WO2017036154A1 - 一种信息处理方法及服务器、计算机存储介质 - Google Patents

一种信息处理方法及服务器、计算机存储介质 Download PDF

Info

Publication number
WO2017036154A1
WO2017036154A1 PCT/CN2016/080796 CN2016080796W WO2017036154A1 WO 2017036154 A1 WO2017036154 A1 WO 2017036154A1 CN 2016080796 W CN2016080796 W CN 2016080796W WO 2017036154 A1 WO2017036154 A1 WO 2017036154A1
Authority
WO
WIPO (PCT)
Prior art keywords
virus
feature
file
operation instruction
classification model
Prior art date
Application number
PCT/CN2016/080796
Other languages
English (en)
French (fr)
Inventor
林舒婕
杨宜
李璐鑫
于涛
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2017036154A1 publication Critical patent/WO2017036154A1/zh
Priority to US15/700,650 priority Critical patent/US11163877B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/564Static detection by virus signature recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/566Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/563Static detection by source code analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/567Computer malware detection or handling, e.g. anti-virus arrangements using dedicated hardware
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/03Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
    • G06F2221/033Test or assess software

Definitions

  • the present invention relates to communication technologies, and in particular, to an information processing method, a server, and a computer storage medium.
  • the mainstream solutions for identifying viruses are as follows: 1. Extracting known virus samples a piece of binary feature code, the feature code can uniquely identify the virus, add the feature code to the virus database, search for whether there is matching virus feature data in the virus detection, thereby identifying the virus; second, running the unknown virus on the virtual machine To detect virus behavior and identify the virus.
  • the virus signature code technology is extracted by the above first solution, and although the known virus can be quickly and accurately identified, the false alarm rate that the new virus cannot be recognized or recognized is high.
  • the virus is run under the virtual machine to detect the virus behavior, and the detection result completely depends on the professional level of the analyst, which not only greatly increases the labor cost, but also has the problem of false alarm rate, and the virus There are also virus packs that can cause analysts to use a large number of application programming interface (API) call sequences, resulting in inefficiencies.
  • API application programming interface
  • the embodiments of the present invention are intended to provide an information processing method, a server, and a computer storage medium, which at least solve the problems existing in the prior art and can accurately identify viruses.
  • An information processing method includes:
  • the at least one file to be detected is identified according to the virus classification model, extracting at least one second operation instruction from the at least one file to be detected, determining whether the feature value of the second operation instruction conforms to the virus
  • the structural characteristic parameter if the structural characteristic parameter of the virus is met, identifies that the file to be detected is a virus file.
  • the server includes:
  • the determining unit is configured to obtain at least one executable file of the specified type, extract at least one first operation instruction from the executable file of the at least one specified type, and determine whether the feature of the first operation instruction conforms to a preset policy, Determining the first operation instruction as a feature instruction if the preset policy is met;
  • a processing unit configured to extract a feature value of the feature instruction, and use the feature value of the feature instruction to construct a virus classification model, and analyze a structural feature parameter of the virus;
  • An identification unit configured to: when the at least one file to be detected is identified according to the virus classification model, extract at least one second operation instruction from the at least one file to be detected, and determine a feature value of the second operation instruction Whether the structural characteristic parameter of the virus is met, and if the structural characteristic parameter of the virus is met, the file to be detected is identified as a virus file.
  • Embodiments of the present invention also provide a computer storage medium in which computer executable instructions are stored, the computer executable instructions being configured to perform the above information processing method.
  • the information processing method of the embodiment of the present invention includes: acquiring at least one executable file of a specified type, extracting at least one first operation instruction from the at least one executable file of the specified type, and determining characteristics of the first operation instruction Whether the preset policy is met, if the preset policy is met, determining the first operation instruction as a feature instruction; extracting a feature value of the feature instruction, and using the feature value of the feature instruction to construct a virus classification model Obtaining a structural characteristic parameter of the virus; and when the at least one file to be detected is identified according to the virus classification model, extracting at least one second operation instruction from the at least one file to be detected, and determining the second operation Whether the eigenvalue of the command conforms to the structural characteristic parameter of the virus, and if the structural characteristic parameter of the virus is met, the file to be detected is identified as a virus file.
  • the embodiment of the present invention performs detection and analysis on an executable file that may be a virus file.
  • an executable file that may be a virus file.
  • the executable file is regarded as a known virus file.
  • the first operation instruction of the known virus file conforming to the preset policy is used as a feature instruction and the feature value is extracted, so as to construct a virus classification model by extracting the feature value, thereby analyzing the structural characteristics of the virus. parameter.
  • 1 is a schematic diagram of hardware entities of each party performing information interaction in an embodiment of the present invention
  • FIG. 2 is a schematic flowchart of an implementation process according to Embodiment 1 of the present invention.
  • 3-5 are schematic diagrams showing examples of a classification model (classifier) according to an embodiment of the present invention.
  • FIG. 6 is a schematic diagram of an implementation process of Embodiment 2 of the present invention.
  • FIG. 7 is a schematic flowchart of an implementation process of Embodiment 3 of the present invention.
  • FIG. 8 is a schematic structural diagram of a structure according to Embodiment 4 of the present invention.
  • FIG. 9 is a schematic structural diagram of a hardware according to Embodiment 5 of the present invention.
  • FIG. 1 is a schematic diagram of hardware entities of each party performing information interaction according to an embodiment of the present invention.
  • FIG. 1 includes: a terminal device 11-12, a base station 21-23, a server 31, and between a terminal and a terminal, between the terminal and the server.
  • virus files infected with viruses may also be transmitted.
  • Virus files include known virus files and/or unknown virus files. For information security considerations in information interaction, it is necessary to A virus is identified in the message.
  • the embodiment of the present invention adopts the method of collecting virus files on the server 31 side, including known virus files and unknown virus files. Since the virus files are usually executable files, the first possible The executable file of the virus file is detected and analyzed. When the feature of the first operation instruction in the executable file is found to conform to the preset policy, the executable file is regarded as a known virus file, and the known virus file is included in the file. The first operation instruction conforming to the preset strategy is used as a feature instruction and the feature value is extracted, so as to construct a virus classification model by extracting the feature value, thereby analyzing the structural feature parameters of the virus.
  • the second operation instruction in the file to be detected is extracted, and it is determined whether the feature value of the second operation instruction conforms to the structural characteristic parameter of the virus.
  • the file to be detected is identified as a virus file.
  • FIG. 1 is only a system architecture example for implementing the embodiment of the present invention.
  • the embodiment of the present invention is not limited to the system structure described in FIG. 1 above, and various embodiments of the present invention are proposed based on the system architecture.
  • the information processing method of the embodiment of the present invention is as shown in FIG. 2, and the method includes:
  • Step 101 Acquire at least one executable file of a specified type, and extract at least one first operation instruction from the at least one executable file of the specified type.
  • the executable file that may be a virus file is first detected and analyzed.
  • the executable file that may be a virus file may be referred to as a PE file, and usually the file name suffix of the PE file is used. Is ".exe".
  • the PE file contains a large amount of static information.
  • the static information has common information and instruction information.
  • the ordinary information may be function entry information
  • the instruction information is an operation instruction for calling the function.
  • Step 102 Determine whether the feature of the first operation instruction conforms to a preset policy. If yes, execute step 103. Otherwise, no processing is performed.
  • the server since the server does not determine whether the PE file is a file required for analysis, that is, a known virus file, it is also necessary to perform detection and analysis on the instruction information in the PE file.
  • the command information of the known virus file has certain characteristics, for example, some instruction information in the virus file will call some special system application program interface (API) functions to achieve its destruction purpose, such as reading and writing to the registry.
  • API application program interface
  • the code parts of the same virus often have similarities; for example, the number and frequency of the instruction information with these characteristics are frequent, or even if there is no certain feature, the instruction information is not yet clear.
  • the number and frequency of occurrences are many and frequent; for example, many viruses use encryption technology to extract the size of the area at the entrance of the code, the boundary, etc. as feature values, etc.; therefore, these conditions are used as preset strategies in the PE file.
  • the characteristics of the instruction information are compared with the features in the preset strategy, and if a certain feature is found, such instruction information is Used as an important feature instruction for subsequent construction of the classifier.
  • Step 103 The feature of the first operation instruction conforms to the preset policy, and the first operation instruction is determined as a feature instruction.
  • the preset policy includes: mentioned in the above step 102, some instruction information in the virus file will call some special system API functions to achieve the purpose of destruction, such as reading and writing to the registry, modifying the system key Path, the code part of the same virus often has similarities; for example, the number and frequency of instruction information with these characteristics are frequent, or the number and frequency of instruction information is not clear, even if it does not have certain characteristics. Many frequently; for example, many viruses use encryption technology to extract the size of the area at the entrance of the code, boundaries, etc. as feature values, etc., but are not limited to these strategies.
  • Step 104 Extract a feature value of the feature instruction, and use the feature value of the feature instruction to construct a virus classification model, and analyze a structural feature parameter of the virus.
  • the eigenvalues of the important feature commands in the known virus files are analyzed by the above steps 101-103, and the feature values of the feature commands are used to construct a virus classification model, and the structural characteristic parameters of the virus obtained can be summarized.
  • Step 105 Extract at least one second operation instruction from the at least one file to be detected when the at least one file to be detected is identified according to the virus classification model.
  • the file to be detected may be a new type of virus file or a normal file that is not infected with a virus.
  • the virus classification model at least one file to be detected is identified, and actually the matching of structural characteristic parameters of the virus is performed.
  • the structural model parameters of various viruses are included in the classification model.
  • the virus classification model is at least one.
  • the virus classification model is more than one, for example, two virus classification models
  • the structural characteristic parameters of the virus in any one of the virus classification models may be used to match the file to be detected, or may be After the two virus classification models are iterated or superimposed, the comprehensive classification model is obtained first, and then the structural characteristics of the virus in the comprehensive classification model are obtained. The parameters are used to match the file to be detected.
  • the virus classification model can be more, such as five, and can be iterated or superimposed with at least one of the virus classification models to obtain a comprehensive classification model, such as two or two iterations or superposition to obtain the first comprehensive classification model and the second synthesis.
  • the classification model is iterated or superimposed with the remaining one of the original classification models.
  • Figure 3 - Figure 5 shows an iterative or superimposed combination of several classification models (or classifiers).
  • Step 106 Determine whether the feature value of the second operation instruction meets the structural feature parameter of the virus, and if yes, perform step 107; otherwise, no processing is performed.
  • Step 107 The feature value of the second operation instruction is consistent with the structural characteristic parameter of the virus, and the file to be detected is identified as a virus file.
  • the file to be detected may be a new type of virus file, or may be a normal file that is not infected with the virus, and extract the feature value of the command information in the file to be detected, and identify it according to the at least one virus classification model, and actually perform the virus.
  • the matching of the structural feature parameters, as long as there is at least one matching result, the file to be detected is a virus file, and further, one or more instructions in the virus file or the virus file may be according to the virus classification result of the virus classification model.
  • the information is distinguished by which type of virus.
  • Figure 3-5 shows an iterative or superimposed combination of several classification models (or classifiers).
  • the classification constructor is represented by A11, as shown in Figure 3, constructed in the classification constructor.
  • There are three virus classification models, and the files to be detected are matched with the structural feature parameters in any one of the virus classification models; as shown in Fig. 4, there are five virus classification models constructed in the classification constructor, and the model 1 4
  • the comprehensive classification model 1 and the comprehensive classification model 2 are obtained first, and the model 5 is not iterated or superimposed, and then the comprehensive classification model 1, the comprehensive classification model 2, and the model 5 are The structural characteristic parameters of the virus are used to match the file to be detected; as shown in FIG.
  • step 102 the determining whether the feature of the first operation instruction conforms to a preset policy is performed according to the principle of one or more of the following manners. Whether the first operation instruction is used to invoke the specified system application interface API function to perform operations including modifying the registry or modifying the critical path of the system, and if so, conforming to the preset policy; Whether the frequency and/or the number of occurrences of the first operation instruction reaches a preset threshold, if yes, the preset policy is met; and the third method: determining the size of the area at the entrance of the static code information included in the first operation instruction And/or whether the boundary meets the preset condition, and if so, the preset policy is met.
  • the information processing method of the embodiment of the present invention is as shown in FIG. 6, and the method includes:
  • Step 201 Acquire at least one executable file of a specified type, and extract at least one first operation instruction from the at least one executable file of the specified type.
  • the executable file that may be a virus file is first detected and analyzed.
  • the executable file that may be a virus file may be referred to as a PE file, and usually the file name suffix of the PE file is used. Is ".exe".
  • the PE file contains a large amount of static information.
  • the static information has common information and instruction information.
  • the ordinary information may be function entry information
  • the instruction information is an operation instruction for calling the function.
  • Step 202 Determine whether the feature of the first operation instruction meets a preset policy. If yes, execute step 203. Otherwise, no processing is performed.
  • the server since the server does not determine whether the PE file is a file required for analysis, that is, a known virus file, it is also necessary to perform detection and analysis on the instruction information in the PE file.
  • the command information of the known virus file has certain characteristics, for example, some instruction information in the virus file will call some special system application program interface (API) functions to achieve The purpose of its destruction, such as reading and writing to the registry, modifying the critical path of the system, the code parts of the same kind of virus often have similarities; for example, the number and frequency of instruction information with these characteristics are frequent, or even if there is no Characteristics, the number and frequency of instruction information are often frequent when there is no clear case; for example, many viruses use encryption technology to extract the size of the area at the entrance of the code, the boundary, etc.
  • API application program interface
  • the characteristics of the instruction information are compared with the features in the preset strategy. If a certain feature is found, such instruction information is regarded as an important feature. Instructions for subsequent construction of the classifier.
  • Step 203 The feature of the first operation instruction conforms to the preset policy, and the first operation instruction is determined as a feature instruction.
  • the preset policy includes: mentioned in the above step 202, some instruction information in the virus file will call some special system API functions to achieve the purpose of destruction, such as reading and writing to the registry, modifying the system key Path, the code part of the same virus often has similarities; for example, the number and frequency of instruction information with these characteristics are frequent, or the number and frequency of instruction information is not clear, even if it does not have certain characteristics. Many frequently; for example, many viruses use encryption technology to extract the size of the area at the entrance of the code, boundaries, etc. as feature values, etc., but are not limited to these strategies.
  • Step 204 The feature value of the feature instruction is used as a reference sample, and the reference sample is analyzed for information impurity purity to obtain a reference reference value of the virus type division, and the virus classification model is divided according to the reference reference value. At least one virus classification model is obtained, and structural characteristic parameters of the virus for characterizing the virus classification are obtained by the at least one virus classification model.
  • the feature values of the important feature commands in the known virus file are analyzed by the above steps 201-203, and the feature values of the feature commands are used to construct a virus classification model, and the structural characteristic parameters of the virus obtained can be summarized.
  • New type of virus and its variants or similar structures The general expression of the virus, which provides a good basis for the final virus detection.
  • the dividing the virus classification model according to the reference reference value to obtain at least one virus classification model includes: when the virus classification model is a decision tree model, using the reference reference value as a branch node of the decision tree, Dividing to the left and right sides of the branch node respectively and establishing at least one classification regression tree in the direction of gradient reduction of the residual, the at least one classification model is formed by the at least one classification regression tree.
  • Step 205 Extract at least one second operation instruction from the at least one file to be detected when the at least one file to be detected is identified according to the virus classification model.
  • the file to be detected may be a new type of virus file or a normal file that is not infected with a virus.
  • the virus classification model at least one file to be detected is identified, and actually the matching of structural characteristic parameters of the virus is performed.
  • the structural model parameters of various viruses are included in the classification model.
  • the virus classification model is at least one.
  • the virus classification model is more than one, for example, two virus classification models, the structural characteristic parameters of the virus in any one of the virus classification models may be used to match the file to be detected, or may be After the two virus classification models are iterated or superimposed, the comprehensive classification model is obtained first, and then the structural characteristic parameters of the virus in the comprehensive classification model are used to match the files to be detected.
  • the virus classification model can be more, such as five, and can be iterated or superimposed with at least one of the virus classification models to obtain a comprehensive classification model, such as two or two iterations or superposition to obtain the first comprehensive classification model and the second synthesis.
  • the classification model is iterated or superimposed with the remaining one of the original classification models.
  • Figure 3 - Figure 5 shows an iterative or superimposed combination of several classification models (or classifiers).
  • Step 206 Determine whether the feature value of the second operation instruction meets the structural feature parameter of the virus, and if yes, perform step 207; otherwise, no processing is performed.
  • Step 207 The feature value of the second operation instruction is consistent with the structural characteristic parameter of the virus, and the file to be detected is identified as a virus file.
  • the file to be detected may be a new type of virus file or a normal file that is not infected with a virus. Extracting the instruction information feature value in the to-be-detected file, and identifying it according to the at least one virus classification model, actually performing matching of structural characteristic parameters of the virus, and if there is at least one matching result, indicating the to-be-detected file For the virus file, further, according to the virus classification result of the virus classification model, it is possible to distinguish which type of virus is one or more pieces of instruction information in the virus file or the virus file.
  • a specific application of the information processing method in the embodiment of the present invention is: when the virus classification model is a decision tree model, the reference reference value is used as a branch node of the decision tree, respectively, to the branch The left and right sides of the node are divided and at least one classification regression tree is established in the direction of the gradient reduction of the residual.
  • the process of the above specific application includes:
  • Step 301 Acquire an estimated value of the reference sample and an actual value obtained by training.
  • Step 302 Calculate a gradient formed by the residual of the estimated value and the actual value according to the estimated value and the actual value.
  • Step 303 Create a new classification regression tree in the direction of each gradient reduction, repeat N times, N is a natural number greater than 1, and obtain N classification regression trees.
  • Embodiment 4 is a diagrammatic representation of Embodiment 4:
  • the server includes:
  • the determining unit 41 is configured to acquire at least one executable file of a specified type, extract at least one first operation instruction from the executable file of the at least one specified type, and determine whether the feature of the first operation instruction conforms to a preset policy. If the preset policy is met, the first An operational command is determined to be a feature command.
  • the processing unit 42 is configured to extract the feature value of the feature instruction, and use the feature value of the feature instruction to construct a virus classification model, and analyze the structural feature parameter of the virus;
  • the identifying unit 43 is configured to: when the at least one file to be detected is identified according to the virus classification model, extract at least one second operation instruction from the at least one file to be detected, and determine a feature value of the second operation instruction Whether the structural characteristic parameter of the virus is met, and if the structural characteristic parameter of the virus is met, the file to be detected is identified as a virus file.
  • the executable file that may be a virus file can be detected and analyzed, and when the feature of the first operating instruction in the executable file is found to be in accordance with the preset
  • the policy the executable file is used as a known virus file, and the first operation instruction of the known virus file that conforms to the preset policy is used as a feature instruction and the feature value is extracted, so as to facilitate the extraction of the feature.
  • the value is used to construct a virus classification model to analyze the structural characteristic parameters of the virus.
  • the determining unit 41 is further configured to determine whether the feature of the first operation instruction meets a preset policy by using one or more of the following manners, including: manner 1: determining the first Whether the operation instruction is used to invoke the specified system application interface API function to perform operations including modifying the registry or modifying the critical path of the system; if yes, the preset policy is met; and mode 2: determining the first Whether the frequency and/or the number of occurrences of the operation instruction reaches a preset threshold, and if yes, the preset policy is met; and the third method: determining the size of the area at the entrance of the static code information included in the first operation instruction and/or Or whether the boundary is consistent The preset condition, if yes, meets the preset policy.
  • manner 1 determining the first Whether the operation instruction is used to invoke the specified system application interface API function to perform operations including modifying the registry or modifying the critical path of the system
  • mode 2 determining the first Whether the frequency and/or the number of occurrences of the operation instruction reaches a
  • the command information of the known virus file has certain characteristics, for example, some instruction information in the virus file will call some special system application program interface (API) functions to achieve its destruction purpose, such as reading and writing to the registry.
  • API application program interface
  • the code parts of the same virus often have similarities; for example, the number and frequency of the instruction information with these characteristics are frequent, or even if there is no certain feature, the instruction information is not yet clear.
  • the number and frequency of occurrences are many and frequent; for example, many viruses use encryption technology to extract the size of the area at the entrance of the code, the boundary, etc. as feature values, etc.; therefore, these conditions are used as preset strategies in the PE file.
  • the characteristics of the instruction information are compared with the features in the preset strategy. If a certain feature is found, such instruction information is used as an important feature instruction for subsequent construction of the classifier.
  • the processing unit 42 may further include: a first sub-processing unit configured to use the feature value of the feature instruction as a reference sample, and perform information purity analysis on the reference sample to Obtaining a reference value of the virus type division; the second sub-processing unit is configured to perform the division of the virus classification model according to the reference reference value, to obtain at least one virus classification model, and obtain the virus for characterizing the virus by using the at least one virus classification model Structural characteristic parameters of the virus classified.
  • a first sub-processing unit configured to use the feature value of the feature instruction as a reference sample, and perform information purity analysis on the reference sample to Obtaining a reference value of the virus type division
  • the second sub-processing unit is configured to perform the division of the virus classification model according to the reference reference value, to obtain at least one virus classification model, and obtain the virus for characterizing the virus by using the at least one virus classification model Structural characteristic parameters of the virus classified.
  • the first sub-processing unit may be specifically configured as a feature value extraction module, configured to extract feature values of the feature command, use the feature value of the feature command as a reference sample, and perform information impurity analysis on the reference sample. To obtain a reference value for the classification of virus types.
  • the second sub-processing unit may be specifically configured to construct a classifier module, and perform classification of the virus classification model according to the reference reference value to obtain at least one virus classification model, and obtain the virus classification by using the at least one virus classification model. Structural characteristic parameters of the virus.
  • the identification unit may be specifically a virus detection module configured to identify a virus and detect which virus classification the identified virus belongs to.
  • the reference reference value is used as a branch node of the decision tree, and is respectively divided into left and right sides of the branch node.
  • At least one classification regression tree is established in the gradient reduction direction of the residual, and the at least one classification model is constructed by the at least one classification regression tree.
  • the estimated value of the reference sample is obtained and the actual value obtained by the training; according to the estimated value and the actual value, a gradient formed by the residual of the estimated value and the actual value is calculated;
  • the direction establishes a new classification regression tree, which is repeated N times, N is a natural number greater than 1, and N classification regression trees are obtained.
  • Embodiment 5 is a diagrammatic representation of Embodiment 5:
  • the foregoing server may be an electronic device that is configured by a cluster system and is integrated into one or each unit function to implement each unit function.
  • the client and the server at least include a database for storing data and A processor for data processing, or a storage medium provided in a server or a separately set storage medium.
  • a microprocessor for the processor for data processing, a microprocessor, a central processing unit (CPU), a digital signal processor (DSP, Digital Singnal Processor) or programmable logic may be used when performing processing.
  • An FPGA Field-Programmable Gate Array
  • An operation instruction for a storage medium, includes an operation instruction, where the operation instruction may be computer executable code, and the operation instruction is used to implement the information processing method in the foregoing embodiment of the present invention.
  • the apparatus includes a processor 51, a storage medium 52, and at least one external communication interface 53; the processor 51, the storage medium 52, and the external communication interface 53 are all connected by a bus 54.
  • Embodiments of the present invention also provide a computer storage medium in which a computer executable The computer executable instruction configuration executes the above information processing method.
  • the embodiment of the present invention is specifically an unknown virus classification and prediction scheme based on a gradient rising decision tree.
  • the embodiment of the present invention mainly predicts the virus type by analyzing the similarity between the unknown virus and the important feature command structure in the known virus PE file, and improves the detection success rate of the unknown virus.
  • the solution does not need to run a virus, and only needs to extract static information in the PE file for analysis.
  • the method of the embodiment of the present invention may be implemented by using a feature value extraction module, a structure classifier module, and a virus detection module.
  • a feature value extraction module e.g., a feature value extraction module
  • a structure classifier module e.g., a virus detection module
  • a virus detection module e.g., a virus detection module
  • Specific examples and algorithms are as follows. Through the feature extraction module, the constructor classifier module, and the virus prediction module, at least the following processing needs to be completed: 1) feature extraction processing for known viruses; 2) storing the extracted features into a database for subsequent construction of the classifier 3) constructing the classifier for identifying specific classifications of viruses and viruses according to the extracted features; 4) using the constructed classifier for virus detection of unknown virus files, identifying viruses, and outputting virus classification.
  • virus files are files in PE format. Virus files are similar to ordinary PE files in various operations, but they often achieve their destructive purposes by calling some special system API functions, such as reading and writing the registry and modifying the critical path of the system. The code parts of the same virus often have similarities, so the important feature commands in the PE file are extracted, and the number of occurrences of the feature commands is used as a basis for judgment. Since many viruses use encryption technology, it is possible to extract the size, boundary, and the like at the entrance of the code as feature values for comparison.
  • n meaningful operation instructions are extracted as features to form a matrix X of m*n, where X(ij) represents the occurrence of the jth feature in the i-th file.
  • the number of times, the virus class corresponding to the m virus files is the m-dimensional vector Y, and Y(i) represents the virus classification to which the i-th virus file belongs.
  • the virus is identified and the virus classification is determined by analyzing the similarity of important instructions and structures in the PE file.
  • the classifier of the embodiment of the invention is implemented by using a gradient ascending decision tree.
  • the principle is: when the algorithm starts, an estimated value is assigned to each sample, and an estimate of each sample is initially obtained. The values are the same.
  • the training model category regression tree
  • there will be an error in the estimation of the data points and the gradient of the residuals of the estimated and actual values of the data points will be calculated.
  • Reduce the direction to create a new model repeat N times (N specified by the user), get N simple classifiers (category regression tree), combine N classifiers (weighted, or vote, etc.), get a The final model.
  • N simple classifiers are constructed using a classification regression tree. The specific steps include:
  • step d If the number of RXs after division is less than the number set by the user, return to step a and continue to search for X(kl). If the information impureness measure is less than a certain value, perform step e, otherwise execute step g. ;
  • the left subtree of the node is a data set LX, Lg greater than the feature value, and the right subtree of the node is less than or equal to the data set RX, Rg of the feature value;
  • the two core points are: 1) using a classification regression tree, with the amount of information as the branch of the tree Node, the mechanism of action of the amplifier used in analog communication, the signal is amplified, easier to identify and distinguish, similarly, the less pure the information, the easier it is to distinguish as a reference value, which can be easily sifted by this information impurity analysis. Virus classification. 2) Establishing a classification regression tree in the direction of the gradient reduction of the residual can make the prediction result close to the real result.
  • N simple classifiers are obtained in the classifier, and the unknown virus files are sequentially judged by the N classifiers, and each time the virus file is determined according to the branch condition of the tree, the leaf nodes are selected according to the branch conditions of the tree. Calculate the information gain of the leaf nodes judged by each classifier, and add the most to the final result. Any classifier can be used, or a method of iterating multiple classifiers can be used. This iterative scheme can avoid over-fitting of the tree and can improve the success rate of virus identification and classification.
  • the disclosed apparatus and method may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner such as: multiple units or components may be combined, or Can be integrated into another system, or some features can be ignored or not executed.
  • the coupling, or direct coupling, or communication connection of the components shown or discussed may be indirect coupling or communication connection through some interfaces, devices or units, and may be electrical, mechanical or other forms. of.
  • the units described above as separate components may or may not be physically separated, and the components displayed as the unit may or may not be physical units, that is, may be located in one place or distributed to a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may be separately used as one unit, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of hardware plus software functional units.
  • the foregoing program may be stored in a computer readable storage medium, and when executed, the program includes The foregoing storage steps include: a removable storage device, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like.
  • the medium of the program code includes: a removable storage device, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like.
  • the above-described integrated unit of the present invention may be stored in a computer readable storage medium if it is implemented in the form of a software function module and sold or used as a standalone product.
  • the technical solution of the embodiments of the present invention may be embodied in the form of a software product in essence or in the form of a software product stored in a storage medium, including a plurality of instructions.
  • a computer device (which may be a personal computer, server, or network device, etc.) is caused to perform all or part of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes various media that can store program codes, such as a mobile storage device, a ROM, a RAM, a magnetic disk, or an optical disk.
  • the embodiment of the present invention performs detection and analysis on an executable file that may be a virus file.
  • an executable file that may be a virus file.
  • the executable file is regarded as a known virus file.
  • the first operation instruction of the known virus file conforming to the preset policy is used as a feature instruction and the feature value is extracted, so as to construct a virus classification model by extracting the feature value, thereby analyzing the structural characteristics of the virus. parameter.
  • Identifying the detected file to see if it is a virus file extracting the second operation in the file to be detected And instructing, determining whether the feature value of the second operation instruction meets the structural feature parameter of the virus, and if the structural feature parameter of the virus is met, identifying the file to be detected as a virus file. Not only does it need to be manually involved in the analysis of virus identification, but also the virus can be accurately identified through the comparison of structural characteristic parameters, thereby improving recognition accuracy and recognition efficiency.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Virology (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Stored Programmes (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种信息处理方法及服务器、计算机存储介质,其中,所述信息处理方法包括:获取至少一个指定类型的可执行文件,从所述至少一个指定类型的可执行文件中提取至少一个第一操作指令(101),判断所述第一操作指令的特征是否符合预设策略(102),如果符合所述预设策略,则将所述第一操作指令确定为特征指令(103);提取所述特征指令的特征值,将所述特征指令的特征值用于构造病毒分类模型,分析得到病毒的结构特征参数(104);根据所述病毒分类模型对至少一个待检测的文件进行识别时,从所述至少一个待检测的文件中提取至少一个第二操作指令(105),判断所述第二操作指令的特征值是否符合所述病毒的结构特征参数(106),如果符合所述病毒的结构特征参数,则识别出所述待检测的文件为病毒文件(107)。

Description

一种信息处理方法及服务器、计算机存储介质 技术领域
本发明涉及通讯技术,尤其涉及一种信息处理方法及服务器、计算机存储介质。
背景技术
随着网络技术的发展,病毒传播得更快,破坏能力更强,是防毒杀毒工作面临巨大挑战,目前,采用现有技术,主流的识别病毒的解决方案为:一、提取已知病毒样本中的一段二进制特征码,该特征码能唯一识别病毒,将此特征码添加到病毒数据库,在病毒检测时搜索是否有匹配的病毒特征数据,从而识别出病毒;二、将未知病毒运行在虚拟机下以检测病毒行为,从而识别出病毒。
采用上述现有技术,存在的缺点为:通过上述第一种解决方案,提取病毒特征码技术,虽然能快速和准确识别已知病毒,但是,对新型病毒无法识别或识别的误报率高。通过上述第二种解决方案,将病毒运行在虚拟机下检测病毒行为,其检测结果完全依赖于分析人员的专业水平,不仅大大增加了人工成本,而且也同样存在误报率的问题,而且病毒还存在病毒加壳,会导致分析人员在使用应用程序接口(API)调用序列时数量巨大,导致效率低下的问题。然而,相关技术中,对于如何精准识别病毒的问题,尚无有效解决方案。
发明内容
有鉴于此,本发明实施例希望提供一种信息处理方法及服务器、计算机存储介质,至少解决了现有技术存在的问题,能精准的识别病毒。
本发明实施例的技术方案是这样实现的:
本发明实施例的一种信息处理方法,所述方法包括:
获取至少一个指定类型的可执行文件,从所述至少一个指定类型的可执行文件中提取至少一个第一操作指令,判断所述第一操作指令的特征是否符合预设策略,如果符合所述预设策略,则将所述第一操作指令确定为特征指令;
提取所述特征指令的特征值,将所述特征指令的特征值用于构造病毒分类模型,分析得到病毒的结构特征参数;
根据所述病毒分类模型对至少一个待检测的文件进行识别时,从所述至少一个待检测的文件中提取至少一个第二操作指令,判断所述第二操作指令的特征值是否符合所述病毒的结构特征参数,如果符合所述病毒的结构特征参数,则识别出所述待检测的文件为病毒文件。
本发明实施例的一种服务器,所述服务器包括:
判断单元,配置为获取至少一个指定类型的可执行文件,从所述至少一个指定类型的可执行文件中提取至少一个第一操作指令,判断所述第一操作指令的特征是否符合预设策略,如果符合所述预设策略,则将所述第一操作指令确定为特征指令;
处理单元,配置为提取所述特征指令的特征值,将所述特征指令的特征值用于构造病毒分类模型,分析得到病毒的结构特征参数;
识别单元,配置为根据所述病毒分类模型对至少一个待检测的文件进行识别时,从所述至少一个待检测的文件中提取至少一个第二操作指令,判断所述第二操作指令的特征值是否符合所述病毒的结构特征参数,如果符合所述病毒的结构特征参数,则识别出所述待检测的文件为病毒文件。
本发明实施例还提供一种计算机存储介质,其中存储有计算机可执行指令,该计算机可执行指令配置执行上述信息处理方法。
本发明实施例的信息处理方法,包括:获取至少一个指定类型的可执行文件,从所述至少一个指定类型的可执行文件中提取至少一个第一操作指令,判断所述第一操作指令的特征是否符合预设策略,如果符合所述预设策略,则将所述第一操作指令确定为特征指令;提取所述特征指令的特征值,将所述特征指令的特征值用于构造病毒分类模型,分析得到病毒的结构特征参数;根据所述病毒分类模型对至少一个待检测的文件进行识别时,从所述至少一个待检测的文件中提取至少一个第二操作指令,判断所述第二操作指令的特征值是否符合所述病毒的结构特征参数,如果符合所述病毒的结构特征参数,则识别出所述待检测的文件为病毒文件。
采用本发明实施例,对可能为病毒文件的可执行文件进行检测分析,当发现该可执行文件中的第一操作指令的特征符合预设策略,则将该可执行文件作为已知的病毒文件,将该已知的病毒文件中符合预设策略的第一操作指令作为特征指令并对其特征值进行提取,以便于通过提取的该特征值来构造病毒分类模型,从而分析得到病毒的结构特征参数。对待检测的文件进行识别,看是否为病毒文件时,提取该待检测的文件中的第二操作指令,判断所述第二操作指令的特征值是否符合所述病毒的结构特征参数,如果符合所述病毒的结构特征参数,则识别出所述待检测的文件为病毒文件。不仅无需人工介入病毒识别的分析,而且能通过结构特征参数的比对,精准的识别出病毒,从而提高了识别精度和识别效率。
附图说明
图1为本发明实施例中进行信息交互的各方硬件实体的示意图;
图2为本发明实施例一的一个实现流程示意图;
图3-5为本发明实施例分类模型(分类器)的实例示意图;
图6为本发明实施例二的一个实现流程示意图;
图7为本发明实施例三的一个实现流程示意图;
图8为本发明实施例四的一个组成结构示意图;
图9为本发明实施例五的一个硬件结构示意图。
具体实施方式
下面结合附图对技术方案的实施作进一步的详细描述。
图1为本发明实施例中进行信息交互的各方硬件实体的示意图,图1中包括:终端设备11-12,基站21-23,服务器31,终端与终端之间,终端与服务器之间在信息交互的过程中除了传输未感染病毒的正常文件,也可能会传播感染病毒的病毒文件,病毒文件包括已知病毒文件和/或未知病毒文件,为了信息交互中的信息安全考虑,需要从众多信息中识别出病毒。
基于上述图1所示的系统,采用本发明实施例,是通过在服务器31侧收集病毒文件,包括已知病毒文件和未知病毒文件,由于病毒文件通常为可执行文件,因此,首先对可能为病毒文件的可执行文件进行检测分析,当发现该可执行文件中的第一操作指令的特征符合预设策略,则将该可执行文件作为已知的病毒文件,将该已知的病毒文件中符合预设策略的第一操作指令作为特征指令并对其特征值进行提取,以便于通过提取的该特征值来构造病毒分类模型,从而分析得到病毒的结构特征参数。之后,对待检测的文件进行识别,看是否为病毒文件时,提取该待检测的文件中的第二操作指令,判断所述第二操作指令的特征值是否符合所述病毒的结构特征参数,如果符合所述病毒的结构特征参数,则识别出所述待检测的文件为病毒文件。
针对图1所示的系统架构,举例来说,在服务器31中至少需要完成以下处理:1)对已知病毒的特征提取处理;2)将提取的特征存入数据库以用于后续构造分类器使用;3)根据提取的特征构造用于识别病毒和病毒具体分类的该分类器;4)对未知病毒文件利用该构造分类器进行病毒检测,识别出病毒,输出病毒分类。
上述图1的例子只是实现本发明实施例的一个系统架构实例,本发明实施例并不限于上述图1所述的系统结构,基于该系统架构,提出本发明各个实施例。
实施例一、
本发明实施例的信息处理方法,如图2所示,所述方法包括:
步骤101、获取至少一个指定类型的可执行文件,从所述至少一个指定类型的可执行文件中提取至少一个第一操作指令。
这里,由于病毒文件通常为可执行文件,因此,首先对可能为病毒文件的可执行文件进行检测分析,该可能为病毒文件的可执行文件可以称为PE文件,通常该PE文件的文件名后缀为“.exe”。PE文件中包含大量的静态信息,静态信息中有普通的信息,也有指令信息,比如普通的信息可以是函数入口信息,指令信息是调用该函数的操作指令。
步骤102、判断所述第一操作指令的特征是否符合预设策略,如果是,则执行步骤103,否则,不予进行处理。
这里,由于服务器并不确定该PE文件是否就是分析所需要的文件,即已知病毒文件,因此,还需要对PE文件中的指令信息进行检测分析。
由于已知病毒文件的指令信息具备一定的特征,比如,病毒文件中的某些指令信息会调用一些较为特殊的系统应用程序接口(API)函数来达到其破坏目的,如对注册表的读写操作、修改系统关键路径,同种病毒的代码部分往往具有相似性;比如,具备这些特征的指令信息的出现次数和频率很频繁,或者即便不具有某种特征,还未明确的情况下指令信息的出现次数和频率很多频繁;比如,很多病毒采用加密技术,可以提取代码入口处的区域大小、边界等做为特征值等等;因此,将这些情况都作为预设策略,对PE文件中的指令信息进行检测分析时,按照指令信息的特征与预设策略中的特征进行比对,如果发现具备一定的特征,则将这样的指令信息 作为重要特征指令,用于后续构造分类器使用。
步骤103、所述第一操作指令的特征符合所述预设策略,将所述第一操作指令确定为特征指令。
这里,预设策略包括:上述步骤102中提及的,病毒文件中的某些指令信息会调用一些较为特殊的系统API函数来达到其破坏目的,如对注册表的读写操作、修改系统关键路径,同种病毒的代码部分往往具有相似性;比如,具备这些特征的指令信息的出现次数和频率很频繁,或者即便不具有某种特征,还未明确的情况下指令信息的出现次数和频率很多频繁;比如,很多病毒采用加密技术,可以提取代码入口处的区域大小、边界等做为特征值等等这些策略,但是不限于本文指出的这些策略。
步骤104、提取所述特征指令的特征值,将所述特征指令的特征值用于构造病毒分类模型,分析得到病毒的结构特征参数。
这里,通过上述步骤101-103对已知病毒文件中重要特征指令的特征值分析,并将所述特征指令的特征值用于构造病毒分类模型,分析得到的该病毒的结构特征参数必然能概括出大部分病毒及其变种或类似结构的新型病毒的通用表现形式,从而为最终的病毒检测提供良好的判断依据。
步骤105、根据所述病毒分类模型对至少一个待检测的文件进行识别时,从所述至少一个待检测的文件中提取至少一个第二操作指令。
待检测文件可以为新型的病毒文件,也可能为未感染病毒的正常文件,根据所述病毒分类模型对至少一个待检测的文件进行识别,实际上是进行病毒的结构特征参数的匹配,该病毒分类模型中包括了各种病毒的结构特征参数。病毒分类模型为至少一个,当病毒分类模型为不止一个时,比如两个病毒分类模型,可以将其中任意一个病毒分类模型中的病毒的结构特征参数用于与待检测文件进行匹配,也可以将两个病毒分类模型进行迭代或叠加后先得到综合分类模型,再将该综合分类模型中的病毒的结构特征 参数用于与待检测文件进行匹配。当然,病毒分类模型可以为更多,比如五个,可以与其中至少一个病毒分类模型进行迭代或叠加后得到综合分类模型,如,分别两两迭代或叠加得到第一综合分类模型和第二综合分类模型,再与剩下的一个原有的分类模型进行迭代或叠加等等。如图3-图5所示为几种分类模型(或称分类器)的迭代或叠加组合实例。
步骤106、判断所述第二操作指令的特征值是否符合所述病毒的结构特征参数,如果是,则执行步骤107;否则,不予进行处理。
步骤107、所述第二操作指令的特征值符合所述病毒的结构特征参数,识别出所述待检测的文件为病毒文件。
待检测文件可以为新型的病毒文件,也可能为未感染病毒的正常文件,提取该待检测文件中的指令信息特征值,将其根据所述至少一个病毒分类模型进行识别,实际上是进行病毒的结构特征参数的匹配,只要有至少一个匹配结果,则说明该待检测文件为病毒文件,进一步,可以根据病毒分类模型的病毒分类结果对该病毒文件、或病毒文件中某条或多条指令信息为哪一类病毒进行区分。
如图3-5所示为几种分类模型(或称分类器)的迭代或叠加组合实例,图3-图5中以A11代表分类构造器,如图3所示,分类构造器中所构造的病毒分类模型有3个,待检测文件分别与其中任意一个病毒分类模型中的结构特征参数进行匹配;如图4所示,分类构造器中所构造的病毒分类模型有5个,模型1-4这4个病毒分类模型两两进行迭代或叠加后先得到综合分类模型1和综合分类模型2,模型5不进行迭代或叠加,之后将综合分类模型1、综合分类模型2、模型5中的病毒的结构特征参数用于与待检测文件进行匹配;如图5所示,分类构造器中所构造的病毒分类模型有5个,模型1-2这2个病毒分类模型两两进行迭代或叠加后先得到综合分类模型1,模型3-5这3个病毒分类模型进行迭代或叠加后先得到综合分类模型2’, 之后将综合分类模型1、综合分类模型2’、中的病毒的结构特征参数用于与待检测文件进行匹配。
基于上述实施例一,在实际应用中,步骤102中,所述判断所述第一操作指令的特征是否符合预设策略,基于以下一种或多种方式的原则进行,方式一:判断所述第一操作指令是否用于调用指定的系统应用程序接口API函数,以执行包括修改注册表或修改系统关键路径在内的操作,如果是,则符合所述预设策略;方式二:判断所述第一操作指令的出现频率和/或次数是否达到预设的阈值,如果是,则符合所述预设策略;方式三:判断所述第一操作指令所包含的静态代码信息入口处的区域大小和/或边界是否符合预设条件,如果是,则符合所述预设策略。
实施例二、
本发明实施例的信息处理方法,如图6所示,所述方法包括:
步骤201、获取至少一个指定类型的可执行文件,从所述至少一个指定类型的可执行文件中提取至少一个第一操作指令。
这里,由于病毒文件通常为可执行文件,因此,首先对可能为病毒文件的可执行文件进行检测分析,该可能为病毒文件的可执行文件可以称为PE文件,通常该PE文件的文件名后缀为“.exe”。PE文件中包含大量的静态信息,静态信息中有普通的信息,也有指令信息,比如普通的信息可以是函数入口信息,指令信息是调用该函数的操作指令。
步骤202、判断所述第一操作指令的特征是否符合预设策略,如果是,则执行步骤203,否则,不予进行处理。
这里,由于服务器并不确定该PE文件是否就是分析所需要的文件,即已知病毒文件,因此,还需要对PE文件中的指令信息进行检测分析。
由于已知病毒文件的指令信息具备一定的特征,比如,病毒文件中的某些指令信息会调用一些较为特殊的系统应用程序接口(API)函数来达到 其破坏目的,如对注册表的读写操作、修改系统关键路径,同种病毒的代码部分往往具有相似性;比如,具备这些特征的指令信息的出现次数和频率很频繁,或者即便不具有某种特征,还未明确的情况下指令信息的出现次数和频率很多频繁;比如,很多病毒采用加密技术,可以提取代码入口处的区域大小、边界等做为特征值等等;因此,将这些情况都作为预设策略,对PE文件中的指令信息进行检测分析时,按照指令信息的特征与预设策略中的特征进行比对,如果发现具备一定的特征,则将这样的指令信息作为重要特征指令,用于后续构造分类器使用。
步骤203、所述第一操作指令的特征符合所述预设策略,将所述第一操作指令确定为特征指令。
这里,预设策略包括:上述步骤202中提及的,病毒文件中的某些指令信息会调用一些较为特殊的系统API函数来达到其破坏目的,如对注册表的读写操作、修改系统关键路径,同种病毒的代码部分往往具有相似性;比如,具备这些特征的指令信息的出现次数和频率很频繁,或者即便不具有某种特征,还未明确的情况下指令信息的出现次数和频率很多频繁;比如,很多病毒采用加密技术,可以提取代码入口处的区域大小、边界等做为特征值等等这些策略,但是不限于本文指出的这些策略。
步骤204、将所述特征指令的特征值作为参考样本,对所述参考样本进行信息不纯度的分析,以得到病毒种类划分的基准参考值,根据所述基准参考值进行病毒分类模型的划分,得到至少一个病毒分类模型,通过所述至少一个病毒分类模型得到用于表征病毒分类的所述病毒的结构特征参数。
这里,通过上述步骤201-203对已知病毒文件中重要特征指令的特征值分析,并将所述特征指令的特征值用于构造病毒分类模型,分析得到的该病毒的结构特征参数必然能概括出大部分病毒及其变种或类似结构的新型 病毒的通用表现形式,从而为最终的病毒检测提供良好的判断依据。
这里,所述根据所述基准参考值进行病毒分类模型的划分,得到至少一个病毒分类模型,包括:所述病毒分类模型为决策树模型时,以所述基准参考值作为决策树的分支节点,分别向所述分支节点的左侧和右侧进行划分并在残差的梯度减少方向建立至少一个分类回归树,由所述至少一个分类回归树构成所述至少一个病毒分类模型。
步骤205、根据所述病毒分类模型对至少一个待检测的文件进行识别时,从所述至少一个待检测的文件中提取至少一个第二操作指令。
待检测文件可以为新型的病毒文件,也可能为未感染病毒的正常文件,根据所述病毒分类模型对至少一个待检测的文件进行识别,实际上是进行病毒的结构特征参数的匹配,该病毒分类模型中包括了各种病毒的结构特征参数。病毒分类模型为至少一个,当病毒分类模型为不止一个时,比如两个病毒分类模型,可以将其中任意一个病毒分类模型中的病毒的结构特征参数用于与待检测文件进行匹配,也可以将两个病毒分类模型进行迭代或叠加后先得到综合分类模型,再将该综合分类模型中的病毒的结构特征参数用于与待检测文件进行匹配。当然,病毒分类模型可以为更多,比如五个,可以与其中至少一个病毒分类模型进行迭代或叠加后得到综合分类模型,如,分别两两迭代或叠加得到第一综合分类模型和第二综合分类模型,再与剩下的一个原有的分类模型进行迭代或叠加等等。如图3-图5所示为几种分类模型(或称分类器)的迭代或叠加组合实例。
步骤206、判断所述第二操作指令的特征值是否符合所述病毒的结构特征参数,如果是,则执行步骤207;否则,不予进行处理。
步骤207、所述第二操作指令的特征值符合所述病毒的结构特征参数,识别出所述待检测的文件为病毒文件。
待检测文件可以为新型的病毒文件,也可能为未感染病毒的正常文件, 提取该待检测文件中的指令信息特征值,将其根据所述至少一个病毒分类模型进行识别,实际上是进行病毒的结构特征参数的匹配,只要有至少一个匹配结果,则说明该待检测文件为病毒文件,进一步,可以根据病毒分类模型的病毒分类结果对该病毒文件、或病毒文件中某条或多条指令信息为哪一类病毒进行区分。
实施例三、
基于上述实施例二,本发明实施例的信息处理方法,一个具体应用为:在所述病毒分类模型为决策树模型时,以所述基准参考值作为决策树的分支节点,分别向所述分支节点的左侧和右侧进行划分并在残差的梯度减少方向建立至少一个分类回归树。
如图7所示,上述具体应用的流程包括:
步骤301、获取所述参考样本的估计值与训练得到的实际值。
步骤302、根据所述估计值和所述实际值,计算出估计值与实际值的残差所构成的梯度。
步骤303、在每一次梯度减少的方向建立一个新的分类回归树,重复N次,N为大于1的自然数,得到N个分类回归树。
这里需要指出的是:以下涉及服务器项的描述,与上述方法描述是类似的,同方法的有益效果描述,不做赘述。对于本发明服务器实施例中未披露的技术细节,请参照本发明上述实施例的描述。
实施例四:
本发明实施例的一种服务器,如图8所示,所述服务器包括:
判断单元41配置为获取至少一个指定类型的可执行文件,从所述至少一个指定类型的可执行文件中提取至少一个第一操作指令,判断所述第一操作指令的特征是否符合预设策略,如果符合所述预设策略,则将所述第 一操作指令确定为特征指令。
处理单元42配置为提取所述特征指令的特征值,将所述特征指令的特征值用于构造病毒分类模型,分析得到病毒的结构特征参数;
识别单元43配置为根据所述病毒分类模型对至少一个待检测的文件进行识别时,从所述至少一个待检测的文件中提取至少一个第二操作指令,判断所述第二操作指令的特征值是否符合所述病毒的结构特征参数,如果符合所述病毒的结构特征参数,则识别出所述待检测的文件为病毒文件。
采用本发明实施例,通过判断单元41、处理单元42和识别单元43,能对可能为病毒文件的可执行文件进行检测分析,当发现该可执行文件中的第一操作指令的特征符合预设策略,则将该可执行文件作为已知的病毒文件,将该已知的病毒文件中符合预设策略的第一操作指令作为特征指令并对其特征值进行提取,以便于通过提取的该特征值来构造病毒分类模型,从而分析得到病毒的结构特征参数。对待检测的文件进行识别,看是否为病毒文件时,提取该待检测的文件中的第二操作指令,判断所述第二操作指令的特征值是否符合所述病毒的结构特征参数,如果符合所述病毒的结构特征参数,则识别出所述待检测的文件为病毒文件。不仅无需人工介入病毒识别的分析,而且能通过结构特征参数的比对,精准的识别出病毒,从而提高了识别精度和识别效率。
在本发明实施例一具体应用中,判断单元41还配置为采取以下一种或多种方式来判断所述第一操作指令的特征是否符合预设策略,包括:方式一:判断所述第一操作指令是否用于调用指定的系统应用程序接口API函数,以执行包括修改注册表或修改系统关键路径在内的操作,如果是,则符合所述预设策略;方式二:判断所述第一操作指令的出现频率和/或次数是否达到预设的阈值,如果是,则符合所述预设策略;方式三:判断所述第一操作指令所包含的静态代码信息入口处的区域大小和/或边界是否符合 预设条件,如果是,则符合所述预设策略。
由于已知病毒文件的指令信息具备一定的特征,比如,病毒文件中的某些指令信息会调用一些较为特殊的系统应用程序接口(API)函数来达到其破坏目的,如对注册表的读写操作、修改系统关键路径,同种病毒的代码部分往往具有相似性;比如,具备这些特征的指令信息的出现次数和频率很频繁,或者即便不具有某种特征,还未明确的情况下指令信息的出现次数和频率很多频繁;比如,很多病毒采用加密技术,可以提取代码入口处的区域大小、边界等做为特征值等等;因此,将这些情况都作为预设策略,对PE文件中的指令信息进行检测分析时,按照指令信息的特征与预设策略中的特征进行比对,如果发现具备一定的特征,则将这样的指令信息作为重要特征指令,用于后续构造分类器使用。
在本发明实施例一具体应用中,处理单元42还可以包括:第一子处理单元,配置为将所述特征指令的特征值作为参考样本,对所述参考样本进行信息不纯度的分析,以得到病毒种类划分的基准参考值;第二子处理单元,配置为根据所述基准参考值进行病毒分类模型的划分,得到至少一个病毒分类模型,通过所述至少一个病毒分类模型得到用于表征病毒分类的所述病毒的结构特征参数。
这里,所述第一子处理单元可以具体为特征值提取模块,配置为提取特征指令的特征值,将所述特征指令的特征值作为参考样本,对所述参考样本进行信息不纯度的分析,以得到病毒种类划分的基准参考值。所述第二子处理单元可以具体为构造分类器模块,根据所述基准参考值进行病毒分类模型的划分,得到至少一个病毒分类模型,通过所述至少一个病毒分类模型得到用于表征病毒分类的所述病毒的结构特征参数。相应的,所述识别单元,可以具体为病毒检测模块,配置为识别出病毒和检测出识别出的病毒属于何种病毒分类。
这里需要指出的是,构造分类器模块在所述病毒分类模型为决策树模型时,以所述基准参考值作为决策树的分支节点,分别向所述分支节点的左侧和右侧进行划分并在残差的梯度减少方向建立至少一个分类回归树,由所述至少一个分类回归树构成所述至少一个病毒分类模型。具体的,是获取所述参考样本的估计值与训练得到的实际值;根据所述估计值和所述实际值,计算出估计值与实际值的残差所构成的梯度;在每一次梯度减少的方向建立一个新的分类回归树,重复N次,N为大于1的自然数,得到N个分类回归树。
有关特征值提取模块、构造分类器模块、病毒检测模块的具体举例和算法描述请详见后续的应用场景描述。
实施例五:
这里需要指出的是,上述服务器可以是通过集群系统构成的,为实现各单元功能而合并为一或各单元功能分体设置的电子设备,客户端和服务器都至少包括用于存储数据的数据库和用于数据处理的处理器,或者包括设置于服务器内的存储介质或独立设置的存储介质。
其中,对于用于数据处理的处理器而言,在执行处理时,可以采用微处理器、中央处理器(CPU,Central Processing Unit)、数字信号处理器(DSP,Digital Singnal Processor)或可编程逻辑阵列(FPGA,Field-Programmable Gate Array)实现;对于存储介质来说,包含操作指令,该操作指令可以为计算机可执行代码,通过所述操作指令来实现上述本发明实施例信息处理方法流程中的各个步骤。
该服务器作为硬件实体S11的一个示例如图9所示。所述装置包括处理器51、存储介质52以及至少一个外部通信接口53;所述处理器51、存储介质52以及外部通信接口53均通过总线54连接。
本发明实施例还提供一种计算机存储介质,其中存储有计算机可执行 指令,该计算机可执行指令配置执行上述信息处理方法。
以一个现实应用场景为例对本发明实施例阐述如下:
在本病毒识别的应用场景中,采用本发明实施例具体为一种基于梯度上升决策树的未知病毒分类与预测方案。在病毒识别的技术中,采用本发明实施例主要是通过分析未知病毒与已知病毒PE文件中的重要特征指令结构的相似度预测病毒种类,提升未知病毒的检测成功率。而且,本方案不需要运行病毒,只需要提取PE文件中的静态信息进行分析即可。
本发明实施例的方法可以采用特征值提取模块、构造分类器模块、病毒检测模块来实现,其具体举例和算法如下所述。通过所述特征提取模块、构造分类器模块、病毒预测模块,至少需要完成以下处理:1)对已知病毒的特征提取处理;2)将提取的特征存入数据库以用于后续构造分类器使用;3)根据提取的特征构造用于识别病毒和病毒具体分类的该分类器;4)对未知病毒文件利用该构造分类器进行病毒检测,识别出病毒,输出病毒分类。
一、针对所述特征值提取模块而言,由于绝大多数病毒文件都是PE格式的文件。病毒文件在进行各种操作时与普通PE文件类似,但它们常常通过调用一些较为特殊的系统API函数来达到其破坏目的,如对注册表的读写操作、修改系统关键路径。同种病毒的代码部分往往具有相似性,因此提取PE文件中的重要特征指令,将特征指令的出现次数作为判断依据。由于很多病毒采用加密技术,可以提取代码入口处的区域大小、边界等做为特征值进行比较。如:在m个PE格式的病毒文件中,选取n个有意义的操作指令作为特征提取出来,组成m*n的矩阵X,其中X(ij)表示第i个文件中第j个特征出现的次数,m个病毒文件所对应的病毒类为m维向量Y,Y(i)表示第i个病毒文件所属的病毒分类。简单来说,是通过分析PE文件中的重要指令和结构的相似度来识别病毒和判断病毒分类。
二、针对所述构造分类器模块而言,本发明实施例的分类器采用梯度上升决策树实现,原理为:算法开始时,为每个样本赋上一个估计值,初始时每个样本的估计值都一样,在每一步训练中的到的模型(分类回归树),会是的数据点的估计有对有错,计算数据点估计值与实际值的残差的梯度,在每一次模型梯度减少的方向建立一个新模型,重复N次(N由用户指定),会得到N个简单的分类器(分类回归树),将N个分类器组合起来(加权、或者进行投票等),得到一个最终的模型。N个简单的分类器采用分类回归树构造,具体步骤包括:
a、输入X和Y-p的梯度g;
b、遍历X,选择任意特征的特征值X(ij),计算所有用特征值划分后的信息不纯性度(比较杂乱)(可以选择GINI指数、双化指数、有序双化指数),信息不纯度越大代表信息当前X包含的病毒种类越杂乱,找到信息不纯度最小时的X(kl);
c、求出X(il){i:0~m-1}中所有大于X(kl)的值d(k){k:1,2……},将X的所有的d(k)行组成左子树数据集LX,对应的g为Lg,将剩下的行组成右子树数据集RX,对应的病毒种类为Rg;
d、若划分后的LX,RX的数量是否小于用户设定数量,返回a步骤,继续寻找X(kl),r若信息不纯性度量是小于一定值,则执行e步骤,否则执行g步骤;
e、返回当前X对应的g的平均值为分类树的叶节点;
f、选择X(kl)为分裂结点,节点的左子树是大于特征值的数据集LX,Lg,节点的右子树小于等于特征值的数据集RX,Rg;
g、将左子树的数据集LX,Lg做为新数据集X,g,执行b步骤;
h、将右子树的数据集RX,Rg作为新数据集X,g,执行c步骤。
两个核心点为:1)使用分类回归树,以信息的不纯度量作为树的分支 节点,类比通信中采用的放大器的作用机理,将信号放大,更容易识别和区分,类似的,信息越不纯,越容易作为区分的参考值,可以通过这种信息不纯度分析更容易筛分出病毒分类。2)在残差的梯度减少方向建立分类回归树,可以使预测结果逼近真实结果。
三、针对所述病毒检测模块而言,分类器中得到N个简单分类器,用N个分类器依次对未知病毒文件进行判断,每次判断病毒文件按照树的分支条件选择符合自己的叶子节点,计算每次分类器判断得到的叶子节点的信息增益,相加最多的为最终结果。可以采用任意一个分类器,也可以采用将多个分类器迭代的方法,这种迭代方案可以避免树的过拟合,且还能提高病毒识别及分类的成功率。
在本申请所提供的几个实施例中,应该理解到,所揭露的设备和方法,可以通过其它的方式实现。以上所描述的设备实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,如:多个单元或组件可以结合,或可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,设备或单元的间接耦合或通信连接,可以是电性的、机械的或其它形式的。
上述作为分离部件说明的单元可以是、或也可以不是物理上分开的,作为单元显示的部件可以是、或也可以不是物理单元,即可以位于一个地方,也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或全部单元来实现本实施例方案的目的。
另外,在本发明各实施例中的各功能单元可以全部集成在一个处理单元中,也可以是各单元分别单独作为一个单元,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。
本领域普通技术人员可以理解:实现上述实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述实施例的步骤;而前述的存储介质包括:移动存储设备、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
或者,本发明上述集成的单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机、服务器、或者网络设备等)执行本发明各个实施例所述方法的全部或部分。而前述的存储介质包括:移动存储设备、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。
工业实用性
采用本发明实施例,对可能为病毒文件的可执行文件进行检测分析,当发现该可执行文件中的第一操作指令的特征符合预设策略,则将该可执行文件作为已知的病毒文件,将该已知的病毒文件中符合预设策略的第一操作指令作为特征指令并对其特征值进行提取,以便于通过提取的该特征值来构造病毒分类模型,从而分析得到病毒的结构特征参数。对待检测的文件进行识别,看是否为病毒文件时,提取该待检测的文件中的第二操作 指令,判断所述第二操作指令的特征值是否符合所述病毒的结构特征参数,如果符合所述病毒的结构特征参数,则识别出所述待检测的文件为病毒文件。不仅无需人工介入病毒识别的分析,而且能通过结构特征参数的比对,精准的识别出病毒,从而提高了识别精度和识别效率。

Claims (11)

  1. 一种信息处理方法,所述方法包括:
    获取至少一个指定类型的可执行文件,从所述至少一个指定类型的可执行文件中提取至少一个第一操作指令,判断所述第一操作指令的特征是否符合预设策略,如果符合所述预设策略,则将所述第一操作指令确定为特征指令;
    提取所述特征指令的特征值,将所述特征指令的特征值用于构造病毒分类模型,分析得到病毒的结构特征参数;
    根据所述病毒分类模型对至少一个待检测的文件进行识别时,从所述至少一个待检测的文件中提取至少一个第二操作指令,判断所述第二操作指令的特征值是否符合所述病毒的结构特征参数,如果符合所述病毒的结构特征参数,则识别出所述待检测的文件为病毒文件。
  2. 根据权利要求1所述的方法,其中,所述判断所述第一操作指令的特征是否符合预设策略,包括以下一种或多种方式:
    方式一:判断所述第一操作指令是否用于调用指定的系统应用程序接口API函数,以执行包括修改注册表或修改系统关键路径在内的操作,如果是,则符合所述预设策略;
    方式二:判断所述第一操作指令的出现频率和/或次数是否达到预设的阈值,如果是,则符合所述预设策略;
    方式三:判断所述第一操作指令所包含的静态代码信息入口处的区域大小和/或边界是否符合预设条件,如果是,则符合所述预设策略。
  3. 根据权利要求1或2所述的方法,其中,所述将所述特征指令的特征值用于构造病毒分类模型,分析得到病毒的结构特征参数,包括:
    将所述特征指令的特征值作为参考样本,对所述参考样本进行信息不纯度的分析,以得到病毒种类划分的基准参考值;
    根据所述基准参考值进行病毒分类模型的划分,得到至少一个病毒分类模型,通过所述至少一个病毒分类模型得到用于表征病毒分类的所述病毒的结构特征参数。
  4. 根据权利要求3所述的方法,其中,所述根据所述基准参考值进行病毒分类模型的划分,得到至少一个病毒分类模型,包括:
    所述病毒分类模型为决策树模型时,以所述基准参考值作为决策树的分支节点,分别向所述分支节点的左侧和右侧进行划分并在残差的梯度减少方向建立至少一个分类回归树,由所述至少一个分类回归树构成所述至少一个病毒分类模型。
  5. 根据权利要求4所述的方法,其中,所述分别向所述分支节点的左侧和右侧进行划分并在残差的梯度减少方向建立至少一个分类回归树,包括:
    获取所述参考样本的估计值与训练得到的实际值;
    根据所述估计值和所述实际值,计算出估计值与实际值的残差所构成的梯度;
    在每一次梯度减少的方向建立一个新的分类回归树,重复N次,N为大于1的自然数,得到N个分类回归树。
  6. 一种服务器,所述服务器包括:
    判断单元,配置为获取至少一个指定类型的可执行文件,从所述至少一个指定类型的可执行文件中提取至少一个第一操作指令,判断所述第一操作指令的特征是否符合预设策略,如果符合所述预设策略,则将所述第一操作指令确定为特征指令;
    处理单元,配置为提取所述特征指令的特征值,将所述特征指令的特征值用于构造病毒分类模型,分析得到病毒的结构特征参数;
    识别单元,配置为根据所述病毒分类模型对至少一个待检测的文件进 行识别时,从所述至少一个待检测的文件中提取至少一个第二操作指令,判断所述第二操作指令的特征值是否符合所述病毒的结构特征参数,如果符合所述病毒的结构特征参数,则识别出所述待检测的文件为病毒文件。
  7. 根据权利要求6所述的服务器,其中,所述判断单元,还配置为采取以下一种或多种方式来判断所述第一操作指令的特征是否符合预设策略:
    方式一:判断所述第一操作指令是否用于调用指定的系统应用程序接口API函数,以执行包括修改注册表或修改系统关键路径在内的操作,如果是,则符合所述预设策略;
    方式二:判断所述第一操作指令的出现频率和/或次数是否达到预设的阈值,如果是,则符合所述预设策略;
    方式三:判断所述第一操作指令所包含的静态代码信息入口处的区域大小和/或边界是否符合预设条件,如果是,则符合所述预设策略。
  8. 根据权利要求6或7所述的服务器,其中,所述处理单元,还配置为:
    第一子处理单元,配置为将所述特征指令的特征值作为参考样本,对所述参考样本进行信息不纯度的分析,以得到病毒种类划分的基准参考值;
    第二子处理单元,配置为根据所述基准参考值进行病毒分类模型的划分,得到至少一个病毒分类模型,通过所述至少一个病毒分类模型得到用于表征病毒分类的所述病毒的结构特征参数。
  9. 根据权利要求8所述的服务器,其中,所述第二子处理单元,还配置为在所述病毒分类模型为决策树模型时,以所述基准参考值作为决策树的分支节点,分别向所述分支节点的左侧和右侧进行划分并在残差的梯度减少方向建立至少一个分类回归树,由所述至少一个分类回归树构成所述至少一个病毒分类模型。
  10. 根据权利要求9所述的服务器,其中,所述第二子处理单元,还配置为获取所述参考样本的估计值与训练得到的实际值;根据所述估计值和所述实际值,计算出估计值与实际值的残差所构成的梯度;在每一次梯度减少的方向建立一个新的分类回归树,重复N次,N为大于1的自然数,得到N个分类回归树。
  11. 一种计算机存储介质,所述计算机存储介质中存储有计算机可执行指令,该计算机可执行指令配置为执行权利要求1至5任一项所述的信息处理方法。
PCT/CN2016/080796 2015-09-02 2016-04-29 一种信息处理方法及服务器、计算机存储介质 WO2017036154A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/700,650 US11163877B2 (en) 2015-09-02 2017-09-11 Method, server, and computer storage medium for identifying virus-containing files

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510555804.2A CN106485146B (zh) 2015-09-02 2015-09-02 一种信息处理方法及服务器
CN201510555804.2 2015-09-02

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/700,650 Continuation US11163877B2 (en) 2015-09-02 2017-09-11 Method, server, and computer storage medium for identifying virus-containing files

Publications (1)

Publication Number Publication Date
WO2017036154A1 true WO2017036154A1 (zh) 2017-03-09

Family

ID=58186601

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/080796 WO2017036154A1 (zh) 2015-09-02 2016-04-29 一种信息处理方法及服务器、计算机存储介质

Country Status (3)

Country Link
US (1) US11163877B2 (zh)
CN (1) CN106485146B (zh)
WO (1) WO2017036154A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112948829A (zh) * 2021-03-03 2021-06-11 深信服科技股份有限公司 文件查杀方法、系统、设备及存储介质

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109522915B (zh) * 2017-09-20 2022-08-23 腾讯科技(深圳)有限公司 病毒文件聚类方法、装置及可读介质
CN109558731B (zh) * 2017-09-26 2022-04-08 腾讯科技(深圳)有限公司 特征码处理方法、装置及存储介质
KR102046748B1 (ko) * 2019-04-25 2019-11-19 숭실대학교산학협력단 트리 부스팅 기반 애플리케이션의 위험도 평가 방법, 이를 수행하기 위한 기록 매체 및 장치
CN110336835B (zh) * 2019-08-05 2021-10-19 深信服科技股份有限公司 恶意行为的检测方法、用户设备、存储介质及装置
CN113806744B (zh) * 2020-06-16 2023-09-05 深信服科技股份有限公司 一种病毒识别方法、装置、设备及可读存储介质
CN112347479B (zh) * 2020-10-21 2021-08-24 北京天融信网络安全技术有限公司 恶意软件检测的误报纠正方法、装置、设备和存储介质
CN112445760B (zh) * 2020-11-13 2024-05-14 三六零数字安全科技集团有限公司 文件分类方法、设备、存储介质及装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103164651A (zh) * 2011-12-15 2013-06-19 西门子公司 提取病毒文件特征码的装置和方法以及病毒检测系统
CN103942495A (zh) * 2010-12-31 2014-07-23 北京奇虎科技有限公司 基于机器学习的程序识别方法及装置
CN104700030A (zh) * 2013-12-04 2015-06-10 腾讯科技(深圳)有限公司 一种病毒数据查找方法、装置及服务器
CN104751054A (zh) * 2013-12-31 2015-07-01 贝壳网际(北京)安全技术有限公司 恶意程序的识别方法及装置、移动终端

Family Cites Families (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5930392A (en) * 1996-07-12 1999-07-27 Lucent Technologies Inc. Classification technique using random decision forests
US7310624B1 (en) * 2000-05-02 2007-12-18 International Business Machines Corporation Methods and apparatus for generating decision trees with discriminants and employing same in data classification
US7089592B2 (en) * 2001-03-15 2006-08-08 Brighterion, Inc. Systems and methods for dynamic detection and prevention of electronic fraud
US7424619B1 (en) * 2001-10-11 2008-09-09 The Trustees Of Columbia University In The City Of New York System and methods for anomaly detection and adaptive learning
US7505948B2 (en) * 2003-11-18 2009-03-17 Aureon Laboratories, Inc. Support vector regression for censored data
US20060287969A1 (en) * 2003-09-05 2006-12-21 Agency For Science, Technology And Research Methods of processing biological data
US7930353B2 (en) * 2005-07-29 2011-04-19 Microsoft Corporation Trees of classifiers for detecting email spam
US8839418B2 (en) * 2006-01-18 2014-09-16 Microsoft Corporation Finding phishing sites
US7908234B2 (en) * 2008-02-15 2011-03-15 Yahoo! Inc. Systems and methods of predicting resource usefulness using universal resource locators including counting the number of times URL features occur in training data
IL191744A0 (en) * 2008-05-27 2009-02-11 Yuval Elovici Unknown malcode detection using classifiers with optimal training sets
US20100192222A1 (en) * 2009-01-23 2010-07-29 Microsoft Corporation Malware detection using multiple classifiers
US8190647B1 (en) * 2009-09-15 2012-05-29 Symantec Corporation Decision tree induction that is sensitive to attribute computational complexity
US8375450B1 (en) * 2009-10-05 2013-02-12 Trend Micro, Inc. Zero day malware scanner
US8401982B1 (en) * 2010-01-14 2013-03-19 Symantec Corporation Using sequencing and timing information of behavior events in machine learning to detect malware
US8326760B2 (en) * 2010-09-13 2012-12-04 Cybersource Corporation Computer-based collective intelligence recommendations for transaction review
US8869277B2 (en) * 2010-09-30 2014-10-21 Microsoft Corporation Realtime multiple engine selection and combining
TW201216106A (en) * 2010-10-13 2012-04-16 Univ Nat Taiwan Science Tech Intrusion detecting system and method to establish classifying rules thereof
US8682812B1 (en) * 2010-12-23 2014-03-25 Narus, Inc. Machine learning based botnet detection using real-time extracted traffic features
CA2829569C (en) * 2011-03-10 2016-05-17 Textwise Llc Method and system for unified information representation and applications thereof
CN102243699B (zh) * 2011-06-09 2013-11-27 深圳市安之天信息技术有限公司 一种恶意代码检测方法及系统
US8955133B2 (en) * 2011-06-09 2015-02-10 Microsoft Corporation Applying antimalware logic without revealing the antimalware logic to adversaries
US8694444B2 (en) * 2012-04-20 2014-04-08 Xerox Corporation Learning multiple tasks with boosted decision trees
US9609456B2 (en) * 2012-05-14 2017-03-28 Qualcomm Incorporated Methods, devices, and systems for communicating behavioral analysis information
US9100366B2 (en) * 2012-09-13 2015-08-04 Cisco Technology, Inc. Early policy evaluation of multiphase attributes in high-performance firewalls
US9292688B2 (en) * 2012-09-26 2016-03-22 Northrop Grumman Systems Corporation System and method for automated machine-learning, zero-day malware detection
US9392463B2 (en) * 2012-12-20 2016-07-12 Tarun Anand System and method for detecting anomaly in a handheld device
US9491187B2 (en) * 2013-02-15 2016-11-08 Qualcomm Incorporated APIs for obtaining device-specific behavior classifier models from the cloud
US9769189B2 (en) * 2014-02-21 2017-09-19 Verisign, Inc. Systems and methods for behavior-based automated malware analysis and classification
US9380065B2 (en) * 2014-03-12 2016-06-28 Facebook, Inc. Systems and methods for identifying illegitimate activities based on historical data
US10318882B2 (en) * 2014-09-11 2019-06-11 Amazon Technologies, Inc. Optimized training of linear machine learning models
US10783254B2 (en) * 2014-10-02 2020-09-22 Massachusetts Institute Of Technology Systems and methods for risk rating framework for mobile applications
US20160127319A1 (en) * 2014-11-05 2016-05-05 ThreatMetrix, Inc. Method and system for autonomous rule generation for screening internet transactions
US10909468B2 (en) * 2015-02-27 2021-02-02 Verizon Media Inc. Large-scale anomaly detection with relative density-ratio estimation
US10599844B2 (en) * 2015-05-12 2020-03-24 Webroot, Inc. Automatic threat detection of executable files based on static data analysis
US20160335432A1 (en) * 2015-05-17 2016-11-17 Bitdefender IPR Management Ltd. Cascading Classifiers For Computer Security Applications
US9690938B1 (en) * 2015-08-05 2017-06-27 Invincea, Inc. Methods and apparatus for machine learning based malware detection
CN108351862B (zh) * 2015-08-11 2023-08-22 科格诺亚公司 利用人工智能和用户输入来确定发育进展的方法和装置
US20170046510A1 (en) * 2015-08-14 2017-02-16 Qualcomm Incorporated Methods and Systems of Building Classifier Models in Computing Devices

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103942495A (zh) * 2010-12-31 2014-07-23 北京奇虎科技有限公司 基于机器学习的程序识别方法及装置
CN103164651A (zh) * 2011-12-15 2013-06-19 西门子公司 提取病毒文件特征码的装置和方法以及病毒检测系统
CN104700030A (zh) * 2013-12-04 2015-06-10 腾讯科技(深圳)有限公司 一种病毒数据查找方法、装置及服务器
CN104751054A (zh) * 2013-12-31 2015-07-01 贝壳网际(北京)安全技术有限公司 恶意程序的识别方法及装置、移动终端

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112948829A (zh) * 2021-03-03 2021-06-11 深信服科技股份有限公司 文件查杀方法、系统、设备及存储介质
CN112948829B (zh) * 2021-03-03 2023-11-03 深信服科技股份有限公司 文件查杀方法、系统、设备及存储介质

Also Published As

Publication number Publication date
US20170372069A1 (en) 2017-12-28
US11163877B2 (en) 2021-11-02
CN106485146A (zh) 2017-03-08
CN106485146B (zh) 2019-08-13

Similar Documents

Publication Publication Date Title
WO2017036154A1 (zh) 一种信息处理方法及服务器、计算机存储介质
Han et al. MalDAE: Detecting and explaining malware based on correlation and fusion of static and dynamic characteristics
CN109784056B (zh) 一种基于深度学习的恶意软件检测方法
CN109063055B (zh) 同源二进制文件检索方法和装置
TWI419003B (zh) 自動化分析與分類惡意程式之方法及系統
WO2015101097A1 (zh) 特征提取的方法及装置
WO2017032261A1 (zh) 身份认证方法、装置及设备
CN109271788B (zh) 一种基于深度学习的Android恶意软件检测方法
CN110427755A (zh) 一种识别脚本文件的方法及装置
CN109670318B (zh) 一种基于核控制流图循环验证的漏洞检测方法
CN109829302B (zh) Android恶意应用家族分类方法、装置与电子设备
RU2728498C1 (ru) Способ и система определения принадлежности программного обеспечения по его исходному коду
CN107273746A (zh) 一种基于apk字符串特征的变种恶意软件检测方法
CN113360906A (zh) 可解释的基于图嵌入的Android恶意软件自动检测
JP2017004123A (ja) 判定装置、判定方法および判定プログラム
Kakisim et al. Sequential opcode embedding-based malware detection method
CN112434296A (zh) 一种安卓恶意应用的检测方法及装置
CN113221109A (zh) 一种基于生成对抗网络的恶意文件智能分析方法
CN113704759B (zh) 基于Adaboost的安卓恶意软件检测方法、系统及存储介质
Agarkar et al. Malware detection & classification using machine learning
CN108959922B (zh) 一种基于贝叶斯网的恶意文档检测方法及装置
CN113139185B (zh) 基于异质信息网络的恶意代码检测方法及系统
Lee et al. Robust IoT Malware Detection and Classification Using Opcode Category Features on Machine Learning
Pranav et al. Detection of botnets in IoT networks using graph theory and machine learning
KR102192196B1 (ko) Ai 기반 머신러닝 교차 검증 기법을 활용한 악성코드 탐지 장치 및 방법

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16840583

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 20/07/2018)

122 Ep: pct application non-entry in european phase

Ref document number: 16840583

Country of ref document: EP

Kind code of ref document: A1