CN111475810B - Malicious software detector training method and system, and detection method and system - Google Patents

Malicious software detector training method and system, and detection method and system Download PDF

Info

Publication number
CN111475810B
CN111475810B CN202010285088.1A CN202010285088A CN111475810B CN 111475810 B CN111475810 B CN 111475810B CN 202010285088 A CN202010285088 A CN 202010285088A CN 111475810 B CN111475810 B CN 111475810B
Authority
CN
China
Prior art keywords
detector
malware
sample
training
noise
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010285088.1A
Other languages
Chinese (zh)
Other versions
CN111475810A (en
Inventor
林肖红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Jeeseen Network Technologies Co Ltd
Original Assignee
Guangzhou Jeeseen Network Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Jeeseen Network Technologies Co Ltd filed Critical Guangzhou Jeeseen Network Technologies Co Ltd
Priority to CN202010285088.1A priority Critical patent/CN111475810B/en
Publication of CN111475810A publication Critical patent/CN111475810A/en
Application granted granted Critical
Publication of CN111475810B publication Critical patent/CN111475810B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Security & Cryptography (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computer Hardware Design (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Virology (AREA)
  • Debugging And Monitoring (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a training method and a training system for a malicious software detector, and a detection method and a detection system for the malicious software detector, and relates to the technical field of computer malicious software detection or processing. The detector training method comprises the steps of inputting a software sample into a detector D, and training the detector D; the generator G generates a noise code; inserting the noise code into the malicious software sample to form a noise malicious software sample; inputting a noise malware sample into a detector D, and training the detector D; the detector D generates a discriminant model in the antagonistic network, and the generator G generates a generative model in the antagonistic network. The invention can identify the noised malicious software, and is beneficial to improving the network security; and the detector D and the generator G are used for generating the malicious software which is subjected to the anti-network identification and noise addition, so that the detection capability is improved.

Description

Malicious software detector training method and system, and detection method and system
Technical Field
The invention relates to the technical field of computer malicious software detection or processing, in particular to a malicious software detector training method and system and a malicious software detector detecting method and system.
Background
Malware refers to software programs that purposefully enable an attacker to break a computer, server, client, or computer network. Representative types are viruses, worms, trojan horses, backdoors, Rootkits, lemonades, botnets, etc. Malicious software is extremely harmful, and detection and prevention of malicious software are important issues existing in network space security for a long time.
The current malware detection methods include traditional expert experience-based analysis methods and big data-based machine learning methods. The machine learning method enables the machine to learn the characteristics of normal software and malicious software from a large number of samples, judges the likes and dislikes of the target software according to the memory obtained after learning, does not need expert experience, and is high in efficiency. However, more and more machine learning algorithms face the challenge of combating attacks, i.e. samples that are specifically noisy are made available to the machine to identify errors. Researchers have proposed effective attack methods for machine learning-based malware detectors.
The generative confrontation network is a deep learning model, and the model generates ideal output through mutual game learning of a generation module and a discrimination module in a frame.
Chinese invention "a method and an apparatus for defending against black box attack for malware image format detection model" with publication number CN110826059A provides a method for defending against black box attack for malware image format detection model, which includes: 1) acquiring a data set, and dividing the data set into a training set and a testing set; 2) converting into a malicious software image format; 3) constructing a black box attack model for generating disturbance based on a deep convolution generation countermeasure network (DCGAN), wherein the structure of the black box attack model is divided into a generator and a discriminator; 4) through the continuous confrontation process between the generator and the discriminator constructed in the step 3), the generator finally generates a confrontation sample which can imitate the sample B; 5) retraining the malicious software assembly format detection model by the countermeasure sample obtained in the step 4) to be optimized, and obtaining a malicious software detection model capable of defending countermeasure attack; 6) and identifying the malicious software by utilizing a malicious software detection model capable of defending against attacks. The invention also comprises a device for implementing the defense method of the black box attack facing the malicious software image format detection model.
Chinese invention "a malicious software detection method and system of adversarial network" with publication number CN110619216A provides a malicious software detection method and system of adversarial network, which can analyze and construct a noise simulation malicious software model based on historical software data, input normal software and malicious software into a black box model, mark the model, generate a software sample, train the noise simulation malicious software model using the software sample, and the model itself has the ability to continuously compound and mutate malicious software. After the noise simulation malicious software model is trained, the machine learning module is accessed to serve as a simulated malicious software source of the machine learning module, and the machine learning module is continuously trained by the malicious software to help improve the detection capability of the machine learning module.
The two publications disclose methods and devices for detecting malicious software from different aspects. However, the above two published applications do not disclose a solution for the noisy malware.
Disclosure of Invention
The invention aims to provide a malicious software detection method and system which have good robustness and can detect the malicious software subjected to noise processing and generate countermeasure network optimization.
In order to solve the problems, the technical scheme of the invention is as follows:
a malware detector training method comprising the steps of:
inputting a software sample into a detector D, and training the detector D;
the generator G generates a noise code;
inserting the noise code into a malicious software sample to form a noise malicious software sample;
inputting the noise malware sample into the detector D, training the detector D;
the detector D generates a discriminant model in the countermeasure network, and the generator G generates a generative model in the countermeasure network.
Further, before inputting the software sample into the detector D, the method further includes preprocessing the software sample, where the preprocessing is to use each byte of the software sample as an integer ranging from 0 to 255, arrange all the integers in an original order to generate a feature vector, and discard a portion exceeding 200 ten thousand when the length of the feature vector exceeds 200 ten thousand; when the length of the feature vector is less than 200 ten thousand, the feature vector is filled with 0, so that the length of the feature vector reaches 200 ten thousand.
Further characterized in that the noise code is a binary code generated by inputting a string of random bytes into the generator G that obey a bernoulli distribution.
Further, wherein the inserting the noise code into the malware sample comprises inserting the noise code entirely into a middle or end of the malware sample.
Further, the method also includes making an amendment to the noise malware sample, the amendment to adjust the noise malware sample header field value.
Further, before the generator G generates the noise code, the method further includes training the generator G according to feedback of the trainer D.
Further, the training of the generator G according to the feedback of the trainer D comprises:
adjusting parameters of the generator G;
generating a training noise code, and inserting the training noise code into the malicious software sample to form a training noise malicious software sample;
inputting the training noise malicious software sample into the detector D to obtain output;
repeating the above steps until the output of the detector D goes to 0;
wherein an output of 0 represents that the degree of maliciousness of the software is 0.
A malware detector training system comprising:
a detector D unit for detecting software;
a generator G unit for generating noise codes for malware;
a modifier unit for inserting the noise code into the malware.
A malware detection method is used for detecting malware by using a detector D obtained by training through any one of the methods.
A malware detection system comprises a detection unit, wherein the detection unit is used for detecting malware by using a detector D obtained by training through any one method.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention can identify the noised malicious software, and is beneficial to improving the network security;
2. the method uses the detector D and the generator G to form a generation countermeasure network, and continuously improves the capacity of the generator G for generating noise codes in training and the capacity of the detector D for detecting the noisy malicious software;
3. according to the invention, the corrector is used for correcting the malicious software added with the noise codes, so that new malicious software can be used, the original malicious software function is reserved, the noise-added malicious software in a real environment is simulated, and the detection capability is improved;
4. the invention uses the corrector to adjust the field value of the binary executable file header, so that new malicious software can be used and the original malicious software function can be reserved;
5. the generator G is used for generating the noise codes, so that a large number of noise codes can be generated, and the training efficiency is improved;
6. the invention trains the detector D by using a large number of confrontation samples, learns and identifies the important characteristics of the malicious software during training, and can improve the robustness of the detector D.
Drawings
The following describes embodiments of the present invention in further detail with reference to the accompanying drawings.
FIG. 1 is a schematic diagram of the system framework of the present invention;
FIG. 2 is a schematic flow chart of a training detector D according to the present invention;
fig. 3 is a schematic flow chart of the training generator G according to the present invention.
Detailed Description
In order to make the technical means, the original characteristics, the achieved purpose and the efficacy of the invention easy to understand, the invention is further described with reference to the specific drawings.
Example (b):
fig. 1-3 show a malware detector training method and system, and a malware detector detection method and system.
A malware detector training system comprising:
a detector D unit for detecting the software and outputting feedback;
a generator G unit for generating noise codes for malware;
and the corrector unit is used for inserting the noise codes into the malicious software and correcting the malicious software.
An optimizer OptimD unit for setting and adjusting parameters of the detector D;
an optimizer OptimG unit for setting and adjusting the parameters of the generator G.
The detector D is used for generating a discriminant mathematical model in the countermeasure network, and is used for fitting a function F (x) by using a neural network, wherein x is a feature vector obtained by preprocessing a binary executable file, and the function value output by F (x) is between 0 and 1. The larger the value, the greater the probability that the representative software is malware.
Binary executables are synonymous with the software referred to in this application.
The generator G generates a generative mathematical model in the antagonistic network, using a neural network. A string of random bytes obeying a bernoulli distribution is input to a generator G, which generates a piece of binary code.
The generation of the countermeasure network refers to a structure composed of a detector D and a generator G, and the detector D can be continuously trained by using the generation of the countermeasure network, so that the accuracy can be kept when confronted with a countermeasure sample.
A malware detector training method comprising the steps of:
inputting a software sample into a detector D, and training the detector D;
the generator G generates a noise code;
inserting the noise code into the malicious software sample to form a noise malicious software sample;
the noise malware samples are input to detector D, which is trained.
Specifically, inputting a software sample into a detector D, training the detector D comprises:
a set of software samples is prepared, the set of software samples including malware and non-malware.
The optimizer Optimd randomly initializes the parameters of the training detector D including initial weights, learning rate, batch size for a single forward pass, and number of training cycles.
Preprocessing a software sample, taking each byte of a binary executable file (namely the software sample) as an integer ranging from 0 to 255, arranging all the integers according to an original sequence to generate a feature vector, discarding a part exceeding 200 ten thousand when the length of the feature vector exceeds 200 ten thousand, and filling 0 when the length of the feature vector is less than 200 ten thousand to enable the length of the feature vector to reach 200 ten thousand.
Each software sample is preprocessed to obtain a feature vector.
And inputting the feature vector of the software sample set into a detector D as training data, and outputting feedback by the detector D.
The optimizer OptimD adjusts and optimizes the parameters of the detector D until the detector D is stable, that is, the values of the malware tend to 1 and cannot rise any more, and the values of the non-malware tend to 0 and cannot fall any more.
The generator G generating the noise code includes:
the generator generates a piece of noise code for each piece of malware. Wherein the noise code and the training noise code are binary codes generated by inputting a string of random bytes into the generator G, which obey a bernoulli distribution.
Inserting noise codes into the malware samples, and forming the noise malware samples comprises:
the modifier inserts the noise code into the malware sample to form a noisy malware sample. The plurality of noise malware samples constitutes a set of noise malware samples.
Further, the corrector inserts the whole of the noise code into the middle or the tail of the malicious software sample, namely randomly into a certain section in the exe file to form the noise malicious software sample, and corrects the head field value of the noise malicious software sample. The modification is to adjust the noise malware sample (binary executable) header field value, make the new malware sample available, and keep the original features.
The modifier is a code program that can be programmed by one skilled in the art without any creative effort to implement the insert function and the modify function.
Inputting the noise malware sample into a detector D, training the detector D comprising:
and adding the noise malware sample set to the prepared software sample set, and preprocessing the noise malware sample. All the eigenvectors are input into the detector D, and the parameters of the detector D are adjusted and optimized by using the optimizer OptimD until the detector D is stable and the output value is not increased any more.
Further, before the generator G generates the noise code, the method further includes training the generator G according to the feedback of the trainer D, including:
the optimizer OptimG randomly initializes the parameters of the generator G;
a generator G generates a training noise code for each malware sample;
the corrector inserts the training noise code into the malicious software sample, and corrects the part of the training noise code inserted into the head of the malicious software sample to form a training noise malicious software sample set;
and inputting the training noise malicious software sample set into a detector D which is initially trained, and outputting feedback by the detector D.
The optimizer OptimG adjusts, optimizes the parameters of the generator G, and repeats the three steps until the output of the detector D tends to be stable and the feedback tends to be non-malware, i.e. the output of the detector D tends to 0.
The generator G training is complete.
The purpose of this step is that the noise code generated by the trained generator G can make the detector D misjudge.
For example: the malware sample M, added with noise code 1, is M1, M1 is preprocessed and input to detector D, which outputs a value close to 1, for example, 0.6, which represents that M1(M) is malware. Adjusting the generator G to generate the noise code 2, adding the malware sample M to form M2, preprocessing M2 and inputting the preprocessed signal into the detector D, wherein the detector D outputs a value close to 0, for example, 0.2, which represents that M2(M) is non-malware. Cycling through the sequence until the value of the detector D output approaches 0 and no longer drops.
The noise code generated by the generator G trained by the method can make the detector D misjudge, so that the detection capability of the detector D is continuously improved when the detector D is trained next, and the misjudgment is not performed when the malicious software added with the noise code is detected. The detector D with the increased detection capability can be used to train the generator G until the noise code of the generator G makes a misjudgment. The detection capability of the detector D is continuously improved by the circulation.
A malware detection method uses a detector D obtained by training in the method to detect malware.
A malware detection system comprises a detection unit, wherein the detection unit is used for detecting malware by using a detector D trained by the method.
It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (4)

1. A malware detector training method comprising the steps of:
preprocessing a software sample, taking each byte of the software sample as an integer ranging from 0 to 255, and arranging all the integers according to an original sequence to generate a characteristic vector with the length of 200 ten thousand; preprocessing each software sample to obtain a feature vector; the software samples comprise malware samples and non-malware samples;
inputting the preprocessed software sample into a detector D, and primarily training the detector D, wherein the method specifically comprises the following steps:
preparing a software sample set;
the optimizer OptimD randomly initializes parameters of the training detector D, including initial weight, learning rate, batch size of single forward propagation and training cycle number;
inputting the feature vector of the software sample set into a detector D as training data, and outputting feedback by the detector D;
the optimizer OptimD adjusts and optimizes the parameters of the detector D until the detector D is stable, namely the values of the malicious software tend to 1 and cannot rise any more, and the values of the non-malicious software tend to 0 and cannot fall any more;
according to the feedback training generator G of the detector D, ending the training until the output of the detector D tends to 0, specifically including:
the optimizer OptimG randomly initializes the parameters of the generator G;
generating a training noise code, inserting the training noise code into the malicious software sample, and correcting the part of the training noise code inserted into the head of the malicious software sample to form a training noise malicious software sample;
inputting the training noise malware sample into the detector D which is preliminarily trained, and outputting feedback by the detector D;
the optimizer OptimG adjusts and optimizes the parameters of the generator G, and the steps are repeated until the output of the detector D is stable and the feedback tends to be non-malware, namely the output of the detector D tends to 0;
the generator G is trained;
a generator G generates a noise code, wherein the noise code is a binary code generated by inputting a string of random bytes which obey Bernoulli distribution into the generator G;
inserting the noise code into a malware sample to form a noise malware sample; inserting the noisy code into a malware sample comprises inserting the noisy code entirely into a middle or end of the malware sample;
modifying the noise malware sample, wherein the modifying is to adjust a header field value of the noise malware sample;
inputting the noise malware sample into the detector D, and training the detector D again;
continuing to train the generator G by the detector D after retraining, and repeating the steps;
the detector D is used for generating a discriminant model in the antagonistic network, and the generator G is used for generating a generation model in the antagonistic network; the generation countermeasure network is a structure composed of a detector D and a generator G, and the generation countermeasure network continuously trains the detector D;
the output result of the detector D is between 0 and 1 and approaches to 0, and the input software sample is represented as non-malicious software; approaching 1, the input software sample is represented as malware.
2. A malware detector training system using the malware detector training method of claim 1, comprising:
a detector D unit for detecting software;
a generator G unit for generating noise codes for malware;
a modifier unit for inserting the noise code into the malware;
an optimizer OptimD unit for setting and adjusting parameters of the detector D;
an optimizer OptimG unit for setting and adjusting the parameters of the generator G;
the detector D is used for generating a discriminant mathematical model in the countermeasure network, the neural network is used for fitting a function F (x), x is a feature vector obtained after preprocessing of a binary executable file, F (x) outputs a function value between 0 and 1, and the larger the numerical value is, the larger the probability that the software is malicious software is represented.
3. A malware detection method, characterized in that malware detection is performed using the detector D trained in claim 1.
4. A malware detection system comprising a detection unit configured to perform malware detection using the detector D trained in claim 1.
CN202010285088.1A 2020-04-13 2020-04-13 Malicious software detector training method and system, and detection method and system Active CN111475810B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010285088.1A CN111475810B (en) 2020-04-13 2020-04-13 Malicious software detector training method and system, and detection method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010285088.1A CN111475810B (en) 2020-04-13 2020-04-13 Malicious software detector training method and system, and detection method and system

Publications (2)

Publication Number Publication Date
CN111475810A CN111475810A (en) 2020-07-31
CN111475810B true CN111475810B (en) 2021-04-06

Family

ID=71752244

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010285088.1A Active CN111475810B (en) 2020-04-13 2020-04-13 Malicious software detector training method and system, and detection method and system

Country Status (1)

Country Link
CN (1) CN111475810B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112231703B (en) * 2020-11-09 2022-08-05 北京理工大学 Malicious software countermeasure sample generation method combined with API fuzzy processing technology
CN112380537A (en) * 2020-11-30 2021-02-19 北京天融信网络安全技术有限公司 Method, device, storage medium and electronic equipment for detecting malicious software

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10839291B2 (en) * 2017-07-01 2020-11-17 Intel Corporation Hardened deep neural networks through training from adversarial misclassified data
CN109165688A (en) * 2018-08-28 2019-01-08 暨南大学 A kind of Android Malware family classification device construction method and its classification method
CN110427756B (en) * 2019-06-20 2021-05-04 中国人民解放军战略支援部队信息工程大学 Capsule network-based android malicious software detection method and device
CN110581857B (en) * 2019-09-17 2022-04-08 武汉思普崚技术有限公司 Virtual execution malicious software detection method and system
CN110619216B (en) * 2019-09-17 2021-09-03 武汉思普崚技术有限公司 Malicious software detection method and system for adversarial network
CN110598794A (en) * 2019-09-17 2019-12-20 武汉思普崚技术有限公司 Classified countermeasure network attack detection method and system
CN110826059B (en) * 2019-09-19 2021-10-15 浙江工业大学 Method and device for defending black box attack facing malicious software image format detection model

Also Published As

Publication number Publication date
CN111475810A (en) 2020-07-31

Similar Documents

Publication Publication Date Title
Ding et al. Intrusion detection system for NSL-KDD dataset using convolutional neural networks
Bhatia et al. Unsupervised machine learning for network-centric anomaly detection in IoT
CN111475810B (en) Malicious software detector training method and system, and detection method and system
Liu et al. ATMPA: attacking machine learning-based malware visualization detection methods via adversarial examples
Wressnegger et al. Zoe: Content-based anomaly detection for industrial control systems
CN111885035A (en) Network anomaly detection method, system, terminal and storage medium
CN110119620A (en) System and method of the training for detecting the machine learning model of malice container
Liu et al. LSTM-CGAN: Towards generating low-rate DDoS adversarial samples for blockchain-based wireless network detection models
Camuto et al. Towards a theoretical understanding of the robustness of variational autoencoders
CN112613036A (en) Malicious sample enhancement method, malicious program detection method and corresponding devices
KR20190028880A (en) Method and appratus for generating machine learning data for botnet detection system
CN112733954A (en) Abnormal traffic detection method based on generation countermeasure network
Lightbody et al. Host-based intrusion detection system for iot using convolutional neural networks
CN112153045B (en) Method and system for identifying encrypted field of private protocol
Alyasiri et al. Grammatical evolution for detecting cyberattacks in Internet of Things environments
Zhan et al. AMGmal: Adaptive mask-guided adversarial attack against malware detection with minimal perturbation
Zhang et al. Evasion attacks based on wasserstein generative adversarial network
CN115277065B (en) Anti-attack method and device in abnormal traffic detection of Internet of things
Mendonça et al. An extremely lightweight approach for ddos detection at home gateways
CN111310186A (en) Method, device and system for detecting confusion command line
CN116015788A (en) Malicious traffic protection method and system based on active detection
Pan Iot network behavioral fingerprint inference with limited network traces for cyber investigation
Yin et al. Botnet detection based on genetic neural network
Lin et al. A hypergraph-based machine learning ensemble network intrusion detection system
Pavlik et al. Cyber creative GAN for novel malicious packets

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant