CN111753299A - Unbalanced malicious software detection method based on packet integration - Google Patents
Unbalanced malicious software detection method based on packet integration Download PDFInfo
- Publication number
- CN111753299A CN111753299A CN202010571828.8A CN202010571828A CN111753299A CN 111753299 A CN111753299 A CN 111753299A CN 202010571828 A CN202010571828 A CN 202010571828A CN 111753299 A CN111753299 A CN 111753299A
- Authority
- CN
- China
- Prior art keywords
- samples
- malicious
- data set
- sample
- unbalanced
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/562—Static detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/211—Selection of the most significant subset of features
- G06F18/2113—Selection of the most significant subset of features by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/254—Fusion techniques of classification results, e.g. of results related to same input data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/259—Fusion by voting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2221/00—Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/03—Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
- G06F2221/033—Test or assess software
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Computer Hardware Design (AREA)
- Health & Medical Sciences (AREA)
- Virology (AREA)
- General Health & Medical Sciences (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention belongs to the technical field of information security, and particularly relates to an unbalanced malicious software detection method based on packet integration, which comprises the following steps: s1, feature extraction: extracting authority information and API calling information from an experimental sample to form a characteristic vector set; the experimental samples comprise normal samples and malicious samples, and the number of the normal samples is larger than that of the malicious samples; s2, feature optimization: screening the characteristic vector set by adopting an information gain algorithm to remove redundant characteristics to obtain an unbalanced data set; and S3, detecting the unbalanced data set by utilizing a grouping integration detection algorithm so as to classify the normal samples and the malicious samples. The method and the device solve the defect that the accuracy and the stability of the detection of the malicious software are difficult to guarantee due to the unbalanced data set.
Description
Technical Field
The invention belongs to the technical field of information security, and particularly relates to an unbalanced malicious software detection method based on packet integration.
Background
The Android platform is popular among a large number of mobile phone manufacturers due to the open source characteristic, the Android mobile phone occupies 87% of market share according to the latest statistics of IDC, meanwhile, the Android platform is vulnerable to malicious software due to the open source characteristic, and 97% of discovered mobile phone end malicious software is related to the Android platform. Malicious molecules attack the Android platform by using malicious software to steal user privacy information, carry out malicious fee deduction and the like, and the security situation of the mobile phone is very severe, so that the detection of the malicious software becomes a research focus in the field of information security.
Machine learning has enjoyed tremendous success in the information security field of spam filtering and the like. Researchers apply machine learning algorithms to the field of Android malware detection, provide a plurality of malware detection algorithms, and verify the effectiveness of the machine learning algorithms in malware detection problems. The method provides a lightweight detection scheme based on sensitive authority, analyzes the difference of authority in different types of samples, removes redundant authority, and finally adopts a nearest neighbor classification algorithm to realize discrimination on malicious software. Zhang et al put forward an Android malicious software detection strategy based on naive Bayes, and discrimination of malicious software is realized by judging whether abuse authority and sensitive authority are supplied in series or not as characteristic attributes. Poplar and macro and the like extract authority and component intention information of android software as features, and a random forest algorithm optimized by weighted voting is adopted to detect malicious software. Although many methods for detecting malware have been proposed, most of these methods assume that malicious software and normal software in the training data do not differ greatly in number. However, in practical applications, because normal samples can be obtained from a third-party market in batches through a crawler, the collection cost of the malicious software samples is high, the difficulty is high, the number of the normal software samples is far greater than that of the malicious software samples, and the problem of imbalance of training data is caused, so that the accuracy and the stability of the malicious software detection method are difficult to guarantee. In the detection of the malicious software, data imbalance is caused due to the high collection cost of the malicious software sample and the like.
Disclosure of Invention
In order to solve the problem of low detection and classification precision of malicious software caused by data imbalance, the invention provides an unbalanced malicious software detection method based on packet integration, and the effectiveness of the method in solving the problem of data imbalance is verified on a real data set.
In order to achieve the purpose, the invention adopts the following technical scheme:
an unbalanced malware detection method based on packet integration comprises the following steps:
s1, feature extraction: extracting authority information and API calling information from an experimental sample to form a characteristic vector set; the experimental samples comprise normal samples and malicious samples, and the number of the normal samples is larger than that of the malicious samples;
s2, feature optimization: screening the characteristic vector set by adopting an information gain algorithm to remove redundant characteristics to obtain an unbalanced data set;
and S3, detecting the unbalanced data set by utilizing a grouping integration detection algorithm so as to classify the normal samples and the malicious samples.
Preferably, the step S3 specifically includes the following steps:
s31, randomly extracting three data sets from the unbalanced data set to be respectively used as a training data set, a verification data set and a test data set; the number of normal samples and the number of malicious samples in the training data set are respectively recorded as b and m;
s32, randomly and unreplaceably extracting m samples from normal samples of the training data set and combining the m samples with m malicious samples to form a new data set Di(ii) a Extracting k times to form k balanced data sets; wherein k is b/m;
s33, for each data set DiTraining by adopting a decision tree, training k decision tree classifiers, sequentially testing the classification performance of each decision tree classifier t on a verification data set, and calculating the recall rate and recording the recall rate as rt(ii) a For the decision tree classifier t, if the malicious sample is wrongly classified, adding the wrongly classified sample into the next decision tree classifier for training to form k base classifiers;
s34, combining the k base classifiers into an integrated decision tree classifier C in a weight voting mode;
s35, inputting x into k basis classifiers in the decision tree integrated classifier C for each test sample x in the test data set, and calculating weight voting results of the k basis classifiers, wherein the calculation formula is as follows:
wherein r istIs the recall of decision tree classifier t;
when counting the votes of the malicious samples, the class c is the malicious sample, and the class non-c is the normal sample;
when counting the ticket number of the normal sample, the class c is the normal sample, and the class non-c is the malicious sample;
and calculating the total ticket number of the samples which are judged to be the malicious samples and the normal samples, and selecting the category with the most tickets as the final category of the sample x.
Preferably, the step S1 specifically includes the following steps:
s11, writing a Python program to read the authority and API calling information in the experiment sample to form a feature set;
s12, carrying out duplication elimination processing on the feature set to form a new feature set FS;
s13, judging whether the samples contain corresponding elements in the new feature set FS or not according to all the samples; if the sample contains the corresponding feature in the FS set, the corresponding element of the feature vector is represented by 1; otherwise, the corresponding element is represented by 0; all samples are traversed to form a feature vector set FVS.
Preferably, the step S1 further includes:
and adding a flag bit at the end of each feature vector, wherein 0 represents a normal sample, and 1 represents a malicious sample.
Preferably, the proportion of the training data set in the unbalanced data set is greater than 50%.
Preferably, the proportion of the training data set in the unbalanced data set is 60%, the proportion of the verification data set in the unbalanced data set is 20%, and the proportion of the test data set in the unbalanced data set is 20%.
Preferably, in step S2, the information gain algorithm calculates a difference between the entropy value of the feature and the conditional entropy thereof to obtain an IG value of the feature, and the larger the IG value is, the more important the feature is.
Preferably, in step S2, the two indexes of recall rate recall and G-mean are used as metrics for screening, which are as follows:
if the samples are predicted to be malicious samples, the samples are actually malicious samples, and the number of the malicious samples which are predicted to be correct is recorded as TP;
if the samples are predicted to be normal samples and actually malicious samples, the number of the malicious sample prediction errors is recorded as FP;
if the samples are predicted to be malicious samples and actually normal samples, the number of prediction errors of the normal samples is recorded as FN;
if the normal samples are predicted, the normal samples are actually normal samples, and the number of the normal samples which are predicted correctly is recorded as TN;
compared with the prior art, the invention has the beneficial effects that:
the unbalanced malicious software detection method based on the grouping integration utilizes the grouping integration detection algorithm, namely, the normal software samples are divided into a plurality of groups through a random sampling technology, then the normal software samples in each group and all the malicious software samples are used for training a classification model, and finally all the classification models are fused through the integration method, so that the defects that the accuracy and the stability of malicious software detection are difficult to guarantee due to the unbalanced data set are overcome.
Drawings
FIG. 1 is a block flow diagram of an unbalanced malware detection method based on packet integration according to an embodiment of the present invention;
FIG. 2 is a flow chart of a packet integration detection algorithm of an embodiment of the present invention;
FIG. 3 is a usage ranking graph of partial rights information in an experimental sample according to an embodiment of the invention;
FIG. 4 is a usage ranking graph of a portion of API call information in an experimental sample according to an embodiment of the present invention;
FIG. 5 is a simulation of the detection method of an embodiment of the present invention over different feature subset numbers.
Detailed Description
In order to more clearly illustrate the embodiments of the present invention, the following description will explain the embodiments of the present invention with reference to the accompanying drawings. It is obvious that the drawings in the following description are only some examples of the invention, and that for a person skilled in the art, other drawings and embodiments can be derived from them without inventive effort.
The invention provides an unbalanced malicious software detection method based on grouping integration for solving the problem of low detection and classification precision of malicious software caused by data imbalance, and verifies the effectiveness of the method in solving the unbalanced problem on a real data set.
Specifically, as shown in fig. 1, the unbalanced malware detection method based on packet integration according to the embodiment of the present invention is implemented by a feature extraction module, a feature optimization module, and a classification detection module, where the feature extraction module mainly uses Python language programming to extract authority and API call information from an experimental sample, so as to form a feature vector set; the characteristic optimization module mainly solves the characteristic redundancy, removes redundant characteristics through an information gain algorithm to prevent the overfitting phenomenon and improves the detection efficiency; the classification detection module provides a grouping integration detection algorithm to detect unbalanced data sets aiming at the deviation of most types of the traditional classifier.
As shown in fig. 2, the packet integration-based unbalanced malware detection method according to the embodiment of the present invention specifically includes:
s1, feature extraction: extracting authority information and API calling information from an experimental sample to form a characteristic vector set; the experimental samples comprise normal samples and malicious samples, and the number of the normal samples is larger than that of the malicious samples.
The method specifically comprises the following steps:
and 1, writing a Python program to read the authority and API calling information in the sample to form a feature set, wherein the Android self-defined authority only belongs to a specific sample and statistics is not carried out.
Step 2: and (4) carrying out duplication removal treatment on the features extracted in the step (1), and removing repeated features to form a new feature set FS.
And step 3: judging whether the samples contain corresponding elements in the FS set or not according to all the samples; if the sample contains the corresponding feature in the FS set, the corresponding element of the feature vector is represented by 1; otherwise, the corresponding element is represented by 0, all samples are traversed to form a feature vector set FVS, in addition, a flag bit is added at the end of each feature vector, 0 represents a normal sample, and 1 represents a malicious sample.
S2, feature optimization: and screening the characteristic vector set by adopting an information gain algorithm to remove redundant characteristics and obtain an unbalanced data set.
Android software relates to various authority features and API calling features, the dimension of a feature space is high, and dimension disasters are easily caused. In order to reduce the influence of redundant features on the detection effect, the detection efficiency is improved. In the embodiment of the invention, an information gain algorithm (IG) is adopted to screen out the characteristics with high category resolution capability from the original characteristic space.
(1) Information gain based feature correlation analysis
The information gain algorithm is used for measuring the difference of two probability distributions in an information theory, the difference is applied to feature selection, the measurement standard is the contribution degree of the features to classification, the algorithm obtains the IG value of the features by calculating the difference value of the entropy value of the features and the conditional entropy thereof, and the larger the value is, the higher the correlation degree is.
The calculation formula is as follows:
wherein m represents the total number of categories of the classification; p (t) represents the probability of occurrence of feature t; p (C)i) Represents class CiProbability of occurrence, P (C)i| t) represents feature t versus class CiThe contribution of (a) to (b),representing the probability that the extracted feature does not contain the feature t,indicates that the extracted features in the training sample belong to C when the features do not include the feature tiConditional probability of a class.
(2) Evaluation and selection of feature subsets
Two index quantities of recall rate and G-mean are used as the measurement indexes. The two indexes of the recall rate recall and the G-mean can be obtained by calculating through a table confusion matrix. Where the confusion matrix is shown in table 1.
TABLE 1 confusion matrix
Prediction as positive class | Prediction as negative class | |
Actually of positive type | TP | FN |
Actually of negative type | FP | TN |
The recall rates recall and G-mean are calculated by the following formulas.
The recall rate represents the accuracy of prediction of the malicious samples, the G-mean considers the whole effect of the classifier on classification of the positive and negative samples, and the value depends on the detection rates of the malicious software and the normal software, namely the screening of the feature vector set is adjusted through the final detection rates of the malicious software and the normal software.
And S3, detecting the unbalanced data set by utilizing a grouping integration detection algorithm so as to classify the normal samples and the malicious samples.
Most of the traditional machine learning algorithms are predicated based on balanced samples, when unbalanced data sets are processed, classification results are biased to most of classes, but the predication effect of few classes is poor, the predication result of the few classes is more important, and if malicious samples are predicated to be normal samples, loss is caused to users. Therefore, the invention provides an unbalanced malware detection algorithm based on packet integration aiming at an unbalanced data set, and a decision tree is adopted as a base classifier.
(1) Data packet
Classifying the unbalanced data sets into a training data set, a verification data set and a test data set, and constructing k balanced data sets by random sampling aiming at the training data set; then training k decision tree classifiers C based on a balanced data set, testing the classification performance of the decision tree base classifiers by using a verification set, and adding wrongly-classified malicious data into a next decision tree base classifier for training; and finally, forming a decision tree integrated classifier C by the k decision tree base classifiers in a weight voting mode, and taking the output of the decision tree integrated classifier as a classification result of the test sample during classification detection.
The embodiment of the invention adopts the decision tree as the base classifier because the decision tree classifier is a weak classifier and has better classification performance than other strong classifiers.
(2) Integrated learning
After data grouping is carried out and a base classifier is trained, classification is integrated in a weighted voting mode.
Specifically, the flow of the packet integration detection algorithm is shown in fig. 2, and the detailed steps are as follows:
step 1: respectively randomly extracting 60% of training data sets, 20% of verification data sets and 20% of testing data sets from a normal data set and a malicious data set (namely an unbalanced data set); in the training data set, the number of normal samples and the number of malicious samples are respectively recorded as b and m;
step 2: randomly and unreplaceably extracting m samples from normal training data samples and m malicious samples to synthesize a new data set Di(ii) a A total of k decimation, where k is b/m, together forming k balanced datasets.
And step 3: for each data set DiTraining by adopting a decision tree, training k decision tree classifiers in total, testing the classification performance of the decision tree classifier t on a verification data set, and calculating the recall rate of the decision tree classifier t and recording the recall rate as rt. And for the decision tree classifier t, if the malicious sample is wrongly classified, adding the wrongly classified sample into the next decision tree classifier for training. So as to form k base classifiers in total,
and combining the k base classifiers into an integrated decision tree classifier C in a weight voting mode.
And 4, step 4: inputting x into k base classifiers in a decision tree integrated classifier C for each test sample x in a test data set, and calculating weight voting results of the k base classifiers, wherein the formula is as follows:
wherein r istIs the recall of decision tree classifier t;
Tc,x(x) Is defined as follows:
when counting the votes of the malicious samples, the class c is the malicious sample, and the class non-c is the normal sample; when counting the votes of the normal samples, the c type is the normal sample, and the non-c type is the malicious sample. When counting the ticket number of the normal sample, if the sample x is judged as the normal sample by the base classifier, Tc,x(x) 1 is ═ 1; if sample x is judged to be a malicious sample by the base classifier, Tc,x(x) 0. When counting the number of the malicious samples, T is the time when the sample x is judged to be the malicious sample by the base classifierc,x(x) 1 is ═ 1; if sample x is judged to be a normal sample by the base classifier, Tc,x(x)=0。
And calculating the total ticket number of the samples which are judged to be the malicious samples and the normal samples, and selecting the category with the most tickets as the final category of the sample x.
The packet integration-based unbalanced malware detection method of the embodiment of the invention has the following effective application:
the experimental data come from Drebin website, and contain 123453 normal samples and 5560 malicious samples.
The adopted data set comprises 123453 normal samples and 5560 malicious samples, each sample comprises 545 characteristics, and the usage rate ordering of partial authority information and API call information in the malicious samples and the normal samples is shown in figures 3 and 4.
The IG value of each feature was calculated by an IG information gain algorithm, and the larger the IG value, the more important the feature was, among which the top-ranked feature attributes are shown in table 2.
TABLE 2 characteristics and their corresponding IG values
After screening, an optimal feature subset (i.e., an unbalanced data set) is obtained, and simulation is performed on the number of different feature subsets according to the packet integration detection algorithm provided by the invention, wherein the simulation result is shown in fig. 5.
On an experimental data set, taking the pre-ranking 70 attribute calculated by an IG algorithm as an input feature, a comparison experiment is carried out on the classification detection strategy of the invention, the kNN, svm and the RF algorithm provided by a sklern packet in Python, and the result is shown in Table 3.
TABLE 3 comparison test data of the packet integration detection algorithm of the present invention with other existing algorithms
Index (I) | kNN | svm | RF | Algorithm of the invention |
TP | 877 | 855 | 940 | 1035 |
TN | 25511 | 25554 | 25525 | 24659 |
FP | 86 | 43 | 72 | 938 |
FN | 235 | 257 | 172 | 77 |
recall | 0.788 | 0.768 | 0.846 | 0.931 |
G-mean | 0.866 | 0.866 | 0.918 | 0.947 |
The foregoing has outlined rather broadly the preferred embodiments and principles of the present invention and it will be appreciated that those skilled in the art may devise variations of the present invention that are within the spirit and scope of the appended claims.
Claims (8)
1. An unbalanced malware detection method based on packet integration is characterized by comprising the following steps:
s1, feature extraction: extracting authority information and API calling information from an experimental sample to form a characteristic vector set; the experimental samples comprise normal samples and malicious samples, and the number of the normal samples is larger than that of the malicious samples;
s2, feature optimization: screening the characteristic vector set by adopting an information gain algorithm to remove redundant characteristics to obtain an unbalanced data set;
and S3, detecting the unbalanced data set by utilizing a grouping integration detection algorithm so as to classify the normal samples and the malicious samples.
2. The unbalanced malware detection method based on packet integration as claimed in claim 1, wherein the step S3 specifically comprises the following steps:
s31, randomly extracting three data sets from the unbalanced data set to be respectively used as a training data set, a verification data set and a test data set; the number of normal samples and the number of malicious samples in the training data set are respectively recorded as b and m;
s32, randomly and unreplaceably extracting m samples from normal samples of the training data set and combining the m samples with m malicious samples to form a new data set Di(ii) a Extracting k times to form k balanced data sets; wherein k is b/m;
s33, for each data set DiTraining by adopting a decision tree, training k decision tree classifiers, sequentially testing the classification performance of each decision tree classifier t on a verification data set, and calculating the recall rate and recording the recall rate as rt(ii) a For the decision tree classifier t, if the malicious sample is wrongly classified, adding the wrongly classified sample into the next decision tree classifier for training to form k base classifiers;
s34, combining the k base classifiers into an integrated decision tree classifier C in a weight voting mode;
s35, inputting x into k basis classifiers in the decision tree integrated classifier C for each test sample x in the test data set, and calculating weight voting results of the k basis classifiers, wherein the calculation formula is as follows:
wherein r istIs the recall of decision tree classifier t;
when counting the votes of the malicious samples, the class c is the malicious sample, and the class non-c is the normal sample;
when counting the ticket number of the normal sample, the class c is the normal sample, and the class non-c is the malicious sample;
and calculating the total ticket number of the samples which are judged to be the malicious samples and the normal samples, and selecting the category with the most tickets as the final category of the sample x.
3. The unbalanced malware detection method based on packet integration as claimed in claim 2, wherein the step S1 specifically comprises the following steps:
s11, writing a Python program to read the authority and API calling information in the experiment sample to form a feature set;
s12, carrying out duplication elimination processing on the feature set to form a new feature set FS;
s13, judging whether the samples contain corresponding elements in the new feature set FS or not according to all the samples; if the sample contains the corresponding feature in the FS set, the corresponding element of the feature vector is represented by 1; otherwise, the corresponding element is represented by 0; all samples are traversed to form a feature vector set FVS.
4. The unbalanced malware detection method based on packet integration as claimed in claim 3, wherein the step S1 further comprises:
and adding a flag bit at the end of each feature vector, wherein 0 represents a normal sample, and 1 represents a malicious sample.
5. The method according to claim 2, wherein the proportion of the training data set in the unbalanced data set is greater than 50%.
6. The method according to claim 2, wherein the proportion of the training data set in the unbalanced data set is 60%, the proportion of the verification data set in the unbalanced data set is 20%, and the proportion of the test data set in the unbalanced data set is 20%.
7. The method as claimed in claim 1, wherein in step S2, the information gain algorithm calculates the difference between the entropy and the conditional entropy of the features to obtain the IG value of the features, and the larger the IG value is, the more important the features are.
8. The unbalanced malware detection method based on packet integration as claimed in claim 1, wherein in the step S2, two indexes of recall rate call and G-mean are used as metrics for screening, specifically as follows:
if the samples are predicted to be malicious samples, the samples are actually malicious samples, and the number of the malicious samples which are predicted to be correct is recorded as TP;
if the samples are predicted to be normal samples and actually malicious samples, the number of the malicious sample prediction errors is recorded as FP;
if the samples are predicted to be malicious samples and actually normal samples, the number of prediction errors of the normal samples is recorded as FN;
if the normal samples are predicted, the normal samples are actually normal samples, and the number of the normal samples which are predicted correctly is recorded as TN;
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010571828.8A CN111753299A (en) | 2020-06-22 | 2020-06-22 | Unbalanced malicious software detection method based on packet integration |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010571828.8A CN111753299A (en) | 2020-06-22 | 2020-06-22 | Unbalanced malicious software detection method based on packet integration |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111753299A true CN111753299A (en) | 2020-10-09 |
Family
ID=72675578
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010571828.8A Withdrawn CN111753299A (en) | 2020-06-22 | 2020-06-22 | Unbalanced malicious software detection method based on packet integration |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111753299A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112560435A (en) * | 2020-12-18 | 2021-03-26 | 北京声智科技有限公司 | Text corpus processing method, device, equipment and storage medium |
CN112764791A (en) * | 2021-01-25 | 2021-05-07 | 济南大学 | Incremental updating malicious software detection method and system |
CN112800426A (en) * | 2021-02-09 | 2021-05-14 | 北京工业大学 | Malicious code data unbalanced processing method based on group intelligent algorithm and cGAN |
-
2020
- 2020-06-22 CN CN202010571828.8A patent/CN111753299A/en not_active Withdrawn
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112560435A (en) * | 2020-12-18 | 2021-03-26 | 北京声智科技有限公司 | Text corpus processing method, device, equipment and storage medium |
CN112560435B (en) * | 2020-12-18 | 2022-03-11 | 北京声智科技有限公司 | Text corpus processing method, device, equipment and storage medium |
CN112764791A (en) * | 2021-01-25 | 2021-05-07 | 济南大学 | Incremental updating malicious software detection method and system |
CN112764791B (en) * | 2021-01-25 | 2023-08-08 | 济南大学 | Incremental update malicious software detection method and system |
CN112800426A (en) * | 2021-02-09 | 2021-05-14 | 北京工业大学 | Malicious code data unbalanced processing method based on group intelligent algorithm and cGAN |
CN112800426B (en) * | 2021-02-09 | 2024-03-22 | 北京工业大学 | Malicious code data unbalanced processing method based on group intelligent algorithm and cGAN |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112329016B (en) | Visual malicious software detection device and method based on deep neural network | |
CN111428231B (en) | Safety processing method, device and equipment based on user behaviors | |
CN110704840A (en) | Convolutional neural network CNN-based malicious software detection method | |
CN111753299A (en) | Unbalanced malicious software detection method based on packet integration | |
Kim et al. | Fusions of GA and SVM for anomaly detection in intrusion detection system | |
CN109359439A (en) | Software detecting method, device, equipment and storage medium | |
CN110084609B (en) | Transaction fraud behavior deep detection method based on characterization learning | |
CN112464232B (en) | Android system malicious software detection method based on mixed feature combination classification | |
CN110287311B (en) | Text classification method and device, storage medium and computer equipment | |
CN112950445B (en) | Compensation-based detection feature selection method in image steganalysis | |
CN111062036A (en) | Malicious software identification model construction method, malicious software identification medium and malicious software identification equipment | |
CN115600194A (en) | Intrusion detection method, storage medium and device based on XGboost and LGBM | |
Muttaqien et al. | Increasing performance of IDS by selecting and transforming features | |
CN115577357A (en) | Android malicious software detection method based on stacking integration technology | |
CN113420291B (en) | Intrusion detection feature selection method based on weight integration | |
Weng et al. | UCM-net: A U-net-like tampered-region-related framework for copy-move forgery detection | |
An et al. | Benchmarking the robustness of image watermarks | |
CN113724779B (en) | SNAREs protein identification method, system, storage medium and equipment based on machine learning technology | |
CN116170187A (en) | Industrial Internet intrusion monitoring method based on CNN and LSTM fusion network | |
CN115842645A (en) | UMAP-RF-based network attack traffic detection method and device and readable storage medium | |
CN115688107A (en) | Fraud-related APP detection system and method | |
CN114510720A (en) | Android malicious software classification method based on feature fusion and NLP technology | |
CN111383716B (en) | Screening method, screening device, screening computer device and screening storage medium | |
CN113792141A (en) | Feature selection method based on covariance measurement factor | |
CN112749759A (en) | Preprocessing method, system and application of confrontation sample of deep neural network map |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20201009 |