CN110162975B - Multi-step abnormal point detection method based on neighbor propagation clustering algorithm - Google Patents

Multi-step abnormal point detection method based on neighbor propagation clustering algorithm Download PDF

Info

Publication number
CN110162975B
CN110162975B CN201910452071.8A CN201910452071A CN110162975B CN 110162975 B CN110162975 B CN 110162975B CN 201910452071 A CN201910452071 A CN 201910452071A CN 110162975 B CN110162975 B CN 110162975B
Authority
CN
China
Prior art keywords
sample
data
application program
abnormal
point detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910452071.8A
Other languages
Chinese (zh)
Other versions
CN110162975A (en
Inventor
朱会娟
冯霞
王良民
黎洋
顾伟
曹晓雯
房浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu University
Original Assignee
Jiangsu University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu University filed Critical Jiangsu University
Priority to CN201910452071.8A priority Critical patent/CN110162975B/en
Publication of CN110162975A publication Critical patent/CN110162975A/en
Application granted granted Critical
Publication of CN110162975B publication Critical patent/CN110162975B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/03Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
    • G06F2221/033Test or assess software

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Debugging And Monitoring (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a multi-step abnormal point detection method based on a neighbor propagation clustering algorithm. The invention can effectively solve the problem of dimension disaster when detecting the abnormal points, thereby avoiding the interference of redundant characteristics or excessive data noise of irrelevant characteristics to the abnormal point detection technology; meanwhile, the excessive dependence of the traditional abnormal point detection technology based on clustering or distance on the selection of the initial value is overcome, the effectiveness of the method is verified by a Virusschare and Google Play-acquired actual data aggregation cross-fold verification method, and the method has a wide application prospect in the field of network security.

Description

Multi-step abnormal point detection method based on neighbor propagation clustering algorithm
Technical Field
The invention belongs to the network security technology, and particularly relates to a multi-step abnormal point detection method based on a neighbor propagation clustering algorithm.
Background
Along with diversified propagation ways and complex application environments brought by rapid development of the internet, great convenience is brought to propagation and attack of malicious software, and the aggressivity and the harmfulness of the malicious software are stronger than those of traditional computer viruses. Due to the characteristics of Android, such as strictless audit of application stores, random release and download of Android application programs from a third-party application market by users and the like, android becomes a main attack target of malicious software, and Android equipment is listed as an attack target by mobile malicious software up to 97% according to latest research data. Malware is the generic term for software that is installed privately without explicit prompting or permission, and that has malicious intent or performs malicious functions that violate the legitimate interests of the user. Malware often has some significant features, such as frequently accessing files, using a network, sending short messages, obtaining a user's address book, and so on. A research and analysis report (last half of 2018) on network privacy security and network fraud behaviors shows a list of network security problems of fraud behaviors such as counterfeiting bank short messages and the like caused by stealing privacy information such as user address books, user geographic positions and other entertainment and payment through an Android application program. Therefore, the Android platform-based malware analysis and detection plays a crucial role in the research of network security.
However, conventional malware detection methods tend to be "retrospective", i.e., they rely on a sufficient known sample of malware to mine out the corresponding malware patterns after the malware has spread widely. Aiming at the realistic situation of Android malicious software detection, the invention introduces an anomaly detection technology. Anomaly Detection (Outlier Detection) aims at detecting data that does not conform to normal behavior. Anomaly detection has wide application in the fields of databases, data mining, machine learning, statistics and the like, and comprises fraud detection of credit cards or insurance industry, intrusion detection and fault diagnosis in networks, new feature identification in satellite image analysis, health medical monitoring, occurrence of emergencies in public safety, identification of novel molecular structures in drug research and the like. Distance-based and cluster-based anomaly detection methods are two more typical anomaly detection methods, but in practical applications, two major challenges are faced: (1) The accuracy of the abnormal point detection technology is low due to data noise caused by redundant features or excessive irrelevant features of high-dimensional data; (2) The efficiency of this type of solution depends greatly on whether the initial value is set reasonably or not, based on the traditional clustering method or the distance-based outlier detection technique (e.g., KNN, K-means, K-center) requires accurate prior knowledge and relies heavily on the selection of the initial value, such as the number of clusters and the initialization of the cluster center.
Disclosure of Invention
The invention aims to: the invention aims to solve the defects in the prior art, and provides a multi-step abnormal point detection method based on a neighbor propagation clustering algorithm.
The technical scheme is as follows: the invention discloses a multi-step abnormal point detection method based on a neighbor propagation clustering algorithm, which comprises the following steps of:
step 1, obtaining normal Android applications from the Android official website Google Play, and obtaining normal Android applications from a virus data sample library (for examplehttp://virusshare.com/) The method comprises the steps of obtaining a malicious App, constructing an application App sample set (containing normal samples and malicious samples), and dividing the App sample set into a training set and a testing set;
step 2, extracting the data stream in the sample set by using a FLOWDROID tool, thereby constructing a feature set X = (X) of the data stream frequency 1 ,x 2 ,...,x n )∈R m×n M refers to the counted number of data streams, namely the original characteristic dimension of the data set, and n represents the number of samples in the sample set; for example { user information → log };
step 3, constructing a characteristic vector by taking the data stream as a characteristic, taking the frequency of calling the corresponding data stream characteristic in each sample App as a characteristic value, and marking the sample App as 0 if the sample App does not call the corresponding characteristic value of a certain data stream;
step 4, reducing the dimension of the high-dimensional data in the step 3 by adopting an EstSNE dimension reduction technology;
step 5, dividing an App sample into 13 subclasses (such as account information, contact information, database operation and the like) related to user sensitive information, specifically, if the App calls the contact information stored on the device through an application program interface, the App is classified into a "contact method class", which is a superimposable partition, that is, one App may exist in multiple subclasses at the same time, because it is considered that location information, contact information, other sensitive information and the like may be called at the same time in the same App;
step 6, clustering partial normal apps in each subclass by adopting a near propagation algorithm AP, namely dividing the apps into different themes to excavate the normal mode of the theme, and calculating the reference point of the theme;
step 7, calculating the abnormal score of the candidate sample set by adopting an NPOD method, namely calculating the abnormal score of the candidate App in the 13 subclasses by taking the 13 groups of reference point sets calculated in the step 6 as reference sets, marking the abnormal score as 0 if the App is not divided into the corresponding subclasses, and constructing an abnormal score vector;
step 8, training a 1SVM (one-class Support Vector Machine) classifier model by adopting a pre-divided training set (all normal samples);
and 9, adopting a pre-divided test set (comprising normal samples and malicious samples), and then performing Android malicious software prediction and evaluation through the 1SVM classifier trained in the step 8.
Further, the detailed process of performing dimension reduction on the high-dimensional data in step 4 is as follows:
using X = [ X = 1 ,x 2 ,...,x n ]∈R m×n Representing a high-dimensional data set, constructing probability distribution P among high-dimensional objects and probability distribution Q of points in a low-dimensional space by an EstSNE dimension reduction method, and then obtaining the optimal low-dimensional representation of the points by minimizing the target KL divergence, namely:
Figure BDA0002075456920000031
p ij represents a sample x i And x j The similarity in the high-dimensional space X is calculated according to the formula:
Figure BDA0002075456920000032
Figure BDA0002075456920000033
δ i represents the variance of the gaussian distribution; wherein p is i|j Is calculated by j|i The same;
q ij representing a sample y i And y j In a low-dimensional space (i.e., a space reduced by X), Y = [ Y = 1 ,y 2 ,...,y n ]∈R d×n D is data after dimensionality reduction, and the calculation mode is as follows: q. q.s ij =((1+||y i -y j || 2 )K) -1
Figure BDA0002075456920000034
Here, p and q are used for cycle counting.
Further, in step 6, the reference point calculation method is as follows:
(6.1) using a negative Euclidean distance s (i, j) = - | | x i -x j || 2 Calculating a similarity matrix N between every two samples in a normal sample set s, and setting a reference degree p as a median of s;
(6.2) initializing attribution values A respectively N×N And an attraction degree matrix R N×N Is 0;
(6.3) passing rules
Figure BDA0002075456920000041
Updating the attraction matrix by rules
Figure BDA0002075456920000042
Updating a attribution degree matrix, wherein the attraction degree r (i, j) represents the attraction degree of the data point j suitable as the class representation of the data point i, and the attribution degree a (i, j) represents the attribution degree of the data point i for selecting the data point j as the class representation of the data point i;
if the iteration times exceed the set maximum value or when the clustering center is not changed in a plurality of iterations, stopping calculation, determining the class center and various sample points, and otherwise, continuously updating the attraction degree r (i, j) and the attribution degree a (i, j) in an iteration manner;
(6.4) setting each cluster center as a reference point
Figure BDA0002075456920000043
Wherein k is the automatically determined number of clusters and h is the total number of cluster centers.
Further, the method for calculating the anomaly score by using the NPOD in the step 7 comprises the following steps:
(7.1) traversing the candidate sample set X for which the computation of the anomaly score is required c
(7.2) passing formula
Figure BDA0002075456920000044
Calculating to obtain a reference set C ref (x c );
(7.3) passing formula OutScr (x) c )=(locDist(x c )+gloDist(x c ) 2) computing candidate samples x c Abnormal score of (Outscr) g (x c ),
Wherein locDist (x) c )=[lo/(l-2)]×[o(x c )/l]L is the number of elements in the reference set;
gloDist(x c ) = gl/(k-2), k is the calculated reference point
Figure BDA0002075456920000045
The number of (2);
Figure BDA0002075456920000046
Figure BDA0002075456920000047
are elements in the reference set;
Figure BDA0002075456920000048
(7.4) traverse 13 subclasses involved in user sensitive information to construct an anomaly score vector OutscrVector (x) ← { Outscr } 1 (x),...,Outscr catNum (x)}。
Has the beneficial effects that: the multi-step anomaly detection model for the Android malicious software is constructed by utilizing an EstSNE dimension reduction method, an anomaly score calculation method NPOD (non-uniform finite automaton), a 1SVM (support vector machine) classification algorithm and the like based on a neighbor propagation clustering algorithm; compared with the prior art, the invention has the following advantages:
1) High efficiency: by extracting the characteristics of the data stream and calculating the frequency, the malicious software can be comprehensively represented in fine granularity, and a multi-step abnormal point detection technology is realized by combining the dimensionality reduction technology PCA and t-SNE in the machine learning method, the AP clustering algorithm and the 1SVM algorithm, so that the efficient detection of the Android malicious software is completed;
2) Easy expansion: under the environment supporting the Android platform, the method can effectively detect newly appeared malicious software or malicious software variants;
3) Intelligentization: because the malware is detected without depending on the known malware pattern, the normal behavior pattern is mined to detect the abnormal points, so that the malware is effectively identified, the problems that the traditional abnormal point detection technology excessively depends on dimension disaster and initial value setting and the like can be solved, and the problem that the detection accuracy is low when the known sample is insufficient when the novel malware or malware variants appear in the early stage is solved.
Drawings
FIG. 1 is a general framework schematic of the present invention;
FIG. 2 is a schematic diagram of data flow features extracted in the present invention;
FIG. 3 is a schematic diagram of a reference point and an abnormal point according to the present invention;
FIG. 4 is a schematic illustration of the anomaly score vector calculated by the present invention;
FIG. 5 is a schematic diagram of a 1SVM classification model in the present invention.
Detailed Description
The technical solution of the present invention is described in detail below, but the scope of the present invention is not limited to the embodiments.
As shown in fig. 1, the present invention comprises the following three steps: (1) Adopting a mixed dimensionality reduction technology EsttNE which combines the advantages of PCA and t-SNE into a whole; (2) The method for calculating the abnormal value score NPOD without the parameters is provided by combining an AP clustering algorithm, and the function of the abnormal value score not only considers the local distance between a candidate sample and a reference cluster, but also considers the global distance of the candidate sample and the reference cluster; (3) A one-class SVM classifier is trained for pre-malware. The method comprises the following specific steps:
step 1, acquiring a normal Android application program from the Google Play of an Android official websiteAnd from a virus data sample library (e.g. virus library)http://virusshare.com/) Obtaining a malicious App, and constructing an application program App sample set;
step 2, extracting the data stream in the sample set by using a FLOWDROID tool, thereby constructing a feature set X = (X) of the data stream frequency 1 ,x 2 ,...,x n )∈R m×n M is the counted number of data streams, i.e., the original feature dimension of the data set, for example, { user information → log };
step 3, constructing a characteristic vector by taking the data stream as a characteristic, taking the frequency of calling a corresponding data stream characteristic in each sample App as a characteristic value, and marking the frequency as 0 if the sample App does not call the corresponding characteristic value of a certain data stream; fig. 2 shows an example of the original features of the present embodiment, data streams (the calling frequency of these data streams will be the input features of estsne);
step 4, reducing the dimension of the high-dimensional data in the step 3 by adopting EstSNE dimension reduction technology;
step 5, dividing App samples into 13 subclasses (e.g. subclasses of account information, contact information, database operation, and the like) related to user sensitive information, as shown in fig. 4, if an App calls a contact information stored on a device through an application program interface, the App is classified into a "contact information class", which is a superimposable division, that is, one App may exist in multiple subclasses at the same time, because it is considered that location information, contact information, other sensitive information, and the like may be called at the same time in the same App;
step 6, as shown in fig. 3, clustering part of the normal apps in each subclass by using a near propagation algorithm AP, namely dividing the apps into different themes to mine the normal mode of the theme, and calculating the reference points of the theme;
step 7, calculating abnormal scores of the candidate sample set by adopting an NPOD method, namely calculating the abnormal scores of the candidate App in the 13 subclasses according to the 13 groups of reference point sets calculated in the step 6, marking the abnormal scores as 0 if the App is not divided into the corresponding subclasses, and constructing an abnormal score vector;
step 8, training a 1SVM (one-class Support Vector Machine) classifier model by adopting a pre-divided training set (all normal samples);
and 9, predicting and evaluating Android malicious software by adopting a pre-divided test set (comprising normal samples and malicious samples) and the 1SVM classifier trained in the step 8.
As shown in fig. 5, in order to evaluate the effectiveness of the method in detecting the Android malware, the embodiment introduces related evaluation criteria: precision (Precision), accuracy (Accuracy), F-measure, respectively defined as follows:
Figure BDA0002075456920000061
Figure BDA0002075456920000062
Figure BDA0002075456920000071
Figure BDA0002075456920000072
wherein, TP (true Positive): true positive, which is a positive sample correctly classified by the classifier; TN (True Negative): the true negative case refers to a negative sample correctly classified by the classifier; FP (False Positive): refers to a negative sample that is incorrectly labeled as a positive sample; FN (False Negative): a positive sample that is incorrectly labeled as a negative sample.
Under the same experimental environment, for example, c =256, g =0.0658, nu =0.06 is set in 1SVM and a polynomial kernel function is adopted, and comparison of the experimental results shown in table 1 can show that the present invention is superior to the conventional ORCA abnormal point detection method, wherein the ORCA abnormal point detection method is based on a K-nearest neighbor (KNN) algorithm, the Accuracy (Accuracy) of the present invention can reach 95.74%, and the Accuracy (Accuracy) of the ORCA method is 90.09%, that is, the Accuracy (Accuracy) of the present invention is improved by 5.65% under the same experimental environment.
TABLE 1 Experimental comparison of the method of the present invention and the ORCA anomaly detection method in the aspect of Android malware detection
Figure BDA0002075456920000073
The effectiveness of the method is verified by a ten-fold cross-validation method for aggregation of the real data acquired from Virusshire and Google Play in the embodiment, and the experimental result shows that the method can achieve the accuracy rate of 95.74%. Moreover, the method is compared with the traditional ORCA anomaly model under the same experimental conditions, and the comparison result shows that the performance of the multi-step anomaly point detection method created by the method is obviously superior to that of the ORCA method.
In conclusion, the method can simultaneously solve two problems of dimension disaster and excessive dependence on initial parameter setting, and is applied to Android malicious software detection for the first time; the data flow calling frequency of each application program is extracted to serve as an original feature, the EstSNE is used for reducing dimensions, then classification is carried out, an NPOD method is used for calculating abnormal scores of the samples in all the sub-classes, and finally the 1SVM classifier is trained to carry out malicious software prediction.

Claims (4)

1. A multi-step abnormal point detection method based on a neighbor propagation clustering algorithm is characterized by comprising the following steps: the method comprises the following steps:
step 1, acquiring a normal Android application program from Google Play of an Android official website, acquiring a malicious application program from a virus data sample library, constructing an application program sample set, wherein the application program sample set contains normal samples and malicious samples, and dividing the application program sample set into a training set and a testing set;
step 2, extracting the data stream in the sample set by using a FLOWDROID tool, thereby constructing a high-dimensional data set X = (X) of the data stream frequency 1 ,x 2 ,...,x n )∈R m×n M is the counted number of data streams, namely the original characteristic dimension of the data set, n is the number of samples in the sample setAn amount;
step 3, constructing a characteristic vector by taking the data stream as a characteristic, taking the frequency of calling a corresponding data stream characteristic in each sample application program as a characteristic value, and marking the frequency as 0 if the sample application program does not call the corresponding characteristic value of a certain data stream;
step 4, adopting EstSNE dimension reduction technology to reduce the dimension of the high-dimensional data set constructed in the step 2;
step 5, dividing application program samples into 13 subclasses related to user sensitive information; the subclasses are divided according to the SUSI standard;
step 6, clustering is carried out on part of normal application programs in each subclass by adopting a near propagation algorithm AP, namely, the application programs are divided into different themes to excavate the normal mode of the theme, and the reference points of the theme are calculated;
step 7, calculating the abnormal score of the candidate sample set by adopting an NPOD method, namely calculating the abnormal score of the candidate application program in the 13 subclasses according to the 13 groups of reference point sets calculated in the step 6, marking the abnormal score as 0 if the App is not divided into the corresponding subclasses, and finally constructing an abnormal score vector;
step 8, training an One-Class SVM classifier model by adopting a pre-divided training set;
and 9, predicting whether the Android application program is malicious software or not by adopting a pre-divided test set and then through the One-Class SVM classifier trained in the step 8.
2. The multi-step abnormal point detection method based on the neighbor propagation clustering algorithm according to claim 1, wherein: the detailed process of performing dimensionality reduction on the high-dimensional data set in the step 4 is as follows:
constructing probability distribution P among high-dimensional objects and probability distribution Q of the points in a low-dimensional space by an EstSNE dimension reduction method, and then obtaining the optimal low-dimensional representation of the points by minimizing the divergence of a target KL, namely:
Figure FDA0003826606910000021
p ij representing a sample x i And x j The similarity in the high-dimensional space X,
Figure FDA0003826606910000022
Figure FDA0003826606910000023
δ i represents the variance of the gaussian distribution; x is a radical of a fluorine atom i And x j Is a sample in a high dimensional space X;
q ij representing a sample y i And y j In a low dimensional space Y = [ Y = 1 ,y 2 ,...,y n ]∈R d×n D is data after dimensionality reduction, q ij =((1+||y i -y j || 2 )K) -1
Figure FDA0003826606910000024
y i And y j Are samples in a low dimensional space.
3. The multi-step abnormal point detection method based on the neighbor propagation clustering algorithm according to claim 1, wherein: the reference point calculation method in the step 6 comprises the following specific steps:
(6.1) using a negative Euclidean distance s (i, j) = - | | x i -x j || 2 Calculating a similarity matrix N between every two samples in a normal sample set s, and setting a reference degree p as a median of s;
(6.2) separately initializing attribution values A N×N And an attraction degree matrix R N×N Is 0;
(6.3) passing rules
Figure FDA0003826606910000025
Updating the attraction matrix by rules
Figure FDA0003826606910000026
UpdatingA matrix of the degree of attribution,
wherein, the attraction degree r (i, j) represents the attraction degree that the data point j is suitable for being represented by the class of the data point i, and the attribution degree a (i, j) represents the attribution degree that the data point i selects the data point j as the class representation of the data point i;
if the iteration times exceed the set maximum value or when the clustering center is not changed in a plurality of iterations, stopping calculation, determining the class center and various sample points, and otherwise, continuously updating the attraction degree r (i, j) and the attribution degree a (i, j) in an iteration manner;
(6.4) setting each cluster center as a reference point
Figure FDA0003826606910000027
Wherein k is the automatically determined number of clusters and h is the total number of cluster centers.
4. The multi-step abnormal point detection method based on the neighbor propagation clustering algorithm according to claim 3, wherein: the method for calculating the abnormal score by adopting the NPOD in the step 7 comprises the following steps:
(7.1) traversing the candidate sample set X for which an anomaly score needs to be calculated c
(7.2) passing formula
Figure FDA0003826606910000031
Calculating to obtain a reference set C ref (x c ) Wherein
Figure FDA0003826606910000032
Represents a reference point in (6.4);
(7.3) passing formula OutScr (x) c )=(locDist(x c )+gloDist(x c ) 2) of the candidate sample x c Abnormal score of (Outscr) g (x c ),
Wherein locDist (x) c )=[lo/(l-2)]×[o(x c )/l]L is the number of elements in the reference set,
gloDist(x c ) = gl/(k-2), k is the reference point calculated in (6.4)
Figure FDA0003826606910000033
The number of the (c) is greater than the total number of the (c),
Figure FDA0003826606910000034
Figure FDA0003826606910000035
are the elements in the reference set and,
Figure FDA0003826606910000036
(7.4) traverse 13 subclasses of 13 related user sensitive information to construct anomaly score vector Outscrvector (x) ← { Outscr } 1 (x),...,Outscr catNum (x)}。
CN201910452071.8A 2019-05-28 2019-05-28 Multi-step abnormal point detection method based on neighbor propagation clustering algorithm Active CN110162975B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910452071.8A CN110162975B (en) 2019-05-28 2019-05-28 Multi-step abnormal point detection method based on neighbor propagation clustering algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910452071.8A CN110162975B (en) 2019-05-28 2019-05-28 Multi-step abnormal point detection method based on neighbor propagation clustering algorithm

Publications (2)

Publication Number Publication Date
CN110162975A CN110162975A (en) 2019-08-23
CN110162975B true CN110162975B (en) 2022-10-25

Family

ID=67629654

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910452071.8A Active CN110162975B (en) 2019-05-28 2019-05-28 Multi-step abnormal point detection method based on neighbor propagation clustering algorithm

Country Status (1)

Country Link
CN (1) CN110162975B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110991508A (en) * 2019-11-25 2020-04-10 珠海复旦创新研究院 Anomaly detector recommendation method, device and equipment
CN112839327B (en) * 2021-01-21 2022-08-16 河北工程大学 Personnel validity detection method and device based on WiFi signals
CN113288122B (en) * 2021-05-21 2023-12-19 河南理工大学 Wearable sitting posture monitoring device and sitting posture monitoring method
CN113569920B (en) * 2021-07-06 2024-05-31 上海顿飞信息科技有限公司 Second neighbor anomaly detection method based on automatic coding

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106599686B (en) * 2016-10-12 2019-06-21 四川大学 A kind of Malware clustering method based on TLSH character representation
CN106845240A (en) * 2017-03-10 2017-06-13 西京学院 A kind of Android malware static detection method based on random forest
CN106919841A (en) * 2017-03-10 2017-07-04 西京学院 A kind of efficient Android malware detection model DroidDet based on rotation forest

Also Published As

Publication number Publication date
CN110162975A (en) 2019-08-23

Similar Documents

Publication Publication Date Title
CN110162975B (en) Multi-step abnormal point detection method based on neighbor propagation clustering algorithm
US11188649B2 (en) System and method for classification of objects of a computer system
Biggio et al. Poisoning behavioral malware clustering
Frank et al. Mining permission request patterns from android and facebook applications
Zarni Aung Permission-based android malware detection
Jerome et al. Using opcode-sequences to detect malicious Android applications
US20070136455A1 (en) Application behavioral classification
CN105229661B (en) Method, computing device and the storage medium for determining Malware are marked based on signal
US11580222B2 (en) Automated malware analysis that automatically clusters sandbox reports of similar malware samples
JP2020115320A (en) System and method for detecting malicious file
Shezan et al. Read between the lines: An empirical measurement of sensitive applications of voice personal assistant systems
WO2019061664A1 (en) Electronic device, user's internet surfing data-based product recommendation method, and storage medium
Imran et al. Using hidden markov model for dynamic malware analysis: First impressions
CN107402957B (en) Method and system for constructing user behavior pattern library and detecting user behavior abnormity
Sanz et al. Anomaly detection using string analysis for android malware detection
Chen et al. More semantics more robust: Improving android malware classifiers
Allix et al. Machine learning-based malware detection for Android applications: History matters!
Wolfe et al. Comprehensive behavior profiling for proactive Android malware detection
Li et al. Novel Android Malware Detection Method Based on Multi-dimensional Hybrid Features Extraction and Analysis.
CN106998336B (en) Method and device for detecting user in channel
CN105631336A (en) System and method for detecting malicious files on mobile device, and computer program product
Morcos et al. A surrogate-based technique for Android malware detectors' explainability
Ndagi et al. Machine learning classification algorithms for adware in android devices: a comparative evaluation and analysis
Lajevardi et al. Markhor: malware detection using fuzzy similarity of system call dependency sequences
Mirzaei et al. Scrutinizer: Detecting code reuse in malware via decompilation and machine learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant