CN108154178A - Semi-supervised support attack detection method based on improved SVM-KNN algorithms - Google Patents
Semi-supervised support attack detection method based on improved SVM-KNN algorithms Download PDFInfo
- Publication number
- CN108154178A CN108154178A CN201711416340.2A CN201711416340A CN108154178A CN 108154178 A CN108154178 A CN 108154178A CN 201711416340 A CN201711416340 A CN 201711416340A CN 108154178 A CN108154178 A CN 108154178A
- Authority
- CN
- China
- Prior art keywords
- user
- sample
- classification
- svm
- knn
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1416—Event detection, e.g. attack signature detection
Landscapes
- Engineering & Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computer Security & Cryptography (AREA)
- Theoretical Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of semi-supervised support attack detection method based on improved SVM KNN algorithms, including:Marked user is collected as training set and trains initial SVM classifier;Unmarked user is collected using initial SVM classifier and carries out preliminary classification;The user data of normal users is incorporated in training set, using improved KNN similarities formula as the range formula of KNN algorithms, secondary classification is carried out to remaining user data;Update training set, and the SVM classifier that re -training is new;Judge whether classification results reach optimum detection performance, if it is determined that being then to export final classification device, otherwise recycle the user concentrated to unmarked user and classify;Support attack detecting is carried out to user data using final classification device.By technical scheme of the present invention, the generalization ability of support attack detecting and detection accuracy are improved, in a small amount of information and continually changing environment, the performance than previous attack detecting algorithm is more superior.
Description
Technical field
The present invention relates to technical field of network security more particularly to a kind of based on the semi-supervised of improved SVM-KNN algorithms
Hold in the palm attack detection method.
Background technology
In practical network environment, a large amount of user identity can not determine, and the support attack faced also can be increasingly
It is complicated.Such as in Taobao's shopping website, some conditions it can be determined by liveness, positive rating, imperial crown user etc.
It is real user, but most of user, it only simply operates some shopping process or even will not all go to evaluate, this part
User can not determine whether it is real user.Attacker can also construct the attack to become increasingly complex with the understanding to website
Model.But existing attack detecting algorithm, it can be true in face of the novel support attack to become increasingly complex and only a small amount of user
In the case of determining identity, detection result is unsatisfactory.
Invention content
At least one of regarding to the issue above, the present invention provides a kind of half prisons based on improved SVM-KNN algorithms
Support attack detection method is superintended and directed, initially sets up label user data set and unmarked user data set, is secondly used according to a small amount of label
User data trains an initial SVM classifier, calculates the distance of unmarked user data and initial SVM classifier boundary, distance
More than the threshold values of setting, then classified using SVM, otherwise classified using KNN, the data newly marked are added to training
It concentrates, re -training SVM classifier, the continuous iteration above process finally obtains a higher svm classifier of nicety of grading
Device.It is utilized the accuracy of the flag data of the detector of semi-supervised learning, and the reasonable employment distribution rule of data untagged
Rule, has combined SVM and KNN algorithms, so as to improve generalization ability and detection accuracy, in a small amount of information and continually changing ring
In border, the performance than previous attack detecting algorithm is more superior.
To achieve the above object, the present invention provides a kind of semi-supervised supports based on improved SVM-KNN algorithms to attack inspection
Survey method, including:User's set is divided into marked user's collection and unmarked user collection, marked user is collected as training set
Train initial SVM classifier;Any sample of users that unmarked user concentrates tentatively is divided using initial SVM classifier
Class;The user data for being will be marked to be incorporated in the training set in preliminary classification, remaining user data is incorporated to
Classification boundaries are nearby in vector set;Using improved KNN similarities formula (1) as the range formula of KNN algorithms, to the classification
User near border vector set carries out secondary classification,
Wherein, A+b+c=1;
The label user data that KNN algorithm classifications obtain is incorporated in the training set, and utilizes the updated instruction
Practice the new SVM classifier of collection re -training;Judge whether classification results reach optimum detection performance, if it is determined that being then to export most
Otherwise whole grader recycles the user concentrated to the unmarked user and classifies;Using the final classification device to user
Data carry out support attack detecting.
In the above-mentioned technical solutions, it is preferable that any sample concentrated using initial SVM classifier to unmarked user
This user carries out preliminary classification and specifically includes:An optional sample of users is concentrated in the unmarked user, utilizes SVM calculation formula
(2) value of categorised decision function f (x) is calculated;
Judge the absolute value of the categorised decision function | f (x) | whether it is more than given classification thresholds ε (0<ε<1);If sentence
Being set to is, then is normal users by the sample labeling.
In the above-mentioned technical solutions, it is preferable that the distance using improved KNN similarities formula (1) as KNN algorithms
Formula carries out secondary classification to the user in vector set near the classification boundaries and specifically includes:By the use in the training set
The user data vectorization consistent with the sample of users data progress to be sorted in the classification boundaries nearby vector set;Using described
Range formula calculates the distance of sample to be sorted and each sample in the training set, and the k sample that chosen distance is nearest
Arest neighbors as the sample to be sorted;The weighted value that each sample in arest neighbors belongs to different classifications is calculated successively;Compare
A certain sample belongs to the size of the weighted value of different classifications, and the sample is divided into the classification of weighted value maximum.
In the above-mentioned technical solutions, it is preferable that a certain sample belongs to the c of each classificationjWeight vectors be qj,qj=
(qj1,qj2,...,qjp), wherein qj1+qj2+...+qjp=1, weighted value qjkIt is characterized a tkCorresponding weight, weighted value it is big
Small expression tkSignificance level in different classifications.
In the above-mentioned technical solutions, it is preferable that judge whether classification results reach optimum detection performance, if it is determined that being, then
Final classification device is exported, is otherwise recycled and classification is carried out to the user that the unmarked user concentrates is specifically included:
According to classification results, calculate the accuracy rate and recall ratio of assorting process, wherein, the classification results include really,
It really bears, is false just with false minus four kinds of data, described really and really bear is respectively the number being appropriately determined as attack user and real user
According to the data for being judged to attacking user and real user for mistake are just being born in the vacation with vacation, and the calculation formula of the accuracy rate is:
Accuracy rate=really/(really+vacation is just), the calculation formula of the recall ratio is:Recall ratio=really/(really+false negative);Judge
Whether the accuracy rate and the recall ratio reach preset optimal threshold;If the accuracy rate and recall ratio reach default
Optimal threshold, then will obtain the SVM-KNN graders of the classification results and exported as final classification device, otherwise recycled to institute
The user that unmarked user concentrates is stated to classify.
Compared with prior art, beneficial effects of the present invention are:By provided by the invention a kind of based on improved SVM-
The semi-supervised support attack detection method of KNN algorithms initially sets up label user data set and unmarked user data set, secondly root
An initial SVM classifier is trained according to a small amount of label user data, calculates unmarked user data and initial SVM classifier boundary
Distance, distance be more than setting threshold values, then classified using SVM, otherwise classified using KNN, the number that will newly mark
According to being added in training set, re -training SVM classifier, the continuous iteration above process finally obtains a nicety of grading and compares
High SVM classifier.It is utilized the accuracy of the flag data of the detector of semi-supervised learning, and the unmarked number of reasonable employment
According to the regularity of distribution, combined SVM and KNN algorithms, so as to improve generalization ability and detection accuracy, in a small amount of information and not
In the environment of disconnected variation, the performance than previous attack detecting algorithm is more superior.
Description of the drawings
Fig. 1 semi-supervised support attack detecting sides based on improved SVM-KNN algorithms disclosed in an embodiment of the present invention
The flow diagram of method;
Fig. 2 is the principle of classification schematic diagram of SVM-KNN graders disclosed in an embodiment of the present invention;
Fig. 3-Fig. 8 is the datagram that attack detecting disclosed in an embodiment of the present invention is tested.
Specific embodiment
Purpose, technical scheme and advantage to make the embodiment of the present invention are clearer, below in conjunction with the embodiment of the present invention
In attached drawing, the technical solution in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is
The part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people
Member's all other embodiments obtained under the premise of creative work is not made, shall fall within the protection scope of the present invention.
The present invention is described in further detail below in conjunction with the accompanying drawings:
As depicted in figs. 1 and 2, it is attacked according to a kind of semi-supervised support based on improved SVM-KNN algorithms provided by the invention
Detection method is hit, including:User's set is divided into marked user's collection and unmarked user collects, by marked user by step S11
Collection trains initial SVM classifier as training set;Step S12 appoints unmarked user's concentration using initial SVM classifier
One sample of users carries out preliminary classification;Step S13 will mark the user data for being to be incorporated to training set in preliminary classification
In, remaining user data is incorporated near classification boundaries in vector set;Step S14, with improved KNN similarities formula (1)
As the range formula of KNN algorithms, secondary classification is carried out to the user in vector set near classification boundaries,
Wherein, cosine similarityThe cosine added in after weighted value is similar
DegreeCommon trait item number accounts for total characteristic item number purpose ratioParameter a+b+c=1, xi、dlFor the feature vector of sample, qjkIt is characterized a tk's
Weighted value, wikAnd wjk(1≤k≤p) is respectively sample diAnd djIn k-th of characteristic item weight, xik,wjk(1≤k≤p) is the
Coordinate in k dimensions, p are characterized a number,Weight for class center vector;
The label user data that KNN algorithm classifications obtain is incorporated in training set by step S15, and utilizes updated instruction
Practice the new SVM classifier of collection re -training;Step S16, judges whether classification results reach optimum detection performance, if it is determined that be,
Final classification device is then exported, the user concentrated to unmarked user is otherwise recycled and classifies;Step S17, utilizes final classification
Device carries out support attack detecting to user data.
In the above embodiment, it is preferable that any sample of users concentrated using initial SVM classifier to unmarked user
Preliminary classification is carried out to specifically include:An optional sample of users is concentrated in unmarked user, is calculated using SVM calculation formula (2)
The value of categorised decision function f (x);
Judge the absolute value of categorised decision function | f (x) | whether it is more than given classification thresholds ε (0<ε<1);If it is determined that it is
It is then normal users by sample labeling to be.
In the above embodiment, it is preferable that using improved KNN similarities formula (1) as the range formula of KNN algorithms,
Secondary classification is carried out to the user in vector set near classification boundaries to specifically include:By the user data in training set and classification side
Sample of users data to be sorted near boundary in vector set carry out consistent vectorization;Sample to be sorted is calculated using range formula
With the distance of each sample in training set, and arest neighbors of the nearest k sample of chosen distance as sample to be sorted;Successively
Calculate the weighted value that each sample in arest neighbors belongs to different classifications;More a certain sample belongs to the big of the weighted value of different classifications
It is small, and the sample is divided into the classification of weighted value maximum.
In the above embodiment, it is preferable that a certain sample belongs to the c of each classificationjWeight vectors be qj,qj=(qj1,
qj2,...,qjp), wherein qj1+qj2+...+qjp=1, weighted value qjkIt is characterized a tkCorresponding weight, the size table of weighted value
Show tkSignificance level in different classifications.
In the above embodiment, it is preferable that judge whether classification results reach optimum detection performance, if it is determined that be, then it is defeated
Go out final classification device, otherwise recycle the process classified to the user that unmarked user concentrates and specifically include:It is tied according to classification
Fruit calculates the accuracy rate and recall ratio of assorting process, wherein, classification results include really, it is very negative, false just with false minus four kinds of numbers
According to it is respectively the data being appropriately determined as attack user and real user really and really to bear, and vacation is just born with vacation to be determined as mistake
The data of user and real user are attacked, the calculation formula of accuracy rate is:Accuracy rate=really/(really+vacation is just), recall ratio
Calculation formula is:Recall ratio=really/(really+false negative);Whether judging nicety rate and recall ratio reach preset best threshold
Value;If accuracy rate and recall ratio reach preset optimal threshold, the SVM-KNN graders of classification results will be obtained as most
Whole grader output, otherwise recycles the user concentrated to unmarked user and classifies.
In this embodiment, for support attack problem, using the attack detecting side of the semi-supervised learning based on machine learning
Method.Common attack detection method supervised learning, unsupervised learning and semi-supervised learning based on machine learning.
Specifically, supervised learning refers to the parameter of the sample adjustment grader using one group of known class, reaches institute
It is required that the process of performance.Support attack is come out by different Construction of A Model, and support attack certainly exists some spies with real user
The otherness of sign using these features, is detected support attack.Many features index has been proposed in research at present, can be with
It is divided into common index, the index based on model, index in general picture.What the scoring of common index from target item and filler was distributed
Difference portrays the difference between support attack and normal users, such as average departure degree (RDMA), arest neighbors similarity
(DegSim) etc..Based on the index of model according to the distinctive marking mode of different support challenge models, different attack general pictures come structure
Build index, for distinguishing support attack and normal users, such as user's average score and its mean value that scores variation (MeanVar),
The highest item that scores and the average deviation (FMTD) of remaining item set etc..Statistical property in general picture between characteristic evidences scoring
Difference is distinguished, such as degree of concern (TMF) index of target item etc..
Unsupervised learning refers to be solved according to the training sample of classification unknown (not being labeled) various in pattern-recognition
Problem.Unsupervised detection algorithm generally classifies to different user dependent on cluster mode, the model used according to algorithm
Difference can be divided into the algorithm based on dimensionality reduction and the algorithm based on model.The algorithm of dimensionality reduction is found most main according to the thought of dimensionality reduction
The variable wanted gives expression to rating matrix, then according to normal users and the difference detection support attack of support attack.Existing research has base
In the unsupervised learning algorithm (PCA Select Users) that the user of principal component analysis screens.The algorithm is using PCA dimensionality reductions
Principle, extracts mutually independent feature, the correlation between feature it is low be then likely to be support attack.Also unsupervised attack
General picture searching algorithm (Un RAP), algorithm determine if it is support attack using the matching degree of user and rating matrix height.
Algorithm based on model utilizes the potential expression of different models, reconstructs the situation of rating matrix under the model, passes through the mould
Difference under type is attacked to detect support.Existing research has based on probability latent semantic analysis (Probabilistic latent
Semantic analysis, PLSA) support attack probe algorithm.The average distance of each user is calculated, and uses " density "
Index weighs it.Think that " density " of support attack is relatively large, therefore support attack detecting can be come out.In addition based on unusual
The technology that value is decomposed proposes SVD detection algorithms.Since certain rule is presented in the scoring of support attack, support attack is by singular value
There are larger differences for the low-dimensional model of low-dimensional model and normal users after decomposition.
Semi-supervised learning refers to using a large amount of Unlabeled data and simultaneously using flag data, to carry out pattern knowledge
It does not work.Semi-supervised detection algorithm refers to only know the general picture of sub-fraction support attack sample, and utilizes this sub-fraction support
The general picture feature of attack constructs detection algorithm, and such as Semi-SAD algorithms, algorithm is being marked first using Naive Bayes Classifier
Training preliminary classification device, then improves grader in unmarked data, promotes the detection performance of grader in data.Half supervises
Educational inspector practises, i.e., synthetically using markd sample data and unmarked sample data, to generate suitable disaggregated model.It
Thought be using a small amount of marked one fundamental classifier of sample training, recycle fundamental classifier to unmarked sample
Data carry out classification marker, then using this part by the sample set that fundamental classifier marks as new training sample set again
Training grader, finally obtain one by a small amount of known mark sample set and a large amount of unknown marks sample set training and
The grader come.
Present invention joint SVM algorithm and improved KNN algorithms, attack inspection is carried out by way of semi-supervised learning
It surveys.
Specifically, nearest neighbour method (NearestNeighbor, abbreviation NN) is most important method in pattern-recognition nonparametric method
One of, initial nearest neighbour method is to be proposed by Cover and Hart in nineteen sixty-eight, so-called k nearest neighbor, is exactly to investigate and sample to be sorted
This K most like sample judges the category attribute of sample to be sorted according to the classification of this K sample.The base of NN graders
Present principles:For a sample vector x to be sorted, using all training samples as representing a little, K phase is found out in point is represented
As sample, then using this K text as candidate categories, using the value of x and the similarity of K sample as measurement weight, together
When set similarity threshold values, it is possible to determine that the classification of x.
The size of two similar sample degrees of correlation is known as similarity, can be with when we carry out vector to sample represents
The similarity degree of two texts is weighed using the distance between sample.There is much the computational methods of distance between two samples
Kind, such as:Euclidean distance (Euclidean distance), COS distance (Cosine distance), city distance (City-
Block distance), correlation distance (Correlation distancce), Hamming distance (Hamming distance) etc..
In vector space model, after feature extraction, each sample can be converted by opposite one group of entry sample
Or feature (t1,t2,t3,...,tp) composition the relatively low space vector of dimension, each feature tiAll there are one corresponding power
Weight values wi(represent character pair tiSignificance level in the sample), the feature t newly chosen after Feature Selection1,t2,t3,...,tp
The latitude of p dimension coordinates system, w can be regarded as1,w2,w3,...,wpIt is corresponding characteristic value on each latitude.Have in this way sample to
The coordinate representation of amount, it is possible to the similarity between sample is measured, if sample vector is di=(wi1,wi2,wi3,...,wip), dj=
(wj1,wj2,wj3,...,wjp), the definition of several distances is given below.
(1) Euclidean distance (Euclidean distance)
It is for p dimensional feature space distance definitions:
Wherein diAnd djThe feature vector of sample is expressed as, p is characterized dimension of a vector space, wikRepresent sample diKth
The coordinate of dimension, wjkRepresent sample djKth dimension coordinate.For two samples apart from smaller, two sample similarity degrees are higher, may belong to same
A kind of possibility is also bigger, conversely, it is then more dissimilar, it may more belong to different classifications.
(2) COS distance (Cosine distance)
The formula that usually used cosine calculates distance is a kind of blue formula distance, generally using the inner product of feature vector or
The cosine of angle theta calculates, and vectorial angle is smaller, that is, cosine value is bigger illustrates that similarity is higher.Use COS distance
The formula for calculating the similarity of two feature vectors is as follows:
Or
Wherein wikAnd wjk(1≤k≤p) is respectively sample diAnd djIn k-th of characteristic item weight, p is characterized a number,
That is characteristic vector space dimension.Formula (5) is after first feature vector length is normalized in fact, then seeks inner product, normalizing
It is so that the inconsistent text of length is comparable to change purpose.
(3) city distance (City-block distance)
Sampling feature vectors diAnd djBetween city distance definition it is as follows:
D(di,dj)=| wi1-wj1|+|wi2-wj2|+...+|wip-wjp| (6)
di=(wi1,wi2,...,wip) and dj=(wj1,wj2,...wjp) represent sample WiAnd WjFeature vector, with D (di,
dj) represent sample point d in sample setiAnd djBetween distance.
(4) correlation distance (Correlation distance)
If
I-th row d of matrixi=(wi1,wi2,...,wip) character pair vector for sample, ci(1≤i≤c) is sample
This classification, Y classes are the n-dimensional vector of classification composition, and the correlation distance of any two sample vector is defined as:
Wherein
(5) Hamming distance (Hamming distance)
In computer binary system theory, Hamming weight represent be symbol in code word " 1 " number, Hamming weight also referred to as
Code weight is abbreviated as W, such as code word " 110010 ", and code length is just p=6, and code weight is just W=3.
Hamming distance between the code length two code word x=(x1x2...xk...xp) for p, y=(y1y2...yk...yp)
It is defined as:
WhereinRepresent mould 2 plus operation, xk∈{0,1},yk∈{0,1}.X in D (x, y), y are code word, and D (x, y) is code word
On equivalent site the number of different symbol the sum of, its size embodies the difference degree between two code words, formula (8)
Value it is bigger explanation two code word differences it is bigger.
Text is after Feature Selection, using boolean's Weight, it is possible to be arranged in the code word that code length is p, such as sample
This W1It can be expressed as:d1=(10011100101010.....101), 0,1 state for corresponding respectively to sample herein, at certain
There is no the sample informations on component positions to remember 0, and there are sample information note 1, such sample sets on certain this component positions
It closes just and codeword set establishes 1-1 correspondences.It is exactly to ask in fact between two code words so the problem of research text is similar
Hamming distance.If text W1And W2Corresponding code word is respectively d1And d2, then the Hamming distance between two samples, it is possible to use
Formula (8) represents.D(d1,d2) vector space model of the value between 0 and p in, when two sample vector p bit words are complete
When exactly the same, when Hamming distance between the two is that 0, p bit words are entirely different, Hamming distance p, so D (x, y) is quantitative
Ground describes the difference degree between different texts.
When carrying out sample classification, sample set is converted into codeword set first, for sample W1Code word d1=
(x1x2...xk...xn) and sample W2Code word d2=(y1y2...yk...yn), so that it may it is similar to be defined with following formula
Degree:
Wherein xk,ykSample W is represented respectively1Corresponding d1With sample W2Corresponding d2Numerical value on k-th of component, value 0 or
Person 1.Similarity degree is described with formula (9), when two samples are substantially similar, i.e., code word is essentially identical, similarity Sim (d1,
d2) value just closer to 1;Conversely, then code word is entirely different, Sim (d1,d2) closer to 0.
The rudimentary algorithm step of KNN includes:
Step 1:Data vector during training is gathered;
Step 2:The data to be sorted that will carry out KNN algorithms also carry out the vectorization caused with training set unification;
Step 3:According to range formula, such as cosine formula:
The distance of sample to be sorted and each sample in training set is calculated, k nearest sample of chosen distance is used as should
K arest neighbors of sample to be sorted;
Step 4:According to the k neighbour selected, the weighted value for belonging to every one kind is calculated successively.Weighted value I (dl,cj)
Instead of specific method is more in the prior art, and details are not described herein.
Wherein KNN represents k arest neighbors.
Step 5:Compare weight size, which kind of weighted value maximum belonged to, then which kind of belongs to.
The formula of the similarity of above-mentioned KNN algorithms is very easy to find following two problems by studying and testing:
(1) sample to be tested x is not related in similarity formulaiWith different classes of of training set with the presence or absence of similitude with
And similarity degree, aggregation extent and each the sample to be tested point for not accounting for every a kind of sample are each in training set
Class cjClass center distance, aggregation extent is bigger and xiApart from certain one kind cjClass center is nearer, then sample to be tested is got over such
It is similar;
(2) training set that different classification is not accounted in formula may be because the characteristic item number difference included to classification
As a result different influences is generated, a little classes having are laid particular stress on comprising some featured items, and some classes are laid particular stress on comprising other characteristic items, class
The not different characteristic items included are also different, that is, do not account for common between sample point in sample to be sorted and every a kind of sample set
The number of features possessed.
Denominator, which is equivalent to, divided by the mould of sample vector is long, has done normalizing can be seen that for above-mentioned COS distance formula (10)
Change is handled, and eliminates influence of the sample vector length to classifying quality.
First problem is considered first, is remembered per one kind cjClass center vector beIt is instruction
Practice set in all sample vectors summation and then divided by training set sample size n sample range calculate result.Sample x to be sortediAway from
From each class cjClass center similarity can be expressed as:
Secondly consider Second Problem, sample x to be sortediWith every one kind c in training setjEach sample in training set
This point co-owns the situation of number of features, noteAccounting for total characteristic item number for common trait item number, (characteristic vector space is tieed up
The ratio of number p), is defined with Hamming similarity
Wherein xi=(xi1,xi2,...,xip) represent sample to be sorted, dj=(wj1,wj2,...,wjp) it is to appoint in training set
Meaning sample, xik,wjkCoordinate in (1≤k≤p) kth dimension, takes 0 or 1.
2 points more than considering, improved COS distance similarity is defined as:
Wherein a+b+c=1 can seek optimum allocation ratio by many experiments, make improved similarity classification effect
Fruit is optimal.Improved COS distance formula can solve above-mentioned two problems, but can clearly be found out by formula, special
Sign item does not introduce weighted value, and characteristic items all so are all equal weight values, so also needing to final step, introduces weight
Value, can more meticulously adjust in this way, improve classifying quality.According to category attribute, we set sample and belong to every one kind cj's
Weight vectors are qj,qj=(qj1,qj2,...,qjp), wherein qj1+qj2+...+qjp=1, weighted value qjkIt is characterized a tkIt is corresponding
Weight, qjkSize represent tkSignificance level in different classes of.
In conclusion the cosine similarity formula after weighted value is added in formula (10):
Ibid, sample to be sorted and each class cjThe similarity formula at class center also introduce weight on the basis of formula (12)
Value:
Comprehensive (15) formula, (16) formula, (13) formula show that the similarity formula finally improved is:
Secondly, support vector machines (SVM) is that Vapnik et al. is proposed in nineteen ninety-five according to structural risk minimization
A kind of machine learning algorithm.It has structural risk minimization, and the strong distinguishing feature of generalization ability for classification problem, is supported
Vector machine algorithm can be sketched as the sample in the input space is mapped to a feature space by certain nonlinear function
In, make two class samples linear separability in this feature space, and it is super to find optimum linearity classification of the sample in this feature space
Plane.SVM algorithm includes linear classification and Nonlinear Classification.
(1) linear classification
SVM defines optimal hyperlane, and searching optimum linearity hyperplane is converted into solution quadratic programming problem, is based on
Sample space by Nonlinear Mapping, is mapped to high-dimensional feature space by Mercer theorems, and sample is solved thereby using linear method
Nonlinearity problem in this space.
Support vector machines is proposed for two classification.Assuming that training sample (xi,yi), i=1,2 ..., l, x ∈ Rd,yi
∈ { -1,1 }, there are Optimal Separating Hyperplane wx+b=0, for classifying face is made correctly to classify all samples and has class interval,
It must satisfy:
yi[(w·xi)+b]-1≥0 (18)
Calculating class interval is:
It is required that maximum class interval 2/ | | w | |, i.e. requirement minimizes | | w | |, then solving optimal hyperlane problem can table
Constrained optimization problem is shown as, i.e., under the constraint of formula (18), minimizes function:
Introduce Lagrange functions:
Wherein, αi>0 is Lagrange coefficients.Formula (21) is sought into local derviation to w and b respectively and it is enabled to be equal to 0, it is possible to will
The above problem is converted into simple dual problem.
Formula (22) and formula (23) are substituted into (21), you can obtain antithesis optimization problem:Solve the maximum of lower array function
Value:
yi[(w·xi)+b]-1≥0 (25)
Wherein, αi>=0, i=1 ..., l
This is the quadratic function extreme-value problem (QP, Quadratic Programming) under an inequality constraints.According to
Karush-Kuhn-Tucker (KKT) condition, the solution of the optimization problem must satisfy:
αi{yi[(w·xi)+b] -1=0, i=1 ..., l (27)
Therefore, the corresponding α of most samplesiIt is 0 to be, αi≠ 0 corresponds to the sample that equal sign is set up in formula (5.5.1)
Referred to as supporting vector.In algorithm of support vector machine, supporting vector is the key element in training set, they are from decision boundary most
Closely.If removing other all training samples, then training is re-started, identical classifying face will be obtained.
After solving above-mentioned quadratic programming problem, then categorised decision function is represented by:
Summation in formula only carries out supporting vector, i.e., the α being only not zeroiCorresponding training sample determines classification knot
Fruit, and other samples are unrelated with classification results.b*It is classification threshold values.When training sample set is linearly inseparable, introduce non-negative
Slack variable ξ, i=1,2 ..., l, the optimal problem for damp plane of classifying are:
Its dual problem is that the maximum value of lower array function is solved to α:
s.t.yi[(w·xi)+b]≥1-ξi (31)
Wherein C>0 is a constant, referred to as error punishment parameter, it controls the punishment degree for dividing mistake sample:ξiBe
The non-negative slack variable introduced during training sample linearly inseparable.
(2) Nonlinear Classification
For Nonlinear Classification problem, then using appropriate interior Product function K (xi,yj) it can realize a certain non-linear change
Linear classification after changing, the object function optimized at this time become:
And corresponding categorised decision function representation is:
Above categorised decision function is exactly support vector machines.It can obtain as described above, original problem is converted into
Dual problem so that the complexity of calculating depends no longer on space dimensionality, but depending on the branch in sample number, especially sample
Vectorial number is held, this feature of support vector machines enables it to effectively tackle higher-dimension problem.
In SVM, kernel function K (xi,xj) introducing, can realize that the inner product operation of higher dimensional space is converted into former space
Inner product kernel function calculates, and Nonlinear Classification is realized in the case where not increasing algorithm complexity.Different kernel functions, can be with structure
Different SVM is built out, the selection of kernel function is most important.Common kernel function has following four:
(1) linear inner product kernel function
K(xi,xj)=(xi·xj) (35)
(2) polynomial kernel
K(xi,xj)=[(xi·xj)+C]q, q > 0 (36)
(3) radial direction base core
K(xi·yj)=exp (- | | xi-xj||2/2σ2) (37)
(4) two layers of neural network core
K(xi,xj)=tanh (v (xi·xj)+θ) (38)
As shown in Figure 1, mistake divides the distribution of sample point to carry out analysis to find during by svm classifier, SVM classifier and other
Grader it is the same, error sample point is all near interface, therefore we can be by improving the sample near interface
Nicety of grading improve classification performance.SVM can be regarded as every class by we, and only there are one the 1NN graders for representing point.Work as sample
When this point is near interface, since SVM only takes one to represent a little every class supporting vector, the representative point cannot be good sometimes
Such is represented, it is because KNN is so as to make classification using all supporting vectors of every class as representative point to be at this moment combined it with KNN
Utensil has higher classification accuracy.Specifically, it for sample x to be identified, calculates x and two class supporting vectors represents point x+ and x-
Range difference, if range difference is more than a given threshold value, i.e. x from classifying face farther out, such as region I and II in Fig. 1, uses SVM
Classification can generally divide pair.When range difference is less than a given threshold value, i.e. x is nearer from interface, that is, falls into region III, such as
Classification SVM, the distance for representing point that only calculating x and two classes are taken is easier misclassification, at this moment using KNN to sample point
Class obtains it judgement using each supporting vector as representing the distance that a little, calculates sample to be identified and each supporting vector.
Included the following steps using basic SVM-KNN classifier algorithms:
Step 1:Supporting vector and constant b is obtained using traditional SVM algorithm.If T is test set, TsvFor supporting vector collection, k
For the number taken, ε is the threshold value of classification, is typically set to 1, and by grader system 0 if ε is 0, system becomes traditional SVM
Grader;
Step 2:IfX ∈ T are then taken, ifThen stop;
Step 3:Value is substituted into formula to calculate, chooses linear separability or linearly inseparable formula as the case may be;
Step 4:If | f (x) |>ε then directly enables f (x) export, if | f (x) |≤ε transmits TSV, x, k carry out KNN algorithms
Classification, with TSVFor whole sample sets, obtained return value is output valve;
Step 5:T=T- { x } turns to step 1.
Since the sample of svm classifier mistake is concentrated mainly near interface, and the sample point near interface is most
It is supporting vector, therefore for the classification performance for improving SVM, KNN graders can be combined, to the different samples being distributed in space
Point uses different sorting techniques, and classifier performance is improved by the way of semi-supervised learning.Support attack detecting is one and changes
The process in generation initially sets up label user data set and unmarked user data set.Secondly, according to a small amount of label user data instruction
Practice an initial SVM classifier, calculate the distance of unmarked user data and initial SVM classifier boundary, distance is more than setting
Threshold values, then classified using SVM, otherwise classified using KNN.The data newly marked are added in training set, weight
New training SVM classifier.The continuous iteration above process, finally obtains a higher SVM classifier of nicety of grading.
Semi-supervised are applied to of SVM-KNN is held in the palm in attack detecting, which is improved.The algorithm is according to not
Same situation, using different graders, is merged into re -training in training set, iteration is instructed to the end always by new flag data
Practise a higher grader of precision, therefore in order to improve algorithm performance, SVM and KNN can be carried out certain independent
It improves.For example, SVM can be improved in convergence rate, these unbalanced problems of data.
During the concrete practice of the present invention, the improved semi-supervised attack detection methods of SVM-KNN include the following steps:
Step 1:User's set is divided into two parts by known users set, and a part is marked set L={ (u1,cj),
(u2,cj),...,(um,cj), m represents the quantity of label user, cjRepresent classification, the value of j is 1 and 2, and attack detecting is two
Classification problem, altogether only two classes, c1Value represents normal users, c for 12Value represents for -1 and attacks user.Another part is not
Label user's collection is combined into U={ u`1, u`2..., u`n, n represents unmarked number of users.It regard label user's collection as training set,
Train initial SVM classifier;
Step 2:An optional sample u` from unmarked user data set Ui, this obtains classification by SVM calculation formula
The value of decision function f (x), formula are following (non-linear formula):
Step 3:When | f (x) |>ε can be determined that u`iFrom classification boundaries farther out, can direct output category result, and will
The data newly marked are directly incorporated into training set.When | f (x) |<ε, it is possible to determine that u`iNearer from classification boundaries, wherein ε is given
Classification thresholds (0<ε<1) the nearer data of the partial distance classification boundaries, are added in into vector set U near classification boundariessv={ ui
∈ U, i=1,2 ..., k }, wherein k is the number of near border vector;
Step 4:Using improved KNN to set UsvIn user data reclassified;
Step 5:The boundary user data newly marked that KNN classifies are put into together in original training set, are being expanded
Training set utilizes updated training set to train new SVM classifier;
Step 6:Whether judging result reaches best detection performance, if it is, output final classification device;If not,
2 re-optimization training user's data sets are then gone to step, carry out SVM training, loop iteration.
Assessment detection performance is assessed by accuracy rate and recall ratio two indices, and four kinds of data are included in grouped data,
It is really and very negative to represent the number being appropriately determined as attack user and real user respectively, it is false just with vacation is negative represents mistake respectively and sentence
It is set to the number of attack user and real user.Accuracy rate and the calculation formula of recall ratio are respectively:Accuracy rate=really/(true
Just+vacation is just), recall ratio=really/(really+false negative).
In above-mentioned algorithm, during each repetitive exercise grader, add in the mark quality of boundary sample in training set to point
The influence of class effect is very big, but since label user data is insufficient, the classification capacity of initial SVM is weaker, easily wrong point
Boundary sample.Therefore, it herein during update training set every time, introduces KNN and classifies to boundary sample, assist SVM
Optimize the mark quality of data boundary, so as to improve the accuracy of detection of final classification device.
It is tested for the semi-supervised support attack detecting algorithms of improved SVM-KNN.
Experiment is using MovieLens100K data sets.The data set contains 943 users, and 1682 films are carried out
1 to 5 score data, while every user at least gives a mark to 20 films.It is just common to give tacit consent to original user
Family, scoring are normal believable.Structure attack user, attack scale is 15%, and filling rate is respectively 3%, 5%, 10%,
15%, 20%, attack type has random attack, mean value attack, popular attack.Data set is divided into training set and test is gathered, instruction
Practice collection comprising 189 normal users, 128 attack users, test set has 754 normal users, 113 attack users (for
It is compared with similar scheme, data distribution schemes are tested fully according to reference to other people).In experiment, SVM uses radial direction base letter
Number.
It is from accuracy rate and the recall ratio of the different detection methods under different attacks, it can be seen that common as shown in Fig. 3-Fig. 8
The semi-supervised attack detectings of SVM-KNN better than SVM classifier, improved SVM-KNN algorithms (optimize KNN) performance more
Excellent, accuracy rate and recall ratio under different attack patterns are above common SVM-KNN detections and SVM classifier detection.
The above is embodiments of the present invention, it is contemplated that there are problems that SVM-KNN to be caused to attack by KNN in the prior art
The technical issues of accuracy of detection is poor, detection result is bad, the present invention propose a kind of based on improved SVM-KNN algorithms
Semi-supervised support attack detection method, label user data set and unmarked user data set is initially set up, secondly according to a small amount of
User data is marked to train an initial SVM classifier, calculate unmarked user data and initial SVM classifier boundary away from
From distance is more than the threshold values of setting, then is classified using SVM, otherwise classified using KNN, the data newly marked are added
Enter into training set, re -training SVM classifier, it is higher to finally obtain a nicety of grading for the continuous iteration above process
SVM classifier.It is utilized the accuracy of the flag data of the detector of semi-supervised learning, and reasonable employment data untagged
The regularity of distribution has combined SVM and KNN algorithms, so as to improve generalization ability and detection accuracy, in a small amount of information and constantly change
In the environment of change, the performance than previous attack detecting algorithm is more superior.
It these are only the preferred embodiment of the present invention, be not intended to restrict the invention, for those skilled in the art
For member, the invention may be variously modified and varied.Any modification for all within the spirits and principles of the present invention, being made,
Equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.
Claims (5)
1. a kind of semi-supervised support attack detection method based on improved SVM-KNN algorithms, which is characterized in that including:
User's set is divided into marked user's collection and unmarked user collection, is trained just using marked user collection as training set
Beginning SVM classifier;
Preliminary classification is carried out to any sample of users that unmarked user concentrates using initial SVM classifier;
The user data for being will be marked to be incorporated in the training set in preliminary classification, remaining user data is incorporated to
Classification boundaries are nearby in vector set;
Using improved KNN similarities formula (1) as the range formula of KNN algorithms, SVM-KNN graders are formed to the classification
User near border vector set carries out secondary classification,
Wherein, cosine similarityAdd in the cosine similarity after weighted valueCommon trait item number accounts for total characteristic item number purpose ratioParameter a+b+c=1, xi、dlFor the feature vector of sample, qjkIt is characterized a tk's
Weighted value, wikAnd wjk(1≤k≤p) is respectively sample diAnd djIn k-th of characteristic item weight, xik,wjk(1≤k≤p) is the
Coordinate in k dimensions, p are characterized a number,Weight for class center vector;
The label user data that KNN algorithm classifications obtain is incorporated in the training set, and utilizes the updated training set
The new SVM classifier of re -training;
Judge whether classification results reach optimum detection performance, if it is determined that being then to export final classification device, otherwise recycle to described
The user that unmarked user concentrates classifies;
Support attack detecting is carried out to user data using the final classification device.
2. the semi-supervised support attack detection method according to claim 1 based on improved SVM-KNN algorithms, feature exist
In described that any sample of users progress preliminary classification that unmarked user concentrates is specifically included using initial SVM classifier:
An optional sample of users is concentrated in the unmarked user, categorised decision function f is calculated using SVM calculation formula (2)
(x) value;
Judge the absolute value of the categorised decision function | f (x) | whether it is more than given classification thresholds ε (0<ε<1);
Then it is normal users by the sample labeling if it is determined that being yes.
3. the semi-supervised support attack detection method according to claim 1 based on improved SVM-KNN algorithms, feature exist
In the range formula using improved KNN similarities formula (1) as KNN algorithms, to vector set near the classification boundaries
In user carry out secondary classification specifically include:
By the sample of users data to be sorted in vector set near the user data in the training set and the classification boundaries into
The consistent vectorization of row;
The distance of sample to be sorted and each sample in the training set is calculated using the range formula, and chosen distance is most
Arest neighbors of the k near sample as the sample to be sorted;
The weighted value that each sample in arest neighbors belongs to different classifications is calculated successively;
More a certain sample belongs to the size of the weighted value of different classifications, and the sample is divided into the classification of weighted value maximum
In.
4. the semi-supervised support attack detection method according to claim 3 based on improved SVM-KNN algorithms, feature exist
In a certain sample belongs to the c of each classificationjWeight vectors be qj,qj=(qj1,qj2,...,qjp), wherein qj1+qj2+...+
qjp=1, weighted value qjkIt is characterized a tkCorresponding weight, the size of weighted value represent tkSignificance level in different classifications.
5. the semi-supervised support attack detection method according to claim 1 based on improved SVM-KNN algorithms, feature exist
In, it is described to judge whether classification results reach optimum detection performance, if it is determined that be then to export final classification device, otherwise cycle pair
The user that the unmarked user concentrates carries out classification and specifically includes:
According to classification results, calculate the accuracy rate and recall ratio of assorting process, wherein, the classification results include really, it is very negative,
Vacation is described really and really to bear the data being respectively appropriately determined as attack user and real user, institute just with false minus four kinds of data
The false data just born with vacation and be judged to attacking user and real user for mistake are stated, the calculation formula of the accuracy rate is:Accurately
Rate=really/(really+vacation is just), the calculation formula of the recall ratio is:Recall ratio=really/(really+false negative);
Judge whether the accuracy rate and the recall ratio reach preset optimal threshold;
If the accuracy rate and recall ratio reach preset optimal threshold, SVM-KNN points of the classification results will be obtained
Class device is exported as final classification device, is otherwise recycled the user concentrated to the unmarked user and is classified.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711416340.2A CN108154178A (en) | 2017-12-25 | 2017-12-25 | Semi-supervised support attack detection method based on improved SVM-KNN algorithms |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711416340.2A CN108154178A (en) | 2017-12-25 | 2017-12-25 | Semi-supervised support attack detection method based on improved SVM-KNN algorithms |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108154178A true CN108154178A (en) | 2018-06-12 |
Family
ID=62464444
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711416340.2A Pending CN108154178A (en) | 2017-12-25 | 2017-12-25 | Semi-supervised support attack detection method based on improved SVM-KNN algorithms |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108154178A (en) |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108769079A (en) * | 2018-07-09 | 2018-11-06 | 四川大学 | A kind of Web Intrusion Detection Techniques based on machine learning |
CN109087482A (en) * | 2018-09-18 | 2018-12-25 | 西安交通大学 | A kind of falling detection device and method |
CN109299741A (en) * | 2018-06-15 | 2019-02-01 | 北京理工大学 | A kind of network attack kind identification method based on multilayer detection |
CN109818929A (en) * | 2018-12-26 | 2019-05-28 | 天翼电子商务有限公司 | Based on the unknown threat cognitive method actively from step study, system, storage medium, terminal |
CN109903166A (en) * | 2018-12-25 | 2019-06-18 | 阿里巴巴集团控股有限公司 | A kind of data Risk Forecast Method, device and equipment |
CN109934004A (en) * | 2019-03-14 | 2019-06-25 | 中国科学技术大学 | The method of privacy is protected in a kind of machine learning service system |
CN110020532A (en) * | 2019-04-15 | 2019-07-16 | 苏州浪潮智能科技有限公司 | A kind of information filtering method, system, equipment and computer readable storage medium |
CN110225055A (en) * | 2019-06-22 | 2019-09-10 | 福州大学 | A kind of network flow abnormal detecting method and system based on KNN semi-supervised learning model |
CN110428458A (en) * | 2018-12-26 | 2019-11-08 | 西安电子科技大学 | Depth information measurement method based on the intensive shape coding of single frames |
CN110602090A (en) * | 2019-09-12 | 2019-12-20 | 天津理工大学 | Block chain-based support attack detection method |
CN110808968A (en) * | 2019-10-25 | 2020-02-18 | 新华三信息安全技术有限公司 | Network attack detection method and device, electronic equipment and readable storage medium |
CN111757328A (en) * | 2020-06-23 | 2020-10-09 | 南京林业大学 | Cross-technology communication cheating attack detection method |
CN112153000A (en) * | 2020-08-21 | 2020-12-29 | 杭州安恒信息技术股份有限公司 | Method and device for detecting network flow abnormity, electronic device and storage medium |
CN112288015A (en) * | 2020-10-30 | 2021-01-29 | 国网四川省电力公司电力科学研究院 | Distribution network electrical topology identification method and system based on edge calculation improved KNN |
CN112529108A (en) * | 2020-12-28 | 2021-03-19 | 内蒙动力机械研究所 | Machine learning-based nondestructive testing data prediction method for solid rocket engine |
CN113079123A (en) * | 2020-01-03 | 2021-07-06 | 中国移动通信集团广东有限公司 | Malicious website detection method and device and electronic equipment |
CN113255474A (en) * | 2021-05-07 | 2021-08-13 | 华中科技大学 | Automobile engine fault diagnosis method and device |
CN113420772A (en) * | 2021-08-24 | 2021-09-21 | 常州微亿智造科技有限公司 | Defect detection method and device based on multi-classifier and SVDD (singular value decomposition and direct decomposition) cooperative algorithm |
CN113469251A (en) * | 2021-07-02 | 2021-10-01 | 南京邮电大学 | Method for classifying unbalanced data |
CN113722607A (en) * | 2021-06-25 | 2021-11-30 | 河海大学 | Improved clustering-based trust attack detection method |
CN114039794A (en) * | 2019-12-11 | 2022-02-11 | 支付宝(杭州)信息技术有限公司 | Abnormal flow detection model training method and device based on semi-supervised learning |
CN116881828A (en) * | 2023-07-19 | 2023-10-13 | 西华师范大学 | Abnormal detection method of KNN algorithm based on subspace similarity |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050100992A1 (en) * | 2002-04-17 | 2005-05-12 | Noble William S. | Computational method for detecting remote sequence homology |
CN104239436A (en) * | 2014-08-27 | 2014-12-24 | 南京邮电大学 | Network hot event detection method based on text classification and clustering analysis |
CN105426426A (en) * | 2015-11-04 | 2016-03-23 | 北京工业大学 | KNN text classification method based on improved K-Medoids |
CN106250442A (en) * | 2016-07-26 | 2016-12-21 | 新疆大学 | The feature selection approach of a kind of network security data and system |
CN106557785A (en) * | 2016-11-23 | 2017-04-05 | 山东浪潮云服务信息科技有限公司 | A kind of support vector machine method of optimization data classification |
CN106951466A (en) * | 2017-03-01 | 2017-07-14 | 常州大学怀德学院 | Field text feature and system based on KNN SVM |
-
2017
- 2017-12-25 CN CN201711416340.2A patent/CN108154178A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050100992A1 (en) * | 2002-04-17 | 2005-05-12 | Noble William S. | Computational method for detecting remote sequence homology |
CN104239436A (en) * | 2014-08-27 | 2014-12-24 | 南京邮电大学 | Network hot event detection method based on text classification and clustering analysis |
CN105426426A (en) * | 2015-11-04 | 2016-03-23 | 北京工业大学 | KNN text classification method based on improved K-Medoids |
CN106250442A (en) * | 2016-07-26 | 2016-12-21 | 新疆大学 | The feature selection approach of a kind of network security data and system |
CN106557785A (en) * | 2016-11-23 | 2017-04-05 | 山东浪潮云服务信息科技有限公司 | A kind of support vector machine method of optimization data classification |
CN106951466A (en) * | 2017-03-01 | 2017-07-14 | 常州大学怀德学院 | Field text feature and system based on KNN SVM |
Non-Patent Citations (1)
Title |
---|
吕成戍 等: "基于SVM-KNN的半监督托攻击检测方法", 《计算机工程与应用》 * |
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109299741A (en) * | 2018-06-15 | 2019-02-01 | 北京理工大学 | A kind of network attack kind identification method based on multilayer detection |
CN109299741B (en) * | 2018-06-15 | 2022-03-04 | 北京理工大学 | Network attack type identification method based on multi-layer detection |
CN108769079A (en) * | 2018-07-09 | 2018-11-06 | 四川大学 | A kind of Web Intrusion Detection Techniques based on machine learning |
CN109087482A (en) * | 2018-09-18 | 2018-12-25 | 西安交通大学 | A kind of falling detection device and method |
CN109903166A (en) * | 2018-12-25 | 2019-06-18 | 阿里巴巴集团控股有限公司 | A kind of data Risk Forecast Method, device and equipment |
CN109903166B (en) * | 2018-12-25 | 2024-01-30 | 创新先进技术有限公司 | Data risk prediction method, device and equipment |
CN110428458A (en) * | 2018-12-26 | 2019-11-08 | 西安电子科技大学 | Depth information measurement method based on the intensive shape coding of single frames |
CN109818929A (en) * | 2018-12-26 | 2019-05-28 | 天翼电子商务有限公司 | Based on the unknown threat cognitive method actively from step study, system, storage medium, terminal |
CN109934004A (en) * | 2019-03-14 | 2019-06-25 | 中国科学技术大学 | The method of privacy is protected in a kind of machine learning service system |
CN110020532A (en) * | 2019-04-15 | 2019-07-16 | 苏州浪潮智能科技有限公司 | A kind of information filtering method, system, equipment and computer readable storage medium |
CN110225055A (en) * | 2019-06-22 | 2019-09-10 | 福州大学 | A kind of network flow abnormal detecting method and system based on KNN semi-supervised learning model |
CN110602090A (en) * | 2019-09-12 | 2019-12-20 | 天津理工大学 | Block chain-based support attack detection method |
CN110808968A (en) * | 2019-10-25 | 2020-02-18 | 新华三信息安全技术有限公司 | Network attack detection method and device, electronic equipment and readable storage medium |
CN114039794A (en) * | 2019-12-11 | 2022-02-11 | 支付宝(杭州)信息技术有限公司 | Abnormal flow detection model training method and device based on semi-supervised learning |
CN113079123A (en) * | 2020-01-03 | 2021-07-06 | 中国移动通信集团广东有限公司 | Malicious website detection method and device and electronic equipment |
CN111757328A (en) * | 2020-06-23 | 2020-10-09 | 南京林业大学 | Cross-technology communication cheating attack detection method |
CN112153000A (en) * | 2020-08-21 | 2020-12-29 | 杭州安恒信息技术股份有限公司 | Method and device for detecting network flow abnormity, electronic device and storage medium |
CN112153000B (en) * | 2020-08-21 | 2023-04-18 | 杭州安恒信息技术股份有限公司 | Method and device for detecting network flow abnormity, electronic device and storage medium |
CN112288015A (en) * | 2020-10-30 | 2021-01-29 | 国网四川省电力公司电力科学研究院 | Distribution network electrical topology identification method and system based on edge calculation improved KNN |
CN112529108A (en) * | 2020-12-28 | 2021-03-19 | 内蒙动力机械研究所 | Machine learning-based nondestructive testing data prediction method for solid rocket engine |
CN113255474A (en) * | 2021-05-07 | 2021-08-13 | 华中科技大学 | Automobile engine fault diagnosis method and device |
CN113722607A (en) * | 2021-06-25 | 2021-11-30 | 河海大学 | Improved clustering-based trust attack detection method |
CN113722607B (en) * | 2021-06-25 | 2023-12-08 | 河海大学 | Improved clustering-based bracket attack detection method |
CN113469251A (en) * | 2021-07-02 | 2021-10-01 | 南京邮电大学 | Method for classifying unbalanced data |
CN113420772A (en) * | 2021-08-24 | 2021-09-21 | 常州微亿智造科技有限公司 | Defect detection method and device based on multi-classifier and SVDD (singular value decomposition and direct decomposition) cooperative algorithm |
CN116881828A (en) * | 2023-07-19 | 2023-10-13 | 西华师范大学 | Abnormal detection method of KNN algorithm based on subspace similarity |
CN116881828B (en) * | 2023-07-19 | 2024-05-17 | 西华师范大学 | Abnormal detection method of KNN algorithm based on subspace similarity |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108154178A (en) | Semi-supervised support attack detection method based on improved SVM-KNN algorithms | |
US20210390355A1 (en) | Image classification method based on reliable weighted optimal transport (rwot) | |
Schubert et al. | On evaluation of outlier rankings and outlier scores | |
CN111126482B (en) | Remote sensing image automatic classification method based on multi-classifier cascade model | |
CN103309953B (en) | Method for labeling and searching for diversified pictures based on integration of multiple RBFNN classifiers | |
Liang et al. | Learning very fast decision tree from uncertain data streams with positive and unlabeled samples | |
Fang et al. | Confident learning-based domain adaptation for hyperspectral image classification | |
Yu et al. | Cutset-type possibilistic c-means clustering algorithm | |
CN108877947A (en) | Depth sample learning method based on iteration mean cluster | |
CN107679138A (en) | Spectrum signature system of selection based on local scale parameter, entropy and cosine similarity | |
CN111815582B (en) | Two-dimensional code region detection method for improving background priori and foreground priori | |
CN115577357A (en) | Android malicious software detection method based on stacking integration technology | |
Khezri et al. | A novel semi-supervised ensemble algorithm using a performance-based selection metric to non-stationary data streams | |
Xue et al. | Deep constrained low-rank subspace learning for multi-view semi-supervised classification | |
Seyghaly et al. | Interference recognition for fog enabled IoT architecture using a novel tree-based method | |
Poongodi et al. | Support vector machine with information gain based classification for credit card fraud detection system. | |
Zhou et al. | Credit card fraud identification based on principal component analysis and improved AdaBoost algorithm | |
CN113837266A (en) | Software defect prediction method based on feature extraction and Stacking ensemble learning | |
Singhal et al. | Image classification using bag of visual words model with FAST and FREAK | |
Cong et al. | Exact and consistent interpretation of piecewise linear models hidden behind APIs: A closed form solution | |
CN106529585A (en) | Piano music score difficulty identification method based on large-interval projection space learning | |
CN113128556B (en) | Deep learning test case sequencing method based on mutation analysis | |
Lin et al. | Automated classification of Wuyi rock tealeaves based on support vector machine | |
Zheng et al. | An Improved k-Nearest Neighbor Classification Algorithm Using Shared Nearest Neighbor Similarity. | |
Pryor et al. | Deepfake detection analyzing hybrid dataset utilizing CNN and SVM |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180612 |