CN108154178A - Semi-supervised support attack detection method based on improved SVM-KNN algorithms - Google Patents

Semi-supervised support attack detection method based on improved SVM-KNN algorithms Download PDF

Info

Publication number
CN108154178A
CN108154178A CN201711416340.2A CN201711416340A CN108154178A CN 108154178 A CN108154178 A CN 108154178A CN 201711416340 A CN201711416340 A CN 201711416340A CN 108154178 A CN108154178 A CN 108154178A
Authority
CN
China
Prior art keywords
user
sample
classification
svm
knn
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711416340.2A
Other languages
Chinese (zh)
Inventor
沈琦
牛立坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN201711416340.2A priority Critical patent/CN108154178A/en
Publication of CN108154178A publication Critical patent/CN108154178A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of semi-supervised support attack detection method based on improved SVM KNN algorithms, including:Marked user is collected as training set and trains initial SVM classifier;Unmarked user is collected using initial SVM classifier and carries out preliminary classification;The user data of normal users is incorporated in training set, using improved KNN similarities formula as the range formula of KNN algorithms, secondary classification is carried out to remaining user data;Update training set, and the SVM classifier that re -training is new;Judge whether classification results reach optimum detection performance, if it is determined that being then to export final classification device, otherwise recycle the user concentrated to unmarked user and classify;Support attack detecting is carried out to user data using final classification device.By technical scheme of the present invention, the generalization ability of support attack detecting and detection accuracy are improved, in a small amount of information and continually changing environment, the performance than previous attack detecting algorithm is more superior.

Description

Semi-supervised support attack detection method based on improved SVM-KNN algorithms
Technical field
The present invention relates to technical field of network security more particularly to a kind of based on the semi-supervised of improved SVM-KNN algorithms Hold in the palm attack detection method.
Background technology
In practical network environment, a large amount of user identity can not determine, and the support attack faced also can be increasingly It is complicated.Such as in Taobao's shopping website, some conditions it can be determined by liveness, positive rating, imperial crown user etc. It is real user, but most of user, it only simply operates some shopping process or even will not all go to evaluate, this part User can not determine whether it is real user.Attacker can also construct the attack to become increasingly complex with the understanding to website Model.But existing attack detecting algorithm, it can be true in face of the novel support attack to become increasingly complex and only a small amount of user In the case of determining identity, detection result is unsatisfactory.
Invention content
At least one of regarding to the issue above, the present invention provides a kind of half prisons based on improved SVM-KNN algorithms Support attack detection method is superintended and directed, initially sets up label user data set and unmarked user data set, is secondly used according to a small amount of label User data trains an initial SVM classifier, calculates the distance of unmarked user data and initial SVM classifier boundary, distance More than the threshold values of setting, then classified using SVM, otherwise classified using KNN, the data newly marked are added to training It concentrates, re -training SVM classifier, the continuous iteration above process finally obtains a higher svm classifier of nicety of grading Device.It is utilized the accuracy of the flag data of the detector of semi-supervised learning, and the reasonable employment distribution rule of data untagged Rule, has combined SVM and KNN algorithms, so as to improve generalization ability and detection accuracy, in a small amount of information and continually changing ring In border, the performance than previous attack detecting algorithm is more superior.
To achieve the above object, the present invention provides a kind of semi-supervised supports based on improved SVM-KNN algorithms to attack inspection Survey method, including:User's set is divided into marked user's collection and unmarked user collection, marked user is collected as training set Train initial SVM classifier;Any sample of users that unmarked user concentrates tentatively is divided using initial SVM classifier Class;The user data for being will be marked to be incorporated in the training set in preliminary classification, remaining user data is incorporated to Classification boundaries are nearby in vector set;Using improved KNN similarities formula (1) as the range formula of KNN algorithms, to the classification User near border vector set carries out secondary classification,
Wherein, A+b+c=1;
The label user data that KNN algorithm classifications obtain is incorporated in the training set, and utilizes the updated instruction Practice the new SVM classifier of collection re -training;Judge whether classification results reach optimum detection performance, if it is determined that being then to export most Otherwise whole grader recycles the user concentrated to the unmarked user and classifies;Using the final classification device to user Data carry out support attack detecting.
In the above-mentioned technical solutions, it is preferable that any sample concentrated using initial SVM classifier to unmarked user This user carries out preliminary classification and specifically includes:An optional sample of users is concentrated in the unmarked user, utilizes SVM calculation formula (2) value of categorised decision function f (x) is calculated;
Judge the absolute value of the categorised decision function | f (x) | whether it is more than given classification thresholds ε (0<ε<1);If sentence Being set to is, then is normal users by the sample labeling.
In the above-mentioned technical solutions, it is preferable that the distance using improved KNN similarities formula (1) as KNN algorithms Formula carries out secondary classification to the user in vector set near the classification boundaries and specifically includes:By the use in the training set The user data vectorization consistent with the sample of users data progress to be sorted in the classification boundaries nearby vector set;Using described Range formula calculates the distance of sample to be sorted and each sample in the training set, and the k sample that chosen distance is nearest Arest neighbors as the sample to be sorted;The weighted value that each sample in arest neighbors belongs to different classifications is calculated successively;Compare A certain sample belongs to the size of the weighted value of different classifications, and the sample is divided into the classification of weighted value maximum.
In the above-mentioned technical solutions, it is preferable that a certain sample belongs to the c of each classificationjWeight vectors be qj,qj= (qj1,qj2,...,qjp), wherein qj1+qj2+...+qjp=1, weighted value qjkIt is characterized a tkCorresponding weight, weighted value it is big Small expression tkSignificance level in different classifications.
In the above-mentioned technical solutions, it is preferable that judge whether classification results reach optimum detection performance, if it is determined that being, then Final classification device is exported, is otherwise recycled and classification is carried out to the user that the unmarked user concentrates is specifically included:
According to classification results, calculate the accuracy rate and recall ratio of assorting process, wherein, the classification results include really, It really bears, is false just with false minus four kinds of data, described really and really bear is respectively the number being appropriately determined as attack user and real user According to the data for being judged to attacking user and real user for mistake are just being born in the vacation with vacation, and the calculation formula of the accuracy rate is: Accuracy rate=really/(really+vacation is just), the calculation formula of the recall ratio is:Recall ratio=really/(really+false negative);Judge Whether the accuracy rate and the recall ratio reach preset optimal threshold;If the accuracy rate and recall ratio reach default Optimal threshold, then will obtain the SVM-KNN graders of the classification results and exported as final classification device, otherwise recycled to institute The user that unmarked user concentrates is stated to classify.
Compared with prior art, beneficial effects of the present invention are:By provided by the invention a kind of based on improved SVM- The semi-supervised support attack detection method of KNN algorithms initially sets up label user data set and unmarked user data set, secondly root An initial SVM classifier is trained according to a small amount of label user data, calculates unmarked user data and initial SVM classifier boundary Distance, distance be more than setting threshold values, then classified using SVM, otherwise classified using KNN, the number that will newly mark According to being added in training set, re -training SVM classifier, the continuous iteration above process finally obtains a nicety of grading and compares High SVM classifier.It is utilized the accuracy of the flag data of the detector of semi-supervised learning, and the unmarked number of reasonable employment According to the regularity of distribution, combined SVM and KNN algorithms, so as to improve generalization ability and detection accuracy, in a small amount of information and not In the environment of disconnected variation, the performance than previous attack detecting algorithm is more superior.
Description of the drawings
Fig. 1 semi-supervised support attack detecting sides based on improved SVM-KNN algorithms disclosed in an embodiment of the present invention The flow diagram of method;
Fig. 2 is the principle of classification schematic diagram of SVM-KNN graders disclosed in an embodiment of the present invention;
Fig. 3-Fig. 8 is the datagram that attack detecting disclosed in an embodiment of the present invention is tested.
Specific embodiment
Purpose, technical scheme and advantage to make the embodiment of the present invention are clearer, below in conjunction with the embodiment of the present invention In attached drawing, the technical solution in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is The part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people Member's all other embodiments obtained under the premise of creative work is not made, shall fall within the protection scope of the present invention.
The present invention is described in further detail below in conjunction with the accompanying drawings:
As depicted in figs. 1 and 2, it is attacked according to a kind of semi-supervised support based on improved SVM-KNN algorithms provided by the invention Detection method is hit, including:User's set is divided into marked user's collection and unmarked user collects, by marked user by step S11 Collection trains initial SVM classifier as training set;Step S12 appoints unmarked user's concentration using initial SVM classifier One sample of users carries out preliminary classification;Step S13 will mark the user data for being to be incorporated to training set in preliminary classification In, remaining user data is incorporated near classification boundaries in vector set;Step S14, with improved KNN similarities formula (1) As the range formula of KNN algorithms, secondary classification is carried out to the user in vector set near classification boundaries,
Wherein, cosine similarityThe cosine added in after weighted value is similar DegreeCommon trait item number accounts for total characteristic item number purpose ratioParameter a+b+c=1, xi、dlFor the feature vector of sample, qjkIt is characterized a tk's Weighted value, wikAnd wjk(1≤k≤p) is respectively sample diAnd djIn k-th of characteristic item weight, xik,wjk(1≤k≤p) is the Coordinate in k dimensions, p are characterized a number,Weight for class center vector;
The label user data that KNN algorithm classifications obtain is incorporated in training set by step S15, and utilizes updated instruction Practice the new SVM classifier of collection re -training;Step S16, judges whether classification results reach optimum detection performance, if it is determined that be, Final classification device is then exported, the user concentrated to unmarked user is otherwise recycled and classifies;Step S17, utilizes final classification Device carries out support attack detecting to user data.
In the above embodiment, it is preferable that any sample of users concentrated using initial SVM classifier to unmarked user Preliminary classification is carried out to specifically include:An optional sample of users is concentrated in unmarked user, is calculated using SVM calculation formula (2) The value of categorised decision function f (x);
Judge the absolute value of categorised decision function | f (x) | whether it is more than given classification thresholds ε (0<ε<1);If it is determined that it is It is then normal users by sample labeling to be.
In the above embodiment, it is preferable that using improved KNN similarities formula (1) as the range formula of KNN algorithms, Secondary classification is carried out to the user in vector set near classification boundaries to specifically include:By the user data in training set and classification side Sample of users data to be sorted near boundary in vector set carry out consistent vectorization;Sample to be sorted is calculated using range formula With the distance of each sample in training set, and arest neighbors of the nearest k sample of chosen distance as sample to be sorted;Successively Calculate the weighted value that each sample in arest neighbors belongs to different classifications;More a certain sample belongs to the big of the weighted value of different classifications It is small, and the sample is divided into the classification of weighted value maximum.
In the above embodiment, it is preferable that a certain sample belongs to the c of each classificationjWeight vectors be qj,qj=(qj1, qj2,...,qjp), wherein qj1+qj2+...+qjp=1, weighted value qjkIt is characterized a tkCorresponding weight, the size table of weighted value Show tkSignificance level in different classifications.
In the above embodiment, it is preferable that judge whether classification results reach optimum detection performance, if it is determined that be, then it is defeated Go out final classification device, otherwise recycle the process classified to the user that unmarked user concentrates and specifically include:It is tied according to classification Fruit calculates the accuracy rate and recall ratio of assorting process, wherein, classification results include really, it is very negative, false just with false minus four kinds of numbers According to it is respectively the data being appropriately determined as attack user and real user really and really to bear, and vacation is just born with vacation to be determined as mistake The data of user and real user are attacked, the calculation formula of accuracy rate is:Accuracy rate=really/(really+vacation is just), recall ratio Calculation formula is:Recall ratio=really/(really+false negative);Whether judging nicety rate and recall ratio reach preset best threshold Value;If accuracy rate and recall ratio reach preset optimal threshold, the SVM-KNN graders of classification results will be obtained as most Whole grader output, otherwise recycles the user concentrated to unmarked user and classifies.
In this embodiment, for support attack problem, using the attack detecting side of the semi-supervised learning based on machine learning Method.Common attack detection method supervised learning, unsupervised learning and semi-supervised learning based on machine learning.
Specifically, supervised learning refers to the parameter of the sample adjustment grader using one group of known class, reaches institute It is required that the process of performance.Support attack is come out by different Construction of A Model, and support attack certainly exists some spies with real user The otherness of sign using these features, is detected support attack.Many features index has been proposed in research at present, can be with It is divided into common index, the index based on model, index in general picture.What the scoring of common index from target item and filler was distributed Difference portrays the difference between support attack and normal users, such as average departure degree (RDMA), arest neighbors similarity (DegSim) etc..Based on the index of model according to the distinctive marking mode of different support challenge models, different attack general pictures come structure Build index, for distinguishing support attack and normal users, such as user's average score and its mean value that scores variation (MeanVar), The highest item that scores and the average deviation (FMTD) of remaining item set etc..Statistical property in general picture between characteristic evidences scoring Difference is distinguished, such as degree of concern (TMF) index of target item etc..
Unsupervised learning refers to be solved according to the training sample of classification unknown (not being labeled) various in pattern-recognition Problem.Unsupervised detection algorithm generally classifies to different user dependent on cluster mode, the model used according to algorithm Difference can be divided into the algorithm based on dimensionality reduction and the algorithm based on model.The algorithm of dimensionality reduction is found most main according to the thought of dimensionality reduction The variable wanted gives expression to rating matrix, then according to normal users and the difference detection support attack of support attack.Existing research has base In the unsupervised learning algorithm (PCA Select Users) that the user of principal component analysis screens.The algorithm is using PCA dimensionality reductions Principle, extracts mutually independent feature, the correlation between feature it is low be then likely to be support attack.Also unsupervised attack General picture searching algorithm (Un RAP), algorithm determine if it is support attack using the matching degree of user and rating matrix height. Algorithm based on model utilizes the potential expression of different models, reconstructs the situation of rating matrix under the model, passes through the mould Difference under type is attacked to detect support.Existing research has based on probability latent semantic analysis (Probabilistic latent Semantic analysis, PLSA) support attack probe algorithm.The average distance of each user is calculated, and uses " density " Index weighs it.Think that " density " of support attack is relatively large, therefore support attack detecting can be come out.In addition based on unusual The technology that value is decomposed proposes SVD detection algorithms.Since certain rule is presented in the scoring of support attack, support attack is by singular value There are larger differences for the low-dimensional model of low-dimensional model and normal users after decomposition.
Semi-supervised learning refers to using a large amount of Unlabeled data and simultaneously using flag data, to carry out pattern knowledge It does not work.Semi-supervised detection algorithm refers to only know the general picture of sub-fraction support attack sample, and utilizes this sub-fraction support The general picture feature of attack constructs detection algorithm, and such as Semi-SAD algorithms, algorithm is being marked first using Naive Bayes Classifier Training preliminary classification device, then improves grader in unmarked data, promotes the detection performance of grader in data.Half supervises Educational inspector practises, i.e., synthetically using markd sample data and unmarked sample data, to generate suitable disaggregated model.It Thought be using a small amount of marked one fundamental classifier of sample training, recycle fundamental classifier to unmarked sample Data carry out classification marker, then using this part by the sample set that fundamental classifier marks as new training sample set again Training grader, finally obtain one by a small amount of known mark sample set and a large amount of unknown marks sample set training and The grader come.
Present invention joint SVM algorithm and improved KNN algorithms, attack inspection is carried out by way of semi-supervised learning It surveys.
Specifically, nearest neighbour method (NearestNeighbor, abbreviation NN) is most important method in pattern-recognition nonparametric method One of, initial nearest neighbour method is to be proposed by Cover and Hart in nineteen sixty-eight, so-called k nearest neighbor, is exactly to investigate and sample to be sorted This K most like sample judges the category attribute of sample to be sorted according to the classification of this K sample.The base of NN graders Present principles:For a sample vector x to be sorted, using all training samples as representing a little, K phase is found out in point is represented As sample, then using this K text as candidate categories, using the value of x and the similarity of K sample as measurement weight, together When set similarity threshold values, it is possible to determine that the classification of x.
The size of two similar sample degrees of correlation is known as similarity, can be with when we carry out vector to sample represents The similarity degree of two texts is weighed using the distance between sample.There is much the computational methods of distance between two samples Kind, such as:Euclidean distance (Euclidean distance), COS distance (Cosine distance), city distance (City- Block distance), correlation distance (Correlation distancce), Hamming distance (Hamming distance) etc..
In vector space model, after feature extraction, each sample can be converted by opposite one group of entry sample Or feature (t1,t2,t3,...,tp) composition the relatively low space vector of dimension, each feature tiAll there are one corresponding power Weight values wi(represent character pair tiSignificance level in the sample), the feature t newly chosen after Feature Selection1,t2,t3,...,tp The latitude of p dimension coordinates system, w can be regarded as1,w2,w3,...,wpIt is corresponding characteristic value on each latitude.Have in this way sample to The coordinate representation of amount, it is possible to the similarity between sample is measured, if sample vector is di=(wi1,wi2,wi3,...,wip), dj= (wj1,wj2,wj3,...,wjp), the definition of several distances is given below.
(1) Euclidean distance (Euclidean distance)
It is for p dimensional feature space distance definitions:
Wherein diAnd djThe feature vector of sample is expressed as, p is characterized dimension of a vector space, wikRepresent sample diKth The coordinate of dimension, wjkRepresent sample djKth dimension coordinate.For two samples apart from smaller, two sample similarity degrees are higher, may belong to same A kind of possibility is also bigger, conversely, it is then more dissimilar, it may more belong to different classifications.
(2) COS distance (Cosine distance)
The formula that usually used cosine calculates distance is a kind of blue formula distance, generally using the inner product of feature vector or The cosine of angle theta calculates, and vectorial angle is smaller, that is, cosine value is bigger illustrates that similarity is higher.Use COS distance The formula for calculating the similarity of two feature vectors is as follows:
Or
Wherein wikAnd wjk(1≤k≤p) is respectively sample diAnd djIn k-th of characteristic item weight, p is characterized a number, That is characteristic vector space dimension.Formula (5) is after first feature vector length is normalized in fact, then seeks inner product, normalizing It is so that the inconsistent text of length is comparable to change purpose.
(3) city distance (City-block distance)
Sampling feature vectors diAnd djBetween city distance definition it is as follows:
D(di,dj)=| wi1-wj1|+|wi2-wj2|+...+|wip-wjp| (6)
di=(wi1,wi2,...,wip) and dj=(wj1,wj2,...wjp) represent sample WiAnd WjFeature vector, with D (di, dj) represent sample point d in sample setiAnd djBetween distance.
(4) correlation distance (Correlation distance)
If
I-th row d of matrixi=(wi1,wi2,...,wip) character pair vector for sample, ci(1≤i≤c) is sample This classification, Y classes are the n-dimensional vector of classification composition, and the correlation distance of any two sample vector is defined as:
Wherein
(5) Hamming distance (Hamming distance)
In computer binary system theory, Hamming weight represent be symbol in code word " 1 " number, Hamming weight also referred to as Code weight is abbreviated as W, such as code word " 110010 ", and code length is just p=6, and code weight is just W=3.
Hamming distance between the code length two code word x=(x1x2...xk...xp) for p, y=(y1y2...yk...yp) It is defined as:
WhereinRepresent mould 2 plus operation, xk∈{0,1},yk∈{0,1}.X in D (x, y), y are code word, and D (x, y) is code word On equivalent site the number of different symbol the sum of, its size embodies the difference degree between two code words, formula (8) Value it is bigger explanation two code word differences it is bigger.
Text is after Feature Selection, using boolean's Weight, it is possible to be arranged in the code word that code length is p, such as sample This W1It can be expressed as:d1=(10011100101010.....101), 0,1 state for corresponding respectively to sample herein, at certain There is no the sample informations on component positions to remember 0, and there are sample information note 1, such sample sets on certain this component positions It closes just and codeword set establishes 1-1 correspondences.It is exactly to ask in fact between two code words so the problem of research text is similar Hamming distance.If text W1And W2Corresponding code word is respectively d1And d2, then the Hamming distance between two samples, it is possible to use Formula (8) represents.D(d1,d2) vector space model of the value between 0 and p in, when two sample vector p bit words are complete When exactly the same, when Hamming distance between the two is that 0, p bit words are entirely different, Hamming distance p, so D (x, y) is quantitative Ground describes the difference degree between different texts.
When carrying out sample classification, sample set is converted into codeword set first, for sample W1Code word d1= (x1x2...xk...xn) and sample W2Code word d2=(y1y2...yk...yn), so that it may it is similar to be defined with following formula Degree:
Wherein xk,ykSample W is represented respectively1Corresponding d1With sample W2Corresponding d2Numerical value on k-th of component, value 0 or Person 1.Similarity degree is described with formula (9), when two samples are substantially similar, i.e., code word is essentially identical, similarity Sim (d1, d2) value just closer to 1;Conversely, then code word is entirely different, Sim (d1,d2) closer to 0.
The rudimentary algorithm step of KNN includes:
Step 1:Data vector during training is gathered;
Step 2:The data to be sorted that will carry out KNN algorithms also carry out the vectorization caused with training set unification;
Step 3:According to range formula, such as cosine formula:
The distance of sample to be sorted and each sample in training set is calculated, k nearest sample of chosen distance is used as should K arest neighbors of sample to be sorted;
Step 4:According to the k neighbour selected, the weighted value for belonging to every one kind is calculated successively.Weighted value I (dl,cj) Instead of specific method is more in the prior art, and details are not described herein.
Wherein KNN represents k arest neighbors.
Step 5:Compare weight size, which kind of weighted value maximum belonged to, then which kind of belongs to.
The formula of the similarity of above-mentioned KNN algorithms is very easy to find following two problems by studying and testing:
(1) sample to be tested x is not related in similarity formulaiWith different classes of of training set with the presence or absence of similitude with And similarity degree, aggregation extent and each the sample to be tested point for not accounting for every a kind of sample are each in training set Class cjClass center distance, aggregation extent is bigger and xiApart from certain one kind cjClass center is nearer, then sample to be tested is got over such It is similar;
(2) training set that different classification is not accounted in formula may be because the characteristic item number difference included to classification As a result different influences is generated, a little classes having are laid particular stress on comprising some featured items, and some classes are laid particular stress on comprising other characteristic items, class The not different characteristic items included are also different, that is, do not account for common between sample point in sample to be sorted and every a kind of sample set The number of features possessed.
Denominator, which is equivalent to, divided by the mould of sample vector is long, has done normalizing can be seen that for above-mentioned COS distance formula (10) Change is handled, and eliminates influence of the sample vector length to classifying quality.
First problem is considered first, is remembered per one kind cjClass center vector beIt is instruction Practice set in all sample vectors summation and then divided by training set sample size n sample range calculate result.Sample x to be sortediAway from From each class cjClass center similarity can be expressed as:
Secondly consider Second Problem, sample x to be sortediWith every one kind c in training setjEach sample in training set This point co-owns the situation of number of features, noteAccounting for total characteristic item number for common trait item number, (characteristic vector space is tieed up The ratio of number p), is defined with Hamming similarity
Wherein xi=(xi1,xi2,...,xip) represent sample to be sorted, dj=(wj1,wj2,...,wjp) it is to appoint in training set Meaning sample, xik,wjkCoordinate in (1≤k≤p) kth dimension, takes 0 or 1.
2 points more than considering, improved COS distance similarity is defined as:
Wherein a+b+c=1 can seek optimum allocation ratio by many experiments, make improved similarity classification effect Fruit is optimal.Improved COS distance formula can solve above-mentioned two problems, but can clearly be found out by formula, special Sign item does not introduce weighted value, and characteristic items all so are all equal weight values, so also needing to final step, introduces weight Value, can more meticulously adjust in this way, improve classifying quality.According to category attribute, we set sample and belong to every one kind cj's Weight vectors are qj,qj=(qj1,qj2,...,qjp), wherein qj1+qj2+...+qjp=1, weighted value qjkIt is characterized a tkIt is corresponding Weight, qjkSize represent tkSignificance level in different classes of.
In conclusion the cosine similarity formula after weighted value is added in formula (10):
Ibid, sample to be sorted and each class cjThe similarity formula at class center also introduce weight on the basis of formula (12) Value:
Comprehensive (15) formula, (16) formula, (13) formula show that the similarity formula finally improved is:
Secondly, support vector machines (SVM) is that Vapnik et al. is proposed in nineteen ninety-five according to structural risk minimization A kind of machine learning algorithm.It has structural risk minimization, and the strong distinguishing feature of generalization ability for classification problem, is supported Vector machine algorithm can be sketched as the sample in the input space is mapped to a feature space by certain nonlinear function In, make two class samples linear separability in this feature space, and it is super to find optimum linearity classification of the sample in this feature space Plane.SVM algorithm includes linear classification and Nonlinear Classification.
(1) linear classification
SVM defines optimal hyperlane, and searching optimum linearity hyperplane is converted into solution quadratic programming problem, is based on Sample space by Nonlinear Mapping, is mapped to high-dimensional feature space by Mercer theorems, and sample is solved thereby using linear method Nonlinearity problem in this space.
Support vector machines is proposed for two classification.Assuming that training sample (xi,yi), i=1,2 ..., l, x ∈ Rd,yi ∈ { -1,1 }, there are Optimal Separating Hyperplane wx+b=0, for classifying face is made correctly to classify all samples and has class interval, It must satisfy:
yi[(w·xi)+b]-1≥0 (18)
Calculating class interval is:
It is required that maximum class interval 2/ | | w | |, i.e. requirement minimizes | | w | |, then solving optimal hyperlane problem can table Constrained optimization problem is shown as, i.e., under the constraint of formula (18), minimizes function:
Introduce Lagrange functions:
Wherein, αi>0 is Lagrange coefficients.Formula (21) is sought into local derviation to w and b respectively and it is enabled to be equal to 0, it is possible to will The above problem is converted into simple dual problem.
Formula (22) and formula (23) are substituted into (21), you can obtain antithesis optimization problem:Solve the maximum of lower array function Value:
yi[(w·xi)+b]-1≥0 (25)
Wherein, αi>=0, i=1 ..., l
This is the quadratic function extreme-value problem (QP, Quadratic Programming) under an inequality constraints.According to Karush-Kuhn-Tucker (KKT) condition, the solution of the optimization problem must satisfy:
αi{yi[(w·xi)+b] -1=0, i=1 ..., l (27)
Therefore, the corresponding α of most samplesiIt is 0 to be, αi≠ 0 corresponds to the sample that equal sign is set up in formula (5.5.1) Referred to as supporting vector.In algorithm of support vector machine, supporting vector is the key element in training set, they are from decision boundary most Closely.If removing other all training samples, then training is re-started, identical classifying face will be obtained.
After solving above-mentioned quadratic programming problem, then categorised decision function is represented by:
Summation in formula only carries out supporting vector, i.e., the α being only not zeroiCorresponding training sample determines classification knot Fruit, and other samples are unrelated with classification results.b*It is classification threshold values.When training sample set is linearly inseparable, introduce non-negative Slack variable ξ, i=1,2 ..., l, the optimal problem for damp plane of classifying are:
Its dual problem is that the maximum value of lower array function is solved to α:
s.t.yi[(w·xi)+b]≥1-ξi (31)
Wherein C>0 is a constant, referred to as error punishment parameter, it controls the punishment degree for dividing mistake sample:ξiBe The non-negative slack variable introduced during training sample linearly inseparable.
(2) Nonlinear Classification
For Nonlinear Classification problem, then using appropriate interior Product function K (xi,yj) it can realize a certain non-linear change Linear classification after changing, the object function optimized at this time become:
And corresponding categorised decision function representation is:
Above categorised decision function is exactly support vector machines.It can obtain as described above, original problem is converted into Dual problem so that the complexity of calculating depends no longer on space dimensionality, but depending on the branch in sample number, especially sample Vectorial number is held, this feature of support vector machines enables it to effectively tackle higher-dimension problem.
In SVM, kernel function K (xi,xj) introducing, can realize that the inner product operation of higher dimensional space is converted into former space Inner product kernel function calculates, and Nonlinear Classification is realized in the case where not increasing algorithm complexity.Different kernel functions, can be with structure Different SVM is built out, the selection of kernel function is most important.Common kernel function has following four:
(1) linear inner product kernel function
K(xi,xj)=(xi·xj) (35)
(2) polynomial kernel
K(xi,xj)=[(xi·xj)+C]q, q > 0 (36)
(3) radial direction base core
K(xi·yj)=exp (- | | xi-xj||2/2σ2) (37)
(4) two layers of neural network core
K(xi,xj)=tanh (v (xi·xj)+θ) (38)
As shown in Figure 1, mistake divides the distribution of sample point to carry out analysis to find during by svm classifier, SVM classifier and other Grader it is the same, error sample point is all near interface, therefore we can be by improving the sample near interface Nicety of grading improve classification performance.SVM can be regarded as every class by we, and only there are one the 1NN graders for representing point.Work as sample When this point is near interface, since SVM only takes one to represent a little every class supporting vector, the representative point cannot be good sometimes Such is represented, it is because KNN is so as to make classification using all supporting vectors of every class as representative point to be at this moment combined it with KNN Utensil has higher classification accuracy.Specifically, it for sample x to be identified, calculates x and two class supporting vectors represents point x+ and x- Range difference, if range difference is more than a given threshold value, i.e. x from classifying face farther out, such as region I and II in Fig. 1, uses SVM Classification can generally divide pair.When range difference is less than a given threshold value, i.e. x is nearer from interface, that is, falls into region III, such as Classification SVM, the distance for representing point that only calculating x and two classes are taken is easier misclassification, at this moment using KNN to sample point Class obtains it judgement using each supporting vector as representing the distance that a little, calculates sample to be identified and each supporting vector.
Included the following steps using basic SVM-KNN classifier algorithms:
Step 1:Supporting vector and constant b is obtained using traditional SVM algorithm.If T is test set, TsvFor supporting vector collection, k For the number taken, ε is the threshold value of classification, is typically set to 1, and by grader system 0 if ε is 0, system becomes traditional SVM Grader;
Step 2:IfX ∈ T are then taken, ifThen stop;
Step 3:Value is substituted into formula to calculate, chooses linear separability or linearly inseparable formula as the case may be;
Step 4:If | f (x) |>ε then directly enables f (x) export, if | f (x) |≤ε transmits TSV, x, k carry out KNN algorithms Classification, with TSVFor whole sample sets, obtained return value is output valve;
Step 5:T=T- { x } turns to step 1.
Since the sample of svm classifier mistake is concentrated mainly near interface, and the sample point near interface is most It is supporting vector, therefore for the classification performance for improving SVM, KNN graders can be combined, to the different samples being distributed in space Point uses different sorting techniques, and classifier performance is improved by the way of semi-supervised learning.Support attack detecting is one and changes The process in generation initially sets up label user data set and unmarked user data set.Secondly, according to a small amount of label user data instruction Practice an initial SVM classifier, calculate the distance of unmarked user data and initial SVM classifier boundary, distance is more than setting Threshold values, then classified using SVM, otherwise classified using KNN.The data newly marked are added in training set, weight New training SVM classifier.The continuous iteration above process, finally obtains a higher SVM classifier of nicety of grading.
Semi-supervised are applied to of SVM-KNN is held in the palm in attack detecting, which is improved.The algorithm is according to not Same situation, using different graders, is merged into re -training in training set, iteration is instructed to the end always by new flag data Practise a higher grader of precision, therefore in order to improve algorithm performance, SVM and KNN can be carried out certain independent It improves.For example, SVM can be improved in convergence rate, these unbalanced problems of data.
During the concrete practice of the present invention, the improved semi-supervised attack detection methods of SVM-KNN include the following steps:
Step 1:User's set is divided into two parts by known users set, and a part is marked set L={ (u1,cj), (u2,cj),...,(um,cj), m represents the quantity of label user, cjRepresent classification, the value of j is 1 and 2, and attack detecting is two Classification problem, altogether only two classes, c1Value represents normal users, c for 12Value represents for -1 and attacks user.Another part is not Label user's collection is combined into U={ u`1, u`2..., u`n, n represents unmarked number of users.It regard label user's collection as training set, Train initial SVM classifier;
Step 2:An optional sample u` from unmarked user data set Ui, this obtains classification by SVM calculation formula The value of decision function f (x), formula are following (non-linear formula):
Step 3:When | f (x) |>ε can be determined that u`iFrom classification boundaries farther out, can direct output category result, and will The data newly marked are directly incorporated into training set.When | f (x) |<ε, it is possible to determine that u`iNearer from classification boundaries, wherein ε is given Classification thresholds (0<ε<1) the nearer data of the partial distance classification boundaries, are added in into vector set U near classification boundariessv={ ui ∈ U, i=1,2 ..., k }, wherein k is the number of near border vector;
Step 4:Using improved KNN to set UsvIn user data reclassified;
Step 5:The boundary user data newly marked that KNN classifies are put into together in original training set, are being expanded Training set utilizes updated training set to train new SVM classifier;
Step 6:Whether judging result reaches best detection performance, if it is, output final classification device;If not, 2 re-optimization training user's data sets are then gone to step, carry out SVM training, loop iteration.
Assessment detection performance is assessed by accuracy rate and recall ratio two indices, and four kinds of data are included in grouped data, It is really and very negative to represent the number being appropriately determined as attack user and real user respectively, it is false just with vacation is negative represents mistake respectively and sentence It is set to the number of attack user and real user.Accuracy rate and the calculation formula of recall ratio are respectively:Accuracy rate=really/(true Just+vacation is just), recall ratio=really/(really+false negative).
In above-mentioned algorithm, during each repetitive exercise grader, add in the mark quality of boundary sample in training set to point The influence of class effect is very big, but since label user data is insufficient, the classification capacity of initial SVM is weaker, easily wrong point Boundary sample.Therefore, it herein during update training set every time, introduces KNN and classifies to boundary sample, assist SVM Optimize the mark quality of data boundary, so as to improve the accuracy of detection of final classification device.
It is tested for the semi-supervised support attack detecting algorithms of improved SVM-KNN.
Experiment is using MovieLens100K data sets.The data set contains 943 users, and 1682 films are carried out 1 to 5 score data, while every user at least gives a mark to 20 films.It is just common to give tacit consent to original user Family, scoring are normal believable.Structure attack user, attack scale is 15%, and filling rate is respectively 3%, 5%, 10%, 15%, 20%, attack type has random attack, mean value attack, popular attack.Data set is divided into training set and test is gathered, instruction Practice collection comprising 189 normal users, 128 attack users, test set has 754 normal users, 113 attack users (for It is compared with similar scheme, data distribution schemes are tested fully according to reference to other people).In experiment, SVM uses radial direction base letter Number.
It is from accuracy rate and the recall ratio of the different detection methods under different attacks, it can be seen that common as shown in Fig. 3-Fig. 8 The semi-supervised attack detectings of SVM-KNN better than SVM classifier, improved SVM-KNN algorithms (optimize KNN) performance more Excellent, accuracy rate and recall ratio under different attack patterns are above common SVM-KNN detections and SVM classifier detection.
The above is embodiments of the present invention, it is contemplated that there are problems that SVM-KNN to be caused to attack by KNN in the prior art The technical issues of accuracy of detection is poor, detection result is bad, the present invention propose a kind of based on improved SVM-KNN algorithms Semi-supervised support attack detection method, label user data set and unmarked user data set is initially set up, secondly according to a small amount of User data is marked to train an initial SVM classifier, calculate unmarked user data and initial SVM classifier boundary away from From distance is more than the threshold values of setting, then is classified using SVM, otherwise classified using KNN, the data newly marked are added Enter into training set, re -training SVM classifier, it is higher to finally obtain a nicety of grading for the continuous iteration above process SVM classifier.It is utilized the accuracy of the flag data of the detector of semi-supervised learning, and reasonable employment data untagged The regularity of distribution has combined SVM and KNN algorithms, so as to improve generalization ability and detection accuracy, in a small amount of information and constantly change In the environment of change, the performance than previous attack detecting algorithm is more superior.
It these are only the preferred embodiment of the present invention, be not intended to restrict the invention, for those skilled in the art For member, the invention may be variously modified and varied.Any modification for all within the spirits and principles of the present invention, being made, Equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.

Claims (5)

1. a kind of semi-supervised support attack detection method based on improved SVM-KNN algorithms, which is characterized in that including:
User's set is divided into marked user's collection and unmarked user collection, is trained just using marked user collection as training set Beginning SVM classifier;
Preliminary classification is carried out to any sample of users that unmarked user concentrates using initial SVM classifier;
The user data for being will be marked to be incorporated in the training set in preliminary classification, remaining user data is incorporated to Classification boundaries are nearby in vector set;
Using improved KNN similarities formula (1) as the range formula of KNN algorithms, SVM-KNN graders are formed to the classification User near border vector set carries out secondary classification,
Wherein, cosine similarityAdd in the cosine similarity after weighted valueCommon trait item number accounts for total characteristic item number purpose ratioParameter a+b+c=1, xi、dlFor the feature vector of sample, qjkIt is characterized a tk's Weighted value, wikAnd wjk(1≤k≤p) is respectively sample diAnd djIn k-th of characteristic item weight, xik,wjk(1≤k≤p) is the Coordinate in k dimensions, p are characterized a number,Weight for class center vector;
The label user data that KNN algorithm classifications obtain is incorporated in the training set, and utilizes the updated training set The new SVM classifier of re -training;
Judge whether classification results reach optimum detection performance, if it is determined that being then to export final classification device, otherwise recycle to described The user that unmarked user concentrates classifies;
Support attack detecting is carried out to user data using the final classification device.
2. the semi-supervised support attack detection method according to claim 1 based on improved SVM-KNN algorithms, feature exist In described that any sample of users progress preliminary classification that unmarked user concentrates is specifically included using initial SVM classifier:
An optional sample of users is concentrated in the unmarked user, categorised decision function f is calculated using SVM calculation formula (2) (x) value;
Judge the absolute value of the categorised decision function | f (x) | whether it is more than given classification thresholds ε (0<ε<1);
Then it is normal users by the sample labeling if it is determined that being yes.
3. the semi-supervised support attack detection method according to claim 1 based on improved SVM-KNN algorithms, feature exist In the range formula using improved KNN similarities formula (1) as KNN algorithms, to vector set near the classification boundaries In user carry out secondary classification specifically include:
By the sample of users data to be sorted in vector set near the user data in the training set and the classification boundaries into The consistent vectorization of row;
The distance of sample to be sorted and each sample in the training set is calculated using the range formula, and chosen distance is most Arest neighbors of the k near sample as the sample to be sorted;
The weighted value that each sample in arest neighbors belongs to different classifications is calculated successively;
More a certain sample belongs to the size of the weighted value of different classifications, and the sample is divided into the classification of weighted value maximum In.
4. the semi-supervised support attack detection method according to claim 3 based on improved SVM-KNN algorithms, feature exist In a certain sample belongs to the c of each classificationjWeight vectors be qj,qj=(qj1,qj2,...,qjp), wherein qj1+qj2+...+ qjp=1, weighted value qjkIt is characterized a tkCorresponding weight, the size of weighted value represent tkSignificance level in different classifications.
5. the semi-supervised support attack detection method according to claim 1 based on improved SVM-KNN algorithms, feature exist In, it is described to judge whether classification results reach optimum detection performance, if it is determined that be then to export final classification device, otherwise cycle pair The user that the unmarked user concentrates carries out classification and specifically includes:
According to classification results, calculate the accuracy rate and recall ratio of assorting process, wherein, the classification results include really, it is very negative, Vacation is described really and really to bear the data being respectively appropriately determined as attack user and real user, institute just with false minus four kinds of data The false data just born with vacation and be judged to attacking user and real user for mistake are stated, the calculation formula of the accuracy rate is:Accurately Rate=really/(really+vacation is just), the calculation formula of the recall ratio is:Recall ratio=really/(really+false negative);
Judge whether the accuracy rate and the recall ratio reach preset optimal threshold;
If the accuracy rate and recall ratio reach preset optimal threshold, SVM-KNN points of the classification results will be obtained Class device is exported as final classification device, is otherwise recycled the user concentrated to the unmarked user and is classified.
CN201711416340.2A 2017-12-25 2017-12-25 Semi-supervised support attack detection method based on improved SVM-KNN algorithms Pending CN108154178A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711416340.2A CN108154178A (en) 2017-12-25 2017-12-25 Semi-supervised support attack detection method based on improved SVM-KNN algorithms

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711416340.2A CN108154178A (en) 2017-12-25 2017-12-25 Semi-supervised support attack detection method based on improved SVM-KNN algorithms

Publications (1)

Publication Number Publication Date
CN108154178A true CN108154178A (en) 2018-06-12

Family

ID=62464444

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711416340.2A Pending CN108154178A (en) 2017-12-25 2017-12-25 Semi-supervised support attack detection method based on improved SVM-KNN algorithms

Country Status (1)

Country Link
CN (1) CN108154178A (en)

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108769079A (en) * 2018-07-09 2018-11-06 四川大学 A kind of Web Intrusion Detection Techniques based on machine learning
CN109087482A (en) * 2018-09-18 2018-12-25 西安交通大学 A kind of falling detection device and method
CN109299741A (en) * 2018-06-15 2019-02-01 北京理工大学 A kind of network attack kind identification method based on multilayer detection
CN109818929A (en) * 2018-12-26 2019-05-28 天翼电子商务有限公司 Based on the unknown threat cognitive method actively from step study, system, storage medium, terminal
CN109903166A (en) * 2018-12-25 2019-06-18 阿里巴巴集团控股有限公司 A kind of data Risk Forecast Method, device and equipment
CN109934004A (en) * 2019-03-14 2019-06-25 中国科学技术大学 The method of privacy is protected in a kind of machine learning service system
CN110020532A (en) * 2019-04-15 2019-07-16 苏州浪潮智能科技有限公司 A kind of information filtering method, system, equipment and computer readable storage medium
CN110225055A (en) * 2019-06-22 2019-09-10 福州大学 A kind of network flow abnormal detecting method and system based on KNN semi-supervised learning model
CN110428458A (en) * 2018-12-26 2019-11-08 西安电子科技大学 Depth information measurement method based on the intensive shape coding of single frames
CN110602090A (en) * 2019-09-12 2019-12-20 天津理工大学 Block chain-based support attack detection method
CN110808968A (en) * 2019-10-25 2020-02-18 新华三信息安全技术有限公司 Network attack detection method and device, electronic equipment and readable storage medium
CN111757328A (en) * 2020-06-23 2020-10-09 南京林业大学 Cross-technology communication cheating attack detection method
CN112153000A (en) * 2020-08-21 2020-12-29 杭州安恒信息技术股份有限公司 Method and device for detecting network flow abnormity, electronic device and storage medium
CN112288015A (en) * 2020-10-30 2021-01-29 国网四川省电力公司电力科学研究院 Distribution network electrical topology identification method and system based on edge calculation improved KNN
CN112529108A (en) * 2020-12-28 2021-03-19 内蒙动力机械研究所 Machine learning-based nondestructive testing data prediction method for solid rocket engine
CN113079123A (en) * 2020-01-03 2021-07-06 中国移动通信集团广东有限公司 Malicious website detection method and device and electronic equipment
CN113255474A (en) * 2021-05-07 2021-08-13 华中科技大学 Automobile engine fault diagnosis method and device
CN113420772A (en) * 2021-08-24 2021-09-21 常州微亿智造科技有限公司 Defect detection method and device based on multi-classifier and SVDD (singular value decomposition and direct decomposition) cooperative algorithm
CN113469251A (en) * 2021-07-02 2021-10-01 南京邮电大学 Method for classifying unbalanced data
CN113722607A (en) * 2021-06-25 2021-11-30 河海大学 Improved clustering-based trust attack detection method
CN114039794A (en) * 2019-12-11 2022-02-11 支付宝(杭州)信息技术有限公司 Abnormal flow detection model training method and device based on semi-supervised learning
CN116881828A (en) * 2023-07-19 2023-10-13 西华师范大学 Abnormal detection method of KNN algorithm based on subspace similarity

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050100992A1 (en) * 2002-04-17 2005-05-12 Noble William S. Computational method for detecting remote sequence homology
CN104239436A (en) * 2014-08-27 2014-12-24 南京邮电大学 Network hot event detection method based on text classification and clustering analysis
CN105426426A (en) * 2015-11-04 2016-03-23 北京工业大学 KNN text classification method based on improved K-Medoids
CN106250442A (en) * 2016-07-26 2016-12-21 新疆大学 The feature selection approach of a kind of network security data and system
CN106557785A (en) * 2016-11-23 2017-04-05 山东浪潮云服务信息科技有限公司 A kind of support vector machine method of optimization data classification
CN106951466A (en) * 2017-03-01 2017-07-14 常州大学怀德学院 Field text feature and system based on KNN SVM

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050100992A1 (en) * 2002-04-17 2005-05-12 Noble William S. Computational method for detecting remote sequence homology
CN104239436A (en) * 2014-08-27 2014-12-24 南京邮电大学 Network hot event detection method based on text classification and clustering analysis
CN105426426A (en) * 2015-11-04 2016-03-23 北京工业大学 KNN text classification method based on improved K-Medoids
CN106250442A (en) * 2016-07-26 2016-12-21 新疆大学 The feature selection approach of a kind of network security data and system
CN106557785A (en) * 2016-11-23 2017-04-05 山东浪潮云服务信息科技有限公司 A kind of support vector machine method of optimization data classification
CN106951466A (en) * 2017-03-01 2017-07-14 常州大学怀德学院 Field text feature and system based on KNN SVM

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
吕成戍 等: "基于SVM-KNN的半监督托攻击检测方法", 《计算机工程与应用》 *

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109299741A (en) * 2018-06-15 2019-02-01 北京理工大学 A kind of network attack kind identification method based on multilayer detection
CN109299741B (en) * 2018-06-15 2022-03-04 北京理工大学 Network attack type identification method based on multi-layer detection
CN108769079A (en) * 2018-07-09 2018-11-06 四川大学 A kind of Web Intrusion Detection Techniques based on machine learning
CN109087482A (en) * 2018-09-18 2018-12-25 西安交通大学 A kind of falling detection device and method
CN109903166A (en) * 2018-12-25 2019-06-18 阿里巴巴集团控股有限公司 A kind of data Risk Forecast Method, device and equipment
CN109903166B (en) * 2018-12-25 2024-01-30 创新先进技术有限公司 Data risk prediction method, device and equipment
CN110428458A (en) * 2018-12-26 2019-11-08 西安电子科技大学 Depth information measurement method based on the intensive shape coding of single frames
CN109818929A (en) * 2018-12-26 2019-05-28 天翼电子商务有限公司 Based on the unknown threat cognitive method actively from step study, system, storage medium, terminal
CN109934004A (en) * 2019-03-14 2019-06-25 中国科学技术大学 The method of privacy is protected in a kind of machine learning service system
CN110020532A (en) * 2019-04-15 2019-07-16 苏州浪潮智能科技有限公司 A kind of information filtering method, system, equipment and computer readable storage medium
CN110225055A (en) * 2019-06-22 2019-09-10 福州大学 A kind of network flow abnormal detecting method and system based on KNN semi-supervised learning model
CN110602090A (en) * 2019-09-12 2019-12-20 天津理工大学 Block chain-based support attack detection method
CN110808968A (en) * 2019-10-25 2020-02-18 新华三信息安全技术有限公司 Network attack detection method and device, electronic equipment and readable storage medium
CN114039794A (en) * 2019-12-11 2022-02-11 支付宝(杭州)信息技术有限公司 Abnormal flow detection model training method and device based on semi-supervised learning
CN113079123A (en) * 2020-01-03 2021-07-06 中国移动通信集团广东有限公司 Malicious website detection method and device and electronic equipment
CN111757328A (en) * 2020-06-23 2020-10-09 南京林业大学 Cross-technology communication cheating attack detection method
CN112153000A (en) * 2020-08-21 2020-12-29 杭州安恒信息技术股份有限公司 Method and device for detecting network flow abnormity, electronic device and storage medium
CN112153000B (en) * 2020-08-21 2023-04-18 杭州安恒信息技术股份有限公司 Method and device for detecting network flow abnormity, electronic device and storage medium
CN112288015A (en) * 2020-10-30 2021-01-29 国网四川省电力公司电力科学研究院 Distribution network electrical topology identification method and system based on edge calculation improved KNN
CN112529108A (en) * 2020-12-28 2021-03-19 内蒙动力机械研究所 Machine learning-based nondestructive testing data prediction method for solid rocket engine
CN113255474A (en) * 2021-05-07 2021-08-13 华中科技大学 Automobile engine fault diagnosis method and device
CN113722607A (en) * 2021-06-25 2021-11-30 河海大学 Improved clustering-based trust attack detection method
CN113722607B (en) * 2021-06-25 2023-12-08 河海大学 Improved clustering-based bracket attack detection method
CN113469251A (en) * 2021-07-02 2021-10-01 南京邮电大学 Method for classifying unbalanced data
CN113420772A (en) * 2021-08-24 2021-09-21 常州微亿智造科技有限公司 Defect detection method and device based on multi-classifier and SVDD (singular value decomposition and direct decomposition) cooperative algorithm
CN116881828A (en) * 2023-07-19 2023-10-13 西华师范大学 Abnormal detection method of KNN algorithm based on subspace similarity
CN116881828B (en) * 2023-07-19 2024-05-17 西华师范大学 Abnormal detection method of KNN algorithm based on subspace similarity

Similar Documents

Publication Publication Date Title
CN108154178A (en) Semi-supervised support attack detection method based on improved SVM-KNN algorithms
US20210390355A1 (en) Image classification method based on reliable weighted optimal transport (rwot)
Schubert et al. On evaluation of outlier rankings and outlier scores
CN111126482B (en) Remote sensing image automatic classification method based on multi-classifier cascade model
CN103309953B (en) Method for labeling and searching for diversified pictures based on integration of multiple RBFNN classifiers
Liang et al. Learning very fast decision tree from uncertain data streams with positive and unlabeled samples
Fang et al. Confident learning-based domain adaptation for hyperspectral image classification
Yu et al. Cutset-type possibilistic c-means clustering algorithm
CN108877947A (en) Depth sample learning method based on iteration mean cluster
CN107679138A (en) Spectrum signature system of selection based on local scale parameter, entropy and cosine similarity
CN111815582B (en) Two-dimensional code region detection method for improving background priori and foreground priori
CN115577357A (en) Android malicious software detection method based on stacking integration technology
Khezri et al. A novel semi-supervised ensemble algorithm using a performance-based selection metric to non-stationary data streams
Xue et al. Deep constrained low-rank subspace learning for multi-view semi-supervised classification
Seyghaly et al. Interference recognition for fog enabled IoT architecture using a novel tree-based method
Poongodi et al. Support vector machine with information gain based classification for credit card fraud detection system.
Zhou et al. Credit card fraud identification based on principal component analysis and improved AdaBoost algorithm
CN113837266A (en) Software defect prediction method based on feature extraction and Stacking ensemble learning
Singhal et al. Image classification using bag of visual words model with FAST and FREAK
Cong et al. Exact and consistent interpretation of piecewise linear models hidden behind APIs: A closed form solution
CN106529585A (en) Piano music score difficulty identification method based on large-interval projection space learning
CN113128556B (en) Deep learning test case sequencing method based on mutation analysis
Lin et al. Automated classification of Wuyi rock tealeaves based on support vector machine
Zheng et al. An Improved k-Nearest Neighbor Classification Algorithm Using Shared Nearest Neighbor Similarity.
Pryor et al. Deepfake detection analyzing hybrid dataset utilizing CNN and SVM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180612