CN105809030B - A kind of commending system safety detection method based on data tracing - Google Patents
A kind of commending system safety detection method based on data tracing Download PDFInfo
- Publication number
- CN105809030B CN105809030B CN201610120727.2A CN201610120727A CN105809030B CN 105809030 B CN105809030 B CN 105809030B CN 201610120727 A CN201610120727 A CN 201610120727A CN 105809030 B CN105809030 B CN 105809030B
- Authority
- CN
- China
- Prior art keywords
- project
- user
- prediction
- scoring
- matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Computer Hardware Design (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The present invention proposes a kind of commending system safety detection method based on data tracing, and to solve traditional Collaborative Filtering Recommendation System detection user's injection general picture, time-consuming, attack effect is bad, does not adapt to the shortcomings of big data processing.The characteristics of invention can be applied to time nonlinear dynamic system using Extended Kalman filter EKF first, track simultaneously prediction term purpose scoring situation, linear discriminant analysis LDA is recycled to carry out cluster analysis to the scoring abnormal user in project later, so as to judge attack user and its general picture in the project.The use of Extended Kalman filter method reduces the detection to a large amount of extraneous datas, so as to improve detection efficiency, improves the robustness of system.Tracing algorithm is used for the safety detection of commending system, can realize the system detectio of on-line uninterruption, reduce false drop rate.Linear discriminant analysis method is realized to multiple features user's dimensionality reduction, so as to which the general picture injection attacks of malicious user be effectively detected and increase verification and measurement ratio.
Description
Technical field
The present invention relates to a kind of data processing methods, and in particular to data tracing technology hand is utilized under a kind of big data background
The commending system safety detection method that can be applied to e-commerce field of Duan Shixian.
Background technology
The appearance of internet and the data that magnanimity is brought to user are popularized, meet user and information is taken in the information age
The demand of business.But the data volume brought with the rapid development of network increases substantially so that user is in face of mass data
Shi Wufa therefrom obtains the part information actually useful to oneself.For this problem, Internet Service Provider is in order to improve use
Family is experienced, and designs or used Collaborative Filtering Recommendation System, it is intended to by the analysis of this system, may actively be had to user's push
Information.In mobile e-business now, Collaborative Filtering Recommendation System has been widely used, and at the same time recommends
Q&r also become user's growing interest major issue.Due to commending system intrinsic opening and user information
Sensibility, malicious user can be by injecting a large amount of false user profiles (such as void to a certain commodity into commending system
Bogus subscriber score) with achieve the purpose that influence system recommendation authenticity, make commending system generate recommendation meet their interests,
And then degree of belief of the user to commending system is influenced, this behavior is referred to as " user profile injection attacks " (User Profile
Injection Attack) or " support attack " (Shilling Attack).
In practical applications, for different attack purposes, user profile injection attacks can be divided into " pushing away attack " (Push
) and " nuclear attack " (Nuke Attack) two types Attack.The attack effect for pushing away attack is so that targeted commercial item pushes away
It recommends frequency and is apparently higher than other commodity items, so as to achieve the purpose that the targeted commercial item can be highly recommended to user.
The attack effect of nuclear attack is so that the recommended frequency of targeted commercial item is influenced significantly lower than other commodity items so as to reach
The end article is by the effect of system recommendation.
In e-commerce transaction, the manipulation commending system that some manufacturers can be by every means recommends the production of oneself to user
Product suppress rival with this, obtain illegal profit.It is no lack of such example in actual life:In June, 2001, certain company profit
Recommend a recently released film to user with the means for forging film comment.In October, 2011, certain store is because unilaterally developing skill
Annual fee is serviced, and then causes numerous medium and small sellers and the extensive jointly attack of the modes to big shop such as comments by the way that malice is poor.In addition, certain is searched
Rope is also frequently found its searching algorithm and can be cheated by number of site with various means, to promote their rows in retrieval result
Sequence.
The normal operation of the presence meeting severe jamming commending system of these attacks, misleading user receiving or purchase are simultaneously non-real
Required information or article make user gradually lose the trust to this commending system, cause the loss of customers, and commending system will
The double loss of prestige and profit can be suffered.Therefore, the technique study that detection commending system is attacked just is particularly important.
At present, the three classes detection method proposed in the prior art is respectively to have three ways, such as supervision, unsupervised and semi-supervised,
This three classes method has his own strong points, and has also derived Many Detection.As the big data epoch arrive, the data volume of commending system
Rank is growing day by day, but the most methods in above-mentioned three classes had not considered the efficiency of attack detecting, caused at data
Manage inefficiency and take it is long, so be difficult to be suitable for the big data epoch under e-commerce.Therefore, it is necessary to a kind of detection effects
Rate higher, safer commending system detection method efficiently service to realize convenient.
Invention content
The present invention for commending system attack detection method in the prior art there are the defects of, utilize the technologies such as data tracing
Means solve the problems such as detection efficiency is low, system is vulnerable.In the case where big data is widely used in the background of e-commerce, this hair
It is bright to realize more efficient and safe detection service using the polymerization of mass efficient data.
A kind of commending system safety detection method based on data tracing proposed by the present invention includes following three steps:
1. data prediction detects scoring item.
2. PROJECT TRACKING and prediction, when can be applied to using Extended Kalman filter (Extended Kalman Filter)
Between nonlinear dynamic system the characteristics of, track and prediction term purpose scoring situation.
3. attacking user's classification, recycle linear discriminant analysis LDA (Linear Discriminant Analysis) right
Scoring abnormal user in project carries out cluster analysis, so as to judge attack user and its general picture in the project.
Further, above-mentioned data prediction, specifically includes:
A:The scoring of all items in current commending system is traversed, obtains the history score data of all users.
B:According to the respective history score data of all users identify project collection i in project j t moment data statistics
Feature avgtAnd vart。
Above-mentioned PROJECT TRACKING and prediction, specifically include:
C:According to the avg of project jt-1And vart-1Acquisition project j is calculated in the SACA values of t moment and SVCA values.
D:Extended Kalman filter initializes, and obtains current project j and is in the system mode of t momentObservation state Y (t) and system mode error co-variance matrix P (t).
E:System mode X (t+1 | t)=f [t, X (t)]+G (t) W (t) of calculating project j at the t+1 moment, the t+1 moment
Observation state Y (t+1 | t)=h [t, X (t)]+V (t), wherein, W (t) is process noise, and f (t) and h (t) they are that system does single order Thailand
The nonlinear function that expansion obtains is strangled, V (t) is observation noise, and G (t) is noise profile matrix.
F:First-order linear state equation is solved, calculates state-transition matrix Φs (t+1) of the project j at the t+1 moment,
G:First-order linear observational equation is solved, calculates observing matrix Hs (t+1) of the project j at the t+1 moment,
H:System mode error co-variance matrix P is calculated using the project j state-transition matrixes extrapolated at the t+1 moment
(t+1 | t), P (t+1 | t)=Φ (t+1) P (t | t) ΦT(t+1)+Q, wherein, Q be process-noise variance value, ΦTFor turning for Φ
It puts.
I:Solution project j is in the kalman gain matrix K (t+1) and control convergence speed at t+1 moment, K (t+1)=P (t+
1|t)HT(t+1)(H(t+1)P(t+1|t)HT(t+1)+R), wherein, variance yields of the R for white Gaussian noise, HTTransposition for H.
J:Renewal item j is in system mode X (t+1)=X (t+1 | the t)+K (Y (t)-Y (t+1 | t) at t+1 moment), simultaneously
Update corresponding system mode error co-variance matrix P (t+1)=(In- K (t+1) H (t+1)) and P (t+1 | t), wherein InFor correspondence
N rank unit matrixs.
K:According to historical data Accurate Prediction attack user next step behavior, that is, judge at this time Attack Prediction result whether
Has validity.
Above-mentioned attack user classifying step includes:
L:Obtained destination item, which traverses, to be predicted to tracking, obtains all exceptions to score destination item
User's set Atti={ att1,att2,…attlAnd data matrix D, D=[d1,d2,...,dL];
M:Calculate scoring the mean vector μ, the scoring mean vector μ of j-th of project of all abnormal usersj。
N:Calculate scatter matrix S in the class of data matrix DbThe scatter matrix S between classW。
O:A is that dimensionality reduction transformation is vectorial, vectorial diIt is g to convert the obtained projection functions of vector a by dimensionality reductioni=aTdi, lead to
It crossesEquations a, λ areCharacteristic value.
P:Best projection values of the data matrix D on projection plane α be:gi=aTdi, use nearest neighbor algorithm KNN (K-
Nearest Neighbor) projection of data matrix is divided, the attacker in abnormal user scores feature (most because of it
Height scoring and time high score) similitude and aggregation can not be generated by generating aggregation rather than the projection of the abnormal user of attacker, from
And mark off attack user.
Further, in the K steps of above-mentioned PROJECT TRACKING and prediction process, whether judgement Attack Prediction result has effectively
Two necessary conditions of property are respectively:
(1) prediction result is being provided after tracking in short term, the accurate variation in only success prediction short time Δ t,
Prediction result just has timeliness;
(2) it is capable of the correlation circumstance of success prediction project, if prediction only can make correct prediction to non-targeted project,
So its predictive content in accuracy without reference value, only can Accurate Prediction go out the system mode of project, prediction result is
Has validity.
(2) a condition specifically can be expressed as calculating formula:
Wherein, ARuIt represents attack user u and provides the collection that the abnormal project for scoring (highest scores and time high scoring) is formed
It closes (because single attack user might have one or more destination items);totalu,jIt represents user u and provides what is scored extremely
The total degree that project j is tracked;Υu,jThe probability of the expression system project j Accurate Prediction abnormal to user u scorings;CONTu,jMeter
It calculates the close number of state transition equation and observational equation in the short time and (meets the difference of state transition equation and observational equation
Less than given minimum ξ, while meet X (t) >=ρ, Y (t) >=ω, ρ are the abnormal threshold values of state transfer, and ω is observational equation
Abnormal threshold value), i.e., statistics accurately predicts number;CALuRepresent that user may be into when carrying out abnormal scoring to destination item
To attack the probability of user, CALu(more detailed description as a percentage:Because the scoring behavior for attacking user has rule
Property, therefore it can be concluded and is judged by tracking Forecasting Methodology, normal users may also be since itself preference be to mesh
Mark project has chosen the high score for seeming abnormal, but since normal users scoring behavior does not have regularity, pre- therefore, it is difficult to be tracked
Survey and CALuValue can be extremely low, so working as CALuValue is accurately chased after closer to the 100% possessed scoring anomaly item of the explanation user
The degree of track prediction is higher, and the possibility which is attack user is bigger, otherwise the lower expression user of its value is attack use
The possibility at family is smaller).When the PROJECT TRACKING time being more than Δ t, while certain project CALuValue is more than given prediction threshold value η
When, then step L is jumped to, otherwise, return to step E.
The setting of this data area of above-mentioned Δ t is derived from the time that the attack of attack user is implemented, can be according to application
Actual needs flexibly set, preferably 600s-259200s.
Advantageous effect:Commending system safety detection method proposed by the present invention based on data tracing has the following advantages that:
1st, reduced using Extended Kalman filter method EKF (Extended Kalman Filter) to a large amount of unrelated numbers
According to detection so as to improve detection efficiency, and since detection in real time can be carried out to commending system so that system robustness increases
Add.
2nd, the present invention is used for the safety detection of commending system, the system that can realize on-line uninterruption with tracing algorithm for the first time
Detection, reduces false drop rate.
3rd, using linear discriminant analysis method LDA (Linear Discriminant Analysis), further to more
Feature user dimensionality reduction reduces error rate so as to effectively detect the general picture injection attacks of malicious user and increase verification and measurement ratio.
In conclusion the present invention can overcome the problems, such as that traditional detection method is less efficient and accuracy is low, due to focusing on
Tracking and prediction to project avoid a large amount of invalid or inefficient operation, so that the present invention has high efficiency, high detection
The feature of rate and low error rate.
The important terms and its constraint used in the present invention are as follows:
User collects:Set U={ the U of m user's composition in system1, U2…Um};
Item Sets:Set i={ the i of n item design in system1,i2…in, hereinafter we represent to gather with j
J-th of (class) project in i;
Short-term averaging variation liveness (SACA):Short-term averaging variation liveness (short-term average change
Activity attack user) is reflected because constantly raising the average mark that the scoring of destination item causes destination item integrally to score
The situation of fast lifting in a short time, wherein, avgtThe a certain project in Item Sets is represented in the average mark of t moment, τ is SACA
Average mark correction value, FtIt represents and is left after abnormal (highest the scores and time high scoring) project that scores is removed in t moment user profile
Scoring set.Specific calculation is such as formula:
Short-term variance variation liveness (SVCA):Short-term variance variation liveness (short-term variance
Change activity) it is constantly to raise the side that the scoring of destination item causes destination item integrally to score for attack user
The situation that difference quickly reduces in a short time, the attribute display go out the anomalous variation of destination item in a short time, and vart represents project
For a certain project concentrated in the variance of t moment, υ is SVCA correction to variances values, and specific calculation is such as formula:SVCAt=| vart-vart-1-
|Ft|υ|
Extended Kalman filter (EKF):The basic thought of Extended Kalman filter (Extended Kalman Filter)
It is to linearize nonlinear system, then observes data by system input and output, the calculation of optimal estimation is carried out to system mode
Method, its essence is a kind of efficient recursion filters.
Linear discriminant analysis (LDA):The basic think of of linear discriminant analysis (Linear Discriminant Analysis)
Want the pattern sample of higher-dimension projecting to best discriminant technique vector space, classification information and compressive features space dimension are extracted to reach
Several effects, Assured Mode sample has maximum between class distance and minimum inter- object distance, i.e. mould in new subspace after projection
Formula has best separability within this space.Therefore, it is a kind of effective Feature Extraction Method.
Data matrix D:D=[d1,d2,...,dL] score to tracking project the dividing equally of user, variance, median
The row composition data matrix that statistical natures are matrix is waited, wherein each diThe scoring statistical nature of user is contained, and is a h
Capable matrix (h is the quantity of statistical nature).
Scatter matrix S in classW:The matrix of dispersion composition in sample is symmetric positive semidefinite matrix.Discrete matrix from
The dense degree that value represents sample point is dissipated, the value the big more disperses, otherwise more concentrates, which is expressed as(subscript T representing matrixes or vectorial transposition, hereafter all herewith meaning).Assuming that altogether
There is the project of c classification, whereinRepresent i-th of sample point of j intermediate items, wjRepresent the sample number (or weight) of j intermediate items,
If the sample point number in classification is more, then its weight is also bigger;In addition, the scoring mean vector of jth intermediate item isDjRepresent data statistics vector relevant with j intermediate items in data matrix.
Scatter matrix S between classb:The matrix of dispersion composition between sample of all categories, and be symmetric positive semidefinite matrix.No
It is more more discrete better between generic sample point.Wherein score mean vector
Nearest neighbor algorithm (KNN):The basic thought of nearest neighbor algorithm (K-Nearest Neighbor) is if a sample exists
Most of in K in feature space most adjacent samples belong to some classification, then the sample also belongs to this classification,
And with the characteristic of sample in this classification.The present invention uses this method based on the projection result of linear discriminant analysis formula,
Attack user is judged as supplementary means using calculating Euclidean distance.
Dimensionality reduction transformation vector a:Original multidimensional data is reduced to one-dimensional and so that different classes of separating effect is most bright
Aobvious optimal vector.
Random attack:Random attack is a kind of simple challenge model, and the scoring of filling project comes from respective items purpose
Average score, destination item then assign highest scoring or lowest score.Although attacking at random easy to implement, its effect is less
It is preferable.
Prevalence attack:Prevalence attack is a kind of development form attacked at random, and construction thinking meets Zipf long-tail laws
There was only the concern that fewer project can attract plurality people.Attacker is by popular project alternatively project, the stream of project
Stroke degree is weighed usually using the number that it is scored.Popular challenge model attack effect is preferable than random attack effect, real
It applies nor very complicated.
Attack scale (Attack Size):All users in system shared by user profile number are attacked i.e. in points-scoring system
The percentage of general picture number.
Description of the drawings
The present invention is described in further detail with embodiment below in conjunction with the accompanying drawings.
Fig. 1 is the detection method flow chart of commending system.
Fig. 2 is the run time comparison diagram with SVM and UnRAP detection methods.
Specific embodiment
The detection method of the present invention is applied to the e-commerce scene of reality by a specific embodiment presented below
In.Here, the score data library using MovieLens data sets as commending system is tested, the data set is by the U.S.
The GroupLens research groups of Minnesota universities provide, it comprises 943 users to 100000 of 1682 films
Scoring record, therefore have many advantages, such as attribute enrich, data it is true, be widely used in the fields such as data mining.Institute of the present invention
The contrasting detection method used has SVM and UnRAP, and these two kinds of methods are all detection sides more popular in current research and application
Method.The present invention using common random attack in commending system and popular attack to the method for the present invention and SVM and UnRAP methods into
Row attack, the attack scale (Attack Size) of this 2 class attack pattern is respectively 3%, 15%, passes through the comparison of experimental result
To prove the superiority of the method for the present invention.
Next specific implementation steps are described:
1. data prediction
Step 1) traverses all users in current MovieLens data sets, obtains the history score data of all users,
The destination item scoring wherein also attacked comprising attack user.
Step 2) is identified project data statistical characteristics of the j in t moment according to the respective history score data of all users
avgtAnd vart, t=30s, i.e. system are primary per 30s detection datas in the present embodiment.
2. PROJECT TRACKING and prediction
Step 3) is according to the avg of project jt-1And vart-1Calculate the SACA values and SVCA for obtaining current project j in t moment
Value.
Step 4) EKF filtering initialization, obtain current project j is in the system mode of t momentIt sees
Survey state Y (t) and system mode error co-variance matrix P (t).
Step 5) calculates project j in system mode X (t+1 | t)=f [t, X (t)]+G (t) W (t), t+1 at t+1 moment
Observation state Y (t+1 | t)=h [t, X (t)]+V (t) at quarter.
Step 6) solves first-order linear state equation, calculates state-transition matrix Φs (t+1) of the project j at the t+1 moment,
Step 7) solves first-order linear observational equation, calculates observing matrix Hs (t+1) of the project j at the t+1 moment,
Step 8) calculates system mode error covariance using the project j state-transition matrixes extrapolated at the t+1 moment
Matrix P (t+1 | t), P (t+1 | t)=Φ (t+1) P (t | t) ΦT(t+1)+Q。
Step 9) solution project j the kalman gain matrix K (t+1) and control convergence speed at the t+1 moment, K (t+1)=
P(t+1|t)HT(t+1)(H(t+1)P(t+1|t)HT(t+1)+R)。
Step 10) renewal item j is in system mode X (t+1)=X (t+1 | the t)+K (Y (t)-Y (t+1 | t) at t+1 moment),
Update corresponding system mode error co-variance matrix P (t+1)=(I simultaneouslyn-K(t+1)H(t+1))P(t+1|t)。
Step 11) attacks the next step behavior of user according to historical data Accurate Prediction, that is, judges Attack Prediction knot at this time
Whether fruit has validity.Predicting effective two necessary conditions is respectively:(1) prediction knot is being provided after tracking in short term
Accurate variation in fruit, only success prediction short time Δ t, prediction result just have timeliness, this data area of Δ t is set
The time that the fixed attack for being derived from attack user is implemented, the setting of the time have to comply with timely and effective this primary condition of property,
But if increase is too fast, too significantly for the anomaly item then without tracking, ordinary people can also have found the exception of the project, thus
It is exposed, this has also violated the purpose of attacker.Therefore, the original meaning of design here is exactly the implementation of pursuit attack user.And
Obviously this implementation time is unfixed, is set as 600s-259200s in the present embodiment.
It can flexibly be set according to the actual needs of application in fact;(2) it is capable of the correlation circumstance of success prediction project, if in advance
Survey only can make correct prediction to non-targeted project, then its predictive content, without reference value, only can in accuracy
Accurate Prediction goes out the system mode of project, and prediction result just has validity.(2) a condition specifically can be expressed as calculating formula:
ARuRepresent attack user u provide set that the project of abnormal scoring (highest score and time high score) formed (because
One or more destination items are might have for single attack user);totalu,jIt represents user u and provides the project j to score extremely
The total degree being tracked;Υu,jThe probability of the expression system project j Accurate Prediction abnormal to user u scorings;CONTu,jIt calculates short
(difference for meeting state transition equation and observational equation is less than the close number of state transition equation and observational equation in time
Given minimum ξ, while meet X (t) >=ρ, Y (t) >=ω, ρ are the abnormal threshold values of state transfer, and ω is observational equation exception
Threshold value), i.e., statistics accurately predicts number;CALuRepresent that user attacks because to being likely to become during the abnormal scoring of destination item progress
Hit the probability of user, CALu(more detailed description as a percentage:Because the scoring behavior for attacking user has regularity,
Therefore it can be concluded and is judged by tracking Forecasting Methodology, normal users may also be since itself preference be to target item
Mesh has chosen the high score for seeming abnormal, but since normal users scoring behavior does not have regularity, therefore, it is difficult to be tracked prediction and
CALuValue can be extremely low, so working as CALuValue illustrates that scoring anomaly item possessed by the user is pre- by accurate tracking closer to 100%
The degree of survey is higher, and the possibility which is attack user is bigger, otherwise the lower expression user of its value is attack user
Possibility is smaller).When the PROJECT TRACKING time being more than Δ t, while certain project CALuWhen value is more than given prediction threshold value η,
Then jump to step 12), otherwise, return to step 5).
3. attack user's classification
Step 12) predicts tracking the project j in obtained project is traversed, and obtains all to destination item progress
The abnormal user set Att of scoringi={ att1,att2,…attlAnd data matrix D, D=[d1,d2,...,dL];Because it attacks
It hits user to score higher and more concentrate, therefore their mean value and variance are also relatively, and almost identical, utilize this feature
The score data of user can be classified.
Step 13) calculates scoring the mean vector μ, the scoring mean vector μ of j-th of project of all abnormal usersj。
Step 14) calculates scatter matrix S in the class of data matrix DbThe scatter matrix S between classW。
It is that dimensionality reduction transformation is vectorial that step 15), which sets a, vectorial diIt is g by the obtained projection functions of dimensionality reduction transformation vector ai=
aTdi, pass throughEquations a, λ areCharacteristic value.
Best projection values of step 16) the data matrix D on projection plane α be:gi=aTdi, use nearest neighbor algorithm KNN
(K-NearestNeighbor) projection of data matrix is divided, the attacker in abnormal user scores feature (most because of it
Height scoring and time high score) similitude and aggregation can not be generated by generating aggregation rather than the projection of the abnormal user of attacker, from
And mark off attack user.
4. interpretation of result and verification
Fig. 2 is the method for the present invention with being compared with times of the classical attack method SVM and UnRAP when user is attacked in detection.
The present invention using the average time of the processing of each user as the index (average everyone score number 106 times) of run time,
Under the conditions of identical running environment, reach identical Detection accuracy, it is right that the unit interval that the method for the present invention is run is better than other
Ratio method this is because the calculation that the method for the present invention is applied is simply rapid, by comparing on a large scale, finds abnormal number
According to, then these abnormal datas are tracked, judged according to the result of tracking prediction.At the same time, the method for the present invention is taken
Linear discriminant analysis formula method can further reduce required data volume to be processed.And other methods detect malicious user when
It waits, needs constantly to handle abnormal data, and do not have the real-time of the method for the present invention.
Above example shows the malicious user that the method for the present invention can effectively in detecting system, enhances the strong of system
Strong property, and operational efficiency also increases substantially, therefore the method for the present invention has important application value.The foregoing is merely
The specific embodiment of the present invention, is not intended to limit the invention, and data set and attack mode used only limit in the present embodiment
In the present embodiment, all within the spirits and principles of the present invention, any modification, equivalent replacement, improvement and so on should all include
Within protection scope of the present invention.
Claims (4)
1. a kind of commending system safety detection method based on data tracing, it is characterised in that include the following steps:
1) user's history score data pre-processes, and detects the history score data of scoring item;
2) PROJECT TRACKING and prediction, the characteristics of can be applied to time nonlinear dynamic system using Extended Kalman filter, tracking
And prediction term purpose scoring situation;
3) attack user classification carries out cluster analysis using linear discriminant analysis formula to the scoring abnormal user in project, so as to
Judge attack user and its general picture in the project.
2. commending system safety detection method according to claim 1, it is characterised in that:
The data prediction specifically includes:
A:The scoring of all items in current commending system is traversed, obtains the history score data of all users;
B:According to the respective history score data of all users identify project collection i in project j t moment data statistical characteristics
The mean value that scores avgtAnd scoring variance vart;
The PROJECT TRACKING and prediction specifically include:
C:According to the avg of project jt-1And vart-1Calculate acquisition project j t moment short-term averaging variation liveness SACA values and
Short-term variance changes liveness SVCA values;
D:Extended Kalman filter initializes, and obtains current project j and is in the system mode of t momentObservation
State Y (t) and system mode error co-variance matrix P (t);
E:Calculating project j is in system mode X (t+1 | t)=f [t, X (t)]+G (t) W (t) at t+1 moment, the observation at t+1 moment
State Y (t+1 | t)=h [t, X (t)]+V (t), wherein, W (t) is process noise, and f (t) and h (t) they are that system does first order Taylor exhibition
The nonlinear function opened, V (t) are observation noise, and G (t) is noise profile matrix;
F:First-order linear state equation is solved, calculates state-transition matrix Φs (t+1) of the project j at the t+1 moment,
G:First-order linear observational equation is solved, calculates observing matrix Hs (t+1) of the project j at the t+1 moment,
H:System mode error co-variance matrix P (t+1 are calculated using the project j state-transition matrixes extrapolated at the t+1 moment
| t), and P (t+1 | t)=Φ (t+1) P (t | t) ΦT(t+1)+Q, wherein, Q be process-noise variance value, ΦTTransposition for Φ;
I:Solution project j the kalman gain matrix K (t+1) and control convergence speed at the t+1 moment, K (t+1)=P (t+1 | t)
HT(t+1)(H(t+1)P(t+1|t)HT(t+1)+R), wherein, variance yields of the R for white Gaussian noise, HTTransposition for H;
J:Renewal item j is in system mode X (t+1)=X (t+1 | the t)+K (Y (t)-Y (t+1 | t) at t+1 moment), it updates simultaneously
Corresponding system mode error co-variance matrix P (t+1)=(In- K (t+1) H (t+1)) and P (t+1 | t), wherein InFor corresponding n
Rank unit matrix;
K:The next step behavior of user is attacked according to historical data Accurate Prediction, that is, judges whether Attack Prediction result has at this time
Validity;
The attack user classification specifically includes:
L:Obtained destination item, which traverses, to be predicted to tracking, obtains all abnormal users to score destination item
Set Atti={ att1,att2,L attlAnd data matrix D, D=[d1,d2,...,dL];
M:Calculate scoring the mean vector μ, the scoring mean vector μ of j-th of project of all abnormal usersj;
N:Calculate scatter matrix S in the class of data matrix DbThe scatter matrix S between classW;
O:A is that dimensionality reduction transformation is vectorial, vectorial diIt is g to convert the obtained projection functions of vector a by dimensionality reductioni=aTdi, pass throughEquations a, λ areCharacteristic value;
P:Best projection values of the data matrix D on projection plane α be:gi=aTdi, using nearest neighbor algorithm KNN to data matrix
Projection divided, the attacker in abnormal user produces due to similitude of its score scoring of feature, that is, highest and time high scoring
The projection of the abnormal user of raw aggregation rather than attacker can not generate aggregation, so as to mark off attack user.
3. commending system safety detection method according to claim 2, it is characterised in that the PROJECT TRACKING and predicted
In the K steps of journey, two necessary conditions whether judgement Attack Prediction result has validity are respectively:
1) prediction result is being provided after tracking in short term, the accurate variation in only success prediction short time Δ t, prediction knot
Fruit just has timeliness;
2) it is capable of the correlation circumstance of success prediction project, if prediction only can make correct prediction to non-targeted project, then
Its predictive content in accuracy without reference value, only can Accurate Prediction go out the system mode of project, prediction result just has
Validity can be expressed with the formula that is calculated as below:
Wherein, ARuIt represents attack user u and provides the set that abnormal scoring i.e. highest scores and the project of time high scoring is formed;
totalu,jIt represents user u and provides the total degree that the project j to score extremely is tracked;
Υu,jThe probability of the expression system project j Accurate Prediction abnormal to user u scorings;
CONTu,jThe close number of state transition equation and observational equation in the short time is calculated, counts accurately prediction number;
CALuProbability of the user because carrying out being likely to become attack user to destination item during abnormal scoring is represented, when PROJECT TRACKING
Between when being more than Δ t, while certain project CALuWhen value is more than given prediction threshold value η, then step L is jumped to, otherwise return to step
Rapid E.
4. commending system safety detection method according to claim 3, it is characterised in that the ranging from 600s- of the Δ t
259200s。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610120727.2A CN105809030B (en) | 2016-03-03 | 2016-03-03 | A kind of commending system safety detection method based on data tracing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610120727.2A CN105809030B (en) | 2016-03-03 | 2016-03-03 | A kind of commending system safety detection method based on data tracing |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105809030A CN105809030A (en) | 2016-07-27 |
CN105809030B true CN105809030B (en) | 2018-07-10 |
Family
ID=56466017
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610120727.2A Active CN105809030B (en) | 2016-03-03 | 2016-03-03 | A kind of commending system safety detection method based on data tracing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105809030B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111126103B (en) * | 2018-10-30 | 2023-09-26 | 百度在线网络技术(北京)有限公司 | Method and device for judging life stage state of user |
CN109391626B (en) * | 2018-11-15 | 2021-07-30 | 东信和平科技股份有限公司 | Method and related device for judging whether network attack result is unsuccessful |
CN111192429A (en) * | 2020-01-20 | 2020-05-22 | 天津合极电气科技有限公司 | Fire early warning detection method based on charge trajectory tracking technology |
CN111175046A (en) * | 2020-03-18 | 2020-05-19 | 北京工业大学 | Rolling bearing fault diagnosis method based on manifold learning and s-k-means clustering |
CN112312169B (en) * | 2020-11-20 | 2022-09-30 | 广州欢网科技有限责任公司 | Method and equipment for checking program scoring validity |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102118382A (en) * | 2010-10-31 | 2011-07-06 | 华南理工大学 | System and method for detecting attack of collaborative recommender based on interest combination |
CN102184364A (en) * | 2011-05-26 | 2011-09-14 | 南京财经大学 | Semi-supervised learning-based recommendation system shilling attack detection method |
CN104809393A (en) * | 2015-05-11 | 2015-07-29 | 重庆大学 | Shilling attack detection algorithm based on popularity classification features |
-
2016
- 2016-03-03 CN CN201610120727.2A patent/CN105809030B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102118382A (en) * | 2010-10-31 | 2011-07-06 | 华南理工大学 | System and method for detecting attack of collaborative recommender based on interest combination |
CN102184364A (en) * | 2011-05-26 | 2011-09-14 | 南京财经大学 | Semi-supervised learning-based recommendation system shilling attack detection method |
CN104809393A (en) * | 2015-05-11 | 2015-07-29 | 重庆大学 | Shilling attack detection algorithm based on popularity classification features |
Also Published As
Publication number | Publication date |
---|---|
CN105809030A (en) | 2016-07-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105809030B (en) | A kind of commending system safety detection method based on data tracing | |
Horvat et al. | The use of machine learning in sport outcome prediction: A review | |
Ryman-Tubb et al. | How Artificial Intelligence and machine learning research impacts payment card fraud detection: A survey and industry benchmark | |
Yang et al. | Estimating user behavior toward detecting anomalous ratings in rating systems | |
CN102184364A (en) | Semi-supervised learning-based recommendation system shilling attack detection method | |
CN104484602B (en) | A kind of intrusion detection method, device | |
CN101826105B (en) | Phishing webpage detection method based on Hungary matching algorithm | |
Ramchandran et al. | Unsupervised anomaly detection for high dimensional data—An exploratory analysis | |
Giatsoglou et al. | Nd-sync: Detecting synchronized fraud activities | |
Zhang et al. | Graph embedding-based approach for detecting group shilling attacks in collaborative recommender systems | |
Du et al. | An overview of correlation-filter-based object tracking | |
CN104851025A (en) | Case-reasoning-based personalized recommendation method for E-commerce website commodity | |
Mahmood et al. | Using machine learning techniques for rising star prediction in basketball | |
CN104751353A (en) | Cluster and Slope One prediction based collaborative filtering method | |
JP2020524346A (en) | Method, apparatus, computer device, program and storage medium for predicting short-term profits | |
Maldonado et al. | Out-of-time cross-validation strategies for classification in the presence of dataset shift | |
Roy et al. | Exploiting Deep Learning Based Classification Model for Detecting Fraudulent Schemes over Ethereum Blockchain | |
Geng et al. | Novel IAPSO-LSTM neural network for risk analysis and early warning of food safety | |
CN109857928A (en) | User preference prediction technique based on polynary credit evaluation | |
Mim et al. | A soft voting ensemble learning approach for credit card fraud detection | |
Tekin et al. | Customer lifetime value prediction for gaming industry: fuzzy clustering based approach | |
CN106682875A (en) | Data analyzing and processing technology based marketing campaign prize supplier recommendation method | |
CN114510645B (en) | Method for solving long-tail recommendation problem based on extraction of effective multi-target groups | |
Srivastava et al. | Usage of Analytics in the World of Sports | |
CN108805162A (en) | A kind of saccharomycete multiple labeling feature selection approach and device based on particle group optimizing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
EE01 | Entry into force of recordation of patent licensing contract |
Application publication date: 20160727 Assignee: NUPT INSTITUTE OF BIG DATA RESEARCH AT YANCHENG CO., LTD. Assignor: Nanjing Post & Telecommunication Univ. Contract record no.: X2019980001249 Denomination of invention: Data tracking based recommendation system security detection method Granted publication date: 20180710 License type: Common License Record date: 20191224 |
|
EE01 | Entry into force of recordation of patent licensing contract |