CN105809030A

CN105809030A - Data tracking based recommendation system security detection method

Info

Publication number: CN105809030A
Application number: CN201610120727.2A
Authority: CN
Inventors: 黄海平; 李峰; 朱洁; 叶宁; 王鹏; 王汝传; 沙超; 吴鹏飞
Original assignee: Nanjing Post and Telecommunication University
Current assignee: Nanjing Post and Telecommunication University; Nanjing University of Posts and Telecommunications
Priority date: 2016-03-03
Filing date: 2016-03-03
Publication date: 2016-07-27
Anticipated expiration: 2036-03-03
Also published as: CN105809030B

Abstract

The invention proposes a data tracking based recommendation system security detection method to overcome the shortcomings of long time for user profile injection, poor attack effect, incapability of adapting to big data processing and the like in conventional collaborative filtering recommendation system detection. According to the method, a score state of a project is tracked and predicted by using a characteristic that extended Kalman filtering (EKF) can be applied to a time nonlinear dynamic system, and then users with abnormal scores in the project are subjected to clustering analysis by utilizing linear discriminant analysis (LDA), so that attack users in the project and the profiles of the users can be determined. With the adoption of the EKF method, the detection of a large amount of unrelated data is reduced, so that the detection efficiency is improved and the system robustness is enhanced. A tracking algorithm is used for recommendation system security detection and can realize online continuous system detection, so that the error detection rate is reduced. The LDA method can perform dimension reduction on multi-characteristic users, so that the profile injection attack of malicious users is effectively detected and the detection rate is increased.

Description

A kind of commending system safety detection method based on data tracing

Technical field

The present invention relates to a kind of data processing method, the commending system safety detection method that can be applicable to e-commerce field being specifically related under a kind of big data background to utilize data tracing technological means to realize.

Background technology

The appearance of the Internet and popularize the data bringing magnanimity to user, meet user in the information age demand to information service.But increasing substantially of the data volume brought along with developing rapidly of network so that user cannot therefrom obtain the part information that oneself is actually useful when in the face of mass data.For this problem, Internet Service Provider, in order to improve Consumer's Experience, designs or employs Collaborative Filtering Recommendation System, it is intended to by the analysis of this system, actively push the information come in handy to user.In mobile e-business now, Collaborative Filtering Recommendation System obtains a wide range of applications, and the q&r meanwhile recommended also becomes the major issue of user's growing interest.The opening intrinsic due to commending system and the sensitivity of user profile, malicious user can pass through to inject a large amount of false user profile (such as the fictitious users of a certain commodity being marked) in commending system and recommend the purpose of verity to reach influential system, the recommendation that commending system produces is made to meet their interests, and then affecting user's degree of belief to commending system, this behavior is called " user profile injection attacks " (UserProfileInjectionAttack) or " holder is attacked " (ShillingAttack).

In actual applications, for different attack purposes, user profile injection attacks can be divided into " pushing away attack " (PushAttack) and " nuclear attack " (NukeAttack) two types.The attack effect pushing away attack is so that the recommended frequency of targeted commercial item is apparently higher than other commodity items, can by strong preference to the purpose of user thus reaching this targeted commercial item.The attack effect of nuclear attack is so that the recommended frequency of targeted commercial item is significantly lower than other commodity items, thus reaching to affect this end article by the effect of system recommendation.

In e-commerce transaction, some manufacturers can handle commending system by every means and recommend the product of oneself to user, suppress rival with this, obtain illegal profit.Being no lack of such example in actual life: June calendar year 2001, certain company utilizes the means forging film comment to recommend a recently released film to user.In October, 2011, certain store is because of the service annual fee that unilaterally develops skill, and then causes numerous medium and small seller and the mode extensive jointly attack to big shop such as comment by malice is poor.Additionally, the searching algorithm that certain search is also frequently found it can be cheated with various means by number of site, to promote they sequences in retrieval result.

The normal operation of the existence meeting severe jamming commending system of these attacks, mislead user accept or buy not really necessary information or article, make user lose the trust to this commending system gradually, cause the loss of customers, it is recommended that system will suffer the double loss of prestige and profit.Therefore, the technique study that detection commending system is hacked just is particularly important.

At present, the three class detection methods proposed in prior art respectively have supervision, nothing supervision and semi-supervised three kinds of modes, and this three classes method has his own strong points, and has also derived Many Detection.Along with big data age arrives, the data volume rank of commending system grows with each passing day, but the most methods of above-mentioned three apoplexy due to endogenous wind had not considered the efficiency of attack detecting, cause that data-handling efficiency is low and consuming time long, so being difficult to the ecommerce being applicable under big data age.Accordingly, it would be desirable to the commending system detection method that a kind of detection efficiency is higher, safer, realize convenient and service efficiently.

Summary of the invention

The present invention is directed to the defect that in prior art, commending system attack detection method exists, utilize the technological means such as data tracing to solve the problems such as detection efficiency is low, system is vulnerable.Being widely used under the background of ecommerce in big data, the present invention utilizes the polymerization of mass efficient data can realize more efficient and safe detection service.

A kind of commending system safety detection method based on data tracing that the present invention proposes includes three below step:

1. data prediction, detects scoring item.

2. PROJECT TRACKING and prediction, utilizes EKF (ExtendedKalmanFilter) to can be applicable to the feature of time nonlinear dynamic system, follows the trail of and prediction term purpose scoring situation.

3. attacking user's classification, the scoring abnormal user in project is carried out cluster analysis by recycling linear discriminant analysis LDA (LinearDiscriminantAnalysis), thus the attack user judged in this project and general picture thereof.

Further, above-mentioned data prediction, specifically include:

A: travel through the scoring of all items in current commending system, it is thus achieved that the history score data of all users.

B: identify project the project j in collection i at the data statistical characteristics avg of t according to the respective history score data of all users_tAnd var_t。

Above-mentioned PROJECT TRACKING and prediction, specifically include:

C: the avg according to project j_t-1And var_t-1Calculate acquisition project j in the SACA value of t and SVCA value.

D: EKF initializes, obtaining current project j in the system mode of t isObserver state Y (t) and system mode error co-variance matrix P (t).

E: computational item j at system mode X (t+1 | the t)=f [t in t+1 moment, X (t)]+G (t) W (t), the observer state Y in t+1 moment (t+1 | t)=h [t, X (t)]+V (t), wherein, W (t) is process noise, f (t) and h (t) is the nonlinear function that system does that first order Taylor launches to obtain, V (t) is observation noise, and G (t) is noise profile matrix.

F: solve first-order linear state equation, calculates the project j state-transition matrix Φ (t+1) in the t+1 moment,

G: solve first-order linear observational equation, computational item j at the observing matrix H (t+1) in t+1 moment,

H: utilize project j to calculate system mode error co-variance matrix P (t+1 | t), P (t+1 | t)=Φ (t+1) P (t | t) Φ at the state-transition matrix that the t+1 moment extrapolates^T(t+1)+Q, wherein, Q is process-noise variance value, Φ^TTransposition for Φ.

I: solve project j in the kalman gain matrix K (t+1) in t+1 moment control convergence speed, K (t+1)=P (t+1 | t) H^T(t+1)(H(t+1)P(t+1|t)H^T(t+1)+R), wherein, R is the variance yields of white Gaussian noise, H^TTransposition for H.

J: renewal item j at system mode X (t+1)=X (t+1 | the t)+K (Y (t)-Y (t+1 | t) in t+1 moment), update corresponding system mode error co-variance matrix P (t+1)=(I simultaneously_n-K (t+1) H (t+1)) and P (t+1 | t), wherein I_nFor corresponding n rank unit matrix.

K: attack next step behavior of user according to historical data Accurate Prediction, namely judge now whether Attack Prediction result possesses effectiveness.

Above-mentioned attack user's classifying step comprises:

L: predict that the destination item obtained travels through to following the trail of, it is thus achieved that all abnormal user set Att that destination item is marked_i={ att₁,att₂,…att_lAnd data matrix D, D=[d₁,d₂,...,d_L]；

M: calculate the scoring mean vector μ of all abnormal user, the scoring mean vector μ of jth project_j。

N: calculate scatter matrix S in the class of data matrix D_bAnd scatter matrix S between class_W。

O:a is dimensionality reduction conversion vector, vector d_iThe projection function obtained by dimensionality reduction conversion vector a is g_i=a^Td_i, pass throughEquations a, λ areEigenvalue.

P: the data matrix D best projection value on projection plane α is: g_i=a^Td_iUse nearest neighbor algorithm KNN (K-NearestNeighbor) that the projection of data matrix is divided, assailant in abnormal user produces because of the similarity of its feature of marking (the highest scoring and second highest scoring) to assemble, but not the projection of the abnormal user of assailant cannot produce to assemble, thus mark off attack user.

Further, in the K step of above-mentioned PROJECT TRACKING and prediction process, it is determined that whether Attack Prediction result possesses two essential conditions of effectiveness respectively:

(1) namely provide and predict the outcome after following the trail of through short-term, only the accurate change in success prediction short time Δ t, it was predicted that result just has ageing；

(2) can the correlation circumstance of success prediction project, if prediction is only capable of, and non-targeted project is made correct prediction, so its predictive content is in accuracy without reference value, only can go out the system mode of project by Accurate Prediction, it was predicted that result just possesses effectiveness.

(2nd) individual condition specifically can be expressed as calculating formula:

Wherein, AR_uRepresent attack user u and provide the set (because single attack user there may be one or more destination item) that the project of abnormal scoring (the highest scoring and second highest scoring) forms；total_u,jRepresent user u and provide the total degree tracked for project j of abnormal scoring；Υ_u,jRepresent the probability of the project j Accurate Prediction of system of users u scoring exception；CONT_u,jCalculate the close number of times of state transition equation and observational equation in the short time and (namely meet the difference of state transition equation and observational equation less than given minimum ξ, meet X (t) >=ρ simultaneously, Y (t) >=ω, ρ is the abnormal threshold value of state transfer, ω is observational equation exception threshold value), namely add up and predict number of times accurately；CAL_uRepresent that user is likely to become the probability attacking user, CAL because destination item carries out abnormal scoring_u(more detailed description: because the scoring behavior attacking user has regularity as a percentage, therefore can pass through to follow the trail of Forecasting Methodology it is concluded and judges, normal users is possibly also owing to destination item has been chosen the high score seeming abnormal by self preference, but owing to normal users scoring behavior does not have regularity, therefore, it is difficult to tracked prediction and CAL_uValue can be extremely low, so working as CAL_uThe anomaly item of marking that value has closer to 100% this user of explanation is more high by the degree accurately following the trail of prediction, and the probability that this user is attack user is more big, otherwise the probability that its value this user of more low expression is attack user is more little).When the PROJECT TRACKING time exceedes Δ t, certain project CAL simultaneously_uWhen value exceedes given prediction threshold value η, then jump to step L, otherwise, return step E.

The setting of above-mentioned Δ this scope of data of t stems from the time attacking enforcement attacking user, it is possible to be actually needed flexible setting according to what apply, it is preferable that 600s-259200s.

Beneficial effect: the commending system safety detection method based on data tracing that the present invention proposes has the advantage that

1, EKF method EKF (ExtendedKalmanFilter) is adopted to decrease the detection to a large amount of extraneous data thus improve detection efficiency, and owing to commending system can be carried out detection in real time so that system robustness increases.

2, the present invention is used for the safety detection of commending system first with tracing algorithm, it is possible to realizes the system detection of on-line uninterruption, reduces false drop rate.

3, adopt linear discriminant analysis method LDA (LinearDiscriminantAnalysis), further to multiple features user's dimensionality reduction thus effectively detecting the general picture injection attacks of malicious user and adding verification and measurement ratio, reduce error rate.

In sum, the present invention can overcome the problem that traditional detection method is inefficient and accuracy is low, owing to focusing on the tracking of project and prediction, it is to avoid substantial amounts of invalid or inefficient operation, so that the present invention has the feature of high efficiency, high detection rate and low error rate.

The important terms and the constraint thereof that use in the present invention are as follows:

User collects: the set U={U of m user's composition in system₁, U₂…U_m}；

Item Sets: the set i={i of n item design in system₁,i₂…i_n, we represent jth (class) project in set i with j hereinafter；

Short-term averaging change liveness (SACA): short-term averaging change liveness (short-termaveragechangeactivity) reflects the situation attacking the average mark fast lifting in a short time that user causes destination item entirety to be marked because of the scoring constantly raising destination item, wherein, avg_tRepresenting the average mark in t of a certain project in Item Sets, τ is SACA average mark correction value, F_tRepresent remaining scoring set after removing abnormal scoring (the highest scoring and second highest scoring) project in t user profile.Specifically calculate such as formula:

Short-term variance change liveness (SVCA): short-term variance change liveness (short-termvariancechangeactivity) is constantly to raise, for attack user, the situation that the scoring of destination item causes the variance that destination item entirety is marked quickly to reduce in a short time, this attribute display goes out destination item ANOMALOUS VARIATIONS in a short time, vart represents the variance in t of a certain project in Item Sets, υ is SVCA correction to variances value, specifically calculates such as formula: SVCA_t=| var_t-var_t-1-|F_t|υ|

EKF (EKF): the basic thought of EKF (ExtendedKalmanFilter) is by nonlinear system linearisation, then pass through system input and output observation data, system mode is carried out the algorithm of optimal estimation, its essence is a kind of high efficiency recursion filter.

Linear discriminant analysis (LDA): the basic thought of linear discriminant analysis (LinearDiscriminantAnalysis) is that the pattern sample of higher-dimension is projected to best discriminant technique vector space, to reach to extract the effect of classification information and compressive features space dimensionality, after projection, Assured Mode sample has maximum between class distance and minimum inter-object distance in new subspace, and namely pattern has the separability of the best within this space.Therefore, it is a kind of effective Feature Extraction Method.

Data matrix D:D=[d₁,d₂,...,d_L] it is row composition data matrix, wherein each d of matrix to the statistical nature such as the dividing equally of the user that tracking project is marked, variance, median_iContain the scoring statistical nature of user, and be the matrix (h is the quantity of statistical nature) of a h row.

Scatter matrix S in class_W: the matrix of the dispersion composition in sample is symmetric positive semidefinite matrix.The centrifugal pump of discrete matrix represents the dense degree of sample point, and value is more big more disperses, otherwise more concentrates, and this matrix table is shown as(subscript T representing matrix or vector transposition, hereafter all herewith implication).Assume the project of total c classification, whereinRepresent the i-th sample point of j intermediate item, w_jRepresent the sample number (or weight) of j intermediate item, if the sample point number in classification is more many, then its weight is also more big；Additionally, the scoring mean vector of jth intermediate item isD_jRepresent data statistics vector relevant to j intermediate item in data matrix.

Scatter matrix S between class_b: the matrix of the dispersion composition between sample of all categories, and be symmetric positive semidefinite matrix.Between different classes of sample point more discrete more good.Wherein mark mean vector

Nearest neighbor algorithm (KNN): the basic thought of nearest neighbor algorithm (K-NearestNeighbor) is if the great majority in the K in feature space sample the most adjacent of sample belong to some classification, then this sample falls within this classification, and has the characteristic of sample in this classification.The present invention uses the method based on the projection result of linear discriminant analysis formula, with calculate Euclidean distance for supplementary means to attack user judge.

Dimensionality reduction conversion vector a: original multidimensional data be reduced to one-dimensional and make the different classes of the most obvious optimal vector of separating effect.

Random attack: random attack is a kind of simple attack model, and its filling project is marked and come from respective items purpose average score, and destination item then gives the highest scoring or minimum scoring.Although attacking easy to implement at random, but its effect being not ideal.

Popular attack: popular attack is the random one attacked development form, and its structure thinking meets Zipf long-tail law and namely only has fewer project can attract the concern of plurality people.Assailant is using popular project as the project of selection, and the popularity degree of project generally uses its number of times being scored to weigh.Popular attack model attack effect is better than random attack effect, and implementing neither be very complicated.

Attack scale (AttackSize): namely attack the percentage ratio of all user profile numbers in system shared by user profile number in marking system.

Accompanying drawing explanation

Below in conjunction with drawings and embodiments, the present invention is further detailed explanation.

Fig. 1 is the detection method flow chart of commending system.

Fig. 2 is the operation time comparison diagram with SVM and UnRAP detection method.

Detailed description of the invention

A specific embodiment presented below, to be applied to the detection method of the present invention in the ecommerce scene of reality.At this, use MovieLens data set is tested as the score data storehouse of commending system, this data set is provided by the GroupLens research group of Minnesota university of the U.S., it comprises 943 users, 100000 scoring records to 1682 films, therefore there is the advantages such as attribute is abundant, data are true, be widely used in the fields such as data mining.Comparison and detection method used in the present invention has SVM and UnRAP, and this two classes method is all detection method comparatively popular in current research and application.The present invention adopts random attack common in commending system with popular attack, the inventive method and SVM and UnRAP method to be attacked, the attack scale (AttackSize) of this 2 class attack pattern respectively 3%, 15%, the comparison of result proves the superiority of the inventive method by experiment.

Next concrete enforcement step is described:

1. data prediction

Step 1) travel through all users in current MovieLens data set, it is thus achieved that the history score data of all users, wherein also comprise and attack the destination item scoring that user attacks.

Step 2) identify project the j data statistical characteristics avg in t according to the respective history score data of all users_tAnd var_t, t=30s in the present embodiment, namely the every 30s of system detects data once.

2. PROJECT TRACKING and prediction

Step 3) avg according to project j_t-1And var_t-1Calculate and obtain current project j in the SACA value of t and SVCA value.

Step 4) EKF filters initialization, and obtaining current project j in the system mode of t isObserver state Y (t) and system mode error co-variance matrix P (t).

Step 5) computational item j is at system mode X (t+1 | the t)=f [t in t+1 moment, X (t)]+G (t) W (t), the observer state Y in t+1 moment (t+1 | t)=h [t, X (t)]+V (t).

Step 6) solve first-order linear state equation, calculate the project j state-transition matrix Φ (t+1) in the t+1 moment,

Step 7) solve first-order linear observational equation, computational item j at the observing matrix H (t+1) in t+1 moment,

Step 8) utilize project j to calculate system mode error co-variance matrix P (t+1 | t), P (t+1 | t)=Φ (t+1) P (t | t) Φ at the state-transition matrix that the t+1 moment extrapolates^T(t+1)+Q。

Step 9) solve project j in the kalman gain matrix K (t+1) in t+1 moment control convergence speed, K (t+1)=P (t+1 | t) H^T(t+1)(H(t+1)P(t+1|t)H^T(t+1)+R)。

Step 10) renewal item j is at system mode X (t+1)=X (t+1 | the t)+K (Y (t)-Y (t+1 | t) in t+1 moment), update corresponding system mode error co-variance matrix P (t+1)=(I simultaneously_n-K(t+1)H(t+1))P(t+1|t)。

Step 11) according to historical data Accurate Prediction attack user next step behavior, namely judge now whether Attack Prediction result possesses effectiveness.Predict effective two essential conditions respectively: (1) namely provides after following the trail of through short-term and predicts the outcome, the only accurate change in success prediction short time Δ t, predicting the outcome, it is ageing just to have, the setting of Δ this scope of data of t stems from the time attacking enforcement attacking user, the setting of this time has to comply with timely and effective this primary condition of property, if but increase and significantly talk about very much too soon, this anomaly item is then without following the trail of, ordinary people also can find the exception of this project, thus being exposed, this has also run counter to the purpose of assailant.Therefore, the original meaning of design here is exactly the enforcement of pursuit attack user.And this enforcement time is unfixed obviously, it is set to 600s-259200s in the present embodiment.

In fact can be actually needed flexible setting according to what apply；(2) can the correlation circumstance of success prediction project, if prediction is only capable of, and non-targeted project is made correct prediction, so its predictive content is in accuracy without reference value, only can go out the system mode of project by Accurate Prediction, it was predicted that result just possesses effectiveness.(2nd) individual condition specifically can be expressed as calculating formula:

AR_uRepresent attack user u and provide the set (because single attack user there may be one or more destination item) that the project of abnormal scoring (the highest scoring and second highest scoring) forms；total_u,jRepresent user u and provide the total degree tracked for project j of abnormal scoring；Υ_u,jRepresent the probability of the project j Accurate Prediction of system of users u scoring exception；CONT_u,jCalculate the close number of times of state transition equation and observational equation in the short time and (namely meet the difference of state transition equation and observational equation less than given minimum ξ, meet X (t) >=ρ simultaneously, Y (t) >=ω, ρ is the abnormal threshold value of state transfer, ω is observational equation exception threshold value), namely add up and predict number of times accurately；CAL_uRepresent that user is likely to become the probability attacking user, CAL because destination item carries out abnormal scoring_u(more detailed description: because the scoring behavior attacking user has regularity as a percentage, therefore can pass through to follow the trail of Forecasting Methodology it is concluded and judges, normal users is possibly also owing to destination item has been chosen the high score seeming abnormal by self preference, but owing to normal users scoring behavior does not have regularity, therefore, it is difficult to tracked prediction and CAL_uValue can be extremely low, so working as CAL_uThe anomaly item of marking that value has closer to 100% this user of explanation is more high by the degree accurately following the trail of prediction, and the probability that this user is attack user is more big, otherwise the probability that its value this user of more low expression is attack user is more little).When the PROJECT TRACKING time exceedes Δ t, certain project CAL simultaneously_uWhen value exceedes given prediction threshold value η, then jump to step 12), otherwise, return step 5).

3. attack user's classification

Step 12) the project j followed the trail of in the project that prediction obtains is traveled through, it is thus achieved that all abnormal user set Att that destination item is marked_i={ att₁,att₂,…att_lAnd data matrix D, D=[d₁,d₂,...,d_L]；Because it is higher and comparatively concentrate to attack user's scoring, therefore their average and variance are also relatively, and are close to identical, utilize this feature the score data of user to be classified.

Step 13) calculate the scoring mean vector μ, the scoring mean vector μ of jth project of all abnormal user_j。

Step 14) calculate data matrix D class in scatter matrix S_bAnd scatter matrix S between class_W。

Step 15) set a be dimensionality reduction conversion vector, vector d_iIt is g by the dimensionality reduction conversion vector a projection function obtained_i=a^Td_i, pass throughEquations a, λ areEigenvalue.

Step 16) data matrix D best projection value on projection plane α is: g_i=a^Td_iUse nearest neighbor algorithm KNN (K-NearestNeighbor) that the projection of data matrix is divided, assailant in abnormal user produces because of the similarity of its feature of marking (the highest scoring and second highest scoring) to assemble, but not the projection of the abnormal user of assailant cannot produce to assemble, thus mark off attack user.

4. interpretation of result and checking

Fig. 2 is that the inventive method contrasts with classical attack method SVM and the UnRAP time when user is attacked in detection.The present invention is using the index (on average everyone mark number of times 106 times) of the average time of the process of each user as the time of operation, when identical running environment, reach identical Detection accuracy, the unit interval that the inventive method is run is better than other control methods, this is because the calculation that the inventive method is applied is simply rapid, by comparing on a large scale, find abnormal data, following the trail of these abnormal datas again, the result according to following the trail of prediction judges.Meanwhile, the linear discriminant analysis formula method that the inventive method is taked can reduce required data volume to be processed further.And additive method is when detection malicious user, it is necessary to constantly abnormal data is processed, and do not possess the real-time of the inventive method.

Above example shows, the inventive method can malicious user in effectively detection system, strengthen the vigorousness of system, and operational efficiency also increase substantially, therefore the inventive method has important using value.The foregoing is only the present invention a specific embodiment; not in order to limit the present invention; in the present embodiment, data set used and attack mode are only limitted to the present embodiment; all within the spirit and principles in the present invention; any amendment of being made, equivalent replacement, improvement etc., should be included within protection scope of the present invention.

Claims

1. the commending system safety detection method based on data tracing, it is characterised in that comprise the steps:

1) data prediction, detects scoring item；

2) PROJECT TRACKING and prediction, utilizes EKF to can be applicable to the feature of time nonlinear dynamic system, follows the trail of and prediction term purpose scoring situation；

3) attack user's classification, utilize linear discriminant analysis formula that the scoring abnormal user in project is carried out cluster analysis, thus the attack user judged in this project and general picture thereof.

2. the commending system safety detection method based on data tracing according to claim 1, it is characterised in that described data prediction specifically includes:

A: travel through the scoring of all items in current commending system, it is thus achieved that the history score data of all users；

B: identify project the project j in collection i at the data statistical characteristics avg of t according to the respective history score data of all users_tAnd var_t；

Described PROJECT TRACKING and prediction specifically include:

C: the avg according to project j_t-1And var_t-1Calculate acquisition project j in the SACA value of t and SVCA value；

D: EKF initializes, obtaining current project j in the system mode of t is

X (t) = (\begin{matrix} S A C A_{t} \\ {SVCA}_{t} \end{matrix}),

Observer state Y (t) and system mode error co-variance matrix P (t)；

E: computational item j at system mode X (t+1 | the t)=f [t in t+1 moment, X (t)]+G (t) W (t), the observer state Y in t+1 moment (t+1 | t)=h [t, X (t)]+V (t), wherein, W (t) is process noise, f (t) and h (t) is the nonlinear function that system does that first order Taylor launches to obtain, V (t) is observation noise, and G (t) is noise profile matrix；

Φ (t + 1) = \frac{\partial f}{\partial X};

H: utilize project j to calculate system mode error co-variance matrix P (t+1 | t), P (t+1 | t)=Φ (t+1) P (t | t) Φ at the state-transition matrix that the t+1 moment extrapolates^T(t+1)+Q, wherein, Q is process-noise variance value, Φ^TTransposition for Φ；

I: solve project j in the kalman gain matrix K (t+1) in t+1 moment control convergence speed, K (t+1)=P (t+1 | t) H^T(t+1)(H(t+1)P(t+1|t)H^T(t+1)+R), wherein, R is the variance yields of white Gaussian noise, H^TTransposition for H；

J: renewal item j at system mode X (t+1)=X (t+1 | the t)+K (Y (t)-Y (t+1 | t) in t+1 moment), update corresponding system mode error co-variance matrix P (t+1)=(I simultaneously_n-K (t+1) H (t+1)) and P (t+1 | t), wherein I_nFor corresponding n rank unit matrix；

K: attack next step behavior of user according to historical data Accurate Prediction, namely judge now whether Attack Prediction result possesses effectiveness；

Described attack user's classification specifically includes:

M: calculate the scoring mean vector μ of all abnormal user, the scoring mean vector μ of jth project_j；

N: calculate scatter matrix S in the class of data matrix D_bAnd scatter matrix S between class_W；

O:a is dimensionality reduction conversion vector, vector d_iThe projection function obtained by dimensionality reduction conversion vector a is g_i=a^Td_i, pass throughEquations a, λ areEigenvalue；

P: the data matrix D best projection value on projection plane α is: g_i=a^Td_iUse nearest neighbor algorithm KNN that the projection of data matrix is divided, assailant in abnormal user produces to assemble because of its scoring feature i.e. similarity of the highest scoring and second highest scoring, but not the projection of the abnormal user of assailant cannot produce to assemble, thus marking off attack user.

3. the commending system safety detection method based on data tracing according to claim 2, it is characterised in that in the K step of described PROJECT TRACKING and prediction process, it is determined that whether Attack Prediction result possesses two essential conditions of effectiveness respectively:

1) namely provide and predict the outcome after following the trail of through short-term, only the accurate change in success prediction short time Δ t, it was predicted that result just has ageing；

2) can the correlation circumstance of success prediction project, if prediction is only capable of non-targeted project is made correct prediction, then its predictive content without reference value, only can go out the system mode of project by Accurate Prediction in accuracy, predicting the outcome and just possess effectiveness, available formula calculated as below is expressed:

Wherein, AR_uRepresent attack user u and provide the set that the project of abnormal scoring i.e. the highest scoring and second highest scoring forms；

total_u,jRepresent user u and provide the total degree tracked for project j of abnormal scoring；

Υ_u,jRepresent the probability of the project j Accurate Prediction of system of users u scoring exception；

CONT_u,jCalculating the close number of times of state transition equation and observational equation in the short time, statistics predicts number of times accurately；

CAL_uRepresent that user is likely to become the probability attacking user because destination item carries out abnormal scoring, when the PROJECT TRACKING time exceedes Δ t, certain project CAL simultaneously_uWhen value exceedes given prediction threshold value η, then jump to step L, otherwise return step E.

4. the commending system safety detection method based on data tracing according to claim 3, it is characterised in that: described Δ t ranges for 600s-259200s.