CN105809030A - Data tracking based recommendation system security detection method - Google Patents

Data tracking based recommendation system security detection method Download PDF

Info

Publication number
CN105809030A
CN105809030A CN201610120727.2A CN201610120727A CN105809030A CN 105809030 A CN105809030 A CN 105809030A CN 201610120727 A CN201610120727 A CN 201610120727A CN 105809030 A CN105809030 A CN 105809030A
Authority
CN
China
Prior art keywords
project
user
prediction
scoring
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610120727.2A
Other languages
Chinese (zh)
Other versions
CN105809030B (en
Inventor
黄海平
李峰
朱洁
叶宁
王鹏
王汝传
沙超
吴鹏飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Post and Telecommunication University
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing Post and Telecommunication University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Post and Telecommunication University filed Critical Nanjing Post and Telecommunication University
Priority to CN201610120727.2A priority Critical patent/CN105809030B/en
Publication of CN105809030A publication Critical patent/CN105809030A/en
Application granted granted Critical
Publication of CN105809030B publication Critical patent/CN105809030B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention proposes a data tracking based recommendation system security detection method to overcome the shortcomings of long time for user profile injection, poor attack effect, incapability of adapting to big data processing and the like in conventional collaborative filtering recommendation system detection. According to the method, a score state of a project is tracked and predicted by using a characteristic that extended Kalman filtering (EKF) can be applied to a time nonlinear dynamic system, and then users with abnormal scores in the project are subjected to clustering analysis by utilizing linear discriminant analysis (LDA), so that attack users in the project and the profiles of the users can be determined. With the adoption of the EKF method, the detection of a large amount of unrelated data is reduced, so that the detection efficiency is improved and the system robustness is enhanced. A tracking algorithm is used for recommendation system security detection and can realize online continuous system detection, so that the error detection rate is reduced. The LDA method can perform dimension reduction on multi-characteristic users, so that the profile injection attack of malicious users is effectively detected and the detection rate is increased.

Description

A kind of commending system safety detection method based on data tracing
Technical field
The present invention relates to a kind of data processing method, the commending system safety detection method that can be applicable to e-commerce field being specifically related under a kind of big data background to utilize data tracing technological means to realize.
Background technology
The appearance of the Internet and popularize the data bringing magnanimity to user, meet user in the information age demand to information service.But increasing substantially of the data volume brought along with developing rapidly of network so that user cannot therefrom obtain the part information that oneself is actually useful when in the face of mass data.For this problem, Internet Service Provider, in order to improve Consumer's Experience, designs or employs Collaborative Filtering Recommendation System, it is intended to by the analysis of this system, actively push the information come in handy to user.In mobile e-business now, Collaborative Filtering Recommendation System obtains a wide range of applications, and the q&r meanwhile recommended also becomes the major issue of user's growing interest.The opening intrinsic due to commending system and the sensitivity of user profile, malicious user can pass through to inject a large amount of false user profile (such as the fictitious users of a certain commodity being marked) in commending system and recommend the purpose of verity to reach influential system, the recommendation that commending system produces is made to meet their interests, and then affecting user's degree of belief to commending system, this behavior is called " user profile injection attacks " (UserProfileInjectionAttack) or " holder is attacked " (ShillingAttack).
In actual applications, for different attack purposes, user profile injection attacks can be divided into " pushing away attack " (PushAttack) and " nuclear attack " (NukeAttack) two types.The attack effect pushing away attack is so that the recommended frequency of targeted commercial item is apparently higher than other commodity items, can by strong preference to the purpose of user thus reaching this targeted commercial item.The attack effect of nuclear attack is so that the recommended frequency of targeted commercial item is significantly lower than other commodity items, thus reaching to affect this end article by the effect of system recommendation.
In e-commerce transaction, some manufacturers can handle commending system by every means and recommend the product of oneself to user, suppress rival with this, obtain illegal profit.Being no lack of such example in actual life: June calendar year 2001, certain company utilizes the means forging film comment to recommend a recently released film to user.In October, 2011, certain store is because of the service annual fee that unilaterally develops skill, and then causes numerous medium and small seller and the mode extensive jointly attack to big shop such as comment by malice is poor.Additionally, the searching algorithm that certain search is also frequently found it can be cheated with various means by number of site, to promote they sequences in retrieval result.
The normal operation of the existence meeting severe jamming commending system of these attacks, mislead user accept or buy not really necessary information or article, make user lose the trust to this commending system gradually, cause the loss of customers, it is recommended that system will suffer the double loss of prestige and profit.Therefore, the technique study that detection commending system is hacked just is particularly important.
At present, the three class detection methods proposed in prior art respectively have supervision, nothing supervision and semi-supervised three kinds of modes, and this three classes method has his own strong points, and has also derived Many Detection.Along with big data age arrives, the data volume rank of commending system grows with each passing day, but the most methods of above-mentioned three apoplexy due to endogenous wind had not considered the efficiency of attack detecting, cause that data-handling efficiency is low and consuming time long, so being difficult to the ecommerce being applicable under big data age.Accordingly, it would be desirable to the commending system detection method that a kind of detection efficiency is higher, safer, realize convenient and service efficiently.
Summary of the invention
The present invention is directed to the defect that in prior art, commending system attack detection method exists, utilize the technological means such as data tracing to solve the problems such as detection efficiency is low, system is vulnerable.Being widely used under the background of ecommerce in big data, the present invention utilizes the polymerization of mass efficient data can realize more efficient and safe detection service.
A kind of commending system safety detection method based on data tracing that the present invention proposes includes three below step:
1. data prediction, detects scoring item.
2. PROJECT TRACKING and prediction, utilizes EKF (ExtendedKalmanFilter) to can be applicable to the feature of time nonlinear dynamic system, follows the trail of and prediction term purpose scoring situation.
3. attacking user's classification, the scoring abnormal user in project is carried out cluster analysis by recycling linear discriminant analysis LDA (LinearDiscriminantAnalysis), thus the attack user judged in this project and general picture thereof.
Further, above-mentioned data prediction, specifically include:
A: travel through the scoring of all items in current commending system, it is thus achieved that the history score data of all users.
B: identify project the project j in collection i at the data statistical characteristics avg of t according to the respective history score data of all userstAnd vart
Above-mentioned PROJECT TRACKING and prediction, specifically include:
C: the avg according to project jt-1And vart-1Calculate acquisition project j in the SACA value of t and SVCA value.
D: EKF initializes, obtaining current project j in the system mode of t isObserver state Y (t) and system mode error co-variance matrix P (t).
E: computational item j at system mode X (t+1 | the t)=f [t in t+1 moment, X (t)]+G (t) W (t), the observer state Y in t+1 moment (t+1 | t)=h [t, X (t)]+V (t), wherein, W (t) is process noise, f (t) and h (t) is the nonlinear function that system does that first order Taylor launches to obtain, V (t) is observation noise, and G (t) is noise profile matrix.
F: solve first-order linear state equation, calculates the project j state-transition matrix Φ (t+1) in the t+1 moment,
G: solve first-order linear observational equation, computational item j at the observing matrix H (t+1) in t+1 moment,
H: utilize project j to calculate system mode error co-variance matrix P (t+1 | t), P (t+1 | t)=Φ (t+1) P (t | t) Φ at the state-transition matrix that the t+1 moment extrapolatesT(t+1)+Q, wherein, Q is process-noise variance value, ΦTTransposition for Φ.
I: solve project j in the kalman gain matrix K (t+1) in t+1 moment control convergence speed, K (t+1)=P (t+1 | t) HT(t+1)(H(t+1)P(t+1|t)HT(t+1)+R), wherein, R is the variance yields of white Gaussian noise, HTTransposition for H.
J: renewal item j at system mode X (t+1)=X (t+1 | the t)+K (Y (t)-Y (t+1 | t) in t+1 moment), update corresponding system mode error co-variance matrix P (t+1)=(I simultaneouslyn-K (t+1) H (t+1)) and P (t+1 | t), wherein InFor corresponding n rank unit matrix.
K: attack next step behavior of user according to historical data Accurate Prediction, namely judge now whether Attack Prediction result possesses effectiveness.
Above-mentioned attack user's classifying step comprises:
L: predict that the destination item obtained travels through to following the trail of, it is thus achieved that all abnormal user set Att that destination item is markedi={ att1,att2,…attlAnd data matrix D, D=[d1,d2,...,dL];
M: calculate the scoring mean vector μ of all abnormal user, the scoring mean vector μ of jth projectj
N: calculate scatter matrix S in the class of data matrix DbAnd scatter matrix S between classW
O:a is dimensionality reduction conversion vector, vector diThe projection function obtained by dimensionality reduction conversion vector a is gi=aTdi, pass throughEquations a, λ areEigenvalue.
P: the data matrix D best projection value on projection plane α is: gi=aTdiUse nearest neighbor algorithm KNN (K-NearestNeighbor) that the projection of data matrix is divided, assailant in abnormal user produces because of the similarity of its feature of marking (the highest scoring and second highest scoring) to assemble, but not the projection of the abnormal user of assailant cannot produce to assemble, thus mark off attack user.
Further, in the K step of above-mentioned PROJECT TRACKING and prediction process, it is determined that whether Attack Prediction result possesses two essential conditions of effectiveness respectively:
(1) namely provide and predict the outcome after following the trail of through short-term, only the accurate change in success prediction short time Δ t, it was predicted that result just has ageing;
(2) can the correlation circumstance of success prediction project, if prediction is only capable of, and non-targeted project is made correct prediction, so its predictive content is in accuracy without reference value, only can go out the system mode of project by Accurate Prediction, it was predicted that result just possesses effectiveness.
(2nd) individual condition specifically can be expressed as calculating formula:
Wherein, ARuRepresent attack user u and provide the set (because single attack user there may be one or more destination item) that the project of abnormal scoring (the highest scoring and second highest scoring) forms;totalu,jRepresent user u and provide the total degree tracked for project j of abnormal scoring;Υu,jRepresent the probability of the project j Accurate Prediction of system of users u scoring exception;CONTu,jCalculate the close number of times of state transition equation and observational equation in the short time and (namely meet the difference of state transition equation and observational equation less than given minimum ξ, meet X (t) >=ρ simultaneously, Y (t) >=ω, ρ is the abnormal threshold value of state transfer, ω is observational equation exception threshold value), namely add up and predict number of times accurately;CALuRepresent that user is likely to become the probability attacking user, CAL because destination item carries out abnormal scoringu(more detailed description: because the scoring behavior attacking user has regularity as a percentage, therefore can pass through to follow the trail of Forecasting Methodology it is concluded and judges, normal users is possibly also owing to destination item has been chosen the high score seeming abnormal by self preference, but owing to normal users scoring behavior does not have regularity, therefore, it is difficult to tracked prediction and CALuValue can be extremely low, so working as CALuThe anomaly item of marking that value has closer to 100% this user of explanation is more high by the degree accurately following the trail of prediction, and the probability that this user is attack user is more big, otherwise the probability that its value this user of more low expression is attack user is more little).When the PROJECT TRACKING time exceedes Δ t, certain project CAL simultaneouslyuWhen value exceedes given prediction threshold value η, then jump to step L, otherwise, return step E.
The setting of above-mentioned Δ this scope of data of t stems from the time attacking enforcement attacking user, it is possible to be actually needed flexible setting according to what apply, it is preferable that 600s-259200s.
Beneficial effect: the commending system safety detection method based on data tracing that the present invention proposes has the advantage that
1, EKF method EKF (ExtendedKalmanFilter) is adopted to decrease the detection to a large amount of extraneous data thus improve detection efficiency, and owing to commending system can be carried out detection in real time so that system robustness increases.
2, the present invention is used for the safety detection of commending system first with tracing algorithm, it is possible to realizes the system detection of on-line uninterruption, reduces false drop rate.
3, adopt linear discriminant analysis method LDA (LinearDiscriminantAnalysis), further to multiple features user's dimensionality reduction thus effectively detecting the general picture injection attacks of malicious user and adding verification and measurement ratio, reduce error rate.
In sum, the present invention can overcome the problem that traditional detection method is inefficient and accuracy is low, owing to focusing on the tracking of project and prediction, it is to avoid substantial amounts of invalid or inefficient operation, so that the present invention has the feature of high efficiency, high detection rate and low error rate.
The important terms and the constraint thereof that use in the present invention are as follows:
User collects: the set U={U of m user's composition in system1, U2…Um};
Item Sets: the set i={i of n item design in system1,i2…in, we represent jth (class) project in set i with j hereinafter;
Short-term averaging change liveness (SACA): short-term averaging change liveness (short-termaveragechangeactivity) reflects the situation attacking the average mark fast lifting in a short time that user causes destination item entirety to be marked because of the scoring constantly raising destination item, wherein, avgtRepresenting the average mark in t of a certain project in Item Sets, τ is SACA average mark correction value, FtRepresent remaining scoring set after removing abnormal scoring (the highest scoring and second highest scoring) project in t user profile.Specifically calculate such as formula:
Short-term variance change liveness (SVCA): short-term variance change liveness (short-termvariancechangeactivity) is constantly to raise, for attack user, the situation that the scoring of destination item causes the variance that destination item entirety is marked quickly to reduce in a short time, this attribute display goes out destination item ANOMALOUS VARIATIONS in a short time, vart represents the variance in t of a certain project in Item Sets, υ is SVCA correction to variances value, specifically calculates such as formula: SVCAt=| vart-vart-1-|Ft|υ|
EKF (EKF): the basic thought of EKF (ExtendedKalmanFilter) is by nonlinear system linearisation, then pass through system input and output observation data, system mode is carried out the algorithm of optimal estimation, its essence is a kind of high efficiency recursion filter.
Linear discriminant analysis (LDA): the basic thought of linear discriminant analysis (LinearDiscriminantAnalysis) is that the pattern sample of higher-dimension is projected to best discriminant technique vector space, to reach to extract the effect of classification information and compressive features space dimensionality, after projection, Assured Mode sample has maximum between class distance and minimum inter-object distance in new subspace, and namely pattern has the separability of the best within this space.Therefore, it is a kind of effective Feature Extraction Method.
Data matrix D:D=[d1,d2,...,dL] it is row composition data matrix, wherein each d of matrix to the statistical nature such as the dividing equally of the user that tracking project is marked, variance, medianiContain the scoring statistical nature of user, and be the matrix (h is the quantity of statistical nature) of a h row.
Scatter matrix S in classW: the matrix of the dispersion composition in sample is symmetric positive semidefinite matrix.The centrifugal pump of discrete matrix represents the dense degree of sample point, and value is more big more disperses, otherwise more concentrates, and this matrix table is shown as(subscript T representing matrix or vector transposition, hereafter all herewith implication).Assume the project of total c classification, whereinRepresent the i-th sample point of j intermediate item, wjRepresent the sample number (or weight) of j intermediate item, if the sample point number in classification is more many, then its weight is also more big;Additionally, the scoring mean vector of jth intermediate item isDjRepresent data statistics vector relevant to j intermediate item in data matrix.
Scatter matrix S between classb: the matrix of the dispersion composition between sample of all categories, and be symmetric positive semidefinite matrix.Between different classes of sample point more discrete more good.Wherein mark mean vector
Nearest neighbor algorithm (KNN): the basic thought of nearest neighbor algorithm (K-NearestNeighbor) is if the great majority in the K in feature space sample the most adjacent of sample belong to some classification, then this sample falls within this classification, and has the characteristic of sample in this classification.The present invention uses the method based on the projection result of linear discriminant analysis formula, with calculate Euclidean distance for supplementary means to attack user judge.
Dimensionality reduction conversion vector a: original multidimensional data be reduced to one-dimensional and make the different classes of the most obvious optimal vector of separating effect.
Random attack: random attack is a kind of simple attack model, and its filling project is marked and come from respective items purpose average score, and destination item then gives the highest scoring or minimum scoring.Although attacking easy to implement at random, but its effect being not ideal.
Popular attack: popular attack is the random one attacked development form, and its structure thinking meets Zipf long-tail law and namely only has fewer project can attract the concern of plurality people.Assailant is using popular project as the project of selection, and the popularity degree of project generally uses its number of times being scored to weigh.Popular attack model attack effect is better than random attack effect, and implementing neither be very complicated.
Attack scale (AttackSize): namely attack the percentage ratio of all user profile numbers in system shared by user profile number in marking system.
Accompanying drawing explanation
Below in conjunction with drawings and embodiments, the present invention is further detailed explanation.
Fig. 1 is the detection method flow chart of commending system.
Fig. 2 is the operation time comparison diagram with SVM and UnRAP detection method.
Detailed description of the invention
A specific embodiment presented below, to be applied to the detection method of the present invention in the ecommerce scene of reality.At this, use MovieLens data set is tested as the score data storehouse of commending system, this data set is provided by the GroupLens research group of Minnesota university of the U.S., it comprises 943 users, 100000 scoring records to 1682 films, therefore there is the advantages such as attribute is abundant, data are true, be widely used in the fields such as data mining.Comparison and detection method used in the present invention has SVM and UnRAP, and this two classes method is all detection method comparatively popular in current research and application.The present invention adopts random attack common in commending system with popular attack, the inventive method and SVM and UnRAP method to be attacked, the attack scale (AttackSize) of this 2 class attack pattern respectively 3%, 15%, the comparison of result proves the superiority of the inventive method by experiment.
Next concrete enforcement step is described:
1. data prediction
Step 1) travel through all users in current MovieLens data set, it is thus achieved that the history score data of all users, wherein also comprise and attack the destination item scoring that user attacks.
Step 2) identify project the j data statistical characteristics avg in t according to the respective history score data of all userstAnd vart, t=30s in the present embodiment, namely the every 30s of system detects data once.
2. PROJECT TRACKING and prediction
Step 3) avg according to project jt-1And vart-1Calculate and obtain current project j in the SACA value of t and SVCA value.
Step 4) EKF filters initialization, and obtaining current project j in the system mode of t isObserver state Y (t) and system mode error co-variance matrix P (t).
Step 5) computational item j is at system mode X (t+1 | the t)=f [t in t+1 moment, X (t)]+G (t) W (t), the observer state Y in t+1 moment (t+1 | t)=h [t, X (t)]+V (t).
Step 6) solve first-order linear state equation, calculate the project j state-transition matrix Φ (t+1) in the t+1 moment,
Step 7) solve first-order linear observational equation, computational item j at the observing matrix H (t+1) in t+1 moment,
Step 8) utilize project j to calculate system mode error co-variance matrix P (t+1 | t), P (t+1 | t)=Φ (t+1) P (t | t) Φ at the state-transition matrix that the t+1 moment extrapolatesT(t+1)+Q。
Step 9) solve project j in the kalman gain matrix K (t+1) in t+1 moment control convergence speed, K (t+1)=P (t+1 | t) HT(t+1)(H(t+1)P(t+1|t)HT(t+1)+R)。
Step 10) renewal item j is at system mode X (t+1)=X (t+1 | the t)+K (Y (t)-Y (t+1 | t) in t+1 moment), update corresponding system mode error co-variance matrix P (t+1)=(I simultaneouslyn-K(t+1)H(t+1))P(t+1|t)。
Step 11) according to historical data Accurate Prediction attack user next step behavior, namely judge now whether Attack Prediction result possesses effectiveness.Predict effective two essential conditions respectively: (1) namely provides after following the trail of through short-term and predicts the outcome, the only accurate change in success prediction short time Δ t, predicting the outcome, it is ageing just to have, the setting of Δ this scope of data of t stems from the time attacking enforcement attacking user, the setting of this time has to comply with timely and effective this primary condition of property, if but increase and significantly talk about very much too soon, this anomaly item is then without following the trail of, ordinary people also can find the exception of this project, thus being exposed, this has also run counter to the purpose of assailant.Therefore, the original meaning of design here is exactly the enforcement of pursuit attack user.And this enforcement time is unfixed obviously, it is set to 600s-259200s in the present embodiment.
In fact can be actually needed flexible setting according to what apply;(2) can the correlation circumstance of success prediction project, if prediction is only capable of, and non-targeted project is made correct prediction, so its predictive content is in accuracy without reference value, only can go out the system mode of project by Accurate Prediction, it was predicted that result just possesses effectiveness.(2nd) individual condition specifically can be expressed as calculating formula:
ARuRepresent attack user u and provide the set (because single attack user there may be one or more destination item) that the project of abnormal scoring (the highest scoring and second highest scoring) forms;totalu,jRepresent user u and provide the total degree tracked for project j of abnormal scoring;Υu,jRepresent the probability of the project j Accurate Prediction of system of users u scoring exception;CONTu,jCalculate the close number of times of state transition equation and observational equation in the short time and (namely meet the difference of state transition equation and observational equation less than given minimum ξ, meet X (t) >=ρ simultaneously, Y (t) >=ω, ρ is the abnormal threshold value of state transfer, ω is observational equation exception threshold value), namely add up and predict number of times accurately;CALuRepresent that user is likely to become the probability attacking user, CAL because destination item carries out abnormal scoringu(more detailed description: because the scoring behavior attacking user has regularity as a percentage, therefore can pass through to follow the trail of Forecasting Methodology it is concluded and judges, normal users is possibly also owing to destination item has been chosen the high score seeming abnormal by self preference, but owing to normal users scoring behavior does not have regularity, therefore, it is difficult to tracked prediction and CALuValue can be extremely low, so working as CALuThe anomaly item of marking that value has closer to 100% this user of explanation is more high by the degree accurately following the trail of prediction, and the probability that this user is attack user is more big, otherwise the probability that its value this user of more low expression is attack user is more little).When the PROJECT TRACKING time exceedes Δ t, certain project CAL simultaneouslyuWhen value exceedes given prediction threshold value η, then jump to step 12), otherwise, return step 5).
3. attack user's classification
Step 12) the project j followed the trail of in the project that prediction obtains is traveled through, it is thus achieved that all abnormal user set Att that destination item is markedi={ att1,att2,…attlAnd data matrix D, D=[d1,d2,...,dL];Because it is higher and comparatively concentrate to attack user's scoring, therefore their average and variance are also relatively, and are close to identical, utilize this feature the score data of user to be classified.
Step 13) calculate the scoring mean vector μ, the scoring mean vector μ of jth project of all abnormal userj
Step 14) calculate data matrix D class in scatter matrix SbAnd scatter matrix S between classW
Step 15) set a be dimensionality reduction conversion vector, vector diIt is g by the dimensionality reduction conversion vector a projection function obtainedi=aTdi, pass throughEquations a, λ areEigenvalue.
Step 16) data matrix D best projection value on projection plane α is: gi=aTdiUse nearest neighbor algorithm KNN (K-NearestNeighbor) that the projection of data matrix is divided, assailant in abnormal user produces because of the similarity of its feature of marking (the highest scoring and second highest scoring) to assemble, but not the projection of the abnormal user of assailant cannot produce to assemble, thus mark off attack user.
4. interpretation of result and checking
Fig. 2 is that the inventive method contrasts with classical attack method SVM and the UnRAP time when user is attacked in detection.The present invention is using the index (on average everyone mark number of times 106 times) of the average time of the process of each user as the time of operation, when identical running environment, reach identical Detection accuracy, the unit interval that the inventive method is run is better than other control methods, this is because the calculation that the inventive method is applied is simply rapid, by comparing on a large scale, find abnormal data, following the trail of these abnormal datas again, the result according to following the trail of prediction judges.Meanwhile, the linear discriminant analysis formula method that the inventive method is taked can reduce required data volume to be processed further.And additive method is when detection malicious user, it is necessary to constantly abnormal data is processed, and do not possess the real-time of the inventive method.
Above example shows, the inventive method can malicious user in effectively detection system, strengthen the vigorousness of system, and operational efficiency also increase substantially, therefore the inventive method has important using value.The foregoing is only the present invention a specific embodiment; not in order to limit the present invention; in the present embodiment, data set used and attack mode are only limitted to the present embodiment; all within the spirit and principles in the present invention; any amendment of being made, equivalent replacement, improvement etc., should be included within protection scope of the present invention.

Claims (4)

1. the commending system safety detection method based on data tracing, it is characterised in that comprise the steps:
1) data prediction, detects scoring item;
2) PROJECT TRACKING and prediction, utilizes EKF to can be applicable to the feature of time nonlinear dynamic system, follows the trail of and prediction term purpose scoring situation;
3) attack user's classification, utilize linear discriminant analysis formula that the scoring abnormal user in project is carried out cluster analysis, thus the attack user judged in this project and general picture thereof.
2. the commending system safety detection method based on data tracing according to claim 1, it is characterised in that described data prediction specifically includes:
A: travel through the scoring of all items in current commending system, it is thus achieved that the history score data of all users;
B: identify project the project j in collection i at the data statistical characteristics avg of t according to the respective history score data of all userstAnd vart
Described PROJECT TRACKING and prediction specifically include:
C: the avg according to project jt-1And vart-1Calculate acquisition project j in the SACA value of t and SVCA value;
D: EKF initializes, obtaining current project j in the system mode of t is X ( t ) = S A C A t SVCA t , Observer state Y (t) and system mode error co-variance matrix P (t);
E: computational item j at system mode X (t+1 | the t)=f [t in t+1 moment, X (t)]+G (t) W (t), the observer state Y in t+1 moment (t+1 | t)=h [t, X (t)]+V (t), wherein, W (t) is process noise, f (t) and h (t) is the nonlinear function that system does that first order Taylor launches to obtain, V (t) is observation noise, and G (t) is noise profile matrix;
F: solve first-order linear state equation, calculates the project j state-transition matrix Φ (t+1) in the t+1 moment, Φ ( t + 1 ) = ∂ f ∂ X ;
G: solve first-order linear observational equation, computational item j at the observing matrix H (t+1) in t+1 moment,
H: utilize project j to calculate system mode error co-variance matrix P (t+1 | t), P (t+1 | t)=Φ (t+1) P (t | t) Φ at the state-transition matrix that the t+1 moment extrapolatesT(t+1)+Q, wherein, Q is process-noise variance value, ΦTTransposition for Φ;
I: solve project j in the kalman gain matrix K (t+1) in t+1 moment control convergence speed, K (t+1)=P (t+1 | t) HT(t+1)(H(t+1)P(t+1|t)HT(t+1)+R), wherein, R is the variance yields of white Gaussian noise, HTTransposition for H;
J: renewal item j at system mode X (t+1)=X (t+1 | the t)+K (Y (t)-Y (t+1 | t) in t+1 moment), update corresponding system mode error co-variance matrix P (t+1)=(I simultaneouslyn-K (t+1) H (t+1)) and P (t+1 | t), wherein InFor corresponding n rank unit matrix;
K: attack next step behavior of user according to historical data Accurate Prediction, namely judge now whether Attack Prediction result possesses effectiveness;
Described attack user's classification specifically includes:
L: predict that the destination item obtained travels through to following the trail of, it is thus achieved that all abnormal user set Att that destination item is markedi={ att1,att2,…attlAnd data matrix D, D=[d1,d2,...,dL];
M: calculate the scoring mean vector μ of all abnormal user, the scoring mean vector μ of jth projectj
N: calculate scatter matrix S in the class of data matrix DbAnd scatter matrix S between classW
O:a is dimensionality reduction conversion vector, vector diThe projection function obtained by dimensionality reduction conversion vector a is gi=aTdi, pass throughEquations a, λ areEigenvalue;
P: the data matrix D best projection value on projection plane α is: gi=aTdiUse nearest neighbor algorithm KNN that the projection of data matrix is divided, assailant in abnormal user produces to assemble because of its scoring feature i.e. similarity of the highest scoring and second highest scoring, but not the projection of the abnormal user of assailant cannot produce to assemble, thus marking off attack user.
3. the commending system safety detection method based on data tracing according to claim 2, it is characterised in that in the K step of described PROJECT TRACKING and prediction process, it is determined that whether Attack Prediction result possesses two essential conditions of effectiveness respectively:
1) namely provide and predict the outcome after following the trail of through short-term, only the accurate change in success prediction short time Δ t, it was predicted that result just has ageing;
2) can the correlation circumstance of success prediction project, if prediction is only capable of non-targeted project is made correct prediction, then its predictive content without reference value, only can go out the system mode of project by Accurate Prediction in accuracy, predicting the outcome and just possess effectiveness, available formula calculated as below is expressed:
Wherein, ARuRepresent attack user u and provide the set that the project of abnormal scoring i.e. the highest scoring and second highest scoring forms;
totalu,jRepresent user u and provide the total degree tracked for project j of abnormal scoring;
Υu,jRepresent the probability of the project j Accurate Prediction of system of users u scoring exception;
CONTu,jCalculating the close number of times of state transition equation and observational equation in the short time, statistics predicts number of times accurately;
CALuRepresent that user is likely to become the probability attacking user because destination item carries out abnormal scoring, when the PROJECT TRACKING time exceedes Δ t, certain project CAL simultaneouslyuWhen value exceedes given prediction threshold value η, then jump to step L, otherwise return step E.
4. the commending system safety detection method based on data tracing according to claim 3, it is characterised in that: described Δ t ranges for 600s-259200s.
CN201610120727.2A 2016-03-03 2016-03-03 A kind of commending system safety detection method based on data tracing Active CN105809030B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610120727.2A CN105809030B (en) 2016-03-03 2016-03-03 A kind of commending system safety detection method based on data tracing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610120727.2A CN105809030B (en) 2016-03-03 2016-03-03 A kind of commending system safety detection method based on data tracing

Publications (2)

Publication Number Publication Date
CN105809030A true CN105809030A (en) 2016-07-27
CN105809030B CN105809030B (en) 2018-07-10

Family

ID=56466017

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610120727.2A Active CN105809030B (en) 2016-03-03 2016-03-03 A kind of commending system safety detection method based on data tracing

Country Status (1)

Country Link
CN (1) CN105809030B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109391626A (en) * 2018-11-15 2019-02-26 东信和平科技股份有限公司 A kind of method and relevant apparatus determining that network attack result is not accomplished
CN111126103A (en) * 2018-10-30 2020-05-08 百度在线网络技术(北京)有限公司 Method and device for judging life stage state of user
CN111175046A (en) * 2020-03-18 2020-05-19 北京工业大学 Rolling bearing fault diagnosis method based on manifold learning and s-k-means clustering
CN111192429A (en) * 2020-01-20 2020-05-22 天津合极电气科技有限公司 Fire early warning detection method based on charge trajectory tracking technology
CN112312169A (en) * 2020-11-20 2021-02-02 广州欢网科技有限责任公司 Method and equipment for checking program scoring validity

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102118382A (en) * 2010-10-31 2011-07-06 华南理工大学 System and method for detecting attack of collaborative recommender based on interest combination
CN102184364A (en) * 2011-05-26 2011-09-14 南京财经大学 Semi-supervised learning-based recommendation system shilling attack detection method
CN104809393A (en) * 2015-05-11 2015-07-29 重庆大学 Shilling attack detection algorithm based on popularity classification features

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102118382A (en) * 2010-10-31 2011-07-06 华南理工大学 System and method for detecting attack of collaborative recommender based on interest combination
CN102184364A (en) * 2011-05-26 2011-09-14 南京财经大学 Semi-supervised learning-based recommendation system shilling attack detection method
CN104809393A (en) * 2015-05-11 2015-07-29 重庆大学 Shilling attack detection algorithm based on popularity classification features

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111126103A (en) * 2018-10-30 2020-05-08 百度在线网络技术(北京)有限公司 Method and device for judging life stage state of user
CN111126103B (en) * 2018-10-30 2023-09-26 百度在线网络技术(北京)有限公司 Method and device for judging life stage state of user
CN109391626A (en) * 2018-11-15 2019-02-26 东信和平科技股份有限公司 A kind of method and relevant apparatus determining that network attack result is not accomplished
CN109391626B (en) * 2018-11-15 2021-07-30 东信和平科技股份有限公司 Method and related device for judging whether network attack result is unsuccessful
CN111192429A (en) * 2020-01-20 2020-05-22 天津合极电气科技有限公司 Fire early warning detection method based on charge trajectory tracking technology
CN111175046A (en) * 2020-03-18 2020-05-19 北京工业大学 Rolling bearing fault diagnosis method based on manifold learning and s-k-means clustering
CN112312169A (en) * 2020-11-20 2021-02-02 广州欢网科技有限责任公司 Method and equipment for checking program scoring validity

Also Published As

Publication number Publication date
CN105809030B (en) 2018-07-10

Similar Documents

Publication Publication Date Title
CN102184364A (en) Semi-supervised learning-based recommendation system shilling attack detection method
CN105809030A (en) Data tracking based recommendation system security detection method
Zhang et al. Detecting spammer groups from product reviews: a partially supervised learning model
CN104484602B (en) A kind of intrusion detection method, device
CN106126549A (en) A kind of community's trust recommendation method decomposed based on probability matrix and system thereof
CN101388024B (en) Compression space high-efficiency search method based on complex network
CN103617235A (en) Method and system for network navy account number identification based on particle swarm optimization
CN105095476A (en) Collaborative filtering recommendation method based on Jaccard equilibrium distance
CN106934035A (en) Concept drift detection method in a kind of multi-tag data flow based on class and feature distribution
CN107491557A (en) A kind of TopN collaborative filtering recommending methods based on difference privacy
CN107944485A (en) The commending system and method, personalized recommendation system found based on cluster group
CN104239496A (en) Collaborative filtering method based on integration of fuzzy weight similarity measurement and clustering
CN104766219B (en) Based on the user's recommendation list generation method and system in units of list
CN104751353A (en) Cluster and Slope One prediction based collaborative filtering method
CN104899321A (en) Collaborative filtering recommendation method based on item attribute score mean value
Yanfang et al. Research on E-commerce user churn prediction based on logistic regression
CN111428145A (en) Recommendation method and system fusing tag data and naive Bayesian classification
Zandian et al. Feature extraction method based on social network analysis
Boukhers et al. Ensemble and multimodal approach for forecasting cryptocurrency price
Roy et al. Exploiting Deep Learning Based Classification Model for Detecting Fraudulent Schemes over Ethereum Blockchain
Li et al. Robust personalized ranking from implicit feedback
CN103678709B (en) Recommendation system attack detection method based on time series data
Tekin et al. Customer lifetime value prediction for gaming industry: fuzzy clustering based approach
Liao et al. Accumulative Time Based Ranking Method to Reputation Evaluation in Information Networks
Yu et al. A robust Bayesian probabilistic matrix factorization model for collaborative filtering recommender systems based on user anomaly rating behavior detection

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20160727

Assignee: NUPT INSTITUTE OF BIG DATA RESEARCH AT YANCHENG CO., LTD.

Assignor: Nanjing Post & Telecommunication Univ.

Contract record no.: X2019980001249

Denomination of invention: Data tracking based recommendation system security detection method

Granted publication date: 20180710

License type: Common License

Record date: 20191224