CN105809030B - A kind of commending system safety detection method based on data tracing - Google Patents

A kind of commending system safety detection method based on data tracing Download PDF

Info

Publication number
CN105809030B
CN105809030B CN201610120727.2A CN201610120727A CN105809030B CN 105809030 B CN105809030 B CN 105809030B CN 201610120727 A CN201610120727 A CN 201610120727A CN 105809030 B CN105809030 B CN 105809030B
Authority
CN
China
Prior art keywords
project
user
prediction
scoring
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610120727.2A
Other languages
Chinese (zh)
Other versions
CN105809030A (en
Inventor
黄海平
李峰
朱洁
叶宁
王鹏
王汝传
沙超
吴鹏飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Post and Telecommunication University
Original Assignee
Nanjing Post and Telecommunication University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Post and Telecommunication University filed Critical Nanjing Post and Telecommunication University
Priority to CN201610120727.2A priority Critical patent/CN105809030B/en
Publication of CN105809030A publication Critical patent/CN105809030A/en
Application granted granted Critical
Publication of CN105809030B publication Critical patent/CN105809030B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention proposes a kind of commending system safety detection method based on data tracing, and to solve traditional Collaborative Filtering Recommendation System detection user's injection general picture, time-consuming, attack effect is bad, does not adapt to the shortcomings of big data processing.The characteristics of invention can be applied to time nonlinear dynamic system using Extended Kalman filter EKF first, track simultaneously prediction term purpose scoring situation, linear discriminant analysis LDA is recycled to carry out cluster analysis to the scoring abnormal user in project later, so as to judge attack user and its general picture in the project.The use of Extended Kalman filter method reduces the detection to a large amount of extraneous datas, so as to improve detection efficiency, improves the robustness of system.Tracing algorithm is used for the safety detection of commending system, can realize the system detectio of on-line uninterruption, reduce false drop rate.Linear discriminant analysis method is realized to multiple features user's dimensionality reduction, so as to which the general picture injection attacks of malicious user be effectively detected and increase verification and measurement ratio.

Description

A kind of commending system safety detection method based on data tracing
Technical field
The present invention relates to a kind of data processing methods, and in particular to data tracing technology hand is utilized under a kind of big data background The commending system safety detection method that can be applied to e-commerce field of Duan Shixian.
Background technology
The appearance of internet and the data that magnanimity is brought to user are popularized, meet user and information is taken in the information age The demand of business.But the data volume brought with the rapid development of network increases substantially so that user is in face of mass data Shi Wufa therefrom obtains the part information actually useful to oneself.For this problem, Internet Service Provider is in order to improve use Family is experienced, and designs or used Collaborative Filtering Recommendation System, it is intended to by the analysis of this system, may actively be had to user's push Information.In mobile e-business now, Collaborative Filtering Recommendation System has been widely used, and at the same time recommends Q&r also become user's growing interest major issue.Due to commending system intrinsic opening and user information Sensibility, malicious user can be by injecting a large amount of false user profiles (such as void to a certain commodity into commending system Bogus subscriber score) with achieve the purpose that influence system recommendation authenticity, make commending system generate recommendation meet their interests, And then degree of belief of the user to commending system is influenced, this behavior is referred to as " user profile injection attacks " (User Profile Injection Attack) or " support attack " (Shilling Attack).
In practical applications, for different attack purposes, user profile injection attacks can be divided into " pushing away attack " (Push ) and " nuclear attack " (Nuke Attack) two types Attack.The attack effect for pushing away attack is so that targeted commercial item pushes away It recommends frequency and is apparently higher than other commodity items, so as to achieve the purpose that the targeted commercial item can be highly recommended to user. The attack effect of nuclear attack is so that the recommended frequency of targeted commercial item is influenced significantly lower than other commodity items so as to reach The end article is by the effect of system recommendation.
In e-commerce transaction, the manipulation commending system that some manufacturers can be by every means recommends the production of oneself to user Product suppress rival with this, obtain illegal profit.It is no lack of such example in actual life:In June, 2001, certain company profit Recommend a recently released film to user with the means for forging film comment.In October, 2011, certain store is because unilaterally developing skill Annual fee is serviced, and then causes numerous medium and small sellers and the extensive jointly attack of the modes to big shop such as comments by the way that malice is poor.In addition, certain is searched Rope is also frequently found its searching algorithm and can be cheated by number of site with various means, to promote their rows in retrieval result Sequence.
The normal operation of the presence meeting severe jamming commending system of these attacks, misleading user receiving or purchase are simultaneously non-real Required information or article make user gradually lose the trust to this commending system, cause the loss of customers, and commending system will The double loss of prestige and profit can be suffered.Therefore, the technique study that detection commending system is attacked just is particularly important.
At present, the three classes detection method proposed in the prior art is respectively to have three ways, such as supervision, unsupervised and semi-supervised, This three classes method has his own strong points, and has also derived Many Detection.As the big data epoch arrive, the data volume of commending system Rank is growing day by day, but the most methods in above-mentioned three classes had not considered the efficiency of attack detecting, caused at data Manage inefficiency and take it is long, so be difficult to be suitable for the big data epoch under e-commerce.Therefore, it is necessary to a kind of detection effects Rate higher, safer commending system detection method efficiently service to realize convenient.
Invention content
The present invention for commending system attack detection method in the prior art there are the defects of, utilize the technologies such as data tracing Means solve the problems such as detection efficiency is low, system is vulnerable.In the case where big data is widely used in the background of e-commerce, this hair It is bright to realize more efficient and safe detection service using the polymerization of mass efficient data.
A kind of commending system safety detection method based on data tracing proposed by the present invention includes following three steps:
1. data prediction detects scoring item.
2. PROJECT TRACKING and prediction, when can be applied to using Extended Kalman filter (Extended Kalman Filter) Between nonlinear dynamic system the characteristics of, track and prediction term purpose scoring situation.
3. attacking user's classification, recycle linear discriminant analysis LDA (Linear Discriminant Analysis) right Scoring abnormal user in project carries out cluster analysis, so as to judge attack user and its general picture in the project.
Further, above-mentioned data prediction, specifically includes:
A:The scoring of all items in current commending system is traversed, obtains the history score data of all users.
B:According to the respective history score data of all users identify project collection i in project j t moment data statistics Feature avgtAnd vart
Above-mentioned PROJECT TRACKING and prediction, specifically include:
C:According to the avg of project jt-1And vart-1Acquisition project j is calculated in the SACA values of t moment and SVCA values.
D:Extended Kalman filter initializes, and obtains current project j and is in the system mode of t momentObservation state Y (t) and system mode error co-variance matrix P (t).
E:System mode X (t+1 | t)=f [t, X (t)]+G (t) W (t) of calculating project j at the t+1 moment, the t+1 moment Observation state Y (t+1 | t)=h [t, X (t)]+V (t), wherein, W (t) is process noise, and f (t) and h (t) they are that system does single order Thailand The nonlinear function that expansion obtains is strangled, V (t) is observation noise, and G (t) is noise profile matrix.
F:First-order linear state equation is solved, calculates state-transition matrix Φs (t+1) of the project j at the t+1 moment,
G:First-order linear observational equation is solved, calculates observing matrix Hs (t+1) of the project j at the t+1 moment,
H:System mode error co-variance matrix P is calculated using the project j state-transition matrixes extrapolated at the t+1 moment (t+1 | t), P (t+1 | t)=Φ (t+1) P (t | t) ΦT(t+1)+Q, wherein, Q be process-noise variance value, ΦTFor turning for Φ It puts.
I:Solution project j is in the kalman gain matrix K (t+1) and control convergence speed at t+1 moment, K (t+1)=P (t+ 1|t)HT(t+1)(H(t+1)P(t+1|t)HT(t+1)+R), wherein, variance yields of the R for white Gaussian noise, HTTransposition for H.
J:Renewal item j is in system mode X (t+1)=X (t+1 | the t)+K (Y (t)-Y (t+1 | t) at t+1 moment), simultaneously Update corresponding system mode error co-variance matrix P (t+1)=(In- K (t+1) H (t+1)) and P (t+1 | t), wherein InFor correspondence N rank unit matrixs.
K:According to historical data Accurate Prediction attack user next step behavior, that is, judge at this time Attack Prediction result whether Has validity.
Above-mentioned attack user classifying step includes:
L:Obtained destination item, which traverses, to be predicted to tracking, obtains all exceptions to score destination item User's set Atti={ att1,att2,…attlAnd data matrix D, D=[d1,d2,...,dL];
M:Calculate scoring the mean vector μ, the scoring mean vector μ of j-th of project of all abnormal usersj
N:Calculate scatter matrix S in the class of data matrix DbThe scatter matrix S between classW
O:A is that dimensionality reduction transformation is vectorial, vectorial diIt is g to convert the obtained projection functions of vector a by dimensionality reductioni=aTdi, lead to It crossesEquations a, λ areCharacteristic value.
P:Best projection values of the data matrix D on projection plane α be:gi=aTdi, use nearest neighbor algorithm KNN (K- Nearest Neighbor) projection of data matrix is divided, the attacker in abnormal user scores feature (most because of it Height scoring and time high score) similitude and aggregation can not be generated by generating aggregation rather than the projection of the abnormal user of attacker, from And mark off attack user.
Further, in the K steps of above-mentioned PROJECT TRACKING and prediction process, whether judgement Attack Prediction result has effectively Two necessary conditions of property are respectively:
(1) prediction result is being provided after tracking in short term, the accurate variation in only success prediction short time Δ t, Prediction result just has timeliness;
(2) it is capable of the correlation circumstance of success prediction project, if prediction only can make correct prediction to non-targeted project, So its predictive content in accuracy without reference value, only can Accurate Prediction go out the system mode of project, prediction result is Has validity.
(2) a condition specifically can be expressed as calculating formula:
Wherein, ARuIt represents attack user u and provides the collection that the abnormal project for scoring (highest scores and time high scoring) is formed It closes (because single attack user might have one or more destination items);totalu,jIt represents user u and provides what is scored extremely The total degree that project j is tracked;Υu,jThe probability of the expression system project j Accurate Prediction abnormal to user u scorings;CONTu,jMeter It calculates the close number of state transition equation and observational equation in the short time and (meets the difference of state transition equation and observational equation Less than given minimum ξ, while meet X (t) >=ρ, Y (t) >=ω, ρ are the abnormal threshold values of state transfer, and ω is observational equation Abnormal threshold value), i.e., statistics accurately predicts number;CALuRepresent that user may be into when carrying out abnormal scoring to destination item To attack the probability of user, CALu(more detailed description as a percentage:Because the scoring behavior for attacking user has rule Property, therefore it can be concluded and is judged by tracking Forecasting Methodology, normal users may also be since itself preference be to mesh Mark project has chosen the high score for seeming abnormal, but since normal users scoring behavior does not have regularity, pre- therefore, it is difficult to be tracked Survey and CALuValue can be extremely low, so working as CALuValue is accurately chased after closer to the 100% possessed scoring anomaly item of the explanation user The degree of track prediction is higher, and the possibility which is attack user is bigger, otherwise the lower expression user of its value is attack use The possibility at family is smaller).When the PROJECT TRACKING time being more than Δ t, while certain project CALuValue is more than given prediction threshold value η When, then step L is jumped to, otherwise, return to step E.
The setting of this data area of above-mentioned Δ t is derived from the time that the attack of attack user is implemented, can be according to application Actual needs flexibly set, preferably 600s-259200s.
Advantageous effect:Commending system safety detection method proposed by the present invention based on data tracing has the following advantages that:
1st, reduced using Extended Kalman filter method EKF (Extended Kalman Filter) to a large amount of unrelated numbers According to detection so as to improve detection efficiency, and since detection in real time can be carried out to commending system so that system robustness increases Add.
2nd, the present invention is used for the safety detection of commending system, the system that can realize on-line uninterruption with tracing algorithm for the first time Detection, reduces false drop rate.
3rd, using linear discriminant analysis method LDA (Linear Discriminant Analysis), further to more Feature user dimensionality reduction reduces error rate so as to effectively detect the general picture injection attacks of malicious user and increase verification and measurement ratio.
In conclusion the present invention can overcome the problems, such as that traditional detection method is less efficient and accuracy is low, due to focusing on Tracking and prediction to project avoid a large amount of invalid or inefficient operation, so that the present invention has high efficiency, high detection The feature of rate and low error rate.
The important terms and its constraint used in the present invention are as follows:
User collects:Set U={ the U of m user's composition in system1, U2…Um};
Item Sets:Set i={ the i of n item design in system1,i2…in, hereinafter we represent to gather with j J-th of (class) project in i;
Short-term averaging variation liveness (SACA):Short-term averaging variation liveness (short-term average change Activity attack user) is reflected because constantly raising the average mark that the scoring of destination item causes destination item integrally to score The situation of fast lifting in a short time, wherein, avgtThe a certain project in Item Sets is represented in the average mark of t moment, τ is SACA Average mark correction value, FtIt represents and is left after abnormal (highest the scores and time high scoring) project that scores is removed in t moment user profile Scoring set.Specific calculation is such as formula:
Short-term variance variation liveness (SVCA):Short-term variance variation liveness (short-term variance Change activity) it is constantly to raise the side that the scoring of destination item causes destination item integrally to score for attack user The situation that difference quickly reduces in a short time, the attribute display go out the anomalous variation of destination item in a short time, and vart represents project For a certain project concentrated in the variance of t moment, υ is SVCA correction to variances values, and specific calculation is such as formula:SVCAt=| vart-vart-1- |Ft|υ|
Extended Kalman filter (EKF):The basic thought of Extended Kalman filter (Extended Kalman Filter) It is to linearize nonlinear system, then observes data by system input and output, the calculation of optimal estimation is carried out to system mode Method, its essence is a kind of efficient recursion filters.
Linear discriminant analysis (LDA):The basic think of of linear discriminant analysis (Linear Discriminant Analysis) Want the pattern sample of higher-dimension projecting to best discriminant technique vector space, classification information and compressive features space dimension are extracted to reach Several effects, Assured Mode sample has maximum between class distance and minimum inter- object distance, i.e. mould in new subspace after projection Formula has best separability within this space.Therefore, it is a kind of effective Feature Extraction Method.
Data matrix D:D=[d1,d2,...,dL] score to tracking project the dividing equally of user, variance, median The row composition data matrix that statistical natures are matrix is waited, wherein each diThe scoring statistical nature of user is contained, and is a h Capable matrix (h is the quantity of statistical nature).
Scatter matrix S in classW:The matrix of dispersion composition in sample is symmetric positive semidefinite matrix.Discrete matrix from The dense degree that value represents sample point is dissipated, the value the big more disperses, otherwise more concentrates, which is expressed as(subscript T representing matrixes or vectorial transposition, hereafter all herewith meaning).Assuming that altogether There is the project of c classification, whereinRepresent i-th of sample point of j intermediate items, wjRepresent the sample number (or weight) of j intermediate items, If the sample point number in classification is more, then its weight is also bigger;In addition, the scoring mean vector of jth intermediate item isDjRepresent data statistics vector relevant with j intermediate items in data matrix.
Scatter matrix S between classb:The matrix of dispersion composition between sample of all categories, and be symmetric positive semidefinite matrix.No It is more more discrete better between generic sample point.Wherein score mean vector
Nearest neighbor algorithm (KNN):The basic thought of nearest neighbor algorithm (K-Nearest Neighbor) is if a sample exists Most of in K in feature space most adjacent samples belong to some classification, then the sample also belongs to this classification, And with the characteristic of sample in this classification.The present invention uses this method based on the projection result of linear discriminant analysis formula, Attack user is judged as supplementary means using calculating Euclidean distance.
Dimensionality reduction transformation vector a:Original multidimensional data is reduced to one-dimensional and so that different classes of separating effect is most bright Aobvious optimal vector.
Random attack:Random attack is a kind of simple challenge model, and the scoring of filling project comes from respective items purpose Average score, destination item then assign highest scoring or lowest score.Although attacking at random easy to implement, its effect is less It is preferable.
Prevalence attack:Prevalence attack is a kind of development form attacked at random, and construction thinking meets Zipf long-tail laws There was only the concern that fewer project can attract plurality people.Attacker is by popular project alternatively project, the stream of project Stroke degree is weighed usually using the number that it is scored.Popular challenge model attack effect is preferable than random attack effect, real It applies nor very complicated.
Attack scale (Attack Size):All users in system shared by user profile number are attacked i.e. in points-scoring system The percentage of general picture number.
Description of the drawings
The present invention is described in further detail with embodiment below in conjunction with the accompanying drawings.
Fig. 1 is the detection method flow chart of commending system.
Fig. 2 is the run time comparison diagram with SVM and UnRAP detection methods.
Specific embodiment
The detection method of the present invention is applied to the e-commerce scene of reality by a specific embodiment presented below In.Here, the score data library using MovieLens data sets as commending system is tested, the data set is by the U.S. The GroupLens research groups of Minnesota universities provide, it comprises 943 users to 100000 of 1682 films Scoring record, therefore have many advantages, such as attribute enrich, data it is true, be widely used in the fields such as data mining.Institute of the present invention The contrasting detection method used has SVM and UnRAP, and these two kinds of methods are all detection sides more popular in current research and application Method.The present invention using common random attack in commending system and popular attack to the method for the present invention and SVM and UnRAP methods into Row attack, the attack scale (Attack Size) of this 2 class attack pattern is respectively 3%, 15%, passes through the comparison of experimental result To prove the superiority of the method for the present invention.
Next specific implementation steps are described:
1. data prediction
Step 1) traverses all users in current MovieLens data sets, obtains the history score data of all users, The destination item scoring wherein also attacked comprising attack user.
Step 2) is identified project data statistical characteristics of the j in t moment according to the respective history score data of all users avgtAnd vart, t=30s, i.e. system are primary per 30s detection datas in the present embodiment.
2. PROJECT TRACKING and prediction
Step 3) is according to the avg of project jt-1And vart-1Calculate the SACA values and SVCA for obtaining current project j in t moment Value.
Step 4) EKF filtering initialization, obtain current project j is in the system mode of t momentIt sees Survey state Y (t) and system mode error co-variance matrix P (t).
Step 5) calculates project j in system mode X (t+1 | t)=f [t, X (t)]+G (t) W (t), t+1 at t+1 moment Observation state Y (t+1 | t)=h [t, X (t)]+V (t) at quarter.
Step 6) solves first-order linear state equation, calculates state-transition matrix Φs (t+1) of the project j at the t+1 moment,
Step 7) solves first-order linear observational equation, calculates observing matrix Hs (t+1) of the project j at the t+1 moment,
Step 8) calculates system mode error covariance using the project j state-transition matrixes extrapolated at the t+1 moment Matrix P (t+1 | t), P (t+1 | t)=Φ (t+1) P (t | t) ΦT(t+1)+Q。
Step 9) solution project j the kalman gain matrix K (t+1) and control convergence speed at the t+1 moment, K (t+1)= P(t+1|t)HT(t+1)(H(t+1)P(t+1|t)HT(t+1)+R)。
Step 10) renewal item j is in system mode X (t+1)=X (t+1 | the t)+K (Y (t)-Y (t+1 | t) at t+1 moment), Update corresponding system mode error co-variance matrix P (t+1)=(I simultaneouslyn-K(t+1)H(t+1))P(t+1|t)。
Step 11) attacks the next step behavior of user according to historical data Accurate Prediction, that is, judges Attack Prediction knot at this time Whether fruit has validity.Predicting effective two necessary conditions is respectively:(1) prediction knot is being provided after tracking in short term Accurate variation in fruit, only success prediction short time Δ t, prediction result just have timeliness, this data area of Δ t is set The time that the fixed attack for being derived from attack user is implemented, the setting of the time have to comply with timely and effective this primary condition of property, But if increase is too fast, too significantly for the anomaly item then without tracking, ordinary people can also have found the exception of the project, thus It is exposed, this has also violated the purpose of attacker.Therefore, the original meaning of design here is exactly the implementation of pursuit attack user.And Obviously this implementation time is unfixed, is set as 600s-259200s in the present embodiment.
It can flexibly be set according to the actual needs of application in fact;(2) it is capable of the correlation circumstance of success prediction project, if in advance Survey only can make correct prediction to non-targeted project, then its predictive content, without reference value, only can in accuracy Accurate Prediction goes out the system mode of project, and prediction result just has validity.(2) a condition specifically can be expressed as calculating formula:
ARuRepresent attack user u provide set that the project of abnormal scoring (highest score and time high score) formed (because One or more destination items are might have for single attack user);totalu,jIt represents user u and provides the project j to score extremely The total degree being tracked;Υu,jThe probability of the expression system project j Accurate Prediction abnormal to user u scorings;CONTu,jIt calculates short (difference for meeting state transition equation and observational equation is less than the close number of state transition equation and observational equation in time Given minimum ξ, while meet X (t) >=ρ, Y (t) >=ω, ρ are the abnormal threshold values of state transfer, and ω is observational equation exception Threshold value), i.e., statistics accurately predicts number;CALuRepresent that user attacks because to being likely to become during the abnormal scoring of destination item progress Hit the probability of user, CALu(more detailed description as a percentage:Because the scoring behavior for attacking user has regularity, Therefore it can be concluded and is judged by tracking Forecasting Methodology, normal users may also be since itself preference be to target item Mesh has chosen the high score for seeming abnormal, but since normal users scoring behavior does not have regularity, therefore, it is difficult to be tracked prediction and CALuValue can be extremely low, so working as CALuValue illustrates that scoring anomaly item possessed by the user is pre- by accurate tracking closer to 100% The degree of survey is higher, and the possibility which is attack user is bigger, otherwise the lower expression user of its value is attack user Possibility is smaller).When the PROJECT TRACKING time being more than Δ t, while certain project CALuWhen value is more than given prediction threshold value η, Then jump to step 12), otherwise, return to step 5).
3. attack user's classification
Step 12) predicts tracking the project j in obtained project is traversed, and obtains all to destination item progress The abnormal user set Att of scoringi={ att1,att2,…attlAnd data matrix D, D=[d1,d2,...,dL];Because it attacks It hits user to score higher and more concentrate, therefore their mean value and variance are also relatively, and almost identical, utilize this feature The score data of user can be classified.
Step 13) calculates scoring the mean vector μ, the scoring mean vector μ of j-th of project of all abnormal usersj
Step 14) calculates scatter matrix S in the class of data matrix DbThe scatter matrix S between classW
It is that dimensionality reduction transformation is vectorial that step 15), which sets a, vectorial diIt is g by the obtained projection functions of dimensionality reduction transformation vector ai= aTdi, pass throughEquations a, λ areCharacteristic value.
Best projection values of step 16) the data matrix D on projection plane α be:gi=aTdi, use nearest neighbor algorithm KNN (K-NearestNeighbor) projection of data matrix is divided, the attacker in abnormal user scores feature (most because of it Height scoring and time high score) similitude and aggregation can not be generated by generating aggregation rather than the projection of the abnormal user of attacker, from And mark off attack user.
4. interpretation of result and verification
Fig. 2 is the method for the present invention with being compared with times of the classical attack method SVM and UnRAP when user is attacked in detection. The present invention using the average time of the processing of each user as the index (average everyone score number 106 times) of run time, Under the conditions of identical running environment, reach identical Detection accuracy, it is right that the unit interval that the method for the present invention is run is better than other Ratio method this is because the calculation that the method for the present invention is applied is simply rapid, by comparing on a large scale, finds abnormal number According to, then these abnormal datas are tracked, judged according to the result of tracking prediction.At the same time, the method for the present invention is taken Linear discriminant analysis formula method can further reduce required data volume to be processed.And other methods detect malicious user when It waits, needs constantly to handle abnormal data, and do not have the real-time of the method for the present invention.
Above example shows the malicious user that the method for the present invention can effectively in detecting system, enhances the strong of system Strong property, and operational efficiency also increases substantially, therefore the method for the present invention has important application value.The foregoing is merely The specific embodiment of the present invention, is not intended to limit the invention, and data set and attack mode used only limit in the present embodiment In the present embodiment, all within the spirits and principles of the present invention, any modification, equivalent replacement, improvement and so on should all include Within protection scope of the present invention.

Claims (4)

1. a kind of commending system safety detection method based on data tracing, it is characterised in that include the following steps:
1) user's history score data pre-processes, and detects the history score data of scoring item;
2) PROJECT TRACKING and prediction, the characteristics of can be applied to time nonlinear dynamic system using Extended Kalman filter, tracking And prediction term purpose scoring situation;
3) attack user classification carries out cluster analysis using linear discriminant analysis formula to the scoring abnormal user in project, so as to Judge attack user and its general picture in the project.
2. commending system safety detection method according to claim 1, it is characterised in that:
The data prediction specifically includes:
A:The scoring of all items in current commending system is traversed, obtains the history score data of all users;
B:According to the respective history score data of all users identify project collection i in project j t moment data statistical characteristics The mean value that scores avgtAnd scoring variance vart
The PROJECT TRACKING and prediction specifically include:
C:According to the avg of project jt-1And vart-1Calculate acquisition project j t moment short-term averaging variation liveness SACA values and Short-term variance changes liveness SVCA values;
D:Extended Kalman filter initializes, and obtains current project j and is in the system mode of t momentObservation State Y (t) and system mode error co-variance matrix P (t);
E:Calculating project j is in system mode X (t+1 | t)=f [t, X (t)]+G (t) W (t) at t+1 moment, the observation at t+1 moment State Y (t+1 | t)=h [t, X (t)]+V (t), wherein, W (t) is process noise, and f (t) and h (t) they are that system does first order Taylor exhibition The nonlinear function opened, V (t) are observation noise, and G (t) is noise profile matrix;
F:First-order linear state equation is solved, calculates state-transition matrix Φs (t+1) of the project j at the t+1 moment,
G:First-order linear observational equation is solved, calculates observing matrix Hs (t+1) of the project j at the t+1 moment,
H:System mode error co-variance matrix P (t+1 are calculated using the project j state-transition matrixes extrapolated at the t+1 moment | t), and P (t+1 | t)=Φ (t+1) P (t | t) ΦT(t+1)+Q, wherein, Q be process-noise variance value, ΦTTransposition for Φ;
I:Solution project j the kalman gain matrix K (t+1) and control convergence speed at the t+1 moment, K (t+1)=P (t+1 | t) HT(t+1)(H(t+1)P(t+1|t)HT(t+1)+R), wherein, variance yields of the R for white Gaussian noise, HTTransposition for H;
J:Renewal item j is in system mode X (t+1)=X (t+1 | the t)+K (Y (t)-Y (t+1 | t) at t+1 moment), it updates simultaneously Corresponding system mode error co-variance matrix P (t+1)=(In- K (t+1) H (t+1)) and P (t+1 | t), wherein InFor corresponding n Rank unit matrix;
K:The next step behavior of user is attacked according to historical data Accurate Prediction, that is, judges whether Attack Prediction result has at this time Validity;
The attack user classification specifically includes:
L:Obtained destination item, which traverses, to be predicted to tracking, obtains all abnormal users to score destination item Set Atti={ att1,att2,L attlAnd data matrix D, D=[d1,d2,...,dL];
M:Calculate scoring the mean vector μ, the scoring mean vector μ of j-th of project of all abnormal usersj
N:Calculate scatter matrix S in the class of data matrix DbThe scatter matrix S between classW
O:A is that dimensionality reduction transformation is vectorial, vectorial diIt is g to convert the obtained projection functions of vector a by dimensionality reductioni=aTdi, pass throughEquations a, λ areCharacteristic value;
P:Best projection values of the data matrix D on projection plane α be:gi=aTdi, using nearest neighbor algorithm KNN to data matrix Projection divided, the attacker in abnormal user produces due to similitude of its score scoring of feature, that is, highest and time high scoring The projection of the abnormal user of raw aggregation rather than attacker can not generate aggregation, so as to mark off attack user.
3. commending system safety detection method according to claim 2, it is characterised in that the PROJECT TRACKING and predicted In the K steps of journey, two necessary conditions whether judgement Attack Prediction result has validity are respectively:
1) prediction result is being provided after tracking in short term, the accurate variation in only success prediction short time Δ t, prediction knot Fruit just has timeliness;
2) it is capable of the correlation circumstance of success prediction project, if prediction only can make correct prediction to non-targeted project, then Its predictive content in accuracy without reference value, only can Accurate Prediction go out the system mode of project, prediction result just has Validity can be expressed with the formula that is calculated as below:
Wherein, ARuIt represents attack user u and provides the set that abnormal scoring i.e. highest scores and the project of time high scoring is formed;
totalu,jIt represents user u and provides the total degree that the project j to score extremely is tracked;
Υu,jThe probability of the expression system project j Accurate Prediction abnormal to user u scorings;
CONTu,jThe close number of state transition equation and observational equation in the short time is calculated, counts accurately prediction number;
CALuProbability of the user because carrying out being likely to become attack user to destination item during abnormal scoring is represented, when PROJECT TRACKING Between when being more than Δ t, while certain project CALuWhen value is more than given prediction threshold value η, then step L is jumped to, otherwise return to step Rapid E.
4. commending system safety detection method according to claim 3, it is characterised in that the ranging from 600s- of the Δ t 259200s。
CN201610120727.2A 2016-03-03 2016-03-03 A kind of commending system safety detection method based on data tracing Active CN105809030B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610120727.2A CN105809030B (en) 2016-03-03 2016-03-03 A kind of commending system safety detection method based on data tracing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610120727.2A CN105809030B (en) 2016-03-03 2016-03-03 A kind of commending system safety detection method based on data tracing

Publications (2)

Publication Number Publication Date
CN105809030A CN105809030A (en) 2016-07-27
CN105809030B true CN105809030B (en) 2018-07-10

Family

ID=56466017

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610120727.2A Active CN105809030B (en) 2016-03-03 2016-03-03 A kind of commending system safety detection method based on data tracing

Country Status (1)

Country Link
CN (1) CN105809030B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111126103B (en) * 2018-10-30 2023-09-26 百度在线网络技术(北京)有限公司 Method and device for judging life stage state of user
CN109391626B (en) * 2018-11-15 2021-07-30 东信和平科技股份有限公司 Method and related device for judging whether network attack result is unsuccessful
CN111192429A (en) * 2020-01-20 2020-05-22 天津合极电气科技有限公司 Fire early warning detection method based on charge trajectory tracking technology
CN111175046A (en) * 2020-03-18 2020-05-19 北京工业大学 Rolling bearing fault diagnosis method based on manifold learning and s-k-means clustering
CN112312169B (en) * 2020-11-20 2022-09-30 广州欢网科技有限责任公司 Method and equipment for checking program scoring validity

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102118382A (en) * 2010-10-31 2011-07-06 华南理工大学 System and method for detecting attack of collaborative recommender based on interest combination
CN102184364A (en) * 2011-05-26 2011-09-14 南京财经大学 Semi-supervised learning-based recommendation system shilling attack detection method
CN104809393A (en) * 2015-05-11 2015-07-29 重庆大学 Shilling attack detection algorithm based on popularity classification features

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102118382A (en) * 2010-10-31 2011-07-06 华南理工大学 System and method for detecting attack of collaborative recommender based on interest combination
CN102184364A (en) * 2011-05-26 2011-09-14 南京财经大学 Semi-supervised learning-based recommendation system shilling attack detection method
CN104809393A (en) * 2015-05-11 2015-07-29 重庆大学 Shilling attack detection algorithm based on popularity classification features

Also Published As

Publication number Publication date
CN105809030A (en) 2016-07-27

Similar Documents

Publication Publication Date Title
CN105809030B (en) A kind of commending system safety detection method based on data tracing
Horvat et al. The use of machine learning in sport outcome prediction: A review
Ryman-Tubb et al. How Artificial Intelligence and machine learning research impacts payment card fraud detection: A survey and industry benchmark
Yang et al. Estimating user behavior toward detecting anomalous ratings in rating systems
CN102184364A (en) Semi-supervised learning-based recommendation system shilling attack detection method
CN104484602B (en) A kind of intrusion detection method, device
CN101826105B (en) Phishing webpage detection method based on Hungary matching algorithm
Ramchandran et al. Unsupervised anomaly detection for high dimensional data—An exploratory analysis
Giatsoglou et al. Nd-sync: Detecting synchronized fraud activities
Zhang et al. Graph embedding-based approach for detecting group shilling attacks in collaborative recommender systems
Du et al. An overview of correlation-filter-based object tracking
CN104851025A (en) Case-reasoning-based personalized recommendation method for E-commerce website commodity
Mahmood et al. Using machine learning techniques for rising star prediction in basketball
CN104751353A (en) Cluster and Slope One prediction based collaborative filtering method
JP2020524346A (en) Method, apparatus, computer device, program and storage medium for predicting short-term profits
Maldonado et al. Out-of-time cross-validation strategies for classification in the presence of dataset shift
Roy et al. Exploiting Deep Learning Based Classification Model for Detecting Fraudulent Schemes over Ethereum Blockchain
Geng et al. Novel IAPSO-LSTM neural network for risk analysis and early warning of food safety
CN109857928A (en) User preference prediction technique based on polynary credit evaluation
Mim et al. A soft voting ensemble learning approach for credit card fraud detection
Tekin et al. Customer lifetime value prediction for gaming industry: fuzzy clustering based approach
CN106682875A (en) Data analyzing and processing technology based marketing campaign prize supplier recommendation method
CN114510645B (en) Method for solving long-tail recommendation problem based on extraction of effective multi-target groups
Srivastava et al. Usage of Analytics in the World of Sports
CN108805162A (en) A kind of saccharomycete multiple labeling feature selection approach and device based on particle group optimizing

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20160727

Assignee: NUPT INSTITUTE OF BIG DATA RESEARCH AT YANCHENG CO., LTD.

Assignor: Nanjing Post & Telecommunication Univ.

Contract record no.: X2019980001249

Denomination of invention: Data tracking based recommendation system security detection method

Granted publication date: 20180710

License type: Common License

Record date: 20191224

EE01 Entry into force of recordation of patent licensing contract