CN110334780A - Streaming big data security processing - Google Patents

Streaming big data security processing Download PDF

Info

Publication number
CN110334780A
CN110334780A CN201910641669.1A CN201910641669A CN110334780A CN 110334780 A CN110334780 A CN 110334780A CN 201910641669 A CN201910641669 A CN 201910641669A CN 110334780 A CN110334780 A CN 110334780A
Authority
CN
China
Prior art keywords
user
node
degree
similarity
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201910641669.1A
Other languages
Chinese (zh)
Inventor
马涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Weijia Software Co Ltd
Original Assignee
Chengdu Weijia Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Weijia Software Co Ltd filed Critical Chengdu Weijia Software Co Ltd
Priority to CN201910641669.1A priority Critical patent/CN110334780A/en
Publication of CN110334780A publication Critical patent/CN110334780A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Strategic Management (AREA)
  • Primary Health Care (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Computing Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of streaming big data security processings, this method comprises: calculating the centripetal degree feature in part of user node based on social networks topology;The similarity between the session content that each user issues every time is calculated, by the way that risk discrimination threshold is arranged, determines that the centripetal degree in part is higher than the user node of preset threshold lower than preset threshold and session content similarity, is identified as the user that breaks one's promise.The invention proposes a kind of streaming big data security processings, reduce the quantity of user behavior characteristics to be analyzed, eliminate the redundancy between feature, the disaggregated model of use more highly effective, the speed of credit evaluation and the accuracy of assessment are effectively improved, the streaming computing scene of mass data has been better adapted to.

Description

Streaming big data security processing
Technical field
The present invention relates to network security, in particular to a kind of streaming big data security processing.
Background technique
The development of internet communication and big data technology provides solid data and technology for determining user credit grade Basis.According to the study found that the internet behavior of user is way of realization of the behavior of people on internet carrier, essence with Social action is consistent, and the variation of assets, management state can be embodied by its network behavior.And network behavior data institute table The social networks revealed be more considered with user's reference have strong correlation, therefore user credit be not only present in financial statement, It mortgages among business information, but also may be embodied in the unstructured datas such as relevant user behavior data, social relationships.This A little data can constantly generate simultaneously input data analysis and excavate engine.Compared with traditional data, stream data shows reality The features such as Shi Xing, volatibility, sudden, randomness and unlimitedness.Since Internet service is to the high request of system response time, These data generally require to analyze and calculate in real time.Therefore under the stream data environment of internet mass, how use is improved The calculating accuracy rate and real-time of family credit become main problem urgently to be solved in big data analysis field.It is advised in network Mould is in today that geometry grade increases, and detected data volume is extremely huge, and traditional network analysis monitoring instrument and platform are difficult It deals with;And a large amount of resource and time can be expended by storing and processing a large amount of social network data.But with user behavior and The increasingly complex of social networks, existing method cannot achieve the identification of risk subscribers behavioural characteristic and to the users that breaks one's promise The control management of progress, and cause to calculate time lag.
Summary of the invention
To solve the problems of above-mentioned prior art, the invention proposes a kind of streaming big data safe handling sides Method, comprising:
Based on social networks topology, the centripetal degree feature in part of user node is calculated;The social networks network topology Node indicates user, and side indicates the social networks between user;
The centripetal degree in the part of the node indicates after removing the node from network, the journey of the associated energies decline of network Degree;
Wherein the associated energies of social network diagram G are defined as:
EL(G)=∑ θ2
θ indicates the characteristic value of the Kirchhoff's matrix of figure G;
The Kirchhoff's matrix of the figure G is L (G)=D (G)-A (G);
A (G) is the adjacency matrix of figure G, and D (G) is the diagonal matrix of vertex out-degree.
For there is n node, out-degree is respectively d1, d2..., dnTopological diagram G, associated energies are
In social networks network, user is indicated with node, side indicates social networks.From node ViIt is directed toward node VjSide a =(i, j) indicates user i, and there are at least one sessions with j.
It is H, the then centripetal degree in the part of vertex v by seal of the vertex v after being removed in figure G are as follows:
Cv=EL(G)-EL(H)
The centripetal degree in part of each user is calculated, and compared with pre-selected centripetal degree threshold value;
Then the session content of text in stream data is decomposed into phrase, then calculates this using the method that bag of words are analyzed The semantic distance of a little phrases;
It is used to calculate session content similarity using closed bag of words characteristic set;Each characteristic set includes a column Table includes the word of similar semantic in list;By checking the similarity of these words, the similarity of entire content is obtained, into And count the similarity between the session content that each user issues every time;
Similarity between the session content that the centripetal degree in part that each user has been calculated and each user are issued every time Later, by the way that risk discrimination threshold is arranged, the centripetal degree in part is obtained by filtration lower than presetting centripetal degree threshold value and session content phase It is higher than the user node of default similarity threshold like degree, is identified as the user that breaks one's promise.
The present invention compared with prior art, has the advantage that
The invention proposes a kind of streaming big data security processings, reduce the number of user behavior characteristics to be analyzed Amount, eliminates the redundancy between feature, the disaggregated model of use more highly effective, effectively improve credit evaluation speed and The accuracy of credit evaluation has better adapted to the streaming computing scene of mass data.
Detailed description of the invention
Fig. 1 is the flow chart of streaming big data security processing according to an embodiment of the present invention.
Specific embodiment
Retouching in detail to one or more embodiment of the invention is hereafter provided together with the attached drawing of the diagram principle of the invention It states.The present invention is described in conjunction with such embodiment, but the present invention is not limited to any embodiments.The scope of the present invention is only by right Claim limits, and the present invention covers many substitutions, modification and equivalent.Illustrate in the following description many details with Just it provides a thorough understanding of the present invention.These details are provided for exemplary purposes, and without in these details Some or all details can also realize the present invention according to claims.
An aspect of of the present present invention provides a kind of streaming big data security processing.Fig. 1 is according to embodiments of the present invention Streaming big data security processing flow chart.
The present invention realizes the detection to social networks behavior by monitoring social network user behavior record, generates behavior Risk-warning information.User behavior record includes social networks session information.It optionally, further include customer transaction record information. For session information, Port Mirroring is carried out in the outlet of clustered node, it will words text imports the host for being used for safety detection, Stream data original message is captured, message is decoded and is pre-processed before it is forwarded to detecting and alarm.Pretreatment includes meeting Words classification, fragment recombination and session reduction.By matching storage detected rule in the database and predefined after pretreatment Feature of risk code checks Stream Data messages head and load, is identified and intercepted to risk behavior.
Wherein, the content of text in the social networks conversation recording includes but is not limited to, instant messaging chat record, from Media releasing information, microblogging or forum's message information, the review record of news website, electric business website evaluation information etc..Above-mentioned society Network behavior record is handed over to be only for example, specific social networks behavior also can be exemplified difference under actual conditions, not make herein specific It limits.
In the analysis of social networks behavior record, using rule match and the behavior model based on user carries out risk behavior Feature extraction.Social networks behavior record is obtained first from server, then according to the decision rule in database to day Will file carries out pattern match, and the redundant recording that normal behaviour generates is eliminated before carrying out credit evaluation, to identify and mention Take violations present in record.
Generally realize that information jumps by cyberrelationship between user.These, which jump path, can indicate that some user accesses The operation of community website.By scanning social networks graph structure, the binary group of current user to be analyzed and association user, table are established Show between the two jump relationship.Then by Analysis server log, establish path of the user in the actual access page and Behavior.
According to behavioural information and warning information provided above, by using risk statistics, vulnerability analysis, availability Analysis is counted.The stream data training set and test set of social network user behavior are read respectively, it will be pre- by standardization The training set data and test set data of processing carry out dimension-reduction treatment using principal component analysis, remove redundant data, carry out data Dimensionality reduction forms collage-credit data feature set.
Collage-credit data feature can select one or more from following stream data set: history credit feature, such as with Family passes through the payment history and refund historical record of financial web site, and the purchase of shopping website is replaced, record of cancelling an order;Society Relationship characteristic, i.e. user establish the credit data of associated other users in social networks, further include that user is associated with other Contacts closeness, depth and the range of user, such as relationship are held time, session ratio etc.;Behavior preference feature, according to user Access type, period, frequency and the social networks evaluation information of webpage or application, counting user Behavior law;Identity attribute is special Sign predicts personal identification association attributes, including age, occupation, marriage, education degree that is, by user network behavior, and verifies With the consistency of the essential information of user's input.Features described above information is only for example, the spy that the sample under actual conditions is included The quantity of reference breath, can be more or less than the quantity of shown information, and specific features information also can be exemplified difference, not make herein It is specific to limit.
Multidimensional data feature is screened from collage-credit data feature set, and the data characteristics of training set and type are loaded into DBN Belief network classifier is trained by DBN belief network, and the feature for being then loaded into test set predicts its type, Obtain the data classification prediction result of test set.Wherein, the multiple hidden layers of DBN belief network classifier, and hidden layer Between calculated using different excitation functions.Training data after DBN belief network classifier training by obtaining as a result, so After be loaded into test data, test data set type is predicted according to training effective result.Prediction result is obtained, i.e. completion base In the detection of the credit grade of machine learning.Finally, be loaded into test set type mark and DBN belief network classifier predicted value into Row comparison and assessment.
When classifying to the collage-credit data feature selected, it is preferable that attempted by defining hyperplane by data set It is divided into 2 class of positive sample and negative sample.Assuming that there are the data sample set of 2 class linear separabilities: (xi, yi), i=1,2 ..., n, n For sample size, yi∈ {+1, -1 }, meets the following conditions:
yi(ω·xi) -1 >=0,
ω is characterized weight adjustment parameter, makes | | ω | |2/ 2 the smallest classifiers are optimal classifier, by collage-credit data The Solve problems of optimum classifier are converted to quadratic programming optimization problem:
Wherein: a1For Lagrange multiplier and a1>=0, constraint condition are as follows:
According to above solution, optimum classifier function is obtained are as follows:
Sgn is to take sign function.
If optimum classifier cannot separate 2 class points, fault-tolerant factor ξ is introducedi>=0, so that:
The discrimination threshold of Λ expression broad category device.Identifying CP indicates penalty factor, can obtain the optimum classifier of broad sense. The dual problem of Generalized optimal classifier is identical with linear classification situation, only by a1It is changed to:
0≤ai≤ CP, i=1,2 ..., n
For linear classification problem, related associated data group is subjected to mapping processing, after being mapped to higher dimensional space, Jin Ertong The linear classification for crossing linked character carrys out Solve problems.The corresponding classification function of feature at this time are as follows:
Φ representative function: Φ (xi, x) and=[(xxi)+1]ξi
It is completed as a result, according to the value of f (x) to the user credit tagsort based on feature selecting.
Optionally, after extracting reference characteristic information, by deep learning by identified risk subscribers and currently wait divide The Feature Mapping of the behavioral data of user is analysed into the newly-built feature space with risk identification, is added in newly-built feature space The similarity score of both weight averages, and then it is similar to the behavioural characteristic of risk subscribers to obtain calculating currently user to be analyzed Degree.
In the training study stage of stream data, a large amount of reference behaviors library is used in advance, the depth mind that training obtains The characteristic information of reference is extracted through network, training sample set derives from social networks session data collection and data of financial transaction Collection.In the deep neural network structure, an activation primitive layer is closely followed after each convolutional layer.For will linearly input conversion At nonlinear object, the output expression formula of hidden layer node are as follows:
hi(x)=maxJ ∈ [1, k][xTWij+bijWij]
In formula, WijIndicate the i-th column jth row nodal value in eigenmatrix, bijIndicate the balance of the i-th column jth row node because Son, each implicit layer unit has corresponded to k sub- hidden layers, by a maximum conduct in this k sub- hidden layer node output valves The output of activation primitive.Using 2 activation primitive nodes, after each convolutional layer, former port number reduces 1/2.
In the layer of pond, maximum characteristic mean in neighborhood is taken to export as the new feature value of neighborhood, above and below retention behavior Literary information, specific method are that behavioural characteristic data are carried out maximum value pond after convolution operation and activation primitive activation respectively Change and average value pond export obtain 2 pond results cascades as new feature.
Feature extraction, deep neural network are carried out to reference feature representation ability network model good enough in order to obtain In the mapping space obtained using the training of limited data sample, spacing is as small as possible in class, and spacing is as big as possible between class.Cause This increases cluster constraint in the calculating of cost function L, assembles homogeneous data mutually and is located remotely from each other without homogeneous data:
In formula, m is number of clusters, xiFor the sampling feature vectors in ith cluster, WTFor the transposed matrix of regression matrix, λ is weight attenuation parameter, cxiIt is characterized vector xiPondage factor.
In risk prevention, in addition to the behaviors such as the transaction promise breaking for needing to identify user, the user in social networks is further related to The risk of fraud between individual.The fraud type user normal users that typically disguise oneself as gain other people trust by cheating, pass through network False propaganda is carried out, early investment regular hour and actual benefit win the trust of victim, and are obtaining unlawful interests After rapidly disappear hiding, or single, crowd the name such as raises and illegally accumulates wealth by unfair means by means of brush.When identification has the user of fraud, key exists In the extraction and expression of its behavioural characteristic.The present invention indicates user behavior by a reference sequence cluster, reference sequence cluster In include characteristic sequence set, without carrying out any annotation or priori to behavioral structure, it is automatic to be directly realized by characteristic sequence set Classification and study.
Firstly, the network behavior of user is decomposed into essential characteristic sequence, secondly, characteristic sequence is transformed to index sequence Column.Obtain training behavior set { (Vn, yn) n=1,2 ..., N, wherein VnFor the behavior set of some user, yn∈ [1, 2 ..., C] it is operating characteristics type label.N is the quantity of user's operation, and C is number of types.For example, user is cheated and is gone For analysis feature include that the social networks topological parameter of user, good friend maintain duration, add good friend's quantity in preset time, delete Except good friend's quantity, good friend maintain duration and fund transfer accounts the ratio of time, fund is transferred to the deleted number of good friend in same order Amount and fund be transferred to the ratio of total degree with etc..
Then by behavior VnIt is expressed as characteristic sequence Xn, it is defined as follows:
Xn=[X1,n..., Xi,n..., Xln,n]
In formula: Xi,nIt is the feature set calculated the i-th period;lnIndicate VnThe quantity of middle period.
It is μ={ p by a characteristic sequence set expressioni| i=1 ..., Np, NpIt is characterized the quantity of arrangement set.I-th A characteristic sequence piIt is defined as { Xi, τi};
τ in formulaiFor detection threshold value.
To calculate Xi, first to all training characteristics sequence { X1..., XNMatrixing is carried out, it is represented with obtaining to have The period of property and the index for clustering all periods, transformation matrix A is indicated are as follows:
In formula:WithRespectively Xi,nAnd Xk,mThe Fisher vector of middle description type t.
Then, to i-th of reference sequence cluster, by the way that detection threshold value τ is arrangediTraining data sequence is established, is avoided noisy The sequence pattern of sound is mined.
It is X for a characteristic sequencen, by XnIndex sequence is converted to, is expressed as
In=[I1,n..., Ii,n..., Iln,n]
In formula: Ii,nFor ith feature sequence index, using characteristic sequence detection model to XnProcessing, selection index Ii,n The response of SVM is set to reach maximum.
From the index sequence [I after training1, I2..., IN] in characteristic sequence set R obtained by data mining algorithm, it is special Levying arrangement set R indicates a user's operation Similarity of Local Characteristic Structure, j-th of sequence RjIt is defined as follows:
Rj={ cj, sj, xj, wj}
In formula: cj∈ [1,2 ..., C] is operating characteristics type label;sjFor sequence pattern;xjIt is characterized arrangement set spy Sign;wjIt is to indicate RjIn operating characteristics type cjWeight.
To calculate sj, training data index is collected first.Then from the training index sequence sequence of calculation mode being collected into, Identical sequence pattern can be excavated from two operating characteristics classes and be obtained, so one weight w of settingj, for mode sj, wj Indicate sjOpposite supporting rate.If same mode occurs in more than two operating characteristics types, two characteristic sequences Gather weight to reduce.If a mode value appears in a type, weight is up to maximum value 1.
One specific action type of each characteristic sequence set expression, characteristic sequence set remain characteristic sequence when Between relationship.Due to the diversity of type, can effective simulation complexity feature.If behavioral test VT, characteristic sequence set R, The evaluation function of one operating characteristics c may be expressed as:
In formula: αj,c, βj,c, γj,cFor the parameter of j-th of characteristic sequence set in operating characteristics type c.NRIt is characterized sequence Column set number.ITFor sequence index, XTFor VTCharacteristic sequence, σ (IT,sj) it is sequence reference feature.σ(IT,sj) based on The structural similarity between behavioral test and characteristic sequence set is calculated, if initial value F (n, 0)=0, n ∈ [0, L];F (0, m)=- M, m ∈ [0, mj], L is in ITThe quantity of middle period, mjFor sequence pattern sjLength.Therefore, matching matrix F is defined as follows:
F (n, m)=max { -1+A (Xn,T, Xm,j), F (n-1, m), F (n, m-1) }
Sequence reference feature is will to indicate that a long sequence i.e. cycle tests of whole operation feature structure is short to homogeneous Sequence describes the part-structure of an operating characteristics.Work as sjWhen matching with cycle tests, σ (IT,sj) there is maximum reference to obtain Point:
σ(IT,sj)=max (F (n, mj)/mj)
Understanding and cognitive phase in collage-credit data feature realize σ (I using step analysis algorithm to be accurate quickT,sj) The identification of sequence reference feature makes distribution within class matrix SwOrder it is small as far as possible;Distribution between class matrix S simultaneouslybOrder it is big as far as possible, with The classification performance being optimal.Calculate Fisher function J:
In formula:For a n dimensional vector.Made by choosingIt is maximumFor projecting direction, obtained most after projection Big SbWith minimum Sw;One group of best discriminant vector is selected to establish projection matrix W, is indicated are as follows:
Finally in the study based on step analysis, dimensionality reduction is carried out to projection matrix W using PCA, eliminates redundancy feature letter Breath completes the identification of risk subscribers feature.
And after the reference risk identification of certain customers is completed, credit evaluation for other new users can be with base In depth network analysis, currently the behavior pattern of new user and risk subscribers whether there is similitude, realize risk subscribers and current The feature of user to be analyzed identifies.Specifically, the feature sample of the risk subscribers having confirmed that and current user to be analyzed is recorded first This binary group (xf, xc), wherein xf、xcRespectively indicate risk subscribers and the reference feature vector of user to be analyzed.Deep learning Target is to find a mapping function f, so that f (xf)、f(xc) meet in newly-built feature space under relation of plane: when to be analyzed When new user has similar behavior pattern feature to risk subscribers, f (xc) and f (xf) the distance between it is as small as possible;When user with When risk subscribers do not have similar behavior pattern feature, f (xf) and f (xc) distance it is as big as possible.
For problem is further simplified, one convolutional network of training before deep learning algorithm, by learning a component Layer nonlinear transformation projects to feature samples binary group in newly-built feature space, and positive sample is to more than default threshold within this space Value, negative sample is to the preset threshold is less than, to make accurate judgment in depth network.
Assuming that depth network is M layers shared, m layers have a neuron of p (m), wherein m=1,2,3 ..., M, to given use M layers of output of family behavioural characteristic vector are as follows:
hm=tanh (W(m)h(m-1)+b(m));
In formula,W(m)For m layers of weight parameter, b(m)For m layers of amount of bias, xf, xc pass through Above-mentioned M layers of nonlinear transformation obtains:
F(xf)=hf (M), F (xc)=hc (M), risk subscribers are at a distance from current user to be analyzed is in new feature space Are as follows: d2 fc(xf, xc)=| | F (xf)-F(xc)||2
Then the behavior pattern measuring similarity of user and risk subscribers should then meet:
d2 fc(xf, xc) < τ -1, then xfAnd xcThere is behavior pattern similarity;
d2 fc(xf, xc) > τ+1, then xfAnd xcWithout behavior pattern similarity;
τ indicates that the risk distance threshold of setting, negative sample positive in this way are divided well on newly-built feature space in formula It separates.
Then setting optimization object function is
Wherein
β is to adjust operator to obtain weight parameter W, amount of bias b according to above-mentioned formula using stochastic gradient descent algorithm.
New character representation binary group (x' is obtained by deep learningf, x'c), it can be respectively obtained using similarity algorithm The behavior pattern similarity Sfc (x' of current user to be analyzed and some risk subscribersf, x'c):
I.e. final user's similarity estimated value.Wherein x'fi, x'ciRespectively feature vector x'f, x'cI-th of component, D is the dimension of feature vector.
If the behavior pattern similarity of current user to be analyzed and some risk subscribers is greater than preset threshold, will be current User to be analyzed is identified as the user that breaks one's promise.
Except it is above-mentioned based on the feature of social networks figure in addition to, the semanteme of dialogue-based content is also needed to user credit assessment Analysis.Such as certain advertisement type users, by the frequency for repeating to send similar content to attract legitimate user to access, and one is used A little tools issue content again, express identical semanteme using different word expression.They are distinguished from normal users Become more difficult.Based on this, office of the embodiment of the present invention to user each in social networks in social networks network topology The centripetal degree feature in portion is calculated, and identifies the risk subscribers of disguise as normal users.
In social networks network, user is indicated with node, side indicates social networks.From node ViIt is directed toward node VjSide a =(i, j) indicates user i, and there are at least one sessions with j.Even if the user that breaks one's promise change self attributes, also it is more difficult change they Position in social networks topology.Therefore, based on above-mentioned social networks topology, the following characteristics of user node are calculated.
The centripetal degree in the part of one node refers to remove the node from network after, the degree of the associated energies decline of network.Office The centripetal degree in portion not only allows for local density's information, it is also contemplated that bottleneck information.The associated energies of one topological diagram are defined as:
EL(G)=∑ θ2
θ refers to the characteristic value of the Kirchhoff's matrix of figure G, is the sum of all vertex out-degree.It is assumed that A (G) is the neighbour of figure G Matrix is connect, D (G) is the diagonal matrix of vertex out-degree.The Kirchhoff's matrix of figure G is L (G)=D (G)-A (G).
For there is n vertex, out-degree is respectively d1, d2..., dnTopological diagram G for, associated energies areReflect the contiguity inside figure.One vertex is removed from figure, the pass of figure Joining energy will be reduced.The part E of reductionL(G) significance level of this vertex in the graphic is reflected.It is assumed that by vertex v from figure Figure after removing in shape G is H.The centripetal degree in the part of vertex v are as follows:
Cv=EL(G)-EL(H)
Due to the unstable social network structure of user of breaking one's promise, and it is very weak with the relationship of neighbor node.By these societies It is removed in the unessential users from networks of breaking one's promise of friendship relationship, the energy of network reduces few.
The user that breaks one's promise is in itself specific commercial interest, and the session content of publication often has very big similitude, wraps Containing information such as a large amount of duplicate session content, harmful links, these information have the similitude of height.Therefore first by streaming number Session content of text in is decomposed into phrase, then calculates these phrase semantic distances using the method that bag of words are analyzed.It uses Closed bag of words characteristic set is used to calculate content similarity.Each characteristic set includes a list, includes similar in list Semantic word.By checking the similarity of these words, the similarity of available entire content, and then count each user The similarity between session content issued every time.
Similarity between the session content that the centripetal degree in part for obtaining each user and each user are issued every time Later, by the way that risk discrimination threshold is arranged, the centripetal degree in part is obtained by filtration lower than presetting centripetal degree threshold value and session content phase It is higher than the user node of default similarity threshold like degree, and is identified as the user that breaks one's promise.
In the promise breaking transaction for bringing risk to system, significant proportion is because of superior node of the user in chain transaction Credit it is lower caused by.The present invention carries out risk diffusion identification further directed to the risk diffusion behavior in chain transaction.According to The credit that the average value of the past period user's All Activity sets the user passively reduces threshold value.There are more transactions when simultaneously When consider influence of the network structure to diffusion.Network G (V, E) is established with real trade data.Node V indicates All Activity user Set.Wherein S (x) is the set of devoid of risk user, and I (x) is the set of risky user.Node E indicates user in network Between the set traded.Side EijOn weight be denoted as { aij, indicate the number traded between user.The state for remembering user i is ni, ni=1 indicates promise breaking, ni=0 indicates not break a contract;Trade E between note userijState be eij, eij=1 indicate this user it Between transaction it is abnormal, eij=0 indicates that transaction is normal.It is d that the credit of user j, which passively reduces number,j=Σ AijaijeijIf user Credit passively reduces threshold value distribution and is denoted as { δi, and credit passively reduces number { Fi, risk diffusion to credit passively reduce The collection of user is combined into Risk (x).Diffusion process description are as follows:
A) all users are initialized all in normal condition (S), so that a part of user is become risk status (I) at random, i.e., Make a part of n at randomiBecome 1 from 0, a certain transaction E of this certain customersijIt breaks a contract, eij=1.
B) number broken a contract is added to higher level user, once higher level's user credit, which passively reduces number, is greater than given threshold value, Work as dj=Σ Aijaijeij> δj, which becomes I by S.
C) the number F that each user i is spread and become risk status is recordedi, the credit after front and back is spread twice When the user set Risk (x) passively reduced is identical, diffusion process terminates.
Promise breaking transaction is extracted from trade network and constitutes sub-network, and credit is passively reduced to number FiDescending arrangement, choosing Select number F in sub-networkiHighest preceding X user propagates the recognition result of user as high risk, and wherein X is preset quantity Threshold value.
For the violations for further limiting risk subscribers and the user that breaks one's promise, illegal operation is reduced to normal users and social activity The adverse effect that network environment generates, the present invention is after the cognitive phase of stream data is completed, the specific limit further implemented System or control strategy.The limitation or control strategy include, but are not limited to, the following ways:
1: limitation user's extension social scope is attempted to search for other new users or be recommended newly in system for the user in user When good friend, reduce that the user is visible or the quantity of the object of system push.Specific practice is, by the recommendable new use of user's script Family is sorted from low to high by value-at-risk, hides the new user of the highest preset ratio of system level of trust.Thus limitation risk is used Influence of the family to normal users.2: user being marked, if user does not generate violations within a certain period of time, but is more than It breaks a contract again after the period, then the user is identified as the user that breaks one's promise again, be placed into the wind higher than former control hierarchy Dangerous grade.3: if some user is determined as the user that malice is broken a contract, pressure control means, including Frozen Account are taken, The user is prevented to continue to influence social network environment.
In conclusion reducing user behavior to be analyzed the invention proposes a kind of streaming big data security processing The quantity of feature, eliminates the redundancy between feature, and the disaggregated model of use more highly effective effectively improves credit evaluation Speed and the accuracy of credit evaluation have better adapted to the streaming computing scene of mass data.
Obviously, it should be appreciated by those skilled in the art, each module of the above invention or each steps can be with general Computing system realize that they can be concentrated in single computing system, or be distributed in multiple computing systems and formed Network on, optionally, they can be realized with the program code that computing system can be performed, it is thus possible to they are stored It is executed within the storage system by computing system.In this way, the present invention is not limited to any specific hardware and softwares to combine.
It should be understood that above-mentioned specific embodiment of the invention is used only for exemplary illustration or explains of the invention Principle, but not to limit the present invention.Therefore, that is done without departing from the spirit and scope of the present invention is any Modification, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.In addition, appended claims purport of the present invention Covering the whole variations fallen into attached claim scope and boundary or this range and the equivalent form on boundary and is repairing Change example.

Claims (1)

1. a kind of streaming big data security processing characterized by comprising
Based on social networks topology, the centripetal degree feature in part of user node is calculated;The node of the social networks network topology Indicate user, side indicates the social networks between user;
The centripetal degree in the part of the node indicates after removing the node from network, the degree of the associated energies decline of network;
Wherein the associated energies of social network diagram G are defined as:
EL(G)=∑ θ2
θ indicates the characteristic value of the Kirchhoff's matrix of figure G;
The Kirchhoff's matrix of the figure G is L (G)=D (G)-A (G);
A (G) is the adjacency matrix of figure G, and D (G) is the diagonal matrix of vertex out-degree.
For there is n node, out-degree is respectively d1, d2..., dnTopological diagram G, associated energies are
In social networks network, user is indicated with node, side indicates social networks.From node ViIt is directed toward node VjSide a= (i, j) indicates user i, and there are at least one sessions with j.
It is H, the then centripetal degree in the part of vertex v by seal of the vertex v after being removed in figure G are as follows:
Cv=EL(G)-EL(H)
The centripetal degree in part of each user is calculated, and compared with pre-selected centripetal degree threshold value;
Then the session content of text in stream data is decomposed into phrase, then calculates these words using the method that bag of words are analyzed The semantic distance of group;
It is used to calculate session content similarity using closed bag of words characteristic set;Each characteristic set includes a list, column It include the word of similar semantic in table;By checking the similarity of these words, the similarity of entire content is obtained, and then count The similarity between session content that each user issues every time;
After similarity between the session content that the centripetal degree in part that each user has been calculated and each user are issued every time, By the way that risk discrimination threshold is arranged, the centripetal degree in part is obtained by filtration lower than presetting centripetal degree threshold value and session content similarity height In the user node of default similarity threshold, it is identified as the user that breaks one's promise.
CN201910641669.1A 2019-07-16 2019-07-16 Streaming big data security processing Withdrawn CN110334780A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910641669.1A CN110334780A (en) 2019-07-16 2019-07-16 Streaming big data security processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910641669.1A CN110334780A (en) 2019-07-16 2019-07-16 Streaming big data security processing

Publications (1)

Publication Number Publication Date
CN110334780A true CN110334780A (en) 2019-10-15

Family

ID=68145255

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910641669.1A Withdrawn CN110334780A (en) 2019-07-16 2019-07-16 Streaming big data security processing

Country Status (1)

Country Link
CN (1) CN110334780A (en)

Similar Documents

Publication Publication Date Title
CN110348528A (en) Method is determined based on the user credit of multidimensional data mining
US11080709B2 (en) Method of reducing financial losses in multiple payment channels upon a recognition of fraud first appearing in any one payment channel
Olszewski A probabilistic approach to fraud detection in telecommunications
US7721336B1 (en) Systems and methods for dynamic detection and prevention of electronic fraud
Lopez-Rojas et al. Money laundering detection using synthetic data
Lekha et al. Data mining techniques in detecting and predicting cyber crimes in banking sector
Singh et al. Fraud detection by monitoring customer behavior and activities
CN113011889B (en) Account anomaly identification method, system, device, equipment and medium
CN109829721B (en) Online transaction multi-subject behavior modeling method based on heterogeneous network characterization learning
CN110347669A (en) Risk prevention method based on streaming big data analysis
CN113139876B (en) Risk model training method, risk model training device, computer equipment and readable storage medium
Mezei et al. Credit risk evaluation in peer-to-peer lending with linguistic data transformation and supervised learning
Badawi et al. Detection of money laundering in bitcoin transactions
Barman et al. A complete literature review on financial fraud detection applying data mining techniques
Lata et al. A comprehensive survey of fraud detection techniques
CN115438102A (en) Space-time data anomaly identification method and device and electronic equipment
Torres et al. A proposal for online analysis and identification of fraudulent financial transactions
Reddy et al. CNN-Bidirectional LSTM based Approach for Financial Fraud Detection and Prevention System
Abdulghani et al. Credit card fraud detection using XGBoost algorithm
Ni et al. A Victim‐Based Framework for Telecom Fraud Analysis: A Bayesian Network Model
Xiao et al. Explainable fraud detection for few labeled time series data
CN110334780A (en) Streaming big data security processing
Hanae et al. End-to-End Real-time Architecture for Fraud Detection in Online Digital Transactions
CN116451050A (en) Abnormal behavior recognition model training and abnormal behavior recognition method and device
Xu et al. Multi-view Heterogeneous Temporal Graph Neural Network for “Click Farming” Detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20191015

WW01 Invention patent application withdrawn after publication