CN108304479A - A kind of fast density cluster double-layer network recommendation method based on graph structure filtering - Google Patents
A kind of fast density cluster double-layer network recommendation method based on graph structure filtering Download PDFInfo
- Publication number
- CN108304479A CN108304479A CN201711469928.4A CN201711469928A CN108304479A CN 108304479 A CN108304479 A CN 108304479A CN 201711469928 A CN201711469928 A CN 201711469928A CN 108304479 A CN108304479 A CN 108304479A
- Authority
- CN
- China
- Prior art keywords
- user
- comment
- cluster
- score
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/335—Filtering based on additional data, e.g. user or group profiles
- G06F16/337—Profile generation, learning or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
Abstract
A kind of fast density cluster double-layer network recommendation method based on graph structure filtering, the described method comprises the following steps:1) it is first depending on historical user's comment information and the false comment very much like with authentic specimen that simulation comment data is used as accurate mark category is automatically generated by TextGAN;2) history is really commented on and is marked false simulation comment as input, in view of generation false comment with really comment on it is very much like, a kind of virtual information filter based on figure of research user access record is designed, is commented on false by continuous iterative user, shop and the confidence level of comment detection of false user;3) for the sparsity of result recommending data the problem of, design the recommendation method that double-layer network is clustered based on fast density, this method can realize the adaptive selection of parameter, and obtain preferable cluster result, it is hereby achieved that the personalized recommendation list of more efficiently user, improves the accuracy rate of recommendation.The present invention using confrontation generate network generate with the very much like false sample of true comment data, and propose a kind of fast density cluster double-layer network recommendation method of high efficient and reliable filter based on graph structure.
Description
Technical field
The invention belongs to information recommendation method, it is related to a kind of fast density cluster double-layer network filtered based on graph structure and pushes away
Recommend method.
Background technology
As network technology develops rapidly, information exchange is increasingly frequent, brings the difficulty of information selection.User in face of
Effective information, i.e. problem of information overload can not be therefrom obtained when bulk information, and commending system then comes into being.In practical feelings
In condition, commending system can have an impact the selection of user, and some shops can then utilize falseness to maximize individual interest
User increases the probability recommended in target shop with false comment, and reduces the recommendation probability in other similar shops.Therefore realizing has
The false comment of effect filter and realize precisely recommend it is most important.
Recommended technology includes based on commending contents, knowledge based recommendation and collaborative filtering recommending etc., wherein being pushed away based on content
It recommends and recommends the content for being based on object to be recommended with knowledge based, the scoring independent of user to shop.Collaborative filtering pushes away
It recommends, can be that user find and oneself like similar people or shop similar with oneself favorite shop is recommended, imitate
Fruit is good and is widely used.Secondly, most commending system has that user-item association matrix is sparse, i.e. user couple
The evaluation of project or consumer record are less.When finding similar users for target user, Sparse directly affects recommendation results
Accuracy.Cluster is introduced into commending system, thinking is provided to solve Deta sparseness.Commending system based on cluster is logical
It crosses and a large amount of sparse data compressions is solved the problems, such as into Deta sparseness at a series of intensive subsets.Xue et al. utilizes K-means
Clustering algorithm clusters user, and the user of a most like degree of K is chosen in the cluster of place as closing on user for each user;
Guo et al. proposes a kind of cluster recommendation calculation being constantly iterated cluster to user with community's trusting relationship according to score information
Method.Since cluster result can generate the proposed algorithm based on cluster large effect, and in clustering algorithm generally existing cluster
The heart is difficult to the problem determined and the robustness of parameter is poor, directly affects recommendation effect.
Recommendation method can preferably solve the problems such as information overload, but be easy the deceptive information that can included in database
It influences.In order to reduce influence of the deceptive information to commending system, needs to introduce filter in commending system, detect and reject void
False information.The problem of fictitious users and false comment detection being put forward for the first time with Jindal et al., in deceptive information detection field
Research be also stepped up.Filter based on supervised learning can effective detection of false information, but be based on supervised learning
Filter depend particularly on the training of marking class target data.In the case where training set is less, the mistake based on supervised learning
The filter effect of filter is not good enough.
Invention content
In order to effectively filter the influence of fictitious users and deceptive information to commending system, and it is existing in order to overcome
The poor deficiency of the less efficient of recommendation method, reliability, the present invention provides a kind of high efficient and reliables to be filtered based on graph structure
Fast density cluster double-layer network recommend method.
The technical solution adopted by the present invention to solve the technical problems is:
A kind of fast density cluster double-layer network recommendation method based on graph structure filtering, the method includes following steps
Suddenly:
1) according to historical user's comment information by be based on TextGAN generators automatically generate simulation comment data be used as standard
The really false comment of mark category, the comment information generated are denoted as similar to true comment;
2) the extremely similar to true comment of false comment is considered, herein according to the design of the access information of user based on figure
Virtual information filter, the confidence level for calculating user and comment filters deceptive information;
3) design clusters the recommendation method of double-layer network based on fast density, quickly, efficiently obtains the individual character of user
Change recommendation list.
Further, in the step 1), the virtual comment based on TextGAN generates, and is to comment on to make with part real history
For input, the more similar virtual comment as attack data is generated according to TextGAN;
Automatic comment technology based on TextGAN can generate according to the text sentence of input and comment on letter as the input phase
Breath;The simulation comment information of generation gives different scorings according to the different emotions of text representation, and the purpose of sentiment analysis is root
According to emotion word in comment, their tendency is judged to every comment, is classified as actively being inclined to or passiveness is inclined to, for each
Each usage of word has corresponding actively score and passive score, and positive score Ps and passiveness score Ns are subtracted each other, obtained
The score Score of this usage of the word:
Score=PsNs (1)
The value of the score of final each word is between [- 1,1], it is considered that this usage of this word when more than 0
It is on the contrary then be inclined to passive with positive tendency;
In order to add score information to comment text, extraction can indicate adjective, the pair of Sentiment orientation other than feature
Word, verb and noun are as emotion word, the score of all middle emotion words in the sentence that adds up, and consider to generate the true of the comment simultaneously
The scoring mean value of real comment sample, is calculated the final score of the sentence.
Further, in the step 2), the network of user and project is mainly made of three parts:User node, project
Node and score information, the filter based on graph structure is that corresponding confidence calculations rule is arranged in these elements, by multiple
The method of iteration filters out virtual user node and score information;
For any one user node u, confidence level HuIt indicates:
Wherein nuIndicate the score information number that user node u leaves,Indicate the confidence of i-th scoring of user u
Degree;
For the confidence level of user to be limited in certain section, enable
Wherein T (u) ∈ (- 1,1).Due to T (u) and HuBetween relationship correspond, and the boundedness of T (u) is more suitable for
The setting of user's confidence threshold value in subsequent process, therefore finally use the confidence level of T (u) expression user nodes u;
For arbitrary score information v, the calculation formula of confidence level H (v) is:
Wherein φvIndicate the destination item of comment v, R (φv) indicate that the confidence level of the destination item, A (v) then indicate user
Influence of the confidence level to v confidence levels;
For arbitrary project t, the calculation formula of its confidence level R (t) is:
Wherein
UtIndicate that the user for accessing project t gathers, ψvThe specific score value of score information between expression t and r, α are to weigh to comment
Divide the threshold parameter of information attribute;
It is 1 first to initialize all T (u) and R (t), calculates the confidence level H (v) of every score information;When H (v) is calculated
After, then calculate T (u) and R (t) successively in order;Then the H of next round can be calculated according to updated T (u) and R (t)
(v), after such iteration is multiple, T (u), H (v) and R (t) will gradually restrain stabilization, and algorithm will export T (u), H (v) and R (t)
End value.
The codomain of T (u) and H (v) is (- 1,1), it is proposed that a kind of channel zapping according to confidence level is set come quickly determination
The method of confidence threshold can be effectively right if the confidence threshold value of setting can be fallen at the low ebb between the double peaks of channel zapping
Virtual User and real user distinguish, and are finally completed the filtering to Virtual User node.
In the step 3), method is recommended using the double-layer network clustered based on fast density, is included the following steps:
3.1) characteristic information of user and project are extracted, and the upper project of user is gathered according to its characteristic information respectively
Class;
3.2) the double-deck two subnetwork models are established, are carried out according to network structure and cluster result consequently recommended.
In the step 3.1), using it is quick determine cluster centre algorithm,
Define 1:For arbitrary sample point i, local density ρiCalculation formula be:
ρi=∑ ξ (dij-dc) (7)
Wherein dijIndicate the distance between sample point i and sample point j value;
Define 2:For arbitrary sample point i, minimum range δiIt is more than in all the points of the point from point i's for local density
Lowest distance value.
δi=min (dij)(ρj≥ρi) (9)
In view of automatically determine the algorithm of cluster centre there are the problem of, introduce variable γ, be defined as:
γi=ρi×δi (10)
Definition according to γ obtains the probability density distribution of γ, from its distribution it can be found that its shape is similar to normal state point
Cloth calculates confidence interval according to approximate normal distribution curve, singular point is determined by confidence interval;
Assuming that corresponding γ is that obey mean value be μ, standard deviation σ, normal distribution, to determine mean value and standard deviation
When, sample average is calculated firstIt can then be obtained further according to moments estimation principle with sample variance S:
The γ density profiles of one data set are further analyzed, it is found that the γ values of all data are non-negative, at this
It is a little upper to illustrate the distribution for arbitrary number strong point i, γ value and non-critical normal distribution, because being negative section in γ values
Inside there is the missing of data point, large effect can be caused to the result of formula (11), is the value that can accurately seek μ and σ,
Data in the section to defect are needed to carry out completion:
First find out sample averageIt choosesSample point in range, update obtain sample averageIt chooses againSample point in range, and update and obtain sample averageContinuous iteration no longer changes or becomes until sample average successively
Change is very small, and final sample mean value isFoundation symmetry principle, withIt is symmetry axis by sectionInterior number
According to be filled into (- ∞, 0], make up γ density profiles the shortage of data of negative semiaxis the problem of, sample calculated according to current data
Variance S recycles formula (11) to obtain μ and σ values;
After the value for finding out μ and σ, a normal distribution curve is obtained, confidence is chosen now according to 5 σ principles of normal distribution
To find out singular point, process is in section:
+ 5 σ of boundary value Wide=μ are set, the γ values of all the points in data set are compared with Wide.For data point
I, if γi>Wide, then it is cluster centre point to mark i.
In the step 3.2), the personalized recommendation of user is carried out using double-deck two network frames, is included the following steps:
3.2.1) user and project are clustered respectively, user's gathering is obtained and closes and the conjunction of project gathering;With user's cluster and
Project cluster is node, and the access times between counting user cluster and project cluster build two networks of first layer;To two of structure
Proposed algorithm of the Web vector graphic based on two networks obtains the personalized recommendation list of all user's clusters;
3.3.2) for each user's cluster, the top n project cluster in personalized recommendation list obtained in the previous step is chosen, with
The user for including in user's cluster and the project for including in selected item cluster are node, are with the score information between user and project
Even side builds two networks of the second layer;Similarly, the proposed algorithm based on two networks is also used to two networks of the second layer, most
The personalized recommendation list of each user is obtained eventually.
The present invention technical concept be:Goodfellow et al. has been put forward for the first time production confrontation network model (GAN), should
Model achieves larger success on application real number space, but not effective when handling discrete data, especially text
Data.In order to enable production confrontation network can effectively handle discrete text data, Zhang et al. proposes text life
An accepted way of doing sth fights network (TextGAN).The model is made of two parts of generator and arbiter, and wherein generator is passed for the time
Return neural network, and arbiter is convolutional neural networks.
The frame of TextGAN is with time recurrent neural network generator, with smooth close approximation time recurrent neural network
Output extract most important semantic feature using convolutional neural networks as arbiter and differentiated.Convolutional Neural under the frame
Network is made of a Ge Juan bases and a maximum pond layer.Maximum pond layer can effectively filter the less list of information
Word extracts the most important feature in sentence.
The object function of the object function of TextGAN frames and the GAN of standard are different.The object function of TextGAN
The majorized function of characteristic matching is increased, wherein iterative optimization procedure includes following two steps:
It minimizes:
It minimizes:Wherein, ΣsAnd ΣrIndicate respectively true feature to
Measure fsWith the feature vector f of simulation sentencerCorresponding covariance matrix;μsAnd μrF is indicated respectivelysAnd frAverage vector.Its
In second loss function LGIt is two multivariate Gaussian distribution N (μr,Σr) and N (μs,Σs) between Jensen-Shannon
Divergence.
Recommendation method based on cluster:Often there is sparsity and make the recommendation knot of conventional recommendation algorithm in real data
Fruit is relatively poor.The concept of cluster is introduced into a large amount of sparse data compressions in proposed algorithm into a series of intensive subsets, energy
The problem of enough effective solution Deta sparseness.
Joseph et al. is classified user by topic model, can distinguish the type (trip of user simultaneously
Visitor or driver) and interest;Rana et al. proposes the Dynamic recommendation system by evolution algorithm cluster user;Wang et al. profits
User is clustered with K-means algorithms, and estimates the scoring in user-shop matrix, and obtains the inclined of target user
It is good;Puntheeranurak et al. proposes a kind of mixing proposed algorithm obscuring K-means clustering algorithm cluster users;
Connor et al. clusters project using some row partitioning algorithms, and calculates the predicted value of each subset.
Influence of the deceptive information to commending system is also increasingly prominent, and the test problems of deceptive information are also concerned.Supervision
The mode detection of false information of study is one of technology mostly important in detection technique.Jindal et al. utilizes supervised learning
Algorithm is commented on according to the important feature of comment and the feature detection of false of user, and the wherein higher comment of multiplicity is considered as
Falseness comment;Li et al. people proposes a kind of method of the mode detection of false information according to coorinated training;Lim et al. usage behaviors
Feature is commented on to analyze detection;Wang et al. proposes the filter algorithm based on figure, according between user, shop and access record
Existing relationship filters deceptive information.
Deceptive information can generate large effect to the recommendation results of commending system.Recommend in order to improve commending system
Accuracy, need in commending system the filter of addition filtering deceptive information, filter deceptive information, improve recommend it is accurate
Rate.
Beneficial effects of the present invention are mainly manifested in:1, using the automatic comment side for fighting network based on text generation formula
Method.This method generates simulation comment according to historical review using confrontation production network, and is each simulation according to sentiment analysis
Comment generates score information;Obtaining has the data set of accurate comment category information;2, design is a kind of quickly determining that node is set
The filter based on figure of confidence threshold.The graph structure that network is commented between user and project based on the filter of figure divides
Analysis, can effectively delete the Virtual User node in network structure and virtual comment information, improve the accuracy of proposed algorithm;3, it carries
Go out a kind of double-layer network proposed algorithm clustered based on fast density, the local density and minimum which can be according to data point
The distribution relation of distance determines cluster centre point, realizes the adaptive of parameter, and have preferable cluster result, is pushed away to improve
The accuracy rate recommended.
Description of the drawings
Fig. 1 is the flow chart that the fast density cluster double-layer network filtered based on graph structure recommends method.
Fig. 2 is user's confidence level channel zapping figure containing virtual information.
Fig. 3 is the fundamental block diagram for clustering recommendation method.
Specific implementation mode
The invention will be further described below in conjunction with the accompanying drawings.
Referring to Fig.1~Fig. 3, a kind of fast density cluster double-layer network recommendation method based on graph structure filtering, including with
Lower step:
1) simulation comment data is automatically generated by TextGAN according to historical user's comment information and is used as accurate mark category
False comment;
2) history is really commented on and is marked false simulation comment as input, design the virtual information filtering based on figure
Device extracts true comment information.
3) design clusters the proposed algorithm of double-layer network based on fast density, obtains the personalized recommendation list of user.
In the step 1), the virtual comment based on TextGAN generates, and is with the higher real history comment of partial evaluation
As input, higher virtual comment of scoring is generated according to TextGAN.Similarly, it is commented on the lower real history of partial evaluation
Lower virtual comment of scoring is generated as input.Generate virtual comment information function also there are two types of:(1) it scores higher
It comments on to increase the recommended probability in target shop in commending system;(2) the lower Virtual matching that scores can be used for dropping
Low commending system recommends the probability in shop similar with target shop.
According to TextGAN models, commented with input using being generated as input data to the comment data in dining room in Yelp data sets
By similar virtual comment.On the whole, the comment information on Yelp data sets is usually associated with scoring of the user to project
Information.Integer of the value of score information between 1-5 can be that boundary judges user to project with 3 when handling data
Sentiment orientation.If score information is more than 3 points, illustrate that user is just to the Sentiment orientation of project;It is on the contrary then be negative.By this
Method, we can effectively screen the Sentiment orientation of user comment information, be roughly divided into actively tendency comment and passive tendency
It comments on two classes and marks category.
Automatic comment technology based on TextGAN can generate according to the text sentence of input and comment on letter as the input phase
Breath.The Sentiment orientation of its Sentiment orientation and input text sentence is consistent.For generating the comment information being actively inclined to, I
Need that a large amount of actively tendency comment the training of model will be carried out as input in Yelp data sets, and according to model output void
Quasi- comment information.Wherein true input sample is for example:“Very nice and clean place to have
A large amount of text sentences such as breakfast or lunch ".The virtual comment generated is such as:“Great food and service.”
" It was amazing.I am a fan, and the service was really great. ".
The simulation comment information of generation gives different scorings according to the different emotions of text representation.The purpose of sentiment analysis
It is that their tendency is judged to every comment according to emotion word in comment, is classified as actively being inclined to or passiveness is inclined to.Emotion
There are many ways to analysis, we select SentiWordNet.SentiWordNet is a huge dictionary resource, it includes
One prodigious text file has the usage and score of each word in dictionary.For each usage of each word
There are corresponding actively score and passive score, positive score Ps and passiveness score Ns are subtracted each other, this use of the word can be obtained
The score Score of method:
Score=PsNs (14)
The value of the score of final each word is between [- 1,1], it is considered that this usage of this word when more than 0
It is on the contrary then be inclined to passive with positive tendency.
In order to add score information to comment text, we extract can indicate describing for Sentiment orientation other than feature
Word, adverbial word, verb and noun are as emotion word, the score of all middle emotion words in the sentence that adds up, and consider that generating this comments simultaneously
The scoring mean value of the true comment sample of opinion, is calculated the final score of the sentence.
To carrying out sentiment analysis for the virtual comment enumerated above generated based on TextGAN:
“Great food and service.”:For scoring of the analysis of sentence obtained by it:By " food " therein and
" service " is used as Feature Words, does not consider its influence to sentence Sentiment orientation.It can be obtained according to above sentiment analysis
Score=0.25 " Great " corresponding in sentiment dictionary, and the scoring mean value by really inputting comment sample can obtain for 4,
The scoring virtually commented on is 4.25.
“It was amazing.I ama fan,and the service was really great.”:It will wherein
" service " be used as Feature Words, according to sentiment analysis can obtain, the Score=0.15 corresponding to " amazing ", " great " institute
Corresponding Score=0.25, the Score=0.375 corresponding to " really ", and the scoring mean value by really inputting comment sample
It can be obtained for 4, which is 4.75.
In the step 2), Wang etc. has been put forward for the first time the filter based on graph structure in.The algorithm is to user and item
Cyberrelationship between mesh is analyzed, by simply iterating to calculate the confidence level of all user nodes, final filtration confidence level
Lower Virtual User node improves the anti-interference ability of network recommendation algorithm.However, this algorithm cannot effectively choose user
The confidence threshold value of node, filter effect are affected by data set.User's confidence level can be quickly determined this paper presents a kind of
The method of threshold value can effectively delete the Virtual User node in network, and improve the accuracy of proposed algorithm.
The network of user and project is mainly made of three parts:User node, item nodes and score information.It is tied based on figure
The filter of structure is that corresponding confidence calculations rule is arranged in these elements, and virtual use is filtered out by the method for successive ignition
Family node and score information.
For any one user node u, confidence level HuIt indicates:
Wherein nuIndicate the score information number that user node u leaves,Indicate the confidence of i-th scoring of user u
Degree.
For the confidence level of user to be limited in certain section, enable
Wherein T (u) ∈ (- 1,1).Due to T (u) and HuBetween relationship correspond, and the boundedness of T (u) is more suitable for
The setting of user's confidence threshold value in subsequent process, therefore finally use the confidence level of T (u) expression user nodes u.
For arbitrary score information v, the calculation formula of confidence level H (v) is:
Wherein φvIndicate the destination item of comment v, R (φv) indicate that the confidence level of the destination item, A (v) then indicate user
Influence of the confidence level to v confidence levels.
For arbitrary project t, the calculation formula of its confidence level R (t) is:
Wherein
UtIndicate that the user for accessing project t gathers, ψvThe specific score value of score information between expression t and r, α are to weigh to comment
Divide the threshold parameter of information attribute.
According to foregoing description, it is seen that T (u), H (v) and R (t) communication with one another are close.It under normal circumstances, can be first initial
It is 1 to change all T (u) and R (t), calculates the confidence level H (v) of every score information;After H (v) is calculated, then by suitable
Sequence calculates T (u) and R (t) successively;Then the H (v) of next round can be calculated according to updated T (u) and R (t), such iteration is more
After secondary, T (u), H (v) and R (t) will gradually restrain stabilization, and algorithm will export T (u), the end value of H (v) and R (t).
Since the codomain of T (u) He H (v) are (- 1,1), in the original filter based on figure, generally directly by 0 conduct
The authenticity of the threshold value identification u and v of T (u) and H (v).However, this method will produce when in face of different data sets it is larger
Filtering difference, to reduce the application range of algorithm.In order to eliminate this drawback, we have proposed a kind of according to confidence level
Channel zapping is come the method that quickly determines confidence threshold value.
By taking user node as an example, since virtual user node generally has targeting and repeatability to the comment of project,
So the confidence difference between Virtual User node is not too large, i.e. the confidence level of dummy node will focus on a certain of (- 1,1)
In subinterval;And the confidence level of real user node should be generally higher than the confidence level of dummy node.In fact, the reality after passing through
Analysis is tested it can be found that the confidence level of a large amount of real user node is close to 1.Therefore, the channel zapping of user node confidence level
Double peak forms of Fig. 2 will be presented in figure.If the confidence threshold value of setting can be fallen at the low ebb between double peaks just, can have
Effect distinguishes Virtual User and real user, is finally completed the filtering to Virtual User node.
In the step 3), in order to solve the problems, such as two main problems of proposed algorithm based on cluster:User
Or the typical types of commodity represent (cluster centre) and need artificial determine;The personalized recommendation of similar users.Using based on fast
The double-layer network of fast Density Clustering recommends method, and basic framework as shown in figure 3, mainly complete, i.e., in two steps:
3.1) characteristic information of user and project are extracted, and the upper project of user is gathered according to its characteristic information respectively
Class;
3.2) the double-deck two subnetwork models are established, are carried out according to network structure and cluster result consequently recommended.
In the step 3.1), often there is larger Deta sparseness in social network data, and due to social networks number
According to the features such as numerous with node, node real-time update is added, conventional recommendation algorithm will produce high when handling this kind of data
Time complexity and recommendation effect it is bad.Proposed algorithm based on cluster can be by a large amount of sparse data compressions at a series of close
Collect subset, recommendation effect can either be optimized, can also reduce the time complexity of algorithm.
Rodriguezs et al. proposes a kind of algorithm automatically determining cluster centre, in the algorithm artificial cluster centre
It is with high density and also larger with the distance between density more high point.But so there are two disadvantages for this clustering algorithm times:Nothing
The completely automatic determining cluster centre of method and density radius will have a direct impact on the result of cluster.Based on this thought, it is proposed that
A kind of algorithm of quick determining cluster centre, and above-described two problems of effective solution.
Define 1 (local density):For arbitrary sample point i, local density ρiCalculation formula be:
ρi=∑ ξ (dij-dc) (20)
Wherein dijIndicate the distance between sample point i and sample point j value.
Define 2 (minimum ranges):For arbitrary sample point i, minimum range δiIt is more than all of the point for local density
From the lowest distance value of point i in point.
δi=min (dij)(ρj≥ρi) (22)
In view of automatically determine the algorithm of cluster centre there are the problem of, we introduce variable γ, are defined as:
γi=ρi×δi (23)
Definition according to γ obtains the probability density distribution of γ, from its distribution it can be found that its shape is similar to normal state point
Cloth.Confidence interval is calculated according to approximate normal distribution curve, singular point is determined by confidence interval.
Assuming that corresponding γ be obey mean value be μ, standard deviation σ, normal distribution.To determine mean value and standard deviation
When, sample average is calculated firstIt can then be obtained further according to moments estimation principle with sample variance S:
The γ density profiles of one data set are further analyzed, it can be found that the γ values of all data are non-negative.
Illustrate the distribution for arbitrary number strong point i, γ value in this regard and non-critical normal distribution, because being negative in γ values
There are the missings of data point in section, can cause large effect to the result of formula (24).Can accurately to seek μ and σ
Value, need in the section to defect data to carry out completion:
First find out sample averageIt choosesSample point in range, update obtain sample averageIt chooses againSample point in range, and update and obtain sample averageContinuous iteration no longer changes or becomes until sample average successively
Change is very small, and final sample mean value isFoundation symmetry principle, withIt is symmetry axis by sectionInterior number
According to be filled into (- ∞, 0], make up γ density profiles the shortage of data of negative semiaxis the problem of.Sample is calculated according to current data
Variance S recycles formula (24) to obtain μ and σ values.
After the value for finding out μ and σ, we can obtain a normal distribution curve, now according to 5 σ principles of normal distribution
Confidence interval is chosen to find out singular point.Specific method is:
+ 5 σ of boundary value Wide=μ are set, the γ values of all the points in data set are compared with Wide.For data point
I, if γi>Wide, then it is cluster centre point to mark i.
In the step 3.2), the personalized recommendation of user is carried out using double-deck two network frames, is included the following steps:
3.2.1) user and project are clustered respectively, user's gathering is obtained and closes and the conjunction of project gathering;With user's cluster and
Project cluster is node, and the access times between counting user cluster and project cluster build two networks of first layer;To two of structure
Proposed algorithm of the Web vector graphic based on two networks obtains the personalized recommendation list of all user's clusters.
3.3.2) for each user's cluster, the top n project cluster in personalized recommendation list obtained in the previous step is chosen, with
The user for including in user's cluster and the project for including in selected item cluster are node, are with the score information between user and project
Even side builds two networks of the second layer.Similarly, the proposed algorithm based on two networks is also used to two networks of the second layer, most
The personalized recommendation list of each user is obtained eventually.
Double-layer network structure can effectively reduce the complexity of two subnetworks originally, improve the operational efficiency of proposed algorithm;This
Outside, two-tier network uses different company's side construction methods, and the access record quantity first passed through between user and project finds user
The High relevancy project cluster of cluster, then personalized recommendation is carried out to each user by specific score information, this to recommend to calculate
Method has higher accuracy.
Claims (7)
1. a kind of fast density cluster double-layer network based on graph structure filtering recommends method, which is characterized in that the method packet
Include following steps:
1) void that simulation comment data is used as accurate mark category is automatically generated based on TextGAN by historical user's comment information
Vacation comment, the comment data of generation and is really commented on very much like, it is difficult to the method progress using tradition to false comment filtering
Detection;
2) it will allow for and be difficult to only be filtered with conventional method using the comment data of the method generation of machine learning, set herein
A kind of virtual information filter based on figure has been counted, the data for adulterating fictitious users had been carried out by the behavioural characteristic of user
Filter;
3) design clusters the recommendation method of double-layer network, the effective personalized recommendation list for obtaining user based on fast density.
2. a kind of fast density cluster double-layer network based on graph structure filtering as described in claim 1 recommends method, special
Sign is, in the step 1), virtual comment based on TextGAN generates, and is using the comment of part real history as inputting, according to
It is generated and the very much like virtual comment of authentic specimen according to TextGAN;
Automatic comment technology based on TextGAN can generate and comment information as the input phase according to the text sentence of input;It is raw
At simulation comment information different scorings is given according to the different emotions of text representation, the purpose of sentiment analysis is according to comment
Middle emotion word judges their tendency to every comment, is classified as actively being inclined to or passiveness is inclined to, for each word
Each usage has corresponding actively score and passive score, and positive score Ps and passiveness score Ns are subtracted each other, the word is obtained
The score Score of this usage:
Score=PsNs (1)
The value of the score of final each word is between [- 1,1], it is considered that this usage of this word has when more than 0
Actively tendency, it is on the contrary then be inclined to passive;
In order to add score information to comment text, extraction can indicate the adjective of Sentiment orientation other than feature, adverbial word, move
Word and noun are as emotion word, the score of all middle emotion words in the sentence that adds up, and consider that generating the true of the comment comments simultaneously
By the scoring mean value of sample, the final score of the sentence is calculated.
3. a kind of fast density cluster double-layer network based on graph structure filtering as claimed in claim 1 or 2 recommends method,
It is characterized in that, in the step 2), the network of user and project is mainly made of three parts:It user node, item nodes and comments
It is that corresponding confidence calculations rule is arranged in these elements to divide information, the filter based on graph structure, passes through the side of successive ignition
Method filters out virtual user node and score information;
For any one user node u, confidence level HuIt indicates:
Wherein nuIndicate the score information number that user node u leaves,Indicate the confidence level of i-th scoring of user u;
For the confidence level of user to be limited in certain section, enable
Wherein T (u) ∈ (- 1,1).Due to T (u) and HuBetween relationship correspond, and the boundedness of T (u) is more suitable for subsequently mistake
The setting of user's confidence threshold value in journey, therefore finally use the confidence level of T (u) expression user nodes u;
For arbitrary score information v, the calculation formula of confidence level H (v) is:
Wherein φvIndicate the destination item of comment v, R (φv) indicate that the confidence level of the destination item, A (v) then indicate user's confidence
Spend the influence to v confidence levels;
For arbitrary project t, the calculation formula of its confidence level R (t) is:
Wherein
UtIndicate that the user for accessing project t gathers, ψvThe specific score value of score information between expression t and r, α are to weigh scoring letter
Cease the threshold parameter of property;
It is 1 first to initialize all T (u) and R (t), calculates the confidence level H (v) of every score information;When H (v) calculating finishes
Afterwards, then in order T (u) and R (t) is calculated successively;Then the H (v) of next round can be calculated according to updated T (u) and R (t), such as
After this iteration is multiple, T (u), H (v) and R (t) will gradually restrain stabilization, and algorithm will export T (u), the end value of H (v) and R (t).
4. a kind of fast density cluster double-layer network based on graph structure filtering as claimed in claim 3 recommends method, special
Sign is that the codomain of T (u) and H (v) are (- 1,1), it is proposed that a kind of channel zapping according to confidence level is set come quickly determination
The method of confidence threshold can be effectively right if the confidence threshold value of setting can be fallen at the low ebb between the double peaks of channel zapping
Virtual User and real user distinguish, and are finally completed the filtering to Virtual User node.
5. a kind of fast density cluster double-layer network based on graph structure filtering as claimed in claim 1 or 2 recommends method,
It is characterized in that, in the step 3), method is recommended using the double-layer network clustered based on fast density, specifically includes following step
Suddenly:
5.1) characteristic information of user and project are extracted, and the upper project of user is clustered according to its characteristic information respectively;
5.2) the double-deck two subnetwork models are established, are carried out according to network structure and cluster result consequently recommended.
6. a kind of fast density cluster double-layer network based on graph structure filtering as claimed in claim 5 recommends method, special
Sign is, in the step 3), using the quick algorithm for determining cluster centre, specifically includes following steps:
Define 1:For arbitrary sample point i, local density ρiCalculation formula be:
ρi=∑ ξ (dij-dc) (7)
Wherein dijIndicate the distance between sample point i and sample point j value;
Define 2:For arbitrary sample point i, minimum range δiIt is more than the minimum in all the points of the point from point i for local density
Distance value.
δi=min (dij)(ρj≥ρi) (9)
In view of automatically determine the algorithm of cluster centre there are the problem of, introduce variable γ, be defined as:
γi=ρi×δi (10)
Definition according to γ obtains the probability density distribution of γ, from its distribution it can be found that its shape is similar to normal distribution,
Confidence interval is calculated according to approximate normal distribution curve, singular point is determined by confidence interval;
Assuming that corresponding γ is that obey mean value be μ, standard deviation σ, normal distribution, it is first to determine mean value and when standard deviation
First calculate sample averageIt can then be obtained further according to moments estimation principle with sample variance S:
The γ density profiles of one data set are further analyzed, it is found that the γ values of all data are non-negative, in this point
It is upper to illustrate the distribution for arbitrary number strong point i, γ value and non-critical normal distribution, because being negative section memory in γ values
In the missing of data point, large effect can be caused to the result of formula (11), be the value that can accurately seek μ and σ, need
Completion is carried out to data in the section of defect:
First find out sample averageIt choosesSample point in range, update obtain sample averageIt chooses again
Sample point in range, and update and obtain sample averageContinuous iteration is until sample average no longer changes or changes very successively
Small, final sample mean value isFoundation symmetry principle, withIt is symmetry axis by sectionInterior data filling
To (- ∞, 0], γ density profiles are made up the problem of bearing the shortage of data of semiaxis, and sample variance S is calculated according to current data,
Formula (11) is recycled to obtain μ and σ values;
After the value for finding out μ and σ, a normal distribution curve is obtained, confidence interval is chosen now according to 5 σ principles of normal distribution
To find out singular point, process is:
+ 5 σ of boundary value Wide=μ are set, the γ values of all the points in data set are compared with Wide.For data point i, if
γi>Wide, then it is cluster centre point to mark i.
7. a kind of fast density cluster double-layer network based on graph structure filtering as claimed in claim 6 recommends method, special
Sign is, in the step 3), the personalized recommendation of user is carried out using double-deck two network frames, specifically includes following step
Suddenly:
7.1) user and project are clustered respectively, obtains user's gathering and closes and the conjunction of project gathering;With user's cluster and project cluster
For node, the access times between counting user cluster and project cluster build two networks of first layer;Two networks of structure are made
With the proposed algorithm based on two networks, the personalized recommendation list of all user's clusters is obtained;
7.2) for each user's cluster, the top n project cluster in personalized recommendation list obtained in the previous step is chosen, with user's cluster
In include user and selected item cluster in include project be node, with the score information between user and project be even side structure
Build two networks of the second layer;Similarly, the proposed algorithm based on two networks is also used to two networks of the second layer, finally obtained
The personalized recommendation list of each user.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711469928.4A CN108304479B (en) | 2017-12-29 | 2017-12-29 | Quick density clustering double-layer network recommendation method based on graph structure filtering |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711469928.4A CN108304479B (en) | 2017-12-29 | 2017-12-29 | Quick density clustering double-layer network recommendation method based on graph structure filtering |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108304479A true CN108304479A (en) | 2018-07-20 |
CN108304479B CN108304479B (en) | 2022-05-03 |
Family
ID=62868047
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711469928.4A Active CN108304479B (en) | 2017-12-29 | 2017-12-29 | Quick density clustering double-layer network recommendation method based on graph structure filtering |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108304479B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109508740A (en) * | 2018-11-09 | 2019-03-22 | 郑州轻工业学院 | Object hardness identification method based on Gaussian mixed noise production confrontation network |
CN111783980A (en) * | 2020-06-28 | 2020-10-16 | 大连理工大学 | Ranking learning method based on dual cooperation generation type countermeasure network |
CN112950295A (en) * | 2021-04-21 | 2021-06-11 | 北京大米科技有限公司 | User data mining method and device, readable storage medium and electronic equipment |
CN112989179A (en) * | 2019-12-13 | 2021-06-18 | 北京达佳互联信息技术有限公司 | Model training and multimedia content recommendation method and device |
CN114241263A (en) * | 2021-12-17 | 2022-03-25 | 电子科技大学 | Radar interference semi-supervised open set identification system based on generation countermeasure network |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016062095A1 (en) * | 2014-10-24 | 2016-04-28 | 华为技术有限公司 | Video classification method and apparatus |
CN107506480A (en) * | 2017-09-13 | 2017-12-22 | 浙江工业大学 | A kind of excavated based on comment recommends method with the double-deck graph structure of Density Clustering |
-
2017
- 2017-12-29 CN CN201711469928.4A patent/CN108304479B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016062095A1 (en) * | 2014-10-24 | 2016-04-28 | 华为技术有限公司 | Video classification method and apparatus |
CN107506480A (en) * | 2017-09-13 | 2017-12-22 | 浙江工业大学 | A kind of excavated based on comment recommends method with the double-deck graph structure of Density Clustering |
Non-Patent Citations (3)
Title |
---|
GUAN WANG等: "Review Graph based Online Store Review Spammer Detection", 《IEEE》 * |
JINYIN CHEN等: "Double Layered Recommendation Algorithm Based on Fast Density Clustering: Case Study on Yelp Social Networks Dataset", 《IEEE》 * |
YIZHE ZHANG等: "Generating Text via Adversarial Training", 《IEEE》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109508740A (en) * | 2018-11-09 | 2019-03-22 | 郑州轻工业学院 | Object hardness identification method based on Gaussian mixed noise production confrontation network |
CN109508740B (en) * | 2018-11-09 | 2019-08-13 | 郑州轻工业学院 | Object hardness identification method based on Gaussian mixed noise production confrontation network |
CN112989179A (en) * | 2019-12-13 | 2021-06-18 | 北京达佳互联信息技术有限公司 | Model training and multimedia content recommendation method and device |
CN112989179B (en) * | 2019-12-13 | 2023-07-28 | 北京达佳互联信息技术有限公司 | Model training and multimedia content recommendation method and device |
CN111783980A (en) * | 2020-06-28 | 2020-10-16 | 大连理工大学 | Ranking learning method based on dual cooperation generation type countermeasure network |
CN112950295A (en) * | 2021-04-21 | 2021-06-11 | 北京大米科技有限公司 | User data mining method and device, readable storage medium and electronic equipment |
CN112950295B (en) * | 2021-04-21 | 2024-03-19 | 北京大米科技有限公司 | Method and device for mining user data, readable storage medium and electronic equipment |
CN114241263A (en) * | 2021-12-17 | 2022-03-25 | 电子科技大学 | Radar interference semi-supervised open set identification system based on generation countermeasure network |
CN114241263B (en) * | 2021-12-17 | 2023-05-02 | 电子科技大学 | Radar interference semi-supervised open set recognition system based on generation of countermeasure network |
Also Published As
Publication number | Publication date |
---|---|
CN108304479B (en) | 2022-05-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108304479A (en) | A kind of fast density cluster double-layer network recommendation method based on graph structure filtering | |
CN112199608B (en) | Social media rumor detection method based on network information propagation graph modeling | |
CN106650725A (en) | Full convolutional neural network-based candidate text box generation and text detection method | |
CN110046260A (en) | A kind of darknet topic discovery method and system of knowledge based map | |
Chang et al. | Research on detection methods based on Doc2vec abnormal comments | |
CN103116637A (en) | Text sentiment classification method facing Chinese Web comments | |
CN109684636B (en) | Deep learning-based user emotion analysis method | |
CN110363049A (en) | The method and device that graphic element detection identification and classification determine | |
CN111931505A (en) | Cross-language entity alignment method based on subgraph embedding | |
CN113343126B (en) | Rumor detection method based on event and propagation structure | |
CN111008337A (en) | Deep attention rumor identification method and device based on ternary characteristics | |
Solomon et al. | Understanding the psycho-sociological facets of homophily in social network communities | |
CN114492423A (en) | False comment detection method, system and medium based on feature fusion and screening | |
CN115310589A (en) | Group identification method and system based on depth map self-supervision learning | |
Yao et al. | Online deception detection refueled by real world data collection | |
Zhang et al. | Research on borrower's credit classification of P2P network loan based on LightGBM algorithm | |
CN107590742B (en) | Behavior-based social network user attribute value inversion method | |
CN108717450A (en) | Film review emotional orientation analysis algorithm | |
CN114218445A (en) | Anomaly detection method based on dynamic heterogeneous information network representation of metagraph | |
Kaiser et al. | Ant-based simulation of opinion spreading in online social networks | |
CN111767404A (en) | Event mining method and device | |
Matapurkar et al. | Comparative analysis for mining fuzzified dataset using association rule mining approach | |
CN110674257B (en) | Method for evaluating authenticity of text information in network space | |
Qin et al. | Recommender resources based on acquiring user's requirement and exploring user's preference with Word2Vec model in web service | |
Alkulaib et al. | HyperTwitter: A Hypergraph-based Approach to Identify Influential Twitter Users and Tweets |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |