CN107085616B - False comment suspicious site detection method based on multi-dimensional attribute mining in LBSN (location based service) - Google Patents

False comment suspicious site detection method based on multi-dimensional attribute mining in LBSN (location based service) Download PDF

Info

Publication number
CN107085616B
CN107085616B CN201710397805.8A CN201710397805A CN107085616B CN 107085616 B CN107085616 B CN 107085616B CN 201710397805 A CN201710397805 A CN 201710397805A CN 107085616 B CN107085616 B CN 107085616B
Authority
CN
China
Prior art keywords
places
node
lbsn
abnormal
competition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710397805.8A
Other languages
Chinese (zh)
Other versions
CN107085616A (en
Inventor
曹玖新
郭一方
马卓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN201710397805.8A priority Critical patent/CN107085616B/en
Publication of CN107085616A publication Critical patent/CN107085616A/en
Application granted granted Critical
Publication of CN107085616B publication Critical patent/CN107085616B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • G06F18/295Markov models or related models, e.g. semi-Markov models; Markov random fields; Networks embedding Markov models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Abstract

The invention discloses a false comment suspicious site detection method based on multi-dimensional attribute mining in LBSN, which comprises the following steps: firstly, marking suspicious places with false comment activities; secondly, extracting abnormal features aiming at the relationship between the overall comment abnormality of the places and the malicious competition among the places based on the place score, the space-time attribute and the text content of the place comment of the LBSN; training and learning by adopting a logistic regression machine learning method to obtain the abnormal degree of each place and the competition degree between the two places; then constructing a Markov random field detection model based on the competition relationship between the places, and fusing the abnormal characteristics of the competition relationship between the places and the LBSN network topology; calculating the probability that any place is a suspicious place based on the detection model; and finally marking whether the place is a suspicious place with false comment activity. The detection method greatly improves the accuracy of detecting the suspicious site of the false comment activity.

Description

False comment suspicious site detection method based on multi-dimensional attribute mining in LBSN (location based service)
Technical Field
The invention relates to a method for detecting false comment suspicious sites based on multi-dimensional attribute mining in LBSN (location based service).
Background
In recent years, with the rapid development of mobile terminal positioning technology and mobile internet technology, a Location-Based Social network, i.e., lbs n (full name Location-Based Social Networks) platform has been greatly successful. LBSN connects the virtual social space and the real behavior space through the position characteristics, the online relation and the offline relation are fused, a user can publish comments for spatial places by relying on an online network, explore and discover new places by relying on the comments offline, and selectively visit, consume or serve the places. However, various false comments exist in massive information on the lbs n platform, which are mostly organizational false comment activities, and these activities change the public praise of a place by issuing a plurality of false comments, thereby affecting the access decision of a user, capturing illegal benefits for place merchants, destroying the network environment, and seriously affecting the user experience and the network reputation. Therefore, it is of great practical significance to identify and detect suspicious sites where there is false comment activity in this section.
Current detection techniques for merchants with false comment activity are mainly directed to traditional e-commerce websites, with little research on detecting suspicious places in the LBSN where false comment activity exists, and no research considering false comment activity due to competitiveness among place merchants. In the practical LBSN, places can detect whether false comment activities exist or not through the abnormity expressed by the overall comments in dimensions of time, space, score, text and the like, and suspicious places with the false comment activities caused by malicious competition can be further explored through competition relations among the places, so that the detection accuracy of the suspicious places with the false comment activities is improved.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the method for detecting the suspicious places of the false comments based on multi-dimensional attribute mining in the LBSN is provided, wherein the suspicious places with the false comment activities can be identified and detected.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows: a false comment suspicious site detection method based on multi-dimensional attribute mining in LBSN (location based service) N utilizes the competition relationship between abnormal characteristic information of sites in LBSN and the sites to carry out the detection process of suspicious sites, and comprises the following steps:
1) according to the filtered comment information in the LBSN, the false comment activity is manually identified, suspicious places with the false comment activity and credible places without the false comment behavior are marked, and a training set and a test set of the places are divided; meanwhile, marking competition relation site pairs with malicious competition activities and competition-free site pairs, and dividing training sets and test sets of the competition relation site pairs.
2) And analyzing the places with false comment activities, extracting abnormal features of the overall place comments based on the place scores of the LBSN, the space-time attributes and the text contents of the place comments, and constructing an abnormal feature set of the places.
3) Analyzing the competition among the sites, extracting abnormal features of malicious competition relation between the two sites based on multiple dimensionalities of LBSN (location based service) N (location based service), and constructing an abnormal feature set of the competition relation between the sites.
4) Abnormal program construction method based on logistic regression machine learning methodA degree function, learning the characteristic weight parameters in the function according to the positive and negative examples marked in the step 1) to obtain the abnormal degree epsilon of each place in the data setlDegree of abnormality e of competition with the sitec
5) Constructing a Markov random field detection model based on LBSN, wherein the Markov random field detection model comprises nodes and edges, the nodes represent places, and the edges represent competition relations among the places; the nodes include two categories: the suspicious places and the credible places are set in the prior probability that the nodes belong to each category under different categories, and the prior probability is obtained through the abnormal degree of the places in the step 4); setting a correlation degree value matrix between the places under different categories, wherein the correlation degree is obtained by the abnormal degree of competition between the two places in the step 4).
6) According to the Markov random field detection model obtained in the step 5), aiming at the node viTo node vjSetting information values
Figure GDA0002847131530000021
And iteratively propagating the information value based on the model, and finally for each node viGenerating confidence
Figure GDA0002847131530000022
Representing a node viBelong to the class σiAs a node viBelong to the class σiThe edge probability of (2).
7) And finally marking whether the place is a suspicious place with false comment activity or not according to the node confidence coefficient obtained in the step 6).
The specific method for marking the activity place of the false comment in the data set in the step 1) comprises the following steps: according to the comment information automatically filtered in the LBSN network, selecting partial places with high proportion of filtered comments, manually marking the false comments in the partial places, marking the places with the proportion of the false comments higher than a certain threshold value as suspicious places with false comment activities, and randomly selecting the places without the filtered comments and marking the places as credible places.
The specific method for extracting the overall comment abnormal features of any place l in the data set from different dimensions in the step 2) comprises the following steps: extracting total score disparity osd (l) of a place from a score disparity dimension, extracting review explosiveness mrd (l) of a place from a time dimension, extracting check-in period distribution disparity D (r | | c) of a place from a spatio-temporal dimension, and extracting content similarity mcs (l) of a place from a review text dimension.
Extracting two sites l with competition in the data set from different dimensions in the step 3)m,lnThe specific method for the abnormal characteristics of the malicious competition comprises the following steps: extracting comment difference URD (l) of two competition location common users from grading difference dimensionnm,ln) Extracting from the time dimension the review time cooperativity ATI (l) of the co-users of two competing sitesnm,ln) Extracting content similarity ACS (l) of two competitive site common users from comment text dimensionnm,ln)。
The specific method for training and learning based on the logistic regression machine learning method in the step 4) to obtain the abnormal degree of the competition relationship between the abnormal degree of each place and the place is divided into the following 3 steps:
a) constructing feature vectors from an abnormal feature set of a place
Figure GDA0002847131530000031
Based on the training set of the places marked in the step 1), training and learning by adopting a gradient descent method to obtain weight vectors corresponding to abnormal feature vectors of the places
Figure GDA0002847131530000032
b) Constructing feature vectors according to abnormal feature sets of competitive relations among places
Figure GDA0002847131530000033
Based on the training set of the competition relationship site pairs marked in the step 1), the weight vector corresponding to the abnormal feature vector of the competition relationship between the sites is obtained by training and learning by adopting maximum likelihood estimation and a gradient descent method
Figure GDA0002847131530000034
c) Calculating the abnormal degree epsilon of all the places according to the abnormal characteristics and the weight of the placeslCalculating the abnormal degree epsilon of the competition relationship among all the sites according to the abnormal characteristics and the weight of the competition relationship among the sitescCalculating the degree of abnormality εlAnd epsiloncThe specific method comprises the following steps:
Figure GDA0002847131530000035
Figure GDA0002847131530000041
wherein the content of the first and second substances,
Figure GDA0002847131530000042
to construct a feature vector from the feature set,
Figure GDA0002847131530000043
and the feature weight vector is corresponding to the feature vector.
The information value is detected based on the detection model in the step 6)
Figure GDA0002847131530000044
The specific method of iterative propagation is as follows:
Figure GDA0002847131530000045
wherein M is a class set of nodes,
Figure GDA0002847131530000046
is a node viAnd node vjIn respective class σi,σjThe degree of association value of (a) below,
Figure GDA0002847131530000047
for the node itself in the category σiThe value of the prior probability of the lower,
Figure GDA0002847131530000048
is a node viOther neighbor nodes v ofkThe value of the information, N (v), passed to the nodei) Is node viAll neighbor node sets of N (v)i)\vjIs node viDivision node vjSet of all other neighbor nodes, Z1Is a standardized constant, with the purpose of ensuring
Figure GDA0002847131530000049
I.e. information values under all categories
Figure GDA00028471315300000410
The sum is 1. .
Each node v needs to be calculated in the step 6)iIn the category σiConfidence of
Figure GDA00028471315300000411
As node viBelong to the class σiProbability of, node viBelong to the class σiThe confidence coefficient calculation method comprises the following specific steps:
Figure GDA00028471315300000412
wherein Z is2Is a standardized constant, with the purpose of ensuring
Figure GDA00028471315300000413
I.e. node viThe sum of the confidences under all classes is 1.
The invention has the beneficial effects that: according to the abnormal features of the comment of the place in the LBSN expressed in the scoring, time, space and text dimensions, the abnormal features of the place are extracted, the place is classified based on a logistic regression machine learning method, and the suspicious place with false comment activity is effectively detected; introducing competition relations among the sites to improve the detection effect and extracting the abnormal features of the competition among the sites; the abnormal features of the sites and the abnormal features of competition among the sites are fused to jointly act on the detection of the suspicious sites with false comment activities, and the detection performance is improved. In particular, the present invention has the following advantages:
1. abnormal features of the places are extracted by using the abnormal features of the comments of the places in the LBSN in scoring, time, space and text dimensions, the places are classified based on a logistic regression machine learning method, and suspicious places with false comment activities are effectively detected;
2. introducing competition relations among the sites to improve the detection effect, extracting abnormal features of competition among the sites, and deeply mining the sites possibly with false comment activities;
3. the abnormal features of the places and the abnormal features of competition among the places are fused to jointly act on the detection of the false comment activity places, and the detection accuracy is improved.
Drawings
Fig. 1 is a flow chart of the abnormal feature extraction of the present invention.
FIG. 2 is a flow diagram of false comment activity location detection in accordance with the present invention.
Fig. 3 is an overall system framework diagram of the present invention.
Detailed Description
The present invention is further illustrated by the following examples, which are intended to be purely exemplary and are not intended to limit the scope of the invention, which is defined in the appended claims, as interpreted by those skilled in the art.
Referring to fig. 1, fig. 2 and fig. 3, a method for detecting suspicious sites of false comments based on multidimensional attribute mining in an lbs n according to the present invention includes the following steps:
step 1: according to the comment information automatically filtered in the LBSN network, selecting partial places with high proportion of filtered comments, manually marking the false comments in the partial places, marking the places with the proportion of the false comments higher than a certain threshold value as suspicious places with false comment activities, and randomly selecting the places without the filtered comments and marking the places as credible places. Then, the data is divided into two parts according to the proportion of 4: 1 by adopting a random extraction method: s, T, where S is the training set and T is the test set;
selecting common access comment users based on the marked suspicious places, taking the place pairs with the spacing distance smaller than a certain threshold and the label category similarity of the places larger than the certain threshold as a place pair candidate set which possibly has competition relationship, marking the place pairs with malicious competition in the candidate set to cause false comment activity as competition place pairs based on a manual marking mode, and randomly selecting the place pairs without the malicious competition activity in the candidate set as non-competition place pairs. Then, the data is divided into two parts according to the proportion of 4: 1 by adopting a random extraction method: s, T, where S is the training set and T is the test set;
step 2: and analyzing the place with the false comment activity, and extracting abnormal features of any place l in the data set for quantification based on multiple dimensions such as scores, time, space, text and the like of the LBSN.
1) Extracting total score difference osd (l) of location l from score difference dimension:
Figure GDA0002847131530000061
wherein t represents a certain comment i e R of the placelTime of release of RlSet of comments, r, representing location li(t) score of comment i at time t, avgt’<tri(t') represents the average score of location i before time t, diExpress comment ri(t) score and average score avg of site/before review timet’<tri(t') the difference between the values of (a),
Figure GDA0002847131530000062
representing the average score difference of all reviews for a location.
2) Extract the commenting explosive mrd (l) for site l from the time dimension:
Figure GDA0002847131530000063
wherein n is the number of reviews received by location l in a day, avg (n) is the average number of reviews per day for location l in the number of days with reviews, max (n) is the maximum number of reviews for location l,
Figure GDA0002847131530000064
represents the absolute deviation of the maximum number of reviews per day for a location.
3) Extracting sign-in period distribution difference D (r | | c) of a location l from a space-time dimension:
Figure GDA0002847131530000065
where k ∈ {1, 2, …, 7} represents a day of the week period, r represents location l commenting on the distribution vector during the week period, c represents location l checking in the distribution vector during the week period,
Figure GDA0002847131530000066
the difference of the location check-in time distribution and the review time distribution is described for KL divergence.
4) Content similarity mcs (l) of location l is extracted from the comment text dimension:
Figure GDA0002847131530000067
wherein, all comment texts of the place are used as corpus space, cosine (r)i,rj) For any two comments r for location li,rjBased on the text cosine similarity of TF-IDF.
5) Constructing an abnormal feature set of the place through feature values of all the places in the extracted data set
Figure GDA0002847131530000068
Wherein the content of the first and second substances,
Figure GDA0002847131530000069
for the overall score of the differential osd (l),
Figure GDA00028471315300000610
to review the explosive mrd (l),
Figure GDA00028471315300000611
to distribute the disparity D (r c) for the check-in period,
Figure GDA00028471315300000612
content similarity mcs (l).
And step 3: analyzing the competition among the places, and extracting any possibly competitive place pair l in the data set based on multiple dimensions of LBSN (location based service)m,lnThe abnormal features of the competition are quantified.
1) Two competitive sites l are extracted from the scoring difference dimensionm,lnReview variability URD (l) of common usersm,ln):
URD(lm,ln)=avgi∈U|di|,di=ri(lm)-ri(ln) (5)
Wherein, the location lmAnd lnIs U, ri(l) Represents the rating of user i for location l, diRepresenting user i for two competing sites lm、lnThe difference in scores of (a).
2) Two competitive sites l are extracted from the time dimensionm,lnComment temporal cooperativity ATI (l) of common usersnm,ln):
ATI(lm,ln)=avgi∈U|Ti(lnm)-Ti(ln)| (6)
Wherein, Ti(l) Represents the comment time, | T, of user i for location li(lm)-Ti(ln) I denotes user i for two competing sites lm、lnThe review time interval of (c).
3) Extracting two competitive sites l from comment text dimensionm,lnContent similarity ACS (l) of common usersnm,ln):
Figure GDA0002847131530000071
Wherein R isUA set of comments for the place of competition representing a common user set U is referred to as a corpus space, cosine (r)i,rj) Comment text r published for a common user for a competitive placei,rjBased on the cosine similarity of the TF-IDF.
4) Constructing an abnormal feature set of competition among the sites by the feature values of all the possible competition site pairs in the extracted data set
Figure GDA0002847131530000072
Wherein the content of the first and second substances,
Figure GDA0002847131530000073
to comment on the variability URD (l)nm,ln),
Figure GDA0002847131530000074
Is a time-synergistic ATI (l)m,ln),
Figure GDA0002847131530000075
For content similarity ACS (l)nm,ln)。
And 4, step 4: training and learning the feature vectors obtained in the step 2 and the step 3 by adopting a logistic regression machine learning method to obtain the abnormal degree epsilon of each placelDegree of competition with two sites epsilonc. The degree of abnormality is calculated by the same method as the method for calculating the degree of competition, and the degree of abnormality is expressed as the degree of abnormality ∈lThe calculation of (a) is taken as an example and mainly comprises the following steps:
1) set of outlier features Ψ for a siteLTo construct theFeature vector of class
Figure GDA0002847131530000076
Wherein the content of the first and second substances,
Figure GDA0002847131530000077
set of representation features ΨLThe ith eigenvalue of (a).
2) Setting weight omega for each dimension of feature, and for the feature vector
Figure GDA0002847131530000078
Constructing corresponding feature weight vectors
Figure GDA0002847131530000079
Wherein, the weight ωiRepresenting feature weight vectors
Figure GDA00028471315300000710
The degree of abnormality epsilon of the ith feature to the locationlThe degree of importance of.
3) Constructing a degree function representing the degree of abnormality of the site based on a binomial logistic regression model:
Figure GDA00028471315300000711
wherein epsilonl∈[0,1],εlA closer to 1 indicates a higher degree of abnormality at the point l.
4) The training set based on the constructed location adopts the maximum likelihood estimation and the gradient descent method to learn the function parameters to obtain the characteristic weight vector
Figure GDA0002847131530000081
5) According to abnormal feature vector of any place l in data set
Figure GDA0002847131530000082
And the feature weight vector
Figure GDA0002847131530000083
Calculating the degree of abnormality epsilon of all the sites l in the data setl
And 5: the specific steps of constructing the Markov random field detection model based on LBSN are divided into the following 3 steps:
1) and constructing a network G (V, E) based on the LBSN and the Markov random field, wherein V is a node set, E is a set of place-place edges, and the competition relationship between places is represented for the place pair candidate set which is selected in the step 1 and possibly has the competition relationship.
2) For node vmIs provided with
Figure GDA0002847131530000084
Is a node vmAt different classes σmThe following prior probability distribution, indicates the likelihood that a location is a different category of location. Setting the degree of abnormality epsilon of the spot obtained in step 4lRepresenting a priori values of nodes in the category of suspicious sites, 1-epsilonlRepresenting the prior value of the node in the trusted place category.
3) For site-site edge E, set up
Figure GDA0002847131530000085
Is a node vmAnd node vnThe association degree distribution matrix under each category represents the degree of correlation of the category of the place with which competition exists. If node vmIs a suspicious site, and sets the abnormal degree epsilon of competition among sitescIndicating the possibility of malicious competition between sites, 1-epsiloncIndicating the likelihood of no malicious competition between the sites. When node vmThe category of the node v is a credible place, and the node v is set without considering the malicious competition characteristics existing between the placesmAnd node vnIt is assumed that the suspicious site and the trusted site are both 1/2 with the same degree of correlation.
Step 6: calculating the probability that each place is a suspicious place with false comment activity according to the detection model obtained in the step 5, which specifically comprises the following steps:
1) obtained according to step 5Setting an arbitrary node v in the modeliTo node vjInformation value
Figure GDA0002847131530000086
The information value transmission method comprises the following steps:
Figure GDA0002847131530000087
wherein the content of the first and second substances,
Figure GDA0002847131530000088
in the category σ for the node obtained in step 5iThe value of the prior probability of the lower,
Figure GDA0002847131530000089
is a node viAnd node vjIn respective class σi,σjThe degree of association value of (a) below,
Figure GDA00028471315300000810
is a node viOther neighbor nodes v ofkThe value of the information, N (v), passed to the nodei) Is the set of all neighbor nodes of node i, Z1Is a constant value that is normalized to a standard,
Figure GDA0002847131530000091
2) all information values are initialized to 1.
3) And selecting part of nodes to start information value iterative propagation, and continuously updating the information values in the process.
4) And when the change of all the information values updated continuously twice is smaller than a certain threshold value, the class distribution condition of all the nodes is shown to reach a stable state, and the information value transmission is stopped.
5) Calculate each node viIn the category σiConfidence of
Figure GDA0002847131530000092
As node viBelong to the class σiProbability of, node viThe confidence coefficient calculation mode is as follows:
Figure GDA0002847131530000093
wherein Z is2Is a standardized constant, with the purpose of ensuring
Figure GDA0002847131530000094
And 7: any node v obtained according to step 6iConfidence level in the suspicious site category σ
Figure GDA0002847131530000095
Selecting a proper partition threshold value delta based on the detection result of the test set, and selecting
Figure GDA0002847131530000096
Is marked as a suspicious site where there is a false comment activity.

Claims (7)

  1. A false comment suspicious site detection method based on multi-dimensional attribute mining in LBSN (location based service) N is characterized in that a false comment suspicious site detection process is carried out by using competition between abnormal features of sites in LBSN and the sites, and the method comprises the following steps:
    1) according to the filtered comment information in the LBSN, the false comment activity is manually identified, suspicious places with the false comment activity and credible places without the false comment behavior are marked, and a training set and a test set are divided;
    2) analyzing the places with false comment activities, extracting abnormal features of the overall place comments based on the place scores of the LBSN, the space-time attributes and the text contents of the place comments, and constructing an abnormal feature set of the places;
    3) analyzing the competition among the sites, extracting abnormal features of a malicious competition relationship between the two sites based on multiple dimensionalities of LBSN (location based service), and constructing an abnormal feature set of the competition relationship between the sites;
    4) are respectively provided withSplicing the features in the feature set obtained in the step 2) and the step 3) into a feature vector, constructing an abnormal degree function by adopting a logistic regression-based machine learning method, learning the weight parameters of the features in the function according to the positive and negative examples marked in the step 1), and obtaining the abnormal degree epsilon of each place in the data setlDegree of abnormality e of competition with the sitec
    5) Constructing a Markov random field detection model based on LBSN, wherein the Markov random field detection model comprises nodes and edges, the nodes represent places, and the edges represent competition relations among the places; the nodes include two categories: setting prior probabilities of nodes belonging to various categories under different categories for suspicious sites and credible sites, and obtaining the abnormal degree of the sites obtained in the step 4); setting association degree value matrixes between the places under different types, wherein the association degree is obtained through the competition abnormal degree between the two places in the step 4);
    6) according to the Markov random field detection model obtained in the step 5), aiming at the node viTo node vjSetting information values
    Figure FDA0002901905870000011
    And iteratively propagating the information value based on the model, and finally for each node viGenerating confidence
    Figure FDA0002901905870000012
    Representing a node viBelong to the class σiAs a node viBelong to the class σiThe edge probability of (1);
    7) and finally marking whether the place is a suspicious place with false comment activity or not according to the node confidence coefficient obtained in the step 6).
  2. 2. The LBSN detection method based on multi-dimensional attribute mining in claim 1, wherein the specific method for labeling the suspicious site with the false comment activity in the data set of the step 1) is as follows: according to the comment information automatically filtered in the LBSN network, the false comments in the LBSN network are manually marked, and suspicious places and credible places with false comment activities are marked according to the false comments.
  3. 3. The LBSN detection method based on multi-dimensional attribute mining in claim 1, wherein in the step 2), the overall comment of any place in the data set is extracted with abnormal features from a scoring difference dimension, a time dimension, a space dimension and a comment text dimension.
  4. 4. The LBSN detection method based on multi-dimensional attribute mining in claim 1, wherein abnormal features are extracted from competition relationships between two places in the dataset in the step 3) from a score difference dimension, a time dimension and a comment text dimension.
  5. 5. The LBSN of claim 3 or 4, wherein said step 4) is performed to obtain the degree of abnormality ε of each point in the datasetlDegree of abnormality e of competition with the sitecThe specific method comprises the following 3 steps:
    a) feature vector is constructed by splicing according to abnormal feature set of location
    Figure FDA0002901905870000021
    Based on the training set of the places marked in the step 1), training and learning by adopting a gradient descent method to obtain weight vectors corresponding to abnormal feature vectors of the places
    Figure FDA0002901905870000022
    b) Feature vector is constructed by splicing abnormal feature sets according to competition relation among places
    Figure FDA0002901905870000023
    Training of competition relation site pairs based on labeling in step 1)The weight vector corresponding to the abnormal feature vector of the competition relationship between the places is obtained by training and learning by adopting maximum likelihood estimation and a gradient descent method
    Figure FDA0002901905870000024
    c) Calculating the abnormal degree epsilon of all the places according to the abnormal characteristics and the weight of the placeslCalculating the abnormal degree epsilon of the competition relationship among all the sites according to the abnormal characteristics and the weight of the competition relationship among the sitescCalculating the degree of abnormality εlAnd epsiloncThe specific method comprises the following steps:
    Figure FDA0002901905870000025
    Figure FDA0002901905870000031
  6. 6. the LBSN of claim 5, wherein said step 6) of detecting suspicious sites of false comments based on multi-dimensional attribute mining comprises applying a Markov random field detection model to the information values
    Figure FDA0002901905870000032
    The specific method of iterative propagation is as follows:
    Figure FDA0002901905870000033
    wherein M is a class set of nodes,
    Figure FDA0002901905870000034
    is a node viAnd node vjIn respective class σi,σjThe degree of association value of (a) below,
    Figure FDA0002901905870000035
    is a node viIn the category σiThe value of the prior probability of the lower,
    Figure FDA0002901905870000036
    is a class σiLower node viOther neighbor nodes v ofkThe value of the information, N (v), passed to the nodei) Is node viAll neighbor node sets of N (v)i)\vjIs node viDivision node vjSet of all other neighbor nodes, Z1Is a constant value that is normalized to a standard,
    Figure FDA0002901905870000037
    to ensure
    Figure FDA0002901905870000038
    I.e. information values under all categories
    Figure FDA0002901905870000039
    The sum is 1.
  7. 7. The LBSN of claim 6, wherein in said step 6), each node v needs to be calculatediIn the category σiConfidence of
    Figure FDA00029019058700000310
    As node viBelong to the class σiProbability of, node viBelong to the class σiThe confidence coefficient calculation method comprises the following specific steps:
    Figure FDA00029019058700000311
    wherein Z is2Is a constant value that is normalized to a standard,
    Figure FDA00029019058700000312
    to ensure
    Figure FDA00029019058700000313
    I.e. node viThe sum of the confidences under all classes is 1.
CN201710397805.8A 2017-05-31 2017-05-31 False comment suspicious site detection method based on multi-dimensional attribute mining in LBSN (location based service) Active CN107085616B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710397805.8A CN107085616B (en) 2017-05-31 2017-05-31 False comment suspicious site detection method based on multi-dimensional attribute mining in LBSN (location based service)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710397805.8A CN107085616B (en) 2017-05-31 2017-05-31 False comment suspicious site detection method based on multi-dimensional attribute mining in LBSN (location based service)

Publications (2)

Publication Number Publication Date
CN107085616A CN107085616A (en) 2017-08-22
CN107085616B true CN107085616B (en) 2021-03-16

Family

ID=59608640

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710397805.8A Active CN107085616B (en) 2017-05-31 2017-05-31 False comment suspicious site detection method based on multi-dimensional attribute mining in LBSN (location based service)

Country Status (1)

Country Link
CN (1) CN107085616B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107784124B (en) * 2017-11-23 2021-08-24 重庆邮电大学 LBSN (location based service) hyper-network link prediction method based on space-time relationship
CN109639633B (en) * 2018-11-02 2021-11-12 平安科技(深圳)有限公司 Abnormal flow data identification method, abnormal flow data identification device, abnormal flow data identification medium, and electronic device
CN109670542A (en) * 2018-12-11 2019-04-23 田刚 A kind of false comment detection method based on comment external information
CN109829733B (en) * 2019-01-31 2023-02-03 重庆大学 False comment detection system and method based on shopping behavior sequence data
CN113434628B (en) * 2021-05-14 2023-07-25 南京信息工程大学 Comment text confidence detection method based on feature level and propagation relation network
CN113468553B (en) * 2021-06-02 2022-07-19 湖北工业大学 Privacy protection analysis system and method for industrial big data
CN113724035B (en) * 2021-07-29 2023-10-17 河海大学 Malicious user detection method based on feature learning and graph reasoning

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010067070A1 (en) * 2008-12-11 2010-06-17 Scansafe Limited Malware detection
CN103235933A (en) * 2013-04-15 2013-08-07 东南大学 Vehicle abnormal behavior detection method based on Hidden Markov Model

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010067070A1 (en) * 2008-12-11 2010-06-17 Scansafe Limited Malware detection
CN103235933A (en) * 2013-04-15 2013-08-07 东南大学 Vehicle abnormal behavior detection method based on Hidden Markov Model

Also Published As

Publication number Publication date
CN107085616A (en) 2017-08-22

Similar Documents

Publication Publication Date Title
CN107085616B (en) False comment suspicious site detection method based on multi-dimensional attribute mining in LBSN (location based service)
CN110162593B (en) Search result processing and similarity model training method and device
WO2022041979A1 (en) Information recommendation model training method and related device
CN107835113B (en) Method for detecting abnormal user in social network based on network mapping
CN107330461B (en) Emotion and trust based collaborative filtering recommendation method
CN106992994B (en) Automatic monitoring method and system for cloud service
Hu et al. Social spammer detection with sentiment information
WO2019128529A1 (en) Url attack detection method and apparatus, and electronic device
Fire et al. Computationally efficient link prediction in a variety of social networks
US20190073593A1 (en) Detecting content items in violation of an online system policy using templates based on semantic vectors representing content items
US20140122294A1 (en) Determining a characteristic group
CN112231570B (en) Recommendation system support attack detection method, device, equipment and storage medium
Olmezogullari et al. Pattern2Vec: Representation of clickstream data sequences for learning user navigational behavior
US9286379B2 (en) Document quality measurement
CN110855648B (en) Early warning control method and device for network attack
WO2019051962A1 (en) Real relationship matching method and apparatus for social platform users, and readable storage medium
WO2019019385A1 (en) Cross-platform data matching method and apparatus, computer device and storage medium
CN109471978B (en) Electronic resource recommendation method and device
Boididou et al. Learning to detect misleading content on twitter
US20150134663A1 (en) Method, apparatus, and computer-readable storage medium for grouping social network nodes
CN112771564A (en) Artificial intelligence engine that generates semantic directions for web sites to map identities for automated entity seeking
US20160124965A1 (en) Biased Users Detection
WO2018068648A1 (en) Information matching method and related device
US20190073410A1 (en) Text-based network data analysis and graph clustering
CN110990683A (en) Microblog rumor integrated identification method and device based on region and emotional characteristics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant