CN113821706B - Social network user credibility assessment method based on soft interval support vector machine - Google Patents

Social network user credibility assessment method based on soft interval support vector machine Download PDF

Info

Publication number
CN113821706B
CN113821706B CN202111119250.3A CN202111119250A CN113821706B CN 113821706 B CN113821706 B CN 113821706B CN 202111119250 A CN202111119250 A CN 202111119250A CN 113821706 B CN113821706 B CN 113821706B
Authority
CN
China
Prior art keywords
user
credibility
information
social network
users
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111119250.3A
Other languages
Chinese (zh)
Other versions
CN113821706A (en
Inventor
邢玲
高建平
吴红海
赵康
姚景龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan University of Science and Technology
Original Assignee
Henan University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Henan University of Science and Technology filed Critical Henan University of Science and Technology
Priority to CN202111119250.3A priority Critical patent/CN113821706B/en
Publication of CN113821706A publication Critical patent/CN113821706A/en
Application granted granted Critical
Publication of CN113821706B publication Critical patent/CN113821706B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Business, Economics & Management (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computing Systems (AREA)
  • Primary Health Care (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Strategic Management (AREA)
  • Medical Informatics (AREA)
  • Marketing (AREA)
  • Mathematical Physics (AREA)
  • Human Resources & Organizations (AREA)
  • General Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a social network user credibility assessment method based on a soft interval support vector machine, which comprises the steps of crawling configuration file information and generated content information of users from a social network, marking the users, calculating the user configuration file information credibility according to the configuration file information of each user, calculating the user generated content information credibility according to the generated content information of each user, taking a vector formed by the user configuration file information credibility of each user and the user generated content information credibility as an input of a training sample, taking a label of the user as a label of the training sample, training the soft interval support vector machine, acquiring the user configuration file information credibility and the user generated content information credibility of the users when the credibility assessment of the users in the social network is required, and inputting the soft interval support vector machine to obtain a user credibility assessment result. The invention improves the accuracy of user credibility assessment through the soft interval support vector machine.

Description

Social network user credibility assessment method based on soft interval support vector machine
Technical Field
The invention belongs to the technical field of reliability evaluation of social network users, and particularly relates to a reliability evaluation method of social network users based on a soft interval support vector machine.
Background
The social network platform and the number of users in the big data age are explosively increased, so that the social network platform not only becomes an indispensable information interaction platform and an indispensable information transmission medium in daily life of people, but also has a huge and complex user group. Users in the social network are important nodes for information transmission of the social platform, the inundation of malicious users can influence the smoothness and the healthy development of the information transmission in the social platform, and meanwhile, the credibility evaluation of the social network users has important research significance in the fields of information screening, public opinion governance, network security, user identification and the like. Therefore, quantification and evaluation of the credibility of users in the social network become an important research topic in the credibility research of the users in the social network.
The social network brings convenience to people's information exchange and emotion expression, and the characteristic of openness also enables a large number of malicious users to be filled in the network. Malicious users can generate a great deal of false and malicious behaviors or information in the social network, and the credibility of the malicious users in the social network is increased through fictitious configuration file information. In order to better identify malicious users and trusted users in a social network, reasonable and accurate assessment of user credibility is required. The evaluation of the user credibility of the social network mainly comprises the steps of quantitatively analyzing the user information in the network, and representing the user credibility in the social network through the calculation of the user information. In order to ensure the accuracy and the reasonability of the user credibility assessment, the processing and the quantification of each characteristic item in the user configuration file information and the user generated content information are required to be enhanced, and the accuracy of the user credibility assessment algorithm is improved.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, and provides a social network user credibility assessment method based on a soft interval support vector machine, wherein a soft interval support vector machine algorithm is used for transforming a user credibility assessment dimension from a linear sum one-dimensional linear space to a two-dimensional coordinate system space so as to improve the accuracy of user credibility assessment.
In order to achieve the above purpose, the social network user credibility assessment method based on the soft interval support vector machine comprises the following steps:
s1: crawling configuration file information and generated content information of N users from a social network, wherein the configuration file information of the users comprises nicknames, education degrees of the users, user profiles and mutual powder numbers of the users, the generated content information of the users comprises the number of endorsements of the users Wen Dian, the number of forwarding the blogs and the number of comments of the blogs, then marking the users, when a label flag (i) =1 indicates that the user i is trusted, when a label flag (i) =0 indicates that the user i is not trusted, i=1, 2, … and N;
s2: extracting characteristic attribute data from the configuration file information of each user, and then calculating the credibility U of the configuration file information of the user P (i);
S3: extracting characteristic attribute data from the generated content information of each user, and then calculating user generated content information credibility U ucg (i);
S4: the user configuration file information credibility U of each user P (i) And user generated content information trustworthiness U ucg (i) Vectors (U) P (i),U ucg (i) As input in the training sample, taking the label flag (i) of the user as the label in the training sample;
s5: the soft interval support vector machine is adopted as a social network user credibility evaluation model, and the training sample obtained in the step S4 is adopted to train the soft interval support vector machine;
s6: when the credibility of the user in the social network is required to be evaluated, the same method of the step S2 is adopted to calculate the credibility of the user configuration file information of the user, the same method of the step S3 is adopted to calculate the credibility of the user generated content information of the user, and after vectors are formed, the information is input into the soft interval support vector machine trained in the step S5, and a user credibility evaluation result is obtained.
The invention relates to a social network user credibility assessment method based on a soft interval support vector machine, which comprises the steps of crawling configuration file information and generated content information of users from a social network, marking the users, calculating the user configuration file information credibility according to the configuration file information of each user, calculating the user generated content information credibility according to the generated content information of each user, taking a vector formed by the user configuration file information credibility of each user and the user generated content information credibility as input in a training sample, taking a label of the user as a label of the training sample, training the soft interval support vector machine, and when the credibility assessment of the users in the social network is required, obtaining the user configuration file information credibility and the user generated content information credibility of the users, and inputting the soft interval support vector machine to obtain a user credibility assessment result.
According to the invention, the soft interval support vector machine is used for transforming the user credibility assessment dimension from the linear sum one-dimensional linear space to the two-dimensional coordinate system space, so that the problem that the user credibility assessment result is aliased at the threshold value is solved, and the accuracy of the user credibility assessment is improved.
Drawings
FIG. 1 is a flow chart of an embodiment of a social network user credibility assessment method based on a soft interval support vector machine of the present invention;
FIG. 2 is a graph showing the accuracy of the reliability assessment results of the present invention compared with the other three comparison methods;
FIG. 3 is a graph showing the accuracy of the confidence assessment results of the present invention versus the other three comparison methods;
FIG. 4 is a graph showing recall ratio comparisons of confidence assessment results for the present invention with three other comparison methods;
FIG. 5 is a graph comparing F1 scores of results of confidence scores of the present invention with other three comparison methods.
Detailed Description
The following description of the embodiments of the invention is presented in conjunction with the accompanying drawings to provide a better understanding of the invention to those skilled in the art. It is to be expressly noted that in the description below, detailed descriptions of known functions and designs are omitted here as perhaps obscuring the present invention.
Examples
FIG. 1 is a flowchart of an embodiment of a social network user credibility assessment method based on a soft interval support vector machine. As shown in FIG. 1, the method for evaluating the credibility of the social network user based on the soft interval support vector machine comprises the following specific steps:
s101: acquiring user data:
and crawling configuration file information and generated content information of N users from a social network, wherein the configuration file information of the users comprises nicknames, education degrees of the users, user profiles and mutual powder numbers of the users, the generated content information of the users comprises the number of user blogs Wen Dian, the number of blogs forwarded and the number of blogs commends, then the users are marked, when a label flag (i) =1 indicates that the user i is trusted, when a label flag (i) =0 indicates that the user i is not trusted, i=1, 2, … and N.
S102: calculating the credibility of the user profile information:
the user profile information is a reflection of the authenticity of the user in the social network, and has high credibility, so that the credibility of the user of the social network can be evaluated by adopting the credibility of the user profile information. For example, the new wave microblog platform is provided with a complete personal information system, and each personal information is designed with strict format correction when being filled in, so that the reality and effectiveness of the information are ensured. The user information it relates to includes 20 types, 14 kinds of user profile information, and 6 kinds of user generated content information. The user profile information includes: user nicknames, UIDs, gender, birthdays, educational backgrounds, user profiles, URLs, professions, companies, hometown, fan numbers, cross-correlation numbers, cross-powder numbers, and interest tags. The user-generated content information includes: number of blogs, number of endorsements Wen Dian, number of blogs forwarding, number of blogs comments, number of blogs labels, and blogs special symbol.
Extracting characteristic attribute data from the configuration file information of each user, and then calculating the credibility U of the configuration file information of the user P (i)。
When the user profile information credibility is calculated, the method is divided into overall credibility and local credibility, the user profile information integrity is adopted to represent the credibility of the user profile information integrity, the user profile information influence index represents the credibility of the user profile information local, and the user profile information credibility is obtained by linearly summing the user profile integrity and the user profile information influence index.
The information integrity of the user configuration file is the ratio of the number of personal information labels which the user is willing to disclose to other users in the social network and the total number of labels of the user information integrity evaluation system. The calculation formula of the user profile information integrity UI (i) is thus as follows:
wherein A (i) represents the number of personal information tags actually disclosed by the user i, and n represents the total number of personal information tags of the user in the social network.
The user profile information influence index refers to the quantitative summation of a limited number of feature items that highly contribute to the calculation of the confidence in calculating the user confidence of the user profile information. The feature items in the user profile information are complicated, and selecting a larger number of feature items causes calculation errors and increases calculation costs, so that only the user profile, the user nickname, the user education level and the mutual powder number are selected to represent the user profile information influence index in the embodiment. The calculation formula of the user profile information influence index G (i) of each user is thus as follows:
G(i)=λ 1 F(i)+λ 2 E(i)+λ 3 P(i)+λ 4 H(i)
wherein F (i) represents a user nickname type number of user i, F (i) =1, 2, …, K F ,K F Representing the number of nickname categories of the user; e (i) represents the education level of user i, E (i) =1, 2, …, K E ,K E A number of levels representing a degree of education of the user; p (i) represents the profile status of user i, P (i) =0 represents that user i has no profile, and P (i) =1 represents that user i has a profile; h (i) represents the mutual number ranking of user i, H (i) =1, 2, …, K H ,K H Representing the number of mutual powder number grades of the user i; lambda (lambda) 1 、λ 2 、λ 3 、λ 4 Respectively represent the nickname type F (i) and the education of the userThe degree E (i), the user profile state P (i) and the user mutual number H (i).
Then the user configuration file information credibility U of each user is calculated by adopting the following formula P (i):
U P (i)=UI(i)+G(i)
Weights lambda required for user profile information impact index 1 、λ 2 、λ 3 、λ 4 In order to solve the type difference and magnitude difference between different information, the embodiment calculates the weight by adopting an information entropy weight distribution method, and the specific method is as follows:
according to the number of values of four characteristic items in the user profile information influence index, constructing a weight distribution judgment matrix A of the user profile information influence index characteristic items G
Wherein K is P =2, indicating the profile status value number of the user.
Judgment matrix A G Representing the ratio between the four feature items. The ratio between feature items represents the importance ratio in the influence index between different feature items. For judgment matrix A G Performing feature decomposition to obtain maximum feature value lambda max The corresponding feature vector is normalized, and the normalized vector is used as a weight vector (lambda 4321 )。
In the present embodiment, let K F =2,K E =4,K H =6, then judge matrix a G
Maximum eigenvalue lambda obtained after eigenvalue decomposition max =4, and the consistency ratio cr=0.0006, much smaller than 0.1, phase conforming to the consistency testThe requirements are that it is reasonable to state the judgment matrix. Eigenvalue lambda max Normalized feature vector of=4 to obtain weights of each feature item of λ 1 =0.375、λ 2 =0.25、λ 3 =0.25、λ 4 =0.125。
S103: calculating the credibility of the user generated content information:
the invention defines the credibility of the user generated content information from the two aspects of influence breadth and propagation breadth of the user release blog, determines the calculated content and the characteristic item based on the user credibility of the user generated content, and characterizes the credibility of the user from two different angles of propagation and influence of the user generated content in a social network respectively, so that the calculated results of the two parts are linearly summed to obtain the credibility of the user generated content information.
The influence breadth of the blog issued by the user is the influence degree of the blog issued by the user on other users, and is mainly reflected in the frequency of praise and comment behaviors of the other users on the target user blog. The calculation formula of the influence breadth IU (i) of each user release blog is as follows:
wherein M is i Representing the number of blogs issued by user i, D i,m Representing the praise number of the mth blog in the user i's release blog, C i,m Representing the number of comments of the mth blog in the blog posted by user i, m=1, 2, …, M i . The denominator in the formula is added with 1 to prevent the denominator from taking zero. Obviously, the larger the value of the influence extent IU (i), the larger the influence extent of the user generated content.
The spreading breadth of the user-issued blog is the frequency of browsing the blog issued by the user by other users, and is mainly measured by the length of the forwarding chain of the user blog, namely, the longer the forwarding chain of the user-issued blog is, the wider the spreading of user-generated content is. The calculation formula of the propagation breadth CU (i) of each user release blog is as follows:
wherein RT i,m And representing the forwarding chain length of the mth blog in the user i published blog.
Then the user generated content information credibility U of each user is calculated by adopting the following formula ucg (i):
U P (i)=IU(i)+CU(i)
S104: determining a training sample:
the user configuration file information influence index G (i) of each user and the user generated content information credibility U are used for generating the content information ucg (i) Vectors (U) P (i),U ucg (i) As input of training samples, the label flag (i) of the user is used as the label of the training samples.
S105: training a soft interval support vector machine:
and (4) using the soft interval support vector machine as a social network user credibility evaluation model, and training the soft interval support vector machine by using the training sample obtained in the step (S104).
The input data of the social network user credibility assessment model in the invention is two-dimensional data (U P (i),U ucg (i) A linear discriminant function f=w in two dimensions T x+b, then a hyperplane W may be used T x+b=0, where x represents the input, W is the weight vector, b is the classification threshold, and the superscript T represents the transpose. And the classification line is required to classify all samples correctly, i.e. it is required to satisfy the following formula:
y i (W T +b)-1≥0,i=1,2,…,N,y i =±1
wherein w= { ω 12 ,…,ω d The vector is a normal vector, determines the direction of the hyperplane, d is the number of eigenvalues, and b determines the distance between the hyperplane and the origin. As long as W and b are determined, one division hyperplane can be uniquely determined. Dividing the distance between the hyperplane and any point on the marginal hyperplane on two sides of the hyperplane into
Specifically, the algebraic expression that the sample points in the support vector machine need to all satisfy the constraint condition is as follows:
the soft-spacing support vector machine allows some samples to fail constraint because the linear inseparability means some sample points fail to satisfy the condition that the function spacing is 1 or more, namely1-y i (W T x i +b) > 0. The solution is to introduce a relaxation variable ζ for each sample point i For those sample points that do not meet the constraint, so that the function interval plus the relaxation variable is greater than or equal to 1, then our constraint becomes the following:
wherein ζ 0/1 Represented is a 0/1 loss function, as follows:
on the one hand, in order to optimize the soft interval support vector machine and improve the evaluation accuracy, a relaxation variable is introduced into a constraint condition, and a balance coefficient C is added into an objective function to solve the problem. On the other hand, the maximization interval exists when the function hyperplane needs to be satisfied, so that samples which do not satisfy the constraint are as few as possible, and the balance coefficient C is added for the purpose of reconciliationAnd y i (W T x i +b)+ζ i Two parts equal to or more than 1Coefficients of the partitions. The representation function of the soft-interval support vector machine can then be written as the following formula:
s.t.y i (W T x i +b)≥1-ζ ii ≥0,i=1,2,…,N
wherein C > 0 is called a balance coefficient, the constraint on misclassification is increased when the C value is large, and the constraint on misclassification is reduced when the C value is small, wherein the value of the balance coefficient C is C=10 k And k= -3, -2, -1,0,1,2,3.
The soft interval support vector machine is adopted, and the user credibility assessment dimension can be transformed from a linear summation one-dimensional linear space to a two-dimensional coordinate system space, so that the accuracy of the user credibility assessment is improved.
S106: user credibility assessment:
when the reliability evaluation needs to be performed on the user in the social network, the user configuration file information reliability of the user is calculated by adopting the same method in the step S102, the user generated content information reliability of the user is calculated by adopting the same method in the step S103, and the user generated content information reliability is input into the soft interval support vector machine trained in the step S105 after the vector is formed, so that the user reliability evaluation result is obtained.
In order to better illustrate the technical effects of the invention, the invention is experimentally verified by adopting a specific example. In the experimental verification, user data are selected from the social network newwave microblogs, and the reliability evaluation results are evaluated by taking accuracy, precision, recall and F-measure (F1) as evaluation indexes.
In order to compare the effectiveness and rationality of the invention in the user credibility evaluation, the embodiment selects three user credibility evaluation methods as comparison methods and the invention herein to evaluate the user credibility and compare the results. The comparison method 1 adopts the method in the document "A.Narayanan, A.Garg, I.Arora, T.Sureka et al," IronSense: towards the Identification of Fake User-Profiles on Twitter Using Machine Learning, "2018Fourteenth International Conference on Information Processing (ICINPLO), pp.1-7, bangalore, india,2018," and adopts the relevant algorithm of machine learning to quantitatively learn the user profile information so as to achieve the purpose of evaluating the user credibility. Comparative method 2 the method in document "H.Slimi, I.Bounhas and y.slimani," URL-Based Tweet Credibility Evaluation, "2019IEEE/ACS 16th International Conference on Computer Systems and Applications (AICCSA), pp.1-6,Abu Dhabi,United Arab Emirates,Nov.2019," characterizes user trustworthiness by quantifying user-generated content information. The comparison method 3 adopts the method in the literature of Q.Sun, N.Wang, Y.Zhou, et al, "Identification of Influential Online Social Network Users Based on Multi-features," International Journal of Pattern Recognition & Artificial Intelligence vol.30, no.6, pp.1659015.1-1659015.15,2016, "comprehensively considers various types of information of users, and adopts the PageRank algorithm to quantitatively process the user information so as to evaluate the user credibility.
FIG. 2 is a graph showing the accuracy of the reliability evaluation results of the present invention compared with the other three comparison methods. FIG. 3 is a graph showing the accuracy of the confidence assessment of the present invention versus the other three comparison methods. FIG. 4 is a graph showing the recall ratio of the reliability evaluation results of the present invention and the other three comparison methods. FIG. 5 is a graph comparing F1 scores of results of confidence scores of the present invention with other three comparison methods. As can be seen from fig. 2,3, 4 and 5, the present invention evaluates user credibility in a two-dimensional plane, avoiding the problem that linear summation easily causes aliasing of credible users and malicious users at classification thresholds. Because the number of users and the evaluation index are in a negative correlation relationship, namely when the number of users continuously increases, the number of users in the interval hyperplane is increased, so that the relaxation becomes larger, the tolerance to noise data becomes smaller, and the error of the evaluation result of the users becomes larger, thereby leading the evaluation index to show a descending trend. The three comparison methods and the user credibility evaluation result of the invention have the accuracy rate of 0.13, 0.14, 0.08 and 0.07 respectively, and the method provided by the invention has the advantages of lowest accuracy rate and better robustness.
While the foregoing describes illustrative embodiments of the present invention to facilitate an understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, but is to be construed as protected by the accompanying claims insofar as various changes are within the spirit and scope of the present invention as defined and defined by the appended claims.

Claims (4)

1. A social network user credibility assessment method based on a soft interval support vector machine is characterized by comprising the following steps:
s1: crawling configuration file information and generated content information of N users from a social network, wherein the configuration file information of the users comprises nicknames, education degrees of the users, user profiles and mutual powder numbers of the users, the generated content information of the users comprises the number of endorsements of the users Wen Dian, the number of forwarding the blogs and the number of comments of the blogs, then marking the users, when a label flag (i) =1 indicates that the user i is trusted, when a label flag (i) =0 indicates that the user i is not trusted, i=1, 2, … and N;
s2: extracting characteristic attribute data from the configuration file information of each user, and then calculating the credibility U of the configuration file information of the user P (i);
S3: extracting characteristic attribute data from the generated content information of each user, and then calculating user generated content information credibility U ucg (i);
S4: the user configuration file information credibility U of each user P (i) And user generated content information trustworthiness U ucg (i) Vectors (U) P (i),U ucg (i) As input in the training sample, taking the label flag (i) of the user as the label in the training sample;
s5: the soft interval support vector machine is adopted as a social network user credibility evaluation model, and the training sample obtained in the step S4 is adopted to train the soft interval support vector machine;
s6: when the credibility of the user in the social network is required to be evaluated, the same method of the step S2 is adopted to calculate the credibility of the user configuration file information of the user, the same method of the step S3 is adopted to calculate the credibility of the user generated content information of the user, and after vectors are formed, the information is input into the soft interval support vector machine trained in the step S5, and a user credibility evaluation result is obtained.
2. The method for evaluating the credibility of a social network user according to claim 1, wherein the user profile information credibility U in step S2 P (i) The calculation method of (2) is as follows:
the user profile information integrity UI (i) is calculated using the following formula:
wherein A (i) represents the number of personal information tags actually disclosed by the user i, and n represents the total number of personal information tags of the user in the social network;
the user profile information impact index G (i) is calculated using the following formula:
G(i)=λ 1 F(i)+λ 2 E(i)+λ 3 P(i)+λ 4 H(i)
wherein F (i) represents a user nickname type number of user i, F (i) =1, 2, …, K F ,K F Representing the number of nickname categories of the user; e (i) represents the education level of user i, E (i) =1, 2, …, K E ,K E A number of levels representing a degree of education of the user; p (i) represents the profile status of user i, P (i) =0 represents that user i has no profile, and P (i) =1 represents that user i has a profile; h (i) represents the mutual number ranking of user i, H (i) =1, 2, …, K H ,K H Representing the number of mutual powder number grades of the user i; lambda (lambda) 1 、λ 2 、λ 3 、λ 4 The preset weights respectively represent the nickname type F (i), the education degree E (i), the user profile state P (i) and the user mutual flour number H (i);
then the user configuration file information of each user is calculated by adopting the following formulaConfidence level U P (i):
U P (i)=UI(i)+G(i)。
3. The method for evaluating the credibility of a social network user according to claim 2, wherein the weight λ is 1 、λ 2 、λ 3 、λ 4 The method is calculated by adopting an information entropy weight distribution method, and the specific method is as follows:
according to the number of values of four characteristic items in the user profile information influence index, constructing a weight distribution judgment matrix A of the user profile information influence index characteristic items G
Wherein K is P =2, representing the number of profile states of the user;
for judgment matrix A G Performing feature decomposition to obtain maximum feature value lambda max The corresponding feature vector is normalized, and the normalized vector is used as a weight vector (lambda 4321 )。
4. The method for evaluating the credibility of a social network user according to claim 1, wherein the user generates the credibility U of the content information in step S3 ucg (i) The calculation method of (2) is as follows:
the influence breadth IU (i) of each user release blog is calculated by adopting the following formula:
wherein M is i Representing the number of blogs issued by user i, D i,m Representing the praise number of the mth blog in the user i's release blog, C i,m Representing the number of comments of the mth blog in the blog posted by user i, m=1, 2, …, M i
The propagation breadth CU (i) of each user's post blog is calculated using the following formula:
wherein RT i,m Representing the forwarding chain length of the mth blog in the user i published blog; then the user generated content information credibility U of each user is calculated by adopting the following formula ucg (i):
U P (i)=IU(i)+CU(i)。
CN202111119250.3A 2021-09-24 2021-09-24 Social network user credibility assessment method based on soft interval support vector machine Active CN113821706B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111119250.3A CN113821706B (en) 2021-09-24 2021-09-24 Social network user credibility assessment method based on soft interval support vector machine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111119250.3A CN113821706B (en) 2021-09-24 2021-09-24 Social network user credibility assessment method based on soft interval support vector machine

Publications (2)

Publication Number Publication Date
CN113821706A CN113821706A (en) 2021-12-21
CN113821706B true CN113821706B (en) 2024-03-19

Family

ID=78921115

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111119250.3A Active CN113821706B (en) 2021-09-24 2021-09-24 Social network user credibility assessment method based on soft interval support vector machine

Country Status (1)

Country Link
CN (1) CN113821706B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109919794A (en) * 2019-03-14 2019-06-21 哈尔滨工程大学 A kind of microblog users method for evaluating trust based on belief propagation
WO2019183191A1 (en) * 2018-03-22 2019-09-26 Michael Bronstein Method of news evaluation in social media networks

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019183191A1 (en) * 2018-03-22 2019-09-26 Michael Bronstein Method of news evaluation in social media networks
CN109919794A (en) * 2019-03-14 2019-06-21 哈尔滨工程大学 A kind of microblog users method for evaluating trust based on belief propagation

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于AHP的社交网络信息可信度评价模型构建;刘银萍;李光强;余容;尹健;;情报探索;20180915(第09期);全文 *
基于主动学习和加权支持向量机的工业故障识别;朱东阳;沈静逸;黄炜平;梁军;;浙江大学学报(工学版);20170415(第04期);全文 *

Also Published As

Publication number Publication date
CN113821706A (en) 2021-12-21

Similar Documents

Publication Publication Date Title
Borges et al. Combining similarity features and deep representation learning for stance detection in the context of checking fake news
Taddy Multinomial inverse regression for text analysis
Wang et al. Drifted Twitter spam classification using multiscale detection test on KL divergence
Wang et al. Diversified and scalable service recommendation with accuracy guarantee
Korayem et al. De-anonymizing users across heterogeneous social computing platforms
Wang et al. Optimal feature selection for learning-based algorithms for sentiment classification
Peng et al. Multicriteria Decision‐Making Approach with Hesitant Interval‐Valued Intuitionistic Fuzzy Sets
Alabadla et al. Systematic review of using machine learning in imputing missing values
Oh A YouTube spam comments detection scheme using cascaded ensemble machine learning model
Goyal et al. Embedding networks with edge attributes
Phiwhorm et al. Adaptive multiple imputations of missing values using the class center
Ekosputra et al. Supervised machine learning algorithms to detect instagram fake accounts
Kurniawanda et al. Analysis sentiment cyberbullying in instagram comments with xgboost method
CN114139634A (en) Multi-label feature selection method based on paired label weights
CN113821706B (en) Social network user credibility assessment method based on soft interval support vector machine
Zhang et al. MERL: Multimodal event representation learning in heterogeneous embedding spaces
CN113343118A (en) Hot event discovery method under mixed new media
Fergus et al. Performance evaluation metrics
Alrubaian et al. A credibility assessment model for online social network content
Li et al. Deeplabel: Automated issue classification for issue tracking systems
Godichon-Baggioni et al. A penalized criterion for selecting the number of clusters for K-medians
Zhao et al. S3UCA: Soft‐Margin Support Vector Machine‐Based Social Network User Credibility Assessment Method
Hussein et al. Machine learning approach to sentiment analysis in data mining
Thirumoorthy et al. A feature selection model for document classification using Tom and Jerry Optimization algorithm
Fadel et al. A comparative study for supervised learning algorithms to analyze sentiment tweets

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant