CN113821706A - Social network user reliability evaluation method based on soft interval support vector machine - Google Patents

Social network user reliability evaluation method based on soft interval support vector machine Download PDF

Info

Publication number
CN113821706A
CN113821706A CN202111119250.3A CN202111119250A CN113821706A CN 113821706 A CN113821706 A CN 113821706A CN 202111119250 A CN202111119250 A CN 202111119250A CN 113821706 A CN113821706 A CN 113821706A
Authority
CN
China
Prior art keywords
user
credibility
information
social network
generated content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111119250.3A
Other languages
Chinese (zh)
Other versions
CN113821706B (en
Inventor
邢玲
高建平
吴红海
赵康
姚景龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan University of Science and Technology
Original Assignee
Henan University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Henan University of Science and Technology filed Critical Henan University of Science and Technology
Priority to CN202111119250.3A priority Critical patent/CN113821706B/en
Publication of CN113821706A publication Critical patent/CN113821706A/en
Application granted granted Critical
Publication of CN113821706B publication Critical patent/CN113821706B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Business, Economics & Management (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computing Systems (AREA)
  • Primary Health Care (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Strategic Management (AREA)
  • Medical Informatics (AREA)
  • Marketing (AREA)
  • Mathematical Physics (AREA)
  • Human Resources & Organizations (AREA)
  • General Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a social network user reliability evaluation method based on a soft interval support vector machine, which is used for crawling user configuration file information and generating content information from a social network and marking users, calculating user profile information reliability according to the profile information of each user, calculating user generated content information reliability according to the generated content information of each user, using a vector formed by the user profile information reliability of each user and the user generated content information reliability as an input of a training sample, using a label of the user as a label of the training sample, training the soft interval support vector machine, when the credibility evaluation of the users in the social network is needed, and obtaining the information credibility of the user configuration file of the user and the credibility of the user generated content information, and inputting the information into a soft interval support vector machine to obtain a user credibility evaluation result. The invention improves the accuracy of user reliability evaluation by a soft interval support vector machine.

Description

Social network user reliability evaluation method based on soft interval support vector machine
Technical Field
The invention belongs to the technical field of social network user reliability evaluation, and particularly relates to a social network user reliability evaluation method based on a soft interval support vector machine.
Background
In the big data era, the number of social network platforms and users is increased explosively, so that the social network platforms not only become indispensable information interaction platforms and information transmission media in daily life of people, but also become huge and complex user groups. The users in the social network are important nodes for information transmission of the social platform, the smooth and healthy development of the information transmission in the social platform can be influenced by the flooding of malicious users, and meanwhile, the reliability evaluation of the users in the social network has important research significance in the fields of information screening, public opinion governance, network security, user identification and the like. Therefore, quantifying and evaluating the credibility of users in the social network becomes an important research topic in the research on the credibility of users in the social network.
The social network brings convenience to information exchange and emotional expression of people, and meanwhile, the characteristics of openness of the social network enable the network to be full of a large number of malicious users. Malicious users can generate a large amount of false and malicious behaviors or information in the social network, and the credibility of the malicious users in the social network is increased through fictitious configuration file information. In order to better identify malicious users and trusted users in the social network, reasonable and accurate evaluation on the user credibility is required. The evaluation of the user credibility of the social network mainly comprises the steps of carrying out quantitative analysis on user information in the network and representing the user credibility in the social network by calculating the user information. In order to ensure the accuracy and the reasonableness of the user reliability evaluation, the processing and the quantification of each feature item in the user configuration file information and the user generated content information need to be enhanced, and the precision of the user reliability evaluation algorithm is improved.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a social network user reliability evaluation method based on a soft-interval support vector machine.
In order to achieve the purpose, the social network user reliability evaluation method based on the soft interval support vector machine comprises the following steps:
s1: crawling configuration file information and generated content information of N users from a social network, wherein the configuration file information of the users comprises user nicknames, user education degrees, user profiles and mutual power numbers, the generated content information of the users comprises user Bowen praise numbers, Bowen forwarding numbers and Bowen comment numbers, then marking the users, and when a label flag (i) is 1, the user i is credible, and when the label flag (i) is 0, the user i is not credible, i is 1,2, …, N;
s2: extracting characteristic attribute data from configuration file information of each user, and calculating user configuration file information credibility UP(i);
S3: extracting characteristic attribute data from generated content information of each user, and calculating user generated content information credibility Uucg(i);
S4: credibility U of user configuration file information of each userP(i) And user generated content information confidence Uucg(i) Constructed vector (U)P(i),Uucg(i) As input in the training sample, the label flag (i) of the user is taken as the label in the training sample;
s5: a soft interval support vector machine is adopted as a social network user reliability evaluation model, and the training sample obtained in the step S4 is adopted to train the soft interval support vector machine;
s6: when the user in the social network needs to be evaluated in credibility, the credibility of the user profile information of the user is calculated by the same method in the step S2, the credibility of the user generated content information of the user is calculated by the same method in the step S3, and the vector is formed and then input into the soft interval support vector machine trained in the step S5 to obtain a user credibility evaluation result.
The invention discloses a social network user reliability evaluation method based on a soft interval support vector machine, which is used for crawling user configuration file information and generating content information from a social network and marking the user, calculating user profile information reliability according to the profile information of each user, calculating user generated content information reliability according to the generated content information of each user, using a vector formed by the user profile information reliability of each user and the user generated content information reliability as an input in a training sample, using a label of the user as a label of the training sample, training the soft interval support vector machine, when the credibility evaluation of the users in the social network is needed, and obtaining the information credibility of the user configuration file of the user and the credibility of the user generated content information, and inputting the information into a soft interval support vector machine to obtain a user credibility evaluation result.
The method uses the soft interval support vector machine to transform the user reliability evaluation dimension from the one-dimensional linear space of linear summation into the two-dimensional coordinate system space, solves the problem of aliasing of the user reliability evaluation result at the threshold value, and improves the accuracy of user reliability evaluation.
Drawings
FIG. 1 is a flowchart of an embodiment of a social network user reliability assessment method based on a soft interval support vector machine according to the present invention;
FIG. 2 is a graph comparing the accuracy of the confidence evaluation results of the present invention with three other comparison methods;
FIG. 3 is a graph comparing the accuracy of the confidence evaluation results of the present invention with three other comparison methods;
FIG. 4 is a chart comparing recall of confidence evaluation results of three other comparison methods in accordance with the present invention;
FIG. 5 is a comparison graph of F1 scores of the results of the confidence evaluation of the present invention and three other comparison methods.
Detailed Description
The following description of the embodiments of the present invention is provided in order to better understand the present invention for those skilled in the art with reference to the accompanying drawings. It is to be expressly noted that in the following description, a detailed description of known functions and designs will be omitted when it may obscure the subject matter of the present invention.
Examples
FIG. 1 is a flowchart of a specific embodiment of a social network user credibility assessment method based on a soft interval support vector machine according to the present invention. As shown in fig. 1, the social network user reliability evaluation method based on the soft interval support vector machine of the present invention specifically includes the steps of:
s101: acquiring user data:
the method comprises the steps of crawling configuration file information of N users from a social network and generating content information, wherein the configuration file information of the users comprises user nicknames, user education degrees, user profiles and mutual power numbers, the generated content information of the users comprises user Bowen praise numbers, Bowen forwarding numbers and Bowen comment numbers, then marking the users, and when a label flag (i) is 1, the user i is credible, and when the label flag (i) is 0, the user i is not credible, i is 1,2, …, N.
S102: and (3) calculating the information credibility of the user configuration file:
the user profile information in the social network is a reflection of the authenticity of the user and has high credibility, so the credibility of the user profile information can be adopted to evaluate the credibility of the user in the social network. For example, the Sina microblog platform has a complete personal information system, and when various personal information is filled in, the microblog platform designs strict format correction, so that the reality and the effectiveness of the information are ensured. The user information involved includes 20 types, 14 user profile information and 6 user generated content information. The user profile information includes: user nickname, UID, gender, birthday, educational background, user profile, URL, occupation, company, hometown, fan count, correlation count, mutual fan count, and interest tag. The user-generated content information includes: the number of the Bowens, the number of the Bowen praise, the number of the Bowen forwarding, the number of the Bowen comments, the number of the Bowen labels and the special character of the Bowen.
Extracting characteristic attribute data from configuration file information of each user, and calculating user configuration file information credibility UP(i)。
In the embodiment, when the information credibility of the user configuration file is calculated, the information credibility is divided into the integral credibility and the local credibility, the integral credibility of the information of the user configuration file is characterized by adopting the information integrity table of the user configuration file, the credibility of the information locality of the user configuration file is represented by the information integrity index of the user configuration file, and the information credibility of the user configuration file can be obtained by linearly summing the integrity index of the user configuration file and the information influence index of the user configuration file.
The user profile information integrity is the ratio of the number of personal information tags which a user is willing to disclose to other users in the social network to the total number of tags of the user information integrity evaluation system. The formula for the user profile information integrity ui (i) is thus calculated as follows:
Figure BDA0003276475690000041
where a (i) represents the number of personal information tags actually disclosed by user i, and n represents the total number of personal information tags of users in the social network.
The user profile information influence index refers to the quantitative summation of a limited number of feature items which have high contribution to the calculation of the reliability in calculating the user reliability of the user profile information. The feature items in the user profile information are complicated, and the selection of a large number of feature items causes calculation errors and increases calculation overhead, so that in the embodiment, only the user profile, the user nickname, the user education degree and the mutual power number are selected to represent the influence index of the user profile information. The user profile information influence index g (i) for each user is therefore calculated as follows:
G(i)=λ1F(i)+λ2E(i)+λ3P(i)+λ4H(i)
wherein the content of the first and second substances,f (i) user nickname type number indicating user i, f (i) 1,2, …, KF,KFRepresenting the number of nickname categories of the user; e (i) indicates the education level of the user i, e (i) 1,2, …, KE,KEA number of levels representing the level of education of the user; p (i) indicates the profile status of user i, where p (i) ═ 0 indicates that user i has no profile, and p (i) ═ 1 indicates that user i has a profile; h (i) represents the mutual power rating of user i, h (i) 1,2, …, KH,KHRepresenting the mutual powder number grade quantity of the user i; lambda [ alpha ]1、λ2、λ3、λ4Respectively representing the preset weights of the user nickname type F (i), the user education degree E (i), the user profile state P (i) and the user mutual power number H (i).
Then, the following formula is adopted to calculate and obtain the user configuration file information credibility U of each userP(i):
UP(i)=UI(i)+G(i)
Weight lambda required for user profile information influence index1、λ2、λ3、λ4In order to solve the type difference and magnitude difference between different pieces of information, the weight is calculated by using an entropy weight distribution method in the embodiment, which includes the following specific steps:
according to the numeric values of four characteristic items in the user profile information influence index, a weight distribution judgment matrix A of the characteristic items of the user profile information influence index is constructedG
Figure BDA0003276475690000051
Wherein, KP2, the profile status of the user takes a value.
Judgment matrix AGRepresenting the ratio between the four characteristic terms. The ratio between feature terms represents the ratio of importance levels in the influence index between different feature terms. For judgment matrix AGPerforming characteristic decomposition to obtain maximum characteristic value lambdamaxNormalizing the corresponding characteristic vector, and taking the normalized vector as a weight directionQuantity (lambda)4321)。
In this example, assume KF=2,KE=4,KHIf 6, the matrix a is determinedG
Figure BDA0003276475690000052
Maximum eigenvalue lambda obtained after feature decompositionmax4, and the consistency ratio CR is 0.0006, which is much smaller than 0.1, meeting the relevant requirements of the consistency test, and indicating that the judgment matrix is reasonable. Characteristic value lambdamaxNormalizing the feature vectors of 4 to obtain the weight of each feature term of lambda1=0.375、λ2=0.25、λ3=0.25、λ4=0.125。
S103: and calculating the credibility of the user generated content information:
the method defines the credibility of the user generated content information from two aspects of the influence extent and the propagation extent of the user issuing the blog, determines the user credibility calculation content and the characteristic item based on the user generated content, and respectively represents the credibility of the user from two different angles of the propagation and the influence of the user generated content in the social network, so that the credibility of the user generated content information can be obtained by linearly summing the calculation results of the two parts.
The influence breadth of the user for publishing the blog article is the influence degree of the blog article published by the user on other users, and is mainly reflected in the frequency of praise and comment behaviors of the other users on the blog article of the target user. The calculation formula of the influence extent iu (i) of the blog release of each user is as follows:
Figure BDA0003276475690000061
wherein M isiIndicating the number of messages issued by user i, Di,mIndicates the praise number, C, of the mth blog article released by the user ii,mIndicating user i to issue a blog articleThe number of comments of the mth Bowen, M ═ 1,2, …, Mi. The addition of 1 to the denominator in the formula is to prevent the denominator from taking zero. Obviously, the larger the value of the influence extent iu (i), the larger the influence extent of the user-generated content.
The spread of the user published the blog article is the frequency of browsing the blog article published by the user by other users, and is mainly measured by the length of the forwarding chain of the user blog article, that is, the longer the forwarding chain of the user published the blog article is, the wider the spread of the user generated content is. The calculation formula of the broadcast extent cu (i) of each user issued the blog is as follows:
Figure BDA0003276475690000062
wherein, RTi,mIndicating that the user i issues the forwarding chain length of the mth blog article in the blog article.
Then, the following formula is adopted to calculate and obtain the user generated content information credibility U of each userucg(i):
UP(i)=IU(i)+CU(i)
S104: determining a training sample:
the user profile information influence index G (i) of each user and the user generated content information credibility Uucg(i) Constructed vector (U)P(i),Uucg(i) As input of the training sample, the label flag (i) of the user is used as the label of the training sample.
S105: training a soft interval support vector machine:
and (5) adopting the soft interval support vector machine as a social network user credibility evaluation model, and adopting the training samples obtained in the step S104 to train the soft interval support vector machine.
The input data of the social network user credibility evaluation model in the invention is two-dimensional data (U)P(i),Uucg(i) Linear discriminant function f ═ W) in two-dimensional spaceTx + b, then usable hyperplane WTThe separation is performed with x + b equal to 0, where x denotes the input, W the weight vector, b the classification threshold, and superscript T denotes the transposition. But requires that the classification line correctly classify all samplesClass, it is required that it satisfies the following formula:
yi(WT+b)-1≥0,i=1,2,…,N,yi=±1
wherein W ═ { ω ═ ω12,…,ωdThe term "is a normal vector, which determines the direction of the hyperplane, d is the number of eigenvalues, and b determines the distance between the hyperplane and the origin. Once W and b are determined, a partition hyperplane can be uniquely determined. The distance between the boundary hyperplane at the two sides of the boundary hyperplane and the boundary hyperplane is divided into
Figure BDA0003276475690000071
Specifically, the algebraic expressions of the sample points in the support vector machine that all need to satisfy the constraint condition are as follows:
Figure BDA0003276475690000072
the soft-spaced support vector machine allows some samples not to satisfy the constraint because the linear inseparability means that some sample points cannot satisfy the condition that the function spacing is greater than or equal to 1, that is
Figure BDA0003276475690000076
1-yi(WTxi+ b) > 0. The solution is to introduce a relaxation variable ζ for each sample pointiFor those sample points that do not satisfy the constraint, the function interval plus the slack variable is made to be greater than or equal to 1, and then our constraint becomes the following equation:
Figure BDA0003276475690000073
therein, ζ0/1Expressed is the 0/1 loss function, as follows:
Figure BDA0003276475690000074
on one hand, in order to optimize the soft interval support vector machine and improve the evaluation accuracy, a relaxation variable is introduced into a constraint condition, and a balance coefficient C is added into an objective function to solve the problem. On the other hand, the function hyperplane needs to be satisfied, a maximization interval exists, samples which do not satisfy the constraint are enabled to be as few as possible, and a balance coefficient C is added for blending
Figure BDA0003276475690000075
And yi(WTxi+b)+ζiCoefficient of more than or equal to 1. The expression function of the soft-space support vector machine can then be written as the following equation:
Figure BDA0003276475690000081
s.t.yi(WTxi+b)≥1-ζii≥0,i=1,2,…,N
wherein C > 0 is called balance coefficient, the restraint to misclassification is increased when the C value is large, the restraint to misclassification is reduced when the C value is small, and the value of the balance coefficient C is C-10kAnd k is-3, -2, -1,0,1,2, 3.
By adopting a soft interval support vector machine, the user reliability evaluation dimension can be transformed from a linear summation one-dimensional linear space to a two-dimensional coordinate system space, so that the accuracy of user reliability evaluation is improved.
S106: user credibility assessment:
when the user in the social network needs to be evaluated in credibility, the same method of step S102 is used to calculate the information credibility of the user profile of the user, the same method of step S103 is used to calculate the information credibility of the user generated content of the user, and the information is input into the soft interval support vector machine trained in step S105 after forming a vector, so as to obtain the evaluation result of the user credibility.
In order to better illustrate the technical effects of the invention, the invention is experimentally verified by using a specific example. In the experimental verification, user data is selected from the social network Xinlang microblog, and the accuracy (accuracy), precision (precision), recall (call) and F-measure (F1) are used as evaluation indexes to evaluate the reliability evaluation result.
In order to compare the effectiveness and the rationality of the invention in the evaluation of the user reliability, the embodiment selects three user reliability evaluation methods as comparison methods and the invention evaluates the user reliability and compares the results. The comparison method 1 adopts the method in the documents of "A.Narayanan, A.Garg, I.Arora, T.Sureka et al," ironSense: directions the Identification of Fake User-Profiles on Twitter Using Machine Learning, "2018 Fourent International Conference Information Processing (ICINPRO), pp.1-7, Bangalore, India, 2018", and adopts the related algorithm of Machine Learning to quantitatively learn the Information of the User profile so as to achieve the purpose of User credibility evaluation. The comparative method 2 adopts the methods in the documents "h.slim, i.bounhas and y.slim", "URL-Based two creation Evaluation", "2019IEEE/ACS 16th International Conference on Computer Systems and Applications (AICCSA), pp.1-6, Abu Dhabi, United arm emerates, nov.2019", and represents the user Credibility by quantifying the user-generated content information. The comparison method 3 adopts a method in the documents of ' Identification of information on line Social Network Users Based on Multi-features ', International Journal of Pattern Recognition and alarm significance vol.30, No.6, pp.1659015.1-1659015.15,2016 ', comprehensively considers various types of information of a user, and quantifies and processes the user information by adopting a PageRank algorithm, thereby evaluating the credibility of the user.
FIG. 2 is a graph comparing the accuracy of the confidence evaluation results of the present invention with three other comparison methods. FIG. 3 is a graph comparing the accuracy of the confidence evaluation results of the present invention with three other comparison methods. FIG. 4 is a chart comparing recall of confidence evaluation results of the present invention with three other comparison methods. FIG. 5 is a comparison graph of F1 scores of the results of the confidence evaluation of the present invention and three other comparison methods. As can be seen from fig. 2,3, 4 and 5, the method disclosed by the invention can be used for evaluating the user reliability under a two-dimensional plane, so that the problem that aliasing of trusted users and malicious users at a classification threshold is easily caused by linear summation is avoided. The number of the users and the evaluation index are in a negative correlation relationship, namely when the number of the users is continuously increased, the number of the users in the interval hyperplane is increased, so that the relaxation variable is increased, the tolerance to noise data is reduced, the error of the evaluation result of the users is increased, and the evaluation index is in a descending trend. The reduction rates of the accuracy rates of the three comparison methods and the user reliability evaluation result of the invention are respectively 0.13, 0.14, 0.08 and 0.07, and the method provided by the invention has the lowest reduction rate and better robustness.
Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, and various changes may be made apparent to those skilled in the art as long as they are within the spirit and scope of the present invention as defined and defined by the appended claims, and all matters of the invention which utilize the inventive concepts are protected.

Claims (4)

1. A social network user credibility assessment method based on a soft interval support vector machine is characterized by comprising the following steps:
s1: crawling configuration file information and generated content information of N users from a social network, wherein the configuration file information of the users comprises user nicknames, user education degrees, user profiles and mutual power numbers, the generated content information of the users comprises user Bowen praise numbers, Bowen forwarding numbers and Bowen comment numbers, then marking the users, and when a label flag (i) is 1, the user i is credible, and when the label flag (i) is 0, the user i is not credible, i is 1,2, …, N;
s2: extracting characteristic attribute data from configuration file information of each user, and calculating user configuration file information credibility UP(i);
S3: extracting characteristic attribute data from generated content information of each user, and calculating user generated content information credibility Uucg(i);
S4: credibility U of user configuration file information of each userP(i) And user generated content information confidence Uucg(i) Constructed vector (U)P(i),Uucg(i) As input in the training sample, the label flag (i) of the user is taken as the label in the training sample;
s5: a soft interval support vector machine is adopted as a social network user reliability evaluation model, and the training sample obtained in the step S4 is adopted to train the soft interval support vector machine;
s6: when the user in the social network needs to be evaluated in credibility, the credibility of the user profile information of the user is calculated by the same method in the step S2, the credibility of the user generated content information of the user is calculated by the same method in the step S3, and the vector is formed and then input into the soft interval support vector machine trained in the step S5 to obtain a user credibility evaluation result.
2. The method for assessing social network user credibility as claimed in claim 1, wherein the user profile information credibility U in step S2P(i) The calculation method of (2) is as follows:
calculating the user profile information integrity UI (i) by adopting the following formula:
Figure FDA0003276475680000011
wherein A (i) represents the number of personal information tags actually disclosed by the user i, and n represents the total number of the personal information tags of the user in the social network;
calculating the user profile information influence index G (i) by adopting the following formula:
G(i)=λ1F(i)+λ2E(i)+λ3P(i)+λ4H(i)
wherein f (i) indicates the user nickname type number of the user i, and f (i) is 1,2, …, KF,KFRepresenting the number of nickname categories of the user; e (i) indicates the education level of the user i, e (i) 1,2, …, KE,KEA number of levels representing the level of education of the user; p (i) indicates the profile status of user i, where p (i) ═ 0 indicates that user i has no profile, and p (i) ═ 1 indicates that user i has a profile; h (i) represents the mutual power rating of user i, h (i) 1,2, …, KH,KHRepresenting the mutual powder number grade quantity of the user i; lambda [ alpha ]1、λ2、λ3、λ4Respectively representing the preset weights of the nickname type F (i), the education degree E (i), the user profile state P (i) and the mutual powder number H (i) of the user;
then, the following formula is adopted to calculate and obtain the user configuration file information credibility U of each userP(i):
UP(i)=UI(i)+G(i)。
3. The social network user credibility assessment method of claim 1, wherein the weight λ1、λ2、λ3、λ4The entropy is calculated by adopting an information entropy weight distribution method, and the specific method is as follows:
according to the numeric values of four characteristic items in the user profile information influence index, a weight distribution judgment matrix A of the characteristic items of the user profile information influence index is constructedG
Figure FDA0003276475680000021
Wherein, KP2, representing the number of profile states of the user;
for judgment matrix AGPerforming characteristic decomposition to obtain maximum characteristic value lambdamaxNormalizing the corresponding characteristic vector, and taking the normalized vector as a weight vector (lambda)4321)。
4. The method for assessing social network user credibility as claimed in claim 1, wherein the user generated content information credibility U in step S3ucg(i) Is calculated as follows:
Calculating the influence extent IU (i) of the released blog articles of each user by adopting the following formula:
Figure FDA0003276475680000022
wherein M isiIndicating the number of messages issued by user i, Di,mIndicates the praise number, C, of the mth blog article released by the user ii,mThe number of comments of the mth blog article in the blog article, M is 1,2, …, Mi
Calculating the spreading degree CU (i) of the blog release of each user by adopting the following formula:
Figure FDA0003276475680000023
wherein, RTi,mThe length of a forwarding chain for the mth blog article in the blog article issued by the user i is represented;
then, the following formula is adopted to calculate and obtain the user generated content information credibility U of each userucg(i):
UP(i)=IU(i)+CU(i)。
CN202111119250.3A 2021-09-24 2021-09-24 Social network user credibility assessment method based on soft interval support vector machine Active CN113821706B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111119250.3A CN113821706B (en) 2021-09-24 2021-09-24 Social network user credibility assessment method based on soft interval support vector machine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111119250.3A CN113821706B (en) 2021-09-24 2021-09-24 Social network user credibility assessment method based on soft interval support vector machine

Publications (2)

Publication Number Publication Date
CN113821706A true CN113821706A (en) 2021-12-21
CN113821706B CN113821706B (en) 2024-03-19

Family

ID=78921115

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111119250.3A Active CN113821706B (en) 2021-09-24 2021-09-24 Social network user credibility assessment method based on soft interval support vector machine

Country Status (1)

Country Link
CN (1) CN113821706B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109919794A (en) * 2019-03-14 2019-06-21 哈尔滨工程大学 A kind of microblog users method for evaluating trust based on belief propagation
WO2019183191A1 (en) * 2018-03-22 2019-09-26 Michael Bronstein Method of news evaluation in social media networks

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019183191A1 (en) * 2018-03-22 2019-09-26 Michael Bronstein Method of news evaluation in social media networks
CN109919794A (en) * 2019-03-14 2019-06-21 哈尔滨工程大学 A kind of microblog users method for evaluating trust based on belief propagation

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘银萍;李光强;余容;尹健;: "基于AHP的社交网络信息可信度评价模型构建", 情报探索, no. 09, 15 September 2018 (2018-09-15) *
朱东阳;沈静逸;黄炜平;梁军;: "基于主动学习和加权支持向量机的工业故障识别", 浙江大学学报(工学版), no. 04, 15 April 2017 (2017-04-15) *

Also Published As

Publication number Publication date
CN113821706B (en) 2024-03-19

Similar Documents

Publication Publication Date Title
Bhardwaj et al. Sentiment analysis for Indian stock market prediction using Sensex and nifty
Genc et al. Discovering context: classifying tweets through a semantic transform based on wikipedia
CN109918505B (en) Network security event visualization method based on text processing
Rezvan et al. Analyzing and learning the language for different types of harassment
Qiu Toward deterministic and semiautomated SPADE analysis
CN111898040B (en) Circle layer user influence evaluation method combined with social network
Binder et al. An overview of techniques for linking high‐dimensional molecular data to time‐to‐event endpoints by risk prediction models
CN107943897B (en) User recommendation method
Oh A YouTube spam comments detection scheme using cascaded ensemble machine learning model
Saghir et al. Monitoring process variation using modified EWMA
CN104111969B (en) The method and system of a kind of similarity measurement
Kibanov et al. Is web content a good proxy for real-life interaction? A case study considering online and offline interactions of computer scientists
CN114139634A (en) Multi-label feature selection method based on paired label weights
Shishah Jointbert for detecting arabic fake news
Wang et al. Prediction analysis for microbiome sequencing data
Kurniawanda et al. Analysis sentiment cyberbullying in instagram comments with xgboost method
CN113343118A (en) Hot event discovery method under mixed new media
CN113821706A (en) Social network user reliability evaluation method based on soft interval support vector machine
CN107203632B (en) Topic Popularity prediction method based on similarity relation and cooccurrence relation
Balamurali An investigation of the effects of misclassification errors on the analysis of means
Linmans et al. Improved and robust controversy detection in general web pages using semantic approaches under large scale conditions
Yu et al. Analyzing the association between emotions and socioeconomic characteristics of census tracts via user‐generated content
Zamanzade et al. Estimating the area under a receiver operating characteristic curve using partially ordered sets
Burrell Measuring similarity of concentration between different informetric distributions: Two new approaches
Yu et al. Text classification by using natural language processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant