CN113806616B - Microblog user identification method, system, electronic equipment and storage medium - Google Patents

Microblog user identification method, system, electronic equipment and storage medium Download PDF

Info

Publication number
CN113806616B
CN113806616B CN202110938155.XA CN202110938155A CN113806616B CN 113806616 B CN113806616 B CN 113806616B CN 202110938155 A CN202110938155 A CN 202110938155A CN 113806616 B CN113806616 B CN 113806616B
Authority
CN
China
Prior art keywords
user
posting
preset
microblog
score
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110938155.XA
Other languages
Chinese (zh)
Other versions
CN113806616A (en
Inventor
李际朝
李青龙
李轩
张旺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Smart Starlight Information Technology Co ltd
Original Assignee
Beijing Smart Starlight Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Smart Starlight Information Technology Co ltd filed Critical Beijing Smart Starlight Information Technology Co ltd
Priority to CN202110938155.XA priority Critical patent/CN113806616B/en
Publication of CN113806616A publication Critical patent/CN113806616A/en
Application granted granted Critical
Publication of CN113806616B publication Critical patent/CN113806616B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a microblog user identification method, a microblog user identification system, electronic equipment and a storage medium, wherein the method comprises the following steps: determining basic behavior initial scores according to posting content, posting time, posting pictures, posting user names, posting user attention lists, posting fan user lists, posting number micro-blogs blank-to-micro-blogs proportion in the posting number micro-blogs preset by the posting, commented number and user authentication level of each piece of micro-blogs in a first preset time; generating an initial user relation graph according to the microblog users and the basic behavior initial scores to obtain a behavior total score value; if the value is larger than the first preset value, the user is a natural registered account user; if the second preset score is smaller than the first preset score, the user is an unnatural registered account user; and if the value is not greater than the first preset value and not less than the second preset value, the user is a state undetermined user. And the accuracy of user identification is improved by diversifying the influence factors of the user account.

Description

Microblog user identification method, system, electronic equipment and storage medium
Technical Field
The invention relates to the field of identification of abnormal accounts in social networks, in particular to a microblog user identification method, a microblog user identification system, electronic equipment and a storage medium.
Background
In recent years, microblogs are used as a domestic main public social network platform, and play an important role in public events and social events. However, in recent years, more and more organized and purposeful garbage accounts have appeared on microblogs. They pollute the public speaking space in a large quantity by a batch and scripted mode, and produce public opinion effect to confuse public viewing. The water army account is distinguished from the real user account, and has important significance for mastering public opinion initiative and purifying network environment.
The detection of abnormal accounts commonly seen at present mostly depends on the collection of abnormal behaviors of the abnormal accounts in hot events or the deep learning by using a neural network. The water army is judged through the hot event, the subjectivity is strong, misjudgment is easily caused to common users, and the accuracy is low; the deep learning neural network algorithm is complex, the interpretability is poor, and misjudgment of a user which cannot be interpreted frequently occurs.
Disclosure of Invention
In view of the above, the embodiments of the present invention provide a method, a system, an electronic device, and a storage medium for identifying a microblog user, so as to solve the disadvantage of inaccurate user identification in the prior art.
Therefore, the embodiment of the invention provides the following technical scheme:
according to a first aspect, an embodiment of the present invention provides a method for identifying a microblog user, including: acquiring posting content, posting time, posting pictures, a posting user name, a posting user attention list, a posting fan user list, blank transfer microblog proportion in the posting number microblogs preset by the posting, comment quantity and user authentication level of each piece of microblog data in a first preset time;
judging whether the user authentication level of the microblog user corresponding to each piece of microblog data is an authenticated user or not respectively;
if the user authentication grade of the microblog user corresponding to the microblog data is an authenticated user, the basic behavior initial score of the authenticated user corresponding to the microblog data is a preset authentication score;
if the user authentication grade of the microblog user corresponding to the microblog data is a non-authenticated user, calculating a basic behavior initial score of the non-authenticated user corresponding to the microblog data according to the posting content, the posting time, the posting picture, the number of commentary in a second preset time of the posting person, the blank-to-paste microblog proportion of the number of the postings preset by the posting person and the preset weight of the postings;
Connecting a user name of a microblog user poster corresponding to each microblog data, a user list concerned by the poster and a basic behavior initial score to form an initial user relation graph, wherein the user relation graph is used for representing the relation between the microblog user and concerned users and fan users associated with the microblog user;
setting the total score value of the behaviors of authenticated users in the initial user relation diagram as a preset authentication score;
obtaining a behavior total score value of a non-authentication user in the initial user relation diagram according to the initial user relation diagram, the basic behavior initial score, the attention user list of the poster and the fan user list of the poster;
respectively judging whether the total score value of the behaviors corresponding to each microblog user is larger than a first preset score value;
if the total behavior score value corresponding to the microblog user is larger than the first preset score value, the microblog user is a natural registered account user;
if the total behavior score value corresponding to the microblog user is smaller than or equal to a first preset score value, judging whether the total behavior score value corresponding to the microblog user is smaller than a second preset score value, wherein the second preset score value is smaller than the first preset score value;
if the total score value of the behaviors corresponding to the microblog users is smaller than a second preset score value, the microblog users are unnatural registered account users;
If the total score value of the behaviors corresponding to the microblog users is greater than or equal to the second preset score value, the microblog users are state undetermined users.
Optionally, if the user authentication level of the microblog user corresponding to the microblog data is a non-authenticated user, calculating a basic behavior initial score of the non-authenticated user corresponding to the microblog data according to the posting content, the posting time, the posting picture, the number of commented pieces in a second preset time of the posting person, a blank-to-post microblog proportion of the number of the postings preset by the posting person and a preset weight, wherein the step comprises:
obtaining the repeated microblog quantity of the non-authentication user, which is the same as the posting content in a third preset time, according to the posting time and the posting content;
dividing the number of repeated microblogs by the first preset number to obtain the repeatability of the posting content;
obtaining the same repeated picture quantity as the non-authentication user posting pictures in a third preset time according to the posting time and the posting pictures;
dividing the number of repeated pictures by a second preset number to obtain the repeatability of the posting pictures;
and obtaining the basic behavior initial score of the non-authentication user according to the number of commented in the second preset time of the poster, the blank transfer microblog proportion in the microblog of the preset posting number of the poster, the preset weight, the posting content repeatability and the posting picture repeatability.
Optionally, the step of obtaining the number of repeated pictures of the non-authenticated user, which is the same as the posting pictures in the third preset time, according to the posting time and the posting pictures includes:
determining the picture size of the posting picture according to the posting picture;
judging whether the picture size of the posting picture is larger than a preset size;
if the picture size of the posting picture is smaller than or equal to the preset size, the posting picture is not subjected to picture repetition degree comparison, and the number of repeated pictures which are the same as the posting picture in the third preset time is 0;
if the picture size of the posting picture is larger than the preset size, respectively removing the preset size from the bottom of the picture in the third preset time, comparing the picture repetition degree of the microblog picture with the posting picture with the preset size removed, and determining the number of repeated pictures of the non-authentication user, which is the same as the posting picture, in the third preset time.
Optionally, the calculation formula of the basic behavior initial score of the non-authenticated user is as follows:
basic behavior score=number of commented in second preset time is commented on weight-number of postings preset by a postings person, and blank transfer micro-doctor ratio is forwarding weight-postings picture repetition degree-postings content repetition degree.
Optionally, if the user authentication level of the microblog user corresponding to the microblog data is a non-authenticated user, calculating a basic behavior initial score of the non-authenticated user corresponding to the microblog data according to the posting content, the posting time, the posting picture, the number of commented in a second preset time of the poster, a blank-to-post microblog proportion of the number of postings preset by the poster and a preset weight of the poster, and then further including:
judging whether the user name of the poster of the non-authentication user is a full digital user name or not;
if the user name of the poster is a full digital user name, subtracting a first preset user name score constant from the initial score of the basic behavior of the non-authenticated user;
if the user name of the poster is not the all-digital user name, judging whether the user name of the poster of the non-authentication user is the all-Chinese character user name with the preset number;
if the user name of the poster is the full Chinese character user name with the preset word number, subtracting a second preset user name score constant from the initial score of the basic behavior of the non-authenticated user, wherein the second preset user name score constant is smaller than the first preset user name score constant;
if the user name of the poster is not the full Chinese character user name with the preset word number, the initial score of the basic behavior of the non-authentication user is kept unchanged.
Optionally, the step of obtaining the total score value of the behavior of the non-authenticated user in the initial user relationship diagram according to the initial user relationship diagram, the initial score of the basic behavior, the attention user list of the poster and the fan user list of the poster includes:
step S701: determining basic behavior initial scores of each non-authentication user, fan users corresponding to the non-authentication users, basic behavior initial scores corresponding to the fan users, basic behavior initial scores corresponding to the non-authentication attention users and the non-authentication attention users in the attention users according to the initial user relation diagram;
step S702: obtaining an average behavior score of the user fan corresponding to the non-authentication user according to the initial scores of the basic behaviors corresponding to the fan users and the fan users;
step S703: obtaining a user attention average behavior score corresponding to the non-authentication user according to the non-authentication attention user and the basic behavior initial score corresponding to the non-authentication attention user in the attention users;
step S704: and obtaining a total behavior score value corresponding to each non-authenticated user in the initial user relation diagram according to the basic behavior initial score, the user fan average behavior score and the user attention average behavior score of each non-authenticated user.
Optionally, after the step of obtaining the total score value of the behaviors corresponding to each non-authenticated user in the initial user relationship graph according to the initial score of the basic behaviors of each non-authenticated user, the average behavior score of the user fan and the average behavior score of the user attention, the method further includes:
step S705: reconnecting a user name of a microblog user corresponding to each microblog data, a attention user list of the poster, an attention user list of the poster and a behavior total score value into a new user relationship graph, and calculating the behavior total score value of each microblog user in the new user relationship graph;
step S706: calculating the score difference value corresponding to each microblog user according to the total score value of the behavior of each microblog user in the initial user relation diagram and the total score value of the behavior of each microblog user in the new user relation diagram;
step S707: obtaining an average difference value according to the score difference value corresponding to each microblog user and the total number of the microblog users;
step S708: judging whether the average difference value is smaller than a preset difference value threshold value or not;
step S709: if the average difference value is smaller than a preset difference value threshold value, taking the behavior total score value as a final behavior total score value;
step S710: if the average difference is greater than or equal to the preset difference threshold, taking the new user relationship graph as an initial user relationship graph, taking the total score of the behaviors of the microblog users in the new user relationship graph as the initial score of the basic behaviors, and returning to the step S701 until the average difference is smaller than the preset difference threshold.
According to a second aspect, an embodiment of the present invention provides a microblog user identification system, including:
the first acquisition module is used for acquiring posting content, posting time, posting pictures, posting user names, posting attention user lists, posting fan user lists, blank transfer micro-betting proportion in the posting quantity micro-blogs preset by the posting, commented quantity and user authentication level in the second preset time of the posting;
the first judging module is used for judging whether the user authentication level of the microblog user corresponding to each piece of microblog data is an authenticated user or not;
the first processing module is used for determining that the basic behavior initial score of the authenticated user corresponding to the microblog data is a preset authentication score if the user authentication grade of the microblog user corresponding to the microblog data is the authenticated user;
the second processing module is used for calculating a basic behavior initial score of the non-authenticated user corresponding to the microblog data according to the posting content, posting time, posting pictures, the number of commented in a second preset time of the posting person, the blank posting-transferring microblog proportion in the microblog of the preset posting number of the posting person and the preset weight if the user authentication level of the microblog user corresponding to the microblog data is the non-authenticated user;
The third processing module is used for connecting the user name of the user of the micro-blog user, the attention user list of the poster and the basic behavior initial score corresponding to each micro-blog data into an initial user relation graph, wherein the user relation graph is used for representing the relation between the micro-blog user and the attention user and the fan user associated with the micro-blog user;
the fourth processing module is used for setting the total behavior score value of the authenticated user in the initial user relation diagram as a preset authentication score;
the fifth processing module is used for obtaining the total behavior score value of the non-authenticated user in the initial user relation diagram according to the initial user relation diagram, the initial basic behavior score, the attention user list of the poster and the fan user list of the poster;
the second judging module is used for judging whether the total score value of the behaviors corresponding to each microblog user is larger than a first preset score value or not;
the sixth processing module is configured to, if the total score value of the behaviors corresponding to the microblog users is greater than the first preset score value, determine that the microblog users are natural registered account users;
the third judging module is used for judging whether the total behavior score value corresponding to the microblog user is smaller than a second preset score or not if the total behavior score value corresponding to the microblog user is smaller than or equal to the first preset score, and the second preset score is smaller than the first preset score;
The seventh processing module is configured to, if the total score value of the behaviors corresponding to the microblog users is smaller than the second preset score value, determine that the microblog users are unnatural registered account users;
and the eighth processing module is used for determining the microblog user as a state undetermined user if the total behavior score value corresponding to the microblog user is greater than or equal to the second preset score value.
Optionally, the second processing module includes: the first processing unit is used for obtaining the repeated microblog quantity of the non-authentication user, which is the same as the posting content in a third preset time, according to the posting time and the posting content; the second processing unit is used for dividing the number of repeated microblogs by the first preset number to obtain the repeatability of the posting content; the third processing unit is used for obtaining the same repeated picture quantity as the non-authentication user posting pictures in a third preset time according to the posting time and the posting pictures; the fourth processing unit is used for dividing the number of repeated pictures by the second preset number to obtain the repeatability of the posted pictures; and the fifth processing unit is used for obtaining the basic behavior initial score of the non-authentication user according to the number of commented in the second preset time of the poster, the blank-to-paste microblog proportion in the microblog of the preset posting number of the poster, the preset weight, the posting content repeatability and the posting picture repeatability.
Optionally, the third processing unit includes: the first processing subunit is used for determining the picture size of the posting picture according to the posting picture; the judging subunit is used for judging whether the picture size of the posting picture is larger than a preset size; the second processing subunit is configured to, if the size of the picture of the posted picture is smaller than or equal to the preset size, not compare the picture repetition degree of the posted picture, where the number of repeated pictures that are the same as the posted picture in the third preset time is 0; and the third processing subunit is used for respectively removing the preset size from the bottom of the picture when the picture size of the posting picture is larger than the preset size, comparing the picture repetition degree of the microblog picture and the posting picture with the removed preset size, and determining the number of repeated pictures of the non-authentication user, which is the same as the posting picture, in the third preset time.
Optionally, the calculation formula of the basic behavior initial score of the non-authenticated user is as follows:
basic behavior score=number of commented in second preset time is commented on weight-number of postings preset by a postings person, and blank transfer micro-doctor ratio is forwarding weight-postings picture repetition degree-postings content repetition degree.
Optionally, the method further comprises: a fourth judging module, configured to judge whether the user name of the poster of the non-authenticated user is a full digital user name; the ninth processing module is used for subtracting a first preset user name score constant from the initial score of the basic behavior of the non-authenticated user if the user name of the poster is a full-digital user name; a fifth judging module, configured to judge whether the user name of the poster of the non-authenticated user is a full kanji user name with a preset number of words if the user name of the poster is not a full kanji user name; a tenth processing module, configured to subtract a second preset user name score constant from the initial score of the basic behavior of the non-authenticated user if the user name of the poster is a full-kanji user name with a preset number of words, where the second preset user name score constant is smaller than the first preset user name score constant; and the eleventh processing module is used for keeping the initial score of the basic behavior of the non-authentication user unchanged if the user name of the poster is not the full Chinese character user name with the preset word number.
Optionally, the fifth processing module includes: the sixth processing unit is used for determining basic behavior initial scores of each non-authentication user, fan users corresponding to the non-authentication users, basic behavior initial scores corresponding to the fan users, basic behavior initial scores corresponding to the non-authentication attention users and the non-authentication attention users in the attention users according to the initial user relation diagram; the seventh processing unit is used for obtaining the user fan average behavior score corresponding to the non-authentication user according to the basic behavior initial score corresponding to the fan user and the fan user; the eighth processing unit is used for obtaining the user attention average behavior score corresponding to the non-authentication user according to the non-authentication attention user and the basic behavior initial score corresponding to the non-authentication attention user in the attention users; and the ninth processing unit is used for obtaining the total behavior score value corresponding to each non-authenticated user in the initial user relation graph according to the basic behavior initial score, the user fan average behavior score and the user attention average behavior score of each non-authenticated user.
Optionally, the fifth processing module further includes: a tenth processing unit, configured to reconnect the user name of the micro-blog user poster, the attention user list of the poster and the behavior total score value corresponding to each micro-blog data into a new user relationship graph, and calculate the behavior total score value of each micro-blog user in the new user relationship graph; the eleventh processing unit is used for respectively calculating the score difference value corresponding to each microblog user according to the total score value of the behavior of each microblog user in the initial user relation diagram and the total score value of the behavior of each microblog user in the new user relation diagram; the twelfth processing unit is used for obtaining an average difference value according to the score difference value corresponding to each microblog user and the total number of the microblog users; the judging unit is used for judging whether the average difference value is smaller than a preset difference value threshold value or not; a thirteenth processing unit, configured to take the total behavioral score value as a final total behavioral score value if the average difference value is less than a preset difference threshold; and the fourteenth processing unit is used for taking the new user relation diagram as an initial user relation diagram if the average difference value is greater than or equal to a preset difference value threshold value, taking the total action score of the microblog users in the new user relation diagram as a basic action initial score, and returning to the sixth processing unit until the average difference value is smaller than the preset difference value threshold value.
According to a third aspect, an embodiment of the present invention provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to cause the at least one processor to perform the microblog user identification method described in any one of the first aspects above.
According to a fourth aspect, an embodiment of the present invention provides a computer readable storage medium, where computer instructions are stored, where the computer instructions are configured to cause a computer to perform the method for identifying a microblog user described in any one of the first aspects above.
The technical scheme of the embodiment of the invention has the following advantages:
the embodiment of the invention provides a microblog user identification method, a microblog user identification system, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring posting content, posting time, posting pictures, a posting user name, a posting user attention list, a posting fan user list, blank transfer microblog proportion in the posting number microblogs preset by the posting, comment quantity and user authentication level of each piece of microblog data in a first preset time; judging whether the user authentication level of the microblog user corresponding to each piece of microblog data is an authenticated user or not respectively; if the user authentication grade of the microblog user corresponding to the microblog data is an authenticated user, the basic behavior initial score of the authenticated user corresponding to the microblog data is a preset authentication score; if the user authentication grade of the microblog user corresponding to the microblog data is a non-authenticated user, calculating a basic behavior initial score of the non-authenticated user corresponding to the microblog data according to the posting content, the posting time, the posting picture, the number of commentary in a second preset time of the posting person, the blank-to-paste microblog proportion of the number of the postings preset by the posting person and the preset weight of the postings; connecting a user name of a microblog user poster corresponding to each microblog data, a user list concerned by the poster and a basic behavior initial score to form an initial user relation graph, wherein the user relation graph is used for representing the relation between the microblog user and concerned users and fan users associated with the microblog user; setting the total score value of the behaviors of authenticated users in the initial user relation diagram as a preset authentication score; obtaining a behavior total score value of a non-authentication user in the initial user relation diagram according to the initial user relation diagram, the basic behavior initial score, the attention user list of the poster and the fan user list of the poster; respectively judging whether the total score value of the behaviors corresponding to each microblog user is larger than a first preset score value; if the total behavior score value corresponding to the microblog user is larger than the first preset score value, the microblog user is a natural registered account user; if the total behavior score value corresponding to the microblog user is smaller than or equal to a first preset score value, judging whether the total behavior score value corresponding to the microblog user is smaller than a second preset score value, wherein the second preset score value is smaller than the first preset score value; if the total score value of the behaviors corresponding to the microblog users is smaller than a second preset score value, the microblog users are unnatural registered account users; if the total score value of the behaviors corresponding to the microblog users is greater than or equal to the second preset score value, the microblog users are state undetermined users. The method comprises the steps of firstly obtaining posting content, posting time, posting pictures, posting user names, posting attention user lists, posting fan user lists, blank-to-posting micro-lottery proportion of the posting quantity micro-blogs preset by the posting, commented quantity and user authentication level in second preset time of the posting in each piece of micro-blog data in first preset time. Secondly, judging the user authentication level of each microblog user, wherein when the user authentication level is an authenticated user, the initial score of the basic behavior of the authenticated user is a preset authentication score, and the preset authentication score is a larger fixed value, so that the calculation stability of a subsequent relation diagram is ensured; when the user authentication level is a non-authenticated user, calculating the basic behavior initial score of the non-authenticated user corresponding to the microblog data according to the posting content, the posting time, the posting picture, the number of commented in a second preset time of the poster, the blank-to-post microblog proportion of the number of the posting preset by the poster in the microblog and the preset weight of the poster. Then, all microblog users are connected into an initial user relation diagram according to the user names of the senders, the attention user list of the senders and the initial score of the basic behavior; the total score value of the authenticated user behavior in the initial user relation diagram is equal to the preset authentication score value, and the fixed value is kept unchanged; and for the non-authenticated user, obtaining the behavior scores of the fan user and the concerned user according to the initial user relation diagram, and adding the behavior influences of the fan user and the concerned user to the basic behavior initial score of the non-authenticated user to obtain the total behavior score value of the non-authenticated user. And finally, carrying out water arming judgment on the microblog users according to the total behavioral score value of each microblog user, and identifying the microblog water arming account. According to the method, the behavior influence of the fan users and concerned users associated with the microblog users and a plurality of characteristics of the microblog users are comprehensively considered, the behavior total score value of each microblog user is obtained, then the water army recognition is carried out according to the size of the behavior total score value, the water army account can be more accurately recognized by diversified water army account influence factors, and the accuracy of the water army recognition is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a specific example of a microblog user identification method according to an embodiment of the present invention;
FIG. 2 is a block diagram of one specific example of a microblog user identification system of an embodiment of the present invention;
fig. 3 is a schematic diagram of an electronic device according to an embodiment of the invention.
Detailed Description
The following description of the embodiments of the present invention will be made apparent and fully in view of the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The embodiment of the invention provides a microblog user identification method, which comprises the steps S1-S12 as shown in fig. 1.
Step S1: the method comprises the steps of obtaining posting content, posting time, posting pictures, a posting user name, a posting user attention list, a posting fan user list, blank transfer microblog proportion of the posting number microblogs preset by the posting, comment quantity and user authentication level of each piece of microblog data in a first preset time.
In this embodiment, the first preset time is determined empirically, and the first preset time is not less than 7 days, so that accuracy can be guaranteed, and specifically, 20 days can be set, which is only schematically described in this embodiment, but not limited thereto; of course, in other embodiments, other values may be set, and may be set reasonably according to actual needs.
In this embodiment, the crawler technology or the business interface may be used to obtain the full amount of microblog data in the first preset time, that is, all the microblog data in the first preset time period. Each piece of microblog data comprises posting content, posting time, posting pictures, a posting user name, a posting user attention list, a posting fan user list, a blank-to-posting microblog proportion of the number of the postings preset by the posting, the number of commented comments and user authentication level in a second preset time of the posting.
In this embodiment, the preset number of posts is set to be not less than 20, and a value lower than 20 will reduce the accuracy of recognition of the system. Specifically, 20 strips can be provided, which is only schematically described in the embodiment, and the embodiment is not limited to the above; of course, in other embodiments, other values may be set, and may be set reasonably according to actual needs. The blank transfer-attached microblog refers to simply transferring the original microblog without any comments. The calculation formula of the blank transfer microblog proportion is blank transfer microblog proportion=blank transfer microblog number/total transfer microblog number 100%.
In this embodiment, the second preset time is determined according to an empirical value obtained by a pre-experiment, and may specifically be set to 20 days, which is only schematically described in this embodiment, but not limited thereto; of course, in other embodiments, other values may be set, and may be set reasonably according to actual needs. The number of commented on in the second preset time of the poster is the absolute number of commented on in the second preset time, and the absolute number refers to the number of commented on, not the proportion of commented on.
In this embodiment, the user authentication level includes microblog yellow V, orange V, blue V, gold V, and microblog vip level 1 or more. This is only schematically described in the present embodiment, and is not limited thereto.
Step S2: and respectively judging whether the user authentication level of the microblog user corresponding to each piece of microblog data is an authenticated user.
In this embodiment, the authenticated user includes a microblog member user and a real-name authenticated user. When the user authentication level of the microblog user is an authenticated user, executing step S3; when the user authentication level of the microblog user is not an authenticated user, step S4 is performed.
Step S3: if the user authentication grade of the microblog user corresponding to the microblog data is the authenticated user, the basic behavior initial score of the authenticated user corresponding to the microblog data is a preset authentication score.
In this embodiment, the water army account is not a real user account, and user level authentication is not performed normally, so when the user authentication level of the microblog user corresponding to the microblog data is an authenticated user, it is indicated that the likelihood that the microblog user is a water army is small, and the initial value of the basic behavior of the authenticated user is set to a larger fixed value, which has the advantage of reducing the fluctuation of the user value in the circulation process. In this embodiment, the value range of the preset authentication score is greater than or equal to 5, and specifically, the preset authentication score may be set to 10 points, which is only schematically illustrated in this embodiment, but not limited thereto. In other embodiments, the preset authentication score may be 1, thereby reducing the amount of computation.
Step S4: if the user authentication grade of the microblog user corresponding to the microblog data is a non-authenticated user, calculating a basic behavior initial score of the non-authenticated user corresponding to the microblog data according to the posting content, the posting time, the posting picture, the number of commentary in a second preset time of the posting person, the blank-to-paste microblog proportion of the number of the posting person preset in the microblog and the preset weight of the posting person.
In this embodiment, when the user authentication level of the microblog user corresponding to the microblog data is a non-authenticated user, further judgment needs to be further performed according to the user behavior of the microblog user. The repetition degree of the posting content can be determined according to the posting content and the posting time of the microblog data, and the repetition degree of the posting picture can be determined according to the posting picture and the posting time; and then obtaining the initial score of the basic behavior of the non-authenticated user according to the number of the commented, the commented weight, the blank transfer microblog proportion, the transfer weight, the posting picture repetition degree and the posting content repetition degree, and comprehensively considering the behavior of the user from a plurality of factors to better identify the water army.
In this embodiment, the calculation formula of the basic behavior initial score of the non-authenticated user is as follows: basic behavior initial score = number of commented in second preset time is commented weight-number of postings preset by the poster is a blank transfer micro-scale in micro-blog proportion of transfer weight-postings picture repetition-postings content repetition.
Step S5: and connecting the user name of the user of the micro-blog user, the attention user list of the poster and the basic behavior initial score corresponding to each micro-blog data into an initial user relation graph, wherein the user relation graph is used for representing the relation between the micro-blog user and the attention user and the fan user associated with the micro-blog user.
In this embodiment, the user name of the micro-blog user poster, the attention user list of the poster and the basic behavior initial score corresponding to each micro-blog data are connected to form an initial user relationship diagram. The initial user relation graph is composed of nodes and connecting lines among the nodes, wherein each microblog user (namely, the user name of a poster) is one node in the user relation graph, the corresponding numerical value on the node is the initial score of the basic behavior corresponding to the microblog user, and the attention and fan relations among different microblog users are the connecting lines in the user relation graph. The user relation diagram is a relation network of the correlation among the microblog users, the information of all the correlation users related to the microblog users can be clearly determined through the user relation diagram, the auxiliary judgment of the water force behaviors of the microblog users is carried out through the behaviors of the correlation users, and the accuracy of user identification is improved.
Step S6: and setting the total behavior score value of the authenticated user in the initial user relation diagram as a preset authentication score.
In this embodiment, since the authenticated user is very likely to be a water army, after the initial user relationship diagram is generated, the behavior of the associated user may be ignored, and the total score value of the behaviors of the authenticated user is equal to the initial score of the basic behavior of the user and is still the preset authentication score. Therefore, the calculated amount can be reduced, the fluctuation degree of the user score value is reduced, and the stability of the relation graph is improved.
Step S7: and obtaining the total behavior score value of the non-authenticated user in the initial user relation diagram according to the initial user relation diagram, the basic behavior initial score, the attention user list of the poster and the fan user list of the poster.
In this embodiment, for a non-authenticated user, it is necessary to combine the own behavior of the microblog user with the behavior of the associated user to perform the water army determination. And determining a basic behavior initial score of an associated user associated with the non-authenticated user according to the initial user relationship diagram, wherein the associated user comprises a microblog user focused by a poster obtained according to a poster focused user list and a microblog user focused by a poster obtained according to a poster fan user list. The total behavior score value corresponding to the non-authentication user is determined according to the three factors of the initial basic behavior score, the user fan behavior score and the user attention behavior score, so that the accuracy of water army recognition is improved.
Specifically, the calculation formula of the total score value of the behavior corresponding to the non-authenticated user is as follows:
total behavioral score value = base behavioral initial score + user fan mean behavioral score + user attention mean behavioral score.
Step S8: and respectively judging whether the total score value of the corresponding behaviors of each microblog user is larger than a first preset score, wherein the first preset score is smaller than a preset authentication score. If the total score value of the behaviors corresponding to the microblog users is larger than the first preset score value, executing a step S9; and if the total action score value corresponding to the microblog user is not greater than the first preset score value, executing the step S10.
In this embodiment, the first preset score is determined according to a pre-experimental empirical value, and may specifically be set to 1; this is only schematically described in the present embodiment, and is not limited thereto. The first preset score is set to be smaller than the preset authentication score, and the preset authentication score is a larger score value, so that the authenticated user can be ensured to be necessarily identified as a non-water army, and misjudgment is not caused.
Step S9: if the total score value of the behaviors corresponding to the microblog users is larger than the first preset score value, the microblog users are natural registered account users.
In this embodiment, when the total score value of the behaviors corresponding to the microblog user is greater than the first preset score value, it is considered that the user does not have water army behaviors, so that the microblog user is determined to be a natural registered account user, that is, a non-water army. Specifically, the natural registration account number refers to an account number generated by a natural person actively registering. Typically, each natural person user has only one or a few microblog account numbers.
Step S10: if the total behavior score value corresponding to the microblog user is smaller than or equal to the first preset score value, judging whether the total behavior score value corresponding to the microblog user is smaller than a second preset score value, wherein the second preset score value is smaller than the first preset score value.
In this embodiment, when the total score value of the behaviors corresponding to the microblog users is smaller than or equal to the first preset score value, it is further determined whether the total score value is smaller than the second preset score value. When the total score value of the behaviors is smaller than the second preset score value, executing the step S11; and when the total score value of the behaviors is not smaller than the second preset score value, executing the step S12.
In this embodiment, the second preset score is determined according to an empirical value, and the second preset score is set smaller than the first preset score, and may specifically be set to-1; this is only schematically described in the present embodiment, and is not limited thereto.
Step S11: if the total score value of the behaviors corresponding to the microblog users is smaller than the second preset score value, the microblog users are unnatural registered account users.
In this embodiment, the unnatural registered account is an account that is registered in batch by an automated program. These account numbers are enormous and are held in a small number of users' hands for purposeful social networking activities through automated procedures or specialized manual teams. The unnatural registered account number is the water army.
In this embodiment, when the total score value of the behaviors corresponding to the microblog user is smaller than the second preset score, it is indicated that the total score value of the behaviors of the microblog user is smaller and the probability of being the water army is larger, so that the microblog user is determined to be the water army.
Step S12: if the total score value of the behaviors corresponding to the microblog users is greater than or equal to the second preset score value, the microblog users are state undetermined users.
In this embodiment, when the total score value of the behaviors corresponding to the microblog users is between the first preset score and the second preset score, it is indicated that the water army behaviors of the microblog users are not obvious, are fuzzy, and are difficult to determine, so that the microblog users are set as state pending users.
The advantage of setting the undetermined state interval is that for users with certain parts difficult to determine, the part of users is reserved, manual identification can be performed, and accuracy is further improved.
The method comprises the steps of firstly obtaining posting content, posting time, posting pictures, posting user names, posting attention user lists, posting fan user lists, blank-to-posting micro-lottery proportion of the posting quantity micro-blogs preset by the posting, commented quantity and user authentication level in second preset time of the posting in each piece of micro-blog data in first preset time. Secondly, judging the user authentication level of each microblog user, wherein when the user authentication level is an authenticated user, the initial score of the basic behavior of the authenticated user is a preset authentication score, and the preset authentication score is a larger fixed value, so that the calculation stability of a subsequent relation diagram is ensured; when the user authentication level is a non-authenticated user, calculating the basic behavior initial score of the non-authenticated user corresponding to the microblog data according to the posting content, the posting time, the posting picture, the number of commented in a second preset time of the poster, the blank-to-post microblog proportion of the number of the posting preset by the poster in the microblog and the preset weight of the poster. Then, all microblog users are connected into an initial user relation diagram according to the user names of the senders, the attention user list of the senders and the initial score of the basic behavior; the total score value of the authenticated user behavior in the initial user relation diagram is equal to the preset authentication score value, and the fixed value is kept unchanged; and for the non-authenticated user, obtaining the behavior scores of the fan user and the concerned user according to the initial user relation diagram, and adding the behavior influences of the fan user and the concerned user to the basic behavior initial score of the non-authenticated user to obtain the total behavior score value of the non-authenticated user. And finally, carrying out water arming judgment on the microblog users according to the total behavioral score value of each microblog user, and identifying the microblog water arming account. According to the method, the behavior influence of the microblog users, the fan users associated with the microblog users and the concerned users is comprehensively considered, the behavior total score value of each microblog user is obtained, then the water army identification is carried out according to the size of the behavior total score value, the water army account can be more accurately identified by diversified water army account influence factors, and the accuracy of user identification is improved.
As an exemplary embodiment, step S4 includes steps of calculating, if the user authentication level of the microblog user corresponding to the microblog data is a non-authenticated user, a basic behavior initial score of the non-authenticated user corresponding to the microblog data according to the posting content, the posting time, the posting picture, the number of commented pieces in a second preset time of the posting person, a blank-to-paste microblog proportion of the number of the postings preset by the posting person, and a preset weight of the microblog, where the steps include steps S401 to S405.
S401: and obtaining the repeated microblog quantity which is the same as the posting content of the non-authentication user in the third preset time according to the posting time and the posting content.
In this embodiment, the third preset time is determined according to an empirical value, and may specifically be set to 1 day, which is only schematically described in this embodiment, but not limited thereto; of course, in other embodiments, other values may be set, and may be set reasonably according to actual needs.
Specifically, firstly, finding out microblog contents released by all microblog users in a third preset time according to the release time of each microblog, respectively comparing the microblog contents released in the third preset time with the release contents of the non-authenticated user, comparing whether the comparison contents are the same, if the comparison contents are the same, adding 1 to the number of repeated microblogs, and thus obtaining the number of repeated microblogs with the same release contents of the non-authenticated user in the third preset time.
S402: dividing the number of repeated microblogs by the first preset number to obtain the repeatability of the posting content.
In this embodiment, the first preset number is determined according to an empirical value, and may specifically be 100000, which is only schematically described in this embodiment, and is not limited thereto; of course, in other embodiments, other values may be set, and may be set reasonably according to actual needs.
In this embodiment, the calculation formula of the posting content repeatability is as follows:
the repetition of the posting content = the same number of repeated microblogs/the first preset number of posting content sent by the non-authenticated user within the third preset time.
S403: and obtaining the number of repeated pictures, which are the same as the posting pictures, of the non-authentication user in the third preset time according to the posting time and the posting pictures.
In this embodiment, firstly, microblog pictures released by all microblog users in a third preset time are found according to the release time of each microblog, the microblog pictures released in the third preset time are compared with the release pictures of the non-authenticated user respectively, whether the pictures are identical or not is compared, if the pictures are identical, the pictures belong to repeated pictures, and the number of the repeated pictures is increased by 1, so that the number of the repeated pictures identical to the release pictures of the non-authenticated user in the third preset time is obtained.
S404: dividing the number of repeated pictures by the second preset number to obtain the repeatability of the posted pictures.
In this embodiment, the second preset number is determined according to an empirical value, and may specifically be 100000, which is only schematically described in this embodiment, and is not limited thereto; of course, in other embodiments, other values may be set, and may be set reasonably according to actual needs.
In this embodiment, the calculation formula of the repetition degree of the posting picture is as follows:
the posting picture repetition = the same number of repeated pictures/second preset number of posting pictures sent by a non-authenticated user within a third preset time.
S405: and obtaining the basic behavior initial score of the non-authentication user according to the number of commented in the second preset time of the poster, the blank transfer microblog proportion in the microblog of the preset posting number of the poster, the preset weight, the posting content repeatability and the posting picture repeatability.
In this embodiment, the preset weights include a commented weight corresponding to the commented number and a forwarding weight corresponding to the forwarding micro-lottery proportion. The two weights are determined according to an empirical value, the value range is 0.01-0.5, specifically, the commented weight is set to be 0.4, and the forwarding weight is set to be 0.2; of course, in other embodiments, other values may be set, which are only schematically described in the present embodiment, and are not limited thereto.
In this embodiment, the calculation formula of the basic behavior initial score is as follows.
Basic behavior initial score = commented number in second preset time commented weight-poster preset posting number micro-blogs blank transfer micro-blogs proportion of forwarding weight-posting picture repetition-posting content repetition
The method comprises the steps that the repeated microblog quantity which is the same as the posting content in a third preset time is obtained according to the posting time and the posting content, and the posting content repeatability is calculated according to the repeated microblog quantity; obtaining the number of repeated pictures which are the same as the posting pictures in a third preset time according to the posting time and the posting pictures, and calculating the repetition degree of the posting pictures according to the number of the repeated pictures; and finally, obtaining the basic behavior initial score of the non-authentication user according to the number of commented in the second preset time of the poster, the blank transfer microblog proportion in the microblog of the preset posting number of the poster, the preset weight, the posting content repetition degree and the posting picture repetition degree. Comprehensively considering a plurality of behaviors of the microblog user, comprehensively scoring the behaviors to obtain a basic behavior initial score, enriching the influence of the behaviors of the microblog user, enabling the basic behavior initial score to be more accurate, jointly carrying out water army recognition through diversified influence factors, and improving the accuracy of user recognition.
As an exemplary embodiment, step S403 includes steps S4031 to S4034 in the step of obtaining the same number of repeated pictures as the posting pictures by the non-authenticated user in the third preset time according to the posting time and the posting pictures.
Step S4031: and determining the picture size of the posting picture according to the posting picture.
In this embodiment, the picture size corresponding to the posting picture is determined according to the posting picture, and the picture size can be determined according to the picture size.
Step S4032: judging whether the picture size of the posting picture is larger than a preset size. If the size of the posted picture is not greater than the preset size, executing step S4033; if the size of the posted picture is greater than the preset size, step S4034 is performed.
In this embodiment, the preset size is obtained by performing statistical analysis on a large number of pictures, and common expression pictures are excluded through the preset size. How much can be set in particular, this is only schematically described in this embodiment, and is not limited thereto; of course, in other embodiments, other values may be set, and may be set reasonably according to actual needs.
Step S4033: if the picture size of the posted picture is smaller than or equal to the preset size, the posted picture is not subjected to picture repetition comparison, and the number of repeated pictures which are the same as the posted picture in the third preset time is 0.
In this embodiment, when the size of the posted picture is smaller than or equal to the preset size, it is indicated that the posted picture is smaller, possibly an expression picture, etc., that is, the probability of the expression picture is larger, and the picture repetition degree comparison is not performed on the posted picture, and the number of repeated pictures is 0.
Step S4034: if the picture size of the posting picture is larger than the preset size, respectively removing the preset size from the bottom of the picture in the third preset time, comparing the picture repetition degree of the microblog picture with the posting picture with the preset size removed, and determining the number of repeated pictures of the non-authentication user, which is the same as the posting picture, in the third preset time.
In this embodiment, when the picture size of the posting picture is larger than the preset size, it indicates that the posting picture is not an expression picture, and it is necessary to compare the picture repetition degree. And respectively removing preset sizes from the bottoms of the microblog pictures corresponding to all the released microblogs and the posting pictures of the non-authenticated user within a third preset time to remove watermarks in the pictures. And comparing the microblog pictures with the watermark removed with the posting pictures, if the microblog pictures are the same as the posting pictures, adding 1 to the number of the repeated pictures, and obtaining the number of the repeated pictures of the non-authentication user which are the same as the posting pictures in a third preset time after comparing all the microblog pictures.
Determining whether to perform image repetition calculation according to the image size of the posted image; when the size of the posting picture is smaller, the picture repetition degree comparison is not carried out; when the size of the posting picture is larger than the preset size, respectively removing the preset size from the bottom of the picture to remove the picture watermark, avoiding the influence of the picture watermark on the repeatability, and comparing the picture repeatability of the microblog picture and the posting picture with the removed preset size to obtain the number of the repeated pictures; the complexity of picture processing and comparison is reduced on the basis of ensuring the accuracy of repeated picture calculation.
As an exemplary embodiment, step S4 further includes steps S13-S17 after the step of calculating the initial score of the basic behavior of the non-authenticated user corresponding to the microblog data according to the posting content, the posting time, the posting picture, the number of commented pieces in the second preset time of the posting person, the blank-to-paste microblog ratio of the number of the postings preset by the posting person, and the preset weight of the microblog user corresponding to the microblog data if the user authentication level of the microblog user is the non-authenticated user.
Step S13: and judging whether the user name of the poster of the non-authentication user is a full digital user name. If the user name of the poster is a full digital user name, executing step S14; if the poster user name is not the all-digital user name, step S15 is executed.
In this embodiment, the all-digital user name refers to the case of user+all-digital, for example, the all-digital user name is user_123456789.
The probability that the user names are water armies is high by counting a large number of microblog water army user names, so that the basic behavior initial scores of non-authentication users corresponding to the user names need to be reduced.
Step S14: if the poster user name is a full digital user name, subtracting a first preset user name score constant from the basic behavior initial score of the non-authenticated user.
In this embodiment, the value range of the first preset user name score constant is 0.3-0.5, and the specific first preset user name score constant may be 0.3; of course, in other embodiments, other values may be set, which are only schematically described in the present embodiment, and are not limited thereto.
The probability that the user names are water armies is high by counting a large number of microblog water army user names, so that the basic behavior initial scores of non-authentication users corresponding to the user names need to be reduced; subtracting a first preset user name score constant from the basic behavior initial score of the non-authenticated user corresponding to the all-digital user name, and reducing the basic behavior initial score of the non-authenticated user.
Step S15: if the poster user name is not the all-digital user name, judging whether the poster user name of the non-authentication user is the all-Chinese character user name with the preset number. If the user name of the poster is the full Chinese character user name with the preset word number, executing the step S16; if the user name of the poster is not the full kanji user name of the preset number of words, step S17 is executed.
In this embodiment, by counting a large number of microblog water army usernames, the likelihood that the microblog user corresponding to the full kanji usernames with the preset number is water army is also relatively high, so that when the poster user name is not the full kanji usernames, it is further required to determine whether the poster user name is the full kanji usernames with the preset number.
In this embodiment, the preset word number is obtained by counting a large number of microblog water army usernames, and the specific preset word number may be 7; of course, in other embodiments, other values may be set, which are only schematically described in the present embodiment, and are not limited thereto.
Step S16: if the user name of the poster is the full Chinese character user name with the preset word number, subtracting a second preset user name score constant from the initial score of the basic behavior of the non-authentication user, wherein the second preset user name score constant is smaller than the first preset user name score constant.
In this embodiment, the value range of the second preset user name score constant is 0.1-0.3, and the specific second preset user name score constant is 0.1; of course, in other embodiments, other values may be set, which are only schematically described in the present embodiment, and are not limited thereto.
In this embodiment, when the user name of the poster is a full kanji user name with a preset number of words, the likelihood that the microblog user corresponding to the full kanji user name with the preset number of words is a water army increases, so that the basic behavior initial score of the non-authentication user corresponding to the user names needs to be reduced; subtracting a second preset user name score constant from the basic behavior initial score of the non-authenticated user corresponding to the full Chinese character user name with the preset word number, and reducing the basic behavior initial score of the non-authenticated user.
In this embodiment, through statistics on a large number of microblog water armies, the probability that the microblog water armies adopt all-digital user names is greater than the probability that the microblog water armies adopt all-Chinese character user names, so that the second preset user name score constant is smaller than the first preset user name score constant. That is, when the full number is adopted as the user name, the value of the basic behavior initial score of the non-authenticated user is subtracted more, and the basic behavior initial score is more declined; the value of the base behavior initial score of the non-authenticated user using the full kanji user name is relatively smaller and the base behavior initial score is reduced less.
Step S17: if the user name of the poster is not the full Chinese character user name with the preset word number, the initial score of the basic behavior of the non-authentication user is kept unchanged.
In this embodiment, when the user name of the poster is not the full kanji user name of the preset number of words, the water army determination cannot be performed on the user name, so that the initial score of the basic behavior of the non-authenticated user remains unchanged.
According to the method, the initial value of the basic behavior of the non-authentication user is adjusted through whether the user name of the poster is a full-digital user name or not and whether the user name of the poster is a full-Chinese character user name with the preset number of words, when the user name of the poster is a user name commonly used by a microblog water army, the initial value of the basic behavior of the non-authentication user is reduced, the possibility of judging the non-authentication user as the water army is increased, the characteristic of water army identification is enriched through the judgment of the increased user name, and the accuracy of water army identification is improved.
As an exemplary embodiment, step S7 includes steps S701-S704 in the step of obtaining a total score value of a behavior of a non-authenticated user in the initial user relationship diagram according to the initial user relationship diagram, the initial score of the basic behavior, the attention user list of the poster, and the fan user list of the poster.
Step S701: and determining the basic behavior initial score of each non-authentication user, the fan user corresponding to the non-authentication user, the basic behavior initial score corresponding to the fan user, the basic behavior initial score corresponding to the non-authentication attention user and the non-authentication attention user in the attention users according to the initial user relation diagram.
In this embodiment, there may be authenticated and non-authenticated users of interest among the users of interest of the non-authenticated users. Authenticating a user of interest refers to the user of interest being an authenticated user, and non-authenticating the user of interest refers to the user of interest being a non-authenticated user. The basic behavior initial score of the authentication concerned user is unchanged, and the authentication score is preset. In the initial user relationship diagram, if the user is authenticated, the initial score of the basic behavior corresponding to the authenticated user is a preset authentication score, the behavior score of the authenticated user remains unchanged and is not influenced by the associated user related to the authenticated user, and the initial score is a node with a fixed behavior score value in the user relationship diagram.
Step S702: and obtaining the user fan average behavior score corresponding to the non-authentication user according to the basic behavior initial scores corresponding to the fan users and the fan users.
In this embodiment, the number of fan users is counted, and an average value is calculated according to the number of fan users and the initial scores of basic behaviors corresponding to the fan users, so as to obtain the average behavior scores of the fan users corresponding to the non-authentication users. The specific process is that the initial scores of the basic behaviors of all the fan users are added and summed to obtain the total score of the fan users 'user fan behaviors, and then the total score of the fan users' user fan behaviors is divided by the number of fan users to obtain the average user fan behavior score corresponding to the non-authentication user.
In this embodiment, the fan users of the non-authenticated user may include an authenticated fan user and a non-authenticated fan user, where the authenticated fan user refers to the fan user as the authenticated user, and the non-authenticated fan user refers to the fan user as the non-authenticated user. Generally, the authenticated user does not pay attention to the water force user and does not become a fan of the water force user, so when the authenticated fan user exists in the fan users of the non-authenticated user (namely, the fan user is the authenticated user), the initial score of the basic behavior corresponding to the authenticated user is larger, and the fan user fan average behavior score of the fan user is an additive term, so that the initial total score of the basic behavior is increased, the probability of judging the water force is reduced, and the probability of judging the water force is increased. Thus, all fan users are averaged in calculating the user fan average behavioral score.
Step S703: and obtaining the user attention average behavior score corresponding to the non-authentication user according to the non-authentication attention user and the basic behavior initial score corresponding to the non-authentication attention user in the attention users.
In this embodiment, the attention user of the non-authenticated user may include an authenticated attention user and a non-authenticated attention user, where the authenticated attention user refers to the attention user as an authenticated user, and the non-authenticated attention user refers to the attention user as a non-authenticated user. And counting the number of the non-authentication attention users in the attention users, and carrying out average value calculation according to the number of the non-authentication attention users and the initial scores of the basic behaviors corresponding to the non-authentication attention users to obtain the average behavior scores of the user attention corresponding to the non-authentication users. The specific process is that the initial scores of the basic behaviors of all the non-authentication concerned users are added and summed to obtain the total score of the user concerned behaviors of the non-authentication concerned users, and then the total score of the user concerned behaviors of the non-authentication concerned users is divided by the number of the non-authentication concerned users to obtain the average user concerned behavior score corresponding to the non-authentication users.
In this embodiment, when there is an authenticated user of interest among the users of interest that are not authenticated, the probability of determining that the user is not a water army is not increased, for example, the water army user is focused on the already authenticated micro-bollard V user, so that the probability of being a non-water army is not reduced. Therefore, only the behavior influence of the non-authenticated focused user is considered among the focused users, and the behavior influence of the authenticated focused user is not considered. Thus, non-authenticated users of interest are averaged in calculating the user's average behavioral score of interest.
Step S704: and obtaining a total behavior score value corresponding to each non-authenticated user in the initial user relation diagram according to the basic behavior initial score, the user fan average behavior score and the user attention average behavior score of each non-authenticated user.
In this embodiment, the calculation formula of the total score value of the behavior corresponding to the non-authenticated user is as follows:
total behavioral score value = base behavioral initial score + user fan mean behavioral score + non-authenticated user mean behavioral score in the user attention list.
According to the method, the total behavior score value of the non-authenticated user is correspondingly adjusted according to the fan shape of the non-authenticated user and the concerned user, the self behavior of the non-authenticated user, the behavior of the fan shape user and the behavior of the concerned user are taken into consideration, the water army judgment characteristics are increased, and the water army recognition accuracy is improved.
As an exemplary embodiment, step S7 further includes steps S705-S710 after the step of obtaining a total score value of the behavior corresponding to each non-authenticated user in the initial user relationship diagram according to the initial score of the basic behavior of each non-authenticated user, the average behavior score of the user fan and the average behavior score of the user attention.
Step S705: and reconnecting the user name of the micro-blog user corresponding to each micro-blog data, the attention user list of the poster and the behavior total score value into a new user relationship graph, and calculating the behavior total score value of each micro-blog user in the new user relationship graph.
In this embodiment, the user name of the micro-blog user poster, the attention user list of the poster and the total score value of the behavior corresponding to each micro-blog data are connected to form a new user relationship diagram. The nodes and the connecting lines in the new user relation graph and the initial user relation graph are the same, and the difference is that the score values corresponding to the nodes are different. In this embodiment, instead of updating the relationship diagram parameters, redrawing the user relationship diagram is used, which is simpler in code implementation.
The calculation of the total score value of the behavior in the new user relationship graph is similar to the calculation in the initial user relationship graph, and will not be described again.
Step S706: and respectively calculating the score difference value corresponding to each microblog user according to the total score value of the behavior of each microblog user in the initial user relation diagram and the total score value of the behavior of the new user relation diagram.
In this embodiment, for each node (one node corresponds to one microblog user), the total score value of the behavior of one node in the initial user relationship diagram is subtracted from the total score value of the behavior of one node in the new user relationship diagram, and the absolute value of the difference between the total score value and the behavior of the node in the initial user relationship diagram is used as the score difference of the node.
Step S707: and obtaining an average difference value according to the score difference value corresponding to each microblog user and the total number of the microblog users.
In the embodiment, the calculated score difference values of all the nodes are summed to obtain a score total difference value; then, dividing the score total difference by the total number of microblog users to obtain an average difference.
Step S708: and judging whether the average difference value is smaller than a preset difference value threshold value or not. If the average difference is smaller than the preset difference threshold, step S709 is performed; if the average difference is not less than the preset difference threshold, step S710 is performed.
In this embodiment, the preset difference threshold is determined according to a pre-experiment, and may specifically be set to 0.1-0.2, which is only schematically described in this embodiment, but not limited thereto; of course, in other embodiments, other values may be set, and may be set reasonably according to actual needs.
Step S709: and if the average difference value is smaller than a preset difference value threshold, taking the total behavior score value as a final total behavior score value.
In this embodiment, when the average difference is smaller than the preset difference threshold, the error is described as meeting the requirement, and no iteration is required, and the total behavior score value is used as the final total behavior score value.
Step S710: if the average difference is greater than or equal to the preset difference threshold, taking the new user relationship graph as an initial user relationship graph, taking the total score of the behaviors of the microblog users in the new user relationship graph as the initial score of the basic behaviors, and returning to the step S701 until the average difference is smaller than the preset difference threshold.
In this embodiment, when the average difference is not smaller than the preset difference threshold, the error is larger, and the requirement cannot be met, and iterative calculation is required to achieve convergence, the new user relationship diagram is used as the initial user relationship diagram, the total score of the microblog user' S behaviors in the new user relationship diagram is used as the initial score of the basic behavior, and the step S701 is returned to, and repeated calculation is performed until the average difference is smaller than the preset difference threshold.
Reconnecting a user name of a poster of the microblog user, a focused user list of the poster, the focused user list of the poster and a behavior total score value into a new user relation graph, calculating a new behavior total score value, and further calculating a score difference value of the behavior total score of each microblog user in the initial user relation graph and the new user relation graph; and calculating the average difference value of all the microblog users in the relation graph according to the score difference value, and repeatedly and iteratively calculating the total score of the behaviors of the non-authenticated users according to the average difference value, so that the average difference value is reduced, the fluctuation of the total score of the behaviors is reduced, the total score of each microblog user is more stable and accurate, and the accuracy of water force identification is improved.
The embodiment also provides a microblog user identification system, which is used for implementing the above embodiment and the preferred implementation manner, and the details are not repeated. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the system described in the following embodiments is preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.
The embodiment also provides a microblog user identification system, as shown in fig. 2, including:
the first acquisition module 1 is used for acquiring posting content, posting time, posting pictures, posting user names, posting attention user lists, posting silk noodle user lists, blank posting micro-doctor ratios in the micro-blogs with the posting quantity preset by the posting, commented quantity and user authentication level in the second preset time of the posting;
the first judging module 2 is used for respectively judging whether the user authentication level of the microblog user corresponding to each piece of microblog data is an authenticated user;
the first processing module 3 is configured to, if the user authentication level of the microblog user corresponding to the microblog data is an authenticated user, set a basic behavior initial score of the authenticated user corresponding to the microblog data as a preset authentication score;
The second processing module 4 is configured to calculate, if the user authentication level of the microblog user corresponding to the microblog data is a non-authenticated user, a basic behavior initial score of the non-authenticated user corresponding to the microblog data according to the posting content, the posting time, the posting picture, the number of commented pieces in a second preset time of the posting person, a blank-to-paste microblog proportion of the number of the postings preset by the posting person, and a preset weight of the non-authenticated user corresponding to the microblog data;
the third processing module 5 is configured to connect the user name of the sender of the microblog user, the attention user list of the sender and the basic behavior initial score corresponding to each microblog data into an initial user relationship graph, where the user relationship graph is used to represent the relationship between the microblog user and the attention user and the fan user associated with the microblog user;
a fourth processing module 6, configured to set a total score value of behaviors of authenticated users in the initial user relationship diagram as a preset authentication score;
the fifth processing module 7 is configured to obtain a total behavior score value of the non-authenticated user in the initial user relationship diagram according to the initial user relationship diagram, the initial basic behavior score, the attention user list of the poster, and the fan user list of the poster;
The second judging module 8 is configured to respectively judge whether a total score value of the behaviors corresponding to each microblog user is greater than a first preset score value;
the sixth processing module 9 is configured to, if the total score value of the behaviors corresponding to the microblog users is greater than the first preset score value, determine that the microblog users are natural registered account users;
the third judging module 10 is configured to judge whether the total score value of the behavior corresponding to the microblog user is smaller than a second preset score if the total score value of the behavior corresponding to the microblog user is smaller than or equal to the first preset score, where the second preset score is smaller than the first preset score;
the seventh processing module 11 is configured to, if the total score value of the behaviors corresponding to the microblog users is smaller than the second preset score value, determine that the microblog users are unnatural registered account users;
and an eighth processing module 12, configured to, if the total score value of the behaviors corresponding to the microblog users is greater than or equal to the second preset score value, determine that the microblog users are state pending users.
As an exemplary embodiment, the second processing module includes: the first processing unit is used for obtaining the repeated microblog quantity of the non-authentication user, which is the same as the posting content in a third preset time, according to the posting time and the posting content; the second processing unit is used for dividing the number of repeated microblogs by the first preset number to obtain the repeatability of the posting content; the third processing unit is used for obtaining the same repeated picture quantity as the non-authentication user posting pictures in a third preset time according to the posting time and the posting pictures; the fourth processing unit is used for dividing the number of repeated pictures by the second preset number to obtain the repeatability of the posted pictures; and the fifth processing unit is used for obtaining the basic behavior initial score of the non-authentication user according to the number of commented in the second preset time of the poster, the blank-to-paste microblog proportion in the microblog of the preset posting number of the poster, the preset weight, the posting content repeatability and the posting picture repeatability.
As an exemplary embodiment, the third processing unit includes: the first processing subunit is used for determining the picture size of the posting picture according to the posting picture; the judging subunit is used for judging whether the picture size of the posting picture is larger than a preset size; the second processing subunit is configured to, if the size of the picture of the posted picture is smaller than or equal to the preset size, not compare the picture repetition degree of the posted picture, where the number of repeated pictures that are the same as the posted picture in the third preset time is 0; and the third processing subunit is used for respectively removing the preset size from the bottom of the picture when the picture size of the posting picture is larger than the preset size, comparing the picture repetition degree of the microblog picture and the posting picture with the removed preset size, and determining the number of repeated pictures of the non-authentication user, which is the same as the posting picture, in the third preset time.
As an exemplary embodiment, the calculation formula of the basic behavior initial score of the non-authenticated user is as follows:
basic behavior score=number of commented in second preset time is commented on weight-number of postings preset by a postings person, and blank transfer micro-doctor ratio is forwarding weight-postings picture repetition degree-postings content repetition degree.
As an exemplary embodiment, further comprising: a fourth judging module, configured to judge whether the user name of the poster of the non-authenticated user is a full digital user name; the ninth processing module is used for subtracting a first preset user name score constant from the initial score of the basic behavior of the non-authenticated user if the user name of the poster is a full-digital user name; a fifth judging module, configured to judge whether the user name of the poster of the non-authenticated user is a full kanji user name with a preset number of words if the user name of the poster is not a full kanji user name; a tenth processing module, configured to subtract a second preset user name score constant from the initial score of the basic behavior of the non-authenticated user if the user name of the poster is a full-kanji user name with a preset number of words, where the second preset user name score constant is smaller than the first preset user name score constant; and the eleventh processing module is used for keeping the initial score of the basic behavior of the non-authentication user unchanged if the user name of the poster is not the full Chinese character user name with the preset word number.
As an exemplary embodiment, the fifth processing module includes: the sixth processing unit is used for determining basic behavior initial scores of each non-authentication user, fan users corresponding to the non-authentication users, basic behavior initial scores corresponding to the fan users, basic behavior initial scores corresponding to the non-authentication attention users and the non-authentication attention users in the attention users according to the initial user relation diagram; the seventh processing unit is used for obtaining the user fan average behavior score corresponding to the non-authentication user according to the basic behavior initial score corresponding to the fan user and the fan user; the eighth processing unit is used for obtaining the user attention average behavior score corresponding to the non-authentication user according to the non-authentication attention user and the basic behavior initial score corresponding to the non-authentication attention user in the attention users; and the ninth processing unit is used for obtaining the total behavior score value corresponding to each non-authenticated user in the initial user relation graph according to the basic behavior initial score, the user fan average behavior score and the user attention average behavior score of each non-authenticated user.
As an exemplary embodiment, the fifth processing module further includes: a tenth processing unit, configured to reconnect the user name of the micro-blog user poster, the attention user list of the poster and the behavior total score value corresponding to each micro-blog data into a new user relationship graph, and calculate the behavior total score value of each micro-blog user in the new user relationship graph; the eleventh processing unit is used for respectively calculating the score difference value corresponding to each microblog user according to the total score value of the behavior of each microblog user in the initial user relation diagram and the total score value of the behavior of each microblog user in the new user relation diagram; the twelfth processing unit is used for obtaining an average difference value according to the score difference value corresponding to each microblog user and the total number of the microblog users; the judging unit is used for judging whether the average difference value is smaller than a preset difference value threshold value or not; a thirteenth processing unit, configured to take the total behavioral score value as a final total behavioral score value if the average difference value is less than a preset difference threshold; and the fourteenth processing unit is used for taking the new user relation diagram as an initial user relation diagram if the average difference value is greater than or equal to a preset difference value threshold value, taking the total action score of the microblog users in the new user relation diagram as a basic action initial score, and returning to the sixth processing unit until the average difference value is smaller than the preset difference value threshold value.
The microblog user identification system in this embodiment is presented in the form of functional units, where the units refer to ASIC circuits, processors and memories executing one or more software or fixed programs, and/or other devices that may provide the functionality described above.
Further functional descriptions of the above respective modules are the same as those of the above corresponding embodiments, and are not repeated here.
The embodiment of the invention also provides an electronic device, as shown in fig. 3, which includes one or more processors 71 and a memory 72, and in fig. 3, one processor 71 is taken as an example.
The controller may further include: an input device 73 and an output device 74.
The processor 71, memory 72, input device 73 and output device 74 may be connected by a bus or otherwise, for example in fig. 3.
The processor 71 may be a central processing unit (Central Processing Unit, CPU). The processor 71 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or combinations of the above. A general purpose processor may be a microprocessor or any conventional processor or the like.
The memory 72 is used as a non-transitory computer readable storage medium for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the microblog user identification method in the embodiment of the present application. Processor 71 executes various functional applications of the server and data processing, i.e., implements the microblog user identification method of the above-described method embodiment, by running non-transitory software programs, instructions, and modules stored in memory 72.
Memory 72 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created according to the use of a processing device operated by the server, or the like. In addition, memory 72 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 72 may optionally include memory located remotely from processor 71, such remote memory being connectable to the network connection device through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 73 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the processing device of the server. The output device 74 may include a display device such as a display screen.
One or more modules are stored in the memory 72 that, when executed by the one or more processors 71, perform the method shown in fig. 1.
It will be appreciated by those skilled in the art that implementing all or part of the above-described embodiments of the method may be accomplished by a computer program instructing the relevant hardware, and the executed program may be stored in a computer readable storage medium, and the program may include the steps of the above-described embodiments of the microblog user identification method when executed. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a Flash Memory (Flash Memory), a Hard Disk (HDD), or a Solid State Drive (SSD); the storage medium may also comprise a combination of memories of the kind described above.
Although embodiments of the present invention have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope of the invention as defined by the appended claims.

Claims (7)

1. The microblog user identification method is characterized by comprising the following steps of:
acquiring posting content, posting time, posting pictures, a posting user name, a posting user attention list, a posting fan user list, blank transfer microblog proportion in the posting number microblogs preset by the posting, comment quantity and user authentication level of each piece of microblog data in a first preset time;
judging whether the user authentication level of the microblog user corresponding to each piece of microblog data is an authenticated user or not respectively;
if the user authentication grade of the microblog user corresponding to the microblog data is an authenticated user, the basic behavior initial score of the authenticated user corresponding to the microblog data is a preset authentication score;
if the user authentication grade of the microblog user corresponding to the microblog data is a non-authenticated user, calculating a basic behavior initial score of the non-authenticated user corresponding to the microblog data according to the posting content, the posting time, the posting picture, the number of commentary in a second preset time of the posting person, the blank-to-paste microblog proportion of the number of the postings preset by the posting person and the preset weight of the postings;
if the user authentication level of the microblog user corresponding to the microblog data is a non-authenticated user, calculating a basic behavior initial score of the non-authenticated user corresponding to the microblog data according to the posting content, the posting time, the posting picture, the number of commentary in a second preset time of the posting person, the blank-to-post microblog proportion of the number of the postings preset by the posting person and the preset weight of the postings, wherein the step comprises the following steps: obtaining the repeated microblog quantity of the non-authentication user, which is the same as the posting content in a third preset time, according to the posting time and the posting content; dividing the number of repeated microblogs by the first preset number to obtain the repeatability of the posting content; obtaining the same repeated picture quantity as the non-authentication user posting pictures in a third preset time according to the posting time and the posting pictures; dividing the number of repeated pictures by a second preset number to obtain the repeatability of the posting pictures; obtaining basic behavior initial scores of non-authentication users according to the number of commented in a second preset time of the poster, the blank transfer microblog proportion in the microblog of the preset posting number of the poster, preset weight, posting content repeatability and posting picture repeatability;
The calculation formula of the basic behavior initial score of the non-authentication user is as follows:
basic behavior initial score = commented number in second preset time x commented weight-poster preset posting number micro-blogs blank transfer micro-blogs proportion x forwarding weight-posting picture repetition-posting content repetition;
if the user authentication level of the microblog user corresponding to the microblog data is a non-authenticated user, calculating a basic behavior initial score of the non-authenticated user corresponding to the microblog data according to the posting content, the posting time, the posting picture, the number of commentary in a second preset time of the posting person, the blank-to-post microblog proportion of the number of the postings preset by the posting person and the preset weight, wherein the method further comprises the following steps: judging whether the user name of the poster of the non-authentication user is a full digital user name or not; if the user name of the poster is a full digital user name, subtracting a first preset user name score constant from the initial score of the basic behavior of the non-authenticated user; if the user name of the poster is not the all-digital user name, judging whether the user name of the poster of the non-authentication user is the all-Chinese character user name with the preset number; if the user name of the poster is the full Chinese character user name with the preset word number, subtracting a second preset user name score constant from the initial score of the basic behavior of the non-authenticated user, wherein the second preset user name score constant is smaller than the first preset user name score constant; if the user name of the poster is not the full Chinese character user name with the preset word number, the initial score of the basic behavior of the non-authentication user is kept unchanged;
Connecting a user name of a microblog user poster corresponding to each microblog data, a user list concerned by the poster and a basic behavior initial score to form an initial user relation graph, wherein the user relation graph is used for representing the relation between the microblog user and concerned users and fan users associated with the microblog user;
setting the total score value of the behaviors of authenticated users in the initial user relation diagram as a preset authentication score;
obtaining a behavior total score value of a non-authentication user in the initial user relation diagram according to the initial user relation diagram, the basic behavior initial score, the attention user list of the poster and the fan user list of the poster;
respectively judging whether the total score value of the behaviors corresponding to each microblog user is larger than a first preset score value;
if the total behavior score value corresponding to the microblog user is larger than the first preset score value, the microblog user is a natural registered account user;
if the total behavior score value corresponding to the microblog user is smaller than or equal to a first preset score value, judging whether the total behavior score value corresponding to the microblog user is smaller than a second preset score value, wherein the second preset score value is smaller than the first preset score value;
if the total score value of the behaviors corresponding to the microblog users is smaller than a second preset score value, the microblog users are unnatural registered account users;
If the total score value of the behaviors corresponding to the microblog users is greater than or equal to the second preset score value, the microblog users are state undetermined users.
2. The microblog user identification method according to claim 1, wherein the step of obtaining the number of repeated pictures of the non-authenticated user, which is the same as the posting pictures in the third preset time, according to the posting time and the posting pictures includes:
determining the picture size of the posting picture according to the posting picture;
judging whether the picture size of the posting picture is larger than a preset size;
if the picture size of the posting picture is smaller than or equal to the preset size, the posting picture is not subjected to picture repetition degree comparison, and the number of repeated pictures which are the same as the posting picture in the third preset time is 0;
if the picture size of the posting picture is larger than the preset size, respectively removing the preset size from the bottom of the picture in the third preset time, comparing the picture repetition degree of the microblog picture with the posting picture with the preset size removed, and determining the number of repeated pictures of the non-authentication user, which is the same as the posting picture, in the third preset time.
3. The method for identifying microblog users according to claim 1, wherein the step of obtaining the total score value of the behavior of the non-authenticated user in the initial user relationship diagram according to the initial user relationship diagram, the initial score of the basic behavior, the attention user list of the poster and the fan user list of the poster comprises the steps of:
Step S701: determining basic behavior initial scores of each non-authentication user, fan users corresponding to the non-authentication users, basic behavior initial scores corresponding to the fan users, basic behavior initial scores corresponding to the non-authentication attention users and the non-authentication attention users in the attention users according to the initial user relation diagram;
step S702: obtaining an average behavior score of the user fan corresponding to the non-authentication user according to the initial scores of the basic behaviors corresponding to the fan users and the fan users;
step S703: obtaining a user attention average behavior score corresponding to the non-authentication user according to the non-authentication attention user and the basic behavior initial score corresponding to the non-authentication attention user in the attention users;
step S704: and obtaining a total behavior score value corresponding to each non-authenticated user in the initial user relation diagram according to the basic behavior initial score, the user fan average behavior score and the user attention average behavior score of each non-authenticated user.
4. The method for identifying microblog users according to claim 3, wherein after the step of obtaining the total score value of the behaviors corresponding to each non-authenticated user in the initial user relationship diagram according to the initial score of the basic behaviors of each non-authenticated user, the average behavior score of the user fan and the average behavior score of the user attention, the method further comprises:
Step S705: reconnecting a user name of a microblog user corresponding to each microblog data, a attention user list of the poster, an attention user list of the poster and a behavior total score value into a new user relationship graph, and calculating the behavior total score value of each microblog user in the new user relationship graph;
step S706: calculating the score difference value corresponding to each microblog user according to the total score value of the behavior of each microblog user in the initial user relation diagram and the total score value of the behavior of each microblog user in the new user relation diagram;
step S707: obtaining an average difference value according to the score difference value corresponding to each microblog user and the total number of the microblog users;
step S708: judging whether the average difference value is smaller than a preset difference value threshold value or not;
step S709: if the average difference value is smaller than a preset difference value threshold value, taking the behavior total score value as a final behavior total score value;
step S710: if the average difference is greater than or equal to the preset difference threshold, taking the new user relationship graph as an initial user relationship graph, taking the total score of the behaviors of the microblog users in the new user relationship graph as the initial score of the basic behaviors, and returning to the step S701 until the average difference is smaller than the preset difference threshold.
5. A microblog user identification system, comprising:
the first acquisition module is used for acquiring posting content, posting time, posting pictures, posting user names, posting attention user lists, posting fan user lists, blank transfer micro-betting proportion in the posting quantity micro-blogs preset by the posting, commented quantity and user authentication level in the second preset time of the posting;
the first judging module is used for judging whether the user authentication level of the microblog user corresponding to each piece of microblog data is an authenticated user or not;
the first processing module is used for determining that the basic behavior initial score of the authenticated user corresponding to the microblog data is a preset authentication score if the user authentication grade of the microblog user corresponding to the microblog data is the authenticated user;
the second processing module is used for calculating a basic behavior initial score of the non-authenticated user corresponding to the microblog data according to the posting content, posting time, posting pictures, the number of commented in a second preset time of the posting person, the blank posting-transferring microblog proportion in the microblog of the preset posting number of the posting person and the preset weight if the user authentication level of the microblog user corresponding to the microblog data is the non-authenticated user;
The second processing module includes: the first processing unit is used for obtaining the repeated microblog quantity of the non-authentication user, which is the same as the posting content in a third preset time, according to the posting time and the posting content; the second processing unit is used for dividing the number of repeated microblogs by the first preset number to obtain the repeatability of the posting content; the third processing unit is used for obtaining the same repeated picture quantity as the non-authentication user posting pictures in a third preset time according to the posting time and the posting pictures; the fourth processing unit is used for dividing the number of repeated pictures by the second preset number to obtain the repeatability of the posted pictures; the fifth processing unit is used for obtaining a basic behavior initial score of the non-authentication user according to the number of commented in a second preset time of the poster, the blank-to-paste microblog proportion in the microblog of the preset posting number of the poster, preset weight, posting content repeatability and posting picture repeatability;
the calculation formula of the basic behavior initial score of the non-authentication user is as follows:
basic behavior initial score = commented number in second preset time x commented weight-poster preset posting number micro-blogs blank transfer micro-blogs proportion x forwarding weight-posting picture repetition-posting content repetition;
A fourth judging module, configured to judge whether the user name of the poster of the non-authenticated user is a full digital user name; the ninth processing module is used for subtracting a first preset user name score constant from the initial score of the basic behavior of the non-authenticated user if the user name of the poster is a full-digital user name; a fifth judging module, configured to judge whether the user name of the poster of the non-authenticated user is a full kanji user name with a preset number of words if the user name of the poster is not a full kanji user name; a tenth processing module, configured to subtract a second preset user name score constant from the initial score of the basic behavior of the non-authenticated user if the user name of the poster is a full-kanji user name with a preset number of words, where the second preset user name score constant is smaller than the first preset user name score constant; the eleventh processing module is used for keeping the initial score of the basic behavior of the non-authentication user unchanged if the user name of the poster is not the full Chinese character user name with the preset word number;
the third processing module is used for connecting the user name of the user of the micro-blog user, the attention user list of the poster and the basic behavior initial score corresponding to each micro-blog data into an initial user relation graph, wherein the user relation graph is used for representing the relation between the micro-blog user and the attention user and the fan user associated with the micro-blog user;
The fourth processing module is used for setting the total behavior score value of the authenticated user in the initial user relation diagram as a preset authentication score;
the fifth processing module is used for obtaining the total behavior score value of the non-authenticated user in the initial user relation diagram according to the initial user relation diagram, the initial basic behavior score, the attention user list of the poster and the fan user list of the poster;
the second judging module is used for judging whether the total score value of the behaviors corresponding to each microblog user is larger than a first preset score value or not;
the sixth processing module is configured to, if the total score value of the behaviors corresponding to the microblog users is greater than the first preset score value, determine that the microblog users are natural registered account users;
the third judging module is used for judging whether the total behavior score value corresponding to the microblog user is smaller than a second preset score or not if the total behavior score value corresponding to the microblog user is smaller than or equal to the first preset score, and the second preset score is smaller than the first preset score;
the seventh processing module is configured to, if the total score value of the behaviors corresponding to the microblog users is smaller than the second preset score value, determine that the microblog users are unnatural registered account users;
and the eighth processing module is used for determining the microblog user as a state undetermined user if the total behavior score value corresponding to the microblog user is greater than or equal to the second preset score value.
6. An electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to cause the at least one processor to perform the microblog user identification method of any of claims 1-4.
7. A computer-readable storage medium having stored thereon computer instructions for causing the computer to perform the microblog user identification method of any one of claims 1-4.
CN202110938155.XA 2021-08-16 2021-08-16 Microblog user identification method, system, electronic equipment and storage medium Active CN113806616B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110938155.XA CN113806616B (en) 2021-08-16 2021-08-16 Microblog user identification method, system, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110938155.XA CN113806616B (en) 2021-08-16 2021-08-16 Microblog user identification method, system, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113806616A CN113806616A (en) 2021-12-17
CN113806616B true CN113806616B (en) 2023-08-22

Family

ID=78943125

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110938155.XA Active CN113806616B (en) 2021-08-16 2021-08-16 Microblog user identification method, system, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113806616B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103095499A (en) * 2013-01-17 2013-05-08 上海交通大学 Method for capturing water armies on microblog platforms
CN103198161A (en) * 2013-04-28 2013-07-10 中国科学院计算技术研究所 Microblog ghostwriter identifying method and device
EP2933981A1 (en) * 2014-04-17 2015-10-21 Comptel OYJ Method and system of user authentication
CN106354845A (en) * 2016-08-31 2017-01-25 上海交通大学 Microblog rumor recognizing method and system based on propagation structures
CN108052543A (en) * 2017-11-23 2018-05-18 北京工业大学 A kind of similar account detection method of microblogging based on map analysis cluster
CN109558555A (en) * 2018-08-20 2019-04-02 湖北大学 Microblog water army detection method and detection system based on artificial immunity danger theory
CN110956210A (en) * 2019-11-29 2020-04-03 重庆邮电大学 Semi-supervised network water force identification method and system based on AP clustering
US10673880B1 (en) * 2016-09-26 2020-06-02 Splunk Inc. Anomaly detection to identify security threats

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103095499A (en) * 2013-01-17 2013-05-08 上海交通大学 Method for capturing water armies on microblog platforms
CN103198161A (en) * 2013-04-28 2013-07-10 中国科学院计算技术研究所 Microblog ghostwriter identifying method and device
EP2933981A1 (en) * 2014-04-17 2015-10-21 Comptel OYJ Method and system of user authentication
CN106354845A (en) * 2016-08-31 2017-01-25 上海交通大学 Microblog rumor recognizing method and system based on propagation structures
US10673880B1 (en) * 2016-09-26 2020-06-02 Splunk Inc. Anomaly detection to identify security threats
CN108052543A (en) * 2017-11-23 2018-05-18 北京工业大学 A kind of similar account detection method of microblogging based on map analysis cluster
CN109558555A (en) * 2018-08-20 2019-04-02 湖北大学 Microblog water army detection method and detection system based on artificial immunity danger theory
CN110956210A (en) * 2019-11-29 2020-04-03 重庆邮电大学 Semi-supervised network water force identification method and system based on AP clustering

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SGHM:a hybrid model for spammer detection in weibo;Yu Liu等;2014 IEEE/ACM international conference on advances in social networks analysis and mining;942-947 *

Also Published As

Publication number Publication date
CN113806616A (en) 2021-12-17

Similar Documents

Publication Publication Date Title
US20210329094A1 (en) Discovering signature of electronic social networks
CN104717124B (en) A kind of friend recommendation method, apparatus and server
CN104778173B (en) Target user determination method, device and equipment
US20190087505A1 (en) Method, apparatus, and computer-readable storage medium for grouping social network nodes
CN104915354B (en) Multimedia file pushing method and device
US20220172090A1 (en) Data identification method and apparatus, and device, and readable storage medium
CN110213164B (en) Method and device for identifying network key propagator based on topology information fusion
CN107545451B (en) Advertisement pushing method and device
CN107357790B (en) Abnormal message detection method, device and system
CN107330798A (en) Method for identifying ID between a kind of social networks propagated based on seed node
CN110162692B (en) User label determination method and device, computer equipment and storage medium
CN113656698B (en) Training method and device for interest feature extraction model and electronic equipment
CN111858928A (en) Social media rumor detection method and device based on graph structure counterstudy
CN112559692B (en) Question extraction method and device, electronic equipment and storage medium
CN113806616B (en) Microblog user identification method, system, electronic equipment and storage medium
CN110825822B (en) Personnel relationship query method and device, electronic equipment and storage medium
CN106446531B (en) A kind of pedigree tree constructing method based on priori decision model
CN115329078B (en) Text data processing method, device, equipment and storage medium
CN107844522B (en) Target object display method and device and electronic equipment
CN113010797B (en) Smart city data sharing method and system based on cloud platform
CN112214675B (en) Method, device, equipment and computer storage medium for determining user purchasing machine
CN106777157B (en) Topic-based gravity-like model microblog prediction method and system
CN111400678A (en) User detection method and device
CN110309421A (en) A kind of UGC content quality appraisal procedure, device and electronic equipment
US20230222167A1 (en) System, query generation apparatus, query generation method, and non-transitory computer readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant