CN110728543B - Abnormal account identification method and device - Google Patents

Abnormal account identification method and device Download PDF

Info

Publication number
CN110728543B
CN110728543B CN201910982159.0A CN201910982159A CN110728543B CN 110728543 B CN110728543 B CN 110728543B CN 201910982159 A CN201910982159 A CN 201910982159A CN 110728543 B CN110728543 B CN 110728543B
Authority
CN
China
Prior art keywords
account
identified
replying
abnormal
algorithm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910982159.0A
Other languages
Chinese (zh)
Other versions
CN110728543A (en
Inventor
吴明平
梁新敏
陈羲
吴明辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Miaozhen Information Technology Co Ltd
Original Assignee
Miaozhen Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Miaozhen Information Technology Co Ltd filed Critical Miaozhen Information Technology Co Ltd
Priority to CN201910982159.0A priority Critical patent/CN110728543B/en
Publication of CN110728543A publication Critical patent/CN110728543A/en
Application granted granted Critical
Publication of CN110728543B publication Critical patent/CN110728543B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0609Buyer or seller confidence or verification

Abstract

The invention provides a method and a device for identifying an abnormal account, wherein the method comprises the following steps: acquiring account information of an account to be identified; the account information comprises user identity information, user behavior information and content release information; judging whether the account to be identified is abnormal or not according to a preset algorithm and the account information; the preset algorithm comprises one or more of a concentration algorithm, a mean algorithm, a community detection algorithm and a text classification algorithm; if so, determining the abnormal type of the account to be identified; the exception types include at least a machine brush type and a human brush type. The method and the device can effectively improve the accuracy and reliability of abnormal account identification.

Description

Abnormal account identification method and device
Technical Field
The invention relates to the technical field of networks, in particular to a method and a device for identifying an abnormal account.
Background
With the rapid development of the internet and e-commerce, online shopping has become more popular, and people prefer to inquire relevant information in vertical websites before purchasing some commodities, especially in industry forums. However, a large number of navy posting situations exist in many industry forums, and organized posting behaviors are formed, which not only seriously affects the credibility of websites, but also easily misleads the judgment of users.
At present, each major forum adopts some methods for identifying the water army posting account, including identifying the water army posting account according to data statistics, network detection and other modes. However, these methods are not accurate enough for recognition of the naval posting account.
Disclosure of Invention
In view of this, the present invention provides a method and an apparatus for identifying an abnormal account, which can effectively improve the accuracy and reliability of identifying the abnormal account.
In a first aspect, an embodiment of the present invention provides a method for identifying an abnormal account, including: acquiring account information of an account to be identified; the account information comprises user identity information, user behavior information and content release information; judging whether the account to be identified is abnormal or not according to a preset algorithm and the account information; the preset algorithm comprises one or more of a concentration algorithm, a mean algorithm, a community detection algorithm and a text classification algorithm; if so, determining the abnormal type of the account to be identified; the exception types include at least a machine brush type and a human brush type.
In one embodiment, the step of acquiring account information of an account to be identified includes: acquiring account information of an account to be identified in a preset time period.
In one embodiment, the preset algorithm includes a concentration algorithm, and the step of determining whether the account to be identified is abnormal according to the preset algorithm and the account information includes: acquiring the replying time of each replying issued by the account to be identified, and calculating the replying time difference of two adjacent replying; dividing the copybacks issued by the account to be identified into a plurality of copyback blocks based on the copyback time difference; each replying block comprises at least one reply; the replying time difference of two adjacent replying sheets in the same replying block is lower than the replying time difference of two adjacent replying sheets in different replying blocks; calculating the replying concentration corresponding to each replying block according to the replying time of each replying in each replying block and the number of replying in each replying block; calculating the total replying concentration of the account to be identified according to the replying concentration corresponding to each replying block and the total number of the replying blocks; judging whether the total replying concentration is smaller than a preset threshold value or not; if so, determining that the account to be identified is abnormal.
In one embodiment, the step of calculating the reply concentration corresponding to each reply block according to the reply time of each reply in each reply block and the reply number in each reply block comprises: calculating the replying concentration corresponding to each replying block according to the following formula:
Figure BDA0002234061050000021
wherein, C ij Representing the replying concentration of the jth replying block of the account i to be identified; m represents the number of replying in the jth replying block of the account i to be identified; t is t m -t m-1 Representing the time difference between two adjacent posts in the jth post block of the account i to be identified; according to the replying concentration degree corresponding to each replying block and the total number of the replying blocks, the step of calculating the total replying concentration degree of the account to be identified comprises the following steps: calculating the total replying concentration ratio of the account to be identified according to the following formula:
Figure BDA0002234061050000022
wherein, C i Representing the total replying concentration of the account i to be identified; j denotes the total number of posting patches for the account number i to be identified.
In one embodiment, the preset algorithm includes a mean algorithm, and the step of determining whether the account to be identified is abnormal according to the preset algorithm and the account information includes: acquiring the total number of replies issued by the account to be identified and the word number of each reply issued by the account to be identified; calculating the average number of the copybacks of the account to be identified according to the total number of the copybacks of the account to be identified and the number of the characters of each copyback; judging whether the average number of the returned words is less than the preset number of words and whether the number of the returned replies issued by the account to be identified is greater than the preset number; and if the judgment results are yes, determining that the account to be identified is abnormal.
In one embodiment, the step of calculating the average number of returned words of the account to be identified according to the total number of returned posts of the account to be identified and the number of words of each returned post comprises the following steps: calculating the average number of copyback characters of the account to be identified according to the following formula:
Figure BDA0002234061050000031
wherein, T i Representing the average number of copyback words of the account i to be identified; n represents the total number of replying of the account i to be identified; q n Indicating to be identifiedThe number of words of the nth reply of the account i.
In one embodiment, the preset algorithm comprises a community detection algorithm, and the step of judging whether the account to be identified is abnormal or not according to the preset algorithm and the account information comprises the following steps: acquiring the number of posts, the number of replies and the posting time period issued by the account to be identified; if the increment of the number of the primary posts in the posting time period and the increment of the number of the replying posts in the posting time period are both larger than a first preset number, constructing a relation graph between the primary posts and the replying posts issued by the account to be identified in the posting time period; carrying out community identification on the relationship graph by adopting a community detection algorithm to obtain the number of the newly registered accounts and the number of the posts released by the newly registered accounts; judging whether the number of the newly registered account numbers is larger than a second preset number and whether the number of the main posts issued by the newly registered account numbers is smaller than a third preset number; and if the judgment results are yes, determining that the account to be identified is abnormal.
In one embodiment, the preset algorithm includes a text classification algorithm, and the step of determining whether the account to be identified is abnormal according to the preset algorithm and the account information includes: acquiring the text information of a fourth preset number of playing essence posts and the text information of a master post issued by the account to be identified; according to the acquired text information of the play essence posts, text classification is carried out on the text information of the main posts issued by the account to be identified by adopting a text classification algorithm, so that the number of the main posts as the play essence posts is obtained; judging whether the number of the main posts as the play essence posts is larger than a fifth preset number or not; if so, determining that the account to be identified is abnormal.
In one embodiment, the exception type further includes a essence post brush type; the method comprises the following steps of determining the abnormal type of the account to be identified, wherein the steps comprise: if the account to be identified is judged to be abnormal according to the concentration algorithm or the mean algorithm, determining the abnormal type of the account to be identified as a machine posting type; if the account to be identified is judged to be abnormal according to the community detection algorithm, determining that the abnormal type of the account to be identified is a human posting type; and if the account to be identified is judged to be abnormal according to the text classification algorithm, determining that the abnormal type of the account to be identified is the essence posting type.
In a second aspect, an embodiment of the present invention provides an apparatus for identifying an abnormal account, including an account information obtaining module, configured to obtain account information of an account to be identified; the account information comprises user identity information, user behavior information and content release information; the account abnormity judging module is used for judging whether the account to be identified is abnormal or not according to a preset algorithm and the account information; the preset algorithm comprises a concentration algorithm, a mean algorithm, a community detection algorithm or a text classification algorithm; the account abnormity type determination module is used for determining the abnormity type of the account to be identified; the exception types include at least a machine brush type and a human brush type.
The method and the device for identifying the abnormal account, provided by the embodiment of the invention, can firstly acquire account information (user identity information, user behavior information and content release information) of the account to be identified, so as to judge whether the account to be identified is abnormal or not according to a preset algorithm and the account information and determine the abnormal type, wherein the preset algorithm comprises one or more of a concentration algorithm, a mean algorithm, a community detection algorithm and a text classification algorithm. According to the method, when the account is subjected to abnormal identification, the user identity, the user behavior and the content release information are fully considered, the abnormal account can be identified more accurately based on the method, and the abnormal identification is performed on the account by adopting one or more algorithms of a concentration algorithm, a mean algorithm, a community detection algorithm and a text classification algorithm, so that the accuracy and reliability of the abnormal account identification can be further improved.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a schematic flowchart of a method for identifying an abnormal account according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a method for dividing the reply blocks based on the reply time difference according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an apparatus for identifying an abnormal account according to an embodiment of the present invention.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
At present, the water army brush plaster does not only irrigate water and increase sound volume, but forms some organized behaviors. The existing identification method of the naval account number comprises the following steps: the forum reply data is statistically analyzed, and a user cooperation network is constructed to identify organized forum water army; regarding the identification problem of the naval account as a two-classification problem, taking the personal information of a user as a characteristic, and adopting a classification algorithm such as logistic regression and the like to identify the naval account and the non-naval account; and clustering the personal information and the statistical posting information of the user by adopting a clustering algorithm, and selecting certain categories from different categories obtained by clustering to be regarded as water army. However, the existing identification method is not accurate enough for identifying the naval account, and based on this, the embodiment of the present invention provides an identification method and an identification device for an abnormal account.
To facilitate understanding of the present embodiment, first, a method for identifying an abnormal account disclosed in the embodiment of the present invention is described in detail, referring to a flowchart of a method for identifying an abnormal account shown in fig. 1, where the method may be executed by an electronic device, and mainly includes the following steps S101 to S103:
step S101: and acquiring account information of the account to be identified.
The account information comprises user identity information, user behavior information and content publishing information. The user identity information can be user id, main post user id, post-back user id, registration time and the like; the user behavior information may be information related to the user posting the primary post and the post back, such as: posting time, primary post id, primary post number of the user, essence post number and the like; the content distribution information may be a post title, a post content, a posting content, and the like. In one embodiment, the acquired account information may be stored in a sorted manner according to a user information table, a user master post table and a user reply table. As shown in table 1, the user information table includes account information in table 1: user id, user master post number, user reply number, essence post number and registration time;
table 1: user information table
User id User account number Number of user replies Essence label Registration time
The user master post table is shown in the following table 2, and the account information included in the table 2 includes: user id, posting time, master post id, master post title, master post content and whether or not to post essence;
table 2: user master card table
User id Time of posting Master ID Master card title Master post content Essence label
Referring to table 3, the user posting list includes the account information in table 3: a master id, a reply id, a posting time, a master id, and a reply content.
Table 3: user reply form
For master cardsUser id User id of copyback Time of posting Master ID Content of reply
It should be understood that the contents contained in the above tables 1 to 2 are only illustrative and should not be construed as limiting.
Step S102: and judging whether the account to be identified is abnormal or not according to a preset algorithm and the account information.
The preset algorithm comprises one or more of a concentration algorithm, a mean algorithm, a community detection algorithm and a text classification algorithm. In one embodiment, the concentration algorithm may be to calculate the concentration of the replies posted by the user, that is, to calculate an average value of time differences between two adjacent replies posted by the user over a period of time; the average algorithm may be an average of calculating the number of user posting words; the community detection algorithm may be an LPA algorithm; the text classification algorithm may be an SVM text classification algorithm, and specifically, the SVM text classification algorithm is implemented by using an algorithm package Libsvm.
Step S103: if so, determining the abnormal type of the account to be identified.
Wherein the exception types at least comprise a machine brush type and an artificial brush type. The machine posting type can be a machine navy account number for posting a large number of posts, and the account number is characterized in that a large number of posts are posted in a concentrated time period, but if the posts replied by the user are personal primary posts, the time limit can be relaxed; the manual posting type can be a naval organization posting, an organized naval posting, and generally means that a responsible person issues a posting task according to activity time, and a plurality of new account numbers conduct posting behaviors simultaneously, and generally comprises two situations of main posting and reply posting.
The method for identifying the abnormal account provided by the embodiment of the invention can firstly acquire the account information (user identity information, user behavior information and content release information) of the account to be identified, so as to judge whether the account to be identified is abnormal or not according to a preset algorithm and the account information and determine the abnormal type, wherein the preset algorithm comprises one or more of a concentration algorithm, a mean algorithm, a community detection algorithm and a text classification algorithm. According to the method, when the account is subjected to abnormal identification, the user identity, the user behavior and the content release information are fully considered, the abnormal account can be identified more accurately based on the method, and the abnormal identification is performed on the account by adopting one or more algorithms of a concentration algorithm, a mean algorithm, a community detection algorithm and a text classification algorithm, so that the accuracy and reliability of the abnormal account identification can be further improved.
In order to identify the abnormal account more accurately and reliably, the account information of the account to be identified is acquired, including acquiring the account information of the account to be identified in a preset time period. In one embodiment, account information of the user in the vertical forum for 3 months may be acquired, and the acquired account information is classified and stored according to the user information table, the user master post table and the user reply table.
In consideration of various types of abnormal account numbers, the adopted preset algorithms are different in order to more accurately identify the different types of abnormal account numbers. Specific implementation modes for judging whether the account to be identified is abnormal or not are given according to different preset algorithms.
When the preset algorithm is a concentration algorithm, judging whether the account to be identified is abnormal according to the preset algorithm and the account information, wherein the method mainly comprises the following steps of a 1-a 6:
step a 1: acquiring the replying time of each replying issued by the account to be identified, and calculating the replying time difference of two adjacent replying;
in one embodiment, the machine account number which swipes the reply will typically issue a large number of reply within a concentrated period of time, so the reply time for acquiring each reply issued by the account number to be recognized may be the reply time for acquiring each reply issued by the account number to be recognized within the period of time, and if the post replied by the user is a primary post issued by the user himself, the limitation of the period of time may be relaxed.
Step a 2: dividing the copybacks issued by the account to be identified into a plurality of copyback blocks based on the copyback time difference; each replying block comprises at least one reply; the replying time difference of two adjacent replying sheets in the same replying block is lower than the replying time difference of two adjacent replying sheets in different replying blocks;
in one embodiment, the division of the reply issued by the account to be identified into a plurality of reply blocks based on the reply time difference may be performed when the difference between two adjacent reply times issued by the account to be identified is greater than 30 minutes. For ease of understanding, referring to a schematic diagram of a method for dividing the posting blocks based on the posting time difference shown in fig. 2, a specific manner of dividing the posting blocks is illustrated.
As shown in fig. 2, the numbers 1 to 13 represent replies issued by the account over a period of time, the arrow represents the posting time, and the interval represents the difference in the replying time between two replies. As can be seen from the figure, the difference of the replying time between two adjacent replys of the user is different. When the user replies and is blocked, the blocking is carried out when the difference between two adjacent replying times is more than 30 minutes. As shown in the figure, the difference in the replying time between the replying section 2 and the replying section 3, between the replying section 6 and the replying section 7, between the replying section 10 and the replying section 11, and between the replying section 11 and the replying section 12 is greater than 30 minutes, so that 13 replying sections shown in the figure are divided into blocks between the replying section 2 and the replying section 3, between the replying section 6 and the replying section 7, between the replying section 10 and the replying section 11, and between the replying section 11 and the replying section 12, that is, the replying sections 1 and 2 are a replying section block, the replying sections 3, 4, 5 and 6 are a replying section block, the replying sections 7, 8, 9 and 10 are a replying section block, the replying section 11 is a replying section block, and the replying sections 12 and 13 are a replying section block.
Step a 3: calculating the replying concentration corresponding to each replying block according to the replying time of each replying in each replying block and the number of replying in each replying block;
in a specific embodiment, the step of calculating the reply concentration corresponding to each reply block according to the reply time of each reply in each reply block and the reply number in each reply block may be calculating the reply concentration corresponding to each reply block according to the following formula:
Figure BDA0002234061050000091
wherein, C ij Representing the replying concentration of the jth replying block of the account i to be identified; m represents the number of replying in the jth replying block of the account i to be identified; t is t m -t m-1 Representing the time difference between two adjacent posts in the jth post block of the account i to be identified.
Step a 4: calculating the total replying concentration of the account to be identified according to the replying concentration corresponding to each replying block and the total number of the replying blocks;
in a specific embodiment, the step of calculating the total replying concentration of the account to be recognized according to the replying concentration corresponding to each replying block and the total number of the replying blocks may be calculating the total replying concentration of the account to be recognized according to the following formula:
Figure BDA0002234061050000101
wherein, C i Representing the total replying concentration of the account i to be identified; c ij Representing the replying concentration of the jth replying block of the account i to be identified; j denotes the total number of posting patches for the account number i to be identified.
Step a 5: judging whether the total replying concentration is smaller than a preset threshold value or not;
in one embodiment, the total posting concentration is calculated as the time interval between the average user replying to a post, and generally speaking, the normal user operation speed is 60 seconds on average to complete one posting, so that when the preset threshold value of the total posting concentration is set to 60 seconds.
Step a 6: if so, determining that the account to be identified is abnormal.
In a specific embodiment, when the calculated total replying concentration of the account to be identified is less than 60 seconds, the account to be identified can be determined to be an abnormal account.
When the preset algorithm comprises a mean value algorithm, judging whether the account to be identified is abnormal according to the preset algorithm and the account information, wherein the steps from b1 to b4 are mainly as follows:
step b 1: acquiring the total number of replies issued by the account to be identified and the word number of each reply issued by the account to be identified;
step b 2: calculating the average number of the copybacks of the account to be identified according to the total number of the copybacks of the account to be identified and the number of the characters of each copyback;
in one embodiment, the step of calculating the average number of the copybacks of the account to be identified according to the total number of the copybacks of the account to be identified and the number of the characters of each copyback may be calculating the average number of the copybacks of the account to be identified according to the following formula:
Figure BDA0002234061050000102
wherein, T i Representing the average number of the copyback words of the account i to be identified; n represents the total number of replying of the account i to be identified; q n And the number of words representing the nth reply of the account i to be identified.
Step b 3: judging whether the average number of the replying words is less than the preset number of words and whether the number of the replying words issued by the account to be identified is more than the preset number;
step b 4: and if the judgment results are yes, determining that the account to be identified is abnormal.
In a specific embodiment, when it is determined that the average number of words of the replies issued by the account to be identified is less and the total number of issued replies is more, it may be determined that the account to be identified is an abnormal account.
And (III) when the preset algorithm comprises a community detection algorithm, judging whether the account to be identified is abnormal according to the preset algorithm and the account information, wherein the method mainly comprises the following steps of c 1-c 5:
step c 1: acquiring the number of posts, the number of replies and the posting time period issued by the account to be identified;
step c 2: if the increment of the number of the primary posts in the posting time period and the increment of the number of the replying posts in the posting time period are both larger than a first preset number, constructing a relation graph between the primary posts and the replying posts issued by the account to be identified in the posting time period;
in a specific embodiment, the acquiring of the number of posts, the number of replies, and the posting time period issued by the account to be identified may be to count the number of posts and the number of replies issued by the account to be identified according to a time sequence, select a time period in which the number of posts and the number of replies are greatly increased, and construct a relationship diagram between the posts and the replies issued by the account to be identified in the selected time period.
Step c 3: carrying out community identification on the relationship graph by adopting a community detection algorithm to obtain the number of the newly registered accounts and the number of the posts released by the newly registered accounts;
in one embodiment, the community detection algorithm may be an LPA algorithm, and the logic of the LPA algorithm includes the following steps (1) to (3):
step (1): assigning a corresponding label to each node, namely, the node 1 corresponds to the label 1, and the node i corresponds to the label i (if any, the label i is directly used, and if not generated);
step (2): traversing N nodes (for i is 1: N), finding out neighbors of corresponding nodes, obtaining neighbor labels of the nodes, finding out labels with the maximum occurrence frequency, and randomly selecting one label to replace the label with the node label if more than one label with the maximum occurrence frequency is found;
and (3): if the node label is not changed any more, the iteration is stopped, otherwise, the step (2) is repeated.
Step c 4: judging whether the number of the newly registered account numbers is larger than a second preset number and whether the number of the main posts issued by the newly registered account numbers is smaller than a third preset number;
in a specific embodiment, the third preset value of the number of the posts issued by the newly registered account can be set to 3.
Step c 5: and if the judgment results are yes, determining that the account to be identified is abnormal.
In one embodiment, when the detected community comprises a large number of newly registered users and the number of posts of the users is less than 3, the account to be identified is determined to be an abnormal account.
And (IV) when the preset algorithm comprises a text classification algorithm, judging whether the account to be identified is abnormal according to the preset algorithm and the account information, wherein the steps from d1 to d4 are mainly as follows:
step d 1: acquiring text information of a fourth preset number of play essence posts and text information of a main post issued by the account to be identified;
in one embodiment, the essence post water army account number is generally long in registration years, usually issues some play-like main posts in exchange for oil cards or other benefits, and is usually essence post, such as the post content is "is the idea that not all urban people have a chance to escape from a city? Want to go to nature and fly away. Just the other way around i are such people. This time, I can leave the car to go to the outskirts and leave the noise of the city to enjoy the natural beauty. The identification of the account number can be to acquire text information of a certain number of play essence posts, namely manually labeling some play essence posts, and labeling some normal main posts for later text classification.
Step d 2: according to the acquired text information of the play essence posts, text classification is carried out on the text information of the main posts issued by the account to be identified by adopting a text classification algorithm, so that the number of the main posts as the play essence posts is obtained;
in a specific embodiment, the text classification algorithm may be an SVM text classification algorithm, which mainly includes four steps: text feature extraction, text feature representation, normalization processing and text classification. Specifically, the SVM text classification algorithm can be implemented by using an algorithm package Libsvm, and the implementation process includes: the model returns two results: label and score, where label is its predicted label; and score is the degree of membership to the class to which the sample belongs, with a higher score representing a higher confidence in the class.
Step d 3: judging whether the number of the main posts as the play essence posts is larger than a fifth preset number or not;
step d 4: if so, determining that the account to be identified is abnormal.
In a specific embodiment, when it is detected that the number of the primary posts issued by the account to be identified is larger than the number of the play-type essence posts, it is determined that the account to be identified is an abnormal account.
After the account to be identified is identified to be abnormal through the plurality of preset algorithms, the embodiment may further determine the abnormal type of the account to be identified, in an implementation manner, the abnormal type may include a machine posting type and a human posting type, and on this basis, the abnormal type may further include an essence posting type. This type of account is typically long in registration years, usually issues a master post for some play categories in exchange for a fuel card or other benefits, and is usually a essence post, so for such abnormal account it can be considered as a essence post type. The step of determining the abnormal type of the account to be identified in this embodiment mainly includes the following steps 1 to 3:
step 1: if the account to be identified is judged to be abnormal according to the concentration algorithm or the mean algorithm, determining the abnormal type of the account to be identified as a machine posting type;
step 2: if the account to be identified is judged to be abnormal according to the community detection algorithm, determining that the abnormal type of the account to be identified is a human posting type;
and step 3: and if the account to be identified is judged to be abnormal according to the text classification algorithm, determining that the abnormal type of the account to be identified is the essence posting type.
In summary, the method for identifying an abnormal account according to the embodiment of the present invention can first obtain account information (user identity information, user behavior information, and content publishing information) of an account to be identified, so as to determine whether the account to be identified is abnormal according to a preset algorithm and the account information, and determine an abnormal type, where the preset algorithm includes one or more of a concentration algorithm, a mean algorithm, a community detection algorithm, and a text classification algorithm. According to the method, when the account is subjected to abnormal identification, the user identity, the user behavior and the content release information are fully considered, the abnormal account can be identified more accurately based on the method, and the abnormal identification is performed on the account by adopting one or more algorithms of a concentration algorithm, a mean algorithm, a community detection algorithm and a text classification algorithm, so that the accuracy and reliability of the abnormal account identification can be further improved.
As to the method for identifying an abnormal account number provided in the foregoing embodiment, an embodiment of the present invention further provides an apparatus for identifying an abnormal account number, and referring to a schematic structural diagram of an apparatus for identifying an abnormal account number shown in fig. 3, the apparatus may include the following components:
an account information obtaining module 301, configured to obtain account information of an account to be identified; the account information comprises user identity information, user behavior information and content release information.
The account abnormality judgment module 302 is configured to judge whether the account to be identified is abnormal according to a preset algorithm and account information; the preset algorithm comprises a concentration algorithm, a mean algorithm, a community detection algorithm or a text classification algorithm.
An account abnormality type determining module 303, configured to determine an abnormality type of an account to be identified; the exception types include at least a machine brush type and a human brush type.
The device provided by the embodiment of the invention fully considers the user identity, the user behavior and the content release information when the account is abnormally identified, can accurately identify the abnormal account based on the account, and can further improve the accuracy and reliability of the abnormal account identification by adopting one or more algorithms of a concentration algorithm, a mean algorithm, a community detection algorithm and a text classification algorithm to carry out the abnormal identification on the account.
In an embodiment, the account abnormality determining module 302 is further configured to: acquiring the replying time of each replying issued by the account to be identified, and calculating the replying time difference of two adjacent replying; dividing the copybacks issued by the account to be identified into a plurality of copyback blocks based on the copyback time difference; each replying block comprises at least one reply; the replying time difference of two adjacent replying sheets in the same replying block is lower than the replying time difference of two adjacent replying sheets in different replying blocks; calculating the replying concentration corresponding to each replying block according to the replying time of each replying in each replying block and the number of replying in each replying block; calculating the total replying concentration of the account to be identified according to the replying concentration corresponding to each replying block and the total number of the replying blocks; judging whether the total replying concentration is smaller than a preset threshold value or not; if so, determining that the account to be identified is abnormal.
In an embodiment, the account abnormality determining module 302 is further configured to: acquiring the total number of replies issued by the account to be identified and the word number of each reply issued by the account to be identified; calculating the average number of the copybacks of the account to be identified according to the total number of the copybacks of the account to be identified and the number of the characters of each copyback; judging whether the average number of the returned words is less than the preset number of words and whether the number of the returned replies issued by the account to be identified is greater than the preset number; and if the judgment results are yes, determining that the account to be identified is abnormal.
In an embodiment, the account abnormality determining module 302 is further configured to: acquiring the number of posts, the number of replies and the posting time period issued by the account to be identified; if the increment of the number of the primary posts in the posting time period and the increment of the number of the replying posts in the posting time period are both larger than a first preset number, constructing a relation graph between the primary posts and the replying posts issued by the account to be identified in the posting time period; carrying out community identification on the relationship graph by adopting a community detection algorithm to obtain the number of the newly registered accounts and the number of the posts released by the newly registered accounts; judging whether the number of the newly registered account numbers is larger than a second preset number and whether the number of the main posts issued by the newly registered account numbers is smaller than a third preset number; and if the judgment results are yes, determining that the account to be identified is abnormal.
In an embodiment, the account abnormality determining module 302 is further configured to: acquiring text information of a fourth preset number of play essence posts and text information of a main post issued by the account to be identified; according to the acquired text information of the play essence posts, text classification is carried out on the text information of the main posts issued by the account to be identified by adopting a text classification algorithm, so that the number of the main posts as the play essence posts is obtained; judging whether the number of the main posts as the play essence posts is larger than a fifth preset number or not; if so, determining that the account to be identified is abnormal.
In one embodiment, the exception type further includes a type of a serum post brush; the account abnormality type determining module 303 is further configured to: if the account to be identified is judged to be abnormal according to the concentration algorithm or the mean algorithm, determining the abnormal type of the account to be identified as a machine posting type; if the account to be identified is judged to be abnormal according to the community detection algorithm, determining that the abnormal type of the account to be identified is a human posting type; and if the account to be identified is judged to be abnormal according to the text classification algorithm, determining that the abnormal type of the account to be identified is the essence posting type.
The device provided by the embodiment of the present invention has the same implementation principle and the same technical effects as those of the foregoing method embodiments, and for the sake of brief description, reference may be made to corresponding contents in the foregoing method embodiments for the parts of the device embodiments that are not mentioned.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (9)

1. A method for identifying an abnormal account is characterized by comprising the following steps:
acquiring account information of an account to be identified; the account information comprises user identity information, user behavior information and content release information;
judging whether the account to be identified is abnormal or not according to a preset algorithm and the account information; the preset algorithm comprises one or more of a concentration algorithm, a mean algorithm, a community detection algorithm and a text classification algorithm;
if so, determining the abnormal type of the account to be identified; the abnormal types at least comprise a machine brush type and a man-made brush type;
the preset algorithm comprises a text classification algorithm, and the step of judging whether the account to be identified is abnormal or not according to the preset algorithm and the account information comprises the following steps:
acquiring text information of a fourth preset number of play essence posts and text information of a main post issued by the account to be identified;
according to the acquired text information of the play essence posts, text classification is carried out on the text information of the main posts issued by the account to be identified by adopting the text classification algorithm, so that the number of the main posts as the play essence posts is obtained;
judging whether the number of the main posts which are the play essence posts is larger than a fifth preset number or not;
if yes, determining that the account number to be identified is abnormal.
2. The method according to claim 1, wherein the step of acquiring account information of the account to be identified comprises:
acquiring account information of an account to be identified in a preset time period.
3. The method according to claim 1, wherein the preset algorithm comprises a concentration algorithm, and the step of determining whether the account to be identified is abnormal according to the preset algorithm and the account information comprises:
acquiring the replying time of each replying issued by the account to be identified, and calculating the replying time difference of two adjacent replying;
dividing the reply issued by the account to be identified into a plurality of reply blocks based on the reply time difference; each replying block comprises at least one reply; the replying time difference of two adjacent replying sheets in the same replying block is lower than the replying time difference of two adjacent replying sheets in different replying blocks;
calculating the replying concentration corresponding to each replying block according to the replying time of each replying in each replying block and the number of replying in each replying block;
calculating the total replying concentration of the account to be identified according to the replying concentration corresponding to each replying block and the total number of the replying blocks;
judging whether the total replying concentration is smaller than a preset threshold value or not;
if so, determining that the account to be identified is abnormal.
4. The method of claim 3, wherein the step of calculating the corresponding reply concentration of each of the reply blocks according to the reply time of each of the replies in each of the reply blocks and the reply number in each of the reply blocks comprises:
calculating the corresponding reply concentration ratio of each reply block according to the following formula:
Figure FDA0003640822610000021
wherein, C ij Representing the replying concentration of the jth replying block of the account i to be identified; m represents the number of posts returned in the jth post block of the account i to be identified; t is t m -t m-1 Representing the time difference between two adjacent posts in the jth post block of the account i to be identified;
according to every the reply concentration that the reply piece corresponds with the total number of reply piece, calculate the step of waiting to discern the total reply concentration of account number includes:
calculating the total replying concentration of the account to be identified according to the following formula:
Figure FDA0003640822610000022
wherein, C i Representing the total replying concentration of the account i to be identified; j represents the total number of the replying blocks of the account i to be identified.
5. The method according to claim 1, wherein the preset algorithm comprises a mean algorithm, and the step of determining whether the account to be identified is abnormal according to the preset algorithm and the account information comprises:
acquiring the total number of replies issued by the account to be identified and the word number of each reply issued by the account to be identified;
calculating the average number of the copybacks of the account to be identified according to the total number of the copybacks of the account to be identified and the number of the characters of each copyback;
judging whether the average number of the returned words is less than a preset number of words and whether the total number of returned words issued by the account to be identified is greater than a preset total number;
and if the judgment results are yes, determining that the account to be identified is abnormal.
6. The method of claim 5, wherein the step of calculating the average number of returned words of the account to be identified according to the total number of returned posts of the account to be identified and the number of words of each returned post comprises:
calculating the average number of the copybacks of the account to be identified according to the following formula:
Figure FDA0003640822610000031
wherein, T i Representing the average number of copyback words of the account i to be identified; n represents the total number of replying of the account i to be identified; q n And the number of words representing the nth reply of the account i to be identified.
7. The method according to claim 1, wherein the preset algorithm comprises a community detection algorithm, and the step of determining whether the account to be identified is abnormal according to the preset algorithm and the account information comprises:
acquiring the number of posts, the number of replies and the posting time period issued by the account to be identified;
if the increment of the number of the master posts in the posting time period and the increment of the number of the replys in the posting time period are both larger than a first preset number, constructing a relation graph between the master posts and the replys issued by the account to be identified in the posting time period;
carrying out community identification on the relationship graph by adopting the community detection algorithm to obtain the number of new registered accounts and the number of posts released by the new registered accounts;
judging whether the number of the new registered accounts is larger than a second preset number or not and whether the number of the master posts issued by the new registered accounts is smaller than a third preset number or not;
and if the judgment results are yes, determining that the account to be identified is abnormal.
8. The method of claim 1, wherein the exception types further include a essence post type;
the step of determining the abnormal type of the account to be identified includes:
if the account to be identified is judged to be abnormal according to a concentration algorithm or a mean algorithm, determining that the abnormal type of the account to be identified is a machine posting type;
if the account to be identified is judged to be abnormal according to a community detection algorithm, determining that the abnormal type of the account to be identified is an artificial posting type;
and if the account to be identified is judged to be abnormal according to a text classification algorithm, determining that the abnormal type of the account to be identified is an essence posting type.
9. An identification device for abnormal account numbers is characterized by comprising
The account information acquisition module is used for acquiring account information of an account to be identified; the account information comprises user identity information, user behavior information and content release information;
the account abnormity judging module is used for judging whether the account to be identified is abnormal or not according to a preset algorithm and the account information; the preset algorithm comprises a concentration algorithm, a mean algorithm, a community detection algorithm or a text classification algorithm;
the account abnormity type determination module is used for determining the abnormity type of the account to be identified; the abnormal types at least comprise a machine brush type and a man-made brush type;
the preset algorithm comprises a text classification algorithm, and the account abnormality judgment module is further configured to: acquiring text information of a fourth preset number of play essence posts and text information of a main post issued by the account to be identified; according to the acquired text information of the play type essence posts, text classification is carried out on the text information of the main posts issued by the account to be identified by adopting the text classification algorithm, and the number of the main posts as the play type essence posts is obtained; judging whether the number of the main posts which are the play essence posts is larger than a fifth preset number or not; if yes, determining that the account number to be identified is abnormal.
CN201910982159.0A 2019-10-15 2019-10-15 Abnormal account identification method and device Active CN110728543B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910982159.0A CN110728543B (en) 2019-10-15 2019-10-15 Abnormal account identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910982159.0A CN110728543B (en) 2019-10-15 2019-10-15 Abnormal account identification method and device

Publications (2)

Publication Number Publication Date
CN110728543A CN110728543A (en) 2020-01-24
CN110728543B true CN110728543B (en) 2022-08-09

Family

ID=69221418

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910982159.0A Active CN110728543B (en) 2019-10-15 2019-10-15 Abnormal account identification method and device

Country Status (1)

Country Link
CN (1) CN110728543B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111340544A (en) * 2020-02-25 2020-06-26 上海昌投网络科技有限公司 Method and device for judging whether WeChat public number is read by swiping
CN111507377B (en) * 2020-03-24 2023-08-11 微梦创科网络科技(中国)有限公司 Method and device for identifying number-keeping accounts in batches
CN112905662A (en) * 2021-02-08 2021-06-04 上海宏原信息科技有限公司 Method, system and device for distinguishing true and false consumers of internet
CN112989167B (en) * 2021-04-15 2021-08-06 腾讯科技(深圳)有限公司 Method, device and equipment for identifying transport account and computer readable storage medium
CN116882409A (en) * 2023-09-08 2023-10-13 中国科学院自动化研究所 Abnormal account detection method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103617235A (en) * 2013-11-26 2014-03-05 中国科学院信息工程研究所 Method and system for network navy account number identification based on particle swarm optimization
CN103778186A (en) * 2013-12-31 2014-05-07 南京财经大学 Method for detecting sockpuppet
CN107332931A (en) * 2017-08-07 2017-11-07 合肥工业大学 The recognition methods of waterborne troops of machine type forum and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103617235A (en) * 2013-11-26 2014-03-05 中国科学院信息工程研究所 Method and system for network navy account number identification based on particle swarm optimization
CN103778186A (en) * 2013-12-31 2014-05-07 南京财经大学 Method for detecting sockpuppet
CN107332931A (en) * 2017-08-07 2017-11-07 合肥工业大学 The recognition methods of waterborne troops of machine type forum and device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
一种网络论坛水军账号快速检测算法;陈桂茸等;《湖南大学学报(自然科学版)》;20150425(第04期);第114-120页 *
在线社交网络群体发现研究进展;潘理等;《电子与信息学报》;20170915(第09期);第2097-2107页 *
基于SVM的新浪微博营销类水帖识别研究;叶施仁等;《湘潭大学自然科学学报》;20151215(第04期);第70-74页 *

Also Published As

Publication number Publication date
CN110728543A (en) 2020-01-24

Similar Documents

Publication Publication Date Title
CN110728543B (en) Abnormal account identification method and device
CN110009174B (en) Risk recognition model training method and device and server
Dickerson et al. Using sentiment to detect bots on twitter: Are humans more opinionated than bots?
CN103793484B (en) The fraud identifying system based on machine learning in classification information website
CN107239440B (en) Junk text recognition method and device
CN108038627B (en) Object evaluation method and device
CN109388634B (en) Address information processing method, terminal device and computer readable storage medium
Hristakieva et al. The spread of propaganda by coordinated communities on social media
CN109933648B (en) Real user comment distinguishing method and device
CN111506785B (en) Social text-based network public opinion topic identification method and system
CN110930218B (en) Method and device for identifying fraudulent clients and electronic equipment
CN111641608A (en) Abnormal user identification method and device, electronic equipment and storage medium
CN115577172A (en) Article recommendation method, device, equipment and medium
CN109783805A (en) A kind of network community user recognition methods and device
Gunawan et al. Filtering spam text messages by using Twitter-LDA algorithm
Wang et al. Boosting election prediction accuracy by crowd wisdom on social forums
CN112990989B (en) Value prediction model input data generation method, device, equipment and medium
Ghanem et al. Agents of influence in social networks.
CN113886697A (en) Clustering algorithm based activity recommendation method, device, equipment and storage medium
CN112835910A (en) Enterprise information and policy information processing method and device
Christopher et al. Review authenticity verification using supervised learning and reviewer personality traits
CN113254919B (en) Abnormal device identification method, electronic device, and computer-readable storage medium
CN104503959B (en) Method and equipment for predicting emotional tendency of user
CN110599195B (en) Method for identifying bill swiping
CN103970727A (en) Topic-based anti-cheating method, device and server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant