CN112905662A - Method, system and device for distinguishing true and false consumers of internet - Google Patents

Method, system and device for distinguishing true and false consumers of internet Download PDF

Info

Publication number
CN112905662A
CN112905662A CN202110171747.3A CN202110171747A CN112905662A CN 112905662 A CN112905662 A CN 112905662A CN 202110171747 A CN202110171747 A CN 202110171747A CN 112905662 A CN112905662 A CN 112905662A
Authority
CN
China
Prior art keywords
threshold value
user
abnormal
account
threshold
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110171747.3A
Other languages
Chinese (zh)
Inventor
杨骏
郭奕楷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Hongyuan Information Technology Co ltd
Original Assignee
Shanghai Hongyuan Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Hongyuan Information Technology Co ltd filed Critical Shanghai Hongyuan Information Technology Co ltd
Priority to CN202110171747.3A priority Critical patent/CN112905662A/en
Publication of CN112905662A publication Critical patent/CN112905662A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Fuzzy Systems (AREA)
  • Computational Linguistics (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Strategic Management (AREA)
  • Primary Health Care (AREA)
  • Marketing (AREA)
  • General Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention provides a method, a system and a device for distinguishing true and false consumers of an internet, which are used for designing a multi-mode abnormal account number recognition method aiming at social and vertical platforms, wherein the speaking content and the behavior of a user are combined, and the behavior data and the issuing content data of the user are extracted by randomly selecting a sample user; establishing a dynamic threshold value for characteristic anomaly judgment to determine the quantitative difference between the abnormal behavior and the normal behavior of the user; and establishing an evaluation mechanism according to the dynamic threshold value to dynamically evaluate the state of the user and identify the abnormal user account.

Description

Method, system and device for distinguishing true and false consumers of internet
Technical Field
The invention relates to the technical field of flow analysis, in particular to a method, a system and a device for distinguishing true consumers from false consumers on the Internet.
Background
At present, there are 3 methods for discriminating false consumers (network abnormal account numbers): the analysis method based on content characteristic analysis, user behavior characteristic analysis and user relation characteristic.
The content feature analysis is to identify the similarity and emotional tendency among texts through natural language processing related technologies, such as text analysis and emotion analysis, so as to achieve the purpose of identifying abnormal account numbers. This technology was widely used primarily in the early internet era, such as identifying spam by detecting duplicate content. However, with the progress of the internet, the abnormal account number hides itself more and more, and even can imitate the speech of a real consumer, and the abnormal account number cannot be effectively identified from the text content. As networks tend to be diversified, the forms of speech by consumers are also more complex, and more meaningless content (e.g., repeated postings) is flooded. Therefore, the judgment is carried out only by text content, and many real users can be identified as abnormal account numbers.
The method based on the user behavior characteristics relies on a machine learning model, such as logistic regression, naive Bayes, random forests and the like, can avoid the problem that the abnormal account number speaking content approaches to a real person and cannot be identified, and can obtain higher accuracy and recall rate through data verification. But such user feature-based method application is almost impossible to implement in business applications as the amount of data on social platforms is getting larger (in the billions and billions). The reason is that 1) the model needs to manually judge and mark which abnormal account is used as training data, and if the abnormal account is found out from 1 hundred million pieces of user data, a large amount of data needs to be manually marked, which is high in cost. 2) Even with annotated abnormal account data, predictions can take a significant amount of computing resources and time. 3) The accuracy of such models is usually around 80%, leaving a gap from commercial use.
Similarly, the method based on user relationship recognition depends on models such as graph theory and probability graph, and can also encounter problems similar to machine learning models in application. In addition, as internet users pay more attention to protection of personal privacy, user social network data necessary for the method based on user relationship identification becomes very difficult to obtain, further increasing the difficulty of implementing the method.
Disclosure of Invention
The invention aims to provide a method, a system and a device for distinguishing true and false consumers in an internet platform.
In order to achieve the above object, an aspect of the present invention provides an internet consumer authentication method, including the steps of:
acquiring data, namely randomly selecting sample users, and extracting behavior data and release content data of the users;
determining a threshold value, and establishing a dynamic threshold value for characteristic abnormity judgment, wherein the dynamic threshold value is used for expressing quantitative difference between abnormal behaviors and normal behaviors of a user;
and account identification, namely establishing an evaluation mechanism according to a dynamic threshold value to dynamically evaluate the state of the user and identifying an abnormal user account.
Further, in the data acquisition process, the extracted behavior data and the release content data of the user include:
the user information comprises a user name, a user account, an attention number and a fan number;
actively releasing contents including characters and releasing time;
forwarding, commenting on content, including content being forwarded or commented on, content written when forwarding or commenting on, time of forwarding or commenting on.
Further, the threshold determination process includes:
calculating quantiles of times of abnormal behaviors of the user account, and recording upper and lower quartile points as Q3 and Q1 respectively;
calculating upper and lower outlier boundaries, the upper outlier boundary being Q3+1.5(Q3-Q1) and the lower outlier boundary being Q1-1.5 (Q3-Q1);
the upper outlier boundary is used as a threshold, and outliers larger than the upper outlier boundary.
Further, the method further comprises the following steps:
calculating the number of active days and the number of inactive days of the user account;
and drawing a box type graph according to the number of active days and the number of inactive days of the user account as boundaries, and setting a threshold value according to the box type graph.
Further, the method judges the abnormal characteristics of the user account according to the following steps:
judging whether the attention number or the fan number of the user account is greater than a threshold value 1 and the attention number is greater than a threshold value 2;
the number of abnormal behaviors in half a year is greater than a threshold value 3, and the abnormal behaviors comprise:
the number of times of replying the single content is larger than a threshold value 4;
under a single content, replying the content with the repetition times larger than a threshold value 4 and the length larger than a threshold value 5;
the frequency of forwarding the same content is greater than a threshold value 4;
issuing content with repetition times larger than a threshold value 6 and length larger than a threshold value 5;
where threshold 1 is 9.6, threshold 2 is 110, threshold 3 is 4, threshold 4 is 4, threshold 5 is 10, and threshold 6 is 7.
Further, the judgment of the abnormal user account further comprises:
calculating the edit distance between the user account username and the identified abnormal account username, wherein the edit distance is Editdistance (S1, S2), S1 is a username string, and S2 is an abnormal account username string;
and when the editing distance is larger than 2, judging that the user account is an abnormal account.
On the other hand, the invention also provides an internet true and false consumer distinguishing system, which comprises:
the data acquisition unit is used for randomly selecting sample users, extracting behavior data of the users and issuing content data;
the threshold determining unit is used for establishing a dynamic threshold of characteristic abnormity judgment, and the dynamic threshold is used for representing the quantitative difference between the abnormal behavior and the normal behavior of the user;
the account identification unit is used for establishing an evaluation mechanism according to the dynamic threshold value to dynamically evaluate the state of the user and identifying an abnormal user account;
in another aspect, the present invention also provides a computer device comprising a memory and a processor, the memory storing a computer program, the computer program, when executed by the processor, causing the processor to perform the steps of the method as described above.
In another aspect, the present invention also provides a computer-readable storage medium, in which a computer program is stored, the computer program comprising program instructions which, when executed by a processor, cause the processor to carry out the steps of the method as described above.
The invention provides a method, a system and a device for distinguishing true and false consumers of an internet, which are used for designing a multi-mode abnormal account number recognition method aiming at social and vertical platforms, wherein the speaking content and the behavior of a user are combined, and the behavior data and the issuing content data of the user are extracted by randomly selecting a sample user; establishing a dynamic threshold value for characteristic anomaly judgment to determine the quantitative difference between the abnormal behavior and the normal behavior of the user; and establishing an evaluation mechanism according to the dynamic threshold value to dynamically evaluate the state of the user and identify the abnormal user account.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a flowchart of an internet consumer authentication method according to an embodiment of the present invention.
Fig. 2 is a block diagram of an internet authentication consumer determination system according to an embodiment of the present invention.
Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.
The invention mainly solves the problem of true and false consumer identification in social platforms and industry vertical platforms. Due to different platform characteristics, methods for identifying abnormal network account numbers need to be designed for the social platform and the industry vertical platform respectively.
Fig. 1 is a flowchart of an internet consumer authentication method according to an embodiment of the present invention. As shown in fig. 1, the method for identifying internet true or false consumers of the present invention includes the following steps:
s100, data acquisition, random selection of sample users, extraction of behavior data of the users and release of content data.
Specifically, the sample users of the present invention are drawn from social-type platforms and industry vertical platforms. The extracted information includes: the user information includes a user name, a user ID, the number of concerns, the number of fans, and the like. All actively released contents comprise characters, releasing time and the like. All forwarded and commented contents include forwarded or commented contents, written contents when forwarded or commented, time of forwarding or commenting and the like.
S200, determining a threshold value, and establishing a dynamic threshold value for characteristic abnormity judgment, wherein the dynamic threshold value is used for expressing quantitative difference between abnormal behaviors and normal behaviors of a user.
In one embodiment, for industry vertical websites, behavioral and content characteristics of the abnormal account number are derived using scenario inference. On an industry vertical platform, main abnormal account behavior and content characteristics include:
the method is characterized in that: a number of response posts on the platform, or participation in publishing activity-related content. The platform type abnormal account number is usually used for being matched with activities issued by a platform to make a potential, and is returned.
The second characteristic: after a long period of non-posting, the post-posting activity begins, and the post-activity content is dominated by participating in the activity or discussing the brand. The account number is one of typical characteristics of the abandoned account number type abnormal account number, and the platform can recycle the account numbers which do not post for a long time, and the account numbers are taken to increase the activity of the platform.
The characteristics are three: the registration time and first posting time interval are long. The account number is one of typical characteristics of the abandoned account number type abnormal account number, and the platform can recycle the account numbers which do not post for a long time, and the account numbers are taken to increase the activity of the platform.
The characteristics are as follows: many months may be spent posting or replying to many brand-related posts. This is characteristic of a marketing account type abnormal account.
Specifically, in the embodiment, an abnormal account and a general user account of each platform are found in a data-driven manner, and quantitative differences in characteristics of behaviors and contents are distinguished in a threshold manner.
It will be appreciated that the number of abnormal account numbers is relatively small compared to the number of ordinary users, and that their behavior and content may exhibit abnormalities in the above characteristics, while an abnormal account number may exhibit abnormalities in at least one characteristic. Thus, the threshold may be set to be dynamically updated once per year.
In one embodiment, the dynamic threshold may be determined by a threshold calculation process, the calculation process including:
calculating quantiles of times of abnormal behaviors of the user account, and recording the upper quartile and the lower quartile as Q3 and Q1 respectively.
Upper and lower outlier boundaries are calculated, the upper outlier boundary being Q3+1.5(Q3-Q1) and the lower outlier boundary being Q1-1.5 (Q3-Q1).
The upper outlier boundary is used as a threshold, and outliers larger than the upper outlier boundary.
In another embodiment, after determining the upper outlier boundary and the lower outlier boundary, a threshold may be set for the upper edge of the boxed graph to be outliers, i.e., graphically larger than the upper edge, by plotting the boxed graph.
For example, if the feature "not post for a long period and then start to be active" in the account is determined, the box chart is drawn in the above manner.
In particular, it was found that most users on the platform would not be active after 122 days of inactivity. The threshold is set at 122 and the feature "long not posted and then active" is satisfied if there are more than 122 users inactive.
In another embodiment, for a social-like platform, its abnormal account behavior and content characteristics include:
the method is characterized in that: many accounts are concerned, and only few fans are available. This is an important feature of false fan-type abnormal account numbers. These abnormal account numbers are typically present to increase the amount of vermicelli available to those buying vermicelli.
The second characteristic: the same content is replied or forwarded for a plurality of times, and the replied or forwarded content is not identical. This is a characteristic of marketing account type abnormal accounts, and repeated reply or forwarding is to make the content look lively.
The characteristics are three: the same content is replied or forwarded for a plurality of times, and the replied or forwarded content is identical. The robot account number type abnormal account number is characterized in that repeated reply or forwarding is to make the content look very lively.
The characteristics are as follows: the same piece of content is released multiple times. This is a characteristic of a robot account type abnormal account, and the repeated content is to make the information more disseminated.
The characteristics are as follows: similar to the account name of the identified abnormal account number. This is a characteristic of a robot account type abnormal account, and usually, many accounts with similar names are automatically registered through a robot program.
Specifically, in the embodiment, an abnormal account and a general user account of each platform are found in a data-driven manner, and quantitative differences in characteristics of behaviors and contents are distinguished in a threshold manner.
The method for judging the abnormal characteristics of the user account comprises the following steps:
judging whether the attention number or the fan number of the user account is greater than a threshold value 1 and the attention number is greater than a threshold value 2;
the number of abnormal behaviors in a half year is greater than a threshold value 3, wherein the abnormal behaviors comprise:
the number of times of replying the single content is larger than a threshold value 4;
under a single content, replying the content with the repetition times larger than a threshold value 4 and the length larger than a threshold value 5;
the frequency of forwarding the same content is greater than a threshold value 4;
content is published with a number of repetitions greater than a threshold 6 and a length > threshold 5.
The threshold 1 is 9.6, the threshold 2 is 110, the threshold 3 is 4, the threshold 4 is 4, the threshold 5 is 10, and the threshold 6 is 7.
Further, the judgment of the abnormal user account further comprises:
and calculating the edit distance between the user account username and the identified abnormal account username, wherein the edit distance is Editdistance (S1, S2), S1 is a username character string, and S2 is an abnormal account username character string. Wherein the edited definition includes: adding a character, modifying a character, deleting a character.
And when the editing distance is larger than 2, judging that the user account is an abnormal account.
And S300, account identification, wherein an evaluation mechanism is established according to a dynamic threshold value to dynamically evaluate the state of the user, and an abnormal user account is identified.
Specifically, the evaluation mechanism is as shown in fig. 4, and it is the abnormal account number that satisfies the following conditions, and the rest are the real consumer account numbers.
There are several blue decision boxes (blue diamonds) in the figure, which means that each platform has the same threshold value in different time period classes. There are four yellow decision boxes (yellow diamonds) in the figure, which means that the threshold of each platform changes or changes with time.
And S400, checking, extracting and sampling 1000 users in the platform, comparing the manual marking and the prediction result, and calculating the accuracy of the true and false consumer account identification algorithm.
In one embodiment, two main indicators for comparing and measuring the technical effect are accuracy (accuracy), and the calculation method is as follows:
the accuracy rate is the correct number of accounts identified in the sample/sample account number.
After 1000 account samples are respectively randomly sampled and manually labeled, the accuracy rate result is counted, wherein the industry vertical platform accuracy rate is 95.3%, and the social platform accuracy rate is 97.6%. Is far higher than the average level in academic papers and markets.
Fig. 2 is a block diagram of an internet authentication consumer determination system according to an embodiment of the present invention. As shown in fig. 2, the internet authentication consumer determination system of the present embodiment includes:
the data acquiring unit 100 is used for randomly selecting sample users, extracting behavior data of the users and issuing content data.
A threshold unit 200 is used to establish a dynamic threshold for feature anomaly determination, wherein the dynamic threshold is used to represent the quantitative difference between the abnormal behavior and the normal behavior of the user.
The account identification unit 300 is configured to dynamically evaluate the state of the user according to an evaluation mechanism established by a dynamic threshold, and identify an abnormal user account.
And the testing unit 400 is used for extracting 1000 sampled users in the platform, comparing the manual marking and the prediction result, and calculating the accuracy of the true and false consumer account identification algorithm.
In another aspect, the present invention further provides a computer device, which includes a memory and a processor, wherein the memory stores a computer program, and the computer program, when executed by the processor, causes the processor to execute the steps of the above method.
In another aspect, the present invention also provides a computer readable storage medium storing a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the steps of performing the above method.
Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention. As shown in fig. 3, an electronic device of one embodiment of the invention includes one or more input devices 1000, one or more output devices 1000, one or more processors 3000, and memory 4000.
In one embodiment of the invention, the processor 1000, the input device 2000, the output device 3000, and the memory 4000 may be connected by a bus or other means. The input device 2000, the output device 3000 may be a standard wired or wireless communication interface.
The Processor 1000 may be a Central Processing Unit (CPU), and may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Memory 4000 may be a high speed RAM memory or a non-volatile memory such as a disk memory. The memory 4000 is used to store a set of computer programs, and the input device 2000, the output device 3000, and the processor 1000 may call the program codes stored in the memory 4000.
The memory 4000 stores a computer program comprising program instructions that, when executed by the processor, cause the processor to perform the steps of the patent value assessment method as described in the above embodiments.
An embodiment of the present invention also provides a computer-readable storage medium. The computer readable storage medium may be a high speed RAM memory or a non-volatile memory such as a disk memory. The computer-readable storage medium may be connected through an external computing device or a network to read a set of computer programs stored in the computer-readable storage medium. The computer program stored by the computer readable storage medium comprises program instructions which, when executed by a processor, cause the processor to perform the steps of the method as described above in the embodiments above.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (9)

1. An internet true and false consumer distinguishing method is characterized by comprising the following steps:
acquiring data, namely randomly selecting sample users, and extracting behavior data and release content data of the users;
determining a threshold value, and establishing a dynamic threshold value for characteristic abnormity judgment, wherein the dynamic threshold value is used for expressing quantitative difference between abnormal behaviors and normal behaviors of a user;
and account identification, namely establishing an evaluation mechanism according to a dynamic threshold value to dynamically evaluate the state of the user and identifying an abnormal user account.
2. The internet consumer authentication method as claimed in claim 1, wherein the extracting of the behavior data and the distribution content data of the user in the data acquisition process comprises:
the user information comprises a user name, a user account, an attention number and a fan number;
actively releasing contents including characters and releasing time;
forwarding, commenting on content, including content being forwarded or commented on, content written when forwarding or commenting on, time of forwarding or commenting on.
3. The internet consumer authentication method as claimed in claim 1, wherein said threshold determination process comprises:
calculating quantiles of times of abnormal behaviors of the user account, and recording upper and lower quartile points as Q3 and Q1 respectively;
calculating upper and lower outlier boundaries, the upper outlier boundary being Q3+1.5(Q3-Q1) and the lower outlier boundary being Q1-1.5 (Q3-Q1);
the upper outlier boundary is used as a threshold, and outliers larger than the upper outlier boundary.
4. The internet consumer authentication method as claimed in claim 3, further comprising:
calculating the number of active days and the number of inactive days of the user account;
and drawing a box type graph according to the number of active days and the number of inactive days of the user account as boundaries, and setting a threshold value according to the box type graph.
5. The internet consumer authentication method according to claim 4, further comprising determining the abnormal characteristics of the user account according to the following steps:
judging whether the attention number or the fan number of the user account is greater than a threshold value 1 and the attention number is greater than a threshold value 2;
the number of abnormal behaviors in half a year is greater than a threshold value 3, and the abnormal behaviors comprise:
the number of times of replying the single content is larger than a threshold value 4;
under a single content, replying the content with the repetition times larger than a threshold value 4 and the length larger than a threshold value 5;
the frequency of forwarding the same content is greater than a threshold value 4;
issuing content with repetition times larger than a threshold value 6 and length larger than a threshold value 5;
where threshold 1 is 9.6, threshold 2 is 110, threshold 3 is 4, threshold 4 is 4, threshold 5 is 10, and threshold 6 is 7.
6. The internet consumer authentication method according to claim 5, wherein the abnormal user account number determination further comprises:
calculating the edit distance between the user account username and the identified abnormal account username, wherein the edit distance is Editdistance (S1, S2), S1 is a username string, and S2 is an abnormal account username string;
and when the editing distance is larger than 2, judging that the user account is an abnormal account.
7. An internet consumer authentication system, comprising:
the data acquisition unit is used for randomly selecting sample users, extracting behavior data of the users and issuing content data;
the threshold determining unit is used for establishing a dynamic threshold of characteristic abnormity judgment, and the dynamic threshold is used for representing the quantitative difference between the abnormal behavior and the normal behavior of the user;
and the account identification unit is used for establishing an evaluation mechanism according to the dynamic threshold value to dynamically evaluate the state of the user and identifying the abnormal user account.
8. A computer arrangement comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the steps of the method according to any one of claims 1 to 6.
9. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to carry out the steps of the method according to any one of claims 1 to 6.
CN202110171747.3A 2021-02-08 2021-02-08 Method, system and device for distinguishing true and false consumers of internet Pending CN112905662A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110171747.3A CN112905662A (en) 2021-02-08 2021-02-08 Method, system and device for distinguishing true and false consumers of internet

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110171747.3A CN112905662A (en) 2021-02-08 2021-02-08 Method, system and device for distinguishing true and false consumers of internet

Publications (1)

Publication Number Publication Date
CN112905662A true CN112905662A (en) 2021-06-04

Family

ID=76122792

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110171747.3A Pending CN112905662A (en) 2021-02-08 2021-02-08 Method, system and device for distinguishing true and false consumers of internet

Country Status (1)

Country Link
CN (1) CN112905662A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117574362A (en) * 2024-01-15 2024-02-20 广东茉莉数字科技集团股份有限公司 Method and system for resolving abnormal data of dactylogyrus account

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106886518A (en) * 2015-12-15 2017-06-23 国家计算机网络与信息安全管理中心 A kind of method of microblog account classification
CN110728543A (en) * 2019-10-15 2020-01-24 秒针信息技术有限公司 Abnormal account identification method and device
CN110990242A (en) * 2019-11-29 2020-04-10 上海观安信息技术股份有限公司 Method and device for determining fluctuation abnormity of user operation times
US20200242161A1 (en) * 2019-01-29 2020-07-30 International Business Machines Corporation Crowdsourced prevention or reduction of dissemination of selected content in a social media platform
CN111901171A (en) * 2020-07-29 2020-11-06 腾讯科技(深圳)有限公司 Anomaly detection and attribution method, device, equipment and computer readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106886518A (en) * 2015-12-15 2017-06-23 国家计算机网络与信息安全管理中心 A kind of method of microblog account classification
US20200242161A1 (en) * 2019-01-29 2020-07-30 International Business Machines Corporation Crowdsourced prevention or reduction of dissemination of selected content in a social media platform
CN110728543A (en) * 2019-10-15 2020-01-24 秒针信息技术有限公司 Abnormal account identification method and device
CN110990242A (en) * 2019-11-29 2020-04-10 上海观安信息技术股份有限公司 Method and device for determining fluctuation abnormity of user operation times
CN111901171A (en) * 2020-07-29 2020-11-06 腾讯科技(深圳)有限公司 Anomaly detection and attribution method, device, equipment and computer readable storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117574362A (en) * 2024-01-15 2024-02-20 广东茉莉数字科技集团股份有限公司 Method and system for resolving abnormal data of dactylogyrus account
CN117574362B (en) * 2024-01-15 2024-04-30 广东茉莉数字科技集团股份有限公司 Method and system for resolving abnormal data of dactylogyrus account

Similar Documents

Publication Publication Date Title
Yang et al. Botometer 101: Social bot practicum for computational social scientists
CN110009174B (en) Risk recognition model training method and device and server
Moreno-Marcos et al. Sentiment analysis in MOOCs: A case study
TW201443811A (en) Social media impact assessment (1)
CN109360089A (en) Credit risk prediction technique and device
US20220210268A1 (en) Tool for annotating and reviewing audio conversations
US20220156460A1 (en) Tool for categorizing and extracting data from audio conversations
Lee et al. Detecting fake reviews with supervised machine learning algorithms
CN111324739A (en) Text emotion analysis method and system
US10706371B2 (en) Data processing techniques
Aralikatte et al. Fault in your stars: an analysis of android app reviews
CN105183743A (en) Prediction method of MicroBlog public sentiment propagation range
Antretter et al. Predicting startup survival from digital traces: Towards a procedure for early stage investors
CN115577172A (en) Article recommendation method, device, equipment and medium
CN112016855A (en) User industry identification method and device based on relational network matching and electronic equipment
Nielek et al. Spiral of hatred: social effects in internet auctions. between informativity and emotion
Khan et al. Possible effects of emoticon and emoji on sentiment analysis web services of work organisations
CN112905662A (en) Method, system and device for distinguishing true and false consumers of internet
CN112580350A (en) Appeal analysis method and device, electronic equipment and storage medium
Keerthana et al. Accurate prediction of fake job offers using machine learning
JP2020135434A (en) Enterprise information processing device, enterprise event prediction method and prediction program
CN109242690A (en) Finance product recommended method, device, computer equipment and readable storage medium storing program for executing
WO2021129368A1 (en) Method and apparatus for determining client type
CN112860892A (en) Data labeling method, device and equipment in AI model
CN113704599A (en) Marketing conversion user prediction method and device and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination