WO2022068493A1 - 异常用户审核方法、装置、电子设备和存储介质 - Google Patents
异常用户审核方法、装置、电子设备和存储介质 Download PDFInfo
- Publication number
- WO2022068493A1 WO2022068493A1 PCT/CN2021/115230 CN2021115230W WO2022068493A1 WO 2022068493 A1 WO2022068493 A1 WO 2022068493A1 CN 2021115230 W CN2021115230 W CN 2021115230W WO 2022068493 A1 WO2022068493 A1 WO 2022068493A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- users
- user
- abnormal
- reviewed
- user set
- Prior art date
Links
- 230000002159 abnormal effect Effects 0.000 title claims abstract description 245
- 238000000034 method Methods 0.000 title claims abstract description 47
- 238000012552 review Methods 0.000 claims description 29
- 230000003542 behavioural effect Effects 0.000 claims description 22
- 238000012550 audit Methods 0.000 claims description 14
- 230000003993 interaction Effects 0.000 claims description 11
- 238000004364 calculation method Methods 0.000 claims description 8
- 238000000605 extraction Methods 0.000 claims description 3
- 238000013145 classification model Methods 0.000 claims description 2
- 238000004590 computer program Methods 0.000 claims description 2
- 230000006870 function Effects 0.000 description 89
- 230000006399 behavior Effects 0.000 description 68
- 238000001514 detection method Methods 0.000 description 7
- 239000000284 extract Substances 0.000 description 6
- 208000032769 Pedophilia Diseases 0.000 description 5
- 230000008859 change Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 238000012795 verification Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000002452 interceptive effect Effects 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000007635 classification algorithm Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000012896 Statistical algorithm Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/535—Tracking the activity of the user
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/906—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9536—Search customisation based on social or collaborative filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/1396—Protocols specially adapted for monitoring users' activity
Definitions
- the present application relates to the technical field of content review, for example, to a method, apparatus, electronic device and storage medium for reviewing abnormal users.
- pedophile users When reviewing users, abnormal users can be pedophile users or other illegal users.
- pedophile users are the focus of the review. It is necessary to detect pedophile users from a large number of users and punish them.
- the video data has the problem of inaccurate information. For example, it is impossible to determine whether a video is a video that is prohibited from being watched by minors, and whether it is a pornographic video.
- pedophile users have confrontational behaviors, such as mutual interactions between pedophile users.
- Interactive behaviors such as following and liking, to avoid detection by following each other to achieve tipping off, or use morphing words of pornographic text that can only be understood by pedophile users, or even create pedophile vocabulary exclusively for pedophile users In order to avoid detection, it is impossible to extract effective features related to pedophilia to detect pedophile users.
- the method for classifying users is to input the characteristics of a user to obtain the user's classification.
- the user's behavior and classification results will not affect the classification results of other users, nor will they be affected by the behavior and classification results of other users.
- the algorithm for classifying users cannot dynamically find out whether a user is a pedophile by combining the interaction between pedophile users.
- the present application provides an abnormal user review method, device, electronic device and storage medium to solve the problem that in the related art, due to inaccurate information, inability to extract effective features, and classification algorithms to classify a single user, it is impossible to use the relationship between pedophile users. Interactive behavior, resulting in the inability to effectively and accurately review pedophile users.
- This application provides an abnormal user review method, including:
- preset valid features are extracted from the historical behavior data, wherein the preset valid features are features in preset sample data
- the maximum value of the total probability function is obtained with the preset condition as a constraint to determine a candidate user; the candidate user is reviewed to obtain an abnormal user.
- the present application provides an abnormal user review device, including:
- a historical behavior data acquisition module configured to acquire historical behavior data of a plurality of users to be reviewed, wherein the historical behavior data includes behavior data formed by interactions between a plurality of users to be reviewed;
- a feature extraction module configured to extract a plurality of preset valid features from the historical behavior data for each user to be reviewed, wherein the preset valid features are features in preset sample data;
- a user probability calculation module configured to calculate the probability that each user to be reviewed is an abnormal user according to the preset probability of the events associated with the plurality of preset valid features
- a total probability function establishment module configured to establish a total probability function using the probability that the plurality of users to be reviewed are abnormal users
- a total probability function solving module configured to use preset conditions as constraints to solve the maximum value of the total probability function to determine candidate users
- the auditing module is configured to audit the candidate users to obtain abnormal users.
- the application provides an electronic device, the electronic device includes:
- processors one or more processors
- storage means arranged to store one or more programs
- the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors implement the above-mentioned abnormal user auditing method.
- the present application provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the above-mentioned abnormal user auditing method is implemented.
- Fig. 1 is a flow chart of steps of a method for reviewing abnormal users provided in Embodiment 1 of the present application;
- Embodiment 2 is a flow chart of steps of a method for checking abnormal users provided in Embodiment 2 of the present application;
- FIG. 3 is a structural block diagram of an abnormal user review device provided in Embodiment 3 of the present application.
- FIG. 4 is a schematic structural diagram of an electronic device provided in Embodiment 4 of the present application.
- FIG. 1 is a flowchart of steps of a method for reviewing abnormal users provided in Embodiment 1 of the present application.
- the embodiment of the present application can be applied to reviewing abnormal users to detect pedophile users or other illegal users.
- the method can be Executed by the abnormal user auditing device of the embodiment of the present application, the abnormal user auditing device may be implemented by hardware or software, and integrated in the electronic equipment provided by the embodiment of the present application, as shown in FIG.
- the abnormal user review method can include the following steps:
- the abnormal user may refer to a pedophile user, and may also include users with other illegal behaviors.
- the embodiment of the present application uses a pedophile user as an example to describe the method for reviewing abnormal users.
- the users to be reviewed can be users on short video and live broadcast platforms, and the users to be reviewed can be some of the specified users or all users.
- Historical behavior data can include data formed by the behavior of users to be reviewed on short videos and live broadcast platforms, such as data formed by recording videos watched, liked videos, comments, following other users, etc. Historical behavior data can also be Including data formed by other users on the behavior of the user to be reviewed, such as other users follow the user to be reviewed, like, watch, comment on the video posted by the user to be reviewed, comments and other data.
- the historical behavior data may also include information of the user to be reviewed, such as a user identifier (User Identifier, UID), gender, and the like of the user to be reviewed.
- data embedding points can be set up on short video and live broadcast platforms
- the historical behavior data of users to be reviewed can be collected through data embedding
- the historical behavior data of users to be reviewed can also be obtained through other methods such as user logs.
- This application implements The example does not limit the way to obtain historical behavior data.
- a specified number of sample users may be determined first, and the specified number of sample users may be users marked with normal user labels and abnormal user labels.
- the historical behavior of the specified number of sample users is obtained.
- Data extract behavioral features from the historical behavior data of a specified number of sample users, and determine the validity of the behavioral feature.
- the behavioral feature is valid, the behavioral feature is used as an effective feature, and the events associated with the effective feature are determined, and the effective feature is calculated by a statistical algorithm
- the probability that a user is an abnormal user when the associated event occurs, the probability is taken as a preset probability, and finally sample data including multiple valid features and preset probabilities of events associated with multiple valid features are obtained.
- the sample data can be updated by using the reviewed user as a sample user.
- the valid features may be prohibited words related to pedophilia contained in the comments of the user to be reviewed, and the likes, comments, and views of the user to be reviewed are liked, commented, and viewed.
- the corresponding preset effective features can be extracted from the behavior data of the users to be reviewed, such as comments, views, likes, and publications.
- the sample data includes multiple preset valid features and preset probabilities of events associated with the preset valid features. For each user to be reviewed, after extracting multiple preset valid features, you can The probability that the user to be reviewed is an abnormal user is obtained by summing the preset probabilities of events associated with valid features.
- the total probability function is the sum of the probability that all users to be reviewed are abnormal users.
- the function value of the total probability function change.
- the proportion of abnormal users in a specified number of sample users can be calculated, and the proportion of abnormal users among all users to be reviewed can be obtained by multiplying the ratio by the total number of all users to be reviewed.
- the recall rate can be set, and the product of the recall rate and the total number of all users to be reviewed can be calculated, and the constraint condition can be set as follows: when the function value of the total probability function is the largest, the number of candidate users is greater than the recall rate and all the users to be reviewed. The product of the total number of , and is less than the number of abnormal users among all users to be reviewed.
- the users to be reviewed can be preliminarily divided into the abnormal user set and the normal user set, and the users to be reviewed in the abnormal user set are assumed to be abnormal users, and the users to be reviewed in the normal user set are assumed to be normal users. , and calculate the first function value of the total probability function based on the initially divided abnormal user set and normal user set.
- the audit user after traversing the abnormal user set each time, calculates the function value of the total probability function based on the current abnormal user set and the normal user set to obtain multiple second function values, and determines from the multiple second function values.
- the minimum second function value When the minimum second function value is greater than the first function value, the second pending user corresponding to the minimum second function value is determined in the abnormal user set, and the second pending user corresponding to the minimum second function value is determined.
- the reviewing user is determined as a normal user, and moved to the normal user set, and the first user to be reviewed that is currently traversed is determined as an abnormal user, and is moved from the normal user set to the abnormal user set, and returns to traverse the normal user set
- the user ID of the candidate user can be sent to the audit background, and the abnormal user is obtained by manually auditing the candidate user in the audit background.
- the historical behavior data of the user to be reviewed includes the data formed by the interactive behaviors of the users to be reviewed.
- the probability that the user to be reviewed is an abnormal user is calculated by extracting the valid features from the historical behavior data. , and use the probability that all users to be reviewed are abnormal users to construct a total probability function, solve the maximum value of the total probability function with preset constraints to obtain candidate users, and review the candidate users to obtain abnormal users.
- FIG. 2 is a flowchart of steps of an abnormal user review method provided in Embodiment 2 of the present application. This embodiment of the present application is described on the basis of the foregoing Embodiment 1. As shown in FIG. 2 , the abnormal user review method in this embodiment of the present application is described. The method may include the following steps:
- sample data includes valid features and a preset probability that a user is an abnormal user when an event associated with the valid feature occurs.
- sample data can be preset through statistics, and the sample data includes valid features and a preset probability that the user is an abnormal user when an event associated with the valid feature occurs.
- a specified number such as 1000
- the sample users are the users marked with the abnormal user label and the normal user label, and then multiple valid features are extracted from the historical behavior data of the sample users, and the events associated with the valid features are calculated.
- the probability that the sample user is an abnormal user is taken as the preset probability.
- the sample users of the feature are the number of second users of normal users, obtain the number of third users of abnormal users and the number of fourth users of normal users among the specified number of sample users, and calculate the first ratio of the number of first users to the number of third users , and calculating the second ratio between the second quantity and the fourth quantity, calculating the absolute value of the difference between the first ratio and the second ratio, and determining that the behavior feature is an effective feature when the absolute value is greater than a preset threshold.
- an abnormal user is reviewed to determine a pedophile user, and it is assumed that N behavioral features are extracted from the historical behavior data of all sample users, one of which behavioral feature B is a prohibited word contained in the user's comment, Among the specified number of sample users, the proportion of pedophile users with behavioral feature B among all pedophile users is P_1, and the proportion of normal users with behavioral feature B in the specified number of sample users is P_2 among all normal users, If
- the probability of being an abnormal user is P(A 0
- the embodiment of the present application extracts behavioral features from the historical behavior data of sample users and determines effective features, and can set effective features for pedophilia or other illegal behaviors to audit abnormal users, which solves the problem of pedophile users in the related art.
- behavior pedophile users use inflections of pornographic words that can only be understood by pedophiles, and even create pedophile vocabulary for pedophile users to avoid detection, resulting in the inability to extract effective pedophile-related words.
- the problem of detecting pedophile users by using features, effective features can be set for pedophile users to detect pedophile users, which improves the accuracy of pedophile user detection.
- S202 Acquire historical behavior data of multiple users to be reviewed from buried points of multiple platforms.
- Historical behavior data can include user gender, videos liked by users, user comments, accounts that users follow, At least one of the accounts followed by the user, the videos shared by the user, the number of videos liked by the user, and the number of videos posted by the user, and the historical behavior data may also include other behavior data.
- the sample data After the sample data is determined by the sample user, the sample data includes preset valid features for reviewing abnormal users, and the corresponding preset valid features can be extracted from the historical behavior data of each user to be reviewed.
- the preset probability of the event associated with each preset valid feature included in the sample data For each user to be reviewed, the preset probability of the event associated with multiple preset valid features of the user to be reviewed can be determined. Sum the probabilities to get the probability that each user to be reviewed is an abnormal user.
- the total probability function is obtained by summing the probabilities of all users to be reviewed as abnormal users. For this total probability function, when one of the users to be reviewed changes from abnormal users to normal users, or from normal users to abnormal users, the When the probability that the user is an abnormal user changes, the function value of the total probability function also changes.
- S206 Initialize the abnormal user set and the normal user set to divide the multiple users to be reviewed into the abnormal user set and the normal user set.
- all users to be reviewed may be regarded as normal users, the set of users to be reviewed obtained when the maximum value of the total probability function is obtained as the set of abnormal users, and the set of users to be reviewed except the set of abnormal users may be regarded as the set of abnormal users.
- the collection of users to be reviewed is regarded as the normal user collection.
- the total number of users to be reviewed is num( ⁇ ), the proportion of abnormal users can be counted as s through the specified number of sample users, and the product of this ratio s and the total number of users to be reviewed num( ⁇ ) is calculated to get the value to be reviewed.
- the number of abnormal users in the user is num( ⁇ 1 ), when assuming that all the users to be reviewed are normal users, calculate the probability that each user to be reviewed is an abnormal user, so that the total probability function takes the maximum value of num( ⁇ 1 ) to be
- the set constituted by the auditing users is regarded as the abnormal user set P, and the set constituted by the remaining users to be audited is regarded as the normal user set Q.
- the historical behavior data of each user to be reviewed can be input into a pre-trained classification model to obtain a classification result of the user to be reviewed, where the classification result is that the user to be reviewed is an abnormal user or a normal user Probability, according to the probability that the user to be reviewed is an abnormal user or a normal user in the classification result, the user to be reviewed is divided into the abnormal user set or the normal user set, such as training regression neural network, deep neural network, recurrent neural network and other networks to be reviewed The user performs preliminary classification, obtains the classification probability that each user to be reviewed is an abnormal user or a normal user, and takes the set of num( ⁇ 1 ) users to be reviewed before the classification probability of the user to be reviewed as an abnormal user as the abnormal user collection, and the rest of the users to be reviewed are divided into the normal user collection.
- abnormal user set and the normal user set can also be initialized in other ways.
- the users to be reviewed in the abnormal user set are regarded as abnormal users, and the users to be reviewed in the normal user set are regarded as normal users to calculate the first function value of the total probability function, because the total probability function is the sum of the probabilities of all users to be reviewed.
- the probability of each user to be reviewed changes, and the value of the total probability function also changes to obtain the first function value S 0 .
- the normal user set can be traversed to take the first user to be reviewed currently traversed in the normal user set as the abnormal user, and after traversing the normal user set
- traverse the abnormal user set After the first user to be reviewed in the set, traverse the abnormal user set to take the second user to be reviewed currently traversed in the abnormal user set as a normal user, after traversing the second user to be reviewed in the abnormal user set, based on the current
- the abnormal user set and the current normal user set calculate the second function value S 1 of the total probability function, and repeat the traversal of the abnormal user set to take the second user to be reviewed currently traversed in the abnormal user set as a normal user.
- the minimum second function value can be determined from a plurality of second function values obtained after traversing the abnormal user set each time, and when the minimum second function value is greater than the first function value, the minimum second function value is determined in the abnormal user set.
- the second user to be reviewed corresponding to the function value the second user to be reviewed corresponding to the smallest second function value is determined to be a normal user and moved to the normal user set, and the currently traversed first user to be reviewed is determined to be an abnormal user, And move from the normal user set to the abnormal user set, and return to the step of traversing the normal user set to take the first user to be reviewed currently traversed in the normal user set as the abnormal user.
- all users Qn to be reviewed in the normal user set Q are normal users
- all users Pm to be reviewed in the abnormal user set P are abnormal users
- the traversal process of the abnormal user set and the normal user set is as follows:
- the number of users to be reviewed is less than the total number of abnormal users among all users to be reviewed, and greater than the product of the total number and the recall rate, where the total number is the product of the preset proportion of abnormal users and the number of all users to be reviewed, that is, the total number of users.
- the problem of solving the probability function is as follows:
- P(x i ) is the probability that the i-th user to be reviewed is an abnormal user
- ⁇ is the set of all users to be reviewed
- st(subject to) is to satisfy a condition
- r is the preset recall rate
- V1 is the set of abnormal users after the traversal is terminated
- num(V 1 ) is the number of abnormal users in the set of abnormal users
- num( ⁇ 1 ) is the number of all abnormal users among all users to be reviewed
- num( ⁇ 1 ) s ⁇ num( ⁇ )
- s is the preset proportion of abnormal users.
- the condition for stopping the traversal is: after traversing the normal user set, the number of users to be reviewed in the abnormal user set is less than the total number of abnormal users in all the users to be reviewed, and is greater than the product of the total number and the recall rate, where, The total number is the product of the preset proportion of abnormal users and the number of all users to be reviewed.
- the user to be reviewed in the abnormal user set obtained after traversing the normal user set and satisfying the preset conditions is the candidate user, and the candidate user can be manually reviewed.
- the user identification of the candidate user may be sent to the review background, and the candidate user is reviewed in the review background. If the candidate user is determined to be an abnormal user through manual review, such as a paedophile user , the candidate user is marked as an abnormal user, otherwise it is marked as a normal user.
- an abnormal user label may be marked for the abnormal user, and a normal user label may be marked for the users other than the abnormal user among the users to be reviewed.
- the users marked with abnormal user labels or normal user labels are used as sample users to update the sample data.
- sample data including valid features and a preset probability that a user is an abnormal user occurs when an event associated with the valid feature occurs
- the Each user to be reviewed extracts multiple preset valid features from historical behavior data, and sums the preset probabilities of events associated with multiple preset valid features to obtain the probability that the user to be reviewed is an abnormal user, using all users to be reviewed as
- the probability of abnormal users establishes a total probability function. After initializing the abnormal user set and the normal user set, calculate the first function value of the total probability function based on the abnormal user set and the normal user set, and traverse the abnormal user set and the normal user set to recalculate the total probability.
- the second function value of the function is to update the abnormal user set and the normal user set according to the first function value and the second function value, and when the maximum function value of the total probability function is obtained after traversing the normal user set under preset conditions, the abnormal user set is
- the users to be reviewed included in the set are determined as candidate users, and the candidate users are reviewed to obtain abnormal users.
- valid features are preset through sample data, and valid features are extracted from historical behavior data and converted into the probability of events associated with valid features.
- the effective features of user review, and the feature calculation is converted into probability calculation, which solves the problem that inaccurate data information affects the review accuracy of abnormal users.
- the user not only utilizes the interactive behavior between the users to be reviewed, but also makes the classification results of the users to be reviewed affect each other, so that the candidate users can be accurately determined for review.
- effective features can be set for pedophilia or other illegal behaviors to audit abnormal users, which solves the problem that pedophile users in related technologies have confrontational behaviors and love Pedophile users use inflections of pornographic texts that can only be understood by pedophiles, and even create pedophile vocabulary for pedophile users to avoid detection, resulting in the inability to extract effective features related to pedophilia for detection.
- effective features can be set for pedophile users to detect pedophile users, which improves the accuracy of pedophile user detection.
- the abnormal user is marked with the abnormal user label
- the normal user is marked with the normal user label
- the marked user is used as the sample user to update the valid features of the sample data.
- the user is an abnormal user.
- the preset probability on the one hand, can dynamically update the sample data to dynamically solve the candidate users; on the other hand, it increases the data source of the sample users and reduces the acquisition cost of the sample users.
- FIG. 3 is a structural block diagram of an abnormal user verification device provided in Embodiment 3 of the present application.
- the abnormal user verification device in the embodiment of the present application may include the following modules:
- the historical behavior data acquisition module 301 is configured to acquire the historical behavior data of a plurality of users to be reviewed, and the historical behavior data includes behavior data formed by the interaction between the multiple users to be reviewed;
- the feature extraction module 302 is configured to For the user to be reviewed, multiple preset valid features are extracted from the historical behavior data, and the preset valid features are features in the preset sample data;
- the user probability calculation module 303 is set to be based on the multiple preset features The preset probability of the event associated with the valid features calculates the probability that the user to be reviewed is an abnormal user;
- the total probability function establishment module 304 is configured to use the probability that all users to be reviewed are abnormal users to establish a total probability function;
- the total probability function solving module 305 set to obtain the maximum value of the total probability function with the preset condition as a constraint to determine candidate users;
- the audit module 306 set to audit the candidate users to obtain abnormal users.
- the abnormal user verification apparatus provided in the embodiment of the present application can execute the abnormal user verification method provided in the first embodiment or the second embodiment of the present application, and has corresponding functional modules and effects of the execution method.
- FIG. 4 is a schematic structural diagram of an electronic device provided in Embodiment 4 of the present application.
- the electronic device may include: a processor 401 , a memory 402 , a display screen 403 with a touch function, an input device 404 , an output device 405 and a communication device 406 .
- the number of processors 401 in the electronic device may be one or more, and one processor 401 is taken as an example in FIG. 4 .
- the processor 401 , the memory 402 , the display screen 403 , the input device 404 , the output device 405 and the communication device 406 of the electronic device can be connected through a bus or other means, and the connection through a bus is taken as an example in FIG. 4 .
- the electronic device is configured to execute the abnormal user review method provided by any embodiment of the present application.
- Embodiments of the present application further provide a computer-readable storage medium, when the instructions in the storage medium are executed by the processor of the electronic device, the electronic device can execute the abnormal user verification method described in the above method embodiments.
- references to the terms “one embodiment,” “some embodiments,” “example,” “example,” or “some examples”, etc. means the features, structures, structures, A material or feature is included in at least one embodiment or example of the present application.
- schematic representations of the above terms do not necessarily refer to the same embodiment or example.
- the described features, structures, materials or characteristics may be combined in any suitable manner in any one or more embodiments or examples.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Mathematical Analysis (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Computational Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- Algebra (AREA)
- Operations Research (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Business, Economics & Management (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Human Resources & Organizations (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- General Business, Economics & Management (AREA)
- Health & Medical Sciences (AREA)
- Economics (AREA)
- General Health & Medical Sciences (AREA)
- Tourism & Hospitality (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Computational Linguistics (AREA)
- Fuzzy Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
Claims (16)
- 一种异常用户审核方法,包括:获取多个待审核用户的历史行为数据,其中,所述历史行为数据包括多个待审核用户之间互动所形成的行为数据;针对每个待审核用户,从所述历史行为数据中提取多个预设有效特征,其中,所述预设有效特征为预先设置的样本数据中的特征;根据所述多个有效特征关联的事件的预设概率计算所述每个待审核用户为异常用户的概率;采用所述多个待审核用户为异常用户的概率建立总概率函数;以预设条件为约束对所述总概率函数求解最大值以确定出候选用户;对所述候选用户进行审核得到异常用户。
- 根据权利要求1所述的方法,在所述获取多个待审核用户的历史行为数据之前,还包括:获取样本数据,其中,所述样本数据包括有效特征、在所述有效特征关联的事件发生的情况下用户为异常用户的预设概率。
- 根据权利要求2所述的方法,其中,所述获取样本数据,包括:获取指定数量的样本用户的历史行为数据,其中,所述指定数量的样本用户为标注了异常用户标签和正常用户标签的用户;从所述指定数量的样本用户的历史行为数据中提取多个有效特征;将所述有效特征关联的事件发生时所述样本用户为异常用户的概率作为预设概率。
- 根据权利要求3所述的方法,其中,所述从所述指定数量的样本用户的历史行为数据中提取多个有效特征,包括:从所述指定数量的样本用户的历史行为数据中提取多个行为特征;针对每个行为特征,获取具有所述每个行为特征的样本用户为异常用户的第一用户数量,以及获取具有所述每个行为特征的样本用户为正常用户的第二用户数量;获取所述指定数量的样本用户中异常用户的第三用户数量和正常用户的第四用户数量;计算所述第一用户数量与所述第三用户数量的第一比值,以及计算所述第二用户数量与所述第四用户数量的第二比值;计算所述第一比值和所述第二比值的差值的绝对值;在所述绝对值大于预设阈值的情况下确定所述每个行为特征为有效特征。
- 根据权利要求1所述的方法,其中,所述获取多个待审核用户的历史行为数据,包括:从多个平台的埋点获取所述多个待审核用户的历史行为数据,其中,所述历史行为数据包括以下至少一种:用户性别、用户点赞的视频、用户的评论、用户关注的账户、用户被关注的账户、用户分享的视频、用户点赞视频的数量、用户发布视频的数量。
- 根据权利要求1所述的方法,其中,所述根据所述多个有效特征关联的事件的预设概率计算所述每个待审核用户为异常用户的概率,包括:对所述多个预设有效特征关联的事件的预设概率求和得到所述每个待审核用户为异常用户的概率。
- 根据权利要求1-6中任一项所述的方法,其中,所述以预设条件为约束对所述总概率函数求解最大值以确定出候选用户,包括:初始化异常用户集合和正常用户集合以将所述多个待审核用户划分到所述异常用户集合和所述正常用户集合中;基于所述异常用户集合和所述正常用户集合计算所述总概率函数的第一函数值;遍历所述异常用户集合和所述正常用户集合以重新计算所述总概率函数的第二函数值;根据所述第一函数值和所述第二函数值更新所述异常用户集合和所述正常用户集合,返回遍历所述异常用户集合和所述正常用户集合以重新计算所述总概率函数的第二函数值的步骤,直到遍历完所述正常用户集合后,所述异常用户集合中待审核用户的数量小于所述多个待审核用户中异常用户的总数量,并且大于所述总数量与召回率的乘积,所述总数量为所述异常用户的预设比例与所述多个待审核用户的数量的乘积;将所述异常用户集合中所包含的待审核用户确定为所述候选用户。
- 根据权利要求7所述的方法,其中,所述初始化异常用户集合和正常用户集合以将所述多个待审核用户划分到所述异常用户集合和所述正常用户集合中,包括:将所述多个待审核用户作为正常用户,对所述总概率函数求解最大值时得到的待审核用户的集合作为异常用户集合;将所述多个待审核用户中除了所述异常用户集合以外的待审核用户的集合作为正常用户集合。
- 根据权利要求7所述的方法,其中,所述初始化异常用户集合和正常用户集合以将所述多个待审核用户划分到所述异常用户集合和所述正常用户集合中,包括:将每个待审核用户的历史行为数据输入预先训练好的分类模型中得到所述每个待审核用户的分类结果,其中,所述分类结果为所述每个待审核用户为异常用户或者正常用户;根据所述分类结果将所述每个待审核用户划分到所述异常用户集合或者所述正常用户集合中。
- 根据权利要求7所述的方法,其中,所述遍历所述异常用户集合和所述正常用户集合以重新计算所述总概率函数的第二函数值,包括:遍历所述正常用户集合以将所述正常用户集合中当前遍历到的第一待审核用户作为异常用户;在遍历到所述正常用户集合中的所述第一待审核用户后,遍历所述异常用户集合以将所述异常用户集合中当前遍历到的第二待审核用户作为正常用户;在遍历到所述异常用户集合中的所述第二待审核用户后,基于当前的异常用户集合和当前的正常用户集合计算所述总概率函数的第二函数值;重复执行所述遍历所述异常用户集合以将所述异常用户集合中当前遍历到的第二待审核用户作为正常用户,在遍历到所述异常用户集合中的所述第二待审核用户后,基于当前的异常用户集合和当前的正常用户集合计算所述总概率函数的第二函数值的操作,直至遍历完所述异常用户集合,得到多个第二函数值;从所述多个第二函数值中确定出最小第二函数值。
- 根据权利要求10所述的方法,其中,所述根据所述第一函数值和所述第二函数值更新所述异常用户集合和所述正常用户集合,包括:在所述最小第二函数值大于所述第一函数值的情况下,在所述异常用户集合中确定出所述最小第二函数值对应的第二待审核用户;将所述最小第二函数值对应的第二待审核用户确定为正常用户,并移到所述正常用户集合中,以及将所述当前遍历到的所述第一待审核用户确定为异常用户,并从所述正常用户集合移到所述异常用户集合中,返回所述遍历所述正常用户集合以将所述正常用户集合中当前遍历到的第一待审核用户作为异常用 户的操作。
- 根据权利要求1-6中任一项所述的方法,其中,所述对所述候选用户进行审核得到异常用户,包括:将所述候选用户的用户标识发送至审核后台,以使所述审核后台对所述候选用户进行人工审核;接收所述审核后台发送的审核结果,其中,所述审核结果包括从所述候选用户中确定出的所述异常用户。
- 根据权利要求12所述的方法,在所述接收所述审核后台发送的审核结果之后,还包括:为所述异常用户标注异常用户标签,以及为所述待审核用户中除了所述异常用户以外的用户标注正常用户标签;将标注异常用户标签或者正常用户标签后的用户作为样本用户来更新所述样本数据。
- 一种异常用户审核装置,包括:历史行为数据获取模块,设置为获取多个待审核用户的历史行为数据,其中,所述历史行为数据包括多个待审核用户之间互动所形成的行为数据;特征提取模块,设置为针对每个待审核用户,从所述历史行为数据中提取多个预设有效特征,其中,所述预设有效特征为预先设置的样本数据中的特征;用户概率计算模块,设置为根据所述多个预设有效特征关联的事件的预设概率计算所述每个待审核用户为异常用户的概率;总概率函数建立模块,设置为采用所述多个待审核用户为异常用户的概率建立总概率函数;总概率函数求解模块,设置为以预设条件为约束对所述总概率函数求解最大值以确定出候选用户;审核模块,设置为对所述候选用户进行审核得到异常用户。
- 一种电子设备,包括:至少一个处理器;存储装置,设置为存储至少一个程序;当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现如权利要求1-13中任一项所述的异常用户审核方法。
- 一种计算机可读存储介质,存储有计算机程序,其中,所述程序被处理器执行时实现如权利要求1-13中任一项所述的异常用户审核方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US18/245,653 US20230336637A1 (en) | 2020-09-30 | 2021-08-30 | Method and apparatus for moderating abnormal users, electronic device, and storage medium |
EP21874154.4A EP4198775A4 (en) | 2020-09-30 | 2021-08-30 | ABNORMAL USER AUDIT METHOD AND APPARATUS, ELECTRONIC DEVICE AND STORAGE MEDIUM |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011066006.0A CN112199640B (zh) | 2020-09-30 | 2020-09-30 | 异常用户审核方法、装置、电子设备和存储介质 |
CN202011066006.0 | 2020-09-30 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022068493A1 true WO2022068493A1 (zh) | 2022-04-07 |
Family
ID=74013675
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/115230 WO2022068493A1 (zh) | 2020-09-30 | 2021-08-30 | 异常用户审核方法、装置、电子设备和存储介质 |
Country Status (4)
Country | Link |
---|---|
US (1) | US20230336637A1 (zh) |
EP (1) | EP4198775A4 (zh) |
CN (1) | CN112199640B (zh) |
WO (1) | WO2022068493A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115994203A (zh) * | 2023-02-20 | 2023-04-21 | 广州佰锐网络科技有限公司 | 一种基于ai的数据标注处理方法、系统及ai中台 |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112199640B (zh) * | 2020-09-30 | 2024-03-12 | 广州市百果园网络科技有限公司 | 异常用户审核方法、装置、电子设备和存储介质 |
CN113766256A (zh) * | 2021-02-09 | 2021-12-07 | 北京沃东天骏信息技术有限公司 | 一种直播风控方法和装置 |
CN113163218A (zh) * | 2021-02-09 | 2021-07-23 | 百果园技术(新加坡)有限公司 | 直播间内用户的检测方法和系统、电子设备及存储介质 |
CN113255929B (zh) * | 2021-05-27 | 2023-04-18 | 支付宝(中国)网络技术有限公司 | 异常用户可解释原因的获取方法和装置 |
CN113485305B (zh) * | 2021-07-28 | 2023-04-07 | 成都飞机工业(集团)有限责任公司 | 一种飞机外勤故障诊断系统与方法 |
CN114205676B (zh) * | 2021-12-08 | 2024-05-28 | 广州方硅信息技术有限公司 | 直播监测方法、装置、介质以及计算机设备 |
CN116488934A (zh) * | 2023-05-29 | 2023-07-25 | 无锡车联天下信息技术有限公司 | 一种基于域控制器的网络安全管理方法及系统 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170124464A1 (en) * | 2015-10-28 | 2017-05-04 | Fractal Industries, Inc. | Rapid predictive analysis of very large data sets using the distributed computational graph |
CN107517251A (zh) * | 2017-08-16 | 2017-12-26 | 北京小度信息科技有限公司 | 信息推送方法和装置 |
CN110237530A (zh) * | 2019-06-14 | 2019-09-17 | 腾讯科技(深圳)有限公司 | 异常行为检测方法、装置和可读存储介质 |
CN110929799A (zh) * | 2019-11-29 | 2020-03-27 | 上海盛付通电子支付服务有限公司 | 用于检测异常用户的方法、电子设备和计算机可读介质 |
CN111090813A (zh) * | 2019-12-20 | 2020-05-01 | 腾讯科技(深圳)有限公司 | 一种内容处理方法、装置和计算机可读存储介质 |
CN112199640A (zh) * | 2020-09-30 | 2021-01-08 | 广州市百果园网络科技有限公司 | 异常用户审核方法、装置、电子设备和存储介质 |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2008141256A2 (en) * | 2007-05-10 | 2008-11-20 | Mary Kay Hoal | Social networking system |
CN103853841A (zh) * | 2014-03-19 | 2014-06-11 | 北京邮电大学 | 一种社交网用户异常行为的分析方法 |
US9985916B2 (en) * | 2015-03-03 | 2018-05-29 | International Business Machines Corporation | Moderating online discussion using graphical text analysis |
-
2020
- 2020-09-30 CN CN202011066006.0A patent/CN112199640B/zh active Active
-
2021
- 2021-08-30 WO PCT/CN2021/115230 patent/WO2022068493A1/zh active Application Filing
- 2021-08-30 US US18/245,653 patent/US20230336637A1/en active Pending
- 2021-08-30 EP EP21874154.4A patent/EP4198775A4/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170124464A1 (en) * | 2015-10-28 | 2017-05-04 | Fractal Industries, Inc. | Rapid predictive analysis of very large data sets using the distributed computational graph |
CN107517251A (zh) * | 2017-08-16 | 2017-12-26 | 北京小度信息科技有限公司 | 信息推送方法和装置 |
CN110237530A (zh) * | 2019-06-14 | 2019-09-17 | 腾讯科技(深圳)有限公司 | 异常行为检测方法、装置和可读存储介质 |
CN110929799A (zh) * | 2019-11-29 | 2020-03-27 | 上海盛付通电子支付服务有限公司 | 用于检测异常用户的方法、电子设备和计算机可读介质 |
CN111090813A (zh) * | 2019-12-20 | 2020-05-01 | 腾讯科技(深圳)有限公司 | 一种内容处理方法、装置和计算机可读存储介质 |
CN112199640A (zh) * | 2020-09-30 | 2021-01-08 | 广州市百果园网络科技有限公司 | 异常用户审核方法、装置、电子设备和存储介质 |
Non-Patent Citations (1)
Title |
---|
See also references of EP4198775A4 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115994203A (zh) * | 2023-02-20 | 2023-04-21 | 广州佰锐网络科技有限公司 | 一种基于ai的数据标注处理方法、系统及ai中台 |
Also Published As
Publication number | Publication date |
---|---|
EP4198775A1 (en) | 2023-06-21 |
US20230336637A1 (en) | 2023-10-19 |
EP4198775A4 (en) | 2024-03-13 |
CN112199640B (zh) | 2024-03-12 |
CN112199640A (zh) | 2021-01-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022068493A1 (zh) | 异常用户审核方法、装置、电子设备和存储介质 | |
CN110162593B (zh) | 一种搜索结果处理、相似度模型训练方法及装置 | |
US11288444B2 (en) | Optimization techniques for artificial intelligence | |
US11036791B2 (en) | Computerized system and method for determining non-redundant tags from a user's network activity | |
CN105279495B (zh) | 一种基于深度学习和文本总结的视频描述方法 | |
US11514244B2 (en) | Structured knowledge modeling and extraction from images | |
CN106951422B (zh) | 网页训练的方法和装置、搜索意图识别的方法和装置 | |
US9087297B1 (en) | Accurate video concept recognition via classifier combination | |
CN110781273B (zh) | 文本数据处理方法、装置、电子设备及存储介质 | |
CN106874253A (zh) | 识别敏感信息的方法及装置 | |
CN102637163A (zh) | 一种基于语义的多层次本体匹配的控制方法及系统 | |
CN108108353B (zh) | 一种基于弹幕的视频语义标注方法、装置及电子设备 | |
Zhang et al. | Cross-modal image sentiment analysis via deep correlation of textual semantic | |
CN104077417A (zh) | 社交网络中的人物标签推荐方法和系统 | |
CN110598070A (zh) | 应用类型识别方法及装置、服务器及存储介质 | |
CN113688310B (zh) | 一种内容推荐方法、装置、设备及存储介质 | |
CN113301442A (zh) | 确定直播资源的方法、设备、介质及程序产品 | |
CN112183881A (zh) | 一种基于社交网络的舆情事件预测方法、设备及存储介质 | |
CN111143508B (zh) | 一种基于通信类短文本的事件检测与跟踪方法及系统 | |
Yuan et al. | Sentiment analysis using social multimedia | |
CN113535949A (zh) | 基于图片和句子的多模态联合事件检测方法 | |
CN115775349A (zh) | 基于多模态融合的假新闻检测方法和装置 | |
CN117891939A (zh) | 粒子群算法结合cnn卷积神经网络的文本分类方法 | |
Karthikeyan et al. | Machine learning techniques application: social media, agriculture, and scheduling in distributed systems | |
CN113901817A (zh) | 文档分类方法、装置、计算机设备和存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21874154 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 202327016666 Country of ref document: IN |
|
ENP | Entry into the national phase |
Ref document number: 2021874154 Country of ref document: EP Effective date: 20230316 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2023107430 Country of ref document: RU |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 523440093 Country of ref document: SA |
|
WWE | Wipo information: entry into national phase |
Ref document number: 523440093 Country of ref document: SA |