WO2022068493A1 - 异常用户审核方法、装置、电子设备和存储介质 - Google Patents

异常用户审核方法、装置、电子设备和存储介质 Download PDF

Info

Publication number
WO2022068493A1
WO2022068493A1 PCT/CN2021/115230 CN2021115230W WO2022068493A1 WO 2022068493 A1 WO2022068493 A1 WO 2022068493A1 CN 2021115230 W CN2021115230 W CN 2021115230W WO 2022068493 A1 WO2022068493 A1 WO 2022068493A1
Authority
WO
WIPO (PCT)
Prior art keywords
users
user
abnormal
reviewed
user set
Prior art date
Application number
PCT/CN2021/115230
Other languages
English (en)
French (fr)
Inventor
李益永
黄秋实
孙准
井雪
项伟
Original Assignee
百果园技术(新加坡)有限公司
李益永
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 百果园技术(新加坡)有限公司, 李益永 filed Critical 百果园技术(新加坡)有限公司
Priority to US18/245,653 priority Critical patent/US20230336637A1/en
Priority to EP21874154.4A priority patent/EP4198775A4/en
Publication of WO2022068493A1 publication Critical patent/WO2022068493A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/535Tracking the activity of the user
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/1396Protocols specially adapted for monitoring users' activity

Definitions

  • the present application relates to the technical field of content review, for example, to a method, apparatus, electronic device and storage medium for reviewing abnormal users.
  • pedophile users When reviewing users, abnormal users can be pedophile users or other illegal users.
  • pedophile users are the focus of the review. It is necessary to detect pedophile users from a large number of users and punish them.
  • the video data has the problem of inaccurate information. For example, it is impossible to determine whether a video is a video that is prohibited from being watched by minors, and whether it is a pornographic video.
  • pedophile users have confrontational behaviors, such as mutual interactions between pedophile users.
  • Interactive behaviors such as following and liking, to avoid detection by following each other to achieve tipping off, or use morphing words of pornographic text that can only be understood by pedophile users, or even create pedophile vocabulary exclusively for pedophile users In order to avoid detection, it is impossible to extract effective features related to pedophilia to detect pedophile users.
  • the method for classifying users is to input the characteristics of a user to obtain the user's classification.
  • the user's behavior and classification results will not affect the classification results of other users, nor will they be affected by the behavior and classification results of other users.
  • the algorithm for classifying users cannot dynamically find out whether a user is a pedophile by combining the interaction between pedophile users.
  • the present application provides an abnormal user review method, device, electronic device and storage medium to solve the problem that in the related art, due to inaccurate information, inability to extract effective features, and classification algorithms to classify a single user, it is impossible to use the relationship between pedophile users. Interactive behavior, resulting in the inability to effectively and accurately review pedophile users.
  • This application provides an abnormal user review method, including:
  • preset valid features are extracted from the historical behavior data, wherein the preset valid features are features in preset sample data
  • the maximum value of the total probability function is obtained with the preset condition as a constraint to determine a candidate user; the candidate user is reviewed to obtain an abnormal user.
  • the present application provides an abnormal user review device, including:
  • a historical behavior data acquisition module configured to acquire historical behavior data of a plurality of users to be reviewed, wherein the historical behavior data includes behavior data formed by interactions between a plurality of users to be reviewed;
  • a feature extraction module configured to extract a plurality of preset valid features from the historical behavior data for each user to be reviewed, wherein the preset valid features are features in preset sample data;
  • a user probability calculation module configured to calculate the probability that each user to be reviewed is an abnormal user according to the preset probability of the events associated with the plurality of preset valid features
  • a total probability function establishment module configured to establish a total probability function using the probability that the plurality of users to be reviewed are abnormal users
  • a total probability function solving module configured to use preset conditions as constraints to solve the maximum value of the total probability function to determine candidate users
  • the auditing module is configured to audit the candidate users to obtain abnormal users.
  • the application provides an electronic device, the electronic device includes:
  • processors one or more processors
  • storage means arranged to store one or more programs
  • the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors implement the above-mentioned abnormal user auditing method.
  • the present application provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the above-mentioned abnormal user auditing method is implemented.
  • Fig. 1 is a flow chart of steps of a method for reviewing abnormal users provided in Embodiment 1 of the present application;
  • Embodiment 2 is a flow chart of steps of a method for checking abnormal users provided in Embodiment 2 of the present application;
  • FIG. 3 is a structural block diagram of an abnormal user review device provided in Embodiment 3 of the present application.
  • FIG. 4 is a schematic structural diagram of an electronic device provided in Embodiment 4 of the present application.
  • FIG. 1 is a flowchart of steps of a method for reviewing abnormal users provided in Embodiment 1 of the present application.
  • the embodiment of the present application can be applied to reviewing abnormal users to detect pedophile users or other illegal users.
  • the method can be Executed by the abnormal user auditing device of the embodiment of the present application, the abnormal user auditing device may be implemented by hardware or software, and integrated in the electronic equipment provided by the embodiment of the present application, as shown in FIG.
  • the abnormal user review method can include the following steps:
  • the abnormal user may refer to a pedophile user, and may also include users with other illegal behaviors.
  • the embodiment of the present application uses a pedophile user as an example to describe the method for reviewing abnormal users.
  • the users to be reviewed can be users on short video and live broadcast platforms, and the users to be reviewed can be some of the specified users or all users.
  • Historical behavior data can include data formed by the behavior of users to be reviewed on short videos and live broadcast platforms, such as data formed by recording videos watched, liked videos, comments, following other users, etc. Historical behavior data can also be Including data formed by other users on the behavior of the user to be reviewed, such as other users follow the user to be reviewed, like, watch, comment on the video posted by the user to be reviewed, comments and other data.
  • the historical behavior data may also include information of the user to be reviewed, such as a user identifier (User Identifier, UID), gender, and the like of the user to be reviewed.
  • data embedding points can be set up on short video and live broadcast platforms
  • the historical behavior data of users to be reviewed can be collected through data embedding
  • the historical behavior data of users to be reviewed can also be obtained through other methods such as user logs.
  • This application implements The example does not limit the way to obtain historical behavior data.
  • a specified number of sample users may be determined first, and the specified number of sample users may be users marked with normal user labels and abnormal user labels.
  • the historical behavior of the specified number of sample users is obtained.
  • Data extract behavioral features from the historical behavior data of a specified number of sample users, and determine the validity of the behavioral feature.
  • the behavioral feature is valid, the behavioral feature is used as an effective feature, and the events associated with the effective feature are determined, and the effective feature is calculated by a statistical algorithm
  • the probability that a user is an abnormal user when the associated event occurs, the probability is taken as a preset probability, and finally sample data including multiple valid features and preset probabilities of events associated with multiple valid features are obtained.
  • the sample data can be updated by using the reviewed user as a sample user.
  • the valid features may be prohibited words related to pedophilia contained in the comments of the user to be reviewed, and the likes, comments, and views of the user to be reviewed are liked, commented, and viewed.
  • the corresponding preset effective features can be extracted from the behavior data of the users to be reviewed, such as comments, views, likes, and publications.
  • the sample data includes multiple preset valid features and preset probabilities of events associated with the preset valid features. For each user to be reviewed, after extracting multiple preset valid features, you can The probability that the user to be reviewed is an abnormal user is obtained by summing the preset probabilities of events associated with valid features.
  • the total probability function is the sum of the probability that all users to be reviewed are abnormal users.
  • the function value of the total probability function change.
  • the proportion of abnormal users in a specified number of sample users can be calculated, and the proportion of abnormal users among all users to be reviewed can be obtained by multiplying the ratio by the total number of all users to be reviewed.
  • the recall rate can be set, and the product of the recall rate and the total number of all users to be reviewed can be calculated, and the constraint condition can be set as follows: when the function value of the total probability function is the largest, the number of candidate users is greater than the recall rate and all the users to be reviewed. The product of the total number of , and is less than the number of abnormal users among all users to be reviewed.
  • the users to be reviewed can be preliminarily divided into the abnormal user set and the normal user set, and the users to be reviewed in the abnormal user set are assumed to be abnormal users, and the users to be reviewed in the normal user set are assumed to be normal users. , and calculate the first function value of the total probability function based on the initially divided abnormal user set and normal user set.
  • the audit user after traversing the abnormal user set each time, calculates the function value of the total probability function based on the current abnormal user set and the normal user set to obtain multiple second function values, and determines from the multiple second function values.
  • the minimum second function value When the minimum second function value is greater than the first function value, the second pending user corresponding to the minimum second function value is determined in the abnormal user set, and the second pending user corresponding to the minimum second function value is determined.
  • the reviewing user is determined as a normal user, and moved to the normal user set, and the first user to be reviewed that is currently traversed is determined as an abnormal user, and is moved from the normal user set to the abnormal user set, and returns to traverse the normal user set
  • the user ID of the candidate user can be sent to the audit background, and the abnormal user is obtained by manually auditing the candidate user in the audit background.
  • the historical behavior data of the user to be reviewed includes the data formed by the interactive behaviors of the users to be reviewed.
  • the probability that the user to be reviewed is an abnormal user is calculated by extracting the valid features from the historical behavior data. , and use the probability that all users to be reviewed are abnormal users to construct a total probability function, solve the maximum value of the total probability function with preset constraints to obtain candidate users, and review the candidate users to obtain abnormal users.
  • FIG. 2 is a flowchart of steps of an abnormal user review method provided in Embodiment 2 of the present application. This embodiment of the present application is described on the basis of the foregoing Embodiment 1. As shown in FIG. 2 , the abnormal user review method in this embodiment of the present application is described. The method may include the following steps:
  • sample data includes valid features and a preset probability that a user is an abnormal user when an event associated with the valid feature occurs.
  • sample data can be preset through statistics, and the sample data includes valid features and a preset probability that the user is an abnormal user when an event associated with the valid feature occurs.
  • a specified number such as 1000
  • the sample users are the users marked with the abnormal user label and the normal user label, and then multiple valid features are extracted from the historical behavior data of the sample users, and the events associated with the valid features are calculated.
  • the probability that the sample user is an abnormal user is taken as the preset probability.
  • the sample users of the feature are the number of second users of normal users, obtain the number of third users of abnormal users and the number of fourth users of normal users among the specified number of sample users, and calculate the first ratio of the number of first users to the number of third users , and calculating the second ratio between the second quantity and the fourth quantity, calculating the absolute value of the difference between the first ratio and the second ratio, and determining that the behavior feature is an effective feature when the absolute value is greater than a preset threshold.
  • an abnormal user is reviewed to determine a pedophile user, and it is assumed that N behavioral features are extracted from the historical behavior data of all sample users, one of which behavioral feature B is a prohibited word contained in the user's comment, Among the specified number of sample users, the proportion of pedophile users with behavioral feature B among all pedophile users is P_1, and the proportion of normal users with behavioral feature B in the specified number of sample users is P_2 among all normal users, If
  • the probability of being an abnormal user is P(A 0
  • the embodiment of the present application extracts behavioral features from the historical behavior data of sample users and determines effective features, and can set effective features for pedophilia or other illegal behaviors to audit abnormal users, which solves the problem of pedophile users in the related art.
  • behavior pedophile users use inflections of pornographic words that can only be understood by pedophiles, and even create pedophile vocabulary for pedophile users to avoid detection, resulting in the inability to extract effective pedophile-related words.
  • the problem of detecting pedophile users by using features, effective features can be set for pedophile users to detect pedophile users, which improves the accuracy of pedophile user detection.
  • S202 Acquire historical behavior data of multiple users to be reviewed from buried points of multiple platforms.
  • Historical behavior data can include user gender, videos liked by users, user comments, accounts that users follow, At least one of the accounts followed by the user, the videos shared by the user, the number of videos liked by the user, and the number of videos posted by the user, and the historical behavior data may also include other behavior data.
  • the sample data After the sample data is determined by the sample user, the sample data includes preset valid features for reviewing abnormal users, and the corresponding preset valid features can be extracted from the historical behavior data of each user to be reviewed.
  • the preset probability of the event associated with each preset valid feature included in the sample data For each user to be reviewed, the preset probability of the event associated with multiple preset valid features of the user to be reviewed can be determined. Sum the probabilities to get the probability that each user to be reviewed is an abnormal user.
  • the total probability function is obtained by summing the probabilities of all users to be reviewed as abnormal users. For this total probability function, when one of the users to be reviewed changes from abnormal users to normal users, or from normal users to abnormal users, the When the probability that the user is an abnormal user changes, the function value of the total probability function also changes.
  • S206 Initialize the abnormal user set and the normal user set to divide the multiple users to be reviewed into the abnormal user set and the normal user set.
  • all users to be reviewed may be regarded as normal users, the set of users to be reviewed obtained when the maximum value of the total probability function is obtained as the set of abnormal users, and the set of users to be reviewed except the set of abnormal users may be regarded as the set of abnormal users.
  • the collection of users to be reviewed is regarded as the normal user collection.
  • the total number of users to be reviewed is num( ⁇ ), the proportion of abnormal users can be counted as s through the specified number of sample users, and the product of this ratio s and the total number of users to be reviewed num( ⁇ ) is calculated to get the value to be reviewed.
  • the number of abnormal users in the user is num( ⁇ 1 ), when assuming that all the users to be reviewed are normal users, calculate the probability that each user to be reviewed is an abnormal user, so that the total probability function takes the maximum value of num( ⁇ 1 ) to be
  • the set constituted by the auditing users is regarded as the abnormal user set P, and the set constituted by the remaining users to be audited is regarded as the normal user set Q.
  • the historical behavior data of each user to be reviewed can be input into a pre-trained classification model to obtain a classification result of the user to be reviewed, where the classification result is that the user to be reviewed is an abnormal user or a normal user Probability, according to the probability that the user to be reviewed is an abnormal user or a normal user in the classification result, the user to be reviewed is divided into the abnormal user set or the normal user set, such as training regression neural network, deep neural network, recurrent neural network and other networks to be reviewed The user performs preliminary classification, obtains the classification probability that each user to be reviewed is an abnormal user or a normal user, and takes the set of num( ⁇ 1 ) users to be reviewed before the classification probability of the user to be reviewed as an abnormal user as the abnormal user collection, and the rest of the users to be reviewed are divided into the normal user collection.
  • abnormal user set and the normal user set can also be initialized in other ways.
  • the users to be reviewed in the abnormal user set are regarded as abnormal users, and the users to be reviewed in the normal user set are regarded as normal users to calculate the first function value of the total probability function, because the total probability function is the sum of the probabilities of all users to be reviewed.
  • the probability of each user to be reviewed changes, and the value of the total probability function also changes to obtain the first function value S 0 .
  • the normal user set can be traversed to take the first user to be reviewed currently traversed in the normal user set as the abnormal user, and after traversing the normal user set
  • traverse the abnormal user set After the first user to be reviewed in the set, traverse the abnormal user set to take the second user to be reviewed currently traversed in the abnormal user set as a normal user, after traversing the second user to be reviewed in the abnormal user set, based on the current
  • the abnormal user set and the current normal user set calculate the second function value S 1 of the total probability function, and repeat the traversal of the abnormal user set to take the second user to be reviewed currently traversed in the abnormal user set as a normal user.
  • the minimum second function value can be determined from a plurality of second function values obtained after traversing the abnormal user set each time, and when the minimum second function value is greater than the first function value, the minimum second function value is determined in the abnormal user set.
  • the second user to be reviewed corresponding to the function value the second user to be reviewed corresponding to the smallest second function value is determined to be a normal user and moved to the normal user set, and the currently traversed first user to be reviewed is determined to be an abnormal user, And move from the normal user set to the abnormal user set, and return to the step of traversing the normal user set to take the first user to be reviewed currently traversed in the normal user set as the abnormal user.
  • all users Qn to be reviewed in the normal user set Q are normal users
  • all users Pm to be reviewed in the abnormal user set P are abnormal users
  • the traversal process of the abnormal user set and the normal user set is as follows:
  • the number of users to be reviewed is less than the total number of abnormal users among all users to be reviewed, and greater than the product of the total number and the recall rate, where the total number is the product of the preset proportion of abnormal users and the number of all users to be reviewed, that is, the total number of users.
  • the problem of solving the probability function is as follows:
  • P(x i ) is the probability that the i-th user to be reviewed is an abnormal user
  • is the set of all users to be reviewed
  • st(subject to) is to satisfy a condition
  • r is the preset recall rate
  • V1 is the set of abnormal users after the traversal is terminated
  • num(V 1 ) is the number of abnormal users in the set of abnormal users
  • num( ⁇ 1 ) is the number of all abnormal users among all users to be reviewed
  • num( ⁇ 1 ) s ⁇ num( ⁇ )
  • s is the preset proportion of abnormal users.
  • the condition for stopping the traversal is: after traversing the normal user set, the number of users to be reviewed in the abnormal user set is less than the total number of abnormal users in all the users to be reviewed, and is greater than the product of the total number and the recall rate, where, The total number is the product of the preset proportion of abnormal users and the number of all users to be reviewed.
  • the user to be reviewed in the abnormal user set obtained after traversing the normal user set and satisfying the preset conditions is the candidate user, and the candidate user can be manually reviewed.
  • the user identification of the candidate user may be sent to the review background, and the candidate user is reviewed in the review background. If the candidate user is determined to be an abnormal user through manual review, such as a paedophile user , the candidate user is marked as an abnormal user, otherwise it is marked as a normal user.
  • an abnormal user label may be marked for the abnormal user, and a normal user label may be marked for the users other than the abnormal user among the users to be reviewed.
  • the users marked with abnormal user labels or normal user labels are used as sample users to update the sample data.
  • sample data including valid features and a preset probability that a user is an abnormal user occurs when an event associated with the valid feature occurs
  • the Each user to be reviewed extracts multiple preset valid features from historical behavior data, and sums the preset probabilities of events associated with multiple preset valid features to obtain the probability that the user to be reviewed is an abnormal user, using all users to be reviewed as
  • the probability of abnormal users establishes a total probability function. After initializing the abnormal user set and the normal user set, calculate the first function value of the total probability function based on the abnormal user set and the normal user set, and traverse the abnormal user set and the normal user set to recalculate the total probability.
  • the second function value of the function is to update the abnormal user set and the normal user set according to the first function value and the second function value, and when the maximum function value of the total probability function is obtained after traversing the normal user set under preset conditions, the abnormal user set is
  • the users to be reviewed included in the set are determined as candidate users, and the candidate users are reviewed to obtain abnormal users.
  • valid features are preset through sample data, and valid features are extracted from historical behavior data and converted into the probability of events associated with valid features.
  • the effective features of user review, and the feature calculation is converted into probability calculation, which solves the problem that inaccurate data information affects the review accuracy of abnormal users.
  • the user not only utilizes the interactive behavior between the users to be reviewed, but also makes the classification results of the users to be reviewed affect each other, so that the candidate users can be accurately determined for review.
  • effective features can be set for pedophilia or other illegal behaviors to audit abnormal users, which solves the problem that pedophile users in related technologies have confrontational behaviors and love Pedophile users use inflections of pornographic texts that can only be understood by pedophiles, and even create pedophile vocabulary for pedophile users to avoid detection, resulting in the inability to extract effective features related to pedophilia for detection.
  • effective features can be set for pedophile users to detect pedophile users, which improves the accuracy of pedophile user detection.
  • the abnormal user is marked with the abnormal user label
  • the normal user is marked with the normal user label
  • the marked user is used as the sample user to update the valid features of the sample data.
  • the user is an abnormal user.
  • the preset probability on the one hand, can dynamically update the sample data to dynamically solve the candidate users; on the other hand, it increases the data source of the sample users and reduces the acquisition cost of the sample users.
  • FIG. 3 is a structural block diagram of an abnormal user verification device provided in Embodiment 3 of the present application.
  • the abnormal user verification device in the embodiment of the present application may include the following modules:
  • the historical behavior data acquisition module 301 is configured to acquire the historical behavior data of a plurality of users to be reviewed, and the historical behavior data includes behavior data formed by the interaction between the multiple users to be reviewed;
  • the feature extraction module 302 is configured to For the user to be reviewed, multiple preset valid features are extracted from the historical behavior data, and the preset valid features are features in the preset sample data;
  • the user probability calculation module 303 is set to be based on the multiple preset features The preset probability of the event associated with the valid features calculates the probability that the user to be reviewed is an abnormal user;
  • the total probability function establishment module 304 is configured to use the probability that all users to be reviewed are abnormal users to establish a total probability function;
  • the total probability function solving module 305 set to obtain the maximum value of the total probability function with the preset condition as a constraint to determine candidate users;
  • the audit module 306 set to audit the candidate users to obtain abnormal users.
  • the abnormal user verification apparatus provided in the embodiment of the present application can execute the abnormal user verification method provided in the first embodiment or the second embodiment of the present application, and has corresponding functional modules and effects of the execution method.
  • FIG. 4 is a schematic structural diagram of an electronic device provided in Embodiment 4 of the present application.
  • the electronic device may include: a processor 401 , a memory 402 , a display screen 403 with a touch function, an input device 404 , an output device 405 and a communication device 406 .
  • the number of processors 401 in the electronic device may be one or more, and one processor 401 is taken as an example in FIG. 4 .
  • the processor 401 , the memory 402 , the display screen 403 , the input device 404 , the output device 405 and the communication device 406 of the electronic device can be connected through a bus or other means, and the connection through a bus is taken as an example in FIG. 4 .
  • the electronic device is configured to execute the abnormal user review method provided by any embodiment of the present application.
  • Embodiments of the present application further provide a computer-readable storage medium, when the instructions in the storage medium are executed by the processor of the electronic device, the electronic device can execute the abnormal user verification method described in the above method embodiments.
  • references to the terms “one embodiment,” “some embodiments,” “example,” “example,” or “some examples”, etc. means the features, structures, structures, A material or feature is included in at least one embodiment or example of the present application.
  • schematic representations of the above terms do not necessarily refer to the same embodiment or example.
  • the described features, structures, materials or characteristics may be combined in any suitable manner in any one or more embodiments or examples.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Mathematical Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Algebra (AREA)
  • Operations Research (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Business, Economics & Management (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Human Resources & Organizations (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Computational Linguistics (AREA)
  • Fuzzy Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

一种异常用户审核方法、装置、电子设备和存储介质。异常用户审核包括:获取多个待审核用户的历史行为数据,其中,历史行为数据包括多个待审核用户之间互动所形成的历史行为数据(S101);针对每个待审核用户从历史行为数据中提取多个预设有效特征,其中,预设有效特征为预先设置的样本数据中的特征(S102);根据多个有效特征关联的事件的预设概率计算每个待审核用户为异常用户的概率(S103);采用多个待审核用户为异常用户的概率建立总概率函数(S104);以预设条件为约束对总概率函数求解最大值以确定出候选用户(S105),对候选用户进行审核得到异常用户(S106)。

Description

异常用户审核方法、装置、电子设备和存储介质
本申请要求在2020年09月30日提交中国专利局、申请号为202011066006.0的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本申请涉及内容审核技术领域,例如涉及一种异常用户审核方法、装置、电子设备和存储介质。
背景技术
随着互联网技术发展,多种视频平台进入人们的日常生活中,人们可以通过短视频、直播等视频平台观看视频或者上传视频,然而,由于用户的多样性以及为了维护健康的网络环境需求,需要对用户进行审核以确定用户是否为异常用户。
在对用户进行审核时,异常用户可以是恋童癖用户或者其他违规用户,为了保护未成年人,恋童癖用户是审核重点,需要从大量用户中检测出恋童癖用户进行惩罚。然而,视频数据存在信息不准确的问题,如无法确定一个视频是否为禁止未成年观看的视频,是否是色情视频,另外,恋童癖用户具有对抗行为,如恋童癖用户之间会产生互相关注、点赞等互动行为,通过互相关注实现通风报信来避免被检测,或者使用恋童癖用户之间才可以理解的色情文字的变形词、甚至创造恋童癖用户之间专用的恋童词汇来避免被检测,造成无法提取与恋童癖相关的有效特征来检测恋童癖用户。
再者,对用户进行分类的方法是输入一个用户的特征得到该用户的分类,该用户的行为、分类结果不会影响其他用户的分类结果,也不会受其他用户的行为和分类结果所影响,对用户分类的算法无法结合恋童癖用户之间的互动行为来动态求解用户是否为恋童癖用户。
综上所述,相关技术中由于信息不准确、恋童癖用户具有对抗行为无法提取有效特征,并且分类算法对单一用户进行分类时无法利用恋童癖用户之间的互动行为,最终导致无法有效准确地对恋童癖用户进行审核、惩罚。
发明内容
本申请提供一种异常用户审核方法、装置、电子设备和存储介质,以解决相关技术中由于信息不准确、无法提取有效特征以及分类算法对单一用户进行 分类时无法利用恋童癖用户之间的互动行为,导致无法有效、准确地对恋童癖用户进行审核的问题。
本申请提供了一种异常用户审核方法,包括:
获取多个待审核用户的历史行为数据,其中,所述历史行为数据包括多个待审核用户之间互动所形成的行为数据;
针对每个待审核用户,从所述历史行为数据中提取多个预设有效特征,其中,所述预设有效特征为预先设置的样本数据中的特征
根据所述多个预设有效特征关联的事件的预设概率计算所述每个待审核用户为异常用户的概率;
采用所述多个待审核用户为异常用户的概率建立总概率函数;
以预设条件为约束对所述总概率函数求解最大值以确定出候选用户;对所述候选用户进行审核得到异常用户。
本申请提供了一种异常用户审核装置,包括:
历史行为数据获取模块,设置为获取多个待审核用户的历史行为数据,其中,所述历史行为数据包括多个待审核用户之间互动所形成的行为数据;
特征提取模块,设置为针对每个待审核用户,从所述历史行为数据中提取多个预设有效特征,其中,所述预设有效特征为预先设置的样本数据中的特征;
用户概率计算模块,设置为根据所述多个预设有效特征关联的事件的预设概率计算所述每个待审核用户为异常用户的概率;
总概率函数建立模块,设置为采用所述多个待审核用户为异常用户的概率建立总概率函数;
总概率函数求解模块,设置为以预设条件为约束对所述总概率函数求解最大值以确定出候选用户;
审核模块,设置为对所述候选用户进行审核得到异常用户。
本申请提供了一种电子设备,所述电子设备包括:
一个或多个处理器;
存储装置,设置为存储一个或多个程序;
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现上述的异常用户审核方法。
本申请提供了一种计算机可读存储介质,其上存储有计算机程序,该程序 被处理器执行时实现上述的异常用户审核方法。
附图说明
图1是本申请实施例一提供的一种异常用户审核方法的步骤流程图;
图2是本申请实施例二提供的一种异常用户审核方法的步骤流程图;
图3是本申请实施例三提供的一种异常用户审核装置的结构框图;
图4是本申请实施例四提供的一种电子设备的结构示意图。
具体实施方式
下面结合附图和实施例对本申请进行说明。
实施例一
图1是本申请实施例一提供的一种异常用户审核方法的步骤流程图,本申请实施例可适用于对异常用户进行审核以检测出恋童癖用户或者其他违规用户的情况,该方法可以由本申请实施例的异常用户审核装置来执行,该异常用户审核装置可以由硬件或软件来实现,并集成在本申请实施例所提供的电子设备中,如图1所示,本申请实施例的异常用户审核方法可以包括如下步骤:
S101、获取多个待审核用户的历史行为数据,所述历史行为数据包括多个待审核用户之间互动所形成的行为数据。
在本申请实施例中,异常用户可以是指恋童癖用户,还可以包括其他违规行为的用户,本申请实施例以恋童癖用户为示例来说明异常用户审核方法。待审核用户可以是短视频、直播平台上的用户,待审核用户可以是指定的部分用户,也可以是全部用户。
历史行为数据可以包括待审核用户在短视频、直播平台上的行为所形成的数据,如记录用户观看的视频、点赞的视频、评论、关注其他用户等所形成的数据,历史行为数据还可以包括其他用户针对待审核用户的行为所形成的数据,如其他用户关注了待审核用户,点赞、观看、评论了待审核用户发布的视频、评论等数据。历史行为数据还可以包括待审核用户的信息,例如待审核用户的用户标识(User Identifier,UID)、性别等。
在实际应用中,可以在短视频、直播平台设置数据埋点,通过数据埋点采集待审核用户的历史行为数据,还可以通过其他诸如用户日志等获取待审核用户的历史行为数据,本申请实施例对获取历史行为数据的方式不加以限制。
S102、针对每个待审核用户,从所述历史行为数据中提取多个预设有效特 征,所述预设有效特征为预先设置的样本数据中的特征。
在本申请实施例,可以先确定指定数量的样本用户,指定数量样本用户可以是标注了正常用户标签和异常用户标签的用户,在确定指定数量的样本用户后,获取指定数量样本用户的历史行为数据,从指定数量样本用户的历史行为数据提取行为特征,并确定行为特征的有效性,在行为特征有效时该行为特征作为有效特征,并确定有效特征关联的事件,通过统计学算法计算有效特征关联的事件发生时一个用户为异常用户的概率,将该概率作为预设概率,最终得到包括多个有效特征,多个有效特征关联的事件的预设概率的样本数据。当对待审核用户进行审核确定出异常用户和正常用户后,可以将审核后的用户作为样本用户来更新样本数据。
在本申请实施例中,以异常用户为恋童癖用户为示例,有效特征可以是待审核用户的评论中包含的与恋童癖有关的违禁词,待审核用户点赞、评论、观看的以未成年为主体的色情视频、待审核用户发布的以未成年为主体的色情视频等,则可以从待审核用户的评论、观看、点赞、发布等行为数据中提取相应的预设有效特征。
S103、根据所述多个预设有效特征关联的事件的预设概率计算所述每个待审核用户为异常用户的概率。
在本申请实施例中,样本数据包括多个预设有效特征以及预设有效特征关联的事件的预设概率,对于每个待审核用户,提取多个预设有效特征后,可以对多个预设有效特征关联的事件的预设概率求和得到待审核用户为异常用户的概率。
S104、采用所有待审核用户为异常用户的概率建立总概率函数。
总概率函数为所有待审核用户为异常用户的概率的和值,对于总概率函数,当待审核用户由异常用户变为正常用户,或者由正常用户变为异常用户时,总概率函数的函数值发生变化。
S105、以预设条件为约束对所述总概率函数求解最大值以确定出候选用户。
在实际应用中,通过样本用户进行统计计算后,可以计算出指定数量的样本用户中异常用户的比例,通过该比例与所有待审核用户的总数量做乘法运算得到所有待审核用户中异常用户的数量,同时可设置召回率,计算召回率与所有待审核用户的总数量的乘积,则可以设置约束条件为:总概率函数的函数值最大时,候选用户的数量大于召回率与所有待审核用户的总数量的乘积,并且小于所有待审核用户中异常用户的数量。
在上述约束条件下,可以先将待审核用户初步划分到异常用户集合和正常 用户集合中,并假设异常用户集合中的待审核用户为异常用户,假设正常用户集合中的待审核用户为正常用户,基于该初步划分的异常用户集合和正常用户集合计算总概率函数的第一函数值。
随后遍历正常用户集合以将正常用户集合中当前遍历到的第一待审核用户作为异常用户,在每次遍历正常用户集合后,遍历异常用户集合以将异常用户集合中当前遍历到的第二待审核用户作为正常用户,在每次遍历异常用户集合后,基于当前的异常用户集合和正常用户集合计算总概率函数的函数值得到多个第二函数值,从多个第二函数值中确定出最小第二函数值,在最小第二函数值大于第一函数值时,在异常用户集合中确定出最小第二函数值对应的第二待审核用户,将最小第二函数值对应的第二待审核用户确定为正常用户,并移到所述正常用户集合中,以及将当前遍历到的第一待审核用户确定为异常用户,并从正常用户集合移到异常用户集合中,返回遍历正常用户集合以将正常用户集合中当前遍历到的第一待审核用户作为异常用户的步骤,直到遍历完正常用户集合,异常用户集合中待审核用户的数量小于所有待审核用户中异常用户的总数量,并且大于总数量与召回率的乘积,其中,总数量为异常用户的预设比例与所有待审核用户的数量的乘积,最终的异常用户集合中的待审核用户即为候选用户。
S106、对所述候选用户进行审核得到异常用户。
在确定出候选用户后,可以将候选用户的用户标识发送到审核后台,在审核后台对候选用户进行人工审核获得异常用户。
本申请实施例中待审核用户的历史行为数据包括待审核用户之间的互动行为所形成的数据,通过样本数据设置有效特征后,通过历史行为数据提取有效特征计算待审核用户为异常用户的概率,并采用所有待审核用户为异常用户的概率构建总概率函数,以预设约束条件求解该总概率函数的最大值得到候选用户,对候选用户进行审核得到异常用户,一方面,通过样本数据预先设置有效特征,从历史行为数据中提取有效特征并转换为有效特征关联的事件的概率,实现了利用待审核用户之间的互动所形成的历史数据提取出用于异常用户审核的有效特征,并将特征计算转换为概率计算,解决了数据信息不准确影响异常用户审核准确度的问题,另一方面,通过构建总概率函数对所有待审核用户进行整体求解确定出候选用户,既利用了待审核用户之间的互动行为,又使得待审核用户的分类结果相互影响,能够准确确定出后候选用户进行审核。
实施例二
图2是本申请实施例二提供的一种异常用户审核方法的步骤流程图,本申请实施例在前述实施例一的基础上进行说明,如图2所示,本申请实施例的异 常用户审核方法可以包括如下步骤:
S201、获取样本数据,所述样本数据包括有效特征、所述有效特征关联的事件发生时用户为异常用户的预设概率。
在本申请实施例中,可以通过统计学预先设置样本数据,该样本数据包括有效特征、有效特征关联的事件发生时用户为异常用户的预设概率,在一个示例中,可以获取指定数量(如1000)的样本用户的历史行为数据,其中,样本用户为标注了异常用户标签和正常用户标签的用户,然后从样本用户的历史行为数据中提取多个有效特征,并计算有效特征关联的事件发生时样本用户为异常用户的概率作为预设概率。
在提取有效特征时,可以先从样本用户的历史行为数据中提取多个行为特征,针对每个行为特征,获取具有该行为特征的样本用户为异常用户的第一用户数量,以及获取具有该行为特征的样本用户为正常用户的第二用户数量,获取指定数量的样本用户中异常用户的第三用户数量和正常用户的第四用户数量,计算第一用户数量与第三用户数量的第一比值,以及计算第二数量与第四数量的第二比值,计算第一比值和第二比值的差值的绝对值,在绝对值大于预设阈值时确定行为特征为有效特征。
在一个示例中,对异常用户进行审核以确定出恋童癖用户,假设从所有样本用户的历史行为数据中提取出N个行为特征,其中一个行为特征B为用户的评论中包含的违禁词,指定数量的样本用户中具有行为特征B的恋童癖用户在所有恋童癖用户中占比为P_1,指定数量的样本用户中具有行为特征B的正常用户在所有正常用户中占比为P_2,如果|P_1-P_2|>d(d为阈值,可设为0.02),说明行为特征B具有区分性,则认为行为特征B有效,将行为特征B作为预设有效特征,同理,遍历N个行为特征,可以得到多个有效特征和有效特征关联的事件的概率,在一个示例中,多个有效特征关联的事件如下:
事件A 0:用户是恋童癖;事件A 1:用户点赞主体为未成年的色情视频;事件A 2:用户发送的评论中包含违禁词;事件A 3:用户关注了恋童癖用户的账号;事件A 4:用户被恋童癖用户账号关注;事件A 5:用户观看了主体为未成年的色情视频;事件A 6:用户分享了主体为未成年的色情视频;事件A 7:用户的点赞视频数除以发布视频数大于30;事件A 8:用户的点赞的视频数除以发布视频数小于30;事件A 9:用户评论了主体为未成年的色情视频;事件A 10:用户为男性;事件A 11:用户为女性;事件A 12:用户性别为NULL;事件A c表示事件A不发生。
如果每个待审核用户的历史行为数据均包括上述事件,则待审核用户总共有2 8×3=768个事件组合,记为B j,1≤j≤768,则每个事件下待审核用户为异 常用户的概率为P(A 0|A n),P(A 0|A n)为待审核用户的历史行为数据中包含事件A n时,待审核用户为异常用户的概率,如上述事件示例中,P(A 0|A 6)=0.015表示待审核用户分享了主体为未成年的色情视频时该待审核用户是恋童癖用户的概率为1.5%,由于样本用户的数量为指定数量,并且可以从所有样本用户的历史行为数据中统计不同事件发生的数量,进而可以对样本用户的历史行为数据进行统计分析计算出不同事件发生时样本用户为恋童癖用户的概率,本申请实施例对不同事件发生时样本用户为恋童癖用户的概率不再详述。
在实际应用中,还可以提取更多的有效特征和设置更多的事件,本申请实施对有效特征和事件的数量、内容均不加以限制。
本申请实施例对样本用户的历史行为数据提取行为特征并确定出有效特征,可以针对诸如恋童癖或者其他违规行为设置有效特征来审核出异常用户,解决了相关技术中恋童癖用户具有对抗行为,恋童癖用户使用恋童癖之间才可以理解的色情文字的变形词、甚至创造恋童癖用户之间专用的恋童词汇来避免被检测,造成无法提取与恋童癖相关的有效特征来检测恋童癖用户的问题,可以针对恋童癖用户设置有效特征来检测出恋童癖用户,提高了恋童癖用户检测的准确度。
S202、从多个平台的埋点获取多个待审核用户的历史行为数据。
可以在短视频、直播平台设置数据埋点,通过数据埋点采集待审核用户的历史行为数据,其中,历史行为数据可以包括用户性别、用户点赞的视频、用户的评论、用户关注的账户、用户被关注的账户、用户分享的视频、用户点赞视频的数量、用户发布视频的数量中的至少一种,历史行为数据还可以包括其他行为数据。
S203、针对每个待审核用户,从所述历史行为数据中提取多个预设有效特征。
通过样本用户确定样本数据后,该样本数据包括了用于异常用户审核的预设有效特征,可以从每个待审核用户的历史行为数据中提取相应的预设有效特征。
S204、对所述多个预设有效特征关联的事件的预设概率求和得到所述每个待审核用户为异常用户的概率。
通过样本用户确定样本数据中包括的每个预设有效特征关联的事件的预设概率后,对于每个待审核用户,可以将该待审核用户的多个预设有效特征关联的事件的预设概率求和,得到每个待审核用户为异常用户的概率。
S205、采用所有待审核用户为异常用户的概率建立总概率函数。
对所有待审核用户为异常用户的概率求和得到总概率函数,对于该总概率函数,其中的一个待审核用户由异常用户变为正常用户,或者由正常用户变为异常用户时,该待审核用户为异常用户的概率发生变化,则总概率函数的函数值也发生变化。
S206、初始化异常用户集合和正常用户集合以将所述多个待审核用户划分到异常用户集合和正常用户集合中。
在一个可选实施例中,可以将所有待审核用户作为正常用户,对总概率函数求解最大值时得到的待审核用户的集合作为异常用户集合,将所有待审核用户中除了异常用户集合以外的待审核用户的集合作为正常用户集合。
待审核用户的总数量为num(Ω),通过指定数量的样本用户可以统计出异常用户所占的比例为s,计算该比例s与待审核用户的总数量num(Ω)的乘积得到待审核用户中异常用户的数量num(Ω 1),在假设所有待审核用户为正常用户时,计算每个待审核用户为异常用户的概率,使得总概率函数取值最大的num(Ω 1)个待审核用户所构成的集合作为异常用户集合P,其余待审核用户所构成的集合作为正常用户集合Q。
在另一个可选实施例中,可以将每个待审核用户的历史行为数据输入预先训练好的分类模型中得到待审核用户的分类结果,该分类结果为待审核用户为异常用户或者正常用户的概率,根据分类结果中待审核用户属于异常用户或者正常用户的概率将待审核用户划分到异常用户集合或者正常用户集合中,如训练回归神经网络、深度神经网络、循环神经网络等网络来对待审核用户进行初步分类,获得每个待审核用户为异常用户或者正常用户的分类概率,将待审核用户为异常用户的分类概率在前的num(Ω 1)个待审核用户所构成的集合作为异常用户集合,其余待审核用户划分到正常用户集合中。
在实际应用中,还可以通过其他方式初始化异常用户集合和正常用户集合。
S207、基于所述异常用户集合和正常用户集合计算所述总概率函数的第一函数值。
将异常用户集合中的待审核用户作为异常用户,将正常用户集合中的待审核用户作为正常用户来计算总概率函数的第一函数值,由于总概率函数为所有待审核用户的概率的和值,当待审核用户初始化为正常用户或者异常用户后,每个待审核用户的概率发生变化,总概率函数的值也发生变化从而得到第一函数值S 0
S208、遍历所述异常用户集合和所述正常用户集合以重新计算所述总概率函数的第二函数值。
在本申请的可选实施例中,在初始化正常用户集合和异常用户集合后,可以遍历正常用户集合以将正常用户集合中当前遍历到的第一待审核用户作为异常用户,在遍历到正常用户集合中的第一待审核用户后,遍历异常用户集合以将异常用户集合中当前遍历的第二待审核用户作为正常用户,在遍历到异常用户集合中的第二待审核用户后,基于当前的异常用户集合和当前的正常用户集合计算总概率函数的第二函数值S 1,重复执行所述遍历异常用户集合以将异常用户集合中当前遍历到的第二待审核用户作为正常用户,在遍历到异常用户集合中的第二待审核用户后,基于当前的异常用户集合和当前的正常用户集合计算总概率函数的第二函数值S 1的操作,直至遍历完异常用户集合,得到多个第二函数值S 1,从多个第二函数值S 1中确定出最小第二函数值S 1min
S209、根据所述第一函数值和所述第二函数值更新所述异常用户集合和正常用户集合。
可以从每次遍历完异常用户集合后得到的多个第二函数值中确定出最小第二函数值,在最小第二函数值大于第一函数值时,在异常用户集合中确定出最小第二函数值对应的第二待审核用户,将最小第二函数值对应的第二待审核用户确定为正常用户移到正常用户集合中,以及将当前遍历到的第一待审核用户确定为异常用户,并从正常用户集合移到异常用户集合中,返回遍历正常用户集合以将正常用户集合中当前遍历到的第一待审核用户作为异常用户的步骤。
为了理解本申请实施例中遍历、更新正常用户集合和异常用户集合的过程,结合以下示例说明:
在一个示例中,记正常用户集合为Q={Q1、Q2、……Qn},正常用户集合Q中所有待审核用户Qn为正常用户,记异常用户集合为P={P1、P2、……Pm},异常用户集合P中所有待审核用户Pm为异常用户,对异常用户集合和正常用户集合遍历过程如下:
S1、遍历正常用户集合Q中的待审核用户Q1,将待审核用户Q1改为异常用户;
S2、遍历异常用户集合为P中的待审核用户P1,将待审核用户P1改为正常用户,基于当前的异常用户集合和当前的正常用户集合计算总概率函数的第二函数值S 11,随后将待审核用户P1改为异常用户;
S3、继续遍历异常用户集合为P中的待审核用户P2,将待审核用户P2改为正常用户,基于当前的异常用户集合和正常用户集合计算总概率函数的第二函数值S 12,随后将待审核用户P2改为异常用户,如此类推,每次遍历完异常用户集合P中的一个待审核用户Pm后得到一个第二函数值S 1m,在遍历完异常用 户集合P之后得到多个第二函数值。
S4、从多个第二函数值S 1m中确定出最小第二函数值S 1min
S5、在最小第二函数值S 1min大于第一函数值S 0时,将异常用户集合P中最小第二函数值S 1min对应的待审核用户Pm确定为正常用户并移到正常用户集合Q中,以及将正常用户集合Q中当前遍历到的待审核用户Qn为异常用户并移到异常用户集合P中,返回S1,继续遍历正常用户集合Q中的下一个待审核用户,重复上述S1-S5的过程直到正常用户集合Q中待审核用户被遍历完为止。
由上述过程遍历过程可知,更新正常用户集合和异常用户集合后,总概率函数的函数值不断增大以此来求解最大值,停止遍历的条件可以是遍历完正常用户集合后,异常用户集合中待审核用户的数量小于所有待审核用户中异常用户的总数量,并且大于总数量与召回率的乘积,其中,总数量为异常用户的预设比例与所有待审核用户的数量的乘积,即总概率函数的求解问题如下:
Figure PCTCN2021115230-appb-000001
s.t.
r×num(Ω 1)≤num(V 1)≤num(Ω 1)
Figure PCTCN2021115230-appb-000002
上述公式中,P(x i)为第i个待审核用户为异常用户的概率,Ω为所有待审核用户的集合,s.t.(subject to)为使一个条件满足,
Figure PCTCN2021115230-appb-000003
为所有待审核用户集合中的任意一个待审核用户,r为预设的召回率,V1为遍历终止后的异常用户集合,num(V 1)为异常用户集合中异常用户的数量,num(Ω 1)为所有待审核用户中所有异常用户的数量,num(Ω 1)=s×num(Ω),s为异常用户的预设比例。
S210、停止遍历后,将所述异常用户集合中所包含的待审核用户确定为候选用户。
停止遍历的条件为:遍历完所述正常用户集合后,所述异常用户集合中待审核用户的数量小于所有待审核用户中异常用户的总数量,并且大于总数量与召回率的乘积,其中,总数量为异常用户的预设比例与所有待审核用户的数量的乘积。
遍历完正常用户集合并且满足预设条件后得到的异常用户集合中的待审核用户即为候选用户,可以对候选用户进行人工审核。
S211、对所述候选用户进行审核得到异常用户。
在本申请的可选实施例中,可以将候选用户的用户标识发送到审核后台, 在审核后台对候选用户进行审核,如果该候选用户经人工审核确定为异常用户,如确定为恋童癖用户,则将该候选用户标注为异常用户,否则标注为正常用户。
在本申请的可选实施例中,在人工审核从候选用户中确定出异常用户后,可以为异常用户标注异常用户标签,以及为待审核用户中除了异常用户以外的用户标注正常用户标签,将标注异常用户标签或者正常用户标签后的用户作为样本用户来更新样本数据。
本申请实施例获取包括有效特征、所述有效特征关联的事件发生时用户为异常用户的预设概率的样本数据,从多个平台的埋点获取多个待审核用户的历史行为数据后,针对每个待审核用户从历史行为数据中提取多个预设有效特征,对多个预设有效特征关联的事件的预设概率求和得到待审核用户为异常用户的概率,采用所有待审核用户为异常用户的概率建立总概率函数,初始化异常用户集合和正常用户集合后,基于异常用户集合和正常用户集合计算总概率函数的第一函数值,遍历异常用户集合和正常用户集合以重新计算总概率函数的第二函数值,根据第一函数值和第二函数值更新异常用户集合和正常用户集合,在预设条件下遍历完正常用户集合后得到总概率函数的最大函数值时,将异常用户集合中所包含的待审核用户确定为候选用户,对所述候选用户进行审核得到异常用户。一方面,通过样本数据预先设置有效特征,从历史行为数据中提取有效特征并转换为有效特征关联的事件的概率,实现了利用待审核用户之间的互动所形成的历史数据提取出用于异常用户审核的有效特征,并将特征计算转换为概率计算,解决了数据信息不准确影响异常用户审核准确度的问题,另一方面,通过构建总概率函数对所有待审核用户进行整体求解确定出候选用户,既利用了待审核用户之间的互动行为,又使得待审核用户的分类结果相互影响,能够准确确定出后候选用户进行审核。
通过对样本用户的历史行为数据提取行为特征并确定出有效特征,可以针对诸如恋童癖或者其他违规行为设置有效特征来审核出异常用户,解决了相关技术中恋童癖用户具有对抗行为,恋童癖用户使用恋童癖之间才可以理解的色情文字的变形词、甚至创造恋童癖用户之间专用的恋童词汇来避免被检测,造成无法提取与恋童癖相关的有效特征来检测恋童癖用户的问题,可以针对恋童癖用户设置有效特征来检测出恋童癖用户,提高了恋童癖用户检测的准确度。
确定出异常用户后为异常用户标注异常用户标签,为正常用户标注正常用户标签,将标注标签后的用户作为样本用户来更新样本数据的有效特征、有效特征关联的事件发生时用户为异常用户的预设概率,一方面,可以动态更新样本数据来动态求解候选用户,另一方面,增加了样本用户的数据来源,降低了样本用户的获取成本。
实施例三
图3是本申请实施例三提供的一种异常用户审核装置的结构框图,如图3所示,本申请实施例的异常用户审核装置可以包括如下模块:
历史行为数据获取模块301,设置为获取多个待审核用户的历史行为数据,所述历史行为数据包括多个待审核用户之间互动所形成的行为数据;特征提取模块302,设置为针对每个待审核用户,从所述历史行为数据中提取多个预设有效特征,所述预设有效特征为预先设置的样本数据中的特征;用户概率计算模块303,设置为根据所述多个预设有效特征关联的事件的预设概率计算所述待审核用户为异常用户的概率;总概率函数建立模块304,设置为采用所有待审核用户为异常用户的概率建立总概率函数;总概率函数求解模块305,设置为以预设条件为约束对所述总概率函数求解最大值以确定出候选用户;审核模块306,设置为对所述候选用户进行审核得到异常用户。
本申请实施例所提供的异常用户审核装置可执行本申请实施例一或实施例二所提供的异常用户审核方法,具备执行方法相应的功能模块和效果。
实施例四
图4是本申请实施例四提供的一种电子设备的结构示意图。如图4所示,该电子设备可以包括:处理器401、存储器402、具有触摸功能的显示屏403、输入装置404、输出装置405以及通信装置406。该电子设备中处理器401的数量可以是一个或者多个,图4中以一个处理器401为例。该电子设备的处理器401、存储器402、显示屏403、输入装置404、输出装置405以及通信装置406可以通过总线或者其他方式连接,图4中以通过总线连接为例。所述电子设备设置为执行如本申请任一实施例提供的异常用户审核方法。
本申请实施例还提供一种计算机可读存储介质,所述存储介质中的指令由电子设备的处理器执行时,使得电子设备能够执行如上述方法实施例所述的异常用户审核方法。
对于装置、电子设备、存储介质实施例而言,由于其与方法实施例基本相似,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
在本文的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“示例”、或“一些示例”等的描述意指结合该实施例或示例描述的特征、结构、材料或者特点包含于本申请的至少一个实施例或示例中。在本文中,对上述术语的示意性表述不一定指的是相同的实施例或示例。而且,描述的特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。

Claims (16)

  1. 一种异常用户审核方法,包括:
    获取多个待审核用户的历史行为数据,其中,所述历史行为数据包括多个待审核用户之间互动所形成的行为数据;
    针对每个待审核用户,从所述历史行为数据中提取多个预设有效特征,其中,所述预设有效特征为预先设置的样本数据中的特征;
    根据所述多个有效特征关联的事件的预设概率计算所述每个待审核用户为异常用户的概率;
    采用所述多个待审核用户为异常用户的概率建立总概率函数;
    以预设条件为约束对所述总概率函数求解最大值以确定出候选用户;
    对所述候选用户进行审核得到异常用户。
  2. 根据权利要求1所述的方法,在所述获取多个待审核用户的历史行为数据之前,还包括:
    获取样本数据,其中,所述样本数据包括有效特征、在所述有效特征关联的事件发生的情况下用户为异常用户的预设概率。
  3. 根据权利要求2所述的方法,其中,所述获取样本数据,包括:
    获取指定数量的样本用户的历史行为数据,其中,所述指定数量的样本用户为标注了异常用户标签和正常用户标签的用户;
    从所述指定数量的样本用户的历史行为数据中提取多个有效特征;
    将所述有效特征关联的事件发生时所述样本用户为异常用户的概率作为预设概率。
  4. 根据权利要求3所述的方法,其中,所述从所述指定数量的样本用户的历史行为数据中提取多个有效特征,包括:
    从所述指定数量的样本用户的历史行为数据中提取多个行为特征;
    针对每个行为特征,获取具有所述每个行为特征的样本用户为异常用户的第一用户数量,以及获取具有所述每个行为特征的样本用户为正常用户的第二用户数量;
    获取所述指定数量的样本用户中异常用户的第三用户数量和正常用户的第四用户数量;
    计算所述第一用户数量与所述第三用户数量的第一比值,以及计算所述第二用户数量与所述第四用户数量的第二比值;
    计算所述第一比值和所述第二比值的差值的绝对值;
    在所述绝对值大于预设阈值的情况下确定所述每个行为特征为有效特征。
  5. 根据权利要求1所述的方法,其中,所述获取多个待审核用户的历史行为数据,包括:
    从多个平台的埋点获取所述多个待审核用户的历史行为数据,其中,所述历史行为数据包括以下至少一种:
    用户性别、用户点赞的视频、用户的评论、用户关注的账户、用户被关注的账户、用户分享的视频、用户点赞视频的数量、用户发布视频的数量。
  6. 根据权利要求1所述的方法,其中,所述根据所述多个有效特征关联的事件的预设概率计算所述每个待审核用户为异常用户的概率,包括:
    对所述多个预设有效特征关联的事件的预设概率求和得到所述每个待审核用户为异常用户的概率。
  7. 根据权利要求1-6中任一项所述的方法,其中,所述以预设条件为约束对所述总概率函数求解最大值以确定出候选用户,包括:
    初始化异常用户集合和正常用户集合以将所述多个待审核用户划分到所述异常用户集合和所述正常用户集合中;
    基于所述异常用户集合和所述正常用户集合计算所述总概率函数的第一函数值;
    遍历所述异常用户集合和所述正常用户集合以重新计算所述总概率函数的第二函数值;
    根据所述第一函数值和所述第二函数值更新所述异常用户集合和所述正常用户集合,返回遍历所述异常用户集合和所述正常用户集合以重新计算所述总概率函数的第二函数值的步骤,直到遍历完所述正常用户集合后,所述异常用户集合中待审核用户的数量小于所述多个待审核用户中异常用户的总数量,并且大于所述总数量与召回率的乘积,所述总数量为所述异常用户的预设比例与所述多个待审核用户的数量的乘积;
    将所述异常用户集合中所包含的待审核用户确定为所述候选用户。
  8. 根据权利要求7所述的方法,其中,所述初始化异常用户集合和正常用户集合以将所述多个待审核用户划分到所述异常用户集合和所述正常用户集合中,包括:
    将所述多个待审核用户作为正常用户,对所述总概率函数求解最大值时得到的待审核用户的集合作为异常用户集合;
    将所述多个待审核用户中除了所述异常用户集合以外的待审核用户的集合作为正常用户集合。
  9. 根据权利要求7所述的方法,其中,所述初始化异常用户集合和正常用户集合以将所述多个待审核用户划分到所述异常用户集合和所述正常用户集合中,包括:
    将每个待审核用户的历史行为数据输入预先训练好的分类模型中得到所述每个待审核用户的分类结果,其中,所述分类结果为所述每个待审核用户为异常用户或者正常用户;
    根据所述分类结果将所述每个待审核用户划分到所述异常用户集合或者所述正常用户集合中。
  10. 根据权利要求7所述的方法,其中,所述遍历所述异常用户集合和所述正常用户集合以重新计算所述总概率函数的第二函数值,包括:
    遍历所述正常用户集合以将所述正常用户集合中当前遍历到的第一待审核用户作为异常用户;
    在遍历到所述正常用户集合中的所述第一待审核用户后,遍历所述异常用户集合以将所述异常用户集合中当前遍历到的第二待审核用户作为正常用户;
    在遍历到所述异常用户集合中的所述第二待审核用户后,基于当前的异常用户集合和当前的正常用户集合计算所述总概率函数的第二函数值;
    重复执行所述遍历所述异常用户集合以将所述异常用户集合中当前遍历到的第二待审核用户作为正常用户,在遍历到所述异常用户集合中的所述第二待审核用户后,基于当前的异常用户集合和当前的正常用户集合计算所述总概率函数的第二函数值的操作,直至遍历完所述异常用户集合,得到多个第二函数值;
    从所述多个第二函数值中确定出最小第二函数值。
  11. 根据权利要求10所述的方法,其中,所述根据所述第一函数值和所述第二函数值更新所述异常用户集合和所述正常用户集合,包括:
    在所述最小第二函数值大于所述第一函数值的情况下,在所述异常用户集合中确定出所述最小第二函数值对应的第二待审核用户;
    将所述最小第二函数值对应的第二待审核用户确定为正常用户,并移到所述正常用户集合中,以及将所述当前遍历到的所述第一待审核用户确定为异常用户,并从所述正常用户集合移到所述异常用户集合中,返回所述遍历所述正常用户集合以将所述正常用户集合中当前遍历到的第一待审核用户作为异常用 户的操作。
  12. 根据权利要求1-6中任一项所述的方法,其中,所述对所述候选用户进行审核得到异常用户,包括:
    将所述候选用户的用户标识发送至审核后台,以使所述审核后台对所述候选用户进行人工审核;
    接收所述审核后台发送的审核结果,其中,所述审核结果包括从所述候选用户中确定出的所述异常用户。
  13. 根据权利要求12所述的方法,在所述接收所述审核后台发送的审核结果之后,还包括:
    为所述异常用户标注异常用户标签,以及为所述待审核用户中除了所述异常用户以外的用户标注正常用户标签;
    将标注异常用户标签或者正常用户标签后的用户作为样本用户来更新所述样本数据。
  14. 一种异常用户审核装置,包括:
    历史行为数据获取模块,设置为获取多个待审核用户的历史行为数据,其中,所述历史行为数据包括多个待审核用户之间互动所形成的行为数据;
    特征提取模块,设置为针对每个待审核用户,从所述历史行为数据中提取多个预设有效特征,其中,所述预设有效特征为预先设置的样本数据中的特征;
    用户概率计算模块,设置为根据所述多个预设有效特征关联的事件的预设概率计算所述每个待审核用户为异常用户的概率;
    总概率函数建立模块,设置为采用所述多个待审核用户为异常用户的概率建立总概率函数;
    总概率函数求解模块,设置为以预设条件为约束对所述总概率函数求解最大值以确定出候选用户;
    审核模块,设置为对所述候选用户进行审核得到异常用户。
  15. 一种电子设备,包括:
    至少一个处理器;
    存储装置,设置为存储至少一个程序;
    当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现如权利要求1-13中任一项所述的异常用户审核方法。
  16. 一种计算机可读存储介质,存储有计算机程序,其中,所述程序被处理器执行时实现如权利要求1-13中任一项所述的异常用户审核方法。
PCT/CN2021/115230 2020-09-30 2021-08-30 异常用户审核方法、装置、电子设备和存储介质 WO2022068493A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US18/245,653 US20230336637A1 (en) 2020-09-30 2021-08-30 Method and apparatus for moderating abnormal users, electronic device, and storage medium
EP21874154.4A EP4198775A4 (en) 2020-09-30 2021-08-30 ABNORMAL USER AUDIT METHOD AND APPARATUS, ELECTRONIC DEVICE AND STORAGE MEDIUM

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011066006.0A CN112199640B (zh) 2020-09-30 2020-09-30 异常用户审核方法、装置、电子设备和存储介质
CN202011066006.0 2020-09-30

Publications (1)

Publication Number Publication Date
WO2022068493A1 true WO2022068493A1 (zh) 2022-04-07

Family

ID=74013675

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/115230 WO2022068493A1 (zh) 2020-09-30 2021-08-30 异常用户审核方法、装置、电子设备和存储介质

Country Status (4)

Country Link
US (1) US20230336637A1 (zh)
EP (1) EP4198775A4 (zh)
CN (1) CN112199640B (zh)
WO (1) WO2022068493A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115994203A (zh) * 2023-02-20 2023-04-21 广州佰锐网络科技有限公司 一种基于ai的数据标注处理方法、系统及ai中台

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112199640B (zh) * 2020-09-30 2024-03-12 广州市百果园网络科技有限公司 异常用户审核方法、装置、电子设备和存储介质
CN113766256A (zh) * 2021-02-09 2021-12-07 北京沃东天骏信息技术有限公司 一种直播风控方法和装置
CN113163218A (zh) * 2021-02-09 2021-07-23 百果园技术(新加坡)有限公司 直播间内用户的检测方法和系统、电子设备及存储介质
CN113255929B (zh) * 2021-05-27 2023-04-18 支付宝(中国)网络技术有限公司 异常用户可解释原因的获取方法和装置
CN113485305B (zh) * 2021-07-28 2023-04-07 成都飞机工业(集团)有限责任公司 一种飞机外勤故障诊断系统与方法
CN114205676B (zh) * 2021-12-08 2024-05-28 广州方硅信息技术有限公司 直播监测方法、装置、介质以及计算机设备
CN116488934A (zh) * 2023-05-29 2023-07-25 无锡车联天下信息技术有限公司 一种基于域控制器的网络安全管理方法及系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170124464A1 (en) * 2015-10-28 2017-05-04 Fractal Industries, Inc. Rapid predictive analysis of very large data sets using the distributed computational graph
CN107517251A (zh) * 2017-08-16 2017-12-26 北京小度信息科技有限公司 信息推送方法和装置
CN110237530A (zh) * 2019-06-14 2019-09-17 腾讯科技(深圳)有限公司 异常行为检测方法、装置和可读存储介质
CN110929799A (zh) * 2019-11-29 2020-03-27 上海盛付通电子支付服务有限公司 用于检测异常用户的方法、电子设备和计算机可读介质
CN111090813A (zh) * 2019-12-20 2020-05-01 腾讯科技(深圳)有限公司 一种内容处理方法、装置和计算机可读存储介质
CN112199640A (zh) * 2020-09-30 2021-01-08 广州市百果园网络科技有限公司 异常用户审核方法、装置、电子设备和存储介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008141256A2 (en) * 2007-05-10 2008-11-20 Mary Kay Hoal Social networking system
CN103853841A (zh) * 2014-03-19 2014-06-11 北京邮电大学 一种社交网用户异常行为的分析方法
US9985916B2 (en) * 2015-03-03 2018-05-29 International Business Machines Corporation Moderating online discussion using graphical text analysis

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170124464A1 (en) * 2015-10-28 2017-05-04 Fractal Industries, Inc. Rapid predictive analysis of very large data sets using the distributed computational graph
CN107517251A (zh) * 2017-08-16 2017-12-26 北京小度信息科技有限公司 信息推送方法和装置
CN110237530A (zh) * 2019-06-14 2019-09-17 腾讯科技(深圳)有限公司 异常行为检测方法、装置和可读存储介质
CN110929799A (zh) * 2019-11-29 2020-03-27 上海盛付通电子支付服务有限公司 用于检测异常用户的方法、电子设备和计算机可读介质
CN111090813A (zh) * 2019-12-20 2020-05-01 腾讯科技(深圳)有限公司 一种内容处理方法、装置和计算机可读存储介质
CN112199640A (zh) * 2020-09-30 2021-01-08 广州市百果园网络科技有限公司 异常用户审核方法、装置、电子设备和存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4198775A4 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115994203A (zh) * 2023-02-20 2023-04-21 广州佰锐网络科技有限公司 一种基于ai的数据标注处理方法、系统及ai中台

Also Published As

Publication number Publication date
EP4198775A1 (en) 2023-06-21
US20230336637A1 (en) 2023-10-19
EP4198775A4 (en) 2024-03-13
CN112199640B (zh) 2024-03-12
CN112199640A (zh) 2021-01-08

Similar Documents

Publication Publication Date Title
WO2022068493A1 (zh) 异常用户审核方法、装置、电子设备和存储介质
CN110162593B (zh) 一种搜索结果处理、相似度模型训练方法及装置
US11288444B2 (en) Optimization techniques for artificial intelligence
US11036791B2 (en) Computerized system and method for determining non-redundant tags from a user's network activity
CN105279495B (zh) 一种基于深度学习和文本总结的视频描述方法
US11514244B2 (en) Structured knowledge modeling and extraction from images
CN106951422B (zh) 网页训练的方法和装置、搜索意图识别的方法和装置
US9087297B1 (en) Accurate video concept recognition via classifier combination
CN110781273B (zh) 文本数据处理方法、装置、电子设备及存储介质
CN106874253A (zh) 识别敏感信息的方法及装置
CN102637163A (zh) 一种基于语义的多层次本体匹配的控制方法及系统
CN108108353B (zh) 一种基于弹幕的视频语义标注方法、装置及电子设备
Zhang et al. Cross-modal image sentiment analysis via deep correlation of textual semantic
CN104077417A (zh) 社交网络中的人物标签推荐方法和系统
CN110598070A (zh) 应用类型识别方法及装置、服务器及存储介质
CN113688310B (zh) 一种内容推荐方法、装置、设备及存储介质
CN113301442A (zh) 确定直播资源的方法、设备、介质及程序产品
CN112183881A (zh) 一种基于社交网络的舆情事件预测方法、设备及存储介质
CN111143508B (zh) 一种基于通信类短文本的事件检测与跟踪方法及系统
Yuan et al. Sentiment analysis using social multimedia
CN113535949A (zh) 基于图片和句子的多模态联合事件检测方法
CN115775349A (zh) 基于多模态融合的假新闻检测方法和装置
CN117891939A (zh) 粒子群算法结合cnn卷积神经网络的文本分类方法
Karthikeyan et al. Machine learning techniques application: social media, agriculture, and scheduling in distributed systems
CN113901817A (zh) 文档分类方法、装置、计算机设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21874154

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202327016666

Country of ref document: IN

ENP Entry into the national phase

Ref document number: 2021874154

Country of ref document: EP

Effective date: 20230316

WWE Wipo information: entry into national phase

Ref document number: 2023107430

Country of ref document: RU

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 523440093

Country of ref document: SA

WWE Wipo information: entry into national phase

Ref document number: 523440093

Country of ref document: SA