WO2019165697A1 - Method and device for identifying click farming users, terminal device and storage medium - Google Patents

Method and device for identifying click farming users, terminal device and storage medium Download PDF

Info

Publication number
WO2019165697A1
WO2019165697A1 PCT/CN2018/084638 CN2018084638W WO2019165697A1 WO 2019165697 A1 WO2019165697 A1 WO 2019165697A1 CN 2018084638 W CN2018084638 W CN 2018084638W WO 2019165697 A1 WO2019165697 A1 WO 2019165697A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
users
risk score
popular
target
Prior art date
Application number
PCT/CN2018/084638
Other languages
French (fr)
Chinese (zh)
Inventor
王璐
陈少杰
张文明
Original Assignee
武汉斗鱼网络科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 武汉斗鱼网络科技有限公司 filed Critical 武汉斗鱼网络科技有限公司
Publication of WO2019165697A1 publication Critical patent/WO2019165697A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1433Vulnerability analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/142Network analysis or design using statistical or mathematical methods

Definitions

  • the present disclosure relates to the field of Internet technologies, and in particular, to a method, an apparatus, a terminal device, and a storage medium for identifying a popular user.
  • the current method of identifying abnormal popularity of brushing behavior is to adopt some strong rules. These strong rules are identified by some obvious abnormal features. This method can identify some risk users, but the requirements of this method are very strict, so only A cheat user with obvious characteristics can be found, and users who are not obvious in characteristics but have cheating behavior will be omitted. Therefore, it is necessary to provide a method for identifying popular users with high accuracy.
  • the purpose of the present disclosure includes a method of identifying a popular user to improve the accuracy of the user who identifies the brush.
  • Another object of the present disclosure includes an identification device for a popular user to improve the accuracy of identifying a popular user of the brush.
  • Another object of the present disclosure includes a terminal device to improve the accuracy of identifying a popular user of a brush.
  • an embodiment of the present disclosure provides a method for identifying a popular user, the method comprising: acquiring user characteristics of all users; determining a first risk score according to user characteristics corresponding to each user; and identifying all users
  • the popular user is a popular user for the target; the non-brushing popular user determined among all the users is the target normal user; and the second risk score of each user is determined according to the user characteristics corresponding to the target popular user and the target normal user; Determining, according to the first risk score and the second risk score, a final risk score of each user as a popular user; and identifying other popular users according to the final risk score.
  • an embodiment of the present disclosure further provides an identification device for a popular user, the device comprising: an acquisition module configured to acquire user characteristics of all users; and a first determining module configured to correspond to each user The user feature determines a first risk score; the first identification module is configured to identify that the brushed popularity user among all the users is the target popularity user; and the second identification module is configured to identify the non-brush popularity user determined by all the users as the target normal a second determining module, configured to determine a second risk score of each user according to the user characteristics corresponding to the target brush popularity user and the target normal user; the score determining module configured to perform according to the first risk score and the The second risk score determines each user's final risk score for the popular user; and identifies other popular users based on the final risk score.
  • an embodiment of the present disclosure further provides a terminal device, the terminal device including a memory and a processor, the memory configured to store computer program code, the processor configured to execute stored in the memory
  • the computer program code implements the method of identifying the popular user.
  • an embodiment of the present disclosure further provides a terminal device, a readable storage medium, storing executable instructions, where the instructions are implemented when executed by one or more processors User identification method.
  • An embodiment of the present disclosure provides a method, an apparatus, a terminal device, and a storage medium for identifying a popular user, and the method and device for identifying the popular user are applied to the terminal device.
  • the method for identifying popular users includes obtaining user characteristics of all users, determining a first risk score according to user characteristics corresponding to each user, and then identifying a popular user determined by all users as a target popular user, and identifying all users.
  • the non-brushing popular user is the target normal user
  • the second risk score of each user is determined according to the target brushing popularity user and the target normal user.
  • the final risk score of each user is determined according to the first risk score and the second risk score.
  • the first risk score and the second risk score of the current user are respectively calculated by two aspects, and then the final risk score of the current user is calculated according to the first risk score and the second risk score, which may
  • the popular popularity of the user is determined, which avoids the omission of the popular user, and improves the accuracy of identifying the user as a popular user.
  • FIG. 1 is a schematic structural diagram of a terminal device according to an embodiment of the present disclosure.
  • FIG. 2 is a schematic flow chart of a method for identifying a popular user by providing an embodiment of the present disclosure.
  • FIG. 3 is a schematic flow chart showing a sub-step of a method for identifying a popular user by providing an embodiment of the present disclosure.
  • FIG. 4 is a schematic flow chart showing another sub-step of a method for identifying a popular user by providing an embodiment of the present disclosure.
  • FIG. 5 is a schematic diagram of functional modules of an identification device for a popular user of the present disclosure.
  • FIG. 1 is a schematic structural diagram of a terminal device 100 according to an embodiment of the present disclosure.
  • a method for identifying a popular user is applied to the terminal device 100, and the terminal device 100 may be, but not Limited to smart electronic devices such as tablets and desktop computers.
  • the terminal device 100 includes an identification device 110 for brushing popular users, a memory 120, a memory controller 130, a processor 140, a peripheral interface 150, an input and output unit 160, an audio unit 170, and a display unit 180.
  • the components of the memory 120, the memory controller 130, the processor 140, the peripheral interface 150, the input and output unit 160, the audio unit 170, and the display unit 180 are directly or indirectly electrically connected to each other to implement data transmission or Interaction.
  • the components can be electrically connected to one another via one or more communication buses or signal lines.
  • the identification device 110 of the popular user includes at least one software function module that can be stored in the memory 120 or is solidified in an operating system (OS) of the terminal device 100 in the form of software or firmware.
  • the processor 140 is configured to execute an executable module stored in the memory 120, such as a software function module or a computer program included in the identification device 110 of the popular user.
  • the memory 120 can be, but not limited to, a random access memory (RAM), a read only memory (ROM), and a programmable read-only memory (PROM). Erasable Programmable Read-Only Memory (EPROM), Electric Erasable Programmable Read-Only Memory (EEPROM), and the like.
  • RAM random access memory
  • ROM read only memory
  • PROM programmable read-only memory
  • EPROM Erasable Programmable Read-Only Memory
  • EEPROM Electric Erasable Programmable Read-Only Memory
  • the memory 120 is configured to store a program, and the processor 140 executes the program after receiving the execution instruction, and the method executed by the terminal device 100 defined by the flow process disclosed in any of the foregoing embodiments of the present disclosure may be applied.
  • processor 140 or implemented by processor 140.
  • Processor 140 may be an integrated circuit chip with signal processing capabilities.
  • the processor 140 may be a general-purpose processor, including a central processing unit (CPU), a network processor (NP Processor, etc.), or a digital signal processor (DSP), an application specific integrated circuit. (ASIC), Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware component.
  • CPU central processing unit
  • NP Processor network processor
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA Field Programmable Gate Array
  • the general purpose processor may be a microprocessor or the processor 140 may be any conventional processor 140 or the like.
  • peripheral interface 150 couples various input/output devices to the processor 140 and to the memory 120.
  • peripheral interface 150, processor 140, and memory controller 130 can be implemented in a single chip. In other instances, they can be implemented by separate chips.
  • the input output unit 160 is configured to provide input to the user to enable user interaction with the terminal device 100.
  • the input and output unit 160 may be, but not limited to, a mouse, a keyboard, and the like.
  • the audio unit 170 provides an audio interface to the user, which may include one or more microphones, one or more speakers, and audio circuitry.
  • the display unit 180 provides an interactive interface (eg, a user interface) between the terminal device 100 and the user or is configured to display image data to the user for reference.
  • the display unit 180 can be a liquid crystal display or a touch display.
  • a touch display it can be a capacitive touch screen or a resistive touch screen that supports single-point and multi-touch operations. Supporting single-point and multi-touch operations means that the touch display can sense simultaneous touch operations from one or more locations on the touch display, and the touch operation is transferred to the processor 140. Perform calculations and processing.
  • the method for identifying a popular user may be configured to identify a user who has a billing behavior in each major website.
  • the website can be, but is not limited to, an e-commerce website (such as Taobao), a live broadcast platform, and the like.
  • the identification method of the popular user is applied to the live broadcast platform as an example for description.
  • the method includes:
  • Step S110 acquiring user characteristics of all users.
  • the user feature is a user feature related to the user's popular behavior.
  • the user usually broadcasts popularity in a plurality of live broadcast rooms or in a short time in a plurality of live broadcast rooms, and the number of live broadcast rooms or users that the user watches during the predetermined time is The number of bullets that are played in multiple live broadcasts within a predetermined time period is the user feature that needs to be acquired.
  • each user has multiple user features.
  • the user feature can be selected for each time when the user views multiple live rooms in a short time.
  • the data such as the start time and end time of each live room watch, and the watch time for each live room, for example, the user feature of user A may be: user A watching the start time M1 of the live room X, the end time M2 and the viewing time M3.
  • the terminal device may acquire user characteristics of all users, for example, three ways may be adopted.
  • the terminal device may obtain a user log generated during a live broadcast of a live broadcast, where the user log indicates that each user who views the live broadcast room during the entire live broadcast process is in the entire viewing process. All the viewings of the live broadcast room, including the live broadcast room, can obtain the start time and end time of each user watching each live broadcast room, as well as the viewing time, and then obtain all the viewing of the live broadcast room. User characteristics for all users.
  • the terminal device may obtain the duration of the time according to a preset, and intermittently obtain a user log generated during a preset duration of the live broadcast during a live broadcast, where
  • the user log indicates that each user who views the live broadcast room during the preset acquisition time has watched all the viewed live broadcast rooms including the live broadcast room during the entire viewing process, that is, every preset time period can be obtained.
  • Each user views the start time and end time of each live broadcast room, as well as the viewing time, and then obtains the user characteristics of all users who view the live broadcast room. For example, if the preset acquisition time is 2 hours and the preset time interval is 30 minutes, the terminal device continuously obtains the user log generated within the 2 hours during a certain 2 hours during the live broadcast of the live broadcast.
  • the method can effectively reduce the processing capacity of the terminal device for data, and can reduce the power consumption to a certain extent.
  • the terminal device can first monitor the current popularity value between the live broadcasts, so that the current popularity value of the live broadcast room can be obtained in real time.
  • the terminal device can determine whether the current popularity value is less than the popularity threshold by passing a preset popularity threshold. When it is judged to be not less than, it means that the popularity of the live broadcast room is relatively high, and the number of viewing users in the live broadcast room is also sufficient, and the attention of the live broadcast room is relatively high (for example, a big anchor), and then the live broadcast room Popularity is likely to be based on the influence of the anchor, while popularity may be a small probability event. Therefore, the terminal device can proceed without further processing and continue monitoring.
  • the terminal device can obtain the user log generated by the live broadcast room from this time, and until the live broadcast of the live broadcast room ends, wherein the user log indicates that each live broadcast room is viewed during the process from the start of monitoring to the end of the live broadcast.
  • the user can view the viewing time of all the live broadcast rooms including the live broadcast room, that is, the start time and end time of each user watching each live broadcast room, and the viewing time, and then the viewing time.
  • User characteristics of all users viewing the live room were also obtained. It can be understood that, in this manner, the terminal device performs the process when it is required to acquire the user features of all users, thereby effectively reducing the processing capacity of the terminal device for data, and also reducing power consumption to a certain extent. effect.
  • Step S120 Determine a first risk score according to a user characteristic corresponding to each user.
  • FIG. 3 is a schematic flowchart of a sub-step of step S120 of the method for identifying a popular user of the present disclosure.
  • the step S120 includes:
  • Step S121 classifying all users according to the user feature by using a predetermined algorithm.
  • the predetermined algorithm is a k-means algorithm, which is a clustering algorithm, and the effect of grouping is realized by continuous iteration of the center point and distance calculation. Further, according to the acquired user characteristics, the user is divided into 10 categories according to the k-means algorithm, which is easy to understand, and the specific division can be customized according to the needs of the user.
  • Step S122 selecting key indicators in the user characteristics corresponding to each type of user.
  • the key indicator may be the user's viewing time for each live broadcast, etc.
  • the key indicator is multiple, that is, the key indicator is a plurality of viewing durations corresponding to multiple live broadcasts during the entire process of viewing multiple live broadcasts.
  • Step S124 determining a first risk score of each user according to the size of the average value.
  • the selected key indicator is the user's characteristics such as the viewing time of each live broadcast room, the shorter the duration, the higher the probability that the user is popular for the main broadcast, and then the average of the multiple key indicators.
  • the smaller the value the higher the risk that the user is a popular user, and the high risk category is recorded as 1 point, and the low risk category is recorded as 0.1 point.
  • all categories of users will determine the first risk score based on the category they belong to.
  • step S130 it is identified that the popular user identified by all the users is the target popular user.
  • the user who selects the user feature to meet the preset strong rule is the target user, and the preset strong rule requires the user to meet the predetermined strict restriction feature, such as, but not limited to, a certain feature.
  • the user has changed multiple devices to log in to his account within a predetermined time, or for the live broadcast platform, there is no log for the user in the background, but the user can continue to play the bullet.
  • the preset strong rule indicates that the user has an obvious popularity feature, and if a user has a user feature that meets a preset strong rule, it is determined that the user is a target popular user.
  • Step S140 identifying that the non-brushing popular users determined among all the users are the target normal users.
  • a normal user among all users can be identified according to a predetermined rule, the predetermined rule includes a plurality of user features, and the plurality of user features jointly determine that the corresponding user is a target normal user, wherein determining that one user is normal
  • the most important user characteristic of the user is the payment behavior, that is, the behavior of determining whether the user has continuously paid for the service enjoyed.
  • the normal user refers to a user who determines that there is no popular behavior.
  • Step S150 Determine a second risk score of each user according to the user characteristics corresponding to the target brush popularity user and the target normal user.
  • FIG. 4 is a schematic flowchart of a sub-step of step S150 of the method for identifying a popular user of the present disclosure.
  • the step S150 includes:
  • step S151 a network structure between all users is constructed.
  • each user has a unique IP or a fixed device address when logging in to the website or live broadcast platform, but there are cases where multiple users have the same IP address or device address due to the presence of a popular user.
  • users with the same IP address or device address are connected by a straight line, thereby forming a network structure among all users.
  • Step S152 Calculate the similarity weight between each two users according to the user characteristics of the target brush popularity user and the target normal user.
  • the target brush popularity user and the target normal user have been determined in advance, the user characteristics describing the user as the target brush popularity user have been determined, and the user characteristics describing the user as the target normal user have also been determined. If, in the network structure, a user is connected to the target user and/or the target normal user, the same user feature of the user and the target user is selected, and the similarity right between the user and the target user is calculated. The value, the same user feature of the user and the target normal user is selected, and the similarity weight of the user and the target popular user is calculated.
  • Xui is the i-th user feature of user u
  • Xvi is the i-th user feature of user v
  • w uv is the similarity weight between user u and user v.
  • the brush popularity user since only the target brush popularity user and the target normal user are determined, at the beginning of the calculation, even if there is the unknown or not, the brush popularity user has the same as other unknown users, the target brush popularity user, and the target normal user.
  • the connection relationship only calculates whether the unknown has a similarity between the popular user and the target brush user and the target normal user. Then, by using the algorithm to make the similarity weight of the currently unknown user known, the similarity weights of other users can be calculated according to the connection relationship between the currently unknown user and other users in the network structure.
  • Step S153 Calculate a second risk score of the current user according to a similarity weight of other users connected to the current user in the network structure, and an initial score of the target brush popularity user and the target normal user.
  • the second risk score of each user can be calculated according to the similarity weight and the initial score between the target brush popularity user and the target normal user.
  • the score of the user A is only the initial calculated score, and only the connection relationship between the user A and the target brush popularity user and the target normal user is calculated, and the user A and the target brush popularity user are excluded.
  • the target normal user or other suspected users are connected, in order to improve the accuracy of the calculation, it is necessary to perform multiple rounds of iteration on the algorithm, so that the second risk score of the last A tends to a stable value, and then the stable value is used as the user.
  • a second risk score is necessary to perform multiple rounds of iteration on the algorithm, so that the second risk score of the last A tends to a stable value, and then the stable value is used as the user.
  • the score S(A)_0 of the user A calculated in the first round is used as the basic score to participate in the calculation of the second round, and then the iterative calculation of A is performed by the algorithm.
  • Precise user A's second risk score Preferably, in the embodiment of the present disclosure, when the second risk score of each user is calculated, the number of iterations is 15 times, and the score that tends to be stable after multiple iterations is selected as the second risk score corresponding to the user.
  • the specific algorithm formula of the second risk score is:
  • S k (i) is the second risk score of user i in the k-th iteration
  • is the weight coefficient, which is between 0 and 1
  • w ji is the similarity between user j and user i Weights.
  • Step S160 determining, according to the first risk score and the second risk score, a final risk score of each user as a popular user, and identifying other popular users according to the final risk score.
  • the user's first risk score is determined by classifying the user, and then the user's second risk score is determined by iteratively calculating the user, and then the user's final risk score is determined according to the first risk score and the second risk score.
  • the final risk score is calculated as:
  • weighting coefficient W1 is usually smaller than W2 because it is more accurate by the algorithm than by the classification to judge whether the user is a popular user.
  • comparing the final risk score of each user to the popular user is compared with a predetermined threshold. If the final risk score is greater than the predetermined threshold, determining that the user is a maliciously popular user, and determining whether the user is popular by the means. The user is more accurate, and can estimate whether the user may become a malicious user according to the current final risk score of each user, so as to focus on the subsequent monitoring of the user.
  • FIG. 5 is a schematic diagram of functional modules of a device for identifying a popular user of the present disclosure.
  • the device includes an obtaining module 111 , a first determining module 112 , a first identifying module 113 , and a second identifying module 114 .
  • the obtaining module 111 is configured to acquire user characteristics of all users.
  • step S110 may be performed by the acquisition module 111.
  • the first determining module 112 is configured to determine a first risk score according to a user characteristic corresponding to each user.
  • steps S120-S124 may be performed by the first determining module 112.
  • the first identification module 113 is configured to identify that the brushed popularity user among all the users is the target brush popularity user.
  • step S130 may be performed by the first identification module 113.
  • the second identification module 114 is configured to identify the non-brushing popular users determined among all the users as the target normal users.
  • step S140 may be performed by the second identification module 114.
  • the second determining module 115 is configured to determine a second risk score of each user according to the target brush popularity user and the target normal user.
  • steps S150 to S153 may be performed by the second determination module 115.
  • the score determination module 116 is configured to determine a final risk score for each user as a popular user based on the first risk score and the second risk score, and identify other popular users based on the final risk score.
  • step S160 may be performed by score determination module 116.
  • an embodiment of the present disclosure provides a method, an apparatus, a terminal device, and a storage medium for identifying a popular user, and the method and device for identifying the popular user are applied to the terminal device.
  • the method for identifying popular users includes obtaining user characteristics of all users, determining a first risk score according to user characteristics corresponding to each user, and then identifying a popular user determined by all users as a target popular user, and identifying all users.
  • the non-brushing popular user is the target normal user
  • the second risk score of each user is determined according to the target brushing popularity user and the target normal user.
  • the final risk score of each user is determined according to the first risk score and the second risk score.
  • the first risk score and the second risk score of the current user are respectively calculated by two aspects, and then the final risk score of the current user as the popular user is calculated according to the first risk score and the second risk score, and the calculation is performed. More complex, improving the accuracy of identifying users as popular users.
  • each block of the flowchart or block diagram can represent a module, a program segment, or a portion of code that comprises one or more of the Executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the blocks may also occur in a different order than those illustrated in the drawings.
  • each block of the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts can be implemented in a dedicated hardware-based system that performs the specified function or function. Or it can be implemented by a combination of dedicated hardware and computer instructions.
  • each functional module in various embodiments of the present disclosure may be integrated to form a separate part, or each module may exist separately, or two or more modules may be integrated to form a separate part.
  • the functions, if implemented in the form of software functional modules and sold or used as separate products, may be stored in a computer readable storage medium.
  • a computer readable storage medium including: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like.
  • the embodiments of the present disclosure provide a method, a device, a terminal device, and a storage medium for identifying a popular user.
  • the method and device for identifying a popular user are applied to the terminal device, thereby avoiding the omission of the popular user and improving the identification of the user. Brush the accuracy of popular users.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present disclosure relates to the technical field of the Internet, in particular to a method and device for identifying click farming users, a terminal device and a storage medium, the method for identifying a click farming user comprising: acquiring user features of all users; determining a first risk score according to the user features corresponding to each user; further identifying a click farming user determined among all users as a target click farming user; identifying a non-click farming user determined among all users as a target normal user; determining a second risk score of each user according to the target click farming user and the target normal user; and finally, determining the final risk score of each user being a click farming user according to the first risk score and the second risk score, and identifying click farming users according to the final risk score. With the present solution, the accuracy of identifying a user as a click farming user is improved.

Description

刷人气用户的识别方法、装置、终端设备及储存介质Method, device, terminal device and storage medium for identifying popular users
相关申请的交叉引用Cross-reference to related applications
本申请要求于2018年02月28日提交中国专利局的申请号为2018101691861名称为“刷人气用户的识别方法、装置及终端设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to Chinese Patent Application No. 2018101691861, entitled "Identification Method, Device and Terminal Device for Popularity Users", filed on February 28, 2018, the entire contents of which are incorporated herein by reference. In the application.
技术领域Technical field
本公开涉及互联网技术领域,具体而言,涉及一种刷人气用户的识别方法、装置、终端设备及储存介质。The present disclosure relates to the field of Internet technologies, and in particular, to a method, an apparatus, a terminal device, and a storage medium for identifying a popular user.
背景技术Background technique
在很多互联网平台上,为了达到某些目的经常存在虚假的刷人气行为,如淘宝刷单或直播平台给主播刷人气等,这样的刷人气行为给互联网平台的生态环境造成极大的影响。On many Internet platforms, in order to achieve certain purposes, there are often false popular behaviors, such as Taobao brushing or live broadcast platform to the main broadcast, etc., such popular behavior has a great impact on the ecological environment of the Internet platform.
目前的识别异常的刷人气行为的方法是采用一些强规则,这些强规则是通过一些较为明显的异常特征进行识别的,该手段可以识别出一些风险用户,但该手段的要求非常严格,因此只能找到具有明显特征的作弊用户,将遗漏特征不明显但存在作弊行为的用户。因此,提供一种具有较高准确度的刷人气用户的识别方法是十分必要的。The current method of identifying abnormal popularity of brushing behavior is to adopt some strong rules. These strong rules are identified by some obvious abnormal features. This method can identify some risk users, but the requirements of this method are very strict, so only A cheat user with obvious characteristics can be found, and users who are not obvious in characteristics but have cheating behavior will be omitted. Therefore, it is necessary to provide a method for identifying popular users with high accuracy.
发明内容Summary of the invention
本公开的目的包括一种刷人气用户的识别方法,以提高识别刷人气用户的准确度。The purpose of the present disclosure includes a method of identifying a popular user to improve the accuracy of the user who identifies the brush.
本公开的另一目的包括一种刷人气用户的识别装置,以提高识别刷人气用户的准确度。Another object of the present disclosure includes an identification device for a popular user to improve the accuracy of identifying a popular user of the brush.
本公开的另一目的包括一种终端设备,以提高识别刷人气用户的准确度。Another object of the present disclosure includes a terminal device to improve the accuracy of identifying a popular user of a brush.
为了实现上述目的,本公开实施例采用的技术方案如下:In order to achieve the above object, the technical solution adopted by the embodiment of the present disclosure is as follows:
第一方面,本公开实施例提供了一种刷人气用户的识别方法,所述方法包括:获取所有用户的用户特征;根据每个用户对应的用户特征确定第一风险评分;识别所有用户中确定的刷人气用户为目标刷人气用户;识别所有用户中确定的非刷人气用户为目标正常用户;根据所述目标刷人气用户和目标正常用户对应的用户特征确定每个用户的第二风险评分;根据所述第一风险评分和所述第二风险评分确定每个用户为刷人气用户的最终风险分数;根据所述最终风险分数识别其他刷人气用户。In a first aspect, an embodiment of the present disclosure provides a method for identifying a popular user, the method comprising: acquiring user characteristics of all users; determining a first risk score according to user characteristics corresponding to each user; and identifying all users The popular user is a popular user for the target; the non-brushing popular user determined among all the users is the target normal user; and the second risk score of each user is determined according to the user characteristics corresponding to the target popular user and the target normal user; Determining, according to the first risk score and the second risk score, a final risk score of each user as a popular user; and identifying other popular users according to the final risk score.
第二方面,本公开实施例还提供了一种刷人气用户的识别装置,所述装置包括:获取模块,配置成获取所有用户的用户特征;第一确定模块,配置成根据每个用户对应的用户特征确定第一风险评分;第一识别模块,配置成识别所有用户中确定的刷人气用户为目标刷人气用户;第二识别模块,配置成识别所有用户中确定的非刷人气用户为目标正常用户; 第二确定模块,配置成根据所述目标刷人气用户和目标正常用户对应的用户特征确定每个用户的第二风险评分;分数确定模块,配置成根据所述第一风险评分和所述第二风险评分确定每个用户为刷人气用户的最终风险分数;并根据所述最终风险分数识别其他刷人气用户。In a second aspect, an embodiment of the present disclosure further provides an identification device for a popular user, the device comprising: an acquisition module configured to acquire user characteristics of all users; and a first determining module configured to correspond to each user The user feature determines a first risk score; the first identification module is configured to identify that the brushed popularity user among all the users is the target popularity user; and the second identification module is configured to identify the non-brush popularity user determined by all the users as the target normal a second determining module, configured to determine a second risk score of each user according to the user characteristics corresponding to the target brush popularity user and the target normal user; the score determining module configured to perform according to the first risk score and the The second risk score determines each user's final risk score for the popular user; and identifies other popular users based on the final risk score.
第三方面,本公开实施例还提供了一种终端设备,所述终端设备包括存储器和处理器,所述存储器配置成存储计算机程序代码,所述处理器配置成执行存储于所述存储器中的计算机程序代码以实现所述的刷人气用户的识别方法。In a third aspect, an embodiment of the present disclosure further provides a terminal device, the terminal device including a memory and a processor, the memory configured to store computer program code, the processor configured to execute stored in the memory The computer program code implements the method of identifying the popular user.
第四方面,本公开实施例还提供了一种终端设备,一种可读存储介质,存储有可执行的指令,所述指令在被一个或多个处理器执行时,实现所述的刷人气用户的识别方法。In a fourth aspect, an embodiment of the present disclosure further provides a terminal device, a readable storage medium, storing executable instructions, where the instructions are implemented when executed by one or more processors User identification method.
本公开实施例提供的一种刷人气用户的识别方法、装置、终端设备及储存介质,该刷人气用户的识别方法及装置应用于终端设备。该刷人气用户的识别方法包括获取所有用户的用户特征,根据每个用户对应的用户特征确定第一风险评分,进而识别所有用户中确定的刷人气用户为目标刷人气用户,识别所有用户中确定的非刷人气用户为目标正常用户,根据目标刷人气用户和目标正常用户确定每个用户的第二风险评分。最后根据第一风险评分和第二风险评分确定每个用户为刷人气用户的最终风险分数。在本方案中,通过两方面分别计算出当前用户的第一风险评分和第二风险评分,进而根据第一风险评分和第二风险评分计算出当前用户为刷人气用户的最终风险分数,其可以在第一风险评分和第二风险评分确定出特征不明显的刷人气用户,避免了刷人气用户被遗漏,提高了识别用户为刷人气用户的准确度。An embodiment of the present disclosure provides a method, an apparatus, a terminal device, and a storage medium for identifying a popular user, and the method and device for identifying the popular user are applied to the terminal device. The method for identifying popular users includes obtaining user characteristics of all users, determining a first risk score according to user characteristics corresponding to each user, and then identifying a popular user determined by all users as a target popular user, and identifying all users. The non-brushing popular user is the target normal user, and the second risk score of each user is determined according to the target brushing popularity user and the target normal user. Finally, the final risk score of each user is determined according to the first risk score and the second risk score. In this solution, the first risk score and the second risk score of the current user are respectively calculated by two aspects, and then the final risk score of the current user is calculated according to the first risk score and the second risk score, which may In the first risk score and the second risk score, the popular popularity of the user is determined, which avoids the omission of the popular user, and improves the accuracy of identifying the user as a popular user.
为使本公开的上述目的、特征和优点能更明显易懂,下文特举较佳实施例,并配合所附附图,作详细说明如下。The above described objects, features, and advantages of the present invention will become more apparent from the description of the appended claims.
附图说明DRAWINGS
为了更清楚地说明本公开实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,应当理解,以下附图仅示出了本公开的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings to be used in the embodiments will be briefly described below. It should be understood that the following drawings show only certain embodiments of the present disclosure, and thus It should be seen as a limitation on the scope, and those skilled in the art can obtain other related drawings according to these drawings without any creative work.
图1示出了本公开实施例提供的一种终端设备的结构示意图。FIG. 1 is a schematic structural diagram of a terminal device according to an embodiment of the present disclosure.
图2示出了本公开实施例提供的一种刷人气用户的识别方法的流程示意图。FIG. 2 is a schematic flow chart of a method for identifying a popular user by providing an embodiment of the present disclosure.
图3示出了本公开实施例提供的一种刷人气用户的识别方法的子步骤的流程示意图。FIG. 3 is a schematic flow chart showing a sub-step of a method for identifying a popular user by providing an embodiment of the present disclosure.
图4示出了本公开实施例提供的一种刷人气用户的识别方法的另一子步骤的流程示意图。FIG. 4 is a schematic flow chart showing another sub-step of a method for identifying a popular user by providing an embodiment of the present disclosure.
图5示出了本公开实施例提供的一种刷人气用户的识别装置的功能模块示意图。FIG. 5 is a schematic diagram of functional modules of an identification device for a popular user of the present disclosure.
图示:100-终端设备;110-刷人气用户的识别装置;120-存储器;130-存储控制器;140-处理器;150-外设接口;160-输入输出单元;170-音频单元;180-显示单元;111-获取模块;112-第一确定模块;113-第一识别模块;114-第二识别模块;115-第二确定模块;116-分数确定模块。Illustration: 100-terminal device; 110-identification device for brushing popular users; 120-memory; 130-storage controller; 140-processor; 150-peripheral interface; 160-input-output unit; 170-audio unit; a display unit; 111-acquisition module; 112-first determination module; 113-first identification module; 114-second identification module; 115-second determination module; 116-score determination module.
具体实施方式Detailed ways
下面将结合本公开实施例中附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本公开实施例的组件可以以各种不同的配置来布置和设计。因此,以下对在附图中提供的本公开的实施例的详细描述并非旨在限制要求保护的本公开的范围,而是仅仅表示本公开的选定实施例。基于本公开的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。The technical solutions in the embodiments of the present disclosure are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present disclosure. It is obvious that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. The components of the disclosed embodiments, which are generally described and illustrated in the figures herein, can be arranged and designed in various different configurations. The detailed description of the embodiments of the present disclosure, which is set forth in the claims All other embodiments obtained by a person skilled in the art based on the embodiments of the present disclosure without creative efforts are within the scope of the present disclosure.
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。同时,在本公开的描述中,术语“第一”、“第二”等仅配置成区分描述,而不能理解为指示或暗示相对重要性。It should be noted that similar reference numerals and letters indicate similar items in the following figures, and therefore, once an item is defined in a drawing, it is not necessary to further define and explain it in the subsequent drawings. Meanwhile, in the description of the present disclosure, the terms "first", "second", and the like are only configured to distinguish descriptions, and are not to be construed as indicating or implying relative importance.
请参照图1,是本公开实施例提供的一种终端设备100的结构示意图,本公开实施例提供的一种刷人气用户的识别方法应用于终端设备100,该终端设备100可以是,但不限于,平板电脑、台式电脑等智能电子设备。该终端设备100包括刷人气用户的识别装置110、存储器120、存储控制器130、处理器140、外设接口150、输入输出单元160、音频单元170以及显示单元180。FIG. 1 is a schematic structural diagram of a terminal device 100 according to an embodiment of the present disclosure. A method for identifying a popular user is applied to the terminal device 100, and the terminal device 100 may be, but not Limited to smart electronic devices such as tablets and desktop computers. The terminal device 100 includes an identification device 110 for brushing popular users, a memory 120, a memory controller 130, a processor 140, a peripheral interface 150, an input and output unit 160, an audio unit 170, and a display unit 180.
所述存储器120、存储控制器130、处理器140、外设接口150、输入输出单元160、音频单元170、显示单元180各元件相互之间直接或间接地电性连接,以实现数据的传输或交互。例如,这些元件相互之间可通过一条或多条通讯总线或信号线实现电性连接。所述刷人气用户的识别装置110包括至少一个可以软件或固件(firmware)的形式存储于所述存储器120中或固化在所述终端设备100的操作系统(operating system,OS)中的软件功能模块。所述处理器140配置成执行存储器120中存储的可执行模块,例如所述刷人气用户的识别装置110包括的软件功能模块或计算机程序。The components of the memory 120, the memory controller 130, the processor 140, the peripheral interface 150, the input and output unit 160, the audio unit 170, and the display unit 180 are directly or indirectly electrically connected to each other to implement data transmission or Interaction. For example, the components can be electrically connected to one another via one or more communication buses or signal lines. The identification device 110 of the popular user includes at least one software function module that can be stored in the memory 120 or is solidified in an operating system (OS) of the terminal device 100 in the form of software or firmware. . The processor 140 is configured to execute an executable module stored in the memory 120, such as a software function module or a computer program included in the identification device 110 of the popular user.
其中,存储器120可以是,但不限于,随机存取存储器(Random Access Memory,RAM),只读存储器(Read Only Memory,ROM),可编程只读存储器(Programmable Read-Only Memory,PROM),可擦除只读存储器(Erasable Programmable Read-Only Memory,EPROM),电可擦除只读存储器(Electric Erasable Programmable Read-Only Memory,EEPROM)等。其中,存储器120配置成存储程序,所述处理器140在接收到执行指令后,执行所述程序, 前述本公开实施例任一实施例揭示的流过程定义的终端设备100所执行的方法可以应用于处理器140中,或者由处理器140实现。The memory 120 can be, but not limited to, a random access memory (RAM), a read only memory (ROM), and a programmable read-only memory (PROM). Erasable Programmable Read-Only Memory (EPROM), Electric Erasable Programmable Read-Only Memory (EEPROM), and the like. The memory 120 is configured to store a program, and the processor 140 executes the program after receiving the execution instruction, and the method executed by the terminal device 100 defined by the flow process disclosed in any of the foregoing embodiments of the present disclosure may be applied. In processor 140, or implemented by processor 140.
处理器140可能是一种集成电路芯片,具有信号的处理能力。上述的处理器140可以是通用处理器,包括中央处理器(Central Processing Unit,简称CPU)、网络处理器(Network Processor,简称NP)等;还可以是数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本公开实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器140也可以是任何常规的处理器140等。 Processor 140 may be an integrated circuit chip with signal processing capabilities. The processor 140 may be a general-purpose processor, including a central processing unit (CPU), a network processor (NP Processor, etc.), or a digital signal processor (DSP), an application specific integrated circuit. (ASIC), Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware component. The methods, steps, and logical block diagrams disclosed in the embodiments of the present disclosure may be implemented or carried out. The general purpose processor may be a microprocessor or the processor 140 may be any conventional processor 140 or the like.
所述外设接口150将各种输入/输出装置耦合至处理器140以及存储器120。在一些实施例中,外设接口150,处理器140以及存储控制器130可以在单个芯片中实现。在其他一些实例中,他们可以分别由独立的芯片实现。The peripheral interface 150 couples various input/output devices to the processor 140 and to the memory 120. In some embodiments, peripheral interface 150, processor 140, and memory controller 130 can be implemented in a single chip. In other instances, they can be implemented by separate chips.
输入输出单元160配置成提供给用户输入数据实现用户与所述终端设备100的交互。所述输入输出单元160可以是,但不限于,鼠标和键盘等。The input output unit 160 is configured to provide input to the user to enable user interaction with the terminal device 100. The input and output unit 160 may be, but not limited to, a mouse, a keyboard, and the like.
音频单元170向用户提供音频接口,其可包括一个或多个麦克风、一个或者多个扬声器以及音频电路。The audio unit 170 provides an audio interface to the user, which may include one or more microphones, one or more speakers, and audio circuitry.
显示单元180在终端设备100与用户之间提供一个交互界面(例如用户操作界面)或配置成显示图像数据给用户参考。在本实施例中,所述显示单元180可以是液晶显示器或触控显示器。若为触控显示器,其可为支持单点和多点触控操作的电容式触控屏或电阻式触控屏等。支持单点和多点触控操作是指触控显示器能感应到来自该触控显示器上一个或多个位置处同时产生的触控操作,并将该感应到的触控操作交由处理器140进行计算和处理。The display unit 180 provides an interactive interface (eg, a user interface) between the terminal device 100 and the user or is configured to display image data to the user for reference. In this embodiment, the display unit 180 can be a liquid crystal display or a touch display. For a touch display, it can be a capacitive touch screen or a resistive touch screen that supports single-point and multi-touch operations. Supporting single-point and multi-touch operations means that the touch display can sense simultaneous touch operations from one or more locations on the touch display, and the touch operation is transferred to the processor 140. Perform calculations and processing.
请参照图2,是本公开实施例提供的一种刷人气用户的识别方法的流程示意图,该刷人气用户的识别方法可配置成识别在各大网站中存在刷单行为的用户,该各大网站可以为,但不限于,电子商务网站(如淘宝)、直播平台等。在本公开实施例中,以该刷人气用户的识别方法应用于直播平台为例进行说明。该方法包括:2 is a schematic flowchart of a method for identifying a popular user of the present disclosure. The method for identifying a popular user may be configured to identify a user who has a billing behavior in each major website. The website can be, but is not limited to, an e-commerce website (such as Taobao), a live broadcast platform, and the like. In the embodiment of the present disclosure, the identification method of the popular user is applied to the live broadcast platform as an example for description. The method includes:
步骤S110,获取所有用户的用户特征。Step S110, acquiring user characteristics of all users.
需要说明的是,该用户特征为与用户刷人气行为有关的用户特征。如在直播平台中,用户通常通过在短时间内观看多个直播间或短时间内在多个直播间发弹幕的方式为主播刷人气,则用户在预定时间内观看的直播间个数或用户在预定时间内在多个直播间发弹幕的个数即为需要获取的用户特征。容易理解的,每个用户对应的用户特征具有多个。而为便于清楚的说明本实施例所提供的方案,以便于本领域常规技术人员能够清楚的理解,可选择的是,用户特征为针对用户在短时间内观看多个直播间时所产生的各种数据,如对每个 直播间观看的开始时间和结束时间、以及对每个直播间的观看时长,例如,用户A的用户特征可以为:用户A观看直播间X的开始时间M1、结束时间M2、以及观看时长M3。It should be noted that the user feature is a user feature related to the user's popular behavior. For example, in the live broadcast platform, the user usually broadcasts popularity in a plurality of live broadcast rooms or in a short time in a plurality of live broadcast rooms, and the number of live broadcast rooms or users that the user watches during the predetermined time is The number of bullets that are played in multiple live broadcasts within a predetermined time period is the user feature that needs to be acquired. It is easy to understand that each user has multiple user features. In order to facilitate the clear description of the solution provided by the embodiment, so that those skilled in the art can clearly understand, the user feature can be selected for each time when the user views multiple live rooms in a short time. The data, such as the start time and end time of each live room watch, and the watch time for each live room, for example, the user feature of user A may be: user A watching the start time M1 of the live room X, the end time M2 and the viewing time M3.
进一步的,终端设备在获取所有用户的用户特征其可以有,例如三种方式可采用。Further, the terminal device may acquire user characteristics of all users, for example, three ways may be adopted.
作为第一种实施方式,终端设备可以获得一直播间在一次直播过程中所产生的用户日志,其中,用户日志表征出在整个直播过程中观看该直播间的每个用户在整个观看过程中对包括该直播间在内的所有观看过直播间的观看情况,即可以获得每个用户对每个观看过直播间的开始时间和结束时间、以及观看时长,进而则获得了所有观看该直播间的所有用户的用户特征。As a first implementation manner, the terminal device may obtain a user log generated during a live broadcast of a live broadcast, where the user log indicates that each user who views the live broadcast room during the entire live broadcast process is in the entire viewing process. All the viewings of the live broadcast room, including the live broadcast room, can obtain the start time and end time of each user watching each live broadcast room, as well as the viewing time, and then obtain all the viewing of the live broadcast room. User characteristics for all users.
作为第二种实施方式,终端设备可以按一预设获得时长,并以预设时间间隔,间断性的去获得一直播间在直播过程中每个预设获得时长内产生的用户日志,其中,用户日志表征出在预设获得时长内观看该直播间的每个用户在整个观看过程中对包括该直播间在内的所有观看过直播间的观看情况,即可以获得在预设获得时长内每个用户对每个观看过直播间的开始时间和结束时间、以及观看时长,进而则获得了所有观看该直播间的所有用户的用户特征。例如,若预设获得时长为2小时,而预设时间间隔为30分钟,那么终端设备则在直播间直播过程中的某一2小时内持续的去获得该2小时内产生的用户日志,之后间断30分钟的时长,并在间断30分钟之后又继续获得下一个2小时产生的用户日志,从而形成循环并直至直播结束。可以理解到,采用该方式可以有效的降低终端设备对数据的处理量,在一定程度上可以达到降低功耗的效果。As a second implementation manner, the terminal device may obtain the duration of the time according to a preset, and intermittently obtain a user log generated during a preset duration of the live broadcast during a live broadcast, where The user log indicates that each user who views the live broadcast room during the preset acquisition time has watched all the viewed live broadcast rooms including the live broadcast room during the entire viewing process, that is, every preset time period can be obtained. Each user views the start time and end time of each live broadcast room, as well as the viewing time, and then obtains the user characteristics of all users who view the live broadcast room. For example, if the preset acquisition time is 2 hours and the preset time interval is 30 minutes, the terminal device continuously obtains the user log generated within the 2 hours during a certain 2 hours during the live broadcast of the live broadcast. Intermittently for a period of 30 minutes, and after 30 minutes of interruption, continue to obtain the user log generated by the next 2 hours, thereby forming a loop and ending the broadcast. It can be understood that the method can effectively reduce the processing capacity of the terminal device for data, and can reduce the power consumption to a certain extent.
作为第三种实施方式,终端设备先可以对直播间的当前人气值进行监测,从而可以实时获得该直播间的当前人气值。而终端设备通过预先设置的一个人气阈值,则可以判断该当前人气值是否小于该人气阈值。在判断为不小于时,则说明此时该直播间的人气比较高,直播间的观看用户的数量也够多,该直播间的关注度也比较高(例如大主播),进而该直播间的人气则很可能是基于该主播的影响力,而刷人气则可能是小概率事件。因此,终端设备可以不做进一步的处理,并继续监测。在判断为小于时,则说明此时该直播间的人气比较低,直播间的观看用户的数量也比较少,该直播间的关注度也比较低(例如小主播),进而该直播间则有可能荣哥刷人气来提供关注度。因此,终端设备可以从此时开始获得该直播间所产生的用户日志,并直至该直播间的本次直播结束,其中,用户日志表征出从监测开始到直播结束的过程中观看该直播间的每个用户在整个观看过程中对包括该直播间在内的所有观看过直播间的观看情况,即可以获得每个用户对每个观看过直播间的开始时间和结束时间、以及观看时长,进而则也获得了所有观看该直播间的所有用户的用户特征。可以理解到,采用该方式使得终端设备在需要获取所有用户的用户特征时才执行该流程,从而也可以有效的降低终端设备对数据的处理量,并也在一定程度上可以达到降低功耗的 效果。As a third implementation manner, the terminal device can first monitor the current popularity value between the live broadcasts, so that the current popularity value of the live broadcast room can be obtained in real time. The terminal device can determine whether the current popularity value is less than the popularity threshold by passing a preset popularity threshold. When it is judged to be not less than, it means that the popularity of the live broadcast room is relatively high, and the number of viewing users in the live broadcast room is also sufficient, and the attention of the live broadcast room is relatively high (for example, a big anchor), and then the live broadcast room Popularity is likely to be based on the influence of the anchor, while popularity may be a small probability event. Therefore, the terminal device can proceed without further processing and continue monitoring. When it is judged to be less than, it means that the popularity of the live broadcast room is relatively low, and the number of viewing users in the live broadcast room is relatively small, and the attention degree of the live broadcast room is relatively low (for example, a small anchor), and then the live broadcast room has Perhaps Rong Ge brushed popularity to provide attention. Therefore, the terminal device can obtain the user log generated by the live broadcast room from this time, and until the live broadcast of the live broadcast room ends, wherein the user log indicates that each live broadcast room is viewed during the process from the start of monitoring to the end of the live broadcast. During the entire viewing process, the user can view the viewing time of all the live broadcast rooms including the live broadcast room, that is, the start time and end time of each user watching each live broadcast room, and the viewing time, and then the viewing time. User characteristics of all users viewing the live room were also obtained. It can be understood that, in this manner, the terminal device performs the process when it is required to acquire the user features of all users, thereby effectively reducing the processing capacity of the terminal device for data, and also reducing power consumption to a certain extent. effect.
需要说明的是,在第三种实施方式中,根据实际情况,比如人气高的直播间反而容易出现刷人气的现象时,终端设备也可以调整为针对当前人气值大于该人气阈值时,进行监测。也就是说,本实施例所提供的实施方式仅为一种可选的方式,并不作为限定,其具体的实施可根据实际情况进行选择。步骤S120,根据每个用户对应的用户特征确定第一风险评分。It should be noted that, in the third embodiment, according to the actual situation, for example, when the live broadcast room with high popularity is likely to be popular, the terminal device may also be adjusted to monitor when the current popularity value is greater than the popularity threshold. . That is to say, the embodiment provided in this embodiment is only an optional manner, and is not limited thereto, and the specific implementation may be selected according to actual conditions. Step S120: Determine a first risk score according to a user characteristic corresponding to each user.
即是说,根据获取的用户特征评判每个用户的第一风险评分,根据该第一风险评分判断该用户为刷人气用户的可能性大小。请参照图3,是本公开实施例提供的一种刷人气用户的识别方法的步骤S120的子步骤的流程示意图,该步骤S120包括:That is to say, the first risk score of each user is judged according to the acquired user characteristics, and the possibility of the user being the popular user is determined according to the first risk score. Please refer to FIG. 3 , which is a schematic flowchart of a sub-step of step S120 of the method for identifying a popular user of the present disclosure. The step S120 includes:
步骤S121,依据所述用户特征采用预定算法对所有用户进行分类。Step S121, classifying all users according to the user feature by using a predetermined algorithm.
该预定算法为k-means算法,该k-means算法为一种聚类算法,通过中心点的不断迭代和距离计算实现分组的效果。进而,依据获取的用户特征根据该k-means算法将用户分成10类,容易理解的,该具体划分多少类可根据用户需要自定义设置。The predetermined algorithm is a k-means algorithm, which is a clustering algorithm, and the effect of grouping is realized by continuous iteration of the center point and distance calculation. Further, according to the acquired user characteristics, the user is divided into 10 categories according to the k-means algorithm, which is easy to understand, and the specific division can be customized according to the needs of the user.
步骤S122,选取每类用户对应的用户特征中的关键指标。Step S122, selecting key indicators in the user characteristics corresponding to each type of user.
即是说,划分为一类的用户具有相似的用户特征,进而针对每一类用户对应的用户特征选取其中的关键指标,如该关键指标可以是用户对每个直播间的观看时长等,较优地,该关键指标为多个,也就是说,关键指标则为用户在观看多个直播间的整个过程中,多个直播间对应的多个观看时长。步骤S123,计算每类下的用户的关键指标的平均值。That is to say, users classified into one class have similar user characteristics, and then select key indicators for each user feature corresponding to each type of user, for example, the key indicator may be the user's viewing time for each live broadcast, etc. Preferably, the key indicator is multiple, that is, the key indicator is a plurality of viewing durations corresponding to multiple live broadcasts during the entire process of viewing multiple live broadcasts. In step S123, an average value of key indicators of users under each category is calculated.
步骤S124,根据所述平均值的大小确定每个用户的第一风险评分。Step S124, determining a first risk score of each user according to the size of the average value.
即是说,若选取的关键指标为用户对每个直播间的观看时长等之类的用户特征,其时长越短说明用户为主播刷人气的可能性越高,进而该多个关键指标的平均值越小,用户为刷人气用户的风险越高,进而将风险高的类别记为1分,将风险低的类别记为0.1分。进而所有类别的用户将根据所属的类别确定第一风险评分。That is to say, if the selected key indicator is the user's characteristics such as the viewing time of each live broadcast room, the shorter the duration, the higher the probability that the user is popular for the main broadcast, and then the average of the multiple key indicators. The smaller the value, the higher the risk that the user is a popular user, and the high risk category is recorded as 1 point, and the low risk category is recorded as 0.1 point. In turn, all categories of users will determine the first risk score based on the category they belong to.
步骤S130,识别所有用户中确定的刷人气用户为目标刷人气用户。In step S130, it is identified that the popular user identified by all the users is the target popular user.
即是说,刷选出用户特征符合预设强规则的用户为目标刷人气用户,该预设强规则要求用户符合预定的具有严格限制的特征,如该特征可以是,但不限于,某个用户在预定时间内更换了多个设备登录自己的账号,或是对于直播平台而言,针对该用户后台并没有发弹幕的日志,但是该用户却可以持续不断地发弹幕。该预设强规则表明该用户具有明显的刷人气特征,则若某个用户具有符合预设强规则的用户特征,则确定该用户为目标刷人气用户。可以确定的是,符合该预设强规则的用户一定为目标刷人气用户,但是并不能排除不符合该预设强规则的用户一定不是目标刷人气用户,因为该预设强规则只是能够筛选出具有明显刷人气行为的用户,但是并不能排除其他用户刷人气的嫌疑。That is to say, the user who selects the user feature to meet the preset strong rule is the target user, and the preset strong rule requires the user to meet the predetermined strict restriction feature, such as, but not limited to, a certain feature. The user has changed multiple devices to log in to his account within a predetermined time, or for the live broadcast platform, there is no log for the user in the background, but the user can continue to play the bullet. The preset strong rule indicates that the user has an obvious popularity feature, and if a user has a user feature that meets a preset strong rule, it is determined that the user is a target popular user. It can be determined that the user who meets the preset strong rule must be a popular user for the target, but it cannot be excluded that the user who does not meet the preset strong rule must not be the target user, because the preset strong rule can only be filtered out. Users with obvious popularity, but can not rule out the suspicion of other users.
步骤S140,识别所有用户中确定的非刷人气用户为目标正常用户。Step S140, identifying that the non-brushing popular users determined among all the users are the target normal users.
即是说,可以根据预定规则识别出所有用户中的正常用户,该预定规则包括多个用户特征,由该多个用户特征共同确定对应的用户为目标正常用户,其中,确定某一个用户为正常用户的最重要用户特征为付费行为,亦即是说,判断该用户是否有持续的为所享受的服务付费的行为。该正常用户指确定没有刷人气行为的用户。That is to say, a normal user among all users can be identified according to a predetermined rule, the predetermined rule includes a plurality of user features, and the plurality of user features jointly determine that the corresponding user is a target normal user, wherein determining that one user is normal The most important user characteristic of the user is the payment behavior, that is, the behavior of determining whether the user has continuously paid for the service enjoyed. The normal user refers to a user who determines that there is no popular behavior.
步骤S150,根据所述目标刷人气用户和目标正常用户对应的用户特征确定每个用户的第二风险评分。Step S150: Determine a second risk score of each user according to the user characteristics corresponding to the target brush popularity user and the target normal user.
具体为,请参照图4,是本公开实施例提供的一种刷人气用户的识别方法的步骤S150的子步骤的流程示意图,该步骤S150包括:Specifically, please refer to FIG. 4 , which is a schematic flowchart of a sub-step of step S150 of the method for identifying a popular user of the present disclosure. The step S150 includes:
步骤S151,构建所有用户间的网络结构。In step S151, a network structure between all users is constructed.
容易理解的,每个用户在登录网站或直播平台时具有唯一的IP或某一台固定的设备地址,但是由于存在刷人气的用户,使得存在多个用户具有相同IP地址或设备地址的情况。由此,将具有相同IP地址或设备地址的用户通过直线连接起来,进而形成所有用户间的网络结构。It is easy to understand that each user has a unique IP or a fixed device address when logging in to the website or live broadcast platform, but there are cases where multiple users have the same IP address or device address due to the presence of a popular user. Thus, users with the same IP address or device address are connected by a straight line, thereby forming a network structure among all users.
步骤S152,根据所述目标刷人气用户和目标正常用户的用户特征依次计算每两个用户之间的相似度权值。Step S152: Calculate the similarity weight between each two users according to the user characteristics of the target brush popularity user and the target normal user.
由于事先已经确定了目标刷人气用户和目标正常用户,则描述用户为目标刷人气用户的用户特征已经确定,描述用户为目标正常用户的用户特征也已经确定。若在网络结构中,某一用户与目标刷人气用户和/或目标正常用户连接,则选取该用户与所述目标刷人气用户相同的用户特征,计算该用户与目标刷人气用户的相似度权值,同时选取该用户与所述目标正常用户相同的用户特征,计算该用户与目标刷人气用户的相似度权值。Since the target brush popularity user and the target normal user have been determined in advance, the user characteristics describing the user as the target brush popularity user have been determined, and the user characteristics describing the user as the target normal user have also been determined. If, in the network structure, a user is connected to the target user and/or the target normal user, the same user feature of the user and the target user is selected, and the similarity right between the user and the target user is calculated. The value, the same user feature of the user and the target normal user is selected, and the similarity weight of the user and the target popular user is calculated.
如未确定是否存在刷人气嫌疑的用户A与用户B、用户C和用户D在网络结构中连接,其中,用户B和用户C都是确定的目标刷人气用户,用户D为确定的目标正常用户,则分别计算用户A和用户B、C、D之间的相似度,假设计算结果为W(AB)=0.7W(AC)=0.5W(AD)=0.3,即是说,用户A和用户B之间的相似度为0.7,用户A和用户C之前的相似度为0.5,用户A和用户D之间的相似度为0.3。该相似度的计算公式为:If it is not determined whether there is a suspected user A and user B, user C and user D are connected in the network structure, wherein both user B and user C are determined target popularity users, and user D is the determined target normal user. , respectively, the similarity between user A and users B, C, and D is calculated, and the calculation result is W(AB)=0.7W(AC)=0.5W(AD)=0.3, that is, user A and user The similarity between B is 0.7, the similarity between user A and user C is 0.5, and the similarity between user A and user D is 0.3. The similarity is calculated as:
Figure PCTCN2018084638-appb-000001
Figure PCTCN2018084638-appb-000001
其中,Xui为用户u的第i个用户特征,Xvi为用户v的第i个用户特征,w uv是用户u和用户v之间的相似度权值。 Among them, Xui is the i-th user feature of user u, Xvi is the i-th user feature of user v, and w uv is the similarity weight between user u and user v.
需要说明的是,由于只有目标刷人气用户和目标正常用户是确定的,则在最开始计算时,即便存在该未知是否存在刷人气用户与其他未知用户和目标刷人气用户、目标正常用户同时具有连接关系,也只计算该未知是否存在刷人气用户与目标刷人气用户、目标正常用户之间的相似度。进而通过该算法使得当前未知嫌疑的用户的相似度权值已知时,则可根据在网络结构中当前未知嫌疑的用户与其他用户的连接关系计算其他用户的相似度权值。It should be noted that since only the target brush popularity user and the target normal user are determined, at the beginning of the calculation, even if there is the unknown or not, the brush popularity user has the same as other unknown users, the target brush popularity user, and the target normal user. The connection relationship only calculates whether the unknown has a similarity between the popular user and the target brush user and the target normal user. Then, by using the algorithm to make the similarity weight of the currently unknown user known, the similarity weights of other users can be calculated according to the connection relationship between the currently unknown user and other users in the network structure.
步骤S153,根据在所述网络结构中与当前用户连接的其他用户的相似度权值,以及所述目标刷人气用户和目标正常用户的初始评分计算所述当前用户的第二风险评分。Step S153: Calculate a second risk score of the current user according to a similarity weight of other users connected to the current user in the network structure, and an initial score of the target brush popularity user and the target normal user.
即是说,在知晓各个用户之间的相似度权值的情况下,可根据该相似度权值以及目标刷人气用户和目标正常用户之间的初始评分计算每个用户的第二风险评分。具体为,设定目标刷人气用户B和C的初始评分为S(B)_0=1S(C)_0=1,目标正常用户D的初始评分为S(D)_0=0,则未知嫌疑A的分数的计算方式为:S(A)_0=(1*0.7+1*0.5)/(0.7+0.5+0.3)=0.75。需要说明的是,该用户A的分数仅仅是初始计算的得分,仅计算了用户A和目标刷人气用户和目标正常用户之间具有连接关系的情况,而排除了用户A同时与目标刷人气用户、目标正常用户、其他嫌疑用户连接的情况,则为了提高计算的准确度,需要对该算法进行多轮迭代,使得最后A的第二风险评分趋于一个稳定数值,继而将该稳定数值作为用户A的第二风险评分。That is to say, in the case of knowing the similarity weight between the respective users, the second risk score of each user can be calculated according to the similarity weight and the initial score between the target brush popularity user and the target normal user. Specifically, the initial scores of the target brush popularity users B and C are S(B)_0=1S(C)_0=1, and the initial score of the target normal user D is S(D)_0=0, then the suspect A is unknown. The score is calculated as: S(A)_0=(1*0.7+1*0.5)/(0.7+0.5+0.3)=0.75. It should be noted that the score of the user A is only the initial calculated score, and only the connection relationship between the user A and the target brush popularity user and the target normal user is calculated, and the user A and the target brush popularity user are excluded. In the case where the target normal user or other suspected users are connected, in order to improve the accuracy of the calculation, it is necessary to perform multiple rounds of iteration on the algorithm, so that the second risk score of the last A tends to a stable value, and then the stable value is used as the user. A second risk score.
具体为,若A除了与用户B、C、D连接外,还与一个未知嫌疑的用户E连接,则首先计算用户A和用户E之间的相似度权值,假设为W(AE)=0.5,则该用户A的分数计算方式为:S(A)_1=0.8*S(A)_0+0.2*(1*0.7+1*0.5+S(E)_0)*0.5/(0.7+0.5+0.3+0.5)Specifically, if A is connected to user B, C, and D, and is connected to an unknown user E, the similarity weight between user A and user E is first calculated, and assumed to be W(AE)=0.5. , the user A's score is calculated as: S(A)_1=0.8*S(A)_0+0.2*(1*0.7+1*0.5+S(E)_0)*0.5/(0.7+0.5+ 0.3+0.5)
由此可见,在第二轮计算A的分数时,是把第一轮计算得到的用户A的分数S(A)_0作为基础分数参与第二轮的计算,进而通过算法对A进行迭代计算更加精确用户A的第二风险评分。较优地,在本公开实施例中,计算每个用户的第二风险评分时,迭代次数为15次,选取经过多次迭代而趋于稳定的分数为用户对应的第二风险评分。该第二风险评分的具体算法公式为:It can be seen that when calculating the score of A in the second round, the score S(A)_0 of the user A calculated in the first round is used as the basic score to participate in the calculation of the second round, and then the iterative calculation of A is performed by the algorithm. Precise user A's second risk score. Preferably, in the embodiment of the present disclosure, when the second risk score of each user is calculated, the number of iterations is 15 times, and the score that tends to be stable after multiple iterations is selected as the second risk score corresponding to the user. The specific algorithm formula of the second risk score is:
Figure PCTCN2018084638-appb-000002
Figure PCTCN2018084638-appb-000002
其中,S k(i)是用户i在第k轮迭代时的第二风险评分,α是权重系数,其取值在0到1之间,w ji是用户j和用户i之间的相似度权重。 Where S k (i) is the second risk score of user i in the k-th iteration, α is the weight coefficient, which is between 0 and 1, w ji is the similarity between user j and user i Weights.
步骤S160,根据所述第一风险评分和所述第二风险评分确定每个用户为刷人气用户的 最终风险分数,并根据所述最终风险分数识别其他刷人气用户。Step S160, determining, according to the first risk score and the second risk score, a final risk score of each user as a popular user, and identifying other popular users according to the final risk score.
之前通过对用户分类确定了用户的第一风险评分,进而通过对用户进行迭代计算确定了用户的第二风险评分,进而根据该第一风险评分和第二风险评分共同确定用户的最终风险分数。该最终风险分数的计算方式为:The user's first risk score is determined by classifying the user, and then the user's second risk score is determined by iteratively calculating the user, and then the user's final risk score is determined according to the first risk score and the second risk score. The final risk score is calculated as:
S u=w 1S u1+w 2S u2 S u =w 1 S u1 +w 2 S u2
其中,w i(i=1,2)是权重系数,取值范围在0到1之间,并且满足,
Figure PCTCN2018084638-appb-000003
为第一风险评分,S u2为第二风险评分。容易理解的,由于通过算法比通过分类对用户是否为刷人气用户评判更加精确,由此,权重系数W1通常小于W2。
Where w i (i=1, 2) is a weight coefficient, and the value ranges from 0 to 1, and satisfies,
Figure PCTCN2018084638-appb-000003
For the first risk score, S u2 is the second risk score. It is easy to understand that the weighting coefficient W1 is usually smaller than W2 because it is more accurate by the algorithm than by the classification to judge whether the user is a popular user.
进一步地,将每个用户为刷人气用户的最终风险分数与预定阈值进行比较,若最终风险分数大于预定阈值,则判定该用户为恶意的刷人气用户,进而通过该手段判断用户是否为刷人气用户准确度更高,并且可根据每个用户当前的最终风险分数预估该用户是否可能成为恶意刷人气用户,以便于后续对该用户进行重点监控。Further, comparing the final risk score of each user to the popular user is compared with a predetermined threshold. If the final risk score is greater than the predetermined threshold, determining that the user is a maliciously popular user, and determining whether the user is popular by the means. The user is more accurate, and can estimate whether the user may become a malicious user according to the current final risk score of each user, so as to focus on the subsequent monitoring of the user.
请参照图5,是本公开实施例提供的一种刷人气用户的识别装置110的功能模块示意图,该装置包括获取模块111、第一确定模块112、第一识别模块113、第二识别模块114、第二确定模块115以及分数确定模块116。Please refer to FIG. 5 , which is a schematic diagram of functional modules of a device for identifying a popular user of the present disclosure. The device includes an obtaining module 111 , a first determining module 112 , a first identifying module 113 , and a second identifying module 114 . The second determining module 115 and the score determining module 116.
获取模块111,配置成获取所有用户的用户特征。The obtaining module 111 is configured to acquire user characteristics of all users.
在本公开实施例中,步骤S110可以由获取模块111执行。In the embodiment of the present disclosure, step S110 may be performed by the acquisition module 111.
第一确定模块112,配置成根据每个用户对应的用户特征确定第一风险评分。The first determining module 112 is configured to determine a first risk score according to a user characteristic corresponding to each user.
在本公开实施例中,步骤S120~S124可以由第一确定模块112执行。In the embodiment of the present disclosure, steps S120-S124 may be performed by the first determining module 112.
第一识别模块113,配置成识别所有用户中确定的刷人气用户为目标刷人气用户。The first identification module 113 is configured to identify that the brushed popularity user among all the users is the target brush popularity user.
在本公开实施例中,步骤S130可以由第一识别模块113执行。In the embodiment of the present disclosure, step S130 may be performed by the first identification module 113.
第二识别模块114,配置成识别所有用户中确定的非刷人气用户为目标正常用户。The second identification module 114 is configured to identify the non-brushing popular users determined among all the users as the target normal users.
在本公开实施例中,步骤S140可以由第二识别模块114执行。In the embodiment of the present disclosure, step S140 may be performed by the second identification module 114.
第二确定模块115,配置成根据所述目标刷人气用户和目标正常用户确定每个用户的第二风险评分。The second determining module 115 is configured to determine a second risk score of each user according to the target brush popularity user and the target normal user.
在本公开实施例中,步骤S150~S153可以由第二确定模块115执行。In the embodiment of the present disclosure, steps S150 to S153 may be performed by the second determination module 115.
分数确定模块116,配置成根据所述第一风险评分和所述第二风险评分确定每个用户为刷人气用户的最终风险分数,并根据所述最终风险分数识别其他刷人气用户。The score determination module 116 is configured to determine a final risk score for each user as a popular user based on the first risk score and the second risk score, and identify other popular users based on the final risk score.
在本公开实施例中,步骤S160可以由分数确定模块116执行。In an embodiment of the present disclosure, step S160 may be performed by score determination module 116.
由于在刷人气用户的识别方法部分已经详细描述,在此不再赘述。Since it has been described in detail in the method of identifying the popular user, it will not be described here.
综上所述,本公开实施例提供的一种刷人气用户的识别方法、装置、终端设备及储存介质,该刷人气用户的识别方法及装置应用于终端设备。该刷人气用户的识别方法包括获取所有用户的用户特征,根据每个用户对应的用户特征确定第一风险评分,进而识别所有用户中确定的刷人气用户为目标刷人气用户,识别所有用户中确定的非刷人气用户为目标正常用户,根据目标刷人气用户和目标正常用户确定每个用户的第二风险评分。最后根据第一风险评分和第二风险评分确定每个用户为刷人气用户的最终风险分数。在本方案中,通过两方面分别计算出当前用户的第一风险评分和第二风险评分,进而根据第一风险评分和第二风险评分计算出当前用户为刷人气用户的最终风险分数,其计算更加复杂,提高了识别用户为刷人气用户的准确度。In summary, an embodiment of the present disclosure provides a method, an apparatus, a terminal device, and a storage medium for identifying a popular user, and the method and device for identifying the popular user are applied to the terminal device. The method for identifying popular users includes obtaining user characteristics of all users, determining a first risk score according to user characteristics corresponding to each user, and then identifying a popular user determined by all users as a target popular user, and identifying all users. The non-brushing popular user is the target normal user, and the second risk score of each user is determined according to the target brushing popularity user and the target normal user. Finally, the final risk score of each user is determined according to the first risk score and the second risk score. In the present solution, the first risk score and the second risk score of the current user are respectively calculated by two aspects, and then the final risk score of the current user as the popular user is calculated according to the first risk score and the second risk score, and the calculation is performed. More complex, improving the accuracy of identifying users as popular users.
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,也可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,附图中的流程图和框图显示了根据本公开的多个实施例的装置、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或代码的一部分,所述模块、程序段或代码的一部分包含一个或多个配置成实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现方式中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may also be implemented in other manners. The apparatus embodiments described above are merely illustrative, for example, the flowcharts and block diagrams in the figures illustrate the architecture, functionality, and functionality of possible implementations of apparatus, methods, and computer program products according to various embodiments of the present disclosure. operating. In this regard, each block of the flowchart or block diagram can represent a module, a program segment, or a portion of code that comprises one or more of the Executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the blocks may also occur in a different order than those illustrated in the drawings. For example, two consecutive blocks may be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts, can be implemented in a dedicated hardware-based system that performs the specified function or function. Or it can be implemented by a combination of dedicated hardware and computer instructions.
另外,在本公开各个实施例中的各功能模块可以集成在一起形成一个独立的部分,也可以是各个模块单独存在,也可以两个或两个以上模块集成形成一个独立的部分。In addition, each functional module in various embodiments of the present disclosure may be integrated to form a separate part, or each module may exist separately, or two or more modules may be integrated to form a separate part.
所述功能如果以软件功能模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那 些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。The functions, if implemented in the form of software functional modules and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, the portion of the technical solution of the present disclosure that contributes in essence or to the prior art or the portion of the technical solution may be embodied in the form of a software product stored in a storage medium, including The instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present disclosure. The foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. . It should be noted that, in this context, relational terms such as first and second are used merely to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply such entities or operations. There is any such actual relationship or order between them. Furthermore, the term "comprises" or "comprises" or "comprises" or any other variations thereof is intended to encompass a non-exclusive inclusion, such that a process, method, article, or device that comprises a plurality of elements includes not only those elements but also Other elements, or elements that are inherent to such a process, method, item, or device. An element that is defined by the phrase "comprising a ..." does not exclude the presence of additional equivalent elements in the process, method, item, or device that comprises the element.
以上所述仅为本公开的优选实施例而已,并不配置成限制本公开,对于本领域的技术人员来说,本公开可以有各种更改和变化。凡在本公开的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本公开的保护范围之内。应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。The above description is only a preferred embodiment of the present disclosure, and is not intended to limit the disclosure, and various changes and modifications may be made to the present disclosure. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and scope of the present disclosure are intended to be included within the scope of the present disclosure. It should be noted that similar reference numerals and letters indicate similar items in the following figures, and therefore, once an item is defined in a drawing, it is not necessary to further define and explain it in the subsequent drawings.
以上所述,仅为本公开的具体实施方式,但本公开的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应所述以权利要求的保护范围为准。The above is only the specific embodiment of the present disclosure, but the scope of the present disclosure is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope of the disclosure. It should be covered within the scope of protection of the present disclosure. Therefore, the scope of protection of the present disclosure should be determined by the scope of the claims.
工业实用性Industrial applicability
本公开实施例提供了刷人气用户的识别方法、装置、终端设备及储存介质,该刷人气用户的识别方法及装置应用于终端设备,由此避免了刷人气用户被遗漏,提高了识别用户为刷人气用户的准确度。The embodiments of the present disclosure provide a method, a device, a terminal device, and a storage medium for identifying a popular user. The method and device for identifying a popular user are applied to the terminal device, thereby avoiding the omission of the popular user and improving the identification of the user. Brush the accuracy of popular users.

Claims (16)

  1. 一种刷人气用户的识别方法,其特征在于,所述方法包括:A method for identifying a popular user, characterized in that the method comprises:
    获取所有用户的用户特征;Get user characteristics of all users;
    根据每个用户对应的用户特征确定第一风险评分;Determining a first risk score according to a user characteristic corresponding to each user;
    识别所有用户中确定的刷人气用户为目标刷人气用户;Identifying the popular users identified by all users as the target to popular users;
    识别所有用户中确定的非刷人气用户为目标正常用户;Identifying non-brushing popular users identified among all users as target normal users;
    根据所述目标刷人气用户和目标正常用户对应的用户特征确定每个用户的第二风险评分;Determining a second risk score of each user according to the user characteristics corresponding to the target brush popularity user and the target normal user;
    根据所述第一风险评分和所述第二风险评分确定每个用户为刷人气用户的最终风险分数;Determining, according to the first risk score and the second risk score, a final risk score of each user as a popular user;
    根据所述最终风险分数识别其他刷人气用户。Identify other popular users based on the final risk score.
  2. 如权利要求1所述的方法,其特征在于,所述根据每个用户对应的用户特征确定第一风险评分的步骤包括:The method of claim 1, wherein the determining the first risk score according to the user characteristics corresponding to each user comprises:
    依据所述用户特征采用预定算法对所有用户进行分类;All users are classified according to the user characteristics by using a predetermined algorithm;
    选取每类用户对应的用户特征中的关键指标;Select key indicators in the user characteristics corresponding to each type of user;
    计算每类下的用户的关键指标的平均值;Calculate the average of the key indicators for users under each category;
    根据所述平均值的大小确定每个用户的第一风险评分。A first risk score for each user is determined based on the size of the average.
  3. 如权利要求2所述的方法,其特征在于,用户的关键指标包括;用户观看多个直播间所对应的多个观看时长。The method of claim 2, wherein the user's key metrics include: the plurality of viewing durations corresponding to the plurality of live rooms being viewed by the user.
  4. 如权利要求1-3任一权项所述的方法,其特征在于,所述根据所述目标刷人气用户和目标正常用户对应的用户特征确定每个用户的第二风险评分的步骤包括:The method according to any one of claims 1 to 3, wherein the step of determining the second risk score of each user according to the user characteristics corresponding to the target brush popularity user and the target normal user comprises:
    构建所有用户间的网络结构;Build a network structure between all users;
    根据所述目标刷人气用户和目标正常用户的用户特征依次计算每两个用户之间的相似度权值;Calculating the similarity weight between each two users according to the user characteristics of the target brush popularity user and the target normal user;
    根据在所述网络结构中与当前用户连接的其他用户的相似度权值,以及所述目标刷人气用户和目标正常用户的初始评分计算所述当前用户的第二风险评分。Calculating a second risk score of the current user according to a similarity weight of other users connected to the current user in the network structure, and an initial score of the target brush popularity user and the target normal user.
  5. 如权利要求4所述的方法,其特征在于,所述计算每两个用户之间的相似度权值的计算方法为:The method of claim 4 wherein said calculating a similarity weight between each two users is:
    Figure PCTCN2018084638-appb-100001
    Figure PCTCN2018084638-appb-100001
    其中,Xui为用户u的第i个用户特征,Xvi为用户v的第i个用户特征,w uv是用 户u和用户v之间的相似度权值。 Among them, Xui is the i-th user feature of user u, Xvi is the i-th user feature of user v, and w uv is the similarity weight between user u and user v.
  6. 如权利要求5所述的方法,其特征在于,所述计算当前用户的第二风险评分的计算方法为:The method of claim 5, wherein the calculating the second risk score of the current user is:
    Figure PCTCN2018084638-appb-100002
    Figure PCTCN2018084638-appb-100002
    其中,S k(i)是用户i在第k轮迭代时的第二风险评分,α是权重系数,其取值在0到1之间,w ji是用户j和用户i之间的相似度权重。 Where S k (i) is the second risk score of user i in the k-th iteration, α is the weight coefficient, which is between 0 and 1, w ji is the similarity between user j and user i Weights.
  7. 如权利要求1-6任一权项所述的方法,其特征在于,所述根据所述最终风险分数识别其他刷人气用户的步骤包括:The method of any of claims 1-6, wherein the step of identifying other popular users based on the final risk score comprises:
    将每个用户的最终风险分数与预定阈值进行比较,若所述最终风险分数大于预定阈值,则所述最终风险分数对应的用户为刷人气用户。The final risk score for each user is compared to a predetermined threshold, and if the final risk score is greater than a predetermined threshold, the user corresponding to the final risk score is a popular user.
  8. 如权利要求1-7任一权项所述的方法,其特征在于,所述获取所有用户的用户特征,包括:The method of any of claims 1-7, wherein the obtaining user characteristics of all users comprises:
    在直播间的直播过程中,按一预设获得时长,并在连续两次的预设获得时长之间以一预设时间间隔呈间断性的获取每个预设获得时长内所有用户的用户特征。During the live broadcast of the live broadcast, the duration is obtained according to a preset, and the user characteristics of all users in each preset acquisition duration are intermittently obtained at a preset time interval between two consecutive preset acquisition durations. .
  9. 如权利要求1-7任一权项所述的方法,其特征在于,所述获取所有用户的用户特征,包括:The method of any of claims 1-7, wherein the obtaining user characteristics of all users comprises:
    在直播间的直播过程中,且在确定所述直播间的当前人气值小于预设的人气阈值时,获取所述直播间的所有用户的用户特征。During the live broadcast of the live broadcast, and when it is determined that the current popularity value of the live broadcast room is less than a preset popularity threshold, the user characteristics of all users in the live broadcast room are obtained.
  10. 一种刷人气用户的识别装置,其特征在于,所述装置包括:An identification device for a popular user, characterized in that the device comprises:
    获取模块,配置成获取所有用户的用户特征;Obtaining a module configured to acquire user characteristics of all users;
    第一确定模块,配置成根据每个用户对应的用户特征确定第一风险评分;a first determining module, configured to determine a first risk score according to a user feature corresponding to each user;
    第一识别模块,配置成识别所有用户中确定的刷人气用户为目标刷人气用户;a first identification module configured to identify a popular user identified by all users as a target popular user;
    第二识别模块,配置成识别所有用户中确定的非刷人气用户为目标正常用户;a second identification module configured to identify a non-brushing popular user determined among all users as a target normal user;
    第二确定模块,配置成根据所述目标刷人气用户和目标正常用户对应的用户特征确定每个用户的第二风险评分;a second determining module, configured to determine a second risk score of each user according to the user characteristics corresponding to the target brush popularity user and the target normal user;
    分数确定模块,配置成根据所述第一风险评分和所述第二风险评分确定每个用户为刷人气用户的最终风险分数;并根据所述最终风险分数识别其他刷人气用户。The score determining module is configured to determine, according to the first risk score and the second risk score, a final risk score of each user as a popular user; and identify other popular users according to the final risk score.
  11. 如权利要求10所述的装置,其特征在于,所述第一确定模块还配置成:The apparatus of claim 10, wherein the first determining module is further configured to:
    依据所述用户特征采用预定算法对所有用户进行分类;All users are classified according to the user characteristics by using a predetermined algorithm;
    选取每类用户对应的用户特征中的关键指标;Select key indicators in the user characteristics corresponding to each type of user;
    计算每类下的每个用户的关键指标的平均值;Calculate the average of the key metrics for each user under each category;
    根据所述平均值的大小确定每个用户的第一风险评分。A first risk score for each user is determined based on the size of the average.
  12. 如权利要求10或11任一权项所述的装置,其特征在于,所述第二确定模块还配置成:The apparatus according to any one of claims 10 or 11, wherein the second determining module is further configured to:
    构建所有用户间的网络结构;Build a network structure between all users;
    根据所述目标刷人气用户和目标正常用户的用户特征依次计算每两个用户之间的相似度权值;Calculating the similarity weight between each two users according to the user characteristics of the target brush popularity user and the target normal user;
    根据在所述网络结构中与当前用户连接的其他用户的相似度权值,以及所述目标刷人气用户和目标正常用户的初始评分计算所述当前用户的第二风险评分。Calculating a second risk score of the current user according to a similarity weight of other users connected to the current user in the network structure, and an initial score of the target brush popularity user and the target normal user.
  13. 如权利要求10-12任一权项所述的装置,其特征在于,所述获取模块具体配置成:The device according to any one of claims 10 to 12, wherein the obtaining module is specifically configured to:
    在直播间的直播过程中,按一预设获得时长,并在连续两次的预设获得时长之间以一预设时间间隔呈间断性的获取每个预设获得时长内所有用户的用户特征。During the live broadcast of the live broadcast, the duration is obtained according to a preset, and the user characteristics of all users in each preset acquisition duration are intermittently obtained at a preset time interval between two consecutive preset acquisition durations. .
  14. 如权利要求10-12任一权项所述的装置,其特征在于,所述获取所有用户的用户特征具体配置成:The device according to any one of claims 10 to 12, wherein the acquiring user characteristics of all users is specifically configured as:
    在直播间的直播过程中,且在确定所述直播间的当前人气值小于预设的人气阈值时,获取所述直播间的所有用户的用户特征。During the live broadcast of the live broadcast, and when it is determined that the current popularity value of the live broadcast room is less than a preset popularity threshold, the user characteristics of all users in the live broadcast room are obtained.
  15. 一种终端设备,其特征在于,所述终端设备包括存储器和处理器,所述存储器配置成存储计算机程序代码,所述处理器配置成执行存储于所述存储器中的计算机程序代码以实现如权利要求1-9任一一项所述的刷人气用户的识别方法。A terminal device, comprising: a memory and a processor, the memory being configured to store computer program code, the processor being configured to execute computer program code stored in the memory to implement a right A method for identifying a popular user of any of claims 1-9.
  16. 一种可读存储介质,其特征在于,存储有可执行的指令,所述指令在被一个或多个处理器执行时,实现权利要求1-9任意一项所述的刷人气用户的识别方法。A readable storage medium, characterized by storing executable instructions that, when executed by one or more processors, implement the method for identifying popular users according to any one of claims 1-9 .
PCT/CN2018/084638 2018-02-28 2018-04-26 Method and device for identifying click farming users, terminal device and storage medium WO2019165697A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810169186.1A CN108390883B (en) 2018-02-28 2018-02-28 Identification method and device for people-refreshing user and terminal equipment
CN201810169186.1 2018-02-28

Publications (1)

Publication Number Publication Date
WO2019165697A1 true WO2019165697A1 (en) 2019-09-06

Family

ID=63070177

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/084638 WO2019165697A1 (en) 2018-02-28 2018-04-26 Method and device for identifying click farming users, terminal device and storage medium

Country Status (2)

Country Link
CN (1) CN108390883B (en)
WO (1) WO2019165697A1 (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109905722B (en) * 2019-02-21 2021-07-23 武汉瓯越网视有限公司 Method for determining suspected node and related equipment
CN110069923A (en) * 2019-03-13 2019-07-30 咪咕文化科技有限公司 A kind of method and relevant apparatus identifying risk subscribers
CN109905411B (en) * 2019-04-25 2021-11-16 北京腾云天下科技有限公司 Abnormal user identification method and device and computing equipment
CN111125192B (en) * 2019-12-20 2023-04-07 北京明略软件系统有限公司 Method and device for determining similarity between objects
CN111488491B (en) * 2020-06-24 2020-10-16 武汉斗鱼鱼乐网络科技有限公司 Method, system, medium and equipment for identifying target anchor
CN113067808B (en) * 2021-03-15 2022-07-05 上海哔哩哔哩科技有限公司 Data processing method, live broadcast method, authentication server and live broadcast data server
CN114679600A (en) * 2022-03-24 2022-06-28 上海哔哩哔哩科技有限公司 Data processing method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2908495A1 (en) * 2014-02-18 2015-08-19 Palo Alto Research Center Incorporated System and method for modeling behavior change and consistency to detect malicious insiders
CN107093090A (en) * 2016-10-25 2017-08-25 北京小度信息科技有限公司 Abnormal user recognition methods and device
CN107146089A (en) * 2017-03-29 2017-09-08 北京三快在线科技有限公司 The single recognition methods of one kind brush and device, electronic equipment
CN107454441A (en) * 2017-06-30 2017-12-08 武汉斗鱼网络科技有限公司 A kind of method for detecting direct broadcasting room brush popularity behavior and live Platform Server
CN107516246A (en) * 2017-08-25 2017-12-26 北京京东尚科信息技术有限公司 Determination method, determining device, medium and the electronic equipment of user type

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103514178A (en) * 2012-06-18 2014-01-15 阿里巴巴集团控股有限公司 Searching and sorting method and device based on click rate
MA42847A (en) * 2015-09-18 2018-07-25 Mms Usa Holdings Inc MICRO-MOMENT ANALYSIS
CN111629010B (en) * 2015-11-23 2023-03-10 创新先进技术有限公司 Malicious user identification method and device
CN105915960A (en) * 2016-03-31 2016-08-31 广州华多网络科技有限公司 User type determination method and device
CN106557955A (en) * 2016-11-29 2017-04-05 流量海科技成都有限公司 Net about car exception order recognition methodss and system
CN106777024A (en) * 2016-12-08 2017-05-31 北京小米移动软件有限公司 Recognize the method and device of malicious user
CN107423883B (en) * 2017-06-15 2020-04-07 创新先进技术有限公司 Risk identification method and device for to-be-processed service and electronic equipment
CN107506921B (en) * 2017-08-14 2020-06-05 上海携程商务有限公司 Order risk identification method, system, storage medium and electronic equipment
CN107483500A (en) * 2017-09-25 2017-12-15 咪咕文化科技有限公司 A kind of Risk Identification Method based on user behavior, device and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2908495A1 (en) * 2014-02-18 2015-08-19 Palo Alto Research Center Incorporated System and method for modeling behavior change and consistency to detect malicious insiders
CN107093090A (en) * 2016-10-25 2017-08-25 北京小度信息科技有限公司 Abnormal user recognition methods and device
CN107146089A (en) * 2017-03-29 2017-09-08 北京三快在线科技有限公司 The single recognition methods of one kind brush and device, electronic equipment
CN107454441A (en) * 2017-06-30 2017-12-08 武汉斗鱼网络科技有限公司 A kind of method for detecting direct broadcasting room brush popularity behavior and live Platform Server
CN107516246A (en) * 2017-08-25 2017-12-26 北京京东尚科信息技术有限公司 Determination method, determining device, medium and the electronic equipment of user type

Also Published As

Publication number Publication date
CN108390883B (en) 2020-08-04
CN108390883A (en) 2018-08-10

Similar Documents

Publication Publication Date Title
WO2019165697A1 (en) Method and device for identifying click farming users, terminal device and storage medium
WO2019134307A1 (en) Malicious user identification method and apparatus, and readable storage medium
US10277480B2 (en) Method, apparatus, and system for determining a location corresponding to an IP address
JP7294760B2 (en) Method and Apparatus for Performing Media Device Asset Certification
CN112364202B (en) Video recommendation method and device and electronic equipment
WO2017101506A1 (en) Information processing method and device
US20170187737A1 (en) Method and electronic device for processing user behavior data
WO2019134285A1 (en) Live broadcast room recommendation method, electronic device and readbale storage medium
WO2017045532A1 (en) Application program classification display method and apparatus
US10404524B2 (en) Resource and metric ranking by differential analysis
WO2018196553A1 (en) Method and apparatus for obtaining identifier, storage medium, and electronic device
CN108062692B (en) Recording recommendation method, device, equipment and computer readable storage medium
EP3346396A1 (en) Multimedia resource quality assessment method and apparatus
WO2016062220A1 (en) Video playing detection method and device
CN111242709A (en) Message pushing method and device, equipment and storage medium thereof
WO2015051750A1 (en) Determining ranking threshold for applications
US7991648B2 (en) Opportunity index for identifying a user's unmet needs
CN106649645B (en) Playlist processing method and device
CN109688217B (en) Message pushing method and device and electronic equipment
US20210357553A1 (en) Apparatus and method for option data object performance prediction and modeling
US10262058B2 (en) Method and apparatus for evaluating search prompting system
CN111027065A (en) Lesovirus identification method and device, electronic equipment and storage medium
CN108495150B (en) Method and device for determining video click satisfaction
WO2018205642A1 (en) Video revenue calculation modeling device and method, video recommendation device and method, server, and storage medium
CN105653645B (en) Network information attention degree evaluation method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18908094

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18908094

Country of ref document: EP

Kind code of ref document: A1