WO2019134307A1 - 恶意用户识别方法、装置及可读存储介质 - Google Patents

恶意用户识别方法、装置及可读存储介质 Download PDF

Info

Publication number
WO2019134307A1
WO2019134307A1 PCT/CN2018/084636 CN2018084636W WO2019134307A1 WO 2019134307 A1 WO2019134307 A1 WO 2019134307A1 CN 2018084636 W CN2018084636 W CN 2018084636W WO 2019134307 A1 WO2019134307 A1 WO 2019134307A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
value
suspect
users
barrage
Prior art date
Application number
PCT/CN2018/084636
Other languages
English (en)
French (fr)
Inventor
王璐
陈少杰
张文明
Original Assignee
武汉斗鱼网络科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 武汉斗鱼网络科技有限公司 filed Critical 武汉斗鱼网络科技有限公司
Publication of WO2019134307A1 publication Critical patent/WO2019134307A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
    • H04N21/44213Monitoring of end-user related data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/466Learning process for intelligent management, e.g. learning user preferences for recommending movies
    • H04N21/4667Processing of monitored end-user data, e.g. trend analysis based on the log file of viewer selections
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/478Supplemental services, e.g. displaying phone caller identification, shopping application
    • H04N21/4788Supplemental services, e.g. displaying phone caller identification, shopping application communicating with other users, e.g. chatting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles

Definitions

  • the present disclosure relates to the field of big data processing technologies, and in particular, to a malicious user identification method and apparatus, and a readable storage medium.
  • the live broadcast platform is an Internet social platform that provides the host broadcast users with channels for displaying live video channels and interacting with other users online.
  • the live broadcast platform there are often some malicious users seeking unfair interests through improper means.
  • the video live broadcast platform reflects the influence and attention of the anchor through the user's attention to the anchor, but the malicious user may obtain benefits by creating a large number of false concerns, which will destroy the ecological balance of the live platform and damage the interests of other normal users. In this case, malicious users need to be identified to block or display the behavior of these malicious users on the live platform.
  • An object of the present disclosure includes providing a malicious user identification method, the method comprising:
  • the suspicion degree value of the user is iteratively calculated through a probability map model
  • the user to be identified whose suspect degree value is greater than a suspect threshold is regarded as a malicious user.
  • the step of using the to-be-identified user whose suspect degree value is greater than a suspect threshold as a malicious user includes:
  • a suspect degree value corresponding to a point where the rising slope exceeds the preset rising threshold is used as the suspect threshold in the empirical distribution function
  • a user whose suspicion degree value is greater than the suspicion threshold in the user to be identified is regarded as a malicious user.
  • the step of performing iterative calculation on the suspect degree value of the user by using a probability map model includes:
  • the iterative calculation is stopped when the degree of change of the suspect degree value corresponding to each user is less than the preset change threshold.
  • the step of performing iterative calculation on the suspect degree value of the user includes:
  • the predetermined degree of iteration calculation is performed on the suspect degree value of each user by a probability map model.
  • the barrage sending behavior feature includes a set of live broadcasts sent by the user to the barrage and at least one barrage sending action statistic value; and the barrage sending behavior characteristic according to each user , the step of calculating the similarity value between each two users, including:
  • the type of the statistical value of the barrage sending action includes the number of sending the barrage, the time period of transmitting the barrage, the time interval for transmitting the barrage, the number of the barrage, and the preset in the barrage.
  • the acquiring a barrage sending behavior feature of the multiple users includes: obtaining a duration according to a preset, and presenting at a preset time interval between two consecutive preset acquisition durations Intermittently acquire the barrage sending behavior characteristics of multiple users within each preset acquisition time.
  • the acquiring a barrage sending behavior feature of the multiple users includes: acquiring a plurality of user bombs when determining that the obtained number of currently transmitted barrages is greater than a preset number of barrage thresholds The screen sends behavior characteristics.
  • the formula for iteratively calculating the suspect degree value of the user by using a probability map model is:
  • S_k(i) is the suspect degree value of the i-th user in the k-th iteration calculation
  • is the weight coefficient, and ⁇ is between 0 and 1
  • w_ji is the degree of similarity between user j and user i value.
  • the first initial suspect value is 1 and the second initial suspect value is 0.
  • a feature acquiring module configured to acquire a barrage sending behavior feature of the plurality of users, where the plurality of users includes at least one determined malicious user and a user to be identified other than the malicious user;
  • a similarity calculation module configured to calculate a similarity value between each of the plurality of users according to the characteristic of the barrage sending behavior
  • the initialization module is configured to set the suspected degree value of the malicious user to a first initial suspect value, and set the suspect degree value of the to-be-identified user to a second initial suspect value, wherein the first initial suspect value is high At the second initial suspect value;
  • An iterative calculation module configured to iteratively calculate the suspect degree value of the user by using a probability map model according to the current suspected degree value of the user and the similarity value with other users for each user ;
  • the identification module is configured to, after a plurality of the iterative calculations, use the user to be identified whose suspect degree value is greater than a suspect threshold as a malicious user.
  • the identifying module is specifically configured to calculate an empirical distribution function of the suspect degree value of the plurality of users; and, at the empirical distribution function, a point where the rising slope exceeds a preset rising threshold The corresponding suspicion degree value is used as the suspicion threshold; the user whose suspicion degree value is greater than the suspicion threshold is determined as a malicious user.
  • the iterative calculation module is specifically configured to perform the iterative calculation on the suspect degree value of each user by using a probability map model; for each user, calculating the current iteration calculation The degree of change of the suspect degree value before and after; when the degree of change of the suspect degree value corresponding to each user is less than the preset change threshold, the iterative calculation is stopped.
  • the barrage sending behavior feature includes: a set of live broadcasts sent by the user to the barrage and at least one barrage sending action statistic value; and the similarity calculating module is specifically configured according to each of the two Calculating a first similar parameter between the two users by sending a set of live broadcasts of the user's barrage; calculating a statistical value between the two users according to each of the two user's barrage action statistics a second similarity parameter; calculating a similarity value between the two users according to the first similar parameter and the second similar parameter.
  • the feature acquiring module is configured to obtain a duration according to a preset, and obtain each of the two consecutive preset time lengths intermittently at a preset time interval. Preset to obtain the characteristics of the barrage transmission behavior of multiple users within the duration.
  • the feature acquiring module is configured to acquire a barrage sending behavior feature of multiple users when it is determined that the obtained number of currently transmitted barrages is greater than a preset threshold number of barrage.
  • the malicious user identification method and device, and the readable storage medium provided by the disclosure calculate the user to be identified according to the barrage behavior characteristic of the known malicious user and the barrage behavior characteristic of the user to be identified by using a probability map algorithm.
  • the suspected degree value and identifies the malicious user based on the suspectedness value. In this way, according to the association between the to-be-identified user and the identified malicious user's barrage behavior, the malicious user whose malicious behavior is not obvious can be effectively screened out.
  • FIG. 1 is a schematic diagram of a data processing device according to an embodiment of the present disclosure
  • FIG. 2 is a schematic flowchart of a malicious user identification method according to an embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of sub-steps of step S120 in the malicious user identification method according to an embodiment of the present disclosure
  • FIG. 4 is a schematic diagram of a malicious user identification apparatus according to an embodiment of the present disclosure.
  • Icon 100-data processing device; 110-malicious user identification device; 111-feature acquisition module; 112-similarity calculation module; 113-initialization module; 114-iteration calculation module; 115-identification module; 120-memory; processor.
  • horizontal simply means that its direction is more horizontal than “vertical”, and does not mean that the structure must be completely horizontal, but may be slightly inclined.
  • malicious users usually appear in a larger number of groups, and some of them have obvious malicious behavior characteristics, and may have some users with insignificant malicious behavior characteristics. Malicious users in a group generally have the same or similar patterns of malicious behavior that may not be obvious, but are related.
  • the inventor proposes a method of iteratively calculating the suspect degree value by using a probability map model, and other malicious users who have the same behavior pattern as the determined malicious users can be screened out accurately. Identify users with unidentified malicious behavior associated with identified malicious users.
  • FIG. 1 is a block diagram of a data processing device 100 according to a preferred embodiment of the present disclosure.
  • the data processing device 100 includes a malicious user identification device 110, a memory 120, and a processor 130.
  • the components of the memory 120 and the processor 130 are electrically connected directly or indirectly to each other to implement data transmission or interaction.
  • the components can be electrically connected to one another via one or more communication buses or signal lines.
  • the malicious user identification device 110 includes at least one software function module that can be stored in the memory 120 or in an operating system (OS) of the data processing device 100 in the form of software or firmware.
  • the processor 130 is configured to execute an executable module stored in the memory 120, such as a software function module and a computer program included in the malicious user identification device 110.
  • the memory 120 may be, but not limited to, a random access memory (RAM), a read only memory (ROM), and a programmable read-only memory (PROM). Erasable Programmable Read-Only Memory (EPROM), Electric Erasable Programmable Read-Only Memory (EEPROM), and the like.
  • RAM random access memory
  • ROM read only memory
  • PROM programmable read-only memory
  • EPROM Erasable Programmable Read-Only Memory
  • EEPROM Electric Erasable Programmable Read-Only Memory
  • the memory 120 is configured to store a program, and the processor 130 executes the program after receiving an execution instruction.
  • the processor 130 may be an integrated circuit chip with signal processing capabilities.
  • the above processor may be a general-purpose processor, including a central processing unit (CPU), a network processor (NP), etc.; or may be a digital signal processor (DSP) or an application specific integrated circuit (ASIC). Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware component.
  • the methods, steps, and logical block diagrams disclosed in the embodiments of the present disclosure may be implemented or carried out.
  • the general purpose processor may be a microprocessor or the processor or any conventional processor or the like.
  • FIG. 2 is a flowchart of an application information acquisition method of the data processing device 100 shown in FIG. 1. The method includes the following steps.
  • Step S110 Acquire a barrage sending behavior feature of multiple users, where the plurality of users include at least one determined malicious user and a user to be identified other than the malicious user.
  • the barrage sending behavior feature may include a set of live broadcasts sent by the user to the barrage and at least one barrage sending action statistic value, and the type of the barrage sending action statistic includes sending the barrage One or more of the quantity, the time period during which the barrage is sent, the time interval at which the barrage is sent, the number of words in the barrage, and the number of times the preset keyword is included in the barrage.
  • the data processing device obtains the barrage sending behavior feature of the multiple users, and the data processing device obtains the duration according to a preset, and intermittently obtains the multiple users at a preset time interval.
  • the way the barrage sends behavioral characteristics For example, if the preset acquisition time is 1 minute and the preset time interval is 30 seconds, the data processing device continuously obtains the barrage transmission behavior characteristics of the plurality of users within 1 minute, and then intermittently 30 seconds. The duration, and after 30 seconds of interruption, continues to obtain the barrage transmission behavior characteristics of the plurality of users within the next one minute period, thereby forming a loop. It can be understood that the method can effectively reduce the processing amount of data by the data processing device, and can reduce the power consumption to a certain extent.
  • the data processing device obtains the barrage sending behavior feature of the multiple users, and the data processing device can monitor the number of the currently transmitted barrage, so that the current sending barrage can be obtained in real time. Quantity.
  • the data processing device can determine whether the current number of the transmitted barrage is greater than the threshold number of the barrage by using a predetermined number of barrage thresholds. When it is judged to be not greater than, it indicates that the number of barrages generated by each user in the live broadcast room is not enough at this time, that is, the current heat in the live broadcast room is not high enough, and then the malicious user is a small probability event. Therefore, the data processing device can proceed without further processing and continue to monitor.
  • Step S120 Calculate a similarity value between each of the plurality of users according to the barrage sending behavior feature.
  • step S120 may include sub-steps S121-S123.
  • Sub-step S121 calculating a first similar parameter between the two users according to a set of live broadcasts sent by each two users.
  • Sub-step S122 calculating a second similar parameter between the two users according to each of the barrage sending action statistics values of the two users.
  • Sub-step S123 calculating a similarity value between the two users according to the first similar parameter and the second similar parameter.
  • the similarity value between the user u and the user v is w_ji, and there is
  • Ru and Rv are the collections of the live broadcasts of the user u and the user v
  • the xi user u sends the action statistics of the i-th barrage
  • the xi user v sends the action statistics.
  • the first similar parameter can be obtained by calculation of Ru and Rv
  • the second similar parameter can be obtained by calculation by xui and xvi.
  • a relatively balanced similarity value can be obtained in the first similar parameter and the second similar parameter. That is to say, the obtained similarity value can be made more accurate by the assigned weight coefficients w1 and w2, and the weight coefficients w1 and w2 are selected according to actual conditions, for example, w1 is 0.8 and w2 is 0.2.
  • the setting of the weighting factors w1 and w2 is mainly based on which of the live broadcast of the hair curtain and the action of transmitting the barrage is more important. For example, in the actual situation, most of the malicious users appear to perform malicious operations in each live broadcast room, then the live broadcast room of the hair bomb screen is more important in the judgment, and the weight coefficient w1 accounts for a higher proportion; On the contrary, the weight coefficient w2 is higher.
  • the suspected degree value of the malicious user is set to a first initial suspect value
  • the suspect degree value of the to-be-identified user is set to a second initial suspect value, wherein the first initial suspect value is higher than the first initial suspect value. Two initial suspect values.
  • the degree of suspicion is used to represent the degree to which the user may be a malicious user
  • the probability value model is used to perform multiple iteration calculations on the degree value to make the suspicion value more accurate.
  • the malicious user's suspect degree value is set to a larger first initial suspect value (for example, set to 1), and the suspected user's suspect degree value is set to a first and the first initial suspect value.
  • a smaller second initial suspect value eg, set to 0).
  • the embodiment may further determine whether the number of malicious users with obvious maliciousness determined according to the rule is greater than a preset threshold number of malicious users.
  • a preset threshold number of malicious users When the judgment is no, it indicates that the number of obviously malicious users with obvious maliciousness is very small at this time, and the users with unclear malicious behavior may have correspondingly less, so the subsequent algorithm operation is not too small at this time. Necessary, not only does not have much effect, but leads to high power consumption. Therefore, when the determination is no, the data processing device can terminate the execution of the subsequent process.
  • the judgment is yes, it indicates that there are many obvious malicious users with obvious maliciousness at this time, and there are many users who have no obvious malicious behavior, so it is necessary to perform subsequent algorithm operations at this time. Therefore, when the determination is yes, the data processing device can continue to execute the subsequent algorithm flow.
  • Step S140 For each user, according to the current suspicion degree value of the user and the similarity value between the user and other users, the suspicion degree value of the user is iteratively calculated by a probability map model.
  • the suspicion value of the user is iteratively calculated by the following formula.
  • Sk(i) is the suspect degree value of the i-th user in the k-th iteration calculation
  • is the weight coefficient, and ⁇ is between 0 and 1
  • wji is the degree of similarity between user j and user i value.
  • the similarity value between users is used to characterize the propagation probability, and the suspect degree value of the brush attention is spread on the probability map according to the correlation between users.
  • the similarity values among users are:
  • the initial suspect value is:
  • Step S150 After the iterative calculation is performed a plurality of times, the user to be identified whose suspect degree value is greater than a suspect threshold is regarded as a malicious user.
  • the data processing device calculates, for each user, the degree of change in the suspect degree value before and after performing the current iteration calculation.
  • the iterative calculation is stopped when the degree of change of the suspect degree value corresponding to each user is less than the preset change threshold.
  • the change threshold may be a percentage change of the previous iteration result.
  • a preset number of times is preset, and the predetermined degree of iteration calculation is performed on the suspect degree value of each user. For example, 10 iteration calculations are performed.
  • the data processing device may calculate an empirical distribution function of the suspect degree values of the plurality of users.
  • a suspect degree value corresponding to a point where the rising slope exceeds the preset rising threshold is used as the suspect degree threshold on the empirical distribution function. If an obvious inflection point is determined on the empirical distribution function, the empirical distribution function appears to rise significantly after the inflection point, and the suspect degree value corresponding to the inflection point is used as the suspect degree threshold.
  • the user whose suspicion degree value is greater than the suspicion threshold in the user to be identified is regarded as a malicious user.
  • the malicious user identification method provided by the embodiment can perform the propagation calculation of the suspect degree value by the malicious user with the characteristic of the malicious behavior, thereby finding the potential risk users who do not have the characteristics of the malicious behavior.
  • the embodiment further provides a malicious user identification device 110 that should be configured as the data processing device 100 shown in FIG. 1.
  • the device includes a feature acquisition module 111, a similarity calculation module 112, an initialization module 113, and an iteration.
  • the feature acquiring module 111 is configured to acquire a barrage sending behavior feature of multiple users, where the plurality of users includes at least one determined malicious user and a user to be identified other than the malicious user
  • the feature acquiring module 111 may be configured to perform the step S110 shown in FIG. 2, and the description of the step S110 may be referred to in the specific description of the feature acquiring module 111.
  • the similarity calculation module 112 is configured to calculate a similarity value between each of the plurality of users according to the characteristic of the barrage transmission behavior.
  • the similarity calculation module 112 may be configured to perform step S120 shown in FIG. 2, and the specific description of the similarity calculation module 112 may refer to the description of the step S120.
  • the initialization module 113 is configured to set the suspected degree value of the malicious user to a first initial suspect value, and set the suspect degree value of the to-be-identified user to a second initial suspect value, wherein the first initial The suspect value is higher than the second initial suspect value.
  • the initialization module 113 may be configured to perform step S130 shown in FIG. 2, and a detailed description of the initialization module 113 may refer to the description of the step S130.
  • the iterative calculation module 114 is configured, for each user, according to the current suspected degree value of the user and the similarity value with other users, the suspect degree value of the user through the probability map model Perform an iterative calculation.
  • the iterative calculation module 114 may be configured to perform step S140 shown in FIG. 2, and a detailed description of the iterative calculation module 114 may refer to the description of the step S140.
  • the identification module 115 is configured to, after a plurality of the iterative calculations, use the user to be identified whose suspect degree value is greater than a suspect threshold as a malicious user.
  • the identification module 115 may be configured to perform step S150 shown in FIG. 2, and a detailed description of the identification module 115 may refer to the description of the step S150.
  • the malicious user identification method and apparatus calculate the behavior of the user to be identified according to the characteristics of the barrage sending behavior of the known malicious user and the barrage behavior characteristics of the user to be identified by using a probability map algorithm. A suspected value, and a malicious user is identified based on the suspectedness value. In this way, according to the association between the to-be-identified user and the identified malicious user's barrage behavior, the malicious user whose malicious behavior is not obvious can be effectively screened out.
  • each block of the flowchart or block diagram can represent a module, a program segment, or a portion of code that comprises one or more of the Executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the blocks may also occur in a different order than those illustrated in the drawings.
  • each block of the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts can be implemented in a dedicated hardware-based system that performs the specified function or function. Or it can be implemented by a combination of dedicated hardware and computer instructions.
  • each functional module in various embodiments of the present disclosure may be integrated to form a separate part, or each module may exist separately, or two or more modules may be integrated to form a separate part.
  • the functions, if implemented in the form of software functional modules and sold or used as separate products, may be stored in a computer readable storage medium.
  • a computer readable storage medium including: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like.
  • the malicious user identification method, apparatus and readable storage medium provided by the embodiments of the present disclosure calculate the user to be identified by using a probability map algorithm according to the barrage behavior characteristics of the known malicious user and the barrage behavior characteristics of the user to be identified.
  • the suspected degree value and identifies the malicious user based on the suspectedness value. In this way, according to the association between the to-be-identified user and the identified malicious user's barrage behavior, the malicious user whose malicious behavior is not obvious can be effectively screened out, the malicious user is omitted, and the security of the live platform is ensured. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Biology (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Social Psychology (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本公开一种恶意用户识别方法、装置及可读存储介质,所述方法包括:获取多个用户的弹幕发送行为特征;根据所述弹幕发送行为特征,计算所述多个用户中每两个用户之间的相似程度值;将所述恶意用户的嫌疑程度值设置为第一初始嫌疑值,将所述待识别用户的嫌疑程度值设置为第二初始嫌疑值;针对每个用户,根据该用户当前的所述嫌疑程度值及该用户与其他用户之间的所述相似程度值,通过概率图模型对该用户的所述嫌疑程度值进行迭代计算;在经过多次所述迭代计算之后,将所述嫌疑程度值大于一个嫌疑度阈值的待识别用户作为恶意用户。如此,根据待识别用户与已确定的恶意用户在的弹幕发送行为特征上的关联,可以有效筛选出恶意行为特征不明显的恶意用户。

Description

恶意用户识别方法、装置及可读存储介质
相关申请的交叉引用
本申请要求于2018年01月02日提交中国专利局的申请号为2018100007598名称为“恶意用户识别方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开涉及大数据处理技术领域,具体而言,涉及一种恶意用户识别方法及、装置及可读存储介质。
背景技术
直播平台是一种互联网社交平台,为主播用户提供了展示视频直播渠道及与其他用户线上交互的渠道。在直播平台上,常存在一些恶意用户通过不正当手段谋求不正当的利益。例如,视频直播平台通过用户对主播的关注反映主播的影响力和关注度,但恶意用户会可能通过营造大量虚假关注获得利益,这会破坏直播平台的生态平衡,损害其他正常用户的利益。针对这种情况,需要对恶意用户进行识别,以屏蔽或显示这些恶意用户在直播平台上的行为。
目前,现有技术在进行恶意用户识别时,为了防止将正常用户识别为恶意用户,常采用较为严格的规则,将具有明显恶意行为特征的用户作为恶意用户。该方式虽然在一定程度上发现并屏蔽这些恶意用户,但是这样会遗漏一些恶意行为特征不明显的恶意用户。
发明内容
本公开的目的包括,提供一种恶意用户识别方法,所述方法包括:
获取多个用户的弹幕发送行为特征,其中,所述多个用户中包括至少一个已确定的恶意用户及除所述恶意用户之外的待识别用户;
根据所述弹幕发送行为特征,计算所述多个用户中每两个用户之间的相似程度值;
将所述恶意用户的嫌疑程度值设置为第一初始嫌疑值,将所述待识别用户的嫌疑程度值设置为第二初始嫌疑值,其中,所述第一初始嫌疑值高于第二初始嫌疑值;
针对每个用户,根据该用户当前的所述嫌疑程度值及该用户与其他用户之间的所述相似程度值,通过概率图模型对该用户的所述嫌疑程度值进行迭代计算;
在经过多次所述迭代计算之后,将所述嫌疑程度值大于一个嫌疑度阈值的待识别用户作为恶意用户。
可选地,在上述方法中,所述将所述嫌疑程度值大于一个嫌疑度阈值的待识别用户作为恶意用户的步骤,包括:
计算得到所述多个用户的嫌疑程度值的经验分布函数;
在所述经验分布函数上将上升斜率超过预设上升阈值的点对应的嫌疑程度值作为所述嫌疑度阈值;
将所述待识别用户中嫌疑程度值大于所述嫌疑度阈值的用户作为恶意用户。
可选地,在上述方法中,所述通过概率图模型对该用户的所述嫌疑程度值进行迭代计算的步骤,包括:
通过概率图模型对每个用户的所述嫌疑程度值进行所述迭代计算;
针对每个用户,计算在执行本轮迭代计算前后所述嫌疑程度值的变化程度;
在每个用户对应的所述嫌疑程度值的变化程度均小于预设变化阈值时,停止迭代计算。
可选地,在上述方法中,所述对该用户的所述嫌疑程度值进行迭代计算的步骤,包括:
通过概率图模型对每个用户的所述嫌疑程度值进行所述预设次数的迭代计算。
可选地,在上述方法中,所述弹幕发送行为特征包括用户发送弹幕的直播间的集合及至少一种弹幕发送动作统计值;所述根据各用户的所述弹幕发送行为特征,计算每两个用户之间的相似程度值的步骤,包括:
根据每两个用户发送弹幕的直播间的集合计算所述两个用户之间的第一相似参数;
根据所述两个用户的每种所述弹幕发送动作统计值计算所述两个用户之间的第二相似参数;
根据所述第一相似参数及第二相似参数计算所述两个用户中间的相似程度值。
可选地,在上述方法中,所述弹幕发送动作统计值的种类包括发送弹幕的数量、发送弹幕的时间段、发送弹幕的时间间隔、弹幕字数、弹幕中包含预设关键字的次数中的一种或多种。
可选地,在上述方法中,所述获取多个用户的弹幕发送行为特征,包括:按一预设获得时长,并在连续两次的预设获得时长之间以一预设时间间隔呈间断性的获取每个预设获得时长内多个用户的弹幕发送行为特征。
可选地,在上述方法中,所述获取多个用户的弹幕发送行为特征,包括:在确定获得的当前发送弹幕的数量大于预设的弹幕数量阈值时,获取多个用户的弹幕发送行为特征。
可选地,在上述方法中,所述通过概率图模型对该用户的所述嫌疑程度值进行迭代计算的公式为:
Figure PCTCN2018084636-appb-000001
其中,S_k(i)为第i个用户在第k轮迭代计算中的嫌疑程度值,α为权重系数,α取值在0到1之间;w_ji是用户j和用户i之间的相似程度值。
可选地,在上述方法中,所述第一初始嫌疑值为1,第二初始嫌疑值为0。
特征获取模块,配置成获取多个用户的弹幕发送行为特征,其中,所述多个用户中包括至少一个已确定的恶意用户及除所述恶意用户之外的待识别用户;
相似度计算模块,配置成根据所述弹幕发送行为特征性,计算所述多个用户中每两个用户之间的相似程度值;
初始化模块,配置成将所述恶意用户的嫌疑程度值设置为第一初始嫌疑值,将所述待识别用户的嫌疑程度值设置为第二初始嫌疑值,其中,所述第一初始嫌疑值高于第二初始嫌疑值;
迭代计算模块,配置成针对每个用户,根据该用户当前的所述嫌疑程度值及与其他用户之间的所述相似程度值,通过概率图模型对该用户的所述嫌疑程度值进行迭代计算;
识别模块,配置成在经过多次所述迭代计算之后,将所述嫌疑程度值大于一个嫌疑度阈值的待识别用户作为恶意用户。
可选地,在上述装置中,所述识别模块,具体配置成计算得到所述多个用户的嫌疑程度值的经验分布函数;在所述经验分布函数上将上升斜率超过预设上升阈值的点对应的嫌疑程度值作为所述嫌疑度阈值;将所述待识别用户中嫌疑程度值大于所述嫌疑度阈值的用户作为恶意用户。
可选地,在上述装置中,所述迭代计算模块,具体配置成通过概率图模型对每个用户的所述嫌疑程度值进行所述迭代计算;针对每个用户,计算在执行本轮迭代计算前后所述嫌疑程度值的变化程度;在每个用户对应的所述嫌疑程度值的变化程度均小于预设变化阈值时,停止迭代计算。
可选地,在上述装置中,所述弹幕发送行为特征包括用户发送弹幕的直播间的集合及至少一种弹幕发送动作统计值;所述相似度计算模块,具体配置成根据每两个用户发送弹幕的直播间的集合计算所述两个用户之间的第一相似参数;根据所述两个用户的每种所述弹幕发送动作统计值计算所述两个用户之间的第二相似参数;根据所述第一相似参数及第二相似参数计算所述两个用户中间的相似程度值。
可选地,在上述装置中,所述特征获取模块,具体配置成按一预设获得时长,并在 连续两次的预设获得时长之间以一预设时间间隔呈间断性的获取每个预设获得时长内多个用户的弹幕发送行为特征。
可选地,在上述装置中,所述特征获取模块,具体配置成在确定获得的当前发送弹幕的数量大于预设的弹幕数量阈值时,获取多个用户的弹幕发送行为特征。
本公开的目的还包括,提供一种可读存储介质,存储有可执行的指令,所述指令再被一个或多个处理器执行时,实现本公开提供的所述恶意用户识别方法。
相对于现有技术而言,本公开包括以下有益效果:
本公开提供的恶意用户识别方法及、装置及可读存储介质,通过采用概率图算法,根据已知恶意用户的弹幕发送行为特征与待识别用户的弹幕行为特征计算,计算得到待识别用户的嫌疑程度值,并根据所述嫌疑程度值识别出恶意用户。如此,根据待识别用户与已确定的恶意用户在的弹幕发送行为特征上的关联,可以有效筛选出恶意行为特征不明显的恶意用户。
附图说明
为了更清楚地说明本公开实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,应当理解,以下附图仅示出了本公开的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。
图1为本公开实施例提供的数据处理设备的示意图;
图2为本公开实施例提供的恶意用户识别方法的流程示意图;
图3为本公开实施例提供的恶意用户识别方法中步骤S120的子步骤示意图;
图4为本公开实施例提供的恶意用户识别装置的示意图。
图标:100-数据处理设备;110-恶意用户识别装置;111-特征获取模块;112-相似度计算模块;113-初始化模块;114-迭代计算模块;115-识别模块;120-存储器;130-处理器。
具体实施方式
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本公开一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本公开实施例的组件可以以各种不同的配置来布置和设计。
因此,以下对在附图中提供的本公开的实施例的详细描述并非旨在限制要求保护的本公开的范围,而是仅仅表示本公开的选定实施例。基于本公开中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个 附图中被定义,则在随后的附图中不需要对其进行可选定义和解释。
在本公开的描述中,需要说明的是,术语“第一”、“第二”、“第三”等仅配置成区分描述,而不能理解为指示或暗示相对重要性。
此外,术语“水平”、“竖直”、“悬垂”等术语并不表示要求部件绝对水平或悬垂,而是可以稍微倾斜。如“水平”仅仅是指其方向相对“竖直”而言更加水平,并不是表示该结构一定要完全水平,而是可以稍微倾斜。
在本公开的描述中,还需要说明的是,除非另有明确的规定和限定,术语“设置”、“安装”、“相连”、“连接”应做广义理解,例如,可以是固定连接,也可以是可拆卸连接,或一体地连接;可以是机械连接,也可以是电连接;可以是直接相连,也可以通过中间媒介间接相连,可以是两个元件内部的连通。对于本领域的普通技术人员而言,可以具体情况理解上述术语在本公开中的具体含义。
经发明人研究发现,恶意用户通常呈一个较多数量的团体出现,其中有一些具有明显恶意行为特征的用户,可能有一些具有不明显的恶意行为特征的用户。在一个团体中的恶意用户一般具有相同或相似的恶意行为模式,这些恶意行为模式可能不是明显的,但却是相关联的。
故在本实施例中,发明人提出一种通过概率图模型,对嫌疑程度值进行传播迭代计算的方式,其他将与已确定的恶意用户具有相同行为模式的恶意用户筛选出来,这样可以准确地识别与已确定的恶意用户相关的,具有不明显的恶意行为的用户。
请参照图1,图1是本公开较佳实施例提供的数据处理设备100的方框示意图。所述数据处理设备100包括恶意用户识别装置110、存储器120、处理器130。
所述存储器120及处理器130各元件相互之间直接或间接地电性连接,以实现数据的传输或交互。例如,这些元件相互之间可通过一条或多条通讯总线或信号线实现电性连接。所述恶意用户识别装置110包括至少一个可以软件或固件(firmware)的形式存储于所述存储器120中或固化在所述数据处理设备100的操作系统(operating system,OS)中的软件功能模块。所述处理器130配置成执行所述存储器120中存储的可执行模块,例如所述恶意用户识别装置110所包括的软件功能模块及计算机程序等。
其中,所述存储器120可以是,但不限于,随机存取存储器(Random Access Memory,RAM),只读存储器(Read Only Memory,ROM),可编程只读存储器(Programmable Read-Only Memory,PROM),可擦除只读存储器(Erasable Programmable Read-Only Memory,EPROM),电可擦除只读存储器(Electric Erasable Programmable Read-Only Memory,EEPROM)等。其中,存储器120配置成存储程序,所述处理器130在接收到执行指令后,执行所述程序。
所述处理器130可能是一种集成电路芯片,具有信号的处理能力。上述的处理器可以是通用处理器,包括中央处理器(Central Processing Unit,CPU)、网络处理器(Network Processor,NP)等;还可以是数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本公开实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
请参照图2,图2为应配置成图1所示的数据处理设备100的一种应用信息获取方法的流程图,以下将对所述方法包括各个步骤进行详细阐述。
步骤S110,获取多个用户的弹幕发送行为特征,其中,所述多个用户中包括至少一个已确定的恶意用户及除所述恶意用户之外的待识别用户。
在本实施例中,所述弹幕发送行为特征可以包括用户发送弹幕的直播间的集合及至少一种弹幕发送动作统计值,所述弹幕发送动作统计值的种类包括发送弹幕的数量、发送弹幕的时间段、发送弹幕的时间间隔、弹幕字数、弹幕中包含预设关键字的次数中的一种或多种。
作为一种方式,数据处理设备获得该多个用户的弹幕发送行为特征的方式可以为,数据处理设备按一预设获得时长,并以预设时间间隔间断性的去获得该多个用户的弹幕发送行为特征的方式。例如,若预设获得时长为1分钟,而预设时间间隔为30秒,那么数据处理设备则在1分钟内持续的去获得该多个用户的弹幕发送行为特征,之后间断30秒钟的时长,并在间断30秒钟之后又继续在下一个1分钟的时长内获得该多个用户的弹幕发送行为特征,从而形成循环。可以理解到,采用该方式可以有效的降低数据处理设备对数据的处理量,在一定程度上可以达到降低功耗的效果。
作为另一种方式,数据处理设备获得该多个用户的弹幕发送行为特征的方式还可以为,数据处理设备可以对当前发送弹幕的数量进行监测,从而可以实时获得该当前发送弹幕的数量。而数据处理设备通过预先设置的一个弹幕数量阈值,则可以判断该当前发送弹幕的数量是否大于该弹幕数量阈值。在判断为不大于时,则说明此时各个用户在直播间内发生的弹幕数量并不够多,也就是直播间在当前的热度还不够高,进而出现恶意用户就更是小概率事件。因此,数据处理设备可以不做进一步的处理,并继续监测。在判断为大于时,则说明各个用户在直播间内发生的弹幕数量已经很多了,也就是直播间在当前的热度比较高,则有可能出现恶意用户,因此,数据处理设备则可以执行获取多个用户的弹幕发送行为特征。可以理解到,采用该方式使得数据处理设备在需要获取多个用户的弹幕发送行为特征时才执行该流程,从而也可以有效的降低数据处理设备对数据的处理量,并也在一定程度上可以达到降低功耗的效果。步骤S120,根据所述弹幕发送行为特征,计算所述多个 用户中每两个用户之间的相似程度值。
在本实施例中,请参照图3,步骤S120可以包括子步骤S121-S123。
子步骤S121,根据每两个用户发送弹幕的直播间的集合计算所述两个用户之间的第一相似参数。
子步骤S122,根据所述两个用户的每种所述弹幕发送动作统计值计算所述两个用户之间的第二相似参数。
子步骤S123,根据所述第一相似参数及第二相似参数计算所述两个用户中间的相似程度值。
例如,在本实施例中,记用户u和用户v之间相似程度值为w_ji,则有
Figure PCTCN2018084636-appb-000002
其中,Ru与Rv分别为用户u和用户v发弹幕的直播间的集合,xui用户u第i个所述弹幕发送动作统计值,xvi用户v第i个所述弹幕发送动作统计值,其中,所述弹幕发送动作统计值中有N个种类。w1与w2为权重系数,且满足w1+w2=1。
基于上述的公式,可以清楚的知道,在式中,通过Ru与Rv来计算能够获得第一相似参数,而通过xui和xvi来计算则能够获得第二相似参数。再通过将权重系数w1与w2加入到第一相似参数和第二相似参数中进行计算,故可以在第一相似参数和第二相似参数获得一个相对平衡的相似程度值。也就是说,通过所赋值的权重系数w1和w2可以使获得的相似程度值更加准确,并且其权重系数w1和w2是根据实际情况所进行的选择,例如w1为0.8、而w2为0.2。当然,权重系数w1和w2的设定主要是根据发弹幕的直播间和发送弹幕的动作两者中哪一个更为重要。比如,在实际情况中,恶意用户的出现大多是在各个直播间均进行恶意的操作,那么发弹幕的直播间在判断中则更为重要,进而权重系数w1所占的比重则更高;反之,则权重系数w2比重更高。步骤S130,将所述恶意用户的嫌疑程度值设置为第一初始嫌疑值,将所述待识别用户的嫌疑程度值设置为第二初始嫌疑值,其中,所述第一初始嫌疑值高于第二初始嫌疑值。
在本实施例中,通过嫌疑程度值表征用户可能为恶意用户的程度,并通过概率图模型对所述程度值进行多次迭代计算,使所述嫌疑程度值更加准确。
在进行迭代计算之前,需要先对每个用户设置一个初始嫌疑值,再次初始嫌疑值的基础上进行迭代计算,在本实施例中,由于所述恶意用户的身份是已确定的,故将所述恶意用户的嫌疑程度值设置为一个较大的第一初始嫌疑值(如,设置为1),并将所述待识别 用户的嫌疑程度值设置为一各与所述第一初始嫌疑值先比较小的第二初始嫌疑值(如,设置为0)。
进一步的,本实施例还可以判断根据规则确定出的具有明显恶意的恶意用户的数量是否大于预先设置的一恶意用户数量阈值。在判断为否时,则说明此时具有明显恶意的明显恶意用户的数量非常少,进而不明显的恶意行为的用户可能相应的也比较少,那么在此时进行后续的算法运算则显得不太必要,不仅没有太大效果,却导致功耗很高。因此,在判断为否时,数据处理设备可以终止后续流程的执行。反之,在判断为是时,则说明此时具有明显恶意的明显恶意用户的数量很多,进而不明显的恶意行为的用户也可能很多,那么在此时进行后续的算法运算是非常必要的。因此,在判断为是时,数据处理设备可以继续执行后续的算法流程。
步骤S140,针对每个用户,根据该用户当前的所述嫌疑程度值及该用户与其他用户之间的所述相似程度值,通过概率图模型对该用户的所述嫌疑程度值进行迭代计算。
在本实施例中,针对每个用户,通过以下公式迭代计算该用户的嫌疑程度值,
Figure PCTCN2018084636-appb-000003
其中,Sk(i)为第i个用户在第k轮迭代计算中的嫌疑程度值,α为权重系数,α取值在0到1之间;wji是用户j和用户i之间的相似程度值。
基于上述设计,利用嫌疑程度值在概率图模型计算中传播的思路,将用户间的相似程度值表征传播概率,实现根据用户之间的相关性将刷关注的嫌疑程度值在概率图上进行传播,通过不断地迭代用户的嫌疑程度值会趋于一个稳定的分布。
假设有三个用户A、B、C,通过强规则可以识别出A用户为恶意用户,而B和C待识别用户,并设置权重系数为0.8。其中用户间的相似程度值分别是:
w AB=0.5
w AC=0.1
w BC=0.2
初始化嫌疑度值为:
S 0(A)=1,S 0(B)=0,S 0(C)=0
在进行第一轮迭代计算时的结果为:
Figure PCTCN2018084636-appb-000004
Figure PCTCN2018084636-appb-000005
Figure PCTCN2018084636-appb-000006
步骤S150,在经过多次所述迭代计算之后,将所述嫌疑程度值大于一个嫌疑度阈值的待识别用户作为恶意用户。
在本实施例的一种方式中,当各个用户的嫌疑程度值均收敛值一定程度时,停止迭代计算。
例如,所述数据处理设备针对每个用户,计算在执行本轮迭代计算前后所述嫌疑程度值的变化程度。在每个用户对应的所述嫌疑程度值的变化程度均小于预设变化阈值时,停止迭代计算。其中,所述变化阈值可以为先对前一次迭代结果的一个变化百分比。
在本实施例的一种方式中,预先设定一个预设次数,对每个用户的所述嫌疑程度值进行所述预设次数的迭代计算。如,进行10次迭代计算。
在完成多次迭代计算后,所述数据处理设备可以计算得到所述多个用户的嫌疑程度值的经验分布函数。在所述经验分布函数上将上升斜率超过预设上升阈值的点对应的嫌疑程度值作为所述嫌疑度阈值。如在所述经验分布函数上确定一个明显的拐点,所述经验分布函数在该拐点后出现明显上升,将该拐点对应的嫌疑程度值作为所述嫌疑度阈值。
然后,将所述待识别用户中嫌疑程度值大于所述嫌疑度阈值的用户作为恶意用户。
如此,通过本实施例提供的恶意用户识别方法,可以将具有明显恶意行为特征的恶意用户进行嫌疑程度值的传播计算,从而找到那些没有明确恶意行为特征的潜在风险用户。
请参照图4,本实施例还提供一种应配置成图1所示数据处理设备100的恶意用户识别装置110,所述装置包括特征获取模块111、相似度计算模块112、初始化模块113、迭代计算模块114及识别模块115。
所述特征获取模块111,配置成获取多个用户的弹幕发送行为特征,其中,所述多个用户中包括至少一个已确定的恶意用户及除所述恶意用户之外的待识别用户
本实施例中,所述特征获取模块111可配置成执行图2所示的步骤S110,关于所述特征获取模块111的具体描述可参对所述步骤S110的描述。
所述相似度计算模块112,配置成根据所述弹幕发送行为特征性,计算所述多个用户中每两个用户之间的相似程度值。
本实施例中,所述相似度计算模块112可配置成执行图2所示的步骤S120,关于所述相似度计算模块112的具体描述可参对所述步骤S120的描述。
所述初始化模块113,配置成将所述恶意用户的嫌疑程度值设置为第一初始嫌疑值,将所述待识别用户的嫌疑程度值设置为第二初始嫌疑值,其中,所述第一初始嫌疑值高于第二初始嫌疑值。
本实施例中,所述初始化模块113可配置成执行图2所示的步骤S130,关于所述初始化模块113的具体描述可参对所述步骤S130的描述。
所述迭代计算模块114,配置成针对每个用户,根据该用户当前的所述嫌疑程度值及与其他用户之间的所述相似程度值,通过概率图模型对该用户的所述嫌疑程度值进行迭代计算。
本实施例中,所述迭代计算模块114可配置成执行图2所示的步骤S140,关于所述迭代计算模块114的具体描述可参对所述步骤S140的描述。
所述识别模块115,配置成在经过多次所述迭代计算之后,将所述嫌疑程度值大于一个嫌疑度阈值的待识别用户作为恶意用户。
本实施例中,所述识别模块115可配置成执行图2所示的步骤S150,关于所述识别模块115的具体描述可参对所述步骤S150的描述。
综上所述,本公开提供的恶意用户识别方法及装置,通过采用概率图算法,根据已知恶意用户的弹幕发送行为特征与待识别用户的弹幕行为特征计算,计算得到待识别用户的嫌疑程度值,并根据所述嫌疑程度值识别出恶意用户。如此,根据待识别用户与已确定的恶意用户在的弹幕发送行为特征上的关联,可以有效筛选出恶意行为特征不明显的恶意用户。
在本公开所提供的实施例中,应该理解到,所揭露的装置和方法,也可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,附图中的流程图和框图显示了根据本公开的多个实施例的装置、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或代码的一部分,所述模块、程序段或代码的一部分包含一个或多个配置成实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现方式中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
另外,在本公开各个实施例中的各功能模块可以集成在一起形成一个独立的部分,也可以是各个模块单独存在,也可以两个或两个以上模块集成形成一个独立的部分。
所述功能如果以软件功能模块的形式实现并作为独立的产品销售或使用时,可以存储 在一个计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
以上所述,仅为本公开的具体实施方式,但本公开的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应以权利要求的保护范围为准。
工业实用性
本公开实施例提供的恶意用户识别方法、装置及可读存储介质,通过采用概率图算法根据已知恶意用户的弹幕发送行为特征与待识别用户的弹幕行为特征计算,计算得到待识别用户的嫌疑程度值,并根据所述嫌疑程度值识别出恶意用户。如此,根据待识别用户与已确定的恶意用户在的弹幕发送行为特征上的关联,可以有效筛选出恶意行为特征不明显的恶意用户,避免了恶意用户被遗漏,保证了直播平台的安全性。

Claims (17)

  1. 一种恶意用户识别方法,其特征在于,所述方法包括:
    获取多个用户的弹幕发送行为特征,其中,所述多个用户中包括至少一个已确定的恶意用户及除所述恶意用户之外的待识别用户;
    根据所述弹幕发送行为特征,计算所述多个用户中每两个用户之间的相似程度值;
    将所述恶意用户的嫌疑程度值设置为第一初始嫌疑值,将所述待识别用户的嫌疑程度值设置为第二初始嫌疑值,其中,所述第一初始嫌疑值高于第二初始嫌疑值;
    针对每个用户,根据该用户当前的所述嫌疑程度值及该用户与其他用户之间的所述相似程度值,通过概率图模型对该用户的所述嫌疑程度值进行迭代计算;
    在经过多次所述迭代计算之后,将所述嫌疑程度值大于一个嫌疑度阈值的待识别用户作为恶意用户。
  2. 根据权利要求1所述的方法,其特征在于,所述将所述嫌疑程度值大于一个嫌疑度阈值的待识别用户作为恶意用户的步骤,包括:
    计算得到所述多个用户的嫌疑程度值的经验分布函数;
    在所述经验分布函数上将上升斜率超过预设上升阈值的点对应的嫌疑程度值作为所述嫌疑度阈值;
    将所述待识别用户中嫌疑程度值大于所述嫌疑度阈值的用户作为恶意用户。
  3. 根据权利要求1所述的方法,其特征在于,所述通过概率图模型对该用户的所述嫌疑程度值进行迭代计算的步骤,包括:
    通过概率图模型对每个用户的所述嫌疑程度值进行所述迭代计算;
    针对每个用户,计算在执行本轮迭代计算前后所述嫌疑程度值的变化程度;
    在每个用户对应的所述嫌疑程度值的变化程度均小于预设变化阈值时,停止迭代计算。
  4. 根据权利要求1所述的方法,其特征在于,所述对该用户的所述嫌疑程度值进行迭代计算的步骤,包括:
    通过概率图模型对每个用户的所述嫌疑程度值进行预设次数的迭代计算。
  5. 根据权利要求1所述的方法,其特征在于,所述弹幕发送行为特征包括用户发送弹幕的直播间的集合及至少一种弹幕发送动作统计值;所述根据所述弹幕发送行为特征,计算所述多个用户中每两个用户之间的相似程度值的步骤,包括:
    根据每两个用户发送弹幕的直播间的集合计算所述两个用户之间的第一相似参数;
    根据所述两个用户的每种所述弹幕发送动作统计值计算所述两个用户之间的第二 相似参数;
    根据所述第一相似参数及第二相似参数计算所述两个用户中间的相似程度值。
  6. 根据权利要求1-5任一权项所述的方法,其特征在于,所述弹幕发送动作统计值的种类包括发送弹幕的数量、发送弹幕的时间段、发送弹幕的时间间隔、弹幕字数、弹幕中包含预设关键字的次数中的一种或多种。
  7. 根据权利要求6所述的方法,其特征在于,所述获取多个用户的弹幕发送行为特征,包括:
    按一预设获得时长,并在连续两次的预设获得时长之间以一预设时间间隔呈间断性的获取每个预设获得时长内多个用户的弹幕发送行为特征。
  8. 根据权利要求7所述的方法,其特征在于,所述获取多个用户的弹幕发送行为特征,包括:
    在确定获得的当前发送弹幕的数量大于预设的弹幕数量阈值时,获取多个用户的弹幕发送行为特征。
  9. 根据权利要求1-8任一权项所述的方法,其特征在于,所述通过概率图模型对该用户的所述嫌疑程度值进行迭代计算的公式为:
    Figure PCTCN2018084636-appb-100001
    其中,Sk(i)为第i个用户在第k轮迭代计算中的嫌疑程度值,α为权重系数,α取值在0到1之间;wji是用户j和用户i之间的相似程度值。
  10. 根据权利要求1-9任一权项所述的方法,其特征在于,所述第一初始嫌疑值为1,第二初始嫌疑值为0。
  11. 一种恶意用户识别装置,其特征在于,所述装置包括:
    特征获取模块,配置成获取多个用户的弹幕发送行为特征,其中,所述多个用户中包括至少一个已确定的恶意用户及除所述恶意用户之外的待识别用户;
    相似度计算模块,配置成根据所述弹幕发送行为特征,计算所述多个用户中每两个用户之间的相似程度值;初始化模块,配置成将所述恶意用户的嫌疑程度值设置为第一初始嫌疑值,将所述待识别用户的嫌疑程度值设置为第二初始嫌疑值,其中,所述第一初始嫌疑值高于第二初始嫌疑值;
    迭代计算模块,配置成针对每个用户,根据该用户当前的所述嫌疑程度值及与其他用户之间的所述相似程度值,通过概率图模型对该用户的所述嫌疑程度值进行迭代 计算;
    识别模块,配置成在经过多次所述迭代计算之后,将所述嫌疑程度值大于一个嫌疑度阈值的待识别用户作为恶意用户。
  12. 根据权利要求11所述的装置,其特征在于,所述识别模块,具体配置成计算得到所述多个用户的嫌疑程度值的经验分布函数;在所述经验分布函数上将上升斜率超过预设上升阈值的点对应的嫌疑程度值作为所述嫌疑度阈值;将所述待识别用户中嫌疑程度值大于所述嫌疑度阈值的用户作为恶意用户。
  13. 根据权利要求11所述的装置,其特征在于,所述迭代计算模块,具体配置成通过概率图模型对每个用户的所述嫌疑程度值进行所述迭代计算;针对每个用户,计算在执行本轮迭代计算前后所述嫌疑程度值的变化程度;在每个用户对应的所述嫌疑程度值的变化程度均小于预设变化阈值时,停止迭代计算。
  14. 根据权利要求11所述的装置,其特征在于,所述弹幕发送行为特征包括用户发送弹幕的直播间的集合及至少一种弹幕发送动作统计值;所述相似度计算模块,具体配置成根据每两个用户发送弹幕的直播间的集合计算所述两个用户之间的第一相似参数;根据所述两个用户的每种所述弹幕发送动作统计值计算所述两个用户之间的第二相似参数;根据所述第一相似参数及第二相似参数计算所述两个用户中间的相似程度值。
  15. 根据权利要求11或14任一权项所述的装置,其特征在于,所述特征获取模块,具体配置成按一预设获得时长,并在连续两次的预设获得时长之间以一预设时间间隔呈间断性的获取每个预设获得时长内多个用户的弹幕发送行为特征。
  16. 根据权利要求11或14任一权项所述的装置,其特征在于,所述特征获取模块,具体配置成在确定获得的当前发送弹幕的数量大于预设的弹幕数量阈值时,获取多个用户的弹幕发送行为特征。
  17. 一种可读存储介质,其特征在于,存储有可执行的指令,所述指令在被一个或多个处理器执行时,实现权利要求1-10任意一项所述的恶意用户识别方法。
PCT/CN2018/084636 2018-01-02 2018-04-26 恶意用户识别方法、装置及可读存储介质 WO2019134307A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810000759.8A CN108174296B (zh) 2018-01-02 2018-01-02 恶意用户识别方法及装置
CN201810000759.8 2018-01-02

Publications (1)

Publication Number Publication Date
WO2019134307A1 true WO2019134307A1 (zh) 2019-07-11

Family

ID=62516946

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/084636 WO2019134307A1 (zh) 2018-01-02 2018-04-26 恶意用户识别方法、装置及可读存储介质

Country Status (2)

Country Link
CN (1) CN108174296B (zh)
WO (1) WO2019134307A1 (zh)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112632153A (zh) * 2020-12-29 2021-04-09 国网安徽省电力有限公司 一种违约用电识别方法及装置
CN113609408A (zh) * 2021-08-10 2021-11-05 公安部交通管理科学研究所 一种基于距离计算的机动车信息查询异常判别方法及系统
CN113726887A (zh) * 2021-08-30 2021-11-30 广州虎牙科技有限公司 一种用户行为评估方法、装置、电子设备及计算机可读存储介质
CN113761277A (zh) * 2020-09-23 2021-12-07 北京沃东天骏信息技术有限公司 一种风控方法、装置、电子设备和存储介质
CN113938692A (zh) * 2020-07-13 2022-01-14 武汉斗鱼网络科技有限公司 一种视频直播的风险控制方法及装置
CN114173138A (zh) * 2021-10-22 2022-03-11 武汉斗鱼网络科技有限公司 一种处理异常视频up主的方法、装置、介质及设备
CN115396734A (zh) * 2022-05-16 2022-11-25 北京大学 一种针对视频集中的弹幕和用户行为的可视化方法及系统

Families Citing this family (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110765171B (zh) * 2018-07-09 2022-06-21 武汉斗鱼网络科技有限公司 一种不良用户甄别方法、存储介质、电子设备及系统
CN109151518B (zh) * 2018-08-06 2021-02-02 武汉斗鱼网络科技有限公司 一种被盗账号的识别方法、装置及电子设备
CN109003181B (zh) * 2018-08-17 2022-05-13 腾讯科技(深圳)有限公司 可疑用户确定方法、装置、设备和计算机可读存储介质
CN109255371B (zh) * 2018-08-23 2021-06-15 武汉斗鱼网络科技有限公司 一种确定直播平台虚假关注用户的方法以及相关设备
CN109255391B (zh) * 2018-09-30 2021-07-23 武汉斗鱼网络科技有限公司 一种识别恶意用户的方法、装置及存储介质
CN109257617B (zh) * 2018-09-30 2021-11-09 武汉斗鱼网络科技有限公司 一种确定直播平台中嫌疑用户的方法以及相关设备
CN109451359B (zh) * 2018-10-31 2020-10-16 武汉斗鱼网络科技有限公司 一种关注异常的检测方法、装置、设备和存储介质
CN109615461B (zh) * 2018-11-09 2022-04-29 创新先进技术有限公司 目标用户识别方法、违规商户识别方法和装置
CN110197375A (zh) * 2018-11-28 2019-09-03 腾讯科技(深圳)有限公司 一种相似用户识别方法、装置、相似用户识别设备和介质
CN109587248B (zh) * 2018-12-06 2023-08-29 腾讯科技(深圳)有限公司 用户识别方法、装置、服务器及存储介质
CN109840778A (zh) * 2018-12-21 2019-06-04 上海拍拍贷金融信息服务有限公司 欺诈用户的识别方法及装置、可读存储介质
CN109905722B (zh) * 2019-02-21 2021-07-23 武汉瓯越网视有限公司 一种确定嫌疑节点的方法以及相关设备
CN110222297B (zh) * 2019-06-19 2021-07-23 武汉斗鱼网络科技有限公司 一种标签用户的识别方法以及相关设备
CN110442801B (zh) * 2019-07-26 2021-11-19 新华三信息安全技术有限公司 一种目标事件的关注用户的确定方法及装置
CN110427999B (zh) * 2019-07-26 2022-02-22 武汉斗鱼网络科技有限公司 一种账号相关性评估方法、装置、设备及介质
CN112667961A (zh) * 2019-10-16 2021-04-16 武汉斗鱼网络科技有限公司 一种识别广告弹幕发布者的方法及系统
CN111125192B (zh) * 2019-12-20 2023-04-07 北京明略软件系统有限公司 一种确定对象之间相似度的方法和装置
CN111371767B (zh) * 2020-02-20 2022-05-13 深圳市腾讯计算机系统有限公司 恶意账号识别方法、恶意账号识别装置、介质及电子设备
CN111476510B (zh) * 2020-06-23 2020-10-16 武汉斗鱼鱼乐网络科技有限公司 一种风险用户识别的方法及系统、存储介质、设备
CN112153221B (zh) * 2020-09-16 2021-06-29 北京邮电大学 一种基于社交网络图计算的通信行为识别方法
CN112395556B (zh) * 2020-09-30 2022-09-06 广州市百果园网络科技有限公司 异常用户检测模型训练方法、异常用户审核方法及装置
CN113159778B (zh) * 2020-12-24 2023-11-24 西安四叶草信息技术有限公司 一种金融欺诈的检测方法及装置
CN114302216B (zh) * 2021-08-25 2024-03-22 上海哔哩哔哩科技有限公司 一种弹幕处理方法、装置、设备及系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102724182A (zh) * 2012-05-30 2012-10-10 北京像素软件科技股份有限公司 异常客户端的识别方法
CN104156447A (zh) * 2014-08-14 2014-11-19 天格科技(杭州)有限公司 一种智能社交平台广告预警及处理方法
US9503465B2 (en) * 2013-11-14 2016-11-22 At&T Intellectual Property I, L.P. Methods and apparatus to identify malicious activity in a network
CN106452809A (zh) * 2015-08-04 2017-02-22 北京奇虎科技有限公司 一种数据处理方法和装置
CN107093090A (zh) * 2016-10-25 2017-08-25 北京小度信息科技有限公司 异常用户识别方法及装置
CN107481009A (zh) * 2017-08-28 2017-12-15 广州虎牙信息科技有限公司 识别直播平台异常充值用户的方法、装置及终端

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7953676B2 (en) * 2007-08-20 2011-05-31 Yahoo! Inc. Predictive discrete latent factor models for large scale dyadic data
CN103077240B (zh) * 2013-01-10 2015-09-23 北京工商大学 一种基于概率图模型的微博水军识别方法
CN105915960A (zh) * 2016-03-31 2016-08-31 广州华多网络科技有限公司 一种用户类型的确定方法及装置
CN107316205A (zh) * 2017-05-27 2017-11-03 银联智惠信息服务(上海)有限公司 识别持卡人属性的方法、装置、计算机可读介质及系统
CN107451854B (zh) * 2017-07-12 2020-05-05 阿里巴巴集团控股有限公司 确定用户类型的方法及装置、电子设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102724182A (zh) * 2012-05-30 2012-10-10 北京像素软件科技股份有限公司 异常客户端的识别方法
US9503465B2 (en) * 2013-11-14 2016-11-22 At&T Intellectual Property I, L.P. Methods and apparatus to identify malicious activity in a network
CN104156447A (zh) * 2014-08-14 2014-11-19 天格科技(杭州)有限公司 一种智能社交平台广告预警及处理方法
CN106452809A (zh) * 2015-08-04 2017-02-22 北京奇虎科技有限公司 一种数据处理方法和装置
CN107093090A (zh) * 2016-10-25 2017-08-25 北京小度信息科技有限公司 异常用户识别方法及装置
CN107481009A (zh) * 2017-08-28 2017-12-15 广州虎牙信息科技有限公司 识别直播平台异常充值用户的方法、装置及终端

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113938692B (zh) * 2020-07-13 2024-02-09 广州壹点通网络科技有限公司 一种视频直播的风险控制方法及装置
CN113938692A (zh) * 2020-07-13 2022-01-14 武汉斗鱼网络科技有限公司 一种视频直播的风险控制方法及装置
CN113761277A (zh) * 2020-09-23 2021-12-07 北京沃东天骏信息技术有限公司 一种风控方法、装置、电子设备和存储介质
CN112632153B (zh) * 2020-12-29 2023-10-20 国网安徽省电力有限公司 一种违约用电识别方法及装置
CN112632153A (zh) * 2020-12-29 2021-04-09 国网安徽省电力有限公司 一种违约用电识别方法及装置
CN113609408B (zh) * 2021-08-10 2023-05-02 公安部交通管理科学研究所 一种基于距离计算的机动车信息查询异常判别方法及系统
CN113609408A (zh) * 2021-08-10 2021-11-05 公安部交通管理科学研究所 一种基于距离计算的机动车信息查询异常判别方法及系统
CN113726887A (zh) * 2021-08-30 2021-11-30 广州虎牙科技有限公司 一种用户行为评估方法、装置、电子设备及计算机可读存储介质
CN113726887B (zh) * 2021-08-30 2024-03-15 广州虎牙科技有限公司 一种用户行为评估方法、装置、电子设备及计算机可读存储介质
CN114173138A (zh) * 2021-10-22 2022-03-11 武汉斗鱼网络科技有限公司 一种处理异常视频up主的方法、装置、介质及设备
CN114173138B (zh) * 2021-10-22 2023-08-22 广州新特珑电子有限公司 一种处理异常视频up主的方法、装置、介质及设备
CN115396734A (zh) * 2022-05-16 2022-11-25 北京大学 一种针对视频集中的弹幕和用户行为的可视化方法及系统
CN115396734B (zh) * 2022-05-16 2024-03-08 北京大学 一种针对视频集中的弹幕和用户行为的可视化方法及系统

Also Published As

Publication number Publication date
CN108174296A (zh) 2018-06-15
CN108174296B (zh) 2019-09-10

Similar Documents

Publication Publication Date Title
WO2019134307A1 (zh) 恶意用户识别方法、装置及可读存储介质
US10135788B1 (en) Using hypergraphs to determine suspicious user activities
WO2019165697A1 (zh) 刷人气用户的识别方法、装置、终端设备及储存介质
US10885306B2 (en) Living body detection method, system and non-transitory computer-readable recording medium
JP6528448B2 (ja) ネットワーク攻撃監視装置、ネットワーク攻撃監視方法、及びプログラム
US10785134B2 (en) Identifying multiple devices belonging to a single user
US9866573B2 (en) Dynamic malicious application detection in storage systems
US10003607B1 (en) Automated detection of session-based access anomalies in a computer network through processing of session data
US10318727B2 (en) Management device, management method, and computer-readable recording medium
US8205255B2 (en) Anti-content spoofing (ACS)
AU2018217323A1 (en) Methods and systems for identifying potential enterprise software threats based on visual and non-visual data
TW201712586A (zh) 惡意程式碼分析方法與系統、資料處理裝置及電子裝置
WO2019136850A1 (zh) 风险行为识别方法、存储介质、设备及系统
US10484419B1 (en) Classifying software modules based on fingerprinting code fragments
TW201523487A (zh) 一種消息推送方法、裝置及系統
CN104980402B (zh) 一种识别恶意操作的方法及装置
US20200012784A1 (en) Profile generation device, attack detection device, profile generation method, and profile generation computer program
US20120124217A1 (en) Adjusting The Connection Idle Timeout In Connection Pools
US10404524B2 (en) Resource and metric ranking by differential analysis
WO2018068664A1 (zh) 网络信息识别方法和装置
JP2015148539A (ja) 地震情報配信システムとノイズ判定方法
Brito et al. Detecting social-network bots based on multiscale behavioral analysis
CN109547427B (zh) 黑名单用户识别方法、装置、计算机设备及存储介质
Zheng et al. Ssl-cleanse: Trojan detection and mitigation in self-supervised learning
CN114157480A (zh) 网络攻击方案的确定方法、装置、设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18898170

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18898170

Country of ref document: EP

Kind code of ref document: A1