WO2018113551A1 - Identification method and device, and anti-junk content system - Google Patents

Identification method and device, and anti-junk content system Download PDF

Info

Publication number
WO2018113551A1
WO2018113551A1 PCT/CN2017/115573 CN2017115573W WO2018113551A1 WO 2018113551 A1 WO2018113551 A1 WO 2018113551A1 CN 2017115573 W CN2017115573 W CN 2017115573W WO 2018113551 A1 WO2018113551 A1 WO 2018113551A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
behavior
illegal
target behavior
content
Prior art date
Application number
PCT/CN2017/115573
Other languages
French (fr)
Chinese (zh)
Inventor
张祥
安伟亭
魏虎
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2018113551A1 publication Critical patent/WO2018113551A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/40Network security protocols
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/16Threshold monitoring
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic

Definitions

  • the present invention relates to information technology, and in particular, to an identification method and apparatus, and an anti-spam content system.
  • the invention provides an identification method and device and an anti-spam content system, which are used to solve the technical problem that the illegal user identification effect in the prior art is poor.
  • an identification method comprising:
  • the collection node records the behavior performed by the user
  • the illegal user identification node acquires the behavior performed by the user from the collection node
  • the illegal user identification node determines a single degree of target behavior in the behavior performed by the user
  • the illegal user identification node identifies whether the user is an illegal user according to a single degree of the target behavior.
  • an identification method comprising:
  • the collection node records the behavior performed by the user
  • the illegal content identification node acquires the behavior performed by the user from the collection node
  • the illegal content identification node determines a single degree of target behavior in the behavior performed by the user
  • the illegal content identification node identifies whether the content generated by the target behavior is illegal content according to a single degree of the target behavior.
  • an anti-spam content system including: an acquisition node and an illegal user identification node;
  • the log collection and resolution node is configured to record behavior performed by the user
  • the illegal user identification node is configured to acquire, from the collection node, an action performed by the user; determine a single degree of the target behavior in the behavior performed by the user; and identify the single behavior according to the single degree of the target behavior Whether the user is an illegal user.
  • an anti-spam content system including: an acquisition node and an illegal content identification node;
  • the collecting node is configured to record behavior performed by the user
  • the illegal content identification node is configured to acquire, from the collection node, an action performed by the user; determine a single degree of the target behavior in the behavior performed by the user; and identify the single degree according to the single degree of the target behavior Whether the content generated by the target behavior is illegal.
  • an identification method including:
  • An illegal user is identified based on a single degree of the target behavior.
  • an identification device including:
  • a determination module for determining a single degree of target behavior in the behavior performed by the user
  • An identification module for identifying an illegal user according to a single degree of the target behavior.
  • an identification method including:
  • an identification device including:
  • a determination module for determining a single degree of target behavior in the behavior performed by the user
  • an identification module configured to identify, according to a single degree of the target behavior, whether the content generated by the target behavior is illegal content.
  • the identification method and device and the anti-spam content system provided by the embodiments of the present invention can identify an illegal user by using a single degree of target behavior in the behavior performed by the user, thereby identifying Users who perform actions such as spam pushes. Since the illegal users in the prior art often adopt the method of increasing the target behavior interval to evade recognition, the single-level recognition mode based on the target behavior reduces the probability of illegal users escaping recognition and improves the identification of illegal users. The rate optimizes the recognition effect of illegal users.
  • FIG. 1 is a schematic diagram of interaction of an identification method according to Embodiment 1 of the present invention.
  • FIG. 2 is a schematic diagram of interaction of another identification method according to Embodiment 1 of the present invention.
  • FIG. 3 is a schematic diagram of interaction of an identification method according to Embodiment 2 of the present invention.
  • FIG. 4 is a schematic diagram of interaction of another identification method according to Embodiment 2 of the present invention.
  • FIG. 5 is a schematic flowchart of a method for identifying according to Embodiment 3 of the present invention.
  • FIG. 6 is a schematic flowchart diagram of an identification method according to Embodiment 4 of the present invention.
  • FIG. 7 is a schematic structural diagram of an anti-spam content system according to Embodiment 5 of the present invention.
  • FIG. 8 is a schematic structural diagram of an identification device according to Embodiment 6 of the present invention.
  • FIG. 9 is a schematic structural diagram of an identification device according to Embodiment 7 of the present invention.
  • FIG. 10 is a schematic flowchart diagram of an identification method according to Embodiment 8 of the present invention.
  • FIG. 11 is a schematic structural diagram of an anti-spam content system according to Embodiment 9 of the present invention.
  • FIG. 12 is a schematic structural diagram of an identification device according to Embodiment 10 of the present invention.
  • FIG. 1 is a schematic diagram of interaction of an identification method according to Embodiment 1 of the present invention.
  • the method provided by this embodiment is performed by an collection node and an illegal user identification node.
  • the method may include:
  • step 101 the collection node records the behavior performed by the user.
  • the collecting node may record the behavior performed by the user through the log data of the service system, so that the illegal user identifies the node, and identifies the illegal user according to the behavior of the user.
  • Step 102 The illegal user identification node acquires the behavior performed by the user from the collection node.
  • the illegal user identification node may periodically acquire the behavior performed by the user from the collection node.
  • the user may select to perform the behavior of acquiring the behavior performed by the user and identifying the illegal user during the idle period of the service system. .
  • Step 103 The illegal user identification node determines a single degree of the target behavior in the behavior performed by the user, and identifies whether the user is an illegal user according to a single degree of the target behavior.
  • the inventor analyzes the behavior of the current illegal users and finds that the illegal users often succeed in evading recognition by increasing the interval of performing the target behavior. Therefore, the practice of the number of executions of the target behavior in the short-term is not recognized in the prior art.
  • This illegal user At the same time, the inventors have found that although illegal users perform target behaviors by increasing the execution interval, these illegal users perform target behaviors more singly than normal users, that is, perform less behaviors unrelated to their illegal purposes. More focused on the implementation of target behavior related to spam push or malicious billing. Thus, the inventors propose that these illegal users can be identified based on a single degree of target behavior.
  • the target behavior should be the behavior necessary for the illegal user to perform the illegal behavior. For example, when the illegal user who publishes the spam is identified, the information can be published as the target behavior, and the single user is identified. When you buy, you can target purchase behavior.
  • data mining may be performed for each user's historical behavior to obtain the repeated behavior of the user, and the number of repeated executions exceeds the preset.
  • the behavior of the threshold is the target behavior.
  • the sequence formed by the user's operation is suffixed array or dynamically planned to perform data mining to obtain the longest common subsequence.
  • the mining result includes: the longest common subsequence, and the support corresponding to the longest common subsequence.
  • the longest common subsequence is the behavior pattern that the user has performed at least twice, that is, The operations of the rows are the same and the order between the operations is the same; the degree of support is the number of times the behavior pattern has been executed.
  • the operations in the longest common subsequence whose support exceeds the preset threshold and their order are selected as the target behavior.
  • the target behavior generally includes at least two steps.
  • the operation that is, the target behavior specifies the content of the operation, and the order in which the operations are performed.
  • the illegal user identification node calculates the proportion of the target behavior in the behavior performed by the user, and uses the specific gravity to indicate a single degree of the target behavior. As a possible implementation manner, the illegal user identification node calculates the proportion of the target behavior in the behavior performed by the user, and specifically calculates the number of executions of the target behavior and the behavior performed by the user. The ratio between the total number of times. Then, using the smoothing algorithm, the calculated ratio is corrected to obtain the manner in which the target behavior accounts for the proportion of the behavior performed by the user.
  • the illegal user identification node estimates the probability that the user is an illegal user according to a single degree of the target behavior, and the illegal user identification node identifies the illegal user according to the probability. As a possible implementation manner, the illegal user identification node calculates that the user is an illegal user according to a single degree of the target behavior of the user in the first time period and the execution frequency of the target behavior in the second time period. The probability.
  • the duration of the first period of time is less than the duration of the second period of time.
  • the first time period corresponds to the long term
  • the second time period corresponds to the short term.
  • the short-term behavior of the user that is, the behavior performed in the recent statistical time window
  • the frequency of the target behavior is calculated, and the probability of determining the user as an illegal user based on the long-term behavior is estimated.
  • the illegal user is identified based on the probability determined by the long-term behavior and the probability determined by the short-term behavior.
  • FIG. 2 is a schematic diagram of interaction of another identification method according to Embodiment 1 of the present invention. As shown in FIG. 2, after identifying whether the user is an illegal user, the method further includes:
  • Step 104 If the user is identified as an illegal user, the illegal user identification node provides the identified illegal user to the management node.
  • the identification method further includes:
  • step 105 the management node uses the operation permission restriction measure to punish the illegal user.
  • the identification method further includes:
  • Step 106 The management node blocks the content published by the illegal user.
  • the identification method provided in this embodiment is mainly applied to an application scenario in which spam content is distributed, and may specifically identify an illegal user who publishes the spam content. Accordingly, in this application scenario, the target behavior is specifically to release the garbage.
  • the behavior necessary for the content such as: posting logs, sending station messages, and messages, etc.
  • the user who performs the behavior of pushing the spa content or the like by increasing the execution target behavior interval can be identified. Since the illegal users in the prior art often adopt the method of increasing the target behavior interval to evade recognition, the single-level recognition mode based on the target behavior reduces the probability of illegal users escaping recognition and improves the identification of illegal users. The rate optimizes the recognition effect of illegal users.
  • FIG. 3 is a schematic diagram of interaction of an identification method according to Embodiment 2 of the present invention.
  • the method provided by this embodiment is performed by an collection node and an illegal content identification node.
  • the method may include:
  • step 201 the collecting node records the behavior performed by the user.
  • the collection node may record the behavior performed by the user through the log data of the service system, so that the illegal content identification node identifies the illegal content according to the behavior of the user.
  • Step 202 The illegal content identification node acquires the behavior performed by the user from the collection node.
  • the illegal content identification node may periodically acquire the behavior performed by the user from the collection node.
  • the step of acquiring the behavior performed by the user and identifying the illegal content may be selected during the idle period of the service system. .
  • Step 203 The illegal content identification node determines a single degree of the target behavior in the behavior performed by the user, and identifies whether the content generated by the target behavior is illegal content according to a single degree of the target behavior.
  • FIG. 4 is a schematic diagram of interaction of another identification method according to Embodiment 2 of the present invention. As shown in FIG. 4, after identifying illegal content, the method further includes:
  • Step 204 If the content is identified as illegal content, the illegal content identification node provides the identified illegal content to the management node.
  • the identification method further includes:
  • Step 205 The management node blocks the illegal content.
  • the identification method further includes:
  • Step 206 The management node uses the operation permission restriction measure to punish the user who issues the illegal content.
  • FIG. 5 is a schematic flowchart of a method for identifying a method according to Embodiment 3 of the present invention.
  • the method provided in this embodiment may be used to identify a certain type of illegal user, for example, an illegal user who issues spam, or a malicious bill.
  • the user does not limit the type of the illegal user in this embodiment.
  • the method includes:
  • Step 301 Determine a single degree of target behavior in the behavior performed by the user.
  • the inventor analyzes the behavior of the current illegal users and finds that the illegal users often succeed in evading recognition by increasing the interval of performing the target behavior. Therefore, the practice of the number of executions of the target behavior in the short-term is not recognized in the prior art.
  • This illegal user At the same time, the inventors have found that although illegal users perform target behaviors by increasing the execution interval, these illegal users perform target behaviors more singly than normal users, that is, perform less behaviors unrelated to their illegal purposes. More focused on the implementation of target behavior related to spam push or malicious billing. Thus, the inventors propose that these illegal users can be identified based on a single degree of target behavior.
  • the target behavior should be an action necessary for an illegal user to perform an illegal behavior.
  • the information may be published as a target behavior.
  • the purchase behavior can be targeted behavior.
  • the sequence formed by the user's operation is suffixed array or dynamically planned to perform data mining to obtain the longest common subsequence.
  • the mining result includes: the longest common subsequence, and the support corresponding to the longest common subsequence.
  • the longest common subsequence is a behavior pattern in which the user has performed at least twice, that is, the operations performed are the same and the order between operations is the same; the degree of support is the number of times the behavior pattern has been executed. .
  • the operations in the longest common subsequence whose support exceeds the preset threshold and their order are selected as the target behavior.
  • the target behavior generally includes at least two steps.
  • the operation that is, the target behavior specifies the content of the operation, and the order in which the operations are performed.
  • determining the target behavior As a single possible implementation of determining the target behavior, it is possible to calculate the ratio between the number of executions of the target behavior and the total number of behaviors performed by the user during the sampling period. For the case that the total number of behaviors obtained by sampling is small, in order to reduce the accuracy of noise improvement, a smoothing algorithm may be used to correct the calculated ratio to obtain the target behavior in the behavior performed by the user. Specific gravity, for example: Laplacian smoothing algorithm can be used.
  • Step 302 Identify an illegal user according to a single degree of the target behavior.
  • the target behavior is more than a limit, the user is identified as an illegal user.
  • the single degree of the target behavior may be determined based on step 301, and then the basis for estimating the basis according to the single degree of the target behavior.
  • the short-term behavior determines the probability that the user is an illegal user.
  • the long-term behavior of the user that is, the behavior performed within the statistical history time window
  • the frequency of the target behavior is calculated, and the probability of determining the user as an illegal user based on the long-term behavior is estimated.
  • the illegal user is identified based on the probability determined by the long-term behavior and the probability determined by the short-term behavior.
  • the single-level recognition mode based on the target behavior reduces the probability of illegal users escaping recognition and improves the identification of illegal users.
  • the rate optimizes the recognition effect of illegal users.
  • FIG. 6 is a schematic flowchart of an identification method according to Embodiment 4 of the present invention.
  • an identification method is described based on an identification process of an illegal user that issues spam content.
  • the target behavior may be an information publishing behavior.
  • the method provided by this embodiment may be performed by an illegal user identification node, which is set in an anti-spam system.
  • the method includes:
  • Step 401 Perform statistics on the behavior of the current user.
  • the user behavior data is obtained, and the behavior of each user is counted one by one, including:
  • count tgt_acc Counts the number of times the user has performed the target behavior count tgt_acc within the current time window. Among them, count tgt_acc reflects how often the user performs the target behavior in a short time.
  • Step 402 Calculate, according to the statistical result, a probability S of each user as an illegal user who issues spam.
  • the S parameter is used to indicate the probability that the user is an illegal user who issues spam.
  • the parameter S includes two parts, one part is for the short-term behavior, that is, the behavior performed in the current time window, and the probability that the user obtained the illegal user who publishes the spam is calculated, that is, the formula section.
  • E[counttgt_acc] represents the expected value of count tgt_acc
  • the expected value may specifically be the average value of count tgt_acc of all users.
  • the other part is the long-term behavior, that is, the behavior performed from the historical time window to the time window of the current time window, and the probability that the user obtained the illegal user who publishes the spam is calculated, that is, in the formula part.
  • E[countratio] represents the expected value of the count ratio , and the expected value may specifically be the average of the count ratios of all users.
  • count ratio quantifies the degree to which the user performs the target behavior, adds Laplacian smoothing in the calculation of count ratio to deal with the less user behavior, and avoids the increase of calculation error caused by less user behavior.
  • is the Laplacian smoothing parameter used in the smoothing process.
  • the weight ⁇ is set, thereby adjusting the influence weight of the long-term behavior calculation result and the short-term behavior calculation result on the finally calculated probability.
  • the weight ⁇ ranges from 0 to 1.
  • Step 403 Determine whether the probability S of the user is greater than a preset threshold. If yes, execute step 204, otherwise identify the next user.
  • Step 404 If it is determined that the user is an illegal user that issues spam, send the information of the user to the management node to perform permission restriction or block content processing.
  • the user who performs the behavior of pushing the spa content or the like by increasing the execution target behavior interval can be identified. Since the illegal users in the prior art often adopt the method of increasing the target behavior interval to evade recognition, the single-level recognition mode based on the target behavior reduces the probability of illegal users escaping recognition and improves the identification of illegal users. The rate optimizes the recognition effect of illegal users.
  • FIG. 7 is a schematic structural diagram of an anti-spam content system according to Embodiment 5 of the present invention. As shown in FIG. 7, the anti-spam content system includes: an collection node and an illegal user identification. node.
  • the collection node is used to record the behavior performed by the user.
  • the collection node is an interface between the anti-spam system and the online service system, and is used to complete the collection of the behavior of the user in the service system.
  • the log can be obtained from the service system, and the log can be parsed and read. The user behavior data is taken.
  • An illegal user identification node configured to acquire, from the collection node, an action performed by the user; determine a single degree of the target behavior in the behavior performed by the user; and identify whether the user is based on a single degree of the target behavior It is an illegal user.
  • the illegal user identification node is used to perform the identification methods provided in the foregoing Embodiment 1, the third embodiment, and the fourth embodiment. For details, refer to the related description in the foregoing embodiment. .
  • the anti-spam system further includes: a management node.
  • the management node is configured to acquire the identified illegal user from the illegal user identification node, and impose an operation authority restriction measure to punish the illegal user. And/or, the management node is configured to acquire the identified illegal user from the illegal user identification node, and block the content published by the illegal user.
  • the anti-spam system can run on the server to identify illegal users who publish spam, and then restrict the operation rights of the identified illegal users, so as to prohibit them from posting information on the website, thereby reducing the spam posted on the website. .
  • the illegal user identification node completes the process of identifying the illegal user and performing the operation authority restriction on the identified illegal user by interacting with the management node.
  • the illegal user identification node acquires relevant data of the user behavior from the collection node, and then the illegal user identification node performs data analysis according to the acquired related data of the user behavior to identify from each user.
  • the illegal process of publishing the spam the analysis process may include: for each user, determining a single degree of the target behavior in the behavior performed by the user, and then identifying whether the user is an illegal user according to a single degree of the target behavior.
  • the illegal user identification node provides the identified illegal user to the management node, and the management node can review the identified illegal user, and after checking that the illegal user who publishes the spam is correct, for each illegal user Set the corresponding operation permission restriction measures, for example, you must not post the log within 3 days, or the user freezes the user account for 3 days.
  • the user who performs the behavior of pushing the spa content or the like by increasing the execution target behavior interval can be identified. Since the illegal users in the prior art often adopt the method of increasing the target behavior interval to evade recognition, the single-level recognition mode based on the target behavior reduces the probability of illegal users escaping recognition and improves the identification of illegal users. The rate optimizes the recognition effect of illegal users.
  • FIG. 8 is a schematic structural diagram of an identification device according to Embodiment 6 of the present invention. As shown in FIG. 8, the method includes: a determining module 41 and an identifying module 42.
  • a determination module 41 is operative to determine a single degree of target behavior in the behavior performed by the user.
  • the identification module 42 is configured to identify an illegal user according to a single degree of the target behavior.
  • the identification module 42 identifies the illegal user by a single degree of the target behavior in the behavior performed by the user determined by the determining module 41, thereby identifying the manner in which the behavior interval of the execution target is increased. Users who perform actions such as spam push. Because illegal users in the prior art often adopt this The method of increasing the execution target behavior interval evades recognition. Therefore, the single degree recognition method based on the target behavior reduces the probability of illegal users escaping recognition, improves the recognition rate of illegal users, and optimizes the recognition effect of illegal users.
  • FIG. 9 is a schematic structural diagram of an identification device according to Embodiment 7 of the present invention. To clearly illustrate the previous embodiment, this embodiment provides a possible implementation manner of the identification device, as shown in FIG.
  • the determining module 41 further includes: a calculating unit 411 and an indicating unit 412.
  • the calculating unit 411 is configured to calculate a proportion of the target behavior in the behavior performed by the user.
  • the indicating unit 412 is configured to use the specific gravity to indicate a single degree of the target behavior.
  • the calculation unit 411 includes: a calculation subunit 4111 and a smoothing subunit 4112.
  • a calculating subunit 4111 configured to calculate a ratio between the number of executions of the target behavior and the total number of times the user performs the behavior
  • the smoothing subunit 4112 is configured to correct the calculated ratio by using a smoothing algorithm to obtain a proportion of the target behavior in the behavior performed by the user.
  • the identification module 42 includes a prediction unit 421 and an identification unit 422.
  • the prediction unit 421 is configured to estimate a probability that the user is an illegal user according to a single degree of the target behavior.
  • the prediction unit 421 is specifically configured to calculate, according to a single degree of the target behavior of the user in the first time period, and a frequency of execution of the target behavior in the second time period, calculate a probability that the user is an illegal user. .
  • the duration of the first period of time is less than the duration of the second period of time.
  • the identifying unit 422 is configured to identify an illegal user according to the probability.
  • the target behavior includes at least two steps.
  • the identification device further includes an analysis module 43.
  • the analyzing module 43 is configured to perform an analysis for each user's behavior to obtain the behavior repeatedly performed by the user, and use the behavior that the number of repeated executions exceeds a preset threshold as the target behavior.
  • FIG. 10 is a schematic flowchart of an identification method according to Embodiment 8 of the present invention.
  • the method provided in this embodiment may be used to identify illegal content. As shown in FIG. 10, the method includes:
  • Step 801 Determine a single degree of target behavior in the behavior performed by the user.
  • the inventor analyzes the behavior of the current illegal users and finds that the illegal users often succeed in evading recognition by increasing the interval of performing the target behavior. Therefore, the practice of the number of executions of the target behavior in the short-term is not recognized in the prior art.
  • This illegal user At the same time, the inventors have found that although illegal users perform target behaviors by increasing the execution interval, these illegal users perform target behaviors more singly than normal users, that is, perform less behaviors unrelated to their illegal purposes. More focused on the implementation of target behavior related to spam push or malicious billing. Thus, the inventors propose that these illegal users can be identified based on a single degree of target behavior.
  • the target behavior should be an action necessary for an illegal user to perform an illegal behavior.
  • the information may be published as a target behavior.
  • the purchase behavior can be targeted behavior.
  • data mining may be performed for each user's historical behavior to obtain the repeated behavior of the user, and the number of repeated executions exceeds the preset.
  • the behavior of the threshold is the target behavior.
  • the sequence formed by the user's operation is suffixed array or dynamically planned to perform data mining to obtain the longest common subsequence.
  • the mining result includes: the longest common subsequence, and the support corresponding to the longest common subsequence.
  • the longest common subsequence is a behavior pattern in which the user has performed at least twice, that is, the operations performed are the same and the order between operations is the same; the degree of support is the number of times the behavior pattern has been executed. .
  • the operations in the longest common subsequence whose support exceeds the preset threshold and their order are selected as the target behavior.
  • the target behavior generally includes at least two steps.
  • the operation that is, the target behavior specifies the content of the operation, and the order in which the operations are performed.
  • determining the target behavior As a single possible implementation of determining the target behavior, it is possible to calculate the ratio between the number of executions of the target behavior and the total number of behaviors performed by the user during the sampling period. For the case that the total number of behaviors obtained by sampling is small, in order to reduce the accuracy of noise improvement, a smoothing algorithm may be used to correct the calculated ratio to obtain the target behavior in the behavior performed by the user. Specific gravity, for example: Laplacian smoothing algorithm can be used.
  • Step 802 Identify, according to a single degree of the target behavior, whether the content generated by the target behavior is non- Legal content.
  • the target behavior is more than a limit
  • the user is identified as an illegal user
  • the content generated by the target behavior performed by the illegal user is illegal content.
  • the short-term behavior of the user may determine a single degree of the target behavior based on step 801, and then estimate the basis according to the single degree of the target behavior.
  • the short-term behavior determines the probability that the user is an illegal user.
  • the long-term behavior of the user that is, the behavior performed within the statistical history time window
  • the frequency of the target behavior is calculated, and the probability of determining the user as an illegal user based on the long-term behavior is estimated.
  • the illegal user is identified, and the content generated by the target behavior performed by the illegal user is determined as illegal content.
  • FIG. 11 is a schematic structural diagram of an anti-spam content system according to Embodiment 9 of the present invention. As shown in FIG. 11, the anti-spam content system includes: an acquisition node and an illegal content identification. node.
  • the collection node is used to record the behavior performed by the user.
  • the collection node is an interface between the anti-spam system and the online service system, and is used to complete the collection of the behavior of the user in the service system.
  • the log can be obtained from the service system, and the log can be parsed and read. The user behavior data is taken.
  • An illegal content identification node configured to acquire, from the collection node, an action performed by the user; determine a single degree of the target behavior in the behavior performed by the user; and identify the target behavior according to a single degree of the target behavior Whether the generated content is illegal.
  • the illegitimate content identification node is used to perform the identification method provided in the foregoing Embodiment 2 and Embodiment 8.
  • the illegitimate content identification node is used to perform the identification method provided in the foregoing Embodiment 2 and Embodiment 8.
  • the related description in the foregoing embodiment and details are not repeatedly described in this embodiment.
  • the anti-spam system further includes: a management node.
  • a management node configured to acquire the identified illegal content from the illegal content identification node; Shielded. And/or a management node, configured to acquire the identified illegal content from the illegal content identification node; and use the operation authority restriction measure to punish the user who issues the illegal content.
  • the anti-spam content system provided in this embodiment can effectively identify the illegal content published by the illegal users and the illegal users, and can also block the illegal content while shielding the illegal content. Compared with the way in which the administrator reviews the posted content or the keyword automatically blocks the published content, it can eliminate the malicious release of advertisements and other bad content from the source of the information publisher, effectively purifying the network environment and improving The user experience.
  • FIG. 12 is a schematic structural diagram of an identification device according to Embodiment 10 of the present invention. As shown in FIG. 12, the method includes: a determining module 91 and an identifying module 92.
  • a determination module 91 is used to determine a single degree of target behavior in the behavior performed by the user.
  • the identification module 92 is configured to identify, according to a single degree of the target behavior, whether the content generated by the target behavior is illegal content.
  • the determining module 91 is specifically configured to calculate a proportion of the target behavior in the behavior performed by the user; and using the specific gravity to indicate a single degree of the target behavior.
  • the determining module 91 calculates the proportion of the target behavior in the behavior performed by the user, including: calculating a ratio between the number of executions of the target behavior and the total number of behaviors performed by the user; A smoothing algorithm corrects the calculated ratio to obtain a proportion of the target behavior in the behavior performed by the user.
  • the identification module 92 is specifically configured to estimate a probability that the content is illegal content according to a single degree of the target behavior; and identify the illegal content according to the probability.
  • the identification module 92 estimates the probability that the content is illegal content according to a single degree of the target behavior.
  • the method includes: calculating a probability that the content is illegal content according to a single degree of the target behavior of the user in the first time period, and an execution frequency of the target behavior in the second time period; wherein the first time The duration of the segment is less than the duration of the second period of time.
  • the target behavior includes at least two steps.
  • the determining module 91 determines, before the singularity of the target behavior in the behavior performed by the user, the method for: analyzing, for each user's behavior, obtaining the behavior repeatedly performed by the user; and repeating the execution times exceeding a preset threshold The behavior as the target behavior.
  • the aforementioned program can be stored in a computer readable storage medium.
  • the program when executed, performs the steps including the foregoing method embodiments; and the foregoing storage medium includes various media that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Transfer Between Computers (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiments of the invention provide an identification method and device, and an anti-junk content system for identifying an illegal user according to a similarity of target behaviors in behaviors performed by a user, thereby identifying a user performing the target behaviors at an increased time interval in order to push junk content. Since illegal users often adopt the method of increasing a time interval at which target behaviors are performed to avoid detection, the identification method, which is based on similarity of target behaviors, can reduce the probability of an illegal user successfully avoiding detection, increasing the rate of illegal user identification, and optimizing identification of the illegal users.

Description

识别方法和装置以及反垃圾内容系统Identification method and device and anti-spam system
本申请要求2016年12月23日递交的申请号为201611207325.2、发明名称为“识别方法和装置以及反垃圾内容系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。The present application claims priority to Chinese Patent Application Serial No. No. No. No. No. No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No
技术领域Technical field
本发明涉及信息技术,尤其涉及一种识别方法和装置以及反垃圾内容系统。The present invention relates to information technology, and in particular, to an identification method and apparatus, and an anti-spam content system.
背景技术Background technique
随着互联网的不断发展,非法用户的数量也在不断攀升,这些非法用户的存在导致了网站环境的大幅恶化,用户体验急剧下降。例如:在提供博客或者社区等产品的社交网站中,通常存在大量的发布垃圾内容的非法用户(Spammer),频繁地在网站上发布大量广告/色情内容进行恶意推广。因此,必须对这些非法用户进行识别,从而杜绝其继续进行这类非法行为。With the continuous development of the Internet, the number of illegal users is also rising. The existence of these illegal users has led to a significant deterioration of the website environment, and the user experience has dropped dramatically. For example, in social networking sites that offer products such as blogs or communities, there are usually a large number of illegal users (Spammers) who post spam, frequently posting large amounts of advertising/pornography on the site for malicious promotion. Therefore, these illegal users must be identified to prevent them from continuing such illegal activities.
在现有技术中,通过记录一个用户短时间内累积执行某种行为的次数,并按照经验设定一个阈值,当该用户的执行该行为的次数超过阈值,则将该用户识别为非法用户。但在识别非法用户的过程中,发现由于非法用户的非法行为执行方式发生了变化,导致非法用户的识别率越来越低,识别效果变差。In the prior art, by recording the number of times a user cumulatively performs a certain behavior in a short period of time, and setting a threshold according to experience, when the number of times the user performs the behavior exceeds a threshold, the user is identified as an illegal user. However, in the process of identifying an illegal user, it is found that the illegal user's illegal behavior is changed, and the recognition rate of the illegal user is getting lower and lower, and the recognition effect is worse.
发明内容Summary of the invention
本发明提供一种识别方法和装置以及反垃圾内容系统,用于解决现有技术中非法用户识别效果较差的技术问题。The invention provides an identification method and device and an anti-spam content system, which are used to solve the technical problem that the illegal user identification effect in the prior art is poor.
为达到上述目的,本发明的实施例采用如下技术方案:In order to achieve the above object, embodiments of the present invention adopt the following technical solutions:
第一方面,提供了一种识别方法,包括:In a first aspect, an identification method is provided, comprising:
采集节点记录用户所执行的行为;The collection node records the behavior performed by the user;
非法用户识别节点从所述采集节点获取用户所执行的行为;The illegal user identification node acquires the behavior performed by the user from the collection node;
所述非法用户识别节点确定在所述用户所执行的行为中,目标行为的单一程度;The illegal user identification node determines a single degree of target behavior in the behavior performed by the user;
所述非法用户识别节点根据所述目标行为的单一程度,识别所述用户是否为非法用户。The illegal user identification node identifies whether the user is an illegal user according to a single degree of the target behavior.
第二方面,提供了一种识别方法,所述方法包括: In a second aspect, an identification method is provided, the method comprising:
采集节点记录用户所执行的行为;The collection node records the behavior performed by the user;
非法内容识别节点从所述采集节点获取用户所执行的行为;The illegal content identification node acquires the behavior performed by the user from the collection node;
所述非法内容识别节点确定在所述用户所执行的行为中,目标行为的单一程度;The illegal content identification node determines a single degree of target behavior in the behavior performed by the user;
所述非法内容识别节点根据所述目标行为的单一程度,识别所述目标行为所产生内容是否为非法内容。The illegal content identification node identifies whether the content generated by the target behavior is illegal content according to a single degree of the target behavior.
第三方面,提供了一种反垃圾内容系统,包括:采集节点和非法用户识别节点;In a third aspect, an anti-spam content system is provided, including: an acquisition node and an illegal user identification node;
所述日志采集解析节点,用于记录用户所执行的行为;The log collection and resolution node is configured to record behavior performed by the user;
所述非法用户识别节点,用于从所述采集节点获取用户所执行的行为;确定在所述用户所执行的行为中,目标行为的单一程度;根据所述目标行为的单一程度,识别所述用户是否为非法用户。The illegal user identification node is configured to acquire, from the collection node, an action performed by the user; determine a single degree of the target behavior in the behavior performed by the user; and identify the single behavior according to the single degree of the target behavior Whether the user is an illegal user.
第四方面,提供了一种反垃圾内容系统,包括:采集节点和非法内容识别节点;In a fourth aspect, an anti-spam content system is provided, including: an acquisition node and an illegal content identification node;
所述采集节点,用于记录用户所执行的行为;The collecting node is configured to record behavior performed by the user;
所述非法内容识别节点,用于从所述采集节点获取用户所执行的行为;确定在所述用户所执行的行为中,目标行为的单一程度;根据所述目标行为的单一程度,识别所述目标行为所产生内容是否为非法内容。The illegal content identification node is configured to acquire, from the collection node, an action performed by the user; determine a single degree of the target behavior in the behavior performed by the user; and identify the single degree according to the single degree of the target behavior Whether the content generated by the target behavior is illegal.
第五方面,提供了一种识别方法,包括:In a fifth aspect, an identification method is provided, including:
确定在用户所执行的行为中目标行为的单一程度;Determine the single degree of target behavior in the behavior performed by the user;
根据所述目标行为的单一程度,识别非法用户。An illegal user is identified based on a single degree of the target behavior.
第六方面,提供了一种识别装置,包括:In a sixth aspect, an identification device is provided, including:
确定模块,用于确定在用户所执行的行为中目标行为的单一程度;a determination module for determining a single degree of target behavior in the behavior performed by the user;
识别模块,用于根据所述目标行为的单一程度,识别非法用户。An identification module for identifying an illegal user according to a single degree of the target behavior.
第七方面,提供了一种识别方法,包括:In a seventh aspect, an identification method is provided, including:
确定在用户所执行的行为中目标行为的单一程度;Determine the single degree of target behavior in the behavior performed by the user;
根据所述目标行为的单一程度,识别所述目标行为所产生内容是否为非法内容。Identifying whether the content generated by the target behavior is illegal content according to a single degree of the target behavior.
第八方面,提供了一种识别装置,包括:In an eighth aspect, an identification device is provided, including:
确定模块,用于确定在用户所执行的行为中目标行为的单一程度;a determination module for determining a single degree of target behavior in the behavior performed by the user;
识别模块,用于根据所述目标行为的单一程度,识别所述目标行为所产生内容是否为非法内容。And an identification module, configured to identify, according to a single degree of the target behavior, whether the content generated by the target behavior is illegal content.
本发明实施例提供的识别方法和装置以及反垃圾内容系统,通过根据用户所执行的行为中目标行为的单一程度,识别非法用户,从而可以识别出采用增大执行目标行为间 隔的方式进行垃圾内容推送等行为的用户。由于现有技术中非法用户往往采用这种增大执行目标行为间隔的方式逃避识别,因此,基于目标行为的单一程度的识别方式,降低了非法用户成功逃避识别的概率,提高了非法用户的识别率,优化了非法用户的识别效果。The identification method and device and the anti-spam content system provided by the embodiments of the present invention can identify an illegal user by using a single degree of target behavior in the behavior performed by the user, thereby identifying Users who perform actions such as spam pushes. Since the illegal users in the prior art often adopt the method of increasing the target behavior interval to evade recognition, the single-level recognition mode based on the target behavior reduces the probability of illegal users escaping recognition and improves the identification of illegal users. The rate optimizes the recognition effect of illegal users.
上述说明仅是本发明技术方案的概述,为了能够更清楚了解本发明的技术手段,而可依照说明书的内容予以实施,并且为了让本发明的上述和其它目的、特征和优点能够更明显易懂,以下特举本发明的具体实施方式。The above description is only an overview of the technical solutions of the present invention, and the above-described and other objects, features and advantages of the present invention can be more clearly understood. Specific embodiments of the invention are set forth below.
附图说明DRAWINGS
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本发明的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:Various other advantages and benefits will become apparent to those skilled in the art from a The drawings are only for the purpose of illustrating the preferred embodiments and are not to be construed as limiting. Throughout the drawings, the same reference numerals are used to refer to the same parts. In the drawing:
图1为本发明实施例一提供的一种识别方法的交互示意图;FIG. 1 is a schematic diagram of interaction of an identification method according to Embodiment 1 of the present invention;
图2为本发明实施例一提供的另一种识别方法的交互示意图;2 is a schematic diagram of interaction of another identification method according to Embodiment 1 of the present invention;
图3为本发明实施例二提供的一种识别方法的交互示意图;3 is a schematic diagram of interaction of an identification method according to Embodiment 2 of the present invention;
图4为本发明实施例二提供的另一种识别方法的交互示意图;4 is a schematic diagram of interaction of another identification method according to Embodiment 2 of the present invention;
图5为本发明实施例三提供的一种识别方法的流程示意图;FIG. 5 is a schematic flowchart of a method for identifying according to Embodiment 3 of the present invention;
图6为本发明实施例四提供的一种识别方法的流程示意图;FIG. 6 is a schematic flowchart diagram of an identification method according to Embodiment 4 of the present invention;
图7为本发明实施例五提供的一种反垃圾内容系统的结构示意图;FIG. 7 is a schematic structural diagram of an anti-spam content system according to Embodiment 5 of the present invention; FIG.
图8为本发明实施例六提供的一种识别装置的结构示意图;8 is a schematic structural diagram of an identification device according to Embodiment 6 of the present invention;
图9为本发明实施例七提供的一种识别装置的结构示意图;9 is a schematic structural diagram of an identification device according to Embodiment 7 of the present invention;
图10为本发明实施例八供的一种识别方法的流程示意图;FIG. 10 is a schematic flowchart diagram of an identification method according to Embodiment 8 of the present invention; FIG.
图11为本发明实施例九提供的一种反垃圾内容系统的结构示意图;11 is a schematic structural diagram of an anti-spam content system according to Embodiment 9 of the present invention;
图12为本发明实施例十提供的一种识别装置的结构示意图。FIG. 12 is a schematic structural diagram of an identification device according to Embodiment 10 of the present invention.
具体实施方式detailed description
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。 Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While the embodiments of the present invention have been shown in the drawings, the embodiments Rather, these embodiments are provided so that this disclosure will be more fully understood and the scope of the disclosure will be fully disclosed.
下面结合附图对本发明实施例提供的识别方法和装置以及反垃圾内容系统进行详细描述。The identification method and device and the anti-spam content system provided by the embodiments of the present invention are described in detail below with reference to the accompanying drawings.
实施例一Embodiment 1
图1为本发明实施例一提供的一种识别方法的交互示意图,本实施例所提供的方法由采集节点、非法用户识别节点执行,如图1所示,方法可以包括:FIG. 1 is a schematic diagram of interaction of an identification method according to Embodiment 1 of the present invention. The method provided by this embodiment is performed by an collection node and an illegal user identification node. As shown in FIG. 1 , the method may include:
步骤101,采集节点记录用户所执行的行为。In step 101, the collection node records the behavior performed by the user.
可选地,采集节点可以通过业务系统的日志数据记录用户所执行的行为,以便非法用户识别节点,根据用户的行为识别非法用户。Optionally, the collecting node may record the behavior performed by the user through the log data of the service system, so that the illegal user identifies the node, and identifies the illegal user according to the behavior of the user.
步骤102,非法用户识别节点从所述采集节点获取用户所执行的行为。Step 102: The illegal user identification node acquires the behavior performed by the user from the collection node.
可选地,非法用户识别节点可以定期从采集节点获取到用户所执行的行为,一般来说,为了减轻负载,可以选择在业务系统的空闲时段执行获取用户所执行的行为以及识别非法用户的步骤。Optionally, the illegal user identification node may periodically acquire the behavior performed by the user from the collection node. Generally, in order to reduce the load, the user may select to perform the behavior of acquiring the behavior performed by the user and identifying the illegal user during the idle period of the service system. .
步骤103,非法用户识别节点确定在用户所执行的行为中,目标行为的单一程度,根据所述目标行为的单一程度,识别所述用户是否为非法用户。Step 103: The illegal user identification node determines a single degree of the target behavior in the behavior performed by the user, and identifies whether the user is an illegal user according to a single degree of the target behavior.
发明人针对目前非法用户的行为进行分析,发现目前非法用户往往通过增大执行目标行为间隔的方式成功逃避识别,因此,现有技术中基于目标行为在短期内的执行次数的做法已经无法识别出这种非法用户。同时,发明人发现,尽管非法用户采用了增大执行间隔的方式执行目标行为,但是,这些非法用户相较于正常用户较为单一地执行目标行为,也就是说较少执行与其非法目的无关的行为,较为专注地执行以垃圾内容推送或恶意刷单等行为相关的目标行为。因而,发明人提出可以基于目标行为的单一程度对这些非法用户进行识别。The inventor analyzes the behavior of the current illegal users and finds that the illegal users often succeed in evading recognition by increasing the interval of performing the target behavior. Therefore, the practice of the number of executions of the target behavior in the short-term is not recognized in the prior art. This illegal user. At the same time, the inventors have found that although illegal users perform target behaviors by increasing the execution interval, these illegal users perform target behaviors more singly than normal users, that is, perform less behaviors unrelated to their illegal purposes. More focused on the implementation of target behavior related to spam push or malicious billing. Thus, the inventors propose that these illegal users can be identified based on a single degree of target behavior.
作为一种可能的实现方式,目标行为应当为非法用户的执行非法行为所必须的行为,例如:对于发布垃圾内容的非法用户进行识别时,可以将信息发布作为目标行为,对于刷单用户进行识别时,可以将购买行为作为目标行为。As a possible implementation, the target behavior should be the behavior necessary for the illegal user to perform the illegal behavior. For example, when the illegal user who publishes the spam is identified, the information can be published as the target behavior, and the single user is identified. When you buy, you can target purchase behavior.
另外,作为另一种可能的实现方式,为了增加识别的准确率,还可以采用针对每一用户的历史行为,进行数据挖掘,以获得该用户重复执行的行为,并且将重复执行次数超过预设阈值的行为作为目标行为。In addition, as another possible implementation manner, in order to increase the accuracy of the recognition, data mining may be performed for each user's historical behavior to obtain the repeated behavior of the user, and the number of repeated executions exceeds the preset. The behavior of the threshold is the target behavior.
例如:对用户的操作所构成的序列采用后缀数组或者动态规划的方式进行数据挖掘,获得最长公共子序列。挖掘结果包括:最长公共子序列,以及该最长公共子序列所对应的支持度。其中,最长公共子序列是该用户至少执行过两次的行为模式,也就是说,执 行的操作是相同的而且操作之间的顺序也是相同的;支持度,是执行过该行为模式的次数。选取支持度超过预设阈值的最长公共子序列中的操作及其顺序作为目标行为。For example, the sequence formed by the user's operation is suffixed array or dynamically planned to perform data mining to obtain the longest common subsequence. The mining result includes: the longest common subsequence, and the support corresponding to the longest common subsequence. The longest common subsequence is the behavior pattern that the user has performed at least twice, that is, The operations of the rows are the same and the order between the operations is the same; the degree of support is the number of times the behavior pattern has been executed. The operations in the longest common subsequence whose support exceeds the preset threshold and their order are selected as the target behavior.
发明人发现这是由于这些非法用户经常采用同一行为模式,因此,可基于这一情况定期进行数据挖掘,获得这种重复的行为模式,即目标行为,此时,目标行为中一般包含至少两步操作,也就是说,目标行为规定了操作的内容,以及操作之间的执行顺序。The inventors found that this is because these illegal users often adopt the same behavior pattern. Therefore, data mining can be performed periodically based on this situation, and the repeated behavior pattern, that is, the target behavior, is obtained. At this time, the target behavior generally includes at least two steps. The operation, that is, the target behavior specifies the content of the operation, and the order in which the operations are performed.
在确定了目标行为之后,非法用户识别节点计算所述目标行为在所述用户所执行的行为中所占比重,采用所述比重指示所述目标行为的单一程度。作为一种可能的实现方式,非法用户识别节点计算所述目标行为在所述用户所执行的行为中所占比重,具体采用了计算所述目标行为的执行次数与所述用户所执行的行为的总次数之间的比值。然后,采用平滑算法,对所计算出的比值进行修正,以获得所述目标行为在所述用户所执行的行为中所占比重的方式。After determining the target behavior, the illegal user identification node calculates the proportion of the target behavior in the behavior performed by the user, and uses the specific gravity to indicate a single degree of the target behavior. As a possible implementation manner, the illegal user identification node calculates the proportion of the target behavior in the behavior performed by the user, and specifically calculates the number of executions of the target behavior and the behavior performed by the user. The ratio between the total number of times. Then, using the smoothing algorithm, the calculated ratio is corrected to obtain the manner in which the target behavior accounts for the proportion of the behavior performed by the user.
进而,非法用户识别节点根据所述目标行为的单一程度,估算所述用户为非法用户的概率,非法用户识别节点根据概率,识别非法用户。作为一种可能的实现方式,非法用户识别节点根据第一时间段内所述用户的目标行为的单一程度,以及第二时间段内所述目标行为的执行频次,计算得到所述用户为非法用户的概率。Further, the illegal user identification node estimates the probability that the user is an illegal user according to a single degree of the target behavior, and the illegal user identification node identifies the illegal user according to the probability. As a possible implementation manner, the illegal user identification node calculates that the user is an illegal user according to a single degree of the target behavior of the user in the first time period and the execution frequency of the target behavior in the second time period. The probability.
其中,所述第一时间段的时长小于所述第二时间段的时长。第一时间段对应长期,第二时间段对应短期。一方面,针对用户的短期行为,也就是最近统计时间窗内执行的行为,可以确定目标行为的单一程度,进而根据目标行为的单一程度,估算根据短期行为确定该用户为非法用户的概率。另一方面,针对用户的长期行为,也就是统计历史时间窗内执行的行为,计算目标行为的频次,估算根据长期行为确定该用户为非法用户的概率。进而根据长期行为确定的概率和短期行为确定出的概率,识别非法用户。The duration of the first period of time is less than the duration of the second period of time. The first time period corresponds to the long term, and the second time period corresponds to the short term. On the one hand, the short-term behavior of the user, that is, the behavior performed in the recent statistical time window, can determine the single degree of the target behavior, and then estimate the probability that the user is an illegal user according to the short-term behavior according to the single degree of the target behavior. On the other hand, for the long-term behavior of the user, that is, the behavior performed within the statistical history time window, the frequency of the target behavior is calculated, and the probability of determining the user as an illegal user based on the long-term behavior is estimated. In turn, the illegal user is identified based on the probability determined by the long-term behavior and the probability determined by the short-term behavior.
进一步,在图1的基础上,图2为本发明实施例一提供的另一种识别方法的交互示意图,如图2所示,在识别所述用户是否为非法用户之后,还包括:Further, on the basis of FIG. 1, FIG. 2 is a schematic diagram of interaction of another identification method according to Embodiment 1 of the present invention. As shown in FIG. 2, after identifying whether the user is an illegal user, the method further includes:
步骤104,若识别所述用户为非法用户,非法用户识别节点向管理节点提供所识别出的非法用户。Step 104: If the user is identified as an illegal user, the illegal user identification node provides the identified illegal user to the management node.
进一步,作为一种可能的实现方式,非法用户识别节点向管理节点提供所识别出的非法用户之后,识别方法还包括:Further, as a possible implementation manner, after the illegal user identification node provides the identified illegal user to the management node, the identification method further includes:
步骤105,管理节点采用操作权限限制措施,对非法用户进行处罚。In step 105, the management node uses the operation permission restriction measure to punish the illegal user.
作为另一种可能的实现方式,非法用户识别节点向管理节点提供所识别出的非法用户之后,识别方法还包括: As another possible implementation manner, after the illegal user identification node provides the identified illegal user to the management node, the identification method further includes:
步骤106,管理节点对所述非法用户所发布的内容进行屏蔽。Step 106: The management node blocks the content published by the illegal user.
本实施例中所提供的识别方法,主要应用于发布垃圾内容的应用场景中,可以具体对发布垃圾内容的非法用户进行识别,相应地,在这一应用场景中,目标行为具体为与发布垃圾内容所必须的行为,例如:发布日志、发送站内信和留言等信息发布行为。The identification method provided in this embodiment is mainly applied to an application scenario in which spam content is distributed, and may specifically identify an illegal user who publishes the spam content. Accordingly, in this application scenario, the target behavior is specifically to release the garbage. The behavior necessary for the content, such as: posting logs, sending station messages, and messages, etc.
本实施例中,通过根据用户所执行的行为中目标行为的单一程度,识别非法用户,从而可以识别出采用增大执行目标行为间隔的方式进行垃圾内容推送等行为的用户。由于现有技术中非法用户往往采用这种增大执行目标行为间隔的方式逃避识别,因此,基于目标行为的单一程度的识别方式,降低了非法用户成功逃避识别的概率,提高了非法用户的识别率,优化了非法用户的识别效果。In the present embodiment, by identifying an illegal user according to a single degree of the target behavior in the behavior performed by the user, the user who performs the behavior of pushing the spa content or the like by increasing the execution target behavior interval can be identified. Since the illegal users in the prior art often adopt the method of increasing the target behavior interval to evade recognition, the single-level recognition mode based on the target behavior reduces the probability of illegal users escaping recognition and improves the identification of illegal users. The rate optimizes the recognition effect of illegal users.
实施例二Embodiment 2
图3为本发明实施例二提供的一种识别方法的交互示意图,本实施例所提供的方法由采集节点、非法内容识别节点执行,如图3所示,方法可以包括:FIG. 3 is a schematic diagram of interaction of an identification method according to Embodiment 2 of the present invention. The method provided by this embodiment is performed by an collection node and an illegal content identification node. As shown in FIG. 3, the method may include:
步骤201,采集节点记录用户所执行的行为。In step 201, the collecting node records the behavior performed by the user.
可选地,采集节点可以通过业务系统的日志数据记录用户所执行的行为,以便非法内容识别节点,根据用户的行为识别非法内容。Optionally, the collection node may record the behavior performed by the user through the log data of the service system, so that the illegal content identification node identifies the illegal content according to the behavior of the user.
步骤202,非法内容识别节点从所述采集节点获取用户所执行的行为。Step 202: The illegal content identification node acquires the behavior performed by the user from the collection node.
可选地,非法内容识别节点可以定期从采集节点获取到用户所执行的行为,一般来说,为了减轻负载,可以选择在业务系统的空闲时段执行获取用户所执行的行为以及识别非法内容的步骤。Optionally, the illegal content identification node may periodically acquire the behavior performed by the user from the collection node. Generally, in order to reduce the load, the step of acquiring the behavior performed by the user and identifying the illegal content may be selected during the idle period of the service system. .
步骤203,非法内容识别节点确定在所述用户所执行的行为中,目标行为的单一程度,根据所述目标行为的单一程度,识别所述目标行为所产生内容是否为非法内容。Step 203: The illegal content identification node determines a single degree of the target behavior in the behavior performed by the user, and identifies whether the content generated by the target behavior is illegal content according to a single degree of the target behavior.
具体地,可以通过用户所执行的行为中,目标行为的单一程度识别用户是否为非法用户,进而针对非法用户,识别其行为所产生的内容为非法内容。Specifically, it is possible to identify whether the user is an illegal user by a single degree of the target behavior in the behavior performed by the user, and then identify the content generated by the behavior as an illegal content for the illegal user.
具体识别用户为非法用户的步骤本实施例中不再赘述,可以参见前述实施例中的相关描述。The steps of specifically identifying the user as an illegal user are not described in this embodiment. For details, refer to the related description in the foregoing embodiment.
进一步,在图3的基础上,图4为本发明实施例二提供的另一种识别方法的交互示意图,如图4所示,在识别出非法内容之后,还包括:Further, on the basis of FIG. 3, FIG. 4 is a schematic diagram of interaction of another identification method according to Embodiment 2 of the present invention. As shown in FIG. 4, after identifying illegal content, the method further includes:
步骤204,若识别所述内容为非法内容,非法内容识别节点向管理节点提供所识别出的非法内容。 Step 204: If the content is identified as illegal content, the illegal content identification node provides the identified illegal content to the management node.
进一步,作为一种可能的实现方式,非法内容识别节点向管理节点提供所识别出的非法内容之后,识别方法还包括:Further, as a possible implementation manner, after the illegal content identification node provides the identified illegal content to the management node, the identification method further includes:
步骤205,管理节点对所述非法内容进行屏蔽。Step 205: The management node blocks the illegal content.
作为另一种可能的实现方式,非法内容识别节点向管理节点提供所识别出的非法内容之后,识别方法还包括:As another possible implementation manner, after the illegal content identification node provides the identified illegal content to the management node, the identification method further includes:
步骤206,管理节点采用操作权限限制措施,对发布所述非法内容的用户进行处罚。Step 206: The management node uses the operation permission restriction measure to punish the user who issues the illegal content.
本实施例中,通过根据用户所执行的行为中目标行为的单一程度,识别目标行为所产生内容是否为非法内容,从而可以识别出采用增大执行目标行为间隔的方式所推送的垃圾内容。由于现有技术中非法用户往往采用这种增大执行目标行为间隔的方式逃避识别,因此,基于目标行为的单一程度的识别方式,降低了非法内容成功逃避识别的概率,提高了非法内容的识别率,优化了非法内容的识别效果。In this embodiment, by identifying whether the content generated by the target behavior is illegal content according to a single degree of the target behavior in the behavior performed by the user, it is possible to identify the spam content pushed by increasing the execution target behavior interval. Since the illegal users in the prior art often use this method of increasing the execution target interval to evade recognition, a single degree of recognition based on the target behavior reduces the probability of illegal content escaping recognition and improves the identification of illegal content. Rate, optimized for the recognition of illegal content.
实施例三Embodiment 3
图5为本发明实施例三提供的一种识别方法的流程示意图,本实施例所提供的方法可以用于对某一类型的非法用户进行识别,例如:发布垃圾内容的非法用户、恶意刷单用户等,本实施例对非法用户的类型不做限定,如图5所示,方法包括:FIG. 5 is a schematic flowchart of a method for identifying a method according to Embodiment 3 of the present invention. The method provided in this embodiment may be used to identify a certain type of illegal user, for example, an illegal user who issues spam, or a malicious bill. The user does not limit the type of the illegal user in this embodiment. As shown in FIG. 5, the method includes:
步骤301、确定在用户所执行的行为中目标行为的单一程度。Step 301: Determine a single degree of target behavior in the behavior performed by the user.
发明人针对目前非法用户的行为进行分析,发现目前非法用户往往通过增大执行目标行为间隔的方式成功逃避识别,因此,现有技术中基于目标行为在短期内的执行次数的做法已经无法识别出这种非法用户。同时,发明人发现,尽管非法用户采用了增大执行间隔的方式执行目标行为,但是,这些非法用户相较于正常用户较为单一地执行目标行为,也就是说较少执行与其非法目的无关的行为,较为专注地执行以垃圾内容推送或恶意刷单等行为相关的目标行为。因而,发明人提出可以基于目标行为的单一程度对这些非法用户进行识别。The inventor analyzes the behavior of the current illegal users and finds that the illegal users often succeed in evading recognition by increasing the interval of performing the target behavior. Therefore, the practice of the number of executions of the target behavior in the short-term is not recognized in the prior art. This illegal user. At the same time, the inventors have found that although illegal users perform target behaviors by increasing the execution interval, these illegal users perform target behaviors more singly than normal users, that is, perform less behaviors unrelated to their illegal purposes. More focused on the implementation of target behavior related to spam push or malicious billing. Thus, the inventors propose that these illegal users can be identified based on a single degree of target behavior.
具体的,可以通过计算目标行为在用户所执行的行为中所占比重,进而采用比重指示目标行为的单一程度。Specifically, it is possible to calculate the proportion of the target behavior in the behavior performed by the user, and then use the specific gravity to indicate a single degree of the target behavior.
具体来说,作为一种可能的实现方式,目标行为应当为非法用户的执行非法行为所必须的行为,例如:对于发布垃圾内容的非法用户进行识别时,可以将信息发布作为目标行为,对于刷单用户进行识别时,可以将购买行为作为目标行为。Specifically, as a possible implementation manner, the target behavior should be an action necessary for an illegal user to perform an illegal behavior. For example, when an illegal user who publishes spam is identified, the information may be published as a target behavior. When a single user recognizes, the purchase behavior can be targeted behavior.
另外,作为另一种可能的实现方式,为了增加识别的准确率,还可以采用针对每一 用户的历史行为,进行数据挖掘,以获得该用户重复执行的行为,并且将重复执行次数超过预设阈值的行为作为目标行为。In addition, as another possible implementation manner, in order to increase the accuracy of the identification, it is also possible to adopt The user's historical behavior, data mining, to obtain the repeated behavior of the user, and the behavior of repeated execution times exceeding a preset threshold as a target behavior.
例如:对用户的操作所构成的序列采用后缀数组或者动态规划的方式进行数据挖掘,获得最长公共子序列。挖掘结果包括:最长公共子序列,以及该最长公共子序列所对应的支持度。其中,最长公共子序列是该用户至少执行过两次的行为模式,也就是说,执行的操作是相同的而且操作之间的顺序也是相同的;支持度,是执行过该行为模式的次数。选取支持度超过预设阈值的最长公共子序列中的操作及其顺序作为目标行为。For example, the sequence formed by the user's operation is suffixed array or dynamically planned to perform data mining to obtain the longest common subsequence. The mining result includes: the longest common subsequence, and the support corresponding to the longest common subsequence. The longest common subsequence is a behavior pattern in which the user has performed at least twice, that is, the operations performed are the same and the order between operations is the same; the degree of support is the number of times the behavior pattern has been executed. . The operations in the longest common subsequence whose support exceeds the preset threshold and their order are selected as the target behavior.
这是由于发明人发现这些非法用户经常采用同一行为模式,因此,可基于这一情况定期进行数据挖掘,获得这种重复的行为模式,即目标行为,此时,目标行为中一般包含至少两步操作,也就是说,目标行为规定了操作的内容,以及操作之间的执行顺序。This is because the inventors found that these illegal users often adopt the same behavior pattern. Therefore, data mining can be performed regularly based on this situation, and the repeated behavior pattern, that is, the target behavior, is obtained. At this time, the target behavior generally includes at least two steps. The operation, that is, the target behavior specifies the content of the operation, and the order in which the operations are performed.
在确定了目标行为之后,作为一种确定目标行为的单一程度的可能的实现方式,可以通过计算在采样时间段内,目标行为的执行次数与用户所执行的行为的总次数之间的比值之后,针对采样获得的行为总次数较少的情况,出于减少噪声提高精度的考虑,还可以采用平滑算法,对所计算出的比值进行修正,以获得目标行为在用户所执行的行为中所占比重,例如:可以采用拉普拉斯平滑算法。After determining the target behavior, as a single possible implementation of determining the target behavior, it is possible to calculate the ratio between the number of executions of the target behavior and the total number of behaviors performed by the user during the sampling period. For the case that the total number of behaviors obtained by sampling is small, in order to reduce the accuracy of noise improvement, a smoothing algorithm may be used to correct the calculated ratio to obtain the target behavior in the behavior performed by the user. Specific gravity, for example: Laplacian smoothing algorithm can be used.
步骤302、根据目标行为的单一程度,识别非法用户。Step 302: Identify an illegal user according to a single degree of the target behavior.
作为一种可能的实现方式,若目标行为的单一程度高于限值,则识别该用户为非法用户。As a possible implementation manner, if the target behavior is more than a limit, the user is identified as an illegal user.
作为另一种可能的实现方式,一方面,针对用户的短期行为,也就是最近统计时间窗内执行的行为,可以基于步骤301确定目标行为的单一程度,进而根据目标行为的单一程度,估算根据短期行为确定该用户为非法用户的概率。另一方面,针对用户的长期行为,也就是统计历史时间窗内执行的行为,计算目标行为的频次,估算根据长期行为确定该用户为非法用户的概率。进而根据长期行为确定的概率和短期行为确定出的概率,识别非法用户。As another possible implementation manner, on the one hand, for the short-term behavior of the user, that is, the behavior performed in the recent statistical time window, the single degree of the target behavior may be determined based on step 301, and then the basis for estimating the basis according to the single degree of the target behavior. The short-term behavior determines the probability that the user is an illegal user. On the other hand, for the long-term behavior of the user, that is, the behavior performed within the statistical history time window, the frequency of the target behavior is calculated, and the probability of determining the user as an illegal user based on the long-term behavior is estimated. In turn, the illegal user is identified based on the probability determined by the long-term behavior and the probability determined by the short-term behavior.
可见,通过根据用户所执行的行为中目标行为的单一程度,识别非法用户,从而可以识别出采用增大执行目标行为间隔的方式进行垃圾内容推送等行为的用户。由于现有技术中非法用户往往采用这种增大执行目标行为间隔的方式逃避识别,因此,基于目标行为的单一程度的识别方式,降低了非法用户成功逃避识别的概率,提高了非法用户的识别率,优化了非法用户的识别效果。 It can be seen that by identifying the illegal user according to the single degree of the target behavior in the behavior performed by the user, it is possible to identify the user who performs the behavior of pushing the spa content by increasing the execution target interval. Since the illegal users in the prior art often adopt the method of increasing the target behavior interval to evade recognition, the single-level recognition mode based on the target behavior reduces the probability of illegal users escaping recognition and improves the identification of illegal users. The rate optimizes the recognition effect of illegal users.
实施例四Embodiment 4
图6为本发明实施例四提供的一种识别方法的流程示意图,本实施例中基于发布垃圾内容的非法用户的识别过程,对识别方法进行说明,具体来说,目标行为可以为信息发布行为。本实施例所提供的方法可以由非法用户识别节点执行,该非法用户识别节点设置于反垃圾内容推送(anti-spam)系统中。FIG. 6 is a schematic flowchart of an identification method according to Embodiment 4 of the present invention. In this embodiment, an identification method is described based on an identification process of an illegal user that issues spam content. Specifically, the target behavior may be an information publishing behavior. . The method provided by this embodiment may be performed by an illegal user identification node, which is set in an anti-spam system.
如图6所示,方法包括:As shown in Figure 6, the method includes:
步骤401、对当前用户的行为进行统计。Step 401: Perform statistics on the behavior of the current user.
具体的,获取用户行为数据,逐个对每一个用户的行为进行统计,包括:Specifically, the user behavior data is obtained, and the behavior of each user is counted one by one, including:
A、短期行为统计:A. Short-term behavior statistics:
统计在当前时间窗内,用户累计执行目标行为的次数counttgt_acc。其中,counttgt_acc反映了用户短时间内执行该目标行为的频繁程度。 Counts the number of times the user has performed the target behavior count tgt_acc within the current time window. Among them, count tgt_acc reflects how often the user performs the target behavior in a short time.
B、长期行为统计:B. Long-term behavior statistics:
统计从当天0点开始的时间窗到当前时间窗,用户累计执行目标行为的次数counttgt_total。以及统计从当天0点开始的时间窗到当前时间窗,用户累计执行所有行为的总次数countall_total Counts the time window from the 0:00 on the current day to the current time window, and the number of times the user has performed the target behavior count tgt_total . And the statistics from the time window starting at 0:00 to the current time window, the total number of times the user has performed all the actions count all_total .
步骤402、根据统计结果,计算各个用户为发布垃圾内容的非法用户的概率S。Step 402: Calculate, according to the statistical result, a probability S of each user as an illegal user who issues spam.
本实施例中采用S参数表示用户为发布垃圾内容的非法用户的概率。In this embodiment, the S parameter is used to indicate the probability that the user is an illegal user who issues spam.
S参数的计算公式如下:The formula for calculating the S parameter is as follows:
Figure PCTCN2017115573-appb-000001
Figure PCTCN2017115573-appb-000001
其中,
Figure PCTCN2017115573-appb-000002
among them,
Figure PCTCN2017115573-appb-000002
该参数S包括两个部分,一个部分为针对短期行为,也就是当前时间窗内执行的行为,进行计算所获得的该用户为发布垃圾内容的非法用户的概率,即公式中的
Figure PCTCN2017115573-appb-000003
部分。在这部分中,E[counttgt_acc]表示counttgt_acc的期望值,期望值具体可以为全部用户的counttgt_acc的平均值。
The parameter S includes two parts, one part is for the short-term behavior, that is, the behavior performed in the current time window, and the probability that the user obtained the illegal user who publishes the spam is calculated, that is, the formula
Figure PCTCN2017115573-appb-000003
section. In this part, E[counttgt_acc] represents the expected value of count tgt_acc , and the expected value may specifically be the average value of count tgt_acc of all users.
另一个部分为针对长期行为,也就是从历史时间窗到当前时间窗的时间段内执行的行为,进行计算所获得的该用户为发布垃圾内容的非法用户的概率,即公式中
Figure PCTCN2017115573-appb-000004
的部分。在这部分中,E[countratio]表示countratio的期望值,期望值具体可以为全部用户的countratio的平均值。
The other part is the long-term behavior, that is, the behavior performed from the historical time window to the time window of the current time window, and the probability that the user obtained the illegal user who publishes the spam is calculated, that is, in the formula
Figure PCTCN2017115573-appb-000004
part. In this part, E[countratio] represents the expected value of the count ratio , and the expected value may specifically be the average of the count ratios of all users.
其中,countratio量化了用户执行目标行为的单一程度,在countratio的计算中增加了拉普拉斯平滑来处理用户行为较少的情况,避免由于用户行为较少所导致的计算误差的增大的情况出现,在前述公式中,β为平滑处理中所采用的拉普拉斯平滑参数。Among them, count ratio quantifies the degree to which the user performs the target behavior, adds Laplacian smoothing in the calculation of count ratio to deal with the less user behavior, and avoids the increase of calculation error caused by less user behavior. In the above formula, β is the Laplacian smoothing parameter used in the smoothing process.
为了在S参数计算中,调节长期行为计算和短期行为计算对S参数取值的影响,设置了权重α,从而调节长期行为计算结果和短期行为计算结果对最终所计算出的概率的影响权重。权重α的取值范围为0到1。In order to adjust the influence of the long-term behavior calculation and the short-term behavior calculation on the value of the S parameter in the S-parameter calculation, the weight α is set, thereby adjusting the influence weight of the long-term behavior calculation result and the short-term behavior calculation result on the finally calculated probability. The weight α ranges from 0 to 1.
步骤403、判断用户的概率S是否大于预设阈值,若是则执行步骤204,否则对下一用户进行识别。Step 403: Determine whether the probability S of the user is greater than a preset threshold. If yes, execute step 204, otherwise identify the next user.
步骤404、若确定该用户为发布垃圾内容的非法用户,将该用户的信息发送至管理节点进行权限限制或者屏蔽内容的处理。Step 404: If it is determined that the user is an illegal user that issues spam, send the information of the user to the management node to perform permission restriction or block content processing.
本实施例中,通过根据用户所执行的行为中目标行为的单一程度,识别非法用户,从而可以识别出采用增大执行目标行为间隔的方式进行垃圾内容推送等行为的用户。由于现有技术中非法用户往往采用这种增大执行目标行为间隔的方式逃避识别,因此,基于目标行为的单一程度的识别方式,降低了非法用户成功逃避识别的概率,提高了非法用户的识别率,优化了非法用户的识别效果。In the present embodiment, by identifying an illegal user according to a single degree of the target behavior in the behavior performed by the user, the user who performs the behavior of pushing the spa content or the like by increasing the execution target behavior interval can be identified. Since the illegal users in the prior art often adopt the method of increasing the target behavior interval to evade recognition, the single-level recognition mode based on the target behavior reduces the probability of illegal users escaping recognition and improves the identification of illegal users. The rate optimizes the recognition effect of illegal users.
实施例五Embodiment 5
本实施例提供了一种反垃圾内容系统,图7为本发明实施例五提供的一种反垃圾内容系统的结构示意图,如图7所示,反垃圾内容系统包括:采集节点和非法用户识别节点。This embodiment provides an anti-spam content system. FIG. 7 is a schematic structural diagram of an anti-spam content system according to Embodiment 5 of the present invention. As shown in FIG. 7, the anti-spam content system includes: an collection node and an illegal user identification. node.
其中,采集节点,用于记录用户所执行的行为。The collection node is used to record the behavior performed by the user.
具体地,采集节点是反垃圾内容系统与线上的业务系统的接口,用于完成用户在业务系统中行为的采集,具体可以从业务系统获取日志,另外,还可以对日志进行解析,从而读取到记录有用户行为数据。Specifically, the collection node is an interface between the anti-spam system and the online service system, and is used to complete the collection of the behavior of the user in the service system. Specifically, the log can be obtained from the service system, and the log can be parsed and read. The user behavior data is taken.
非法用户识别节点,用于从所述采集节点获取用户所执行的行为;确定在所述用户所执行的行为中,目标行为的单一程度;根据所述目标行为的单一程度,识别所述用户是否为非法用户。An illegal user identification node, configured to acquire, from the collection node, an action performed by the user; determine a single degree of the target behavior in the behavior performed by the user; and identify whether the user is based on a single degree of the target behavior It is an illegal user.
需要说明的是,非法用户识别节点用于执行前述实施例一、实施例三和实施例四中所提供的识别方法,具体参见前述实施例中的相关描述,本实施例中对此不再赘述。It should be noted that the illegal user identification node is used to perform the identification methods provided in the foregoing Embodiment 1, the third embodiment, and the fourth embodiment. For details, refer to the related description in the foregoing embodiment. .
进一步,反垃圾内容系统还包括:管理节点。 Further, the anti-spam system further includes: a management node.
管理节点,用于从所述非法用户识别节点获取所识别出的非法用户,采用操作权限限制措施,对所述非法用户进行处罚。和/或,管理节点用于从所述非法用户识别节点获取所识别出的非法用户,对所述非法用户所发布的内容进行屏蔽。The management node is configured to acquire the identified illegal user from the illegal user identification node, and impose an operation authority restriction measure to punish the illegal user. And/or, the management node is configured to acquire the identified illegal user from the illegal user identification node, and block the content published by the illegal user.
反垃圾内容系统可以运行在服务器上,对发布垃圾内容的非法用户识别,进而针对所识别出的非法用户进行操作权限限制,以禁止其在网站中发布信息,从而减少网站中所发布的垃圾内容。The anti-spam system can run on the server to identify illegal users who publish spam, and then restrict the operation rights of the identified illegal users, so as to prohibit them from posting information on the website, thereby reducing the spam posted on the website. .
作为一种可能的实现方式,非法用户识别节点通过与管理节点进行交互,完成识别非法用户,以及对所识别出的非法用户进行操作权限限制的过程。As a possible implementation manner, the illegal user identification node completes the process of identifying the illegal user and performing the operation authority restriction on the identified illegal user by interacting with the management node.
具体地,如图7所示,非法用户识别节点从采集节点获取到用户行为的相关数据,进而非法用户识别节点根据所获取到的用户行为的相关数据进行数据分析,以从各个用户中识别出发布垃圾内容的非法用户,分析过程可以包括:针对每一个用户,确定在该用户所执行的行为中,目标行为的单一程度,进而根据目标行为的单一程度,识别该用户是否为非法用户。非法用户识别节点将所识别出的非法用户提供给管理节点,由管理节点对所识别出的非法用户可以进行复核,在复核确认其为发布垃圾内容的非法用户无误后,针对每一名非法用户设置相应的操作权限限制措施,例如:3日内不得发布日志,或户冻结该用户帐号3天等。Specifically, as shown in FIG. 7 , the illegal user identification node acquires relevant data of the user behavior from the collection node, and then the illegal user identification node performs data analysis according to the acquired related data of the user behavior to identify from each user. The illegal process of publishing the spam, the analysis process may include: for each user, determining a single degree of the target behavior in the behavior performed by the user, and then identifying whether the user is an illegal user according to a single degree of the target behavior. The illegal user identification node provides the identified illegal user to the management node, and the management node can review the identified illegal user, and after checking that the illegal user who publishes the spam is correct, for each illegal user Set the corresponding operation permission restriction measures, for example, you must not post the log within 3 days, or the user freezes the user account for 3 days.
本实施例中,通过根据用户所执行的行为中目标行为的单一程度,识别非法用户,从而可以识别出采用增大执行目标行为间隔的方式进行垃圾内容推送等行为的用户。由于现有技术中非法用户往往采用这种增大执行目标行为间隔的方式逃避识别,因此,基于目标行为的单一程度的识别方式,降低了非法用户成功逃避识别的概率,提高了非法用户的识别率,优化了非法用户的识别效果。In the present embodiment, by identifying an illegal user according to a single degree of the target behavior in the behavior performed by the user, the user who performs the behavior of pushing the spa content or the like by increasing the execution target behavior interval can be identified. Since the illegal users in the prior art often adopt the method of increasing the target behavior interval to evade recognition, the single-level recognition mode based on the target behavior reduces the probability of illegal users escaping recognition and improves the identification of illegal users. The rate optimizes the recognition effect of illegal users.
实施例六Embodiment 6
图8为本发明实施例六提供的一种识别装置的结构示意图,如图8所示,包括:确定模块41和识别模块42。FIG. 8 is a schematic structural diagram of an identification device according to Embodiment 6 of the present invention. As shown in FIG. 8, the method includes: a determining module 41 and an identifying module 42.
确定模块41,用于确定在用户所执行的行为中目标行为的单一程度。A determination module 41 is operative to determine a single degree of target behavior in the behavior performed by the user.
识别模块42,用于根据所述目标行为的单一程度,识别非法用户。The identification module 42 is configured to identify an illegal user according to a single degree of the target behavior.
本实施例所提供的识别装置中,识别模块42通过根据确定模块41确定出的用户所执行的行为中目标行为的单一程度,识别非法用户,从而可以识别出采用增大执行目标行为间隔的方式进行垃圾内容推送等行为的用户。由于现有技术中非法用户往往采用这 种增大执行目标行为间隔的方式逃避识别,因此,基于目标行为的单一程度的识别方式,降低了非法用户成功逃避识别的概率,提高了非法用户的识别率,优化了非法用户的识别效果。In the identification device provided by the embodiment, the identification module 42 identifies the illegal user by a single degree of the target behavior in the behavior performed by the user determined by the determining module 41, thereby identifying the manner in which the behavior interval of the execution target is increased. Users who perform actions such as spam push. Because illegal users in the prior art often adopt this The method of increasing the execution target behavior interval evades recognition. Therefore, the single degree recognition method based on the target behavior reduces the probability of illegal users escaping recognition, improves the recognition rate of illegal users, and optimizes the recognition effect of illegal users.
实施例七Example 7
图9为本发明实施例七提供的一种识别装置的结构示意图,为了清楚说明上一实施例,本实施例提供了识别装置的一种可能的实现方式,如图9所示,在图8的基础上,确定模块41,进一步包括:计算单元411和指示单元412。FIG. 9 is a schematic structural diagram of an identification device according to Embodiment 7 of the present invention. To clearly illustrate the previous embodiment, this embodiment provides a possible implementation manner of the identification device, as shown in FIG. The determining module 41 further includes: a calculating unit 411 and an indicating unit 412.
计算单元411,用于计算所述目标行为在所述用户所执行的行为中所占比重。The calculating unit 411 is configured to calculate a proportion of the target behavior in the behavior performed by the user.
指示单元412,用于采用所述比重指示所述目标行为的单一程度。The indicating unit 412 is configured to use the specific gravity to indicate a single degree of the target behavior.
其中,计算单元411,包括:计算子单元4111和平滑子单元4112。The calculation unit 411 includes: a calculation subunit 4111 and a smoothing subunit 4112.
计算子单元4111,用于计算所述目标行为的执行次数与所述用户所执行的行为的总次数之间的比值;a calculating subunit 4111, configured to calculate a ratio between the number of executions of the target behavior and the total number of times the user performs the behavior;
平滑子单元4112,用于采用平滑算法,对所计算出的比值进行修正,以获得所述目标行为在所述用户所执行的行为中所占比重。The smoothing subunit 4112 is configured to correct the calculated ratio by using a smoothing algorithm to obtain a proportion of the target behavior in the behavior performed by the user.
进一步,识别模块42,包括:预测单元421和识别单元422。Further, the identification module 42 includes a prediction unit 421 and an identification unit 422.
预测单元421,用于根据所述目标行为的单一程度,估算所述用户为非法用户的概率。The prediction unit 421 is configured to estimate a probability that the user is an illegal user according to a single degree of the target behavior.
具体的,预测单元421,具体用于根据第一时间段内所述用户的目标行为的单一程度,以及第二时间段内所述目标行为的执行频次,计算得到所述用户为非法用户的概率。Specifically, the prediction unit 421 is specifically configured to calculate, according to a single degree of the target behavior of the user in the first time period, and a frequency of execution of the target behavior in the second time period, calculate a probability that the user is an illegal user. .
其中,所述第一时间段的时长小于所述第二时间段的时长。The duration of the first period of time is less than the duration of the second period of time.
识别单元422,用于根据概率,识别非法用户。The identifying unit 422 is configured to identify an illegal user according to the probability.
进一步,目标行为包括至少两步操作。Further, the target behavior includes at least two steps.
基于此,识别装置还包括:分析模块43。Based on this, the identification device further includes an analysis module 43.
分析模块43,用于针对每一用户的行为,进行分析,以获得所述用户重复执行的行为;将重复执行次数超过预设阈值的行为作为所述目标行为。The analyzing module 43 is configured to perform an analysis for each user's behavior to obtain the behavior repeatedly performed by the user, and use the behavior that the number of repeated executions exceeds a preset threshold as the target behavior.
实施例八Example eight
图10为本发明实施例八供的一种识别方法的流程示意图,本实施例所提供的方法可以用于对非法内容进行识别,如图10所示,方法包括: 10 is a schematic flowchart of an identification method according to Embodiment 8 of the present invention. The method provided in this embodiment may be used to identify illegal content. As shown in FIG. 10, the method includes:
步骤801、确定在用户所执行的行为中目标行为的单一程度。Step 801: Determine a single degree of target behavior in the behavior performed by the user.
发明人针对目前非法用户的行为进行分析,发现目前非法用户往往通过增大执行目标行为间隔的方式成功逃避识别,因此,现有技术中基于目标行为在短期内的执行次数的做法已经无法识别出这种非法用户。同时,发明人发现,尽管非法用户采用了增大执行间隔的方式执行目标行为,但是,这些非法用户相较于正常用户较为单一地执行目标行为,也就是说较少执行与其非法目的无关的行为,较为专注地执行以垃圾内容推送或恶意刷单等行为相关的目标行为。因而,发明人提出可以基于目标行为的单一程度对这些非法用户进行识别。The inventor analyzes the behavior of the current illegal users and finds that the illegal users often succeed in evading recognition by increasing the interval of performing the target behavior. Therefore, the practice of the number of executions of the target behavior in the short-term is not recognized in the prior art. This illegal user. At the same time, the inventors have found that although illegal users perform target behaviors by increasing the execution interval, these illegal users perform target behaviors more singly than normal users, that is, perform less behaviors unrelated to their illegal purposes. More focused on the implementation of target behavior related to spam push or malicious billing. Thus, the inventors propose that these illegal users can be identified based on a single degree of target behavior.
具体的,可以通过计算目标行为在用户所执行的行为中所占比重,进而采用比重指示目标行为的单一程度。Specifically, it is possible to calculate the proportion of the target behavior in the behavior performed by the user, and then use the specific gravity to indicate a single degree of the target behavior.
具体来说,作为一种可能的实现方式,目标行为应当为非法用户的执行非法行为所必须的行为,例如:对于发布垃圾内容的非法用户进行识别时,可以将信息发布作为目标行为,对于刷单用户进行识别时,可以将购买行为作为目标行为。Specifically, as a possible implementation manner, the target behavior should be an action necessary for an illegal user to perform an illegal behavior. For example, when an illegal user who publishes spam is identified, the information may be published as a target behavior. When a single user recognizes, the purchase behavior can be targeted behavior.
另外,作为另一种可能的实现方式,为了增加识别的准确率,还可以采用针对每一用户的历史行为,进行数据挖掘,以获得该用户重复执行的行为,并且将重复执行次数超过预设阈值的行为作为目标行为。In addition, as another possible implementation manner, in order to increase the accuracy of the recognition, data mining may be performed for each user's historical behavior to obtain the repeated behavior of the user, and the number of repeated executions exceeds the preset. The behavior of the threshold is the target behavior.
例如:对用户的操作所构成的序列采用后缀数组或者动态规划的方式进行数据挖掘,获得最长公共子序列。挖掘结果包括:最长公共子序列,以及该最长公共子序列所对应的支持度。其中,最长公共子序列是该用户至少执行过两次的行为模式,也就是说,执行的操作是相同的而且操作之间的顺序也是相同的;支持度,是执行过该行为模式的次数。选取支持度超过预设阈值的最长公共子序列中的操作及其顺序作为目标行为。For example, the sequence formed by the user's operation is suffixed array or dynamically planned to perform data mining to obtain the longest common subsequence. The mining result includes: the longest common subsequence, and the support corresponding to the longest common subsequence. The longest common subsequence is a behavior pattern in which the user has performed at least twice, that is, the operations performed are the same and the order between operations is the same; the degree of support is the number of times the behavior pattern has been executed. . The operations in the longest common subsequence whose support exceeds the preset threshold and their order are selected as the target behavior.
这是由于发明人发现这些非法用户经常采用同一行为模式,因此,可基于这一情况定期进行数据挖掘,获得这种重复的行为模式,即目标行为,此时,目标行为中一般包含至少两步操作,也就是说,目标行为规定了操作的内容,以及操作之间的执行顺序。This is because the inventors found that these illegal users often adopt the same behavior pattern. Therefore, data mining can be performed regularly based on this situation, and the repeated behavior pattern, that is, the target behavior, is obtained. At this time, the target behavior generally includes at least two steps. The operation, that is, the target behavior specifies the content of the operation, and the order in which the operations are performed.
在确定了目标行为之后,作为一种确定目标行为的单一程度的可能的实现方式,可以通过计算在采样时间段内,目标行为的执行次数与用户所执行的行为的总次数之间的比值之后,针对采样获得的行为总次数较少的情况,出于减少噪声提高精度的考虑,还可以采用平滑算法,对所计算出的比值进行修正,以获得目标行为在用户所执行的行为中所占比重,例如:可以采用拉普拉斯平滑算法。After determining the target behavior, as a single possible implementation of determining the target behavior, it is possible to calculate the ratio between the number of executions of the target behavior and the total number of behaviors performed by the user during the sampling period. For the case that the total number of behaviors obtained by sampling is small, in order to reduce the accuracy of noise improvement, a smoothing algorithm may be used to correct the calculated ratio to obtain the target behavior in the behavior performed by the user. Specific gravity, for example: Laplacian smoothing algorithm can be used.
步骤802、根据所述目标行为的单一程度,识别所述目标行为所产生内容是否为非 法内容。Step 802: Identify, according to a single degree of the target behavior, whether the content generated by the target behavior is non- Legal content.
作为一种可能的实现方式,若目标行为的单一程度高于限值,则识别该用户为非法用户,非法用户所执行的目标行为产生的内容为非法内容。As a possible implementation manner, if the target behavior is more than a limit, the user is identified as an illegal user, and the content generated by the target behavior performed by the illegal user is illegal content.
作为另一种可能的实现方式,一方面,针对用户的短期行为,也就是最近统计时间窗内执行的行为,可以基于步骤801确定目标行为的单一程度,进而根据目标行为的单一程度,估算根据短期行为确定该用户为非法用户的概率。另一方面,针对用户的长期行为,也就是统计历史时间窗内执行的行为,计算目标行为的频次,估算根据长期行为确定该用户为非法用户的概率。进而根据长期行为确定的概率和短期行为确定出的概率,识别非法用户,将非法用户所执行的目标行为产生的内容确定为非法内容。As another possible implementation manner, on the one hand, the short-term behavior of the user, that is, the behavior performed in the recent statistical time window, may determine a single degree of the target behavior based on step 801, and then estimate the basis according to the single degree of the target behavior. The short-term behavior determines the probability that the user is an illegal user. On the other hand, for the long-term behavior of the user, that is, the behavior performed within the statistical history time window, the frequency of the target behavior is calculated, and the probability of determining the user as an illegal user based on the long-term behavior is estimated. Further, according to the probability determined by the long-term behavior and the probability determined by the short-term behavior, the illegal user is identified, and the content generated by the target behavior performed by the illegal user is determined as illegal content.
可见,通过根据用户所执行的行为中目标行为的单一程度,识别目标行为所产生内容是否为非法内容,从而可以识别出采用增大执行目标行为间隔的方式所推送的垃圾内容。由于现有技术中非法用户往往采用这种增大执行目标行为间隔的方式逃避识别,因此,基于目标行为的单一程度的识别方式,降低了非法内容成功逃避识别的概率,提高了非法内容的识别率,优化了非法内容的识别效果。It can be seen that by identifying the content generated by the target behavior as illegal content according to the single degree of the target behavior in the behavior performed by the user, it is possible to identify the spam pushed by the method of increasing the execution target behavior interval. Since the illegal users in the prior art often use this method of increasing the execution target interval to evade recognition, a single degree of recognition based on the target behavior reduces the probability of illegal content escaping recognition and improves the identification of illegal content. Rate, optimized for the recognition of illegal content.
实施例九Example nine
本实施例提供了一种反垃圾内容系统,图11为本发明实施例九提供的一种反垃圾内容系统的结构示意图,如图11所示,反垃圾内容系统包括:采集节点和非法内容识别节点。This embodiment provides an anti-spam content system. FIG. 11 is a schematic structural diagram of an anti-spam content system according to Embodiment 9 of the present invention. As shown in FIG. 11, the anti-spam content system includes: an acquisition node and an illegal content identification. node.
其中,采集节点,用于记录用户所执行的行为。The collection node is used to record the behavior performed by the user.
具体地,采集节点是反垃圾内容系统与线上的业务系统的接口,用于完成用户在业务系统中行为的采集,具体可以从业务系统获取日志,另外,还可以对日志进行解析,从而读取到记录有用户行为数据。Specifically, the collection node is an interface between the anti-spam system and the online service system, and is used to complete the collection of the behavior of the user in the service system. Specifically, the log can be obtained from the service system, and the log can be parsed and read. The user behavior data is taken.
非法内容识别节点,用于从所述采集节点获取用户所执行的行为;确定在所述用户所执行的行为中,目标行为的单一程度;根据所述目标行为的单一程度,识别所述目标行为所产生内容是否为非法内容。An illegal content identification node, configured to acquire, from the collection node, an action performed by the user; determine a single degree of the target behavior in the behavior performed by the user; and identify the target behavior according to a single degree of the target behavior Whether the generated content is illegal.
需要说明的是,非法内容识别节点用于执行前述实施例二、实施例八中所提供的识别方法,具体参见前述实施例中的相关描述,本实施例中对此不再赘述。It should be noted that the illegitimate content identification node is used to perform the identification method provided in the foregoing Embodiment 2 and Embodiment 8. For details, refer to the related description in the foregoing embodiment, and details are not repeatedly described in this embodiment.
进一步,反垃圾内容系统还包括:管理节点。Further, the anti-spam system further includes: a management node.
管理节点,用于从所述非法内容识别节点获取所识别出的非法内容;对所述非法内 容进行屏蔽。和/或,管理节点,用于从所述非法内容识别节点获取所识别出的非法内容;采用操作权限限制措施,对发布所述非法内容的用户进行处罚。a management node, configured to acquire the identified illegal content from the illegal content identification node; Shielded. And/or a management node, configured to acquire the identified illegal content from the illegal content identification node; and use the operation authority restriction measure to punish the user who issues the illegal content.
由于目前在网站中,存在大量的发布垃圾内容的非法用户,频繁在网站发布广告以及其他不良内容,进行恶意推广,导致网络环境恶化,用户体验较差。通过本实施例所提供的反垃圾内容系统,能够有效识别出这些发布垃圾内容的非法用户以及非法用户所发布的非法内容,并对非法用户进行相应处罚的同时,还可以对非法内容进行屏蔽,相较于管理员审核发布内容或者关键字自动屏蔽发布内容等针对每一条内容进行处理的方式,能够从信息发布者这一源头杜绝广告以及其他不良内容的恶意发布,有效净化了网络环境,提高了用户体验。Due to the fact that there are a large number of illegal users who publish spam on the website, frequently posting advertisements and other inappropriate content on the website for malicious promotion, the network environment is deteriorated and the user experience is poor. The anti-spam content system provided in this embodiment can effectively identify the illegal content published by the illegal users and the illegal users, and can also block the illegal content while shielding the illegal content. Compared with the way in which the administrator reviews the posted content or the keyword automatically blocks the published content, it can eliminate the malicious release of advertisements and other bad content from the source of the information publisher, effectively purifying the network environment and improving The user experience.
可见,通过根据用户所执行的行为中目标行为的单一程度,识别目标行为所产生内容是否为非法内容,从而可以识别出采用增大执行目标行为间隔的方式所推送的垃圾内容。由于现有技术中非法用户往往采用这种增大执行目标行为间隔的方式逃避识别,因此,基于目标行为的单一程度的识别方式,降低了非法内容成功逃避识别的概率,提高了非法内容的识别率,优化了非法内容的识别效果。It can be seen that by identifying the content generated by the target behavior as illegal content according to the single degree of the target behavior in the behavior performed by the user, it is possible to identify the spam pushed by the method of increasing the execution target behavior interval. Since the illegal users in the prior art often use this method of increasing the execution target interval to evade recognition, a single degree of recognition based on the target behavior reduces the probability of illegal content escaping recognition and improves the identification of illegal content. Rate, optimized for the recognition of illegal content.
实施例十Example ten
图12为本发明实施例十提供的一种识别装置的结构示意图,如图12所示,包括:确定模块91和识别模块92。FIG. 12 is a schematic structural diagram of an identification device according to Embodiment 10 of the present invention. As shown in FIG. 12, the method includes: a determining module 91 and an identifying module 92.
确定模块91,用于确定在用户所执行的行为中目标行为的单一程度。A determination module 91 is used to determine a single degree of target behavior in the behavior performed by the user.
识别模块92,用于根据所述目标行为的单一程度,识别所述目标行为所产生内容是否为非法内容。The identification module 92 is configured to identify, according to a single degree of the target behavior, whether the content generated by the target behavior is illegal content.
可选地,确定模块91具体用于计算所述目标行为在所述用户所执行的行为中所占比重;采用所述比重指示所述目标行为的单一程度。Optionally, the determining module 91 is specifically configured to calculate a proportion of the target behavior in the behavior performed by the user; and using the specific gravity to indicate a single degree of the target behavior.
其中,确定模块91计算所述目标行为在所述用户所执行的行为中所占比重,包括:计算所述目标行为的执行次数与所述用户所执行的行为的总次数之间的比值;采用平滑算法,对所计算出的比值进行修正,以获得所述目标行为在所述用户所执行的行为中所占比重。The determining module 91 calculates the proportion of the target behavior in the behavior performed by the user, including: calculating a ratio between the number of executions of the target behavior and the total number of behaviors performed by the user; A smoothing algorithm corrects the calculated ratio to obtain a proportion of the target behavior in the behavior performed by the user.
识别模块92具体用于根据所述目标行为的单一程度,估算所述内容为非法内容的概率;根据所述概率,识别非法内容。The identification module 92 is specifically configured to estimate a probability that the content is illegal content according to a single degree of the target behavior; and identify the illegal content according to the probability.
其中,识别模块92根据所述目标行为的单一程度,估算所述内容为非法内容的概率, 包括:根据第一时间段内所述用户的目标行为的单一程度,以及第二时间段内所述目标行为的执行频次,计算得到所述内容为非法内容的概率;其中,所述第一时间段的时长小于所述第二时间段的时长。The identification module 92 estimates the probability that the content is illegal content according to a single degree of the target behavior. The method includes: calculating a probability that the content is illegal content according to a single degree of the target behavior of the user in the first time period, and an execution frequency of the target behavior in the second time period; wherein the first time The duration of the segment is less than the duration of the second period of time.
作为一种可能的实现方式,目标行为包括至少两步操作。确定模块91确定在用户所执行的行为中目标行为的单一程度之前,还用于:针对每一用户的行为,进行分析,以获得所述用户重复执行的行为;将重复执行次数超过预设阈值的行为作为所述目标行为。As a possible implementation, the target behavior includes at least two steps. The determining module 91 determines, before the singularity of the target behavior in the behavior performed by the user, the method for: analyzing, for each user's behavior, obtaining the behavior repeatedly performed by the user; and repeating the execution times exceeding a preset threshold The behavior as the target behavior.
可见,通过根据用户所执行的行为中目标行为的单一程度,识别目标行为所产生内容是否为非法内容,从而可以识别出采用增大执行目标行为间隔的方式所推送的垃圾内容。由于现有技术中非法用户往往采用这种增大执行目标行为间隔的方式逃避识别,因此,基于目标行为的单一程度的识别方式,降低了非法内容成功逃避识别的概率,提高了非法内容的识别率,优化了非法内容的识别效果。It can be seen that by identifying the content generated by the target behavior as illegal content according to the single degree of the target behavior in the behavior performed by the user, it is possible to identify the spam pushed by the method of increasing the execution target behavior interval. Since the illegal users in the prior art often use this method of increasing the execution target interval to evade recognition, a single degree of recognition based on the target behavior reduces the probability of illegal content escaping recognition and improves the identification of illegal content. Rate, optimized for the recognition of illegal content.
本领域普通技术人员可以理解:实现上述各方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成。前述的程序可以存储于一计算机可读取存储介质中。该程序在执行时,执行包括上述各方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。One of ordinary skill in the art will appreciate that all or part of the steps to implement the various method embodiments described above may be accomplished by hardware associated with the program instructions. The aforementioned program can be stored in a computer readable storage medium. The program, when executed, performs the steps including the foregoing method embodiments; and the foregoing storage medium includes various media that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
最后应说明的是:以上各实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述各实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。 Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, and are not intended to be limiting; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that The technical solutions described in the foregoing embodiments may be modified, or some or all of the technical features may be equivalently replaced; and the modifications or substitutions do not deviate from the technical solutions of the embodiments of the present invention. range.

Claims (40)

  1. 一种识别方法,其特征在于,所述方法包括:An identification method, the method comprising:
    采集节点记录用户所执行的行为;The collection node records the behavior performed by the user;
    非法用户识别节点从所述采集节点获取用户所执行的行为;The illegal user identification node acquires the behavior performed by the user from the collection node;
    所述非法用户识别节点确定在所述用户所执行的行为中,目标行为的单一程度;The illegal user identification node determines a single degree of target behavior in the behavior performed by the user;
    所述非法用户识别节点根据所述目标行为的单一程度,识别所述用户是否为非法用户。The illegal user identification node identifies whether the user is an illegal user according to a single degree of the target behavior.
  2. 根据权利要求1所述的识别方法,其特征在于,所述非法用户识别节点确定在所述用户所执行的行为中,目标行为的单一程度,包括:The identification method according to claim 1, wherein the illegal user identification node determines a single degree of target behavior in the behavior performed by the user, including:
    所述非法用户识别节点计算所述目标行为在所述用户所执行的行为中所占比重,采用所述比重指示所述目标行为的单一程度。The illegal user identification node calculates a proportion of the target behavior in the behavior performed by the user, and uses the specific gravity to indicate a single degree of the target behavior.
  3. 根据权利要求2所述的识别方法,其特征在于,所述计算所述目标行为在所述用户所执行的行为中所占比重,包括:The identification method according to claim 2, wherein the calculating the proportion of the target behavior in the behavior performed by the user comprises:
    所述非法用户识别节点计算所述目标行为的执行次数与所述用户所执行的行为的总次数之间的比值;The illegal user identification node calculates a ratio between the number of executions of the target behavior and the total number of times the user performs the behavior;
    采用平滑算法,所述非法用户识别节点对所计算出的比值进行修正,以获得所述目标行为在所述用户所执行的行为中所占比重。Using a smoothing algorithm, the illegal user identification node corrects the calculated ratio to obtain the proportion of the target behavior in the behavior performed by the user.
  4. 根据权利要求1所述的识别方法,其特征在于,所述非法用户识别节点根据所述目标行为的单一程度,识别所述用户是否为非法用户,包括:The identification method according to claim 1, wherein the unauthorized user identification node identifies whether the user is an illegal user according to a single degree of the target behavior, including:
    所述非法用户识别节点根据所述目标行为的单一程度,估算所述用户为非法用户的概率;The illegal user identification node estimates a probability that the user is an illegal user according to a single degree of the target behavior;
    所述非法用户识别节点根据所述概率,识别非法用户。The illegal user identification node identifies an illegal user according to the probability.
  5. 根据权利要求4所述的识别方法,其特征在于,所述非法用户识别节点根据所述目标行为的单一程度,估算所述用户为非法用户的概率,包括:The identification method according to claim 4, wherein the illegal user identification node estimates the probability that the user is an illegal user according to a single degree of the target behavior, including:
    所述非法用户识别节点根据第一时间段内所述用户的目标行为的单一程度,以及第二时间段内所述目标行为的执行频次,计算得到所述用户为非法用户的概率;其中,所述第一时间段的时长小于所述第二时间段的时长。The illegal user identification node calculates a probability that the user is an illegal user according to a single degree of the target behavior of the user in the first time period and the execution frequency of the target behavior in the second time period; The duration of the first time period is less than the duration of the second time period.
  6. 根据权利要求1-5任一项所述的识别方法,其特征在于,所述目标行为包括至少两步操作。The identification method according to any one of claims 1 to 5, wherein the target behavior comprises at least two steps of operation.
  7. 根据权利要求1-5任一项所述的识别方法,其特征在于,所述非法用户识别节点 确定在所述用户所执行的行为中,目标行为的单一程度之前,还包括:The identification method according to any one of claims 1 to 5, wherein the illegal user identification node Determining a single degree of target behavior in the behavior performed by the user includes:
    所述非法用户识别节点针对每一用户的行为,进行分析,以获得所述用户重复执行的行为;The illegal user identification node performs an analysis on the behavior of each user to obtain the behavior repeatedly performed by the user;
    所述非法用户识别节点将重复执行次数超过预设阈值的行为作为所述目标行为。The illegal user identification node takes the behavior that the number of repeated executions exceeds a preset threshold as the target behavior.
  8. 根据权利要求1-5任一项所述的识别方法,其特征在于,所述识别所述用户是否为非法用户之后,还包括:The identification method according to any one of claims 1-5, wherein after the identifying whether the user is an illegal user, the method further includes:
    若识别所述用户为非法用户,所述非法用户识别节点向管理节点提供所识别出的非法用户;If the user is identified as an illegal user, the illegal user identification node provides the identified illegal user to the management node;
    所述管理节点采用操作权限限制措施,对所述非法用户进行处罚。The management node uses the operation authority restriction measure to punish the illegal user.
  9. 根据权利要求1-5任一项所述的识别方法,其特征在于,所述识别所述用户是否为非法用户之后,还包括:The identification method according to any one of claims 1-5, wherein after the identifying whether the user is an illegal user, the method further includes:
    若识别所述用户为非法用户,所述非法用户识别节点向管理节点提供所识别出的非法用户;If the user is identified as an illegal user, the illegal user identification node provides the identified illegal user to the management node;
    所述管理节点对所述非法用户所发布的内容进行屏蔽。The management node blocks content published by the illegal user.
  10. 一种识别方法,所述方法包括:An identification method, the method comprising:
    采集节点记录用户所执行的行为;The collection node records the behavior performed by the user;
    非法内容识别节点从所述采集节点获取用户所执行的行为;The illegal content identification node acquires the behavior performed by the user from the collection node;
    所述非法内容识别节点确定在所述用户所执行的行为中,目标行为的单一程度;The illegal content identification node determines a single degree of target behavior in the behavior performed by the user;
    所述非法内容识别节点根据所述目标行为的单一程度,识别所述目标行为所产生内容是否为非法内容。The illegal content identification node identifies whether the content generated by the target behavior is illegal content according to a single degree of the target behavior.
  11. 根据权利要求10所述的识别方法,其特征在于,所述非法内容识别节点确定在所述用户所执行的行为中,目标行为的单一程度,包括:The identification method according to claim 10, wherein the illegal content identification node determines a single degree of target behavior in the behavior performed by the user, including:
    所述非法内容识别节点计算所述目标行为在所述用户所执行的行为中所占比重,采用所述比重指示所述目标行为的单一程度。The illegal content identification node calculates a proportion of the target behavior in the behavior performed by the user, and uses the specific gravity to indicate a single degree of the target behavior.
  12. 根据权利要求11所述的识别方法,其特征在于,所述计算所述目标行为在所述用户所执行的行为中所占比重,包括:The identification method according to claim 11, wherein the calculating the proportion of the target behavior in the behavior performed by the user comprises:
    所述非法内容识别节点计算所述目标行为的执行次数与所述用户所执行的行为的总次数之间的比值;The illegal content identification node calculates a ratio between the number of executions of the target behavior and the total number of times the user performs the behavior;
    采用平滑算法,所述非法内容识别节点对所计算出的比值进行修正,以获得所述目标行为在所述用户所执行的行为中所占比重。 Using a smoothing algorithm, the illegal content recognition node corrects the calculated ratio to obtain a proportion of the target behavior in the behavior performed by the user.
  13. 根据权利要求10所述的识别方法,其特征在于,所述非法内容识别节点根据所述目标行为的单一程度,识别所述目标行为所产生内容是否为非法内容,包括:The identification method according to claim 10, wherein the illegal content identification node identifies whether the content generated by the target behavior is illegal content according to a single degree of the target behavior, including:
    所述非法内容识别节点根据所述目标行为的单一程度,估算所述内容为非法内容的概率;The illegal content identification node estimates a probability that the content is illegal content according to a single degree of the target behavior;
    所述非法内容识别节点根据所述概率,识别非法用户。The illegal content identification node identifies an illegal user according to the probability.
  14. 根据权利要求13所述的识别方法,其特征在于,所述非法内容识别节点根据所述目标行为的单一程度,估算所述内容为非法内容的概率,包括:The identification method according to claim 13, wherein the illegal content identification node estimates the probability that the content is illegal content according to a single degree of the target behavior, and includes:
    所述非法内容识别节点根据第一时间段内所述用户的目标行为的单一程度,以及第二时间段内所述目标行为的执行频次,计算得到所述内容为非法内容的概率;其中,所述第一时间段的时长小于所述第二时间段的时长。The illegal content identification node calculates a probability that the content is illegal content according to a single degree of the target behavior of the user in the first time period and the execution frequency of the target behavior in the second time period; The duration of the first time period is less than the duration of the second time period.
  15. 根据权利要求10-14任一项所述的识别方法,其特征在于,所述目标行为包括至少两步操作。The identification method according to any one of claims 10 to 14, wherein the target behavior comprises at least two steps of operation.
  16. 根据权利要求10-14任一项所述的识别方法,其特征在于,所述非法内容识别节点确定在所述用户所执行的行为中,目标行为的单一程度之前,还包括:The identification method according to any one of claims 10 to 14, wherein the illegal content identification node determines, before the single degree of the target behavior in the behavior performed by the user, the method further includes:
    所述非法内容识别节点针对每一用户的行为,进行分析,以获得所述用户重复执行的行为;The illegal content identification node performs an analysis on the behavior of each user to obtain the behavior repeatedly performed by the user;
    所述非法内容识别节点将重复执行次数超过预设阈值的行为作为所述目标行为。The illegal content identification node takes the behavior that the number of repeated executions exceeds a preset threshold as the target behavior.
  17. 根据权利要求10-14任一项所述的识别方法,其特征在于,所述识别所述目标行为所产生内容是否为非法内容之后,还包括:The identification method according to any one of claims 10 to 14, wherein after the identifying whether the content generated by the target behavior is illegal content, the method further includes:
    若识别所述内容为非法内容,所述非法内容识别节点向管理节点提供所识别出的非法内容;If the content is identified as illegal content, the illegal content identification node provides the identified illegal content to the management node;
    所述管理节点对所述非法内容进行屏蔽。The management node blocks the illegal content.
  18. 根据权利要求10-14任一项所述的识别方法,其特征在于,所述识别所述目标行为所产生内容是否为非法内容之后,还包括:The identification method according to any one of claims 10 to 14, wherein after the identifying whether the content generated by the target behavior is illegal content, the method further includes:
    若识别所述内容为非法内容,所述非法内容识别节点向管理节点提供所识别出的非法内容;If the content is identified as illegal content, the illegal content identification node provides the identified illegal content to the management node;
    所述管理节点采用操作权限限制措施,对发布所述非法内容的用户进行处罚。The management node uses the operation authority restriction measure to punish the user who issues the illegal content.
  19. 一种反垃圾内容系统,其特征在于,包括:采集节点和非法用户识别节点;An anti-spam content system, comprising: an acquisition node and an illegal user identification node;
    所述采集节点,用于记录用户所执行的行为;The collecting node is configured to record behavior performed by the user;
    所述非法用户识别节点,用于从所述采集节点获取用户所执行的行为;确定在所述 用户所执行的行为中,目标行为的单一程度;根据所述目标行为的单一程度,识别所述用户是否为非法用户。The illegal user identification node is configured to acquire, from the collection node, an action performed by the user; The single degree of the target behavior in the behavior performed by the user; identifying whether the user is an illegal user according to a single degree of the target behavior.
  20. 根据权利要求19所述的反垃圾内容系统,其特征在于,所述系统还包括:The anti-spam system of claim 19, wherein the system further comprises:
    管理节点,用于从所述非法用户识别节点获取所识别出的非法用户,采用操作权限限制措施,对所述非法用户进行处罚。The management node is configured to acquire the identified illegal user from the illegal user identification node, and impose an operation authority restriction measure to punish the illegal user.
  21. 根据权利要求19所述的反垃圾内容系统,其特征在于,所述系统还包括:The anti-spam system of claim 19, wherein the system further comprises:
    管理节点,用于从所述非法用户识别节点获取所识别出的非法用户,对所述非法用户所发布的内容进行屏蔽。And a management node, configured to acquire the identified illegal user from the illegal user identification node, and block the content published by the illegal user.
  22. 一种反垃圾内容系统,其特征在于,包括:采集节点和非法内容识别节点;An anti-spam content system, comprising: an acquisition node and an illegal content identification node;
    所述采集节点,用于记录用户所执行的行为;The collecting node is configured to record behavior performed by the user;
    所述非法内容识别节点,用于从所述采集节点获取用户所执行的行为;确定在所述用户所执行的行为中,目标行为的单一程度;根据所述目标行为的单一程度,识别所述目标行为所产生内容是否为非法内容。The illegal content identification node is configured to acquire, from the collection node, an action performed by the user; determine a single degree of the target behavior in the behavior performed by the user; and identify the single degree according to the single degree of the target behavior Whether the content generated by the target behavior is illegal.
  23. 根据权利要求22所述的反垃圾内容系统,其特征在于,所述系统还包括:The anti-spam system of claim 22, wherein the system further comprises:
    管理节点,用于从所述非法内容识别节点获取所识别出的非法内容;对所述非法内容进行屏蔽。And a management node, configured to acquire the identified illegal content from the illegal content identification node; and block the illegal content.
  24. 根据权利要求22所述的反垃圾内容系统,其特征在于,所述系统还包括:The anti-spam system of claim 22, wherein the system further comprises:
    管理节点,用于从所述非法内容识别节点获取所识别出的非法内容;采用操作权限限制措施,对发布所述非法内容的用户进行处罚。And a management node, configured to acquire the identified illegal content from the illegal content identification node; and use the operation authority restriction measure to punish the user who issues the illegal content.
  25. 一种识别方法,其特征在于,包括:An identification method, comprising:
    确定在用户所执行的行为中目标行为的单一程度;Determine the single degree of target behavior in the behavior performed by the user;
    根据所述目标行为的单一程度,识别非法用户。An illegal user is identified based on a single degree of the target behavior.
  26. 根据权利要求25所述的识别方法,其特征在于,所述确定目标行为的单一程度包括:The identification method according to claim 25, wherein said determining a single degree of target behavior comprises:
    计算所述目标行为在所述用户所执行的行为中所占比重;Calculating a proportion of the target behavior in the behavior performed by the user;
    采用所述比重指示所述目标行为的单一程度。The specific gravity is used to indicate a single degree of the target behavior.
  27. 根据权利要求26所述的识别方法,其特征在于,所述计算所述目标行为在所述用户所执行的行为中所占比重,包括:The identification method according to claim 26, wherein the calculating the proportion of the target behavior in the behavior performed by the user comprises:
    计算所述目标行为的执行次数与所述用户所执行的行为的总次数之间的比值;Calculating a ratio between the number of executions of the target behavior and the total number of behaviors performed by the user;
    采用平滑算法,对所计算出的比值进行修正,以获得所述目标行为在所述用户所执 行的行为中所占比重。Using a smoothing algorithm, correcting the calculated ratio to obtain the target behavior in the user's execution The proportion of the behavior of the line.
  28. 根据权利要求25所述的识别方法,其特征在于,所述根据所述目标行为的单一程度,识别非法用户包括:The identification method according to claim 25, wherein the identifying the illegal user according to the single degree of the target behavior comprises:
    根据所述目标行为的单一程度,估算所述用户为非法用户的概率;Estimating the probability that the user is an illegal user according to a single degree of the target behavior;
    根据所述概率,识别非法用户。An illegal user is identified based on the probability.
  29. 根据权利要求28所述的识别方法,其特征在于,所述根据所述目标行为的单一程度,估算所述用户为非法用户的概率,包括:The identification method according to claim 28, wherein the estimating the probability that the user is an illegal user according to a single degree of the target behavior comprises:
    根据第一时间段内所述用户的目标行为的单一程度,以及第二时间段内所述目标行为的执行频次,计算得到所述用户为非法用户的概率;其中,所述第一时间段的时长小于所述第二时间段的时长。Calculating, according to a single degree of the target behavior of the user in the first time period, and a frequency of execution of the target behavior in the second time period, calculating a probability that the user is an illegal user; wherein, the first time period The duration is less than the duration of the second period of time.
  30. 根据权利要求25-29任一项所述的识别方法,其特征在于,所述目标行为包括至少两步操作。The identification method according to any one of claims 25 to 29, wherein the target behavior comprises at least two steps of operation.
  31. 根据权利要求25-29任一项所述的识别方法,其特征在于,所述确定在用户所执行的行为中目标行为的单一程度之前,还包括:The identification method according to any one of claims 25 to 29, wherein before the determining a single degree of the target behavior in the behavior performed by the user, the method further comprises:
    针对每一用户的行为,进行分析,以获得所述用户重复执行的行为;Performing an analysis for each user's behavior to obtain the repeated behavior of the user;
    将重复执行次数超过预设阈值的行为作为所述目标行为。The behavior of repeating the number of executions exceeding the preset threshold is taken as the target behavior.
  32. 一种识别装置,其特征在于,包括:An identification device, comprising:
    确定模块,用于确定在用户所执行的行为中目标行为的单一程度;a determination module for determining a single degree of target behavior in the behavior performed by the user;
    识别模块,用于根据所述目标行为的单一程度,识别非法用户。An identification module for identifying an illegal user according to a single degree of the target behavior.
  33. 一种识别方法,其特征在于,包括:An identification method, comprising:
    确定在用户所执行的行为中目标行为的单一程度;Determine the single degree of target behavior in the behavior performed by the user;
    根据所述目标行为的单一程度,识别所述目标行为所产生内容是否为非法内容。Identifying whether the content generated by the target behavior is illegal content according to a single degree of the target behavior.
  34. 根据权利要求33所述的识别方法,其特征在于,所述确定在用户所执行的行为中目标行为的单一程度,包括:The identification method according to claim 33, wherein said determining a single degree of target behavior in the behavior performed by the user comprises:
    计算所述目标行为在所述用户所执行的行为中所占比重;Calculating a proportion of the target behavior in the behavior performed by the user;
    采用所述比重指示所述目标行为的单一程度。The specific gravity is used to indicate a single degree of the target behavior.
  35. 根据权利要求34所述的识别方法,其特征在于,所述计算所述目标行为在所述用户所执行的行为中所占比重,包括:The identification method according to claim 34, wherein the calculating the proportion of the target behavior in the behavior performed by the user comprises:
    计算所述目标行为的执行次数与所述用户所执行的行为的总次数之间的比值;Calculating a ratio between the number of executions of the target behavior and the total number of behaviors performed by the user;
    采用平滑算法,对所计算出的比值进行修正,以获得所述目标行为在所述用户所执 行的行为中所占比重。Using a smoothing algorithm, correcting the calculated ratio to obtain the target behavior in the user's execution The proportion of the behavior of the line.
  36. 根据权利要求33所述的识别方法,其特征在于,所述根据所述目标行为的单一程度,识别所述目标行为所产生内容是否为非法内容包括:The identification method according to claim 33, wherein the identifying whether the content generated by the target behavior is illegal according to a single degree of the target behavior comprises:
    根据所述目标行为的单一程度,估算所述内容为非法内容的概率;Estimating the probability that the content is illegal content according to a single degree of the target behavior;
    根据所述概率,识别非法内容。Illegal content is identified based on the probability.
  37. 根据权利要求36所述的识别方法,其特征在于,所述根据所述目标行为的单一程度,估算所述内容为非法内容的概率,包括:The identification method according to claim 36, wherein the estimating the probability that the content is illegal content according to a single degree of the target behavior comprises:
    根据第一时间段内所述用户的目标行为的单一程度,以及第二时间段内所述目标行为的执行频次,计算得到所述内容为非法内容的概率;其中,所述第一时间段的时长小于所述第二时间段的时长。Calculating, according to a single degree of the target behavior of the user in the first time period, and a frequency of execution of the target behavior in the second time period, calculating a probability that the content is illegal content; wherein, the first time period The duration is less than the duration of the second period of time.
  38. 根据权利要求33-37任一项所述的识别方法,其特征在于,所述目标行为包括至少两步操作。The identification method according to any one of claims 33 to 37, wherein the target behavior comprises at least two steps of operation.
  39. 根据权利要求33-37任一项所述的识别方法,其特征在于,所述确定在用户所执行的行为中目标行为的单一程度之前,还包括:The identification method according to any one of claims 33 to 37, wherein the determining, before determining a single degree of the target behavior in the behavior performed by the user, further comprises:
    针对每一用户的行为,进行分析,以获得所述用户重复执行的行为;Performing an analysis for each user's behavior to obtain the repeated behavior of the user;
    将重复执行次数超过预设阈值的行为作为所述目标行为。The behavior of repeating the number of executions exceeding the preset threshold is taken as the target behavior.
  40. 一种识别装置,其特征在于,包括:An identification device, comprising:
    确定模块,用于确定在用户所执行的行为中目标行为的单一程度;a determination module for determining a single degree of target behavior in the behavior performed by the user;
    识别模块,用于根据所述目标行为的单一程度,识别所述目标行为所产生内容是否为非法内容。 And an identification module, configured to identify, according to a single degree of the target behavior, whether the content generated by the target behavior is illegal content.
PCT/CN2017/115573 2016-12-23 2017-12-12 Identification method and device, and anti-junk content system WO2018113551A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201611207325.2 2016-12-23
CN201611207325.2A CN108243142A (en) 2016-12-23 2016-12-23 Recognition methods and device and anti-spam content system

Publications (1)

Publication Number Publication Date
WO2018113551A1 true WO2018113551A1 (en) 2018-06-28

Family

ID=62624416

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/115573 WO2018113551A1 (en) 2016-12-23 2017-12-12 Identification method and device, and anti-junk content system

Country Status (3)

Country Link
CN (1) CN108243142A (en)
TW (1) TW201824048A (en)
WO (1) WO2018113551A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101389085A (en) * 2008-10-14 2009-03-18 中国联合通信有限公司 Rubbish short message recognition system and method based on sending behavior
CN101472245A (en) * 2007-12-27 2009-07-01 中国移动通信集团公司 Method and apparatus for intercepting rubbish short message
CN102413076A (en) * 2011-12-22 2012-04-11 网易(杭州)网络有限公司 Spam mail judging system based on behavior analysis
CN104067280A (en) * 2011-10-18 2014-09-24 迈可菲公司 System and method for detecting a malicious command and control channel

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090119242A1 (en) * 2007-10-31 2009-05-07 Miguel Vargas Martin System, Apparatus, and Method for Internet Content Detection
CN104836781B (en) * 2014-02-20 2018-11-09 腾讯科技(北京)有限公司 Distinguish the method and device for accessing user identity
CN104980402B (en) * 2014-04-09 2020-02-21 腾讯科技(北京)有限公司 Method and device for identifying malicious operation
CN105808639B (en) * 2016-02-24 2021-02-09 平安科技(深圳)有限公司 Network access behavior identification method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101472245A (en) * 2007-12-27 2009-07-01 中国移动通信集团公司 Method and apparatus for intercepting rubbish short message
CN101389085A (en) * 2008-10-14 2009-03-18 中国联合通信有限公司 Rubbish short message recognition system and method based on sending behavior
CN104067280A (en) * 2011-10-18 2014-09-24 迈可菲公司 System and method for detecting a malicious command and control channel
CN102413076A (en) * 2011-12-22 2012-04-11 网易(杭州)网络有限公司 Spam mail judging system based on behavior analysis

Also Published As

Publication number Publication date
TW201824048A (en) 2018-07-01
CN108243142A (en) 2018-07-03

Similar Documents

Publication Publication Date Title
US8321934B1 (en) Anti-phishing early warning system based on end user data submission statistics
CN108768943B (en) Method and device for detecting abnormal account and server
US10992762B2 (en) Processing link identifiers in click records of a log file
US8769695B2 (en) Phish probability scoring model
US11095586B2 (en) Detection of spam messages
US9930065B2 (en) Measuring, categorizing, and/or mitigating malware distribution paths
US8671449B1 (en) Systems and methods for identifying potential malware
US20080289047A1 (en) Anti-content spoofing (acs)
US9258316B1 (en) Systems and methods for generating reputation-based ratings for uniform resource locators
AU2013204865B2 (en) Methods and apparatus to share online media impressions data
US7860870B2 (en) Detection of abnormal user click activity in a search results page
US20130042306A1 (en) Determining machine behavior
WO2017113677A1 (en) User behavior data processing method and system
CN110442712B (en) Risk determination method, risk determination device, server and text examination system
CN109522190B (en) Abnormal user behavior identification method and device, electronic equipment and storage medium
JP2013502009A5 (en)
CN107196968B (en) Crawler identification method
CN107682345B (en) IP address detection method and device and electronic equipment
CN109257390B (en) CC attack detection method and device and electronic equipment
US10057155B2 (en) Method and apparatus for determining automatic scanning action
US8627463B1 (en) Systems and methods for using reputation information to evaluate the trustworthiness of files obtained via torrent transactions
US9942255B1 (en) Method and system for detecting abusive behavior in hosted services
US11528288B2 (en) Service infrastructure and methods of predicting and detecting potential anomalies at the service infrastructure
CN102982048A (en) Method and device for assessing junk information mining rule
CN117176482B (en) Big data network safety protection method and system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17883813

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17883813

Country of ref document: EP

Kind code of ref document: A1