WO2018113551A1

WO2018113551A1 - Identification method and device, and anti-junk content system

Info

Publication number: WO2018113551A1
Application number: PCT/CN2017/115573
Authority: WO
Inventors: 张祥; 安伟亭; 魏虎
Original assignee: 阿里巴巴集团控股有限公司
Priority date: 2016-12-23
Filing date: 2017-12-12
Publication date: 2018-06-28
Also published as: TW201824048A; CN108243142A

Abstract

The embodiments of the invention provide an identification method and device, and an anti-junk content system for identifying an illegal user according to a similarity of target behaviors in behaviors performed by a user, thereby identifying a user performing the target behaviors at an increased time interval in order to push junk content. Since illegal users often adopt the method of increasing a time interval at which target behaviors are performed to avoid detection, the identification method, which is based on similarity of target behaviors, can reduce the probability of an illegal user successfully avoiding detection, increasing the rate of illegal user identification, and optimizing identification of the illegal users.

Description

Identification method and device and anti-spam system

The present application claims priority to Chinese Patent Application Serial No. No. No. No. No. No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No

Technical field

The present invention relates to information technology, and in particular, to an identification method and apparatus, and an anti-spam content system.

Background technique

With the continuous development of the Internet, the number of illegal users is also rising. The existence of these illegal users has led to a significant deterioration of the website environment, and the user experience has dropped dramatically. For example, in social networking sites that offer products such as blogs or communities, there are usually a large number of illegal users (Spammers) who post spam, frequently posting large amounts of advertising/pornography on the site for malicious promotion. Therefore, these illegal users must be identified to prevent them from continuing such illegal activities.

In the prior art, by recording the number of times a user cumulatively performs a certain behavior in a short period of time, and setting a threshold according to experience, when the number of times the user performs the behavior exceeds a threshold, the user is identified as an illegal user. However, in the process of identifying an illegal user, it is found that the illegal user's illegal behavior is changed, and the recognition rate of the illegal user is getting lower and lower, and the recognition effect is worse.

Summary of the invention

The invention provides an identification method and device and an anti-spam content system, which are used to solve the technical problem that the illegal user identification effect in the prior art is poor.

In order to achieve the above object, embodiments of the present invention adopt the following technical solutions:

In a first aspect, an identification method is provided, comprising:

The collection node records the behavior performed by the user;

The illegal user identification node acquires the behavior performed by the user from the collection node;

The illegal user identification node determines a single degree of target behavior in the behavior performed by the user;

The illegal user identification node identifies whether the user is an illegal user according to a single degree of the target behavior.

In a second aspect, an identification method is provided, the method comprising:

The collection node records the behavior performed by the user;

The illegal content identification node acquires the behavior performed by the user from the collection node;

The illegal content identification node determines a single degree of target behavior in the behavior performed by the user;

The illegal content identification node identifies whether the content generated by the target behavior is illegal content according to a single degree of the target behavior.

In a third aspect, an anti-spam content system is provided, including: an acquisition node and an illegal user identification node;

The log collection and resolution node is configured to record behavior performed by the user;

The illegal user identification node is configured to acquire, from the collection node, an action performed by the user; determine a single degree of the target behavior in the behavior performed by the user; and identify the single behavior according to the single degree of the target behavior Whether the user is an illegal user.

In a fourth aspect, an anti-spam content system is provided, including: an acquisition node and an illegal content identification node;

The collecting node is configured to record behavior performed by the user;

The illegal content identification node is configured to acquire, from the collection node, an action performed by the user; determine a single degree of the target behavior in the behavior performed by the user; and identify the single degree according to the single degree of the target behavior Whether the content generated by the target behavior is illegal.

In a fifth aspect, an identification method is provided, including:

Determine the single degree of target behavior in the behavior performed by the user;

An illegal user is identified based on a single degree of the target behavior.

In a sixth aspect, an identification device is provided, including:

a determination module for determining a single degree of target behavior in the behavior performed by the user;

An identification module for identifying an illegal user according to a single degree of the target behavior.

In a seventh aspect, an identification method is provided, including:

Identifying whether the content generated by the target behavior is illegal content according to a single degree of the target behavior.

In an eighth aspect, an identification device is provided, including:

And an identification module, configured to identify, according to a single degree of the target behavior, whether the content generated by the target behavior is illegal content.

The identification method and device and the anti-spam content system provided by the embodiments of the present invention can identify an illegal user by using a single degree of target behavior in the behavior performed by the user, thereby identifying Users who perform actions such as spam pushes. Since the illegal users in the prior art often adopt the method of increasing the target behavior interval to evade recognition, the single-level recognition mode based on the target behavior reduces the probability of illegal users escaping recognition and improves the identification of illegal users. The rate optimizes the recognition effect of illegal users.

The above description is only an overview of the technical solutions of the present invention, and the above-described and other objects, features and advantages of the present invention can be more clearly understood. Specific embodiments of the invention are set forth below.

DRAWINGS

Various other advantages and benefits will become apparent to those skilled in the art from a The drawings are only for the purpose of illustrating the preferred embodiments and are not to be construed as limiting. Throughout the drawings, the same reference numerals are used to refer to the same parts. In the drawing:

FIG. 1 is a schematic diagram of interaction of an identification method according to Embodiment 1 of the present invention;

2 is a schematic diagram of interaction of another identification method according to Embodiment 1 of the present invention;

3 is a schematic diagram of interaction of an identification method according to Embodiment 2 of the present invention;

4 is a schematic diagram of interaction of another identification method according to Embodiment 2 of the present invention;

FIG. 5 is a schematic flowchart of a method for identifying according to Embodiment 3 of the present invention;

FIG. 6 is a schematic flowchart diagram of an identification method according to Embodiment 4 of the present invention;

FIG. 7 is a schematic structural diagram of an anti-spam content system according to Embodiment 5 of the present invention; FIG.

8 is a schematic structural diagram of an identification device according to Embodiment 6 of the present invention;

9 is a schematic structural diagram of an identification device according to Embodiment 7 of the present invention;

FIG. 10 is a schematic flowchart diagram of an identification method according to Embodiment 8 of the present invention; FIG.

11 is a schematic structural diagram of an anti-spam content system according to Embodiment 9 of the present invention;

FIG. 12 is a schematic structural diagram of an identification device according to Embodiment 10 of the present invention.

detailed description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While the embodiments of the present invention have been shown in the drawings, the embodiments Rather, these embodiments are provided so that this disclosure will be more fully understood and the scope of the disclosure will be fully disclosed.

The identification method and device and the anti-spam content system provided by the embodiments of the present invention are described in detail below with reference to the accompanying drawings.

Embodiment 1

FIG. 1 is a schematic diagram of interaction of an identification method according to Embodiment 1 of the present invention. The method provided by this embodiment is performed by an collection node and an illegal user identification node. As shown in FIG. 1 , the method may include:

In step 101, the collection node records the behavior performed by the user.

Optionally, the collecting node may record the behavior performed by the user through the log data of the service system, so that the illegal user identifies the node, and identifies the illegal user according to the behavior of the user.

Step 102: The illegal user identification node acquires the behavior performed by the user from the collection node.

Optionally, the illegal user identification node may periodically acquire the behavior performed by the user from the collection node. Generally, in order to reduce the load, the user may select to perform the behavior of acquiring the behavior performed by the user and identifying the illegal user during the idle period of the service system. .

Step 103: The illegal user identification node determines a single degree of the target behavior in the behavior performed by the user, and identifies whether the user is an illegal user according to a single degree of the target behavior.

The inventor analyzes the behavior of the current illegal users and finds that the illegal users often succeed in evading recognition by increasing the interval of performing the target behavior. Therefore, the practice of the number of executions of the target behavior in the short-term is not recognized in the prior art. This illegal user. At the same time, the inventors have found that although illegal users perform target behaviors by increasing the execution interval, these illegal users perform target behaviors more singly than normal users, that is, perform less behaviors unrelated to their illegal purposes. More focused on the implementation of target behavior related to spam push or malicious billing. Thus, the inventors propose that these illegal users can be identified based on a single degree of target behavior.

As a possible implementation, the target behavior should be the behavior necessary for the illegal user to perform the illegal behavior. For example, when the illegal user who publishes the spam is identified, the information can be published as the target behavior, and the single user is identified. When you buy, you can target purchase behavior.

In addition, as another possible implementation manner, in order to increase the accuracy of the recognition, data mining may be performed for each user's historical behavior to obtain the repeated behavior of the user, and the number of repeated executions exceeds the preset. The behavior of the threshold is the target behavior.

For example, the sequence formed by the user's operation is suffixed array or dynamically planned to perform data mining to obtain the longest common subsequence. The mining result includes: the longest common subsequence, and the support corresponding to the longest common subsequence. The longest common subsequence is the behavior pattern that the user has performed at least twice, that is, The operations of the rows are the same and the order between the operations is the same; the degree of support is the number of times the behavior pattern has been executed. The operations in the longest common subsequence whose support exceeds the preset threshold and their order are selected as the target behavior.

The inventors found that this is because these illegal users often adopt the same behavior pattern. Therefore, data mining can be performed periodically based on this situation, and the repeated behavior pattern, that is, the target behavior, is obtained. At this time, the target behavior generally includes at least two steps. The operation, that is, the target behavior specifies the content of the operation, and the order in which the operations are performed.

After determining the target behavior, the illegal user identification node calculates the proportion of the target behavior in the behavior performed by the user, and uses the specific gravity to indicate a single degree of the target behavior. As a possible implementation manner, the illegal user identification node calculates the proportion of the target behavior in the behavior performed by the user, and specifically calculates the number of executions of the target behavior and the behavior performed by the user. The ratio between the total number of times. Then, using the smoothing algorithm, the calculated ratio is corrected to obtain the manner in which the target behavior accounts for the proportion of the behavior performed by the user.

Further, the illegal user identification node estimates the probability that the user is an illegal user according to a single degree of the target behavior, and the illegal user identification node identifies the illegal user according to the probability. As a possible implementation manner, the illegal user identification node calculates that the user is an illegal user according to a single degree of the target behavior of the user in the first time period and the execution frequency of the target behavior in the second time period. The probability.

The duration of the first period of time is less than the duration of the second period of time. The first time period corresponds to the long term, and the second time period corresponds to the short term. On the one hand, the short-term behavior of the user, that is, the behavior performed in the recent statistical time window, can determine the single degree of the target behavior, and then estimate the probability that the user is an illegal user according to the short-term behavior according to the single degree of the target behavior. On the other hand, for the long-term behavior of the user, that is, the behavior performed within the statistical history time window, the frequency of the target behavior is calculated, and the probability of determining the user as an illegal user based on the long-term behavior is estimated. In turn, the illegal user is identified based on the probability determined by the long-term behavior and the probability determined by the short-term behavior.

Further, on the basis of FIG. 1, FIG. 2 is a schematic diagram of interaction of another identification method according to Embodiment 1 of the present invention. As shown in FIG. 2, after identifying whether the user is an illegal user, the method further includes:

Step 104: If the user is identified as an illegal user, the illegal user identification node provides the identified illegal user to the management node.

Further, as a possible implementation manner, after the illegal user identification node provides the identified illegal user to the management node, the identification method further includes:

In step 105, the management node uses the operation permission restriction measure to punish the illegal user.

As another possible implementation manner, after the illegal user identification node provides the identified illegal user to the management node, the identification method further includes:

Step 106: The management node blocks the content published by the illegal user.

The identification method provided in this embodiment is mainly applied to an application scenario in which spam content is distributed, and may specifically identify an illegal user who publishes the spam content. Accordingly, in this application scenario, the target behavior is specifically to release the garbage. The behavior necessary for the content, such as: posting logs, sending station messages, and messages, etc.

In the present embodiment, by identifying an illegal user according to a single degree of the target behavior in the behavior performed by the user, the user who performs the behavior of pushing the spa content or the like by increasing the execution target behavior interval can be identified. Since the illegal users in the prior art often adopt the method of increasing the target behavior interval to evade recognition, the single-level recognition mode based on the target behavior reduces the probability of illegal users escaping recognition and improves the identification of illegal users. The rate optimizes the recognition effect of illegal users.

Embodiment 2

FIG. 3 is a schematic diagram of interaction of an identification method according to Embodiment 2 of the present invention. The method provided by this embodiment is performed by an collection node and an illegal content identification node. As shown in FIG. 3, the method may include:

In step 201, the collecting node records the behavior performed by the user.

Optionally, the collection node may record the behavior performed by the user through the log data of the service system, so that the illegal content identification node identifies the illegal content according to the behavior of the user.

Step 202: The illegal content identification node acquires the behavior performed by the user from the collection node.

Optionally, the illegal content identification node may periodically acquire the behavior performed by the user from the collection node. Generally, in order to reduce the load, the step of acquiring the behavior performed by the user and identifying the illegal content may be selected during the idle period of the service system. .

Step 203: The illegal content identification node determines a single degree of the target behavior in the behavior performed by the user, and identifies whether the content generated by the target behavior is illegal content according to a single degree of the target behavior.

Specifically, it is possible to identify whether the user is an illegal user by a single degree of the target behavior in the behavior performed by the user, and then identify the content generated by the behavior as an illegal content for the illegal user.

The steps of specifically identifying the user as an illegal user are not described in this embodiment. For details, refer to the related description in the foregoing embodiment.

Further, on the basis of FIG. 3, FIG. 4 is a schematic diagram of interaction of another identification method according to Embodiment 2 of the present invention. As shown in FIG. 4, after identifying illegal content, the method further includes:

Step 204: If the content is identified as illegal content, the illegal content identification node provides the identified illegal content to the management node.

Further, as a possible implementation manner, after the illegal content identification node provides the identified illegal content to the management node, the identification method further includes:

Step 205: The management node blocks the illegal content.

As another possible implementation manner, after the illegal content identification node provides the identified illegal content to the management node, the identification method further includes:

Step 206: The management node uses the operation permission restriction measure to punish the user who issues the illegal content.

In this embodiment, by identifying whether the content generated by the target behavior is illegal content according to a single degree of the target behavior in the behavior performed by the user, it is possible to identify the spam content pushed by increasing the execution target behavior interval. Since the illegal users in the prior art often use this method of increasing the execution target interval to evade recognition, a single degree of recognition based on the target behavior reduces the probability of illegal content escaping recognition and improves the identification of illegal content. Rate, optimized for the recognition of illegal content.

Embodiment 3

FIG. 5 is a schematic flowchart of a method for identifying a method according to Embodiment 3 of the present invention. The method provided in this embodiment may be used to identify a certain type of illegal user, for example, an illegal user who issues spam, or a malicious bill. The user does not limit the type of the illegal user in this embodiment. As shown in FIG. 5, the method includes:

Step 301: Determine a single degree of target behavior in the behavior performed by the user.

Specifically, it is possible to calculate the proportion of the target behavior in the behavior performed by the user, and then use the specific gravity to indicate a single degree of the target behavior.

Specifically, as a possible implementation manner, the target behavior should be an action necessary for an illegal user to perform an illegal behavior. For example, when an illegal user who publishes spam is identified, the information may be published as a target behavior. When a single user recognizes, the purchase behavior can be targeted behavior.

In addition, as another possible implementation manner, in order to increase the accuracy of the identification, it is also possible to adopt The user's historical behavior, data mining, to obtain the repeated behavior of the user, and the behavior of repeated execution times exceeding a preset threshold as a target behavior.

For example, the sequence formed by the user's operation is suffixed array or dynamically planned to perform data mining to obtain the longest common subsequence. The mining result includes: the longest common subsequence, and the support corresponding to the longest common subsequence. The longest common subsequence is a behavior pattern in which the user has performed at least twice, that is, the operations performed are the same and the order between operations is the same; the degree of support is the number of times the behavior pattern has been executed. . The operations in the longest common subsequence whose support exceeds the preset threshold and their order are selected as the target behavior.

This is because the inventors found that these illegal users often adopt the same behavior pattern. Therefore, data mining can be performed regularly based on this situation, and the repeated behavior pattern, that is, the target behavior, is obtained. At this time, the target behavior generally includes at least two steps. The operation, that is, the target behavior specifies the content of the operation, and the order in which the operations are performed.

After determining the target behavior, as a single possible implementation of determining the target behavior, it is possible to calculate the ratio between the number of executions of the target behavior and the total number of behaviors performed by the user during the sampling period. For the case that the total number of behaviors obtained by sampling is small, in order to reduce the accuracy of noise improvement, a smoothing algorithm may be used to correct the calculated ratio to obtain the target behavior in the behavior performed by the user. Specific gravity, for example: Laplacian smoothing algorithm can be used.

Step 302: Identify an illegal user according to a single degree of the target behavior.

As a possible implementation manner, if the target behavior is more than a limit, the user is identified as an illegal user.

As another possible implementation manner, on the one hand, for the short-term behavior of the user, that is, the behavior performed in the recent statistical time window, the single degree of the target behavior may be determined based on step 301, and then the basis for estimating the basis according to the single degree of the target behavior. The short-term behavior determines the probability that the user is an illegal user. On the other hand, for the long-term behavior of the user, that is, the behavior performed within the statistical history time window, the frequency of the target behavior is calculated, and the probability of determining the user as an illegal user based on the long-term behavior is estimated. In turn, the illegal user is identified based on the probability determined by the long-term behavior and the probability determined by the short-term behavior.

It can be seen that by identifying the illegal user according to the single degree of the target behavior in the behavior performed by the user, it is possible to identify the user who performs the behavior of pushing the spa content by increasing the execution target interval. Since the illegal users in the prior art often adopt the method of increasing the target behavior interval to evade recognition, the single-level recognition mode based on the target behavior reduces the probability of illegal users escaping recognition and improves the identification of illegal users. The rate optimizes the recognition effect of illegal users.

Embodiment 4

FIG. 6 is a schematic flowchart of an identification method according to Embodiment 4 of the present invention. In this embodiment, an identification method is described based on an identification process of an illegal user that issues spam content. Specifically, the target behavior may be an information publishing behavior. . The method provided by this embodiment may be performed by an illegal user identification node, which is set in an anti-spam system.

As shown in Figure 6, the method includes:

Step 401: Perform statistics on the behavior of the current user.

Specifically, the user behavior data is obtained, and the behavior of each user is counted one by one, including:

A. Short-term behavior statistics:

_Counts the number of times the user has performed the target behavior count _{tgt_acc} within the current time window. Among them, count _{tgt_acc} reflects how _often the user performs the target behavior in a short time.

B. Long-term behavior statistics:

_Counts the time window from the 0:00 on the current day to the current time window, and the number of times the user has performed the target behavior count _{tgt_total} . And the statistics from the time window starting at 0:00 to the current time window, the total number of times the user has performed all the actions count _{all_total} .

Step 402: Calculate, according to the statistical result, a probability S of each user as an illegal user who issues spam.

In this embodiment, the S parameter is used to indicate the probability that the user is an illegal user who issues spam.

The formula for calculating the S parameter is as follows:

among them,

The parameter S includes two parts, one part is for the short-term behavior, that is, the behavior performed in the current time window, and the probability that the user obtained the illegal user who publishes the spam is calculated, that is, the formula

section. In this part, E[counttgt_acc] represents the expected value of count _{tgt_acc} , and the expected value may specifically be the average value of count _{tgt_acc} of all users.

The other part is the long-term behavior, that is, the behavior performed from the historical time window to the time window of the current time window, and the probability that the user obtained the illegal user who publishes the spam is calculated, that is, in the formula

part. In this part, E[countratio] represents the expected value of the count _ratio , and the expected value may specifically be the average of the count _ratios of all users.

Among them, count _ratio quantifies the degree to which the user performs the target behavior, adds Laplacian smoothing in the calculation of count _ratio to deal with the less user behavior, and avoids the increase of calculation error caused by less user behavior. In the above formula, β is the Laplacian smoothing parameter used in the smoothing process.

In order to adjust the influence of the long-term behavior calculation and the short-term behavior calculation on the value of the S parameter in the S-parameter calculation, the weight α is set, thereby adjusting the influence weight of the long-term behavior calculation result and the short-term behavior calculation result on the finally calculated probability. The weight α ranges from 0 to 1.

Step 403: Determine whether the probability S of the user is greater than a preset threshold. If yes, execute step 204, otherwise identify the next user.

Step 404: If it is determined that the user is an illegal user that issues spam, send the information of the user to the management node to perform permission restriction or block content processing.

Embodiment 5

This embodiment provides an anti-spam content system. FIG. 7 is a schematic structural diagram of an anti-spam content system according to Embodiment 5 of the present invention. As shown in FIG. 7, the anti-spam content system includes: an collection node and an illegal user identification. node.

The collection node is used to record the behavior performed by the user.

Specifically, the collection node is an interface between the anti-spam system and the online service system, and is used to complete the collection of the behavior of the user in the service system. Specifically, the log can be obtained from the service system, and the log can be parsed and read. The user behavior data is taken.

An illegal user identification node, configured to acquire, from the collection node, an action performed by the user; determine a single degree of the target behavior in the behavior performed by the user; and identify whether the user is based on a single degree of the target behavior It is an illegal user.

It should be noted that the illegal user identification node is used to perform the identification methods provided in the foregoing Embodiment 1, the third embodiment, and the fourth embodiment. For details, refer to the related description in the foregoing embodiment. .

Further, the anti-spam system further includes: a management node.

The management node is configured to acquire the identified illegal user from the illegal user identification node, and impose an operation authority restriction measure to punish the illegal user. And/or, the management node is configured to acquire the identified illegal user from the illegal user identification node, and block the content published by the illegal user.

The anti-spam system can run on the server to identify illegal users who publish spam, and then restrict the operation rights of the identified illegal users, so as to prohibit them from posting information on the website, thereby reducing the spam posted on the website. .

As a possible implementation manner, the illegal user identification node completes the process of identifying the illegal user and performing the operation authority restriction on the identified illegal user by interacting with the management node.

Specifically, as shown in FIG. 7 , the illegal user identification node acquires relevant data of the user behavior from the collection node, and then the illegal user identification node performs data analysis according to the acquired related data of the user behavior to identify from each user. The illegal process of publishing the spam, the analysis process may include: for each user, determining a single degree of the target behavior in the behavior performed by the user, and then identifying whether the user is an illegal user according to a single degree of the target behavior. The illegal user identification node provides the identified illegal user to the management node, and the management node can review the identified illegal user, and after checking that the illegal user who publishes the spam is correct, for each illegal user Set the corresponding operation permission restriction measures, for example, you must not post the log within 3 days, or the user freezes the user account for 3 days.

Embodiment 6

FIG. 8 is a schematic structural diagram of an identification device according to Embodiment 6 of the present invention. As shown in FIG. 8, the method includes: a determining module 41 and an identifying module 42.

A determination module 41 is operative to determine a single degree of target behavior in the behavior performed by the user.

The identification module 42 is configured to identify an illegal user according to a single degree of the target behavior.

In the identification device provided by the embodiment, the identification module 42 identifies the illegal user by a single degree of the target behavior in the behavior performed by the user determined by the determining module 41, thereby identifying the manner in which the behavior interval of the execution target is increased. Users who perform actions such as spam push. Because illegal users in the prior art often adopt this The method of increasing the execution target behavior interval evades recognition. Therefore, the single degree recognition method based on the target behavior reduces the probability of illegal users escaping recognition, improves the recognition rate of illegal users, and optimizes the recognition effect of illegal users.

Example 7

FIG. 9 is a schematic structural diagram of an identification device according to Embodiment 7 of the present invention. To clearly illustrate the previous embodiment, this embodiment provides a possible implementation manner of the identification device, as shown in FIG. The determining module 41 further includes: a calculating unit 411 and an indicating unit 412.

The calculating unit 411 is configured to calculate a proportion of the target behavior in the behavior performed by the user.

The indicating unit 412 is configured to use the specific gravity to indicate a single degree of the target behavior.

The calculation unit 411 includes: a calculation subunit 4111 and a smoothing subunit 4112.

a calculating subunit 4111, configured to calculate a ratio between the number of executions of the target behavior and the total number of times the user performs the behavior;

The smoothing subunit 4112 is configured to correct the calculated ratio by using a smoothing algorithm to obtain a proportion of the target behavior in the behavior performed by the user.

Further, the identification module 42 includes a prediction unit 421 and an identification unit 422.

The prediction unit 421 is configured to estimate a probability that the user is an illegal user according to a single degree of the target behavior.

Specifically, the prediction unit 421 is specifically configured to calculate, according to a single degree of the target behavior of the user in the first time period, and a frequency of execution of the target behavior in the second time period, calculate a probability that the user is an illegal user. .

The duration of the first period of time is less than the duration of the second period of time.

The identifying unit 422 is configured to identify an illegal user according to the probability.

Further, the target behavior includes at least two steps.

Based on this, the identification device further includes an analysis module 43.

The analyzing module 43 is configured to perform an analysis for each user's behavior to obtain the behavior repeatedly performed by the user, and use the behavior that the number of repeated executions exceeds a preset threshold as the target behavior.

Example eight

10 is a schematic flowchart of an identification method according to Embodiment 8 of the present invention. The method provided in this embodiment may be used to identify illegal content. As shown in FIG. 10, the method includes:

Step 801: Determine a single degree of target behavior in the behavior performed by the user.

Step 802: Identify, according to a single degree of the target behavior, whether the content generated by the target behavior is non- Legal content.

As a possible implementation manner, if the target behavior is more than a limit, the user is identified as an illegal user, and the content generated by the target behavior performed by the illegal user is illegal content.

As another possible implementation manner, on the one hand, the short-term behavior of the user, that is, the behavior performed in the recent statistical time window, may determine a single degree of the target behavior based on step 801, and then estimate the basis according to the single degree of the target behavior. The short-term behavior determines the probability that the user is an illegal user. On the other hand, for the long-term behavior of the user, that is, the behavior performed within the statistical history time window, the frequency of the target behavior is calculated, and the probability of determining the user as an illegal user based on the long-term behavior is estimated. Further, according to the probability determined by the long-term behavior and the probability determined by the short-term behavior, the illegal user is identified, and the content generated by the target behavior performed by the illegal user is determined as illegal content.

It can be seen that by identifying the content generated by the target behavior as illegal content according to the single degree of the target behavior in the behavior performed by the user, it is possible to identify the spam pushed by the method of increasing the execution target behavior interval. Since the illegal users in the prior art often use this method of increasing the execution target interval to evade recognition, a single degree of recognition based on the target behavior reduces the probability of illegal content escaping recognition and improves the identification of illegal content. Rate, optimized for the recognition of illegal content.

Example nine

This embodiment provides an anti-spam content system. FIG. 11 is a schematic structural diagram of an anti-spam content system according to Embodiment 9 of the present invention. As shown in FIG. 11, the anti-spam content system includes: an acquisition node and an illegal content identification. node.

The collection node is used to record the behavior performed by the user.

An illegal content identification node, configured to acquire, from the collection node, an action performed by the user; determine a single degree of the target behavior in the behavior performed by the user; and identify the target behavior according to a single degree of the target behavior Whether the generated content is illegal.

It should be noted that the illegitimate content identification node is used to perform the identification method provided in the foregoing Embodiment 2 and Embodiment 8. For details, refer to the related description in the foregoing embodiment, and details are not repeatedly described in this embodiment.

Further, the anti-spam system further includes: a management node.

a management node, configured to acquire the identified illegal content from the illegal content identification node; Shielded. And/or a management node, configured to acquire the identified illegal content from the illegal content identification node; and use the operation authority restriction measure to punish the user who issues the illegal content.

Due to the fact that there are a large number of illegal users who publish spam on the website, frequently posting advertisements and other inappropriate content on the website for malicious promotion, the network environment is deteriorated and the user experience is poor. The anti-spam content system provided in this embodiment can effectively identify the illegal content published by the illegal users and the illegal users, and can also block the illegal content while shielding the illegal content. Compared with the way in which the administrator reviews the posted content or the keyword automatically blocks the published content, it can eliminate the malicious release of advertisements and other bad content from the source of the information publisher, effectively purifying the network environment and improving The user experience.

Example ten

FIG. 12 is a schematic structural diagram of an identification device according to Embodiment 10 of the present invention. As shown in FIG. 12, the method includes: a determining module 91 and an identifying module 92.

A determination module 91 is used to determine a single degree of target behavior in the behavior performed by the user.

The identification module 92 is configured to identify, according to a single degree of the target behavior, whether the content generated by the target behavior is illegal content.

Optionally, the determining module 91 is specifically configured to calculate a proportion of the target behavior in the behavior performed by the user; and using the specific gravity to indicate a single degree of the target behavior.

The determining module 91 calculates the proportion of the target behavior in the behavior performed by the user, including: calculating a ratio between the number of executions of the target behavior and the total number of behaviors performed by the user; A smoothing algorithm corrects the calculated ratio to obtain a proportion of the target behavior in the behavior performed by the user.

The identification module 92 is specifically configured to estimate a probability that the content is illegal content according to a single degree of the target behavior; and identify the illegal content according to the probability.

The identification module 92 estimates the probability that the content is illegal content according to a single degree of the target behavior. The method includes: calculating a probability that the content is illegal content according to a single degree of the target behavior of the user in the first time period, and an execution frequency of the target behavior in the second time period; wherein the first time The duration of the segment is less than the duration of the second period of time.

As a possible implementation, the target behavior includes at least two steps. The determining module 91 determines, before the singularity of the target behavior in the behavior performed by the user, the method for: analyzing, for each user's behavior, obtaining the behavior repeatedly performed by the user; and repeating the execution times exceeding a preset threshold The behavior as the target behavior.

One of ordinary skill in the art will appreciate that all or part of the steps to implement the various method embodiments described above may be accomplished by hardware associated with the program instructions. The aforementioned program can be stored in a computer readable storage medium. The program, when executed, performs the steps including the foregoing method embodiments; and the foregoing storage medium includes various media that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.

Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, and are not intended to be limiting; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that The technical solutions described in the foregoing embodiments may be modified, or some or all of the technical features may be equivalently replaced; and the modifications or substitutions do not deviate from the technical solutions of the embodiments of the present invention. range.

Claims

An identification method, the method comprising:

The collection node records the behavior performed by the user;

The illegal user identification node acquires the behavior performed by the user from the collection node;

The illegal user identification node determines a single degree of target behavior in the behavior performed by the user;

The illegal user identification node identifies whether the user is an illegal user according to a single degree of the target behavior.
The identification method according to claim 1, wherein the illegal user identification node determines a single degree of target behavior in the behavior performed by the user, including:

The illegal user identification node calculates a proportion of the target behavior in the behavior performed by the user, and uses the specific gravity to indicate a single degree of the target behavior.
The identification method according to claim 2, wherein the calculating the proportion of the target behavior in the behavior performed by the user comprises:

The illegal user identification node calculates a ratio between the number of executions of the target behavior and the total number of times the user performs the behavior;

Using a smoothing algorithm, the illegal user identification node corrects the calculated ratio to obtain the proportion of the target behavior in the behavior performed by the user.
The identification method according to claim 1, wherein the unauthorized user identification node identifies whether the user is an illegal user according to a single degree of the target behavior, including:

The illegal user identification node estimates a probability that the user is an illegal user according to a single degree of the target behavior;

The illegal user identification node identifies an illegal user according to the probability.
The identification method according to claim 4, wherein the illegal user identification node estimates the probability that the user is an illegal user according to a single degree of the target behavior, including:

The illegal user identification node calculates a probability that the user is an illegal user according to a single degree of the target behavior of the user in the first time period and the execution frequency of the target behavior in the second time period; The duration of the first time period is less than the duration of the second time period.
The identification method according to any one of claims 1 to 5, wherein the target behavior comprises at least two steps of operation.
The identification method according to any one of claims 1 to 5, wherein the illegal user identification node Determining a single degree of target behavior in the behavior performed by the user includes:

The illegal user identification node performs an analysis on the behavior of each user to obtain the behavior repeatedly performed by the user;

The illegal user identification node takes the behavior that the number of repeated executions exceeds a preset threshold as the target behavior.
The identification method according to any one of claims 1-5, wherein after the identifying whether the user is an illegal user, the method further includes:

If the user is identified as an illegal user, the illegal user identification node provides the identified illegal user to the management node;

The management node uses the operation authority restriction measure to punish the illegal user.
The identification method according to any one of claims 1-5, wherein after the identifying whether the user is an illegal user, the method further includes:

If the user is identified as an illegal user, the illegal user identification node provides the identified illegal user to the management node;

The management node blocks content published by the illegal user.
An identification method, the method comprising:

The collection node records the behavior performed by the user;

The illegal content identification node acquires the behavior performed by the user from the collection node;

The illegal content identification node determines a single degree of target behavior in the behavior performed by the user;

The illegal content identification node identifies whether the content generated by the target behavior is illegal content according to a single degree of the target behavior.
The identification method according to claim 10, wherein the illegal content identification node determines a single degree of target behavior in the behavior performed by the user, including:

The illegal content identification node calculates a proportion of the target behavior in the behavior performed by the user, and uses the specific gravity to indicate a single degree of the target behavior.
The identification method according to claim 11, wherein the calculating the proportion of the target behavior in the behavior performed by the user comprises:

The illegal content identification node calculates a ratio between the number of executions of the target behavior and the total number of times the user performs the behavior;

Using a smoothing algorithm, the illegal content recognition node corrects the calculated ratio to obtain a proportion of the target behavior in the behavior performed by the user.
The identification method according to claim 10, wherein the illegal content identification node identifies whether the content generated by the target behavior is illegal content according to a single degree of the target behavior, including:

The illegal content identification node estimates a probability that the content is illegal content according to a single degree of the target behavior;

The illegal content identification node identifies an illegal user according to the probability.
The identification method according to claim 13, wherein the illegal content identification node estimates the probability that the content is illegal content according to a single degree of the target behavior, and includes:

The illegal content identification node calculates a probability that the content is illegal content according to a single degree of the target behavior of the user in the first time period and the execution frequency of the target behavior in the second time period; The duration of the first time period is less than the duration of the second time period.
The identification method according to any one of claims 10 to 14, wherein the target behavior comprises at least two steps of operation.
The identification method according to any one of claims 10 to 14, wherein the illegal content identification node determines, before the single degree of the target behavior in the behavior performed by the user, the method further includes:

The illegal content identification node performs an analysis on the behavior of each user to obtain the behavior repeatedly performed by the user;

The illegal content identification node takes the behavior that the number of repeated executions exceeds a preset threshold as the target behavior.
The identification method according to any one of claims 10 to 14, wherein after the identifying whether the content generated by the target behavior is illegal content, the method further includes:

If the content is identified as illegal content, the illegal content identification node provides the identified illegal content to the management node;

The management node blocks the illegal content.
The identification method according to any one of claims 10 to 14, wherein after the identifying whether the content generated by the target behavior is illegal content, the method further includes:

If the content is identified as illegal content, the illegal content identification node provides the identified illegal content to the management node;

The management node uses the operation authority restriction measure to punish the user who issues the illegal content.
An anti-spam content system, comprising: an acquisition node and an illegal user identification node;

The collecting node is configured to record behavior performed by the user;

The illegal user identification node is configured to acquire, from the collection node, an action performed by the user; The single degree of the target behavior in the behavior performed by the user; identifying whether the user is an illegal user according to a single degree of the target behavior.
The anti-spam system of claim 19, wherein the system further comprises:

The management node is configured to acquire the identified illegal user from the illegal user identification node, and impose an operation authority restriction measure to punish the illegal user.
The anti-spam system of claim 19, wherein the system further comprises:

And a management node, configured to acquire the identified illegal user from the illegal user identification node, and block the content published by the illegal user.
An anti-spam content system, comprising: an acquisition node and an illegal content identification node;

The collecting node is configured to record behavior performed by the user;

The illegal content identification node is configured to acquire, from the collection node, an action performed by the user; determine a single degree of the target behavior in the behavior performed by the user; and identify the single degree according to the single degree of the target behavior Whether the content generated by the target behavior is illegal.
The anti-spam system of claim 22, wherein the system further comprises:

And a management node, configured to acquire the identified illegal content from the illegal content identification node; and block the illegal content.
The anti-spam system of claim 22, wherein the system further comprises:

And a management node, configured to acquire the identified illegal content from the illegal content identification node; and use the operation authority restriction measure to punish the user who issues the illegal content.
An identification method, comprising:

Determine the single degree of target behavior in the behavior performed by the user;

An illegal user is identified based on a single degree of the target behavior.
The identification method according to claim 25, wherein said determining a single degree of target behavior comprises:

Calculating a proportion of the target behavior in the behavior performed by the user;

The specific gravity is used to indicate a single degree of the target behavior.
The identification method according to claim 26, wherein the calculating the proportion of the target behavior in the behavior performed by the user comprises:

Calculating a ratio between the number of executions of the target behavior and the total number of behaviors performed by the user;

Using a smoothing algorithm, correcting the calculated ratio to obtain the target behavior in the user's execution The proportion of the behavior of the line.
The identification method according to claim 25, wherein the identifying the illegal user according to the single degree of the target behavior comprises:

Estimating the probability that the user is an illegal user according to a single degree of the target behavior;

An illegal user is identified based on the probability.
The identification method according to claim 28, wherein the estimating the probability that the user is an illegal user according to a single degree of the target behavior comprises:

Calculating, according to a single degree of the target behavior of the user in the first time period, and a frequency of execution of the target behavior in the second time period, calculating a probability that the user is an illegal user; wherein, the first time period The duration is less than the duration of the second period of time.
The identification method according to any one of claims 25 to 29, wherein the target behavior comprises at least two steps of operation.
The identification method according to any one of claims 25 to 29, wherein before the determining a single degree of the target behavior in the behavior performed by the user, the method further comprises:

Performing an analysis for each user's behavior to obtain the repeated behavior of the user;

The behavior of repeating the number of executions exceeding the preset threshold is taken as the target behavior.
An identification device, comprising:

a determination module for determining a single degree of target behavior in the behavior performed by the user;

An identification module for identifying an illegal user according to a single degree of the target behavior.
An identification method, comprising:

Determine the single degree of target behavior in the behavior performed by the user;

Identifying whether the content generated by the target behavior is illegal content according to a single degree of the target behavior.
The identification method according to claim 33, wherein said determining a single degree of target behavior in the behavior performed by the user comprises:

Calculating a proportion of the target behavior in the behavior performed by the user;

The specific gravity is used to indicate a single degree of the target behavior.
The identification method according to claim 34, wherein the calculating the proportion of the target behavior in the behavior performed by the user comprises:

Calculating a ratio between the number of executions of the target behavior and the total number of behaviors performed by the user;

Using a smoothing algorithm, correcting the calculated ratio to obtain the target behavior in the user's execution The proportion of the behavior of the line.
The identification method according to claim 33, wherein the identifying whether the content generated by the target behavior is illegal according to a single degree of the target behavior comprises:

Estimating the probability that the content is illegal content according to a single degree of the target behavior;

Illegal content is identified based on the probability.
The identification method according to claim 36, wherein the estimating the probability that the content is illegal content according to a single degree of the target behavior comprises:

Calculating, according to a single degree of the target behavior of the user in the first time period, and a frequency of execution of the target behavior in the second time period, calculating a probability that the content is illegal content; wherein, the first time period The duration is less than the duration of the second period of time.
The identification method according to any one of claims 33 to 37, wherein the target behavior comprises at least two steps of operation.
The identification method according to any one of claims 33 to 37, wherein the determining, before determining a single degree of the target behavior in the behavior performed by the user, further comprises:

Performing an analysis for each user's behavior to obtain the repeated behavior of the user;

The behavior of repeating the number of executions exceeding the preset threshold is taken as the target behavior.
An identification device, comprising:

a determination module for determining a single degree of target behavior in the behavior performed by the user;

And an identification module, configured to identify, according to a single degree of the target behavior, whether the content generated by the target behavior is illegal content.