CN114885006A

CN114885006A - Method for identifying real user based on comprehensive characteristics

Info

Publication number: CN114885006A
Application number: CN202210507585.0A
Authority: CN
Inventors: 王子健; 徐桢虎; 戴金良; 王磊; 李华
Original assignee: Sichuan Cover Media Technology Co ltd
Current assignee: Sichuan Cover Media Technology Co ltd
Priority date: 2022-05-10
Filing date: 2022-05-10
Publication date: 2022-08-09

Abstract

The invention relates to the technical field of user identification, and provides a method for identifying a real user based on comprehensive characteristics in order to improve the identification accuracy of the real user, which comprises the following steps of 1, carrying out real user identification judgment on the environmental characteristics of a current network; step 2, identifying and judging the real user of the historical behavior; step 3, carrying out real user identification and judgment on the user behavior flow path; and 4, when the users are judged to be real users in the steps 1-3, the users are judged to be real users. By adopting the method, the identification accuracy is greatly improved, and effective defense can be achieved by aiming at the script simulation attack and other modes.

Description

Method for identifying real user based on comprehensive characteristics

Technical Field

The invention relates to the technical field of user identification, in particular to a method for identifying a real user based on comprehensive characteristics.

Background

The number of users and the number of online users are the most core indexes of the current internet companies and products, generally, the most direct mode of an enterprise is to increase the number of daily users through various marketing activities, but with the rise of wool parties and the application of script technologies such as brushing lists and the like, most of the benefits of the unreal users can be removed by the abnormal marketing activities within the first few seconds, so that the identification of the real users in the activities is a guarantee for reducing the losses of the companies and ensuring the operation activities of the companies to obtain the maximum benefits.

Currently, a common practice for protecting wool is: this type of man-machine challenge is verified by means of a WeChat, a short message verification code, etc. However, these methods have some disadvantages, and most of the user traffic is guided into the wechat by the authentication of the wechat, and the method cannot prevent all the scripting attacks; the man-machine challenge and short message verification code based mode consumes more cost, causes some influence on user experience, and simultaneously can not prevent many attacks which increase the intelligent identification technology type; other techniques for identifying real users, such as refer header monitoring, IP address segment ranges, etc., can also be modeled by the script to bypass.

Disclosure of Invention

In order to improve the identification accuracy of the real user, the application provides a method for identifying the real user based on comprehensive characteristics.

The technical scheme adopted by the invention for solving the problems is as follows:

the method for identifying the real user based on the comprehensive characteristics comprises the following steps:

step 1, identifying and judging real users of the current network environment characteristics;

step 2, identifying and judging the real user of the historical behavior;

step 3, carrying out real user identification and judgment on the user behavior flow path;

and 4, when the users are judged to be real users in the steps 1-3, the users are judged to be real users.

Further, the existing network environment features include: the existing network environment characteristics comprise: client IP, client Refer information, website url information and client UA information.

Further, the specific steps of step 1 are:

step 11, extracting the current network environment characteristics from HTTP based on the user request;

step 12, generalizing the environmental characteristics of the existing network;

step 13, dividing the current network request environment into a plurality of classes by adopting similarity calculation and clustering processing;

step 14, mapping the characteristic values of the actual real users on the classification to be used as a characteristic template of the real users in the current network environment;

and step 15, identifying and judging the real user by adopting the characteristic template.

Further, the generalization rule in step 12 is: two bits after the client IP is generalized, domain name extraction is carried out on the Refer information of the client, the url information of the website is replaced by special characters, and the information of the client UA reserves browser information.

Further, the step 2 comprises:

step 21, acquiring all user behavior data in a period before the current activity time;

step 22, carrying out statistical analysis on the user behavior data and marking corresponding labels on the behavior of the user aiming at activities;

step 23, using the behavior data of the existing real user and malicious user as a training set to obtain a mapping relation between the label and the real user, and using the mapping relation as a real user template;

and 24, identifying and judging the real user according to the real user template.

Further, the step 3 comprises:

step 31, presetting an access point, an access time interval and an access mode which must be passed by a user behavior for an activity flow;

step 32, recording the access point, the access time interval and the access mode passed by the user in real time;

and step 33, judging whether the access point, the access time interval and the access mode in the step 32 are the same as those preset in the step 31, if so, determining the user as a real user, otherwise, determining the user as a malicious user.

Further, in the step 33, when the access time interval and the access mode of a certain access point in the step 32 are the same as those in the step 31, the access point is highlighted, and when all the access points are highlighted, the user is determined to be a real user.

Compared with the prior art, the invention has the beneficial effects that: the method and the system respectively identify and judge the real users from three aspects of network environment, historical behaviors and real-time paths, greatly improve the identification accuracy, and can effectively defend against script simulation attacks and other modes; and the user does not need to perform additional operations, such as man-machine challenge, so that the use experience of the user is improved.

Drawings

Fig. 1 is a flowchart of a method for identifying real users based on integrated features.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As shown in fig. 1, the method for identifying a real user based on integrated features includes:

step 1, identifying and judging real users of the current network environment characteristics; the current network environment characteristics comprise client IP, client Refer information, website url information, client UA information and the like;

step 2, identifying and judging the real user of the historical behavior;

Specifically, the specific steps of step 1 are as follows:

step 12, generalizing the environmental characteristics of the existing network; and (3) replacing a number part, a letter part, an alphanumeric mixed part, a hexadecimal part and a base64 part in the http flow with special characters respectively, canceling the difference parts of the special characters, and reserving the structural characteristics of the information through generalization so as to facilitate identification and judgment. In this embodiment, the generalization rule is as follows: two bits after IP generalization, the generalized format is: aaa.bbb. @; performing domain name extraction on Refer information of a client, and generalizing a Refer common format such as https:// m.the cover.cn/xxx into a type format of # ############# - ## - #; url is replaced with special characters, generalized as: @/####; the UA reserves information such as browser, and the generalization format is @ @ @. Other generalized formats may also be employed, without limitation.

Step 13, dividing the current network request environment into a plurality of classes by adopting similarity calculation and clustering treatment: after the information generalization, specific feature items such as IP, url and the like are obtained, and the feature values of the feature items are different for each request; adopting similarity calculation and clustering processing to divide the current network request environment into a plurality of classes aiming at the characteristic values, if the IP conforms to the rule 1, the UA conforms to the rule 2 and the url conforms to the rule 3, the classification is set as 1, and then a plurality of classes are obtained through permutation and combination according to different values; similarity calculation and clustering processing are prior art and are not described herein again;

step 14, adopting the characteristic values of the actual real users and the actual malicious users to map on the classification, and taking the characteristic values as the characteristic templates of the real users and the malicious users in the current network environment: using the existing actual real users and actual malicious users as the label sets, and combining the existing network classification obtained in the step 13 to map whether a certain existing network classification is a real user or a malicious user, so as to obtain a characteristic template which can deduce whether the existing network classification is a real user or not by combining the index characteristic values;

and step 15, identifying and judging the real user and the malicious user by adopting the characteristic template.

The step 2 comprises the following steps:

step 21, acquiring all user behavior detail data in a period before the current activity time; the previous cycle typically takes 5 minutes;

step 22, carrying out statistical analysis on the data and marking some quasi-real-time labels on the activities of the user; such as the registration time, operating frequency, time to enter activity, etc. of the user;

and step 23, using the behaviors of the existing real users and malicious users as a training set to obtain the mapping relation between the user quasi-real-time label combination and whether the user quasi-real-time label combination is a real user, so as to obtain a group of templates for judging whether the user is a real user through the quasi-real-time label value. For example, through analysis and mining of historical data, it is found that the user portrait has the advantages that the registration time is near the current time of the activity, the operation frequency is high, the great probability that the operation path directly enters the activity is a wool party user, and the portrait characteristic value combination type can be used as a template for later identification and judgment aiming at user requests;

and 24, judging whether the user is a real user or not according to each user request through the template.

The step 3 comprises the following steps:

step 32, recording access points, access time intervals and access modes passed by the user in real time;

Preferably, when the access time interval and the access mode of an access point in step 32 are the same as those in step 31, the access point is highlighted, and when all the access points are highlighted, the user is determined to be a real user.

Examples

Prepare to launch an activity that will have real users, wool party users and script swiped users. The wool party users mainly refer to a large number of user accounts which are registered in batches before the start of an activity; the script form brushing account refers to an account which is investigated aiming at the client or the active webpage and can simulate a certain user request.

Firstly, carrying out a real user identification and judgment process on the environmental characteristics of the current network: various information of the existing network is collected, and the embodiment mainly comprises IP addresses, refer information, UA information and the like; in order to facilitate later-stage identification and judgment, the obtained current network information is spliced in sequence, the spliced features are subjected to generalization processing, and the current network request environment is divided into a plurality of classes by adopting similarity calculation and clustering processing, for example, the class 1: the IP address field is A.A.110.0-A.A.110.225 and the Refer contains yangmao typeface; and (4) classification 2: the IP address field is A.B.110.0-A.B.110.225 and the Refer contains the cover word, etc.

Through analysis of existing data, the phenomenon of the refresh script is that the request quantity of the address field of A.A.110.0-A.A.110.225 is huge, and the Refer information contains yangmao characters. And comparing and classifying to obtain template conditions that the classification 1 is a malicious user, the classification 2 is a real user, and the classification 3 is a script single-swiping user and the like.

Whether the user is a real user can be judged through the template aiming at each user request.

Next, a user judgment is made using a historical behavior analysis technique: firstly, a time window is determined, generally 5 minutes, and all the tags of each user are calculated for all the user requests in the time window, which mainly includes: registration time, operating frequency, operating path, etc. Data mining is carried out on historical data to obtain corresponding relations between different tag value combinations and user types, for example, the difference between the registration time and the activity starting time is relatively close, the operation frequency is relatively high, the operation path directly enters an activity page, the type of user is generally a woolen party user, and then the user recording the tag value combination is a template 1, and the user conforming to the template is a malicious user; and sequentially calculating the user types corresponding to the other label value combination templates.

After the user request comes, the statistical analysis of the labels corresponding to the behaviors in the time window is compared with the template, and whether the user is a real user can be obtained.

And (3) carrying out real user identification and judgment on the user behavior flow path: for example, in the activity, the normal entry path includes operations of opening the client, clicking an activity channel, pulling down a channel list, entering activity details, clicking a participation button, and the like. We set the waypoints to be lit, the time interval at which they should be set, and so on.

After the user requests to enter, if the user passes through the path points arranged above, lighting operation is executed, whether the user lights all paths is judged when the user carries out specific activities, if not, the user is identified as a malicious user if the user has a high probability of refreshing a single script.

It should be noted that the creation of the specific identification judgment template may be adjusted according to actual needs, and the application does not limit the creation.

The three modes respectively identify and judge the real users from three aspects of network environment, historical behaviors and real-time paths, the analysis of the network environment and the real-time paths can identify a single-swiping script, the analysis of the historical behaviors can identify wool parties, different identification and judgment modes are adopted for different malicious users, the identification accuracy is greatly improved, and effective defense can be achieved by simulating attacks and other modes aiming at the scripts; and the user does not need to carry out additional authentication operation such as man-machine challenge, and the use experience of the user is improved.

Claims

1. The method for identifying the real user based on the comprehensive characteristics is characterized by comprising the following steps:

step 2, identifying and judging the real user of the historical behavior;

2. The method for identifying real users based on integrated features of claim 1, wherein the existing network environment features comprise: client IP, client Refer information, website url information and client UA information.

3. The method for identifying the real user based on the integrated features as claimed in claim 2, wherein the specific steps of the step 1 are as follows:

4. The method for identifying real users based on integrated features of claim 3, wherein the generalization rule in step 12 is: two bits after the client IP is generalized, domain name extraction is carried out on the Refer information of the client, the url information of the website is replaced by special characters, and the information of the client UA reserves browser information.

5. The method for identifying the real user based on the integrated features of claim 1, wherein the step 2 comprises:

6. The method for identifying the real user based on the integrated features of claim 1, wherein the step 3 comprises:

7. The method of claim 6, wherein in step 33, when the access time interval and access mode of a certain access point in step 32 are the same as those in step 31, the access point is highlighted, and when all the access points are highlighted, the user is identified as the real user.