CN114885006A - Method for identifying real user based on comprehensive characteristics - Google Patents

Method for identifying real user based on comprehensive characteristics Download PDF

Info

Publication number
CN114885006A
CN114885006A CN202210507585.0A CN202210507585A CN114885006A CN 114885006 A CN114885006 A CN 114885006A CN 202210507585 A CN202210507585 A CN 202210507585A CN 114885006 A CN114885006 A CN 114885006A
Authority
CN
China
Prior art keywords
user
real
users
identifying
real user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210507585.0A
Other languages
Chinese (zh)
Inventor
王子健
徐桢虎
戴金良
王磊
李华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Cover Media Technology Co ltd
Original Assignee
Sichuan Cover Media Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Cover Media Technology Co ltd filed Critical Sichuan Cover Media Technology Co ltd
Priority to CN202210507585.0A priority Critical patent/CN114885006A/en
Publication of CN114885006A publication Critical patent/CN114885006A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention relates to the technical field of user identification, and provides a method for identifying a real user based on comprehensive characteristics in order to improve the identification accuracy of the real user, which comprises the following steps of 1, carrying out real user identification judgment on the environmental characteristics of a current network; step 2, identifying and judging the real user of the historical behavior; step 3, carrying out real user identification and judgment on the user behavior flow path; and 4, when the users are judged to be real users in the steps 1-3, the users are judged to be real users. By adopting the method, the identification accuracy is greatly improved, and effective defense can be achieved by aiming at the script simulation attack and other modes.

Description

Method for identifying real user based on comprehensive characteristics
Technical Field
The invention relates to the technical field of user identification, in particular to a method for identifying a real user based on comprehensive characteristics.
Background
The number of users and the number of online users are the most core indexes of the current internet companies and products, generally, the most direct mode of an enterprise is to increase the number of daily users through various marketing activities, but with the rise of wool parties and the application of script technologies such as brushing lists and the like, most of the benefits of the unreal users can be removed by the abnormal marketing activities within the first few seconds, so that the identification of the real users in the activities is a guarantee for reducing the losses of the companies and ensuring the operation activities of the companies to obtain the maximum benefits.
Currently, a common practice for protecting wool is: this type of man-machine challenge is verified by means of a WeChat, a short message verification code, etc. However, these methods have some disadvantages, and most of the user traffic is guided into the wechat by the authentication of the wechat, and the method cannot prevent all the scripting attacks; the man-machine challenge and short message verification code based mode consumes more cost, causes some influence on user experience, and simultaneously can not prevent many attacks which increase the intelligent identification technology type; other techniques for identifying real users, such as refer header monitoring, IP address segment ranges, etc., can also be modeled by the script to bypass.
Disclosure of Invention
In order to improve the identification accuracy of the real user, the application provides a method for identifying the real user based on comprehensive characteristics.
The technical scheme adopted by the invention for solving the problems is as follows:
the method for identifying the real user based on the comprehensive characteristics comprises the following steps:
step 1, identifying and judging real users of the current network environment characteristics;
step 2, identifying and judging the real user of the historical behavior;
step 3, carrying out real user identification and judgment on the user behavior flow path;
and 4, when the users are judged to be real users in the steps 1-3, the users are judged to be real users.
Further, the existing network environment features include: the existing network environment characteristics comprise: client IP, client Refer information, website url information and client UA information.
Further, the specific steps of step 1 are:
step 11, extracting the current network environment characteristics from HTTP based on the user request;
step 12, generalizing the environmental characteristics of the existing network;
step 13, dividing the current network request environment into a plurality of classes by adopting similarity calculation and clustering processing;
step 14, mapping the characteristic values of the actual real users on the classification to be used as a characteristic template of the real users in the current network environment;
and step 15, identifying and judging the real user by adopting the characteristic template.
Further, the generalization rule in step 12 is: two bits after the client IP is generalized, domain name extraction is carried out on the Refer information of the client, the url information of the website is replaced by special characters, and the information of the client UA reserves browser information.
Further, the step 2 comprises:
step 21, acquiring all user behavior data in a period before the current activity time;
step 22, carrying out statistical analysis on the user behavior data and marking corresponding labels on the behavior of the user aiming at activities;
step 23, using the behavior data of the existing real user and malicious user as a training set to obtain a mapping relation between the label and the real user, and using the mapping relation as a real user template;
and 24, identifying and judging the real user according to the real user template.
Further, the step 3 comprises:
step 31, presetting an access point, an access time interval and an access mode which must be passed by a user behavior for an activity flow;
step 32, recording the access point, the access time interval and the access mode passed by the user in real time;
and step 33, judging whether the access point, the access time interval and the access mode in the step 32 are the same as those preset in the step 31, if so, determining the user as a real user, otherwise, determining the user as a malicious user.
Further, in the step 33, when the access time interval and the access mode of a certain access point in the step 32 are the same as those in the step 31, the access point is highlighted, and when all the access points are highlighted, the user is determined to be a real user.
Compared with the prior art, the invention has the beneficial effects that: the method and the system respectively identify and judge the real users from three aspects of network environment, historical behaviors and real-time paths, greatly improve the identification accuracy, and can effectively defend against script simulation attacks and other modes; and the user does not need to perform additional operations, such as man-machine challenge, so that the use experience of the user is improved.
Drawings
Fig. 1 is a flowchart of a method for identifying real users based on integrated features.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As shown in fig. 1, the method for identifying a real user based on integrated features includes:
step 1, identifying and judging real users of the current network environment characteristics; the current network environment characteristics comprise client IP, client Refer information, website url information, client UA information and the like;
step 2, identifying and judging the real user of the historical behavior;
step 3, carrying out real user identification and judgment on the user behavior flow path;
and 4, when the users are judged to be real users in the steps 1-3, the users are judged to be real users.
Specifically, the specific steps of step 1 are as follows:
step 11, extracting the current network environment characteristics from HTTP based on the user request;
step 12, generalizing the environmental characteristics of the existing network; and (3) replacing a number part, a letter part, an alphanumeric mixed part, a hexadecimal part and a base64 part in the http flow with special characters respectively, canceling the difference parts of the special characters, and reserving the structural characteristics of the information through generalization so as to facilitate identification and judgment. In this embodiment, the generalization rule is as follows: two bits after IP generalization, the generalized format is: aaa.bbb. @; performing domain name extraction on Refer information of a client, and generalizing a Refer common format such as https:// m.the cover.cn/xxx into a type format of # ############# - ## - #; url is replaced with special characters, generalized as: @/####; the UA reserves information such as browser, and the generalization format is @ @ @. Other generalized formats may also be employed, without limitation.
Step 13, dividing the current network request environment into a plurality of classes by adopting similarity calculation and clustering treatment: after the information generalization, specific feature items such as IP, url and the like are obtained, and the feature values of the feature items are different for each request; adopting similarity calculation and clustering processing to divide the current network request environment into a plurality of classes aiming at the characteristic values, if the IP conforms to the rule 1, the UA conforms to the rule 2 and the url conforms to the rule 3, the classification is set as 1, and then a plurality of classes are obtained through permutation and combination according to different values; similarity calculation and clustering processing are prior art and are not described herein again;
step 14, adopting the characteristic values of the actual real users and the actual malicious users to map on the classification, and taking the characteristic values as the characteristic templates of the real users and the malicious users in the current network environment: using the existing actual real users and actual malicious users as the label sets, and combining the existing network classification obtained in the step 13 to map whether a certain existing network classification is a real user or a malicious user, so as to obtain a characteristic template which can deduce whether the existing network classification is a real user or not by combining the index characteristic values;
and step 15, identifying and judging the real user and the malicious user by adopting the characteristic template.
The step 2 comprises the following steps:
step 21, acquiring all user behavior detail data in a period before the current activity time; the previous cycle typically takes 5 minutes;
step 22, carrying out statistical analysis on the data and marking some quasi-real-time labels on the activities of the user; such as the registration time, operating frequency, time to enter activity, etc. of the user;
and step 23, using the behaviors of the existing real users and malicious users as a training set to obtain the mapping relation between the user quasi-real-time label combination and whether the user quasi-real-time label combination is a real user, so as to obtain a group of templates for judging whether the user is a real user through the quasi-real-time label value. For example, through analysis and mining of historical data, it is found that the user portrait has the advantages that the registration time is near the current time of the activity, the operation frequency is high, the great probability that the operation path directly enters the activity is a wool party user, and the portrait characteristic value combination type can be used as a template for later identification and judgment aiming at user requests;
and 24, judging whether the user is a real user or not according to each user request through the template.
The step 3 comprises the following steps:
step 31, presetting an access point, an access time interval and an access mode which must be passed by a user behavior for an activity flow;
step 32, recording access points, access time intervals and access modes passed by the user in real time;
and step 33, judging whether the access point, the access time interval and the access mode in the step 32 are the same as those preset in the step 31, if so, determining the user as a real user, otherwise, determining the user as a malicious user.
Preferably, when the access time interval and the access mode of an access point in step 32 are the same as those in step 31, the access point is highlighted, and when all the access points are highlighted, the user is determined to be a real user.
Examples
Prepare to launch an activity that will have real users, wool party users and script swiped users. The wool party users mainly refer to a large number of user accounts which are registered in batches before the start of an activity; the script form brushing account refers to an account which is investigated aiming at the client or the active webpage and can simulate a certain user request.
Firstly, carrying out a real user identification and judgment process on the environmental characteristics of the current network: various information of the existing network is collected, and the embodiment mainly comprises IP addresses, refer information, UA information and the like; in order to facilitate later-stage identification and judgment, the obtained current network information is spliced in sequence, the spliced features are subjected to generalization processing, and the current network request environment is divided into a plurality of classes by adopting similarity calculation and clustering processing, for example, the class 1: the IP address field is A.A.110.0-A.A.110.225 and the Refer contains yangmao typeface; and (4) classification 2: the IP address field is A.B.110.0-A.B.110.225 and the Refer contains the cover word, etc.
Through analysis of existing data, the phenomenon of the refresh script is that the request quantity of the address field of A.A.110.0-A.A.110.225 is huge, and the Refer information contains yangmao characters. And comparing and classifying to obtain template conditions that the classification 1 is a malicious user, the classification 2 is a real user, and the classification 3 is a script single-swiping user and the like.
Whether the user is a real user can be judged through the template aiming at each user request.
Next, a user judgment is made using a historical behavior analysis technique: firstly, a time window is determined, generally 5 minutes, and all the tags of each user are calculated for all the user requests in the time window, which mainly includes: registration time, operating frequency, operating path, etc. Data mining is carried out on historical data to obtain corresponding relations between different tag value combinations and user types, for example, the difference between the registration time and the activity starting time is relatively close, the operation frequency is relatively high, the operation path directly enters an activity page, the type of user is generally a woolen party user, and then the user recording the tag value combination is a template 1, and the user conforming to the template is a malicious user; and sequentially calculating the user types corresponding to the other label value combination templates.
After the user request comes, the statistical analysis of the labels corresponding to the behaviors in the time window is compared with the template, and whether the user is a real user can be obtained.
And (3) carrying out real user identification and judgment on the user behavior flow path: for example, in the activity, the normal entry path includes operations of opening the client, clicking an activity channel, pulling down a channel list, entering activity details, clicking a participation button, and the like. We set the waypoints to be lit, the time interval at which they should be set, and so on.
After the user requests to enter, if the user passes through the path points arranged above, lighting operation is executed, whether the user lights all paths is judged when the user carries out specific activities, if not, the user is identified as a malicious user if the user has a high probability of refreshing a single script.
It should be noted that the creation of the specific identification judgment template may be adjusted according to actual needs, and the application does not limit the creation.
The three modes respectively identify and judge the real users from three aspects of network environment, historical behaviors and real-time paths, the analysis of the network environment and the real-time paths can identify a single-swiping script, the analysis of the historical behaviors can identify wool parties, different identification and judgment modes are adopted for different malicious users, the identification accuracy is greatly improved, and effective defense can be achieved by simulating attacks and other modes aiming at the scripts; and the user does not need to carry out additional authentication operation such as man-machine challenge, and the use experience of the user is improved.

Claims (7)

1. The method for identifying the real user based on the comprehensive characteristics is characterized by comprising the following steps:
step 1, identifying and judging real users of the current network environment characteristics;
step 2, identifying and judging the real user of the historical behavior;
step 3, carrying out real user identification and judgment on the user behavior flow path;
and 4, when the users are judged to be real users in the steps 1-3, the users are judged to be real users.
2. The method for identifying real users based on integrated features of claim 1, wherein the existing network environment features comprise: client IP, client Refer information, website url information and client UA information.
3. The method for identifying the real user based on the integrated features as claimed in claim 2, wherein the specific steps of the step 1 are as follows:
step 11, extracting the current network environment characteristics from HTTP based on the user request;
step 12, generalizing the environmental characteristics of the existing network;
step 13, dividing the current network request environment into a plurality of classes by adopting similarity calculation and clustering processing;
step 14, mapping the characteristic values of the actual real users on the classification to be used as a characteristic template of the real users in the current network environment;
and step 15, identifying and judging the real user by adopting the characteristic template.
4. The method for identifying real users based on integrated features of claim 3, wherein the generalization rule in step 12 is: two bits after the client IP is generalized, domain name extraction is carried out on the Refer information of the client, the url information of the website is replaced by special characters, and the information of the client UA reserves browser information.
5. The method for identifying the real user based on the integrated features of claim 1, wherein the step 2 comprises:
step 21, acquiring all user behavior data in a period before the current activity time;
step 22, carrying out statistical analysis on the user behavior data and marking corresponding labels on the behavior of the user aiming at activities;
step 23, using the behavior data of the existing real user and malicious user as a training set to obtain a mapping relation between the label and the real user, and using the mapping relation as a real user template;
and 24, identifying and judging the real user according to the real user template.
6. The method for identifying the real user based on the integrated features of claim 1, wherein the step 3 comprises:
step 31, presetting an access point, an access time interval and an access mode which must be passed by a user behavior for an activity flow;
step 32, recording the access point, the access time interval and the access mode passed by the user in real time;
and step 33, judging whether the access point, the access time interval and the access mode in the step 32 are the same as those preset in the step 31, if so, determining the user as a real user, otherwise, determining the user as a malicious user.
7. The method of claim 6, wherein in step 33, when the access time interval and access mode of a certain access point in step 32 are the same as those in step 31, the access point is highlighted, and when all the access points are highlighted, the user is identified as the real user.
CN202210507585.0A 2022-05-10 2022-05-10 Method for identifying real user based on comprehensive characteristics Pending CN114885006A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210507585.0A CN114885006A (en) 2022-05-10 2022-05-10 Method for identifying real user based on comprehensive characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210507585.0A CN114885006A (en) 2022-05-10 2022-05-10 Method for identifying real user based on comprehensive characteristics

Publications (1)

Publication Number Publication Date
CN114885006A true CN114885006A (en) 2022-08-09

Family

ID=82676174

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210507585.0A Pending CN114885006A (en) 2022-05-10 2022-05-10 Method for identifying real user based on comprehensive characteristics

Country Status (1)

Country Link
CN (1) CN114885006A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103401957A (en) * 2013-08-07 2013-11-20 五八同城信息技术有限公司 Method for identifying client machine uniquely in web environment
CN106453357A (en) * 2016-11-01 2017-02-22 北京红马传媒文化发展有限公司 Network ticket buying abnormal behavior recognition method and system and equipment
CN106713242A (en) * 2015-11-17 2017-05-24 阿里巴巴集团控股有限公司 Data request processing method and device
CN107465648A (en) * 2016-06-06 2017-12-12 腾讯科技(深圳)有限公司 The recognition methods of warping apparatus and device
US20180152474A1 (en) * 2016-11-28 2018-05-31 Arbor Networks, Inc. Dos detection configuration
CN112733045A (en) * 2021-04-06 2021-04-30 北京轻松筹信息技术有限公司 User behavior analysis method and device and electronic equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103401957A (en) * 2013-08-07 2013-11-20 五八同城信息技术有限公司 Method for identifying client machine uniquely in web environment
CN106713242A (en) * 2015-11-17 2017-05-24 阿里巴巴集团控股有限公司 Data request processing method and device
CN107465648A (en) * 2016-06-06 2017-12-12 腾讯科技(深圳)有限公司 The recognition methods of warping apparatus and device
CN106453357A (en) * 2016-11-01 2017-02-22 北京红马传媒文化发展有限公司 Network ticket buying abnormal behavior recognition method and system and equipment
US20180152474A1 (en) * 2016-11-28 2018-05-31 Arbor Networks, Inc. Dos detection configuration
CN112733045A (en) * 2021-04-06 2021-04-30 北京轻松筹信息技术有限公司 User behavior analysis method and device and electronic equipment

Similar Documents

Publication Publication Date Title
CN109509021B (en) Behavior track-based anomaly identification method and device, server and storage medium
EP2803031B1 (en) Machine-learning based classification of user accounts based on email addresses and other account information
CN103559235B (en) A kind of online social networks malicious web pages detection recognition methods
CN111078978B (en) Network credit website entity identification method and system based on website text content
CN111865925A (en) Network traffic based fraud group identification method, controller and medium
CN111104521B (en) Anti-fraud detection method and detection system based on graph analysis
CN102647408A (en) Method for judging phishing website based on content analysis
CN112491917B (en) Unknown vulnerability identification method and device for Internet of things equipment
CN112733045B (en) User behavior analysis method and device and electronic equipment
CN112118249B (en) Security protection method and device based on log and firewall
CN104852916A (en) Social engineering-based webpage verification code recognition method and system
CN108289093A (en) The construction method and structure system in App application condition codes library
CN112989348A (en) Attack detection method, model training method, device, server and storage medium
CN107508832A (en) A kind of device-fingerprint recognition methods and system
CN110674479A (en) Abnormal behavior data real-time processing method, device, equipment and storage medium
CN114885006A (en) Method for identifying real user based on comprehensive characteristics
Spranger et al. MoNA: automated identification of evidence in forensic short messages
CN109309668A (en) Website verification method, device, system, computer equipment and storage medium
CN114861076A (en) Information processing method, information processing device, computer equipment and storage medium
CN114915468A (en) Intelligent analysis and detection method for network crime based on knowledge graph
CN109995605A (en) A kind of method for recognizing flux and device and computer readable storage medium
Zolotukhin et al. Detection of anomalous http requests based on advanced n-gram model and clustering techniques
CN113688346A (en) Illegal website identification method, device, equipment and storage medium
CN114422168A (en) Malicious machine traffic identification method and system
CN113297847A (en) Http protocol information extraction method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination