CN106657007A - Method for recognizing abnormal batch ticket booking behavior based on DBSCAN model - Google Patents
Method for recognizing abnormal batch ticket booking behavior based on DBSCAN model Download PDFInfo
- Publication number
- CN106657007A CN106657007A CN201611019839.5A CN201611019839A CN106657007A CN 106657007 A CN106657007 A CN 106657007A CN 201611019839 A CN201611019839 A CN 201611019839A CN 106657007 A CN106657007 A CN 106657007A
- Authority
- CN
- China
- Prior art keywords
- behavior
- frequency
- user
- threshold values
- cookie
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/02—Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
- H04L63/0227—Filtering policies
- H04L63/0236—Filtering by address, protocol, port number or service, e.g. IP-address or URL
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0633—Lists, e.g. purchase orders, compilation or processing
- G06Q30/0635—Processing of requisition or of purchase orders
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/16—Threshold monitoring
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1425—Traffic logging, e.g. anomaly detection
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1441—Countermeasures against malicious traffic
- H04L63/1466—Active attacks involving interception, injection, modification, spoofing of data unit addresses, e.g. hijacking, packet injection or TCP sequence number attacks
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
Abstract
The invention provides a method for recognizing an abnormal batch ticket booking behavior based on a DBSCAN model, and the method comprises the steps: carrying out the hashing of user IP, Cookie and access agent environment Agent in a network ticket booking behavior record of at least one marked highly-centralized registration account cluster into a global unique coding character string IP+Cookie+Agent after the registration number in a predetermined time period is higher than a recognition threshold value of the registration number in a reference time period, and forming a unique user identification; recognizing the user IP of an abnormal behavior attribute in network ticket booking behavior attributes, and storing the user IP of the abnormal behavior attribute in a blacklist for isolation. The method provides data criterion for the judgment of scalpers through determining various threshold values for the recognition of scalpers. The behavior features of users are recorded in real time, and the real-time criterion can be provided for the interception of scalpers. According to the method, the blacklist can be built, and the scalpers can be prevented in advance based on the blacklist, thereby enabling the resource distribution to be more reasonable and fair.
Description
Technical field
The present invention relates to abnormal behaviour technology of identification field, more particularly to a kind of identification is based on the improper of DBSCAN models
The method of batch booking behavior.
Background technology
Live performance ticket is few due to the high resource of price, can attract substantial amounts of ox come brush ticket (network booking exception row
For), then high price is resell at a profit, the appearance of ox, damages the interests of user, greatly reduces the Consumer's Experience peace of online ticketing
User's viscosity of platform.Ox, often by many accounts of machine batch registration, can also be carried out to rob ticket by multiple accounts
High frequency, substantial amounts of access is placed an order with most fast speed and occupies resource.So ox typically can carry out brush ticket by program.At present
Identification ox is originated by the access of counting user, and visitation frequency, is therefrom found out different from most of users access cycle
Access exception, so as to be judged to ox.Set up ox blacklist.The not necessarily one real user of definition of ox,
Can also be a resource, ox carries out brush ticket using the resource, and this resource is also brought in ox blacklist, thus, meeting
There are IP blacklists, Cookie blacklists, account blacklist etc..
The mode of current identification ox, by parsing, calculates the IP in daily record mainly by monitoring access log,
Cookie, equipment, the visitation frequency of account, access time is spaced to recognize abnormal access, and this is prevented to a certain extent
Ox.But when using above-mentioned technology, inventor has found, the identification of single dimension, it is impossible to enough unique differentiation user equipmenies,
Easily manslaughter normal users, such as IP, ox and normal users in same building or cell, with same outlet IP, if
Recognized using IP, easily manslaughter normal users.Second, frequency identification can only recognize ox to a certain extent, when ox is drawn
It is big to access interval, visitation frequency is reduced, with regard to bad judgement.And ox can simulate different clients, carrying out by all kinds of means is brushed
Ticket.Ox can take a shortcut to quickly rob ticket, will not operate as normal users, thus its action trail also lacks committed step,
So being currently based on the recognition methods of flow access exception, it has been unsatisfactory for recognizing the needs of ox.
The content of the invention
In order to solve above-mentioned technical problem, the invention provides a kind of improper batch of identification based on DBSCAN models is purchased
The method of ticket behavior, can identify network batch booking abnormal behavior from normal booking behavioural characteristic, carry out
Isolation, and probability of misrecognition is reduced, the distribution for making resource has more reasonability and fairness.
The invention provides a kind of method of identification based on the improper batch booking behavior of DBSCAN models, including:
After recognition threshold of the number-of-registration in monitoring predetermined amount of time higher than the number-of-registration of reference time section, base is obtained
At least one high concentration marked after all registration behaviors that density clustering algorithm was scanned in the pre-identification time period
Login account cluster;
By the User IP in the network booking behavior record of the login account cluster of at least one high concentration of the mark,
Cookie and access agent environment Agents Hash turn to a globally unique coded string IP+Cookie+Agent, are formed
Unique subscriber identification;
Extract the network in the web-based history booking behavior record and real-time network booking behavior record of the ID
Booking behavior property;
The User IP of the abnormal behaviour attribute in the network booking behavior property is recognized, by the abnormal behaviour attribute
User IP is stored in blacklist is isolated.
Further, in the booking behavior record by network User IP, Cookie and access agent environment Agents are breathed out
It is uncommon to turn to a globally unique coded string IP+Cookie+Agent, unique subscriber identification is formed, including:
The User IP in network booking behavior record, Cookie and access agent environment Agents are breathed out by hash function
It is uncommon to turn to a globally unique coded string IP+Cookie+Agent, form unique subscriber identification.
Further, the User IP of the abnormal behaviour attribute in the identification network booking behavior property, will be described
The User IP of abnormal behaviour attribute is stored in blacklist is isolated, including:
The frequency threshold values and blacklist in the network booking behavior property is recognized, the frequency threshold values includes but do not limit
In:Different IP visitation frequencies, the frequency of the access difference url of each IP, IP+cookie+agent visitation frequencies or IP+
Cookie+agent accesses one or more in the frequency of difference url;
The User IP of abnormal behaviour is identified by the frequency threshold values and blacklist, the User IP for identifying is deposited
Enter blacklist to be isolated.
Further, the web-based history booking behavior record for extracting the ID and real-time network booking behavior
Network booking behavior property in record, including:
Extract the history frequency threshold values and history blacklist in the historical behavior record in the ID;
Extract and there is potential buying behavior exception threshold values in the historical trading behavior record in the ID and surpass
Go out the blacklist of the abnormal register user of buying behavior exception threshold values;
Active user's visitation frequency and path in the current accessed behavior record of Real-time Collection ID.
Further, the history frequency threshold values in the historical behavior record extracted in the ID, including with
Lower step:
Log file contents during historical behavior is recorded are loaded in big data number storehouse Hive, and daily record is set up in Hive
Document formatting tables of data, log file contents are formatted in tables of data;
In tables of data, visitation frequency is calculated, and result of calculation is stored in big data number storehouse;The visitation frequency includes
But it is not limited to different IP visitation frequencies, the frequency of the access difference url of each IP, IP+cookie+agent visitation frequencies or IP+
Cookie+agent accesses one or more in the frequency of difference url;
Using histogram, observation frequency distribution, self-defined determination history frequency threshold values simultaneously stores the history frequency threshold values.
Further, the history blacklist in the historical behavior record extracted in the ID, including it is following
Step:
The user access logses file of the previous day on server different in ngnix server proxy clusters is focused on
On distributed memory system HDFS;
Log file contents are loaded in big data number storehouse Hive, log file formats data are set up in Hive
Table, log file contents are formatted in tables of data;
In Hive, difference IP visitation frequencies, the frequency of the access difference url of each IP, IP+cookie+agent are calculated
Visitation frequency, IP+cookie+agent accesses the frequency of difference url;Result of calculation is stored in big data number storehouse;Using Nogata
Figure, observation frequency distribution, self-defined determination history frequency threshold values;
Based on a determination that history frequency threshold values and frequency result of calculation, identify abnormal client, be deposited into blacklist table
In.
Further, there is potential buying behavior in the historical trading behavior record extracted in the ID
Abnormal threshold values, comprises the following steps:
Importing historical trading behavior record is in data warehouse;
Calculate the single game time booking number of each User IP, bought item number, average booking number;
Using histogram, observation single game time booking number, bought item number, average booking number distribution, according to custom rule
There is potential buying behavior exception threshold values in analysis determination, and store and described there is potential buying behavior exception threshold values.
Further, beyond buying behavior exception valve in the historical trading behavior record extracted in the ID
The blacklist of the abnormal register user of value, comprises the following steps:
Importing the previous day All Activity record and the transaction record of at least a year are in data warehouse;
Calculate the single game time booking number within each User IP 1 year, bought item number, average booking number;Using Nogata
Figure, observation single game time booking number, bought item number, average booking number distribution exists potential according to custom rule analysis determination
Buying behavior exception threshold values;
Based on a determination that potential buying behavior exception threshold values and frequency result of calculation, identify beyond buying behavior exception
The abnormal register user of threshold values, in being deposited into blacklist table.
Further, the active user's visitation frequency in the current accessed behavior record of the Real-time Collection ID and
Abnormal access path, including:
The access log file of nginx is read in real time and is sent to log processing system;
The daily record that log processing system real-time reception Log Collect System sends, with one second as a calculation window, meter
IP visitation frequencies are calculated, the frequency of the access url of each IP, IP+cookie+agent visitation frequencies, IP+cookie+agent is visited
The frequency and the abnormal access path of url are asked, and result of calculation is stored in caching.
Further, the calculation of the recognition threshold includes:
Wherein, the ratio that α uprushes for data, P1 be the pre-identification time period in number-of-registration, n be the pre-identification time period it
A front continuous base, n+m continuous multiple bases before being the pre-identification time period, Pi is reference time section
Number-of-registration, number-of-registration maximums of the Pmax before being the pre-identification time period in continuous multiple bases, Pmin is pre-
Number-of-registration minimum of a value before recognition time section in continuous multiple bases.
The present invention is higher than the identification threshold of the number-of-registration of reference time section by the number-of-registration in monitoring predetermined amount of time
After value, at least one marked after all registration behaviors scanned based on density clustering algorithm in the pre-identification time period is obtained
The login account cluster of high concentration;The network booking behavior of the login account cluster of at least one high concentration of the mark is remembered
User IP, Cookie and access agent environment Agents Hash in record turns to a globally unique coded string IP+
Cookie+Agent, forms unique subscriber identification;Extract the web-based history booking behavior record and Real-time Network of the ID
Network booking behavior property in network booking behavior record;Recognize abnormal behaviour attribute in the network booking behavior property
User IP, is stored in the User IP of the abnormal behaviour attribute blacklist and is isolated.It is determined that the various threshold values of identification ox, are
Differentiate that ox provides data basis for estimation.The behavioural characteristic (frequency and track) of real time record user, can block for real-time ox
Cut and real-time foundation is provided;Blacklist can be set up, blacklist can be based on, ox is intercepted in advance, the distribution for making resource is more closed
Rationality and fairness.
Description of the drawings
Fig. 1 is based on the reality of the method for the improper batch booking behavior of DBSCAN models for a kind of identification that the present invention is provided
Apply the flow chart of example one.
Specific embodiment
In order that those skilled in the art more fully understand the present invention program, below in conjunction with the embodiment of the present invention
Accompanying drawing, is clearly and completely described to the technical scheme in the embodiment of the present invention, it is clear that described embodiment is only
The embodiment of a part of the invention, rather than the embodiment of whole.Based on the embodiment in the present invention, ordinary skill people
The every other embodiment that member is obtained under the premise of creative work is not made, should all belong to the model of present invention protection
Enclose.
It should be noted that description and claims of this specification and the term " first " in above-mentioned accompanying drawing, "
Two " it is etc. the object for distinguishing similar, without for describing specific order or precedence.It should be appreciated that so using
Data can exchange in the appropriate case, so as to embodiments of the invention described herein can with except illustrating here or
Order beyond those of description is implemented.Additionally, term " comprising " and " having " and their any deformation, it is intended that cover
Lid is non-exclusive to be included, and for example, process, method, system, product or the equipment for containing series of steps or unit is not necessarily limited to
Those steps clearly listed or unit, but may include clearly not list or for these processes, method, product
Or intrinsic other steps of equipment or unit.
Embodiment one
The embodiment of the present invention one provides a kind of method of identification based on the improper batch booking behavior of DBSCAN models,
As shown in figure 1, including:Step S110 to S140.
In step s 110, identification of the number-of-registration in predetermined amount of time higher than the number-of-registration of reference time section is monitored
After threshold value, at least marked after all registration behaviors scanned based on density clustering algorithm in the pre-identification time period is obtained
The login account cluster of individual high concentration.
In the step s 120, the network booking behavior of the login account cluster of at least one high concentration of the mark is remembered
User IP, Cookie and access agent environment Agents Hash in record turns to a globally unique coded string IP+
Cookie+Agent, forms unique subscriber identification.
In step s 130, web-based history booking behavior record and the real-time network booking behavior of the ID are extracted
Network booking behavior property in record.
In step S140, the User IP of the abnormal behaviour attribute in the network booking behavior property is recognized, will be described
The User IP of abnormal behaviour attribute is stored in blacklist is isolated.
Further, in the booking behavior record by network User IP, Cookie and access agent environment Agents are breathed out
It is uncommon to turn to a globally unique coded string IP+Cookie+Agent, unique subscriber identification is formed, including:
The User IP in network booking behavior record, Cookie and access agent environment Agents are breathed out by hash function
It is uncommon to turn to a globally unique coded string IP+Cookie+Agent, form unique subscriber identification.
Further, the User IP of the abnormal behaviour attribute in the identification network booking behavior property, will be described
The User IP of abnormal behaviour attribute is stored in blacklist is isolated, including:
The frequency threshold values and blacklist in the network booking behavior property is recognized, the frequency threshold values includes but do not limit
In:Different IP visitation frequencies, the frequency of the access difference url of each IP, IP+cookie+agent visitation frequencies or IP+
Cookie+agent accesses one or more in the frequency of difference url;
The User IP of abnormal behaviour is identified by the frequency threshold values and blacklist, the User IP for identifying is deposited
Enter blacklist to be isolated.
Further, the web-based history booking behavior record for extracting the ID and real-time network booking behavior
Network booking behavior property in record, including:
Extract the history frequency threshold values and history blacklist in the historical behavior record in the ID;
Extract and there is potential buying behavior exception threshold values in the historical trading behavior record in the ID and surpass
Go out the blacklist of the abnormal register user of buying behavior exception threshold values;
Active user's visitation frequency and path in the current accessed behavior record of Real-time Collection ID.
Further, the history frequency threshold values in the historical behavior record extracted in the ID, including with
Lower step:
Log file contents during historical behavior is recorded are loaded in big data number storehouse Hive, and daily record is set up in Hive
Document formatting tables of data, log file contents are formatted in tables of data;
In tables of data, visitation frequency is calculated, and result of calculation is stored in big data number storehouse;The visitation frequency includes
But it is not limited to different IP visitation frequencies, the frequency of the access difference url of each IP, IP+cookie+agent visitation frequencies or IP+
Cookie+agent accesses one or more in the frequency of difference url;
Using histogram, observation frequency distribution, self-defined determination history frequency threshold values simultaneously stores the history frequency threshold values.
Further, the history blacklist in the historical behavior record extracted in the ID, including it is following
Step:
The user access logses file of the previous day on server different in ngnix server proxy clusters is focused on
On distributed memory system HDFS;
Log file contents are loaded in big data number storehouse Hive, log file formats data are set up in Hive
Table, log file contents are formatted in tables of data;
In Hive, difference IP visitation frequencies, the frequency of the access difference url of each IP, IP+cookie+agent are calculated
Visitation frequency, IP+cookie+agent accesses the frequency of difference url;Result of calculation is stored in big data number storehouse;Using Nogata
Figure, observation frequency distribution, self-defined determination history frequency threshold values;
Based on a determination that history frequency threshold values and frequency result of calculation, identify abnormal client, be deposited into blacklist table
In.
Further, there is potential buying behavior in the historical trading behavior record extracted in the ID
Abnormal threshold values, comprises the following steps:
Importing historical trading behavior record is in data warehouse;
Calculate the single game time booking number of each User IP, bought item number, average booking number;
Using histogram, observation single game time booking number, bought item number, average booking number distribution, according to custom rule
There is potential buying behavior exception threshold values in analysis determination, and store and described there is potential buying behavior exception threshold values.
Further, beyond buying behavior exception valve in the historical trading behavior record extracted in the ID
The blacklist of the abnormal register user of value, comprises the following steps:
Importing the previous day All Activity record and the transaction record of at least a year are in data warehouse;
Calculate the single game time booking number within each User IP 1 year, bought item number, average booking number;Using Nogata
Figure, observation single game time booking number, bought item number, average booking number distribution exists potential according to custom rule analysis determination
Buying behavior exception threshold values;
Based on a determination that potential buying behavior exception threshold values and frequency result of calculation, identify beyond buying behavior exception
The abnormal register user of threshold values, in being deposited into blacklist table.
Further, the active user's visitation frequency in the current accessed behavior record of the Real-time Collection ID and
Abnormal access path, including:
The access log file of nginx is read in real time and is sent to log processing system;
The daily record that log processing system real-time reception Log Collect System sends, with one second as a calculation window, meter
IP visitation frequencies are calculated, the frequency of the access url of each IP, IP+cookie+agent visitation frequencies, IP+cookie+agent is visited
The frequency and the abnormal access path of url are asked, and result of calculation is stored in caching.
Further, the calculation of the recognition threshold includes:
Wherein, the ratio that α uprushes for data, P1 be the pre-identification time period in number-of-registration, n be the pre-identification time period it
A front continuous base, n+m continuous multiple bases before being the pre-identification time period, Pi is reference time section
Number-of-registration, number-of-registration maximums of the Pmax before being the pre-identification time period in continuous multiple bases, Pmin is pre-
Number-of-registration minimum of a value before recognition time section in continuous multiple bases.
The embodiment of the present invention is by the number-of-registration in monitoring predetermined amount of time higher than the number-of-registration of reference time section
After recognition threshold, obtain and marked extremely after all registration behaviors scanned based on density clustering algorithm in the pre-identification time period
The login account cluster of a few high concentration;By the network booking of the login account cluster of at least one high concentration of the mark
User IP, Cookie and access agent environment Agents Hash in behavior record turns to a globally unique coded string
IP+Cookie+Agent, forms unique subscriber identification;Extract the web-based history booking behavior record and in real time of the ID
Network booking behavior property in network booking behavior record;Recognize the abnormal behaviour attribute in the network booking behavior property
User IP, the User IP of the abnormal behaviour attribute is stored in into blacklist and is isolated.It is determined that the various threshold values of identification ox,
To differentiate that ox provides data basis for estimation.The behavioural characteristic (frequency and track) of real time record user, can be real-time ox
Intercept and real-time foundation is provided;Blacklist can be set up, blacklist can be based on, ox is intercepted in advance, the distribution for making resource has more
Reasonability and fairness.
The embodiments of the present invention are for illustration only, do not represent the quality of embodiment.
It should be noted that for aforesaid each method embodiment, in order to be briefly described, therefore it is all expressed as a series of
Combination of actions, but those skilled in the art should know, the present invention do not limited by described sequence of movement because
According to the present invention, some steps can adopt other orders or while carry out.Secondly, those skilled in the art also should know
Know, embodiment described in this description belongs to preferred embodiment, involved action and module is not necessarily of the invention
It is necessary.
In the above-described embodiments, the description to each embodiment all emphasizes particularly on different fields, without the portion described in detail in certain embodiment
Point, may refer to the associated description of other embodiment.
In several embodiments provided herein, it should be understood that disclosed device, can be by another way
Realize.For example, device embodiment described above is only schematic, such as division of described unit, is only one kind
Division of logic function, can there is an other dividing mode when actually realizing, such as multiple units or component can with reference to or can
To be integrated into another system, or some features can be ignored, or not perform.It is another, it is shown or discussed each other
Coupling or direct-coupling or communication connection can be INDIRECT COUPLING or communication connection by some interfaces, device or unit,
Can be electrical or other forms.
The unit as separating component explanation can be or may not be it is physically separate, it is aobvious as unit
The part for showing can be or may not be physical location, you can with positioned at a place, or can also be distributed to multiple
On NE.Some or all of unit therein can according to the actual needs be selected to realize the mesh of this embodiment scheme
's.
In addition, each functional unit in each embodiment of the invention can be integrated in a processing unit, it is also possible to
It is that unit is individually physically present, it is also possible to which two or more units are integrated in a unit.Above-mentioned integrated list
Unit both can be realized in the form of hardware, it would however also be possible to employ the form of SFU software functional unit is realized.
It may be noted that according to the needs implemented, each step/part described in this application can be split as into more multistep
Suddenly/part, also can be combined into new step/part by the part operation of two or more step/parts or step/part,
To realize the purpose of the present invention.
Above-mentioned the method according to the invention can be realized in hardware, firmware, or be implemented as being storable in recording medium
Software or computer code in (such as CD ROM, RAM, floppy disk, hard disk or magneto-optic disk), or it is implemented through network download
Original storage is in long-range recording medium or nonvolatile machine readable media and will be stored in the meter in local recording medium
Calculation machine code, so as to method described here can be stored in using all-purpose computer, application specific processor or programmable or special
With the such software processing in the recording medium of hardware (such as ASIC or FPGA).It is appreciated that computer, processor, micro-
Processor controller or programmable hardware include can storing or receive software or computer code storage assembly (for example, RAM,
ROM, flash memory etc.), when the software or computer code are by computer, processor or hardware access and when performing, realize here
The processing method of description.Additionally, when all-purpose computer accesses the code of the process being shown in which for realization, the execution of code
All-purpose computer is converted to into the special-purpose computer for performing the process being shown in which.
The above, the only specific embodiment of the present invention, but protection scope of the present invention is not limited thereto, any
Those familiar with the art the invention discloses technical scope in, change or replacement can be readily occurred in, all should contain
Cover within protection scope of the present invention.Therefore, protection scope of the present invention should be defined by the scope of the claims.
Claims (10)
1. a kind of method that identification is based on the improper batch booking behavior of DBSCAN models, it is characterised in that include:
After recognition threshold of the number-of-registration in monitoring predetermined amount of time higher than the number-of-registration of reference time section, obtain based on close
Degree clustering algorithm scans the registration of at least one high concentration marked after all registration behaviors in the pre-identification time period
Account cluster;
By the User IP in the network booking behavior record of the login account cluster of at least one high concentration of the mark,
Cookie and access agent environment Agents Hash turn to a globally unique coded string IP+Cookie+Agent, are formed
Unique subscriber identification;
Extract the network booking in the web-based history booking behavior record and real-time network booking behavior record of the ID
Behavior property;
The User IP of the abnormal behaviour attribute in the network booking behavior property is recognized, by the user of the abnormal behaviour attribute
IP is stored in blacklist to be isolated.
2. the method for claim 1, it is characterised in that User IP in the booking behavior record by network,
Cookie and access agent environment Agents Hash turn to a globally unique coded string IP+Cookie+Agent, are formed
Unique subscriber identification, including:
By hash function by the User IP in network booking behavior record, Cookie and access agent environment Agents hashed
For a globally unique coded string IP+Cookie+Agent, unique subscriber identification is formed.
3. method as claimed in claim 1 or 2, it is characterised in that different in the identification network booking behavior property
Often the User IP of behavior property, is stored in the User IP of the abnormal behaviour attribute blacklist and is isolated, including:
The frequency threshold values and blacklist in the network booking behavior property is recognized, the frequency threshold values is included but is not limited to:No
With IP visitation frequencies, the frequency of the access difference url of each IP, IP+cookie+agent visitation frequencies or IP+cookie+
Agent accesses one or more in the frequency of difference url;
The User IP of abnormal behaviour is identified by the frequency threshold values and blacklist, the User IP for identifying is stored in black
List is isolated.
4. the method as described in one of claim 1-3, it is characterised in that the web-based history purchase of the extraction ID
Network booking behavior property in ticket behavior record and real-time network booking behavior record, including:
Extract the history frequency threshold values and history blacklist in the historical behavior record in the ID;
Extract and there is potential buying behavior exception threshold values in the historical trading behavior record in the ID and beyond purchase
Buy the blacklist of the abnormal register user of abnormal behavior threshold values;
Active user's visitation frequency and path in the current accessed behavior record of Real-time Collection ID.
5. method as claimed in claim 4, it is characterised in that in the historical behavior record in the extraction ID
History frequency threshold values, comprise the following steps:
Log file contents during historical behavior is recorded are loaded in big data number storehouse Hive, and journal file is set up in Hive
Format data table, log file contents are formatted in tables of data;
In tables of data, visitation frequency is calculated, and result of calculation is stored in big data number storehouse;The visitation frequency include but not
It is limited to different IP visitation frequencies, the frequency of the access difference url of each IP, IP+cookie+agent visitation frequencies or IP+
Cookie+agent accesses one or more in the frequency of difference url;
Using histogram, observation frequency distribution, self-defined determination history frequency threshold values simultaneously stores the history frequency threshold values.
6. method as claimed in claim 4, it is characterised in that in the historical behavior record in the extraction ID
History blacklist, comprise the following steps:
The user access logses file of the previous day on server different in ngnix server proxy clusters is focused on into distribution
In formula storage system HDFS;
Log file contents are loaded in big data number storehouse Hive, log file formats tables of data is set up in Hive, will
Log file contents are formatted in tables of data;
In Hive, difference IP visitation frequencies are calculated, the frequency of the access difference url of each IP, IP+cookie+agent is accessed
The frequency, IP+cookie+agent accesses the frequency of difference url;Result of calculation is stored in big data number storehouse;Using histogram,
Observation frequency distribution, self-defined determination history frequency threshold values;
Based on a determination that history frequency threshold values and frequency result of calculation, abnormal client is identified, in being deposited into blacklist table.
7. method as claimed in claim 4, it is characterised in that the historical trading behavior note in the extraction ID
There is potential buying behavior exception threshold values in record, comprise the following steps:
Importing historical trading behavior record is in data warehouse;
Calculate the single game time booking number of each User IP, bought item number, average booking number;
Using histogram, observation single game time booking number, bought item number, average booking number distribution, analyzed according to custom rule
It is determined that there is potential buying behavior exception threshold values, and store and described there is potential buying behavior exception threshold values.
8. method as claimed in claim 4, it is characterised in that the historical trading behavior note in the extraction ID
Exceed the blacklist of the abnormal register user of buying behavior exception threshold values in record, comprise the following steps:
Importing the previous day All Activity record and the transaction record of at least a year are in data warehouse;
Calculate the single game time booking number within each User IP 1 year, bought item number, average booking number;Using histogram, see
Single game time booking number is examined, bought item number, average booking number distribution determines there is potential purchase according to custom rule analysis
Abnormal behavior threshold values;
Based on a determination that potential buying behavior exception threshold values and frequency result of calculation, identify beyond buying behavior exception threshold values
Abnormal register user, in being deposited into blacklist table.
9. method as claimed in claim 4, it is characterised in that the current accessed behavior record of the Real-time Collection ID
In active user's visitation frequency and abnormal access path, including:
The access log file of nginx is read in real time and is sent to log processing system;
The daily record that log processing system real-time reception Log Collect System sends, with one second as a calculation window, calculates IP
Visitation frequency, the frequency of the access url of each IP, IP+cookie+agent visitation frequencies, IP+cookie+agent accesses url
The frequency and abnormal access path, and by result of calculation store in caching.
10. the method for claim 1, it is characterised in that the calculation of the recognition threshold includes:
Wherein, the ratio that α uprushes for data, P1 is the number-of-registration in the pre-identification time period, and n connects before being the pre-identification time period
A continuous base, n+m continuous multiple bases before being the pre-identification time period, Pi is the registration of reference time section
Quantity, number-of-registration maximums of the Pmax before being the pre-identification time period in continuous multiple bases, Pmin is pre-identification
Number-of-registration minimum of a value before time period in continuous multiple bases.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611019839.5A CN106657007A (en) | 2016-11-18 | 2016-11-18 | Method for recognizing abnormal batch ticket booking behavior based on DBSCAN model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611019839.5A CN106657007A (en) | 2016-11-18 | 2016-11-18 | Method for recognizing abnormal batch ticket booking behavior based on DBSCAN model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106657007A true CN106657007A (en) | 2017-05-10 |
Family
ID=58808057
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611019839.5A Pending CN106657007A (en) | 2016-11-18 | 2016-11-18 | Method for recognizing abnormal batch ticket booking behavior based on DBSCAN model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106657007A (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI638319B (en) * | 2017-08-25 | 2018-10-11 | 拓元股份有限公司 | Internet ticketing system |
CN108900478A (en) * | 2018-06-11 | 2018-11-27 | 阿里巴巴集团控股有限公司 | The detection method and device of unusual fluctuation attack, safety protection equipment |
CN109685536A (en) * | 2017-10-18 | 2019-04-26 | 北京京东尚科信息技术有限公司 | Method and apparatus for output information |
CN109949069A (en) * | 2019-01-28 | 2019-06-28 | 平安科技(深圳)有限公司 | Suspicious user screening technique, device, computer equipment and storage medium |
CN110322028A (en) * | 2018-03-29 | 2019-10-11 | 北京红马传媒文化发展有限公司 | Method for managing resource, device and electronic equipment |
CN110322573A (en) * | 2018-03-30 | 2019-10-11 | 北京红马传媒文化发展有限公司 | User authentication method, user authentication device and electronic equipment |
CN110675228A (en) * | 2019-09-27 | 2020-01-10 | 支付宝(杭州)信息技术有限公司 | User ticket buying behavior detection method and device |
CN111723655A (en) * | 2020-05-12 | 2020-09-29 | 五八有限公司 | Face image processing method, device, server, terminal, equipment and medium |
CN111860644A (en) * | 2020-07-20 | 2020-10-30 | 北京百度网讯科技有限公司 | Abnormal account identification method, device, equipment and storage medium |
CN111899856A (en) * | 2020-07-25 | 2020-11-06 | 广州海鹚网络科技有限公司 | Risk control method, device, equipment and storage medium for hospital registration |
CN111984634A (en) * | 2019-05-22 | 2020-11-24 | 中国移动通信集团山西有限公司 | Alarm transaction extraction method, device, equipment and computer storage medium |
CN112364347A (en) * | 2020-11-19 | 2021-02-12 | 全知科技(杭州)有限责任公司 | High-performance computing method for identifying high-frequency data access and operation |
CN114187010A (en) * | 2021-10-25 | 2022-03-15 | 武汉斗鱼网络科技有限公司 | Method, device, medium and equipment for identifying anchor with ticket swiping behavior |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105791255A (en) * | 2014-12-23 | 2016-07-20 | 阿里巴巴集团控股有限公司 | Method and system for identifying computer risks based on account clustering |
CN105808988A (en) * | 2014-12-31 | 2016-07-27 | 阿里巴巴集团控股有限公司 | Method and device for identifying exceptional account |
CN105956911A (en) * | 2016-05-23 | 2016-09-21 | 北京小米移动软件有限公司 | Purchase request processing method and device |
-
2016
- 2016-11-18 CN CN201611019839.5A patent/CN106657007A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105791255A (en) * | 2014-12-23 | 2016-07-20 | 阿里巴巴集团控股有限公司 | Method and system for identifying computer risks based on account clustering |
CN105808988A (en) * | 2014-12-31 | 2016-07-27 | 阿里巴巴集团控股有限公司 | Method and device for identifying exceptional account |
CN105956911A (en) * | 2016-05-23 | 2016-09-21 | 北京小米移动软件有限公司 | Purchase request processing method and device |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI638319B (en) * | 2017-08-25 | 2018-10-11 | 拓元股份有限公司 | Internet ticketing system |
CN109685536A (en) * | 2017-10-18 | 2019-04-26 | 北京京东尚科信息技术有限公司 | Method and apparatus for output information |
CN110322028A (en) * | 2018-03-29 | 2019-10-11 | 北京红马传媒文化发展有限公司 | Method for managing resource, device and electronic equipment |
CN110322573A (en) * | 2018-03-30 | 2019-10-11 | 北京红马传媒文化发展有限公司 | User authentication method, user authentication device and electronic equipment |
CN108900478A (en) * | 2018-06-11 | 2018-11-27 | 阿里巴巴集团控股有限公司 | The detection method and device of unusual fluctuation attack, safety protection equipment |
CN109949069A (en) * | 2019-01-28 | 2019-06-28 | 平安科技(深圳)有限公司 | Suspicious user screening technique, device, computer equipment and storage medium |
WO2020155508A1 (en) * | 2019-01-28 | 2020-08-06 | 平安科技(深圳)有限公司 | Suspicious user screening method and apparatus, computer device and storage medium |
CN111984634A (en) * | 2019-05-22 | 2020-11-24 | 中国移动通信集团山西有限公司 | Alarm transaction extraction method, device, equipment and computer storage medium |
CN111984634B (en) * | 2019-05-22 | 2023-07-21 | 中国移动通信集团山西有限公司 | Alarm transaction extraction method, device, equipment and computer storage medium |
CN110675228A (en) * | 2019-09-27 | 2020-01-10 | 支付宝(杭州)信息技术有限公司 | User ticket buying behavior detection method and device |
CN111723655A (en) * | 2020-05-12 | 2020-09-29 | 五八有限公司 | Face image processing method, device, server, terminal, equipment and medium |
CN111723655B (en) * | 2020-05-12 | 2024-03-08 | 五八有限公司 | Face image processing method, device, server, terminal, equipment and medium |
CN111860644A (en) * | 2020-07-20 | 2020-10-30 | 北京百度网讯科技有限公司 | Abnormal account identification method, device, equipment and storage medium |
CN111899856A (en) * | 2020-07-25 | 2020-11-06 | 广州海鹚网络科技有限公司 | Risk control method, device, equipment and storage medium for hospital registration |
CN112364347A (en) * | 2020-11-19 | 2021-02-12 | 全知科技(杭州)有限责任公司 | High-performance computing method for identifying high-frequency data access and operation |
CN114187010A (en) * | 2021-10-25 | 2022-03-15 | 武汉斗鱼网络科技有限公司 | Method, device, medium and equipment for identifying anchor with ticket swiping behavior |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106657007A (en) | Method for recognizing abnormal batch ticket booking behavior based on DBSCAN model | |
CN106453357A (en) | Network ticket buying abnormal behavior recognition method and system and equipment | |
CN103795612B (en) | Rubbish and illegal information detecting method in instant messaging | |
CN108809745A (en) | A kind of user's anomaly detection method, apparatus and system | |
CN106022800A (en) | User feature data processing method and device | |
CN106022834A (en) | Advertisement against cheating method and device | |
CN110648172B (en) | Identity recognition method and system integrating multiple mobile devices | |
WO2022247955A1 (en) | Abnormal account identification method, apparatus and device, and storage medium | |
CN108647730A (en) | A kind of data partition method and system based on historical behavior co-occurrence | |
CN110457601B (en) | Social account identification method and device, storage medium and electronic device | |
CN107977855B (en) | Method and device for managing user information | |
CN107332931A (en) | The recognition methods of waterborne troops of machine type forum and device | |
Liu et al. | SDHM: A hybrid model for spammer detection in Weibo | |
CN113902534A (en) | Interactive risk group identification method based on stock community relation map | |
CN114140248A (en) | AI artificial intelligence technology-based abnormal transaction identification method | |
CN115378619A (en) | Sensitive data access method, electronic equipment and computer readable storage medium | |
Li et al. | Customer churn prediction in telecom using big data analytics | |
CN115860482A (en) | Shop risk identification method and device, equipment, medium and product thereof | |
CN117875501A (en) | Social media user behavior prediction system and method based on big data | |
CN109919667A (en) | A kind of method and apparatus of the IP of enterprise for identification | |
CN112199388A (en) | Strange call identification method and device, electronic equipment and storage medium | |
CN116402546A (en) | Store risk attribution method and device, equipment, medium and product thereof | |
CN111309706A (en) | Model training method and device, readable storage medium and electronic equipment | |
CN107705135A (en) | A kind of method that potential commercial value is evaluated based on company's storage contact data | |
CN114385899A (en) | User group accurate identification system and method based on big data analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170510 |