CN105337987A - Network user identity authentication method and system - Google Patents
Network user identity authentication method and system Download PDFInfo
- Publication number
- CN105337987A CN105337987A CN201510810443.1A CN201510810443A CN105337987A CN 105337987 A CN105337987 A CN 105337987A CN 201510810443 A CN201510810443 A CN 201510810443A CN 105337987 A CN105337987 A CN 105337987A
- Authority
- CN
- China
- Prior art keywords
- session
- user
- algorithm
- content
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 60
- 239000011159 matrix material Substances 0.000 claims abstract description 23
- 238000012545 processing Methods 0.000 claims abstract description 11
- 230000008569 process Effects 0.000 claims description 18
- 238000007477 logistic regression Methods 0.000 claims description 8
- 239000000284 extract Substances 0.000 claims description 7
- 230000008878 coupling Effects 0.000 claims description 5
- 238000010168 coupling process Methods 0.000 claims description 5
- 238000005859 coupling reaction Methods 0.000 claims description 5
- 230000000694 effects Effects 0.000 abstract description 6
- 238000009412 basement excavation Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 238000005065 mining Methods 0.000 description 4
- 230000008859 change Effects 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 238000012795 verification Methods 0.000 description 3
- 101150030891 MRAS gene Proteins 0.000 description 2
- 230000002776 aggregation Effects 0.000 description 2
- 238000004220 aggregation Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000011478 gradient descent method Methods 0.000 description 2
- 238000004064 recycling Methods 0.000 description 2
- VYZAMTAEIAYCRO-UHFFFAOYSA-N Chromium Chemical compound [Cr] VYZAMTAEIAYCRO-UHFFFAOYSA-N 0.000 description 1
- 241001269238 Data Species 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000003542 behavioural effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000013011 mating Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/08—Network architectures or network communication protocols for network security for authentication of entities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/02—Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/535—Tracking the activity of the user
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- Computer Security & Cryptography (AREA)
- Storage Device Security (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The invention provides a network user identity authentication method and system. The network user identity authentication method comprises the following steps: collecting all web browsing history of a legal user within a set time interval as a conversation, and processing each browsing history into the pattern of a website top level domain, a content class and a time stamp; acquiring m conversations of the legal user, and processing according to the following steps aiming at each conversation: acquiring n feature values of the browsed web by the user according to the conversations; processing the acquired feature values according to the set third algorithm so as to obtain a weight matrix corresponding to the feature values; calculating according to the feature values and the corresponding weight matrix so as to obtain the score of the conversations; and calculating by adopting a fourth algorithm according to the score of the conversations so as to obtain the classified threshold of the legal user. The technical scheme is used for authenticating continuously from three aspects of browsed and recorded website, content and time, so that the authentication effect is enhanced.
Description
Technical field
The present invention relates to a kind of network security technology, particularly relate to a kind of method for authentication of identification of network user and system.
Background technology
Along with the development of information technology and Internet technology, the scale of the online personnel of China constantly expands, shopping online is also more and more frequent with transaction, online has become an indispensable part in many people life, meanwhile, swindle crime in net purchase transaction is also sharply being risen in recent years, and the new network swindle that artificial hoax and technological means combine has become the primary security threat that netizen's line is lived.The important method that authentication is to provide the fail safe in network trading is carried out to the network user.In authenticating user identification, disposable certification and sustainable certification two class can be divided into.About disposable certification, mainly contain at present traditional certification based on password, the certification based on smart card, based on the biological characteristic of user and the certification etc. of behavioural characteristic.But disposable checking only at a time carries out certification, thus certification, by then judging that this user identity is legal, well for user provides safety guarantee, can not further provide sustainability certification.Relatively less about the research of sustainable certification at present, existing sustainable certification is mainly studied from the contact between user's network address sequence or user's browsing content.To user browsing behavior consider comprehensive not, certification effect has much room for improvement.
Given this, the technical scheme improving authentication of identification of network user fail safe further how is found just to become those skilled in the art's problem demanding prompt solution.
Summary of the invention
The shortcoming of prior art in view of the above, the object of the present invention is to provide a kind of method for authentication of identification of network user and system, for solving the problem that authentication of identification of network user fail safe in prior art needs to be improved further.
For achieving the above object and other relevant objects, the invention provides a kind of method for authentication of identification of network user, described method for authentication of identification of network user comprises: gather all web page browsing records of validated user in setting-up time section, described in browse record and comprise browsing page network address, content of text, timestamp; Network address TLD is extracted from described browsing page network address, extract keyword from described content of text and then determine the content class belonging to described content of text, < network address TLD is become by browsing recording processing described in each, content class, the form of timestamp >, using obtain in described setting-up time section all described in browse record as a session; Obtain m session of described validated user, for each session, be handled as follows: browse record according to all in described session, count multiple network address TLDs that user the most frequently accesses, and utilize the first algorithm of setting excavate described in browse relation in record between network address TLD and content class, the second algorithm that utilization sets browses content class and the relation between the time period in record described in excavating, and then obtains n characteristic value of described user's browsing page; The 3rd algorithm according to setting processes obtained characteristic value, obtains the weight matrix corresponding with described characteristic value; The mark of described session is calculated according to described characteristic value and corresponding weight matrix; According to the mark of a described m session, the 4th algorithm is adopted to calculate the classification thresholds of described validated user.
Alternatively, described method for authentication of identification of network user also comprises: obtain a new session, and calculate the mark of described new session; When described mark falls into the scope of described classification thresholds, judge that active user is described validated user; When described mark does not fall into the scope of described classification thresholds, judge that active user is not described validated user.
Alternatively, described characteristic value comprises: the element number that session packet contains; The frequent access websites number that session packet contains; The frequent item set number that session is mated; The frequent access websites number comprised in the frequent item set of session coupling; The longest frequent item set length that session is mated; The equal length of frequent item set that session is mated; The max support of the frequent item set that session is mated; The Average Supports of the frequent item set that session is mated; The frequent time period number that session is mated; Target column.
Alternatively, described first algorithm comprises Apriori algorithm.
Alternatively, described second algorithm comprises: the method for maximal possibility estimation browses record from described session the parameter value calculating the normal distribution that user obeys the browsing time to each content class.
Alternatively, described parameter value comprises:
Wherein, time
ifor user is at browsing content class content
itime relative time.
Alternatively, described 3rd algorithm comprises: LR logistic regression algorithm.
Alternatively, described 4th algorithm comprises:
then described classification thresholds is
wherein, score
legal ibe the mark of i-th session, m session altogether.
Alternatively, described setting-up time section comprises 30 minutes.
The invention provides also a kind of identity authentication system of network user, described identity authentication system of network user comprises: user conversation acquisition module, for gathering all web page browsing records of validated user in setting-up time section, described in browse record and comprise browsing page network address, content of text, timestamp; Network address TLD is extracted from described browsing page network address, extract keyword from described content of text and then determine the content class belonging to described content of text, < network address TLD is become by browsing recording processing described in each, content class, the form of timestamp >, using obtain in described setting-up time section all described in browse record as a session; Session scores computing module, for for a session, record is browsed according to all in described session, count multiple network address TLDs that user the most frequently accesses, and utilize the first algorithm of setting excavate described in browse relation in record between network address TLD and content class, the second algorithm that utilization sets browses content class and the relation between the time period in record described in excavating, and then obtains n characteristic value of described user's browsing page; The 3rd algorithm according to setting processes obtained characteristic value, obtains the weight matrix corresponding with described characteristic value; The mark of described session is calculated according to described characteristic value and corresponding weight matrix; Classification thresholds determination module, for obtaining multiple session scores of validated user, adopts the 4th algorithm to calculate the classification thresholds of described validated user.
Alternatively, described identity authentication system of network user also comprises the legal judge module of user, for obtaining a new session, and calculates the mark of described new session; When described mark falls into the scope of described classification thresholds, judge that active user is described validated user; When described mark does not fall into the scope of described classification thresholds, judge that active user is not described validated user.
Alternatively, described characteristic value comprises: the element number that session packet contains; The frequent access websites number that session packet contains; The frequent item set number that session is mated; The frequent access websites number comprised in the frequent item set of session coupling; The longest frequent item set length that session is mated; The equal length of frequent item set that session is mated; The max support of the frequent item set that session is mated; The Average Supports of the frequent item set that session is mated; The frequent time period number that session is mated; Target column.
Alternatively, described first algorithm comprises Apriori algorithm.
Alternatively, described second algorithm comprises: the method for maximal possibility estimation browses record from described session the parameter value calculating the normal distribution that user obeys the browsing time to each content class.
Alternatively, described parameter value comprises:
Wherein, time
ifor user is at browsing content class content
itime relative time.
Alternatively, described 3rd algorithm comprises: LR logistic regression algorithm.
Alternatively, described 4th algorithm comprises:
then described classification thresholds is
wherein, score
legal ibe the mark of i-th session, m session altogether.
Alternatively, described setting-up time section comprises 30 minutes.
As mentioned above, a kind of method for authentication of identification of network user of the present invention and system, there is following beneficial effect: (the network address 1) user browsed, content), and (content, time) two factors carry out the excavation of sequence, instead of only consider wherein certain factor merely, thus make authentication method of the present invention meet user browse custom.2) utilize correlation rule, (network address, content) is combined and carries out the excavation that user browses custom; Based on normal distribution, in order to find the frequent access time section of user to each content.3) in the process of user's browsing page, reach the effect of the certification of continuation.
Accompanying drawing explanation
Fig. 1 is shown as the schematic flow sheet of an embodiment of a kind of method for authentication of identification of network user of the present invention.
Fig. 2 is shown as the schematic flow sheet of another embodiment of a kind of method for authentication of identification of network user of the present invention.
Fig. 3 is shown as the module diagram of an embodiment of a kind of identity authentication system of network user of the present invention.
Element numbers explanation
1 identity authentication system of network user
11 user conversation acquisition modules
12 session scores computing modules
13 classification thresholds determination modules
The legal judge module of 14 user
S1 ~ S4 step
Embodiment
Below by way of specific instantiation, embodiments of the present invention are described, those skilled in the art the content disclosed by this specification can understand other advantages of the present invention and effect easily.The present invention can also be implemented or be applied by embodiments different in addition, and the every details in this specification also can based on different viewpoints and application, carries out various modification or change not deviating under spirit of the present invention.
It should be noted that, the diagram provided in the present embodiment only illustrates basic conception of the present invention in a schematic way, then only the assembly relevant with the present invention is shown in graphic but not component count, shape and size when implementing according to reality is drawn, it is actual when implementing, and the kenel of each assembly, quantity and ratio can be a kind of change arbitrarily, and its assembly layout kenel also may be more complicated.
The invention provides a kind of method for authentication of identification of network user.Described method for authentication of identification of network user carries out authentication according to user browsing behavior.In one embodiment, as shown in Figure 1, described method for authentication of identification of network user comprises:
Step S1, gathers all web page browsing records of validated user in setting-up time section, described in browse record and comprise browsing page network address, content of text, timestamp; Network address TLD is extracted from described browsing page network address, extract keyword from described content of text and then determine the content class belonging to described content of text, < network address TLD is become by browsing recording processing described in each, content class, the form of timestamp >, using obtain in described setting-up time section all described in browse record as a session.In one embodiment, gather the web-browsing record of a user, carry out data processing, form the basis of { (domain, content, timestamp) } session structure as follows as subsequent analysis.With the time interval of 30 minutes, process division is carried out to the record of browsing gathered, within every 30 minutes, obtain a session, perform repeatedly step S1, as performed m step S1, obtaining m session, finally m merged session being obtained corresponding session aggregation S.Follow-up is also for unit carries out certification with the once access behavior of user (session, 30 minutes) when carrying out certification.
In one embodiment, first utilize the sqlite database that chrome browser carries, what gather validated user browses record.Have recorded details when user browses each webpage in sqlite database, gather the url (URL(uniform resource locator), UniformResourceLocator) of each user institute browsing page, i.e. web page address; Content of text and timestamp browse record as original.To browse record and be designated as r, its attribute is as shown in table 1 below:
After acquisition initial data, can process initial data: first, each record of browsing in session be processed, its url is carried out to the extraction of TLD; Article on text classification sample and the network in recycling sogou laboratory is common, extracts obtain corresponding keyword to the content of text under each class, and the web page title of classifying with needs afterwards carries out mating the content class obtained belonging to this webpage.After Web Page Processing, what the Article 1 as table 1 was original browses the form that record is treated to (news.163.com, society, timestamp), and this form data are designated as webpage p (domain, content, timestamp) by us.
Step S2, obtain m session of described validated user, for each session, be handled as follows: browse record according to all in described session, count multiple network address TLDs that user the most frequently accesses, and utilize the first algorithm of setting excavate described in browse relation in record between network address TLD and content class, utilize the second algorithm of setting excavate described in browse content class and the relation between the time period in record, and then obtain n characteristic value of described user's browsing page; The 3rd algorithm according to setting processes obtained characteristic value, obtains the weight matrix corresponding with described characteristic value; The mark of described session is calculated according to described characteristic value and corresponding weight matrix.In one embodiment, with the time interval of 30 minutes, process division is carried out to the record of browsing gathered, within every 30 minutes, obtain a session, perform m step S1, obtain m session, finally m merged session is obtained corresponding session aggregation S.Described characteristic value comprises: the element number that session packet contains; The frequent access websites number that session packet contains; The frequent item set number that session is mated; The frequent access websites number comprised in the frequent item set of session coupling; The longest frequent item set length that session is mated; The equal length of frequent item set that session is mated; The max support of the frequent item set that session is mated; The Average Supports of the frequent item set that session is mated; The frequent time period number that session is mated; Target column.Described first algorithm comprises Apriori algorithm.Apriori algorithm is a kind of frequent item set algorithm of Mining Association Rules, and its core concept is closed detection two the stage Mining Frequent Itemsets Baseds downwards by candidate generation and plot.And algorithm has been widely used the every field such as business, network security.
Described second algorithm comprises: the method for maximal possibility estimation browses record from described session the parameter value calculating the normal distribution that user obeys the browsing time to each content class.Described parameter value
wherein, timei is the relative time of user when browsing content class contenti; The frequent time period number that described parameter is mated for adding up described session.Described 3rd algorithm comprises: LR logistic regression algorithm.Logistic regression is typical two sorting algorithms, and the model relative straightforward produced by it is simple, easily explains, and is not easy to produce Expired Drugs.It is a process of study f:X->Y equation in fact, we can n tuple variable vector X=<X1 given in advance, X2..., Xn> and m unit object vector Y=<Y1, Y2..., Ym>, and logistic regression learns a function f (X) exactly, the variate-value that the function learnt can be provided in advance according to us is fit object vector Y farthest.
In one embodiment, different user at different time, can browse certain content at different web sites, browse feature based on this user, we are mainly from frequently accessing network address, (network address, content) and (content, time period) this three aspect set about the extraction carrying out feature.Add up from frequent network address according to { (domain, content, timestamp) } session collection, frequent item set mining and frequent time period excavate three aspects and carry out feature extraction and obtain user and browse feature.In one embodiment, process statistics can be carried out to multiple user simultaneously:
Frequent access websites statistics: the webpage difference frequently browsed due to each user, counts 15 network address TLDs that each user the most frequently accesses, in the middle of the frequent access network address class FUj putting into relative users j.
Frequent item set mining: utilize Apriori algorithm, excavates the sequence relation existed between (network address, content).In this process, the present invention have chosen a suitable support threshold δ by experiment.For a frequent item set fc:X, Y, if support (fc) > is δ, then this frequent item set is added in the web-browsing frequent item set FCj of respective user j.
The frequent time period excavates: for a user, assuming that its time period of browsing certain content submits to a normal distribution process.The normal distribution model that the time utilizing (content, the time) data obtained from S* process to browse each class content content for user sets up obeys.Because the parameter in normal distribution process cannot accurately obtain, utilize the method for maximal possibility estimation from session, calculate the parameter value of the normal distribution that user obeys the browsing time to each content.Wherein,
And then corresponding characteristic value in acquisition session, each session in the middle of session collection has corresponding characteristic value, is designated as fvji.In one embodiment, and fv
ji=<length
i, pun
i, mrn
i, rpun
i, mrml
i, mral
i, mrms
i, mras
i, mtn
i, target
i>, the implication of central each value concrete as table 2 ought as shown in.
After the characteristic value collection obtaining session, utilize LR logistic regression algorithm to carry out browsing feature verification method (hereinafter referred to as UBFAA) based on user, its detailed process is as shown in algorithm 1.
Algorithm 1: the authentication method (UBFAA) browsing feature based on user
Input: validated user session collection S*, the frequent item set FC of validated user, the frequent access address set FU of validated user with
And frequent access time section collection FT
Export: characteristic value weight matrix w, array score is legal
1) each session s*i of the session collection S* of validated user is traveled through
Mrtl=0; The total length of the frequent item set that // session S*i mates
Mrts=0; Total support of the frequent item set that // session S*i mates
pun=0;
The element number that length=session collection S* comprises;
target=1;
1.1) the frequent access address set FU of validated user is traveled through
There is the TLD of fuj=current sessions web page class in the frequent access address set FU of if validated user, then pun adds 1;
1.2) the frequent item set FC of validated user is traveled through
If current sessions comprises frequent item set fcj
1.2.1) mrn adds 1, mrtl and to add up the length of upper current frequent item set, and mrts adds up the support of upper current frequent item set;
1.2.2) max support of current sessions institute matched rule is kept in mrms;
1.2.3) maximum length of current sessions institute matched rule is kept in mrml;
1.2.4) the frequent access websites number that in statistics, fcj comprises is kept in rpun;
1.3) Average Supports mras and the average length mral of current sessions institute matched rule is obtained;
1.4) the frequent time period collection FT of validated user is traveled through
There is ftj in the frequent access address set FT of if validated user, current sessions web page class .content=ftj.contentand current sessions web page class .time is existed
in interval,
Then mtn adds 1;
1.5) each attribute of session s*i is write in the middle of ten tuple-set FVi;
2) ten tuple set FVi are traveled through
2.1) creating matrix datas, is 1 by complete for its first row assignment, and by characteristic be stored in the middle of matrix;
2.2) create labels matrix, and last column data of FVi is stored in the middle of labels;
2.3) establishment value is the weight matrix w of the 10*1 size of 1 entirely;
3) the pace of learning alpha=0.01 of LR logistic regression is set, the maximum cycle maxCycles=500 of LR;
4) when calculation times is less than maxCycles, recycling gradient descent method calculates the value of weight matrix w;
5) exploitation right value matrix w calculates the corresponding score of session, and stored in array score
legalin;
6) returning right value matrix w and legitimate conversation are marked array score
legal.
Then, the feature value vector fv corresponding to the weight matrix w obtained according to above algorithm and session j
jcalculate the score of its correspondence
j, its computing formula is as follows:
For fv
i∈ FV,
score=w
0+w
1*fv
i.length+w
2*fv
i.pun+...+w
10*fv
i.mtn
Session for m validated user obtains the array score that marks
legal={ score
legal 1, score
legal 2..., score
legal m.
Step S3, according to the mark of a described m session, adopts the 4th algorithm to calculate the classification thresholds of described validated user.In one embodiment, described 4th algorithm comprises:
described classification thresholds is
wherein, score
legal ibe the mark of i-th session, m session altogether.
In one embodiment, as shown in Figure 2, described method for authentication of identification of network user also comprises:
Step S4, obtains a new session, and calculates the mark of described new session; When described mark falls into the scope of described classification thresholds, judge that active user is described validated user; When described mark does not fall into the scope of described classification thresholds, judge that active user is not described validated user.Adopt the method for step S1 to obtain a current sessions (new session), and adopt the method for step S2 to calculate the mark of this session, then according to the classification thresholds in step S3, judge whether the user belonging to current sessions is validated user.When the mark of new session falls into the scope of described classification thresholds, judge that active user is described validated user; When the mark of new session does not fall into the scope of described classification thresholds, judge that active user is not described validated user.
The invention provides also a kind of identity authentication system of network user.Described identity authentication system of network user can adopt method for authentication of identification of network user as above.In one embodiment, as shown in Figure 3, described identity authentication system of network user 1 comprises user conversation acquisition module 11, session scores computing module 12 and classification thresholds determination module 13.Wherein:
User conversation acquisition module 11 for gathering all web page browsing records of user in setting-up time section, described in browse record and comprise browsing page network address, content of text, timestamp; Network address TLD is extracted from described browsing page network address, extract keyword from described content of text and then determine the content class belonging to described content of text, < network address TLD is become by browsing recording processing described in each, content class, the form of timestamp >, using obtain in described setting-up time section all described in browse record as a session.Described setting-up time section comprises 30 minutes in one embodiment.
Session scores computing module 12 is connected with user conversation acquisition module 11, for for a session, record is browsed according to all in described session, count multiple network address TLDs that user the most frequently accesses, and utilize the first algorithm of setting excavate described in browse relation in record between network address TLD and content class, the second algorithm that utilization sets browses content class and the relation between the time period in record described in excavating, and then obtains n characteristic value of described user's browsing page; The 3rd algorithm according to setting processes obtained characteristic value, obtains the weight matrix corresponding with described characteristic value; The mark of described session is calculated according to described characteristic value and corresponding weight matrix.In one embodiment, described characteristic value comprises: the element number that session packet contains; The frequent access websites number that session packet contains; The frequent item set number that session is mated; The frequent access websites number comprised in the frequent item set of session coupling; The longest frequent item set length that session is mated; The equal length of frequent item set that session is mated; The max support of the frequent item set that session is mated; The Average Supports of the frequent item set that session is mated; The frequent time period number that session is mated; Target column.Described first algorithm comprises Apriori algorithm.Described second algorithm comprises: the method for maximal possibility estimation browses record from described session the parameter value calculating the normal distribution that user obeys the browsing time to each content class.Described parameter value
Wherein, time
ifor user is at browsing content class content
itime relative time; The frequent time period number that described parameter is mated for adding up described session.Described 3rd algorithm comprises: gradient descent method.
Classification thresholds determination module 13 is connected with session scores computing module, for obtaining multiple session scores of validated user, adopts the 4th algorithm to calculate the classification thresholds of described validated user.In one embodiment, described 4th algorithm comprises:
then described classification thresholds is
wherein, score
close method ibe the mark of i-th session, m session altogether.
In one embodiment, as shown in Figure 3, described identity authentication system of network user 1 also comprises the legal judge module 14 of user, the legal judge module 14 of user is connected with classification thresholds determination module 13, session scores computing module 12, user conversation acquisition module 11, for the session that acquisition one from user conversation acquisition module 11 is new, and calculated the mark of described new session by session scores computing module 12; When described mark falls into the scope of the classification thresholds of the validated user that classification thresholds determination module 13 obtains, judge that active user is described validated user; When described mark does not fall into the scope of described classification thresholds, judge that active user is not described validated user.Technical scheme of the present invention is by from the network address sequence browsed, and the navigation patterns of different user is distinguished in content and browsing time three aspect, thus provides Reliable guarantee for the account safety of user.Through experiment test, when rate of false alarm is 10%, energy network ID authentication of the present invention can reach the verification and measurement ratio of 93.6%, has good verification the verifying results, is conducive to ensureing user account safety.
In sum, a kind of method for authentication of identification of network user of the present invention and system have following beneficial effect: (the network address 1) user browsed, content), and (content, time) two factors carry out the excavation of sequence, instead of simple only consider wherein certain factor, thus make authentication method of the present invention meet user browse custom.2) utilize correlation rule, (network address, content) is combined and carries out the excavation that user browses custom; Based on normal distribution, in order to find the frequent access time section of user to each content.3) in the process of user's browsing page, reach the effect of the certification of continuation.So the present invention effectively overcomes various shortcoming of the prior art and tool high industrial utilization.
Above-described embodiment is illustrative principle of the present invention and effect thereof only, but not for limiting the present invention.Any person skilled in the art scholar all without prejudice under spirit of the present invention and category, can modify above-described embodiment or changes.Therefore, such as have in art usually know the knowledgeable do not depart from complete under disclosed spirit and technological thought all equivalence modify or change, must be contained by claim of the present invention.
Claims (10)
1. a method for authentication of identification of network user, is characterized in that, described method for authentication of identification of network user comprises:
Gather all web page browsing records of validated user in setting-up time section, described in browse record and comprise browsing page network address, content of text, timestamp; Network address TLD is extracted from described browsing page network address, extract keyword from described content of text and then determine the content class belonging to described content of text, < network address TLD is become by browsing recording processing described in each, content class, the form of timestamp >, using obtain in described setting-up time section all described in browse record as a session;
Obtain m session of described validated user, for each session, be handled as follows: browse record according to all in described session, count multiple network address TLDs that user the most frequently accesses, and utilize the first algorithm of setting excavate described in browse relation in record between network address TLD and content class, the second algorithm that utilization sets browses content class and the relation between the time period in record described in excavating, and then obtains n characteristic value of described user's browsing page; The 3rd algorithm according to setting processes obtained characteristic value, obtains the weight matrix corresponding with described characteristic value; The mark of described session is calculated according to described characteristic value and corresponding weight matrix;
According to the mark of a described m session, the 4th algorithm is adopted to calculate the classification thresholds of described validated user.
2. method for authentication of identification of network user according to claim 1, is characterized in that: described method for authentication of identification of network user also comprises: obtain a new session, and calculate the mark of described new session; When described mark falls into the scope of described classification thresholds, judge that active user is described validated user; When described mark does not fall into the scope of described classification thresholds, judge that active user is not described validated user.
3. method for authentication of identification of network user according to claim 1, is characterized in that: described characteristic value comprises: the element number that session packet contains; The frequent access websites number that session packet contains; The frequent item set number that session is mated; The frequent access websites number comprised in the frequent item set of session coupling; The longest frequent item set length that session is mated; The equal length of frequent item set that session is mated; The max support of the frequent item set that session is mated; The Average Supports of the frequent item set that session is mated; The frequent time period number that session is mated; Target column.
4. method for authentication of identification of network user according to claim 1, is characterized in that: described first algorithm comprises Apriori algorithm.
5. method for authentication of identification of network user according to claim 1, is characterized in that: described second algorithm comprises: the method for maximal possibility estimation browses record from described session the parameter value calculating the normal distribution that user obeys the browsing time to each content class.
6. method for authentication of identification of network user according to claim 5, is characterized in that: described parameter value comprises:
wherein, time
ifor user is at browsing content class content
itime relative time.
7. method for authentication of identification of network user according to claim 1, is characterized in that: described 3rd algorithm comprises: LR logistic regression algorithm.
8. method for authentication of identification of network user according to claim 1, is characterized in that: described 4th algorithm comprises:
then described classification thresholds is
wherein, score
legal iit is the mark of i-th session.
9. an identity authentication system of network user, is characterized in that: described identity authentication system of network user comprises:
User conversation acquisition module, for gathering all web page browsing records of validated user in setting-up time section, described in browse record and comprise browsing page network address, content of text, timestamp; Network address TLD is extracted from described browsing page network address, extract keyword from described content of text and then determine the content class belonging to described content of text, < network address TLD is become by browsing recording processing described in each, content class, the form of timestamp >, using obtain in described setting-up time section all described in browse record as a session;
Session scores computing module, for for a session, record is browsed according to all in described session, count multiple network address TLDs that user the most frequently accesses, and utilize the first algorithm of setting excavate described in browse relation in record between network address TLD and content class, the second algorithm that utilization sets browses content class and the relation between the time period in record described in excavating, and then obtains n characteristic value of described user's browsing page; The 3rd algorithm according to setting processes obtained characteristic value, obtains the weight matrix corresponding with described characteristic value; The mark of described session is calculated according to described characteristic value and corresponding weight matrix;
Classification thresholds determination module, for obtaining multiple session scores of validated user, adopts the 4th algorithm to calculate the classification thresholds of described validated user.
10. identity authentication system of network user according to claim 9, is characterized in that: described identity authentication system of network user also comprises the legal judge module of user, for obtaining a new session, and calculates the mark of described new session; When described mark falls into the scope of described classification thresholds, judge that active user is described validated user; When described mark does not fall into the scope of described classification thresholds, judge that active user is not described validated user.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510810443.1A CN105337987B (en) | 2015-11-20 | 2015-11-20 | A kind of method for authentication of identification of network user and system |
PCT/CN2016/070994 WO2017084205A1 (en) | 2015-11-20 | 2016-01-15 | Network user identity authentication method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510810443.1A CN105337987B (en) | 2015-11-20 | 2015-11-20 | A kind of method for authentication of identification of network user and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105337987A true CN105337987A (en) | 2016-02-17 |
CN105337987B CN105337987B (en) | 2018-07-03 |
Family
ID=55288270
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510810443.1A Active CN105337987B (en) | 2015-11-20 | 2015-11-20 | A kind of method for authentication of identification of network user and system |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN105337987B (en) |
WO (1) | WO2017084205A1 (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106776895A (en) * | 2016-11-29 | 2017-05-31 | 天津大学 | Interpersonal relationships automation portrait method based on person-to-person session information |
CN107368718A (en) * | 2017-07-06 | 2017-11-21 | 同济大学 | A kind of user browsing behavior authentication method and system |
CN107408115A (en) * | 2015-01-13 | 2017-11-28 | 微软技术许可有限责任公司 | web site access control |
CN108632087A (en) * | 2018-04-26 | 2018-10-09 | 四川斐讯信息技术有限公司 | A kind of online management method and system based on router |
CN109903067A (en) * | 2017-12-08 | 2019-06-18 | 北京京东尚科信息技术有限公司 | Information processing method and device |
CN110324292A (en) * | 2018-03-30 | 2019-10-11 | 富泰华工业(深圳)有限公司 | Authentication means, auth method and computer storage medium |
CN110414212A (en) * | 2019-08-05 | 2019-11-05 | 国网电子商务有限公司 | A kind of multidimensional characteristic dynamic identity authentication method and system towards power business |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109918873B (en) * | 2019-03-05 | 2022-12-06 | 西安电子科技大学 | Continuous identity authentication method for acquiring user interaction behavior by using mobile terminal |
CN117040923B (en) * | 2023-09-28 | 2024-03-19 | 联通(广东)产业互联网有限公司 | User behavior anomaly detection method and system based on Apriori algorithm |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130132366A1 (en) * | 2006-04-24 | 2013-05-23 | Working Research Inc. | Interest Keyword Identification |
CN104270358A (en) * | 2014-09-25 | 2015-01-07 | 同济大学 | Trusted network transaction system client side monitor and implementation method thereof |
CN104618372A (en) * | 2015-02-02 | 2015-05-13 | 同济大学 | Device and method for authenticating user identity based on WEB browsing habits |
CN104731914A (en) * | 2015-03-24 | 2015-06-24 | 浪潮集团有限公司 | Method for detecting user abnormal behavior based on behavior similarity |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103544150B (en) * | 2012-07-10 | 2016-03-09 | 腾讯科技(深圳)有限公司 | For browser of mobile terminal provides the method and system of recommendation information |
EP2929671B1 (en) * | 2012-12-07 | 2017-02-22 | Microsec Szamitastechnikai Fejlesztö Zrt. | Method and system for authenticating a user using a mobile device and by means of certificates |
-
2015
- 2015-11-20 CN CN201510810443.1A patent/CN105337987B/en active Active
-
2016
- 2016-01-15 WO PCT/CN2016/070994 patent/WO2017084205A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130132366A1 (en) * | 2006-04-24 | 2013-05-23 | Working Research Inc. | Interest Keyword Identification |
CN104270358A (en) * | 2014-09-25 | 2015-01-07 | 同济大学 | Trusted network transaction system client side monitor and implementation method thereof |
CN104618372A (en) * | 2015-02-02 | 2015-05-13 | 同济大学 | Device and method for authenticating user identity based on WEB browsing habits |
CN104731914A (en) * | 2015-03-24 | 2015-06-24 | 浪潮集团有限公司 | Method for detecting user abnormal behavior based on behavior similarity |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107408115A (en) * | 2015-01-13 | 2017-11-28 | 微软技术许可有限责任公司 | web site access control |
CN107408115B (en) * | 2015-01-13 | 2020-10-09 | 微软技术许可有限责任公司 | Web site filter, method and medium for controlling access to content |
CN106776895A (en) * | 2016-11-29 | 2017-05-31 | 天津大学 | Interpersonal relationships automation portrait method based on person-to-person session information |
CN106776895B (en) * | 2016-11-29 | 2019-05-14 | 天津大学 | Interpersonal relationships based on person-to-person session information automates portrait method |
CN107368718A (en) * | 2017-07-06 | 2017-11-21 | 同济大学 | A kind of user browsing behavior authentication method and system |
CN107368718B (en) * | 2017-07-06 | 2022-08-16 | 同济大学 | User browsing behavior authentication method and system |
CN109903067A (en) * | 2017-12-08 | 2019-06-18 | 北京京东尚科信息技术有限公司 | Information processing method and device |
CN109903067B (en) * | 2017-12-08 | 2021-07-16 | 北京京东尚科信息技术有限公司 | Information processing method and device |
CN110324292A (en) * | 2018-03-30 | 2019-10-11 | 富泰华工业(深圳)有限公司 | Authentication means, auth method and computer storage medium |
CN110324292B (en) * | 2018-03-30 | 2022-01-07 | 富泰华工业(深圳)有限公司 | Authentication device, authentication method, and computer storage medium |
CN108632087A (en) * | 2018-04-26 | 2018-10-09 | 四川斐讯信息技术有限公司 | A kind of online management method and system based on router |
CN110414212A (en) * | 2019-08-05 | 2019-11-05 | 国网电子商务有限公司 | A kind of multidimensional characteristic dynamic identity authentication method and system towards power business |
Also Published As
Publication number | Publication date |
---|---|
CN105337987B (en) | 2018-07-03 |
WO2017084205A1 (en) | 2017-05-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105337987A (en) | Network user identity authentication method and system | |
Mitra et al. | Comparing person-and process-centric strategies for obtaining quality data on amazon mechanical turk | |
CN103745000B (en) | Hot topic detection method of Chinese micro-blogs | |
CN102170446A (en) | Fishing webpage detection method based on spatial layout and visual features | |
CN110781308B (en) | Anti-fraud system for constructing knowledge graph based on big data | |
CN104077396A (en) | Method and device for detecting phishing website | |
CN106056407A (en) | Online banking user portrait drawing method and equipment based on user behavior analysis | |
CN102609407B (en) | Fine-grained semantic detection method of harmful text contents in network | |
US20120314941A1 (en) | Accurate text classification through selective use of image data | |
CN107577682A (en) | Users' Interests Mining and user based on social picture recommend method and system | |
Feng et al. | Patterns and pace: Quantifying diverse exploration behavior with visualizations on the web | |
CN107368718A (en) | A kind of user browsing behavior authentication method and system | |
CN104899229A (en) | Swarm intelligence based behavior clustering system | |
Zhang et al. | Anomaly detection in bitcoin information networks with multi-constrained meta path | |
CN107885857A (en) | A kind of search results pages user's behavior pattern mining method, apparatus and system | |
CN103440328B (en) | A kind of user classification method based on mouse behavior | |
CN108009215B (en) | A kind of search results pages user behavior pattern assessment method, apparatus and system | |
Rahman et al. | New biostatistics features for detecting web bot activity on web applications | |
CN103023874A (en) | Phishing website detection method | |
CN105512224A (en) | Search engine user satisfaction automatic assessment method based on cursor position sequence | |
de Moura et al. | Using structural information to improve search in Web collections | |
CN106330861A (en) | Website detection method and apparatus | |
CN111125747B (en) | Commodity browsing privacy protection method and system for commercial website user | |
CN111612531A (en) | Click fraud detection method and system | |
Jin et al. | Graph-based identification and authentication: A stochastic kronecker approach |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |