CN105337987B - A kind of method for authentication of identification of network user and system - Google Patents
A kind of method for authentication of identification of network user and system Download PDFInfo
- Publication number
- CN105337987B CN105337987B CN201510810443.1A CN201510810443A CN105337987B CN 105337987 B CN105337987 B CN 105337987B CN 201510810443 A CN201510810443 A CN 201510810443A CN 105337987 B CN105337987 B CN 105337987B
- Authority
- CN
- China
- Prior art keywords
- session
- user
- score
- algorithm
- browsing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 42
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 56
- 239000011159 matrix material Substances 0.000 claims abstract description 19
- 238000007477 logistic regression Methods 0.000 claims description 8
- 206010036086 Polymenorrhoea Diseases 0.000 claims description 6
- 239000000284 extract Substances 0.000 claims description 4
- 238000007476 Maximum Likelihood Methods 0.000 claims 1
- 230000000694 effects Effects 0.000 abstract description 6
- 238000012545 processing Methods 0.000 description 6
- 238000009412 basement excavation Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000000605 extraction Methods 0.000 description 4
- 239000013598 vector Substances 0.000 description 4
- 238000012795 verification Methods 0.000 description 4
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000002776 aggregation Effects 0.000 description 2
- 238000004220 aggregation Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000005065 mining Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000004064 recycling Methods 0.000 description 2
- VYZAMTAEIAYCRO-UHFFFAOYSA-N Chromium Chemical compound [Cr] VYZAMTAEIAYCRO-UHFFFAOYSA-N 0.000 description 1
- 241001269238 Data Species 0.000 description 1
- 101150030891 MRAS gene Proteins 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000003542 behavioural effect Effects 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 239000003973 paint Substances 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/08—Network architectures or network communication protocols for network security for authentication of entities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/02—Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/535—Tracking the activity of the user
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- Computer Security & Cryptography (AREA)
- Storage Device Security (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The present invention provides a kind of method for authentication of identification of network user and system.The method for authentication of identification of network user includes:All web page browsings record of the validated user in set period of time is acquired as a session, each browsing record is processed into<Network address top level domain, content class, timestamp>Form;M session of the validated user is obtained, for each session, is handled as follows:According to the session, the n characteristic value that the user browses webpage is obtained;Acquired characteristic value is handled according to the third algorithm of setting, is obtained and the corresponding weight matrix of the characteristic value;The score of the session is calculated according to the characteristic value and corresponding weight matrix;According to the score of session, the classification thresholds of the validated user are calculated using the 4th algorithm.Technical scheme of the present invention by recording network address according to browsing, start with into Line Continuity certification, improves certification effect by three aspects of content and time.
Description
Technical field
The present invention relates to a kind of network security technology, more particularly to a kind of method for authentication of identification of network user and system.
Background technology
With the development of information technology and Internet technology, the scale of the online personnel in China constantly expands, shopping online
Also more and more frequent with transaction, online has become an indispensable part during many life are lived, at the same time, net purchase transaction
In fraud crime also steeply rising in recent years, the new network fraud that artificial hoax and technological means are combined has become net
The primary security threat lived on people's line.The important of the safety that authentication is to provide in network trading is carried out to the network user
Method.In terms of authenticating user identification, two class of disposable certification and sustainable certification can be divided into.About disposable certification, mesh
It is preceding mainly to have traditional certification based on password, the certification based on smart card, the biological characteristic based on user and behavioural characteristic
Certification etc..But disposable verification is only at a time authenticated, and certification is legal by then judging the user identity, it is impossible to very well
Ground provides safety guarantee to the user, thus further provides sustainability certification.At present about the research phase of sustainable certification
To less, existing sustainable certification is mainly studied from the contact between user's network address sequence or user's browsing content.
To the not comprehensive enough of user browsing behavior consideration, certification effect is to be improved.
In consideration of it, the technical solution for further improving authentication of identification of network user safety how is found just into this field
Technical staff's urgent problem to be solved.
Invention content
In view of the foregoing deficiencies of prior art, the purpose of the present invention is to provide a kind of authentication of identification of network user sides
Method and system, for solving the problems, such as that authentication of identification of network user safety needs to be further improved in the prior art.
In order to achieve the above objects and other related objects, the present invention provides a kind of method for authentication of identification of network user, described
Method for authentication of identification of network user includes:All web page browsings record of the validated user in set period of time is acquired, it is described clear
Record of looking at includes browsing webpage network address, content of text, timestamp;Network address top level domain is extracted from the browsing webpage network address
Name extracts keyword from the content of text and then determines the content class belonging to the content of text, will be clear described in each
Record of looking at is processed into<Network address top level domain, content class, timestamp>Form, it is all by what is obtained in the set period of time
The browsing record is as a session;M session of the validated user is obtained, for each session, is handled as follows:
All browsings record in the session, counts multiple network address top level domain that user is most frequently visited by, and utilize and set
The first fixed algorithm excavates the relationship between network address top level domain and content class in the browsing record, utilizes the second of setting
Algorithm excavates content class and the relationship between the period in the browsing record, and then obtains the n that the user browses webpage
A characteristic value;Acquired characteristic value is handled according to the third algorithm of setting, is obtained corresponding with the characteristic value
Weight matrix;The score of the session is calculated according to the characteristic value and corresponding weight matrix;According to the m
The classification thresholds of the validated user are calculated using the 4th algorithm for the score of session.
Optionally, the method for authentication of identification of network user further includes:A new session is obtained, and is calculated described new
The score of session;When the score falls into the range of the classification thresholds, judgement active user is the validated user;Work as institute
When stating range of the score without falling into the classification thresholds, judgement active user is not the validated user.
Optionally, the characteristic value includes:The element number that session includes;The frequent access Number of websites that session includes;
The matched frequent item set number of session institute;The frequent access Number of websites included in the matched frequent item set of session;Session institute
The longest frequent item set length matched;The matched equal length of frequent item set of session institute;Session matched frequent item set maximum branch
Degree of holding;Session matched frequent item set Average Supports;The matched frequent period number of session institute;Target column.
Optionally, first algorithm includes Apriori algorithm.
Optionally, second algorithm includes:The method of maximal possibility estimation is calculated from the browsing record of the session
Go out the parameter value of normal distribution that user obeys the browsing time of each content class.
Optionally, the third algorithm includes:LR logistic regression algorithms.
Optionally, the 4th algorithm includes:Then institute
Stating classification thresholds isWherein, scoreLegal iFor the score of i-th of session, common m session.
Optionally, the set period of time includes 30 minutes.
The present invention provides also a kind of identity authentication system of network user, and the identity authentication system of network user includes:With
Family acquisition conversation module, for acquiring all web page browsings record of the validated user in set period of time, the browsing record
Including browsing webpage network address, content of text, timestamp;Network address top level domain is extracted from the browsing webpage network address, from institute
It states content of text to extract keyword and then determine the content class belonging to the content of text, will be browsed at record described in each
It manages into<Network address top level domain, content class, timestamp>Form, all browsings that will be obtained in the set period of time
Record is as a session;Session scores computing module, for being directed to a session, all browsings note in the session
Record, counts multiple network address top level domain that user is most frequently visited by, and excavate the browsing using the first algorithm of setting
Relationship in record between network address top level domain and content class is excavated using the second algorithm of setting in the browsing record
Hold class and the relationship between the period, and then obtain the n characteristic value that the user browses webpage;According to the third algorithm of setting
Acquired characteristic value is handled, is obtained and the corresponding weight matrix of the characteristic value;According to the characteristic value and
The score of the session is calculated in corresponding weight matrix;Classification thresholds determining module, for obtaining the more of validated user
The classification thresholds of the validated user are calculated using the 4th algorithm for a session scores.
Optionally, the identity authentication system of network user further includes the legal judgment module of user, new for obtaining one
Session, and calculate the score of the new session;When the score falls into the range of the classification thresholds, current use is judged
Family is the validated user;When range of the score without falling into the classification thresholds, judgement active user is not the conjunction
Method user.
Optionally, the characteristic value includes:The element number that session includes;The frequent access Number of websites that session includes;
The matched frequent item set number of session institute;The frequent access Number of websites included in the matched frequent item set of session;Session institute
The longest frequent item set length matched;The matched equal length of frequent item set of session institute;Session matched frequent item set maximum branch
Degree of holding;Session matched frequent item set Average Supports;The matched frequent period number of session institute;Target column.
Optionally, first algorithm includes Apriori algorithm.
Optionally, second algorithm includes:The method of maximal possibility estimation is calculated from the browsing record of the session
Go out the parameter value of normal distribution that user obeys the browsing time of each content class.
Optionally, the third algorithm includes:LR logistic regression algorithms.
Optionally, the 4th algorithm includes:Then institute
Stating classification thresholds isWherein, scoreLegal iFor the score of i-th of session, common m session.
Optionally, the set period of time includes 30 minutes.
As described above, a kind of method for authentication of identification of network user and system of the present invention, have the advantages that:1) will
(network address, content) and (content, time) two factors that user is browsed carry out the excavation of sequence rather than only examine merely
Wherein some factor is considered, so that the authentication method of the present invention meets the browsing custom of user.It 2), will using correlation rule
(network address, content) joint carries out the excavation that user browses custom;Based on normal distribution, to find frequency of the user to each content
Numerous access time section.3) certification of duration has been achieved the effect that during user browses webpage.
Description of the drawings
Fig. 1 is shown as a kind of flow diagram of an embodiment of method for authentication of identification of network user of the present invention.
Fig. 2 is shown as a kind of flow diagram of another embodiment of method for authentication of identification of network user of the present invention.
Fig. 3 is shown as a kind of module diagram of an embodiment of identity authentication system of network user of the present invention.
Component label instructions
1 identity authentication system of network user
11 user conversation acquisition modules
12 session scores computing modules
13 classification thresholds determining modules
The legal judgment module of 14 users
S1~S4 steps
Specific embodiment
Illustrate embodiments of the present invention below by way of specific specific example, those skilled in the art can be by this specification
Disclosed content understands other advantages and effect of the present invention easily.The present invention can also pass through in addition different specific realities
The mode of applying is embodied or practiced, the various details in this specification can also be based on different viewpoints with application, without departing from
Various modifications or alterations are carried out under the spirit of the present invention.
It should be noted that the diagram provided in the present embodiment only illustrates the basic conception of the present invention in a schematic way,
Then in schema only display with it is of the invention in related component rather than component count, shape and size during according to actual implementation paint
System, kenel, quantity and the ratio of each component can be a kind of random change, and its assembly layout kenel also may be used during actual implementation
It can be increasingly complex.
The present invention provides a kind of method for authentication of identification of network user.The method for authentication of identification of network user is clear according to user
Behavior of looking at carries out authentication.In one embodiment, as shown in Figure 1, the method for authentication of identification of network user includes:
Step S1, all web page browsings record of the acquisition validated user in set period of time, the browsing record include
Browse webpage network address, content of text, timestamp;Network address top level domain is extracted from the browsing webpage network address, from the text
This content extraction goes out keyword and then determines the content class belonging to the content of text, and browsing record described in each is processed into
<Network address top level domain, content class, timestamp>Form, will be obtained in the set period of time it is all it is described browsing record
As a session.In one embodiment, the web-browsing record of a user is acquired, data processing is carried out, forms following institute
Basis of { (domain, content, the timestamp) } session structure as subsequent analysis shown.With the time interval of 30 minutes
Processing division is carried out to the browsing record of acquisition, a session is obtained within every 30 minutes, performs multiple step S1, such as perform m hyposynchronization
Rapid S1 obtains m session, and m merged session finally is obtained corresponding session aggregation S.It is follow-up be also when being authenticated with
The primary access behavior (i.e. a session, 30 minutes) at family is authenticated for unit.
In one embodiment, the sqlite databases carried first with chrome browsers, acquire validated user
Browsing record.Details when user browses each webpage are had recorded in sqlite databases, each user is acquired and is browsed
The url (uniform resource locator, Uniform Resource Locator) of webpage, i.e. web page address;Content of text is with timely
Between stamp as it is original browsing record.Browsing is denoted as r, attribute is as shown in table 1 below:
After initial data is obtained, initial data can be handled:First, each browsing in session is recorded into
Row processing, carries out its url the extraction of top level domain;The text classification sample and the text on network for recycling sogou laboratories
Zhang Gongtong extracts the content of text under each class to obtain corresponding keyword, is carried out later with the web page title for needing to classify
Matching obtains the content class belonging to the webpage.After Web Page Processing, as first original browsing record of table 1 is handled
For the form of (news.163.com, society, timestamp), we by this form data be denoted as webpage p (domain,
content,timestamp)。
Step S2 obtains m session of the validated user, for each session, is handled as follows:According to the meeting
All browsings record in words counts multiple network address top level domain that user is most frequently visited by, and first using setting is calculated
Method excavates the relationship between network address top level domain and content class in the browsing record, is excavated using the second algorithm of setting
Content class and the relationship between the period in the browsing record, and then obtain the n characteristic value that the user browses webpage;Root
Acquired characteristic value is handled according to the third algorithm of setting, is obtained and the corresponding weight matrix of the characteristic value;Root
The score of the session is calculated according to the characteristic value and corresponding weight matrix.In one embodiment, with 30 points
The time interval of clock carries out processing division to the browsing record of acquisition, obtains within every 30 minutes a session, performs m step S1,
M session is obtained, m merged session is finally obtained into corresponding session aggregation S.The characteristic value includes:The member that session includes
Plain number;The frequent access Number of websites that session includes;The matched frequent item set number of session institute;The matched frequent item set of session
In include frequent access the Number of websites;The matched longest frequent item set length of session institute;The matched frequent item set of session institute is equal
Length;Session matched frequent item set max support;Session matched frequent item set Average Supports;Session institute
Matched frequent period number;Target column.First algorithm includes Apriori algorithm.Apriori algorithm is a kind of excavation
The frequent item set algorithm of correlation rule, core concept are downward closing two stages of detection by candidate generation and plot
Carry out Mining Frequent Itemsets Based.And algorithm has been widely used the every field such as business, network security.
Second algorithm includes:The method of maximal possibility estimation calculates user couple from the browsing record of the session
The parameter value of normal distribution that the browsing time of each content class is obeyed.The parameter value
Wherein, timei is relative time of the user in browsing content class contenti;The parameter is used to count the session institute
The frequent period number matched.The third algorithm includes:LR logistic regression algorithms.Logistic regression is typical two classification
Algorithm, the model relative straightforward generated by it is simple, easily explains, and do not allow to be also easy to produce over-fitting.It is to learn in fact
Practise f:X->One process of Y equations, we can previously given n tuple variable vectors X=<X1,X2...,Xn>With m member mesh
Mark vector Y=<Y1,Y2...,Ym>, and logistic regression is exactly to learn a function f (X) so that the function energy learnt according to
The variate-value that we provide in advance farthest fit object vector Y.
In one embodiment, different user can browse specific content, based on this use in different time in different web sites
Family browses feature, we mainly access network address from frequent, (network address, content) and (content, period) this three aspect set about into
The extraction of row feature.It is counted according to { (domain, content, timestamp) } session collection from frequent network address, frequent item set is dug
Progress feature extractions obtain user and browse feature in terms of pick and frequently period excavate three.In one embodiment, Ke Yitong
When processing statistics is carried out to multiple users:
Frequently access website statistics:Since the webpage that each user frequently browses is different, each user most frequency is counted
15 network address top level domain of numerous access are put into the frequent of relative users j and access in network address class FUj.
Frequent item set mining:Using Apriori algorithm, existing sequence relation between (network address, content) is excavated.At this
During a, the present invention has chosen a suitable support threshold δ by experiment.For a frequent item set fc:X, Y, if
support(fc)>The frequent item set is then added in the web-browsing frequent item set FCj of corresponding user j by δ.
And then corresponding characteristic value in session is obtained, each session in session collection has corresponding characteristic value,
It is denoted as fvji.In one embodiment, and fvji=<lengthi,puni,mrni,rpuni,mrmli,mrali,mrmsi,
mrasi,mtni,targeti>, the meaning being each worth in the middle is specifically if table 2 is as shown in.
After the characteristic value collection of session is obtained, carry out browsing feature verification based on user using LR logistic regressions algorithm
Method (hereinafter referred to as UBFAA), detailed process is as shown in algorithm 1.
Algorithm 1:The authentication method (UBFAA) of feature is browsed based on user
Input:Validated user session collection S*, the frequent item set FC of validated user, the frequent access address set FU of validated user
With
And frequently access time section collects FT
Output:Characteristic value weight matrix w, array score is legal
1) each session s*i of the session collection S* of traversal validated user
Mrtl=0;// session S*i matched frequent item set total length
Mrts=0;// session S*i matched frequent item set total support
Pun=0;
The element number that length=session collection S* is included;
Target=1;
1.1) the frequent of traversal validated user accesses address set FU
The frequent of if validated users is accessed in address set FU there are the top level domain of fuj=current sessions web page class, then pun
Add 1;
1.2) the frequent item set FC of validated user is traversed
If current sessions include frequent item set fcj
1.2.1) mrn adds 1, mrtl to add up the length of upper current frequent item set, and mrts adds up the branch of upper current frequent item set
Degree of holding;
1.2.2) max support of current sessions institute matching rule is stored in mrms;
1.2.3) maximum length of current sessions institute matching rule is stored in mrml;
1.2.4 the frequent Number of websites that accesses that fcj is included in) counting is stored in rpun;
1.3) the Average Supports mras and average length mral of current sessions institute matching rule are obtained;
1.4) the frequent period collection FT of validated user is traversed
There are ftj in the frequent access address set FT of if validated users so that current sessions web page class .content=
Ftj.content and current sessions web page class .time existsIn section,
Then mtn adds 1;
1.5) each attribute of session s*i is written in ten tuple-set FVi;
2) ten tuple set FVi are traversed
2.1) create matrix datas, its first row be assigned a value of 1 entirely, and by characteristic storage to matrix in;
2.2) labels matrixes are created, and will be in last column data storage to labels of FVi;
2.3) establishment value is all the weight matrix w of 1 10*1 sizes;
3) the maximum cycle maxCycles=500 of the pace of learning alpha=0.01, LR of setting LR logistic regressions;
4) when calculation times are less than maxCycles, recycling gradient descent method calculates the value of weight matrix w;
5) the corresponding score of session is calculated using weight matrix w, and is stored in array scoreIt is legalIn;
6) weight matrix w and legitimate conversation scoring array score are returnedIt is legal。
Then, the feature value vector fv according to corresponding to the weight matrix w that algorithm above obtains with session jjIt is right to calculate its
The score answeredj, calculation formula is as follows:
For fvi∈ FV,
Score=w0+w1*fvi.length+w2*fvi.pun+...+w10*fvi.mtn
Session for m validated user obtains scoring array scoreIt is legal={ scoreLegal 1, scoreLegal 2...,
scoreLegal m}。
According to the score of the m session, the classification threshold of the validated user is calculated using the 4th algorithm by step S3
Value.In one embodiment, the 4th algorithm includes:Institute
Stating classification thresholds isWherein, scoreLegal iFor the score of i-th of session, common m session.
In one embodiment, as shown in Fig. 2, the method for authentication of identification of network user further includes:
Step S4 obtains a new session, and calculates the score of the new session;When the score falls into described point
During the range of class threshold value, judgement active user is the validated user;When the score is without falling into the range of the classification thresholds
When, judgement active user is not the validated user.One current sessions (new session) is obtained using the method for step S1, and
The score of the session is calculated using the method for step S2, then the classification thresholds in step S3, judge belonging to current sessions
User whether be validated user.When the score of new session falls into the range of the classification thresholds, judgement active user is
The validated user;When new session score without falling into the classification thresholds range when, judgement active user be not described
Validated user.
The present invention provides also a kind of identity authentication system of network user.The identity authentication system of network user may be used
Method for authentication of identification of network user as described above.In one embodiment, as shown in figure 3, the authentication of identification of network user
System 1 includes user conversation acquisition module 11, session scores computing module 12 and classification thresholds determining module 13.Wherein:
User conversation acquisition module 11 is used to acquire all web page browsings record of the user in set period of time, described clear
Record of looking at includes browsing webpage network address, content of text, timestamp;Network address top level domain is extracted from the browsing webpage network address
Name extracts keyword from the content of text and then determines the content class belonging to the content of text, will be clear described in each
Record of looking at is processed into<Network address top level domain, content class, timestamp>Form, it is all by what is obtained in the set period of time
The browsing record is as a session.The set period of time includes 30 minutes in one embodiment.
Classification thresholds determining module 13 is connected with session scores computing module, for obtaining multiple sessions of validated user point
The classification thresholds of the validated user are calculated using the 4th algorithm for number.In one embodiment, the 4th algorithm packet
It includes:Then the classification thresholds areWherein,
scoreLegal iFor the score of i-th of session, common m session.
In one embodiment, as shown in figure 3, the identity authentication system of network user 1 further includes the legal judgement of user
Module 14, the legal judgment module 14 of user are obtained with classification thresholds determining module 13, session scores computing module 12, user conversation
Module 11 is connected, and for obtaining a new session from user conversation acquisition module 11, and passes through session scores computing module
12 calculate the score of the new session;When the score falls into point of the validated user obtained in classification thresholds determining module 13
During the range of class threshold value, judgement active user is the validated user;When the score is without falling into the range of the classification thresholds
When, judgement active user is not the validated user.Technical scheme of the present invention by the network address sequence from browsing, content and
Browsing time, three aspects distinguished the navigation patterns of different user, so as to which the account safety for user provides reliable guarantee.By
Experiment test, when rate of false alarm is 10%, the network ID authentication of the energy present invention can reach 93.6% verification and measurement ratio, have very
Good verification the verifying results advantageously ensure that user account safety.
In conclusion a kind of method for authentication of identification of network user and system of the present invention have the advantages that:1) will
(network address, content) and (content, time) two factors that user is browsed carry out the excavation of sequence rather than only examine merely
Wherein some factor is considered, so that the authentication method of the present invention meets the browsing custom of user.It 2), will using correlation rule
(network address, content) joint carries out the excavation that user browses custom;Based on normal distribution, to find frequency of the user to each content
Numerous access time section.3) certification of duration has been achieved the effect that during user browses webpage.So the present invention is effectively
It overcomes various shortcoming of the prior art and has high industrial utilization.
The above-described embodiments merely illustrate the principles and effects of the present invention, and is not intended to limit the present invention.It is any ripe
The personage for knowing this technology all can carry out modifications and changes under the spirit and scope without prejudice to the present invention to above-described embodiment.Cause
This, those of ordinary skill in the art is complete without departing from disclosed spirit and institute under technological thought such as
Into all equivalent modifications or change, should by the present invention claim be covered.
Claims (7)
1. a kind of method for authentication of identification of network user, which is characterized in that the method for authentication of identification of network user includes:
All web page browsings record of the validated user in set period of time is acquired, the browsing record includes browsing webpage net
Location, content of text, timestamp;Network address top level domain is extracted from the browsing webpage network address, is extracted from the content of text
Go out keyword and then determine the content class belonging to the content of text, browsing record described in each is processed into<Network address is top
Domain name, content class, timestamp>Form, using obtained in the set period of time it is all it is described browsing record as one
Session;
M session of the validated user is obtained, for each session, is handled as follows:It is all in the session
Browsing record, counts multiple network address top level domain that user is most frequently visited by, and excavate institute using the first algorithm of setting
The relationship between network address top level domain and content class in browsing record is stated, the browsing is excavated using the second algorithm of setting and remembers
Content class and the relationship between the period in record, and then obtain the n characteristic value that the user browses webpage;According to the of setting
Three algorithms handle acquired characteristic value, obtain and the corresponding weight matrix of the characteristic value;According to the feature
The score of the session is calculated in value and corresponding weight matrix;
According to the score of the m session, the classification thresholds of the validated user are calculated using the 4th algorithm;
The method for authentication of identification of network user further includes:A new session is obtained, and calculates the score of the new session;
When the score falls into the range of the classification thresholds, judgement active user is the validated user;When the score is not fallen
When entering the range of the classification thresholds, judgement active user is not the validated user;
4th algorithm includes:Then the classification thresholds areWherein, scoreLegal iScore for i-th of session.
2. method for authentication of identification of network user according to claim 1, it is characterised in that:The characteristic value includes:Session
Comprising element number;The frequent access Number of websites that session includes;The matched frequent item set number of session institute;Session is matched
The frequent access Number of websites included in frequent item set;The matched longest frequent item set length of session institute;The matched frequency of session institute
Numerous equal length of item collection;Session matched frequent item set max support;Session matched frequent item set average support
Degree;The matched frequent period number of session institute;Target column.
3. method for authentication of identification of network user according to claim 1, it is characterised in that:First algorithm includes
Apriori algorithm.
4. method for authentication of identification of network user according to claim 1, it is characterised in that:Second algorithm includes:Most
The method of maximum-likelihood estimation calculates user from the browsing record of the session and the browsing time of each content class is obeyed
Normal distribution parameter value.
5. method for authentication of identification of network user according to claim 4, it is characterised in that:The parameter value includes:Wherein, timeiIt is user in browsing content class contentiWhen relative time.
6. method for authentication of identification of network user according to claim 1, it is characterised in that:The third algorithm includes:LR
Logistic regression algorithm.
7. a kind of identity authentication system of network user, it is characterised in that:The identity authentication system of network user includes:
User conversation acquisition module, it is described clear for acquiring all web page browsings record of the validated user in set period of time
Record of looking at includes browsing webpage network address, content of text, timestamp;Network address top level domain is extracted from the browsing webpage network address
Name extracts keyword from the content of text and then determines the content class belonging to the content of text, will be clear described in each
Record of looking at is processed into<Network address top level domain, content class, timestamp>Form, it is all by what is obtained in the set period of time
The browsing record is as a session;
Session scores computing module, for being directed to a session, all browsings record in the session counts user
The multiple network address top level domain being most frequently visited by, and using setting the first algorithm excavate it is described browsing record in network address it is top
Relationship between domain name and content class, using the second algorithm of setting excavate in the browsing record content class and period it
Between relationship, and then obtain the n characteristic value that the user browses webpage;According to the third algorithm of setting to acquired feature
Value is handled, and is obtained and the corresponding weight matrix of the characteristic value;According to the characteristic value and corresponding weights square
The score of the session is calculated in battle array;
Classification thresholds determining module for obtaining multiple session scores of validated user, is calculated described using the 4th algorithm
The classification thresholds of validated user;
The identity authentication system of network user further includes the legal judgment module of user, for obtaining a new session, and counts
Calculate the score of the new session;When the score falls into the range of the classification thresholds, judgement active user is the conjunction
Method user;When range of the score without falling into the classification thresholds, judgement active user is not the validated user;
4th algorithm includes:Then the classification thresholds areWherein, scoreLegal iScore for i-th of session.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510810443.1A CN105337987B (en) | 2015-11-20 | 2015-11-20 | A kind of method for authentication of identification of network user and system |
PCT/CN2016/070994 WO2017084205A1 (en) | 2015-11-20 | 2016-01-15 | Network user identity authentication method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510810443.1A CN105337987B (en) | 2015-11-20 | 2015-11-20 | A kind of method for authentication of identification of network user and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105337987A CN105337987A (en) | 2016-02-17 |
CN105337987B true CN105337987B (en) | 2018-07-03 |
Family
ID=55288270
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510810443.1A Active CN105337987B (en) | 2015-11-20 | 2015-11-20 | A kind of method for authentication of identification of network user and system |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN105337987B (en) |
WO (1) | WO2017084205A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110414212A (en) * | 2019-08-05 | 2019-11-05 | 国网电子商务有限公司 | A kind of multidimensional characteristic dynamic identity authentication method and system towards power business |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10154041B2 (en) * | 2015-01-13 | 2018-12-11 | Microsoft Technology Licensing, Llc | Website access control |
CN106776895B (en) * | 2016-11-29 | 2019-05-14 | 天津大学 | Interpersonal relationships based on person-to-person session information automates portrait method |
CN107368718B (en) * | 2017-07-06 | 2022-08-16 | 同济大学 | User browsing behavior authentication method and system |
CN109903067B (en) * | 2017-12-08 | 2021-07-16 | 北京京东尚科信息技术有限公司 | Information processing method and device |
CN110324292B (en) * | 2018-03-30 | 2022-01-07 | 富泰华工业(深圳)有限公司 | Authentication device, authentication method, and computer storage medium |
CN108632087B (en) * | 2018-04-26 | 2021-12-28 | 深圳市华迅光通信有限公司 | Internet access management method and system based on router |
CN109918873B (en) * | 2019-03-05 | 2022-12-06 | 西安电子科技大学 | Continuous identity authentication method for acquiring user interaction behavior by using mobile terminal |
CN117040923B (en) * | 2023-09-28 | 2024-03-19 | 联通(广东)产业互联网有限公司 | User behavior anomaly detection method and system based on Apriori algorithm |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104270358A (en) * | 2014-09-25 | 2015-01-07 | 同济大学 | Trusted network transaction system client side monitor and implementation method thereof |
CN104618372A (en) * | 2015-02-02 | 2015-05-13 | 同济大学 | Device and method for authenticating user identity based on WEB browsing habits |
CN104731914A (en) * | 2015-03-24 | 2015-06-24 | 浪潮集团有限公司 | Method for detecting user abnormal behavior based on behavior similarity |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10042927B2 (en) * | 2006-04-24 | 2018-08-07 | Yeildbot Inc. | Interest keyword identification |
CN103544150B (en) * | 2012-07-10 | 2016-03-09 | 腾讯科技(深圳)有限公司 | For browser of mobile terminal provides the method and system of recommendation information |
CN104838629B (en) * | 2012-12-07 | 2017-11-21 | 微秒资讯科技发展有限公司 | Use mobile device and the method and system that are authenticated by means of certificate to user |
-
2015
- 2015-11-20 CN CN201510810443.1A patent/CN105337987B/en active Active
-
2016
- 2016-01-15 WO PCT/CN2016/070994 patent/WO2017084205A1/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104270358A (en) * | 2014-09-25 | 2015-01-07 | 同济大学 | Trusted network transaction system client side monitor and implementation method thereof |
CN104618372A (en) * | 2015-02-02 | 2015-05-13 | 同济大学 | Device and method for authenticating user identity based on WEB browsing habits |
CN104731914A (en) * | 2015-03-24 | 2015-06-24 | 浪潮集团有限公司 | Method for detecting user abnormal behavior based on behavior similarity |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110414212A (en) * | 2019-08-05 | 2019-11-05 | 国网电子商务有限公司 | A kind of multidimensional characteristic dynamic identity authentication method and system towards power business |
Also Published As
Publication number | Publication date |
---|---|
WO2017084205A1 (en) | 2017-05-26 |
CN105337987A (en) | 2016-02-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105337987B (en) | A kind of method for authentication of identification of network user and system | |
CN104077396B (en) | Method and device for detecting phishing website | |
WO2019218514A1 (en) | Method for extracting webpage target information, device, and storage medium | |
CN101820366B (en) | Pre-fetching-based fishing web page detection method | |
CN108959383A (en) | Analysis method, device and the computer readable storage medium of network public-opinion | |
CN106685936B (en) | Webpage tampering detection method and device | |
CN104199874A (en) | Webpage recommendation method based on user browsing behaviors | |
CN108416198A (en) | Man-machine identification model establishes device, method and computer readable storage medium | |
CN102170446A (en) | Fishing webpage detection method based on spatial layout and visual features | |
WO2016201938A1 (en) | Multi-stage phishing website detection method and system | |
CN101534306A (en) | Detecting method and a device for fishing website | |
CN104135365A (en) | A method, a server, and a client for verifying an access request | |
CN106302438A (en) | A kind of method of actively monitoring fishing website of Behavior-based control feature by all kinds of means | |
CN103838785A (en) | Vertical search engine in patent field | |
CN107341183A (en) | A kind of Website classification method based on darknet website comprehensive characteristics | |
CN107438083B (en) | Detection method for phishing site and its detection system under a kind of Android environment | |
CN104050243B (en) | It is a kind of to search for the network search method combined with social activity and its system | |
CN109413023A (en) | The training of machine recognition model and machine identification method, device, electronic equipment | |
CN107608980A (en) | Information-pushing method and system based on the analysis of DPI big datas | |
CN107818132A (en) | A kind of webpage agent discovery method based on machine learning | |
Tao | [Retracted] Application of Data Mining in the Analysis of Martial Arts Athlete Competition Skills and Tactics | |
CN109522692A (en) | Webpage machine behavioral value method and system | |
Camiña et al. | Towards building a masquerade detection method based on user file system navigation | |
CN106330861A (en) | Website detection method and apparatus | |
CN111125747B (en) | Commodity browsing privacy protection method and system for commercial website user |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |