CN104765884A - Fingerprint extraction method and fingerprint identification method of HTTPS web pages - Google Patents

Fingerprint extraction method and fingerprint identification method of HTTPS web pages Download PDF

Info

Publication number
CN104765884A
CN104765884A CN201510213462.6A CN201510213462A CN104765884A CN 104765884 A CN104765884 A CN 104765884A CN 201510213462 A CN201510213462 A CN 201510213462A CN 104765884 A CN104765884 A CN 104765884A
Authority
CN
China
Prior art keywords
length
fingerprint
unknown
https
https webpage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510213462.6A
Other languages
Chinese (zh)
Other versions
CN104765884B (en
Inventor
余翔湛
何慧
张伟哲
叶麟
张宏莉
康宁
丛小亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN201510213462.6A priority Critical patent/CN104765884B/en
Publication of CN104765884A publication Critical patent/CN104765884A/en
Application granted granted Critical
Publication of CN104765884B publication Critical patent/CN104765884B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The invention relates to a fingerprint extraction method and a fingerprint identification method of HTTPS web pages. The fingerprint extraction method includes the steps that according to data streams of the HTTPS web pages to be processed, the ciphertext length and encryption mode of each of multiple objects of the HTTPS web pages to be processed are obtained; according to the ciphertext length and encryption mode of each object of the HTTPS web pages to be processed, the plaintext length interval of each object is obtained so as to determine information of the object, wherein the information of the object includes the maximum length, the minimum length and the average length corresponding to the object; through the information of each object of the HTTPS web pages to be processed, fingerprints of the HTTPS web pages to be processed are constructed. The fingerprint identification method includes the step that the information of the objects of the HTTPS web pages to be identified is extracted and matched with information in a HTTPS web page fingerprint database to complete identification. The fingerprint extraction method and the fingerprint identification method are high in feasibility and identification accuracy.

Description

A kind of fingerprint extraction method of HTTPS webpage and fingerprint identification method
Technical field
The present invention relates to field of computer technology, be specifically related to a kind of fingerprint extraction method and fingerprint identification method of HTTPS webpage.
Background technology
At present, along with the development of Traffic identification art, also more and more extensive to its demand in network management.People are no longer confined in the past in the identification of application to flow, and more lay particular emphasis on the recognition methods of encipher flux, as P2P, SSL, SSH etc.Along with the development of ssl protocol and redundant organism tls protocol thereof in recent years, HTTPS agreement (combination of http protocol and ssl protocol) is risen gradually.
HTTPS is a kind of cryptographic protocol ensureing web data safe transmission.In HTTPS agreement, HTTP is responsible for the transmission of web data, and ssl protocol is responsible for data encryption and authentication.At present, HTTPS agreement has been widely applied in the critical services such as the Internet bank, network payment, ecommerce.Numerous Web site, in order to from the safety in communication process, also all carries out data transmission by HTTPS agreement.Even usually adopt the general Websites of http protocol, the page that also can log in its user and registration etc. relates to network user's private information adopts HTTPS agreement to transmit, even for user provides special HTTPS protocol channel.Therefore, HTTPS agreement has occupied one seat on Web communications market, and HTTPS encipher flux is more and more extensive, and will continue to increase.But the recognition accuracy at present for the encryption technology of HTTPS is lower, and feasibility is poor.
Summary of the invention
The invention provides a kind of fingerprint extraction method and fingerprint identification method of HTTPS webpage, its objective is and solve problem lower for the recognition accuracy of the encryption webpage flow based on HTTPS agreement at present.
In order to realize foregoing invention object, the technical scheme that the present invention takes is as follows:
A fingerprint extraction method for HTTPS webpage, this fingerprint extraction method comprises: according to the data stream of pending HTTPS webpage, the respective ciphertext length of the multiple objects obtaining pending HTTPS webpage and cipher mode; According to the respective ciphertext length of multiple objects of pending HTTPS webpage and cipher mode, obtain multiple object length of the plaintext separately interval, to determine the information of each object, wherein the information of each object comprises maximum length corresponding to this object, minimum length and average length; And utilize multiple objects information separately of pending HTTPS webpage, build the fingerprint of pending HTTPS webpage.
Preferably, in the step obtaining multiple object length of the plaintext interval separately: for each object in multiple object, when this object adopts stream encryption mode, the length of the plaintext interval of this object is L (D)=[L (E)-nL (Mac), L (E)-nL (Mac)], when this object adopts block cipher mode, the length of the plaintext interval of this object is L (D)=[L (E)-nL (Mac)-n-n (bs-1), L (E)-nL (Mac)-n], wherein, L (D) represents that the length of the plaintext of this object is interval, and in length of the plaintext interval on the left of comma expression formula represent the minimum length of this object, and comma right-hand side expression represents this object at utmost, L (E) represents the ciphertext length of this object, L (Mac) represents the check information length obtained according to the cipher mode of this object, n represents this object burst number in the transmission, and bs represents the block size taked according to the cipher mode of this object.
Preferably, the fingerprint of the pending HTTPS webpage of structure is: fp={obj i, i=1,2 ..., N 0, wherein, N 0represent the number of objects included by pending HTTPS webpage, fp represents the fingerprint of pending HTTPS webpage, obj i={ obj i_ min, obj i_ max, obj i_ s}, obj i_ min represents the minimum length of i-th object of pending HTTPS webpage, obj i_ max represents the maximum length of i-th object, obj i_ s represents the average length of i-th object, and obj i _ s = obj i _ min + obj i _ max 2 .
A fingerprint identification method for HTTPS webpage, this fingerprint identification method comprises: the data stream of catching a predetermined quantity unknown HTTPS webpage, to determine ciphertext length and the cipher mode of all unknown objects included by a predetermined quantity unknown HTTPS webpage; According to ciphertext length and the cipher mode of each unknown object, the length of the plaintext obtaining each unknown object is interval, to determine the information of each unknown object, wherein the information of each unknown object comprises maximum length corresponding to this unknown object, minimum length and average length; According to the information of all unknown objects, build the data set to be identified that a predetermined quantity unknown HTTPS webpage is corresponding; And the fingerprint of data set to be identified with known HTTPS webpage each in predetermined fingerprint storehouse is mated, to determine the fingerprint of the known HTTPS webpage that data set to be identified is corresponding according to matching result, as the fingerprint recognition result of data set to be identified.
Preferably, the step that data set to be identified and the fingerprint of known HTTPS webpage each in predetermined fingerprint storehouse carry out mating is comprised: for each unknown object included by a predetermined quantity unknown HTTPS webpage, judge whether the length of the plaintext interval of each known object of each known HTTPS webpage has common factor with the length of the plaintext interval of this unknown object: if there is common factor, then by the information of this known object stored in set of matches corresponding to known HTTPS webpage belonging to this known object, the match objects of this unknown object is determined among all known object that the set of matches that each known HTTPS webpage is corresponding comprises, make the spacing of the average length of match objects and the average length of this unknown object minimum, and by the corresponding relation between match objects with this unknown object stored in set of matches corresponding to known HTTPS webpage belonging to match objects.
Preferably, smaller value in the length of the plaintext interval of this unknown object can equal the minimum length of this unknown object and the difference of the buffering factor preset, and the higher value in the length of the plaintext interval of this unknown object can equal the maximum length of this unknown object and buffering factor sum.
Preferably, determine that according to matching result the step of the fingerprint of the known HTTPS webpage that data set to be identified is corresponding comprises: the number of all unknown objects included by a predetermined quantity unknown HTTPS webpage and total bytes, the number of all known object in predetermined fingerprint storehouse included by each known HTTPS webpage and total bytes, the number of all known object included in each set of matches and total bytes, the average length of known object that each corresponding relation included in each set of matches comprises and the average length of unknown object, calculate the matching factor that each known HTTPS webpage is corresponding, in the matching factor that all known HTTPS webpages are corresponding, remove the matching factor being wherein less than the first coefficient threshold, all matching factors of current residual are obtained the coefficient set after sorting after sorting from small to large, calculating current coefficient concentrates the front and back coefficient ratio in every two neighbor coefficients respectively, and determine the minimum value in calculated all front and back coefficient ratio, by deleting from coefficient set, to upgrade current coefficient collection at rear matching factor and all matching factors come after rear matching factor in corresponding for this minimum value two neighbor coefficients, the second coefficient threshold is determined according to the matching factor that current coefficient lumped values is maximum, the matching factor being less than the second coefficient threshold in this coefficient set is removed, the fingerprint of all known HTTPS webpage corresponding to current residual matching factor is defined as the fingerprint recognition result of data set to be identified.
Preferably, the second coefficient threshold equals the prearranged multiple of the maximum matching factor of current coefficient lumped values, and wherein, prearranged multiple is value between 0 to 1.
Compared to the prior art the present invention, has following beneficial effect:
The fingerprint extraction method of a kind of HTTPS webpage of the present invention and fingerprint identification method, feasibility is stronger, recognition accuracy is higher, better effectively can manage network service while ensuring information safety, the mode that simultaneously can prevent lawless person from encrypting webpage by HTTPS transmits illegal harmful information.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of an example of the fingerprint extraction method of a kind of HTTPS webpage of the embodiment of the present invention; And
Fig. 2 is the process flow diagram of an example of the fingerprint identification method of a kind of HTTPS webpage of the embodiment of the present invention.
Embodiment
Clearly understand for making goal of the invention of the present invention, technical scheme and beneficial effect, below in conjunction with accompanying drawing, embodiments of the invention are described, it should be noted that, when not conflicting, the embodiment in the application and the feature in embodiment can combination in any mutually.
The embodiment provides a kind of fingerprint extraction method of HTTPS webpage, this fingerprint extraction method comprises: according to the data stream of pending HTTPS webpage, the respective ciphertext length of the multiple objects obtaining pending HTTPS webpage and cipher mode; According to the respective ciphertext length of multiple objects of pending HTTPS webpage and cipher mode, obtain multiple object length of the plaintext separately interval, to determine the information of each object, wherein the information of each object comprises maximum length corresponding to this object, minimum length and average length; And utilize multiple objects information separately of pending HTTPS webpage, build the fingerprint of pending HTTPS webpage.
Fig. 1 shows the process flow diagram of an example process of the fingerprint extraction method of a kind of HTTPS webpage according to the embodiment of the present invention.As shown in Figure 1, after this treatment scheme starts, first perform step S110.
In step s 110, according to the data stream of pending HTTPS webpage (can be such as any one in multiple pending HTTPS webpage), the respective ciphertext length of the multiple objects obtaining pending HTTPS webpage and cipher mode.Then, step S120 is performed.
In the step s 120, according to the respective ciphertext length of multiple objects of pending HTTPS webpage and cipher mode, obtain multiple object length of the plaintext separately interval, to determine the information of each object, wherein the information of each object comprises maximum length corresponding to this object, minimum length and average length.Then, step S130 is performed.
Preferably, in the step s 120, multiple object length of the plaintext separately can be obtained interval: for each object in multiple object by following process, when this object adopts stream encryption mode, the length of the plaintext interval of this object is L (D)=[L (E)-nL (Mac), L (E)-nL (Mac)], when this object adopts block cipher mode, the length of the plaintext interval of this object is L (D)=[L (E)-nL (Mac)-n-n (bs-1), L (E)-nL (Mac)-n], wherein, L (D) represents that the length of the plaintext of this object is interval, and in length of the plaintext interval on the left of comma expression formula represent the minimum length of this object, and comma right-hand side expression represents this object at utmost, L (E) represents the ciphertext length of this object, L (Mac) represents the check information length obtained according to the cipher mode of this object, n represents this object burst number in the transmission, and bs represents the block size taked according to the cipher mode of this object.
In step s 130, which, utilize multiple objects information separately of pending HTTPS webpage, build the fingerprint of pending HTTPS webpage, to complete the fingerprint extraction to pending HTTPS webpage.Then, end process.
Preferably, the fingerprint of the pending HTTPS webpage of structure can be: fp={obj i, i=1,2 ..., N 0, wherein, N 0represent the number of objects included by pending HTTPS webpage, fp represents the fingerprint of pending HTTPS webpage, obj i={ obj i_ min, obj i_ max, obj i_ s}, obj i_ min represents the minimum length of i-th object of pending HTTPS webpage, obj i_ max represents the maximum length of i-th object, obj i_ s represents the average length of i-th object, and
Known by describing above, the fingerprint extraction method of above-mentioned a kind of HTTPS webpage according to the embodiment of the present invention, it is according to the respective ciphertext length of multiple objects of HTTPS webpage and cipher mode, obtain multiple object length of the plaintext separately interval to determine the information of each object, and then obtain the fingerprint of HTTPS webpage.This fingerprint extraction method feasibility is comparatively strong, is convenient to follow-up deciphering and the realization of fingerprint identification process, makes follow-up recognition accuracy higher.
In addition, embodiments of the invention additionally provide a kind of fingerprint identification method of HTTPS webpage, this fingerprint identification method comprises: the data stream of catching a predetermined quantity unknown HTTPS webpage, to determine ciphertext length and the cipher mode of all unknown objects included by a predetermined quantity unknown HTTPS webpage; According to ciphertext length and the cipher mode of each unknown object, the length of the plaintext obtaining each unknown object is interval, to determine the information of each unknown object, wherein the information of each unknown object comprises maximum length corresponding to this unknown object, minimum length and average length; According to the information of all unknown objects, build the data set to be identified that a predetermined quantity unknown HTTPS webpage is corresponding; And the fingerprint of data set to be identified with known HTTPS webpage each in predetermined fingerprint storehouse is mated, to determine the fingerprint of the known HTTPS webpage that data set to be identified is corresponding according to matching result, as the fingerprint recognition result of described data set to be identified.
Fig. 2 shows the process flow diagram of an example process of the fingerprint identification method of a kind of HTTPS webpage according to the embodiment of the present invention.As shown in Figure 2, after this treatment scheme starts, first perform step S210.
In step S210, catch the data stream of a predetermined quantity unknown HTTPS webpage, to determine ciphertext length and the cipher mode of all unknown objects included by a predetermined quantity unknown HTTPS webpage.Then, step S220 is performed.
In step S220, according to ciphertext length and the cipher mode of each unknown object, the length of the plaintext obtaining each unknown object is interval, and to determine the information of each unknown object, wherein the information of each unknown object comprises maximum length corresponding to this unknown object, minimum length and average length.Then, step S230 is performed.
In step S230, according to the information of all unknown objects, build the data set to be identified that a predetermined quantity unknown HTTPS webpage is corresponding.Then, step S240 is performed.
In step S240, the fingerprint of data set to be identified with known HTTPS webpage each in predetermined fingerprint storehouse is mated, to determine the fingerprint of the known HTTPS webpage that data set to be identified is corresponding according to matching result, as the fingerprint recognition result of described data set to be identified.Then, end process.
Preferably, in step S240, by processing as follows, the fingerprint of data set to be identified with known HTTPS webpage each in predetermined fingerprint storehouse can be mated: for each unknown object included by a predetermined quantity unknown HTTPS webpage, judge whether the length of the plaintext interval of each known object of each known HTTPS webpage has common factor with the length of the plaintext interval of this unknown object: if there is common factor, then by the information of this known object stored in set of matches corresponding to known HTTPS webpage belonging to this known object, the match objects of this unknown object is determined among all known object that the set of matches that each known HTTPS webpage is corresponding comprises, make the spacing of the average length of match objects and the average length of this unknown object (i.e. the difference of the average length of match objects and the average length of this unknown object) minimum, and by the corresponding relation between match objects with this unknown object stored in set of matches corresponding to known HTTPS webpage belonging to match objects.Wherein, smaller value in the length of the plaintext interval of this unknown object can equal the minimum length of this unknown object and the difference of the buffering factor preset, and the higher value in the length of the plaintext interval of this unknown object can equal the maximum length of this unknown object and buffering factor sum.
Preferably, in step S240, can by processing the fingerprint determining the known HTTPS webpage that data set to be identified is corresponding as follows: the number of all unknown objects included by a predetermined quantity unknown HTTPS webpage and total bytes, the number of all known object in predetermined fingerprint storehouse included by each known HTTPS webpage and total bytes, the number of all known object included in each set of matches and total bytes, the average length of known object that each corresponding relation included in each set of matches comprises and the average length of unknown object, calculate the matching factor that each known HTTPS webpage is corresponding, in the matching factor that all known HTTPS webpages are corresponding, remove the matching factor being wherein less than the first coefficient threshold, all matching factors of current residual are obtained the coefficient set after sorting after sorting from small to large, calculating current coefficient concentrates the front and back coefficient ratio in every two neighbor coefficients respectively, and determine the minimum value in calculated all front and back coefficient ratio, by deleting from coefficient set, to upgrade current coefficient collection at rear matching factor and all matching factors come after rear matching factor in corresponding for this minimum value two neighbor coefficients, the second coefficient threshold is determined according to the matching factor that current coefficient lumped values is maximum, the matching factor being less than the second coefficient threshold in this coefficient set is removed, the fingerprint of all known HTTPS webpage corresponding to current residual matching factor is defined as the fingerprint recognition result of data set to be identified.
Wherein, the second coefficient threshold such as can equal the prearranged multiple of the maximum matching factor of current coefficient lumped values, and wherein, prearranged multiple is value between 0 to 1.
An a kind of according to an embodiment of the invention application example of fingerprint identification method of HTTPS webpage will be described below.
First, after completing the catching of a whole piece HTTPS data stream, analyze, obtain cryptographic algorithm and the message digest algorithm of this data stream to interior data, the plaintext calculating all response object in stream is interval, and stored in unknown object set.After catching predetermined quantity bar (such as 10) HTTPS data stream, in now unknown object set, total Nu object, then carries out fingerprint recognition to this Nu object.
Make UKOBJ represent above-mentioned data set to be identified, then:
UKOBJ={ukobj k,k=1,2,...,N u}。
Wherein, Nu represents the sum of the individual unknown object included by unknown HTTPS webpage of above-mentioned predetermined quantity, ukobj krepresent the information of the kth unknown object in the individual all unknown objects included by unknown HTTPS webpage of above-mentioned predetermined quantity, and ukobj k={ ukobj k_ min, ukobj k_ max, ukobj k_ s}.
Ukobj k_ min represents the minimum length of an above-mentioned kth unknown object, ukobj k_ max represents the maximum length of an above-mentioned kth unknown object, ukobj k_ s represents the average length of an above-mentioned kth unknown object, and ukobj k _ s = ukobj k _ min + ukobj k _ max 2 .
If predetermined fingerprint storehouse comprises the fingerprint of M known HTTPS webpage, be expressed as wherein, fp mfor the fingerprint of m the known HTTPS webpage that above-mentioned predetermined fingerprint storehouse comprises, the information of the jth known object included by this m known HTTPS webpage.
obj j ( m ) = { obj j ( m ) _ min , obj j ( m ) _ max , obj j ( m ) _ s } .
Wherein, represent the minimum length of above-mentioned m the jth known object included by known HTTPS webpage, represent the maximum length of above-mentioned m the jth known object included by known HTTPS webpage, represent the average length of above-mentioned m the jth known object included by known HTTPS webpage, and
For each unknown object included by an above-mentioned predetermined quantity unknown HTTPS webpage, each known object of each known HTTPS webpage is judged as follows: the length of the plaintext of this known object is interval whether with the length of the plaintext interval [ukobj of this unknown object k_ min, ukobj k_ max] have common factor: if there is common factor, then by the information of this known object stored in set of matches R corresponding to this known HTTPS webpage min, then continue judgement below; Otherwise, directly judge next known object.
In one example in which, such as can judge [ukobj k_ min-α, ukobj k_ max+ α] between whether have common factor: if there is common factor, then will stored in the set of matches R of correspondence min.Wherein, α is the buffering factor, by adding the buffering factor, can offset to a certain extent due to browser or the different impact on HTTP datagram header of system kernel.Wherein, α such as can between 10 to 30 value.
Then, for each unknown object included by an above-mentioned predetermined quantity unknown HTTPS webpage, at the set of matches R that each known HTTPS webpage is corresponding mamong all known object comprised, determine the match objects of this unknown object, make the average length of the match objects of this unknown object with the average length ukobj of this unknown object k'the spacing of _ s is minimum, and by the information of the match objects of this unknown object with the information ukobj of this unknown object k'between corresponding relation stored in the set of matches R that this known HTTPS webpage is corresponding min, wherein
Then, for the set of matches R that each known HTTPS webpage is corresponding m, extract following information: the number uk_num of the individual all unknown objects included by unknown HTTPS webpage of (1) above-mentioned predetermined quantity; (2) the individual total bytes uk_bytes included by unknown HTTPS webpage of above-mentioned predetermined quantity; (3) the number fp of these all known object included by known HTTPS webpage m_ num; (4) the total bytes fp of these all known object included by known HTTPS webpage m_ bytes; (5) this set of matches R min number (that is, all known object numbers matched with unknown object in the current HTTPS webpage) R of included all known object m_ num; (6) this set of matches R min the total bytes R of included all known object m_ bytes; (7) this set of matches R min included each corresponding relation in the average length of included known object and (8) this set of matches R min included each corresponding relation in the average length ukobj of included unknown object k'_ s.
According to above-mentioned eight kinds of information, calculate the matching factor that each known HTTPS webpage is corresponding:
ω m = R m _ num fp m _ num × R m _ num uk _ num × R m _ bytes fp m _ bytes × R m _ bytes uk _ bytes × Π k ′ = 1 N k min ( obj j ′ m _ s ukobj k ′ _ s , ukobj k ′ _ s obj j ′ m _ s ) ,
Wherein, n kfor set of matches R mthe number of the unknown object comprised.
In the matching factor that all known HTTPS webpages are corresponding, remove the matching factor being wherein less than the first coefficient threshold β, all matching factors of current residual are obtained the coefficient set after sorting after sorting from small to large ρ = { ω k 1 , ω k 2 , . . . , ω k Nρ 1 } = { ω k p , p = 1,2 , . . . , Nρ 1 } , K p=k 1, k 2..., k n ρ 1represent the sequence number of the known HTTPS webpage corresponding to matching factor, the number of the matching factor that N ρ 1 concentrates (namely executing in the coefficient set after the process of " removing the matching factor being wherein less than the first coefficient threshold β ") to comprise for current coefficient.If now coefficient set ρ is empty, then represent in an above-mentioned predetermined quantity HTTPS flow and do not comprise the flow that in predetermined fingerprint storehouse, known HTTPS webpage produces, fingerprint recognition completes; Otherwise, process below continuation.Wherein, the value of the first coefficient threshold β can set based on experience value, also can be determined by the method for test, no longer describe in detail here.
Calculating current coefficient concentrates the front and back coefficient ratio in every two neighbor coefficients respectively (now p=1,2, N ρ 1-1), and determine the minimum value in calculated all front and back coefficient ratio, by in corresponding for this minimum value two neighbor coefficients at rear matching factor and come above-mentioned all matching factors after rear matching factor and delete from above-mentioned coefficient set, to upgrade current coefficient collection, for ρ ′ = { ω k ′ 1 , ω k ′ 2 , . . . , ω k ′ Nρ 2 } = { ω k ′ p ′ , p ′ = 1,2 , . . . , Nρ 2 } , K' p'=k' 1, k' 2..., k' n ρ 2, N ρ 2 for current coefficient concentrate (namely execute " and by corresponding for this minimum value two neighbor coefficients at rear matching factor and come above-mentioned all matching factors after rear matching factor and delete from above-mentioned coefficient set " process after coefficient set in) number of matching factor that comprises.
Then, according to the matching factor that current coefficient lumped values is maximum determine the second coefficient threshold β '.Wherein, the second coefficient threshold β ' can equal the prearranged multiple of the maximum matching factor of current coefficient lumped values, and wherein, prearranged multiple is value between 0 to 1.Such as, k c∈ { k 1, k 2..., k n ρ 1; θ (0< θ <1) is preset ratio coefficient (i.e. above-mentioned prearranged multiple); the matching factor being less than above-mentioned second coefficient threshold β ' in this coefficient set is removed, obtains the coefficient set that residue matching factor is formed &rho; &prime; &prime; = { &omega; k &prime; &prime; 1 , &omega; k &prime; &prime; 2 , . . . , &omega; k &prime; &prime; N&rho; 3 } = { &omega; k &prime; &prime; p &prime; &prime; , p &prime; &prime; = 1,2 , . . . , N&rho; 3 } , K " p "=k " 1, k " 2..., k " n ρ 3be then the result of fingerprint recognition, the number of the matching factor that N ρ 3 concentrates (namely executing in the coefficient set after the process of " being removed by the matching factor being less than above-mentioned second coefficient threshold β ' in this coefficient set ") to comprise for current coefficient.Also namely, the fingerprint of all known HTTPS webpage corresponding to current residual matching factor is the fingerprint recognition result of above-mentioned data set to be identified.Wherein, the value of prearranged multiple can set based on experience value, also can be determined by the method for test, no longer describe in detail here.
Known by describing above, the fingerprint identification method of above-mentioned a kind of HTTPS webpage according to the embodiment of the present invention, it utilizes fingerprint extraction method mentioned above to obtain the fingerprint of unknown HTTPS webpage, and compare with the fingerprint in predetermined fingerprint storehouse, thus determine the result of fingerprint recognition according to comparative result.This fingerprint identification method feasibility is comparatively strong, and recognition accuracy is higher.
The fingerprint extraction problem of the above-mentioned HTTPS webpage according to the embodiment of the present invention and fingerprint identification method, it better effectively can manage network service while ensuring information safety, and the mode that simultaneously can prevent lawless person from encrypting webpage by HTTPS transmits illegal harmful information.
Although disclosed embodiment as above, the embodiment that its content just adopts for the ease of understanding technical scheme of the present invention, is not intended to limit the present invention.Technician in any the technical field of the invention; under the prerequisite not departing from disclosed core technology scheme; any amendment and change can be made in the form implemented and details; but the protection domain that the present invention limits, the scope that still must limit with appending claims is as the criterion.

Claims (8)

1. a fingerprint extraction method for HTTPS webpage, is characterized in that, described fingerprint extraction method comprises:
According to the data stream of pending HTTPS webpage, the respective ciphertext length of the multiple objects obtaining described pending HTTPS webpage and cipher mode;
According to the respective ciphertext length of multiple objects of described pending HTTPS webpage and cipher mode, obtain described multiple object length of the plaintext separately interval, to determine the information of each object, wherein the information of each object comprises maximum length corresponding to this object, minimum length and average length; And
Utilize multiple objects information separately of described pending HTTPS webpage, build the fingerprint of described pending HTTPS webpage.
2. fingerprint extraction method according to claim 1, is characterized in that, in the step obtaining described multiple object length of the plaintext interval separately:
For each object in described multiple object,
When this object adopts stream encryption mode, the length of the plaintext interval of this object is L (D)=[L (E)-nL (Mac), L (E)-nL (Mac)],
When this object adopts block cipher mode, the length of the plaintext interval of this object is L (D)=[L (E)-nL (Mac)-n-n (bs-1), L (E)-nL (Mac)-n],
Wherein, L (D) represents that the length of the plaintext of this object is interval, and in described length of the plaintext interval on the left of comma expression formula represent the minimum length of this object, and comma right-hand side expression represents this object at utmost; L (E) represents the ciphertext length of this object, L (Mac) represents the check information length obtained according to the cipher mode of this object, n represents this object burst number in the transmission, and bs represents the block size taked according to the cipher mode of this object.
3. fingerprint extraction method according to claim 1 and 2, is characterized in that, the fingerprint of the described pending HTTPS webpage of structure is:
fp={obj i,i=1,2,...,N 0},
Wherein, N 0represent the number of objects included by described pending HTTPS webpage, fp represents the fingerprint of described pending HTTPS webpage,
obj i={obj i_min,obj i_max,obj i_s},
Obj i_ min represents the minimum length of i-th object of described pending HTTPS webpage, obj i_ max represents the maximum length of described i-th object, obj i_ s represents the average length of described i-th object, and
obj i _ s = obj i _ min + obj i _ max 2 .
4. a fingerprint identification method for HTTPS webpage, is characterized in that, described fingerprint identification method comprises:
Catch the data stream of a predetermined quantity unknown HTTPS webpage, to determine ciphertext length and the cipher mode of all unknown objects included by a described predetermined quantity unknown HTTPS webpage;
According to ciphertext length and the cipher mode of each unknown object, the length of the plaintext obtaining each unknown object is interval, to determine the information of each unknown object, wherein the information of each unknown object comprises maximum length corresponding to this unknown object, minimum length and average length;
According to the information of described all unknown objects, build the data set to be identified that a described predetermined quantity unknown HTTPS webpage is corresponding; And
The fingerprint of described data set to be identified with known HTTPS webpage each in predetermined fingerprint storehouse is mated, to determine the fingerprint of the known HTTPS webpage that described data set to be identified is corresponding according to matching result, as the fingerprint recognition result of described data set to be identified.
5. fingerprint identification method according to claim 4, is characterized in that, the step that described data set to be identified and the fingerprint of known HTTPS webpage each in predetermined fingerprint storehouse carry out mating is comprised:
For each unknown object included by a described predetermined quantity unknown HTTPS webpage,
Judge whether the length of the plaintext interval of each known object of each known HTTPS webpage has common factor with the length of the plaintext interval of this unknown object: if there is common factor, then by the information of this known object stored in set of matches corresponding to known HTTPS webpage belonging to this known object
The match objects of this unknown object is determined among all known object that the set of matches that each known HTTPS webpage is corresponding comprises, make the spacing of the average length of described match objects and the average length of this unknown object minimum, and by the corresponding relation between described match objects with this unknown object stored in set of matches corresponding to known HTTPS webpage belonging to described match objects.
6. fingerprint identification method according to claim 5, it is characterized in that, smaller value in the length of the plaintext interval of described unknown object equals the minimum length of described unknown object and the difference of the buffering factor preset, and the higher value in the length of the plaintext interval of described unknown object equals the maximum length of described unknown object and described buffering factor sum.
7. the fingerprint identification method according to any one of claim 4-6, is characterized in that, describedly determines that according to matching result the step of the fingerprint of the known HTTPS webpage that described data set to be identified is corresponding comprises:
The average length of known object that in the number of all known object included in the number of all known object in the number of all unknown objects included by described predetermined quantity unknown HTTPS webpage and total bytes, described predetermined fingerprint storehouse included by each known HTTPS webpage and total bytes, each set of matches and total bytes, each set of matches, included each corresponding relation comprises and the average length of unknown object, calculate the matching factor that each known HTTPS webpage is corresponding;
In the matching factor that all known HTTPS webpages are corresponding, remove the matching factor being wherein less than the first coefficient threshold, all matching factors of current residual are obtained the coefficient set after sorting after sorting from small to large;
Calculating current coefficient concentrates the front and back coefficient ratio in every two neighbor coefficients respectively, and determine the minimum value in calculated all front and back coefficient ratio, by in corresponding for this minimum value two neighbor coefficients at rear matching factor and come described all matching factors after rear matching factor and delete from described coefficient set, to upgrade current coefficient collection;
The second coefficient threshold is determined according to the matching factor that current coefficient lumped values is maximum, the matching factor being less than described second coefficient threshold in this coefficient set is removed, the fingerprint of all known HTTPS webpage corresponding to current residual matching factor is defined as the fingerprint recognition result of described data set to be identified.
8. fingerprint identification method according to claim 7, is characterized in that, described second coefficient threshold equals the prearranged multiple of the maximum matching factor of described current coefficient lumped values, wherein, and described prearranged multiple value between 0 to 1.
CN201510213462.6A 2015-04-30 2015-04-30 A kind of fingerprint identification method of HTTPS webpages Active CN104765884B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510213462.6A CN104765884B (en) 2015-04-30 2015-04-30 A kind of fingerprint identification method of HTTPS webpages

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510213462.6A CN104765884B (en) 2015-04-30 2015-04-30 A kind of fingerprint identification method of HTTPS webpages

Publications (2)

Publication Number Publication Date
CN104765884A true CN104765884A (en) 2015-07-08
CN104765884B CN104765884B (en) 2018-06-22

Family

ID=53647711

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510213462.6A Active CN104765884B (en) 2015-04-30 2015-04-30 A kind of fingerprint identification method of HTTPS webpages

Country Status (1)

Country Link
CN (1) CN104765884B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018233379A1 (en) * 2017-06-23 2018-12-27 武汉斗鱼网络科技有限公司 Method and device for obtaining data plaintext, electronic terminal, and readable storage medium
CN109831448A (en) * 2019-03-05 2019-05-31 南京理工大学 For the detection method of particular encryption web page access behavior
CN112788159A (en) * 2020-12-31 2021-05-11 山西三友和智慧信息技术股份有限公司 Webpage fingerprint tracking method based on DNS traffic and KNN algorithm
CN113407880A (en) * 2021-05-06 2021-09-17 中南大学 Access behavior identification method suitable for encrypted HTTP/2 webpage
CN116016365A (en) * 2023-01-06 2023-04-25 哈尔滨工业大学 Webpage identification method based on data packet length information under encrypted flow

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101741745A (en) * 2009-12-29 2010-06-16 苏州融通科技有限公司 Method and system for identifying application traffic of peer-to-peer network
US20110314269A1 (en) * 2009-12-10 2011-12-22 Angelos Stavrou Website Detection
CN102404396A (en) * 2011-11-14 2012-04-04 北京星网锐捷网络技术有限公司 Method, device and system for identifying peer-to-peer (P2P) flow and equipment
CN104038389A (en) * 2014-06-19 2014-09-10 高长喜 Multiple application protocol identification method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110314269A1 (en) * 2009-12-10 2011-12-22 Angelos Stavrou Website Detection
CN101741745A (en) * 2009-12-29 2010-06-16 苏州融通科技有限公司 Method and system for identifying application traffic of peer-to-peer network
CN102404396A (en) * 2011-11-14 2012-04-04 北京星网锐捷网络技术有限公司 Method, device and system for identifying peer-to-peer (P2P) flow and equipment
CN104038389A (en) * 2014-06-19 2014-09-10 高长喜 Multiple application protocol identification method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YUAN ZHENLONG等: ""PPI:towards precise page identification for encrypted web-browsing traffic"", 《PROCEEDINGS OF THE NINTH ACM/IEEE SYMPOSIUM ON ARCHITECTURE FOR NETWORKING AND COMMUNICATIONS SYSTEMS.IEEE PRESS》 *
吴家顺: ""Website指纹识别攻击与防护技术研究"", 《中国优秀硕士学位论文全文数据库》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018233379A1 (en) * 2017-06-23 2018-12-27 武汉斗鱼网络科技有限公司 Method and device for obtaining data plaintext, electronic terminal, and readable storage medium
CN109831448A (en) * 2019-03-05 2019-05-31 南京理工大学 For the detection method of particular encryption web page access behavior
CN112788159A (en) * 2020-12-31 2021-05-11 山西三友和智慧信息技术股份有限公司 Webpage fingerprint tracking method based on DNS traffic and KNN algorithm
CN112788159B (en) * 2020-12-31 2022-07-08 山西三友和智慧信息技术股份有限公司 Webpage fingerprint tracking method based on DNS traffic and KNN algorithm
CN113407880A (en) * 2021-05-06 2021-09-17 中南大学 Access behavior identification method suitable for encrypted HTTP/2 webpage
CN116016365A (en) * 2023-01-06 2023-04-25 哈尔滨工业大学 Webpage identification method based on data packet length information under encrypted flow
CN116016365B (en) * 2023-01-06 2023-09-19 哈尔滨工业大学 Webpage identification method based on data packet length information under encrypted flow

Also Published As

Publication number Publication date
CN104765884B (en) 2018-06-22

Similar Documents

Publication Publication Date Title
EP3046286B1 (en) Information processing method, program, and information processing apparatus
CN104765884A (en) Fingerprint extraction method and fingerprint identification method of HTTPS web pages
CN107786547A (en) A kind of auth method based on block chain, device and computer-readable recording medium
CN104901971B (en) The method and apparatus that safety analysis is carried out to network behavior
CN111340008A (en) Method and system for generation of counterpatch, training of detection model and defense of counterpatch
SG10201808534SA (en) Method and system for processing blockchain-based transactions on existing payment networks
CN103368954B (en) A kind of smart card registration entry based on password and biological characteristic
CN104767624A (en) Remote protocol authentication method based on biological features
CN111680676A (en) Training face recognition model, image registration and face recognition method and device
CN106936775A (en) A kind of authentication method and system based on fingerprint recognition
CN103929425A (en) Identity registration and identity authentication method, device and system
CN112437060A (en) Data transmission method and device, computer equipment and storage medium
CN109784918A (en) Information measure of supervision, device, equipment and storage medium based on block chain
CN102223235A (en) Fingerprint characteristic template protecting method and identity authentication method in open network environment
CN107844696A (en) A kind of identifying code interference method and server
CN105897401A (en) Bit-based universal differential power consumption analysis method and system
CN112597379B (en) Data identification method and device, storage medium and electronic device
CN115456766A (en) Credit risk prediction method and device
CN108696865A (en) A kind of radio sensing network node safety certifying method
CN105281914B (en) A kind of secret handshake method based on lattice password
CN109831290B (en) Side channel analysis method for CAVE algorithm authentication protocol
CN109801066B (en) Method and device for realizing remote storage service
CN110519291A (en) Authentication data transmission method and system based on edge calculations and channel relevancy
CN111242613A (en) Wallet information management method and device based on online banking system and electronic equipment
CN114071384B (en) Short data packet transmission method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant