CN111737702A - Web fingerprint identification method based on Chebyshev inequality - Google Patents

Web fingerprint identification method based on Chebyshev inequality Download PDF

Info

Publication number
CN111737702A
CN111737702A CN202010574486.5A CN202010574486A CN111737702A CN 111737702 A CN111737702 A CN 111737702A CN 202010574486 A CN202010574486 A CN 202010574486A CN 111737702 A CN111737702 A CN 111737702A
Authority
CN
China
Prior art keywords
web
web fingerprint
fingerprint
fingerprints
triple
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010574486.5A
Other languages
Chinese (zh)
Inventor
武军成
张攀
张长英
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Changhong Electric Co Ltd
Original Assignee
Sichuan Changhong Electric Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Changhong Electric Co Ltd filed Critical Sichuan Changhong Electric Co Ltd
Priority to CN202010574486.5A priority Critical patent/CN111737702A/en
Publication of CN111737702A publication Critical patent/CN111737702A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a web fingerprint identification method based on Chebyshev inequality, and belongs to the technical field of network security. The invention comprises the following steps: the method comprises the following steps: collecting the characteristics of known web fingerprints and establishing a triple library; collecting the characteristics of the web fingerprint of the target website, and calculating to obtain a sample value of the web fingerprint of the target website; and substituting the sample value of the target website web fingerprint into the Chebyshev inequality, traversing the triple library, and judging whether the target website web fingerprint accords with the web fingerprint in the triple library. Compared with the prior art, the unique technical characteristic of the technical scheme is that the sample value of the collected characteristic is calculated in a linear weight mode, and then the web fingerprint identification is carried out according to the Chebyshev inequality method; the web fingerprint result obtained by the method of the invention is more accurate than the result of comparing the traditional feature library one by one.

Description

Web fingerprint identification method based on Chebyshev inequality
Technical Field
The invention relates to the technical field of network security, in particular to a web fingerprint identification method based on Chebyshev inequality.
Background
When the specific implementation work of the penetration test is carried out, the first step is to collect information of a target, and the information is searched as much as possible by using social engineering, web fingerprint identification, directory scanning, sub-domain blasting, side station and C-segment query and the like, which is the first step of the penetration test and is also the most important step of the penetration test, and the information collection amount often determines the result of the penetration test.
The Web fingerprint identification technology is used for carrying out component identification on a website, identifying whether the website uses a CMS framework or not, identifying what the type of the used CMS framework is, using Nginx or Tomcat, using Vue. js or React. js as a front-end framework, using Java language or Php language as a rear-end framework, using Struct or Weblogic and other information, belonging to an information collection part before penetration test, and carrying out special attack after collecting the information so as to achieve a quick and accurate attacking target system.
One of the disadvantages of the method is that the method depends on a huge number of feature libraries, the more the data of the feature libraries are, the more accurate the identification result is, and the other disadvantage is that the identification accuracy is low, because some websites change the previous features into other information in order not to identify the web fingerprints, and do not have the features, and the web fingerprints are probably not identified at this time.
The problem addressed by this patent application is that of web fingerprinting, which is a technical deficiency of using the collected features to match one-by-one against a huge library of features. The technical essence of the method for solving the problem of the patent application is that a Chebyshev inequality is adopted to identify the web fingerprint. Compared with the prior art, the unique technical characteristic is that after the characteristic is collected, the sample value is calculated according to the linear weight, and then the Chebyshev inequality is used for achieving the purpose of identifying the web fingerprint.
Disclosure of Invention
The problem addressed by this patent application is that of web fingerprinting, which is a technical deficiency of using the collected features to match one-by-one against a huge library of features. The method for solving the problems in the patent application is technically characterized in that a Chebyshev inequality is adopted to identify the web fingerprint, and compared with the prior art, the method has the unique technical characteristics that after the characteristics are collected, the sample value is calculated according to the linear weight, and then the Chebyshev inequality is used for achieving the purpose of identifying the web fingerprint.
In order to achieve the purpose, the invention adopts the following technical scheme:
a web fingerprint identification method based on Chebyshev inequality comprises the following steps:
collecting the characteristics of known web fingerprints and establishing a triple library;
collecting the characteristics of the web fingerprint of the target website, and calculating to obtain a sample value of the web fingerprint of the target website;
and substituting the sample value of the target website web fingerprint into the Chebyshev inequality, traversing the triple library, and judging whether the target website web fingerprint accords with the web fingerprint in the triple library.
Further, the collecting the characteristics of the known web fingerprints and establishing a triple library comprise:
the method comprises the steps of collecting a large number of characteristics of known web fingerprints, establishing a sample set according to a linear weight method, and establishing a triple library according to the sample set.
Further, the collecting a plurality of characteristics of known web fingerprints, establishing a sample set according to a linear weighting method, and establishing a triple library according to the sample set includes:
calculating sample values X of the web fingerprints according to different elements of the characteristics of each web fingerprint according to a weight proportion, wherein a calculation formula of the sample values X of the web fingerprints is as follows:
X=x1*r1+x2*r2+x3*r3+x4*r4
x1、x2、x3、x4are the different elements that make up the features of each web fingerprint, whose values are 1 or 0, respectively, 1 if present, and 0 if not present; wherein:
x1url path, x2Html source key, x3Http response header keyword, x4Default error page keywords;
r1、r2、r3、r4is the corresponding weight coefficient, r, of each element1+r2+r3+r4100 percent; wherein:
r1is the weight coefficient of the url path, r2Is the weight coefficient, r, of the html source code keyword3Is the weight coefficient, r, of the http response header keyword4Is the weight coefficient of the default error page keyword; r is1+r2+r3+r4=100%;
Finding a large number of websites with known web fingerprints on the Internet, obtaining a large number of sample values according to the calculation formula, and establishing a sample set of the web fingerprints;
calculating the mathematical expected mu and standard deviation of the sample set of each web fingerprint according to a large number of sample values;
and storing the web fingerprint name, the mathematical expectation mu and the standard deviation as a triple into a database to obtain a triple library.
Further, the step of traversing the triple library by substituting the sample value of the target website web fingerprint into the chebyshev inequality to determine whether the target website web fingerprint matches the web fingerprint in the triple library includes:
the chebyshev inequality is as follows: p { | XTarget-μ|>=}<=2/2
XTargetFor the sample value of the target web site web fingerprint, μ is the mathematical expectation, is the standard deviation,2is a constant value, set as T, if the probability P is obtained<=2/2Then the web fingerprint of the target website is considered to be the web fingerprint in the triplet corresponding to the mathematical expectation μ and the standard.
Compared with the prior art, the invention has the beneficial effects that:
compared with the prior art, the unique technical characteristic of the technical scheme is that the sample value of the collected characteristic is calculated in a linear weight mode, and then the web fingerprint identification is carried out according to the Chebyshev inequality method; the web fingerprint result obtained by the method of the invention is more accurate than the result of comparing the traditional feature library one by one.
Drawings
FIG. 1 is a flow chart of a web fingerprint identification method based on Chebyshev inequality according to the present invention.
Detailed Description
The present invention will be further described with reference to the following examples, which are intended to illustrate only some, but not all, of the embodiments of the present invention. Based on the embodiments of the present invention, other embodiments used by those skilled in the art without any creative effort belong to the protection scope of the present invention.
Example 1:
as shown in fig. 1, a web fingerprint identification method based on chebyshev inequality includes the following steps:
collecting the characteristics of known web fingerprints and establishing a triple library;
collecting the characteristics of the web fingerprint of the target website, and calculating to obtain a sample value of the web fingerprint of the target website;
and substituting the sample value of the target website web fingerprint into the Chebyshev inequality, traversing the triple library, and judging whether the target website web fingerprint accords with the web fingerprint in the triple library.
Further, the collecting the characteristics of the known web fingerprints and establishing a triple library comprise:
the method comprises the steps of collecting a large number of characteristics of known web fingerprints, establishing a sample set according to a linear weight method, and establishing a triple library according to the sample set.
Further, the collecting a plurality of characteristics of known web fingerprints, establishing a sample set according to a linear weighting method, and establishing a triple library according to the sample set includes:
calculating sample values X of the web fingerprints according to different elements of the characteristics of each web fingerprint according to a weight proportion, wherein a calculation formula of the sample values X of the web fingerprints is as follows:
X=x1*r1+x2*r2+x3*r3+x4*r4
x1、x2、x3、x4are the different elements that make up the features of each web fingerprint, whose values are 1 or 0, respectively, 1 if present, and 0 if not present; wherein:
x1url path, x2Html source key, x3Http response header keyword, x4Default error page keywords;
r1、r2、r3、r4is the corresponding weight coefficient, r, of each element1+r2+r3+r4100 percent; wherein:
r1is the weight coefficient of the url path, r2Is the weight coefficient, r, of the html source code keyword3Is the weight coefficient, r, of the http response header keyword4Is the weight coefficient of the default error page keyword; r is1+r2+r3+r4=100%;
Finding a large number of websites with known web fingerprints on the Internet, obtaining a large number of sample values according to the calculation formula, and establishing a sample set of the web fingerprints;
calculating the mathematical expected mu and standard deviation of the sample set of each web fingerprint according to a large number of sample values;
and storing the web fingerprint name, the mathematical expectation mu and the standard deviation as a triple into a database to obtain a triple library.
Further, the step of traversing the triple library by substituting the sample value of the target website web fingerprint into the chebyshev inequality to determine whether the target website web fingerprint matches the web fingerprint in the triple library includes:
the chebyshev inequality is as follows: p { | XTarget-μ|>=}<=2/2
XTargetFor the sample value of the target web site web fingerprint, μ is the mathematical expectation, is the standard deviation,2is a constant value, set as T, if the probability P is obtained<=2/2Then the web fingerprint of the target website is considered to be the mu sum mark which is expected by mathematicsWeb fingerprints in quasi-corresponding triples.
Specific examples are given below for illustration:
1) firstly, calculating according to different weight proportions of elements forming each web fingerprint feature, wherein the calculation formula is as follows:
X=x1*r1+x2*r2+x3*r3+x4*r4
a large number of websites with known web fingerprints are found on the Internet, more sample values are obtained according to the formula, and the mathematical expected mu and standard deviation of each fingerprint sample set are calculated according to the plurality of sample values. Assume that the mathematical expectation μ and standard deviation of the wordpress fingerprint are 1,0.5, respectively.
Storing web fingerprint name, mathematical expectation mu and standard deviation as a triple into a database
2) Com, if a target website http:// test.com is provided, firstly, feature collection is carried out on a target to acquire a web fingerprint of the website, and the specific steps are as follows:
2.1 accessing http:// test. com/wp-admin to see if access is available, if access is available x1 is 1, if not, x1 is 0;
com, check html source code in the homepage, check whether a keyword wordpress exists, if so, x2 is equal to 1, and if not, x2 is equal to 0;
com, check the return header information of the server, check whether the return header information contains an X-Powered-By php field, if so, X3 is 1, and if not, X3 is 0;
2.4 randomly accessing a nonexistent path, namely accessing http:// test. com/2332/34sfsd/, checking whether the error information contains a wordpress keyword, if so, x4 is 1, and if not, x4 is 0;
let x1, x2, x3, x4 have values of 1,1,0,1, respectively; then, the sample values are calculated according to the above linear weight algorithm:
then the sample value of http:// test.com is X ═ 1 × 30% +1 × 20% +0 × 30% +1 × 20% + 0.5;
3) substituting the sample value obtained in the last step into the Chebyshev inequality, traversing the triple obtained in the first step, and setting2=0.5, judging whether the obtained probability P is less than or equal to2/2The chebyshev inequality is as follows:
P{|X-μ|>=}<=2/2
the triple of Wordpress is (Wordpress,1,0.5), when traversing to the triple of Wordpress, the sample value X of http:// test.com is 0.5, and is substituted into the Chebyshev inequality, so as to obtain:
P{|0.5-1|>=0.1}<=0.25/0.5=0.5
because P is 0.5, the website http:// test.com is considered to adopt a wordpress framework.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (4)

1. A web fingerprint identification method based on Chebyshev inequality is characterized by comprising the following steps:
collecting the characteristics of known web fingerprints and establishing a triple library;
collecting the characteristics of the web fingerprint of the target website, and calculating to obtain a sample value of the web fingerprint of the target website;
and substituting the sample value of the target website web fingerprint into the Chebyshev inequality, traversing the triple library, and judging whether the target website web fingerprint accords with the web fingerprint in the triple library.
2. The chebyshev-inequality-based web fingerprint identification method according to claim 1, wherein the gathering of the features of the known web fingerprints and the establishing of the triple library comprise:
the method comprises the steps of collecting a large number of characteristics of known web fingerprints, establishing a sample set according to a linear weight method, and establishing a triple library according to the sample set.
3. The chebyshev-inequality-based web fingerprint identification method according to claim 2, wherein the collecting features of a large number of known web fingerprints, establishing a sample set according to a linear weighting method, and establishing a triplet library according to the sample set comprises:
calculating sample values X of the web fingerprints according to different elements of the characteristics of each web fingerprint according to a weight proportion, wherein a calculation formula of the sample values X of the web fingerprints is as follows:
X=x1*r1+x2*r2+x3*r3+x4*r4
x1、x2、x3、x4are the different elements that make up the features of each web fingerprint, whose values are 1 or 0, respectively, 1 if present, and 0 if not present; wherein:
x1url path, x2Html source key, x3Http response header keyword, x4Default error page keywords;
r1、r2、r3、r4is the corresponding weight coefficient, r, of each element1+r2+r3+r4100 percent; wherein:
r1is the weight coefficient of the url path, r2Is the weight coefficient, r, of the html source code keyword3Is the weight coefficient, r, of the http response header keyword4Is the weight coefficient of the default error page keyword; r is1+r2+r3+r4=100%;
Finding a large number of websites with known web fingerprints on the Internet, obtaining a large number of sample values according to the calculation formula, and establishing a sample set of the web fingerprints;
calculating the mathematical expected mu and standard deviation of the sample set of each web fingerprint according to a large number of sample values;
and storing the web fingerprint name, the mathematical expectation mu and the standard deviation as a triple into a database to obtain a triple library.
4. The chebyshev-inequality-based web fingerprint identification method according to claim 3, wherein the step of traversing the triple library by substituting the sample value of the web fingerprint of the target website into the chebyshev-inequality for determining whether the web fingerprint of the target website conforms to the web fingerprint in the triple library comprises:
the chebyshev inequality is as follows: p { | XTarget-μ|>=}<=2/2
XTargetFor the sample value of the target web site web fingerprint, μ is the mathematical expectation, is the standard deviation,2is a constant value, set as T, if the probability P is obtained<=2/2Then the web fingerprint of the target website is considered to be the web fingerprint in the triplet corresponding to the mathematical expectation μ and the standard.
CN202010574486.5A 2020-06-22 2020-06-22 Web fingerprint identification method based on Chebyshev inequality Pending CN111737702A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010574486.5A CN111737702A (en) 2020-06-22 2020-06-22 Web fingerprint identification method based on Chebyshev inequality

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010574486.5A CN111737702A (en) 2020-06-22 2020-06-22 Web fingerprint identification method based on Chebyshev inequality

Publications (1)

Publication Number Publication Date
CN111737702A true CN111737702A (en) 2020-10-02

Family

ID=72650436

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010574486.5A Pending CN111737702A (en) 2020-06-22 2020-06-22 Web fingerprint identification method based on Chebyshev inequality

Country Status (1)

Country Link
CN (1) CN111737702A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112347328A (en) * 2020-10-27 2021-02-09 杭州安恒信息技术股份有限公司 Network platform identification method, device, equipment and readable storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103065095A (en) * 2013-01-29 2013-04-24 四川大学 WEB vulnerability scanning method and vulnerability scanner based on fingerprint recognition technology
CN104954192A (en) * 2014-03-27 2015-09-30 东华软件股份公司 Network flow monitoring method and device
CN105337985A (en) * 2015-11-19 2016-02-17 北京师范大学 Attack detection method and system
US20190294642A1 (en) * 2017-08-24 2019-09-26 Bombora, Inc. Website fingerprinting
CN110311888A (en) * 2019-05-09 2019-10-08 深信服科技股份有限公司 A kind of Web anomalous traffic detection method, device, equipment and medium
CN110879891A (en) * 2019-08-14 2020-03-13 奇安信科技集团股份有限公司 Vulnerability detection method and device based on web fingerprint information
CN111008405A (en) * 2019-12-06 2020-04-14 杭州安恒信息技术股份有限公司 Website fingerprint identification method based on file Hash

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103065095A (en) * 2013-01-29 2013-04-24 四川大学 WEB vulnerability scanning method and vulnerability scanner based on fingerprint recognition technology
CN104954192A (en) * 2014-03-27 2015-09-30 东华软件股份公司 Network flow monitoring method and device
CN105337985A (en) * 2015-11-19 2016-02-17 北京师范大学 Attack detection method and system
US20190294642A1 (en) * 2017-08-24 2019-09-26 Bombora, Inc. Website fingerprinting
CN110311888A (en) * 2019-05-09 2019-10-08 深信服科技股份有限公司 A kind of Web anomalous traffic detection method, device, equipment and medium
CN110879891A (en) * 2019-08-14 2020-03-13 奇安信科技集团股份有限公司 Vulnerability detection method and device based on web fingerprint information
CN111008405A (en) * 2019-12-06 2020-04-14 杭州安恒信息技术股份有限公司 Website fingerprint identification method based on file Hash

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
闫淑筠 等: ""一种有效的Web指纹识别方法"", 《中国科学院大学学报》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112347328A (en) * 2020-10-27 2021-02-09 杭州安恒信息技术股份有限公司 Network platform identification method, device, equipment and readable storage medium

Similar Documents

Publication Publication Date Title
CN112866023B (en) Network detection method, model training method, device, equipment and storage medium
CN107437026B (en) Malicious webpage advertisement detection method based on advertisement network topology
CN112989348B (en) Attack detection method, model training method, device, server and storage medium
CN110505202B (en) Attack organization discovery method and system
CN113785289A (en) System and method for dynamically generating a set of API endpoints
CN113706100B (en) Real-time detection and identification method and system for Internet of things terminal equipment of power distribution network
CN114168968A (en) Vulnerability mining method based on Internet of things equipment fingerprints
CN114650176A (en) Phishing website detection method and device, computer equipment and storage medium
CN110708339A (en) Correlation analysis method based on WEB log
CN111737702A (en) Web fingerprint identification method based on Chebyshev inequality
CN115098151A (en) Fine-grained intranet equipment firmware version detection method
CN114201756A (en) Vulnerability detection method and related device for intelligent contract code segment
CN114372267A (en) Malicious webpage identification and detection method based on static domain, computer and storage medium
CN112968870A (en) Network group discovery method based on frequent itemset
CN109992960B (en) Counterfeit parameter detection method and device, electronic equipment and storage medium
CN110851828A (en) Malicious URL monitoring method and device based on multi-dimensional features and electronic equipment
CN112003884A (en) Network asset acquisition and natural language retrieval method
CN113992625B (en) Domain name source station detection method, system, computer and readable storage medium
CN115392238A (en) Equipment identification method, device, equipment and readable storage medium
CN115085948B (en) Network security situation assessment method based on improved D-S evidence theory
CN110147506B (en) URL duplication eliminating method and device
CN113992390A (en) Phishing website detection method and device and storage medium
CN108573155B (en) Method and device for detecting vulnerability influence range, electronic equipment and storage medium
CN114257565A (en) Method, system and server for mining domain name with potential threat
CN116723050B (en) Imitation website detection method, device, equipment and medium based on graph database

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20201002