CN111737702A - Web fingerprint identification method based on Chebyshev inequality - Google Patents
Web fingerprint identification method based on Chebyshev inequality Download PDFInfo
- Publication number
- CN111737702A CN111737702A CN202010574486.5A CN202010574486A CN111737702A CN 111737702 A CN111737702 A CN 111737702A CN 202010574486 A CN202010574486 A CN 202010574486A CN 111737702 A CN111737702 A CN 111737702A
- Authority
- CN
- China
- Prior art keywords
- web
- web fingerprint
- fingerprint
- fingerprints
- triple
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/57—Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
- G06F21/577—Assessing vulnerabilities and evaluating computer system security
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/90335—Query processing
- G06F16/90344—Query processing by using string matching techniques
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Computer Security & Cryptography (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a web fingerprint identification method based on Chebyshev inequality, and belongs to the technical field of network security. The invention comprises the following steps: the method comprises the following steps: collecting the characteristics of known web fingerprints and establishing a triple library; collecting the characteristics of the web fingerprint of the target website, and calculating to obtain a sample value of the web fingerprint of the target website; and substituting the sample value of the target website web fingerprint into the Chebyshev inequality, traversing the triple library, and judging whether the target website web fingerprint accords with the web fingerprint in the triple library. Compared with the prior art, the unique technical characteristic of the technical scheme is that the sample value of the collected characteristic is calculated in a linear weight mode, and then the web fingerprint identification is carried out according to the Chebyshev inequality method; the web fingerprint result obtained by the method of the invention is more accurate than the result of comparing the traditional feature library one by one.
Description
Technical Field
The invention relates to the technical field of network security, in particular to a web fingerprint identification method based on Chebyshev inequality.
Background
When the specific implementation work of the penetration test is carried out, the first step is to collect information of a target, and the information is searched as much as possible by using social engineering, web fingerprint identification, directory scanning, sub-domain blasting, side station and C-segment query and the like, which is the first step of the penetration test and is also the most important step of the penetration test, and the information collection amount often determines the result of the penetration test.
The Web fingerprint identification technology is used for carrying out component identification on a website, identifying whether the website uses a CMS framework or not, identifying what the type of the used CMS framework is, using Nginx or Tomcat, using Vue. js or React. js as a front-end framework, using Java language or Php language as a rear-end framework, using Struct or Weblogic and other information, belonging to an information collection part before penetration test, and carrying out special attack after collecting the information so as to achieve a quick and accurate attacking target system.
One of the disadvantages of the method is that the method depends on a huge number of feature libraries, the more the data of the feature libraries are, the more accurate the identification result is, and the other disadvantage is that the identification accuracy is low, because some websites change the previous features into other information in order not to identify the web fingerprints, and do not have the features, and the web fingerprints are probably not identified at this time.
The problem addressed by this patent application is that of web fingerprinting, which is a technical deficiency of using the collected features to match one-by-one against a huge library of features. The technical essence of the method for solving the problem of the patent application is that a Chebyshev inequality is adopted to identify the web fingerprint. Compared with the prior art, the unique technical characteristic is that after the characteristic is collected, the sample value is calculated according to the linear weight, and then the Chebyshev inequality is used for achieving the purpose of identifying the web fingerprint.
Disclosure of Invention
The problem addressed by this patent application is that of web fingerprinting, which is a technical deficiency of using the collected features to match one-by-one against a huge library of features. The method for solving the problems in the patent application is technically characterized in that a Chebyshev inequality is adopted to identify the web fingerprint, and compared with the prior art, the method has the unique technical characteristics that after the characteristics are collected, the sample value is calculated according to the linear weight, and then the Chebyshev inequality is used for achieving the purpose of identifying the web fingerprint.
In order to achieve the purpose, the invention adopts the following technical scheme:
a web fingerprint identification method based on Chebyshev inequality comprises the following steps:
collecting the characteristics of known web fingerprints and establishing a triple library;
collecting the characteristics of the web fingerprint of the target website, and calculating to obtain a sample value of the web fingerprint of the target website;
and substituting the sample value of the target website web fingerprint into the Chebyshev inequality, traversing the triple library, and judging whether the target website web fingerprint accords with the web fingerprint in the triple library.
Further, the collecting the characteristics of the known web fingerprints and establishing a triple library comprise:
the method comprises the steps of collecting a large number of characteristics of known web fingerprints, establishing a sample set according to a linear weight method, and establishing a triple library according to the sample set.
Further, the collecting a plurality of characteristics of known web fingerprints, establishing a sample set according to a linear weighting method, and establishing a triple library according to the sample set includes:
calculating sample values X of the web fingerprints according to different elements of the characteristics of each web fingerprint according to a weight proportion, wherein a calculation formula of the sample values X of the web fingerprints is as follows:
X=x1*r1+x2*r2+x3*r3+x4*r4
x1、x2、x3、x4are the different elements that make up the features of each web fingerprint, whose values are 1 or 0, respectively, 1 if present, and 0 if not present; wherein:
x1url path, x2Html source key, x3Http response header keyword, x4Default error page keywords;
r1、r2、r3、r4is the corresponding weight coefficient, r, of each element1+r2+r3+r4100 percent; wherein:
r1is the weight coefficient of the url path, r2Is the weight coefficient, r, of the html source code keyword3Is the weight coefficient, r, of the http response header keyword4Is the weight coefficient of the default error page keyword; r is1+r2+r3+r4=100%;
Finding a large number of websites with known web fingerprints on the Internet, obtaining a large number of sample values according to the calculation formula, and establishing a sample set of the web fingerprints;
calculating the mathematical expected mu and standard deviation of the sample set of each web fingerprint according to a large number of sample values;
and storing the web fingerprint name, the mathematical expectation mu and the standard deviation as a triple into a database to obtain a triple library.
Further, the step of traversing the triple library by substituting the sample value of the target website web fingerprint into the chebyshev inequality to determine whether the target website web fingerprint matches the web fingerprint in the triple library includes:
the chebyshev inequality is as follows: p { | XTarget-μ|>=}<=2/2
XTargetFor the sample value of the target web site web fingerprint, μ is the mathematical expectation, is the standard deviation,2is a constant value, set as T, if the probability P is obtained<=2/2Then the web fingerprint of the target website is considered to be the web fingerprint in the triplet corresponding to the mathematical expectation μ and the standard.
Compared with the prior art, the invention has the beneficial effects that:
compared with the prior art, the unique technical characteristic of the technical scheme is that the sample value of the collected characteristic is calculated in a linear weight mode, and then the web fingerprint identification is carried out according to the Chebyshev inequality method; the web fingerprint result obtained by the method of the invention is more accurate than the result of comparing the traditional feature library one by one.
Drawings
FIG. 1 is a flow chart of a web fingerprint identification method based on Chebyshev inequality according to the present invention.
Detailed Description
The present invention will be further described with reference to the following examples, which are intended to illustrate only some, but not all, of the embodiments of the present invention. Based on the embodiments of the present invention, other embodiments used by those skilled in the art without any creative effort belong to the protection scope of the present invention.
Example 1:
as shown in fig. 1, a web fingerprint identification method based on chebyshev inequality includes the following steps:
collecting the characteristics of known web fingerprints and establishing a triple library;
collecting the characteristics of the web fingerprint of the target website, and calculating to obtain a sample value of the web fingerprint of the target website;
and substituting the sample value of the target website web fingerprint into the Chebyshev inequality, traversing the triple library, and judging whether the target website web fingerprint accords with the web fingerprint in the triple library.
Further, the collecting the characteristics of the known web fingerprints and establishing a triple library comprise:
the method comprises the steps of collecting a large number of characteristics of known web fingerprints, establishing a sample set according to a linear weight method, and establishing a triple library according to the sample set.
Further, the collecting a plurality of characteristics of known web fingerprints, establishing a sample set according to a linear weighting method, and establishing a triple library according to the sample set includes:
calculating sample values X of the web fingerprints according to different elements of the characteristics of each web fingerprint according to a weight proportion, wherein a calculation formula of the sample values X of the web fingerprints is as follows:
X=x1*r1+x2*r2+x3*r3+x4*r4
x1、x2、x3、x4are the different elements that make up the features of each web fingerprint, whose values are 1 or 0, respectively, 1 if present, and 0 if not present; wherein:
x1url path, x2Html source key, x3Http response header keyword, x4Default error page keywords;
r1、r2、r3、r4is the corresponding weight coefficient, r, of each element1+r2+r3+r4100 percent; wherein:
r1is the weight coefficient of the url path, r2Is the weight coefficient, r, of the html source code keyword3Is the weight coefficient, r, of the http response header keyword4Is the weight coefficient of the default error page keyword; r is1+r2+r3+r4=100%;
Finding a large number of websites with known web fingerprints on the Internet, obtaining a large number of sample values according to the calculation formula, and establishing a sample set of the web fingerprints;
calculating the mathematical expected mu and standard deviation of the sample set of each web fingerprint according to a large number of sample values;
and storing the web fingerprint name, the mathematical expectation mu and the standard deviation as a triple into a database to obtain a triple library.
Further, the step of traversing the triple library by substituting the sample value of the target website web fingerprint into the chebyshev inequality to determine whether the target website web fingerprint matches the web fingerprint in the triple library includes:
the chebyshev inequality is as follows: p { | XTarget-μ|>=}<=2/2
XTargetFor the sample value of the target web site web fingerprint, μ is the mathematical expectation, is the standard deviation,2is a constant value, set as T, if the probability P is obtained<=2/2Then the web fingerprint of the target website is considered to be the mu sum mark which is expected by mathematicsWeb fingerprints in quasi-corresponding triples.
Specific examples are given below for illustration:
1) firstly, calculating according to different weight proportions of elements forming each web fingerprint feature, wherein the calculation formula is as follows:
X=x1*r1+x2*r2+x3*r3+x4*r4
a large number of websites with known web fingerprints are found on the Internet, more sample values are obtained according to the formula, and the mathematical expected mu and standard deviation of each fingerprint sample set are calculated according to the plurality of sample values. Assume that the mathematical expectation μ and standard deviation of the wordpress fingerprint are 1,0.5, respectively.
Storing web fingerprint name, mathematical expectation mu and standard deviation as a triple into a database
2) Com, if a target website http:// test.com is provided, firstly, feature collection is carried out on a target to acquire a web fingerprint of the website, and the specific steps are as follows:
2.1 accessing http:// test. com/wp-admin to see if access is available, if access is available x1 is 1, if not, x1 is 0;
com, check html source code in the homepage, check whether a keyword wordpress exists, if so, x2 is equal to 1, and if not, x2 is equal to 0;
com, check the return header information of the server, check whether the return header information contains an X-Powered-By php field, if so, X3 is 1, and if not, X3 is 0;
2.4 randomly accessing a nonexistent path, namely accessing http:// test. com/2332/34sfsd/, checking whether the error information contains a wordpress keyword, if so, x4 is 1, and if not, x4 is 0;
let x1, x2, x3, x4 have values of 1,1,0,1, respectively; then, the sample values are calculated according to the above linear weight algorithm:
then the sample value of http:// test.com is X ═ 1 × 30% +1 × 20% +0 × 30% +1 × 20% + 0.5;
3) substituting the sample value obtained in the last step into the Chebyshev inequality, traversing the triple obtained in the first step, and setting2=0.5, judging whether the obtained probability P is less than or equal to2/2The chebyshev inequality is as follows:
P{|X-μ|>=}<=2/2
the triple of Wordpress is (Wordpress,1,0.5), when traversing to the triple of Wordpress, the sample value X of http:// test.com is 0.5, and is substituted into the Chebyshev inequality, so as to obtain:
P{|0.5-1|>=0.1}<=0.25/0.5=0.5
because P is 0.5, the website http:// test.com is considered to adopt a wordpress framework.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.
Claims (4)
1. A web fingerprint identification method based on Chebyshev inequality is characterized by comprising the following steps:
collecting the characteristics of known web fingerprints and establishing a triple library;
collecting the characteristics of the web fingerprint of the target website, and calculating to obtain a sample value of the web fingerprint of the target website;
and substituting the sample value of the target website web fingerprint into the Chebyshev inequality, traversing the triple library, and judging whether the target website web fingerprint accords with the web fingerprint in the triple library.
2. The chebyshev-inequality-based web fingerprint identification method according to claim 1, wherein the gathering of the features of the known web fingerprints and the establishing of the triple library comprise:
the method comprises the steps of collecting a large number of characteristics of known web fingerprints, establishing a sample set according to a linear weight method, and establishing a triple library according to the sample set.
3. The chebyshev-inequality-based web fingerprint identification method according to claim 2, wherein the collecting features of a large number of known web fingerprints, establishing a sample set according to a linear weighting method, and establishing a triplet library according to the sample set comprises:
calculating sample values X of the web fingerprints according to different elements of the characteristics of each web fingerprint according to a weight proportion, wherein a calculation formula of the sample values X of the web fingerprints is as follows:
X=x1*r1+x2*r2+x3*r3+x4*r4
x1、x2、x3、x4are the different elements that make up the features of each web fingerprint, whose values are 1 or 0, respectively, 1 if present, and 0 if not present; wherein:
x1url path, x2Html source key, x3Http response header keyword, x4Default error page keywords;
r1、r2、r3、r4is the corresponding weight coefficient, r, of each element1+r2+r3+r4100 percent; wherein:
r1is the weight coefficient of the url path, r2Is the weight coefficient, r, of the html source code keyword3Is the weight coefficient, r, of the http response header keyword4Is the weight coefficient of the default error page keyword; r is1+r2+r3+r4=100%;
Finding a large number of websites with known web fingerprints on the Internet, obtaining a large number of sample values according to the calculation formula, and establishing a sample set of the web fingerprints;
calculating the mathematical expected mu and standard deviation of the sample set of each web fingerprint according to a large number of sample values;
and storing the web fingerprint name, the mathematical expectation mu and the standard deviation as a triple into a database to obtain a triple library.
4. The chebyshev-inequality-based web fingerprint identification method according to claim 3, wherein the step of traversing the triple library by substituting the sample value of the web fingerprint of the target website into the chebyshev-inequality for determining whether the web fingerprint of the target website conforms to the web fingerprint in the triple library comprises:
the chebyshev inequality is as follows: p { | XTarget-μ|>=}<=2/2
XTargetFor the sample value of the target web site web fingerprint, μ is the mathematical expectation, is the standard deviation,2is a constant value, set as T, if the probability P is obtained<=2/2Then the web fingerprint of the target website is considered to be the web fingerprint in the triplet corresponding to the mathematical expectation μ and the standard.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010574486.5A CN111737702A (en) | 2020-06-22 | 2020-06-22 | Web fingerprint identification method based on Chebyshev inequality |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010574486.5A CN111737702A (en) | 2020-06-22 | 2020-06-22 | Web fingerprint identification method based on Chebyshev inequality |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111737702A true CN111737702A (en) | 2020-10-02 |
Family
ID=72650436
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010574486.5A Pending CN111737702A (en) | 2020-06-22 | 2020-06-22 | Web fingerprint identification method based on Chebyshev inequality |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111737702A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112347328A (en) * | 2020-10-27 | 2021-02-09 | 杭州安恒信息技术股份有限公司 | Network platform identification method, device, equipment and readable storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103065095A (en) * | 2013-01-29 | 2013-04-24 | 四川大学 | WEB vulnerability scanning method and vulnerability scanner based on fingerprint recognition technology |
CN104954192A (en) * | 2014-03-27 | 2015-09-30 | 东华软件股份公司 | Network flow monitoring method and device |
CN105337985A (en) * | 2015-11-19 | 2016-02-17 | 北京师范大学 | Attack detection method and system |
US20190294642A1 (en) * | 2017-08-24 | 2019-09-26 | Bombora, Inc. | Website fingerprinting |
CN110311888A (en) * | 2019-05-09 | 2019-10-08 | 深信服科技股份有限公司 | A kind of Web anomalous traffic detection method, device, equipment and medium |
CN110879891A (en) * | 2019-08-14 | 2020-03-13 | 奇安信科技集团股份有限公司 | Vulnerability detection method and device based on web fingerprint information |
CN111008405A (en) * | 2019-12-06 | 2020-04-14 | 杭州安恒信息技术股份有限公司 | Website fingerprint identification method based on file Hash |
-
2020
- 2020-06-22 CN CN202010574486.5A patent/CN111737702A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103065095A (en) * | 2013-01-29 | 2013-04-24 | 四川大学 | WEB vulnerability scanning method and vulnerability scanner based on fingerprint recognition technology |
CN104954192A (en) * | 2014-03-27 | 2015-09-30 | 东华软件股份公司 | Network flow monitoring method and device |
CN105337985A (en) * | 2015-11-19 | 2016-02-17 | 北京师范大学 | Attack detection method and system |
US20190294642A1 (en) * | 2017-08-24 | 2019-09-26 | Bombora, Inc. | Website fingerprinting |
CN110311888A (en) * | 2019-05-09 | 2019-10-08 | 深信服科技股份有限公司 | A kind of Web anomalous traffic detection method, device, equipment and medium |
CN110879891A (en) * | 2019-08-14 | 2020-03-13 | 奇安信科技集团股份有限公司 | Vulnerability detection method and device based on web fingerprint information |
CN111008405A (en) * | 2019-12-06 | 2020-04-14 | 杭州安恒信息技术股份有限公司 | Website fingerprint identification method based on file Hash |
Non-Patent Citations (1)
Title |
---|
闫淑筠 等: ""一种有效的Web指纹识别方法"", 《中国科学院大学学报》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112347328A (en) * | 2020-10-27 | 2021-02-09 | 杭州安恒信息技术股份有限公司 | Network platform identification method, device, equipment and readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112866023B (en) | Network detection method, model training method, device, equipment and storage medium | |
CN107437026B (en) | Malicious webpage advertisement detection method based on advertisement network topology | |
CN112989348B (en) | Attack detection method, model training method, device, server and storage medium | |
CN110505202B (en) | Attack organization discovery method and system | |
CN113785289A (en) | System and method for dynamically generating a set of API endpoints | |
CN113706100B (en) | Real-time detection and identification method and system for Internet of things terminal equipment of power distribution network | |
CN114168968A (en) | Vulnerability mining method based on Internet of things equipment fingerprints | |
CN114650176A (en) | Phishing website detection method and device, computer equipment and storage medium | |
CN110708339A (en) | Correlation analysis method based on WEB log | |
CN111737702A (en) | Web fingerprint identification method based on Chebyshev inequality | |
CN115098151A (en) | Fine-grained intranet equipment firmware version detection method | |
CN114201756A (en) | Vulnerability detection method and related device for intelligent contract code segment | |
CN114372267A (en) | Malicious webpage identification and detection method based on static domain, computer and storage medium | |
CN112968870A (en) | Network group discovery method based on frequent itemset | |
CN109992960B (en) | Counterfeit parameter detection method and device, electronic equipment and storage medium | |
CN110851828A (en) | Malicious URL monitoring method and device based on multi-dimensional features and electronic equipment | |
CN112003884A (en) | Network asset acquisition and natural language retrieval method | |
CN113992625B (en) | Domain name source station detection method, system, computer and readable storage medium | |
CN115392238A (en) | Equipment identification method, device, equipment and readable storage medium | |
CN115085948B (en) | Network security situation assessment method based on improved D-S evidence theory | |
CN110147506B (en) | URL duplication eliminating method and device | |
CN113992390A (en) | Phishing website detection method and device and storage medium | |
CN108573155B (en) | Method and device for detecting vulnerability influence range, electronic equipment and storage medium | |
CN114257565A (en) | Method, system and server for mining domain name with potential threat | |
CN116723050B (en) | Imitation website detection method, device, equipment and medium based on graph database |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20201002 |