CN111708967A - Fingerprint identification method based on website map - Google Patents

Fingerprint identification method based on website map Download PDF

Info

Publication number
CN111708967A
CN111708967A CN202010530722.3A CN202010530722A CN111708967A CN 111708967 A CN111708967 A CN 111708967A CN 202010530722 A CN202010530722 A CN 202010530722A CN 111708967 A CN111708967 A CN 111708967A
Authority
CN
China
Prior art keywords
website
tree
fingerprint
sitemap
map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010530722.3A
Other languages
Chinese (zh)
Other versions
CN111708967B (en
Inventor
刘传兴
陈怡�
祝晓春
周波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Zheda Net New International Software Technology Service Co ltd
Original Assignee
Zhejiang Zheda Net New International Software Technology Service Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Zheda Net New International Software Technology Service Co ltd filed Critical Zhejiang Zheda Net New International Software Technology Service Co ltd
Priority to CN202010530722.3A priority Critical patent/CN111708967B/en
Publication of CN111708967A publication Critical patent/CN111708967A/en
Application granted granted Critical
Publication of CN111708967B publication Critical patent/CN111708967B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a fingerprint identification method based on a website map, which comprises the following steps: capturing a website map of a target website, and representing the website map by using an n-ary tree; pruning the website map tree according to the website path blacklist, reserving a website path capable of reflecting the website fingerprint, and generating a simplified website map tree; establishing a website fingerprint-website map tree library; traversing a website map tree in a website fingerprint-website map tree library, and matching the website map tree with a target website map tree to acquire fingerprint information of a target website; and updating the corresponding information of the website fingerprint of the target website and the website map tree into a website fingerprint-website map tree library. The invention improves the efficiency and the accuracy of acquiring the website fingerprint.

Description

Fingerprint identification method based on website map
Technical Field
The invention provides a website fingerprint identification method based on a constructed website map, and relates to core technologies and algorithms for website map filtering, website fingerprint identification on the filtered website map, and website fingerprint identification library establishment.
Background
With the rapid development of the mobile internet, the explosive growth of the website is promoted, and meanwhile, as the website defense technology is increased, the defense system is improved. When a safety tester carries out safety test on a website, the fingerprint of the website cannot be quickly identified. On the other hand, the conventional website fingerprint identification has poor stability, and a great error is often caused by the change of a deployment file or the identification cannot be performed. Accurate and rapid website fingerprint identification can help security testers to carry out security tests more pertinently.
Disclosure of Invention
The invention aims to provide a fingerprint identification method based on a website map, aiming at the defects of the prior art.
The purpose of the invention is realized by the following technical scheme: a fingerprint identification method based on a sitemap comprises the following steps:
(1) generating a website map tree: capturing a website map of a target website, and representing the website map by using an n-ary tree T0;
(2) pruning a website map tree: pruning the sitemap tree T0 according to the website path blacklist, reserving a website path capable of reflecting the website fingerprint, and generating a simplified sitemap tree T1;
(3) establishing a website fingerprint-website map tree library D1;
(4) fingerprint identification: traversing a website map tree in a website fingerprint-website map tree library, and matching the website map tree with a target website map tree to acquire fingerprint information of a target website;
(5) website fingerprint-website map tree library updating: and updating the corresponding information of the website fingerprint of the target website and the website map tree into a website fingerprint-website map tree library.
Further, each node of the n-ary tree T0 has two attributes: the value of the current node val and the list of children nodes child of the current node children.
Further, in the step (2), a blacklist is established for general fields which cannot reflect the website characteristics;
when a certain node of the sitemap tree exists in the blacklist list, cutting off the node;
and when the number of child nodes of a certain node of the sitemap tree is larger than the node threshold value, cutting the node.
Further, in the step (3), the method for constructing the website fingerprint-website map tree library specifically includes:
establishing a website fingerprint library D0, and storing a plurality of website fingerprints;
establishing a website fingerprint-website map tree library D1, and storing the one-to-many relationship between the website fingerprints and the website map tree;
for each website fingerprint in D0, find the website corresponding to the website fingerprint, then obtain the sitemap of the website, and further obtain the sitemap tree of the website, and add the corresponding information of the website fingerprint-sitemap tree to D1.
Further, in the step (4), the sitemap tree of the target website is set to T0, each piece of data in D1 is traversed to obtain the corresponding information of the website fingerprint-sitemap tree, the currently traversed website fingerprint is set to F1, the sitemap tree is set to T1, and T0 and T1 are matched, which includes the following specific calculation methods:
firstly, traversing T0 hierarchically, comparing the val of the traversed node with the val of the root node of T1, and if the val of the traversed node is different from the val of the root node of T1, continuing to traverse downwards; if the current node is the root node, setting a tree with the current node as the root node in T0 as T2, and then calculating the similarity between T2 and T1;
if the similarity of T1 and T2 is calculated to be higher than the similarity threshold, the successful matching of T1 and T2 can be considered, and the target website can be further confirmed to be matched with the website fingerprint F1; if the T1 and the T2 can be successfully matched, intercepting the T2, reserving nodes within the height H range of the T1, marking as T3, and recording corresponding information of F1-T3 at the moment;
and after the similarity calculation of the T1 and the T2 is finished, continuously traversing the T0 in a hierarchy mode, and circulating the calculation process to finally obtain a group of fingerprints matched with the T0 and information of a new webpage map tree T3 corresponding to each fingerprint in the group of fingerprints.
Further, in the step (4), a specific method for calculating the similarity between T1 and T2 is as follows:
performing hierarchical traversal from a root node of the T1, performing comparison of nodes val between the T1 and the T2 in the same layer, and finishing the similarity calculation process after the T1 traversal is finished; recording the depth d of the root node as 0, and sequentially increasing the depth of the node from top to bottom; at each layer, the similarity of the layer is calculated firstly, and the calculation formula is as follows:
Figure BDA0002535109510000021
the maximum number of the nodes refers to the maximum value of the number of the nodes of the d layers at the same depth of T1 and T2;
after the similarity calculation of each layer of depth is completed, summing is carried out, and sum is recorded as sum;
the similarity calculation formulas of T1 and T2 are as follows:
Figure BDA0002535109510000022
further, in the step (5), the fingerprints matched with the target website obtained in the step (4) and the information of the new webpage map tree T3 corresponding to each fingerprint are updated to the website fingerprint-website map tree library, so that the information of the website fingerprint-website map tree library is further expanded, and the coverage of the website fingerprint-website map tree library is increased.
The invention has the beneficial effects that: the method includes the steps that a website map of a target website is captured, and the website map is represented by an n-ary tree; pruning the website map tree according to the website path blacklist, reserving a website path capable of reflecting the website fingerprint, and generating a simplified website map tree; establishing a website fingerprint-website map tree library; traversing a website map tree in a website fingerprint-website map tree library, and matching the website map tree with a target website map tree to acquire fingerprint information of a target website; and updating the corresponding information of the website fingerprint of the target website and the website map tree into a website fingerprint-website map tree library. The invention improves the efficiency and the accuracy of acquiring the website fingerprint.
Drawings
FIG. 1 is a schematic view of a process of fingerprint identification based on a sitemap according to the present invention;
FIG. 2 is a schematic diagram of an n-ary tree corresponding to a Baidu sitemap;
FIG. 3 is a simplified schematic diagram of the matching process of the sitemap trees T0 and T1.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described and will be readily apparent to those of ordinary skill in the art without departing from the spirit of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.
As shown in fig. 1, the fingerprint identification method based on the sitemap provided by the present invention includes the following steps:
(1) generating a website map tree: capturing a website map of a target website, and representing the website map by using an n-ary tree T0;
(2) pruning a website map tree: pruning the sitemap tree T0 according to the website path blacklist, reserving a website path capable of reflecting the website fingerprint, and generating a simplified sitemap tree T1;
(3) establishing a website fingerprint-website map tree library;
(4) fingerprint identification: traversing a website map tree in a website fingerprint-website map tree library, and matching the website map tree with a target website map tree to acquire fingerprint information of a target website;
(5) website fingerprint-website map tree library updating: and updating the corresponding information of the website fingerprint of the target website and the website map tree into a website fingerprint-website map tree library.
In the step (1), capturing a website map of a target website, and representing the website map by using an n-ary tree T0, which is specifically as follows:
for example, a hundred-degree official website (https:// www.baidu.com /) is taken as an example, and an n-ary tree corresponding to a website map is shown in FIG. 2. It should be noted that the sitemap refers to a navigation web page file generated according to the structure, frame, and content of a website, and any manner in the prior art may be adopted for how to capture the sitemap of a target website.
Further, the data structure of the nodes in the n-ary tree is defined as follows:
Figure BDA0002535109510000031
Figure BDA0002535109510000041
it can be seen that each node has two attributes, val representing the value of the current node and children representing the list of children of the current node.
In the step (2), a blacklist is established for general fields which cannot reflect the characteristics of the website, and the blacklist includes but is not limited to the following named fields: time, pure numbers, etc.
When a certain node of the sitemap tree exists in the blacklist, the node is cut off.
When the number of child nodes of a certain node of the sitemap tree is greater than the node threshold (100 can be taken), the node is pruned, and the pruned sitemap tree T1 is generated.
In the step (3), the website fingerprint is used to realize accurate identification of the target web application, and specifically, features of five major facets, i.e., application name (version), server software (version), programming language (version), application framework (version), and application component, of the target web application are identified.
The method for constructing the website fingerprint-website map tree library specifically comprises the following steps: and establishing a website fingerprint library D0, and storing a plurality of website fingerprints. A site fingerprint-site map tree library D1 is created for storing the one-to-many relationship between site fingerprints and site map trees. Since the establishment of D1 establishes correspondence based on all website fingerprints present in D0, the richness of the types of website fingerprints in D0 determines the richness of the types of fingerprints in D1. Further, since the identification of the target site fingerprint is performed by comparing the similarity between the sitemap tree of the target site and the sitemap tree of the known site fingerprint in D1, the richer the fingerprint types in D0 are, the richer the fingerprint types in D1 are, and the greater the probability of accurately identifying the site fingerprint of the target site is, in the process of establishing D0, the more existing site fingerprints are added as much as possible. After the complete website fingerprint library D0 is built, the website fingerprint-website map tree library D1 is built. In this process, for each website fingerprint in D0, an implementation of a typical sitemap tree corresponding to the website fingerprint needs to be found, and in the implementation, a website corresponding to the website fingerprint is found first, then a sitemap of the website is obtained, and further a sitemap tree of the website is obtained, and then correspondence information of the website fingerprint-sitemap tree is added to D1. It should be noted here that the same website fingerprint may correspond to multiple website map trees, so the correspondence between the website fingerprint and the website map tree is one-to-many.
In the step (4), the matching process of the sitemap tree and the target sitemap tree in the sitemap tree library of the website fingerprint is as follows:
and setting the obtained sitemap tree of the target website as T0, traversing each piece of data in D1 to obtain corresponding information of the website fingerprint-sitemap tree, and setting the currently traversed website fingerprint as F1 and the sitemap tree as T1. Then, matching is performed on T0 and T1, and in order to describe the matching process more clearly, simplified sitemap trees T0 and T1 in fig. 3 are used as an example for illustration, where nodes in the sitemap trees are also shown in a simplified manner. The specific calculation method is as follows:
firstly, traversing T0 hierarchically, comparing the val of the traversed node with the val of the root node of T1, and if the val of the traversed node is different from the val of the root node of T1, continuing to traverse downwards; if the two nodes are the same, a tree with the current node as the root node in T0 is set as T2, and then the similarity between T2 and T1 is calculated. In fig. 3, when traversing to the node b in T0, the condition is satisfied, at this time, T2 is T2 shown in fig. 2, and the specific method for calculating the similarity between T1 and T2 is as follows:
the height of T1 is first calculated and is denoted as H. Then, from the root node of T1, i.e. node b, hierarchical traversal is started, and comparison of node val between T1 and T2 is performed at the same layer, and after the traversal of T1 is completed, the similarity calculation process is ended. And recording the depth d of the root node as 0, and sequentially increasing the depth of the node from top to bottom. At each layer, the similarity of the layer is calculated firstly, and the calculation formula is as follows:
Figure BDA0002535109510000051
the maximum number of nodes refers to the maximum value of the number of nodes at the d-layer of the same depth of T1 and T2.
And after the similarity calculation of each layer of depth is completed, summing is carried out, and sum is recorded as sum. The similarity of T1 and T2 is calculated as follows:
Figure BDA0002535109510000052
according to the above calculation method, if the similarity of T1 and T2 is calculated to be 50%, it indicates that nearly 50% of nodes in T1 are matched with T2, and at this time, it can be considered that the matching degree of T1 and T2 is high. Therefore, 50% may be used as the threshold for confirming the matching of T1 and T2, that is, if the similarity of T1 and T2 is calculated to be higher than 50%, it may be considered that T1 and T2 achieve a successful matching, that is, it may be further confirmed that the target website may match the website fingerprint F1. And if the T1 and the T2 can be successfully matched, intercepting the T2, reserving nodes in the height H range, marking as T3, and recording corresponding information of F1-T3 at the moment.
After the similarity calculation of the T1 and the T2 is completed, the hierarchy traversal T0 is continued, and the calculation process is circulated, so that a group of fingerprints matched with the T0 and information of a new netpage map tree T3 corresponding to each fingerprint in the group of fingerprints can be finally obtained.
In the step (5), the updating of the website fingerprint-website map tree library specifically comprises the following steps: finally, the fingerprint matched with the target website and the information of the new webpage map tree T3 corresponding to each fingerprint are obtained in the step (4), and the obtained information of the new website fingerprint-website map tree is updated into the website fingerprint-website map tree library, so that the information of the website fingerprint-website map tree library can be further expanded, the coverage of the website fingerprint-website map tree library is increased, and the efficiency and the accuracy of acquiring the website fingerprint are further improved.
The foregoing is only a preferred embodiment of the present invention, and although the present invention has been disclosed in the preferred embodiments, it is not intended to limit the present invention. Those skilled in the art can make numerous possible variations and modifications to the present teachings, or modify equivalent embodiments to equivalent variations, without departing from the scope of the present teachings, using the methods and techniques disclosed above. Therefore, any simple modification, equivalent change and modification made to the above embodiments according to the technical essence of the present invention are still within the scope of the protection of the technical solution of the present invention, unless the contents of the technical solution of the present invention are departed.

Claims (7)

1. A fingerprint identification method based on a website map is characterized by comprising the following steps:
(1) generating a website map tree: capturing a website map of a target website, and representing the website map by using an n-ary tree T0;
(2) pruning a website map tree: pruning the sitemap tree T0 according to the website path blacklist, reserving a website path capable of reflecting the website fingerprint, and generating a simplified sitemap tree T1;
(3) establishing a website fingerprint-website map tree library D1;
(4) fingerprint identification: traversing a website map tree in a website fingerprint-website map tree library, and matching the website map tree with a target website map tree to acquire fingerprint information of a target website;
(5) website fingerprint-website map tree library updating: and updating the corresponding information of the website fingerprint of the target website and the website map tree into a website fingerprint-website map tree library.
2. The sitemap-based fingerprint identification method according to claim 1, wherein in the step (1), each node of the n-ary tree T0 has two attributes: the value of the current node val and the list of children nodes child of the current node children.
3. The sitemap-based fingerprint identification method according to claim 1, wherein in the step (2), a blacklist is established for general fields that cannot reflect the website characteristics;
when a certain node of the sitemap tree exists in the blacklist list, cutting off the node;
and when the number of child nodes of a certain node of the sitemap tree is larger than the node threshold value, cutting the node.
4. The sitemap-based fingerprint identification method according to claim 1, wherein in the step (3), the website fingerprint-sitemap tree library construction method specifically comprises:
establishing a website fingerprint library D0, and storing a plurality of website fingerprints;
establishing a website fingerprint-website map tree library D1, and storing the one-to-many relationship between the website fingerprints and the website map tree;
for each website fingerprint in D0, find the website corresponding to the website fingerprint, then obtain the sitemap of the website, and further obtain the sitemap tree of the website, and add the corresponding information of the website fingerprint-sitemap tree to D1.
5. The sitemap-based fingerprint identification method according to claim 1, wherein in the step (4), the sitemap tree of the target website is set as T0, each piece of data in D1 is traversed to obtain the corresponding information of the sitemap tree, the currently traversed website fingerprint is F1, the sitemap tree is T1, and T0 and T1 are matched, and the specific calculation method is as follows:
firstly, traversing T0 hierarchically, comparing the val of the traversed node with the val of the root node of T1, and if the val of the traversed node is different from the val of the root node of T1, continuing to traverse downwards; if the current node is the root node, setting a tree with the current node as the root node in T0 as T2, and then calculating the similarity between T2 and T1;
if the similarity of T1 and T2 is calculated to be higher than the similarity threshold, the successful matching of T1 and T2 can be considered, and the target website can be further confirmed to be matched with the website fingerprint F1; if the T1 and the T2 can be successfully matched, intercepting the T2, reserving nodes within the height H range of the T1, marking as T3, and recording corresponding information of F1-T3 at the moment;
and after the similarity calculation of the T1 and the T2 is finished, continuously traversing the T0 in a hierarchy mode, and circulating the calculation process to finally obtain a group of fingerprints matched with the T0 and information of a new webpage map tree T3 corresponding to each fingerprint in the group of fingerprints.
6. The sitemap-based fingerprint identification method according to claim 5, wherein in the step (4), the specific method for calculating the similarity of T1 and T2 is as follows:
performing hierarchical traversal from a root node of the T1, performing comparison of nodes val between the T1 and the T2 in the same layer, and finishing the similarity calculation process after the T1 traversal is finished; recording the depth d of the root node as 0, and sequentially increasing the depth of the node from top to bottom; at each layer, the similarity of the layer is calculated firstly, and the calculation formula is as follows:
Figure FDA0002535109500000021
the maximum number of the nodes refers to the maximum value of the number of the nodes of the d layers at the same depth of T1 and T2;
after the similarity calculation of each layer of depth is completed, summing is carried out, and sum is recorded as sum;
the similarity calculation formulas of T1 and T2 are as follows:
Figure FDA0002535109500000022
7. the sitemap-based fingerprint identification method according to claim 5, wherein in the step (5), the fingerprint matched with the target website obtained in the step (4) and the information of the new netpage map tree T3 corresponding to each fingerprint are updated into the sitemap-sitemap tree library, so as to further expand the information of the sitemap-sitemap tree library and increase the coverage of the sitemap-sitemap tree library.
CN202010530722.3A 2020-06-11 2020-06-11 Fingerprint identification method based on sitemap Active CN111708967B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010530722.3A CN111708967B (en) 2020-06-11 2020-06-11 Fingerprint identification method based on sitemap

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010530722.3A CN111708967B (en) 2020-06-11 2020-06-11 Fingerprint identification method based on sitemap

Publications (2)

Publication Number Publication Date
CN111708967A true CN111708967A (en) 2020-09-25
CN111708967B CN111708967B (en) 2023-05-16

Family

ID=72539827

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010530722.3A Active CN111708967B (en) 2020-06-11 2020-06-11 Fingerprint identification method based on sitemap

Country Status (1)

Country Link
CN (1) CN111708967B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115879110A (en) * 2023-02-09 2023-03-31 北京金信网银金融信息服务有限公司 System for identifying financial risk website based on fingerprint penetration technology

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1840765A1 (en) * 2006-03-02 2007-10-03 Indigen Solutions SARL Process for extracting data from a web site
WO2010108421A1 (en) * 2009-03-27 2010-09-30 腾讯科技(深圳)有限公司 Method and apparatus for authenticating a website
CN103116760A (en) * 2013-02-18 2013-05-22 人民搜索网络股份公司 Method and device for identifying text-missing web pages
CN103778164A (en) * 2012-10-26 2014-05-07 广州市邦富软件有限公司 Web page link characteristic mode recognition algorithm
CN104182412A (en) * 2013-05-24 2014-12-03 中国移动通信集团安徽有限公司 Webpage crawling method and webpage crawling system
EP3147867A1 (en) * 2015-09-24 2017-03-29 Samsung Electronics Co., Ltd. Apparatus for and method of traversing tree
CN108563729A (en) * 2018-04-04 2018-09-21 福州大学 A kind of bidding website acceptance of the bid information extraction method based on dom tree
CN109376291A (en) * 2018-11-08 2019-02-22 杭州安恒信息技术股份有限公司 A kind of method and device of the website fingerprint information scanning based on web crawlers
CN109783753A (en) * 2018-12-14 2019-05-21 平安普惠企业管理有限公司 The tree-shaped drawing generating method of web site url, device, equipment and storage medium
CN110851606A (en) * 2019-11-18 2020-02-28 杭州安恒信息技术股份有限公司 Website clustering method and system based on webpage structure similarity
CN111008405A (en) * 2019-12-06 2020-04-14 杭州安恒信息技术股份有限公司 Website fingerprint identification method based on file Hash

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1840765A1 (en) * 2006-03-02 2007-10-03 Indigen Solutions SARL Process for extracting data from a web site
WO2010108421A1 (en) * 2009-03-27 2010-09-30 腾讯科技(深圳)有限公司 Method and apparatus for authenticating a website
CN103778164A (en) * 2012-10-26 2014-05-07 广州市邦富软件有限公司 Web page link characteristic mode recognition algorithm
CN103116760A (en) * 2013-02-18 2013-05-22 人民搜索网络股份公司 Method and device for identifying text-missing web pages
CN104182412A (en) * 2013-05-24 2014-12-03 中国移动通信集团安徽有限公司 Webpage crawling method and webpage crawling system
EP3147867A1 (en) * 2015-09-24 2017-03-29 Samsung Electronics Co., Ltd. Apparatus for and method of traversing tree
CN108563729A (en) * 2018-04-04 2018-09-21 福州大学 A kind of bidding website acceptance of the bid information extraction method based on dom tree
CN109376291A (en) * 2018-11-08 2019-02-22 杭州安恒信息技术股份有限公司 A kind of method and device of the website fingerprint information scanning based on web crawlers
CN109783753A (en) * 2018-12-14 2019-05-21 平安普惠企业管理有限公司 The tree-shaped drawing generating method of web site url, device, equipment and storage medium
CN110851606A (en) * 2019-11-18 2020-02-28 杭州安恒信息技术股份有限公司 Website clustering method and system based on webpage structure similarity
CN111008405A (en) * 2019-12-06 2020-04-14 杭州安恒信息技术股份有限公司 Website fingerprint identification method based on file Hash

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115879110A (en) * 2023-02-09 2023-03-31 北京金信网银金融信息服务有限公司 System for identifying financial risk website based on fingerprint penetration technology

Also Published As

Publication number Publication date
CN111708967B (en) 2023-05-16

Similar Documents

Publication Publication Date Title
CN108446349B (en) GIS abnormal data detection method
US8112421B2 (en) Query selection for effectively learning ranking functions
CN108268581A (en) The construction method and device of knowledge mapping
CN110990404B (en) Index data processing method and device and electronic equipment
CN109818961B (en) Network intrusion detection method, device and equipment
CN110909364B (en) Source code bipolar software security vulnerability map construction method
CN108959395B (en) Multi-source heterogeneous big data oriented hierarchical reduction combined cleaning method
CN111581092B (en) Simulation test data generation method, computer equipment and storage medium
CN113254630B (en) Domain knowledge map recommendation method for global comprehensive observation results
KR100835290B1 (en) System and method for classifying document
CN113569057B (en) Sample query method oriented to ontology tag knowledge graph
CN110852107A (en) Relationship extraction method, device and storage medium
CN111708967A (en) Fingerprint identification method based on website map
CN110333990B (en) Data processing method and device
CN106411855A (en) Vulnerability directory search method and apparatus
CN116630990B (en) RPA flow element path intelligent restoration method and system
CN104376000A (en) Webpage attribute determination method and webpage attribute determination device
CN111241293A (en) Knowledge graph algorithm constructed based on academic literature
CN115268867A (en) Abstract syntax tree clipping method
CN115392238A (en) Equipment identification method, device, equipment and readable storage medium
CN114461813A (en) Data pushing method, system and storage medium based on knowledge graph
CN114238709A (en) Character string matching method, device, equipment and readable storage medium
CN114330319A (en) Entity processing method, entity processing device, electronic equipment and storage medium
CN113946584A (en) QRB tree indexing method for massive vector data retrieval
CN111881309A (en) Electronic certificate retrieval method, device and computer readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant