CN111475464A - Method for automatically discovering and mining fingerprints of Web component - Google Patents

Method for automatically discovering and mining fingerprints of Web component Download PDF

Info

Publication number
CN111475464A
CN111475464A CN202010197426.6A CN202010197426A CN111475464A CN 111475464 A CN111475464 A CN 111475464A CN 202010197426 A CN202010197426 A CN 202010197426A CN 111475464 A CN111475464 A CN 111475464A
Authority
CN
China
Prior art keywords
component
website
file
fingerprint
library
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010197426.6A
Other languages
Chinese (zh)
Other versions
CN111475464B (en
Inventor
陈龙
周双飞
夏书银
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202010197426.6A priority Critical patent/CN111475464B/en
Publication of CN111475464A publication Critical patent/CN111475464A/en
Application granted granted Critical
Publication of CN111475464B publication Critical patent/CN111475464B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • G06F16/137Hash-based
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • G06F16/986Document structures and storage, e.g. HTML extensions
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Storage Device Security (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a method for automatically discovering and mining a Web component fingerprint, belonging to the field of computer networks. The method comprises the following steps: collecting website webpage data of different domain names and storing the website webpage data into a website webpage database; calculating any digital abstract (Hash value) of unique JS files, CSS files and static files of pictures in the source codes of the open source components; extracting a data digital abstract of which the number is more than N (N is a certain natural number which is more than 2) in a website static file feature library, and sequentially matching the data digital abstract with a data digital abstract in a component source code file feature database; extracting special file path characteristics and keyword characteristic information of the component in a component source code file characteristic library based on a website _ component association library, and matching each piece of characteristic information in a large number of websites containing the component; and selecting the feature information with more hit times in the component fingerprint library to be selected and adding the feature information into the component fingerprint library. The invention can realize automatic discovery and mining of the Web component fingerprint.

Description

Method for automatically discovering and mining fingerprints of Web component
Technical Field
The invention belongs to the field of computer networks, and relates to a method for automatically discovering and mining fingerprints of Web components.
Background
The website is composed of components, and servers, databases, web containers, plug-ins, middleware and the like belong to the website components. When identifying which components of a website are formed, the component fingerprint matching is generally adopted, wherein the component fingerprint refers to a certain section of information which can uniquely identify the component, and can be a unique static file Hash value (a JS file, a CSS file, a picture and the like), a special file path, a keyword field and the like of the component. When the matching of the component fingerprint is successful, the website is indicated to use the component.
Among the fingerprints, the Hash value identification component of the static file is the most accurate identification method. The present invention extends to this feature.
The richness and accuracy of the component fingerprint library become the main constraints of component fingerprint identification. The rapid increase of components and the alternation of component versions lead to the increase and change of component fingerprints, so that the acquisition of component fingerprints becomes a time-consuming and labor-consuming project.
The defects and shortcomings of the prior art are as follows:
the existing component fingerprint discovery is mainly completed by manual marking, so that each Web component fingerprint identification platform or open source tool has a function or way of submitting the component fingerprint, and the defects of high cost and low efficiency are overcome.
Disclosure of Invention
In view of the above, the present invention provides a method for automatically discovering and mining a Web component fingerprint, and solves the main problems of automatically discovering a component fingerprint, completing the task efficiently and at low cost, getting rid of the situation of mainly depending on manually labeling component fingerprints, and achieving the purpose of mainly automatically discovering and assisting manually labeling.
In order to achieve the purpose, the invention provides the following technical scheme:
a method for automatically discovering mined Web component fingerprints, the method comprising the steps of:
1) establishing a webpage database and a website static file numerical value abstract, namely a feature library of a Hash value, a component source code file feature library, a website _ component association library, a component fingerprint library and a component fingerprint library to be selected;
2) collecting website webpage data under different domain names and storing the website webpage data into a webpage database;
3) the website data is processed, and the method comprises the following steps:
3.1) calculating Hash values of a JavaScript language file, a cascading style sheet CSS file and a static file of a picture of the website, and path characteristics and keyword characteristics of a special file; one website has a plurality of static file numerical value abstract values, namely a Hash value, a special file path and keyword characteristics;
3.2) storing the Hash value obtained by the calculation into a static file feature library of the website, and if the Hash value is stored in a database, increasing the Count by 1;
4) calculating the Hash values of unique JS files, CSS files and static files of pictures in the source code files of the open source components and the characteristics of the path characteristic keywords of the unique files, and storing the calculation results into a component source code file characteristic database; one component has several static file Hash values;
5) the Hash value characteristics of the static files of the website are compared and matched with the Hash values of the component source code files, and the method comprises the following steps:
5.1) extracting and counting a Hash value data with the Count being more than N from a website static file feature library, wherein N is any natural number more than 2;
5.2) comparing and matching the Hash values extracted in the step 5.1 with Hash values in a component source code file characteristic database in sequence, wherein if the two Hash values are the same, the matching is successful;
5.3) if the matching in the step 5 is successful, writing the successfully matched Hash value as the fingerprint of the component into a component fingerprint library; simultaneously marking the component identification for the website containing the Hash value in a website static file characteristic database to enable the component to be associated with the website, and writing an association result into a website _ component association database; finishing the round of matching, extracting the Hash value of the next Count number > N for matching until the Hash values of all the Count number > N are matched with the Hash value in the component source code file feature library; if the matching fails, the step 5.1 is carried out;
6) extracting all relevant website information of a certain component from a website _ component relevant library, and extracting corresponding website webpage data from a webpage database;
7) extracting the special file path characteristics and the keyword characteristics of the component in a component source code file characteristic database based on the extracted component, and sequentially performing characteristic matching on the extracted characteristics in the extracted website webpage data; if the matching is successful, the feature is written into a fingerprint database of the component to be selected, and in different website webpage data, the feature Count is increased by 1 every time each feature is successfully matched;
8) selecting the characteristics of hit times Count > M (M is any natural number greater than 2) in the component fingerprint library to be selected and writing the characteristics into the component fingerprint library.
Optionally, the static file characteristics of the website include a JavaScript language file of the website, a cascading style sheet CSS file, and a Hash value of the static file of the picture.
Optionally, the component source code file features include a Hash value of a unique JS file, a unique CSS file, and a unique static file of a picture in the component source code file, and unique file path features and keyword features.
Optionally, the method for determining whether the Hash value of the static file is the component fingerprint includes: sequentially comparing the Hash value of the static file of the component with the Hash values of the static files which appear in different websites for a plurality of times, and judging whether the Hash values of the static files are the same or not; if yes, the Hash value is determined to be the component fingerprint.
Optionally, the component fingerprint mining method includes: the special file path and the keyword characteristics in the component source code file characteristics are subjected to characteristic matching in a large amount of website data containing the component; if the matching is successful, judging whether the characteristic exists in the fingerprint database of the component to be selected; if yes, the feature Count is increased by 1, and if not, the feature Count is written into a fingerprint library of the component to be selected.
Optionally, the method for selecting the special file path and the keyword feature includes: selecting a special file path with multiple hit times and keyword fingerprints to write into the component fingerprint library.
The invention has the beneficial effects that: according to the invention, the Hash value fingerprint of the static file of the component is found in a Hash value comparison mode through mathematical statistics, and on the basis of finding the Hash value fingerprint of the static file, the special file path and the keyword characteristics of the component fingerprint which can be found are continuously mined. Therefore, the fingerprint of the excavation component is automatically found.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the means of the instrumentalities and combinations particularly pointed out hereinafter.
Drawings
For the purposes of promoting a better understanding of the objects, aspects and advantages of the invention, reference will now be made to the following detailed description taken in conjunction with the accompanying drawings in which:
FIG. 1 is a process for discovering new component fingerprints;
FIG. 2 is a component fingerprint mining process.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention in a schematic way, and the features in the following embodiments and examples may be combined with each other without conflict.
Wherein the showings are for the purpose of illustrating the invention only and not for the purpose of limiting the same, and in which there is shown by way of illustration only and not in the drawings in which there is no intention to limit the invention thereto; to better illustrate the embodiments of the present invention, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The same or similar reference numerals in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if there is an orientation or positional relationship indicated by terms such as "upper", "lower", "left", "right", "front", "rear", etc., based on the orientation or positional relationship shown in the drawings, it is only for convenience of description and simplification of description, but it is not an indication or suggestion that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and therefore, the terms describing the positional relationship in the drawings are only used for illustrative purposes, and are not to be construed as limiting the present invention, and the specific meaning of the terms may be understood by those skilled in the art according to specific situations.
The invention mainly aims to overcome the defect of traditional component fingerprint discovery, and provides a method for automatically discovering and updating a component fingerprint. And counting the occurrence times of static files such as JS, CSS, pictures and the like by collecting website data of different domain names. And matching the Hash value of the static file with the number of times of which is more than N with the static file value of the component. And if the matching is successful, taking the Hash value of the static file as the component fingerprint, and further discovering other fingerprints of the component on the basis of discovering the Hash value fingerprint of the static file of the component, thereby realizing the automatic discovery of the component fingerprint.
The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments, as shown in fig. 1 and fig. 2, which are schematic flow charts of a method for automatically discovering fingerprints of a mining component according to the present invention, wherein the method includes:
as shown in fig. 1, it is assumed that web page data collection is started.
Step A, collecting website data, processing the data to obtain a Hash value of the static file, and assuming that the storage examples of the current website static file feature library are as follows.
Static filename Static file Hash value Site_url Counting
A Hash value url1,url2 2
B Hash value url 1
C Hash value url 1
When the Hash value of a static file is repeatedly collected every time, the website address of the static file is updated to Site _ url, the Count is increased by 1, for example: in the collection process, the Hash value of the static file A is collected again, and the database records are changed as follows:
static filename Static file Hash value Site_url Counting
A Hash _ A value url1,url2,url3,url4 4
B Hash _ B value url 1
C Hash _ C value url 1
And B, processing the source code file of the open source assembly, and calculating and analyzing the unique static file Hash value and other characteristics of the assembly, wherein the Hash value and other characteristics mainly comprise file path characteristics and key field characteristics. The storage example of the component source code file characteristic database is as follows:
Figure BDA0002418110840000041
and C, when the Count of a certain piece of data in the static file characteristic database of the website is greater than N (N is any natural number greater than 2), extracting the Hash value of the piece of data. Component fingerprints are meaningful for a plurality of identifications in each web site, thus defining N > 2. If N is 3, only the static file A meeting the conditions in the static file feature library of the website is selected, and therefore the Hash value Hash _ A of A is extracted.
And D, sequentially comparing and matching the extracted Hash value with the Hash values of the static files of the components (component 1 and component 2).
If the matching is successful, the following two operations are executed
1) Writing the Hash _ A serving as the fingerprint of the successfully matched component into a component fingerprint library; for example, if Hash _ A matches a static file in component 1 successfully, the component fingerprint repository data records become:
component name Hash value fingerprint File path feature fingerprint Key field feature fingerprints
Component 1 Hash_A null null
2) And extracting website information containing Hash _ A from the website static file feature library based on Hash _ A, marking a label of the component on the website, and writing the label into a website _ component association library. For example, the website static file feature library contains the website information of Hash _ A with 4 pieces of "url 1, url2, url3 and url 4", so the data records of the website _ component association library are changed to:
Site_url component lists
url1 Component 1
url2 Component 1
url3 Component 1
url4 Component 1
And ending the cycle, and selecting the Hash value of the data with the next Count of N for comparison and matching.
And if the matching fails, turning to the step C.
And E, extracting the Hash value fingerprint in the component fingerprint library, and acquiring the websites containing the component Hash value fingerprint from the website _ component association database, if the number of the websites is large, selecting L websites, otherwise, selecting all websites, and performing feature matching on the file path feature and the keyword feature of the component in L website webpage data.
If the matching is successful, adding the characteristic information into the component fingerprint library to be selected, and assuming that the data stored in the current component fingerprint library to be selected is as follows:
component name Feature(s) Type Count
Component 1 Feature 1_1 keyword 1
Component 1 Feature 1_2 path 5
Component 2 Feature 2_1 keyword 8
Example (c): if the feature 1_1 of component 1 hits in the website page data successfully once again, the record in the change database is:
Figure BDA0002418110840000051
Figure BDA0002418110840000061
and F, aiming at each component, selecting a component to be selected with more hit times in the fingerprint library, namely counting a plurality of pieces of characteristic information with larger Count, setting a threshold value Y, dynamically changing Y along with the number of matched websites, and writing the component characteristics of the Count greater than Y into the component fingerprint library. Example (c): for component 1, its feature 1_2 performs better and satisfies Count > Y, the other type fields are keywords, which are key field feature fingerprints, then the database data records are updated as:
component name Hash value fingerprint File path feature fingerprint Key field feature fingerprints
Component 1 Hash_A null Feature 1_2
Through the steps, a reliable component fingerprint database is obtained automatically. Example fingerprint for component 1 "Hash value fingerprint: hash _ A, key field feature fingerprint: feature 1_2 ", which can be used for actual Web component fingerprinting.
Finally, the above embodiments are only intended to illustrate the technical solutions of the present invention and not to limit the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions, and all of them should be covered by the claims of the present invention.

Claims (6)

1. A method for automatically discovering and mining Web component fingerprints is characterized in that: the method comprises the following steps:
1) establishing a webpage database and a website static file numerical value abstract, namely a feature library of a Hash value, a component source code file feature library, a website _ component association library, a component fingerprint library and a component fingerprint library to be selected;
2) collecting website webpage data under different domain names and storing the website webpage data into a webpage database;
3) the website data is processed, and the method comprises the following steps:
3.1) calculating Hash values of a JavaScript language file, a cascading style sheet CSS file and a static file of a picture of the website, and path characteristics and keyword characteristics of a special file; one website has a plurality of static file numerical value abstract values, namely a Hash value, a special file path and keyword characteristics;
3.2) storing the Hash value obtained by the calculation into a static file feature library of the website, and if the Hash value is stored in a database, increasing the Count by 1;
4) calculating the Hash values of unique JS files, CSS files and static files of pictures in the source code files of the open source components and the characteristics of the path characteristic keywords of the unique files, and storing the calculation results into a component source code file characteristic database; one component has several static file Hash values;
5) the Hash value characteristics of the static files of the website are compared and matched with the Hash values of the component source code files, and the method comprises the following steps:
5.1) extracting and counting a Hash value data with the Count being more than N from a website static file feature library, wherein N is any natural number more than 2;
5.2) comparing and matching the Hash values extracted in the step 5.1 with Hash values in a component source code file characteristic database in sequence, wherein if the two Hash values are the same, the matching is successful;
5.3) if the matching in the step 5 is successful, writing the successfully matched Hash value as the fingerprint of the component into a component fingerprint library; simultaneously marking the component identification for the website containing the Hash value in a website static file characteristic database to enable the component to be associated with the website, and writing an association result into a website _ component association database; finishing the round of matching, extracting the Hash value of the next Count number > N for matching until the Hash values of all the Count number > N are matched with the Hash value in the component source code file feature library; if the matching fails, the step 5.1 is carried out;
6) extracting all relevant website information of a certain component from a website _ component relevant library, and extracting corresponding website webpage data from a webpage database;
7) extracting the special file path characteristics and the keyword characteristics of the component in a component source code file characteristic database based on the extracted component, and sequentially performing characteristic matching on the extracted characteristics in the extracted website webpage data; if the matching is successful, the feature is written into a fingerprint database of the component to be selected, and in different website webpage data, the feature Count is increased by 1 every time each feature is successfully matched;
8) selecting the characteristics of hit times Count > M in the component fingerprint library to be selected, and writing the characteristics into the component fingerprint library, wherein M is any natural number greater than 2.
2. The method of claim 1, wherein the method for automatically discovering fingerprints of mined Web components comprises: the static file characteristics of the website comprise a JavaScript language file of the website, a cascading style sheet CSS file and a Hash value of a static file of a picture.
3. The method of claim 1, wherein the method for automatically discovering fingerprints of mined Web components comprises: the component source code file characteristics comprise unique Hash values of JS files, CSS files and static files of pictures in the component source code files, unique file path characteristics and unique keyword characteristics.
4. The method of claim 1, wherein the method for automatically discovering fingerprints of mined Web components comprises: the method for judging whether the Hash value of the static file is the component fingerprint comprises the following steps: sequentially comparing the Hash value of the static file of the component with the Hash values of the static files which appear in different websites for a plurality of times, and judging whether the Hash values of the static files are the same or not; if yes, the Hash value is determined to be the component fingerprint.
5. The method of claim 1, wherein the method for automatically discovering fingerprints of mined Web components comprises: the component fingerprint mining method comprises the following steps: the special file path and the keyword characteristics in the component source code file characteristics are subjected to characteristic matching in a large amount of website data containing the component; if the matching is successful, judging whether the characteristic exists in the fingerprint database of the component to be selected; if yes, the feature Count is increased by 1, and if not, the feature Count is written into a fingerprint library of the component to be selected.
6. The method of claim 5, wherein the method for automatically discovering fingerprints of mined Web components comprises: the method for selecting the special file path and the keyword features comprises the following steps: selecting a special file path with multiple hit times and keyword fingerprints to write into the component fingerprint library.
CN202010197426.6A 2020-03-19 2020-03-19 Method for automatically finding and mining fingerprints of Web component Active CN111475464B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010197426.6A CN111475464B (en) 2020-03-19 2020-03-19 Method for automatically finding and mining fingerprints of Web component

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010197426.6A CN111475464B (en) 2020-03-19 2020-03-19 Method for automatically finding and mining fingerprints of Web component

Publications (2)

Publication Number Publication Date
CN111475464A true CN111475464A (en) 2020-07-31
CN111475464B CN111475464B (en) 2023-04-25

Family

ID=71747637

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010197426.6A Active CN111475464B (en) 2020-03-19 2020-03-19 Method for automatically finding and mining fingerprints of Web component

Country Status (1)

Country Link
CN (1) CN111475464B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112131508A (en) * 2020-09-25 2020-12-25 深信服科技股份有限公司 Method, equipment, device and medium for identifying fingerprint of website application framework
CN113946566A (en) * 2021-12-20 2022-01-18 北京大学 Web system fingerprint database construction method and device and electronic equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103065095A (en) * 2013-01-29 2013-04-24 四川大学 WEB vulnerability scanning method and vulnerability scanner based on fingerprint recognition technology
US20140281919A1 (en) * 2013-03-15 2014-09-18 Webroot Inc. Detecting a change to the content of information displayed to a user of a website
US20160006841A1 (en) * 2012-02-01 2016-01-07 Convertro, Inc. Systems and methods for identifying a returning web client
US20180267787A1 (en) * 2017-03-17 2018-09-20 Microsoft Technology Licensing, Llc Runtime deployment of payloads in a cloud service
CN108628722A (en) * 2018-05-11 2018-10-09 华中科技大学 A kind of distributed Web Component services detection system
CN110489701A (en) * 2019-08-19 2019-11-22 安徽三实信息技术服务有限公司 Extract the method, apparatus and CMS recognition methods of CMS identification feature

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160006841A1 (en) * 2012-02-01 2016-01-07 Convertro, Inc. Systems and methods for identifying a returning web client
CN103065095A (en) * 2013-01-29 2013-04-24 四川大学 WEB vulnerability scanning method and vulnerability scanner based on fingerprint recognition technology
US20140281919A1 (en) * 2013-03-15 2014-09-18 Webroot Inc. Detecting a change to the content of information displayed to a user of a website
US20180267787A1 (en) * 2017-03-17 2018-09-20 Microsoft Technology Licensing, Llc Runtime deployment of payloads in a cloud service
CN108628722A (en) * 2018-05-11 2018-10-09 华中科技大学 A kind of distributed Web Component services detection system
CN110489701A (en) * 2019-08-19 2019-11-22 安徽三实信息技术服务有限公司 Extract the method, apparatus and CMS recognition methods of CMS identification feature

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
闫淑筠 等: ""一种有效的Web指纹识别方法"", 《中国科学院大学学报》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112131508A (en) * 2020-09-25 2020-12-25 深信服科技股份有限公司 Method, equipment, device and medium for identifying fingerprint of website application framework
CN113946566A (en) * 2021-12-20 2022-01-18 北京大学 Web system fingerprint database construction method and device and electronic equipment
CN113946566B (en) * 2021-12-20 2022-03-18 北京大学 Web system fingerprint database construction method and device and electronic equipment

Also Published As

Publication number Publication date
CN111475464B (en) 2023-04-25

Similar Documents

Publication Publication Date Title
WO2019214245A1 (en) Information pushing method and apparatus, and terminal device and storage medium
JP2022535792A (en) Discovery of data field semantic meaning from data field profile data
US8271495B1 (en) System and method for automating categorization and aggregation of content from network sites
US20100257440A1 (en) High precision web extraction using site knowledge
US7711719B1 (en) Massive multi-pattern searching
US20120102015A1 (en) Method and System for Performing a Comparison
GB2513472A (en) Resolving similar entities from a database
CN101796480A (en) Integrating external related phrase information into a phrase-based indexing information retrieval system
CN113254751B (en) Method, equipment and storage medium for accurately extracting complex webpage structured information
US8423885B1 (en) Updating search engine document index based on calculated age of changed portions in a document
CN105589894B (en) Document index establishing method and device and document retrieval method and device
CN106649557B (en) Semantic association mining method for defect report and mail list
CN103324929B (en) Based on the handwritten Chinese recognition methods of minor structure study
CN111475464B (en) Method for automatically finding and mining fingerprints of Web component
CN111488385A (en) Data processing method and device based on artificial intelligence and computer equipment
CN108959550B (en) User focus mining method, device, equipment and computer readable medium
CN113065018A (en) Audio and video index library creating and retrieving method and device and electronic equipment
Machanavajjhala et al. Collective extraction from heterogeneous web lists
CN109740097B (en) Webpage text extraction method based on logical link block
CN107169065B (en) Method and device for removing specific content
CN107832389B (en) Data management method and device
CN110705297A (en) Enterprise name-identifying method, system, medium and equipment
Dejean Extracting structured data from unstructured document with incomplete resources
EP1138007A1 (en) System and method for finding near matches among records in databases
CN112131215B (en) Bottom-up database information acquisition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant