CN117454881B - Website dynamic tag analysis method based on static page - Google Patents

Website dynamic tag analysis method based on static page Download PDF

Info

Publication number
CN117454881B
CN117454881B CN202311748269.3A CN202311748269A CN117454881B CN 117454881 B CN117454881 B CN 117454881B CN 202311748269 A CN202311748269 A CN 202311748269A CN 117454881 B CN117454881 B CN 117454881B
Authority
CN
China
Prior art keywords
website
data
page
dynamic tag
tag
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311748269.3A
Other languages
Chinese (zh)
Other versions
CN117454881A (en
Inventor
刘志雨
赵志庆
侯玉柱
张昊
靳学庚
张雨铭威
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Rongxing Technology Co ltd
Original Assignee
Rongxing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rongxing Technology Co ltd filed Critical Rongxing Technology Co ltd
Priority to CN202311748269.3A priority Critical patent/CN117454881B/en
Publication of CN117454881A publication Critical patent/CN117454881A/en
Application granted granted Critical
Publication of CN117454881B publication Critical patent/CN117454881B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/221Parsing markup language streams
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • G06F16/986Document structures and storage, e.g. HTML extensions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/75Structural analysis for program understanding
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention relates to the field of website dynamic tag analysis, in particular to a website dynamic tag analysis method based on a static page, which comprises the following steps: s1, acquiring a website hierarchical structure, and performing initial analysis processing to obtain a multi-level static page; s2, acquiring static page element code data by utilizing the multi-level static page; s3, dynamic analysis processing is carried out by utilizing the static page element code data to obtain a website dynamic tag analysis result, the website acquisition and display effect is optimized through extraction and redistribution of page source codes, meanwhile, the dynamic analysis tag can be compatible with a more complex page tag structure, and in a large website structure and a long-time analysis flow, the accuracy and the instantaneity of the tag analysis result at each moment are improved.

Description

Website dynamic tag analysis method based on static page
Technical Field
The invention relates to the field of website dynamic tag analysis, in particular to a website dynamic tag analysis method based on a static page.
Background
The static page is used as a common display type of a website architecture, the page display content is more and the display content is different according to different page grades, because the current open source acquisition demand is gradually increased, dynamic tag analysis of the static page can be implemented by the existing scheme, but for the multi-level page of the large website architecture, the accuracy of acquisition analysis and the reasonable implementation of the scheme are urgent problems to be solved.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a website dynamic tag analysis method based on a static page, which realizes multistage layer-by-layer output verification by extracting the source codes of the static page for analysis and processing, and improves the tag analysis capability and accuracy.
In order to achieve the above purpose, the present invention provides a website dynamic tag parsing method based on static pages, including:
s1, acquiring a website hierarchical structure, and performing initial analysis processing to obtain a multi-level static page;
s2, acquiring static page element code data by utilizing the multi-level static page;
and S3, carrying out dynamic analysis processing by utilizing the static page element code data to obtain a website dynamic tag analysis result.
Preferably, the obtaining the website hierarchical structure for initial parsing to obtain the multi-level static page includes:
s1-1, collecting a corresponding open source page of a website;
s1-2, obtaining a website hierarchical structure by utilizing the corresponding open source page of the website;
s1-3, carrying out overall acquisition processing according to the hierarchical structure to obtain a multi-level static page of the website hierarchical structure;
the multi-level static page comprises js files and css style sheet files.
Preferably, the acquiring static page element code data by using the multi-level static page includes:
s2-1, acquiring website page compatible data according to the multi-level static page;
s2-2, performing bit display processing according to the multi-level static page by utilizing the website page compatible data to obtain static page element code data.
Further, obtaining website page compatible data according to the multi-level static page includes:
s2-1-1, judging whether the multi-level static page corresponds to an open source page and has form data, if so, extracting the form data of the open source page, arranging the form data based on the original order of the form data to obtain secondary processing form data, and executing S2-1-2, otherwise, directly executing S2-1-2;
s2-1-2, judging whether file data exists in the multi-level static page corresponding to the open source page, if so, acquiring a file data corresponding to the download address of the open source page, and executing S2-1-3, otherwise, directly executing S2-1-3;
s2-1-3, judging whether the multi-level static page corresponds to an open source page and has picture data, if so, executing S2-1-4, otherwise, using the download address corresponding to the secondary processing form data and the file data as website page compatible data;
s2-1-4, judging whether the picture data and the form data are associated, if yes, inserting secondary processing form data into the picture data to obtain tertiary processing form data as website page compatible data, otherwise, executing S2-1-5;
s2-1-5, judging whether the picture data and the file data are associated, if so, acquiring a secondary address corresponding to the picture data according to a download address corresponding to the file data, using the secondary processing table data, the download address corresponding to the file data and the secondary address corresponding to the picture data as website page compatible data, and if not, using the secondary processing table data, the download address corresponding to the file data and the download address corresponding to the picture data as website page compatible data.
Further, the step of performing dynamic analysis processing by using the static page element code data to obtain a website dynamic tag analysis result includes:
s3-1, acquiring a website dynamic tag according to the static page element code data;
s3-2, carrying out loop analysis verification by using the website dynamic tag to obtain a website dynamic tag iteration result;
s3-3, performing backtracking verification processing according to the website dynamic tag iteration result to obtain a backtracking verification result of the website dynamic tag iteration result;
s3-4, obtaining a website dynamic tag analysis result according to the backtracking verification result.
Further, obtaining the website dynamic tag according to the static page element code data includes:
s3-1-1, acquiring corresponding element nodes as HTML tags by utilizing the static page element code data;
s3-1-2, judging whether the number of the HTML tags is 1, if yes, executing S3-1-3, otherwise, sequentially acquiring URL features of the HTML tags corresponding to static page element code data, and directly executing S3-1-4;
s3-1-3, judging whether the link address of the HTML tag is consistent with the link address of the corresponding open source page of the website, if so, using the HTML tag as a website dynamic tag, otherwise, returning to S3-1-1;
s3-1-4, judging whether the address sequence corresponding to the URL features is completely consistent with the address sequence corresponding to the website hierarchical structure, if so, using the URL features as website dynamic tags, otherwise, returning to S1-1.
Further, performing loop analysis verification by using the website dynamic tag to obtain a website dynamic tag iteration result includes:
s3-2-1, judging whether the website dynamic tag is an HTML tag, if so, directly outputting an iteration result of the HTML tag as the website dynamic tag, otherwise, executing S3-2-2;
s3-2-2, judging whether the website dynamic tag and the website page compatible data are corresponding step by step, if so, establishing primary data-address mapping by utilizing each URL characteristic of the website dynamic tag and each sub-data of the corresponding website page compatible data, and executing S3-2-3, otherwise, performing cyclic verification processing;
s3-2-3, using the primary data-address mapping of the open source page corresponding to the current website as a data reference;
s3-2-4, judging whether the hierarchical structure of the website is changed, if so, executing S3-2-5, otherwise, directly executing S3-2-6;
s3-2-5, judging whether the current website hierarchy is a subset of the adjacent last website hierarchy, if so, deleting the corresponding primary data-address mapping of the changed website hierarchy, and executing S3-2-6, otherwise, returning to S1-2;
s3-2-6, using the data reference as a website dynamic tag iteration result at the current moment;
and each URL characteristic of the dynamic website label and each sub data in the compatible website page data are corresponding to each other in the step-by-step correspondence mode.
Further, the performing loop verification processing includes:
s3-2-2-1, acquiring a non-corresponding website dynamic tag according to the website page compatible data;
s3-2-2-2, acquiring website page compatible data corresponding to the non-corresponding website dynamic tag;
s3-2-2-3, judging whether the website page compatible data corresponding to the non-corresponding website dynamic tag corresponds to the website corresponding open source page, if so, executing S3-2-2-4, otherwise, returning to S2-1;
s3-2-2-4, judging whether the compatible data of the website page corresponding to the non-corresponding website dynamic tag has a merging address, if so, executing S3-2-2-5, otherwise, directly executing S3-2-2-6;
s3-2-2-5, judging whether the combined address is consistent with a download address corresponding to file data of an open source page corresponding to a website, if so, establishing a supplementary data-address mapping by utilizing the download address corresponding to the file data and the corresponding URL characteristic, adding a primary data-address mapping, returning to S3-2-2, otherwise, returning to S2-1-5;
s3-2-2-6, judging whether the website hierarchy structure part corresponding to the non-corresponding website dynamic tag is a subset of the website hierarchy structure of the website corresponding to the open source page, if yes, returning to S3-2-1, otherwise, returning to S1-2;
and the combined address is a secondary address of picture data of website page compatible data.
Further, performing backtracking verification processing according to the website dynamic tag iteration result to obtain a backtracking verification result of the website dynamic tag iteration result includes:
s3-3-1, judging whether step return processing exists at the current moment, if so, acquiring preset data of the step return processing, and executing S3-3-2, otherwise, enabling a backtracking verification result of the website dynamic tag iteration result to be normal, and directly outputting the website dynamic tag iteration result corresponding to the current moment;
s3-3-2, judging whether the preset data and the backtracking verification node set are corresponding, if so, the backtracking verification result of the website dynamic tag iteration result is normal, outputting the website dynamic tag iteration result corresponding to the current moment, and otherwise, returning to the preset data corresponding screening step;
the step of returning is a pre-screening returning step at the corresponding moment of S3-3-1, and the backtracking verification node set sequentially comprises the website corresponding open source pages.
Further, obtaining the website dynamic tag analysis result according to the backtracking verification result includes:
when the backtracking verification result is normal, directly outputting a website dynamic tag iteration result at the current moment as a website dynamic tag analysis result;
when the backtracking verification result corresponds to the existing step return processing, the preset screening return step corresponding time is used as the starting time, and the website dynamic tag iteration result from the starting time to the current time and the corresponding website page compatible data are obtained.
Compared with the closest prior art, the invention has the following beneficial effects:
the extraction and redistribution of the page source codes optimize the website acquisition and display effect, meanwhile, the dynamic analysis of the labels can be compatible with more complex page label structures, in the large website structure and long-time analysis flow, the accuracy and the instantaneity of the label analysis result at each moment are improved.
Drawings
FIG. 1 is a flow chart of a website dynamic tag parsing method based on a static page.
Detailed Description
The following describes the embodiments of the present invention in further detail with reference to the drawings.
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1: the invention provides a website dynamic tag analysis method based on a static page, which is shown in figure 1 and comprises the following steps:
s1, acquiring a website hierarchical structure, and performing initial analysis processing to obtain a multi-level static page;
s2, acquiring static page element code data by utilizing the multi-level static page;
and S3, carrying out dynamic analysis processing by utilizing the static page element code data to obtain a website dynamic tag analysis result.
S1 specifically comprises:
s1-1, collecting a corresponding open source page of a website;
s1-2, obtaining a website hierarchical structure by utilizing the corresponding open source page of the website;
s1-3, carrying out overall acquisition processing according to the hierarchical structure to obtain a multi-level static page of the website hierarchical structure;
the multi-level static page comprises js files and css style sheet files.
S2 specifically comprises:
s2-1, acquiring website page compatible data according to the multi-level static page;
s2-2, performing bit display processing according to the multi-level static page by utilizing the website page compatible data to obtain static page element code data.
S2-1 specifically comprises:
s2-1-1, judging whether the multi-level static page corresponds to an open source page and has form data, if so, extracting the form data of the open source page, arranging the form data based on the original order of the form data to obtain secondary processing form data, and executing S2-1-2, otherwise, directly executing S2-1-2;
s2-1-2, judging whether file data exists in the multi-level static page corresponding to the open source page, if so, acquiring a file data corresponding to the download address of the open source page, and executing S2-1-3, otherwise, directly executing S2-1-3;
s2-1-3, judging whether the multi-level static page corresponds to an open source page and has picture data, if so, executing S2-1-4, otherwise, using the download address corresponding to the secondary processing form data and the file data as website page compatible data;
s2-1-4, judging whether the picture data and the form data are associated, if yes, inserting secondary processing form data into the picture data to obtain tertiary processing form data as website page compatible data, otherwise, executing S2-1-5;
s2-1-5, judging whether the picture data and the file data are associated, if so, acquiring a secondary address corresponding to the picture data according to a download address corresponding to the file data, using the secondary processing table data, the download address corresponding to the file data and the secondary address corresponding to the picture data as website page compatible data, and if not, using the secondary processing table data, the download address corresponding to the file data and the download address corresponding to the picture data as website page compatible data.
In this embodiment, in the website dynamic tag analysis method based on the static page, the association between the picture data and the file data is that the picture content and the document content are associated with each other.
In this embodiment, a website dynamic tag analysis method based on a static page combines the above data and processing steps, and the implementation flow of the alignment display processing is as follows:
1. the acquisition mode is as follows: the method comprises the steps that full-quantity acquisition is carried out on an original open source page, js files and css style sheet files are contained in the original open source page, and the files are corresponding to original page reference files in a specific matching mode, so that localization of static files is achieved;
2. table compatibility: in order to enable the display form to be close to the original text display form, analyzing and implanting standard form labels, implanting the standard form labels to simultaneously contain text information contained in each cell, assembling the cells according to the original sequence, and recovering the original form content and the original form;
3. file compatibility: when the acquired information source contains the picture file, the information source is analyzed through a specific label, the information source is judged to contain a downloading address, the downloading address is obtained, if the file is an overseas file, the downloading forwarding is carried out through an agent through an overseas server, the file is pushed back to the local, and meanwhile, a new file path is implanted into an original file downloading path to realize file downloading. After the return, the analysis service supports the direct preview of the file of the specific type through browser rendering, and the file which is not supported by the preview can support the downloading;
4. and (3) picture compatibility: when the acquisition information source contains the picture file, the picture type and the picture downloading path are analyzed through the picture tag, the multi-picture acquisition downloading can be compatible at the same time, if the user encounters an overseas picture address, the user can download and forward the file through an agent by an overseas server, the file is pushed back to the local, a new file path is implanted into the original picture tag path, the multi-picture is supported, and meanwhile, a specific mark is inserted into the original picture showing position.
In this embodiment, the secondary address is a lower address of a download address corresponding to file data, and is used for downloading corresponding picture data.
S3 specifically comprises:
s3-1, acquiring a website dynamic tag according to the static page element code data;
s3-2, carrying out loop analysis verification by using the website dynamic tag to obtain a website dynamic tag iteration result;
s3-3, performing backtracking verification processing according to the website dynamic tag iteration result to obtain a backtracking verification result of the website dynamic tag iteration result;
s3-4, obtaining a website dynamic tag analysis result according to the backtracking verification result.
S3-1 specifically comprises:
s3-1-1, acquiring corresponding element nodes as HTML tags by utilizing the static page element code data;
s3-1-2, judging whether the number of the HTML tags is 1, if yes, executing S3-1-3, otherwise, sequentially acquiring URL features of the HTML tags corresponding to static page element code data, and directly executing S3-1-4;
s3-1-3, judging whether the link address of the HTML tag is consistent with the link address of the corresponding open source page of the website, if so, using the HTML tag as a website dynamic tag, otherwise, returning to S3-1-1;
s3-1-4, judging whether the address sequence corresponding to the URL features is completely consistent with the address sequence corresponding to the website hierarchical structure, if so, using the URL features as website dynamic tags, otherwise, returning to S1-1.
S3-2 specifically comprises:
s3-2-1, judging whether the website dynamic tag is an HTML tag, if so, directly outputting an iteration result of the HTML tag as the website dynamic tag, otherwise, executing S3-2-2;
s3-2-2, judging whether the website dynamic tag and the website page compatible data are corresponding step by step, if so, establishing primary data-address mapping by utilizing each URL characteristic of the website dynamic tag and each sub-data of the corresponding website page compatible data, and executing S3-2-3, otherwise, performing cyclic verification processing;
s3-2-3, using the primary data-address mapping of the open source page corresponding to the current website as a data reference;
s3-2-4, judging whether the hierarchical structure of the website is changed, if so, executing S3-2-5, otherwise, directly executing S3-2-6;
s3-2-5, judging whether the current website hierarchy is a subset of the adjacent last website hierarchy, if so, deleting the corresponding primary data-address mapping of the changed website hierarchy, and executing S3-2-6, otherwise, returning to S1-2;
s3-2-6, using the data reference as a website dynamic tag iteration result at the current moment;
and each URL characteristic of the dynamic website label and each sub data in the compatible website page data are corresponding to each other in the step-by-step correspondence mode.
In this embodiment, a website dynamic tag parsing method based on a static page, where the HTML tag is defined as each component in an HTML document is a node, where each HTML tag is an element node, and an annotation belongs to an annotation node.
S3-2-2 specifically comprises:
s3-2-2-1, acquiring a non-corresponding website dynamic tag according to the website page compatible data;
s3-2-2-2, acquiring website page compatible data corresponding to the non-corresponding website dynamic tag;
s3-2-2-3, judging whether the website page compatible data corresponding to the non-corresponding website dynamic tag corresponds to the website corresponding open source page, if so, executing S3-2-2-4, otherwise, returning to S2-1;
s3-2-2-4, judging whether the compatible data of the website page corresponding to the non-corresponding website dynamic tag has a merging address, if so, executing S3-2-2-5, otherwise, directly executing S3-2-2-6;
s3-2-2-5, judging whether the combined address is consistent with a download address corresponding to file data of an open source page corresponding to a website, if so, establishing a supplementary data-address mapping by utilizing the download address corresponding to the file data and the corresponding URL characteristic, adding a primary data-address mapping, returning to S3-2-2, otherwise, returning to S2-1-5;
s3-2-2-6, judging whether the website hierarchy structure part corresponding to the non-corresponding website dynamic tag is a subset of the website hierarchy structure of the website corresponding to the open source page, if yes, returning to S3-2-1, otherwise, returning to S1-2;
and the combined address is a secondary address of picture data of website page compatible data.
S3-3 specifically comprises:
s3-3-1, judging whether step return processing exists at the current moment, if so, acquiring preset data of the step return processing, and executing S3-3-2, otherwise, enabling a backtracking verification result of the website dynamic tag iteration result to be normal, and directly outputting the website dynamic tag iteration result corresponding to the current moment;
s3-3-2, judging whether the preset data and the backtracking verification node set are corresponding, if so, the backtracking verification result of the website dynamic tag iteration result is normal, outputting the website dynamic tag iteration result corresponding to the current moment, and otherwise, returning to the preset data corresponding screening step;
the step of returning is a pre-screening returning step at the corresponding moment of S3-3-1, and the backtracking verification node set sequentially comprises the website corresponding open source pages.
S3-4 specifically comprises:
s3-4-1, directly outputting a website dynamic tag iteration result at the current moment as a website dynamic tag analysis result when the backtracking verification result is normal;
s3-4-2, when the backtracking verification result corresponds to the existing step return processing, utilizing the moment corresponding to the pre-screening return step as the starting moment, and acquiring the website dynamic tag iteration result from the starting moment to the current moment and the corresponding website page compatible data.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical aspects of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those of ordinary skill in the art that: modifications and equivalents may be made to the specific embodiments of the invention without departing from the spirit and scope of the invention, which is intended to be covered by the claims.

Claims (3)

1. A website dynamic tag analysis method based on a static page is characterized by comprising the following steps:
s1, acquiring a website hierarchical structure, and performing initial analysis processing to obtain a multi-level static page;
s1-1, collecting a corresponding open source page of a website;
s1-2, obtaining a website hierarchical structure by utilizing the corresponding open source page of the website;
s1-3, carrying out overall acquisition processing according to the hierarchical structure to obtain a multi-level static page of the website hierarchical structure;
the multi-level static page comprises js files and css style sheet files;
s2, acquiring static page element code data by utilizing the multi-level static page;
s2-1, acquiring website page compatible data according to the multi-level static page;
s2-1-1, judging whether the multi-level static page corresponds to an open source page and has form data, if so, extracting the form data of the open source page, arranging the form data based on the original order of the form data to obtain secondary processing form data, and executing S2-1-2, otherwise, directly executing S2-1-2;
s2-1-2, judging whether file data exists in the multi-level static page corresponding to the open source page, if so, acquiring a file data corresponding to the download address of the open source page, and executing S2-1-3, otherwise, directly executing S2-1-3;
s2-1-3, judging whether the multi-level static page corresponds to an open source page and has picture data, if so, executing S2-1-4, otherwise, using the download address corresponding to the secondary processing form data and the file data as website page compatible data;
s2-1-4, judging whether the picture data and the form data are associated, if yes, inserting secondary processing form data into the picture data to obtain tertiary processing form data as website page compatible data, otherwise, executing S2-1-5;
s2-1-5, judging whether the picture data and the file data are associated, if so, acquiring a secondary address corresponding to the picture data according to a download address corresponding to the file data, and using the secondary processing table data, the download address corresponding to the file data and the secondary address corresponding to the picture data as website page compatible data, otherwise, using the secondary processing table data, the download address corresponding to the file data and the download address corresponding to the picture data as website page compatible data;
s2-2, performing alignment display processing according to the multi-level static page by utilizing the website page compatible data to obtain static page element code data;
s3, carrying out dynamic analysis processing by utilizing the static page element code data to obtain a website dynamic tag analysis result;
s3-1, acquiring a website dynamic tag according to the static page element code data;
s3-1-1, acquiring corresponding element nodes as HTML tags by utilizing the static page element code data;
s3-1-2, judging whether the number of the HTML tags is 1, if yes, executing S3-1-3, otherwise, sequentially acquiring URL features of the HTML tags corresponding to static page element code data, and directly executing S3-1-4;
s3-1-3, judging whether the link address of the HTML tag is consistent with the link address of the corresponding open source page of the website, if so, using the HTML tag as a website dynamic tag, otherwise, returning to S3-1-1;
s3-1-4, judging whether the address sequence corresponding to the URL features is completely consistent with the address sequence corresponding to the website hierarchical structure, if so, using the URL features as website dynamic tags, otherwise, returning to S1-1;
s3-2, carrying out loop analysis verification by using the website dynamic tag to obtain a website dynamic tag iteration result;
s3-2-1, judging whether the website dynamic tag is an HTML tag, if so, directly outputting an iteration result of the HTML tag as the website dynamic tag, otherwise, executing S3-2-2;
s3-2-2, judging whether the website dynamic tag and the website page compatible data are corresponding step by step, if so, establishing primary data-address mapping by utilizing each URL characteristic of the website dynamic tag and each sub-data of the corresponding website page compatible data, and executing S3-2-3, otherwise, performing cyclic verification processing;
s3-2-3, using the primary data-address mapping of the open source page corresponding to the current website as a data reference;
s3-2-4, judging whether the hierarchical structure of the website is changed, if so, executing S3-2-5, otherwise, directly executing S3-2-6;
s3-2-5, judging whether the current website hierarchy is a subset of the adjacent last website hierarchy, if so, deleting the corresponding primary data-address mapping of the changed website hierarchy, and executing S3-2-6, otherwise, returning to S1-2;
s3-2-6, using the data reference as a website dynamic tag iteration result at the current moment;
wherein, each URL characteristic of the dynamic website label and each sub data in the compatible website page data are corresponding to each other step by step;
s3-3, performing backtracking verification processing according to the website dynamic tag iteration result to obtain a backtracking verification result of the website dynamic tag iteration result;
s3-3-1, judging whether step return processing exists at the current moment, if so, acquiring preset data of the step return processing, and executing S3-3-2, otherwise, enabling a backtracking verification result of the website dynamic tag iteration result to be normal, and directly outputting the website dynamic tag iteration result corresponding to the current moment;
s3-3-2, judging whether the preset data and the backtracking verification node set are corresponding, if so, the backtracking verification result of the website dynamic tag iteration result is normal, outputting the website dynamic tag iteration result corresponding to the current moment, and otherwise, returning to the preset data corresponding screening step;
the step of returning is a pre-screening returning step at the corresponding moment of S3-3-1, and the backtracking verification node set sequentially comprises website corresponding open source pages;
s3-4, obtaining a website dynamic tag analysis result according to the backtracking verification result.
2. The method for dynamically parsing a website tag based on a static page according to claim 1, wherein the performing the loop verification process comprises:
s3-2-2-1, acquiring a non-corresponding website dynamic tag according to the website page compatible data;
s3-2-2-2, acquiring website page compatible data corresponding to the non-corresponding website dynamic tag;
s3-2-2-3, judging whether the website page compatible data corresponding to the non-corresponding website dynamic tag corresponds to the website corresponding open source page, if so, executing S3-2-2-4, otherwise, returning to S2-1;
s3-2-2-4, judging whether the compatible data of the website page corresponding to the non-corresponding website dynamic tag has a merging address, if so, executing S3-2-2-5, otherwise, directly executing S3-2-2-6;
s3-2-2-5, judging whether the combined address is consistent with a download address corresponding to file data of an open source page corresponding to a website, if so, establishing a supplementary data-address mapping by utilizing the download address corresponding to the file data and the corresponding URL characteristic, adding a primary data-address mapping, returning to S3-2-2, otherwise, returning to S2-1-5;
s3-2-2-6, judging whether the website hierarchy structure part corresponding to the non-corresponding website dynamic tag is a subset of the website hierarchy structure of the website corresponding to the open source page, if yes, returning to S3-2-1, otherwise, returning to S1-2;
and the combined address is a secondary address of picture data of website page compatible data.
3. The method for analyzing the website dynamic tag based on the static page according to claim 1, wherein obtaining the website dynamic tag analysis result according to the backtracking verification result comprises:
when the backtracking verification result is normal, directly outputting a website dynamic tag iteration result at the current moment as a website dynamic tag analysis result;
when the backtracking verification result corresponds to the existing step return processing, the preset screening return step corresponding time is used as the starting time, and the website dynamic tag iteration result from the starting time to the current time and the corresponding website page compatible data are obtained.
CN202311748269.3A 2023-12-19 2023-12-19 Website dynamic tag analysis method based on static page Active CN117454881B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311748269.3A CN117454881B (en) 2023-12-19 2023-12-19 Website dynamic tag analysis method based on static page

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311748269.3A CN117454881B (en) 2023-12-19 2023-12-19 Website dynamic tag analysis method based on static page

Publications (2)

Publication Number Publication Date
CN117454881A CN117454881A (en) 2024-01-26
CN117454881B true CN117454881B (en) 2024-03-08

Family

ID=89585797

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311748269.3A Active CN117454881B (en) 2023-12-19 2023-12-19 Website dynamic tag analysis method based on static page

Country Status (1)

Country Link
CN (1) CN117454881B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010170453A (en) * 2009-01-26 2010-08-05 Nippon Business Engineering:Kk Static web site construction method, static web site construction service providing method, dynamic/static conversion processor, and dynamic/static conversion processing program
CN103685189A (en) * 2012-09-17 2014-03-26 百度在线网络技术(北京)有限公司 Website security evaluation method and system
CN112818200A (en) * 2021-01-28 2021-05-18 平安普惠企业管理有限公司 Data crawling and event analyzing method and system based on static website
CN114817811A (en) * 2022-05-07 2022-07-29 盐城金堤科技有限公司 Website analysis method and device
CN117093260A (en) * 2023-10-16 2023-11-21 戎行技术有限公司 Fusion model website structure analysis method based on decision tree classification algorithm

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010170453A (en) * 2009-01-26 2010-08-05 Nippon Business Engineering:Kk Static web site construction method, static web site construction service providing method, dynamic/static conversion processor, and dynamic/static conversion processing program
CN103685189A (en) * 2012-09-17 2014-03-26 百度在线网络技术(北京)有限公司 Website security evaluation method and system
CN112818200A (en) * 2021-01-28 2021-05-18 平安普惠企业管理有限公司 Data crawling and event analyzing method and system based on static website
CN114817811A (en) * 2022-05-07 2022-07-29 盐城金堤科技有限公司 Website analysis method and device
CN117093260A (en) * 2023-10-16 2023-11-21 戎行技术有限公司 Fusion model website structure analysis method based on decision tree classification algorithm

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于WebDriver的定向网络爬虫设计与实现;时永坤;;软件;20160915(第09期);94-97页 *

Also Published As

Publication number Publication date
CN117454881A (en) 2024-01-26

Similar Documents

Publication Publication Date Title
CN109933752B (en) Method and device for exporting electronic document
US10229115B2 (en) System and method for creating an internationalized web application
US8468494B2 (en) In-line editor
CN105095168A (en) Automatic generation method and device for contract files
CN105138312A (en) Table generation method and apparatus
CN101667118A (en) Method and device for multi-language version development and replacement
CN113609820A (en) Method, device and equipment for generating word file based on extensible markup language file
CN111367595B (en) Data processing method, program running method, device and processing equipment
CN107566090B (en) Fixed-length/variable-length text message processing method and device
CN114328276B (en) Test case generation method and device, and test case display method and device
CN106960058A (en) A kind of structure of web page alteration detection method and system
CN103631590A (en) Method for editing automated testing script
CN112559453A (en) Data storage method and device, electronic equipment and storage medium
CN110309457B (en) Webpage data processing method, device, computer equipment and storage medium
CN116244476A (en) Method and system for realizing pre-labeling front-end visualization based on rich text
CN110688315A (en) Interface code detection report generation method, electronic device, and storage medium
CN117454881B (en) Website dynamic tag analysis method based on static page
CN112785284A (en) Message storage method and device based on structured document
CN111638930A (en) Shell-based iOS multi-language script configuration method and system
CN112965772A (en) Web page display method and device and electronic equipment
CN102567016B (en) Method and device for extracting use example of application programming interface
CN113127776A (en) Breadcrumb path generation method and device and terminal equipment
CN109284401A (en) The addition of courseware label, courseware recommended method, device and storage medium
CN114518881A (en) Page generation method, system and storage medium
CN114115831A (en) Data processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant