CN112470154A - Method and device for detecting safety of web page - Google Patents

Method and device for detecting safety of web page Download PDF

Info

Publication number
CN112470154A
CN112470154A CN201880095842.6A CN201880095842A CN112470154A CN 112470154 A CN112470154 A CN 112470154A CN 201880095842 A CN201880095842 A CN 201880095842A CN 112470154 A CN112470154 A CN 112470154A
Authority
CN
China
Prior art keywords
web
page
dom tree
dom
tested
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201880095842.6A
Other languages
Chinese (zh)
Other versions
CN112470154B (en
Inventor
黄增强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Cloud Computing Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN112470154A publication Critical patent/CN112470154A/en
Application granted granted Critical
Publication of CN112470154B publication Critical patent/CN112470154B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Storage Device Security (AREA)
  • Information Transfer Between Computers (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The method comprises the steps of eliminating a public part in each web response page, wherein the public part in each web response page is a part irrelevant to Structured Query Language (SQL) injection in the web page to be tested, calculating the similarity between the non-public part of the first web response page and the non-public part of the second web response page, and detecting SQL injection risks of the web page to be tested according to the calculated similarity, so that the influence of general content in the web response page on similarity calculation is reduced, and the accuracy of calculating the similarity is improved.

Description

Method and device for detecting safety of web page Technical Field
The present application relates to the field of network security, and more particularly, to a method and apparatus for detecting security of a web page.
Background
Detection of Structured Query Language (SQL) injection vulnerabilities is one of the scanning capabilities that an automated vulnerability scanning tool must possess. In the detection process, the automation tool needs to send a plurality of requests to the server interface, and judges whether the SQL injection vulnerability exists according to the response condition of the server. The current SQL vulnerability detection scheme commonly used in the industry is to perform similarity judgment based on a response page of an SQL statement with an access parameter of logical true or logical false, and has high computational complexity and low accuracy.
Disclosure of Invention
In view of this, the present application provides a method and an apparatus for detecting security of a web page, so as to improve accuracy of calculating similarity.
In a first aspect, a method for detecting security of a web page is provided, including: sending a first test request and a second test request to a server of a web page to be tested, wherein the first test request comprises a legal test request, and the second test request comprises an illegal test request; receiving a first web response page corresponding to the first test request and a second web response page corresponding to the second test request; removing a public part in each web response page, wherein the public part in each web response page is a part irrelevant to SQL injection in the web page to be tested; and calculating the similarity of the non-public part of the first web response page and the non-public part of the second web response page, and detecting the SQL injection risk of the webpage to be tested according to the calculated similarity.
When the similarity of the web response page is calculated, the interference of the public part in the web page on the similarity calculation can be eliminated, and the accuracy of the similarity calculation can be improved.
With reference to the first aspect, in a first possible implementation manner of the first aspect, the rejecting a common part in each web response page includes:
respectively acquiring a Document Object Model (DOM) tree of the first web response page and a DOM tree of the second web response page; acquiring a DOM tree of the template of the web page to be tested, wherein the template of the web page to be tested indicates a public part of the web page to be tested; according to the DOM tree of the template of the web page to be tested, removing subtrees in the DOM tree of the first web response page, wherein the subtrees are the same as the DOM tree of the template; and according to the DOM tree of the template of the web page to be tested, removing the subtree which is the same as the DOM tree of the template in the DOM tree of the second web response page.
Therefore, the method and the device for eliminating the public content in the DOM tree of the web response page are achieved by using the DOM tree of the template of the web page to be tested, and the similarity is calculated conveniently.
With reference to the first possible implementation manner of the first aspect, in a second possible implementation manner of the first aspect, the obtaining a DOM tree of a template of the web page to be tested specifically includes:
sending at least two access requests to a server of the web page to be tested, wherein the at least two access requests comprise different access parameters; receiving at least two access response pages; obtaining at least two access DOM trees according to the at least two access response pages; and acquiring a DOM tree of the template of the web page according to the at least two access DOM trees, wherein the DOM tree of the template of the web page comprises a common sub-tree of the at least two access DOM trees.
Therefore, the embodiment of the application sends a plurality of access requests to the server of the web page to be tested so as to obtain the DOM tree of the template of the web page.
With reference to the second possible implementation manner of the first aspect, in a third possible implementation manner of the first aspect, the obtaining a DOM tree of a template of the web page specifically includes:
and reversely traversing the nodes of the at least two access DOM trees to obtain a common sub-tree of the at least two access DOM trees.
Therefore, the method and the device for obtaining the DOM trees of the templates of the web pages can obtain the public subtrees of the multiple access DOM trees in a traversal mode, so that the DOM trees of the templates of the web pages can be accurately obtained.
With reference to any one possible implementation manner of the first possible implementation manner, the second possible implementation manner, and the third possible implementation manner of the first aspect, in a fourth possible implementation manner of the first aspect, the calculating a similarity between the non-common portion of the first web response page and the non-common portion of the second web response page specifically includes:
and calculating the similarity between the DOM tree of the first web response page after the subtree is removed and the DOM tree of the second web response page after the subtree is removed.
Therefore, the similarity between the DOM tree of the first web response page and the DOM tree of the second web response page after the subtree is removed can be directly calculated, so that the SQL injection risk of the webpage to be tested can be known conveniently.
With reference to any one possible implementation manner of the first possible implementation manner, the second possible implementation manner, and the third possible implementation manner of the first aspect, in a fifth possible implementation manner of the first aspect, the calculating the similarity of the at least two eliminated test DOM trees specifically includes:
rendering the DOM tree of the first web response page after the subtree is removed and the DOM tree of the second web response page after the subtree is removed, and generating at least two test images; and calculating the image similarity of the at least two test images.
Therefore, the DOM tree of the first web response page and the DOM tree of the second web response page after the subtree is removed can be rendered respectively to obtain a plurality of test images, and an image similarity calculation method is used for calculation so as to know the SQL injection risk of the webpage to be tested. In addition, the rendered test image can be displayed, so that the webpage to be tested can be reflected more intuitively, and the SQL injection risk of the webpage to be tested can be judged more favorably.
In a second aspect, an apparatus for detecting security of a web page is provided, which includes a test response acquisition unit, a similarity calculation unit, and a common part acquisition unit. These means are adapted to perform the method of the first aspect and of the various expressions of the first aspect.
In a third aspect, a computing device is provided that includes at least one processor and a memory unit; the storage unit is used for storing instructions; the processor is coupled to the memory unit and when executed by the at least one processor, causes the processor to perform the method as described above in relation to the first aspect and any of the plurality of expressions of the first aspect.
In a fourth aspect, there is provided a computer program product comprising: program code which, when run by a processing unit or transceiver, processor of a computing device, causes the computing device to perform the method of any of the first aspect and its possible embodiments described above.
In a fifth aspect, a computer-readable storage medium is provided, which stores a program that causes a detection apparatus for security of a web page to execute the method of the first aspect and any of its possible embodiments.
Drawings
Fig. 1 is a schematic diagram of an example of a system architecture to which an embodiment of the present application is applied.
FIG. 2 is a schematic view of a web site page.
FIG. 3 is a schematic flow chart diagram of a method for detecting security of a web page according to an embodiment of the present application.
Fig. 4 is a schematic diagram of an example of an embodiment of the present application.
FIG. 5 is a flow chart of reverse traversal according to an embodiment of the application.
FIG. 6 is a schematic diagram of an example according to an embodiment of the present application.
Fig. 7 is a diagram illustrating an example of extracting a common tree according to an embodiment of the present application.
Fig. 8 is a schematic block diagram of a detection apparatus for security of a web page according to an embodiment of the present application.
Fig. 9 is a schematic block diagram of a detection apparatus for security of a web page according to an embodiment of the present application.
Detailed Description
The technical solution in the present application will be described below with reference to the accompanying drawings.
The technical scheme of the embodiment of the application can be applied to a network security detection system or a network security scanning system, for example, an automatic vulnerability scanning tool, a Structured Query Language (SQL) injection scanning system, a cloud vulnerability scanning tool, a Web attack scanning system and other tools for detecting network security. Some terms or concepts referred to are briefly described below.
SQL injection (injection) is to insert an SQL command into a form of a network (e.g., World Wide Web or World Wide Web, Web for convenience of description, the following description takes Web as an example, and is not limited to this), so as to submit or input a query string of a domain name or a page request, and finally achieve the purpose of deceiving a server to execute a malicious SQL command. Specifically, SQL injection is implemented by injecting (malicious) SQL commands into a background database engine using an existing application, and can obtain a database on a website with a security vulnerability by inputting (malicious) SQL statements in a Web form. SQL injection is one of the most common ways of Web attacks.
Document Object Model (DOM), which can access and modify the content and structure of a document in a platform and speech independent manner, is a common way to represent and process a HyperText Markup Language (HTML) document or Extensible Markup Language (XML) document. The design of the DOM can be used in any programming language. DOM techniques allow the user plane to change dynamically, such as by dynamically displaying or hiding an element, changing their properties, adding an element, and the like. DOM techniques enable the interactivity of pages to be greatly enhanced. The DOM is actually a document model described in an object-oriented manner. The DOM defines the objects needed to represent and modify a document, the behavior and properties of those objects, and the relationships between those objects. The DOM may be considered a tree representation of the data and structure on the page.
The risk detection of SQL injection is also called vulnerability detection of SQL injection and can be realized by a similarity algorithm and a threshold valueAnd (5) realizing. Specifically, the similarity calculation method uses the following formula:
Figure PCTCN2018101148-APPB-000001
where y represents the degree of similarity, b represents the number of difference characters, and a represents the total character length. It can be seen that the value of the similarity depends on the number of difference characters and the total string length in the formula. Many characters in the response page of a web page are irrelevant to SQL injection vulnerability. That is to say, a large part of the total string length as the denominator is the general content, and the general content of any response of the web page is the same, and under the condition that the number of the numerator difference characters is not changed, the more the general content in the total string length of the denominator is, the greater the influence on the similarity is, the lower the sensitivity of the similarity is caused, and finally, the more missed judgment exists in the detection result. Therefore, the embodiment of the present application proposes a method to reduce the interference terms of the similarity algorithm as much as possible, so as to improve the accuracy of the similarity calculation.
FIG. 1 is a schematic diagram of an example of a system architecture 100 to which embodiments of the present application may be applied. As shown in fig. 1, the system architecture 100 includes a server 110 of a web page and a security detection device 120 of the web page. The security detection device 120 is used for detecting SQL injection risks of the web page.
Alternatively, the security detection apparatus 120 of the web page may include a test response acquisition unit 121, a similarity calculation unit 122, and a common part acquisition unit 123. The test response acquiring unit 121 is configured to acquire a web response page required by security detection; the similarity calculation unit 122 is configured to calculate a similarity according to the web response page acquired by the test response acquisition unit 121, so as to determine an SQL injection risk or an SQL injection vulnerability of the web page. The common part acquiring unit 123 is configured to acquire a common part in each web response page.
It should be understood that the embodiment of the present application is not limited to the specific deployment of the server 110 for the web page. Alternatively, the server 110 of the web page may be a separately deployed server, or may be a server deployed in a data center, for example, a virtual machine in a public cloud, a private cloud, or a hybrid cloud platform.
It should also be understood that the embodiment of the present application is not limited to the specific deployment of the security detection apparatus 120 for a web page. Alternatively, the security detection apparatus 120 for the web page may be deployed on a physical machine/server that is independent from the server for the web page, or may be deployed in a data center where the server for the web page is located, such as a management service or a virtual machine in a public cloud, a private cloud, or a hybrid cloud platform.
It should be further understood that fig. 1 only schematically shows a system architecture diagram of the embodiment of the present application, and does not limit the embodiment of the present application, and in fact, fig. 1 may further include other modules or units that interact or communicate with the server 110 of the web page and/or the security detection apparatus 120 of the web page.
The method for detecting the security of the web page according to the embodiment of the present application may be performed by the security detection apparatus 120 (which may be simply referred to as a security detection apparatus) of the web page in fig. 1. For convenience of description, the following description will be given taking the security detection apparatus as an example, and the description will be made in a unified way.
In the embodiment of the application, the safety detection device eliminates the public part in the web response page, namely the part irrelevant to SQL injection in the web page to be tested, and calculates the similarity according to the eliminated web response page.
For ease of understanding, common portions of the web response page are described herein in connection with the example of FIG. 2. Fig. 2 shows a schematic diagram of a page relating to an artist. As shown in fig. 2, in the presented web page content, if the user clicks to view a different artist, different content is presented with respect to the "specific content presentation by artist" in the box, but the same content is presented with respect to the navigation bar. In the content presented by the web page, the content corresponding to the navigation bar may be understood as the public content related to the web page, including the contents of "first page, category, artist …" and the like, and "specific content presentation by artist" may be considered as the non-public part of the web response page. That is, the template corresponding to the web page in fig. 2 can be understood as the content of the common part composition. It should be understood that fig. 2 is only schematic representation of content that may be included in a web page.
The method for detecting the security of the web page in the embodiment of the present application is described in detail below.
FIG. 3 is a schematic flow chart diagram of a method 300 of detecting security of a web page in accordance with an embodiment of the present application. Alternatively, the method 300 may be performed by a security detection device, such as the security detection device 120 of fig. 1 described above. The method 300 includes:
s310, sending a first test request and a second test request to a server of the web page to be tested, wherein the first test request comprises a legal test request, and the second test request comprises an illegal test request.
Optionally, the second test request includes a test request with the SQL statement logic in the access parameter being true or a test request with the SQL statement logic in the access parameter being false. It should also be understood that, in addition to the first test request and the second test request, a test request with true SQL statement logic in at least one access parameter and a test request with false SQL statement logic in at least one access parameter may be sent to the server of the web page to be tested at the same time, so as to increase the number of received web response pages and improve the test accuracy. Here, the first test request and the second test request may be for the same access parameter of the web page to be tested, for example, the access parameter is "id ═ 1", and the access parameter of the first test request is "id ═ 1"; the access parameter of the second test request is "id 1and 1 ═ 1" or "id 1and 1 ═ 2", the former (i.e., "id 1and 1 ═ 1") is a test request whose SQL statement logic in the access parameter is true, and the latter (i.e., "id 1and 1 ═ 2") is a test request whose SQL statement logic in the access parameter is false.
Wherein, the server of the web page to be tested may be the server 110 of the web page in fig. 1.
It should be understood that the above test requests (including true requests and false requests) are only exemplary descriptions, and are not limiting on the embodiments of the present application, and that other requests may also be present in specific implementations. For example, a request with SQL statement logic true in the access parameter may also be some logical condition: for example, "id 2and id 1+1or id (3-1) × 1" or the like, the same response contents as "id 1and id 1" can be obtained.
Specifically, for example, the security detection device sends to the server:
requests with true SQL statement logic in access parameters
(http:// testph. vulnw. com/artists. ph partists 2% 20% and% 201 ═ 1), and a request with SQL statement logic in the access parameter false (http:// testph. vulnw. com/artists. ph partists 2% 20% and% 201 ═ 2).
S320, receiving a first web response page corresponding to the first test request and a second web response page corresponding to the second test request.
S330, removing a public part in each web response page, wherein the public part in each web response page is a part irrelevant to SQL injection in the web page to be tested.
Specifically, after obtaining the first web response page and the second web response page, the security detection apparatus needs to remove the public portion in the web response page.
It should be noted that the common part in each web response page can be understood as the general content of the web page to be tested. For example, multiple access requests are sent for the same web page to be tested, and the content appearing in the response page corresponding to each access request is the common part of the web page to be tested. As can be appreciated in conjunction with FIG. 2 above, the common portion of the web response page may be understood to be the content of the common portion of the web page content, such as the navigation bar of FIG. 2.
Optionally, S330 includes:
respectively acquiring a Document Object Model (DOM) tree of the first web response page and a DOM tree of the second web response page;
acquiring a DOM tree of the template of the web page to be tested, wherein the template of the web page to be tested indicates a public part of the web page to be tested;
according to the DOM tree of the template of the web page to be tested, removing a sub-tree which is the same as the DOM tree of the template in the DOM tree of the first web response page;
and removing the subtree in the DOM tree of the second web response page according to the DOM tree of the template of the web page to be tested.
Specifically, the security detection device analyzes the content of the first web response page to obtain a DOM tree of the first web response page. And then, the safety detection device cuts the nodes in the DOM tree of the template of the web page to be tested by using the DOM tree of the template of the first web response page, and the eliminated DOM tree of the first web response page does not contain the nodes corresponding to the public part of the web page to be tested. Similarly, the security detection device analyzes the content of the second web response page to obtain a DOM tree of the second web response page, cuts the nodes in the DOM tree of the second web response page by adopting the template of the web page to be tested, and the eliminated DOM tree of the second web response page does not contain the nodes corresponding to the public part of the web page to be tested.
For ease of understanding, the web response page after the common portion is culled is described herein in connection with the example in FIG. 4. As shown in fig. 4, in contrast to fig. 2, the content of the common portion in the web response page in fig. 4 has been culled, leaving "specific content presentation by artist". It should be understood that the content displayed in the "artist's specific content presentation" box may be detailed content about the artist, not specifically expanded in fig. 4. Here, fig. 4 may be understood as a normal response page of the web page after the common part is removed. In addition, for the web page after the common part is removed, it should be understood by those skilled in the art that if a test request with a true condition is sent to the server of the web page, a response page same as that of fig. 4 can be obtained theoretically; if a test request with a false condition is sent to the server of the web page, the theoretically obtained response page differs from fig. 4 in that the content in the "artist's specific content presentation" box is empty.
In this way, each web response page obtained by the security detection device does not include a part irrelevant to SQL injection in the web page to be tested.
Optionally, the security detection apparatus may obtain a template of the web page to be tested. The template of the web page to be tested can be understood as a service model formed by the general contents of the web page to be tested.
Optionally, the security detection apparatus may acquire a DOM tree of a template of the web page to be tested. The specific way in which the security detection apparatus of the embodiment of the present application obtains the DOM tree of the template of the web page to be tested is described in detail below. Optionally, the obtaining of the template of the web page may specifically include:
the security detection device sends at least two access requests to a server of the web page to be tested, wherein the at least two access requests comprise different access parameters; receiving at least two access response pages; obtaining at least two access DOM trees according to the at least two access response pages; and acquiring the DOM tree of the template of the web page to be tested according to the at least two access DOM trees, wherein the DOM tree of the template of the web page to be tested comprises the common sub-tree of the at least two access DOM trees.
Specifically, the security detection apparatus sends a plurality of access requests to a server of the web page to be tested, where the plurality of access requests include different access parameters, so as to obtain different response contents for the web page to be tested. After obtaining a plurality of response pages of a plurality of access requests, the security detection device analyzes the plurality of response pages to obtain a plurality of DOM trees. And the safety detection device obtains the public subtrees of the DOM trees by traversing and comparing the nodes on the DOM trees. And the safety monitoring device can obtain the DOM tree of the template of the web page to be tested through the public subtrees of the DOM trees.
Optionally, the security detection apparatus obtains the common sub-tree of the at least two access DOM trees by reversely traversing the nodes of the at least two access DOM trees, and the specific method is described in detail later.
Specifically, the web page to be tested provides an access interface, and web response pages of the web page to be tested under different access parameters are obtained through access requests with different access parameters. The at least two access requests may be understood as not being directed to the same web service interfaceAs well as the URL. For example, the web page to be tested includes a news website interface through which the web site is passedhttp://a.com/news.phpid=1The content is presented. The web page security detection device to be tested respectively conducts crawler scanning on the web page with the service parameter of news, phpid being 2and the web page with the service parameter of news, phpid being 3, of the same news website interface, and different web page contents of the news website interface can be presented. In the web content with the service parameter of news, ph 2and the web content with the service parameter of news, ph 3, the other contents of the web page are similar except for the related parts of the news contents, such as public information like a public navigation bar of the web page. These similar parts may be understood as public content, which constitutes a template for the news web site interface. Accordingly, the DOM tree corresponding to the public content is the DOM tree of the template of the news website interface.
It should be understood that the security detection device may also locally store a template of the web page to be tested, and can be directly used without limitation.
S340, calculating the similarity of the non-public part of the first web response page and the non-public part of the second web response page, and detecting the SQL injection risk of the webpage to be tested according to the calculated similarity.
It should be noted that the non-public part of the first web response page may be understood as: and sending a first test request to a server of the web page to be tested, and removing the public part of the web response page obtained in the previous step from the obtained first web response page, namely the non-public part of the first web response page. Similarly, the non-public part of the second web response page may be understood as: and sending a second test request to a server of the web page to be tested, and removing the public part of the web response page obtained in the previous step from the obtained second web response page, namely the non-public part of the second web response page.
Specifically, the security detection apparatus uses the non-public part of the first web response page as a reference, and if the second test request is a test request whose SQL statement logic in the access parameter is true, when the similarity between the non-public part of the second web response page and the non-public part of the first web response page exceeds a first threshold, it may be considered that the SQL injection vulnerability exists; if the second test request is a test request with a false SQL statement logic in the access parameter, the SQL injection vulnerability may be considered to exist when the similarity between the non-public part of the second web response page and the non-public part of the first web response page is lower than a second threshold. It should be understood that the first threshold and the second threshold may be preset according to requirements or experience, and are not limited thereto.
It should also be understood that, in this embodiment of the application, a test request that SQL statement logic in at least one access parameter is true and a test request that SQL statement logic in at least one access parameter is false may also be simultaneously sent to a server of a web page to be tested, and the obtained similarities between the non-common part of the multiple web response pages and the non-common part of the first web response page are respectively calculated. Whether the SQL injection vulnerability exists or not can be judged according to the calculated similarities and the preset threshold, wherein the preset threshold can be one or more, and the method for judging the SQL vulnerability according to the calculated similarities and the preset threshold is not described herein again.
Optionally, S340 includes: and respectively calculating the similarity of the DOM tree of the first web response page after the subtree is removed and the DOM tree of the second web response page.
In the embodiment of the application, the safety detection device removes all invalid nodes and eliminates interference nodes by removing public parts in the web response page, so that the character string length of the valid nodes is only needed to be adopted when the similarity is calculated, the character string length of the invalid nodes does not need to be included in a similarity formula for calculation, and the accuracy of vulnerability detection can be improved.
Furthermore, because the content actually presented by the page is greatly different from the background code, some hidden codes cannot be seen in the image, but the SQL injection attack only concerns the content seen by the user, so that the text comparison is still inaccurate, and after the subtree is eliminated, the embodiment of the application can also use the image (the image of the effective area presented to the user to perform similarity analysis and compare to obtain a more accurate analysis result.
Optionally, S340 includes: rendering the DOM tree of the first web response page after the subtree is removed and the DOM tree of the second web response page after the subtree is removed, and generating at least two test images;
and calculating the image similarity of the at least two test images.
Specifically, the security detection device may render an image of the obtained DOM tree of the first web response page from which the sub-tree is removed, and present a corresponding image; and the security detection device can perform image rendering on the obtained DOM tree of the second web response page after the subtree is removed, and present a corresponding image. Therefore, the user can intuitively compare the similarity (the user can intuitively compare the rendered images without calculating the similarity), or the similarity of the webpage can be calculated by using an image similarity method. Alternatively, the image similarity method may be a binarization algorithm or other similarity comparison algorithm in the image field, which is not limited herein. Optionally, if the first image similarity satisfies a corresponding image similarity threshold, and the second image similarity satisfies a corresponding image similarity threshold, the SQL injection vulnerability may be considered to exist. Of course, the image similarity threshold may be set in advance, which is not limited thereto.
In addition, because many character strings such as annotations are actually contained in the HTML text to interfere with the text similarity graph, and the annotations and other invisible elements cannot be presented on the graph, the interference of the character strings such as the annotations on the similarity calculation can be avoided by adopting an image rendering mode. That is, if the similarity formula is used for calculation, the annotation-like character string will add extra character string length, thereby affecting the accuracy of the similarity calculation, and the image rendering method can be used to avoid the problems.
Described here in connection with the example in fig. 6. As shown in FIG. 6, for a URL:http://testphp.vulnweb.com/artists.phpartist=2%20and%201=1the method of the embodiment is adopted to eliminate the netAnd reserving div # content in a DOM tree corresponding to the source code of the address, wherein the source code of the address corresponds to the div # siteInfo, the div # maphead and the div # navBar. The left side of fig. 6 is the source code corresponding to the website after removing div # siteInfo, div # masghead, and div # navBar (where the source code may be viewed through a DOM element viewer or obtained in other ways, which are not limited to this), and the right side of fig. 6 is the rendered image of the source code. From the right image in fig. 6 it can be seen that: the removed web page only displays the content of the non-public part, i.e. the content corresponding to the div # content. Moreover, there are annotation strings in the source code (those skilled in the art will appreciate that some of the annotation strings are only schematically outlined in fig. 6, and that there are actually other annotation strings). Similarly, hidden nodes or Cascading Style Sheets (CSSs) in the source code of the web page, similar to the above-mentioned annotation character strings, the Java script code may also increase the length of the additional character string, thereby affecting the accuracy of similarity calculation, and the image rendering method may be used to avoid these problems.
Optionally, the acquiring, by the security detection apparatus, the DOM tree of the template of the web page specifically includes: and obtaining at least two public sub-trees of the access DOM trees by reversely traversing the nodes of the at least two access DOM trees, wherein the at least two public sub-trees of the access DOM trees are the DOM trees of the templates of the web pages.
Specifically, the security detection apparatus traverses a plurality of nodes accessing the DOM tree in reverse direction, specifically, sequentially traverses to an upper layer according to the following sequence: leaf nodes, parent nodes, subtrees, largest subtrees, and the like, by comparison to obtain a common subtree of the plurality of DOM trees. The following describes a process of extracting a common subtree according to an embodiment of the present application with reference to fig. 5. Taking two DOM trees as an example, the specific flow is shown in fig. 5, the security detection device extracts the effective visible text distribution of the leaf nodes of the two DOM trees, judges whether the contents of the leaf nodes of the two DOM trees are the same, and if so, continues traversing; if not, the maximum sub-tree extraction ends. If the two DOM trees have leaf nodes with the same content, parent nodes of the leaf nodes with the same content in the two DOM trees are further extracted, and then whether the parent nodes of the two DOM trees are the same or not is judged. If the parent nodes are the same, whether other subtrees exist in the two DOM trees except the same parent node needs to be judged, and if the parent nodes are different, the extraction of the maximum subtree is finished. If the parent nodes are the same and other subtrees exist in the two DOM trees except the parent nodes, continuing to execute the traversal process on the other subtrees; and if the parent nodes are the same, if the situation that other subtrees do not exist in the two DOM trees except the parent nodes is judged, continuously extracting the parent nodes of the two DOM trees. It should be understood that the process of extracting a common sub-tree is described by taking two DOM trees as an example, and the number of the DOM trees is not limited in the embodiments of the present application.
This is described in detail below in conjunction with the DOM of FIG. 7. As shown in fig. 7, the first leaf node labeled with a of the DOM tree 1(id ═ 1) and the DOM tree 2(id ═ 2) are selected for comparison, for example, the corresponding content of the leaf node labeled with a of the DOM tree 1and the leaf node labeled with a of the DOM tree 2 in the web page is "About Us", and then the leaf node a of the DOM tree 1and the leaf node a of the DOM tree 2 are considered to be the same. And then extracting the parent node of the leaf node with the label a in the DOM tree with id being 1and the parent node of the leaf node with the label a in the DOM tree with id being 2, judging whether the two parent nodes are the same, and finishing the extraction of the maximum subtree if the two parent nodes are not the same. Here, since the parent node corresponding to the leaf node labeled with a of the DOM tree 1and the DOM tree 2 is the same, and there are leaf nodes labeled with a, the parent node in the DOM tree 1and the leaf nodes labeled with a under the parent node in the DOM tree 2 also need to be compared, and the result is also the same. And in this way, the comparison of the parent nodes at the upper layer is continued. When comparing the subtrees, since there are subtrees with different contents in tree 1and tree 2, such as the subtree with content 1 in tree 1 is different from the subtree with content 2 in tree 2, the largest common subtree extracted this time is subtree x. Similarly, other largest common sub-trees, such as sub-tree y, may also be extracted, derived inversely from other leaf nodes. Thus, subtree x and subtree y constitute a DOM tree of the template of the web page to be tested.
It should be understood that the examples in fig. 2, 4 to 7 are only for facilitating the understanding of the embodiments of the present application by those skilled in the art, and are not intended to limit the embodiments of the present application to the specific scenarios illustrated. It will be apparent to those skilled in the art that various equivalent modifications or variations can be made from the examples of fig. 2, 4 to 7, and such modifications or variations also fall within the scope of the embodiments of the present application. The method for detecting the security of the web page according to the embodiment of the present application is described in detail above with reference to fig. 1 to 7. The following describes a detection apparatus for security of a web page according to an embodiment of the present application with reference to fig. 8 and 9. It should be understood that the technical features described in the method embodiments are equally applicable to the following apparatus embodiments.
Fig. 8 shows a schematic block diagram of a detection apparatus 800 for security of a web page according to an embodiment of the present application, and the detection apparatus 800 may be used for the security detection apparatus 120 in fig. 1. Optionally, the apparatus 800 may be implemented by software and/or hardware, and the embodiment of the present application is not limited thereto. The apparatus 800 comprises:
a test response acquisition unit 810 and a similarity calculation unit 820;
the test response obtaining unit 810 is configured to:
sending a first test request and a second test request to a server of a web page to be tested, wherein the first test request comprises legal test parameters, and the second test request comprises illegal test parameters;
receiving a first web response page corresponding to the first test request and a second web response page corresponding to the second test request;
the similarity calculation unit 820 is configured to:
removing a public part in each web response page, wherein the public part in each web response page is a part irrelevant to SQL injection in the web page to be tested;
and calculating the similarity of the non-public part of the first web response page and the non-public part of the second web response page, and detecting the SQL injection risk of the web page to be tested according to the calculated similarity.
Optionally, the detecting device 800 further includes: common part acquisition unit the common part acquisition unit 830,
the public part obtaining unit 830 is configured to obtain a DOM tree of a template of the web page to be tested, where the template of the web page to be tested indicates a public part of the web page to be tested;
the similarity calculation unit 820 is configured to remove a common part in each web response page, and specifically includes:
respectively acquiring a DOM tree of the first web response page and a DOM tree of the second web response page;
according to the DOM tree of the template of the web page to be tested, which is acquired by the public part acquisition unit 830, the sub-tree in the DOM tree of the first web response page is eliminated;
and removing the subtree in the DOM tree of the second web response page according to the DOM tree of the template of the web page to be tested.
In an optional implementation manner, the similarity calculation unit 820 is configured to obtain a DOM tree of a template of the web page to be tested, and specifically includes:
sending at least two access requests to a server of the web page to be tested, wherein the at least two access requests contain different access parameters;
receiving at least two access response pages;
obtaining at least two access DOM trees according to the at least two access response pages;
and acquiring a DOM tree of the template of the web page according to the at least two access DOM trees, wherein the DOM tree of the template of the web page comprises a common sub-tree of the at least two access DOM trees.
Optionally, the similarity calculation unit 820 is configured to obtain a DOM tree of a template of the web page, and specifically includes:
and reversely traversing the nodes of the at least two access DOM trees to obtain the common subtree of the at least two access DOM trees.
In an optional implementation manner, the similarity calculation unit 820 is configured to calculate the similarity between the non-common part of the first web response page and the non-common part of the second web response page, and specifically includes:
and calculating the similarity between the DOM tree of the first web response page after the subtree is removed and the DOM tree of the second web response page after the subtree is removed.
In an optional implementation manner, the similarity calculation unit 820 is configured to calculate the similarity of the at least two removed test DOM trees, and specifically includes:
rendering the DOM tree of the first web response page after the subtree is removed and the DOM tree of the second web response page after the subtree is removed, and generating at least two test images;
and calculating the image similarity of the at least two test images.
It should be understood that the apparatus 800 for detecting security of a web page according to an embodiment of the present application may correspond to the method for detecting security of a web page in the foregoing method embodiment, for example, the method in fig. 3, and the above and other management operations and/or functions of each module in the apparatus 800 are respectively for implementing corresponding steps of the method in the foregoing method embodiment, so that beneficial effects in the foregoing method embodiment may also be implemented, and for brevity, no detailed description is provided here.
Fig. 9 is a schematic block diagram of a device 900 for detecting security of a web page according to an embodiment of the present application. As shown in fig. 9, the detection apparatus 900 includes a processing unit 901 and a communication interface 902, the processing unit 901 is configured to execute functions defined by an operating system and various software programs running on the detection apparatus 900, for example, functions of various software components on the security detection apparatus 800 shown in fig. 8, and specifically, for example, the processing unit 901 is configured to implement the function of the similarity calculation unit 820. The communication interface 902 is used for performing communication interaction with other computing nodes, and the communication interface 902 is used for implementing the function of the test response obtaining unit 810. Other computing nodes may be other physical servers and in particular, communication interface 902 may be a network adapter card. Optionally, the detection apparatus 900 may further include an input/output interface 903, where the input/output interface 903 is connected to an input/output device, and is configured to receive input information and output an operation result. The input/output interface 903 may be a mouse, a keyboard, a display, or an optical drive, etc. Optionally, the physical server may also include a secondary storage 904, also commonly referred to as external storage, the storage medium of the secondary storage 904 may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., optical disk), or a semiconductor medium (e.g., solid state disk), among others. The processing unit 901 may have various specific implementation forms, for example, the processing unit 901 may include a processor 9011 and a memory 9012, the processor 9011 may execute related operations according to a program unit stored in the memory 9012, the processor 9011 may be a Central Processing Unit (CPU) or a Graphics Processing Unit (GPU), and the processor 2011 may be a single-core processor or a multi-core processor. The processing unit 901 may also be implemented by using a logic device with built-in processing logic, such as a Field Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), or the like. Moreover, FIG. 9 is merely an example of a security detection device, which may include more or fewer components than shown in FIG. 9, or a different arrangement of components.
The method disclosed in the embodiments of the present application may be applied to a processor, or may be implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method embodiments may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, a discrete hardware component, a system on chip (SoC), a Central Processing Unit (CPU), a Network Processor (NP), a Digital Signal Processor (DSP), a Microcontroller (MCU), a Programmable logic controller (PLD), or other Integrated chip. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.
It will be appreciated that the memory in the embodiments of the subject application can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. The non-volatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of example, but not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic Random Access Memory (SDRAM), double data rate SDRAM, enhanced SDRAM, SLDRAM, Synchronous Link DRAM (SLDRAM), and direct rambus RAM (DR RAM). It should be noted that the memory of the systems and methods described herein is intended to comprise, without being limited to, these and any other suitable types of memory.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (13)

  1. A method of detecting security of a web page, comprising:
    sending a first test request and a second test request to a server of a web page to be tested, wherein the first test request comprises a legal test request, and the second test request comprises an illegal test request;
    receiving a first web response page corresponding to the first test request and a second web response page corresponding to the second test request;
    removing a public part in each web response page, wherein the public part in each web response page is a part irrelevant to Structured Query Language (SQL) injection in the web page to be tested;
    and calculating the similarity of the non-public part of the first web response page and the non-public part of the second web response page, and detecting the SQL injection risk of the webpage to be tested according to the calculated similarity.
  2. The method of claim 1, wherein the culling common portions of each web response page comprises:
    respectively acquiring a Document Object Model (DOM) tree of the first web response page and a DOM tree of the second web response page;
    acquiring a DOM tree of the template of the web page to be tested, wherein the template of the web page to be tested indicates a public part of the web page to be tested;
    according to the DOM tree of the template of the web page to be tested, removing a sub-tree which is the same as the DOM tree of the template in the DOM tree of the first web response page;
    and according to the DOM tree of the template of the web page to be tested, removing the subtree which is the same as the DOM tree of the template in the DOM tree of the second web response page.
  3. The method according to claim 2, wherein the obtaining of the DOM tree of the template of the web page to be tested specifically comprises:
    sending at least two access requests to a server of the web page to be tested, wherein the at least two access requests comprise different access parameters;
    receiving at least two access response pages;
    obtaining at least two access DOM trees according to the at least two access response pages;
    and acquiring a DOM tree of the template of the web page according to the at least two access DOM trees, wherein the DOM tree of the template of the web page comprises a common sub-tree of the at least two access DOM trees.
  4. The method according to claim 3, wherein the obtaining of the DOM tree of the template of the web page specifically comprises:
    and reversely traversing the nodes of the at least two access DOM trees to obtain the common subtree of the at least two access DOM trees.
  5. The method according to any one of claims 2-4, wherein the calculating the similarity between the non-common part of the first web response page and the non-common part of the second web response page specifically comprises:
    and calculating the similarity between the DOM tree of the first web response page after the subtree is removed and the DOM tree of the second web response page after the subtree is removed.
  6. The method according to any of claims 2 to 4, wherein said calculating the similarity of said at least two eliminated test DOM trees comprises:
    rendering the DOM tree of the first web response page after the subtree is removed and the DOM tree of the second web response page after the subtree is removed, and generating at least two test images;
    and calculating the image similarity of the at least two test images.
  7. The detection device for the safety of the web page is characterized by comprising a test response acquisition unit and a similarity calculation unit;
    the test response acquisition unit is used for:
    sending a first test request and a second test request to a server of a web page to be tested, wherein the first test request comprises legal test parameters, and the second test request comprises illegal test parameters;
    receiving a first web response page corresponding to the first test request and a second web response page corresponding to the second test request;
    the similarity calculation unit is configured to:
    removing a public part in each web response page, wherein the public part in each web response page is a part irrelevant to SQL injection in the web page to be tested;
    and calculating the similarity of the non-public part of the first web response page and the non-public part of the second web response page, and detecting the SQL injection risk of the web page to be tested according to the calculated similarity.
  8. The sensing device of claim 7, further comprising: a common part acquisition unit for acquiring the common part,
    the public part acquisition unit is used for acquiring a DOM tree of a template of the web page to be tested, and the template of the web page to be tested indicates a public part of the web page to be tested;
    the similarity calculation unit is used for eliminating a common part in each web response page, and specifically includes:
    respectively acquiring a DOM tree of the first web response page and a DOM tree of the second web response page;
    according to the DOM tree of the template of the web page to be tested, which is obtained by the public part obtaining unit, removing a sub-tree which is the same as the DOM tree of the template in the DOM tree of the first web response page;
    and according to the DOM tree of the template of the web page to be tested, removing the subtree which is the same as the DOM tree of the template in the DOM tree of the second web response page.
  9. The detection apparatus according to claim 8, wherein the similarity calculation unit is configured to obtain a DOM tree of the template of the web page to be tested, and specifically includes:
    sending at least two access requests to a server of the web page to be tested, wherein the at least two access requests contain different access parameters;
    receiving at least two access response pages;
    obtaining at least two access DOM trees according to the at least two access response pages;
    and acquiring a DOM tree of the template of the web page according to the at least two access DOM trees, wherein the DOM tree of the template of the web page comprises a common sub-tree of the at least two access DOM trees.
  10. The apparatus according to claim 9, wherein the similarity calculation unit is configured to obtain a DOM tree of the template of the web page, and specifically includes:
    and reversely traversing the nodes of the at least two access DOM trees to obtain the common subtree of the at least two access DOM trees.
  11. The apparatus according to any one of claims 8 to 10, wherein the similarity calculation unit is configured to calculate a similarity between the non-common part of the first web response page and the non-common part of the second web response page, and specifically includes:
    and calculating the similarity between the DOM tree of the first web response page after the subtree is removed and the DOM tree of the second web response page after the subtree is removed.
  12. The detection apparatus according to any one of claims 8 to 10, wherein the similarity calculation unit is configured to calculate the similarity of the at least two eliminated test DOM trees, and specifically includes:
    rendering the DOM tree of the first web response page after the subtree is removed and the DOM tree of the second web response page after the subtree is removed, and generating at least two test images;
    and calculating the image similarity of the at least two test images.
  13. A computing device comprising at least one processor and a memory unit;
    the storage unit is used for storing instructions;
    the processor is coupled with the storage unit, and when executed by the at least one processor, causes the processor to perform the method of any of claims 1-6.
CN201880095842.6A 2018-08-17 2018-08-17 Method and device for detecting web page security Active CN112470154B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/101148 WO2020034212A1 (en) 2018-08-17 2018-08-17 Method and device for checking web page security

Publications (2)

Publication Number Publication Date
CN112470154A true CN112470154A (en) 2021-03-09
CN112470154B CN112470154B (en) 2024-03-05

Family

ID=69524556

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201880095842.6A Active CN112470154B (en) 2018-08-17 2018-08-17 Method and device for detecting web page security

Country Status (2)

Country Link
CN (1) CN112470154B (en)
WO (1) WO2020034212A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113377867A (en) * 2021-06-10 2021-09-10 四川省明厚天信息技术股份有限公司 Data synchronization method and device and electronic equipment

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2598412A (en) 2020-08-18 2022-03-02 Clario Tech Ltd A method for detecting a web skimmer on a "payment page"

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120239598A1 (en) * 2011-03-15 2012-09-20 Cascaval Gheorghe C Machine Learning Method to Identify Independent Tasks for Parallel Layout in Web Browsers
CN102831345A (en) * 2012-07-30 2012-12-19 西北工业大学 Injection point extracting method in SQL (Structured Query Language) injection vulnerability detection
CN106919503A (en) * 2016-11-15 2017-07-04 阿里巴巴集团控股有限公司 The method of testing and device of application program

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102799830B (en) * 2012-08-06 2015-06-17 厦门市美亚柏科信息股份有限公司 Improved SQL (Structured Query Language) injection flaw detection method
CN105279086B (en) * 2015-10-16 2018-01-19 山东大学 A kind of method of the automatic detection e-commerce website logic leak based on flow chart
CN106503244A (en) * 2016-11-08 2017-03-15 天津海量信息技术股份有限公司 A kind of processing method of URL similarity

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120239598A1 (en) * 2011-03-15 2012-09-20 Cascaval Gheorghe C Machine Learning Method to Identify Independent Tasks for Parallel Layout in Web Browsers
CN102831345A (en) * 2012-07-30 2012-12-19 西北工业大学 Injection point extracting method in SQL (Structured Query Language) injection vulnerability detection
CN106919503A (en) * 2016-11-15 2017-07-04 阿里巴巴集团控股有限公司 The method of testing and device of application program

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张晨: "基于网页DOM树比对的SQL注入漏洞检测", 《计算机工程》, vol. 38, no. 18, pages 111 - 115 *
罗明宇: "基于DOM树序列值比对的SQL注入漏洞检测", 《计算机工程与设计》, vol. 36, no. 2, pages 350 - 354 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113377867A (en) * 2021-06-10 2021-09-10 四川省明厚天信息技术股份有限公司 Data synchronization method and device and electronic equipment
CN113377867B (en) * 2021-06-10 2022-10-21 四川省明厚天信息技术股份有限公司 Data synchronization method and device and electronic equipment

Also Published As

Publication number Publication date
WO2020034212A1 (en) 2020-02-20
CN112470154B (en) 2024-03-05

Similar Documents

Publication Publication Date Title
US20130031461A1 (en) Detecting repeat patterns on a web page
CN110537180B (en) System and method for tagging elements in internet content within a direct browser
US9037965B2 (en) Browser and operating system compatibility
US7299407B2 (en) Marking and annotating electronic documents
US9904936B2 (en) Method and apparatus for identifying elements of a webpage in different viewports of sizes
US20150067476A1 (en) Title and body extraction from web page
CN108021692B (en) Method for monitoring webpage, server and computer readable storage medium
JP6203374B2 (en) Web page style address integration
KR101892206B1 (en) Bidirectional text checker
CN109033282B (en) Webpage text extraction method and device based on extraction template
KR20140038459A (en) Live browser tooling in an integrated development environment
CN107612908B (en) Webpage tampering monitoring method and device
CN111008348A (en) Anti-crawler method, terminal, server and computer readable storage medium
CN109271598B (en) Method, device and storage medium for extracting news webpage content
US20150134669A1 (en) Element identification in a tree data structure
WO2018093899A1 (en) Electronic form identification using spatial information
CN106547895B (en) Webpage information extraction method and device
US10452723B2 (en) Detecting malformed application screens
CN112470154B (en) Method and device for detecting web page security
CN111460803A (en) Equipment identification method based on Web management page of industrial Internet of things equipment
EP3173965A1 (en) System and method for enablement of data masking for web documents
US10339207B2 (en) Identifying a functional fragment of a document object model tree
CN110390037B (en) Information classification method, device and equipment based on DOM tree and storage medium
CN111414404A (en) Data visualization device and method
CN111177518A (en) Webpage purification method, system and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20220208

Address after: 550025 Huawei cloud data center, jiaoxinggong Road, Qianzhong Avenue, Gui'an New District, Guiyang City, Guizhou Province

Applicant after: Huawei Cloud Computing Technology Co.,Ltd.

Address before: 518129 Bantian HUAWEI headquarters office building, Longgang District, Guangdong, Shenzhen

Applicant before: HUAWEI TECHNOLOGIES Co.,Ltd.

GR01 Patent grant
GR01 Patent grant