CN114329149A - Detection method and device for automatically capturing page information, electronic equipment and readable storage medium - Google Patents

Detection method and device for automatically capturing page information, electronic equipment and readable storage medium Download PDF

Info

Publication number
CN114329149A
CN114329149A CN202111460324.XA CN202111460324A CN114329149A CN 114329149 A CN114329149 A CN 114329149A CN 202111460324 A CN202111460324 A CN 202111460324A CN 114329149 A CN114329149 A CN 114329149A
Authority
CN
China
Prior art keywords
page
dom tree
response
information
application
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111460324.XA
Other languages
Chinese (zh)
Inventor
马粤
李勇
罗仕强
李凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
River Security Inc
Original Assignee
River Security Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by River Security Inc filed Critical River Security Inc
Priority to CN202111460324.XA priority Critical patent/CN114329149A/en
Publication of CN114329149A publication Critical patent/CN114329149A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • User Interface Of Digital Computer (AREA)

Abstract

The invention discloses a detection method and device for automatically capturing page information, electronic equipment and a readable storage medium, and relates to the technical field of data processing, in particular to the technical field of artificial intelligence such as big data and information flow. The specific implementation scheme is as follows: inserting a detection code in the obtained response page requested by the application, and sending the response page with the detection code inserted into the response page to the application so as to execute the detection code for the application to register a DOM tree change event to be monitored and a response operation of the DOM tree change event; responding to DOM tree structure information of a response page provided by an application, and detecting and processing the DOM tree structure information of the response page to determine whether abnormal behaviors of page information automatic capture exist or not; the DOM tree structure information of the response page is provided for the application by executing response operation of the DOM tree change event; and responding to the abnormal behavior that the page information is automatically captured, and performing abnormal response processing.

Description

Detection method and device for automatically capturing page information, electronic equipment and readable storage medium
Technical Field
The disclosure relates to the technical field of data processing, in particular to the technical field of artificial intelligence such as big data and information flow.
Background
With the deep development of the internet, Applications (APPs) applied to terminals are in the endlessly. In the process of using the application, a user may encounter a situation that page information of the application is maliciously grabbed by the automatic grabbing tool. This not only results in the theft of the core content of the application, but also may lead to the crash of the service server of the application.
Therefore, how to effectively protect the page information of the application and prevent the page information from being maliciously captured by the automatic capture tool has important significance.
Disclosure of Invention
The disclosure provides a detection method and device for automatically capturing page information, electronic equipment and a readable storage medium.
According to an aspect of the present disclosure, a method for detecting automatic page information capture is provided, including:
inserting a detection code in the obtained response page requested by the application, and sending the response page inserted with the detection code to the application so that the application executes the detection code to register a DOM tree change event to be monitored and a response operation of the DOM tree change event;
responding to DOM tree structure information of the response page provided by the application, and detecting and processing the DOM tree structure information of the response page to determine whether abnormal behaviors of page information automatic capture exist or not; the DOM tree structure information of the response page is provided for the application by executing response operation of the DOM tree change event;
and responding to the abnormal behavior that the page information is automatically captured, and performing abnormal response processing.
According to another aspect of the present disclosure, there is provided a detection apparatus for automatically capturing page information, including:
a code inserting unit, configured to insert a detection code in a response page requested by the obtained application, and send the response page into which the detection code is inserted to the application, so that the application executes the detection code to register a DOM tree change event to be monitored and a response operation of the DOM tree change event;
the structure detection unit is used for responding to the DOM tree structure information of the response page provided by the application and detecting and processing the DOM tree structure information of the response page so as to determine whether abnormal behaviors of page information automatic capture exist or not; the DOM tree structure information of the response page is provided for the application by executing response operation of the DOM tree change event;
and the exception handling unit is used for responding to the abnormal behavior that the page information is automatically captured and carrying out exception response handling.
According to still another aspect of the present disclosure, there is provided an electronic device including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method of the aspects and any possible implementation described above.
According to yet another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of the above-described aspect and any possible implementation.
According to yet another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method of the aspect and any possible implementation as described above.
As can be seen from the foregoing technical solutions, in the embodiment of the present disclosure, a detection code is inserted into an obtained response page requested by an application, and the response page into which the detection code is inserted is sent to the application, so that the application executes the detection code to register a DOM tree change event to be monitored and a response operation of the DOM tree change event, and further, in response to DOM tree structure information of the response page provided by the application, the DOM tree structure information of the response page is detected and processed to determine whether there is an abnormal behavior of page information automatic capture, where the DOM tree structure information of the response page is provided for the application by executing the response operation of the DOM tree change event, so that abnormal response processing can be performed in response to determining that there is the abnormal behavior of page information automatic capture, and since a detection code capable of monitoring the DOM tree change event is inserted into the response page, whether abnormal behaviors of automatic capturing of page information exist can be determined according to the change of the DOM tree of the response page, and some abnormal behaviors of automatic capturing of the page information, which cannot be detected based on conventional detection of page content of the page, can be effectively detected, so that the reliability of automatic capturing and detecting of the page information is improved.
In addition, by adopting the technical scheme provided by the disclosure, the safety of page information access of the application can be effectively improved.
In addition, by adopting the technical scheme provided by the disclosure, the user experience can be effectively improved.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
To more clearly illustrate the technical solutions in the embodiments of the present disclosure, the drawings needed for the embodiments or the prior art descriptions will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present disclosure, and those skilled in the art can also obtain other drawings according to the drawings without inventive labor. The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a schematic diagram according to a first embodiment of the present disclosure;
FIG. 2 is a schematic diagram according to a second embodiment of the present disclosure;
FIG. 3 is a schematic diagram according to a third embodiment of the present disclosure;
fig. 4 is a block diagram of an electronic device for implementing the detection method for automatically capturing page information according to the embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
It is to be understood that the described embodiments are only a few, and not all, of the disclosed embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
It should be noted that the terminal device involved in the embodiments of the present disclosure may include, but is not limited to, a mobile phone, a Personal Digital Assistant (PDA), a wireless handheld device, a Tablet Computer (Tablet Computer), and other intelligent devices; the display device may include, but is not limited to, a personal computer, a television, and the like having a display function.
In addition, the term "and/or" herein is only one kind of association relationship describing an associated object, and means that there may be three kinds of relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
With the deep development of the internet, Applications (APPs) applied to terminals are in the endlessly. In the process of using the application, a user may encounter a situation that page information of the application is maliciously grabbed by the automatic grabbing tool. This not only results in the theft of the core content of the application, but also may lead to the crash of the service server of the application.
An application may execute code written in a scripting language, the page code may execute in a page space, and the code of a plug-in mechanism executes in a special process space, which may control the Document Object Model (DOM) tree structure of the page space. However, the code of the page space is not aware of the code of the plug-in mechanism of the process space.
Taking a crawler as an example, the crawler can access information of a computer system through an abnormal way or an aggressive technical means, and great interference is caused to normal operation of the system and normal users.
At present, one implementation means of the crawler is as follows: and programming an automatic program by using a browser plug-in mechanism and a scripting language to realize automatic capture of page information.
The crawler (i.e. plug-in type crawler) using the browser plug-in mechanism operates the DOM tree of the page to satisfy various functions in the crawling process. All of the page elements on a page may be organized in a tree structure, forming a DOM tree of pages. The DOM tree of a page may be composed of node objects and relationships between the node objects.
For example, each tag in a HyperText Mark-up Language (HTML) page may be loaded as an element node object in the DOM tree of the HTML page; the attribute of each tag is loaded into an attribute node object on a DOM tree of the HTML page; the contents of each tag will be loaded as a text node object on the DOM tree of the HTML page.
Taking the Button element node object in the DOM tree of the page as an example, represents a Button. When a normal user accesses, the Button element is only exposed by an application or the attribute of the Button element is read by a code. And the plug-in type crawler modifies the attribute of the Button element, such as adding a mark for recording the grabbing process.
The special authority of the plug-in is utilized, so that the conventional detection of abnormal behavior of automatically capturing page information based on the page content of the page can be bypassed.
Therefore, it is desirable to provide a detection method capable of detecting abnormal behavior of automatic capture of page information that cannot be detected by conventional detection means.
Fig. 1 is a schematic diagram according to a first embodiment of the present disclosure, as shown in fig. 1.
101. Inserting a detection code in the obtained response page requested by the application, and sending the response page inserted with the detection code to the application so that the application executes the detection code to register a DOM tree change event to be monitored and a response operation of the DOM tree change event.
The detection code may be code written by using a scripting language, such as JavaScript code.
102. And responding to the DOM tree structure information of the response page provided by the application, and detecting and processing the DOM tree structure information of the response page to determine whether abnormal behaviors automatically captured by the page information exist or not.
And the DOM tree structure information of the response page is provided for the application through response operation of executing the DOM tree change event.
103. And responding to the abnormal behavior that the page information is automatically captured, and performing abnormal response processing.
Therefore, by utilizing the standard interface of the application, the DOM tree change event of the response page is monitored, and then the DOM tree structure information after the change is further combined, whether the abnormal behavior of automatically capturing the page information exists can be detected, so that the abnormal behavior of automatically capturing the page information, which cannot be detected by the conventional detection means, is detected.
It should be noted that part or all of the execution subjects of 101 to 103 may be an application located at the local terminal, or may also be a functional unit such as a plug-in or Software Development Kit (SDK) set in the application located at the local terminal, or may also be a processing engine located in a server on the network side, or may also be a distributed system located on the network side, for example, a processing engine or a distributed system in a detection device for automatically capturing page information on the network side, which is not particularly limited in this embodiment.
It is to be understood that the application may be a native application (native app) installed on the local terminal, or may also be a web page program (webApp) of a browser on the local terminal, which is not limited in this embodiment.
In this way, by inserting a detection code in a response page requested by an acquired application, and sending the response page into which the detection code is inserted to the application, the application executes the detection code to register a DOM tree change event to be monitored and a response operation of the DOM tree change event, and further, in response to DOM tree structure information of the response page provided by the application, performs detection processing on the DOM tree structure information of the response page to determine whether there is an abnormal behavior of page information automatic capture, wherein the DOM tree structure information of the response page is provided for the application by executing the response operation of the DOM tree change event, so that it is possible to perform abnormal response processing in response to determining that there is an abnormal behavior of page information automatic capture, since the detection code capable of monitoring a DOM tree change event is inserted in the response page, whether abnormal behaviors of automatic capturing of page information exist can be determined according to the change of the DOM tree of the response page, and some abnormal behaviors of automatic capturing of the page information, which cannot be detected based on conventional detection of page content of the page, can be effectively detected, so that the reliability of automatic capturing and detecting of the page information is improved.
In the present disclosure, in a preferred embodiment, 101 to 103 may be executed by an independent device on the network side, for example, a detection device for automatically capturing page information on the network side, and at this time, the application side and the service server side may perform the service logic normally without any modification.
Optionally, in a possible implementation manner of this embodiment, in 101, a response page requested by an application and returned by a service server may be specifically intercepted, and then a detection code is inserted in the intercepted response page. Then, a response page into which the detection code is inserted is sent to the application.
Wherein, the inserted detection code can contain the behavior characteristics of the DOM tree structure and the behavior analysis logic based on the DOM tree structure and is subjected to the obfuscated encryption processing.
Optionally, in a possible implementation manner of this embodiment, after 101, after receiving the acquisition response page provided by the service server, the application may further execute the detection code inserted in the response page while outputting or reading the response page. After the application executes the detection code, registering a DOM tree change event to be monitored according to the behavior characteristics of the DOM tree structure, and registering response operation of the monitored DOM tree change event according to behavior analysis logic based on the DOM tree structure.
The DOM tree change event may include, but is not limited to, at least one of the following events:
change events of element attributes of elements of the DOM tree;
newly added events of elements of the DOM tree; and
removal events of DOM tree elements.
And if the response operation of the DOM tree change event can be the DOM tree change event, providing the DOM tree structure information of the corresponding response page to the service server.
In a specific implementation process, if the response operation of the DOM tree change event is that the DOM tree change event occurs, the DOM tree structure information of the corresponding response page is separately provided to the service server.
For example, a message sent by an application to a service server may be simulated, and DOM tree structure information of a corresponding response page may be provided to the service server.
In this example, if the response operation of the DOM tree change event is specifically the occurrence of the DOM tree change event, the DOM tree structure information of the corresponding response page is inserted into a new message sent by the application to the service server, and is provided to the service server.
The new message is a new message used by the response operation to indicate that the DOM tree structure information of the corresponding response page can be provided to the service server individually.
Accordingly, in 102, in particular, in response to a new message triggered by the application and containing the DOM tree structure information of the response page, the DOM tree structure information of the response page may be detected to determine whether there is an abnormal behavior of automatic crawling of page information. And the DOM tree structure information of the response page is inserted into the new message by the application through executing the response operation of the DOM tree change event.
In another specific implementation process, if the response operation of the DOM tree change event is that the DOM tree change event occurs, the DOM tree structure information of the corresponding response page is provided to the service server together with other information sent by the application to the service server.
For example, the DOM tree structure information of the corresponding response page may be provided to the service server together with the page request sent by the application to the service server.
In this example, the response operation of the DOM tree change event may specifically be that the DOM tree change event occurs, and the DOM tree structure information of the corresponding response page is inserted into a page request sent by the application to the service server and provided to the service server.
Accordingly, in 102, in particular, in response to a page request triggered by the application and containing the DOM tree structure information of the response page, the DOM tree structure information of the response page may be detected to determine whether there is an abnormal behavior of automatic crawling of page information. And the DOM tree structure information of the response page is inserted into the page request by the application through executing the response operation of the DOM tree change event.
Or, for another example, the DOM tree structure information of the corresponding response page may be provided to the service server together with the heartbeat request sent by the application to the service server.
In this example, if the response operation of the DOM tree change event is specifically the occurrence of the DOM tree change event, the DOM tree structure information of the corresponding response page is inserted into a heartbeat request sent by the application to the service server, and is provided to the service server.
Accordingly, in 102, in particular, in response to the heartbeat request that is triggered by the application and includes the DOM tree structure information of the response page, the DOM tree structure information of the response page may be detected and processed to determine whether there is an abnormal behavior of automatic crawling of page information. And inserting the heartbeat request into the DOM tree structure information of the response page by executing the response operation of the DOM tree change event by the application.
Optionally, in a possible implementation manner of this embodiment, in 102, specifically, the DOM tree structure information of the response page sent by the application to the service server may be intercepted, so that the DOM tree structure information of the response page is detected and processed.
Optionally, in a possible implementation manner of this embodiment, the DOM tree structure information of the response page may include, but is not limited to, the following information:
structure change information of a DOM tree of the response page; or
And complete structure information of a DOM tree of the response page.
Optionally, in a possible implementation manner of this embodiment, in 102, specifically, the changed element of the DOM tree may be determined according to the DOM tree structure information of the response page, and further, whether an abnormal behavior of automatic capture of page information exists may be determined according to an element attribute of the changed element of the DOM tree.
Specifically, after determining the elements of the DOM tree change, the method may further determine, according to the element attributes of the elements of the DOM tree change, whether an abnormal behavior of page information automatic capture exists in combination with the abnormal capture characteristic database, and further may further determine a crawler type and a name corresponding to the abnormal behavior.
For example, it may be specifically determined that an element of the DOM tree that changes is a Body (Body) element by further combining an exception capture feature database according to the DOM tree structure information of the response page, and an attribute value of a modified element attribute sidexplaning flag is 1 by determining an attribute value of the modified element attribute sidexplaning flag (custom character) according to an element attribute sidexplaning flag of the Body (Body) element, and then, it may be determined that there is an exception behavior for page information automatic capture, which is a plug-in crawler named "Katalon Recorder" by combining the exception capture feature database.
Or, for another example, according to the DOM tree structure information of the response page, further combining with the exception grabbing feature database, determining that the newly added element of the DOM tree is a tag (Div) element, and according to the element attribute Style of the tag (Div) element, determining that the attribute value of the added element attribute Style contains, for example, "|! important; visibility: visible! important; width of 100%! important; 2147483646Z-index! important; the text of the page information can be combined with the abnormal capture characteristic database to determine that the abnormal behavior of page information automatic capture exists, and the abnormal behavior is named as the plug-in crawler of the Web crawler.
Generally, the existing detection of the crawler is based on a special object of a scripting language, when the crawler runs in a plug-in space of a browser plug-in, the code feature of the crawler cannot be detected, and the existing detection means is invalid. Thus, by utilizing the structural changes of the DOM tree, detection of page-based DOM behavior is achieved, which can be detected even if the crawler runs in the plug-in space of the browser plug-in.
In 103, in response to determining that there is an abnormal behavior of automatically capturing page information, the abnormal response processing may be directly rejecting the page request of the application, or may also be allowing a service server to normally process the page request of the application, and normally obtain another response page requested by the application, and perform abnormal behavior alarm processing, or may also be shielding a current user of the application, or may also be shielding the current application, which is not particularly limited in this embodiment.
In this disclosure, it may further be determined that there is no abnormal behavior of automatic capture of page information, and then, normal response processing may be performed. For example, the service request of the application may be allowed to be processed by the service server, or another response page requested by the application may be normally acquired, and the operations of 101 to 103 are re-executed, which is not particularly limited in this embodiment.
The following describes the technical solution of the present disclosure in detail by taking separately deployed detection devices and performing operations of page request and response through interaction between a browser and a service server as examples, as shown in fig. 2.
201. The browser sends a first page request to the business server.
202. The detection device intercepts the first page request without any processing.
203. The detection device directly forwards the intercepted first page request to the service server.
204. And the service server returns a first response page to the browser according to the first page request.
205. The detection device intercepts a first response page, and a detection code is inserted into the first response page.
206. And the detection equipment sends a response page into which the detection code is inserted to the browser.
207. And the browser executes the detection code to register the DOM tree change event to be monitored and the response operation of the DOM tree change event.
And if the response operation of the DOM tree change event is specifically the occurrence of the DOM tree change event, inserting the DOM tree structure information of the corresponding first response page into a page request sent by the browser to the service server, and providing the page request to the service server.
208. And the browser monitors the DOM tree change event, inserts the DOM tree structure information of the first response page into the second page request and sends the DOM tree structure information to the service server.
209. The detection equipment intercepts and captures a second page request inserted into DOM tree structure information of the first response page, and determines whether abnormal behaviors of automatic capturing of page information exist or not according to the DOM tree structure information of the first response page.
Specifically, the detection device may specifically intercept a second page request inserted into DOM tree structure information of the first response page, determine an element of a DOM tree change according to the DOM tree structure information of the first response page, and further determine whether an abnormal behavior of automatic page information capture exists by further combining an abnormal capture feature database according to an element attribute of the element of the DOM tree change.
In one case, in response to determining that there is an abnormal behavior in which the page information is automatically captured, an abnormal response process may be performed.
For example, a second page request to insert the DOM tree structure information of the first response page is directly rejected.
Or, for another example, the second page request inserted with the DOM tree structure information of the first response page is forwarded to the service server, and the abnormal behavior alarm processing is performed, the service server normally processes the second page request, normally obtains another response page requested by the application, and re-executes 205 to 209 operations.
Or, for another example, mask all page requests of the current user of the application.
Or, for another example, mask all page requests of the current application.
In another case, in response to determining that there is no abnormal behavior of automatic crawling of page information, normal response processing may be performed.
For example, the second page request inserted into the DOM tree structure information of the first response page is forwarded to the service server, which normally processes the second page request.
Or for another example, another response page requested by the application is normally acquired, and the operations 205 to 209 are executed again.
In this embodiment, a detection code is inserted into an acquired response page requested by an application, and a response page into which the detection code is inserted is sent to the application, so that the application executes the detection code to register a DOM tree change event to be monitored and a response operation of the DOM tree change event, and further, in response to DOM tree structure information of the response page provided by the application, the DOM tree structure information of the response page is detected and processed to determine whether an abnormal behavior of page information automatic capture exists, wherein the DOM tree structure information of the response page is provided for the application by executing the response operation of the DOM tree change event, so that the abnormal response processing can be performed in response to determining that the abnormal behavior of page information automatic capture exists, and since the detection code capable of monitoring the DOM tree change event is inserted into the response page, whether abnormal behaviors of automatic capturing of page information exist can be determined according to the change of the DOM tree of the response page, and some abnormal behaviors of automatic capturing of the page information, which cannot be detected based on conventional detection of page content of the page, can be effectively detected, so that the reliability of automatic capturing and detecting of the page information is improved.
In addition, by adopting the technical scheme provided by the disclosure, the safety of page information access of the application can be effectively improved.
In addition, by adopting the technical scheme provided by the disclosure, manual operation is not needed, the operation is simple, errors are not easy to occur, and the safety, efficiency and reliability of page information access of the application can be further improved.
In addition, by adopting the technical scheme provided by the disclosure, the user experience can be effectively improved.
It is noted that while for simplicity of explanation, the foregoing method embodiments have been described as a series of acts or combination of acts, it will be appreciated by those skilled in the art that the present disclosure is not limited by the order of acts, as some steps may, in accordance with the present disclosure, occur in other orders and concurrently. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required for the disclosure.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
Fig. 3 is a schematic diagram according to a third embodiment of the present disclosure, as shown in fig. 3. The detection apparatus 300 for automatic page information crawling of the present embodiment may include a code insertion unit 301, a structure detection unit 302, and an exception handling unit 303. The code inserting unit 301 is configured to insert a detection code in a response page requested by the acquired application, and send the response page into which the detection code is inserted to the application, so that the application executes the detection code to register a DOM tree change event to be monitored and a response operation of the DOM tree change event; a structure detection unit 302, configured to perform detection processing on the DOM tree structure information of the response page in response to the DOM tree structure information of the response page provided by the application, so as to determine whether there is an abnormal behavior of automatic capture of page information; the DOM tree structure information of the response page is provided for the application by executing response operation of the DOM tree change event; and the exception handling unit 303 is configured to, in response to determining that there is an exception behavior of automatic capture of the page information, perform exception response handling.
It should be noted that, part or all of the detection apparatus for automatically capturing page information in this embodiment may be an application located at the local terminal, or may also be a functional unit such as a plug-in or Software Development Kit (SDK) set in the application located at the local terminal, or may also be a processing engine located in a server on the network side, or may also be a distributed system located on the network side, for example, a processing engine or a distributed system in a detection platform for automatically capturing page information on the network side, and this embodiment is not particularly limited thereto.
It is to be understood that the application may be a native application (native app) installed on the local terminal, or may also be a web page program (webApp) of a browser on the local terminal, which is not limited in this embodiment.
Optionally, in a possible implementation manner of this embodiment, the structure detecting unit 302 may be specifically configured to, in response to a page request that is triggered by the application and includes DOM tree structure information of the response page, perform detection processing on the DOM tree structure information of the response page, so as to determine whether there is an abnormal behavior of automatically capturing page information; and the DOM tree structure information of the response page is inserted into the page request by the application through executing the response operation of the DOM tree change event.
Optionally, in a possible implementation manner of this embodiment, the code inserting unit 301 may be further configured to normally acquire another response page requested by the application in response to determining that there is no abnormal behavior of automatic crawling of page information.
Optionally, in a possible implementation manner of this embodiment, the DOM tree structure information of the response page may include, but is not limited to, the following information:
structure change information of a DOM tree of the response page; or
And complete structure information of a DOM tree of the response page.
Optionally, in a possible implementation manner of this embodiment, the structure detecting unit 302 may be specifically configured to determine an element of the DOM tree change according to DOM tree structure information of the response page; and determining whether abnormal behaviors of automatic capturing of page information exist or not according to the element attributes of the changed elements of the DOM tree.
It should be noted that the method in the embodiment corresponding to fig. 1 and the method executed by the testing device in the embodiment corresponding to fig. 2 may be implemented by the detection apparatus for automatically capturing page information provided in this embodiment. For a detailed description, reference may be made to relevant contents in the embodiments corresponding to fig. 1 and fig. 2, and details are not described here.
In this embodiment, a detection code is inserted into an acquired response page requested by an application, and a response page into which the detection code is inserted is sent to the application, so that the application executes the detection code to register a DOM tree change event to be monitored and a response operation of the DOM tree change event, and further, in response to DOM tree structure information of the response page provided by the application, the DOM tree structure information of the response page is detected and processed to determine whether an abnormal behavior of page information automatic capture exists, wherein the DOM tree structure information of the response page is provided for the application by executing the response operation of the DOM tree change event, so that the abnormal response processing can be performed in response to determining that the abnormal behavior of page information automatic capture exists, and since the detection code capable of monitoring the DOM tree change event is inserted into the response page, whether abnormal behaviors of automatic capturing of page information exist can be determined according to the change of the DOM tree of the response page, and some abnormal behaviors of automatic capturing of the page information, which cannot be detected based on conventional detection of page content of the page, can be effectively detected, so that the reliability of automatic capturing and detecting of the page information is improved.
In addition, by adopting the technical scheme provided by the disclosure, the safety of page information access of the application can be effectively improved.
In addition, by adopting the technical scheme provided by the disclosure, manual operation is not needed, the operation is simple, errors are not easy to occur, and the safety, efficiency and reliability of page information access of the application can be further improved.
In addition, by adopting the technical scheme provided by the disclosure, the user experience can be effectively improved.
In the technical scheme of the disclosure, the acquisition, storage, application and the like of the related response page all accord with the regulations of related laws and regulations, and do not violate the good custom of the public order.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
FIG. 4 shows a schematic block diagram of an example electronic device 400 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 4, the electronic device 400 includes a computing unit 401 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)402 or a computer program loaded from a storage unit 408 into a Random Access Memory (RAM) 403. In the RAM 403, various programs and data required for the operation of the electronic device 400 can also be stored. The computing unit 401, ROM 402, and RAM 403 are connected to each other via a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.
A number of components in the electronic device 400 are connected to the I/O interface 405, including: an input unit 406 such as a keyboard, a mouse, or the like; an output unit 407 such as various types of displays, speakers, and the like; a storage unit 408 such as a magnetic disk, optical disk, or the like; and a communication unit 409 such as a network card, modem, wireless communication transceiver, etc. The communication unit 409 allows the electronic device 400 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
Computing unit 401 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 401 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 401 executes the respective methods and processes described above, such as the detection method of automatic page information crawling. For example, in some embodiments, the detection method of automatic crawling of page information may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 408. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 400 via the ROM 402 and/or the communication unit 409. When the computer program is loaded into RAM 403 and executed by computing unit 401, one or more steps of the detection method for automatic crawling of page information described above may be performed. Alternatively, in other embodiments, the computing unit 401 may be configured by any other suitable means (e.g., by means of firmware) to perform the detection method of page information auto-crawling.
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), the internet, and blockchain networks.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (12)

1. A detection method for automatically capturing page information is characterized by comprising the following steps:
inserting a detection code in the obtained response page requested by the application, and sending the response page inserted with the detection code to the application so that the application executes the detection code to register a DOM tree change event to be monitored and a response operation of the DOM tree change event;
responding to DOM tree structure information of the response page provided by the application, and detecting and processing the DOM tree structure information of the response page to determine whether abnormal behaviors of page information automatic capture exist or not; the DOM tree structure information of the response page is provided for the application by executing response operation of the DOM tree change event;
and responding to the abnormal behavior that the page information is automatically captured, and performing abnormal response processing.
2. The method according to claim 1, wherein the detecting, in response to the DOM tree structure information of the response page provided by the application, the DOM tree structure information of the response page to determine whether there is an abnormal behavior of page information automatic crawling includes:
responding to a page request which is triggered by the application and contains DOM tree structure information of the response page, and detecting and processing the DOM tree structure information of the response page to determine whether abnormal behaviors of automatically capturing page information exist or not; and the DOM tree structure information of the response page is inserted into the page request by the application through executing the response operation of the DOM tree change event.
3. The method of claim 1, further comprising:
and responding to the abnormal behavior that the page information is automatically captured, and normally acquiring another response page requested by the application.
4. The method of claim 1, wherein the DOM tree structure information of the response page comprises:
structure change information of a DOM tree of the response page; or
And complete structure information of a DOM tree of the response page.
5. The method according to any one of claims 1 to 4, wherein the detecting the DOM tree structure information of the response page comprises:
determining changed elements of the DOM tree according to the DOM tree structure information of the response page;
and determining whether abnormal behaviors of automatic capturing of page information exist or not according to the element attributes of the changed elements of the DOM tree.
6. The utility model provides a detection apparatus for page information snatchs automatically which characterized in that includes:
a code inserting unit, configured to insert a detection code in a response page requested by the obtained application, and send the response page into which the detection code is inserted to the application, so that the application executes the detection code to register a DOM tree change event to be monitored and a response operation of the DOM tree change event;
the structure detection unit is used for responding to the DOM tree structure information of the response page provided by the application and detecting and processing the DOM tree structure information of the response page so as to determine whether abnormal behaviors of page information automatic capture exist or not; the DOM tree structure information of the response page is provided for the application by executing response operation of the DOM tree change event;
and the exception handling unit is used for responding to the abnormal behavior that the page information is automatically captured and carrying out exception response handling.
7. Device according to claim 6, characterized in that the structure detection unit, in particular for detecting the structure of a structure is used for
Responding to a page request which is triggered by the application and contains DOM tree structure information of the response page, and detecting and processing the DOM tree structure information of the response page to determine whether abnormal behaviors of automatically capturing page information exist or not; and the DOM tree structure information of the response page is inserted into the page request by the application through executing the response operation of the DOM tree change event.
8. The apparatus of claim 6, wherein the code insertion unit is further configured to insert the code
And responding to the abnormal behavior that the page information is automatically captured, and normally acquiring another response page requested by the application.
9. The apparatus of claim 6, wherein the DOM tree structure information of the response page comprises:
structure change information of a DOM tree of the response page; or
And complete structure information of a DOM tree of the response page.
10. Device according to any one of claims 6 to 9, characterized in that the structure detection unit, in particular for detecting the structure of a structure is provided
Determining changed elements of the DOM tree according to the DOM tree structure information of the response page; and
and determining whether abnormal behaviors of automatic capturing of page information exist or not according to the element attributes of the changed elements of the DOM tree.
11. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.
12. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-5.
CN202111460324.XA 2021-12-02 2021-12-02 Detection method and device for automatically capturing page information, electronic equipment and readable storage medium Pending CN114329149A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111460324.XA CN114329149A (en) 2021-12-02 2021-12-02 Detection method and device for automatically capturing page information, electronic equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111460324.XA CN114329149A (en) 2021-12-02 2021-12-02 Detection method and device for automatically capturing page information, electronic equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN114329149A true CN114329149A (en) 2022-04-12

Family

ID=81048713

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111460324.XA Pending CN114329149A (en) 2021-12-02 2021-12-02 Detection method and device for automatically capturing page information, electronic equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN114329149A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114579347A (en) * 2022-04-24 2022-06-03 浙江口碑网络技术有限公司 Page abnormity detection method and device, computer equipment and readable storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114579347A (en) * 2022-04-24 2022-06-03 浙江口碑网络技术有限公司 Page abnormity detection method and device, computer equipment and readable storage medium

Similar Documents

Publication Publication Date Title
CN108900353B (en) Fault warning method and terminal equipment
CN105573733B (en) Method, web front-end and the system that browser is communicated with web front-end
CN112636957A (en) Early warning method and device based on log, server and storage medium
CN113506045A (en) Risk user identification method, device, equipment and medium based on mobile equipment
CN106789973B (en) Page security detection method and terminal equipment
CN114329149A (en) Detection method and device for automatically capturing page information, electronic equipment and readable storage medium
CN114153703A (en) Micro-service exception positioning method and device, electronic equipment and program product
GB2521637A (en) Messaging digest
CN115421831A (en) Method, device, equipment and storage medium for generating calling relation of activity component
CN115062304A (en) Risk identification method and device, electronic equipment and readable storage medium
CN114462030A (en) Privacy policy processing and evidence obtaining method, device, equipment and storage medium
CN113839944A (en) Method, device, electronic equipment and medium for coping with network attack
JP2021163475A (en) Log-based mashup code generation
CN113469732A (en) Content understanding-based auditing method and device and electronic equipment
CN113221035A (en) Method, apparatus, device, medium, and program product for determining an abnormal web page
CN108810230B (en) Method, device and equipment for acquiring incoming call prompt information
EP4184356A1 (en) Webpage integrity monitoring
CN112003833A (en) Abnormal behavior detection method and device
CN114791996B (en) Information processing method, device, system, electronic equipment and storage medium
CN117056150B (en) Network attached storage detection method, device, equipment and storage medium
CN115859349B (en) Data desensitization method and device, electronic equipment and storage medium
CN114253633A (en) Interface calling method and device, electronic equipment and storage medium
CN113971237A (en) Advertisement filtering rule generation method and device and electronic equipment
CN117499060A (en) Webpage aggressiveness detection method, device, equipment and storage medium
CN114721787A (en) Operation event processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination