CN111143650A - Method, device, medium and electronic equipment for acquiring page data - Google Patents

Method, device, medium and electronic equipment for acquiring page data Download PDF

Info

Publication number
CN111143650A
CN111143650A CN201911295167.4A CN201911295167A CN111143650A CN 111143650 A CN111143650 A CN 111143650A CN 201911295167 A CN201911295167 A CN 201911295167A CN 111143650 A CN111143650 A CN 111143650A
Authority
CN
China
Prior art keywords
page
response
login
data
request
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911295167.4A
Other languages
Chinese (zh)
Other versions
CN111143650B (en
Inventor
王政操
张霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Neusoft Corp
Original Assignee
Neusoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Neusoft Corp filed Critical Neusoft Corp
Priority to CN201911295167.4A priority Critical patent/CN111143650B/en
Priority claimed from CN201911295167.4A external-priority patent/CN111143650B/en
Publication of CN111143650A publication Critical patent/CN111143650A/en
Application granted granted Critical
Publication of CN111143650B publication Critical patent/CN111143650B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Abstract

The disclosure relates to a method, a device, a medium and an electronic device for acquiring page data, wherein the method comprises the following steps: receiving an original request of page operation; acquiring the characteristic parameters of the original request, and storing the characteristic parameters into a target structure body; generating a data acquisition request according to the target structure body, wherein the data acquisition request is used for requesting to acquire page data of a response page of the original request; sending the data acquisition request to a server; and receiving the page data of the response page sent by the server. Therefore, when the page data of the response page of the original request is acquired, the data acquisition request is generated according to the characteristic parameters of the original request, so that the data is requested to the server based on the data acquisition request, the page data can be acquired without simulating the request response operation of the browser, the complexity of acquiring the page data can be effectively reduced, and the efficiency and the accuracy of acquiring the page data are improved.

Description

Method, device, medium and electronic equipment for acquiring page data
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a method, an apparatus, a medium, and an electronic device for acquiring page data.
Background
With the development of computer technology, the demand for network data is increasing. In some scenarios, a user may want to obtain data for a certain page, but if the page does not provide data export functionality, it may be difficult for the user to obtain the data. In order to solve the above problem, in the prior art, page data of the type of page is usually obtained based on a crawler technology.
However, in the prior art, the web crawler technology is usually implemented based on a headless browser to simulate a page operation, so as to acquire data in the page. In the process, because the user operation is simulated, any step in the simulation process is required to be consistent with the user operation, and no error is caused, otherwise, any step in the middle of the simulation process is error, the whole process is invalid, and the page data cannot be acquired.
Disclosure of Invention
The invention aims to provide a method, a device, a medium and electronic equipment for conveniently and accurately acquiring page data.
In order to achieve the above object, according to a first aspect of the present disclosure, there is provided a method of acquiring page data, the method including:
receiving an original request of page operation;
acquiring the characteristic parameters of the original request, and storing the characteristic parameters into a target structure body;
generating a data acquisition request according to the target structure body, wherein the data acquisition request is used for requesting to acquire page data of a response page of the original request;
sending the data acquisition request to a server;
and receiving the page data of the response page sent by the server.
Optionally, before the step of requesting the server for the data acquisition, the method further includes:
if the fact that the login operation is needed before the original request is responded is determined, a response token corresponding to the login operation is obtained;
the generating a data acquisition request according to the target structure includes:
and generating a data acquisition request according to the target structure body and the response token.
Optionally, the obtaining of the response token corresponding to the login operation includes:
and if the login script corresponding to the original request exists, playing back the login script to obtain a response token after the login information used by the login script passes authentication, wherein the login script is generated by prerecording based on a login plug-in.
Optionally, in the process of playing back the login script, if the current operation is a verification code input operation, obtaining a verification code image in the current page, and sending the verification code image to a verification code identification module, so that the verification code identification module identifies the verification code image to obtain verification code information; and executing the verification code input operation according to the received verification code information obtained by the verification code identification module.
Optionally, the obtaining of the response token corresponding to the login operation includes:
if the login operation is required before the original request is responded, and a login script corresponding to the original request does not exist, outputting prompt information to prompt a user to perform the login operation;
detecting a login state through a login plug-in;
and receiving a response token sent by the login plug-in when the login plug-in detects that the login is successful.
Optionally, after the step of receiving the page data of the response page sent by the server, the method further includes:
determining a text similarity parameter corresponding to the first page text information of the response page and the second page text information of the response sample corresponding to the original request;
determining whether the response page is matched with the response sample or not according to the text similarity parameter;
and if the response page is determined to be matched with the response sample, storing the page data of the response page.
Optionally, the storing the page data of the response page includes:
and storing the page data of the response page into a structured file so as to store the page data into a database based on the structured file.
According to a second aspect of the present disclosure, there is provided an apparatus for acquiring page data, the apparatus comprising:
the first receiving module is used for receiving an original request of page operation;
the first storage module is used for acquiring the characteristic parameters of the original request and storing the characteristic parameters into a target structure body;
the generating module is used for generating a data acquisition request according to the target structure body, wherein the data acquisition request is used for requesting to acquire page data of a response page of the original request;
the sending module is used for sending the data acquisition request to a server;
and the second receiving module is used for receiving the page data of the response page sent by the server.
Optionally, the apparatus further comprises:
an obtaining module, configured to, before the sending module sends the data obtaining request to a server, obtain a response token corresponding to a login operation if it is determined that the login operation is required before the original request is responded;
the generation module is configured to:
and generating a data acquisition request according to the target structure body and the response token.
Optionally, the obtaining module is configured to:
and if the login script corresponding to the original request exists, playing back the login script to obtain a response token after the login information used by the login script passes authentication, wherein the login script is generated by prerecording based on a login plug-in.
Optionally, in the process of playing back the login script, if the current operation is a verification code input operation, obtaining a verification code image in the current page, and sending the verification code image to a verification code identification module, so that the verification code identification module identifies the verification code image to obtain verification code information; and executing the verification code input operation according to the received verification code information obtained by the verification code identification module.
Optionally, the obtaining module includes:
the output sub-module is used for outputting prompt information to prompt a user to carry out login operation before responding to the original request and under the condition that a login script corresponding to the original request does not exist;
the detection submodule is used for detecting the login state through the login plug-in;
and the receiving submodule is used for receiving a response token sent by the login plugin under the condition of detecting that the login is successful.
Optionally, the apparatus further comprises:
a first determining module, configured to determine, after the second receiving module receives the page data of the response page sent by the server, a text similarity parameter corresponding to first page text information of the response page and second page text information of a response sample corresponding to the original request;
the second determining module is used for determining whether the response page is matched with the response sample or not according to the text similarity parameter;
and the second storage module is used for storing the page data of the response page under the condition that the response page is determined to be matched with the response sample.
Optionally, the second storage module is configured to:
and storing the page data of the response page into a structured file so as to store the page data into a database based on the structured file.
According to a third aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of any of the methods of the first aspect described above.
According to a fourth aspect of the present disclosure, there is provided an electronic device comprising:
a memory having a computer program stored thereon;
a processor for executing the computer program in the memory to implement the steps of the method of any of the first aspects above.
In the technical scheme, an original request for receiving page operation is determined; acquiring the characteristic parameters of the original request, and storing the characteristic parameters into a target structure body; generating a data acquisition request according to the target structure; and sending the data acquisition request to a server, and receiving the page data of the response page sent by the server. According to the technical scheme, when the page data of the response page of the original request is obtained, the data obtaining request is generated according to the characteristic parameters of the original request, so that the data is requested to the server based on the data obtaining request, the page data can be obtained without simulating the request response operation of the browser, on one hand, the complexity of obtaining the page data can be effectively reduced, the page data obtaining mode is simplified, and the reliability and the accuracy of obtaining the page data are improved. On the other hand, the application range of the method for acquiring the page data can be effectively widened.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure without limiting the disclosure. In the drawings:
FIG. 1 is a flow chart of a method of obtaining page data provided in accordance with one embodiment of the present disclosure;
FIG. 2 is a block diagram of an apparatus for acquiring page data provided in accordance with one embodiment of the present disclosure;
FIG. 3 is a block diagram illustrating an electronic device in accordance with an exemplary embodiment;
FIG. 4 is a block diagram illustrating an electronic device in accordance with an example embodiment.
Detailed Description
The following detailed description of specific embodiments of the present disclosure is provided in connection with the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present disclosure, are given by way of illustration and explanation only, not limitation.
Fig. 1 is a flowchart illustrating a method for acquiring page data according to an embodiment of the present disclosure. As shown in fig. 1, the method includes:
in S11, an original request of a page operation is received, where the page operation may be a page query operation, a view operation, or the like, and the original request is a request initiated to perform the page operation.
In S12, the characteristic parameters of the original request are acquired, and the characteristic parameters are stored in the target structure.
Optionally, the characteristic parameter may include a path parameter corresponding to the original request, and may be obtained from a URL (Uniform Resource Locator) of the original request. The original request can be used for requesting all pages, and the path parameter is the URL of the original request; the method can also be used for requesting a certain part of data in the page, for example, the original request is used for requesting a text part in the page, and the path parameter corresponding to the original request is the URL corresponding to the text part.
Optionally, the feature parameters may further include a query parameter, where the query parameter is a parameter for restricting display data in the page, for example, if a list of students in the query system with a score higher than 80 is queried, the query parameter is that the score is higher than 80. Because the systems corresponding to the original request are different, in one embodiment, the query parameter may be obtained from the URL of the original request; in another embodiment, the query parameters may be obtained from the requesting entity of the original request, which is not limited by this disclosure.
After obtaining the above-mentioned characteristic parameters of the original request, the characteristic parameters may be stored in the target structure. Illustratively, the target structure may be a JSON file, whose format is as follows:
URL http:// localhost:8080/api/{ user-defined path A }/{ user-defined path B }
Path parameters: is there a { user-defined parameter a }, parameter value & { user-defined parameter b }, parameter value query parameter:
Figure BDA0002320317830000061
Figure BDA0002320317830000071
therefore, after the characteristic parameters are extracted, the characteristic parameters can be correspondingly stored into the target structure as parameter values, and a standardized representation mode is adopted for different forms of original requests, so that the original requests with different representation modes can be uniformly represented and managed, the compatibility of the method is improved, and the application scenes and the range of the method are widened.
In S13, a data acquisition request for requesting acquisition of page data of a response page of the original request is generated according to the target structure.
In S14, a data acquisition request is sent to the server.
That is, in the present application, when acquiring page data of a response page according to an original request, a data acquisition request is directly generated according to characteristic parameters of a target structure, rather than acquiring page data by simulating the operation of a Browser according to the original request as in the prior art, and therefore, the present application is applicable to a system of a B/S (Browser/Server) architecture, and is also applicable to a C/S (Client/Server) architecture and an APP.
In S15, page data of the response page sent by the server is received.
In the technical scheme, an original request for receiving page operation is determined; acquiring the characteristic parameters of the original request, and storing the characteristic parameters into a target structure body; generating a data acquisition request according to the target structure; and sending the data acquisition request to a server, and receiving the page data of the response page sent by the server. According to the technical scheme, when the page data of the response page of the original request is obtained, the data obtaining request is generated according to the characteristic parameters of the original request, so that the data is requested to the server based on the data obtaining request, the page data can be obtained without simulating the request response operation of the browser, on one hand, the complexity of obtaining the page data can be effectively reduced, the page data obtaining mode is simplified, and the reliability and the accuracy of obtaining the page data are improved. On the other hand, the application range of the method for acquiring the page data can be effectively widened.
The following describes in detail a method of determining a data acquisition request corresponding to an original request.
For example, a recording tool may be used to record in advance a process from when a user initiates an operation request to when a page corresponding to the operation request is obtained, so that a description file of a recording result may be obtained.
And then, the description file of the recording result can be analyzed, and the analyzed result is converted into a predefined standard format. Since the formats of the description files of the recording results obtained by different recording tools may be different, the adaptation to various recording tools can be realized by converting the analyzed result into a standard format. It should be noted that, in the prior art, in the process of initiating one operation request to obtain data of a response page corresponding to the operation request, multiple request interactions may be performed with a background server, for example, a request for obtaining a style file of the response page is initiated, a request for obtaining a JS file of the response page is initiated, and a request for obtaining page data of the response page is initiated. In the above operation request, the request for acquiring the page data of the response page is a request for acquiring data in the process of responding to the operation request, that is, a data acquisition request described in the present disclosure.
All requests in the process of responding to the operation request are contained in a description file with a standard format (for convenience of description, abbreviated as a formatted description file). Illustratively, the formatted description file includes 7 requests a-g. When determining a data acquisition request according to the formatted description file, the data acquisition request may be extracted and analyzed from the page data displayed in the response page to obtain keyword information. Based on the keyword information, a request containing the keyword information is queried from the formatted description file, illustratively to request g, which is taken as a query request. Then, the parameters carried in the request g can be used as new keyword information to query the request containing the new keyword information from the formatted description file. For example, the parameters carried in the request g are cookie a and ciphertext B, and then the request corresponding to cookie a may be queried from the formatted description file based on cookie a, and the request corresponding to ciphertext B may be queried from the formatted description file based on ciphertext B. When a plurality of requests are queried, the request with the earliest request time among the plurality of requests is used as a new query request. If the new request is not queried or the queried request is a login request, the analysis process is ended, the determined query requests are sequenced from early to late according to the request time, and the sequencing result is used as a data acquisition request, namely the data acquisition request can comprise a plurality of sub-requests with sequence correlation.
The recording tool may be any existing recording tool, and the parsing of the description file is prior art, which is not limited in this disclosure.
Therefore, by the technical scheme, the data acquisition request actually used for acquiring the page data in one operation request can be determined. Therefore, when the page data is acquired, the corresponding data acquisition request can be generated according to the original request, so that unnecessary style files or JS files can be effectively avoided from being acquired, and the efficiency of acquiring the page data is improved.
Optionally, before the step of sending the data obtaining request to the server, the method further includes:
and if the fact that the login operation is required before the original request is responded is determined, acquiring a response token corresponding to the login operation. The response token is returned after the server passes the authentication of the login information of the user and is used for representing the legal login state of the user.
Wherein, in one embodiment, the determination that a login operation is required before responding to the original request may be determined by an identifier associated with the original request. For example, if a login operation is required before responding to the original request, the identifier (e.g., identifier 1) associated with the original request is used to indicate that the login operation is required, and if a login operation is not required before responding to the original request, the identifier (e.g., identifier 2) associated with the original request is used to indicate that the login operation is not required. The association identifier may associate the corresponding original request when determining the data acquisition request. Therefore, when the original request is acquired, whether the login operation is needed can be directly determined according to the associated identifier.
The generating a data acquisition request according to the target structure includes:
and generating a data acquisition request according to the target structure body and the response token.
In the above technical solution, before sending the data acquisition request to the server, it may be determined whether the current user is in a legal state by determining that a login operation is required before responding to the original request. If the login operation is required before the original request is responded, a response token corresponding to the login operation needs to be obtained, so that the legal login state of the user is ensured. And when the data acquisition request is generated, the response token can be carried in the data acquisition request, so that the accuracy of the response page can be ensured, the efficiency and the accuracy of page data acquisition are improved, and the influence of the login state of a user on page data acquisition is avoided.
Optionally, an exemplary implementation of the obtaining of the response token corresponding to the login operation is as follows:
and if the login script corresponding to the original request exists, playing back the login script to obtain a response token after the login information used by the login script passes authentication, wherein the login script is generated by prerecording based on a login plug-in.
For example, for a simpler login process, such as a login process requiring no authentication code or a login process requiring a scene such as an image authentication code, a user may record a login script through a login plug-in, and then login information, such as an account number, a password, and the like, input when the user logs in is recorded in the login script, and the login script is associated with an original request. Therefore, when it is determined that a login operation is required before a response to the original request, a login script corresponding to the original request can be obtained, and the login is performed by playing back the login script, so that a response token can be obtained.
Optionally, in the process of playing back the login script, if the current operation is a verification code input operation, acquiring a verification code image in the current page, and sending the verification code image to a verification code identification module, so that the verification code image is identified by the verification code identification module to acquire verification code information; and executing the verification code input operation according to the received verification code information obtained by the verification code identification module.
In this embodiment, for a login scenario requiring an image authentication code, the image authentication code when a user logs in is dynamically changed, and therefore, in the process of playing back a login script, if the current operation is an authentication code input operation, authentication code information in the image authentication code displayed in real time in a page needs to be input at this time. Therefore, the verification code image in the current page can be acquired to be identified through the verification code identification module, so that the identified verification code information is input, and the login operation is completed.
By the technical scheme, automatic login operation can be realized through the pre-recorded login script, so that participation of a user can be effectively reduced, and the workload of the user is reduced. Meanwhile, the login operation can be completed according to the login script and the dynamically-changed image verification code in the page by acquiring the verification code image and identifying the verification code information, so that the accuracy and convenience of the login operation are improved, the user operation is further reduced, and the user experience is improved.
Optionally, another implementation manner of obtaining the response token corresponding to the login operation is as follows:
if the login operation is required before the original request is responded, and a login script corresponding to the original request does not exist, outputting prompt information to prompt a user to perform the login operation;
the login status is detected by the login plug-in, wherein the login status may include login success and login failure. When the user logs in successfully, the server returns a response token corresponding to the login information of the user, and at the moment, the login plug-in can detect that the login is successful and can acquire the response token.
And receiving a response token sent by the login plug-in when the login plug-in detects that the login is successful.
In this embodiment, in the case where there is no login script corresponding to the original request, the response token may be obtained by prompting the user to perform a login operation. Specifically, the user can log in a browser or a client, and the login operation of the user can be monitored through a login plug-in, wherein the login plug-in is executed in a hidden mode, and the operation is transparent to the user. Therefore, after the user successfully logs in, the response token when the login is successful can be obtained through the login plug-in. The storage location of the response token may be different for different systems, e.g., the response token may be stored in a cookie, local storage, or session storage. Thus, data in the cookie, the local storage, and the session storage can be obtained through the login plug-in, thereby obtaining the response token.
In the above technical solution, if a login operation is required before the response to the original request is made, and there is no login script corresponding to the original request, the user may be prompted to log in. And the user can obtain the response token only by logging in the browser or the client side of the user without operating a login plug-in, so that repeated login operation of the user is avoided, the accuracy of a subsequent generated data acquisition request is ensured, and the accuracy of the acquired page data is improved.
In order to ensure the accuracy of the page data obtained under the condition, the following embodiments are also provided in the present disclosure.
Optionally, after the step of receiving the page data of the response page sent by the server, the method further includes:
and determining a text similarity parameter corresponding to the first page text information of the response page and the second page text information of the response sample corresponding to the original request. The response sample corresponding to the original request is a preset page responding to the original request, where the page may be a response page used when analyzing and determining the data acquisition request corresponding to the original request, or a page set by a developer according to the response page, and the disclosure does not limit this.
For example, the first page text information of the response page may be a text vector corresponding to the response page, and the second page text information of the response sample corresponding to the original request may be a text vector corresponding to the response sample, so that the text similarity parameter may be determined by the text vector corresponding to the response page and the text vector corresponding to the response sample. For example, the text similarity parameter may be a distance or cosine similarity between a text vector corresponding to the response page and a text vector corresponding to the response sample. The way of calculating the distance or cosine similarity between vectors is the prior art, and is not described herein again.
And determining whether the response page is matched with the response sample or not according to the text similarity parameter.
As an example, if the text similarity parameter is a distance between a text vector corresponding to the response page and a text vector corresponding to the response sample, it is determined that the response page matches the response sample when the distance is smaller than a preset distance threshold.
As an example, if the text similarity parameter is cosine similarity between a text vector corresponding to the response page and a text vector corresponding to the response sample, it is determined that the response page matches the response sample when the cosine value is greater than a preset cosine similarity threshold.
The distance threshold and the cosine similarity threshold may be set according to an actual usage scenario, which is not limited in the present disclosure.
And if the response page is determined to be matched with the response sample, storing the page data of the response page. That is, when the response page is determined to be an accurate page, its page data is stored.
In the technical scheme, whether the page data of the response page is the data acquired by the original request is determined by determining whether the response page and the response are matched in a sample; when the response page is determined to be matched with the response sample, the page data of the response page is stored, so that the problem that the acquired page data is inconsistent with the data acquired by the original request can be avoided by checking the response page, and the accuracy and the effectiveness of acquiring the page data are improved. Meanwhile, the wrong page data can be effectively prevented from being stored, and the waste of storage resources is avoided.
Optionally, the storing the page data of the response page includes:
and storing the page data of the response page into a structured file so as to store the page data into a database based on the structured file.
Illustratively, the structured file may be an HTML, JSON, XML file, and the page data of the response page is stored in the structured file, so that the page data can be stored in the database through the ETL technology based on the structured file. The ETL (Extract-Transform-Load) is used for the process of extracting (Extract), converting (Transform) and loading (Load) data from a source end to a destination end.
In the technical scheme, the page data are stored in the structured file, so that the data in the structured file can be conveniently stored in the database, the data can be safely and persistently stored, the user can check the data conveniently, and the user use experience is improved.
The present disclosure also provides an apparatus for acquiring page data, as shown in fig. 2, the apparatus 10 includes:
a first receiving module 100, configured to receive an original request for a page operation;
a first storage module 200, configured to obtain a feature parameter of the original request, and store the feature parameter in a target structure;
a generating module 300, configured to generate a data obtaining request according to the target structure, where the data obtaining request is used to request to obtain page data of a response page of the original request;
a sending module 400, configured to send the data obtaining request to a server;
a second receiving module 500, configured to receive the page data of the response page sent by the server.
Optionally, the apparatus further comprises:
an obtaining module, configured to, before the sending module sends the data obtaining request to a server, obtain a response token corresponding to a login operation if it is determined that the login operation is required before the original request is responded;
the generation module is configured to:
and generating a data acquisition request according to the target structure body and the response token.
Optionally, the obtaining module is configured to:
and if the login script corresponding to the original request exists, playing back the login script to obtain a response token after the login information used by the login script passes authentication, wherein the login script is generated by prerecording based on a login plug-in.
Optionally, in the process of playing back the login script, if the current operation is a verification code input operation, obtaining a verification code image in the current page, and sending the verification code image to a verification code identification module, so that the verification code identification module identifies the verification code image to obtain verification code information; and executing the verification code input operation according to the received verification code information obtained by the verification code identification module.
Optionally, the obtaining module includes:
the output sub-module is used for outputting prompt information to prompt a user to carry out login operation before responding to the original request and under the condition that a login script corresponding to the original request does not exist;
the detection submodule is used for detecting the login state through the login plug-in;
and the receiving submodule is used for receiving a response token sent by the login plugin under the condition of detecting that the login is successful.
Optionally, the apparatus further comprises:
a first determining module, configured to determine, after the second receiving module receives the page data of the response page sent by the server, a text similarity parameter corresponding to first page text information of the response page and second page text information of a response sample corresponding to the original request;
the second determining module is used for determining whether the response page is matched with the response sample or not according to the text similarity parameter;
and the second storage module is used for storing the page data of the response page under the condition that the response page is determined to be matched with the response sample.
Optionally, the second storage module is configured to:
and storing the page data of the response page into a structured file so as to store the page data into a database based on the structured file.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
The present disclosure also provides a computer interface, which is used to implement the steps of the above method for acquiring page data. The method for acquiring the page data is packaged into a computer interface, so that a third party can call the page data conveniently, and the convenience and the compatibility of acquiring the page data are improved.
Fig. 3 is a block diagram illustrating an electronic device 700 according to an example embodiment. As shown in fig. 3, the electronic device 700 may include: a processor 701 and a memory 702. The electronic device 700 may also include one or more of a multimedia component 703, an input/output (I/O) interface 704, and a communication component 705.
The processor 701 is configured to control the overall operation of the electronic device 700, so as to complete all or part of the steps in the above-mentioned method for acquiring page data. The memory 702 is used to store various types of data to support operation at the electronic device 700, such as instructions for any application or method operating on the electronic device 700 and application-related data, such as contact data, transmitted and received messages, pictures, audio, video, and the like. The Memory 702 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk, or optical disk. The multimedia components 703 may include screen and audio components. Wherein the screen may be, for example, a touch screen and the audio component is used for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may further be stored in the memory 702 or transmitted through the communication component 705. The audio assembly also includes at least one speaker for outputting audio signals. The I/O interface 704 provides an interface between the processor 701 and other interface modules, such as a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 705 is used for wired or wireless communication between the electronic device 700 and other devices. Wireless communication, such as Wi-Fi, bluetooth, Near Field Communication (NFC), 2G, 3G, 4G, NB-IOT, eMTC, or other 5G, etc., or a combination of one or more of them, which is not limited herein. The corresponding communication component 705 may thus include: Wi-Fi module, Bluetooth module, NFC module, etc.
In an exemplary embodiment, the electronic Device 700 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components for performing the above-described method of acquiring page data.
In another exemplary embodiment, a computer readable storage medium comprising program instructions which, when executed by a processor, implement the steps of the above-described method of acquiring page data is also provided. For example, the computer readable storage medium may be the memory 702 comprising program instructions executable by the processor 701 of the electronic device 700 to perform the method of obtaining page data described above.
Fig. 4 is a block diagram illustrating an electronic device 1900 according to an example embodiment. For example, the electronic device 1900 may be provided as a server. Referring to fig. 4, an electronic device 1900 includes a processor 1922, which may be one or more in number, and a memory 1932 for storing computer programs executable by the processor 1922. The computer program stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processor 1922 may be configured to execute the computer program to perform the above-described method of acquiring page data.
Additionally, electronic device 1900 may also include a power component 1926 and a communication component 1950, the power component 1926 may be configured to perform power management of the electronic device 1900, and the communication component 1950 may be configured to enable communication, e.g., wired or wireless communication, of the electronic device 1900. In addition, the electronic device 1900 may also include input/output (I/O) interfaces 1958. The electronic device 1900 may operate based on an operating system, such as Windows Server, Mac OS XTM, UnixTM, Linux, etc., stored in memory 1932.
In another exemplary embodiment, a computer readable storage medium comprising program instructions which, when executed by a processor, implement the steps of the above-described method of acquiring page data is also provided. For example, the computer readable storage medium may be the memory 1932 that includes program instructions executable by the processor 1922 of the electronic device 1900 to perform the method for obtaining page data described above.
In another exemplary embodiment, a computer program product is also provided, which comprises a computer program executable by a programmable apparatus, the computer program having code portions for performing the above-mentioned method of acquiring page data when executed by the programmable apparatus.
The preferred embodiments of the present disclosure are described in detail with reference to the accompanying drawings, however, the present disclosure is not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solution of the present disclosure within the technical idea of the present disclosure, and these simple modifications all belong to the protection scope of the present disclosure.
It should be noted that the various features described in the above embodiments may be combined in any suitable manner without departing from the scope of the invention. In order to avoid unnecessary repetition, various possible combinations will not be separately described in this disclosure.
In addition, any combination of various embodiments of the present disclosure may be made, and the same should be considered as the disclosure of the present disclosure, as long as it does not depart from the spirit of the present disclosure.

Claims (10)

1. A method for acquiring page data, the method comprising:
receiving an original request of page operation;
acquiring the characteristic parameters of the original request, and storing the characteristic parameters into a target structure body;
generating a data acquisition request according to the target structure body, wherein the data acquisition request is used for requesting to acquire page data of a response page of the original request;
sending the data acquisition request to a server;
and receiving the page data of the response page sent by the server.
2. The method of claim 1, wherein prior to the step of sending the data acquisition request to the server, the method further comprises:
if the fact that the login operation is needed before the original request is responded is determined, a response token corresponding to the login operation is obtained;
the generating a data acquisition request according to the target structure includes:
and generating a data acquisition request according to the target structure body and the response token.
3. The method of claim 2, wherein obtaining the response token corresponding to the login operation comprises:
and if the login script corresponding to the original request exists, playing back the login script to obtain a response token after the login information used by the login script passes authentication, wherein the login script is generated by prerecording based on a login plug-in.
4. The method according to claim 3, wherein in the process of playing back the login script, if the current operation is a verification code input operation, obtaining a verification code image in the current page, and sending the verification code image to a verification code identification module so that the verification code identification module identifies the verification code image to obtain verification code information; and executing the verification code input operation according to the received verification code information obtained by the verification code identification module.
5. The method of claim 2, wherein obtaining the response token corresponding to the login operation comprises:
if the login operation is required before the original request is responded, and a login script corresponding to the original request does not exist, outputting prompt information to prompt a user to perform the login operation;
detecting a login state through a login plug-in;
and receiving a response token sent by the login plug-in when the login plug-in detects that the login is successful.
6. The method of claim 1, wherein after the step of receiving page data of the response page sent by the server, the method further comprises:
determining a text similarity parameter corresponding to the first page text information of the response page and the second page text information of the response sample corresponding to the original request;
determining whether the response page is matched with the response sample or not according to the text similarity parameter;
and if the response page is determined to be matched with the response sample, storing the page data of the response page.
7. The method of claim 6, wherein storing the page data of the response page comprises:
and storing the page data of the response page into a structured file so as to store the page data into a database based on the structured file.
8. An apparatus for obtaining page data, the apparatus comprising:
the first receiving module is used for receiving an original request of page operation;
the first storage module is used for acquiring the characteristic parameters of the original request and storing the characteristic parameters into a target structure body;
the generating module is used for generating a data acquisition request according to the target structure body, wherein the data acquisition request is used for requesting to acquire page data of a response page of the original request;
the sending module is used for sending the data acquisition request to a server;
and the second receiving module is used for receiving the page data of the response page sent by the server.
9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
10. An electronic device, comprising:
a memory having a computer program stored thereon;
a processor for executing the computer program in the memory to carry out the steps of the method of any one of claims 1 to 7.
CN201911295167.4A 2019-12-16 Method, device, medium and electronic equipment for acquiring page data Active CN111143650B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911295167.4A CN111143650B (en) 2019-12-16 Method, device, medium and electronic equipment for acquiring page data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911295167.4A CN111143650B (en) 2019-12-16 Method, device, medium and electronic equipment for acquiring page data

Publications (2)

Publication Number Publication Date
CN111143650A true CN111143650A (en) 2020-05-12
CN111143650B CN111143650B (en) 2024-04-26

Family

ID=

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113434234A (en) * 2021-06-29 2021-09-24 青岛海尔科技有限公司 Page jump method, device, computer readable storage medium and processor

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101178739A (en) * 2007-11-30 2008-05-14 四川长虹电器股份有限公司 Embedded browsing system and web page browsing method
JP2011070453A (en) * 2009-09-25 2011-04-07 Five Drive Inc Procurement information retrieval system
CN102487403A (en) * 2010-12-03 2012-06-06 腾讯科技(深圳)有限公司 Method and device for executing JS (JavaScript) by server side
CN102918819A (en) * 2011-06-03 2013-02-06 华为技术有限公司 Method, apparatus and system for online application processing
US20130073382A1 (en) * 2011-09-16 2013-03-21 Kontera Technologies, Inc. Methods and systems for enhancing web content based on a web search query
US9158845B1 (en) * 2004-04-29 2015-10-13 Aol Inc. Reducing latencies in web page rendering
WO2015192288A1 (en) * 2014-06-16 2015-12-23 华为技术有限公司 Method, terminal and system for establishing communication connection
CN109063142A (en) * 2018-08-06 2018-12-21 网宿科技股份有限公司 Web page resources method for pushing and server
CN109445968A (en) * 2018-11-09 2019-03-08 金瓜子科技发展(北京)有限公司 Service request processing method, device, equipment and the storage medium of different agreement
CN110198333A (en) * 2018-04-18 2019-09-03 腾讯科技(深圳)有限公司 Data capture method and device, storage medium and electronic device
CN110505258A (en) * 2018-05-17 2019-11-26 腾讯科技(深圳)有限公司 Webpage load and response method, device, computer equipment and storage medium

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9158845B1 (en) * 2004-04-29 2015-10-13 Aol Inc. Reducing latencies in web page rendering
CN101178739A (en) * 2007-11-30 2008-05-14 四川长虹电器股份有限公司 Embedded browsing system and web page browsing method
JP2011070453A (en) * 2009-09-25 2011-04-07 Five Drive Inc Procurement information retrieval system
CN102487403A (en) * 2010-12-03 2012-06-06 腾讯科技(深圳)有限公司 Method and device for executing JS (JavaScript) by server side
CN102918819A (en) * 2011-06-03 2013-02-06 华为技术有限公司 Method, apparatus and system for online application processing
US20130073382A1 (en) * 2011-09-16 2013-03-21 Kontera Technologies, Inc. Methods and systems for enhancing web content based on a web search query
WO2015192288A1 (en) * 2014-06-16 2015-12-23 华为技术有限公司 Method, terminal and system for establishing communication connection
CN110198333A (en) * 2018-04-18 2019-09-03 腾讯科技(深圳)有限公司 Data capture method and device, storage medium and electronic device
CN110505258A (en) * 2018-05-17 2019-11-26 腾讯科技(深圳)有限公司 Webpage load and response method, device, computer equipment and storage medium
CN109063142A (en) * 2018-08-06 2018-12-21 网宿科技股份有限公司 Web page resources method for pushing and server
CN109445968A (en) * 2018-11-09 2019-03-08 金瓜子科技发展(北京)有限公司 Service request processing method, device, equipment and the storage medium of different agreement

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113434234A (en) * 2021-06-29 2021-09-24 青岛海尔科技有限公司 Page jump method, device, computer readable storage medium and processor
CN113434234B (en) * 2021-06-29 2023-06-09 青岛海尔科技有限公司 Page jump method, device, computer readable storage medium and processor

Similar Documents

Publication Publication Date Title
CN108897691B (en) Data processing method, device, server and medium based on interface simulation service
CN107122258B (en) Method and equipment for checking state code of test interface
CN107122297B (en) Method and equipment for generating request message of test interface
CN111262759B (en) Internet of things platform testing method, device, equipment and storage medium
CN110879903A (en) Evidence storage method, evidence verification method, evidence storage device, evidence verification device, evidence storage equipment and evidence verification medium
CN107436844B (en) Method and device for generating interface use case aggregate
WO2018059393A1 (en) Test method for mobile application program, server, terminal and storage medium
CN107085549B (en) Method and device for generating fault information
CN108521612B (en) Video abstract generation method, device, server and storage medium
CN109376534B (en) Method and apparatus for detecting applications
CN111931188A (en) Vulnerability testing method and system under login scene
CN111654495B (en) Method, apparatus, device and storage medium for determining traffic generation source
CN111309632A (en) Application program testing method and device, computer equipment and storage medium
CN106658666B (en) Method and equipment for establishing wireless connection
CN113961836A (en) Page jump method and device, electronic equipment and storage medium
CN111506496A (en) Test data acquisition method and device, electronic equipment and storage medium
CN111597559B (en) System command injection vulnerability detection method and device, equipment and storage medium
CN113067802A (en) User identification method, device, equipment and computer readable storage medium
CN111143650B (en) Method, device, medium and electronic equipment for acquiring page data
CN115022201B (en) Data processing function test method, device, equipment and storage medium
CN113886221B (en) Test script generation method and device, storage medium and electronic equipment
CN111143650A (en) Method, device, medium and electronic equipment for acquiring page data
CN113806815B (en) File signing method and system
CN112685072B (en) Method, device, equipment and storage medium for generating communication address knowledge base
CN114816815A (en) Fault positioning method, log format configuration method, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant