CN110968755A - Method and device for crawling data - Google Patents

Method and device for crawling data Download PDF

Info

Publication number
CN110968755A
CN110968755A CN201811145420.3A CN201811145420A CN110968755A CN 110968755 A CN110968755 A CN 110968755A CN 201811145420 A CN201811145420 A CN 201811145420A CN 110968755 A CN110968755 A CN 110968755A
Authority
CN
China
Prior art keywords
crawling
data
task information
program
task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811145420.3A
Other languages
Chinese (zh)
Inventor
陆生辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN201811145420.3A priority Critical patent/CN110968755A/en
Publication of CN110968755A publication Critical patent/CN110968755A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses a method and a device for crawling data, which aim to solve the problem that a data crawling program cannot continuously execute tasks. The method comprises the following steps: the background service receives task information sent by the server, wherein the task information is used for indicating data in the crawling user interface; the background service responds to the task information and sends the task information to a data crawling program; the background service receives a crawling result of the data crawling program and sends the crawling result to the server, and the crawling result is data in the user interface crawled by the data crawling program according to the task information.

Description

Method and device for crawling data
Technical Field
The invention relates to the technical field of information crawling, in particular to a method and a device for crawling data.
Background
The user interface automatic testing tool (Uiautomator) is a tool derived by google for the user interface automatic testing of android phone software, namely, a common manual testing. Specifically, each control element is clicked to see whether the output result is in accordance with the expectation. For example: and respectively inputting correct and wrong user name and password on a login interface, and then clicking a login button to see whether login is possible and whether an error prompt exists or not. The automatic user interface testing tool has a whole set of operation functions of application starting, control operation, information acquisition and the like.
However, data crawling programs like the automatic user interface testing tool are stateless, and the data crawling programs terminate the test after running a task once, and cannot continuously execute the task.
Disclosure of Invention
In view of the foregoing problems, an object of the embodiments of the present invention is to provide a method and an apparatus for crawling data, so as to solve the problem that a data crawling program cannot continuously perform tasks.
In order to solve the above technical problems, embodiments of the present invention provide the following technical solutions:
in a first aspect, an embodiment of the present invention provides a method for crawling data, where the method includes: the background service receives task information sent by the server, wherein the task information is used for indicating data in the crawling user interface; the background service responds to the task information and sends the task information to a data crawling program; the background service receives a crawling result of the data crawling program and sends the crawling result to the server, and the crawling result is data in the user interface crawled by the data crawling program according to the task information.
In other embodiments of the present invention, the background service, in response to the task information, sends the task information to a data crawling program, including: the background service responds to the task information and judges the state of the data crawling program; when the data crawling program is in a closed state, starting the data crawling program, and sending the task information to the data crawling program; and when the data crawling program is in a running state, sending the task information to the data crawling program.
In other embodiments of the present invention, the sending the task information to the data crawling program when the data crawling program is in a running state includes: when the data crawling program is in a running state, judging the execution sequence of the task information and the task information currently executed in the data crawling program; when the task information is executed firstly, the task information is directly sent to the data crawling program, so that the data crawling program executes the task information firstly; when the executing task information is executed first, after the data crawling program finishes executing the executing task information, the task information is sent to the data crawling program, so that the data crawling program finishes executing the executing task information and then executes the task information.
In other embodiments of the present invention, the background service receiving the crawling result from the data crawling program and sending the crawling result to the server includes: when the data crawled by the data crawling program is not the data required by the task information, sending a first crawling result to the server, wherein the first crawling result comprises information indicating that the data crawled by the data crawling program is not the data required by the task information; and/or when the data crawling program generates an error in the process of executing the task information, sending a second crawling result to the server, wherein the second crawling result comprises information indicating that the data crawling program generates an error in the process of executing the task information; and/or, when the data that the program was crawled to the data is when the data that task information needs, send the third result of crawling for the server, the third result of crawling includes the data that the program was crawled to the data, and indicate the data that the program was crawled to the data is the information of the data that task information needs, so that right the third result of crawling is handled.
In other embodiments of the present invention, when the data crawled by the data crawling program is the data required by the task information, sending a third crawling result to the server includes: when the data crawled by the data crawling program is all data required by the task information, sending the third crawling result to the server; when the data crawled by the data crawling program is partial data required by the task information, splicing the third crawling result with an additional result to generate a splicing result; sending the splicing result to the server; wherein the additional result is data different from the third crawling result that is needed by the task information.
In other embodiments of the present invention, the method further comprises: the background service receives a control command sent by the server, wherein the control command is generated by the server according to the crawling result, and the control command is used for indicating to restart or terminate the data crawling program.
In other embodiments of the present invention, the method further comprises: the background service generates a task state according to the crawling result, wherein the task state is used for indicating whether the data crawling program completes the task in the task information or not; the background service determines whether to continue to acquire new task information sent by the server according to the task state; when the task state indicates that the data crawling program completes the task in the task information, the background service continues to acquire the new task information; and when the task state indicates that the data crawling program does not complete the task in the task information, the background service stops acquiring the new task information.
In other embodiments of the present invention, the obtaining, by the background service, the task information sent by the server includes: the background service sends a query to the server according to a preset time interval, wherein the query is used for indicating the server to send the task information to the background service; and the background service receives the task information sent by the server.
In other embodiments of the present invention, the task information includes: crawling tasks and configuration information; the background service responds to the task information and sends the task information to a data crawling program, and the task information comprises the following steps: the background service responds to the crawling task, and the crawling task is used for indicating crawling content; the background service sends the configuration information to the data crawling program, wherein the configuration information is an operation step for the data crawling program to complete the crawling task; and/or the background service receives the crawling result of the data crawling program, and comprises the following steps: and the background service receives the data in the user interface crawled by the data crawling program according to the operation steps in the configuration information and the crawling content indicated by the crawling task.
In a second aspect, an embodiment of the present invention provides a method for crawling data, where the method includes: the data crawling program receives task information sent by a background service, the background service is used for receiving the task information sent by a server, and the task information is used for indicating data in a crawling user interface; the data crawling program responds to the task information, crawls data in the user interface according to the task information and generates a crawling result; and the data crawling program sends the crawling result to the background service, and the background service is used for sending the crawling result to the server.
In other embodiments of the present invention, the task information includes: crawling tasks and configuration information; the crawling task is used for indicating crawling content, and the configuration information is an operation step of completing the crawling task for the data crawling program; the data crawling program responds to the task information, crawls data in the user interface according to the task information, and comprises the following steps: and the data crawling program crawls the data in the user interface according to the operation steps in the configuration information and the crawling content indicated by the crawling task.
In a third aspect, an embodiment of the present invention provides an apparatus for crawling data, where the apparatus includes: the receiving module is configured to receive task information sent by the server through background service, wherein the task information is used for indicating data in the crawling user interface; the sending module is configured to respond to the task information by the background service and send the task information to a data crawling program; and the processing module is configured to receive a crawling result of the data crawling program by the background service and send the crawling result to the server, wherein the crawling result is data in the user interface crawled by the data crawling program according to the task information.
In a fourth aspect, an embodiment of the present invention provides an apparatus for crawling data, including: the receiving module is configured to receive task information sent by a background service by a data crawling program, wherein the background service is used for receiving the task information sent by a server, and the task information is used for indicating data in a crawling user interface; the processing module is configured to respond to the task information by the data crawling program, crawl data in the user interface according to the task information and generate a crawling result; a sending module configured to send the crawling result to the background service by the data crawling program, wherein the background service is used for sending the crawling result to the server.
In a fifth aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements the method in one or more of the above technical solutions.
The method and the device for crawling data provided by the embodiment of the invention comprise the following steps that firstly, a background service receives task information which is sent by a server and used for indicating data in a crawling user interface; then, the background service responds to the task information and sends the task information to the data crawling program, so that the data crawling program crawls data in the user interface according to the task information; and finally, the background service receives the crawling result of the data crawling program and sends the crawling result to the server. Therefore, the data crawling program receives the task information through the background service, and the background service can run all the time in the background, namely, the background service can receive the task information all the time and send the task information to the data crawling program, so that the data crawling program can continue to execute the task in the next task information after executing the task in one task information. Therefore, in the technical scheme provided by the invention, the task information is received through the background service, and the task information is sent to the data crawling program, so that the data crawling program can continue to execute the task in the next task information after executing the task in one task information, and meanwhile, the crawling result can be fed back to the server through the background service, so that the server can conveniently monitor the crawling process.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained according to the drawings without creative efforts for those skilled in the art.
FIG. 1 is a block diagram of a system for crawling data according to an embodiment of the present invention;
FIG. 2 is a first flowchart illustrating a method for crawling data according to an embodiment of the present invention;
FIG. 3 is a second flowchart illustrating a method for crawling data according to an embodiment of the present invention;
FIG. 4 is a first diagram illustrating an apparatus for crawling data according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a device for crawling data according to an embodiment of the present invention.
Detailed Description
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. Other embodiments may be derived from these embodiments by those of ordinary skill in the art without the exercise of inventive faculty.
An embodiment of the present invention provides a system for crawling data, fig. 1 is a schematic diagram of an architecture of the system for crawling data in the embodiment of the present invention, and as shown in fig. 1, the system includes: server 110, backend service 120, and data crawler 130.
The background service receives task information in the server and sends the task information to the data crawling program, the data crawling program crawls data in a user interface according to the task information and sends crawling results to the background service, and the background service sends the crawling results to the server.
The following describes a method for crawling data according to an embodiment of the present invention with reference to the above system.
In practical application, the method for crawling data can be applied to various occasions where data needs to be crawled, such as: in crawling the data of the user interface of an android mobile phone application program (APP), in order to enable the data crawling program to continue to execute the next task after executing one task, different tasks can be received ceaselessly through background services which can run all the time in the background, the received task is sent to the data crawling program, and the data crawling program can continue to execute the next task after executing one task.
Fig. 2 is a first schematic flowchart of a method for crawling data in an embodiment of the present invention, and referring to fig. 2, the method includes:
s210: the background service receives task information sent by the server;
s220: the background service responds to the task information and sends the task information to the data crawling program;
s230: and the background service receives the crawling result of the data crawling program and sends the crawling result to the server.
The task information is used for indicating data in the crawling user interface, and the task information is obtained by the background service from the server. The background service can passively receive task information from the server, when the task information is generated in the server, the server sends the task information to the background service, and the background service receives the task information, so that the background service can obtain the task information when the task information is generated in the server, and the crawling efficiency can be improved; the background service can also be used for actively acquiring task information from the server, the background service acquires the task information from the server once at intervals, when the server generates the task information within the interval, the background service can acquire the task information, and when the server does not generate the task information within the interval, the background service cannot acquire the task information, so that the problem that the server mistakenly sends the task information to other programs except the background service can be avoided, the data crawling program in the method can be ensured to crawl according to the task information, and the form of acquiring the task information is not specifically limited.
Here, the background service may be a component that can run in the background for a long time and can receive task information sent by the server, for example: the Android Service has the characteristics of being capable of running for a long time in the background and interacting with other bound services, and can receive task information sent by the server.
In practical application, the background service is started and then runs in the background all the time. When the server has task information, the background service receives the task information sent by the server; after obtaining the task information, the background service sends the task information to a data crawling program, so that the data crawling program crawls data in a user interface according to the task information and generates a crawling result, wherein the crawling result is the data in the user interface which is crawled by the data crawling program according to the task information; and the background service receives the crawling result and sends the crawling result to the server. When the server has no task information, the background service has no task information and can send the task information to the data crawling program, and the data crawling program is finished after executing the task in the last task information.
Wherein the data crawling program may be a tool capable of crawling data in the user interface, such as: the user interface automatic testing tool, namely the Uiautormator, is the user interface which needs to be crawled in the corresponding task information.
It can be seen that when task information is generated in a server, the background service which is always operated in the background receives the task information sent by the server and sends the task information to a data crawling program, so that the data crawling program can crawl data in a user interface according to the task information, the background service ceaselessly receives the task information and sends the task information to the data crawling program, and the data crawling program can continue to execute a task in the next task information after executing the task in one task information, so that the repeated manual starting of the data crawling program is avoided; when the server has no task information, the background service has no task information to send to the data crawling program, and the data crawling program is closed after executing the task in the last task information, so that the data crawling program is prevented from occupying resources when not executing the task information; meanwhile, the crawling result can be fed back to the server through the background service, and the server can monitor the crawling process conveniently.
Based on the foregoing embodiment, in order to avoid that the data crawling program cannot receive the task information due to being closed, the user interface is not crawled. Further, S220 includes:
s2211: the background service responds to the task information and judges the state of the data crawling program; wherein, when the data crawling program is in a closed state, S2212 is executed; when the data crawling program is in the running state, executing S2213;
s2212: starting a data crawling program and sending task information to the data crawling program;
s2213: and sending the task information to the data crawling program.
Firstly, when the background service receives task information, the background service judges the state of the data crawling program, here, the background service may directly judge the state of the data crawling program, or the data crawling program sends the current state of the data crawling program to the background service, the background service receives the current state of the data crawling program, and the form of judging the state of the data crawling program is not limited specifically here.
Then, when the data crawling program is in a closed state, the background service calls a bottom-layer Linux script through an android running environment to start the data crawling program, and after the data crawling program is started, task information is sent to the data crawling program, so that the data crawling program can crawl user interface data according to the task information; when the data crawling program is in the running state, the task information is directly sent to the data crawling program, so that the data crawling program can crawl user interface data according to the task information.
Therefore, by judging the state of the data crawling program, when the data crawling program is not started, the background service starts the data crawling program firstly, and then sends the task information to the data crawling program, so that the problem that the data crawling program cannot receive the task information because the data crawling program is not started after the background service sends the task information can be solved.
Based on the foregoing embodiments, to ensure that the data crawling program can execute the task information in the correct order. Further, S2213 includes:
s2213 a: judging the execution sequence of the task information and the task information being executed in the current data crawling program; wherein, when the task information needs to be executed first, S2213b is executed; when the task information being executed needs to be executed first, S2213c is executed;
s2213 b: directly sending task information to the data crawling program, so that the data crawling program executes the task information firstly;
s2213 c: after the data crawling program finishes executing the task information, the task information is sent to the data crawling program, so that the data crawling program finishes executing the task information and then executes the task information.
First, when the data crawling program is in an operating state, the execution order of the task information and the task information being executed is determined, where the execution order of the task information and the task information being executed may be determined according to the urgency of the task information and the task information being executed, or the execution order of the task information and the task information being executed may be determined according to the importance of the task information and the task information being executed, and the basis for determining the task information being executed first is not specifically limited herein.
Then, when the task information is judged to need to be executed first, the background service directly sends the task information to the data crawling program, so that the task information covers the task information being executed, or the task information squeezes the task information being executed into a queue to be executed, and the data crawling program executes the task information first; when the task information which is being executed needs to be executed first, the background service sends the task information to the data crawling program after the data crawling program finishes executing the task information which is being executed, so that the data crawling program finishes executing the task information which is being executed, and then executes the task information.
Therefore, by judging the execution sequence of the task information and the task information being executed, the timing when the background service sends the task information to the data crawling program is determined, the task information needing to be executed first can be guaranteed to be executed first by the data crawling program, and the task information needing to be executed first is prevented from influencing the execution of the task information being executed.
Based on the foregoing embodiment, in order to enable the server to know the completion of the task in the data crawling program completion task information. Further, S230 includes:
when the data crawled by the data crawling program is not the data required by the task information, sending a first crawling result to a server, wherein the first crawling result comprises information indicating that the data crawled by the data crawling program is not the data required by the task information;
when the data crawling program has an error in the process of executing the task information, sending a second crawling result to the server, wherein the second crawling result comprises information indicating that the data crawling program has an error in the process of executing the task information;
and when the data crawled by the data crawling program is the data required by the task information, sending a third crawling result to the server, wherein the third crawling result comprises the data crawled by the data crawling program and information indicating that the data crawled by the data crawling program is the data required by the task information so as to process the third crawling result.
First, after receiving the crawling result, the background service may perform preliminary judgment on the crawling result.
Specifically, when the form of the data in the crawling result is different from the form of the data required by the task information, it may be determined that the crawling result is error information generated by an error occurring when the data crawling program executes the task in the task information, that is, the second crawling result; when the content of the data in the crawling result is completely different from the content of the data required by the task information, the crawling result can be judged to be the data crawled by the data crawling program and not the data required by the task information, namely a first crawling result; when the content of the data in the crawling result is the same as or part of the content of the data required by the task information, it can be determined that the crawling result is the data that the data crawling program crawls is required by the task information, that is, the third crawling result.
And then, sending the corresponding crawling result to the server, so that the server can know the completion condition of the task in the task information completed by the data crawling program.
Here, the first crawling result and the second crawling result may be text information or speech information, and the expression form of the first crawling result and the second crawling result is not specifically limited herein.
Based on the foregoing embodiments, in order to enable the server to receive more complete data for the user to use. Further, when the data crawled by the data crawling program is data required by the task information, sending a third crawling result to the server, comprising:
when the data crawled by the data crawling program is all data required by the task information, sending a third crawling result to the server;
when the data crawled by the data crawling program is partial data required by the task information, splicing the third crawling result with the additional result to generate a splicing result; and sending the splicing result to a server.
The additional result is data which is different from the third crawling result and is needed by the task information, wherein the splicing result can be all data needed by the task information or partial data needed by the task information, and even if the splicing result is partial data needed by the task information, the splicing result is more complete compared with the third crawling result.
Specifically, when the background service judges that the data crawled by the data crawling program is all data required by the task information, the third crawling result is sent to the server, namely all data required by the task information and information indicating that the data crawled by the data crawling program is all data required by the task information are sent to the server, so that the server can perform next processing on all data required by the task information or can be used by a user.
Or when partial data required by the task information exists in the crawling result, the background service acquires an additional result which is required by the task information and is different from the crawling result from other channels, wherein the other channels can be data required by the task information and crawled by another data crawling program, can also be data required by the task information and crawled by other user interface crawling tools, and can also be data required by the existing task information.
In addition, after determining that the data crawled by the data crawling program is the data required by the task information, and before sending the data to a server or splicing the data with an additional result, judging whether the data which is not required by the task information exists in the data; if the data does not contain data which is not needed by the task information, the data does not need to be processed; if the data has data which is not needed by the task information, the data is cleaned, the data which is not needed by the task information in the data is deleted, and only the data which is needed by the task information in the data is obtained.
Therefore, the crawling results received by the server are processed crawling results, and the server does not need to process data crawled by the data crawling program.
Based on the foregoing embodiments, in order to be able to control the data crawling program to restart or terminate. Further, the method further comprises:
s240: the background service receives the control command sent by the server.
And the control command is generated by the server according to the crawling result, and is used for indicating to restart or terminate the data crawling program.
Specifically, after the server receives a first crawling result or a third crawling result, the server generates a control command for indicating termination of the data crawling program and sends the control command to the background service, the background service calls the script according to the control command to terminate the data crawling program, the data crawling program can be closed after task information is executed, and resource occupation is avoided; and after the server receives the second crawling result, the server generates a control command for indicating the data crawling program to be restarted, and sends the control command to the background service, and the background service calls the script according to the control command to restart the data crawling program, so that the data crawling program can be automatically restarted when an error occurs in the process of executing the task information.
Based on the foregoing embodiments, to avoid the data crawling program from receiving multiple task information at the same time. Further, after S230, the method further includes:
s251: the background service generates a task state according to the crawling result;
s252: the background service determines whether to continue to acquire new task information sent by the server according to the task state; when the task state indicates that the data crawling program has completed the task in the task information, S253 is executed; when the task state indicates that the data crawling program does not complete the task in the task information, executing S254;
s253: the background service continuously acquires new task information;
s254: the background service stops acquiring new task information.
And the task state is used for indicating whether the data crawling program completes the task in the task information.
Specifically, after the background service obtains the crawling result, the task state is generated according to the crawling result, if the task state indicates that the data crawling program has executed the task information, the background service can continue to receive new task information sent by the server, so that the data crawling program can continue to execute the new task information when the data crawling program has executed the current task information, and the crawling efficiency is improved; if the task state indicates that the data crawling program does not execute the task information, the background service stops continuously receiving the new task information sent by the server, and the problem that the background service sends the new task information to the data crawling program to influence the task information being executed in the data crawling program after obtaining the new task information in advance is avoided, namely the problem that the data crawling program runs disorderly due to the fact that the data crawling program receives a plurality of task information at the same time is avoided, and the accuracy of crawling data can be improved.
It should be noted that S251 and S240 do not have a sequential execution order.
Based on the foregoing embodiment, in order to enable the background service to receive the task information without stopping. Further, S210 includes:
s211: the background service sends a query to the server according to a preset time interval;
s212: and the background service receives the task information sent by the server.
Wherein the query is used to instruct the server to send the task information to the background service.
Specifically, the background service may send an inquiry to the server at a preset time interval, so as to obtain new task information after the preset time interval, where the preset time interval may be the time required by the data crawling program to complete one task information, or may be half of the time required by the data crawling program to complete one task information, and is not limited herein. The background service receives new task information sent by the server at a preset time interval, and the data crawling program can continuously crawl according to the task information.
Based on the foregoing embodiment, the operating efficiency and the accuracy of the data crawling program are improved. Further, the task information includes: crawling tasks and configuration information; s220 comprises:
s2221: the background service responds to the crawling task;
s2222: the background service sends configuration information to the data crawling program.
The crawling task is used for indicating crawling content, and the configuration information is an operation step for completing the crawling task by a data crawling program.
Specifically, the background service responds to the crawling task and sends configuration information corresponding to the crawling task to the data crawling program; the data crawling program opens corresponding APP according to crawling contents indicated by the crawling task, crawls data in an APP interface according to configuration information, and sends crawling results to the background service; and the background service receives the crawling result and sends the processed crawling result to the server.
It should be noted that the crawling result received by the background service is the operation step of the data crawling program received by the background service according to the configuration information and the data in the user interface crawled by the crawling content indicated by the crawling task.
It can be seen that, backstage service response crawls the task, send the configuration information that corresponds with the task of crawling for the data program of crawling, make the data program of crawling only need crawl the data in the user interface according to the configuration information, need not preset or save the configuration information in the data program of crawling, can improve the operating efficiency of data program of crawling, and can crawl the task according to the difference and send different configuration information for the data program of crawling, it is easy for the mistake to take place when calling in the data program to avoid a large amount of configuration information to have in advance, when needs use certain configuration information, send this configuration information for the data program of crawling again, can guarantee that the accuracy that the data program of crawling carries out the task of crawling.
The working process of the method for crawling data according to the embodiment of the present invention is described below by using a specific example.
Firstly, starting Android Service, and enabling the Android Service to run in a background all the time, wherein the Android Service continuously acquires crawling tasks and configuration information from a server; after the Android Service acquires the crawling task, calling a bottom-layer Linux script by using an Android running environment to start the Uiautomator, and sending configuration information to the Uiautomator; the Uiautomator opens the corresponding APP according to the configuration information, crawls the data of the user interface in the crawling task, and sends the crawling result or the problem encountered in the task execution process to the Android Service through interprocess communication; and the Android Service analyzes the crawling result, sends the crawling result to the server and updates the task state when the crawling result is data required by the task information, sends the error log to the server when the Android Service receives the error log, and restarts or terminates the Uiautormator by calling the bottom-layer Linux script according to the feedback information of the server. The android service can continuously acquire the task information from the server, so that the Uiautomator can continuously crawl data in the user interface according to the task information, and the Uiautomator can not occupy mobile phone resources when the task information is not executed.
Based on the same inventive concept, an embodiment of the present invention further provides a method for crawling data, fig. 3 is a schematic flow diagram of the method for crawling data in the embodiment of the present invention, and as shown in fig. 3, the method includes: s310: the data crawling program receives task information sent by a background service, the background service is used for receiving the task information sent by the server, and the task information is used for indicating data in a crawling user interface; s320: the data crawling program responds to the task information, crawls data in the user interface according to the task information and generates a crawling result; s330: and the data crawling program sends the crawling result to the background service, and the background service is used for sending the crawling result to the server.
Based on the foregoing embodiments, the task information includes: the method comprises the following operation steps of crawling tasks and configuration information, wherein the crawling tasks are used for indicating crawling contents, and the configuration information is an operation step of completing the crawling tasks for a data crawling program; the data crawling program responds to the task information, crawls data in the user interface according to the task information, and comprises the following steps: and the data crawling program crawls data in the user interface according to the operation steps in the configuration information and the crawling content indicated by the crawling task.
Here, it should be noted that: the above description of the method embodiment, similar to the description of the method embodiment, has similar beneficial effects as the method embodiment. For technical details which are not disclosed in the above-described method embodiments, reference is made to the description of the method embodiments described above for understanding.
Based on the same inventive concept, an embodiment of the present invention further provides an apparatus for crawling data, fig. 4 is a schematic structural diagram of the apparatus for crawling data in the embodiment of the present invention, and as shown in fig. 4, the apparatus 400 for crawling data includes: the receiving module 410 is configured to receive task information sent by a server by a background service, wherein the task information is used for indicating data in a crawling user interface; a sending module 420 configured to respond to the task information by the background service and send the task information to the data crawling program; and the processing module 430 is configured to receive the crawling result of the data crawling program by the background service and send the crawling result to the server, wherein the crawling result is data in a user interface crawled by the data crawling program according to the task information.
Based on the foregoing embodiment, the sending module is configured to respond to the task information by the background service and determine a state of the data crawling program; when the data crawling program is in a closed state, starting the data crawling program and sending task information to the data crawling program; and when the data crawling program is in the running state, sending task information to the data crawling program.
Based on the foregoing embodiment, the sending module is configured to determine, when the data crawling program is in a running state, an execution order of the task information and task information being executed in the current data crawling program; when the task information is executed firstly, the task information is directly sent to the data crawling program, so that the data crawling program executes the task information firstly; when the executing task information is executed first, after the data crawling program finishes executing the executing task information, the task information is sent to the data crawling program, so that the data crawling program finishes executing the executing task information and then executes the task information.
Based on the foregoing embodiment, a processing module configured to send a first crawling result to a server when data crawled by a data crawling program is not data required for task information, the first crawling result including information indicating that the data crawled by the data crawling program is not data required for task information; and/or when the data crawling program has an error in the process of executing the task information, sending a second crawling result to the server, wherein the second crawling result comprises information indicating that the data crawling program has an error in the process of executing the task information; and/or when the data crawled by the data crawling program is the data required by the task information, sending a third crawling result to the server, wherein the third crawling result comprises the data crawled by the data crawling program and information indicating that the data crawled by the data crawling program is the data required by the task information, so that the third crawling result is processed.
Based on the foregoing embodiment, the processing module is configured to send a third crawling result to the server when the data crawled by the data crawling program is all data required by the task information; when the data crawled by the data crawling program is partial data required by the task information, splicing the third crawling result with the additional result to generate a splicing result; sending the splicing result to a server; wherein the additional result is data which is different from the third crawling result and is needed by the task information.
Based on the foregoing embodiment, the apparatus further comprises: a first control module; and the first control module is configured to receive a control command sent by the server by the background service, wherein the control command is generated by the server according to the crawling result, and the control command is used for indicating to restart or terminate the data crawling program.
Based on the foregoing embodiment, the apparatus further comprises: a second control module; the second control module is configured to generate a task state according to the crawling result by the background service, wherein the task state is used for indicating whether the data crawling program completes the task in the task information; the background service determines whether to continue to acquire new task information sent by the server according to the task state; when the task state indicates that the data crawling program completes the task in the task information, the background service continues to acquire new task information; and when the task state indicates that the data crawling program does not complete the task in the task information, the background service stops acquiring new task information.
Based on the foregoing embodiment, the receiving module is configured to send, by the background service, an inquiry to the server at a preset time interval, where the inquiry is used to instruct the server to send task information to the background service; and the background service receives the task information sent by the server.
Based on the foregoing embodiments, the task information includes: crawling tasks and configuration information; a receiving module configured to respond to a crawling task by a background service, the crawling task being used to indicate crawling content; the background service sends configuration information to the data crawling program, and the configuration information is an operation step for the data crawling program to complete a crawling task; and/or the processing module is configured to receive the operation steps of the data crawling program according to the configuration information and the data in the user interface crawled by the crawling content indicated by the crawling task by the background service.
Here, it should be noted that: the above description of the apparatus embodiments, similar to the above description of the method embodiments, has similar beneficial effects as the method embodiments. For technical details not disclosed in the embodiments of the apparatus according to the invention, reference is made to the description of the embodiments of the method according to the invention for understanding.
Based on the same inventive concept, an embodiment of the present invention further provides an apparatus for crawling data, fig. 5 is a schematic structural diagram of the apparatus for crawling data in the embodiment of the present invention, and referring to fig. 5, the apparatus 500 for crawling data includes: a receiving module 510, configured to receive, by the data crawling program, task information sent by a background service, where the background service is used to receive the task information sent by the server, and the task information is used to indicate data in the crawling user interface; a processing module 520 configured to generate a crawling result in response to the task information by the data crawling program according to the data in the task information crawling user interface; a sending module 530 configured to send the crawling result to a background service by the data crawling program, where the background service is used for sending the crawling result to the server. .
Based on the foregoing embodiments, the task information includes: crawling tasks and configuration information; the crawling task is used for indicating the crawling content, and the configuration information is an operation step of completing the crawling task for a data crawling program; and the processing module is configured to crawl data in the user interface according to the operation steps in the configuration information and the crawling content indicated by the crawling task by the data crawling program.
Here, it should be noted that: the above description of the apparatus embodiments, similar to the above description of the method embodiments, has similar beneficial effects as the method embodiments. For technical details not disclosed in the embodiments of the apparatus according to the invention, reference is made to the description of the embodiments of the method according to the invention for understanding.
Based on the same inventive concept, an embodiment of the present invention further provides an electronic device, where the electronic device may include the data crawling apparatus in the above embodiment, and the electronic device may be a mobile phone, a tablet computer, or the like. The electronic device includes: at least one processor, and at least one memory, bus connected with the processor; the processor and the memory complete mutual communication through a bus; the processor is used for calling the program instructions in the memory to execute the method for crawling the data in one or more embodiments, and is configured to receive task information sent by the server by the background service, wherein the task information is used for indicating data in the crawling user interface; the background service responds to the task information and sends the task information to the data crawling program; and the background service receives the crawling result of the data crawling program and sends the crawling result to the server, wherein the crawling result is data in a user interface which is crawled by the data crawling program according to the task information.
Based on the foregoing embodiment, the processor is configured to respond to the task information by the background service and determine a state of the data crawling program; when the data crawling program is in a closed state, starting the data crawling program and sending task information to the data crawling program; and when the data crawling program is in the running state, sending task information to the data crawling program.
Based on the foregoing embodiment, the processor is configured to determine, when the data-crawling program is in a running state, an execution order of the task information and task information being executed in the current data-crawling program; when the task information is executed firstly, the task information is directly sent to the data crawling program, so that the data crawling program executes the task information firstly; when the executing task information is executed first, after the data crawling program finishes executing the executing task information, the task information is sent to the data crawling program, so that the data crawling program finishes executing the executing task information and then executes the task information.
Based on the foregoing embodiment, a processor configured to send a first crawling result to a server when data crawled by a data crawling program is not data required for task information, the first crawling result including information indicating that the data crawled by the data crawling program is not data required for task information; and/or when the data crawling program has an error in the process of executing the task information, sending a second crawling result to the server, wherein the second crawling result comprises information indicating that the data crawling program has an error in the process of executing the task information; and/or when the data crawled by the data crawling program is the data required by the task information, sending a third crawling result to the server, wherein the third crawling result comprises the data crawled by the data crawling program and information indicating that the data crawled by the data crawling program is the data required by the task information, so that the third crawling result is processed.
Based on the foregoing embodiment, the processor is configured to send a third crawling result to the server when the data crawled by the data crawling program is all data required by the task information; when the data crawled by the data crawling program is partial data required by the task information, splicing the third crawling result with the additional result to generate a splicing result; sending the splicing result to a server; wherein the additional result is data which is different from the third crawling result and is needed by the task information.
Based on the foregoing embodiment, the processor is further configured to receive, by the background service, a control command sent by the server, where the control command is generated by the server according to the crawling result, and the control command is used for instructing to restart or terminate the data crawling program.
Based on the foregoing embodiment, the processor is further configured to generate, by the background service, a task state according to the crawling result, where the task state is used to indicate whether the data crawling program completes the task in the task information; the background service determines whether to continue to acquire new task information sent by the server according to the task state; when the task state indicates that the data crawling program completes the task in the task information, the background service continues to acquire new task information; and when the task state indicates that the data crawling program does not complete the task in the task information, the background service stops acquiring new task information.
Based on the foregoing embodiment, the processor is configured to send, by the background service, a query to the server at a preset time interval, where the query is used to instruct the server to send task information to the background service; and the background service receives the task information sent by the server.
Based on the foregoing embodiments, the task information includes: crawling tasks and configuration information; a processor configured to respond to a crawling task by a background service, the crawling task being indicative of crawling content; the background service sends configuration information to the data crawling program, and the configuration information is an operation step for the data crawling program to complete a crawling task; and/or the processor is configured to receive the operation steps of the data crawling program according to the configuration information and the data in the user interface crawled by the crawling content indicated by the crawling task by the background service.
Here, it should be noted that: the above description of the embodiments of the electronic device is similar to the description of the embodiments of the method described above, and has similar advantageous effects to the embodiments of the method. For technical details not disclosed in the embodiments of the electronic device according to the embodiments of the present invention, please refer to the description of the method embodiments of the present invention.
Based on the same inventive concept, an embodiment of the present invention further provides an electronic device, where the electronic device may include the data crawling apparatus in the above embodiment, and the electronic device may be a mobile phone, a tablet computer, or the like. The electronic device includes: at least one processor, and at least one memory, bus connected with the processor; the processor and the memory complete mutual communication through a bus; the processor is used for calling the program instructions in the memory to execute the method for crawling data in one or more embodiments, and is configured to receive task information sent by a background service by the data crawling program, the background service is used for receiving the task information sent by the server, and the task information is used for indicating data in the crawling user interface; the data crawling program responds to the task information, crawls data in the user interface according to the task information and generates a crawling result; the data crawling program is configured to send crawling results to the background service, and the background service is used for sending the crawling results to the server. .
Based on the foregoing embodiments, the task information includes: crawling tasks and configuration information; the crawling task is used for indicating the crawling content, and the configuration information is an operation step of completing the crawling task for a data crawling program; and the processor is configured to crawl data in the user interface according to the operation steps in the configuration information and the crawling content indicated by the crawling task by the data crawling program.
Here, it should be noted that: the above description of the embodiments of the electronic device is similar to the description of the embodiments of the method described above, and has similar advantageous effects to the embodiments of the method. For technical details not disclosed in the embodiments of the electronic device according to the embodiments of the present invention, please refer to the description of the method embodiments of the present invention.
Based on the same inventive concept, embodiments of the present invention also provide a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the method for crawling data as in one or more embodiments above.
Here, it should be noted that: the above description of the computer-readable storage medium embodiments is similar to the description of the method embodiments described above, with similar beneficial effects as the method embodiments. For technical details not disclosed in the embodiments of the computer-readable storage medium of the embodiments of the present invention, reference is made to the description of the method embodiments of the present invention for understanding.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (10)

1. A method of crawling data, the method comprising:
the background service receives task information sent by the server, wherein the task information is used for indicating data in the crawling user interface;
the background service responds to the task information and sends the task information to a data crawling program;
the background service receives a crawling result of the data crawling program and sends the crawling result to the server, and the crawling result is data in the user interface crawled by the data crawling program according to the task information.
2. The method of claim 1, wherein the backend service receives crawl results from the data crawling program and sends the crawl results to the server, comprising:
when the data crawled by the data crawling program is not the data required by the task information, sending a first crawling result to the server, wherein the first crawling result comprises information indicating that the data crawled by the data crawling program is not the data required by the task information; and/or the presence of a gas in the gas,
when the data crawling program generates an error in the process of executing the task information, sending a second crawling result to the server, wherein the second crawling result comprises information indicating that the data crawling program generates an error in the process of executing the task information; and/or the presence of a gas in the gas,
when the data that the program was crawled to the data is the data that task information needs, send the third result of crawling for the server, the third result of crawling includes the data that the program was crawled to the data, and instruct the data that the program was crawled to the data is the information of the data that task information needs, so that right the third result of crawling is handled.
3. The method of claim 1, further comprising:
the background service receives a control command sent by the server, wherein the control command is generated by the server according to the crawling result, and the control command is used for indicating to restart or terminate the data crawling program; and/or the presence of a gas in the gas,
the background service generates a task state according to the crawling result, wherein the task state is used for indicating whether the data crawling program completes the task in the task information or not; the background service determines whether to continue to acquire new task information sent by the server according to the task state; when the task state indicates that the data crawling program completes the task in the task information, the background service continues to acquire the new task information; and when the task state indicates that the data crawling program does not complete the task in the task information, the background service stops acquiring the new task information.
4. The method of claim 1, wherein the background service obtains task information sent by a server, comprising:
the background service sends a query to the server according to a preset time interval, wherein the query is used for indicating the server to send the task information to the background service;
and the background service receives the task information sent by the server.
5. The method of any of claims 1-4, wherein the task information comprises: crawling tasks and configuration information; the background service responds to the task information and sends the task information to a data crawling program, and the task information comprises the following steps: the background service responds to the crawling task, and the crawling task is used for indicating crawling content; the background service sends the configuration information to the data crawling program, wherein the configuration information is an operation step for the data crawling program to complete the crawling task; and/or the presence of a gas in the gas,
the background service receives the crawling result of the data crawling program, and the crawling result comprises the following steps: and the background service receives the data in the user interface crawled by the data crawling program according to the operation steps in the configuration information and the crawling content indicated by the crawling task.
6. A method of crawling data, the method comprising:
the data crawling program receives task information sent by a background service, the background service is used for receiving the task information sent by a server, and the task information is used for indicating data in a crawling user interface;
the data crawling program responds to the task information, crawls data in the user interface according to the task information and generates a crawling result;
and the data crawling program sends the crawling result to the background service, and the background service is used for sending the crawling result to the server.
7. The method of claim 6, wherein the task information comprises: crawling tasks and configuration information; the crawling task is used for indicating crawling content, and the configuration information is an operation step of completing the crawling task for the data crawling program; the data crawling program responds to the task information, crawls data in the user interface according to the task information, and comprises the following steps:
and the data crawling program crawls the data in the user interface according to the operation steps in the configuration information and the crawling content indicated by the crawling task.
8. An apparatus to crawl data, the apparatus comprising:
the receiving module is configured to receive task information sent by the server through background service, wherein the task information is used for indicating data in the crawling user interface;
the sending module is configured to respond to the task information by the background service and send the task information to a data crawling program;
and the processing module is configured to receive a crawling result of the data crawling program by the background service and send the crawling result to the server, wherein the crawling result is data in the user interface crawled by the data crawling program according to the task information.
9. An apparatus to crawl data, the apparatus comprising:
the receiving module is configured to receive task information sent by a background service by a data crawling program, wherein the background service is used for receiving the task information sent by a server, and the task information is used for indicating data in a crawling user interface;
the processing module is configured to respond to the task information by the data crawling program, crawl data in the user interface according to the task information and generate a crawling result;
a sending module configured to send the crawling result to the background service by the data crawling program, wherein the background service is used for sending the crawling result to the server.
10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method according to any one of claims 1 to 7.
CN201811145420.3A 2018-09-29 2018-09-29 Method and device for crawling data Pending CN110968755A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811145420.3A CN110968755A (en) 2018-09-29 2018-09-29 Method and device for crawling data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811145420.3A CN110968755A (en) 2018-09-29 2018-09-29 Method and device for crawling data

Publications (1)

Publication Number Publication Date
CN110968755A true CN110968755A (en) 2020-04-07

Family

ID=70027153

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811145420.3A Pending CN110968755A (en) 2018-09-29 2018-09-29 Method and device for crawling data

Country Status (1)

Country Link
CN (1) CN110968755A (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100100963A1 (en) * 2008-10-21 2010-04-22 Flexilis, Inc. System and method for attack and malware prevention
CN103856467A (en) * 2012-12-06 2014-06-11 百度在线网络技术(北京)有限公司 Method and distributed system for achieving safety scanning
US20150058324A1 (en) * 2013-08-19 2015-02-26 Joseph Gregory Kauwe Systems and methods of enabling integrated activity scheduling, sharing and real-time social connectivity through an event-sharing platform
CN104484405A (en) * 2014-12-15 2015-04-01 北京国双科技有限公司 Method and device for carrying out crawling task
CN104714875A (en) * 2015-03-11 2015-06-17 浪潮集团有限公司 Distributed automatic collecting method
CN107562541A (en) * 2017-09-05 2018-01-09 广东科杰通信息科技有限公司 A kind of distributed reptile method of load balancing, crawler system
CN107689951A (en) * 2017-07-26 2018-02-13 上海壹账通金融科技有限公司 Web data crawling method, device, user terminal and readable storage medium storing program for executing
CN107832136A (en) * 2017-11-28 2018-03-23 广州启生信息技术有限公司 The management method and device of a kind of web crawler
CN107870861A (en) * 2017-10-10 2018-04-03 上海壹账通金融科技有限公司 The concurrent testing method and application server of web page crawl
CN108089967A (en) * 2017-12-12 2018-05-29 成都睿码科技有限责任公司 A kind of method for crawling Android mobile phone App data

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100100963A1 (en) * 2008-10-21 2010-04-22 Flexilis, Inc. System and method for attack and malware prevention
CN103856467A (en) * 2012-12-06 2014-06-11 百度在线网络技术(北京)有限公司 Method and distributed system for achieving safety scanning
US20150058324A1 (en) * 2013-08-19 2015-02-26 Joseph Gregory Kauwe Systems and methods of enabling integrated activity scheduling, sharing and real-time social connectivity through an event-sharing platform
CN104484405A (en) * 2014-12-15 2015-04-01 北京国双科技有限公司 Method and device for carrying out crawling task
CN104714875A (en) * 2015-03-11 2015-06-17 浪潮集团有限公司 Distributed automatic collecting method
CN107689951A (en) * 2017-07-26 2018-02-13 上海壹账通金融科技有限公司 Web data crawling method, device, user terminal and readable storage medium storing program for executing
CN107562541A (en) * 2017-09-05 2018-01-09 广东科杰通信息科技有限公司 A kind of distributed reptile method of load balancing, crawler system
CN107870861A (en) * 2017-10-10 2018-04-03 上海壹账通金融科技有限公司 The concurrent testing method and application server of web page crawl
CN107832136A (en) * 2017-11-28 2018-03-23 广州启生信息技术有限公司 The management method and device of a kind of web crawler
CN108089967A (en) * 2017-12-12 2018-05-29 成都睿码科技有限责任公司 A kind of method for crawling Android mobile phone App data

Similar Documents

Publication Publication Date Title
CN108182077B (en) Device, firmware upgrading method and device of device and storage medium
CN109376088B (en) Automatic test system and automatic test method
CN108121543B (en) Software code compiling processing method and device
CN108228444B (en) Test method and device
WO2017193737A1 (en) Software testing method and system
CN111026645A (en) User interface automatic testing method and device, storage medium and electronic equipment
WO2018076969A1 (en) Software upgrade method, computing device, and channel control device
CN107066600A (en) Automatic method, system, mobile terminal and the readable storage medium storing program for executing for skipping advertising page
US9170924B2 (en) Ecosystem certification of a partner product
CN107967207B (en) Method and device for testing user interface interaction function
CN110609755A (en) Message processing method, device, equipment and medium for cross-block chain node
CN108733545B (en) Pressure testing method and device
CN110543429B (en) Test case debugging method, device and storage medium
CN109032705B (en) Application program execution method and device, electronic equipment and readable storage medium
WO2020041957A1 (en) Vehicle diagnostic method and device and readable storage medium
US20210097787A1 (en) Information presentation method and apparatus
CN110750453B (en) HTML 5-based intelligent mobile terminal testing method, system, server and storage medium
CN107025126B (en) Resource scheduling method, NFVO and system
CN110968755A (en) Method and device for crawling data
CN111045919B (en) Method, device, background server, storage medium and system for debugging program
CN106484604B (en) Application test control method and device
CN107967363B (en) Data processing method and device and electronic equipment
CN107168756B (en) Radio frequency driving compiling and debugging method, client, server and storage device
CN113448607B (en) Method and device for firmware upgrading and intelligent household appliance
CN111400173B (en) VTS test method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200407