CN108062244B - Reptile task canceling method and device - Google Patents

Reptile task canceling method and device Download PDF

Info

Publication number
CN108062244B
CN108062244B CN201610987134.6A CN201610987134A CN108062244B CN 108062244 B CN108062244 B CN 108062244B CN 201610987134 A CN201610987134 A CN 201610987134A CN 108062244 B CN108062244 B CN 108062244B
Authority
CN
China
Prior art keywords
target
crawler task
crawler
task
running
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610987134.6A
Other languages
Chinese (zh)
Other versions
CN108062244A (en
Inventor
朱长坚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN201610987134.6A priority Critical patent/CN108062244B/en
Publication of CN108062244A publication Critical patent/CN108062244A/en
Application granted granted Critical
Publication of CN108062244B publication Critical patent/CN108062244B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application discloses a method and a device for canceling a crawler task. The method comprises the following steps: detecting whether a target crawler task is injected into a target message queue, wherein the target message queue is a message queue to which a monitoring unit of a crawler system belongs, and the monitoring unit is used for monitoring the crawler task; if the target crawler task is detected to be injected into the target message queue, determining the unique identifier of the target crawler task and operating the target crawler task; detecting whether a cancel instruction is received or not in the process of running the target crawler task; if a cancel instruction is received, modifying the running state of the target crawler task according to the unique identifier of the target crawler task; and executing the task of canceling the running target crawler according to the modified running state of the target crawler task when the running state is detected to be changed. By the method and the device, the problem that a lot of communication resources of a system are wasted when the crawler task is cancelled in the related technology is solved.

Description

Reptile task canceling method and device
Technical Field
The application relates to the technical field of internet, in particular to a method and a device for canceling a crawler task.
Background
In the general and temporary crawler frameworks, a user injects a crawler task into a crawler system through a WebAPI to execute the crawler task, and if the user wants to cancel the executed crawler task, the user can cancel the crawler task through a WebAPI task cancellation interface. The task canceling mechanism of the universal crawler cancels the crawler task through an observer mode (for example, subscription-publication) of a remote broadcast message, namely, a crawler module subscribes a push message of a certain keyword when being started, a WebAPI transmits a message with the keyword when canceling the crawler task, and the crawler module receives a task canceling message issued by the WebAPI at the moment because the crawler module subscribes the message of the keyword. However, in the temporary crawler task crawler framework, although a general crawler module is used, the difference is that each temporary crawler task starts a crawler process, and resources of the process are recovered after the crawling task is completed. The overall architecture of the temporary crawler is different from that of the general crawler. If the crawler task canceling mechanism of remote broadcasting in the general crawler is adopted to cancel the temporary crawler task, the crawler module equivalent to each crawler task can subscribe the message pushed by the same keyword, although all subscribers can receive the published message, the messages only have significance for one of the subscribers, have no significance for other subscribers, and waste some unnecessary communication resources.
In interim crawler framework, a crawler module can all be started to each interim crawler task, if adopt general crawler frame, every interim crawler task all can start the subscription that a crawler task cancelled, if there are many interim crawler tasks in operation simultaneously, will have the subscriber that many crawler tasks cancelled. Once WebAPI publishes a cancellation message for a certain crawler task, all subscribers will receive notification, however, this crawler task cancellation is only effective for one of the subscribers, which wastes a lot of communication resources. Moreover, once the WebAPI issues a cancellation message of a certain crawler task, all resources (e.g., crawler process resources and message queue resources) need to be immediately recycled, but it is difficult to do so in the existing system if a crawler task cancellation mechanism of a general crawler is adopted.
Aiming at the problem that a lot of communication resources of a system are wasted when a crawler task is cancelled in the related art, an effective solution is not provided at present.
Disclosure of Invention
The present application mainly aims to provide a method and an apparatus for canceling a crawler task, so as to solve the problem of a lot of communication resource waste of a system when canceling the crawler task in the related art.
In order to achieve the above object, according to one aspect of the present application, there is provided a method for canceling a crawler task. The method comprises the following steps: detecting whether a target crawler task is injected into a target message queue, wherein the target message queue is a message queue to which a monitoring unit of a crawler system belongs, and the monitoring unit is used for monitoring the crawler task; if the target crawler task is detected to be injected into the target message queue, determining the unique identifier of the target crawler task and operating the target crawler task; detecting whether a cancel instruction is received in the process of running the target crawler task, wherein the cancel instruction is an instruction for indicating to cancel the running of the target crawler task; if a cancel instruction is received, modifying the running state of the target crawler task according to the unique identifier of the target crawler task; and executing the task of canceling the running target crawler according to the modified running state of the target crawler task when the running state is detected to be changed.
Further, after running the target crawler task, the method further comprises: creating a target child node under the crawler task operation root node, wherein the crawler task operation root node is used for recording a crawler task in an operating state in a crawler system, and the target child node is used for recording information of the target crawler task; adding data of the running state of the crawler task in the target child node; and setting a monitoring mechanism for the target child node, wherein the monitoring mechanism is used for monitoring the change of the running state of the target crawler task.
Further, if a cancel instruction is received, modifying the running state of the target crawler task according to the unique identifier of the target crawler task comprises: if a cancel instruction is received, calling a crawler task cancel interface to send the unique identifier of the target crawler task; searching a target child node executing the target crawler task under the crawler task operation root node according to the received unique identifier of the target crawler task; and modifying the running state of the crawler task under the searched target child node.
Further, when detecting that the running state changes, executing the task of canceling the running target crawler according to the modified running state of the task of target crawler includes: when monitoring that the running state of the target crawler task changes, the monitoring mechanism sends a starting message to a monitoring unit of the crawler system, wherein the starting message is used for indicating a crawler task termination program to be started to delete a message queue of process resources of the crawler system; and after the monitoring unit receives the starting message, starting a crawler task termination program to delete the message queue of the process resource of the crawler system.
Further, before detecting whether the target crawler task is injected in the target message queue, the method further comprises at least one of the following steps: initializing a target message queue; initializing a message queue of process resources of the crawler system; and initializing the Zookeeper node.
Further, initializing the Zookeeper node further includes: initializing a crawler task operation root node; and/or initializing the crawler task eliminating root node, wherein the crawler task eliminating root node is used for recording and eliminating the crawler task.
In order to achieve the above object, according to another aspect of the present application, there is provided a reptile task cancellation apparatus. The device includes: the system comprises a first detection unit, a second detection unit and a third detection unit, wherein the first detection unit is used for detecting whether a target crawler task is injected into a target message queue, the target message queue is a message queue to which a monitoring unit of a crawler system belongs, and the monitoring unit is used for monitoring the crawler task; the determining unit is used for determining the unique identifier of the target crawler task and running the target crawler task under the condition that the target crawler task is detected to be injected into the target message queue; the second detection unit is used for detecting whether a cancel instruction is received or not in the process of running the target crawler task, wherein the cancel instruction is an instruction for indicating to cancel the running of the target crawler task; the modification unit is used for modifying the running state of the target crawler task according to the unique identifier of the target crawler task under the condition of receiving the cancel instruction; and the execution unit is used for executing the task of canceling the running target crawler according to the modified running state of the target crawler task when the running state is detected to be changed.
Further, the apparatus further comprises: the system comprises a creating unit and a processing unit, wherein the creating unit is used for creating target child nodes under a crawler task operation root node after a target crawler task is operated, the crawler task operation root node is used for recording the crawler task in an operating state in a crawler system, and the target child nodes are used for recording the information of the target crawler task; the adding unit is used for adding data of the running state of the crawler task in the target child node; and the setting unit is used for setting a monitoring mechanism for the target child node, wherein the monitoring mechanism is used for monitoring the change of the running state of the target crawler task.
Further, the modification unit includes: the first sending module is used for calling a crawler task cancellation interface to send the unique identifier of the target crawler task under the condition of receiving the cancellation instruction; the sending module is used for searching a target child node which is executing the target crawler task under the crawler task running root node according to the received unique identifier of the target crawler task; and the modification module is used for modifying the running state of the crawler task under the searched target child node.
Further, the execution unit includes: the system comprises a first sending module, a second sending module and a monitoring unit, wherein the first sending module is used for sending a starting message to a monitoring unit of the crawler system when the monitoring mechanism monitors that the running state of a target crawler task changes, and the starting message is used for indicating a crawler task termination program to be started to delete a message queue of process resources of the crawler system; and the deleting module is used for starting a crawler task terminating program to delete the message queue of the process resource of the crawler system after the monitoring unit receives the starting message.
Through the application, the following steps are adopted: detecting whether a target crawler task is injected into a target message queue, wherein the target message queue is a message queue to which a monitoring unit of a crawler system belongs, and the monitoring unit is used for monitoring the crawler task; if the target crawler task is detected to be injected into the target message queue, determining the unique identifier of the target crawler task and operating the target crawler task; detecting whether a cancel instruction is received in the process of running the target crawler task, wherein the cancel instruction is an instruction for indicating to cancel the running of the target crawler task; if a cancel instruction is received, modifying the running state of the target crawler task according to the unique identifier of the target crawler task; and when the change of the running state is detected, executing the task of canceling the running target crawler according to the modified running state of the target crawler task, thereby solving the problem of a lot of communication resource waste of the system when the crawler task is canceled in the related technology. And further, the effect of reducing the overhead of system communication resources when the crawler task is cancelled is achieved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application. In the drawings:
FIG. 1 is a flow chart of a method for canceling a crawler task provided according to an embodiment of the present application; and
fig. 2 is a schematic diagram of a crawler task cancellation device provided according to an embodiment of the present application.
Detailed Description
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be used. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
For convenience of description, some terms or expressions referred to in the embodiments of the present application are explained below:
ZooKeeper is a distributed, open-source distributed application coordination service, is an open-source implementation of Chubby of Google, and is an important component of Hadoop and Hbase. It is a software that provides a consistent service for distributed applications, and the functions provided include: configuration maintenance, domain name service, distributed synchronization, group service, etc. The ZooKeeper aims to package complex and error-prone key services and provide a simple and easy-to-use interface and a system with high performance and stable functions for users. The Zookeeper is used as a distributed service framework and mainly used for solving the consistency problem of an application system in a distributed cluster, can provide data storage based on a directory node tree mode similar to a file system, but is not used for specially storing data, and is mainly used for maintaining and monitoring the state change of the data stored by the Zookeeper. By monitoring these data state changes, data-based cluster management can be achieved. Watch mechanism of zookeeper: a ZooKeeper node may be monitored, including modification of data stored in the directory, change of child node directory, and notification of a client setting up monitoring upon change, which is the most important feature of ZooKeeper to an application, and functions that may be implemented by this feature include centralized management of configuration, cluster management, distributed locking, and so on. The watch mechanism official states: a Watch event is a one-time trigger, and when the data to which Watch is set is changed, the server sends the change to the clients to which Watch is set so as to notify them.
According to an embodiment of the application, a method for canceling a crawler task is provided.
Fig. 1 is a flowchart of a method for canceling a crawler task according to an embodiment of the present application. As shown in fig. 1, the method comprises the steps of:
step S101, whether a target crawler task is injected into a target message queue is detected, wherein the target message queue is a message queue to which a monitoring unit of a crawler system belongs, and the monitoring unit is used for monitoring the crawler task.
The system periodically detects whether a target crawler task is injected into a target message queue, when the WebAPI is started, an external user calls a WebAPI task injection interface and transmits task parameters, and the parameter format is a json format. It is detected that the target crawler task is injected in the target message queue.
Optionally, in the method for canceling a crawler task provided in the embodiment of the present application, before detecting whether a target crawler task is injected in a target message queue, the method further includes performing at least one of the following steps: initializing a target message queue; initializing a message queue of process resources of the crawler system; and initializing the Zookeeper node.
Optionally, in the method for canceling a crawler task provided in the embodiment of the present application, initializing a Zookeeper node further includes: initializing a crawler task operation root node; and/or initializing the crawler task eliminating root node, wherein the crawler task eliminating root node is used for recording and eliminating the crawler task.
Specifically, prior to execution of the steps of the present application, a crawler system monitoring agent is initialized. The monitoring agent will initialize the following three important resources: initialization of the monitoring agent message queue, initialization of process resources (crawlers, pre-processing, task termination programs) and initialization of the Zookeeper. The initialization of the monitoring agent message queue refers to the establishment of a monitoring agent message queue, and a new task is injected into the message queue; the initialization of the process resource refers to the initialization of the path of the process resource, and a corresponding program can be started according to the path. When the WebAPI is initialized, the following three nodes are created in the Zookeeper: the system comprises a crawler task running root node, a monitoring agent root node and a task canceling root node. Wherein, the task operation root node refers to a crawler task root node in operation. The Zookeeper initialization refers to the initialization of some Zookeeper nodes, such as a crawler task running root node, a monitoring agent root node and a task canceling root node, and then a child node is created under the monitoring agent root node, wherein the node name monitors the agent name. Once a task begins to crawl, a child node is created under a task operation root node; the monitoring agent node is a root node of the monitoring agent, and when a new monitoring agent is deployed, a child node is created under the root node; a task cancel node is a child node created under the root node when a new task is canceled.
And step S102, if the fact that the target crawler task is injected into the target message queue is detected, determining the unique identification of the target crawler task and running the target crawler task.
The WebAPI task interface acquires all child nodes of the root node of the monitoring agent and randomly selects an agent from the child nodes. And then, the task is sent to the message queue of the selected agent, and if the task is detected to be successfully sent to the message queue of the monitoring agent of the crawler system, a task ID number is returned to the user as the unique identifier of the target crawler task.
Optionally, in the method for canceling a crawler task provided in the embodiment of the present application, after the target crawler task is executed, the method further includes: creating a target child node under the crawler task operation root node, wherein the crawler task operation root node is used for recording a crawler task in an operating state in a crawler system, and the target child node is used for recording information of the target crawler task; adding data of the running state of the crawler task in the target child node; and setting a monitoring mechanism for the target child node, wherein the monitoring mechanism is used for monitoring the change of the running state of the target crawler task.
Specifically, running the target crawler task further comprises: after the monitoring agent program monitors that a message queue to which the monitoring agent program belongs has a new task, it should be noted that the monitoring here refers to that the monitoring agent program is bound to the message queue by an event, and the event is triggered when a new task is injected into the message queue. Firstly, a child node is created under a root node of a running task, the name of the node is the ID number of the task, running state data is written in the node, and monitoring of data change of the node is set, namely once the data of the node changes, a monitoring agent receives a notification and carries out corresponding processing. The monitoring agent will then call the process resources and start the programs (crawlers, pre-processing and task termination programs) according to the initialized path. When the task termination program is started, monitoring is also set for the task operation node.
Step S103, whether a cancel instruction is received or not is detected in the process of running the target crawler task, wherein the cancel instruction is an instruction for indicating to cancel the running of the target crawler task.
In the process of running the target crawler task, a user may need to cancel the running target crawler task, namely, a cancel instruction is sent, and the system detects whether the instruction for canceling the running target crawler task is received.
And step S104, if a cancel instruction is received, modifying the running state of the target crawler task according to the unique identifier of the target crawler task.
Optionally, in the method for canceling a crawler task provided in the embodiment of the present application, if a cancel instruction is received, modifying the running state of the target crawler task according to the unique identifier of the target crawler task includes: if a cancel instruction is received, calling a crawler task cancel interface to send the unique identifier of the target crawler task; searching a target child node executing the target crawler task under the crawler task operation root node according to the received unique identifier of the target crawler task; and modifying the running state of the crawler task under the searched target child node.
And if a canceling instruction is received, calling a WebAPI task canceling interface, and transmitting an ID number (the unique identification of the target crawler task) of the parameter task. In the WebAPI task cancel interface, the executed task node is searched under the root node of the running task in the Zookeeper according to the ID number of the task, and the node data is modified, namely the execution state of the task is modified to be cancelled (Canceling).
And step S105, when the change of the running state is detected, executing the task of canceling the running target crawler according to the modified running state of the target crawler task.
Optionally, in the method for canceling a crawler task provided in the embodiment of the present application, when it is detected that the running state is changed, executing to cancel the running target crawler task according to the modified running state of the target crawler task includes: when monitoring that the running state of the target crawler task changes, the monitoring mechanism sends a starting message to a monitoring unit of the crawler system, wherein the starting message is used for indicating a crawler task termination program to be started to delete a message queue of process resources of the crawler system; and after the monitoring unit receives the starting message, starting a crawler task termination program to delete the message queue of the process resource of the crawler system.
When the task node data being executed is modified into the state data being cancelled by the WebAPI task cancelling interface, the started task terminating program receives a notification at this time, the message queue of the process resource is deleted at this time, then the task node data being executed is modified into the cancelled state (cancelled), the monitoring agent receives the notification that the task is cancelled at this time, and the agent deletes the process resource at this time. I.e. the task cancellation is completed.
In summary, through the above steps, the temporary task injects the task into a message queue of a monitoring agent (monitor agent) through an open WebAPI interface, the monitor agent then starts a corresponding crawler preprocessing program, a crawler program, and a terminating program module, and creates a task node in the Zookeeper to track the running state of the task, and the terminating module also monitors the task state in the Zookeeper, and once the task state is changed, the terminating program module receives a notification and performs a corresponding measure. When the task state changes to be cancelled, the terminating program module receives the notification and cancels the crawler task. Thereby avoiding the problem of much communication resource waste of the system. And further, the effect of reducing the overhead of system communication resources when the crawler task is cancelled is achieved.
According to the method for canceling the crawler task, whether the target crawler task is injected into a target message queue is detected, wherein the target message queue is a message queue to which a monitoring unit of a crawler system belongs, and the monitoring unit is used for monitoring the crawler task; if the target crawler task is detected to be injected into the target message queue, determining the unique identifier of the target crawler task and operating the target crawler task; detecting whether a cancel instruction is received in the process of running the target crawler task, wherein the cancel instruction is an instruction for indicating to cancel the running of the target crawler task; if a cancel instruction is received, modifying the running state of the target crawler task according to the unique identifier of the target crawler task; and when the change of the running state is detected, executing the task of canceling the running target crawler according to the modified running state of the target crawler task, thereby solving the problem of a lot of communication resource waste of the system when the crawler task is canceled in the related technology. And further, the effect of reducing the overhead of system communication resources when the crawler task is cancelled is achieved.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.
The embodiment of the present application further provides a device for canceling the crawler task, and it should be noted that the device for canceling the crawler task of the embodiment of the present application may be used to execute the method for canceling the crawler task provided by the embodiment of the present application. The following describes a crawler task cancellation device provided in an embodiment of the present application.
Fig. 2 is a schematic diagram of a crawler task cancellation apparatus according to an embodiment of the present application. As shown in fig. 2, the apparatus includes: a first detection unit 10, a determination unit 20, a second detection unit 30, a modification unit 40 and an execution unit 50.
Specifically, the first detecting unit 10 is configured to detect whether a target crawler task is injected into a target message queue, where the target message queue is a message queue to which a monitoring unit of the crawler system belongs, and the monitoring unit is configured to monitor the crawler task.
And the determining unit 20 is configured to determine the unique identifier of the target crawler task and run the target crawler task when it is detected that the target crawler task is injected into the target message queue.
And a second detecting unit 30, configured to detect whether a cancel instruction is received during the process of running the target crawler task, where the cancel instruction is an instruction indicating to cancel running the target crawler task.
And the modifying unit 40 is used for modifying the running state of the target crawler task according to the unique identifier of the target crawler task under the condition that the canceling instruction is received.
And the execution unit 50 is used for executing the task of canceling the running target crawler according to the modified running state of the target crawler task when the running state is detected to be changed.
According to the device for canceling the crawler task, whether the target crawler task is injected into a target message queue is detected through a first detection unit 10, wherein the target message queue is a message queue to which a monitoring unit of a crawler system belongs, and the monitoring unit is used for monitoring the crawler task; the determining unit 20 determines the unique identifier of the target crawler task and runs the target crawler task when detecting that the target crawler task is injected into the target message queue; the second detection unit 30 detects whether a cancel instruction is received in the process of running the target crawler task, wherein the cancel instruction is an instruction for instructing to cancel the running of the target crawler task; the modifying unit 40 modifies the running state of the target crawler task according to the unique identifier of the target crawler task under the condition of receiving the canceling instruction; and when detecting that the running state changes, the execution unit 50 executes the task of canceling the running target crawler according to the modified running state of the target crawler task, so that the problem of a lot of communication resource waste of the system when the crawler task is canceled in the related art is solved. And further, the effect of reducing the overhead of system communication resources when the crawler task is cancelled is achieved.
Optionally, in the apparatus for canceling a crawler task provided in an embodiment of the present application, the apparatus further includes: the system comprises a creating unit and a processing unit, wherein the creating unit is used for creating target child nodes under a crawler task operation root node after a target crawler task is operated, the crawler task operation root node is used for recording the crawler task in an operating state in a crawler system, and the target child nodes are used for recording the information of the target crawler task; the adding unit is used for adding data of the running state of the crawler task in the target child node; and the setting unit is used for setting a monitoring mechanism for the target child node, wherein the monitoring mechanism is used for monitoring the change of the running state of the target crawler task.
Optionally, in the device for canceling a crawler task provided in an embodiment of the present application, the modifying unit 40 includes: the first sending module is used for calling a crawler task cancellation interface to send the unique identifier of the target crawler task under the condition of receiving the cancellation instruction; the sending module is used for searching a target child node which is executing the target crawler task under the crawler task running root node according to the received unique identifier of the target crawler task; and the modification module is used for modifying the running state of the crawler task under the searched target child node.
Optionally, in the apparatus for canceling a crawler task provided in an embodiment of the present application, the execution unit 50 includes: the system comprises a first sending module, a second sending module and a monitoring unit, wherein the first sending module is used for sending a starting message to a monitoring unit of the crawler system when the monitoring mechanism monitors that the running state of a target crawler task changes, and the starting message is used for indicating a crawler task termination program to be started to delete a message queue of process resources of the crawler system; and the deleting module is used for starting a crawler task terminating program to delete the message queue of the process resource of the crawler system after the monitoring unit receives the starting message.
The crawler task canceling device comprises a processor and a memory, wherein the first detection unit 10, the determination unit 20, the second detection unit 30, the modification unit 40, the execution unit 50 and the like are stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions.
The processor comprises a kernel, and the kernel calls the corresponding program unit from the memory. The kernel can be set to be one or more, and the crawler task is cancelled by adjusting the kernel parameters.
The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.
The present application further provides an embodiment of a computer program product, which, when being executed on a data processing device, is adapted to carry out program code for initializing the following method steps: detecting whether a target crawler task is injected into a target message queue, wherein the target message queue is a message queue to which a monitoring unit of a crawler system belongs, and the monitoring unit is used for monitoring the crawler task; if the target crawler task is detected to be injected into the target message queue, determining the unique identifier of the target crawler task and operating the target crawler task; detecting whether a cancel instruction is received in the process of running the target crawler task, wherein the cancel instruction is an instruction for indicating to cancel the running of the target crawler task; if a cancel instruction is received, modifying the running state of the target crawler task according to the unique identifier of the target crawler task; and executing the task of canceling the running target crawler according to the modified running state of the target crawler task when the running state is detected to be changed.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
It will be apparent to those skilled in the art that the modules or steps of the present application described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and they may alternatively be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, or fabricated separately as individual integrated circuit modules, or fabricated as a single integrated circuit module from multiple modules or steps. Thus, the present application is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (8)

1. A method for canceling a crawler task, comprising:
detecting whether a target crawler task is injected into a target message queue, wherein the target message queue is a message queue to which a monitoring unit of a crawler system belongs, and the monitoring unit is used for monitoring the crawler task;
if the target crawler task is detected to be injected into the target message queue, determining a unique identifier of the target crawler task and running the target crawler task;
detecting whether a cancel instruction is received in the process of running the target crawler task, wherein the cancel instruction is an instruction for indicating to cancel running the target crawler task;
if the cancel instruction is received, modifying the running state of the target crawler task according to the unique identifier of the target crawler task; and
when the change of the running state is detected, executing to cancel running the target crawler task according to the modified running state of the target crawler task;
wherein after running the target crawler task, the method further comprises:
creating a target child node under a crawler task operation root node, wherein the crawler task operation root node is used for recording a crawler task in an operating state in the crawler system, and the target child node is used for recording information of the target crawler task;
adding data of the running state of the crawler task in the target child node; and
and setting a monitoring mechanism for the target child node, wherein the monitoring mechanism is used for monitoring the change of the running state of the target crawler task.
2. The method of claim 1, wherein modifying the running state of the target crawler task based on the unique identification of the target crawler task if the cancel instruction is received comprises:
if the cancel instruction is received, calling a crawler task cancel interface to send the unique identifier of the target crawler task;
searching the target child node executing the target crawler task under the crawler task operation root node according to the received unique identifier of the target crawler task; and
and modifying the running state of the crawler task under the searched target child node.
3. The method of claim 1, wherein performing cancellation of the target crawler task according to the modified running state of the target crawler task upon detecting the change in the running state comprises:
when monitoring that the running state of the target crawler task changes, the monitoring mechanism sends a start message to the monitoring unit of the crawler system, wherein the start message is used for indicating to start a crawler task termination program to delete a message queue of process resources of the crawler system; and
and after the monitoring unit receives the starting message, starting the crawler task termination program to delete the message queue of the process resource of the crawler system.
4. The method of claim 1, wherein prior to detecting whether a target crawler task has been injected in a target message queue, the method further comprises performing at least one of:
initializing the target message queue;
initializing a message queue of process resources of the crawler system; and
and initializing the Zookeeper node.
5. The method of claim 4, wherein initializing the Zookeeper node further comprises:
initializing the crawler task operation root node; and/or
And initializing the crawler task eliminating root node, wherein the crawler task eliminating root node is used for recording and eliminating the crawler task.
6. A crawler task cancellation apparatus, comprising:
the system comprises a first detection unit, a second detection unit and a third detection unit, wherein the first detection unit is used for detecting whether a target crawler task is injected into a target message queue, the target message queue is a message queue to which a monitoring unit of a crawler system belongs, and the monitoring unit is used for monitoring the crawler task;
the determining unit is used for determining the unique identifier of the target crawler task and running the target crawler task under the condition that the target crawler task is detected to be injected into the target message queue;
the second detection unit is used for detecting whether a cancel instruction is received or not in the process of running the target crawler task, wherein the cancel instruction is an instruction for indicating to cancel running the target crawler task;
the modification unit is used for modifying the running state of the target crawler task according to the unique identifier of the target crawler task under the condition of receiving the cancel instruction; and
the execution unit is used for executing and canceling the running of the target crawler task according to the modified running state of the target crawler task when the running state is detected to be changed;
wherein the apparatus further comprises:
the system comprises a creating unit and a processing unit, wherein the creating unit is used for creating a target child node under a crawler task running root node after the target crawler task is run, the crawler task running root node is used for recording the crawler task in a running state in the crawler system, and the target child node is used for recording the information of the target crawler task;
the adding unit is used for adding the data of the running state of the crawler task in the target child node; and
and the setting unit is used for setting a monitoring mechanism for the target child node, wherein the monitoring mechanism is used for monitoring the change of the running state of the target crawler task.
7. The apparatus of claim 6, wherein the modifying unit comprises:
the first sending module is used for calling a crawler task canceling interface to send the unique identifier of the target crawler task under the condition of receiving the canceling instruction;
the sending module is used for searching the target child node which is executing the target crawler task under the crawler task running root node according to the received unique identifier of the target crawler task; and
and the modification module is used for modifying the running state of the crawler task under the searched target child node.
8. The apparatus of claim 6, wherein the execution unit comprises:
a second sending module, configured to send a start message to the monitoring unit of the crawler system when the monitoring mechanism monitors that the running state of the target crawler task changes, where the start message is used to instruct to start a crawler task termination program to delete a message queue of process resources of the crawler system; and
and the deleting module is used for starting the crawler task terminating program to delete the message queue of the process resource of the crawler system after the monitoring unit receives the starting message.
CN201610987134.6A 2016-11-09 2016-11-09 Reptile task canceling method and device Active CN108062244B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610987134.6A CN108062244B (en) 2016-11-09 2016-11-09 Reptile task canceling method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610987134.6A CN108062244B (en) 2016-11-09 2016-11-09 Reptile task canceling method and device

Publications (2)

Publication Number Publication Date
CN108062244A CN108062244A (en) 2018-05-22
CN108062244B true CN108062244B (en) 2021-03-26

Family

ID=62137486

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610987134.6A Active CN108062244B (en) 2016-11-09 2016-11-09 Reptile task canceling method and device

Country Status (1)

Country Link
CN (1) CN108062244B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109492145A (en) * 2018-11-08 2019-03-19 大连瀚闻资讯有限公司 Extensive circulation crawler management method applied to public sentiment platform
CN110262882A (en) * 2019-06-17 2019-09-20 北京思特奇信息技术股份有限公司 A kind of distributed communication command scheduling system and method
CN112035721A (en) * 2020-07-22 2020-12-04 大箴(杭州)科技有限公司 Crawler cluster monitoring method and device, storage medium and computer equipment
CN113920698B (en) * 2021-11-25 2023-08-04 杭州安恒信息技术股份有限公司 Early warning method, device, equipment and medium for interface abnormal call

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101268447A (en) * 2005-05-26 2008-09-17 美国联合包裹服务公司 Software process monitor
CN103856467A (en) * 2012-12-06 2014-06-11 百度在线网络技术(北京)有限公司 Method and distributed system for achieving safety scanning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101268447A (en) * 2005-05-26 2008-09-17 美国联合包裹服务公司 Software process monitor
CN103856467A (en) * 2012-12-06 2014-06-11 百度在线网络技术(北京)有限公司 Method and distributed system for achieving safety scanning

Also Published As

Publication number Publication date
CN108062244A (en) 2018-05-22

Similar Documents

Publication Publication Date Title
CN108062244B (en) Reptile task canceling method and device
US11296923B2 (en) Network fault originator identification for virtual network infrastructure
CN108566290B (en) Service configuration management method, system, storage medium and server
CN110569109B (en) Container updating method, control node and edge node
CN111090699A (en) Service data synchronization method and device, storage medium and electronic device
CN105554142B (en) The method, apparatus and system of message push
CN113595788B (en) API gateway management method and device based on plug-in
CN109614164B (en) Method, device and equipment for realizing configurable plug-in and readable storage medium
CN112528296B (en) Vulnerability detection method and device, storage medium and electronic equipment
CN111818117A (en) Data updating method and device, storage medium and electronic equipment
CN108989189A (en) A kind of information push method based on wechat enterprise
CN111225029A (en) Dynamic message pushing method and system and automobile diagnosis server
US20160197863A1 (en) Schedule based execution with extensible continuation based actions
CN109361542A (en) The fault handling method of client, device, system, terminal and server
CN112583898A (en) Business process arranging method and device and readable medium
CN114327710B (en) Function management method, management device, terminal equipment and readable storage medium
US8224933B2 (en) Method and apparatus for case-based service composition
CN107291938B (en) Order inquiry system and method
CN111107147B (en) Message pushing method and device
CN111200651A (en) Method, system, device and medium for timed calling of microservice
CN112437146B (en) Equipment state synchronization method, device and system
CN111339460B (en) Data updating method, device, computer equipment and storage medium
CN113504981A (en) Task scheduling method and device, storage medium and electronic equipment
CN111367853A (en) Data transmission method, device, equipment and computer readable storage medium
CN111367929B (en) Data management method and device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100083 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing

Applicant after: Beijing Guoshuang Technology Co.,Ltd.

Address before: 100086 Cuigong Hotel, 76 Zhichun Road, Shuangyushu District, Haidian District, Beijing

Applicant before: Beijing Guoshuang Technology Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant