CN112579247A

CN112579247A - Method and device for determining task state

Info

Publication number: CN112579247A
Application number: CN201910925908.6A
Authority: CN
Inventors: 张猛
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2019-09-27
Filing date: 2019-09-27
Publication date: 2021-03-30

Abstract

The invention discloses a method and a device for determining task states, and relates to the technical field of computers. One embodiment of the method comprises: acquiring an event occurring in a container set, and determining the current state of the container set from the event; when the current state is an operating state and the latest state of the task stored in advance is a starting related state, determining the current state of the task as an operating state; and/or acquiring events occurring at the deployment unit, and determining the number of unavailable container sets controlled by the deployment unit from the events; and when the number is equal to a preset first threshold value and the latest state of the tasks stored in advance is a starting related state except for a starting failure state, determining the current state of the tasks as the starting failure state. The implementation mode can accurately judge the current state of the task by acquiring the container set event and/or the deployment unit event.

Description

Method and device for determining task state

Technical Field

The invention relates to the technical field of computers, in particular to a method and a device for determining task states.

Background

Kubernets is an application for managing, orchestrating, and scheduling containers in multiple hosts. In the prior art, in order to determine the status of a task in kubernets, a Fabric8 (an application program interface) provided by kubernets and an API Server (a key service process, which provides a unique entry for resource operations and provides mechanisms such as authentication, authorization, access control, application program interface registration and discovery) may be used to interact to query the status of a Pod (a smallest management element in kubernets, a Pod may contain at least one container), and then the status of the task is determined by the status of the Pod.

In the process of implementing the invention, the inventor finds that the prior art has at least the following problems:

the Pod state cannot directly reflect the task state, and the task state is not reasonable by completely depending on Pod judgment in the prior art.

2. The prior art needs to frequently use the Fabric8 to interact with the API Server, thereby putting pressure on the API Server. Meanwhile, historical task information executed in Kubernets cannot be inquired in the prior art.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method and an apparatus for determining a task state, which may accurately determine a current state of a task by acquiring a container set event and/or a deployment unit event.

To achieve the above object, according to one aspect of the present invention, there is provided a method of determining a task state. Wherein the tasks are executed in containers managed by a container set deployed by a deployment unit; the task state comprises a starting related state and an operating state, and the starting related state comprises a starting failure state.

The method for determining the task state of the embodiment of the invention comprises the following steps: acquiring an event occurring in the container set, and determining the current state of the container set from the event; when the current state is an operating state and the latest state of the task stored in advance is a starting related state, determining the current state of the task as an operating state; and/or acquiring events occurring at the deployment unit, and determining the number of unavailable container sets controlled by the deployment unit from the events; and when the number is equal to a preset first threshold value and the latest state of the tasks stored in advance is a starting related state except for a starting failure state, determining the current state of the tasks as the starting failure state.

Optionally, the task state further comprises a stop state; and, the method further comprises: acquiring an event occurring in the deployment unit, and determining the number of available container sets controlled by the deployment unit from the event; and when the number is equal to a preset second threshold value and the latest state of the tasks stored in advance is the running state, determining the current state of the tasks as the stop state.

Optionally, the start-related state further comprises: a starting state and a starting state, wherein the latest state of the task is stored in a cache unit; and, the method further comprises: after determining the current state of the task, storing the current state in a cache unit; prior to performing the step of determining the number of available container sets controlled by the deployment unit from the event: if the event is determined not to carry the stop event identifier, the step is not executed; after acquiring an event occurring at the deployment unit: determining the identifier of the task from the event, and determining the current state of the task as a starting state when the identifier is judged not to be stored in the cache unit; after acquiring an event that occurs at the set of containers: and if the latest state of the task stored in the cache unit is the starting state, determining the current state of the task as the starting state.

Optionally, the task is a streaming computing task, the task is executed in a Spark cluster built on a Kubernetes platform, and the cache unit stores historical task information; the container is a Docker container, the container set is a Driver Pod for scheduling an Executor container set, an Executor Pod, the Deployment unit is a Deployment element, and the state of the Driver Pod further includes at least one of the following: a pending state, a failed state, a deleted state; acquiring events occurring in the set of containers includes: monitoring the Driver Pod by using a monitoring unit, and receiving an event sent by the monitoring unit when the Driver Pod generates the event; acquiring events occurring at the deployment unit includes: monitoring the Deployment by using a monitoring unit, and receiving an event sent by the monitoring unit when the Deployment generates the event; and, the method further comprises: after receiving the event occurring at the execution Pod, acquiring the Spark user interface address information started at the Driver Pod from the tag of the execution Pod.

To achieve the above object, according to another aspect of the present invention, there is provided a method of determining a task state. Wherein the tasks are executed in containers managed by a container set deployed by a deployment unit; the task state includes a start-related state, an in-progress state, and a stop state.

The method for determining the task state of the embodiment of the invention comprises the following steps: acquiring an event occurring in the container set, and determining the current state of the container set from the event; when the current state is an operating state and the latest state of the task stored in advance is a starting related state, determining the current state of the task as an operating state; and/or acquiring events occurring at the deployment unit, and determining the number of available container sets controlled by the deployment unit from the events; and when the number is equal to a preset second threshold value and the latest state of the tasks stored in advance is the running state, determining the current state of the tasks as the stop state.

To achieve the above object, according to still another aspect of the present invention, there is provided an apparatus for determining a task state. Wherein the tasks are executed in containers managed by a container set deployed by a deployment unit; the task state comprises a starting related state and an operating state, and the starting related state comprises a starting failure state.

The device for determining the task state of the embodiment of the invention can comprise: a first state judgment unit, configured to acquire an event occurring in the container set, and determine a current state of the container set from the event; when the current state is an operating state and the latest state of the task stored in advance is a starting related state, determining the current state of the task as an operating state; and/or the second state judgment unit is used for acquiring the events occurring in the deployment unit and determining the number of the unavailable container sets controlled by the deployment unit from the events; and when the number is equal to a preset first threshold value and the latest state of the tasks stored in advance is a starting related state except for a starting failure state, determining the current state of the tasks as the starting failure state.

To achieve the above object, according to still another aspect of the present invention, there is provided an apparatus for determining a task state. Wherein the tasks are executed in containers managed by a container set deployed by a deployment unit; the task state includes a start-related state, an in-progress state, and a stop state.

The device for determining the task state of the embodiment of the invention can comprise: a first state judgment unit, configured to acquire an event occurring in the container set, and determine a current state of the container set from the event; when the current state is an operating state and the latest state of the task stored in advance is a starting related state, determining the current state of the task as an operating state; and/or, a third state judgment unit, configured to acquire an event occurring in the deployment unit, and determine, from the event, the number of available container sets controlled by the deployment unit; and when the number is equal to a preset second threshold value and the latest state of the tasks stored in advance is the running state, determining the current state of the tasks as the stop state.

To achieve the above object, according to still another aspect of the present invention, there is provided an electronic apparatus.

An electronic device of the present invention includes: one or more processors; the storage device is used for storing one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors realize the method for determining the task state provided by the invention.

To achieve the above object, according to still another aspect of the present invention, there is provided a computer-readable storage medium.

A computer-readable storage medium of the invention has stored thereon a computer program which, when being executed by a processor, carries out the method of determining a task state provided by the invention.

According to the technical scheme of the invention, one embodiment of the invention has the following advantages or beneficial effects:

first, in the embodiment of the present invention, a container set event and a deployment unit event related to a task may be obtained, then, a current state of the container set is extracted from the container set event, a current parameter state of the deployment unit (for example, the number of available container sets and the number of unavailable container sets) is extracted from the deployment unit event, and the current state of the container set, the current parameter state of the deployment unit, and a task recent state stored in advance are combined and judged, so as to obtain an accurate current state of the task. In practical application, the current state of the container set and the latest state of the task are judged according to a preset strategy to determine whether the task is in a running state, and the current parameter state of the deployment unit and the latest state of the task are judged according to the preset strategy to determine whether the task is in a failed starting state or a stopped state.

Secondly, the invention can adopt a monitoring mechanism (Watch mechanism) of Kubernets to acquire the container set event and the deployment unit event. Specifically, two monitoring units (watchers) are started to monitor the container set and the deployment unit in parallel, and the container set or the deployment unit automatically captures and pushes the container set or the deployment unit when an event occurs, so that the pressure of the API Server is relieved. In addition, the invention can use a cache unit (such as Redis) to store the recent state of the current task and the historical task information, and can use a cache unit interface to directly inquire when the historical task information needs to be inquired.

Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

FIG. 1 is a diagram illustrating the main steps of a method for determining task status according to a first embodiment of the present invention;

FIG. 2 is a flowchart illustrating the execution of a method for determining task status according to a first embodiment of the present invention;

FIG. 3 is a diagram illustrating the main steps of a method for determining task status according to a second embodiment of the present invention;

FIG. 4 is a schematic diagram of portions of an apparatus for determining task status corresponding to the method of FIG. 1;

FIG. 5 is a schematic diagram of portions of an apparatus for determining task status corresponding to the method shown in FIG. 3;

FIG. 6 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;

fig. 7 is a schematic structural diagram of an electronic device for implementing the method for determining task status in the embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In embodiments of the present invention, status determination and tracking may be performed on tasks executing in kubernets. Kubernets is a lightweight extensible open source platform for managing containerized applications and services through which automated deployment and scalability of applications can occur. In Kubernetes, containers constituting an application are combined into a logical unit so as to be easier to manage and discover, and the logical unit is Pod (container set), which is the most basic operation unit of Kubernetes and internally encapsulates one or more closely related containers (such as a Docker container, which is an open-source application container engine). One or more tags may be attached to the Pod for tagging related information, for example, a task tag is used for tagging a task executed by the Pod, and a role tag is used for tagging functions and roles of the Pod, for example, "app ═ workcount" is a task tag, where app represents an application and workcount (statistics of word occurrences) is a task identifier (which may be a task name); "role ═ driver" is a role label, where role represents a role and driver represents a drive.

The Pod status may be Pending status (Pending), Running status (Running), successful status (successful), Failed status (Failed), Unknown status (Unknown), and deleted status (Killed), and each status is described in detail in the following table.

The Deployment unit is the way that a Pod is actually deployed to a cluster, and the main purpose of the Deployment unit is to declare how many copies a Pod should run at the same time, and when the Deployment unit is added to the cluster, it will automatically create and monitor the required number of pods. If a Pod disappears, the Deployment will automatically recreate. One or more tags may also be attached to the Deployment for tagging related information, such as attaching a task tag for tagging a task to be performed. The deplaypublic can use one or more fields to describe the current state, such as available (number of available container sets), unavailable (number of unavailable container sets), and the following table shows the value of the available field and the value of the unavailable field of each state when the deplaypublic controls one Pod.

Deployment State	available	unavailable
			Start of starting	0	0
Failed start-up	0	1
			Is running	1	0
Stop	0	0

In the prior art, the task execution state is generally determined by determining the Pod state: and judging that the task is in an operating state when the Pod is in an operating state, judging that the task is in a starting failure state when the Pod is in a failure state, and judging that the task is in a stopping state when the Pod is in a deleted state. However, the above method is inaccurate because the Deployment can be automatically created again when the Pod is in a failed or deleted state, and the execution of the task is not affected, so that the task state is determined to be a failed start or a wrong stop state.

Based on the above consideration, the invention can comprehensively judge the task execution state by combining the Pod and the Deployment, and in the stage of starting or stopping the task, because the Deployment can reflect the real working state of the system (the starting of the Deployment means the starting of the task, the failure of the Deployment means the starting of the task, and the stopping of the Deployment means the stopping of the task), the latest state of the Deployment field data and the unavailable field data of the Deployment can be used to judge whether the task starts, fails to start or stops. In the task running stage, the relevance between the Pod state and the task state is strong, so that whether the task is running or not is judged by combining the Pod state with the latest state of the task. Thus, the invention can realize the task state judgment of the complete life cycle from starting to stopping.

In the embodiment of the present invention, the task states include the following five types: the start, start failure, run, and stop can be represented by different values of the phase field, and the specific correspondence and description of the task status are shown in the following table.

Wherein, the starting state and the starting failure state all belong to starting related states. It should be noted that, in the embodiment of the present invention, a cache unit (e.g., Redis) may be used to store a state of a current task (i.e., a task that needs to be executed currently), after determining a current state of the current task, the current state may be updated to the cache unit, when a task state is determined next time, the current state is a latest state of the current task, and the latest state of the task stored in the cache unit may be represented by the phase value. In practical applications, the cache unit stores, in addition to the latest status of the current task, a task identifier (e.g., a task name), the latest status of a Pod executing the task, field data of the default executing the task, and other data to be stored. It will be appreciated that the task identity may be added to the associated tags of the Pod and the Deployment to thereby associate the tasks in the Pod, the Deployment and the cache molecule. In addition, the cache unit can also store historical task (tasks executed in a historical period and not required to be executed currently) information, such as task identification, task description, execution time and the like, and when the historical task information needs to be acquired, the cache unit interface can be used for direct query.

In addition, the tasks targeted by the method can be various types of tasks, such as single-time computing tasks, batch computing tasks and streaming computing tasks. In practical applications, the task may be a Spark (a big data processing engine) task based on the kubernets platform, such as Spark Streaming framework (Spark Streaming framework) task. It will be appreciated that there is generally no need to focus on the successful and unknown states of a Pod when executing a streaming computing task, since the successful states are generally used for state determination of a batch computing task, and valuable information is generally not available from the unknown states.

In a Spark cluster built on a kubernets platform, the roles of Pod generally have a Driver and an Executor, and in the process of executing a task, a developer generally starts a Driver Pod (Driver container set) through a copy controller rc (replication controller) or a copy set rs (replication set), and the Driver Pod starts one or more executors Pod (Executor container sets). When executing the task, the Driver Pod divides the task into a plurality of tasks (subtasks) to be executed by the execution Pod, and the Driver Pod is also responsible for tracking the running condition of the execution Pod, allocating the tasks for the execution Pod, receiving the calculation result returned by the execution Pod, and the like. It can be seen that in the above scenario, tasks are executed within containers in the execution Pod, which are encapsulated in the execution Pod and managed by the Driver Pod, which is deployed by the Deployment agent. The above scenario is referred to as a first scenario, and the technical solution of the present invention is mainly introduced through this scenario, and a scenario without Driver Pod and execution Pod is referred to as a second scenario, in which a task is executed in a container, the container is encapsulated in and managed by the Pod, and the Pod is deployed by a Deployment entity.

It should be understood that, although the following description mainly takes the task state determination scenario of the kubernets platform as an example, this does not set any limit to the application scenario of the present invention. In fact, the method of the present invention can be applied to any other applicable scenarios, and the objects such as "container set", "Deployment unit", etc. are not limited to "Pod" and "Deployment" in kubernets, but may represent corresponding objects in the applicable scenarios. The terms "first", "second", and the like, as used herein, are used herein to describe various objects, but these objects are not limited by the above terms. The above terms are used only to distinguish one object from another. For example, without departing from the scope of the present invention, the first threshold may be referred to as a second threshold, and the second threshold may also be referred to as a first threshold, where the first threshold and the second threshold are both thresholds, but not the same threshold. Furthermore, it should be noted that the embodiments of the present invention and the technical features of the embodiments may be combined with each other without conflict.

Fig. 1 is a schematic diagram of the main steps of a method for determining the task state in the first embodiment of the present invention. In a first embodiment, the current state of a task may be determined by acquiring container set events and/or deployment unit events. Generally, a container set event occurs when a container set is created or a state changes, and the container set event can carry information such as a task identifier corresponding to the container set, a current state of the container set, and the like; the deployment unit event occurs when the deployment unit is created, the state changes or the field data changes, and the deployment unit event can carry information such as a task identifier corresponding to the deployment unit and the current field data of the deployment unit.

In a first scenario, when a task is started for the first time, a deployment unit needs to be created first, after the deployment unit is created, a relevant record (such as a task identifier and a task state) of the task may be stored in a cache unit (the cache unit does not store the relevant record of the task before), and a current state of the task is determined as starting. After that, the deployment unit creates a driver container set through the RC or RS, and during the creation of the driver container set, the current state of the task may be determined as being started. Afterwards, if the deployment unit fails to start, it means that the task fails to start, and the unavailable field value of the deployment unit is 1 at this time; if the drive container set enters the running state, it means that the task enters the running state. Finally, when the task needs to be stopped due to updating the script, abandoning the execution of the task, and the like, the deployment unit may be stopped first, and at this time, the available field value of the deployment unit is zero, which means that the task is in a stopped state. In addition, in the first scenario, the task may directly enter the failed start state from the start state (i.e., the task enters the failed start state due to the failure of the driver container set creation while in the start state), or may directly enter the running state from the failed start state.

As shown in fig. 1, the method for determining a task state according to this embodiment may be specifically executed according to the following steps:

step S101: acquiring an event occurring in a container set, and determining the current state of the container set from the event; and when the current state is the running state and the latest state of the pre-stored task is the starting related state, determining the current state of the task as the running state.

It is to be understood that the set of containers used in this step to determine the task state may be the set of drive containers in the first scenario. This step may obtain a container set event through a listening mechanism, and specifically, in the first scenario, a listening unit (e.g., a watchdog in kubernets) may be started before the task is executed, and the listening unit may listen to the driver container set and the actuator container set, and receive an event sent by the listening unit when the container set has the event. Upon receiving a container set event, a task identification may be extracted therefrom to associate the event with a task stored by the cache unit, along with a container set role. When the container set is the actuator container set, since the tag of the actuator container set has the network address of the drive container set so as to communicate with the drive container set, the network address of the drive container set can be obtained from the tag as the user interface address information (i.e. the IP address and the port of SparkUI, the IP address refers to an internet protocol address) of the spare started in the drive container set, so that the state of the spare can be conveniently viewed and tracked.

If the container set corresponding to the received event is a drive container set, it can be first determined in the cache unit whether the latest state of the drive container set is a start state: if so, the current state of the task may be determined to be the startup state (since the first drive container set event received when the task is in the Start state is typically a create event, so the task transitions to the startup state). If not, extracting the current state of the driver container set from the event, and determining the current state of the task as the running state when judging that the current state of the driver is the running state and the latest state of the task is the starting related state. And if the current state of the driver is judged not to be the running state or the latest state of the task is not the starting related state, the task state stored in the cache unit is not updated, and the container set is monitored continuously.

Step S102: acquiring an event occurring in a deployment unit, and determining the number of unavailable container sets controlled by the deployment unit from the event; and when the number is equal to a preset first threshold value and the latest state of the pre-stored tasks is a starting related state except for a starting failure state, determining the current state of the tasks as the starting failure state.

In this step, the monitoring unit can be started to monitor the deployment unit, and receive the event sent by the monitoring unit when the deployment unit has an event. After receiving the deployment unit event, the deployment unit event can extract the task identifier from the deployment unit event, and judge whether the task identifier is stored in the cache unit. If the task is not stored in the cache unit, the task is a new task, at this time, a task identifier and the like can be recorded in the cache unit, and the current state of the task is determined as a starting state. If the task identifier is already stored in the cache unit, it indicates that the task is not a new task, and at this time, it is first determined in the cache unit whether the latest state of the task is a start state or a start state: if so, it may be continuously determined whether the unavailable field value of the deployment unit is a preset first threshold (generally, the first threshold may be set as the total number of container sets controlled by the deployment unit, and in a first scenario, the first threshold is 1), and if the unavailable field value is the first threshold, it indicates that the deployment unit fails to start at this time, so that the current state of the task may be determined as a failed start state. And if the unavailable field value is not the first threshold, maintaining the original state of the task. It should be noted that step S102 and step S101 may be executed independently or in combination.

In an optional implementation manner, if it is determined in the cache unit that the latest state of the task is not the start-up state or the in-progress state, it is determined whether the latest state is the in-progress state: if not, maintaining the original state of the task; if so, extracting an available field value of the deployment unit from the event, and judging whether the available field value is a preset second threshold (the second threshold can be set according to requirements, and in the first scene, the second threshold is zero). If the available field value is the second threshold, it indicates that the deployment unit has stopped at this time, and thus the current state of the task may be determined as the stopped state. If the available field value is not the second threshold, the task original state is maintained.

As a preferred solution, before extracting the available field value of the deployment unit from the event, it may be first determined whether the event carries a stop event identifier (e.g. stop event code): if the event carries the stop event identifier, the event is indicated as a stop event, and the available field value of the deployment unit can be continuously extracted and judged; if the event does not carry the stop event identifier, the event is not the stop event, and at this time, the judgment process can be terminated, and the original state of the task is maintained. Through the steps, the accuracy of judging the task stop state can be improved.

Fig. 2 is a schematic flowchart illustrating an execution flow of a method for determining a task state according to a first embodiment of the present invention, and as shown in fig. 2, in a first scenario, the task state may be determined through the following specific steps. Firstly, starting a monitoring task, and starting two monitoring units to respectively monitor Pod (lower branch) and Deployment (upper branch). In the upper branch, after the default event is acquired, the app name is extracted from the default event (the app name serves as a task identifier), and whether the app name exists in a Redis (cache unit) is judged. If not, a task record is newly built in Redis, phase is stored as zero (namely, the task is determined to be in the starting state), and the current state of the drive container set is set to be processed. If app name exists in Redis, then continue to determine whether phase is 0 or 1 (i.e., determine whether the most recent state of the task is the Start or Start state): if yes, acquiring an unavailable field value of the Deployment, updating the task phase of the cache unit to 2 (namely determining that the task is in a start failure state) when the unavailable field value is 1, updating the state of the driver container set to failure, and maintaining the original state of the task when the unavailable field value is not 1.

If it is determined in the above step that phase is neither zero nor 1, it is determined whether phase is 3: if not, maintaining the original state of the task; if yes, continuing to judge whether the Deployment event is a stop event. If the delivery event is not a stop event, the task original state is maintained, and if the delivery event is a stop event, the available field value of the delivery is acquired. The task original state is maintained when the available field value is not zero, the task phase of the cache unit is updated to 4 (i.e., the task is determined to be in the stopped state) when the available field value is zero, and the state of the drive container set is updated to be deleted.

In the lower branch, after a Pod event is monitored, a Pod role is firstly acquired, when the Pod is an executive Pod, a Spark UI address is acquired from an executive Pod tag, and when the Pod is a Driver Pod, whether a task phase stored in a cache is zero is judged: if yes, updating the phase to 1 (namely determining that the task is in the starting state), wherein the Pod state is to be processed; if not, extracting the Pod current state from the Pod event, and judging whether phase is 1 or 2 and whether the Pod current state is the running state. If phase is 1 or 2 and the Pod current state is the running state, the phase in the cache unit is updated to 3 (i.e., it is determined that the task is in the running state). If phase is neither 1 nor 2, or the Pod current state is not a running state, the end flow continues to listen for Pod.

It should be noted that the upper branch and the lower branch in fig. 2 have a close relationship. Specifically, first, when determining that a task is in a failed start state or a stopped state, the upper branch may need to rely on a previously determined start state or running state, which are determined by the lower branch determination; when the lower branch determines that the task is in the starting state or the running state, the lower branch may need to rely on a starting state or a starting failure state determined before, and the starting state or the starting failure state is determined by the upper branch. In addition, the Deployment has a management scheduling function on the Pod, and the change of the state of the Deployment can trigger the Pod event, so that the judgment logic of the upper branch and the lower branch has strong correlation.

In the second scenario, the Pod has no division between a Driver Pod and an execution Pod, the Deployment unit deploys one or more pods directly, and the flow of determining the task state is basically the same as that in the first scenario, except that: in the judgment branch for monitoring the Pod, the role of the Pod does not need to be judged; in the decision branch for listening to the deputy, the first threshold may be set to the Pod total and the second threshold may still be set to zero.

Fig. 3 is a schematic diagram of the main steps of a method for determining the task state in a second embodiment of the present invention. As shown in fig. 3, the following steps may be performed in the second embodiment to determine the task status.

Step S101: acquiring an event occurring in a container set, and determining the current state of the container set from the event; and when the current state is the running state and the latest state of the pre-stored task is the starting related state, determining the current state of the task as the running state. This step is the same as step S101 in the first embodiment, and is not repeated here.

Step S302: acquiring an event occurring in a deployment unit, and determining the number of available container sets controlled by the deployment unit from the event; and when the number is equal to a preset second threshold value and the latest state of the tasks stored in advance is the running state, determining the current state of the tasks as the stop state. This step is the same as the step of determining that the current state of the task is the stopped state in the first embodiment, and is not repeated here. It is understood that step S302 and step S101 may be executed independently or in combination.

Preferably, in this embodiment, an event occurring at the deployment unit may be obtained, and the number of unavailable container sets controlled by the deployment unit may be determined from the event; and when the number is equal to a preset first threshold value and the latest state of the pre-stored tasks is a starting related state except for a starting failure state, determining the current state of the tasks as the starting failure state. This step is the same as the step of determining that the task is in the failed start state in the first embodiment, and the specific details are not repeated here.

In addition, in an alternative implementation, before the step of determining the number of available container sets controlled by the deployment unit from the deployment unit event is performed, if it is determined that the deployment unit event does not carry the stop event identifier, the step is not performed. After the event occurring in the deployment unit is acquired, the task identifier is determined from the event, and the current state of the task is determined as the starting state when the task identifier is judged not to be stored in the cache unit. After acquiring the event occurring in the container set, if the latest state of the task stored in the cache unit is the starting state, determining the current state of the task as the starting state.

In the technical scheme of the embodiment of the invention, the container set event and the deployment unit event related to the task can be acquired, then the current state of the container set is extracted from the container set event, the current parameter state of the deployment unit is extracted from the deployment unit event, and the current state of the container set, the current parameter state of the deployment unit and the task recent state stored in advance are combined and judged, so that the current accurate state of the task is obtained. In practical application, the current state of the container set and the latest state of the task are judged according to a preset strategy to determine whether the task is in a running state, and the current parameter state of the deployment unit and the latest state of the task are judged according to the preset strategy to determine whether the task is in a failed starting state or a stopped state. Meanwhile, the invention can adopt a monitoring mechanism of Kubernetes to acquire the container set event and the deployment unit event. Specifically, two monitoring units are started to monitor the container set and the deployment unit in parallel, and the container set or the deployment unit automatically captures and pushes the container set or the deployment unit when an event occurs, so that the pressure of an API Server is relieved. In addition, the invention can use the cache unit to store the recent state of the current task and the historical task information, and can use the interface of the cache unit to directly inquire when the historical task information needs to be inquired.

It should be noted that, for the convenience of description, the foregoing method embodiments are described as a series of acts, but those skilled in the art will appreciate that the present invention is not limited by the order of acts described, and that some steps may in fact be performed in other orders or concurrently. Moreover, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no acts or modules are necessarily required to implement the invention.

To facilitate a better implementation of the above-described aspects of embodiments of the present invention, the following also provides relevant means for implementing the above-described aspects.

Referring to fig. 4, an apparatus 400 for determining a task state according to an embodiment of the present invention (the apparatus corresponds to the method of the first embodiment) may include: a first state judgment unit 401 and/or a second state judgment unit 402. Wherein the tasks are executed in containers managed by a container set deployed by a deployment unit; the task state comprises a starting related state and an operating state, and the starting related state comprises a starting failure state.

The first state judgment unit 401 may be configured to: acquiring an event occurring in the container set, and determining the current state of the container set from the event; and when the current state is the running state and the pre-stored latest state of the task is the starting related state, determining the current state of the task as the running state. The second state determination unit 402 is configured to: acquiring an event occurring in the deployment unit, and determining the number of unavailable container sets controlled by the deployment unit from the event; and when the number is equal to a preset first threshold value and the latest state of the tasks stored in advance is a starting related state except for a starting failure state, determining the current state of the tasks as the starting failure state.

In an embodiment of the present invention, the task state further includes a stop state. The apparatus 400 may further include a third status determining unit, configured to obtain an event occurring at the deployment unit, and determine, from the event, the number of available container sets controlled by the deployment unit; and when the number is equal to a preset second threshold value and the latest state of the tasks stored in advance is the running state, determining the current state of the tasks as the stop state.

The initiating the relevant state further comprises: a start-up state and a start-up-state, the most recent state of the task being stored in a cache unit. The device 400 may further comprise an execution unit for: after determining the current state of the task, storing the current state in a cache unit; prior to performing the step of determining the number of available container sets controlled by the deployment unit from the event: if the event is determined not to carry the stop event identifier, the step is not executed; after acquiring an event occurring at the deployment unit: determining the identifier of the task from the event, and determining the current state of the task as a starting state when the identifier is judged not to be stored in the cache unit; after acquiring an event that occurs at the set of containers: and if the latest state of the task stored in the cache unit is the starting state, determining the current state of the task as the starting state.

In addition, in the embodiment of the present invention, the task is a streaming computing task, the task is executed in a Spark cluster built on a Kubernetes platform, and the cache unit stores history task information. The container is a Docker container, the container set is a Driver Pod for scheduling an Executor container set, an Executor Pod, the Deployment unit is a Deployment element, and the state of the Driver Pod further includes at least one of the following: a pending state, a failed state, a deleted state. The first state judgment unit 401 may be further configured to: and monitoring the Driver Pod by using the monitoring unit, and receiving the event sent by the monitoring unit when the Driver Pod generates the event. The second state determination unit 402 may be further configured to: and monitoring the Deployment by using the monitoring unit, and receiving the event sent by the monitoring unit when the Deployment generates the event. The execution unit may be further to: after receiving the event occurring at the execution Pod, acquiring the Spark user interface address information started at the Driver Pod from the tag of the execution Pod.

Fig. 5 is a schematic diagram of a portion of an apparatus for determining a task state corresponding to the method shown in fig. 3. As shown in fig. 5, the apparatus 500 includes: a first state judgment unit 401 and/or a third state judgment unit 502. Wherein the tasks are executed in containers managed by a container set deployed by a deployment unit; the task state includes a start-related state, an in-progress state, and a stop state.

Specifically, the first state judgment unit 401 may be configured to obtain an event occurring in the container set, and determine a current state of the container set from the event; and when the current state is the running state and the pre-stored latest state of the task is the starting related state, determining the current state of the task as the running state. The third state judgment unit 502 may be configured to obtain an event occurring at the deployment unit, and determine the number of available container sets controlled by the deployment unit from the event; and when the number is equal to a preset second threshold value and the latest state of the tasks stored in advance is the running state, determining the current state of the tasks as the stop state.

In the embodiment of the present invention, the startup related state includes a startup start state, a startup in progress state, and a startup failure state. The apparatus 500 may further include a second state determination unit, configured to obtain an event occurring at the deployment unit, and determine, from the event, the number of unavailable container sets controlled by the deployment unit; and when the number is equal to a preset first threshold value and the latest state of the tasks stored in advance is a starting related state except for a starting failure state, determining the current state of the tasks as the starting failure state.

As a preferred aspect, the apparatus 500 further comprises an execution unit, which is operable to: after determining the current state of the task, storing the current state in a cache unit; prior to performing the step of determining the number of available container sets controlled by the deployment unit from the event: if the event is determined not to carry the stop event identifier, the step is not executed; after acquiring an event occurring at the deployment unit: determining the identifier of the task from the event, and determining the current state of the task as a starting state when the identifier is judged not to be stored in the cache unit; after acquiring an event that occurs at the set of containers: and if the latest state of the task stored in the cache unit is the starting state, determining the current state of the task as the starting state.

Preferably, in the embodiment of the present invention, the task is a streaming computing task, the task is executed in a Spark cluster built on a Kubernetes platform, and the cache unit stores history task information. The container is a Docker container, the container set is a Driver Pod for scheduling an Executor container set, an Executor Pod, the Deployment unit is a Deployment element, and the state of the Driver Pod further includes at least one of the following: a pending state, a failed state, a deleted state. The first state judgment unit 401 may be further configured to: and monitoring the Driver Pod by using the monitoring unit, and receiving the event sent by the monitoring unit when the Driver Pod generates the event. The second state determination unit may be further configured to: and monitoring the Deployment by using the monitoring unit, and receiving the event sent by the monitoring unit when the Deployment generates the event. The execution unit may be further to: after receiving the event occurring at the execution Pod, acquiring the Spark user interface address information started at the Driver Pod from the tag of the execution Pod.

Fig. 6 illustrates an exemplary system architecture 600 of a method of determining task status or a device for determining task status to which embodiments of the invention may be applied.

As shown in fig. 6, the system architecture 600 may include

terminal devices

601, 602, 603, a network 604 and a server 605 (this architecture is merely an example, and the components included in a specific architecture may be adjusted according to the specific application). The network 604 serves to provide a medium for communication links between the

terminal devices

601, 602, 603 and the server 605. Network 604 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.

A user may use the

terminal devices

601, 602, 603 to interact with the server 605 via the network 604 to receive or send messages or the like. Various client applications, such as an application for determining task status (for example only), may be installed on the

terminal devices

601, 602, 603.

The

terminal devices

601, 602, 603 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 605 may be a server providing various services, such as a background server (just an example) providing support for applications operated by the user with the

terminal device

601, 602, 603 for determining the task state. The backend server may process the received status determination request and feed back a processing result (e.g., the determined task status — just an example) to the

terminal device

601, 602, 603.

It should be noted that the method for determining the task state provided by the embodiment of the present invention is generally executed by the server 605, and accordingly, the device for determining the task state is generally disposed in the server 605.

It should be understood that the number of terminal devices, networks, and servers in fig. 6 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

The invention also provides the electronic equipment. The electronic device of the embodiment of the invention comprises: one or more processors; the storage device is used for storing one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors realize the method for determining the task state provided by the invention.

Referring now to FIG. 7, shown is a block diagram of a computer system 700 suitable for use with the electronic device implementing an embodiment of the present invention. The electronic device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 7, the computer system 700 includes a Central Processing Unit (CPU)701, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM703, various programs and data necessary for the operation of the computer system 700 are also stored. The CPU701, the ROM 702, and the RAM703 are connected to each other via a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

The following components are connected to the I/O interface 705: an input portion 706 including a keyboard, a mouse, and the like; an output section 707 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read out therefrom is mounted into the storage section 708 as necessary.

In particular, the processes described in the main step diagrams above may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the invention include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the main step diagram. In the above-described embodiment, the computer program can be downloaded and installed from a network through the communication section 709, and/or installed from the removable medium 711. The computer program, when executed by the central processing unit 701, performs the above-described functions defined in the system of the present invention.

It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present invention may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a first state judgment unit and a second state judgment unit. Where the names of these units do not in some cases constitute a limitation on the unit itself, for example, the first state judgment unit may also be described as "a unit for determining the current state of the task from the container set event".

As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to perform steps comprising: acquiring an event occurring in a container set, and determining the current state of the container set from the event; when the current state is an operating state and the latest state of the pre-stored task is a starting related state, determining the current state of the task as an operating state; and/or acquiring events occurring at the deployment unit, and determining the number of unavailable container sets controlled by the deployment unit from the events; and when the number is equal to a preset first threshold value and the latest state of the pre-stored tasks is a starting related state except for a starting failure state, determining the current state of the tasks as the starting failure state.

The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method of determining a task state, wherein the task is executed in a container, the container is managed by a set of containers, the set of containers is deployed by a deployment unit; the task state comprises a starting related state and an operating state, and the starting related state comprises a starting failure state;

characterized in that the method comprises:

acquiring an event occurring in the container set, and determining the current state of the container set from the event; when the current state is an operating state and the latest state of the task stored in advance is a starting related state, determining the current state of the task as an operating state; and/or

Acquiring an event occurring in the deployment unit, and determining the number of unavailable container sets controlled by the deployment unit from the event; and when the number is equal to a preset first threshold value and the latest state of the tasks stored in advance is a starting related state except for a starting failure state, determining the current state of the tasks as the starting failure state.

2. The method of claim 1, wherein the task state further comprises a stop state; and, the method further comprises:

acquiring an event occurring in the deployment unit, and determining the number of available container sets controlled by the deployment unit from the event; and when the number is equal to a preset second threshold value and the latest state of the tasks stored in advance is the running state, determining the current state of the tasks as the stop state.

3. The method of claim 2, wherein the initiating the relevant state further comprises: a starting state and a starting state, wherein the latest state of the task is stored in a cache unit; and, the method further comprises:

after determining the current state of the task, storing the current state in a cache unit;

prior to performing the step of determining the number of available container sets controlled by the deployment unit from the event: if the event is determined not to carry the stop event identifier, the step is not executed;

after acquiring an event occurring at the deployment unit: determining the identifier of the task from the event, and determining the current state of the task as a starting state when the identifier is judged not to be stored in the cache unit;

after acquiring an event that occurs at the set of containers: and if the latest state of the task stored in the cache unit is the starting state, determining the current state of the task as the starting state.

4. The method of claim 3,

the task is a streaming computing task, the task is executed in a Spark cluster built on a Kubernetes platform, and historical task information is stored in the cache unit;

the container is a Docker container, the container set is a Driver Pod for scheduling an Executor container set, an Executor Pod, the Deployment unit is a Deployment element, and the state of the Driver Pod further includes at least one of the following: a pending state, a failed state, a deleted state;

acquiring events occurring in the set of containers includes: monitoring the Driver Pod by using a monitoring unit, and receiving an event sent by the monitoring unit when the Driver Pod generates the event;

acquiring events occurring at the deployment unit includes: monitoring the Deployment by using a monitoring unit, and receiving an event sent by the monitoring unit when the Deployment generates the event;

and, the method further comprises: after receiving the event occurring at the execution Pod, acquiring the Spark user interface address information started at the Driver Pod from the tag of the execution Pod.

5. A method of determining a task state, wherein the task is executed in a container, the container is managed by a set of containers, the set of containers is deployed by a deployment unit; the task state comprises a starting related state, a running state and a stopping state;

characterized in that the method comprises:

6. The method of claim 5, wherein the initiating the relevant state further comprises: a starting state and a starting state, wherein the latest state of the task is stored in a cache unit; and, the method further comprises:

7. The method of claim 6,

8. An apparatus that determines a status of a task, wherein the task is executed in a container, the container is managed by a container set, and the container set is deployed by a deployment unit; the task state comprises a starting related state and an operating state, and the starting related state comprises a starting failure state;

characterized in that the device comprises:

a first state judgment unit, configured to acquire an event occurring in the container set, and determine a current state of the container set from the event; when the current state is an operating state and the latest state of the task stored in advance is a starting related state, determining the current state of the task as an operating state; and/or

The second state judgment unit is used for acquiring events occurring in the deployment unit and determining the number of unavailable container sets controlled by the deployment unit from the events; and when the number is equal to a preset first threshold value and the latest state of the tasks stored in advance is a starting related state except for a starting failure state, determining the current state of the tasks as the starting failure state.

9. An apparatus that determines a status of a task, wherein the task is executed in a container, the container is managed by a container set, and the container set is deployed by a deployment unit; the task state comprises a starting related state, a running state and a stopping state;

characterized in that the device comprises:

A third state judgment unit, configured to acquire an event occurring in the deployment unit, and determine, from the event, the number of available container sets controlled by the deployment unit; and when the number is equal to a preset second threshold value and the latest state of the tasks stored in advance is the running state, determining the current state of the tasks as the stop state.

10. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.

11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-7.