CN113760472A

CN113760472A - Method and device for scheduling push tasks

Info

Publication number: CN113760472A
Application number: CN202010495110.5A
Authority: CN
Inventors: 燕媛媛
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Priority date: 2020-06-03
Filing date: 2020-06-03
Publication date: 2021-12-07

Abstract

The invention discloses a method and a device for scheduling a pushing task, and relates to the technical field of computers. One embodiment of the method comprises: testing and running all the pushing tasks, and regularly acquiring the cluster residual throughput of the target cluster receiving the pushing data in the testing process; determining the execution sequence of all the pushing tasks according to the task parameters of each pushing task and the residual throughput of the cluster obtained at fixed time; and pushing the pushing data to the target cluster according to the execution sequence. The implementation method automatically coordinates the execution sequence of all the pushing tasks on one cluster by testing and running all the pushing tasks, and can achieve the purpose of efficiently utilizing the throughput of the whole cluster.

Description

Method and device for scheduling push tasks

Technical Field

The invention relates to the technical field of computers, in particular to a method and a device for scheduling a pushing task.

Background

And the pushing task is used for pushing the offline data to the cluster. To make efficient use of the cluster functionality, a cluster may carry index offsets for multiple items. Cluster throughput can be a bottleneck when billions of data are pushed to the cluster. In the prior art, the utilization rate of throughput of an ES cluster is generally improved by improving the cluster writing speed and serially or parallelly processing an inferred task. However, this method cannot improve the utilization of the throughput of the whole cluster under the condition that the cluster throughput is limited.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method and an apparatus for scheduling a speculative task, which automatically coordinate an execution sequence of all speculative tasks on a cluster by testing and running all speculative tasks, and can achieve the purpose of efficiently utilizing throughput of the whole cluster.

To achieve the above object, according to an aspect of the embodiments of the present invention, there is provided a method for scheduling a speculative task, including:

testing and running all the pushing tasks, and regularly acquiring the cluster residual throughput of the target cluster receiving the pushing data in the testing process;

determining the execution sequence of all the pushing tasks according to the task parameters of each pushing task and the residual throughput of the cluster obtained at fixed time;

and pushing the pushing data to the target cluster according to the execution sequence.

Optionally, the task parameters include: task occupation throughput and task priority;

determining the execution sequence of all the pushing tasks according to the task parameters of each pushing task and the cluster residual throughput obtained at fixed time, wherein the execution sequence comprises the following steps:

after the cluster residual throughput is obtained each time, sorting the waiting deduction tasks with the task occupation throughput less than or equal to the cluster residual throughput according to the sequence of the task priorities from high to low to obtain a sorting result;

and determining the execution sequence of all the pushing tasks according to the sequencing result corresponding to the cluster residual throughput obtained each time.

Optionally, the task parameters further include: the number pushing duration and the starting time;

after the tasks waiting for pushing with the task occupation throughput less than or equal to the cluster residual throughput are sequenced according to the sequence of the task priorities from high to low, the method further comprises the following steps:

judging whether a waiting number-pushing task with the same task priority exists in the sequenced task sequence or not; if not, taking the task sequence as the sequencing result; if yes, sorting the number-pushing waiting tasks with the same task priority in the task sequence again according to the order of the number-pushing duration from short to long;

judging whether a waiting number pushing task with the same number pushing duration exists in the re-ordered task sequence; if so, sorting the waiting pushing number tasks with the same pushing number duration in the re-sorted task sequence according to the sequence of the starting time from front to back to obtain a sorting result; otherwise, taking the second task sequence as the sequencing result.

Optionally, after determining the execution order of all the speculative tasks, the method further includes: judging whether the execution sequence of all the number pushing tasks meets a first adjustment condition or not; if yes, adjusting the execution sequence of all the number pushing tasks according to the following steps:

adjusting the task priority of at least one task waiting for pushing to be the same as the task priority of the task pushing ahead, then retesting and running all pushing tasks, and determining the new execution sequence of all pushing tasks;

judging whether the total execution time length of the number pushing tasks corresponding to the new execution sequence is greater than the total execution time length of the number pushing tasks corresponding to the execution sequence; if so, rolling back the task priority of the adjusted tasks waiting for pushing, and not adjusting the execution sequence of all the pushing tasks; otherwise, taking the new execution sequence as the execution sequence of all the number pushing tasks;

wherein the first adjustment condition includes: idle time periods exist on a pushing time axis corresponding to the execution sequence, and pushing waiting tasks exist in the idle time periods; the idle period refers to a period in which the cluster remaining throughput of the target cluster is equal to zero; the preceding pushup task refers to a pushup task which is executed last before the idle period.

Optionally, adjusting the task priority of at least one task waiting for a pushup to be the same as the task priority of a task preceding the pushup, comprising:

determining the waiting tasks with priorities to be adjusted according to the cluster highest throughput of the target cluster and the task occupation throughput of each waiting pushing task on the pushing time axis;

and adjusting the task priority of the waiting task with the priority to be adjusted to be the same as the task priority of the task with the forward pushing number, and updating the cluster residual throughput of the adjusted target cluster.

Optionally, after the task priority of at least one pushing-waiting task is adjusted to be the same as the task priority of the pushing-forward task, the method further includes:

judging whether the cluster residual throughput of the target cluster after adjustment is larger than zero; and if so, increasing the task occupation throughput of the adjusted task waiting for the pushing number task to enable the cluster residual throughput of the target cluster to be zero.

Optionally, after determining the execution order of all the speculative tasks, the method further includes: judging whether the execution sequence of all the number pushing tasks meets a second adjustment condition or not; if yes, adjusting the execution sequence of all the number pushing tasks according to the following steps:

adjusting the task priority of at least one task waiting for pushing number, of which the task occupation throughput is less than or equal to the residual throughput of the cluster, to be the same as the task priority of the pushing number task being executed, then retesting and running all the pushing number tasks, and determining the new execution sequence of all the pushing number tasks;

wherein the second adjustment condition includes: a non-full period exists on a pushing time axis corresponding to the execution sequence, and a waiting pushing task with task occupation throughput less than or equal to the cluster residual throughput exists in the non-full period; the non-full period refers to a period in which the cluster remaining throughput of the target cluster is greater than zero and less than the highest cluster throughput.

Optionally, adjusting the task priority of at least one push-waiting task with a task occupation throughput less than or equal to the remaining throughput of the cluster to be the same as the task priority of the push task being executed includes:

determining the waiting tasks with priorities to be adjusted according to the cluster residual throughput and the task occupation throughput of each waiting pushing task on the pushing time axis;

and adjusting the task priority of the waiting task with the priority to be adjusted to be the same as the task priority of the pushed task which is being executed, and updating the cluster residual throughput of the adjusted target cluster.

Optionally, after adjusting the task priority of at least one push-waiting task whose task occupation throughput is less than or equal to the remaining throughput of the cluster to be the same as the task priority of the push task being executed, the method further includes:

According to a second aspect of the embodiments of the present invention, there is provided an apparatus for scheduling a speculative task, including:

the acquisition module tests and runs all the pushed tasks and regularly acquires the cluster residual throughput of the target cluster receiving the pushed data in the test process;

the determining module is used for determining the execution sequence of all the pushing tasks according to the task parameters of each pushing task and the cluster residual throughput obtained at fixed time;

and the pushing module pushes the pushing data to the target cluster according to the execution sequence.

the determining module determines the execution sequence of all the pushed tasks according to the task parameters of each pushed task and the cluster remaining throughput obtained at fixed time, and the determining module comprises the following steps:

the determination module is further to: after the waiting pushing number tasks with the task occupation throughput less than or equal to the residual throughput of the cluster are sequenced according to the sequence of the task priorities from high to low,

Optionally, the determining module is further configured to: after the execution sequence of all the number pushing tasks is determined, judging whether the execution sequence of all the number pushing tasks meets a first adjustment condition; if yes, adjusting the execution sequence of all the number pushing tasks according to the following steps:

Optionally, the determining module adjusts the task priority of at least one task waiting for a number of pushes to be the same as the task priority of a task waiting for a number of pushes, including:

Optionally, the determining module is further configured to: after the task priority of at least one of the pending pushup tasks is adjusted to be the same as the task priority of the preceding pushup task,

Optionally, the determining module is further configured to: after the execution sequence of all the number pushing tasks is determined, judging whether the execution sequence of all the number pushing tasks meets a second adjustment condition; if yes, adjusting the execution sequence of all the number pushing tasks according to the following steps:

Optionally, the adjusting, by the determining module, a task priority of at least one pushing task waiting for which a task occupancy throughput is less than or equal to the remaining throughput of the cluster is adjusted to be the same as a task priority of a pushing task being executed, including:

Optionally, the determining module is further configured to: after the task priority of at least one waiting pushing task with the task occupying throughput less than or equal to the residual throughput of the cluster is adjusted to be the same as the task priority of the pushing task which is executing,

According to a third aspect of the embodiments of the present invention, there is provided an electronic device for pushed task scheduling, including:

one or more processors;

a storage device for storing one or more programs,

when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the method provided by the first aspect of the embodiments of the present invention.

According to a fourth aspect of embodiments of the present invention, there is provided a computer readable medium, on which a computer program is stored, which when executed by a processor, implements the method provided by the first aspect of embodiments of the present invention.

One embodiment of the above invention has the following advantages or benefits: the execution sequence of all the pushing tasks on one cluster is automatically coordinated by testing and running all the pushing tasks, and the purpose of efficiently utilizing the throughput of the whole cluster can be achieved. The task priority of the pushing task, the task occupation throughput and other task parameters are dynamically adjusted by detecting the execution condition of each pushing task on the whole pushing time axis, so that the utilization rate of the whole cluster throughput can be further improved.

Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

FIG. 1 is a schematic diagram of a main flow of a method of push task scheduling according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a method for speculative task scheduling in an alternative embodiment of the invention;

FIG. 3 is a flowchart illustrating task parameters for adjusting a speculative task based on idle periods in an alternative embodiment of the present invention;

FIG. 4 is a flowchart illustrating task parameters for adjusting a pushed task based on an underfilling period in an alternative embodiment of the invention;

FIG. 5 is a diagram illustrating the main modules of an apparatus for push task scheduling according to an embodiment of the present invention;

FIG. 6 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;

fig. 7 is a schematic block diagram of a computer system suitable for use in implementing a terminal device or server of an embodiment of the invention.

Detailed Description

Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

According to an aspect of an embodiment of the present invention, a method for scheduling a speculative task is provided.

Fig. 1 is a schematic diagram of a main flow of a method for scheduling a speculative task according to an embodiment of the present invention, and as shown in fig. 1, the method for scheduling a speculative task includes: step S101, step S102, and step S103.

In step S101 and step S102, all the pushed tasks are run by testing, and the execution order of all the pushed tasks is determined. Specifically, in step S101, all the pushed tasks are tested and run, and the cluster remaining throughput of the target cluster receiving the pushed data in the test process is obtained at regular time. In step S102, determining an execution sequence of all the pushed tasks according to the task parameter of each pushed task and the remaining throughput of the cluster obtained at regular time. In step S103, the push data is pushed to the target cluster according to the execution order.

Illustratively, when all the pushed tasks are tested and run, the cluster remaining throughput of the target cluster receiving the pushed data in the test process is acquired every 1 minute and is acquired for 2 times in total. After the cluster residual throughput of the target cluster is obtained for the first time, the execution sequence of all the pushed tasks is determined to be T1, T2, T3, T4 and T5 according to the task parameter of each pushed task and the cluster residual throughput obtained for the first time. And secondly acquiring the cluster residual throughput of the target cluster when the operation of the pushed task T2 is finished, and determining the execution sequence of all unexecuted pushed tasks to be T3, T5 and T4 according to the task parameter of each pushed task and the cluster residual throughput acquired for the second time. The execution sequence of all the pushed tasks obtained after the test run is T1, T2, T3, T5 and T4.

The content and the number of the task parameters can be selectively set according to actual conditions. For example, if the task parameters include task occupation throughput, when the execution sequence of all the pushed tasks is determined according to the task parameters of each pushed task and the cluster residual throughput obtained at fixed time, all the pushed tasks are sequenced according to the sequence of the task occupation ratio from high to low or from low to high.

Optionally, the task parameters include: tasks occupy throughput and task priority. Determining the execution sequence of all the pushing tasks according to the task parameters of each pushing task and the cluster residual throughput obtained at fixed time, wherein the execution sequence comprises the following steps: after the cluster residual throughput is obtained each time, sorting the waiting deduction tasks with the task occupation throughput less than or equal to the cluster residual throughput according to the sequence of the task priorities from high to low to obtain a sorting result; and determining the execution sequence of all the pushing tasks according to the sequencing result corresponding to the cluster residual throughput obtained each time.

The execution sequence of all the pushing tasks is determined according to the sequence of the task priorities from high to low, so that important or core pushing data can be preferentially pushed to the target cluster.

Optionally, the task parameters further include: the number pushing duration and the start time. After the tasks waiting for pushing with the task occupation throughput less than or equal to the cluster residual throughput are sequenced according to the sequence of the task priorities from high to low, the method further comprises the following steps: judging whether a waiting number-pushing task with the same task priority exists in the sequenced task sequence or not; if not, taking the task sequence as the sequencing result; if yes, sorting the number-pushing waiting tasks with the same task priority in the task sequence again according to the order of the number-pushing duration from short to long; judging whether a waiting number pushing task with the same number pushing duration exists in the re-ordered task sequence; if so, sorting the waiting pushing number tasks with the same pushing number duration in the re-sorted task sequence according to the sequence of the starting time from front to back to obtain a sorting result; otherwise, taking the second task sequence as the sequencing result.

Fig. 2 is a flowchart illustrating a method for scheduling a speculative task in an alternative embodiment of the present invention. In fig. 2, an alternative embodiment is shown, the cluster highest throughput of the target cluster and task parameters such as the task occupation throughput and the pushing time length of each pushing task are first tested, and the task parameters of each pushing task are added to subsequently determine the execution order of the pushing tasks. In this embodiment, the task parameters of each of the pushed tasks may be dynamically detected and optimized, and the modified task parameters are added, so that the execution sequence of the pushed tasks is adjusted according to the modified parameters. For each pushed task, firstly, whether the current cluster residual throughput of the target cluster is smaller than the task occupation throughput of the pushed task is detected. If yes, the current environment is not suitable for executing the current pushing task, and whether the upstream task is executed completely is waited and circularly detected; if not, whether the high-priority pushing task is not executed is detected. If yes, indicating that a more important or core pushing task needs to be executed currently, not executing the pushing task, and waiting and circularly detecting whether an upstream task is executed completely; if not, detecting whether the pushing task with shorter pushing duration exists in the pushing tasks with the same task priority as the pushing task. If yes, the pushing task is not executed, and whether the execution of the upstream task is finished or not is waited and circularly detected; if not, the pushing task is executed.

In this embodiment, the pushed tasks with shorter pushed duration are preferentially processed among the multiple pushed tasks with the same task priority, so that the concurrence of the pushed tasks can be ensured to be increased, and the utilization rate of the throughput of the whole cluster is increased.

The embodiment of the invention automatically coordinates the execution sequence of all the pushing tasks on one cluster by testing and running all the pushing tasks, and can achieve the aim of efficiently utilizing the throughput of the whole cluster.

In some optional embodiments, after determining the execution order of all the speculative tasks, the method further includes: judging whether the execution sequence of all the number pushing tasks meets a first adjustment condition or not; if yes, adjusting the execution sequence of all the number pushing tasks according to the following steps:

Illustratively, assuming that an idle period exists on a pushing time axis, the task priority of the last executed pushing task before the idle period is A, and a plurality of pushing waiting tasks exist in the idle period, the task priority of the pushing waiting task with the earliest starting time or the highest task priority or the shortest pushing time after the idle period is adjusted to be A, so that the pushing waiting task is executed in the idle period. The embodiment can fully utilize the throughput of the target cluster in the idle period, and improve the utilization rate of the overall cluster throughput of the target cluster in the deduction period.

Optionally, adjusting the task priority of at least one task waiting for a pushup to be the same as the task priority of a task preceding the pushup, comprising: determining the waiting tasks with priorities to be adjusted according to the cluster highest throughput of the target cluster and the task occupation throughput of each waiting pushing task on the pushing time axis; and adjusting the task priority of the waiting task with the priority to be adjusted to be the same as the task priority of the task with the forward pushing number, and updating the cluster residual throughput of the adjusted target cluster.

When the idle period is the time when the number of the tasks waiting for the pushing number is large, the waiting tasks with the priorities to be adjusted can be screened according to the sequence of the starting time from early to late or the task priorities from high to low or the pushing number duration from short to long. Illustratively, assuming that the highest throughput of the cluster is 10, 4 tasks waiting for pushing are arranged in the order of the priority from high to low in the idle period, and the occupied throughputs of the corresponding tasks are 3, 3 and 4 respectively. The task priorities of the first three pushwaiting tasks are adjusted to be the same as the task priority of the preceding pushwaiting task.

Optionally, after the task priority of at least one pushing-waiting task is adjusted to be the same as the task priority of the pushing-forward task, the method further includes: judging whether the cluster residual throughput of the target cluster after adjustment is larger than zero; and if so, increasing the task occupation throughput of the adjusted task waiting for the pushing number task to enable the cluster residual throughput of the target cluster to be zero.

When the task occupation throughput of the adjusted pushing waiting task is adjusted, the cluster residual throughput of the target cluster at the adjusting time can be evenly distributed to the adjusted pushing waiting task, or only distributed to part of the adjusted pushing waiting task. The throughput to which each adjusted push-waiting task is assigned may be the same or different. Fig. 3 is a flowchart illustrating a task parameter for adjusting a pushing task based on an idle period in an alternative embodiment of the present invention. As shown in fig. 3, it is first detected whether there is an idle period on the extrapolation time axis. If not, the process is ended; if yes, whether the task waiting for pushing number exists in the idle time period is detected. If not, the process is ended; if yes, the priority of the tasks waiting for pushing and the task occupation throughput are improved. And then retesting and running all the pushing tasks to determine the total execution time of the pushing tasks of all the pushing tasks after adjustment. And judging whether the total execution time length of the pushed tasks of all the pushed tasks after the adjustment is longer, namely whether the total execution time length of the pushed tasks of all the pushed tasks after the adjustment is longer than the total execution time length of the pushed tasks of all the pushed tasks before the adjustment. And if so, rolling back the task priority and the task state throughput of the number pushing task, and not adjusting the task parameters of the number pushing task.

Illustratively, assuming that the highest throughput of the cluster is 10, 4 tasks waiting for pushing are arranged in the order of the priority from high to low in the idle period, and the occupied throughputs of the corresponding tasks are 3, 3 and 4 respectively. The task priority of the first three tasks waiting for the pushing is adjusted to be the same as the task priority of the task waiting for the pushing, and then the cluster residual throughput of the target cluster is larger than zero. At this time, the remaining 1 throughput may be added to any one of the pushed tasks whose priorities have been adjusted as described above. In this embodiment, the adjusted task occupation throughput of the task waiting for the pushed task is increased to make the cluster remaining throughput of the target cluster zero, so that the cluster throughput of the target cluster can be maximally utilized, and the utilization rate of the overall cluster throughput is increased.

In other alternative embodiments, after determining the execution order of all the pushing tasks, the method further includes: judging whether the execution sequence of all the number pushing tasks meets a second adjustment condition or not; if yes, adjusting the execution sequence of all the number pushing tasks according to the following steps:

For example, assuming that there is an unfilled time period on the pushcount time axis, the task priority of the pushcount task being executed in the unfilled time period is a, and a plurality of pushcount waiting tasks exist in the unfilled time period, the task priority of the pushcount waiting task with the earliest starting time or the highest task priority or the shortest pushcount duration after the unfilled time period is adjusted to be a, so that the pushcount waiting task is executed in the unfilled time period. The embodiment can fully utilize the residual throughput of the target cluster in the unfilled time period, and improve the overall cluster throughput utilization rate of the target cluster in the deduction period.

Optionally, adjusting the task priority of at least one push-waiting task with a task occupation throughput less than or equal to the remaining throughput of the cluster to be the same as the task priority of the push task being executed includes: determining the waiting tasks with priorities to be adjusted according to the cluster residual throughput and the task occupation throughput of each waiting pushing task on the pushing time axis; and adjusting the task priority of the waiting task with the priority to be adjusted to be the same as the task priority of the pushed task which is being executed, and updating the cluster residual throughput of the adjusted target cluster.

When the unfilled time period is the number of the tasks waiting for the pushing number is large, the waiting tasks with the priorities to be adjusted can be screened according to the sequence of the starting time from morning to evening or the priority of the tasks from high to low or the duration of the pushing number from short to long. Illustratively, assuming that the highest throughput of the cluster is 10, and the remaining throughput of the cluster is 3 in a 15-minute period, the 15-minute period is the underfilling period. If a task occupying a pushing task with a throughput of 3 exists in the non-full period, the task priority of the pushing task can be adjusted to be the same as the task priority of the pushing task currently being executed.

Optionally, after adjusting the task priority of at least one push-waiting task whose task occupation throughput is less than or equal to the remaining throughput of the cluster to be the same as the task priority of the push task being executed, the method further includes: judging whether the cluster residual throughput of the target cluster after adjustment is larger than zero; and if so, increasing the task occupation throughput of the adjusted task waiting for the pushing number task to enable the cluster residual throughput of the target cluster to be zero.

When the task occupation throughput of the adjusted pushing waiting task is adjusted, the cluster residual throughput of the target cluster at the adjusting time can be evenly distributed to the adjusted pushing waiting task, or only distributed to part of the adjusted pushing waiting task. The throughput to which each adjusted push-waiting task is assigned may be the same or different. FIG. 4 is a flowchart illustrating the task parameters for adjusting the pushed task based on the underfilling period in an alternative embodiment of the invention. As shown in fig. 4, it is first detected whether there is an unfilled period on the extrapolation time axis. If not, the process is ended; if yes, whether the task waiting for pushing number exists in the idle time period is detected. If not, the process is ended; if yes, the priority of the tasks waiting for pushing and the task occupation throughput are improved. And then retesting and running all the pushing tasks to determine the total execution time of the pushing tasks of all the pushing tasks after adjustment. And judging whether the total execution time length of the pushed tasks of all the pushed tasks after the adjustment is longer, namely whether the total execution time length of the pushed tasks of all the pushed tasks after the adjustment is longer than the total execution time length of the pushed tasks of all the pushed tasks before the adjustment. And if so, rolling back the task priority and the task state throughput of the number pushing task, and not adjusting the task parameters of the number pushing task.

Illustratively, assuming that the highest throughput of the cluster is 10, and the remaining throughput of the cluster is 5 in a 15-minute period, the 15-minute period is the underfilling period. If a task occupying the pushed tasks with the throughputs of 1 and 2 respectively exists in the unfilled time period, the task priorities of the two tasks waiting for pushed tasks are adjusted to be the same as the task priority of the pushed task currently being executed, and then the cluster residual throughput of the target cluster is 2 and is larger than zero. At this time, the remaining 2 throughputs may be added to the two pushed tasks with the adjusted task priorities, that is, the occupied throughputs of the two pushed tasks with the adjusted task priorities are 2 and 3 respectively. In this embodiment, the adjusted task occupation throughput of the task waiting for the pushed task is increased to make the cluster remaining throughput of the target cluster zero, so that the cluster throughput of the target cluster can be maximally utilized, and the utilization rate of the overall cluster throughput is increased.

If the pushed data is written into the cluster by batch commit and using multithreading, the writing speed is increased again when the throughput is full, the pushed data is discarded since it is not time for the cluster to process, resulting in data loss. If the cluster throughput is provided by optimizing the hardware, the cost of the cluster machine is increased, and when the hardware is improved to the optimal value, the pushed data volume is continuously increased, and the write bottleneck is still met. If the cluster writing speed is increased by increasing the Refresh time interval, reducing the number of copies, and the like, the search real-time performance and the data reliability are sacrificed. If the cluster writing speed is improved by a push task serial processing mode, the push time of the whole cluster is delayed when some preposed tasks fail and are delayed. If the cluster writing speed is provided by the way of pushing task parallel processing, because the pushing data amount of each pushing task is different, the pushing time is too long due to the use of lower concurrency, and the throughput of the whole cluster cannot be efficiently utilized. When a higher concurrency degree is used, because the end times of the upstream tasks of the indexes are different, if too many push tasks are executed at the same time, the concurrency amount is too large, and the data pushed to the cluster is lost. If a lost data retry mechanism is added, not only the cluster throughput of the cluster cannot be improved, but also the load of the whole cluster is increased by excessive lost data retry.

In the actual production process, rather than pursuing the extreme writing speed, the cluster throughput utilization rate is improved as much as possible under the condition of limited hardware equipment and under the condition of ensuring a certain service scene. The embodiment of the invention automatically coordinates the execution sequence of all the pushing tasks on one cluster by testing and running all the pushing tasks, and can achieve the aim of efficiently utilizing the throughput of the whole cluster. The task priority of the pushing task, the task occupation throughput and other task parameters are dynamically adjusted by detecting the execution condition of each pushing task on the whole pushing time axis, so that the utilization rate of the whole cluster throughput can be further improved.

It should be noted that the clusters mentioned in the embodiments of the present invention may be clusters other than ES (elastic search, a Lucene-based search server), and may also be other clusters, such as mysql (a relational database management system) cluster, hbase (a distributed, column-oriented open source database) cluster, and the like.

According to a second aspect of the embodiments of the present invention, there is provided an apparatus for implementing the above method.

Fig. 5 is a schematic diagram of main modules of an apparatus for scheduling a speculative task according to an embodiment of the present invention, and as shown in fig. 5, the apparatus 500 for scheduling a speculative task includes:

the obtaining module 501 is configured to test and run all the pushed tasks, and regularly obtain the cluster remaining throughput of the target cluster receiving the pushed data in the test process;

a determining module 502, configured to determine an execution sequence of all the pushed tasks according to the task parameter of each pushed task and the remaining throughput of the cluster obtained at regular time;

and a pushing module 503, configured to push the pushed data to the target cluster according to the execution order.

one or more processors;

a storage device for storing one or more programs,

Fig. 6 illustrates an exemplary system architecture 600 of a method or apparatus for push task scheduling to which embodiments of the present invention may be applied.

As shown in fig. 6, the system architecture 600 may include

terminal devices

601, 602, 603, a network 604, and a server 605. The network 604 serves to provide a medium for communication links between the

terminal devices

601, 602, 603 and the server 605. Network 604 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.

A user may use the

terminal devices

601, 602, 603 to interact with the server 605 via the network 604 to receive or send messages or the like. The

terminal devices

601, 602, 603 may have installed thereon various communication client applications, such as shopping applications, web browser applications, search applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).

The

terminal devices

601, 602, 603 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 605 may be a server providing various services, such as a background management server (for example only) providing support for shopping websites browsed by users using the

terminal devices

601, 602, 603. The backend management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (for example, target push information, product information — just an example) to the terminal device.

It should be noted that the method for scheduling a push task provided in the embodiment of the present invention is generally executed by the server 605, and accordingly, the device for scheduling a push task is generally disposed in the server 605.

It should be understood that the number of terminal devices, networks, and servers in fig. 6 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Referring now to FIG. 7, shown is a block diagram of a computer system 700 suitable for use with a terminal device implementing an embodiment of the present invention. The terminal device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 7, the computer system 700 includes a Central Processing Unit (CPU)701, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data necessary for the operation of the system 700 are also stored. The CPU 701, the ROM 702, and the RAM 703 are connected to each other via a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

The following components are connected to the I/O interface 705: an input portion 706 including a keyboard, a mouse, and the like; an output section 707 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read out therefrom is mounted into the storage section 708 as necessary.

In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 709, and/or installed from the removable medium 711. The computer program performs the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 701.

It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor comprising: the acquisition module tests and runs all the pushed tasks and regularly acquires the cluster residual throughput of the target cluster receiving the pushed data in the test process; the determining module is used for determining the execution sequence of all the pushing tasks according to the task parameters of each pushing task and the cluster residual throughput obtained at fixed time; and the pushing module pushes the pushing data to the target cluster according to the execution sequence. The names of these modules do not in some cases form a limitation on the modules themselves, for example, the acquiring module may also be described as a module that pushes the push data to the target cluster in the execution order.

As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: testing and running all the pushing tasks, and regularly acquiring the cluster residual throughput of the target cluster receiving the pushing data in the testing process; determining the execution sequence of all the pushing tasks according to the task parameters of each pushing task and the residual throughput of the cluster obtained at fixed time; and pushing the pushing data to the target cluster according to the execution sequence.

According to the technical scheme of the embodiment of the invention, the execution sequence of all the pushing tasks on one cluster is automatically coordinated by testing and running all the pushing tasks, so that the aim of efficiently utilizing the throughput of the whole cluster can be achieved. The task priority of the pushing task, the task occupation throughput and other task parameters are dynamically adjusted by detecting the execution condition of each pushing task on the whole pushing time axis, so that the utilization rate of the whole cluster throughput can be further improved.

The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for push task scheduling, comprising:

2. The method of claim 1, wherein the task parameters comprise: task occupation throughput and task priority;

3. The method of claim 1, wherein the task parameters further comprise: the number pushing duration and the starting time;

4. The method of claim 2, wherein after determining the order of execution of all the speculative tasks, further comprising: judging whether the execution sequence of all the number pushing tasks meets a first adjustment condition or not; if yes, adjusting the execution sequence of all the number pushing tasks according to the following steps:

5. The method of claim 4, wherein adjusting the task priority of at least one pending pushup task to be the same as the task priority of a preceding pushup task comprises:

6. The method of claim 4 or 5, wherein after adjusting the task priority of at least one pending pushup task to be the same as the task priority of a preceding pushup task, further comprising:

7. The method of claim 2, wherein after determining the order of execution of all the speculative tasks, further comprising: judging whether the execution sequence of all the number pushing tasks meets a second adjustment condition or not; if yes, adjusting the execution sequence of all the number pushing tasks according to the following steps:

8. The method of claim 7, wherein adjusting a task priority of at least one push-waiting task having a task occupancy throughput less than or equal to the remaining throughput of the cluster to be the same as a task priority of a push task being executed comprises:

9. The method of claim 7 or 8, wherein after adjusting the task priority of at least one push-waiting task having a task occupancy throughput less than or equal to the remaining throughput of the cluster to be the same as the task priority of the push task being executed, further comprising:

10. An apparatus for push task scheduling, comprising:

11. An electronic device for push task scheduling, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-9.

12. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-9.