CN115712489A

CN115712489A - Task scheduling method and device for deep learning platform and electronic equipment

Info

Publication number: CN115712489A
Application number: CN202110954074.9A
Authority: CN
Inventors: 刘鹤; 沈大赛; 谢远江
Original assignee: Beijing Sogou Technology Development Co Ltd
Current assignee: Beijing Sogou Technology Development Co Ltd
Priority date: 2021-08-19
Filing date: 2021-08-19
Publication date: 2023-02-24

Abstract

The invention discloses a task scheduling method for a deep learning platform, which comprises the steps of obtaining a target task to be processed with the time consumption not lower than a set time consumption from a task queue; selecting one load gate with the load lower than a set load from N load gates as a target load gate, wherein the N load gates are in one-to-one correspondence with N GPUs included in the deep learning platform, and N is an integer greater than 1; and distributing the target tasks to be processed to a target GPU corresponding to the target load door for processing. The task scheduling method, the task scheduling device and the electronic equipment for the deep learning platform can improve the accuracy of load balancing, shorten the total time consumption for task completion and improve the processing efficiency.

Description

Task scheduling method and device for deep learning platform and electronic equipment

Technical Field

The invention relates to the technical field of deep learning, in particular to a task scheduling method and device for a deep learning platform and electronic equipment.

Background

In an existing deep learning platform, a service generally needs to enable a plurality of Graphics Processing Units (GPUs) to perform computation, each computation task needs to be deployed to one GPU to perform feature computation, and a plurality of threads are deployed on each GPU, each thread can perform one task computation at a time, so that each GPU may need to perform computation of a plurality of tasks at the same time.

Because each GPU may need to execute multiple task computations simultaneously, a task scheduling policy is needed to balance the computation load on all GPUs, thereby avoiding the situations that the load of individual GPUs is high and the computation time is long, and the load of other GPUs is low and the computation time is short. In the prior art, tasks are generally allocated to idle threads by polling access of threads of a GPU, but at this time, tasks with high time consumption are intensively allocated to some GPUs, and tasks with low time consumption are allocated to other GPUs, which causes some GPUs to have high load and some GPUs to have low load, so that the problem of low precision of load balancing in the prior art exists.

Disclosure of Invention

The embodiment of the invention provides a task scheduling method and device for a deep learning platform and electronic equipment, which can improve the accuracy of load balancing, shorten the total time consumption for task completion and improve the processing efficiency.

The first aspect of the embodiments of the present invention provides a task scheduling method for a deep learning platform, where the method includes:

acquiring a target task to be processed with the time consumption not lower than the set time consumption from the task queue;

selecting one load gate with the load lower than a set load from N load gates as a target load gate, wherein the N load gates are in one-to-one correspondence with N GPUs included in the deep learning platform, and N is an integer greater than 1;

and distributing the target tasks to be processed to a target GPU corresponding to the target load door for processing.

Optionally, before obtaining the target to-be-processed task whose consumed time is not lower than the set consumed time from the task queue, the method further includes:

acquiring all current tasks to be processed;

and time-consuming sequencing is carried out on all the tasks to be processed, and then the tasks to be processed are placed into the task queue.

Optionally, the acquiring all current tasks to be processed includes:

and responding to a currently received request, acquiring the calculation tasks of a plurality of deep learning maps corresponding to the request, and taking the calculation tasks of the plurality of deep learning maps as all the tasks to be processed.

Optionally, the time-consuming sequencing of all the tasks to be processed and then placing the tasks into the task queue includes:

acquiring the expected time consumption of each task to be processed in all the tasks to be processed;

and according to the expected time consumption of each task to be processed, putting all the tasks to be processed into the task queue from large to small according to the expected time consumption.

Optionally, the placing all the to-be-processed tasks into the task queue according to the expected consumed time of each to-be-processed task from large to small according to the expected consumed time includes:

sequencing all the tasks to be processed from large to small according to the expected time consumption of each task to be processed;

and putting all the sequenced tasks to be processed into the task queue.

Optionally, the selecting one load gate with a load lower than the set load from the N load gates as a target load gate includes:

acquiring a load value corresponding to each load gate in the N load gates, wherein the load value is determined according to the number of current waiting threads of the load gates;

and selecting one load door with the load lower than the set load from the N load doors as the target load door according to the load value corresponding to each load door.

and selecting one load gate with the lowest load from the N load gates as the target load gate.

Optionally, the allocating the target task to be processed to the target GPU corresponding to the target load gate for processing includes:

and distributing the target task to be processed to an idle thread in the target GPU for processing, and adjusting the current load of the target load door.

The second aspect of the embodiments of the present invention further provides a task scheduling device for a deep learning platform, where the device includes:

the target task acquiring unit is used for acquiring a target to-be-processed task with the time consumption not less than the set time consumption from the task queue;

the target load gate selecting unit is used for selecting one load gate with the load lower than a set load from N load gates as a target load gate, wherein the N load gates are in one-to-one correspondence with N GPUs contained in the deep learning platform, and N is an integer greater than 1;

and the task processing unit is used for distributing the target task to be processed to the target GPU corresponding to the target load door for processing.

Optionally, the method further includes:

the task to be processed acquiring unit is used for acquiring all current tasks to be processed;

and the sequencing unit is used for sequencing all the tasks to be processed in a time-consuming manner and then putting the tasks into the task queue.

Optionally, the to-be-processed task obtaining unit is configured to respond to a currently received request, obtain computation tasks of a plurality of deep learning maps corresponding to the request, and use the computation tasks of the plurality of deep learning maps as all the to-be-processed tasks.

Optionally, the target load gate selecting unit is configured to obtain a load value corresponding to each load gate of the N load gates, where the load value is determined according to the number of current waiting threads of the load gate; and selecting one load gate with the load lower than the set load from the N load gates as the target load gate according to the load value corresponding to each load gate.

A third aspect of the embodiments of the present invention provides an electronic device, including a memory, and one or more programs, where the one or more programs are stored in the memory and configured to be executed by one or more processors to execute operating instructions included in the one or more programs for performing the task scheduling method for a deep learning platform according to the first aspect.

A fourth aspect of the embodiments of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps corresponding to the task scheduling method for a deep learning platform as provided in the first aspect.

The above one or at least one technical solution in the embodiments of the present invention has at least the following technical effects:

based on the technical scheme, the target task to be processed with the time consumption not lower than the set time consumption is obtained from the task queue; selecting one load gate with the load lower than the set load from the N load gates as a target load gate; and distributing the target tasks to be processed to the target GPUs corresponding to the target load gates for processing, so that the time consumption of the target tasks to be processed is not lower than the set time consumption, the load of the target load gates is lower than the set load, and the N load gates correspond to the N GPUs one to one, so that the target tasks to be processed with high time consumption are distributed to the target GPUs with low loads for processing, the accuracy of load balancing is improved, the balancing performance of the total load of the tasks is improved on the basis of improving the accuracy of load balancing, the tasks with high time consumption can be scheduled and calculated preferentially, the total time consumption for task completion can be effectively shortened, and the task processing efficiency is improved.

Drawings

Fig. 1 is a schematic flowchart of a task scheduling method for a deep learning platform according to an embodiment of the present invention;

FIG. 2 is a block diagram of a task scheduling device for a deep learning platform according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The main implementation principle, the specific implementation mode and the corresponding beneficial effects of the technical scheme of the embodiment of the invention are explained in detail with reference to the accompanying drawings.

Examples

Referring to fig. 1, an embodiment of the present invention provides a task scheduling method for a deep learning platform, where the method includes:

s101, acquiring a target task to be processed with the consumed time not lower than a set consumed time from a task queue;

s102, selecting one load gate with the load lower than a set load from N load gates as a target load gate, wherein the N load gates are in one-to-one correspondence with N GPUs included in the deep learning platform, and N is an integer greater than 1;

s103, distributing the target task to be processed to a target GPU corresponding to the target load door for processing.

In an embodiment of the present specification, the deep learning platform may be a deep learning inference platform.

In step S101, if the to-be-processed tasks in the to-be-processed task set in the task queue are not time-consuming sequenced, time-consuming calculation needs to be performed on each of the to-be-processed tasks in the to-be-processed task set, and according to expected time consumption of each of the to-be-processed tasks obtained through calculation, one to-be-processed task is sequentially selected from the to-be-processed task set as a target to-be-processed task from large to small in time consumption; and if the to-be-processed tasks in the to-be-processed task set in the task queue are subjected to time-consuming sequencing, sequentially selecting one to-be-processed task from the to-be-processed task set as a target to-be-processed task directly according to the time consumption from large to small.

In this embodiment of the specification, the set time consumption may be determined according to the time consumption of each to-be-processed task in the task queue, for example, the set time consumption may be not less than the second highest time consumption or the third highest time consumption in the task queue and not more than the highest time consumption, for example, if the time consumption of 3 to-be-processed tasks in the task queue is sequentially 30ms (milliseconds), and 20ms and 15ms, the set time consumption may be determined to be 20ms, 25ms, or 26 ms. Of course, the setting time can be set according to actual conditions, and the description is not particularly limited.

Specifically, the set consumed time is changed according to the change of the tasks to be processed in the task queue, and if the consumed time of 3 tasks to be processed in the task queue at the previous time is 30ms (millisecond), 20ms and 15ms in sequence, according to 30ms and 20ms, the set consumed time can be determined to be 20, 25, 26 and the like, so that the tasks to be processed with 30ms are allocated to the corresponding GPU for processing; at the next moment, the time consumption of 2 tasks to be processed in the task queue is sequentially 20ms and 15ms, at the moment, the set time consumption can be determined to be 19ms, 17ms, 16ms and the like according to 20ms and 15ms, and therefore the 20ms tasks to be processed are distributed to the corresponding GPU for processing; and at the next moment, only one task to be processed exists in the task queue, the time consumption is 15ms, and according to the 15ms, the set time consumption is determined to be 15ms, 14ms, 13ms and the like because the set time consumption is not more than the highest time consumption, so that the task to be processed of 15ms is allocated to the corresponding GPU for processing.

Preferably, the time consumption is set to be not less than the time consumption with the second highest time consumption in the task queue, so that in this case, one to-be-processed task is sequentially selected as the target to-be-processed task from the to-be-processed task set in the task queue according to the descending of the time consumption; and executing the steps S102-S103 aiming at the target task to be processed selected each time, thereby finishing the processing of each task to be processed in the task queue.

In this embodiment of the present specification, before executing step S101, all current tasks to be processed may be acquired; time-consuming sequencing is carried out on all tasks to be processed, and then the tasks are placed into a task queue; in this way, when step S101 is executed, the target tasks to be processed can be acquired from all the tasks to be processed in the time-consuming sequencing more quickly.

In this embodiment of the present specification, when all current tasks to be processed are obtained, if a plurality of requests are received at the same time, a plurality of computation tasks of a plurality of deep learning maps corresponding to the plurality of requests are obtained, and the computation tasks of the plurality of deep learning maps are used as all tasks to be processed. When the calculation tasks of a plurality of deep learning maps corresponding to a plurality of requests are obtained, for each request, if the request only corresponds to the calculation task of one deep learning map, directly adding the calculation task of the deep learning map corresponding to the request into the calculation tasks of the plurality of deep learning maps; if the request corresponds to the computation tasks of the multiple deep learning maps, adding the computation tasks of the multiple deep learning maps to all the tasks to be processed; and after the computing task corresponding to each request is added to all the tasks to be processed, acquiring all the tasks to be processed.

Specifically, when all current tasks to be processed are obtained, only one request may be received at the same time, and the computation tasks of the multiple deep learning maps corresponding to the request may be obtained in response to the currently received request, and the computation tasks of the multiple deep learning maps may be used as all the tasks to be processed.

Specifically, when a request is received, the request includes calculation of a plurality of deep learning maps (i.e., feature maps), the calculation of each deep learning map is used as a calculation task, different deep learning maps may have different calculation time consumption due to different model structures, and thus after the calculation tasks of the plurality of deep learning maps are obtained, the expected time consumption of the calculation task of each deep learning map is calculated, then the calculation tasks of the plurality of deep learning maps after the ordering are placed in a task queue as all tasks to be processed according to the expected time consumption of the calculation task of each deep learning map.

After all the tasks to be processed are obtained, when all the tasks to be processed are placed in the task queue after time-consuming sequencing, the expected time consumption of each task to be processed in all the tasks to be processed can be obtained; and according to the expected time consumption of each task to be processed, putting all tasks to be processed into the task queue according to the expected time consumption from large to small. Therefore, all the to-be-processed tasks in the task queue are sequentially ordered from large to small according to the expected time consumption, and then when the step S101 is executed, the to-be-processed task with the highest or next highest time consumption in all the to-be-processed tasks can be directly used as the target to-be-processed task according to the set time consumption, so that the time for obtaining the target to-be-processed task is shortened, and the obtaining efficiency is improved.

Specifically, when all the tasks to be processed are placed into the task queue according to the expected consumed time of each task to be processed and the expected consumed time is from large to small, all the tasks to be processed can be sorted according to the expected consumed time of each task to be processed and the expected consumed time is from large to small; and then all the tasks to be processed after sequencing are placed into a task queue. At this time, all the tasks to be processed are sorted first, and then all the sorted tasks to be processed are put into the task queue, or all the tasks to be processed are directly put into the task queue from large to small according to expected time consumption without sorting all the tasks to be processed, and the specification is not limited specifically.

For example, taking the received request as one and including 4 calculations of the deep learning maps as an example, the calculation tasks of the 4 deep learning maps corresponding to the request are sequentially A1, A2, A3, and A4, where the expected time consumption of the calculation of A1, A2, A3, and A4 is sequentially 12ms, 10ms, 20ms, and 6ms, then A1, A2, A3, and A4 are sorted according to the expected time consumption from large to small to obtain a sorted sequence { A3, A1, A2, A4}, and then the sorted sequence is placed in the task queue. At this time, when the sorting sequence is placed in the task queue, the put method may be called in sequence, so that A3, A1, A2, and A4 are placed in the task queue in sequence. And if the setting time is 18ms, selecting A3 as the target task to be processed because of 20 >.

After the target to-be-processed task is determined, step S102 is performed.

In step S102, after the target processing task is determined, an operation of selecting a target load gate is automatically triggered, and at this time, a load value corresponding to each load gate of the N load gates is obtained, where the load value is determined according to the number of current waiting threads of the load gate; and selecting one load gate with the load lower than the set load from the N load gates as a target load gate according to the load value corresponding to each load gate. The current waiting thread number refers to the number of threads currently in a waiting state. For example, if the number of GPU in wait states corresponding to a certain load gate is 3, the current number of waiting threads of the load gate is 3.

In the embodiments of the present specification, since N is an integer greater than 1, so that the value of N may be 2, 3, 5, 8, and the like, the present specification is not particularly limited.

In the embodiment of the present specification, the set load may be determined according to the load value of each of the N load gates, for example, the set load may be not greater than the second lowest load value or the third lowest load value of the N load gates and greater than the lowest load value, for example, if the load values of the 3 load gates are-5, -2, and-8 in sequence, the set load may be-6, and-7, and the like. Of course, the set load may be set according to actual conditions, and the present specification is not particularly limited.

Specifically, one load gate having the lowest or next-lowest load may be selected from the N load gates as the target load gate. That is, one load gate having the smallest or next smallest load value may be selected from the N load gates as the target load gate.

Preferably, the load is set to be smaller than the next lowest load value of the N load gates and larger than the lowest load value, so that the target task to be processed is allocated to the load gate with the lowest load, the time consumption is set to be not smaller than the time consumption of the next highest time consumption in the task queue, the time consumption of the target task to be processed is determined each time to be the highest, and the load of the load gate allocated to the target task to be processed is the lowest, the probability that the high time consumption task is scheduled and calculated preferentially as much as possible can be further improved, the total time required for completing all tasks is further shortened, and the task processing efficiency can be effectively improved.

Specifically, a gate _ states array is created for each GPU in advance according to N GPUs, and the N gate _ states arrays are obtained as N load gates, where for each gate _ states array, a load value corresponding to the array is initialized to 0, and whenever there is a thread in the GPU corresponding to the gate in an idle state, the load value corresponding to the gate is decreased by one (if there are 3 threads in a wait state, the load value corresponding to the gate is-3); in this way, the load value of each of the N load gates can be obtained. And after the load value of each load gate is obtained, selecting one load gate with the load value lower than the set load from the N load gates as a target load gate. Of course, whenever there is a thread in the GPU corresponding to the gate in the idle state, the load value corresponding to the gate may be added or subtracted by a corresponding value, for example, whenever there is a thread in the GPU corresponding to the gate in the idle state, the load value corresponding to the gate may be added by 1, or added by 2, or subtracted by 3, and the like, which is not limited in this specification. In the following, it is specifically taken as an example that whenever there is a thread in the GPU corresponding to the gate in the idle state, the load value corresponding to the gate is decreased by one.

Taking the deep learning platform including 4 GPUs as an example, the 4 GPUs are B1, B2, B3 and B4 in sequence, a gate _ states array is created for each GPU in B1, B2, B3 and B4, and G1, G2, G3 and G4 in sequence, wherein the load values of G1, G2, G3 and G4 are obtained to be-3, -4, -6 and-5 in sequence according to the number of threads currently in a waiting state of each GPU, and if the load is set to be-5.5, G3 is selected as a target load gate because-6 < -5.5.

In the practical application process, after the put method is called to place all the sorted tasks to be processed into the task queue, the select _ and _ signal method is called, and by calling the select _ and _ signal method, a load gate with the lowest load is selected and a thread corresponding to the load gate is informed.

After the target load door is determined, step S103 is performed.

In step S103, after the target load gate is determined, since the N load gates correspond to the N GPUs one to one, the target GPU corresponding to the target load gate may be obtained, and then the target to-be-processed task is allocated to an idle thread in the target GPU for processing, and the current load of the target load gate is adjusted.

For example, G3 is selected as a target load gate, and as G3 corresponds to B3, it is determined that the target GPU is B3, and the thread in the waiting state in B3 is notified, and the target task to be processed is allocated to selecting G3 as the target load gate, and the current load value of B3 is-6 +1= -5, then the current load value of B3 is adjusted from-6 to-5. Therefore, the target to-be-processed tasks with the time consumption larger than the set time consumption in the task queue can be distributed to the target GPU with the low load, so that the high-time-consumption tasks are scheduled and calculated preferentially as much as possible, the total time required by all tasks to be completed can be shortened, and the task processing efficiency can be effectively improved.

In the embodiment of the present description, for the remaining to-be-processed tasks in the task queue except the target to-be-processed task, the steps S101 to S103 are performed for each to-be-processed task in the remaining to-be-processed tasks, and at this time, the setting time consumption needs to be determined according to the expected time consumption of the current to-be-processed task in the task queue, so that the setting time consumption and the setting load are changed according to an actual situation, and thus, the to-be-processed tasks with low time consumption are allocated to the GPU with high load, and the accuracy of load balancing of the N GPUs can be improved.

Based on the technical scheme, the target task to be processed with the time consumption not lower than the set time consumption is obtained from the task queue; selecting one load gate with the load lower than the set load from the N load gates as a target load gate; and distributing the target tasks to be processed to the target GPUs corresponding to the target load gates for processing, so that the time consumption of the target tasks to be processed is not lower than the set time consumption, the load of the target load gates is lower than the set load, and the N load gates are in one-to-one correspondence with the N GPUs, so that the target tasks to be processed with high time consumption are distributed to the target GPUs with low load for processing, the accuracy of load balancing is improved, the balancing performance of the total load of the tasks is improved on the basis of improving the accuracy of load balancing, the tasks with high time consumption can be preferentially scheduled and calculated, the total time consumption for task completion can be effectively shortened, and the task processing efficiency is improved.

The embodiment of the present invention also provides a task scheduling apparatus for a deep learning platform, referring to fig. 2, where the apparatus includes:

a target task obtaining unit 201, configured to obtain a target to-be-processed task whose consumed time is not less than a set consumed time from a task queue;

a target load gate selecting unit 202, configured to select, as a target load gate, one load gate with a load lower than a set load from N load gates, where the N load gates are in one-to-one correspondence with N GPUs included in the deep learning platform, and N is an integer greater than 1;

and the task processing unit 203 is configured to allocate the target task to be processed to a target GPU corresponding to the target load gate for processing.

In an alternative embodiment, the apparatus further comprises:

the to-be-processed task obtaining unit is used for obtaining all current to-be-processed tasks;

In an optional implementation manner, the to-be-processed task obtaining unit is configured to, in response to a currently received request, obtain computation tasks of a plurality of deep learning maps corresponding to the request, and use the computation tasks of the plurality of deep learning maps as all the to-be-processed tasks.

In an optional implementation manner, the sorting unit is configured to obtain an expected consumed time of each of the all to-be-processed tasks; and according to the expected time consumption of each task to be processed, putting all the tasks to be processed into the task queue from large to small according to the expected time consumption.

In an optional implementation manner, the sorting unit is configured to sort all the tasks to be processed according to expected time consumption of each task to be processed, from large to small; and putting all the sequenced tasks to be processed into the task queue.

In an optional implementation manner, the target load gate selecting unit 202 is configured to obtain a load value corresponding to each load gate of the N load gates, where the load value is determined according to the number of current waiting threads of the load gate; and selecting one load door with the load lower than the set load from the N load doors as the target load door according to the load value corresponding to each load door.

In an alternative embodiment, the target load gate selecting unit 202 is configured to select a load gate with the lowest load from the N load gates as the target load gate.

In an optional embodiment, the task processing unit 203 is configured to allocate the target task to be processed to an idle thread in the target GPU for processing, and adjust the current load of the target load gate.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 3 is a block diagram illustrating an electronic device 800 for a task scheduling method for a deep learning platform, according to an example embodiment. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 3, electronic device 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/presentation (I/O) interface 812, sensor component 814, and communication component 816.

The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing elements 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operation at the device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.

The multimedia component 808 includes a screen that provides a presentation interface between the electronic device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 800 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to present and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, the audio component 810 also includes a speaker for presenting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, the sensor assembly 814 can detect the open/closed state of the device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 can also detect a change in position of the electronic device 800 or a component of the electronic device 800, the presence or absence of user contact with the electronic device 800, orientation or acceleration/deceleration of the electronic device 800, and a change in temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as WiFi,2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast associated information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communications component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 804 comprising instructions, executable by the processor 820 of the electronic device 800 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

A non-transitory computer readable storage medium having instructions therein, which when executed by a processor of a mobile terminal, enable the mobile terminal to perform a method of task scheduling for a deep learning platform, the method comprising:

acquiring a target task to be processed with the consumed time not lower than the set consumed time from a task queue;

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is only limited by the appended patent claims

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and should not be taken as limiting the scope of the present invention, which is intended to cover any modifications, equivalents, improvements, etc. within the spirit and scope of the present invention.

Claims

1. A task scheduling method for a deep learning platform, the method comprising:

selecting a load gate with a load lower than a set load from N load gates as a target load gate, wherein the N load gates are in one-to-one correspondence with N GPUs included in the deep learning platform, and N is an integer greater than 1;

2. The method of claim 1, wherein before retrieving the target pending task from the task queue having a elapsed time not less than the set elapsed time, the method further comprises:

acquiring all current tasks to be processed;

3. The method of claim 2, wherein the obtaining all current tasks to be processed comprises:

4. The method of claim 2, wherein the time-consuming ordering of all the tasks to be processed and placing the tasks into the task queue comprises:

5. The method of claim 4, wherein the placing all the pending tasks into the task queue according to the expected time consumption of each pending task from big to small comprises:

and putting all the sequenced tasks to be processed into the task queue.

6. The method according to any one of claims 1 to 5, wherein the selecting one load gate with a load lower than the set load from the N load gates as the target load gate comprises:

and selecting one load gate with the load lower than the set load from the N load gates as the target load gate according to the load value corresponding to each load gate.

7. The method of claim 6, wherein selecting one of the N load gates having a load lower than the set load as the target load gate comprises:

8. The method as claimed in claim 7, wherein said allocating the target task to be processed to the target GPU corresponding to the target load gate for processing comprises:

and allocating the target task to be processed to an idle thread in the target GPU for processing, and adjusting the current load of the target load door.

9. A task scheduler for a deep learning platform, the task scheduler comprising:

the target load gate selection unit is used for selecting one load gate with the load lower than a set load from N load gates as a target load gate, wherein the N load gates are in one-to-one correspondence with N GPUs included in the deep learning platform, and N is an integer larger than 1;

10. The apparatus of claim 9, further comprising:

11. The apparatus according to claim 10, wherein the to-be-processed task obtaining unit is configured to, in response to a currently received request, obtain computation tasks of a plurality of deep learning maps corresponding to the request, and use the computation tasks of the plurality of deep learning maps as all the to-be-processed tasks.

12. The apparatus according to any one of claims 9 to 11, wherein the target load gate selecting unit is configured to obtain a load value corresponding to each load gate of the N load gates, where the load value is determined according to a current waiting thread number of the load gates; and selecting one load gate with the load lower than the set load from the N load gates as the target load gate according to the load value corresponding to each load gate.

13. An electronic device comprising a memory and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors to execute operating instructions included in the one or more programs for performing the corresponding method according to any one of claims 1 to 8.

14. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps corresponding to the method according to any one of claims 1 to 8.