CN110096356B - Resource scheduling method, device, electronic equipment and storage medium - Google Patents

Resource scheduling method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN110096356B
CN110096356B CN201910223115.XA CN201910223115A CN110096356B CN 110096356 B CN110096356 B CN 110096356B CN 201910223115 A CN201910223115 A CN 201910223115A CN 110096356 B CN110096356 B CN 110096356B
Authority
CN
China
Prior art keywords
resource
communication time
modules
module
resource modules
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910223115.XA
Other languages
Chinese (zh)
Other versions
CN110096356A (en
Inventor
程京
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Priority to CN201910223115.XA priority Critical patent/CN110096356B/en
Publication of CN110096356A publication Critical patent/CN110096356A/en
Application granted granted Critical
Publication of CN110096356B publication Critical patent/CN110096356B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The application discloses a resource scheduling method, a resource scheduling device, an electronic device and a storage medium, wherein the resource scheduling method comprises the following steps: acquiring initial communication time parameters among a plurality of resource modules and the occupancy rate of each resource module; obtaining current communication time parameters among the resource modules according to the initial communication time parameters among the resource modules and the occupancy rates of the resource modules; and determining a resource module loop containing a preset number of resource modules according to the current communication time parameters among the resource modules, wherein the sum of the current communication time parameters among the adjacent resource modules in the resource module loop is minimum. By adopting the technical scheme, the resource scheduling of the Tensorflow Allreduce frame can be optimized, the resource modules are automatically distributed, and the optimized efficient communication loop is determined, so that the communication time in the training process can be reduced, and the training speed in the deep learning process can be improved.

Description

Resource scheduling method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a resource scheduling method and apparatus, an electronic device, and a storage medium.
Background
The TensorFlow is a symbolic mathematical system based on data flow programming, and is abundantly applied in the scenes of graphic classification, audio processing, recommendation systems, natural language processing and the like. TensorFlow is a very flexible framework that can run on single or multiple CPUs and GPUs of a personal computer or server, even on mobile devices.
In the deep learning framework, we often need to use multiple machines or cards to accelerate the training of the model. The Ring-Allreduce is a distributed deep learning architecture, is integrated into Tensflow1.11 and above versions, and is used as one of distributed calling modes of an estimator high-level API. In the Ring-allreduce architecture, each device is worker and forms a Ring, and a central node is not needed to aggregate gradients calculated by all workers. In an iterative process, each worker completes its mini-batch training, computes a gradient, and passes the gradient to the next worker in the ring, while it also receives gradients from the previous worker. For a ring containing N workers, each worker can update the model parameters after receiving the gradients of other N-1 workers.
Compared to the conventional ps (parameter Server architecture) architecture, the Ring-allocation architecture is bandwidth optimized because the bandwidth of each node in the cluster is fully utilized. In addition, in the deep learning training process, the calculation gradient adopts a BP (Back propagation) algorithm, and the method is characterized in that the gradient of the back layer is calculated first, while the gradient of the front layer is slower than that of the back layer, and the Ring-allow framework can make full use of the characteristic, and transmit the gradient of the back layer while calculating the gradient of the front layer, thereby reducing the training time.
In order to adopt the Ring-allreduce architecture for deep learning training, how to perform resource scheduling to form a Ring-allreduce containing multiple workers is a technical problem to be solved by those skilled in the art.
Disclosure of Invention
In order to overcome the problems in the related art, the present application provides a resource scheduling method, device, electronic device and storage medium.
According to a first aspect of the present application, there is provided a resource scheduling method, which is applied to an electronic device including a plurality of resource modules, the method including:
acquiring initial communication time parameters among the plurality of resource modules and the occupancy rates of the resource modules;
obtaining current communication time parameters among the plurality of resource modules according to the initial communication time parameters among the plurality of resource modules and the occupancy rates of the resource modules;
and determining a resource module loop containing a preset number of resource modules according to the current communication time parameters among the resource modules, wherein the sum of the current communication time parameters among the adjacent resource modules in the resource module loop is minimum.
In an optional implementation manner, the electronic device includes a plurality of sub devices, each of the sub devices includes the resource module, and the step of obtaining the initial communication time parameter between the resource modules includes:
and acquiring initial communication time parameters among the resource modules according to the connection mode among the resource modules in the sub-equipment and the network connection type among the sub-equipment.
In an optional implementation manner, the step of obtaining the current communication time parameter between the plurality of resource modules according to the initial communication time parameter between the plurality of resource modules and the occupancy rate of each of the resource modules includes:
judging whether the electronic equipment comprises a first resource module, wherein the occupancy rate of the first resource module is greater than or equal to a first preset threshold value;
if so, adjusting the communication time parameter between each resource module and the first resource module from the initial communication time parameter to be more than a second preset threshold value, and obtaining the current communication time parameter between the resource modules.
In an optional implementation manner, the step of determining a resource module loop including a preset number of resource modules according to a current communication time parameter between the plurality of resource modules includes:
generating an adjacency matrix according to the current communication time parameters among the resource modules;
and solving the adjacency matrix by adopting a dynamic programming algorithm, and determining a resource module loop containing a preset number of resource modules.
In an optional implementation manner, the step of determining a resource module loop including a preset number of resource modules according to a current communication time parameter between the plurality of resource modules includes:
determining all loops to be distributed according to the current communication time parameters among the resource modules, wherein each loop to be distributed comprises a preset number of resource modules;
calculating the sum of current communication time parameters between adjacent resource modules in each loop to be distributed;
and determining the loop to be allocated with the minimum sum of the current communication time parameters as the resource module loop.
According to a second aspect of the present application, there is provided a resource scheduling apparatus, the apparatus being applied to an electronic device, the electronic device including a plurality of resource modules, the apparatus including:
a first obtaining module configured to obtain an initial communication time parameter among the plurality of resource modules and an occupancy rate of each of the resource modules;
the second acquisition module is configured to obtain current communication time parameters among the plurality of resource modules according to the initial communication time parameters among the plurality of resource modules and the occupancy rates of the resource modules;
the allocation module is configured to determine a resource module loop including a preset number of resource modules according to the current communication time parameters among the plurality of resource modules, and the sum of the current communication time parameters among adjacent resource modules in the resource module loop is minimum.
In one optional implementation manner, the electronic device includes a plurality of sub-devices, each of the sub-devices includes the resource module, and the first obtaining module is further configured to:
and acquiring initial communication time parameters among the resource modules according to the connection mode among the resource modules in the sub-equipment and the network connection type among the sub-equipment.
In one optional implementation, the second obtaining module is further configured to:
judging whether the electronic equipment comprises a first resource module, wherein the occupancy rate of the first resource module is greater than or equal to a first preset threshold value;
if so, adjusting the communication time parameter between each resource module and the first resource module from the initial communication time parameter to be more than a second preset threshold value, and obtaining the current communication time parameter between the resource modules.
In one optional implementation, the allocation module is further configured to:
generating an adjacency matrix according to the current communication time parameters among the resource modules;
and solving the adjacency matrix by adopting a dynamic programming algorithm, and determining a resource module loop containing a preset number of resource modules.
In one optional implementation, the allocation module is further configured to:
determining all loops to be distributed according to the current communication time parameters among the resource modules, wherein each loop to be distributed comprises a preset number of resource modules;
calculating the sum of current communication time parameters between adjacent resource modules in each loop to be distributed;
and determining the loop to be allocated with the minimum sum of the current communication time parameters as the resource module loop.
According to a third aspect of the present application, there is provided an electronic device comprising:
a resource module;
a memory for storing resource module executable instructions;
wherein the resource module is configured to perform the resource scheduling method according to the first aspect.
According to a fourth aspect of the present application, there is provided a non-transitory computer readable storage medium having instructions which, when executed by a resource module of an electronic device, enable the electronic device to perform the resource scheduling method according to the first aspect.
According to a fifth aspect of the present application, there is provided a computer program product, wherein instructions, when executed by a resource module of an electronic device, enable the electronic device to perform the resource scheduling method according to the first aspect.
The technical scheme provided by the application can comprise the following beneficial effects:
by adopting the technical scheme, the resource scheduling of the Tensorflow Allreduce frame can be optimized, the resource modules are automatically distributed, and the optimized efficient communication loop (resource module loop) with the minimum sum of the current communication time parameters is determined, so that the communication time in the training process can be reduced, and the training speed in the deep learning process is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
Fig. 1 is a flowchart illustrating steps of a resource scheduling method according to the present application.
Fig. 2 is a flowchart illustrating a procedure for obtaining a current communication time parameter according to the present application.
FIG. 3 is a flow chart illustrating one step of determining a resource module loop shown in the present application.
FIG. 4 is a flow chart illustrating another process for determining a resource module loop.
Fig. 5 is a block diagram of a resource scheduling apparatus according to the present application.
Fig. 6 is a schematic structural diagram of an electronic device shown in the present application.
Fig. 7 is a block diagram of an electronic device shown in the present application.
Fig. 8 is a block diagram of an electronic device shown in the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
Ai (artificial intelligence), which refers to the intelligence expressed by machines manufactured by humans. Generally, artificial intelligence refers to human intelligence technology implemented by means of ordinary computer programs.
Machine Learning (Machine Learning) is a discipline that specializes in how computers can simulate or implement human Learning behaviors to acquire new knowledge or skills and reorganize existing knowledge structures to continuously improve their performance.
Machine learning neighborhoods have been progressing continuously since 2006, and artificial intelligence has been deepening into our daily lives. The technical means used in the AI neighborhood not only depend on the parallel processing capacity of cloud computing on big data, but also depend on a deep learning algorithm. The deep learning directly tries to solve the problem of abstract cognition, and brings artificial intelligence with a new step. Deep learning is not only concerned about in academic circles, but also large-scale investment is started in the industry due to the extremely strong practicability of the deep learning, and many products, such as automatic driving automobiles, smart homes and even speech recognition on daily mobile phones, can benefit from the deep learning.
At present, deep learning has achieved breakthrough progress in several major areas: in the field of speech recognition, deep learning replaces a Gaussian Mixture Model (GMM) in an acoustic Model with a deep Model, and error rate reduction of about 30% is obtained; in the field of image recognition, by constructing a deep Convolutional Neural Network (CNN), the error rate of Top5 is greatly reduced from 26% to 15%, and the error rate is further reduced to 11% by increasing the deepened network structure; in the field of natural language processing, deep learning basically obtains results equivalent to other method levels, but complicated feature extraction steps can be omitted. It can be said that deep learning is by far the most intelligent learning approach to the human brain.
With the rapid development of the deep learning field, the tensrflow has become the most popular deep learning item in recent years. The Ring-Allreduce is a distributed deep learning architecture, is integrated into Tensflow1.11 and above versions, and is used as one of distributed calling modes of an estimator high-level API.
Device allocation in the current tensflow's allreduce distribution can run distributively using artificially designated devices. However, the inventors have found that there are problems with manually specifying a device by a user: an algorithm engineer needs to know the GPU condition of a physical machine, and the learning cost of the algorithm engineer on a bottom layer is increased; when the equipment is appointed, the code is required to be modified, when a plurality of persons share the server, the idle condition of the bottom equipment is required to be known before running each time, and the code is modified, so that the reusability of the code is reduced; improper manual designation (such as misoperation) can cause problems such as downtime of the physical machine.
In addition, if the mode of specifying the equipment by the user is not adopted, the Tensorflow allocates the available equipment according to the number sequence when the allreduce architecture is called. The inventor finds that under the scheduling mode according to the equipment numbering sequence, non-optimal scheduling can occur, and further the communication time length between the equipment is increased. Because the number of times related to communication is extremely large in the deep learning process, the training speed is reduced due to the increase of the communication time length, and the engineering efficiency is reduced.
In order to solve the above problem, a resource scheduling method provided by an embodiment of the present application is illustrated with reference to fig. 1, and is applied to an electronic device, where the electronic device includes a plurality of resource modules, and the method includes the following steps.
In step S101, an initial communication time parameter between a plurality of resource modules and an occupancy rate of each resource module are obtained.
The resource module may be a processor, such as a Central Processing Unit (CPU) or a Graphics Processing Unit (GPU).
The occupancy rate is used for representing the occupation condition of the process or program running on the resource module, and the higher the occupancy rate is, the more programs are run on the resource module, and the less programs are run on the resource module.
In practical applications, the occupancy rate may be obtained in various manners, for example, by a resource manager or monitoring software.
The initial communication time parameters between the resource modules may be obtained in various manners, for example, may be determined according to connection types between the resource modules, and a specific implementation manner of obtaining the initial communication time parameters between the resource modules will be described in detail in the following embodiments.
In step S102, a current communication time parameter between the plurality of resource modules is obtained according to the initial communication time parameter between the plurality of resource modules and the occupancy rate of each resource module.
In practical application, there are various ways to determine the current communication time parameters between resource modules according to the occupancy rates of the resource modules and the initial communication time parameters between the resource modules. For example, whether a resource module is occupied or not may be determined according to the occupancy rate of each resource module, and then the communication time parameters between the occupied resource module and each other resource module may be adjusted to obtain the current communication time parameters between the plurality of resource modules; in another implementation manner, an initial state matrix may be determined according to initial communication time parameters between resource modules, then whether the resource modules are occupied is determined according to the occupancy rates of the resource modules, and a matrix element associated with the occupied resource module in the initial state matrix is adjusted to obtain a current state matrix, where the matrix element in the current state matrix is the current communication time parameter between the resource modules. The following embodiments will detail both implementations.
In step S103, a resource module loop including a preset number of resource modules is determined according to the current communication time parameters between the resource modules, and the sum of the current communication time parameters between adjacent resource modules in the resource module loop is the minimum.
There are various ways of determining resource module loops according to the current communication time parameters between the resource modules, for example, all loops to be allocated including a preset number of resource modules may be determined first, then the sum of the current communication time parameters of each loop to be allocated is calculated, and the loop to be allocated with the smallest sum of the current communication time parameters is determined as the resource module loop; and calculating a current state matrix (adjacent matrix) formed by current communication time parameters among the resource modules according to a dynamic programming algorithm to determine a resource module loop with the minimum sum of the current communication time parameters. The preset number of values may be specifically determined according to actual needs, and this embodiment does not limit this.
In practical application, when K nodes (such as GPUs) need to be scheduled, a GPU loop including K GPUs can be determined, wherein the sum of current communication time parameters between adjacent GPUs in the GPU loop is minimum, and the GPU loop is used as a scheduling resource, so that the most efficient communication loop can be obtained by automatically allocating resource modules.
When the determined resource module loop is used for model training, the occupancy rates of a preset number of resource modules in the resource module loop are increased, when the model training is finished and the resource module resources are released, the corresponding occupancy rates of the resource modules are reduced, and calculation can be performed according to the updated occupancy rates of the resource modules in the subsequent resource scheduling process.
The resource scheduling method provided by the embodiment can optimize the resource scheduling of the Tensorflow Allreduce frame, automatically allocate resource modules and determine an optimized efficient communication loop, thereby reducing the communication time in the training process and improving the training speed in the deep learning process; moreover, an algorithm engineer does not need to consider the condition of bottom-layer equipment and a scheduling strategy, so that the work of developers is reduced, and the developers can concentrate on the research of the algorithm; in addition, codes are not required to be modified in the process of resource module allocation, the reusability of the codes is enhanced, and the problems of physical downtime and the like caused by manual misoperation are solved.
In an implementation manner of obtaining an initial communication time parameter between a plurality of resource modules, an electronic device includes a plurality of sub-devices, each sub-device includes a resource module, and in step S101, the method may specifically include: and obtaining initial communication time parameters among the resource modules according to the connection mode among the resource modules in the sub-equipment and the network connection type among the sub-equipment.
The electronic device shown in fig. 6 is a small cluster, and includes a plurality of sub-devices, namely, a machine 1, a machine 2, and a machine 3, where each sub-device includes a resource module, and the resource module is a GPU. The initial communication time parameter between resource modules inside each sub-device and between sub-devices may be as shown in the topological diagram in fig. 6, and the value on the topological line is the initial communication time parameter (which may or may not represent the real communication time) for representing the speed of communication between devices, and a smaller initial communication time parameter indicates a faster communication speed.
The initial communication time parameters between the internal resource modules of the sub-device may be determined according to the connection mode between the resource modules, for example, a keyword of the connection mode between the resource modules may be obtained through a GPU driver of Nvidia, and then the initial communication time parameters between the internal resource modules of the sub-device may be determined according to the preset initial communication time parameters corresponding to the keywords. For example, the correspondence between the keywords representing different connection relationships and the initial communication time parameter may be defined as: the SOC is 7, SYS is 6, NODE is 5, PHB is 4, PXB is 3, PIX is 2, NV # is 1, and the corresponding relation between the keyword and the initial communication time parameter can be configured according to actual conditions.
In fig. 6, the communication structures of the GPUs in the machine 1 and the machine 2 are the same, and here, taking the machine 1 as an example for description, the initial communication time parameters between the GPU0 and the GPU1, and between the GPU2 and the GPU3 are both 1, and the communication is fastest; communication between the GPU0 and the GPU2, between the GPU0 and the GPU3, between the GPU1 and the GPU2, and between the GPU1 and the GPU3 all need to pass through a bridge, so the initial communication time parameter is 2, and the communication is slow. In fig. 6, the machine 3 includes two sets of resource modules, the first set of resource modules includes GPU0, GPU1, GPU2, and GPU3, the second set of resource modules includes GPU4, GPU5, GPU6, and GPU7, and the initial communication time parameter between the resource modules in the set is shown as a numerical value on a topological line in fig. 6, and since the connection between the first set of resource modules and the second set of resource modules needs to be transmitted through one NUMA node, the initial communication time parameter between any GPU in the first set of resource modules and any GPU in the second set of resource modules can be set to 4.
The initial communication time parameter between the sub-devices may be determined according to a network connection type between the sub-devices, and specifically, the communication distance between the sub-devices may be obtained according to the network connection type between the sub-devices, and then the initial communication time parameter between the sub-devices may be determined according to a correspondence between the communication distance and the initial communication time parameter. The network connection type may include a wireless connection, a wired connection, and the like. For example, the type of network connection between the machines shown in fig. 6 is a wireless connection, and an initial communication time parameter between the machines may be set to 10, that is, an initial communication time parameter between any GPU in machine x and any GPU in machine y may be set to 10, where x is 1, 2, and 3, and y is 1, 2, and 3.
Then, according to the initial communication time parameter between the internal resource modules of the sub-devices and the initial communication parameter between the sub-devices, the initial communication time parameter between any two GPUs of the whole cluster can be obtained. For example, the initial communication time parameter between the GPU0 in the machine 1 and the GPU2 in the machine 1 is 2, and the initial communication time parameter between the GPU0 in the machine 1 and the GPU1 in the machine 3 is 10; the initial communication time parameter between the GPU0 in machine 3 and the GPU5 in machine 3 is 4; the initial communication time parameter between the GPU2 in machine 1 and the GPU3 in machine 2 is 10.
In an implementation manner of this embodiment, referring to fig. 2, in step 102, the method may further include:
in step S201, it is determined whether the electronic device includes a first resource module, and an occupancy rate of the first resource module is greater than or equal to a first preset threshold.
Through the step, whether the electronic device or the plurality of resource modules include an occupied first resource module or not is judged, and specifically, whether the electronic device or the plurality of resource modules include a resource module with an occupancy rate greater than or equal to a first preset threshold value or not can be judged. The specific value of the first preset threshold may be set according to actual conditions, and may be set to 50%, for example.
When the occupancy rate of a certain resource module is greater than or equal to a first preset threshold value, the resource module can be judged to be occupied, and the electronic equipment comprises the first resource module; when the occupancy rate of a certain resource module is smaller than the first preset threshold value, the resource module can be judged to be unoccupied and to be in a current available state; when all the resource modules are in the current available state, the first resource module is not included in the electronic device.
In step S202, if yes, the communication time parameter between each resource module and the first resource module is adjusted from the initial communication time parameter to be greater than or equal to a second preset threshold, so as to obtain the current communication time parameter between the plurality of resource modules.
Specifically, when the electronic device includes an occupied resource module, that is, a first resource module, the communication time parameter between each resource module (whether in the occupied state or in the current available state) and the first resource module may be adjusted, and the initial communication time parameter is adjusted to be greater than or equal to a second preset threshold value, so as to obtain the current communication time parameter between the plurality of resource modules. The second preset threshold may be set according to an actual situation, and the specific value is not limited in the present application, for example, the current communication time parameter between each resource module and the first resource module may be set to infinity.
When the occupied resource of the first resource module is released, the current communication time parameter (infinity) between the resource module in the current available state and the released first resource module can be recovered to the initial communication time parameter, and the current communication time parameters (infinity) between the other occupied resource modules and the released first resource module are unchanged.
The initial communication time parameter between the resource modules in the current available state is not adjusted, that is, the current communication time parameter between the two resource modules in the current available state is still the initial communication time parameter.
In another implementation manner of this embodiment, step 102 may include: determining an initial state matrix according to initial communication time parameters among a plurality of resource modules; when the electronic equipment comprises the first resource module, adjusting matrix elements in the initial state matrix, which are associated with the first resource module, to obtain a current state matrix, wherein each matrix element in the current state matrix is a current communication time parameter between the resource modules, and the occupancy rate of the first resource module is greater than or equal to a first preset threshold value.
Specifically, the initial state matrix is an undirected graph with weights, and the weights represent initial communication time parameters between the associated resource modules. And the matrix element value positioned in the mth row and the nth column in the initial state matrix is an initial communication time parameter between the mth resource module and the nth resource module in the cluster.
The process of determining whether the electronic device includes the first resource module may refer to the description in step S201.
When the electronic device includes an occupied resource module, i.e., a first resource module, a matrix element associated with the first resource module may be adjusted, and a value is adjusted from an initial communication time parameter to a current communication time parameter (e.g., infinity), so as to obtain a current state matrix. For example, when the number of the occupied resource module is k, the elements on the k-th row and the elements on the k-th column in the current state matrix are both infinite.
When resource modules in the current available state, for example, two resource modules numbered i and j are both in the current available state, the matrix element value in the ith row and the jth column in the current state matrix is still the initial communication time parameter.
In an implementation manner of this embodiment, referring to fig. 3, in step S103, the method may further include:
in step S301: and generating an adjacency matrix according to the current communication time parameters among the resource modules.
Specifically, the adjacency matrix is an undirected graph with weights, and the weights represent the current communication time parameters between the associated resource modules. And the matrix element positioned in the mth row and the nth column in the adjacent matrix is the current communication time parameter between the mth resource module and the nth resource module in the cluster. The adjacency matrix may be the same as the current-state matrix described previously.
When the number of the occupied resource module is k, since the current communication time parameter between the kth resource module and any other resource module (whether in the occupied state or in the currently available state) is, for example, infinity, the values of the elements in the kth row and the elements in the kth column in the adjacent matrix are both infinity. When the resource module numbered i and the resource module numbered j are both in the current available state, because the current communication time parameter between the ith resource module and the jth resource module is the initial communication time parameter, the matrix element value positioned in the ith row and the jth column in the adjacent matrix is the initial communication time parameter between the ith resource module and the jth resource module.
Wherein m, n, i, j, and k are numbers of resource modules in the cluster, for example, numbers of GPUs 0-3 in machine 1 in fig. 6 may be 0-3 in sequence; the numbers of the GPUs 0-3 in the machine 2 can be 4-7 in sequence; the numbering of GPUs 0-7 in machine 3 may be 8-15 in sequence.
In step S302: and solving the adjacency matrix by adopting a dynamic programming algorithm, and determining a resource module loop containing a preset number of resource modules.
Specifically, a recursive method may be adopted to deeply traverse each element in the adjacency matrix, and solve to obtain K resource modules with the smallest sum of current communication time parameters between adjacent resource modules, so as to obtain a resource module loop.
In another implementation manner of this embodiment, referring to fig. 4, in step S103, the method may further include:
in step S401, all to-be-allocated loops are determined according to the current communication time parameter between the plurality of resource modules, and each to-be-allocated loop includes a preset number of resource modules.
Specifically, all loops to be allocated (including K resource modules) can be solved for the adjacency matrix by using a Tarjan algorithm, and sets are used to remove repeated loops to be allocated, so that a loop set to be allocated containing K nodes is obtained.
The set is one of the sets, and when the set takes a value, no null value is allowed, and no repeated value is allowed, so that the repeated loops to be allocated can be removed by using the set. There are mainly 2 implementations of Set, one is TreeSet and the other is HashSet.
In step S402, the sum of the current communication time parameters between the adjacent resource modules in each loop to be allocated is calculated.
Specifically, the sum of the current communication time parameters between adjacent resource modules in each loop to be allocated in the set is solved. For example, when a loop to be allocated includes a-b-c-d resource modules connected end to end, the loop to be allocated includes a-b-c-d resource modules connected end to endThe sum of the current communication time parameters of the distribution loop is tab+tbc+tcd+tdaWherein t isab、tbc、tcdAnd tdaThe current communication time parameters are respectively between a and b, between b and c, between c and d and between d and a.
It can be understood that, when the loop to be allocated includes occupied resource modules, the sum of the current communication time parameters of the loop to be allocated is infinite.
In step S403, the loop to be allocated with the minimum sum of the current communication time parameters is determined as the resource module loop.
Specifically, the to-be-allocated loops may be sorted in the order from small to large according to the sum of the current communication time parameters, and the to-be-allocated loop with the smallest sum of the current communication time parameters is determined as the resource module loop.
According to the resource scheduling method provided by the embodiment, the GPU topology and the optimal selection algorithm are utilized to automatically allocate resource modules, solve the resource module loop with the minimum sum of the current communication time parameters, schedule the resource modules and optimize the scheduling strategy of the Tensflow bottom layer equipment, so that the communication time overhead among equipment is minimized when the Tensflow Allreduce frame is trained, and the training speed is increased. Meanwhile, an algorithm engineer does not need to consider a scheduling strategy, so that the work of developers is reduced, and the developers are more concentrated on the research of the algorithm.
Fig. 5 is a block diagram of a resource scheduling apparatus according to the present application. Referring to fig. 5, the apparatus is applied to an electronic device including a plurality of resource modules, and may include:
a first obtaining module 51 configured to obtain an initial communication time parameter between the plurality of resource modules and an occupancy rate of each of the resource modules;
a second obtaining module 52, configured to obtain a current communication time parameter between the plurality of resource modules according to the initial communication time parameter between the plurality of resource modules and the occupancy rate of each of the resource modules;
the allocating module 53 is configured to determine a resource module loop including a preset number of resource modules according to the current communication time parameters among the plurality of resource modules, where a sum of the current communication time parameters among adjacent resource modules in the resource module loop is minimum.
The resource module may be a processor, such as a Central Processing Unit (CPU) or a Graphics Processing Unit (GPU).
The occupancy rate is used for representing the occupation condition of the process or program running on the resource module, and the higher the occupancy rate is, the more programs are run on the resource module, and the less programs are run on the resource module.
In practical applications, the first obtaining module 51 may obtain the occupancy rate in various ways, for example, by a resource manager or monitoring software.
The initial communication time parameters between the resource modules may be obtained by the first obtaining module 51 in various ways, for example, may be determined according to the connection types between the resource modules, and the specific implementation manner of obtaining the initial communication time parameters between the resource modules will be described in detail in the following embodiments.
In practical applications, the second obtaining module 52 has a plurality of implementation manners for determining the current communication time parameter between the resource modules according to the occupancy rates of the resource modules and the initial communication time parameter between the resource modules. For example, whether a resource module is occupied or not may be determined according to the occupancy rate of each resource module, and then the communication time parameters between the occupied resource module and each other resource module may be adjusted to obtain the current communication time parameters between the plurality of resource modules; in another implementation manner, an initial state matrix may be determined according to initial communication time parameters between resource modules, then whether the resource modules are occupied is determined according to the occupancy rates of the resource modules, and a matrix element associated with the occupied resource module in the initial state matrix is adjusted to obtain a current state matrix, where the matrix element in the current state matrix is the current communication time parameter between the resource modules. The following embodiments will describe these two implementations in detail.
The allocation module 53 determines the resource module loop according to the current communication time parameter between the resource modules in various ways, for example, it may first determine all to-be-allocated loops including a preset number of resource modules, then calculate the sum of the current communication time parameters of each to-be-allocated loop, and determine the to-be-allocated loop with the smallest sum of the current communication time parameters as the resource module loop; and calculating a current state matrix (adjacent matrix) formed by current communication time parameters among the resource modules according to a dynamic programming algorithm to determine a resource module loop with the minimum sum of the current communication time parameters. The preset number of values may be specifically determined according to actual needs, and this embodiment does not limit this.
In practical application, when K nodes (such as GPUs) need to be scheduled, a GPU loop including K GPUs can be determined, wherein the sum of current communication time parameters between adjacent GPUs in the GPU loop is minimum, and the GPU loop is used as a scheduling resource, so that the most efficient communication loop can be obtained by automatically allocating resource modules.
When the determined resource module loop is used for model training, the occupancy rates of a preset number of resource modules in the resource module loop are increased, when the model training is finished and the resource module resources are released, the corresponding occupancy rates of the resource modules are reduced, and calculation can be performed according to the updated occupancy rates of the resource modules in the subsequent resource scheduling process.
The resource scheduling device provided by the embodiment can optimize the resource scheduling of the Tensorflow Allreduce frame, automatically allocate resource modules and determine an optimized efficient communication loop, thereby reducing the communication time in the training process and improving the training speed in the deep learning process; moreover, an algorithm engineer does not need to consider the condition of bottom-layer equipment and a scheduling strategy, so that the work of developers is reduced, and the developers can concentrate on the research of the algorithm; in addition, codes are not required to be modified in the process of resource module allocation, the reusability of the codes is enhanced, and the problems of physical downtime and the like caused by manual misoperation are solved.
In an optional implementation manner, the electronic device includes a plurality of sub-devices, each of the sub-devices includes the resource module, and the first obtaining module 51 is further configured to:
and acquiring initial communication time parameters among the resource modules according to the connection mode among the resource modules in the sub-equipment and the network connection type among the sub-equipment.
In an optional implementation, the second obtaining module 52 is further configured to:
judging whether the electronic equipment comprises a first resource module, wherein the occupancy rate of the first resource module is greater than or equal to a first preset threshold value;
if so, adjusting the communication time parameter between each resource module and the first resource module from the initial communication time parameter to be more than a second preset threshold value, and obtaining the current communication time parameter between the resource modules.
In an alternative implementation, the assignment module 53 is further configured to:
generating an adjacency matrix according to the current communication time parameters among the resource modules;
and solving the adjacency matrix by adopting a dynamic programming algorithm, and determining a resource module loop containing a preset number of resource modules.
In an alternative implementation, the allocating module 53 is further configured to:
determining all loops to be distributed according to the current communication time parameters among the resource modules, wherein each loop to be distributed comprises a preset number of resource modules;
calculating the sum of current communication time parameters between adjacent resource modules in each loop to be distributed;
and determining the loop to be allocated with the minimum sum of the current communication time parameters as the resource module loop.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs operations and advantageous effects have been described in detail in the embodiment related to the method, and will not be elaborated upon here.
Fig. 7 is a block diagram of an electronic device 800 shown in the present application. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.
Referring to fig. 7, electronic device 800 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.
The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 802 may include one or more resource modules 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.
The memory 804 is configured to store various types of data to support operation at the device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, images, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile and non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.
The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 800 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.
The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, the sensor assembly 814 may detect an open/closed state of the device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in the position of the electronic device 800 or a component of the electronic device 800, the presence or absence of user contact with the electronic device 800, orientation or acceleration/deceleration of the electronic device 800, and a change in the temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as WiFi, a carrier network (such as 2G, 3G, 4G, or 5G), or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital signal resource modules (DSPs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-resource modules, or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 804 comprising instructions, executable by the resource module 820 of the electronic device 800 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
Fig. 8 is a block diagram of an electronic device 1900 shown in the present application. For example, the electronic device 1900 may be provided as a server.
Referring to fig. 8, electronic device 1900 includes a processing component 1922 further including one or more resource modules and memory resources, represented by memory 1932, for storing instructions, e.g., application programs, executable by processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the above-described method.
The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system stored in memory 1932, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.
A1, a resource scheduling method, the method is applied to an electronic device, the electronic device comprises a plurality of resource modules, the method comprises:
acquiring initial communication time parameters among the resource modules and the occupancy rate of each resource module;
obtaining current communication time parameters among the plurality of resource modules according to the initial communication time parameters among the plurality of resource modules and the occupancy rates of the resource modules;
and determining a resource module loop containing a preset number of resource modules according to the current communication time parameters among the resource modules, wherein the sum of the current communication time parameters among the adjacent resource modules in the resource module loop is minimum.
A2, the method for scheduling resources according to a1, wherein the electronic device includes a plurality of sub-devices, each of the sub-devices includes the resource module, and the step of obtaining the initial communication time parameters between the resource modules includes:
and acquiring initial communication time parameters among the resource modules according to the connection mode among the resource modules in the sub-equipment and the network connection type among the sub-equipment.
A3, according to the resource scheduling method of a1, the step of obtaining the current communication time parameter among the plurality of resource modules according to the initial communication time parameter among the plurality of resource modules and the occupancy rate of each resource module includes:
judging whether the electronic equipment comprises a first resource module, wherein the occupancy rate of the first resource module is greater than or equal to a first preset threshold value;
if so, adjusting the communication time parameter between each resource module and the first resource module from the initial communication time parameter to be more than a second preset threshold value, and obtaining the current communication time parameter between the resource modules.
A4, the method for scheduling resources according to any one of A1 to A3, wherein the step of determining a resource module loop including a predetermined number of resource modules according to the current communication time parameter between the resource modules comprises:
generating an adjacency matrix according to the current communication time parameters among the resource modules;
and solving the adjacency matrix by adopting a dynamic programming algorithm, and determining a resource module loop containing a preset number of resource modules.
A5, the method for scheduling resources according to any one of A1 to A3, wherein the step of determining a resource module loop including a predetermined number of resource modules according to the current communication time parameter between the resource modules comprises:
determining all loops to be distributed according to the current communication time parameters among the resource modules, wherein each loop to be distributed comprises a preset number of resource modules;
calculating the sum of current communication time parameters between adjacent resource modules in each loop to be distributed;
and determining the loop to be allocated with the minimum sum of the current communication time parameters as the resource module loop.
A6, a resource scheduling apparatus, the apparatus is applied to an electronic device, the electronic device includes a plurality of resource modules, the apparatus includes:
a first obtaining module configured to obtain an initial communication time parameter among the plurality of resource modules and an occupancy rate of each of the resource modules;
the second acquisition module is configured to obtain current communication time parameters among the plurality of resource modules according to the initial communication time parameters among the plurality of resource modules and the occupancy rates of the resource modules;
the allocation module is configured to determine a resource module loop including a preset number of resource modules according to the current communication time parameters among the plurality of resource modules, and the sum of the current communication time parameters among adjacent resource modules in the resource module loop is minimum.
A7, the resource scheduling apparatus according to A6, wherein the electronic device includes a plurality of sub-devices, each of the sub-devices includes the resource module, and the first obtaining module is further configured to:
and acquiring initial communication time parameters among the plurality of resource modules according to the connection mode among the resource modules in the sub-equipment and the network connection type among the sub-equipment.
A8, the resource scheduling apparatus of A6, the second obtaining module further configured to:
judging whether the electronic equipment comprises a first resource module, wherein the occupancy rate of the first resource module is greater than or equal to a first preset threshold value;
if so, adjusting the communication time parameter between each resource module and the first resource module from the initial communication time parameter to be more than a second preset threshold value, and obtaining the current communication time parameter between the resource modules.
A9, the resource scheduling apparatus of any one of A6 to A8, the allocation module further configured to:
generating an adjacency matrix according to the current communication time parameters among the resource modules;
and solving the adjacency matrix by adopting a dynamic programming algorithm, and determining a resource module loop containing a preset number of resource modules.
A10, the resource scheduling apparatus of any one of A6 to A8, the allocation module further configured to:
determining all loops to be distributed according to the current communication time parameters among the resource modules, wherein each loop to be distributed comprises a preset number of resource modules;
calculating the sum of current communication time parameters between adjacent resource modules in each loop to be distributed;
and determining the loop to be allocated with the minimum sum of the current communication time parameters as the resource module loop.

Claims (10)

1. A resource scheduling method is applied to an electronic device, the electronic device comprises a plurality of resource modules, and the method comprises the following steps:
acquiring initial communication time parameters among the resource modules and the occupancy rate of each resource module; the initial communication time parameter is determined according to the connection type among the resource modules;
obtaining current communication time parameters among the plurality of resource modules according to the initial communication time parameters among the plurality of resource modules and the occupancy rates of the resource modules;
determining a resource module loop containing a preset number of resource modules according to the current communication time parameters among the resource modules, wherein the sum of the current communication time parameters among adjacent resource modules in the resource module loop is minimum;
wherein the resource module loop is used for model training;
the step of obtaining the current communication time parameters among the plurality of resource modules according to the initial communication time parameters among the plurality of resource modules and the occupancy rates of the resource modules includes:
judging whether the electronic equipment comprises a first resource module, wherein the occupancy rate of the first resource module is greater than or equal to a first preset threshold value;
if so, adjusting the communication time parameter between each resource module and the first resource module from the initial communication time parameter to be more than a second preset threshold value, and obtaining the current communication time parameter between the resource modules.
2. The method according to claim 1, wherein the electronic device comprises a plurality of sub-devices, each of the sub-devices comprises the resource module, and the step of obtaining the initial communication time parameters between the plurality of resource modules comprises:
and acquiring initial communication time parameters among the resource modules according to the connection mode among the resource modules in the sub-equipment and the network connection type among the sub-equipment.
3. The method according to claim 1 or 2, wherein the step of determining a resource module loop including a predetermined number of resource modules according to the current communication time parameter between the plurality of resource modules comprises:
generating an adjacency matrix according to the current communication time parameters among the resource modules;
and solving the adjacency matrix by adopting a dynamic programming algorithm, and determining a resource module loop comprising a preset number of resource modules.
4. The method according to claim 1 or 2, wherein the step of determining a resource module loop including a predetermined number of resource modules according to the current communication time parameter between the plurality of resource modules comprises:
determining all loops to be distributed according to the current communication time parameters among the resource modules, wherein each loop to be distributed comprises a preset number of resource modules;
calculating the sum of current communication time parameters between adjacent resource modules in each loop to be distributed;
and determining the loop to be distributed with the minimum sum of the current communication time parameters as the resource module loop.
5. A resource scheduling apparatus, the apparatus being applied to an electronic device, the electronic device including a plurality of resource modules, the apparatus comprising:
a first obtaining module configured to obtain an initial communication time parameter among the plurality of resource modules and an occupancy rate of each of the resource modules; the initial communication time parameter is determined according to the connection type among the resource modules;
the second acquisition module is configured to obtain current communication time parameters among the plurality of resource modules according to the initial communication time parameters among the plurality of resource modules and the occupancy rates of the resource modules;
the allocation module is configured to determine a resource module loop containing a preset number of resource modules according to current communication time parameters among the plurality of resource modules, wherein the sum of the current communication time parameters among adjacent resource modules in the resource module loop is minimum;
wherein the resource module loop is used for model training;
the second acquisition module is further configured to:
judging whether the electronic equipment comprises a first resource module, wherein the occupancy rate of the first resource module is greater than or equal to a first preset threshold value;
if so, adjusting the communication time parameter between each resource module and the first resource module from the initial communication time parameter to be more than a second preset threshold value, and obtaining the current communication time parameter between the resource modules.
6. The apparatus according to claim 5, wherein the electronic device comprises a plurality of sub-devices, each of the sub-devices comprising the resource module, and wherein the first obtaining module is further configured to:
and acquiring initial communication time parameters among the resource modules according to the connection mode among the resource modules in the sub-equipment and the network connection type among the sub-equipment.
7. The resource scheduling apparatus according to claim 5 or 6, wherein the allocation module is further configured to:
generating an adjacency matrix according to the current communication time parameters among the resource modules;
and solving the adjacency matrix by adopting a dynamic programming algorithm, and determining a resource module loop containing a preset number of resource modules.
8. The resource scheduling apparatus according to claim 5 or 6, wherein the allocation module is further configured to:
determining all loops to be distributed according to the current communication time parameters among the resource modules, wherein each loop to be distributed comprises a preset number of resource modules;
calculating the sum of current communication time parameters between adjacent resource modules in each loop to be distributed;
and determining the loop to be allocated with the minimum sum of the current communication time parameters as the resource module loop.
9. An electronic device, characterized in that the electronic device comprises:
a resource module;
a memory for storing resource module executable instructions;
wherein the resource module is configured to perform the resource scheduling method according to any one of claims 1-4.
10. A non-transitory computer readable storage medium, instructions in which, when executed by a resource module of an electronic device, enable the electronic device to perform the resource scheduling method of any one of claims 1-4.
CN201910223115.XA 2019-03-22 2019-03-22 Resource scheduling method, device, electronic equipment and storage medium Active CN110096356B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910223115.XA CN110096356B (en) 2019-03-22 2019-03-22 Resource scheduling method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910223115.XA CN110096356B (en) 2019-03-22 2019-03-22 Resource scheduling method, device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110096356A CN110096356A (en) 2019-08-06
CN110096356B true CN110096356B (en) 2022-06-03

Family

ID=67443287

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910223115.XA Active CN110096356B (en) 2019-03-22 2019-03-22 Resource scheduling method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110096356B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111105016B (en) * 2019-12-06 2023-04-28 浪潮电子信息产业股份有限公司 Data processing method and device, electronic equipment and readable storage medium
CN111176841B (en) * 2019-12-20 2023-08-11 北京达佳互联信息技术有限公司 Distribution method and device of graphics processor resources, electronic equipment and storage medium
CN113965574B (en) * 2021-09-27 2022-07-12 西安交通大学 Cloud service data center backup virtual resource scheduling method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3399418A1 (en) * 2017-05-05 2018-11-07 INTEL Corporation Fine-grain compute communication execution for deep learning frameworks
CN108986063A (en) * 2018-07-25 2018-12-11 浪潮(北京)电子信息产业有限公司 The method, apparatus and computer readable storage medium of gradient fusion
CN109034386A (en) * 2018-06-26 2018-12-18 中国科学院计算机网络信息中心 A kind of deep learning system and method based on Resource Scheduler

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11094029B2 (en) * 2017-04-10 2021-08-17 Intel Corporation Abstraction layers for scalable distributed machine learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3399418A1 (en) * 2017-05-05 2018-11-07 INTEL Corporation Fine-grain compute communication execution for deep learning frameworks
CN109034386A (en) * 2018-06-26 2018-12-18 中国科学院计算机网络信息中心 A kind of deep learning system and method based on Resource Scheduler
CN108986063A (en) * 2018-07-25 2018-12-11 浪潮(北京)电子信息产业有限公司 The method, apparatus and computer readable storage medium of gradient fusion

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Horovod: fast and easy distributed deep learning in TensorFlow;Alexander Sergeev,Mike Del Balso;《ResearchGate》;20180221;全文 *

Also Published As

Publication number Publication date
CN110096356A (en) 2019-08-06

Similar Documents

Publication Publication Date Title
US11699213B2 (en) Image-capturing device and method for controlling same
CN110096356B (en) Resource scheduling method, device, electronic equipment and storage medium
CN111651263B (en) Resource processing method and device of mobile terminal, computer equipment and storage medium
CN110298437B (en) Neural network segmentation calculation method and device, storage medium and mobile terminal
CN107657590B (en) Picture processing method and device and storage medium
JP2021517282A (en) Network modules, allocation methods and devices, electronic devices and storage media
CN109858614B (en) Neural network training method and device, electronic equipment and storage medium
CN107590534B (en) Method and device for training deep convolutional neural network model and storage medium
WO2023201947A1 (en) Methods, systems, and storage media for task dispatch
CN113032112A (en) Resource scheduling method and device, electronic equipment and storage medium
CN109255784B (en) Image processing method and device, electronic equipment and storage medium
CN109272118B (en) Data training method, device, equipment and storage medium
CN106339260B (en) Task allocation method and device based on Jenkins platform
CN114595785A (en) Model training method and device, electronic equipment and storage medium
CN105847558A (en) Calendar event mode switching method based on mobile terminal and device thereof
CN107480773B (en) Method and device for training convolutional neural network model and storage medium
CN111312243B (en) Equipment interaction method and device
CN110689478B (en) Image stylization processing method and device, electronic equipment and readable medium
CN112486658A (en) Task scheduling method and device for task scheduling
CN110543900A (en) Image processing method and device, electronic equipment and storage medium
US20200293884A1 (en) Image processing method and device and terminal
CN110909886B (en) Machine learning network operation method, device and medium
CN109711386B (en) Method and device for obtaining recognition model, electronic equipment and storage medium
CN113204443A (en) Data processing method, equipment, medium and product based on federal learning framework
CN111176841B (en) Distribution method and device of graphics processor resources, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant