CN109034386A - A kind of deep learning system and method based on Resource Scheduler - Google Patents
A kind of deep learning system and method based on Resource Scheduler Download PDFInfo
- Publication number
- CN109034386A CN109034386A CN201810668856.4A CN201810668856A CN109034386A CN 109034386 A CN109034386 A CN 109034386A CN 201810668856 A CN201810668856 A CN 201810668856A CN 109034386 A CN109034386 A CN 109034386A
- Authority
- CN
- China
- Prior art keywords
- deep learning
- resource scheduler
- resource
- performance calculation
- user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Neurology (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
The present invention provides a kind of deep learning system and method based on Resource Scheduler, comprising: multiple high-performance calculation nodes, each high-performance calculation node include muti-piece graphics processor;Further include: Resource Scheduler and deep learning frame, wherein Resource Scheduler is used for according to the mentioned demand of user, and resource allocation required for choosing from multiple high-performance calculation nodes is to user;The environmental variance that the Resource Scheduler distributes to user resources is parsed by parsing plug-in unit, obtains corresponding parameter;Deep learning frame forms the process of an operation according to the parameter, to start to execute deep learning program;After the completion of deep learning program, the Resource Scheduler recycles the resource of all distribution, to complete entire depth learning process.The present invention provides the system of the centralized management of an entirety for all kinds of deep learning frames, effectively improves the operation efficiency of Distributed Learning frame.
Description
Technical field
The present invention relates to artificial intelligence deep learning technology field more particularly to a kind of depth based on Resource Scheduler
Learning system and its method.
Background technique
The arrival of Internet of Things and mobile internet era, data generate side's aspect in the form of all kinds of from production and living
Face, such as: perceptron, journal file, emails, social media, all kinds of pictures and video etc..Current 80% number according to estimates
According to being Un-structured, the data of Un-structured are just increased with the data of 15 times of structurings, it is contemplated that arrive the year two thousand twenty global metadata
Total amount is up to 40zettabytes (1021bytes), and the mankind have really stepped into a data-centered epoch.It passes
On system, HPC (High Performance Computing Cluster) combines closely with the extensive scientific algorithm of solution and big data application.HPC is naturally just gathered around
There is a whole set of complete, mature, height optimization family's system technology for high-performance calculation.Such as: special 03 height having
Performance optimization transmitting network (InfiniBand, IBM Blue Gene interconnects), high-performance message passing library
(MPI), the mathematical computations library (BLAS, LAPACK) abundant accelerated towards all kinds of architectures, efficient parallel file storage
System (Lustre, Parastor) and scheduler (Slurm, LSF) by all kinds of combination of software together.
Have evolved into ripe high-performance calculation adapt to using deep learning as representative emphasize big data calculate algorithm, with
Based on the related facility of High Performance Computing Cluster, for the demand in terms of artificial intelligence and machine learning, we need at present
The matter of utmost importance of solution is that scheduler how to be made to adapt to distributed deep learning frame, to carry out to large-scale data deep
The study and training of degree study aspect.
Summary of the invention
To solve the above problems, in a first aspect, the present invention provides a kind of deep learning system based on Resource Scheduler, packet
Include: multiple high-performance calculation nodes, each high-performance calculation node include muti-piece graphics processor;Further include: Resource Scheduler
With deep learning frame, wherein Resource Scheduler is used to be chosen from multiple high-performance calculation nodes according to the mentioned demand of user
Required resource allocation is to user;The environmental variance that user resources are distributed to by parsing plug-in unit resolving resource scheduler, is obtained
Take corresponding parameter;Deep learning frame forms the process of an operation according to parameter, to start to execute deep learning program;
After the completion of deep learning program, Resource Scheduler recycles the resource of all distribution, to complete entire deep learning process.
Preferably, parsing plug-in unit be application container engine, application container engine include Singularity, Shifter or
Docker。
Preferably, the environmental variance that user resources are distributed to by the parsing plug-in unit resolving resource scheduler write in advance, is obtained
Taking corresponding parameter step includes: to be become by the environment that the parsing plug-in unit resolving resource scheduler write in advance distributes to user resources
SLURM_JOB_NODELIST and SLURMD_NODENAME is measured, corresponding parameter cluster, job_name, task_ are obtained
index;Deep learning frame forms the process of an operation according to parameter cluster, job_name, task_index, thus
Start to execute deep learning program.
Preferably, the quantity of high-performance calculation node is 48, and each high-performance calculation node is handled comprising 8 block graphicses
Device.
Preferably, Resource Scheduler is Slurm Resource Scheduler.
Preferably, deep learning frame is TensorFlow deep learning frame.
Second aspect, the present invention provide a kind of deep learning method based on Resource Scheduler, comprising the following steps: resource
Scheduler is according to the mentioned demand of user, and resource allocation required for choosing from multiple high-performance calculation nodes is to user;Pass through
The parsing plug-in unit resolving resource scheduler write in advance distributes to the environmental variance of user resources, obtains corresponding parameter;Depth
It practises frame and forms the process of an operation according to parameter, to start to execute deep learning program;It is completed in deep learning program
Later, Resource Scheduler recycles the resource of all distribution, to complete entire deep learning process.
Preferably, parsing plug-in unit be application container engine, application container engine include Singularity, Shifter or
Docker。
Preferably, the environmental variance that user resources are distributed to by the parsing plug-in unit resolving resource scheduler write in advance, is obtained
Taking corresponding parameter step includes: to be become by the environment that the parsing plug-in unit resolving resource scheduler write in advance distributes to user resources
SLURM_JOB_NODELIST and SLURMD_NODENAME is measured, corresponding parameter cluster, job_name, task_ are obtained
index;Deep learning frame forms the process of an operation according to parameter cluster, job_name, task_index, thus
Start to execute deep learning program.
Preferably, the quantity of high-performance calculation node is 48, and each high-performance calculation node is handled comprising 8 block graphicses
Device.
The present invention provides the system of the centralized management of an entirety for all kinds of deep learning frames, and scheduler is combined and is distributed
Formula storage system can be such that all deep learning frames coexist in a management system, to adjust by the plug-in unit write in advance
It spends device and supports scheduling Distributed Learning frame, so that distributed deep learning frame is just as the equally scheduled device tune of ordinary procedure
With and cancel.Effectively improve the operation efficiency of Distributed Learning frame.
Detailed description of the invention
Fig. 1 is a kind of deep learning method flow schematic diagram based on Resource Scheduler that the embodiment of the present invention one provides;
Fig. 2 is the experiment effect figure of a pair of of embodiment of the present invention single node test performance test;
Fig. 3 is one Distributed T ensorFlow effect picture of the embodiment of the present invention.
Specific embodiment
With reference to the accompanying drawings and examples, technical scheme of the present invention will be described in further detail.
The deep learning system based on Resource Scheduler that the embodiment of the invention provides a kind of, the system include: multiple high
Performance calculate node, each high-performance calculation node include muti-piece graphics processor;For example, the quantity of high-performance calculation node is
48, each high-performance calculation node includes 8 block graphics processors.It is provided in an embodiment of the present invention a kind of based on Resource Scheduler
Deep learning system further include: Resource Scheduler and deep learning frame, wherein Resource Scheduler be Slurm scheduling of resource
Device, deep learning frame are TensorFlow deep learning frame.Resource Scheduler is used for according to the mentioned demand of user, from multiple
Resource allocation required for choosing in high-performance calculation node is to user;The Resource Scheduler distribution is parsed by parsing plug-in unit
To the environmental variance of user resources, corresponding parameter is obtained;Deep learning frame according to the parameter formed one operation into
Journey, to start to execute deep learning program;After the completion of deep learning program, the Resource Scheduler recycles all distribution
Resource, to complete entire deep learning process.
Wherein, parsing plug-in unit be application container engine, application container engine include Singularity, Shifter or
Docker.In past ten years, from certain engineers, self hobby is gradually transformed into globalization to virtualization technology
Industrialized basal needs.All kinds of exploitation environment are wrapped in container by Singularity as container, in this way, for work
Oneself exploitation or required software environment can be loaded among Singularity by Cheng Shi, this kind of people of scientist, a structure
It builds, repeatedly multi-platform seamless interfacing uses, and improves the working efficiency of engineering staff and scientific research personnel, they are absorbed in
In the research of more core more importantly question essence, without excessively worried by the trifling thing in corner;For system manager
For be also the big liberation of production, nowadays all kinds of softwares enthusiastically come out like the mushrooms after rain, and each user demand may have phase not to the utmost
Together, unified that all softwares are installed, it is unrealistic also It is not necessary to, may use 1.0 versions with a software, party A-subscriber, and party B-subscriber
Use 1.2 versions, it is also possible to which A software translating relies on version repository 0.5, and B software translating relies on version repository 0.8.
The demand of more ideally user and system manager both sides are capable of in encapsulation of the Singlularity as an environment.
The container that Singularity sheet is designed and developed as HPC, the natural key technology example supported in high-performance calculation
Such as InfiniBand and Lustre, seamless interfacing is in all high-performance resource managers, such as Slurm, Torque, SGE.It is special
Not, Singularity can be integrated into a plug-in unit in Slurm, and Slurm jobs is so enabled to naturally to run
Into the container of Singularity.Common containers Singularity, Shifter and Docker are compared as follows table:
Compare between Singularity Shifter Docker
The environmental variance that the Resource Scheduler distributes to user resources is parsed above by the parsing plug-in unit write in advance, is obtained
Taking corresponding parameter step includes: to parse the ring that the Resource Scheduler distributes to user resources by the parsing plug-in unit write in advance
Border variable SLURM_JOB_NODELIST and SLURMD_NODENAME, obtain corresponding parameter cluster, job_name,
task_index;Deep learning frame forms an operation according to the parameter cluster, job_name, task_index
Process, to start to execute deep learning program.
The embodiment of the present invention provides the system of the centralized management of an entirety for all kinds of deep learning frames, by scheduler knot
Closing distributed memory system can be such that all deep learning frames coexist in a management system, pass through the plug-in unit write in advance
So that scheduler supports scheduling Distributed Learning frame, so that distributed deep learning frame is adjusted just as ordinary procedure is the same
Device is spent to call and cancel.Effectively improve the operation efficiency of Distributed Learning frame.
Correspondingly, the deep learning method based on Resource Scheduler that the embodiment of the invention provides a kind of, as shown in Figure 1,
Method includes the following steps:
S101, Resource Scheduler is according to the mentioned demand of user, money required for choosing from multiple high-performance calculation nodes
Distribute to user in source;
S102 parses the environmental variance that the Resource Scheduler distributes to user resources by the parsing plug-in unit write in advance,
Obtain corresponding parameter;
S103, deep learning frame form the process of an operation according to the parameter, to start to execute deep learning
Program;After the completion of deep learning program, the Resource Scheduler recycles the resource of all distribution, to complete entire depth
Learning process.
The embodiment of the present invention provides the method for the centralized management of an entirety for all kinds of deep learning frames, by scheduler knot
Closing distributed memory system can be such that all deep learning frames coexist in a management system, pass through the plug-in unit write in advance
So that scheduler supports scheduling Distributed Learning frame, so that distributed deep learning frame is adjusted just as ordinary procedure is the same
Device is spent to call and cancel.Effectively improve the operation efficiency of Distributed Learning frame.
The experiment of deep learning system performance is illustrated below:
Deep learning system needs to carry out intensive calculating on large data sets, therefore first to the read-write band of mass data
Width is more demanding, just can control the network delay of each deep learning business model in the training process.This deep learning system
The networking mode for having used the Infiniband of Parastor, has ensured message transmission rate.
Deep learning system possesses 48 nodes, 8 pieces of Tesla P100gpu on each node, the scheduling based on Slurm
Device, platform have realized the demand that can satisfy in all kinds of deep learning scheduling to the other scheduling of card several levels is dynamically assigning to.
Deep learning system is supported all kinds of deep learning frames and divergence is all very excellent, at present the depth of mainstream
Frame is practised to have supported, such as: TensorFlow, Caffe, Mxnet, Pytourch etc..The need of each version in order to balance
It asks and the customized deep learning frame of user, platform provides the other virtualization condition of container levels, and can be multiplexed
The container of Docker, successful com environment of user, hereafter no longer needing to carry out any change again can seamless interfacing transplanting
Relevant infrastest is carried out on to this platform, substantially increases the convenience of experiment.
In order to embody the representativeness of deep learning system using practice, it is the most typical that this test has chosen deep learning
The direct naked race that frame TensorFlow has been carried out in single node calls Slurm to carry out dynamic node allocation for test, virtualization
TensorFlow environmental testing in container, Slurm scheduling load TensorFlow container Singularity and are tested.Single-unit
Point test is upper to use 1,2,4, the 8 multi-class tests of muti-piece GPU again.The following figure is test result table.
Python | Slurm | Singulariy | Singulariy on slurm | |
1gpu | 159.3 | 159.9 | 158.9 | 160.49 |
2gpu | 315.5 | 307.8 | 310.9 | 313.1 |
4gpu | 630.4 | 615.2 | 621.9 | 613.0 |
8gpu | 1133.6 | 1108.7 | 1135.2 | 1099.8 |
TensorFlow test experiments
Experiment in single node exports the result is that the picture number handled in each second, the data set for testing selection are
2012 experiment collection in ImageNet, deep learning network uniformly choose Resnet50, and BatchSize is all made of in training process
32。
From the point of view of the experiment effect of single node, four big group experiments have reached linear speed-up ratio, and are dispatched using Slurm
Virtualization with Singularity is tested, and does not almost bring the loss in performance to experiment effect.The following figure embodies all kinds of experiments
Effect difference is no different.It (note: Python: directly logs in calculate node and is tested;Slurm:Slurm dynamic dispatching calculate node
It is tested;Singularity: being logged on to calculate node and tested using Singularity, Singularity on Slurm:
Slurm dispatches Singularity and carries out node test), four groups of experimental performance effects are as shown in Figure 2.
8 pieces of gpu have been able to meet the training necessary requirement in most deep learnings, in order to support ultra-large mould
Training and acceleration in type, Distributed T ensorFlow also seamless interfacing on the dynamic dispatching of Slurm.Following diagrammatic representation
From 1 to 4 performance test of the Distributed T ensorFlow of node.
1node | 2node | 4node | 8node | |
python | 1133.7 | 1888.4 | 3868.9 | 7561.5 |
It is watched from the process performance result of test, the extension on 8 pieces of gpu to 64 pieces of gpu has also reached linear substantially and added
The effect of speed ratio.Embody the outstanding expansible characteristic of platform.Specific effect picture is shown in Fig. 3.
Deep learning system based on Slurm realizes the gpu from single node to cross-node from experiment effect
Linear speed-up ratio, it is outstanding complete deep learning needed for calculated performance requirement.The dynamic allocation of platform computing resource, preferably
The convenient purpose used while meeting multiple users.The seamless interfacing of mainstream frame supports that multiple version virtualizations are held
Device coexists, and realizes user customized demand and excellent portable effect extensively.Entire platform, which uses, to be passed
System high-performance architectural framework has been perfectly combined current deep learning technology burning the hotest, has produced good application value
Above-described specific embodiment has carried out further the purpose of the present invention, technical scheme and beneficial effects
It is described in detail, it should be understood that being not intended to limit the present invention the foregoing is merely a specific embodiment of the invention
Protection scope, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should all include
Within protection scope of the present invention.
Claims (10)
1. a kind of deep learning system based on Resource Scheduler, comprising: multiple high-performance calculation nodes, each high-performance calculation
Node includes muti-piece graphics processor;It is characterized by further comprising: Resource Scheduler and deep learning frame, wherein
Resource Scheduler is used for according to the mentioned demand of user, resource allocation required for choosing from multiple high-performance calculation nodes
To user;
The environmental variance that the Resource Scheduler distributes to user resources is parsed by parsing plug-in unit, obtains corresponding parameter;
Deep learning frame forms the process of an operation according to the parameter, to start to execute deep learning program;In depth
It spends after learning program completion, the Resource Scheduler recycles the resource of all distribution, to complete entire deep learning process.
2. system according to claim 1, which is characterized in that the parsing plug-in unit is application container engine, the application
Container engine includes Singularity, Shifter or Docker.
3. system according to claim 1, which is characterized in that the parsing plug-in unit by writing in advance parses the resource
Scheduler distributes to the environmental variance of user resources, and obtaining corresponding parameter step includes: the parsing plug-in unit solution by writing in advance
Environmental variance SLURM_JOB_NODELIST and SLURMD_NODENAME that the Resource Scheduler distributes to user resources are analysed,
Obtain corresponding parameter cluster, job_name, task_index;
Deep learning frame forms the process of an operation according to the parameter cluster, job_name, task_index, from
And start to execute deep learning program.
4. system according to claim 1, which is characterized in that the quantity of the high-performance calculation node is 48, each
High-performance calculation node includes 8 block graphics processors.
5. system according to claim 1, which is characterized in that the Resource Scheduler is Slurm Resource Scheduler.
6. system according to claim 1, which is characterized in that the deep learning frame is TensorFlow deep learning
Frame.
7. a kind of deep learning method based on Resource Scheduler, which comprises the following steps: Resource Scheduler according to
The mentioned demand of user, resource allocation required for choosing from multiple high-performance calculation nodes is to user;
The environmental variance that the Resource Scheduler distributes to user resources is parsed by the parsing plug-in unit write in advance, is obtained corresponding
Parameter;
Deep learning frame forms the process of an operation according to the parameter, to start to execute deep learning program;In depth
It spends after learning program completion, the Resource Scheduler recycles the resource of all distribution, to complete entire deep learning process.
8. the method according to the description of claim 7 is characterized in that the parsing plug-in unit is application container engine, the application
Container engine includes Singularity, Shifter or Docker.
9. system according to claim 7, which is characterized in that the parsing plug-in unit by writing in advance parses the resource
Scheduler distributes to the environmental variance of user resources, and obtaining corresponding parameter step includes: the parsing plug-in unit solution by writing in advance
Environmental variance SLURM_JOB_NODELIST and SLURMD_NODENAME that the Resource Scheduler distributes to user resources are analysed,
Obtain corresponding parameter cluster, job_name, task_index;
Deep learning frame forms the process of an operation according to the parameter cluster, job_name, task_index, from
And start to execute deep learning program.
10. the method according to the description of claim 7 is characterized in that the quantity of the high-performance calculation node is 48, each
High-performance calculation node includes 8 block graphics processors.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810668856.4A CN109034386A (en) | 2018-06-26 | 2018-06-26 | A kind of deep learning system and method based on Resource Scheduler |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810668856.4A CN109034386A (en) | 2018-06-26 | 2018-06-26 | A kind of deep learning system and method based on Resource Scheduler |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109034386A true CN109034386A (en) | 2018-12-18 |
Family
ID=64610901
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810668856.4A Pending CN109034386A (en) | 2018-06-26 | 2018-06-26 | A kind of deep learning system and method based on Resource Scheduler |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109034386A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109739514A (en) * | 2018-12-21 | 2019-05-10 | 北京中科寒武纪科技有限公司 | Parameter processing method and Related product |
CN109976911A (en) * | 2019-03-25 | 2019-07-05 | 哈尔滨工程大学 | A kind of adaptive resource dispatching method |
CN110096356A (en) * | 2019-03-22 | 2019-08-06 | 北京达佳互联信息技术有限公司 | Resource regulating method, device, electronic equipment and storage medium |
CN110389834A (en) * | 2019-06-28 | 2019-10-29 | 苏州浪潮智能科技有限公司 | A kind of method and apparatus for submitting deep learning training mission |
CN111221541A (en) * | 2019-12-26 | 2020-06-02 | 曙光信息产业(北京)有限公司 | Cluster parallel program deployment method and device |
CN111695672A (en) * | 2019-03-14 | 2020-09-22 | 百度(美国)有限责任公司 | Method for improving AI engine MAC utilization rate |
CN113065642A (en) * | 2021-04-09 | 2021-07-02 | 中电科数字科技(集团)有限公司 | Artificial intelligence acceleration method and system based on heterogeneous computing |
CN114968559A (en) * | 2022-05-06 | 2022-08-30 | 苏州国科综合数据中心有限公司 | LSF-based method for multi-host multi-GPU distributed arrangement of deep learning model |
US11699073B2 (en) | 2018-12-29 | 2023-07-11 | Cambricon Technologies Corporation Limited | Network off-line model processing method, artificial intelligence processing device and related products |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107733977A (en) * | 2017-08-31 | 2018-02-23 | 北京百度网讯科技有限公司 | A kind of cluster management method and device based on Docker |
US9934147B1 (en) * | 2015-06-26 | 2018-04-03 | Emc Corporation | Content-aware storage tiering techniques within a job scheduling system |
CN108170417A (en) * | 2017-12-29 | 2018-06-15 | 曙光信息产业(北京)有限公司 | A kind of method and apparatus that high performance job scheduling frame is integrated in MESOS clusters |
-
2018
- 2018-06-26 CN CN201810668856.4A patent/CN109034386A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9934147B1 (en) * | 2015-06-26 | 2018-04-03 | Emc Corporation | Content-aware storage tiering techniques within a job scheduling system |
CN107733977A (en) * | 2017-08-31 | 2018-02-23 | 北京百度网讯科技有限公司 | A kind of cluster management method and device based on Docker |
CN108170417A (en) * | 2017-12-29 | 2018-06-15 | 曙光信息产业(北京)有限公司 | A kind of method and apparatus that high performance job scheduling frame is integrated in MESOS clusters |
Non-Patent Citations (2)
Title |
---|
JACOB_WJJ: "slurm提交Tensorflow任务实现", 《HTTPS://BLOG.CSDN.NET/JIANGBO1017/ARTICLE/DETAILS/78591846》 * |
陆忠华等: "基于 Slurm 的深度学习高性能计算平台设计及其调度实现技术", 《科研信息化技术与应用》 * |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109739514A (en) * | 2018-12-21 | 2019-05-10 | 北京中科寒武纪科技有限公司 | Parameter processing method and Related product |
US11699073B2 (en) | 2018-12-29 | 2023-07-11 | Cambricon Technologies Corporation Limited | Network off-line model processing method, artificial intelligence processing device and related products |
CN111695672A (en) * | 2019-03-14 | 2020-09-22 | 百度(美国)有限责任公司 | Method for improving AI engine MAC utilization rate |
CN111695672B (en) * | 2019-03-14 | 2023-09-08 | 百度(美国)有限责任公司 | Method for improving MAC utilization rate of AI engine |
CN110096356A (en) * | 2019-03-22 | 2019-08-06 | 北京达佳互联信息技术有限公司 | Resource regulating method, device, electronic equipment and storage medium |
CN110096356B (en) * | 2019-03-22 | 2022-06-03 | 北京达佳互联信息技术有限公司 | Resource scheduling method, device, electronic equipment and storage medium |
CN109976911A (en) * | 2019-03-25 | 2019-07-05 | 哈尔滨工程大学 | A kind of adaptive resource dispatching method |
CN110389834A (en) * | 2019-06-28 | 2019-10-29 | 苏州浪潮智能科技有限公司 | A kind of method and apparatus for submitting deep learning training mission |
CN110389834B (en) * | 2019-06-28 | 2022-07-12 | 苏州浪潮智能科技有限公司 | Method and device for submitting deep learning training task |
CN111221541A (en) * | 2019-12-26 | 2020-06-02 | 曙光信息产业(北京)有限公司 | Cluster parallel program deployment method and device |
CN113065642A (en) * | 2021-04-09 | 2021-07-02 | 中电科数字科技(集团)有限公司 | Artificial intelligence acceleration method and system based on heterogeneous computing |
CN113065642B (en) * | 2021-04-09 | 2023-04-07 | 中电科数字科技(集团)有限公司 | Artificial intelligence acceleration method and system based on heterogeneous computing |
CN114968559A (en) * | 2022-05-06 | 2022-08-30 | 苏州国科综合数据中心有限公司 | LSF-based method for multi-host multi-GPU distributed arrangement of deep learning model |
CN114968559B (en) * | 2022-05-06 | 2023-12-01 | 苏州国科综合数据中心有限公司 | LSF-based multi-host multi-GPU distributed arrangement deep learning model method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109034386A (en) | A kind of deep learning system and method based on Resource Scheduler | |
Taylor | Distributed simulation: state-of-the-art and potential for operational research | |
Mao et al. | Analysis of average shortest‐path length of scale‐free network | |
CN106649085A (en) | Cloud computing-based software test system | |
CN106199696B (en) | Earthquake data processing system and method | |
CN104053179B (en) | A kind of C RAN system integration project platforms | |
Cui et al. | Multiple DAGs workflow scheduling algorithm based on reinforcement learning in cloud computing | |
CN107480717A (en) | Train job processing method and system, computing device, computer-readable storage medium | |
Li et al. | Intermediate data placement and cache replacement strategy under Spark platform | |
Yilmaz et al. | Panel: The future of research in modeling & simulation | |
CN109992715A (en) | Information displaying method, device, medium and calculating equipment | |
Scarpiniti et al. | VirtFogSim: A parallel toolbox for dynamic energy-delay performance testing and optimization of 5G mobile-fog-cloud virtualized platforms | |
CN110351145A (en) | A kind of radio network functions method of combination of the virtualization based on economic benefit | |
Varghese et al. | DocLite: A docker-based lightweight cloud benchmarking tool | |
CN106293947A (en) | GPU CPU mixing resource allocation system and method under virtualization cloud environment | |
CN107301088A (en) | A kind of method and apparatus for managing virtual machine batch migration | |
CN109976873A (en) | The scheduling scheme acquisition methods and dispatching method of containerization distributed computing framework | |
Xhafa et al. | A parallel grid-based implementation for real-time processing of event log data of collaborative applications | |
Liu et al. | KubFBS: A fine‐grained and balance‐aware scheduling system for deep learning tasks based on kubernetes | |
Depasquale et al. | Dynamics of research into modeling the power consumption of virtual entities used in the telco cloud | |
Cao et al. | Throughput optimization for Storm-based processing of stream data on clouds | |
Mondal et al. | Toward optimal load prediction and customizable autoscaling scheme for kubernetes | |
Wang et al. | GPARS: Graph predictive algorithm for efficient resource scheduling in heterogeneous GPU clusters | |
Muniyandi et al. | A representation of membrane computing with a clustering algorithm on the graphical processing unit | |
Nasonov et al. | The multi-level adaptive approach for efficient execution of multi-scale distributed applications with dynamic workload |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20181218 |