CN109144661A - A kind of deep learning management method based on docker - Google Patents

A kind of deep learning management method based on docker Download PDF

Info

Publication number
CN109144661A
CN109144661A CN201810741533.3A CN201810741533A CN109144661A CN 109144661 A CN109144661 A CN 109144661A CN 201810741533 A CN201810741533 A CN 201810741533A CN 109144661 A CN109144661 A CN 109144661A
Authority
CN
China
Prior art keywords
module
docker
cluster
deep learning
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810741533.3A
Other languages
Chinese (zh)
Inventor
冯涛
王辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN201810741533.3A priority Critical patent/CN109144661A/en
Publication of CN109144661A publication Critical patent/CN109144661A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5077Logical partitioning of resources; Management or configuration of virtualized resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)

Abstract

The deep learning management method based on docker that the invention discloses a kind of, including cluster module, Registry module, memory module, line module.Docker Mirror Info of the Registry module for being saved in record storage module, and the node into cluster module transmits docker mirror image if necessary;The memory module is for storing the docker mirror image recorded in Registry module;The cluster module is made of multiple nodes, for receiving line module request, generate the docker container comprising specified resource complete relative to task;The line module submits service for providing authentication, cluster monitoring and task, can carry out deep learning task to cluster module application docker container by the module.We can submit deep learning task by line module by user's checking to the platform when use after being verified, and training obtains deep learning model.

Description

A kind of deep learning management method based on docker
Technical field
This hair belongs to cloud computing virtualization technology and artificial intelligence deep learning field, more particularly to a kind of based on docker Deep learning management method.
Technical background
Since 21 century, the achievement that the mankind obtain in artificial intelligence is further plentiful and substantial.Deep learning is machine learning A branch, it is therefore intended that establish one simulation human brain carry out analytic learning neural network, pass through imitate human brain work Mechanism carries out data processing, such as image is typically used in video identification, image recognition or voice recognition field.But it faces Increasing data, deep learning computing resource required when training are increasing, it is intended that use sleeve platform These computing resources are managed collectively and are monitored, the utilization rate of computing resource is improved.
Therefore in view of the drawbacks of the prior art, it is really necessary to propose a kind of technical solution to solve skill of the existing technology Art problem.
Summary of the invention:
It is an object of the invention to propose a kind of deep learning management method based on docker, to improve deep learning Computing resource utilization rate when training.
The purpose of the present invention is be achieved through the following technical solutions:
A kind of deep learning management method based on docker, specifically includes the following steps:
Step S1, authentication is carried out by the service that line module provides, logs in the specified docker of rear user's selection Mirror image, required parameter and the required configuration of when deep learning model training when setting generates docker container, and to cluster mould Block sends request to apply for specific resource;
Step S2, after cluster module receives the request of line module, the node in cluster is first selected to check in current cluster Whether environment has enough resources to generate docker;If vacant resource is unsatisfactory for requiring, task is waited, and returns to waiting for reason To line module;If vacant resource is met the requirements, cluster module specifies the node that this subtask uses, and enters step S3;
Step S3, in judgment step S2 the specified node of cluster module whether include this required by task docker mirror Picture;If each node includes required docker mirror image, S4 is entered step;If it exists node not needed for docker mirror image, Docker mirror image needed for node can be received from memory module to Registry module application;If also useless in Registry module Required mirror image returns to error message;If each node has received required docker mirror image, S4 is entered step;
Step S4, new docker container and the deep learning model training parameter according to setting are generated in each node It is trained;
Realize that the system architecture of the method for the present invention includes cluster module as a preferred technical solution, Registry module, Memory module, line module.Docker Mirror Info of the Registry module for being saved in record storage module, and The node into cluster module transmits docker mirror image when necessary;The memory module records in Registry module for storing Docker mirror image;The cluster module is made of multiple nodes, for receiving line module request, is generated comprising specified resource Docker container complete relative to task;The line module is for providing authentication, user logs in and submits with task Service, can carry out deep learning task to cluster module application docker container by the module.
Further, the Registry module is realized by the private database docker container run in linux host. Registry module, which is mainly responsible for, records the docker mirror image that currently stored module saves, so as to the node in subsequent cluster module It is downloaded.When needing to upload new docker mirror image, user is uploaded new docker mirror image by line module. Further, the cluster module is made of multiple nodes, for receiving line module request, is generated comprising specified resource Docker container complete relative to task.Cluster module can be built by clustered software, such as hadoop, etcd etc.. Docker container is after generation periodically to information such as the operating condition of line module return container, log outputs.After task, Destroying information is simultaneously returned to line module by docker container auto-destruct.
Further, the line module is used to provide authentication, user logs in submit with task and service, and can pass through The module carries out deep learning task to cluster module application docker container.The authentication of line module offer, cluster Monitoring and task submit service to pass through the information realization of web services.The corresponding achievable identity of URL is accessed by browser to recognize Card.After completing authentication, we can observe the resource service condition and network communication of each node of cluster in a browser The information such as situation also can choose specified resource distribution and deep learning task and start to train.When task training, line module Receive the information of cluster module, the operating condition of real time display task.
Compared with prior art, the invention has the benefit that
1, the present invention can apply for specified resource dynamically in a browser complete task, and operating procedure is than simple It is more convenient using docker technology.
2, it when needing more computing resources, can quickly be added by clustered software.
3, the resource being managed collectively in a manner of visual in cluster greatly improves the utilization of computing resource in cluster Rate.
Detailed description of the invention
Fig. 1 is that the present invention is based on the system structure diagrams of the deep learning management method of docker;
Fig. 2 is that the present invention is based on the process step figures of the deep learning management method of docker;
Specific embodiment
The present invention is further qualified with embodiment with reference to the accompanying drawings of the specification, but not limited to this.
Docker is an open-source software project, allows work of the application deployment under software container can be certainly Dynamicization carries out.The resource that we can be specified by the distribution of docker Technique dynamic, generates the calculating environment of specific environment.Cause This, docker mirror image dynamically manages computing resource for us, and the utilization rate for improving computing resource provides a kind of thinking.
As illustrated in fig. 1 and 2, the present invention is based on the system construction drawings and process step of the deep learning management method of docker Figure, specifically includes the following steps:
S1, user carry out authentication by the service that line module provides, and log in the specified docker of rear user's selection Mirror image, setting parameter required when generating docker container (it is big to include but are not limited to GPU quantity, CPU core calculation and memory It is small) and the required configuration of when deep learning model training (path of data needed for including but are not limited to network frame, training and Training pattern type), to the specific resource of cluster module application;
After S2, cluster module receive the request of line module, the node in cluster is first selected to check environment in current cluster Whether there are enough resources to generate docker.If vacant resource is unsatisfactory for requiring, task is waited, after a period of time, this Business can execute S2 again;If vacant resource is met the requirements, cluster module specifies the node that this subtask uses, into S3.
S3, judge the specified node of cluster module when S2 whether include this required by task docker mirror image.Ruo Gejie Point includes required docker mirror image, into S4;If it exists node not needed for docker mirror image, node can be to Docker mirror image needed for Registry module application is received from memory module.If mirror needed for also useless in Registry module Picture returns to error message;If each node has received required docker mirror image, into S4.
S4, new docker container is generated in each node and is carried out according to the deep learning model training parameter of setting Training.
Wherein, the system architecture for realizing the method for the present invention further comprises cluster module, and Registry module stores mould Block, line module.
Further, the Registry module is realized by the private database docker container run in linux host.? Docker software is installed in the linux host where Registry module, and passes through the pull command download of docker Registry mirror image creates privately owned warehouse by the mirror image, to save the mirror image that current subsequent is uploaded in Registry module.
Further, the memory module is realized as the linux host where Registry module.We The biggish hard disk of capacity is hung in linux host where Registry, and the docker mirror image in Registry module is deposited Path is stored up to be arranged on the corresponding path of new carry hard disk.
Further, the cluster module is made of multiple nodes, for receiving line module request, is generated comprising specified The docker container of resource complete relative to task.Cluster module, which is built, needs more linux hosts.Firstly, we by this A little hosts carry out SSH between each other and exempt from close logon operation, guarantee that SSH service, which can be used, between each host exempts from close log in.Its Secondary, we, which are arranged, is built these linux hosts for a cluster by current existing clustered software.Finally, it would be desirable to Each linux host in cluster installs docker software, and realizes that the docker in each host between host can be carried out Communication.In addition to this, in cluster module host need to be implemented relative to script, return host and content host device real-time letter Breath.
Further, the line module can be realized in a certain node in cluster, can also be individually outside cluster It is realized in linux host.Line module mainly realizes that authentication, user are logged in and submits service function with task.We use A series of frames of java realize these functions, and let us is accessed by browser.
The above description of the embodiment is only used to help understand the method for the present invention and its core ideas.It should be pointed out that pair For those skilled in the art, without departing from the principle of the present invention, the present invention can also be carried out Some improvements and modifications, these improvements and modifications also fall within the scope of protection of the claims of the present invention.
The foregoing description of the disclosed embodiments enables those skilled in the art to implement or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, of the invention It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one The widest scope of cause.

Claims (3)

1. a kind of deep learning management method based on docker, which is characterized in that specifically includes the following steps:
Step S1, authentication is carried out by the service that line module provides, logs in the specified docker mirror image of rear user's selection, Configuration required when required parameter and deep learning model training when generating docker container is set, and is sent to cluster module Request is to apply for specific resource;
Step S2, after cluster module receives the request of line module, the node in cluster is first selected to check environment in current cluster Whether there are enough resources to generate docker;If vacant resource is unsatisfactory for requiring, task is waited, and returns to waiting for reason to use Family module;If vacant resource is met the requirements, cluster module specifies the node that this subtask uses, and enters step S3;
Step S3, in judgment step S2 the specified node of cluster module whether include this required by task docker mirror image;If Each node includes required docker mirror image, enters step S4;If it exists node not needed for docker mirror image, node meeting Docker mirror image needed for being received from memory module to Registry module application;If mirror needed for also useless in Registry module Picture returns to error message;If each node has received required docker mirror image, S4 is entered step;
Step S4, new docker container is generated in each node and is carried out according to the deep learning model training parameter of setting Training;
Wherein, docker Mirror Info of the Registry module for saving in record storage module, and according to cluster Node in module transmits docker mirror image;The memory module is for storing the docker mirror recorded in Registry module Picture;The cluster module is made of multiple nodes, for receiving line module request, is generated the docker comprising specified resource and is held Device complete relative to task;The line module for providing authentication, cluster monitoring and task submit service, and to collection Group's module application docker container carries out deep learning task.
2. the deep learning management method according to claim 1 based on docker, it is characterised in that: the Registry Module is realized by the private database docker container run in linux host;The memory module is by remaining larger free space Linux host is realized;The cluster module builds realization by more linux hosts;The line module is by a linux host The service of middle operation is realized.
3. the deep learning management method based on docker according to claim 2, it is characterised in that: form modules Linux host needs to keep network communication normal, and realizes respective SSH and exempt from close log in.
CN201810741533.3A 2018-07-05 2018-07-05 A kind of deep learning management method based on docker Pending CN109144661A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810741533.3A CN109144661A (en) 2018-07-05 2018-07-05 A kind of deep learning management method based on docker

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810741533.3A CN109144661A (en) 2018-07-05 2018-07-05 A kind of deep learning management method based on docker

Publications (1)

Publication Number Publication Date
CN109144661A true CN109144661A (en) 2019-01-04

Family

ID=64799977

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810741533.3A Pending CN109144661A (en) 2018-07-05 2018-07-05 A kind of deep learning management method based on docker

Country Status (1)

Country Link
CN (1) CN109144661A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110209574A (en) * 2019-05-14 2019-09-06 深圳极视角科技有限公司 A kind of data mining system based on artificial intelligence
CN111274018A (en) * 2020-01-21 2020-06-12 行星算力(深圳)科技有限公司 Distributed training method based on DL framework
CN112035220A (en) * 2020-09-30 2020-12-04 北京百度网讯科技有限公司 Processing method, device and equipment for operation task of development machine and storage medium
CN112148348A (en) * 2019-06-28 2020-12-29 杭州海康威视数字技术股份有限公司 Task processing method and device and storage medium
CN112148419A (en) * 2019-06-28 2020-12-29 杭州海康威视数字技术股份有限公司 Mirror image management method, device and system in cloud platform and storage medium
CN112579149A (en) * 2020-12-24 2021-03-30 第四范式(北京)技术有限公司 Generation method, device, equipment and storage medium of model training program mirror image

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105511943A (en) * 2015-12-03 2016-04-20 华为技术有限公司 Docker container running method and device
CN106790483A (en) * 2016-12-13 2017-05-31 武汉邮电科学研究院 Hadoop group systems and fast construction method based on container technique
CN106850621A (en) * 2017-02-07 2017-06-13 南京云创大数据科技股份有限公司 A kind of method based on container cloud fast construction Hadoop clusters
EP3270289A2 (en) * 2016-06-23 2018-01-17 Sap Se Container-based multi-tenant computing infrastructure
CN107733977A (en) * 2017-08-31 2018-02-23 北京百度网讯科技有限公司 A kind of cluster management method and device based on Docker
CN107783818A (en) * 2017-10-13 2018-03-09 北京百度网讯科技有限公司 Deep learning task processing method, device, equipment and storage medium
CN108062246A (en) * 2018-01-25 2018-05-22 北京百度网讯科技有限公司 For the resource regulating method and device of deep learning frame

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105511943A (en) * 2015-12-03 2016-04-20 华为技术有限公司 Docker container running method and device
EP3270289A2 (en) * 2016-06-23 2018-01-17 Sap Se Container-based multi-tenant computing infrastructure
CN106790483A (en) * 2016-12-13 2017-05-31 武汉邮电科学研究院 Hadoop group systems and fast construction method based on container technique
CN106850621A (en) * 2017-02-07 2017-06-13 南京云创大数据科技股份有限公司 A kind of method based on container cloud fast construction Hadoop clusters
CN107733977A (en) * 2017-08-31 2018-02-23 北京百度网讯科技有限公司 A kind of cluster management method and device based on Docker
CN107783818A (en) * 2017-10-13 2018-03-09 北京百度网讯科技有限公司 Deep learning task processing method, device, equipment and storage medium
CN108062246A (en) * 2018-01-25 2018-05-22 北京百度网讯科技有限公司 For the resource regulating method and device of deep learning frame

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JASONBSEN DM等: "Contain this,unleashing docker for hpc", 《PROCEEDINGS OF THE CRAY USER GROUP》 *
耿朋等: "面向Dockerfile的容器镜像构建工具", 《计算机系统应用》 *
肖熠等: "一种针对GPU资源的深度学习容器云研究", 《中国传媒大学学报自然科学版》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110209574A (en) * 2019-05-14 2019-09-06 深圳极视角科技有限公司 A kind of data mining system based on artificial intelligence
CN112148348A (en) * 2019-06-28 2020-12-29 杭州海康威视数字技术股份有限公司 Task processing method and device and storage medium
CN112148419A (en) * 2019-06-28 2020-12-29 杭州海康威视数字技术股份有限公司 Mirror image management method, device and system in cloud platform and storage medium
CN112148348B (en) * 2019-06-28 2023-10-20 杭州海康威视数字技术股份有限公司 Task processing method, device and storage medium
CN112148419B (en) * 2019-06-28 2024-01-02 杭州海康威视数字技术股份有限公司 Mirror image management method, device and system in cloud platform and storage medium
CN111274018A (en) * 2020-01-21 2020-06-12 行星算力(深圳)科技有限公司 Distributed training method based on DL framework
CN112035220A (en) * 2020-09-30 2020-12-04 北京百度网讯科技有限公司 Processing method, device and equipment for operation task of development machine and storage medium
CN112579149A (en) * 2020-12-24 2021-03-30 第四范式(北京)技术有限公司 Generation method, device, equipment and storage medium of model training program mirror image
WO2022135592A1 (en) * 2020-12-24 2022-06-30 第四范式(北京)技术有限公司 Method and apparatus for generating model training program mirror image, device, and storage medium
CN112579149B (en) * 2020-12-24 2024-01-30 第四范式(北京)技术有限公司 Method, device, equipment and storage medium for generating model training program mirror image

Similar Documents

Publication Publication Date Title
CN109144661A (en) A kind of deep learning management method based on docker
McChesney et al. Defog: fog computing benchmarks
WO2021115480A1 (en) Federated learning method, device, equipment, and storage medium
CN106484622B (en) Test method, apparatus and system
CN112712182B (en) Model training method and device based on federal learning and storage medium
CN104252337B (en) Task executing method, boundary means and grid service server in computing system
CN108255605A (en) Image recognition cooperative computing method and system based on neural network
CN107508722B (en) Service monitoring method and device
CN108809722A (en) A kind of method, apparatus and storage medium of deployment Kubernetes clusters
Mukherjee et al. Cloud computing: future framework for e-Governance
US20100153482A1 (en) Cloud-Based Automation of Resources
CN106325975A (en) Method for automatically deploying and managing big data clusters through Docker container
CN105681454B (en) A kind of adaptive connection cloud desktop method and system
CN109104467A (en) Develop environment construction method, apparatus and plateform system and storage medium
US11483218B2 (en) Automating 5G slices using real-time analytics
CN110083455B (en) Graph calculation processing method, graph calculation processing device, graph calculation processing medium and electronic equipment
CN108924221A (en) The method and apparatus for distributing resource
US20130297803A1 (en) Method for providing development and deployment services using a cloud-based platform and devices thereof
CN112764875B (en) Intelligent calculation-oriented lightweight portal container microservice system and method
CN112667594A (en) Heterogeneous computing platform based on hybrid cloud resources and model training method
CN111124617B (en) Method and device for creating block chain system, storage medium and electronic device
CN108196764A (en) Application architecture dispositions method, device, system and cloud platform
CN108667639A (en) A kind of method for managing resource under privately owned cloud environment and management server
CN107168844B (en) Performance monitoring method and device
Longo et al. Urban pollution monitoring based on mobile crowd sensing: An osmotic computing approach

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190104

RJ01 Rejection of invention patent application after publication