CN111459506B

CN111459506B - Deep learning platform cluster deployment method and device, medium and electronic equipment

Info

Publication number: CN111459506B
Application number: CN202010136850.XA
Authority: CN
Inventors: 钟孝勋; 贺波; 万书武; 李均; 蒋英明
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-03-02
Filing date: 2020-03-02
Publication date: 2023-10-13
Anticipated expiration: 2040-03-02
Also published as: CN111459506A

Abstract

The disclosure relates to the field of process optimization, and discloses a deployment method, a device, a medium and electronic equipment of a deep learning platform cluster. The method comprises the following steps: setting a storage node; uploading a deep learning platform installation package, a deep learning calculation installation package corresponding to the deep learning platform installation package, a system installation package on which the deep learning platform depends to run on a target system, and an installation program of the deep learning platform to a storage node for storage; and downloading a downloading instruction to target equipment to be deployed with the deep learning platform, so that the target equipment downloads an installation program of the deep learning platform according to the downloading instruction, and then downloads and installs an installation package of the deep learning platform matched with the target equipment, a system installation package on which the deep learning platform depends and/or a deep learning computing installation package from a storage node by running the installation program. By the method, the deployment efficiency of the deep learning platform, particularly the deep learning platform cluster, is improved, and the deployment cost is reduced.

Description

Deep learning platform cluster deployment method and device, medium and electronic equipment

Technical Field

The disclosure relates to the technical field of process optimization, and in particular relates to a deployment method, device, medium and electronic equipment of a deep learning platform cluster.

Background

The surge of artificial intelligence is rolling around the world, and with the development of artificial intelligence, platforms such as deep learning frameworks, etc. for providing services for deep learning are also emerging. However, at present, if one deep learning platform is to be installed, various factors such as the version of the deep learning platform itself, the version of an operating system used by the device to be installed, the versions of various installation packages under the environment on which the deep learning platform to be installed depends, and the hardware configuration of the device to be installed need to be considered, so that when the deep learning platform is deployed on the device, related personnel are required to have professional skills and perform complicated operations, and particularly when a cluster of one deep learning platform is to be deployed, the deployment efficiency is low, the deployment cost is high, and the like are problems.

Disclosure of Invention

In the technical field of process optimization, in order to solve the technical problems, the purpose of the present disclosure is to provide a deployment method, a device, a medium and an electronic device for a deep learning platform cluster.

According to an aspect of the present application, there is provided a deployment method of a deep learning platform cluster, the method comprising:

Setting a storage node;

uploading an installation package of at least one deep learning platform, a deep learning calculation installation package corresponding to the installation package of the at least one deep learning platform and a system installation package on a target system on which the deep learning platform operates to the storage node for storage, wherein the installation packages of all the deep learning platforms are the installation packages of the same deep learning platform;

uploading an installation program of the deep learning platform to the storage node for storage;

and downloading a downloading instruction to target equipment to which the deep learning platform is to be deployed, so that the target equipment downloads an installation program of the deep learning platform according to the instruction of the downloading instruction, and then downloads and installs an installation package of the deep learning platform matched with the target equipment, a system installation package on which the deep learning platform depends and/or a deep learning computing installation package from the storage node by running the installation program of the deep learning platform.

According to another aspect of the present application, there is provided a deployment apparatus of a deep learning platform cluster, the apparatus comprising:

a setting module configured to set a storage node;

The first uploading module is configured to upload an installation package of at least one deep learning platform, a deep learning calculation installation package corresponding to the installation package of the at least one deep learning platform and a system installation package on which the deep learning platform depends to run on a target system to the storage node for storage, wherein the installation packages of all the deep learning platforms are the installation packages of the same deep learning platform;

the second uploading module is configured to upload the installation program of the deep learning platform to the storage node for storage;

the instruction issuing module is configured to issue a downloading instruction to target equipment to which the deep learning platform is to be deployed, so that the target equipment downloads an installation program of the deep learning platform according to the instruction of the downloading instruction, and then downloads and installs an installation package of the deep learning platform matched with the target equipment, a system installation package on which the deep learning platform depends and/or a deep learning computing installation package from the storage node by running the installation program of the deep learning platform.

According to another aspect of the application there is provided a computer readable program medium storing computer program instructions which, when executed by a computer, cause the computer to perform the method as described above.

According to another aspect of the present application, there is provided an electronic apparatus including:

a processor;

a memory having stored thereon computer readable instructions which, when executed by the processor, implement a method as described above.

The technical scheme provided by the embodiment of the application can comprise the following beneficial effects:

the deployment method of the deep learning platform cluster provided by the application comprises the following steps: setting a storage node; uploading an installation package of at least one deep learning platform, a deep learning calculation installation package corresponding to the installation package of the at least one deep learning platform and a system installation package on a target system on which the deep learning platform operates to the storage node for storage, wherein the installation packages of all the deep learning platforms are the installation packages of the same deep learning platform; uploading an installation program of the deep learning platform to the storage node for storage; and downloading a downloading instruction to target equipment to which the deep learning platform is to be deployed, so that the target equipment downloads an installation program of the deep learning platform according to the instruction of the downloading instruction, and then downloads and installs an installation package of the deep learning platform matched with the target equipment, a system installation package on which the deep learning platform depends and/or a deep learning computing installation package from the storage node by running the installation program of the deep learning platform.

According to the method, after the storage node is set, the installation package of the deep learning platform, the installation package of the deep learning computing, the installation package required by the installation operation of the deep learning platform such as the system installation package on which the target system depends and the installation program of the deep learning platform are respectively uploaded to the storage node, and when the deep learning platform is to be deployed on the target equipment, an instruction is issued to the corresponding target equipment, the corresponding target equipment can automatically download the installation program of the deep learning platform from the storage node according to the downloading instruction, and the installation package required by the installation operation of the deep learning platform matched with the target equipment can be automatically downloaded from the storage node through the installation operation of the installation program, so that the automatic adaptation, downloading and installation of the installation package required by the installation operation of the deep learning platform are realized, the deployment efficiency of the deep learning platform, particularly the cluster of the deep learning platform, is improved, and the deployment cost is reduced.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

FIG. 1 is a system architecture diagram illustrating a method of deployment of a deep learning platform cluster, according to an example embodiment;

FIG. 2 is a flowchart illustrating a method of deployment of a deep learning platform cluster, according to an example embodiment;

FIG. 3 is a flowchart illustrating details of steps 220 and 240 according to an embodiment shown in the corresponding embodiment of FIG. 2;

FIG. 4 is a flowchart showing steps performed by a target device in running an installation script of a deep learning platform, according to an example embodiment;

FIG. 5 is a block diagram of a deployment apparatus of a deep learning platform cluster, shown in accordance with an exemplary embodiment;

FIG. 6 is an exemplary block diagram of an electronic device implementing a method for deploying a deep learning platform cluster as described above, according to one illustrative embodiment;

fig. 7 is a computer readable storage medium embodying the deployment method of the deep learning platform cluster described above, according to an exemplary embodiment.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the invention. Rather, they are merely examples of apparatus and methods consistent with aspects of the invention as detailed in the accompanying claims.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities.

The present disclosure first provides a deployment method of a deep learning platform cluster. The deep learning platform may also be called a deep learning framework, which may provide class, function, API (Application Programming Interface ) interfaces to the outside for programming implementation of various deep learning algorithms. These deep learning algorithms may include deep convolutional neural networks (Convolutional Neural Network, CNN), recurrent neural networks (Recurrent Neural Network, RNN), generative antagonism networks (Generative Adversarial Networks, GAN), and the like. Common deep learning platforms include TensorFlow, keras, pyTorch, hundred degree flying paddles (PaddlePaddle) and the like. The deployment of the deep learning platform cluster refers to the deployment of the deep learning platform on a plurality of terminals or devices respectively, and the deployment method of the deep learning platform cluster provided by the disclosure can realize the deployment of the deep learning platform cluster with high efficiency.

The implementation terminal of the present disclosure may be any device having an operation and processing function, which may be connected to an external device, for receiving or transmitting data, and may specifically be a portable mobile device, such as a smart phone, a tablet computer, a notebook computer, PDA (Personal Digital Assistant), or a fixed device, such as a computer device, a field terminal, a desktop computer, a server, a workstation, or the like, or may be a collection of multiple devices, such as a physical infrastructure of cloud computing.

Preferably, the implementation terminal of the present disclosure may be a server or a physical infrastructure of cloud computing.

Fig. 1 is a system architecture diagram illustrating a method of deployment of a deep learning platform cluster according to an exemplary embodiment. As shown in fig. 1, the system architecture includes a file server 120, a user terminal 130, and a device cluster 110 to be deployed with a deep learning platform, where the device cluster 110 includes a plurality of devices, and the user terminal 130 and each device, each device and the file server 120, and the file server 120 and the user terminal 130 are all connected by communication links. In this embodiment, the user terminal 130 is an execution terminal of the present application, and each target device in the target device cluster 110 may be organized with a preset architecture to implement data interaction. When the deployment method of the deep learning platform cluster provided by the present disclosure is applied to the system architecture shown in fig. 1, a specific process may be as follows: first, the user terminal 130 configures the file server 120 as a storage node, so that the file server 120 can receive and store data transmitted from the user terminal 130; then, the user terminal 130 may upload the installation package of the deep learning platform, the corresponding deep learning calculation installation package, the system installation package on which the deep learning platform depends, and the installation program of the deep learning platform to the storage node for storage; then, the user terminal 130 issues a download instruction to a target device to be deployed with the deep learning platform in the multiple devices of the device cluster 110 through a communication link, the target device can download an installation program of the deep learning platform according to the instruction of the download instruction, and then the target device downloads and installs an installation package of the deep learning platform adapted to the target device, a system installation package on which the deep learning platform depends and/or a deep learning calculation installation package from a storage node by running the installation program, so that the deployment of the deep learning platform on the target device is realized, and the deep learning platforms deployed on the target devices form the deep learning platform cluster.

It should be noted that, the embodiment shown in fig. 1 is only one embodiment of the present application, and although in this embodiment, a device in a device cluster is a different device from an executing terminal of the present application, and a storage node for storing an installation program and various installation packages is one node, in other embodiments or specific applications, a device in a device cluster may be an executing terminal of the present application, and may be a different device from a target device, and a storage node for storing an installation program and various installation packages may be a plurality of nodes, such as a storage node cluster.

Fig. 2 is a flow chart illustrating a method of deployment of a deep learning platform cluster, according to an example embodiment. The embodiment may be executed by the foregoing server or desktop computer, as shown in fig. 2, and may include the following steps:

step 210, a storage node is set.

The storage node may be any device having storage and communication functions, which may be the same type of device as the execution terminal of the present application, or may be a different type of device. The number of the storage nodes may be one or plural, for example, when plural storage nodes are provided, the plural storage nodes may be a server cluster.

The storage nodes may be arranged in a variety of ways.

In one embodiment, the setting storage node includes:

and sending an installation package of the storage management system to the target node so that the target node becomes a storage node after receiving and installing the installation package.

In one embodiment, the target node has a client pre-installed, and the home terminal has a server corresponding to the client pre-installed, and the setting storage node includes:

and configuring a configuration file corresponding to the client of the target node at the server side so as to set the target node as a storage node.

In one embodiment, the storage node is a file server.

And 220, uploading an installation package of at least one deep learning platform, a deep learning calculation installation package corresponding to the installation package of the at least one deep learning platform and a system installation package on which the deep learning platform depends to run on a target system to the storage node for storage.

Wherein, all the installation packages of the deep learning platform are the installation packages of the same deep learning platform.

As previously described, the deep learning platform may be a variety of software system architectures that support running deep learning models or algorithms, such as a TensorFlow.

When the number of the installation packages of the deep learning platform uploaded to the storage node for storage is multiple, the installation packages of each deep learning platform can be the same platform but different versions of installation packages.

The target system may be various operating systems, such as a Linux-based red cap (RedHat) operating system, a Linux-based community enterprise operating system (Community Enterprise Operating System, centos), a Linux-based wu Ban Tu (ubuntu) operating system, and the like.

The deep learning computing installation package is an installation package for assisting a platform for deep learning training reasoning. A detailed description of the installation package of the deep learning platform, the deep learning computing installation package, and the system installation package will be set forth in the explanation of step 240, and will not be described in detail herein.

And 230, uploading the installation program of the deep learning platform to the storage node for storage.

The installer may be any program entity that may be used to install the deep learning platform, such as software, modules, components, scripts, or the like.

And 240, issuing a downloading instruction to target equipment to be deployed with the deep learning platform, so that the target equipment downloads an installation program of the deep learning platform according to the instruction of the downloading instruction, and then downloads and installs an installation package of the deep learning platform matched with the target equipment, a system installation package on which the deep learning platform depends and/or a deep learning computing installation package from the storage node by running the installation program of the deep learning platform.

The download instruction is an instruction for instructing the target device to download an installer of the deep learning platform.

The target device downloads and installs the installation package of the deep learning platform matched with the target device, the system installation package relied by the operation of the deep learning platform and/or the installation package of the deep learning calculation from the storage node by running the installation program of the deep learning platform, which means that the target device can download and install the installation package of the deep learning platform matched with the target device, the system installation package relied by the operation of the deep learning platform and the installation package of the deep learning calculation from the storage node at different time, for example, the installation package of the deep learning platform matched with the target device and the system installation package relied by the operation of the deep learning platform can be downloaded and installed from the storage node only, and two installation packages are included.

In one embodiment, an allowable automation operation and maintenance tool is utilized to issue a download instruction to a target device to which the deep learning platform is to be deployed.

Anstable is an automated operation tool that does not require clients or agents to be installed on the remote host, communicating with the remote host based on ssh.

For example, the target system is a RedHat operating system, the deep learning platform is a Tensorflow, and the download instruction issued to the target device to which the deep learning platform is to be deployed may be a wgget install-tensorflow.sh, where the install-tensorflow.sh is a script type installation program.

In this step, by issuing a download instruction to the target device to which the deep learning platform is to be deployed, the target device that receives the issue instruction may complete the construction of the deep learning platform and the environment on which the deep learning platform depends according to the instruction.

In one embodiment, the deep learning computing installation package includes a general parallel computing architecture installation package and a deep neural network library installation package, the system installation package on which the deep learning platform depends running on the target system includes a compiler-suite installation package and a system core library installation package, the at least one deep learning platform installation package includes at least one deep learning platform graphics processing version installation package, the deep learning computing installation package corresponds to the deep learning platform graphics processing version installation package, and the specific steps of step 220 and step 240 may be as shown in fig. 3.

Fig. 3 is a flowchart illustrating details of steps 220 and 240 according to an embodiment shown in the corresponding embodiment of fig. 2. As shown in fig. 3, the method comprises the following steps:

and 220', uploading an installation package of at least one deep learning platform, a general parallel computing architecture installation package and a deep neural network library installation package corresponding to the installation package of the graphics processing version of the deep learning platform, and an installation package and a system core library installation package of a compiler suite matched with the installation package of the deep learning platform to the storage node for storage.

In this embodiment, to install and run the deep learning platform, in addition to the installation package of the deep learning platform itself, other dependent installation packages are required, including a general parallel computing architecture installation package, a deep neural network library installation package, an installation package of a compiler suite, and a system core library installation package.

The correspondence between the installation packages refers to the version correspondence of the two installation packages, so that the two installation packages can be mutually compatible after installation and operation to realize cooperative work.

Since the development of an installation package of a version is often based on other kinds of installation packages of a particular version and versions of its operating system, running an installation package is often based on other kinds of installation packages of a corresponding particular version number and operating systems of a particular version to achieve compatibility. For example, if an installation package of a version and a graphics processing version of a deep learning platform is to be installed on a device, in order to better use the deep learning platform installed with the installation package, it is necessary to install a generic parallel computing architecture installation package, a deep neural network library installation package, or the like of a version corresponding to the deep learning installation package of the version.

In one embodiment, the installation package of the deep learning platform is an anaconda package embedded with a TensorFlow-gpu package and/or a TensorFlow-cpu package, the general parallel computing architecture installation package corresponding to the installation package of the graphics processing version of the deep learning platform is a CUDA package corresponding to the TensorFlow-gpu package, the deep neural network library installation package corresponding to the installation package of the graphics processing version of the deep learning platform is a cuDNN package corresponding to the TensorFlow-gpu package, the installation package of the compiler suite is a GCC package, and the system core library installation package is a GLIBC package.

For example, when the version number of the installed Tensorflow-gpu packet is 1.11.0, the version number of the cuDNN packet corresponding to the Tensorflow-gpu packet should be 7.0, and the version number of the CUDA packet corresponding to the Tensorflow-gpu packet should be 9.0.

CUDA (Compute Unified Device Architecture, computing general device architecture) is an operation platform proposed by NVIDIA of graphic card manufacturers.

cuDNN (The NVIDIACUDADeep Neural Network library, NVIDIA deep neural network library) is a gpu-accelerated deep neural network primitive library.

The GNU C Library (glibc). GNU C library engineering provides a core library for GNU systems and GNU/Linux systems, as well as many other systems that use Linux as a kernel. These libraries provide key apis (Application Programming Interface, application programming interfaces) including ISO C11, posix.1-2008, BSD, os-specific apis, and the like. These api include infrastructure such as open, read, write, malloc, printf, getaddrinfo, dlopen, pthread _ create, crypt, login, exit.

GNU is a recursive abbreviation for "GNU is Not Unix". GNU is a free operating system whose content software is released entirely in GPL.

GCC is GNU compiler suite (GNU Compiler Collection).

Step 240' of issuing a download instruction to a target device to which the deep learning platform is to be deployed, so that the target device downloads an installation program of the deep learning platform according to an instruction of the download instruction, and then downloads and installs an installation package of the deep learning platform matching the target device, a general parallel computing architecture installation package and a deep neural network library installation package corresponding to an installation package of a graphics processing version of the deep learning platform, and an installation package and a system core library installation package or of a compiler suite matching the installation package of the deep learning platform from the storage node by running the installation program of the deep learning platform

The system comprises an installation package of a deep learning platform matched with the target equipment, an installation package of a compiler suite matched with the installation package of the deep learning platform and a system core library installation package.

In this embodiment, by running the installation program of the deep learning platform, the target device may download and install different numbers and types of deep learning platforms and installation packages that are matched with the target device for different types of target devices, for example, for the installation packages of the graphics processing version of the deep learning platform, the general parallel computing architecture installation packages and the deep neural network library installation packages may also be correspondingly downloaded and installed, so that the matching between the installation packages required by the various types of deep learning platforms to be deployed downloaded by the target device and the target device is ensured, so that the deep learning platform deployment on the target device may be automatically completed, thereby improving the deployment efficiency of the deep learning platform, in particular, the cluster deployment of the deep learning platform, and reducing the deployment cost.

In one embodiment, the setting storage node includes:

setting a plurality of storage nodes;

the uploading the installation package of the at least one deep learning platform, the deep learning calculation installation package corresponding to the installation package of the at least one deep learning platform and the system installation package on which the deep learning platform depends to run on a target system to the storage node for storage comprises the following steps:

respectively uploading an installation package of at least one deep learning platform, a deep learning calculation installation package corresponding to the installation package of the at least one deep learning platform and a system installation package on a target system, on which the deep learning platform is operated, to the plurality of storage nodes for storage;

the downloading instruction is issued to the target device to be deployed with the deep learning platform, so that the target device downloads the installation program of the deep learning platform according to the instruction of the downloading instruction, and then downloads and installs the installation package of the deep learning platform matched with the target device, the system installation package and/or the deep learning computing installation package depending on the deep learning platform from the storage node by running the installation program of the deep learning platform, and the method comprises the following steps:

And downloading a downloading instruction to target equipment to which the deep learning platform is to be deployed, so that the target equipment downloads an installation program of the deep learning platform according to the instruction of the downloading instruction, and then downloads and installs an installation package of the deep learning platform matched with the target equipment from a storage node closest to the target equipment, and runs a system installation package and/or a deep learning computing installation package on which the deep learning platform depends by running the installation program of the deep learning platform.

In the embodiment, by setting a plurality of storage nodes and enabling the target equipment to download the installation package and the dependent installation package of the deep learning platform from the storage node closest to the target equipment when the target equipment is to deploy the deep learning platform, the resource downloading rate of the target equipment when the deep learning platform is deployed is improved to a certain extent, the resource downloading delay is reduced, and the deployment efficiency of the deep learning platform is improved; in addition, as the installation package of the deep learning platform, the deep learning calculation installation package and the system installation package are respectively uploaded to a plurality of storage nodes, the load of a single storage node is reduced.

In one embodiment, the uploading the installation program of the deep learning platform to the storage node for storage includes:

uploading the installation script of the deep learning platform to the storage node for storage;

and downloading a downloading instruction to target equipment to be deployed with the deep learning platform, so that the target equipment downloads an installation script of the deep learning platform according to the instruction of the downloading instruction, and then downloads and installs an installation package of the deep learning platform matched with the target equipment, a system installation package on which the deep learning platform depends and/or a deep learning computing installation package from the storage node by running the installation script of the deep learning platform.

For example, the target system may be a Linux-based RedHat operating system, the deep learning platform is tensorflow, the installation script is named as install-tensorflow.sh, and then the command that can be used for running the script is sh install-tensorflow.sh.

In one embodiment, the deep learning computing installation package includes a general parallel computing architecture installation package and a deep neural network library installation package, the system installation package on which the deep learning platform depends running on a target system includes an installation package of a compiler suite and a system core library installation package, the installation package of the at least one deep learning platform includes an installation package of a graphics processing version of the at least one deep learning platform, a correspondence between the installation packages is a correspondence of a version number of the installation package, and the deep learning computing installation package corresponds to the installation package of the graphics processing version of the deep learning platform, when an installation script of the deep learning platform is executed by the target device, the steps implemented may be as shown in fig. 4.

FIG. 4 is a flowchart showing steps performed by a target device in running an installation script of a deep learning platform, according to an example embodiment. As shown in fig. 4, the method comprises the following steps:

Information acquisition step 410: and acquiring the system kernel version information of the target equipment.

The cat/etc/redhat-release command may be used to obtain system kernel version information.

Kernel version determination step 420: comparing the system kernel version information with preset version information, if the system kernel version information is consistent with the preset version information, transferring to a system installation package version judging step, and if the system kernel version information is inconsistent with the preset version information, transferring to a system installation package upgrading step.

If the predetermined version information is 7.4 and the obtained system kernel version information is 6.7, the two are inconsistent, and the process proceeds to step 430.

System installation package upgrade step 430: and respectively downloading and installing an installation package of the compiler suite corresponding to the preset version information and a system core library installation package from the storage node.

The specific process of this step may be as follows: after the installation catalogs of the compiler suite and the system core library are respectively created, entering the installation catalogs of the compiler suite, downloading and installing the compiler suite corresponding to the preset version information from the storage node by taking the installation catalogs of the compiler suite as paths, entering the installation catalogs of the system core library, and downloading and installing the system core library corresponding to the preset version information from the storage node by taking the installation catalogs of the system core library as paths. For example, the compiler suite may be gcc, the system core library may be glibc, the wcc corresponding to the preset version information is downloaded and then installed at the storage node by the wcet command for the installation directory, the glibc corresponding to the preset version information is downloaded and then installed at the storage node by the wcet command for the installation directory.

System installation package version determination step 440: and acquiring the version of the compiler suite currently installed by the target equipment and the version of the system core library to determine whether the version of the compiler suite and the version of the system core library are both preset versions, if so, turning to an equipment type judging step, and if not, turning to a system installation package upgrading step.

For example, the compiler suite may be gcc, the system core library may be glibc, and then the corresponding commands for obtaining the version of the compiler suite and the version of the system core library currently installed by the target device may be gcc-version and ldd-version, respectively.

This step performs a secondary check on the version of the compiler suite and the version of the system core library.

Device type determination step 450: and judging whether the target equipment is the graphic processing equipment, if so, turning to a deep learning calculation installation package installation step, and if not, turning to a platform installation package installation step.

For example, if the Graphics Processing (GPU) device is a device equipped with an english (nvidia) GPU, the lspci|grep-i nvidia command may determine whether the target device is a GPU device, where when the target device is a GPU device, GPU related information of nvidia is returned according to the command.

Deep learning computing installation package installation step 460: and downloading and installing a general parallel computing architecture installation package and a corresponding version of deep neural network library installation package from the storage node.

Platform installation package installation step 470: and acquiring the equipment type of the target equipment and the version number of the general parallel computing architecture installed by the target equipment, if the target equipment is graphics processing equipment, downloading and installing an installation package of the graphics processing version of the deep learning platform corresponding to the version number of the general parallel computing architecture from the storage node, and if the target equipment is not graphics processing equipment, downloading and installing an installation package of the non-graphics processing version of the deep learning platform corresponding to the version number of the compiler suite and the version number of the system core library from the storage node after acquiring the version number of the compiler suite installed by the target equipment and the version number of the system core library.

The method comprises the steps that through executing an nvidia-smi|grep CUDA command, the device type of target equipment and the version number of a general parallel computing architecture installed by the target equipment can be obtained, if the target equipment returns relevant information according to an nvidia-smi|grep CUDA command instruction, the target equipment is determined to be GPU equipment, the returned information contains the version number of the CUDA, a catalog is created firstly, a cd command is used for entering the catalog, then a wret command is used for downloading a tensorsurface-GPU installation package corresponding to the version number of the CUDA from a storage node, and then the tensorsurface-GPU installation package is installed; if the target device does not return the related information according to the nvidia-smi|grep CUDA command instruction, determining that the target device is the CPU device, acquiring version numbers of gcc and glibc installed on the target device, downloading a tensorf low-CPU installation package corresponding to the version numbers from a storage node by using a wget command, and finally installing the tensorf low-CPU installation package.

In one embodiment, the determining whether the target device is a graphics processing device includes:

after a graphic processing information acquisition instruction is operated, judging whether the target equipment is graphic processing equipment or not based on the graphic processing information returned by the target equipment according to the instruction, wherein the graphic processing information comprises the model of a graphic processor of the target equipment;

the storage node also stores a driver corresponding to the model of each graphic processor, and downloads and installs a general parallel computing architecture installation package and a corresponding version of deep neural network library installation package from the storage node, including:

downloading and installing a driver corresponding to the model of the graphics processor of the target device from the storage node;

downloading and installing a general parallel computing architecture installation package and a corresponding version of deep neural network library installation package from the storage node;

and writing the configuration environment corresponding to the general parallel computing architecture and the deep neural network library into a preset catalog.

In this embodiment, not only the general parallel computing architecture and the deep neural network library are downloaded and installed, but also the configuration of the drive and the environment is completed.

The model of the graphics processor of the target device may be various graphics processor models, such as the model of the GPU of nvidia.

The manner in which the configuration environment is written to the preset directory can be varied. For example, the authority of/usr/local/cuda may be modified by a chown command, then the configuration environment is written to/etc/profile, and source/etc/profile is executed to validate the configuration environment save.

In one embodiment, the preset version information is a version number, and the name of each installation package stored in the storage node includes the version number of the installation package, and each corresponding installation package is stored in the storage node correspondingly.

In one embodiment, the preset version information is a version number, the name of each installation package stored in the storage node includes the version number of the installation package, the storage node further stores a table of correspondence between the version numbers of the installation packages, and by querying the table, another installation package corresponding to one installation package can be determined.

In one embodiment, the method further comprises the step of uploading to the storage node for storage, after the platform installation package installation step, when the installation script of the deep learning platform is executed by the target device, further comprising the step of:

The testing steps are as follows: and downloading and running the test code from the storage node to verify whether the deployment of the deep learning platform is correct.

For example, the test code is test.py, and the deep learning platform is TensorFlow managed by anaconda, then a specific path of the installation directory of anaconda can be entered first, test.py is downloaded from the storage node with a wget command, and then test.py is run under the specific path for verification. For example, the deep learning platform may be verified to be built correctly by running/appcom/anaconda 3/bin/python test.

The method has the advantages that after the deep learning platform and various installation packages required by the operation of the deep learning platform are built, the test codes are further operated to test, and the built deep learning platform and the built environment are guaranteed to be reliable.

In one embodiment, the test code may be:

import tensorflow as tf

sess＝tf.Session(config＝tf.ConfigProto(log_device_placement＝True))

the disclosure also provides a deployment device of the deep learning platform cluster, and the following is an embodiment of the device of the disclosure.

Fig. 5 is a block diagram illustrating a deployment apparatus of a deep learning platform cluster, according to an example embodiment. As shown in fig. 5, the apparatus 500 includes:

A setting module 510 configured to set a storage node;

a first uploading module 520 configured to upload, to the storage node, an installation package of at least one deep learning platform, a deep learning computation installation package corresponding to the installation package of the at least one deep learning platform, and a system installation package on which the deep learning platform depends to run on a target system, where all the installation packages of the deep learning platform are installation packages of the same deep learning platform;

a second uploading module 530 configured to upload an installer of the deep learning platform to the storage node for storage;

the instruction issuing module 540 is configured to issue a downloading instruction to a target device to which the deep learning platform is to be deployed, so that the target device downloads an installation program of the deep learning platform according to the instruction of the downloading instruction, and then downloads and installs an installation package of the deep learning platform matched with the target device, a system installation package on which the deep learning platform depends and/or a deep learning computing installation package from the storage node by running the installation program of the deep learning platform.

According to a third aspect of the present disclosure, there is also provided an electronic device capable of implementing the above method.

Those skilled in the art will appreciate that the various aspects of the invention may be implemented as a system, method, or program product. Accordingly, aspects of the invention may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.

An electronic device 600 according to this embodiment of the invention is described below with reference to fig. 6. The electronic device 600 shown in fig. 6 is merely an example, and should not be construed as limiting the functionality and scope of use of embodiments of the present invention.

As shown in fig. 6, the electronic device 600 is in the form of a general purpose computing device. Components of electronic device 600 may include, but are not limited to: the at least one processing unit 610, the at least one memory unit 620, and a bus 630 that connects the various system components, including the memory unit 620 and the processing unit 610.

Wherein the storage unit stores program code that is executable by the processing unit 610 such that the processing unit 610 performs steps according to various exemplary embodiments of the present invention described in the above-described "example methods" section of the present specification.

The storage unit 620 may include readable media in the form of volatile storage units, such as Random Access Memory (RAM) 621 and/or cache memory 622, and may further include Read Only Memory (ROM) 623.

The storage unit 620 may also include a program/utility 624 having a set (at least one) of program modules 625, such program modules 625 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.

Bus 630 may be a local bus representing one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or using any of a variety of bus architectures.

The electronic device 600 may also communicate with one or more external devices 800 (e.g., keyboard, pointing device, bluetooth device, etc.), one or more devices that enable a user to interact with the electronic device 600, and/or any device (e.g., router, modem, etc.) that enables the electronic device 600 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 650. Also, electronic device 600 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 660. As shown, network adapter 660 communicates with other modules of electronic device 600 over bus 630. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 600, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, including several instructions to cause a computing device (may be a personal computer, a server, a terminal device, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.

According to a fourth aspect of the present disclosure, there is also provided a computer readable storage medium having stored thereon a program product capable of implementing the method described herein above. In some possible embodiments, the various aspects of the invention may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps according to the various exemplary embodiments of the invention as described in the "exemplary methods" section of this specification, when said program product is run on the terminal device.

Referring to fig. 7, a program product 700 for implementing the above-described method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The computer readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

Furthermore, the above-described drawings are only schematic illustrations of processes included in the method according to the exemplary embodiment of the present invention, and are not intended to be limiting. It will be readily appreciated that the processes shown in the above figures do not indicate or limit the temporal order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, among a plurality of modules.

It is to be understood that the invention is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. A method for deploying a deep learning platform cluster, the method comprising:

setting a storage node;

Downloading a downloading instruction to target equipment to which the deep learning platform is to be deployed, so that the target equipment downloads an installation program of the deep learning platform according to the instruction of the downloading instruction, and then downloads and installs an installation package of the deep learning platform matched with the target equipment, a system installation package on which the deep learning platform depends and/or a deep learning computing installation package from the storage node by running the installation program of the deep learning platform;

the deep learning computing installation package comprises a general parallel computing architecture installation package and a deep neural network library installation package, the system installation package on which the deep learning platform depends runs on a target system comprises a compiler suite installation package and a system core library installation package, the at least one deep learning platform installation package comprises at least one deep learning platform graphic processing version installation package, the correspondence between the installation packages is the correspondence of the version numbers of the installation packages, the deep learning computing installation package corresponds to the deep learning platform graphic processing version installation package, and the installation programs of the deep learning platform are uploaded to the storage nodes for storage, and the method comprises the following steps: uploading the installation script of the deep learning platform to the storage node for storage; the downloading instruction is issued to the target device to be deployed with the deep learning platform, so that the target device downloads the installation program of the deep learning platform according to the instruction of the downloading instruction, and then downloads and installs the installation package of the deep learning platform matched with the target device, the system installation package and/or the deep learning computing installation package depending on the deep learning platform from the storage node by running the installation program of the deep learning platform, and the method comprises the following steps: downloading a downloading instruction to target equipment to which the deep learning platform is to be deployed, so that the target equipment downloads an installation script of the deep learning platform according to the instruction of the downloading instruction, and then downloads and installs an installation package of the deep learning platform matched with the target equipment, a system installation package on which the deep learning platform depends and/or a deep learning computing installation package from the storage node by running the installation script of the deep learning platform; when the installation script of the deep learning platform is executed by the target device, the following steps are realized: an information acquisition step: acquiring system kernel version information of target equipment; kernel version judging step: comparing the system kernel version information with preset version information, if the system kernel version information is consistent with the preset version information, transferring to a system installation package version judging step, and if the system kernel version information is inconsistent with the preset version information, transferring to a system installation package upgrading step; system installation package upgrading step: respectively downloading and installing an installation package of a compiler suite corresponding to the preset version information and a system core library installation package from the storage node; judging the version of the system installation package: acquiring the version of the compiler suite currently installed by the target device and the version of the system core library to determine whether the version of the compiler suite and the version of the system core library are both preset versions, if so, turning to a device type judging step, and if not, turning to a system installation package upgrading step; judging the equipment type: judging whether the target equipment is graphic processing equipment, if so, turning to a deep learning calculation installation package installation step, and if not, turning to a platform installation package installation step; the deep learning calculation installation package installation step: downloading and installing a general parallel computing architecture installation package and a corresponding version of deep neural network library installation package from the storage node; the platform installation package installation step: and acquiring the equipment type of the target equipment and the version number of the general parallel computing architecture installed by the target equipment, if the target equipment is graphics processing equipment, downloading and installing an installation package of the graphics processing version of the deep learning platform corresponding to the version number of the general parallel computing architecture from the storage node, and if the target equipment is not graphics processing equipment, downloading and installing an installation package of the non-graphics processing version of the deep learning platform corresponding to the version number of the compiler suite and the version number of the system core library from the storage node after acquiring the version number of the compiler suite installed by the target equipment and the version number of the system core library.

2. The method of claim 1, wherein said determining whether the target device is a graphics processing device comprises:

3. The method of claim 1, wherein the test code uploaded to the storage node for storage further comprises test code that, after the platform installation package installation step, when executed by the target device, further performs the steps of:

4. The method of claim 1, wherein the setting up a storage node comprises:

setting a plurality of storage nodes;

the downloading instruction is issued to the target device to be deployed with the deep learning platform, so that the target device downloads the installation program of the deep learning platform according to the instruction of the downloading instruction, and then downloads and installs the installation package of the deep learning platform matched with the target device, the system installation package and/or the deep learning computing installation package depending on the operation of the deep learning platform from the storage node by operating the installation program of the deep learning platform, and the method further comprises:

5. A deployment apparatus for a deep learning platform cluster, the apparatus comprising:

a setting module configured to set a storage node;

The instruction issuing module is configured to issue a downloading instruction to target equipment to which the deep learning platform is to be deployed, so that the target equipment downloads an installation program of the deep learning platform according to the instruction of the downloading instruction and then downloads and installs an installation package of the deep learning platform matched with the target equipment, a system installation package on which the deep learning platform depends and/or a deep learning calculation installation package from the storage node by running the installation program of the deep learning platform;

6. A computer readable program medium, characterized in that it stores computer program instructions, which when executed by a computer, cause the computer to perform the method according to any one of claims 1 to 4.

7. An electronic device, the electronic device comprising:

a processor;

a memory having stored thereon computer readable instructions which, when executed by the processor, implement the method of any of claims 1 to 4.