CN112311605B - Cloud platform and method for providing machine learning service - Google Patents

Cloud platform and method for providing machine learning service Download PDF

Info

Publication number
CN112311605B
CN112311605B CN202011226841.6A CN202011226841A CN112311605B CN 112311605 B CN112311605 B CN 112311605B CN 202011226841 A CN202011226841 A CN 202011226841A CN 112311605 B CN112311605 B CN 112311605B
Authority
CN
China
Prior art keywords
environment
request
machine learning
training
training data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011226841.6A
Other languages
Chinese (zh)
Other versions
CN112311605A (en
Inventor
马震
王志洋
马慧荣
黄严
张德兵
邓亚峰
赵勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gelingshentong Information Technology Co ltd
Original Assignee
Beijing Gelingshentong Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gelingshentong Information Technology Co ltd filed Critical Beijing Gelingshentong Information Technology Co ltd
Priority to CN202011226841.6A priority Critical patent/CN112311605B/en
Publication of CN112311605A publication Critical patent/CN112311605A/en
Application granted granted Critical
Publication of CN112311605B publication Critical patent/CN112311605B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • H04L67/025Protocols based on web technology, e.g. hypertext transfer protocol [HTTP] for remote control or remote monitoring of applications
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/04Network management architectures or arrangements
    • H04L41/044Network management architectures or arrangements comprising hierarchical management structures
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0803Configuration setting
    • H04L41/0823Configuration setting characterised by the purposes of a change of settings, e.g. optimising configuration for enhancing reliability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/60Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

A cloud platform and method of providing machine learning services, comprising: the system comprises an IaaS layer, a PaaS layer and a SaaS layer, wherein the IaaS layer is provided with a system support module, the PaaS layer is provided with Kubernetes and dockers, and the SaaS layer comprises a public library service module, a RESTful micro-service module, an application service module and a management maintenance module, wherein the public library service module is used for recording logs, configuration parameters and mathematical calculations; the RESTful micro-service module is used for processing the received request, scheduling and life cycle management of the WEB classification task; the application service module is used for displaying the running state of the task and the machine learning result; and the management maintenance module is used for managing the mirror image resources by utilizing the Harbor. By adopting the scheme in the application, the environment can be migrated simply, the tracking experiment and the machine learning deployment are very easy, and the experimental result can be reproduced.

Description

Cloud platform and method for providing machine learning service
Technical Field
The present application relates to cloud computing technology, and in particular, to a cloud platform and method for providing machine learning services.
Background
Machine learning is a very popular technology, whether applied in security, transportation, medical, financial, retail, etc. fields.
Although machine learning can produce excellent results, it is still complex to use in practice. In addition to the common challenges in software development, machine learning developers are faced with new challenges including experimental management (e.g., tracking what parameters, code and data are caused by the results), repeatability (e.g., the same code may be executed later in the same operating environment), deployment of models to production environments, and data governance (auditing models and data used throughout the organization). These workflow-related challenges around the machine learning lifecycle are typically the biggest hurdle to using machine learning in a production environment and expanding it inside an organization.
Currently, there are some online training services of cloud platform supporting algorithms, and these cloud platforms often have the following functions: and (3) self-writing codes for training by clicking resources such as a GPU, storage and the like and corresponding environments. While machine learning with these cloud platforms still has the following problems:
1. there are numerous tools independent of each other, from data preparation to model training, hundreds of software tools covering each stage of the machine learning lifecycle, machine learning developers need to deploy a production environment around tens of libraries;
2. the experimental result is difficult to reproduce, a large amount of data is needed when a model is trained, the environment is deployed in a targeted manner, and when the model is used in the actual production environment of a user, the environment is required to be redeployed and a large amount of data is acquired, so that the near-identical experimental result is possible to be obtained;
3. tracking experiments and deploying machine learning is difficult, machine learning algorithms have tens of configurable parameters, tracking these configurable parameters and their values is very difficult, and migrating trained models to a production environment is very challenging.
Problems in the prior art:
at present, a cloud platform special for machine learning is not available, so that a user needs to re-execute a great deal of complex work such as data preparation, environment deployment and the like when using a model in different environments.
Disclosure of Invention
The embodiment of the application provides a cloud platform and a method for providing machine learning service, so as to solve the technical problems.
According to a first aspect of the present application, there is provided a cloud platform for providing machine learning services, comprising: an IaaS layer, a PaaS layer and a SaaS layer, wherein the IaaS layer is provided with a system support module, the PaaS layer is provided with Kubernetes and Docker, the SaaS layer comprises a public library service module, a RESTful micro-service module, an application service module and a management maintenance module,
the public library service module is used for recording logs, configuration parameters and mathematical calculation;
the RESTful micro-service module is used for processing the received request, scheduling and life cycle management of the WEB classification task;
the application service module is used for displaying the task running state and the machine learning result;
and the management maintenance module is used for managing the mirror image resources by utilizing the Harbor.
According to a second aspect of the present application, there is provided a method for machine learning using a cloud platform for providing machine learning services as described above, comprising:
acquiring a training data set;
loading the training data set and a predetermined initial algorithm model into a predetermined GPU cluster, and training to obtain a trained algorithm model;
and respectively mirror-storing the environment and the data by utilizing the Harbor according to the received environment storage request and the data storage request.
According to the cloud platform provided by the embodiment of the application, after one training is finished, environments and data can be respectively mirrored to serve as mirror image resources to be stored, machine learning developers do not need to redeploy the environments in subsequent production environments, models after the training is finished can be simply migrated to the production environments, tracking experiments and machine learning deployment are very easy, and experimental results can be reproduced.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate and explain the application and together with the description serve to explain the application and do not constitute an undue limitation. In the drawings:
fig. 1 shows a schematic structural diagram of a cloud platform for providing a machine learning service in an example one of the present application;
FIG. 2 is a flow chart of a method of providing machine learning services in example two of the present application;
fig. 3 shows an architecture schematic of a machine learning cloud platform in example four of the present application.
Detailed Description
In order to make the technical solutions and advantages in the examples of the present application more apparent, the following detailed description of exemplary examples of the present application is given in conjunction with the accompanying drawings, and it is apparent that the described examples are only some examples of the present application and not exhaustive of all examples. It should be noted that, in the case of no conflict, the examples and features in the examples may be combined with each other.
Example one
Fig. 1 shows a schematic structural diagram of a cloud platform for providing a machine learning service in an example of the present application.
As shown, the cloud platform for providing machine learning service includes: an IaaS layer, a PaaS layer and a SaaS layer, wherein the IaaS layer is provided with a system support module, the PaaS layer is provided with Kubernetes and Docker, the SaaS layer comprises a public library service module, a RESTful micro-service module, an application service module and a management maintenance module,
the public library service module is used for recording logs, configuration parameters and mathematical calculation;
the RESTful micro-service module is used for processing the received request, scheduling and life cycle management of the WEB classification task;
the application service module is used for displaying the task running state and the machine learning result;
and the management maintenance module is used for managing the mirror image resources by utilizing the Harbor.
In specific implementation, the WEB classification task may include a machine learning training task, an environment and data mirroring task, and the like.
According to the cloud platform provided by the embodiment of the application, after one training is finished, environments and data can be respectively mirrored to serve as mirror image resources to be stored, machine learning developers do not need to redeploy the environments in subsequent production environments, models after the training is finished can be simply migrated to the production environments, tracking experiments and machine learning deployment are very easy, and experimental results can be reproduced.
In one embodiment, the RESTful microservice module comprises:
the training data storage unit is used for receiving a training data storage request and storing training data in the training data storage request;
and the training data extraction unit is used for receiving the training data extraction request and extracting feedback of the requested training data according to the training data extraction request.
In one embodiment, the training data storage unit is configured to receive a training data storage request and store training data in the training data storage request to a cloud data center.
In one embodiment, the RESTful microservice module comprises:
the environment preservation unit is used for receiving an environment preservation request and preserving the environment mirror image in the environment preservation request;
and the environment extraction unit is used for receiving the environment extraction request and feeding back the requested environment extraction according to the environment extraction request.
In one embodiment, the environment preservation unit is configured to receive an environment preservation request and preserve an environment image in the environment preservation request to a local training center.
In one embodiment, the management maintenance module includes:
the mirror image manufacturing unit is used for manufacturing a mirror image of the Docker; in particular, it may refer to packaging training environments or code blocks required for machine learning into an ISO image.
The issuing unit is used for issuing the Docker mirror image; specifically, the whole platform can uniformly manage all manufactured images, and a user can select image files required by the user according to the needs of the user, so that the user can quickly deploy the machine learning training environment of the user.
A monitoring unit for monitoring Kubernetes, docker and micro service resources; specifically, the monitoring unit may monitor Kubernetes, docker and micro-service resources in real time in order to ensure efficient operation of the entire cluster. If a certain machine learning training environment is crashed, the embodiment of the application obtains the message of the environmental crash through the monitoring unit, and then the machine learning training environment is quickly recovered, so that the situation that relevant data cannot be lost is ensured, and the high availability of service is ensured.
And the arrangement unit is used for arranging the resources. In practice, machine learning training requires significant amounts of computer resources, including CPU, GPU, memory, disk, network, and so forth. The resources are arranged to better allocate these network resources in the cluster in order to allow all users to better use the flexible computer resources in the cluster.
Example two
Based on the same inventive concept, the present application example provides a method for machine learning by using the cloud platform for providing machine learning services as described in example one.
Fig. 2 shows a flow chart of a method for providing machine learning services in example two of the present application.
As shown, the method for providing machine learning service includes:
step 201, acquiring a training data set;
in one embodiment, the training data set is stored in a disk of the IaaS layer, and the training data set is managed by the SaaS service MinIO in the embodiment of the present application, and the public library service module is invoked to operate the MinIO, so as to obtain a corresponding training data set.
Step 202, loading the training data set and a predetermined initial algorithm model into a predetermined GPU cluster, and training to obtain a trained algorithm model;
in one embodiment, resources such as GPU, CPU, storage and the like are allocated at the IaaS layer, then the mirror image is loaded to the IaaS through the kubernetes public service module, and then the user can view the training environment using own algorithm through the RESTful service.
And 203, respectively mirror-storing the environment and the data by utilizing a Harbor according to the received environment storage request and the data storage request.
In one embodiment, when the RESTful service receives a request for storing a training environment by a user, the RESTful service invokes a dock public service module to package the image, and after the package is completed, the Harbor basic service is invoked to upload the image to a master node of the Harbor service.
The cloud platform provided in the embodiment of the application is adopted, after one training is completed, the environment and the data can be respectively stored, machine learning developers can simply migrate the trained model to the production environment without redeploying the environment in the subsequent production environment, tracking experiments and machine learning deployment are very easy, and experimental results can be reproduced.
In one embodiment, the storing the environment and the data according to the received environment storing request and the data storing request in mirror images respectively includes:
storing the environment mirror image to a local training center according to the received environment storage request;
and storing the data mirror image to a cloud data center according to the received data storage request.
In one embodiment, the method further comprises:
extracting the requested environment according to the received environment extraction request;
and loading the extracted environment into an algorithm service by using a RESTful micro-service framework, and generating an application program interface to be provided for an application service module.
In one embodiment, the context save request is from a first terminal and the context extract request is from a second terminal.
Example three
The machine learning cloud platform provided by the embodiment of the application adopts the architecture design of cloud computing, the system support is deployed on cloud computing IaaS layer service, the PaaS layer adopts the application mode of Kubernetes+Docker, and the SaaS layer comprises: public library services, RESTful micro-service frameworks (micro-service kernel loading, HTTP API), application services + WEB, management maintenance, etc.
In particular, the method comprises the steps of,
public library service: including basic functions such as journaling, configuration, mathematical calculations, etc.;
RESTful micro-service framework: based on the flash framework, the method is used for unifying micro-service interfaces, decoupling the relation with the service and unifying RESTful Application Program Interface (API). Specifically, the method is used for processing the request, the dispatch and the life cycle management of the WEB classification task.
Management and maintenance: the Harbor manages mirror image resources, including Docker mirror image manufacturing and release, kubernetes, docker, micro-service resource monitoring and resource arrangement functions.
Application WEB: the UI display is used for displaying the running state of the task and the machine learning result, and also comprises a monitoring display of the resource.
The cloud service is customized for the machine learning traffic volume, functions and services such as AI training, AI online service, training model management, training environment management, GPU resource management and the like are provided, one-stop machine learning task hosting is realized, and the cloud service is universally applicable to common machine learning business scenes such as picture recognition, audio and video processing and the like. In particular, the method comprises the following advantages:
1. hybrid cloud resource management. And performing accurate management on computing resources such as GPU (graphics processing units) of different IDCs.
The existing cloud platform only can support the management of resources such as CPU, memory, disk and the like, and cannot realize the rationalization management of the GPU. The algorithm training is particularly dependent on the GPU, and the requirements of different algorithms on the GPU are different. According to the method and the device, uniform scheduling is carried out on GPU resources through the Kubernetes, so that accurate management of the GPU resources is achieved.
2. User training can self-expand resources such as GPU, storage and the like, self-configure SSD local storage, and support expansion of multiple cloud service storage types.
The embodiment of the application allocates and schedules the computer resources through the Kubernetes, and establishes a bridge between the users and the resources through the public service module and the RESTful service. The user can click and select the required resource configuration through the front-end service in the browser.
3. The training service packages a training algorithm by using a Docker mirror image, a user can upload a user-defined algorithm mirror image to a DGnet mirror image center, and the training service can pull the training mirror image; the mirror center provides a basic mirror template of the AI framework of TensorFlow, MXNet, keras, caffe, etc.
The algorithm training often needs different machine learning environments like TensorFlow, caffe, MXNet, etc., but Linux systems do not install these environments by default, so in order to reduce the time of deploying this part of the environment, in this application, some basic algorithm training environments are built through a dock, and users can directly use the built algorithm training environments. The existing cloud platform only supports the installation of a Linux system and does not deploy related algorithm model environments.
4. The training service enables training a one-stop hosted service while also supporting distributed AI training tasks and interactive AI training tasks. The platform realizes the functions of GPU node scheduling, training data uploading and downloading, task disaster recovery and the like, and has high availability.
The conventional cloud computing platform does not perform customization processing on algorithm training, and the embodiment of the application performs customization processing on AI algorithm training, including providing mirror images related to the AI algorithm, GPU node scheduling and the like.
Example four
For the purposes of facilitating the practice of this application, this application example will be described in terms of a specific example.
Fig. 3 shows an architecture schematic of a machine learning cloud platform in example four of the present application.
As shown, the entire flow of user usage is illustrated. In the flow of the third diagram, the user is unaware, which is the ppline called by the whole program and the basic service required to be called in the whole service in the example of the application. The system will call in the flow order of the graph. And in the process, basic services in the data center, the training center and the integrated service center are called to schedule related resources.
The machine learning cloud platform provided by the embodiment of the application comprises a data center, a training center and an integrated service center, wherein the data center, the training center and the integrated service center are supported and deployed on cloud computing IaaS layer services, the PaaS layer of the cloud platform adopts an application mode of Kubernetes+Docker, and the SaaS layer of the cloud platform is a micro-server machine learning system.
Assuming that a face recognition model is to be manufactured, a user A uploads a plurality of face pictures acquired by public security departments or other ways as training data to a cloud platform provided by an application example, and the application example stores the training data to a data center;
user a builds a training environment, including determining the number of GPUs and storage resources, algorithms used for training, and so forth. The program can perform resource scheduling in the training center module, and schedule the task instance of the user to the host machine meeting the requirements according to the requirements of the user. Training is performed after the instance is started. After training is completed, a face recognition model is obtained, a user A clicks a storage environment, the mirror image of the environment is manufactured and stored in a training center, the mirror image storage is performed through a DockerSDK to perform storage deleting operation, and RBAC management of the mirror image is performed through an integrated HarborSDK.
When the user B wants to use the face recognition model, a request can be sent to a cloud platform provided by the application example, and the application example can extract mirror image data of the environment of the face recognition model and provide the mirror image data to the client of the user B. The user can select the needed instance configuration and the needed mirror image through the list option of the Web front end. And the back-end program packages the user requirements into a Yaml configuration file, and the scheduling system performs explicit scheduling and configuration according to the Yaml configuration file.
The main functional modules of the system comprise: public library services, RESTful micro-service frameworks (micro-service kernel loading, HTTP API), application services + WEB, management maintenance, etc.
Public library service: including basic functions such as journaling, configuration, mathematical calculations, etc.;
RESTful micro-service framework: based on the flash framework, the micro service interface is mainly unified, the relation between the service interface and the service is decoupled, and the RESTful API is unified.
Management and maintenance: the Harbor manages mirror image resources, and mainly comprises mirror image manufacturing and release of the Docker, kubernetes, docker, micro-service resource monitoring and resource arrangement functions.
Application WEB: if the microservice is considered a deep server, then the shallow server application and WEB client are contained herein. The server side processes the request, the dispatch and the life cycle management of the WEB classification task. The WEB terminal displays the running state of the task and the machine learning result UI display, and also comprises the monitoring display of the resource.
It will be appreciated by those skilled in the art that examples of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The solutions in the examples of the present application may be implemented in various computer languages, for example, object-oriented programming language Java, and an transliterated scripting language JavaScript, etc.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to examples of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred examples of the present application have been described, additional variations and modifications in those examples may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following appended claims be interpreted as including the preferred examples and all such alterations and modifications as fall within the scope of the present application.
It will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the spirit or scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims and the equivalents thereof, the present application is intended to cover such modifications and variations.

Claims (5)

1. A cloud platform for providing machine learning services, comprising: an IaaS layer, a PaaS layer and a SaaS layer, wherein the IaaS layer is provided with a system support module, the PaaS layer is provided with Kubernetes and Docker, the SaaS layer comprises a public library service module, a RESTful micro-service module, an application service module and a management maintenance module,
the public library service module is used for recording logs, configuration parameters and mathematical calculation;
the RESTful micro-service module is used for processing the received request, scheduling and life cycle management of the WEB classification task;
the application service module is used for displaying the task running state and the machine learning result;
the management maintenance module is used for managing mirror image resources by utilizing a Harbor;
the RESTful microservice module comprises:
the environment preservation unit is used for receiving an environment preservation request and preserving the environment mirror image in the environment preservation request;
the environment extraction unit is used for receiving an environment extraction request and feeding back the requested environment extraction according to the environment extraction request;
the RESTful microservice module comprises:
the training data storage unit is used for receiving a training data storage request and storing training data in the training data storage request;
the training data extraction unit is used for receiving a training data extraction request and extracting and feeding back the requested training data according to the training data extraction request;
the training data storage unit is used for receiving a training data storage request and storing training data in the training data storage request to a cloud data center;
the environment preservation unit is used for receiving an environment preservation request and preserving the environment mirror image in the environment preservation request to a local training center.
2. The cloud platform of claim 1, wherein the management maintenance module comprises:
the mirror image manufacturing unit is used for manufacturing a mirror image of the Docker;
the issuing unit is used for issuing the Docker mirror image;
a monitoring unit for monitoring Kubernetes, docker and micro service resources;
and the arrangement unit is used for arranging the resources.
3. A method of machine learning using the cloud platform for providing machine learning services according to any of claims 1 to 2, comprising:
acquiring a training data set;
loading the training data set and a predetermined initial algorithm model into a predetermined GPU cluster, and training to obtain a trained algorithm model;
respectively mirror-image storing the environment and the data by utilizing a Harbor according to the received environment storing request and data storing request;
the storing the environment and the data according to the received environment storing request and the data storing request in mirror images respectively comprises the following steps:
storing the environment mirror image to a local training center according to the received environment storage request;
and storing the data mirror image to a cloud data center according to the received data storage request.
4. A method as claimed in claim 3, further comprising:
extracting the requested environment according to the received environment extraction request;
and loading the extracted environment into an algorithm service by using a RESTful micro-service framework, and generating an application program interface to be provided for an application service module.
5. The method of claim 4, wherein the context save request is from a first terminal and the context extract request is from a second terminal.
CN202011226841.6A 2020-11-06 2020-11-06 Cloud platform and method for providing machine learning service Active CN112311605B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011226841.6A CN112311605B (en) 2020-11-06 2020-11-06 Cloud platform and method for providing machine learning service

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011226841.6A CN112311605B (en) 2020-11-06 2020-11-06 Cloud platform and method for providing machine learning service

Publications (2)

Publication Number Publication Date
CN112311605A CN112311605A (en) 2021-02-02
CN112311605B true CN112311605B (en) 2023-12-22

Family

ID=74326202

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011226841.6A Active CN112311605B (en) 2020-11-06 2020-11-06 Cloud platform and method for providing machine learning service

Country Status (1)

Country Link
CN (1) CN112311605B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112799742B (en) * 2021-02-09 2024-02-13 上海海事大学 Machine learning practical training system and method based on micro-service
CN115167292B (en) * 2021-04-12 2024-04-09 清华大学 Intelligent factory operating system based on industrial Internet architecture

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105654066A (en) * 2016-02-02 2016-06-08 北京格灵深瞳信息技术有限公司 Vehicle identification method and device
CN107659609A (en) * 2017-07-26 2018-02-02 北京天云融创软件技术有限公司 A kind of deep learning support platform and deep learning training method based on cloud computing
CN107704252A (en) * 2017-10-20 2018-02-16 北京百悟科技有限公司 A kind of method and system for providing a user artificial intelligence platform
CN108170520A (en) * 2018-01-29 2018-06-15 北京搜狐新媒体信息技术有限公司 A kind of cloud computing resources management method and device
CN109144724A (en) * 2018-07-27 2019-01-04 众安信息技术服务有限公司 A kind of micro services resource scheduling system and method
CN109885389A (en) * 2019-02-19 2019-06-14 山东浪潮云信息技术有限公司 A kind of parallel deep learning scheduling training method and system based on container
CN109961151A (en) * 2017-12-21 2019-07-02 同方威视科技江苏有限公司 For the system for calculating service of machine learning and for the method for machine learning
CN110058922A (en) * 2019-03-19 2019-07-26 华为技术有限公司 A kind of method, apparatus of the metadata of extraction machine learning tasks
CN110245003A (en) * 2019-06-06 2019-09-17 中信银行股份有限公司 A kind of machine learning uniprocessor algorithm arranging system and method
CN110413294A (en) * 2019-08-06 2019-11-05 中国工商银行股份有限公司 Service delivery system, method, apparatus and equipment
CN110795072A (en) * 2019-10-16 2020-02-14 北京航空航天大学 Crowd-sourcing competition platform framework system and method based on crowd intelligence
CN111026409A (en) * 2019-10-28 2020-04-17 烽火通信科技股份有限公司 Automatic monitoring method, device, terminal equipment and computer storage medium
CN111158745A (en) * 2019-12-30 2020-05-15 山东浪潮商用系统有限公司 Data processing platform based on Docker
CN111625316A (en) * 2020-05-15 2020-09-04 苏州浪潮智能科技有限公司 Environment deployment method and device, electronic equipment and storage medium
CN111861020A (en) * 2020-07-27 2020-10-30 深圳壹账通智能科技有限公司 Model deployment method, device, equipment and storage medium

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10225323B2 (en) * 2012-09-07 2019-03-05 Oracle International Corporation System and method for providing java cloud services for use with a cloud computing environment
US9921885B2 (en) * 2015-06-19 2018-03-20 Vmware, Inc. Resource management for containers in a virtualized environment
US11475353B2 (en) * 2017-12-01 2022-10-18 Appranix, Inc. Automated application reliability management using adaptable machine learning models
US11423326B2 (en) * 2018-09-14 2022-08-23 Microsoft Technology Licensing, Llc Using machine-learning methods to facilitate experimental evaluation of modifications to a computational environment within a distributed system
US11048564B2 (en) * 2018-09-21 2021-06-29 International Business Machines Corporation API evolution and adaptation based on cognitive selection and unsupervised feature learning
US11200142B2 (en) * 2018-10-26 2021-12-14 International Business Machines Corporation Perform preemptive identification and reduction of risk of failure in computational systems by training a machine learning module
US11507434B2 (en) * 2019-02-01 2022-11-22 Hewlett Packard Enterprise Development Lp Recommendation and deployment engine and method for machine learning based processes in hybrid cloud environments
US11562176B2 (en) * 2019-02-22 2023-01-24 Cisco Technology, Inc. IoT fog as distributed machine learning structure search platform

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105654066A (en) * 2016-02-02 2016-06-08 北京格灵深瞳信息技术有限公司 Vehicle identification method and device
CN107659609A (en) * 2017-07-26 2018-02-02 北京天云融创软件技术有限公司 A kind of deep learning support platform and deep learning training method based on cloud computing
CN107704252A (en) * 2017-10-20 2018-02-16 北京百悟科技有限公司 A kind of method and system for providing a user artificial intelligence platform
CN109961151A (en) * 2017-12-21 2019-07-02 同方威视科技江苏有限公司 For the system for calculating service of machine learning and for the method for machine learning
CN108170520A (en) * 2018-01-29 2018-06-15 北京搜狐新媒体信息技术有限公司 A kind of cloud computing resources management method and device
CN109144724A (en) * 2018-07-27 2019-01-04 众安信息技术服务有限公司 A kind of micro services resource scheduling system and method
CN109885389A (en) * 2019-02-19 2019-06-14 山东浪潮云信息技术有限公司 A kind of parallel deep learning scheduling training method and system based on container
CN110058922A (en) * 2019-03-19 2019-07-26 华为技术有限公司 A kind of method, apparatus of the metadata of extraction machine learning tasks
CN110245003A (en) * 2019-06-06 2019-09-17 中信银行股份有限公司 A kind of machine learning uniprocessor algorithm arranging system and method
CN110413294A (en) * 2019-08-06 2019-11-05 中国工商银行股份有限公司 Service delivery system, method, apparatus and equipment
CN110795072A (en) * 2019-10-16 2020-02-14 北京航空航天大学 Crowd-sourcing competition platform framework system and method based on crowd intelligence
CN111026409A (en) * 2019-10-28 2020-04-17 烽火通信科技股份有限公司 Automatic monitoring method, device, terminal equipment and computer storage medium
CN111158745A (en) * 2019-12-30 2020-05-15 山东浪潮商用系统有限公司 Data processing platform based on Docker
CN111625316A (en) * 2020-05-15 2020-09-04 苏州浪潮智能科技有限公司 Environment deployment method and device, electronic equipment and storage medium
CN111861020A (en) * 2020-07-27 2020-10-30 深圳壹账通智能科技有限公司 Model deployment method, device, equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
企业开发、测试环境快速部署方案的设计与实现;徐星;李银桥;刘学锋;毛建华;;工业控制计算机(第03期);全文 *
基于Docker和Kubernetes的深度学习容器云平台的设计与实现;罗晟皓;《中国优秀硕士学位论文全文数据库》;20200115;正文第11-57页 *

Also Published As

Publication number Publication date
CN112311605A (en) 2021-02-02

Similar Documents

Publication Publication Date Title
US9661071B2 (en) Apparatus, systems and methods for deployment and management of distributed computing systems and applications
US10635406B2 (en) Determining the identity of software in software containers
US9971593B2 (en) Interactive content development
US10387199B2 (en) Container chaining for automated process completion
US20180088926A1 (en) Container image management using layer deltas
Yang et al. A profile-based approach to just-in-time scalability for cloud applications
CN107515776A (en) The uninterrupted upgrade method of business, node to be upgraded and readable storage medium storing program for executing
CN111527474B (en) Dynamic delivery of software functions
CN112311605B (en) Cloud platform and method for providing machine learning service
Naranjo et al. Accelerated serverless computing based on GPU virtualization
US11762654B2 (en) Processing framework for in-system programming in a containerized environment
CN107526623B (en) Data processing method and device
US10318343B2 (en) Migration methods and apparatuses for migrating virtual machine including locally stored and shared data
Rechert et al. Introduction to an emulation-based preservation strategy for software-based artworks
CN109558143A (en) The method and device of application deployment in a kind of cluster
US11288232B2 (en) Database deployment objects and deterministic locking models
US10997269B1 (en) Using web application components with different web application frameworks in a web application
US9934019B1 (en) Application function conversion to a service
US9965260B2 (en) Software product release automation framework
US20150378689A1 (en) Application instance staging
US9059992B2 (en) Distributed mobile enterprise application platform
CN113326098B (en) Cloud management platform supporting KVM virtualization and container virtualization
US11543945B1 (en) Accurate local depiction of preview of a program window included in a remote graphical desktop
CN109766246B (en) Method and apparatus for monitoring applications
US11895101B2 (en) Machine learning development hub

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 100192 Block B, Building 1, Tiandi Adjacent to Maple Industrial Park, No. 1, North Yongtaizhuang Road, Haidian District, Beijing

Applicant after: Beijing gelingshentong Information Technology Co.,Ltd.

Address before: 100192 Block B, Building 1, Tiandi Adjacent to Maple Industrial Park, No. 1, North Yongtaizhuang Road, Haidian District, Beijing

Applicant before: BEIJING DEEPGLINT INFORMATION TECHNOLOGY Co.,Ltd.

GR01 Patent grant
GR01 Patent grant