CN112311605A - Cloud platform and method for providing machine learning service - Google Patents

Cloud platform and method for providing machine learning service Download PDF

Info

Publication number
CN112311605A
CN112311605A CN202011226841.6A CN202011226841A CN112311605A CN 112311605 A CN112311605 A CN 112311605A CN 202011226841 A CN202011226841 A CN 202011226841A CN 112311605 A CN112311605 A CN 112311605A
Authority
CN
China
Prior art keywords
environment
request
machine learning
training
training data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011226841.6A
Other languages
Chinese (zh)
Other versions
CN112311605B (en
Inventor
马震
王志洋
马慧荣
黄严
张德兵
邓亚峰
赵勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Deepglint Information Technology Co ltd
Original Assignee
Beijing Deepglint Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Deepglint Information Technology Co ltd filed Critical Beijing Deepglint Information Technology Co ltd
Priority to CN202011226841.6A priority Critical patent/CN112311605B/en
Publication of CN112311605A publication Critical patent/CN112311605A/en
Application granted granted Critical
Publication of CN112311605B publication Critical patent/CN112311605B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • H04L67/025Protocols based on web technology, e.g. hypertext transfer protocol [HTTP] for remote control or remote monitoring of applications
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/04Network management architectures or arrangements
    • H04L41/044Network management architectures or arrangements comprising hierarchical management structures
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0803Configuration setting
    • H04L41/0823Configuration setting characterised by the purposes of a change of settings, e.g. optimising configuration for enhancing reliability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/60Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

A cloud platform and method of providing machine learning services, comprising: the system comprises an IaaS layer, a PaaS layer and a SaaS layer, wherein the IaaS layer is provided with a system support module, the PaaS layer is provided with Kubernets and a Docker, and the SaaS layer comprises a public library service module, a RESTful micro-service module, an application service module and a management maintenance module, wherein the public library service module is used for recording logs, configuration parameters and mathematical calculation; the RESTful micro-service module is used for processing the received request, scheduling and life cycle management of the WEB classification task; the application service module is used for displaying the task running state and the machine learning result; and the management maintenance module is used for managing the mirror image resources by using the Harbor. By adopting the scheme in the application, the environment can be simply transferred, the tracking experiment and the machine learning deployment are very easy, and the experiment result can be reproduced.

Description

Cloud platform and method for providing machine learning service
Technical Field
The present application relates to cloud computing technologies, and in particular, to a cloud platform and method for providing machine learning services.
Background
Machine learning is a very popular technology, and is not only applied to the fields of security protection, transportation, medical treatment, finance, retail and the like.
Although machine learning can produce excellent results, its use is still complex in practice. In addition to the common challenges in software development, machine learning developers are faced with new challenges including experimental management (e.g., tracking which parameters, code, and data the results are due to), repeatability (e.g., the same code can be executed later in the same operating environment), deploying models to the production environment, and data governance (auditing the models and data used throughout the organization). These workflow-related challenges surrounding the machine learning lifecycle are typically the biggest hurdles to using machine learning in a production environment and extending it within the facility.
At present, some cloud platforms support online training services of algorithms, and the cloud platforms often have the following functions: and writing codes by self for training by clicking resources such as applying GPU, storing and the like and corresponding environments. The following problems still exist when the cloud platforms are utilized for machine learning:
1. there are countless independent tools, from data preparation to model training, hundreds of software tools cover each phase of the machine learning lifecycle, machine learning developers need to deploy production environments around tens of libraries;
2. experimental results are difficult to reproduce, a large amount of data is needed during model training, environments are deployed in a targeted mode, and when the model training system is used in an actual production environment of a user, the environments need to be re-deployed and a large amount of data are obtained, so that the approximately same experimental results can be obtained;
3. tracking experiments and deploying machine learning is difficult, machine learning algorithms have dozens of configurable parameters, tracking these configurable parameters and their values is very difficult, and migrating a trained model to a production environment is very challenging.
Problems existing in the prior art:
at present, no cloud platform specially aiming at machine learning exists, so that a user needs to perform a large amount of complex work such as data preparation, environment deployment and the like again when using a model in different environments.
Disclosure of Invention
The application example provides a cloud platform and a method for providing machine learning service, so as to solve the technical problem.
According to a first aspect of an example of the present application, there is provided a cloud platform providing a machine learning service, comprising: the system comprises an IaaS layer, a PaaS layer and a SaaS layer, wherein the IaaS layer is provided with a system support module, the PaaS layer is provided with Kubernets and Dockers, the SaaS layer comprises a public library service module, a RESTful micro-service module, an application service module and a management and maintenance module, wherein,
the public library service module is used for recording logs, configuration parameters and mathematical calculation;
the RESTful micro-service module is used for processing the received request, scheduling and life cycle management of the WEB classification task;
the application service module is used for displaying the task running state and the machine learning result;
and the management maintenance module is used for managing the mirror image resources by using the Harbor.
According to a second aspect of an example of the present application, there is provided a method for machine learning by using the cloud platform for providing a machine learning service, including:
acquiring a training data set;
loading the training data set and a predetermined initial algorithm model into a predetermined GPU cluster, and training to obtain a trained algorithm model;
and respectively carrying out mirror image storage on the environment and the data by utilizing a Harbor according to the received environment storage request and the data storage request.
The cloud service is customized for machine learning business, the cloud platform provided in the application example is adopted, after one-time training is completed, the environment and data can be respectively mirrored to be stored as mirror image resources, machine learning developers do not need to redeploy the environment in subsequent production environments, the model after the training is completed can be simply migrated to the production environments, tracking experiments and machine learning deployment are very easy, and experimental results can reappear.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate example embodiments of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1 illustrates a schematic structural diagram of a cloud platform for providing a machine learning service in an example one of the present application;
FIG. 2 is a flow chart illustrating a method for providing machine learning services in example two of the present application;
fig. 3 shows an architecture diagram of a machine learning cloud platform in example four of the present application.
Detailed Description
In order to make the technical solutions and advantages in the examples of the present application more apparent, the following further detailed description of the examples of the present application with reference to the accompanying drawings makes it clear that the described examples are only a part of the examples of the present application, and not an exhaustive list of all examples. It should be noted that the examples and features of the examples in this application may be combined with each other without conflict.
Example one
Fig. 1 shows a schematic structural diagram of a cloud platform for providing a machine learning service in an example one of the present application.
As shown, the cloud platform for providing machine learning services includes: the system comprises an IaaS layer, a PaaS layer and a SaaS layer, wherein the IaaS layer is provided with a system support module, the PaaS layer is provided with Kubernets and Dockers, the SaaS layer comprises a public library service module, a RESTful micro-service module, an application service module and a management and maintenance module, wherein,
the public library service module is used for recording logs, configuration parameters and mathematical calculation;
the RESTful micro-service module is used for processing the received request, scheduling and life cycle management of the WEB classification task;
the application service module is used for displaying the task running state and the machine learning result;
and the management maintenance module is used for managing the mirror image resources by using the Harbor.
In specific implementation, the WEB classification task may include a machine learning training task, an environment and data mirroring task, and the like.
The cloud service is customized for machine learning business, the cloud platform provided in the application example is adopted, after one-time training is completed, the environment and data can be respectively mirrored to be stored as mirror image resources, machine learning developers do not need to redeploy the environment in subsequent production environments, the model after the training is completed can be simply migrated to the production environments, tracking experiments and machine learning deployment are very easy, and experimental results can reappear.
In one embodiment, the RESTful microservice module comprises:
the training data storage unit is used for receiving a training data storage request and storing the training data in the training data storage request;
and the training data extraction unit is used for receiving a training data extraction request and extracting and feeding back the requested training data according to the training data extraction request.
In one embodiment, the training data saving unit is configured to receive a training data saving request and save training data in the training data saving request to a cloud data center.
In one embodiment, the RESTful microservice module comprises:
the environment storage unit is used for receiving an environment storage request and storing an environment mirror image in the environment storage request;
and the environment extraction unit is used for receiving the environment extraction request and feeding back the requested environment extraction according to the environment extraction request.
In one embodiment, the environment saving unit is configured to receive an environment saving request and save an environment image in the environment saving request to a local training center.
In one embodiment, the administration and maintenance module includes:
the mirror image manufacturing unit is used for Docker mirror image manufacturing; specifically, the method can refer to packaging a training environment or a code block required by machine learning into an ISO image file.
The release unit is used for releasing the Docker mirror image; specifically, the whole platform can uniformly manage all manufactured mirror images, and a user can select a mirror image file required by the user according to the requirement of the user, so that the machine learning training environment of the user can be deployed quickly.
The monitoring unit is used for monitoring Kubernets, Dockers and micro-service resources; specifically, the monitoring unit may monitor kubernets, Docker, and micro service resources in real time in order to ensure efficient operation of the entire cluster. If a certain machine learning training environment collapses, the embodiment of the application obtains the information of the environment collapse through the monitoring unit, and then the machine learning training environment is quickly recovered, so that the relevant data are not lost, and the high availability of the service is ensured.
And the arranging unit is used for arranging the resources. In particular, machine learning training requires a large amount of computer resources, including CPU, GPU, memory, disk, network, etc. The resources are arranged to better allocate these network resources in the cluster with the aim of allowing all users to better use the flexible computer resources in the cluster.
Example two
Based on the same inventive concept, the application example provides a method for machine learning by using the cloud platform for providing the machine learning service as in the first example.
Fig. 2 is a flowchart illustrating a method for providing a machine learning service according to example two of the present application.
As shown, the method for providing a machine learning service includes:
step 201, acquiring a training data set;
in an embodiment, a training data set is stored in a disk of an IaaS layer, the application example performs training set management through a SaaS service MinIO, and a public library service module is called to operate the MinIO, so as to obtain a corresponding training data set.
202, loading the training data set and a predetermined initial algorithm model into a predetermined GPU cluster, and training to obtain a trained algorithm model;
in one implementation, resources such as a GPU, a CPU and a storage are firstly distributed on an IaaS layer, then a mirror image is loaded to the IaaS through a kubernets public service module, and then a user can check and use the algorithm training environment of the user through RESTful service.
And step 203, utilizing a Harbor to respectively store the environment and the data in a mirror image mode according to the received environment storage request and the data storage request.
In an implementation mode, when the RESTful service receives a request of a user for saving a training environment, a docker public service module is called to pack a mirror image, and after the packing is completed, a Harbor basic service is called to upload the mirror image to a master node of the Harbor service.
The cloud service is customized for machine learning business, the environment and the data can be respectively stored after one-time training is completed by adopting the cloud platform provided in the application example, machine learning developers can simply migrate the model after the training is completed to the production environment without redeploying the environment in the subsequent production environment, tracking experiments and machine learning deployment are very easy, and experimental results can reappear.
In one embodiment, the mirror saving the environment and the data according to the received environment saving request and data saving request respectively includes:
saving the environment mirror image to a local training center according to the received environment saving request;
and storing the data mirror image to a cloud data center according to the received data storage request.
In one embodiment, the method further comprises:
extracting a requested environment according to the received environment extraction request;
and loading the extracted environment into an algorithm service by utilizing a RESTful micro-service framework, generating an application program interface and providing the application program interface for an application service module.
In one embodiment, the context save request is from a first terminal and the context extract request is from a second terminal.
Example three
The machine learning cloud platform provided by the embodiment of the application adopts the architecture design of cloud computing, the system supports deployment on cloud computing IaaS layer service, the PaaS layer adopts the application mode of Kubernets + Docker, and the SaaS layer comprises: public library services, RESTful microservice frameworks (microservice core load, HTTP API), application services + WEB, administrative maintenance, etc.
In particular, the method comprises the following steps of,
public library service: including basic functions such as logging, configuration, mathematical calculations, etc.;
RESTful microservices framework: based on a flash framework, the method is used for unifying micro-service interfaces, decoupling the relation with services and unifying RESTful Application Program Interfaces (APIs). In particular, the method is used for processing the request, scheduling and life cycle management of the WEB classification task.
And (3) management and maintenance: the Harbor manages the mirroring resources, and comprises Docker mirroring manufacturing and issuing, Kubernetes, Docker and micro-service resource monitoring and resource arrangement functions.
And (3) WEB application: and the UI display is used for displaying the running state of the task and the result of machine learning, and also comprises a monitoring display of the resource.
The cloud service is customized for the machine learning business, the functions and services of AI training, AI on-line service, training model management, training environment management, GPU resource management and the like are provided, one-stop machine learning task hosting is realized, and the cloud service is generally suitable for common machine learning business scenes such as picture recognition, audio and video processing and the like. In particular, the following advantages are included:
1. and managing hybrid cloud resources. And carrying out precise management on the calculation resources such as the GPUs of different IDCs and the like.
The existing cloud platform can only support the management of resources such as a CPU, an internal memory, a disk and the like, and can not realize the reasonable management of the GPU. Algorithm training is particularly dependent on the GPU, and the requirements of different algorithms on the GPU are different. According to the embodiment of the application, GPU resources are uniformly scheduled through Kubernetes, so that the GPU resources are accurately managed.
2. The user training can self-expand GPU, storage and other resources, self-define SSD local storage configuration, and support expansion of various cloud service storage types.
In the embodiment of the application, computer resources are distributed and scheduled through Kubernets, and a bridge between users and resources is built through a public service module and RESTful service. The user can click and select the needed resource configuration through the front-end service in the browser.
3. The training service uses a Docker mirror image encapsulation training algorithm, a user can upload a self-defined algorithm mirror image to the DGnet mirror image center, and the training service can pull the training mirror image; the mirror center provides basic mirror templates of AI frameworks such as TensorFlow, MXNet, Keras, Caffe and the like.
Different machine learning environments such as TensorFlow, Caffe, MXNet and the like are often needed for algorithm training, a Linux system does not default to the environments, in order to reduce the time for deploying the environment, some basic algorithm training environments are built through docker in the embodiment of the application, and a user can directly use the built algorithm training environments. The existing cloud platform only supports the installation of a Linux system and does not deploy a related algorithm model environment.
4. The training service realizes a training one-stop hosting service, and simultaneously supports a distributed AI training task and an interactive AI training task. The platform realizes the functions of GPU node scheduling, training data uploading and downloading, task disaster tolerance and the like, and has high availability.
The traditional cloud computing platform does not perform customized processing on algorithm training, and the embodiment of the application performs customized processing on AI algorithm training, including providing mirror images related to AI algorithms, GPU node scheduling and the like.
Example four
In order to facilitate the implementation of the present application, the present application is illustrated by a specific example.
Fig. 3 shows an architecture diagram of a machine learning cloud platform in example four of the present application.
As shown, the entire flow of user usage is presented. In the flow of fig. three, the user is unaware, which is the pipline called by the whole program and the basic service required to be called in the whole service in the example of the present application. The system is called according to the sequence of the three flows of the figure. And basic services in a data center, a training center and an integrated service center are called in the process to schedule related resources.
The machine learning cloud platform provided by the embodiment of the application comprises a data center, a training center and an integrated service center, wherein the data center, the training center and the integrated service center are supported by a system and deployed on a cloud computing IaaS layer service, a PaaS layer of the cloud platform adopts an application mode of Kubernets + Docker, and a SaaS layer of the cloud platform is a micro-service machine learning system.
Supposing that a face recognition model is to be made, a user A uploads a plurality of face pictures acquired by a public security department or other ways to a cloud platform provided by the application example as training data, and the application example stores the training data to a data center;
user a builds a training environment, including determining the number and storage resources of GPUs, algorithms used for training, and so on. And the program carries out resource scheduling in the training center module, and schedules the task instance of the user to a host machine meeting the requirements according to the requirements of the user. Training is performed after the instance is started. After training is completed, a face recognition model is obtained, a user A clicks a storage environment, an image of the environment is stored to a training center, the image is stored and deleted through a DockerSDK, and RBAC management of the image is managed through an integrated HarborSDK.
When the user B wants to use the face recognition model, a request can be sent to a cloud platform provided by the application example, and the application example can extract mirror image data of the environment of the face recognition model and provide the mirror image data to the client of the user B. The user can select the needed instance configuration and the needed mirror image through the list option of the Web front end. The back-end program packages the user requirements into a Yaml configuration file, and then the scheduling system performs explicit scheduling and configuration according to the Yaml file.
The main functional modules of the system comprise: public library services, RESTful microservice frameworks (microservice core load, HTTP API), application services + WEB, administrative maintenance, etc.
Public library service: including basic functions such as logging, configuration, mathematical calculations, etc.;
RESTful microservices framework: based on a flash framework, micro-service interfaces are mainly unified, the relation between decoupling and service is unified, and RESTful API is unified.
And (3) management and maintenance: the Harbor manages the mirroring resources and mainly comprises Docker mirroring manufacturing and issuing, Kubernetes, Docker and micro-service resource monitoring and resource arrangement functions.
And (3) WEB application: if the micro-service is regarded as a deep server, the shallow server application and the WEB client are included. And the server side processes the request, scheduling and life cycle management of the WEB classification task. And the WEB terminal displays the running state of the task and the result UI display of machine learning and also monitors and displays resources.
As will be appreciated by one skilled in the art, examples of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The solution in the present application example can be implemented by using various computer languages, for example, object-oriented programming language Java and transliterated scripting language JavaScript.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to examples of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred examples of the present application have been described, additional variations and modifications in those examples may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the appended claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (10)

1. A cloud platform that provides machine learning services, comprising: the system comprises an IaaS layer, a PaaS layer and a SaaS layer, wherein the IaaS layer is provided with a system support module, the PaaS layer is provided with Kubernets and Dockers, the SaaS layer comprises a public library service module, a RESTful micro-service module, an application service module and a management and maintenance module, wherein,
the public library service module is used for recording logs, configuration parameters and mathematical calculation;
the RESTful micro-service module is used for processing the received request, scheduling and life cycle management of the WEB classification task;
the application service module is used for displaying the task running state and the machine learning result;
and the management maintenance module is used for managing the mirror image resources by using the Harbor.
2. The cloud platform of claim 1, wherein said RESTful microservices module comprises:
the training data storage unit is used for receiving a training data storage request and storing the training data in the training data storage request;
and the training data extraction unit is used for receiving a training data extraction request and extracting and feeding back the requested training data according to the training data extraction request.
3. The cloud platform of claim 2, wherein the training data saving unit is configured to receive a training data saving request and save training data in the training data saving request to a cloud data center.
4. The cloud platform of claim 1, wherein said RESTful microservices module comprises:
the environment storage unit is used for receiving an environment storage request and storing an environment mirror image in the environment storage request;
and the environment extraction unit is used for receiving the environment extraction request and feeding back the requested environment extraction according to the environment extraction request.
5. The cloud platform of claim 4, wherein the environment saving unit is configured to receive an environment saving request and save an environment image in the environment saving request to a local training center.
6. The cloud platform of claim 1, wherein the administration and maintenance module comprises:
the mirror image manufacturing unit is used for Docker mirror image manufacturing;
the release unit is used for releasing the Docker mirror image;
the monitoring unit is used for monitoring Kubernets, Dockers and micro-service resources;
and the arranging unit is used for arranging the resources.
7. A method for machine learning using the cloud platform for providing machine learning services according to any one of claims 1 to 6, comprising:
acquiring a training data set;
loading the training data set and a predetermined initial algorithm model into a predetermined GPU cluster, and training to obtain a trained algorithm model;
and respectively carrying out mirror image storage on the environment and the data by utilizing a Harbor according to the received environment storage request and the data storage request.
8. The method of claim 7, wherein mirroring the context and the data according to the received context save request and the data save request respectively comprises:
saving the environment mirror image to a local training center according to the received environment saving request;
and storing the data mirror image to a cloud data center according to the received data storage request.
9. The method of claim 7, further comprising:
extracting a requested environment according to the received environment extraction request;
and loading the extracted environment into an algorithm service by utilizing a RESTful micro-service framework, generating an application program interface and providing the application program interface for an application service module.
10. The method of claim 9, wherein the context save request is from a first terminal and the context extract request is from a second terminal.
CN202011226841.6A 2020-11-06 2020-11-06 Cloud platform and method for providing machine learning service Active CN112311605B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011226841.6A CN112311605B (en) 2020-11-06 2020-11-06 Cloud platform and method for providing machine learning service

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011226841.6A CN112311605B (en) 2020-11-06 2020-11-06 Cloud platform and method for providing machine learning service

Publications (2)

Publication Number Publication Date
CN112311605A true CN112311605A (en) 2021-02-02
CN112311605B CN112311605B (en) 2023-12-22

Family

ID=74326202

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011226841.6A Active CN112311605B (en) 2020-11-06 2020-11-06 Cloud platform and method for providing machine learning service

Country Status (1)

Country Link
CN (1) CN112311605B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112799742A (en) * 2021-02-09 2021-05-14 上海海事大学 Machine learning training system and method based on micro-service
CN113824790A (en) * 2021-09-23 2021-12-21 大连华信计算机技术股份有限公司 Cloud native PaaS management platform supporting enterprise-level application
CN115167292A (en) * 2021-04-12 2022-10-11 清华大学 Intelligent factory operating system based on industrial Internet architecture
CN113824790B (en) * 2021-09-23 2024-04-26 信华信技术股份有限公司 Cloud native PaaS management platform supporting enterprise-level application

Citations (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140075035A1 (en) * 2012-09-07 2014-03-13 Oracle International Corporation System and method for providing java cloud services for use with a cloud computing environment
CN105654066A (en) * 2016-02-02 2016-06-08 北京格灵深瞳信息技术有限公司 Vehicle identification method and device
US20160371127A1 (en) * 2015-06-19 2016-12-22 Vmware, Inc. Resource management for containers in a virtualized environment
CN107659609A (en) * 2017-07-26 2018-02-02 北京天云融创软件技术有限公司 A kind of deep learning support platform and deep learning training method based on cloud computing
CN107704252A (en) * 2017-10-20 2018-02-16 北京百悟科技有限公司 A kind of method and system for providing a user artificial intelligence platform
CN108170520A (en) * 2018-01-29 2018-06-15 北京搜狐新媒体信息技术有限公司 A kind of cloud computing resources management method and device
CN109144724A (en) * 2018-07-27 2019-01-04 众安信息技术服务有限公司 A kind of micro services resource scheduling system and method
US20190171966A1 (en) * 2017-12-01 2019-06-06 Govindarajan Rangasamy Automated application reliability management using adaptable machine learning models
CN109885389A (en) * 2019-02-19 2019-06-14 山东浪潮云信息技术有限公司 A kind of parallel deep learning scheduling training method and system based on container
CN109961151A (en) * 2017-12-21 2019-07-02 同方威视科技江苏有限公司 For the system for calculating service of machine learning and for the method for machine learning
CN110058922A (en) * 2019-03-19 2019-07-26 华为技术有限公司 A kind of method, apparatus of the metadata of extraction machine learning tasks
CN110245003A (en) * 2019-06-06 2019-09-17 中信银行股份有限公司 A kind of machine learning uniprocessor algorithm arranging system and method
CN110413294A (en) * 2019-08-06 2019-11-05 中国工商银行股份有限公司 Service delivery system, method, apparatus and equipment
CN110795072A (en) * 2019-10-16 2020-02-14 北京航空航天大学 Crowd-sourcing competition platform framework system and method based on crowd intelligence
US20200089651A1 (en) * 2018-09-14 2020-03-19 Microsoft Technology Licensing, Llc Using machine-learning methods to facilitate experimental evaluation of modifications to a computational environment within a distributed system
US20200097338A1 (en) * 2018-09-21 2020-03-26 International Business Machines Corporation Api evolution and adaptation based on cognitive selection and unsupervised feature learning
CN111026409A (en) * 2019-10-28 2020-04-17 烽火通信科技股份有限公司 Automatic monitoring method, device, terminal equipment and computer storage medium
US20200133820A1 (en) * 2018-10-26 2020-04-30 International Business Machines Corporation Perform preemptive identification and reduction of risk of failure in computational systems by training a machine learning module
CN111158745A (en) * 2019-12-30 2020-05-15 山东浪潮商用系统有限公司 Data processing platform based on Docker
US20200250012A1 (en) * 2019-02-01 2020-08-06 Hewlett Packard Enterprise Development Lp Recommendation and deployment engine and method for machine learning based processes in hybrid cloud environments
US20200272859A1 (en) * 2019-02-22 2020-08-27 Cisco Technology, Inc. Iot fog as distributed machine learning structure search platform
CN111625316A (en) * 2020-05-15 2020-09-04 苏州浪潮智能科技有限公司 Environment deployment method and device, electronic equipment and storage medium
CN111861020A (en) * 2020-07-27 2020-10-30 深圳壹账通智能科技有限公司 Model deployment method, device, equipment and storage medium

Patent Citations (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140075035A1 (en) * 2012-09-07 2014-03-13 Oracle International Corporation System and method for providing java cloud services for use with a cloud computing environment
US20160371127A1 (en) * 2015-06-19 2016-12-22 Vmware, Inc. Resource management for containers in a virtualized environment
CN105654066A (en) * 2016-02-02 2016-06-08 北京格灵深瞳信息技术有限公司 Vehicle identification method and device
CN107659609A (en) * 2017-07-26 2018-02-02 北京天云融创软件技术有限公司 A kind of deep learning support platform and deep learning training method based on cloud computing
CN107704252A (en) * 2017-10-20 2018-02-16 北京百悟科技有限公司 A kind of method and system for providing a user artificial intelligence platform
US20190171966A1 (en) * 2017-12-01 2019-06-06 Govindarajan Rangasamy Automated application reliability management using adaptable machine learning models
CN109961151A (en) * 2017-12-21 2019-07-02 同方威视科技江苏有限公司 For the system for calculating service of machine learning and for the method for machine learning
CN108170520A (en) * 2018-01-29 2018-06-15 北京搜狐新媒体信息技术有限公司 A kind of cloud computing resources management method and device
CN109144724A (en) * 2018-07-27 2019-01-04 众安信息技术服务有限公司 A kind of micro services resource scheduling system and method
US20200089651A1 (en) * 2018-09-14 2020-03-19 Microsoft Technology Licensing, Llc Using machine-learning methods to facilitate experimental evaluation of modifications to a computational environment within a distributed system
US20200097338A1 (en) * 2018-09-21 2020-03-26 International Business Machines Corporation Api evolution and adaptation based on cognitive selection and unsupervised feature learning
US20200133820A1 (en) * 2018-10-26 2020-04-30 International Business Machines Corporation Perform preemptive identification and reduction of risk of failure in computational systems by training a machine learning module
US20200250012A1 (en) * 2019-02-01 2020-08-06 Hewlett Packard Enterprise Development Lp Recommendation and deployment engine and method for machine learning based processes in hybrid cloud environments
CN109885389A (en) * 2019-02-19 2019-06-14 山东浪潮云信息技术有限公司 A kind of parallel deep learning scheduling training method and system based on container
US20200272859A1 (en) * 2019-02-22 2020-08-27 Cisco Technology, Inc. Iot fog as distributed machine learning structure search platform
CN110058922A (en) * 2019-03-19 2019-07-26 华为技术有限公司 A kind of method, apparatus of the metadata of extraction machine learning tasks
CN110245003A (en) * 2019-06-06 2019-09-17 中信银行股份有限公司 A kind of machine learning uniprocessor algorithm arranging system and method
CN110413294A (en) * 2019-08-06 2019-11-05 中国工商银行股份有限公司 Service delivery system, method, apparatus and equipment
CN110795072A (en) * 2019-10-16 2020-02-14 北京航空航天大学 Crowd-sourcing competition platform framework system and method based on crowd intelligence
CN111026409A (en) * 2019-10-28 2020-04-17 烽火通信科技股份有限公司 Automatic monitoring method, device, terminal equipment and computer storage medium
CN111158745A (en) * 2019-12-30 2020-05-15 山东浪潮商用系统有限公司 Data processing platform based on Docker
CN111625316A (en) * 2020-05-15 2020-09-04 苏州浪潮智能科技有限公司 Environment deployment method and device, electronic equipment and storage medium
CN111861020A (en) * 2020-07-27 2020-10-30 深圳壹账通智能科技有限公司 Model deployment method, device, equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
徐星;李银桥;刘学锋;毛建华;: "企业开发、测试环境快速部署方案的设计与实现", 工业控制计算机, no. 03 *
罗晟皓: "基于Docker和Kubernetes的深度学习容器云平台的设计与实现", 《中国优秀硕士学位论文全文数据库》 *
罗晟皓: "基于Docker和Kubernetes的深度学习容器云平台的设计与实现", 《中国优秀硕士学位论文全文数据库》, 15 January 2020 (2020-01-15), pages 11 - 57 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112799742A (en) * 2021-02-09 2021-05-14 上海海事大学 Machine learning training system and method based on micro-service
CN112799742B (en) * 2021-02-09 2024-02-13 上海海事大学 Machine learning practical training system and method based on micro-service
CN115167292A (en) * 2021-04-12 2022-10-11 清华大学 Intelligent factory operating system based on industrial Internet architecture
CN115167292B (en) * 2021-04-12 2024-04-09 清华大学 Intelligent factory operating system based on industrial Internet architecture
CN113824790A (en) * 2021-09-23 2021-12-21 大连华信计算机技术股份有限公司 Cloud native PaaS management platform supporting enterprise-level application
CN113824790B (en) * 2021-09-23 2024-04-26 信华信技术股份有限公司 Cloud native PaaS management platform supporting enterprise-level application

Also Published As

Publication number Publication date
CN112311605B (en) 2023-12-22

Similar Documents

Publication Publication Date Title
US11086661B2 (en) Container chaining for automated process completion
US9661071B2 (en) Apparatus, systems and methods for deployment and management of distributed computing systems and applications
US10635406B2 (en) Determining the identity of software in software containers
US9971593B2 (en) Interactive content development
US10048955B2 (en) Accelerating software builds
US20180088926A1 (en) Container image management using layer deltas
Yang et al. A profile-based approach to just-in-time scalability for cloud applications
JP2019215877A (en) Visual content development
CN111527474B (en) Dynamic delivery of software functions
CN112104723B (en) Multi-cluster data processing system and method
US20180246753A1 (en) Program execution without the use of bytecode modification or injection
CN112311605B (en) Cloud platform and method for providing machine learning service
CN113791765B (en) Resource arrangement method, device and equipment of cloud service and storage medium
US11288232B2 (en) Database deployment objects and deterministic locking models
US9934019B1 (en) Application function conversion to a service
US20150378689A1 (en) Application instance staging
CN110019059B (en) Timing synchronization method and device
CN113326098B (en) Cloud management platform supporting KVM virtualization and container virtualization
US11543945B1 (en) Accurate local depiction of preview of a program window included in a remote graphical desktop
KR101838944B1 (en) Rendering system and method
US9866451B2 (en) Deployment of enterprise applications
US11202130B1 (en) Offline video presentation
US11893403B1 (en) Automation service
US20230176839A1 (en) Automatic management of applications in a containerized environment
US11314718B2 (en) Shared disk buffer pool update and modification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 100192 Block B, Building 1, Tiandi Adjacent to Maple Industrial Park, No. 1, North Yongtaizhuang Road, Haidian District, Beijing

Applicant after: Beijing gelingshentong Information Technology Co.,Ltd.

Address before: 100192 Block B, Building 1, Tiandi Adjacent to Maple Industrial Park, No. 1, North Yongtaizhuang Road, Haidian District, Beijing

Applicant before: BEIJING DEEPGLINT INFORMATION TECHNOLOGY Co.,Ltd.

GR01 Patent grant
GR01 Patent grant