CN112311605A - Cloud platform and method for providing machine learning service - Google Patents
Cloud platform and method for providing machine learning service Download PDFInfo
- Publication number
- CN112311605A CN112311605A CN202011226841.6A CN202011226841A CN112311605A CN 112311605 A CN112311605 A CN 112311605A CN 202011226841 A CN202011226841 A CN 202011226841A CN 112311605 A CN112311605 A CN 112311605A
- Authority
- CN
- China
- Prior art keywords
- environment
- request
- machine learning
- training
- training data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000010801 machine learning Methods 0.000 title claims abstract description 61
- 238000000034 method Methods 0.000 title claims abstract description 24
- 238000012423 maintenance Methods 0.000 claims abstract description 14
- 238000012545 processing Methods 0.000 claims abstract description 12
- 238000004364 calculation method Methods 0.000 claims abstract description 7
- 238000012549 training Methods 0.000 claims description 80
- 238000004422 calculation algorithm Methods 0.000 claims description 23
- 238000007726 management method Methods 0.000 claims description 22
- 238000003860 storage Methods 0.000 claims description 20
- 238000004519 manufacturing process Methods 0.000 claims description 18
- 238000013500 data storage Methods 0.000 claims description 11
- 238000000605 extraction Methods 0.000 claims description 10
- 238000012544 monitoring process Methods 0.000 claims description 9
- 238000013075 data extraction Methods 0.000 claims description 6
- 238000002474 experimental method Methods 0.000 abstract description 6
- 238000010586 diagram Methods 0.000 description 12
- 230000006870 function Effects 0.000 description 11
- 238000004590 computer program Methods 0.000 description 7
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 238000002360 preparation method Methods 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000005538 encapsulation Methods 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000004806 packaging method and process Methods 0.000 description 1
- 238000012856 packing Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/02—Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
- H04L67/025—Protocols based on web technology, e.g. hypertext transfer protocol [HTTP] for remote control or remote monitoring of applications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/04—Network management architectures or arrangements
- H04L41/044—Network management architectures or arrangements comprising hierarchical management structures
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/08—Configuration management of networks or network elements
- H04L41/0803—Configuration setting
- H04L41/0823—Configuration setting characterised by the purposes of a change of settings, e.g. optimising configuration for enhancing reliability
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1095—Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/12—Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/60—Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
A cloud platform and method of providing machine learning services, comprising: the system comprises an IaaS layer, a PaaS layer and a SaaS layer, wherein the IaaS layer is provided with a system support module, the PaaS layer is provided with Kubernets and a Docker, and the SaaS layer comprises a public library service module, a RESTful micro-service module, an application service module and a management maintenance module, wherein the public library service module is used for recording logs, configuration parameters and mathematical calculation; the RESTful micro-service module is used for processing the received request, scheduling and life cycle management of the WEB classification task; the application service module is used for displaying the task running state and the machine learning result; and the management maintenance module is used for managing the mirror image resources by using the Harbor. By adopting the scheme in the application, the environment can be simply transferred, the tracking experiment and the machine learning deployment are very easy, and the experiment result can be reproduced.
Description
Technical Field
The present application relates to cloud computing technologies, and in particular, to a cloud platform and method for providing machine learning services.
Background
Machine learning is a very popular technology, and is not only applied to the fields of security protection, transportation, medical treatment, finance, retail and the like.
Although machine learning can produce excellent results, its use is still complex in practice. In addition to the common challenges in software development, machine learning developers are faced with new challenges including experimental management (e.g., tracking which parameters, code, and data the results are due to), repeatability (e.g., the same code can be executed later in the same operating environment), deploying models to the production environment, and data governance (auditing the models and data used throughout the organization). These workflow-related challenges surrounding the machine learning lifecycle are typically the biggest hurdles to using machine learning in a production environment and extending it within the facility.
At present, some cloud platforms support online training services of algorithms, and the cloud platforms often have the following functions: and writing codes by self for training by clicking resources such as applying GPU, storing and the like and corresponding environments. The following problems still exist when the cloud platforms are utilized for machine learning:
1. there are countless independent tools, from data preparation to model training, hundreds of software tools cover each phase of the machine learning lifecycle, machine learning developers need to deploy production environments around tens of libraries;
2. experimental results are difficult to reproduce, a large amount of data is needed during model training, environments are deployed in a targeted mode, and when the model training system is used in an actual production environment of a user, the environments need to be re-deployed and a large amount of data are obtained, so that the approximately same experimental results can be obtained;
3. tracking experiments and deploying machine learning is difficult, machine learning algorithms have dozens of configurable parameters, tracking these configurable parameters and their values is very difficult, and migrating a trained model to a production environment is very challenging.
Problems existing in the prior art:
at present, no cloud platform specially aiming at machine learning exists, so that a user needs to perform a large amount of complex work such as data preparation, environment deployment and the like again when using a model in different environments.
Disclosure of Invention
The application example provides a cloud platform and a method for providing machine learning service, so as to solve the technical problem.
According to a first aspect of an example of the present application, there is provided a cloud platform providing a machine learning service, comprising: the system comprises an IaaS layer, a PaaS layer and a SaaS layer, wherein the IaaS layer is provided with a system support module, the PaaS layer is provided with Kubernets and Dockers, the SaaS layer comprises a public library service module, a RESTful micro-service module, an application service module and a management and maintenance module, wherein,
the public library service module is used for recording logs, configuration parameters and mathematical calculation;
the RESTful micro-service module is used for processing the received request, scheduling and life cycle management of the WEB classification task;
the application service module is used for displaying the task running state and the machine learning result;
and the management maintenance module is used for managing the mirror image resources by using the Harbor.
According to a second aspect of an example of the present application, there is provided a method for machine learning by using the cloud platform for providing a machine learning service, including:
acquiring a training data set;
loading the training data set and a predetermined initial algorithm model into a predetermined GPU cluster, and training to obtain a trained algorithm model;
and respectively carrying out mirror image storage on the environment and the data by utilizing a Harbor according to the received environment storage request and the data storage request.
The cloud service is customized for machine learning business, the cloud platform provided in the application example is adopted, after one-time training is completed, the environment and data can be respectively mirrored to be stored as mirror image resources, machine learning developers do not need to redeploy the environment in subsequent production environments, the model after the training is completed can be simply migrated to the production environments, tracking experiments and machine learning deployment are very easy, and experimental results can reappear.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate example embodiments of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1 illustrates a schematic structural diagram of a cloud platform for providing a machine learning service in an example one of the present application;
FIG. 2 is a flow chart illustrating a method for providing machine learning services in example two of the present application;
fig. 3 shows an architecture diagram of a machine learning cloud platform in example four of the present application.
Detailed Description
In order to make the technical solutions and advantages in the examples of the present application more apparent, the following further detailed description of the examples of the present application with reference to the accompanying drawings makes it clear that the described examples are only a part of the examples of the present application, and not an exhaustive list of all examples. It should be noted that the examples and features of the examples in this application may be combined with each other without conflict.
Example one
Fig. 1 shows a schematic structural diagram of a cloud platform for providing a machine learning service in an example one of the present application.
As shown, the cloud platform for providing machine learning services includes: the system comprises an IaaS layer, a PaaS layer and a SaaS layer, wherein the IaaS layer is provided with a system support module, the PaaS layer is provided with Kubernets and Dockers, the SaaS layer comprises a public library service module, a RESTful micro-service module, an application service module and a management and maintenance module, wherein,
the public library service module is used for recording logs, configuration parameters and mathematical calculation;
the RESTful micro-service module is used for processing the received request, scheduling and life cycle management of the WEB classification task;
the application service module is used for displaying the task running state and the machine learning result;
and the management maintenance module is used for managing the mirror image resources by using the Harbor.
In specific implementation, the WEB classification task may include a machine learning training task, an environment and data mirroring task, and the like.
The cloud service is customized for machine learning business, the cloud platform provided in the application example is adopted, after one-time training is completed, the environment and data can be respectively mirrored to be stored as mirror image resources, machine learning developers do not need to redeploy the environment in subsequent production environments, the model after the training is completed can be simply migrated to the production environments, tracking experiments and machine learning deployment are very easy, and experimental results can reappear.
In one embodiment, the RESTful microservice module comprises:
the training data storage unit is used for receiving a training data storage request and storing the training data in the training data storage request;
and the training data extraction unit is used for receiving a training data extraction request and extracting and feeding back the requested training data according to the training data extraction request.
In one embodiment, the training data saving unit is configured to receive a training data saving request and save training data in the training data saving request to a cloud data center.
In one embodiment, the RESTful microservice module comprises:
the environment storage unit is used for receiving an environment storage request and storing an environment mirror image in the environment storage request;
and the environment extraction unit is used for receiving the environment extraction request and feeding back the requested environment extraction according to the environment extraction request.
In one embodiment, the environment saving unit is configured to receive an environment saving request and save an environment image in the environment saving request to a local training center.
In one embodiment, the administration and maintenance module includes:
the mirror image manufacturing unit is used for Docker mirror image manufacturing; specifically, the method can refer to packaging a training environment or a code block required by machine learning into an ISO image file.
The release unit is used for releasing the Docker mirror image; specifically, the whole platform can uniformly manage all manufactured mirror images, and a user can select a mirror image file required by the user according to the requirement of the user, so that the machine learning training environment of the user can be deployed quickly.
The monitoring unit is used for monitoring Kubernets, Dockers and micro-service resources; specifically, the monitoring unit may monitor kubernets, Docker, and micro service resources in real time in order to ensure efficient operation of the entire cluster. If a certain machine learning training environment collapses, the embodiment of the application obtains the information of the environment collapse through the monitoring unit, and then the machine learning training environment is quickly recovered, so that the relevant data are not lost, and the high availability of the service is ensured.
And the arranging unit is used for arranging the resources. In particular, machine learning training requires a large amount of computer resources, including CPU, GPU, memory, disk, network, etc. The resources are arranged to better allocate these network resources in the cluster with the aim of allowing all users to better use the flexible computer resources in the cluster.
Example two
Based on the same inventive concept, the application example provides a method for machine learning by using the cloud platform for providing the machine learning service as in the first example.
Fig. 2 is a flowchart illustrating a method for providing a machine learning service according to example two of the present application.
As shown, the method for providing a machine learning service includes:
in an embodiment, a training data set is stored in a disk of an IaaS layer, the application example performs training set management through a SaaS service MinIO, and a public library service module is called to operate the MinIO, so as to obtain a corresponding training data set.
202, loading the training data set and a predetermined initial algorithm model into a predetermined GPU cluster, and training to obtain a trained algorithm model;
in one implementation, resources such as a GPU, a CPU and a storage are firstly distributed on an IaaS layer, then a mirror image is loaded to the IaaS through a kubernets public service module, and then a user can check and use the algorithm training environment of the user through RESTful service.
And step 203, utilizing a Harbor to respectively store the environment and the data in a mirror image mode according to the received environment storage request and the data storage request.
In an implementation mode, when the RESTful service receives a request of a user for saving a training environment, a docker public service module is called to pack a mirror image, and after the packing is completed, a Harbor basic service is called to upload the mirror image to a master node of the Harbor service.
The cloud service is customized for machine learning business, the environment and the data can be respectively stored after one-time training is completed by adopting the cloud platform provided in the application example, machine learning developers can simply migrate the model after the training is completed to the production environment without redeploying the environment in the subsequent production environment, tracking experiments and machine learning deployment are very easy, and experimental results can reappear.
In one embodiment, the mirror saving the environment and the data according to the received environment saving request and data saving request respectively includes:
saving the environment mirror image to a local training center according to the received environment saving request;
and storing the data mirror image to a cloud data center according to the received data storage request.
In one embodiment, the method further comprises:
extracting a requested environment according to the received environment extraction request;
and loading the extracted environment into an algorithm service by utilizing a RESTful micro-service framework, generating an application program interface and providing the application program interface for an application service module.
In one embodiment, the context save request is from a first terminal and the context extract request is from a second terminal.
Example three
The machine learning cloud platform provided by the embodiment of the application adopts the architecture design of cloud computing, the system supports deployment on cloud computing IaaS layer service, the PaaS layer adopts the application mode of Kubernets + Docker, and the SaaS layer comprises: public library services, RESTful microservice frameworks (microservice core load, HTTP API), application services + WEB, administrative maintenance, etc.
In particular, the method comprises the following steps of,
public library service: including basic functions such as logging, configuration, mathematical calculations, etc.;
RESTful microservices framework: based on a flash framework, the method is used for unifying micro-service interfaces, decoupling the relation with services and unifying RESTful Application Program Interfaces (APIs). In particular, the method is used for processing the request, scheduling and life cycle management of the WEB classification task.
And (3) management and maintenance: the Harbor manages the mirroring resources, and comprises Docker mirroring manufacturing and issuing, Kubernetes, Docker and micro-service resource monitoring and resource arrangement functions.
And (3) WEB application: and the UI display is used for displaying the running state of the task and the result of machine learning, and also comprises a monitoring display of the resource.
The cloud service is customized for the machine learning business, the functions and services of AI training, AI on-line service, training model management, training environment management, GPU resource management and the like are provided, one-stop machine learning task hosting is realized, and the cloud service is generally suitable for common machine learning business scenes such as picture recognition, audio and video processing and the like. In particular, the following advantages are included:
1. and managing hybrid cloud resources. And carrying out precise management on the calculation resources such as the GPUs of different IDCs and the like.
The existing cloud platform can only support the management of resources such as a CPU, an internal memory, a disk and the like, and can not realize the reasonable management of the GPU. Algorithm training is particularly dependent on the GPU, and the requirements of different algorithms on the GPU are different. According to the embodiment of the application, GPU resources are uniformly scheduled through Kubernetes, so that the GPU resources are accurately managed.
2. The user training can self-expand GPU, storage and other resources, self-define SSD local storage configuration, and support expansion of various cloud service storage types.
In the embodiment of the application, computer resources are distributed and scheduled through Kubernets, and a bridge between users and resources is built through a public service module and RESTful service. The user can click and select the needed resource configuration through the front-end service in the browser.
3. The training service uses a Docker mirror image encapsulation training algorithm, a user can upload a self-defined algorithm mirror image to the DGnet mirror image center, and the training service can pull the training mirror image; the mirror center provides basic mirror templates of AI frameworks such as TensorFlow, MXNet, Keras, Caffe and the like.
Different machine learning environments such as TensorFlow, Caffe, MXNet and the like are often needed for algorithm training, a Linux system does not default to the environments, in order to reduce the time for deploying the environment, some basic algorithm training environments are built through docker in the embodiment of the application, and a user can directly use the built algorithm training environments. The existing cloud platform only supports the installation of a Linux system and does not deploy a related algorithm model environment.
4. The training service realizes a training one-stop hosting service, and simultaneously supports a distributed AI training task and an interactive AI training task. The platform realizes the functions of GPU node scheduling, training data uploading and downloading, task disaster tolerance and the like, and has high availability.
The traditional cloud computing platform does not perform customized processing on algorithm training, and the embodiment of the application performs customized processing on AI algorithm training, including providing mirror images related to AI algorithms, GPU node scheduling and the like.
Example four
In order to facilitate the implementation of the present application, the present application is illustrated by a specific example.
Fig. 3 shows an architecture diagram of a machine learning cloud platform in example four of the present application.
As shown, the entire flow of user usage is presented. In the flow of fig. three, the user is unaware, which is the pipline called by the whole program and the basic service required to be called in the whole service in the example of the present application. The system is called according to the sequence of the three flows of the figure. And basic services in a data center, a training center and an integrated service center are called in the process to schedule related resources.
The machine learning cloud platform provided by the embodiment of the application comprises a data center, a training center and an integrated service center, wherein the data center, the training center and the integrated service center are supported by a system and deployed on a cloud computing IaaS layer service, a PaaS layer of the cloud platform adopts an application mode of Kubernets + Docker, and a SaaS layer of the cloud platform is a micro-service machine learning system.
Supposing that a face recognition model is to be made, a user A uploads a plurality of face pictures acquired by a public security department or other ways to a cloud platform provided by the application example as training data, and the application example stores the training data to a data center;
user a builds a training environment, including determining the number and storage resources of GPUs, algorithms used for training, and so on. And the program carries out resource scheduling in the training center module, and schedules the task instance of the user to a host machine meeting the requirements according to the requirements of the user. Training is performed after the instance is started. After training is completed, a face recognition model is obtained, a user A clicks a storage environment, an image of the environment is stored to a training center, the image is stored and deleted through a DockerSDK, and RBAC management of the image is managed through an integrated HarborSDK.
When the user B wants to use the face recognition model, a request can be sent to a cloud platform provided by the application example, and the application example can extract mirror image data of the environment of the face recognition model and provide the mirror image data to the client of the user B. The user can select the needed instance configuration and the needed mirror image through the list option of the Web front end. The back-end program packages the user requirements into a Yaml configuration file, and then the scheduling system performs explicit scheduling and configuration according to the Yaml file.
The main functional modules of the system comprise: public library services, RESTful microservice frameworks (microservice core load, HTTP API), application services + WEB, administrative maintenance, etc.
Public library service: including basic functions such as logging, configuration, mathematical calculations, etc.;
RESTful microservices framework: based on a flash framework, micro-service interfaces are mainly unified, the relation between decoupling and service is unified, and RESTful API is unified.
And (3) management and maintenance: the Harbor manages the mirroring resources and mainly comprises Docker mirroring manufacturing and issuing, Kubernetes, Docker and micro-service resource monitoring and resource arrangement functions.
And (3) WEB application: if the micro-service is regarded as a deep server, the shallow server application and the WEB client are included. And the server side processes the request, scheduling and life cycle management of the WEB classification task. And the WEB terminal displays the running state of the task and the result UI display of machine learning and also monitors and displays resources.
As will be appreciated by one skilled in the art, examples of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The solution in the present application example can be implemented by using various computer languages, for example, object-oriented programming language Java and transliterated scripting language JavaScript.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to examples of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred examples of the present application have been described, additional variations and modifications in those examples may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the appended claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.
Claims (10)
1. A cloud platform that provides machine learning services, comprising: the system comprises an IaaS layer, a PaaS layer and a SaaS layer, wherein the IaaS layer is provided with a system support module, the PaaS layer is provided with Kubernets and Dockers, the SaaS layer comprises a public library service module, a RESTful micro-service module, an application service module and a management and maintenance module, wherein,
the public library service module is used for recording logs, configuration parameters and mathematical calculation;
the RESTful micro-service module is used for processing the received request, scheduling and life cycle management of the WEB classification task;
the application service module is used for displaying the task running state and the machine learning result;
and the management maintenance module is used for managing the mirror image resources by using the Harbor.
2. The cloud platform of claim 1, wherein said RESTful microservices module comprises:
the training data storage unit is used for receiving a training data storage request and storing the training data in the training data storage request;
and the training data extraction unit is used for receiving a training data extraction request and extracting and feeding back the requested training data according to the training data extraction request.
3. The cloud platform of claim 2, wherein the training data saving unit is configured to receive a training data saving request and save training data in the training data saving request to a cloud data center.
4. The cloud platform of claim 1, wherein said RESTful microservices module comprises:
the environment storage unit is used for receiving an environment storage request and storing an environment mirror image in the environment storage request;
and the environment extraction unit is used for receiving the environment extraction request and feeding back the requested environment extraction according to the environment extraction request.
5. The cloud platform of claim 4, wherein the environment saving unit is configured to receive an environment saving request and save an environment image in the environment saving request to a local training center.
6. The cloud platform of claim 1, wherein the administration and maintenance module comprises:
the mirror image manufacturing unit is used for Docker mirror image manufacturing;
the release unit is used for releasing the Docker mirror image;
the monitoring unit is used for monitoring Kubernets, Dockers and micro-service resources;
and the arranging unit is used for arranging the resources.
7. A method for machine learning using the cloud platform for providing machine learning services according to any one of claims 1 to 6, comprising:
acquiring a training data set;
loading the training data set and a predetermined initial algorithm model into a predetermined GPU cluster, and training to obtain a trained algorithm model;
and respectively carrying out mirror image storage on the environment and the data by utilizing a Harbor according to the received environment storage request and the data storage request.
8. The method of claim 7, wherein mirroring the context and the data according to the received context save request and the data save request respectively comprises:
saving the environment mirror image to a local training center according to the received environment saving request;
and storing the data mirror image to a cloud data center according to the received data storage request.
9. The method of claim 7, further comprising:
extracting a requested environment according to the received environment extraction request;
and loading the extracted environment into an algorithm service by utilizing a RESTful micro-service framework, generating an application program interface and providing the application program interface for an application service module.
10. The method of claim 9, wherein the context save request is from a first terminal and the context extract request is from a second terminal.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011226841.6A CN112311605B (en) | 2020-11-06 | 2020-11-06 | Cloud platform and method for providing machine learning service |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011226841.6A CN112311605B (en) | 2020-11-06 | 2020-11-06 | Cloud platform and method for providing machine learning service |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112311605A true CN112311605A (en) | 2021-02-02 |
CN112311605B CN112311605B (en) | 2023-12-22 |
Family
ID=74326202
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011226841.6A Active CN112311605B (en) | 2020-11-06 | 2020-11-06 | Cloud platform and method for providing machine learning service |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112311605B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112799742A (en) * | 2021-02-09 | 2021-05-14 | 上海海事大学 | Machine learning training system and method based on micro-service |
CN113824790A (en) * | 2021-09-23 | 2021-12-21 | 大连华信计算机技术股份有限公司 | Cloud native PaaS management platform supporting enterprise-level application |
CN115167292A (en) * | 2021-04-12 | 2022-10-11 | 清华大学 | Intelligent factory operating system based on industrial Internet architecture |
CN113824790B (en) * | 2021-09-23 | 2024-04-26 | 信华信技术股份有限公司 | Cloud native PaaS management platform supporting enterprise-level application |
Citations (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140075035A1 (en) * | 2012-09-07 | 2014-03-13 | Oracle International Corporation | System and method for providing java cloud services for use with a cloud computing environment |
CN105654066A (en) * | 2016-02-02 | 2016-06-08 | 北京格灵深瞳信息技术有限公司 | Vehicle identification method and device |
US20160371127A1 (en) * | 2015-06-19 | 2016-12-22 | Vmware, Inc. | Resource management for containers in a virtualized environment |
CN107659609A (en) * | 2017-07-26 | 2018-02-02 | 北京天云融创软件技术有限公司 | A kind of deep learning support platform and deep learning training method based on cloud computing |
CN107704252A (en) * | 2017-10-20 | 2018-02-16 | 北京百悟科技有限公司 | A kind of method and system for providing a user artificial intelligence platform |
CN108170520A (en) * | 2018-01-29 | 2018-06-15 | 北京搜狐新媒体信息技术有限公司 | A kind of cloud computing resources management method and device |
CN109144724A (en) * | 2018-07-27 | 2019-01-04 | 众安信息技术服务有限公司 | A kind of micro services resource scheduling system and method |
US20190171966A1 (en) * | 2017-12-01 | 2019-06-06 | Govindarajan Rangasamy | Automated application reliability management using adaptable machine learning models |
CN109885389A (en) * | 2019-02-19 | 2019-06-14 | 山东浪潮云信息技术有限公司 | A kind of parallel deep learning scheduling training method and system based on container |
CN109961151A (en) * | 2017-12-21 | 2019-07-02 | 同方威视科技江苏有限公司 | For the system for calculating service of machine learning and for the method for machine learning |
CN110058922A (en) * | 2019-03-19 | 2019-07-26 | 华为技术有限公司 | A kind of method, apparatus of the metadata of extraction machine learning tasks |
CN110245003A (en) * | 2019-06-06 | 2019-09-17 | 中信银行股份有限公司 | A kind of machine learning uniprocessor algorithm arranging system and method |
CN110413294A (en) * | 2019-08-06 | 2019-11-05 | 中国工商银行股份有限公司 | Service delivery system, method, apparatus and equipment |
CN110795072A (en) * | 2019-10-16 | 2020-02-14 | 北京航空航天大学 | Crowd-sourcing competition platform framework system and method based on crowd intelligence |
US20200089651A1 (en) * | 2018-09-14 | 2020-03-19 | Microsoft Technology Licensing, Llc | Using machine-learning methods to facilitate experimental evaluation of modifications to a computational environment within a distributed system |
US20200097338A1 (en) * | 2018-09-21 | 2020-03-26 | International Business Machines Corporation | Api evolution and adaptation based on cognitive selection and unsupervised feature learning |
CN111026409A (en) * | 2019-10-28 | 2020-04-17 | 烽火通信科技股份有限公司 | Automatic monitoring method, device, terminal equipment and computer storage medium |
US20200133820A1 (en) * | 2018-10-26 | 2020-04-30 | International Business Machines Corporation | Perform preemptive identification and reduction of risk of failure in computational systems by training a machine learning module |
CN111158745A (en) * | 2019-12-30 | 2020-05-15 | 山东浪潮商用系统有限公司 | Data processing platform based on Docker |
US20200250012A1 (en) * | 2019-02-01 | 2020-08-06 | Hewlett Packard Enterprise Development Lp | Recommendation and deployment engine and method for machine learning based processes in hybrid cloud environments |
US20200272859A1 (en) * | 2019-02-22 | 2020-08-27 | Cisco Technology, Inc. | Iot fog as distributed machine learning structure search platform |
CN111625316A (en) * | 2020-05-15 | 2020-09-04 | 苏州浪潮智能科技有限公司 | Environment deployment method and device, electronic equipment and storage medium |
CN111861020A (en) * | 2020-07-27 | 2020-10-30 | 深圳壹账通智能科技有限公司 | Model deployment method, device, equipment and storage medium |
-
2020
- 2020-11-06 CN CN202011226841.6A patent/CN112311605B/en active Active
Patent Citations (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140075035A1 (en) * | 2012-09-07 | 2014-03-13 | Oracle International Corporation | System and method for providing java cloud services for use with a cloud computing environment |
US20160371127A1 (en) * | 2015-06-19 | 2016-12-22 | Vmware, Inc. | Resource management for containers in a virtualized environment |
CN105654066A (en) * | 2016-02-02 | 2016-06-08 | 北京格灵深瞳信息技术有限公司 | Vehicle identification method and device |
CN107659609A (en) * | 2017-07-26 | 2018-02-02 | 北京天云融创软件技术有限公司 | A kind of deep learning support platform and deep learning training method based on cloud computing |
CN107704252A (en) * | 2017-10-20 | 2018-02-16 | 北京百悟科技有限公司 | A kind of method and system for providing a user artificial intelligence platform |
US20190171966A1 (en) * | 2017-12-01 | 2019-06-06 | Govindarajan Rangasamy | Automated application reliability management using adaptable machine learning models |
CN109961151A (en) * | 2017-12-21 | 2019-07-02 | 同方威视科技江苏有限公司 | For the system for calculating service of machine learning and for the method for machine learning |
CN108170520A (en) * | 2018-01-29 | 2018-06-15 | 北京搜狐新媒体信息技术有限公司 | A kind of cloud computing resources management method and device |
CN109144724A (en) * | 2018-07-27 | 2019-01-04 | 众安信息技术服务有限公司 | A kind of micro services resource scheduling system and method |
US20200089651A1 (en) * | 2018-09-14 | 2020-03-19 | Microsoft Technology Licensing, Llc | Using machine-learning methods to facilitate experimental evaluation of modifications to a computational environment within a distributed system |
US20200097338A1 (en) * | 2018-09-21 | 2020-03-26 | International Business Machines Corporation | Api evolution and adaptation based on cognitive selection and unsupervised feature learning |
US20200133820A1 (en) * | 2018-10-26 | 2020-04-30 | International Business Machines Corporation | Perform preemptive identification and reduction of risk of failure in computational systems by training a machine learning module |
US20200250012A1 (en) * | 2019-02-01 | 2020-08-06 | Hewlett Packard Enterprise Development Lp | Recommendation and deployment engine and method for machine learning based processes in hybrid cloud environments |
CN109885389A (en) * | 2019-02-19 | 2019-06-14 | 山东浪潮云信息技术有限公司 | A kind of parallel deep learning scheduling training method and system based on container |
US20200272859A1 (en) * | 2019-02-22 | 2020-08-27 | Cisco Technology, Inc. | Iot fog as distributed machine learning structure search platform |
CN110058922A (en) * | 2019-03-19 | 2019-07-26 | 华为技术有限公司 | A kind of method, apparatus of the metadata of extraction machine learning tasks |
CN110245003A (en) * | 2019-06-06 | 2019-09-17 | 中信银行股份有限公司 | A kind of machine learning uniprocessor algorithm arranging system and method |
CN110413294A (en) * | 2019-08-06 | 2019-11-05 | 中国工商银行股份有限公司 | Service delivery system, method, apparatus and equipment |
CN110795072A (en) * | 2019-10-16 | 2020-02-14 | 北京航空航天大学 | Crowd-sourcing competition platform framework system and method based on crowd intelligence |
CN111026409A (en) * | 2019-10-28 | 2020-04-17 | 烽火通信科技股份有限公司 | Automatic monitoring method, device, terminal equipment and computer storage medium |
CN111158745A (en) * | 2019-12-30 | 2020-05-15 | 山东浪潮商用系统有限公司 | Data processing platform based on Docker |
CN111625316A (en) * | 2020-05-15 | 2020-09-04 | 苏州浪潮智能科技有限公司 | Environment deployment method and device, electronic equipment and storage medium |
CN111861020A (en) * | 2020-07-27 | 2020-10-30 | 深圳壹账通智能科技有限公司 | Model deployment method, device, equipment and storage medium |
Non-Patent Citations (3)
Title |
---|
徐星;李银桥;刘学锋;毛建华;: "企业开发、测试环境快速部署方案的设计与实现", 工业控制计算机, no. 03 * |
罗晟皓: "基于Docker和Kubernetes的深度学习容器云平台的设计与实现", 《中国优秀硕士学位论文全文数据库》 * |
罗晟皓: "基于Docker和Kubernetes的深度学习容器云平台的设计与实现", 《中国优秀硕士学位论文全文数据库》, 15 January 2020 (2020-01-15), pages 11 - 57 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112799742A (en) * | 2021-02-09 | 2021-05-14 | 上海海事大学 | Machine learning training system and method based on micro-service |
CN112799742B (en) * | 2021-02-09 | 2024-02-13 | 上海海事大学 | Machine learning practical training system and method based on micro-service |
CN115167292A (en) * | 2021-04-12 | 2022-10-11 | 清华大学 | Intelligent factory operating system based on industrial Internet architecture |
CN115167292B (en) * | 2021-04-12 | 2024-04-09 | 清华大学 | Intelligent factory operating system based on industrial Internet architecture |
CN113824790A (en) * | 2021-09-23 | 2021-12-21 | 大连华信计算机技术股份有限公司 | Cloud native PaaS management platform supporting enterprise-level application |
CN113824790B (en) * | 2021-09-23 | 2024-04-26 | 信华信技术股份有限公司 | Cloud native PaaS management platform supporting enterprise-level application |
Also Published As
Publication number | Publication date |
---|---|
CN112311605B (en) | 2023-12-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11086661B2 (en) | Container chaining for automated process completion | |
US9661071B2 (en) | Apparatus, systems and methods for deployment and management of distributed computing systems and applications | |
US10635406B2 (en) | Determining the identity of software in software containers | |
US9971593B2 (en) | Interactive content development | |
US10048955B2 (en) | Accelerating software builds | |
US20180088926A1 (en) | Container image management using layer deltas | |
Yang et al. | A profile-based approach to just-in-time scalability for cloud applications | |
JP2019215877A (en) | Visual content development | |
CN111527474B (en) | Dynamic delivery of software functions | |
CN112104723B (en) | Multi-cluster data processing system and method | |
US20180246753A1 (en) | Program execution without the use of bytecode modification or injection | |
CN112311605B (en) | Cloud platform and method for providing machine learning service | |
CN113791765B (en) | Resource arrangement method, device and equipment of cloud service and storage medium | |
US11288232B2 (en) | Database deployment objects and deterministic locking models | |
US9934019B1 (en) | Application function conversion to a service | |
US20150378689A1 (en) | Application instance staging | |
CN110019059B (en) | Timing synchronization method and device | |
CN113326098B (en) | Cloud management platform supporting KVM virtualization and container virtualization | |
US11543945B1 (en) | Accurate local depiction of preview of a program window included in a remote graphical desktop | |
KR101838944B1 (en) | Rendering system and method | |
US9866451B2 (en) | Deployment of enterprise applications | |
US11202130B1 (en) | Offline video presentation | |
US11893403B1 (en) | Automation service | |
US20230176839A1 (en) | Automatic management of applications in a containerized environment | |
US11314718B2 (en) | Shared disk buffer pool update and modification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: 100192 Block B, Building 1, Tiandi Adjacent to Maple Industrial Park, No. 1, North Yongtaizhuang Road, Haidian District, Beijing Applicant after: Beijing gelingshentong Information Technology Co.,Ltd. Address before: 100192 Block B, Building 1, Tiandi Adjacent to Maple Industrial Park, No. 1, North Yongtaizhuang Road, Haidian District, Beijing Applicant before: BEIJING DEEPGLINT INFORMATION TECHNOLOGY Co.,Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |