CN108427992B - Machine learning training system and method based on edge cloud computing - Google Patents

Machine learning training system and method based on edge cloud computing Download PDF

Info

Publication number
CN108427992B
CN108427992B CN201810219917.9A CN201810219917A CN108427992B CN 108427992 B CN108427992 B CN 108427992B CN 201810219917 A CN201810219917 A CN 201810219917A CN 108427992 B CN108427992 B CN 108427992B
Authority
CN
China
Prior art keywords
model
training
data
machine learning
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810219917.9A
Other languages
Chinese (zh)
Other versions
CN108427992A (en
Inventor
张小纳
董艳敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHANDONG YIRAN INFORMATION TECHNOLOGY Co.,Ltd.
Original Assignee
Jinan Feixiang Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinan Feixiang Information Technology Co ltd filed Critical Jinan Feixiang Information Technology Co ltd
Priority to CN201810219917.9A priority Critical patent/CN108427992B/en
Publication of CN108427992A publication Critical patent/CN108427992A/en
Application granted granted Critical
Publication of CN108427992B publication Critical patent/CN108427992B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a machine learning training system and method based on edge cloud computing, which can continuously acquire new data expansion and optimization training samples by a user terminal, train an output model by a machine learning platform, rapidly release the trained output model to the user terminal, collect the application condition of the model by the user terminal and feed the application condition back to the machine learning platform to complete the retraining of the model, form a virtuous circle and continuously improve the training capability and the model accuracy of the machine learning platform, simultaneously enable computing resources to be closer to users, and simultaneously realize the rapid deployment and the unified management of the machine learning platform by means of the resource pooling and the resource management capability of the cloud computing platform.

Description

Machine learning training system and method based on edge cloud computing
Technical Field
The invention relates to a machine learning training system and method based on edge cloud computing.
Background
The current machine learning platform is more, and the work of setting up and maintaining the machine learning platform is also more loaded down with trivial details. How to quickly set up a training platform for machine learning and quickly start a training task becomes a difficult problem to be solved urgently. The cloud computing platform can be used for solving the problem.
Cloud computing generally refers to an infrastructure platform based on a large-scale data center, a cloud end of cloud computing is formed by establishing a centralized large-scale data center and a computing center, and resources and services required by a user are provided to the outside by the cloud end, so the large-scale data center is also called a core cloud. The edge cloud computing is an open cloud platform integrating network, computing, storage and application core capabilities at one side close to an object or a data source, and nearest-end service is provided nearby.
The edge cloud is composed of server nodes distributed in the same region, specifically processes service requests of local users, and rapidly and flexibly provides cloud computing services for the users. The edge clouds are connected through a backbone network, and a user logs in the edge cloud with the nearest geographic position through the network and uses the service provided by the edge cloud nearby. On one hand, the edge cloud is responsible for processing data flow between the core cloud and the user terminal, and by utilizing the correlation between communication data, network overhead is reduced, time delay is reduced, and cloud computing service quality is guaranteed; on the other hand, the edge cloud storage terminal accesses common data and common data required by the cloud computing service.
For machine learning, a training task needs high-density computing resources, the computing resources of the core cloud are located at the far end of data generated by a user and an application model, quick response of user requirements cannot be achieved, the computing resources needed by training can be closer to a user terminal by the edge cloud, the model trained by machine learning can be more conveniently issued to the terminal, and meanwhile the terminal can timely feed back the use condition of the model.
In addition, training models produced by most current machine learning platforms cannot be rapidly released and applied, and cannot be rapidly fed back by users and timely correct the models. Finally, the user cannot quickly experience the intelligent effect brought by machine learning.
In summary, the training methods for most machine learning platforms at present have the following disadvantages:
(1) the number of machine learning platforms is large, the deployment and implementation processes are complex, the maintenance cost is high, and unified management cannot be realized;
(2) the computing resources of the training platform are far away from the user data and cannot be close to the user;
(3) the training model cannot be quickly released to the end user;
(4) the application effect of the training model cannot be timely acquired and fed back to the training platform for training again.
Based on the defects, the invention designs a machine learning training method based on edge cloud computing to solve the problems of complex deployment and implementation and unified management of a machine learning platform, the problem that computing resources are close to users, the problem that a training model is rapidly released and the problem that user feedback is collected and then trained.
Disclosure of Invention
In order to solve the problems, the invention provides a machine learning training system and method based on edge cloud computing.
Firstly, the invention provides a machine learning training system based on edge cloud computing, which can continuously acquire new data expansion and optimization training samples by a user terminal, train an output model by a machine learning platform, rapidly release the model output by training to the user terminal, collect the application condition of the model by the user terminal and feed the application condition back to the machine learning platform to complete the retraining of the model, and form a virtuous circle to continuously improve the training capability and the model accuracy of the machine learning platform.
Meanwhile, the invention provides a machine learning training method based on edge cloud computing, which uses an edge cloud computing platform supporting virtualization and container technology to provide computing resources required by training, so that the computing resources are closer to users, and meanwhile, quick deployment and unified management of the machine learning platform are realized by means of resource pooling and resource management capacity of the cloud computing platform.
In order to achieve the purpose, the invention adopts the following technical scheme:
a machine learning training system based on edge cloud computing comprises an edge cloud computing system and a terminal, wherein:
the edge cloud computing system comprises a machine learning platform and a cloud platform, wherein the cloud platform simultaneously manages a virtualization platform and a container platform, the virtualization platform realizes pooling of resources through a virtualization technology and elastic allocation of the resources, so that the utilization rate of infrastructure is improved, and the container platform realizes decoupling of the machine learning platform and hardware resources by using the container technology;
the machine learning platform is configured to perform specific execution of a training task, and specifically comprises a model production device and a model feedback device, wherein the model production device is configured to perform training scheduling, verification, archiving, publishing, subscribing and updating of a management model; the model feedback device is configured to realize the collection, analysis and feedback of data in the production environment;
and the terminal receives the training model, collects the training result and feeds the training result back to the machine learning platform.
Further, the model production device comprises a training scheduling module, a model verification module, a model archiving module, a model publishing module, an updating decision module and a sample management module, wherein:
the training scheduling module realizes the scheduling and scheduling functions of the training tasks and ensures the effective operation of the training tasks;
the model verification module realizes the local verification function before model archiving and prevents the output of invalid models;
the model filing module realizes the functions of inquiring, retrieving, storing, backing up, deleting, destroying and classifying management of the model files;
the model issuing module rapidly issues the model output by training and distributes the model to each user terminal;
the user subscription module realizes the function of subscribing the model by the user, allows the user terminal to subscribe the model and distribute the model according to the user requirement, and avoids unnecessary transmission;
after the updating decision module receives feedback from a user, the updating decision module decides whether the model needs to be updated or not, and informs the training module to create a new training task if the model needs to be updated;
and the sample management module is used for realizing the management of sampling, storage, expansion, optimization, updating and the like of the samples.
Further, the model feedback device comprises a collection module, an analysis module and a feedback module, wherein:
the collection module is used for realizing the collection of actual effect data of the model application and finishing the data acquisition of the terminal;
the analysis module is responsible for cleaning, filtering, analyzing and summarizing data, generating a sample and completing data preparation work;
the feedback module is responsible for expanding the initial samples output by the analysis module to a formal sample set after checking and feeding the initial samples back to an update decision module of the model production device to decide whether the model needs to be updated or not.
Furthermore, the edge cloud computing system is located at the near end of the terminal, and data change is responded quickly.
The training method based on the system comprises the following steps:
after the training task starts, removing the sample set to obtain a corresponding sample;
applying for container resources, and starting training after initialization configuration;
after the training is finished, outputting a model;
verifying the model on the test sample set, if the model is an effective model, archiving, and otherwise finishing training;
after archiving, starting a publishing process, distributing the model to a corresponding terminal, and receiving feedback accuracy and performance data;
filtering and analyzing the feedback data, and screening effective data from the feedback data and expanding the effective data to a sample set;
deciding whether to start a model updating task by analyzing the accuracy data and the performance data; if the updating is needed, a new training task is started, and if the updating is not needed, the feedback is terminated.
The training task scheduling method based on the system comprises the following steps:
summarizing all currently received training tasks, including: timing tasks, temporary tasks, subscription tasks and updating tasks;
performing priority evaluation on the tasks, and sequencing;
calculating the time cost of all tasks and evaluating the resource requirements of all tasks;
evaluating available resources at present, collecting container resources in the cloud platform, and calculating the amount of the available resources;
performing task arrangement according to the priority, the time cost, the resource requirement and the available resources;
after the tasks are arranged, the container resources are applied according to the sequence, and the training tasks are scheduled.
Further, the priority is in order from big to small, and the tasks are subscribed to, temporarily and updated.
Further, the time cost evaluation parameters include, but are not limited to: model parameters, sample set size, task type, and historical time cost.
Further, the resource demand evaluation parameters include, but are not limited to: model size, sample set size, and historical demand.
The training model issuing method based on the system comprises the following steps:
firstly, inquiring a user list subscribed to the model;
carrying out priority sequencing on the users, wherein sequencing parameters comprise user levels, model use frequency and online states;
and inquiring the authorization of the terminal to the machine learning platform, directly pushing the model under the condition that the authority permits direct updating, and only notifying the model updating message to each terminal if the authority permits only receiving notification.
The feedback and update implementation method based on the system comprises the following steps:
firstly, collecting accuracy data, performance data and use frequency data of model application;
pushing the collected data, filtering the integrity and effectiveness of the data, and then performing classification, aggregation and statistical analysis;
according to the analysis result, a part of data is used for generating a user use report, and a part of available data is used for generating sample data;
and directly feeding back a user report, verifying sample data according to the requirements of a formal sample, expanding the sample data to a sample set after the sample data passes the requirements of the formal sample, and finally feeding back the data.
Compared with the prior art, the invention has the beneficial effects that:
1. the method can be applied to the edge cloud which is positioned at the nearest end of the user data and has strong computing resources, so that the machine learning platform based on the edge cloud can train the model in the environment closest to the user, and the problems of long distance between the user data and the computing resources, high data transmission cost and delayed response are solved. Meanwhile, the centralized management of the deep learning platform is completed by utilizing the resource management function of the edge cloud computing system.
2. The training method provided by the invention realizes a model factory device, can uniformly schedule a training task, complete the training, verification and archiving of the model, and can quickly release the model, so that a user can quickly experience the model. The problems that the model is updated slowly and cannot be adapted to the user scene quickly are solved fundamentally.
3. The training method realizes a feedback device, can uninterruptedly collect performance data of model application, and then feeds the performance data back to the machine learning platform to form a virtuous cycle of model continuous optimization.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application.
FIG. 1 is a schematic view of an edge cloud of the present invention;
FIG. 2 is a machine learning platform deployment diagram of the present invention;
FIG. 3 is an overall flow chart of the training method of the present invention;
FIG. 4 is a schematic view of a pattern production apparatus of the present invention;
FIG. 5 is a schematic view of a model feedback arrangement of the present invention;
FIG. 6 is a training task scheduling flow diagram of the present invention;
FIG. 7 is a flow diagram of model release in accordance with the present invention;
FIG. 8 is a flow chart of the feedback and update of the present invention.
The specific implementation mode is as follows:
the invention is further described with reference to the following figures and examples.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
In the present invention, terms such as "upper", "lower", "left", "right", "front", "rear", "vertical", "horizontal", "side", "bottom", and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only terms of relationships determined for convenience of describing structural relationships of the parts or elements of the present invention, and are not intended to refer to any parts or elements of the present invention, and are not to be construed as limiting the present invention.
In the present invention, terms such as "fixedly connected", "connected", and the like are to be understood in a broad sense, and mean either a fixed connection or an integrally connected or detachable connection; may be directly connected or indirectly connected through an intermediate. The specific meanings of the above terms in the present invention can be determined according to specific situations by persons skilled in the relevant scientific or technical field, and are not to be construed as limiting the present invention.
As described in the background art, the existing machine learning platform has many problems that the deployment implementation process is complicated, the maintenance cost is high, unified management cannot be performed, the machine learning platform cannot approach a user, and a training model cannot be rapidly released to an end user.
As shown in fig. 1, the present invention employs an edge cloud computing system to manage a machine learning platform. The edge cloud computing system simultaneously supports two technologies of virtualization and container, and can simultaneously manage two objects of a virtual machine and a container. The edge cloud computing system can be used for uniformly managing various services of the machine learning platform, and the daily work of deployment, operation and maintenance, upgrading and reconstruction is effectively reduced. The non-core service of machine learning runs on a virtualization platform, pooling of resources is achieved through a virtualization technology, and flexible resource allocation is achieved, so that the utilization rate of infrastructure is improved. The core service (including but not limited to the training model service) runs on the container platform, and the decoupling of the machine learning platform and the hardware resource is realized by using the container technology, so that the method not only can be used for supporting various machine learning platforms, but also can improve the utilization rate of GPU hardware resources and reduce the loss, thereby ensuring that the GPU resources are effectively distributed to the core service. Meanwhile, the edge cloud computing system is located at the near end of the user terminal, computing resources are closer to data, transmission cost of the data can be effectively reduced, and change of the data can be responded quickly.
In particular to a machine learning training platform, a model production device and a model feedback device. Among other things, machine learning platforms support various platforms (including but not limited to Tensorflow, Caffe) that are currently popular for specifically performing training tasks. The model production device is used for managing training scheduling, verification, archiving, publishing, subscribing and updating of the model. The model feedback device is used for realizing data collection, analysis and feedback in the production environment.
The model production device, as shown in fig. 4, adopts the training scheduling module to implement the scheduling and scheduling functions of the training tasks, and can effectively ensure the effective operation of the training tasks. The model verification module realizes the local verification function before model archiving and prevents the output of invalid models. The model filing module realizes the functions of inquiring, retrieving, storing, backing up, deleting, destroying and classifying management of the model files. The model issuing module can rapidly issue the training output model and distribute the training output model to each user terminal. The user subscription module realizes the function of subscribing the model by the user, allows the user terminal to subscribe the model, and distributes the model according to the user requirement, thereby avoiding unnecessary transmission. And after the updating decision module receives feedback from the user, deciding whether the model needs to be updated or not, and informing the training module to create a new training task if the model needs to be updated. And the sample management module is responsible for realizing the management of sampling, storage, expansion, optimization, updating and the like of the samples. The model production device can be used for effectively managing the model, timely distribution of the model can be guaranteed, and the latest model can be guaranteed to be applied timely.
The model feedback device, as shown in fig. 5, uses a collection module to collect actual effect data of the model application, and completes data collection of the user terminal. The analysis module is responsible for cleaning, filtering, analyzing and summarizing the data, generating a sample and completing the preparation work of the data. And the feedback module is responsible for expanding the initial sample output by the analysis module to a formal sample set after checking and feeding the initial sample back to an updating module of the model production device to decide whether the model needs to be updated or not. The feedback device can automatically complete the collection and report of user data, and the machine learning platform can timely obtain user feedback, so that the change of the user data is quickly responded, and a model for continuously improving and optimizing is provided for a user.
Machine learning platform deployment method
(1) As shown in fig. 1, the edge cloud is located between the core cloud and the user terminal, and provides computing resources for the user nearby;
(2) as shown in fig. 2, the edge cloud platform provides two resources, a virtual machine and a container, to the outside;
(3) the machine learning platform runs on the cloud platform, and the non-core service: the model production device and the model feedback device use virtual machine resources to run on a virtualization platform, and the training platform relying on physical GPU resources uses container resources to run on a container platform.
The overall process of the training method of the invention is shown in fig. 3:
(1) after a training task starts, firstly, removing a sample set to obtain a corresponding sample;
(2) applying for container resources, and starting training after initialization configuration;
(3) after the training is finished, outputting a model;
(4) verifying the model on the test sample set, if the model is an effective model, archiving, and otherwise finishing training;
(5) after archiving, starting a publishing process and distributing the model to users;
(6) when the user terminal uses the model, the accuracy and the performance data are collected and fed back to the machine learning platform;
(7) after the machine learning receives the feedback, the data is filtered and analyzed, and effective data is screened out from the data and expanded to a sample set;
(8) whether a model updating task is started or not is decided by an updating module of the model production device through analyzing the accuracy data and the performance data; if the updating is needed, a new training task is started, and if the updating is not needed, the feedback is terminated.
In conclusion, the method is suitable for training tasks of all models, and comprises the steps of starting the training tasks, generating the models, verifying, archiving and releasing the models, feeding back the models to the user terminal, and finally deciding whether to start the model updating task according to the feedback result. The whole process can form virtuous circle and continuously upgrade and reform the model.
The training task scheduling implementation steps involved in the present invention are as shown in fig. 6:
(1) the training scheduling module summarizes all currently received training tasks, and comprises the following steps: timing tasks, temporary tasks, subscription tasks and updating tasks;
(2) and performing priority evaluation on the tasks, wherein the priority is as follows: subscribing tasks, temporary tasks, updating tasks and timing tasks, and sequencing;
(3) calculating the time cost of all tasks, and the time cost evaluation parameters include but are not limited to: model parameter number, sample set size, task type and historical time cost;
(4) evaluating resource requirements of all tasks, the resource requirement evaluation parameters including but not limited to: model size, sample set size, historical demand;
(5) evaluating available resources at present, collecting container resources in the cloud platform, and calculating the amount of the available resources;
(6) performing task arrangement according to the priority, time cost, resource requirements and available resources;
(7) after the tasks are arranged, the container resources are applied according to the sequence, and the training tasks are scheduled.
The model release implementation steps involved in the present invention are as shown in fig. 7:
(1) firstly, inquiring a user list subscribed to the model;
(2) and (3) carrying out priority sequencing on the users, wherein the sequencing parameters are as follows: user level, model usage frequency, online status;
(3) inquiring authorization of a user terminal to a machine learning platform, directly pushing the model under the condition that permission allows direct updating, and only notifying a model updating message to a user if the permission only allows receiving notification;
(4) the model is downloaded by the user and the new model is applied.
The feedback and update implementation steps involved in the present invention are as shown in fig. 8:
(1) firstly, collecting accuracy data, performance data and use frequency data of model application by a user terminal;
(2) pushing the collected data to a data analysis module of a feedback device, filtering the integrity and effectiveness of the data, and then performing classification, aggregation and statistical analysis;
(3) according to the analysis result, a part of data is used for generating a user use report, and a part of available data is used for generating sample data;
(4) and directly feeding back the user report, verifying the sample data according to the requirements of the formal sample, expanding the sample data to a sample set after the sample data passes the requirements of the formal sample, and feeding back the data to an updating decision module of the model production device.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.

Claims (8)

1. A machine learning training system based on edge cloud computing is characterized in that: including marginal cloud computing system and terminal, wherein:
the edge cloud computing system comprises a machine learning platform and a cloud platform, wherein the cloud platform simultaneously manages a virtualization platform and a container platform, the virtualization platform realizes pooling of resources through a virtualization technology and elastic allocation of the resources, so that the utilization rate of infrastructure is improved, and the container platform realizes decoupling of the machine learning platform and hardware resources by using the container technology;
the machine learning platform is configured to perform specific execution of a training task, and specifically comprises a model production device and a model feedback device, wherein the model production device is configured to perform training scheduling, verification, archiving, publishing, subscribing and updating of a management model; the model feedback device is configured to realize the collection, analysis and feedback of data in the production environment;
the terminal receives the training model, collects the training result and feeds the training result back to the machine learning platform;
the model feedback device comprises a collection module, an analysis module and a feedback module, wherein:
the collection module is used for realizing the collection of actual effect data of the model application and finishing the data acquisition of the terminal;
the analysis module is responsible for cleaning, filtering, analyzing and summarizing data, generating a sample and completing data preparation work;
the feedback module is responsible for expanding the initial samples output by the analysis module to a formal sample set after checking and feeding the initial samples back to an update decision module of the model production device to decide whether the model needs to be updated or not.
2. The machine learning training system based on edge cloud computing as claimed in claim 1, wherein: the model production device comprises a training scheduling module, a model verification module, a model archiving module, a model publishing module, a user subscription module, an updating decision module and a sample management module, wherein:
the training scheduling module realizes the scheduling and scheduling functions of the training tasks and ensures the effective operation of the training tasks;
the model verification module realizes the local verification function before model archiving and prevents the output of invalid models;
the model filing module realizes the functions of inquiring, retrieving, storing, backing up, deleting, destroying and classifying management of the model files;
the model issuing module rapidly issues the model output by training and distributes the model to each user terminal;
the user subscription module realizes the function of subscribing the model by the user, allows the user terminal to subscribe the model and distribute the model according to the user requirement, and avoids unnecessary transmission;
after the updating decision module receives feedback from a user, the updating decision module decides whether the model needs to be updated or not, and informs the training module to create a new training task if the model needs to be updated;
and the sample management module is used for realizing sampling, storage, expansion, optimization and updating management of the samples.
3. The machine learning training system based on edge cloud computing as claimed in claim 1, wherein: the edge cloud computing system is located at the near end of the terminal and is used for rapidly responding to data changes.
4. A machine learning training method based on edge cloud computing, which adopts the machine learning training system based on edge cloud computing of any one of claims 1 to 3, and is characterized in that: the method comprises the following steps:
after the training task starts, removing the sample set to obtain a corresponding sample;
applying for container resources, and starting training after initialization configuration;
after the training is finished, outputting a model;
verifying the model on the test sample set, if the model is an effective model, archiving, and otherwise finishing training;
after archiving, starting a publishing process, distributing the model to a corresponding terminal, and receiving feedback accuracy and performance data;
filtering and analyzing the feedback data, and screening effective data from the feedback data and expanding the effective data to a sample set;
deciding whether to start a model updating task by analyzing the accuracy data and the performance data; if the training task needs to be updated, starting a new training task, and if the training task does not need to be updated, terminating the feedback;
the feedback and update implementation method comprises the following steps:
firstly, collecting accuracy data, performance data and use frequency data of model application;
pushing the collected data, filtering the integrity and effectiveness of the data, and then performing classification, aggregation and statistical analysis;
according to the analysis result, a part of data is used for generating a user use report, and a part of available data is used for generating sample data;
and directly feeding back a user report, verifying sample data according to the requirements of a formal sample, expanding the sample data to a sample set after the sample data passes the requirements of the formal sample, and finally feeding back the data.
5. The machine learning training method based on edge cloud computing as claimed in claim 4, wherein: the method for scheduling the training task comprises the following steps:
summarizing all currently received training tasks, including: timing tasks, temporary tasks, subscription tasks and updating tasks;
performing priority evaluation on the tasks, and sequencing;
calculating the time cost of all tasks and evaluating the resource requirements of all tasks;
evaluating available resources at present, collecting container resources in the cloud platform, and calculating the amount of the available resources;
performing task arrangement according to the priority, the time cost, the resource requirement and the available resources;
after the tasks are arranged, the container resources are applied according to the sequence, and the training tasks are scheduled.
6. The edge cloud computing-based machine learning training method of claim 5, wherein: the priority is a subscription task, a temporary task, an updating task and a timing task in turn according to the sequence from big to small.
7. The edge cloud computing-based machine learning training method of claim 5, wherein: time cost assessment parameters include, but are not limited to: model parameter number, sample set size, task type and historical time cost;
resource demand assessment parameters include, but are not limited to: model size, sample set size, and historical demand.
8. The machine learning training method based on the edge cloud computing as claimed in claim 4, further comprising a training model issuing method, comprising the steps of:
firstly, inquiring a user list subscribed to the model;
carrying out priority sequencing on the users, wherein sequencing parameters comprise user levels, model use frequency and online states;
and inquiring the authorization of the terminal to the machine learning platform, directly pushing the model under the condition that the authority permits direct updating, and only notifying the model updating message to each terminal if the authority permits only receiving notification.
CN201810219917.9A 2018-03-16 2018-03-16 Machine learning training system and method based on edge cloud computing Active CN108427992B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810219917.9A CN108427992B (en) 2018-03-16 2018-03-16 Machine learning training system and method based on edge cloud computing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810219917.9A CN108427992B (en) 2018-03-16 2018-03-16 Machine learning training system and method based on edge cloud computing

Publications (2)

Publication Number Publication Date
CN108427992A CN108427992A (en) 2018-08-21
CN108427992B true CN108427992B (en) 2020-09-01

Family

ID=63158386

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810219917.9A Active CN108427992B (en) 2018-03-16 2018-03-16 Machine learning training system and method based on edge cloud computing

Country Status (1)

Country Link
CN (1) CN108427992B (en)

Families Citing this family (40)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110866605A (en) * 2018-08-27 2020-03-06 北京京东尚科信息技术有限公司 Data model training method and device, electronic equipment and readable medium
CN109146084B (en) * 2018-09-06 2022-06-07 郑州云海信息技术有限公司 Machine learning method and device based on cloud computing
CN109272119A (en) * 2018-09-11 2019-01-25 杭州祁睿信息技术有限公司 A kind of user oriented type artificial intelligence system platform based on machine learning model
CN109447276B (en) * 2018-09-17 2021-11-02 烽火通信科技股份有限公司 Machine learning system, equipment and application method
CN109325740A (en) * 2018-09-19 2019-02-12 中国电力科学研究院有限公司 A kind of artificial intelligence shared platform and management method for electric system
CN109491790B (en) * 2018-11-02 2021-08-27 中山大学 Container-based industrial Internet of things edge computing resource allocation method and system
CN109408500B (en) * 2018-11-06 2020-11-17 北京深度奇点科技有限公司 Artificial intelligence operation platform
CN109560893B (en) * 2018-11-08 2022-04-15 中国联合网络通信集团有限公司 Data verification method and device and server
CN109446783B (en) * 2018-11-16 2023-07-25 山东浪潮科学研究院有限公司 Image recognition efficient sample collection method and system based on machine crowdsourcing
CN109657804A (en) * 2018-11-29 2019-04-19 湖南视比特机器人有限公司 Model dynamic training, verification, updating maintenance under cloud platform and utilize method
US11640323B2 (en) 2018-12-13 2023-05-02 Telefonaktiebolaget Lm Ericsson (Publ) Method and machine learning agent for executing machine learning in an edge cloud
CN109670600B (en) * 2018-12-14 2019-10-25 启元世界(北京)信息技术服务有限公司 Decision-making technique and system based on cloud platform
CN111368991B (en) * 2018-12-25 2023-05-26 杭州海康威视数字技术股份有限公司 Training method and device of deep learning model and electronic equipment
CN111401563B (en) * 2018-12-28 2023-11-03 杭州海康威视数字技术股份有限公司 Machine learning model updating method and device
CN111385127B (en) * 2018-12-29 2021-07-09 北京华为数字技术有限公司 Intelligent processing system and method
CN109830271A (en) * 2019-01-15 2019-05-31 安徽理工大学 A kind of health data management system and analysis method based on edge calculations and cloud computing
CN111680798A (en) * 2019-03-11 2020-09-18 人工智能医生股份有限公司 Joint learning model system and method, apparatus, and computer-readable storage medium
CN109949031A (en) * 2019-04-02 2019-06-28 山东浪潮云信息技术有限公司 A kind of machine learning model training method and device
CN111800443B (en) * 2019-04-08 2022-04-29 阿里巴巴集团控股有限公司 Data processing system and method, device and electronic equipment
CN110209574A (en) * 2019-05-14 2019-09-06 深圳极视角科技有限公司 A kind of data mining system based on artificial intelligence
CN110300104B (en) * 2019-06-21 2021-10-22 山东超越数控电子股份有限公司 User authority control and transfer method and system under edge cloud scene
CN110493304B (en) * 2019-07-04 2022-11-29 上海数据交易中心有限公司 Edge computing system and transaction system
CN110471988B (en) * 2019-08-09 2023-05-02 南京智骋致想电子科技有限公司 Three-section five-layer artificial intelligence system based on modularization
CN110572448B (en) * 2019-08-30 2020-11-06 烽火通信科技股份有限公司 Distributed edge cloud system architecture
CN110795529B (en) * 2019-09-05 2023-07-25 腾讯科技(深圳)有限公司 Model management method and device, storage medium and electronic equipment
CN110705177B (en) * 2019-09-29 2023-05-16 支付宝(杭州)信息技术有限公司 Terminal risk assessment model generation method and system based on machine learning
CN111199279A (en) * 2019-10-30 2020-05-26 山东浪潮人工智能研究院有限公司 Cloud edge calculation and artificial intelligence fusion method and device for police service industry
CN113033814A (en) * 2019-12-09 2021-06-25 北京中关村科金技术有限公司 Method, apparatus and storage medium for training machine learning model
CN111030861B (en) * 2019-12-11 2022-05-31 中移物联网有限公司 Edge calculation distributed model training method, terminal and network side equipment
CN111314339B (en) * 2020-02-12 2021-09-10 腾讯科技(深圳)有限公司 Data transmission method and device
CN111580411A (en) * 2020-04-27 2020-08-25 珠海格力电器股份有限公司 Control parameter optimization method, device and system
CN111880430A (en) * 2020-08-27 2020-11-03 珠海格力电器股份有限公司 Control method and device for intelligent household equipment
CN112965803A (en) * 2021-03-22 2021-06-15 共达地创新技术(深圳)有限公司 AI model generation method and electronic equipment
CN117178502A (en) 2021-04-22 2023-12-05 高通股份有限公司 Machine learning model reporting, rollback and update for wireless communications
CN112988716B (en) * 2021-05-15 2021-08-24 杰为软件系统(深圳)有限公司 Cloud edge collaborative digital equipment modeling method
CN114048104A (en) * 2021-11-24 2022-02-15 国家电网有限公司大数据中心 Monitoring method, device, equipment and storage medium
WO2023216121A1 (en) * 2022-05-10 2023-11-16 Nokia Shanghai Bell Co., Ltd. Method, apparatus and computer program
CN115688611A (en) * 2022-12-29 2023-02-03 南京邮电大学 Small space model real-time training method based on semiconductor device structure
CN116541717A (en) * 2023-07-06 2023-08-04 图林科技(深圳)有限公司 Big data analysis method based on cloud computing and deep learning
CN117332878B (en) * 2023-10-31 2024-04-16 慧之安信息技术股份有限公司 Model training method and system based on ad hoc network system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120239739A1 (en) * 2011-02-09 2012-09-20 Gaurav Manglik Apparatus, systems and methods for dynamic adaptive metrics based application deployment on distributed infrastructures
CN107563417A (en) * 2017-08-18 2018-01-09 北京天元创新科技有限公司 A kind of deep learning artificial intelligence model method for building up and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106657099B (en) * 2016-12-29 2020-06-16 北京天元创新科技有限公司 Spark data analysis service publishing system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120239739A1 (en) * 2011-02-09 2012-09-20 Gaurav Manglik Apparatus, systems and methods for dynamic adaptive metrics based application deployment on distributed infrastructures
CN107563417A (en) * 2017-08-18 2018-01-09 北京天元创新科技有限公司 A kind of deep learning artificial intelligence model method for building up and system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
基于Spark高性能计算的仿真云平台设计;李军等;《系统仿真技术及其应用》;20150809;第16卷;第426-429页 *
容器微云的监控系统的研究与实现;张松;《中国优秀硕士学位论文全文数据库信息科技辑》;20170115(第01期);第7-10页 *
边缘云计算体系结构及数据迁移方法研究;曹小坤;《中国优秀硕士学位论文全文数据库信息科技辑》;20140115(第01期);第10-25页 *

Also Published As

Publication number Publication date
CN108427992A (en) 2018-08-21

Similar Documents

Publication Publication Date Title
CN108427992B (en) Machine learning training system and method based on edge cloud computing
CN105049268B (en) Distributed computing resource distribution system and task processing method
CN103645954B (en) A kind of CPU dispatching method based on heterogeneous multi-core system, device and system
CN104461744B (en) A kind of resource allocation methods and device
CN102414674B (en) Application efficiency engine
CN109918198A (en) A kind of emulation cloud platform load dispatch system and method based on user characteristics prediction
US8020044B2 (en) Distributed batch runner
CN107003887A (en) Overloaded cpu setting and cloud computing workload schedules mechanism
CN108920153A (en) A kind of Docker container dynamic dispatching method based on load estimation
CN107480041A (en) The task automation method of testing and system of a kind of big data
CN112685153A (en) Micro-service scheduling method and device and electronic equipment
CN102707995A (en) Service scheduling method and device based on cloud computing environments
CN110012062B (en) Multi-computer-room task scheduling method and device and storage medium
CN110855481B (en) Data acquisition system and method
CN108132840B (en) Resource scheduling method and device in distributed system
WO2012100545A1 (en) Method, system and device for service scheduling
CN116302568A (en) Computing power resource scheduling method and system, scheduling center and data center
CN104092781B (en) A kind of cloud cluster rapid deployment system based on cloud computing
CN102158545A (en) Resource pool management method and device
CN107203256B (en) Energy-saving distribution method and device under network function virtualization scene
CN108107791B (en) Wind power plant control device, method and system
CN114138501B (en) Processing method and device for edge intelligent service for field safety monitoring
CN107589995B (en) Pre-broadcast preview task scheduling method for data service
CN112883110A (en) Terminal big data distribution method, storage medium and system based on NIFI
CN103546341A (en) Automatic setup method of test environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20211227

Address after: 250000 room 1802, J2 office building, Wanda Plaza, 57 Gongye South Road, Jinan, Shandong

Patentee after: SHANDONG YIRAN INFORMATION TECHNOLOGY Co.,Ltd.

Address before: 250101 1-1303, building 4, Dinghao Plaza, No. 44, Gongye South Road, high tech Zone, Jinan, Shandong

Patentee before: JINAN FEIXIANG INFORMATION TECHNOLOGY CO.,LTD.

TR01 Transfer of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A machine learning training system and method based on edge cloud computing

Effective date of registration: 20220826

Granted publication date: 20200901

Pledgee: Ji'nan finance Company limited by guarantee

Pledgor: SHANDONG YIRAN INFORMATION TECHNOLOGY Co.,Ltd.

Registration number: Y2022980013542

PE01 Entry into force of the registration of the contract for pledge of patent right
PC01 Cancellation of the registration of the contract for pledge of patent right

Date of cancellation: 20230926

Granted publication date: 20200901

Pledgee: Ji'nan finance Company limited by guarantee

Pledgor: SHANDONG YIRAN INFORMATION TECHNOLOGY Co.,Ltd.

Registration number: Y2022980013542

PC01 Cancellation of the registration of the contract for pledge of patent right