CN113537504A - Transfer learning method, device and system based on edge nodes - Google Patents

Transfer learning method, device and system based on edge nodes Download PDF

Info

Publication number
CN113537504A
CN113537504A CN202010288894.4A CN202010288894A CN113537504A CN 113537504 A CN113537504 A CN 113537504A CN 202010288894 A CN202010288894 A CN 202010288894A CN 113537504 A CN113537504 A CN 113537504A
Authority
CN
China
Prior art keywords
model
training
edge
data set
task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010288894.4A
Other languages
Chinese (zh)
Inventor
周胜平
吴栋
林俊杰
吴敏
梁乔忠
仲景武
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alpha Cloud Computing Shenzhen Co ltd
Original Assignee
Alpha Cloud Computing Shenzhen Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alpha Cloud Computing Shenzhen Co ltd filed Critical Alpha Cloud Computing Shenzhen Co ltd
Priority to CN202010288894.4A priority Critical patent/CN113537504A/en
Publication of CN113537504A publication Critical patent/CN113537504A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks

Abstract

The invention provides a model training method based on migration learning of edge nodes, which comprises the following steps: generating a public model initial copy of transfer learning by performing model training in a cloud center; the edge training node obtains the public model according to the assigned training task and executes the assigned training task; the edge verification node obtains a public model intermediate copy, and verifies the copy; and the cloud center selects an optimized intermediate copy according to the verification result of the intermediate copy of each public model and then optimizes the intermediate copy to obtain the final public model. The model training method for transfer learning can effectively utilize the resource distribution characteristics of the cloud center and the edge nodes, improve the real-time performance of data acquisition and processing of the edge nodes, reduce the acquisition cost of the AI model and accelerate the application and popularization of the AI.

Description

Transfer learning method, device and system based on edge nodes
Technical Field
The application relates to a migration learning method of AI, mainly relating to a model training method for migration learning based on edge nodes.
Background
It is currently widely recognized that Artificial Intelligence (AI-Artificial Intelligence) will be one of the most influential technologies in the twenty-first century and beyond. The existing mainstream AI technology still belongs to machine learning and related algorithms. Among many machine learning algorithms, an algorithm called migration learning has been developed in recent years. This algorithm differs from previous machine learning algorithms by: it can transfer some modules obtained by algorithm learning from one type of task to another in a mode of a common model.
But with the advent of the Internet of Things (IoT-Internet of Things), more and more devices will be added to the data collection line. As the number and variety of devices increase further, more and more rich and diverse tasks are developed based on the data they collect. In the traditional mode of using cloud center computing resources for transfer learning, on one hand, the real-time performance of data processing cannot be provided, and on the other hand, the complementary characteristics of the cloud center and the edge nodes in computing capacity are wasted.
Disclosure of Invention
Therefore, the migration learning algorithm is trained by using the edge AI nodes, so that the characteristics of edge cloud computing resource distribution are fully utilized, and the real-time performance of data acquisition and processing is guaranteed. The methods are applied to unspecified tools, equipment and systems, even a data center or a cloud service center, so that a model training system for transfer learning is formed. Therefore, the invention comprises the following steps:
in one aspect, a migration learning method based on edge nodes is provided, and the method is used for a model training process, wherein the model training is implemented in the edge nodes. Further comprising:
determining a first data set, wherein the first data set is a result of first model training; sending a first data set, the sending of the first data set resulting from a response to a second model training request of one edge training node; receiving a second data set, wherein the second data set is from an edge verification node and is a result of second model verification; generating a third data set, the third data set comprising a plurality of second data sets; generating a fourth data set, the fourth data set from the third data set. Further, it is included that the first data set is an initial copy of a common model of the model training process. Further, it is included that the fifth data set is a final copy of a common model of the model training process. Further, the second data set is from the edge verification node, and the edge verification node receives a model obtained by training from the edge training node; the edge training node receives the first data set. Further, the second data set is an intermediate copy of a common model of the model training process; selecting a good quality intermediate replica from the third data set that identifies the common model; and training the high-quality intermediate copy.
In a data center or cloud equipment environment with a general control capability, a plurality of connected and managed edge AI nodes (edge nodes for short) are used for model training of transfer learning, so that the characteristic of edge cloud computing resource distribution is fully utilized, the real-time performance of data acquisition and processing is guaranteed, and the cost of model training and even AI application is reduced. In this regard, the model training process is implemented in a cloud-centric facility and system. In this respect, the model training method process of the transfer learning is as follows: preparing a model training task set, and dividing and confirming a local (cloud) training task subset and a remote (edge node) training task subset; executing a local model training task, and obtaining an initial copy of a public model from a training result obtained by completing the training task; issuing model training task information, wherein the issued model training task is information about a remote training task subset, and the issued information is sent to an edge node with model training capability; the method comprises the steps that a public model is optimized in a local and remote combined mode, and the public model obtained in the optimization process is an intermediate copy of the model; selecting and eliminating from the intermediate copy of the public model, and starting a local model training task again based on the obtained high-quality model to perform optimization training on the public model; and determining a final public model, wherein the determination process of the final public model is selected and eliminated according to the optimized training result. Further, the local and remote joint optimization common model further includes: initializing edge nodes and sending training task information to each edge node; receiving a request of an edge node, wherein the request comprises training task information, and the training information has a corresponding public model initial copy; sending the matched training task and the common model initial copy corresponding to the training task to a request edge node; waiting for the corresponding edge node to finish the training task and sending a training result; receiving a training result to obtain a public model intermediate copy submitted by the edge node; a request for a model training task is again received from the (other) edge node.
Therefore, the product and service system comprising the partial or all methods and steps can provide a model training process based on the migration learning of the edge nodes, fully utilize the characteristic of edge cloud computing resource distribution, ensure the real-time performance of data acquisition and processing, and reduce the cost of AI application, so that the AI has the capability of boosting more convenient cloud application and big data application, and the popularization of the cloud application and the big data application is accelerated.
In another aspect, a model training device for transfer learning based on edge nodes is provided, and the device is a system composed of a central station and devices of the central station. The middle station also comprises: AI middlings, data middlings, IoT middlings. Each of the intermediate stations further comprises:
a task management module: the module is used for managing training tasks;
a model training module: the module is used for executing model training performed by the cloud center;
a transfer learning module: the module is used for executing management related to the transfer learning characteristics;
a model management module: the module is used for managing the model related to the cloud center and the edge side;
a data management module: the module is used for managing data maintained by the data center;
a data calculation module: the module is used for executing the calculation process of the data managed by the data center;
the data distribution module: the module is used for executing the function of distributing cloud center data to the edge or the terminal side;
a device management module: the module is used for executing the management function of the edge node or the terminal equipment;
the data interaction module: the module is used for managing interactive contents and operations among edge nodes or terminal equipment;
a data acquisition module: the module is used for collecting data from the edge and terminal sides.
The unit provided by the invention and other units, modules, related platforms and related engines required by the actual implementation of the product jointly realize a model training device for transfer learning based on edge nodes. The expression is as follows: the method comprises the steps that a model training device prepares a model training task set, and divides and confirms a local (cloud) training task subset and a remote (edge node) training task subset; the model training device executes a local model training task and obtains an initial copy of the public model from a training result obtained by completing the training task; the model training device issues model training task information, the issued model training task is information about a remote training task subset, and the issued information is sent to an edge node with model training capability; the model training device and the remote node jointly optimize the public model, and the public model obtained in the joint optimization process is an intermediate copy of the model; the model training device selects and eliminates the intermediate copy of the public model, and based on the obtained high-quality model, the model training device starts a local model training task again to perform optimization training on the public model; and the model training device determines a final public model, and the determination process of the final public model is selected and eliminated according to the optimized training result. Further, the local and remote joint optimization common model further includes: initializing edge nodes by the model training device, and sending training task information to each edge node; a model training device receives a request of an edge node, wherein the request comprises training task information, and the training information has a corresponding public model initial copy; the model training device sends the matched training task and the public model initial copy corresponding to the training task to the request edge node; the model training device waits for the corresponding edge node to complete a training task and send a training result; the model training device receives a training result to obtain a public model intermediate copy submitted by the edge node; the model training device again receives requests for model training tasks from (other) edge nodes.
Like this, the product and the service system that have above-mentioned functional device can provide the model training process based on edge node does the migration study, and the characteristics of make full use of limit cloud computing resource distribution ensure the real-time of data acquisition and processing, reduce the cost of AI application to make the AI have the ability of promoting more convenient cloud application and big data application, accelerate the popularization of cloud application and big data application.
In another aspect, a model training method for transfer learning is provided, where the model training method is used for training edge nodes of model training, and the method further includes:
receiving an initialization command of a training node, wherein the initialization command of the training node is sent from an AI (analog to digital) middle station; executing an initialization process of the training nodes to complete the initialization operation of the nodes; receiving model training task information, confirming a training task corresponding to a node according to the training task information, and then sending a corresponding training task request; receiving the description of a training task, and determining a required public model according to the training task; requesting a corresponding public model-related module, receiving the public model-related module; generating a training task according to the deployment requirement of the training task, and executing a model training task; completing a model training task, and obtaining a public model from a training result; and submitting the obtained public model for verification.
The model training device with the transfer learning capability realized through the distributed edge nodes has the advantages of good data processing real-time performance, full and effective resource utilization and the like besides the algorithm application and expansion capability of the transfer learning. Under the centralized connection and unified management of the AI central station, further: the method comprises the steps that a model training device receives an initialization command of a training node, and the initialization command of the training node is sent from an AI (analog to digital) middle station; the model training device executes the initialization process of the training nodes to complete the initialization operation of the nodes; the model training device receives model training task information, confirms a training task corresponding to a node according to the training task information, and then sends a corresponding training task request; the model training device receives the description of the training task and determines a required public model according to the training task; the model training device requests a corresponding public model related module and receives the public model related module; the model training device generates a training task according to the deployment requirement of the training task and executes the model training task; the model training device completes a model training task and obtains a public model from a training result; and submitting the obtained public model for verification by the model training device.
Like this, the product and the service system that have above-mentioned functional device can provide the model training process based on edge node does the migration study, and the characteristics of make full use of limit cloud computing resource distribution ensure the real-time of data acquisition and processing, reduce the cost of AI application to make the AI have the ability of promoting more convenient cloud application and big data application, accelerate the popularization of cloud application and big data application.
In another aspect, a model verification method for transfer learning is provided, where the model verification method is used for verifying edge nodes of model verification, and the method further includes:
receiving an initialization command of a verification node, wherein the initialization command of the verification node is sent from an AI (advanced information infrastructure) middle station; executing the initialization process of the verification node to complete the initialization operation of the node; receiving model verification task information, confirming a verification task corresponding to a node according to the verification task information, and then sending a corresponding verification task request; receiving the description of the verification task, and determining a required public model according to the verification task; requesting a corresponding public model-related module, receiving the public model-related module; generating a verification task according to the deployment requirement of the verification task, and training a verification model; generating a verification task according to the deployment requirement of the verification task, and executing the verification task of the public model; and uploading and submitting the verification result of the public model.
The model verification device with the transfer learning capability realized through the distributed edge nodes has the advantages of good data processing real-time performance, full and effective resource utilization and the like besides the algorithm application and expansion capability of the transfer learning. Under the centralized connection and unified management of the AI central station, further: the model verification device receives an initialization command of a verification node, and the initialization command of the verification node is sent from an AI (advanced information infrastructure) middle desk; the model verification device executes the initialization process of the verification node to complete the initialization operation of the node; the model verification device receives model verification task information, confirms a verification task corresponding to a node according to the verification task information, and then sends a corresponding verification task request; the model verification device receives the description of the verification task and determines a required public model according to the verification task; the model verifying device requests a corresponding public model related module and receives the public model related module; generating a verification task according to the deployment requirement of the verification task, and training a verification model by using a model verification device; generating a verification task according to the deployment requirement of the verification task, and executing the verification task of the public model by the model verification device; and finally, uploading and submitting the verification result of the public model by the model verification device.
Like this, the product and the service system that have above-mentioned functional device can provide the model training and the verification process based on edge node does the migration study, and the characteristics of make full use of limit cloud computing resources distribution ensure the real-time of data acquisition and processing, reduce the cost of AI application to make the AI have the ability of promoting more convenient cloud application and big data application, accelerate the popularization and the popularization of cloud application and big data application.
In another aspect, an edge node is provided for simultaneously performing the model training method performed by the training edge node and the model verification method performed by the verification edge node. The method of the edge node in the model training and model verification process is not described herein.
In another aspect, a computer-readable storage medium is provided, which stores program instructions that, when executed by a processor, the processor (respectively) has implementation procedures to perform the above-described method.
In another aspect, an apparatus for management is provided that includes a storage component, a processing component, and a communication component, the storage component, the processing component, and the communication component being interconnected. The storage component is used for storing data processing codes, and the communication component is used for carrying out information interaction with external equipment; the processing component is configured to invoke program code, each to perform the functions described above with respect to the apparatus.
Drawings
In order to more clearly illustrate the technical solution of the present invention and to more clearly illustrate the elements, modes and processes for achieving the objects of the present invention, the following drawings are provided for illustrating the embodiments of the present invention:
FIG. 1 is a diagram of a system component of the proposed invention;
FIG. 2 is a diagram of a system component of the proposed invention;
FIG. 3 is a diagram of a system component of the proposed invention;
FIG. 4 is a schematic diagram of an embodiment of the invention proposed in the present application;
FIG. 5 is a schematic diagram of an embodiment of the invention proposed in the present application;
FIG. 6 is one of the flow charts for implementing the operation of the invention proposed in this application;
FIG. 7 is one of the flow charts for implementing the operation of the invention proposed in this application;
FIG. 8 is one of the flow charts for implementing the operation of the invention proposed in this application;
fig. 9 is one of the flow of implementing operations of the invention proposed in this application.
Detailed Description
The embodiments of the present invention will be described below with reference to the drawings.
The terms "first", "second", and "third", etc. in the description and claims of this application and in the accompanying drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
As used in this application, the terms "server," "device," "apparatus," "unit," "component," "module," "system," and the like are intended to refer to a computer-related entity, either hardware, firmware, a combination of hardware and software, or software in execution. For example, a server may be, but is not limited to, a processor, a data processing platform, a computing device, a computer, two or more computers, or the like; a unit may be, but is not limited to being, a process running on a processor, a runnable object, an executable, a thread of execution, or any other executable computer program. One or more units may reside within a process and/or thread of execution and a unit may be localized on one computer and/or distributed between 2 or more computers. In addition, these units may execute from various computer readable media having various data structures stored thereon. The elements may communicate by way of local and/or remote processes based on a signal having one or more data packets (e.g., data from two elements interacting with another element in a local system, distributed system, and/or across a network, such as the internet with other systems by way of the signal).
First, some terms in the present application are explained so as to be easily understood by those skilled in the art. The terms listed include the following:
(1) cloud computing: namely Cloud Computing, refers to a new Computing paradigm that has the advantages of integration, connectivity in a network environment, and the ability to provide Computing, storage, and even software to users in a service fashion. The difference between the new computing paradigm and the old computing paradigm is that, for the user, the new computing paradigm has no visible fixed form or even no resource-free state, so the new computing paradigm is called cloud computing;
(2) artificial intelligence: the intelligent simulation method is characterized in that the method is an Artificial Intelligence, AI for short, and is a general name of a method, technology, software, hardware and a system for simulating human Intelligence through a computing system;
(3) machine learning: machine learning is an important branching technique in the field of AI. Machine learning extracts data patterns from the sample data in order to make the best possible predictions of the application data. From the current development results, machine learning is divided into supervised learning, unsupervised learning and reinforcement learning;
(4) task scheduling: for running multiple tasks in the same environment, competition for resources among them is inevitable. In order to improve the utilization rate of resources, guarantee the sequential operation of tasks, and ensure that the tasks obtain effective logical results, the operation of subtasks in the tasks needs to be arranged according to the competition condition of the tasks on the resources and the data dependence condition in the tasks. This arrangement is called "task scheduling". Generally, in a distributed environment, task scheduling involves two levels of implications: remote data rely on the relevant task scheduling and competition with the task of the local resource and relevant to switch over;
(5) the neural network algorithm comprises the following steps: the neural network of the present application is mainly referred to as an artificial neural network, and the neural network algorithm is mainly referred to as an artificial neural network algorithm. In machine learning and cognitive science, a neural network is a mathematical or computational model that mimics the structure and function of a biological neural network, and thus is used to estimate or approximate a function. The neural network algorithm is formed by connecting a large number of artificial neurons, and the structures of the neurons can change in response to the change of external information, so that the algorithm has a self-adaptive function, namely a learning function;
(6) transfer learning: transfer Learning, is a research problem in Machine Learning (ML), and focuses on obtaining common knowledge so that it can be applied to solve one problem as well as another related problem. Namely, the migration learning mainly aims at finding and extracting reusable knowledge;
(7) copy of the model: the concept proposed by the present invention. In the implementation process of the transfer learning, reusable knowledge exists in a mode of a common model. The invention proposes that the common model in the transfer learning can be continuously trained and optimized in different training tasks, so that it continuously changes in the form of a copy. The common model obtained by the initial transfer learning is an initial copy of the common model, the common model obtained by the continuous training process is an intermediate copy, and the common model obtained by the optimization (or close to the final determined version) is a final copy.
Next, the objective problem of the present invention and a technical method for solving the objective problem are summarized. With the popularization of AI applications, the traditional AI application method of training and generating an AI model by collecting and sorting large-scale data has a significant disadvantage, which is particularly significant in IoT scenes with richer connections and data acquisition capabilities. The method improves the flexibility of AI application by a group learning model training mode, reduces the acquisition cost of the AI model, and thus is expected to promote the popularization of the AI application in a wider range.
The invention will be further explained with reference to the drawings. Wherein:
fig. 1 is a system diagram of the invention of the present application. This figure illustrates the functional component structure of model training that implements the transfer learning proposed by the present invention. 110 is a cloud center, and 110 comprises an AI middle station, a data middle station and an IoT middle station; 120/130 are AI edge nodes, each of which includes a model training unit, a model verification unit, and a data collection unit. Wherein:
111-AI midcourse: the middle station is used for providing the ability related to the algorithm learning and the model training and completing the corresponding function;
112-station in data: the middle station is used for providing functions related to data management and data calculation and completing corresponding functions;
113-IoT intermediate station: the middle station is used for providing the capability related to the connection and management of the Internet of things (IoT) and completing the corresponding function;
121/131-model training unit: the unit is used for executing algorithm learning and model training tasks deployed on the edge nodes;
122/132-model verification unit: the unit is used for executing a model verification task deployed on the edge node;
123/133-data acquisition unit: the unit is used for collecting and managing data required by the edge nodes in the algorithm learning and model training processes.
Fig. 2 is a system diagram of the invention of the present application. This figure illustrates the functional component structure of model training that implements the transfer learning proposed by the present invention. The AI console shown at 210 also includes: 211, a task management module, 212, a transfer learning module, 213 and a model training module, 214; 220 further comprises: 221 a data management module, 222 a data calculation module and 223 a data distribution module; the IoT intermediate station shown at 230 further comprises: 231 equipment management module, 232 data interaction module and 233 data acquisition module. Wherein:
211-task management module: the module is used for managing training tasks;
213-model training module: the module is used for executing model training performed by the cloud center;
212-transfer learning Module: the module is used for executing management related to the transfer learning characteristics;
214-model management module: the module is used for managing the model related to the cloud center and the edge side;
221-data management module: the module is used for managing data maintained by the data center;
222-data calculation module: the module is used for executing the calculation process of the data managed by the data center;
223-data distribution module: the module is used for executing the function of distributing cloud center data to the edge or the terminal side;
231-device management module: the module is used for executing the management function of the edge node or the terminal equipment;
232-data interaction module: the module is used for managing interactive contents and operations among edge nodes or terminal equipment;
233-data acquisition module: the module is used for collecting data from the edge and terminal sides.
Fig. 3 is a system diagram of the invention of the present application. This figure illustrates the functional component structure of model training that implements the transfer learning proposed by the present invention. The model training unit shown in 310 further comprises 311 a model management module and 312 a training control module; the model verification unit shown at 320 further comprises 321 a local task module and 322 a local management module; the data acquisition unit shown at 330 further comprises a data acquisition module 331 and a data organization module 332. Optionally, the model training unit and the model verification unit may reside on the same edge node; optionally, the model training unit and the model verification unit may also reside on different edge nodes, respectively. Wherein:
311-model management module: the module is used for managing a model deployed and operated on the equipment node;
312-training control module: the module is used for managing and controlling a model in a training process on equipment nodes;
321-model management module: the module is used for managing a model deployed and operated on the equipment node;
322-verification control module: the module is used for managing and controlling a model in a verification process on the equipment node;
331-data acquisition module: the module is used for collecting and collecting the data of the IoT equipment connected to the edge AI equipment node;
332-data organization module: the module is used for storing, organizing and managing the data collected from the IoT equipment.
Fig. 4 is a schematic diagram of an embodiment of the invention proposed in the present application. This figure illustrates a case where individual model training tasks share a common model, where: 511 is a private model a possessed by the model training task a, 512 is a private model B possessed by the model training task B, 513 is a private model C possessed by the model training task C, and 514 is a common model D shared by the model training task a, the model training task B, and the model training task C. Further, as an example, wherein: the private model a further comprises certain private modules 51 and 52, the private model B further comprises certain private modules 61 and 62, the private model C further comprises certain private modules 71 and 72, and the public model C further comprises certain public modules 10, 20, 30, 40 and 50. It should be noted that the number of model training tasks, the relationship between the model training tasks, the composition between the model and the module, and the like are only illustrative of the implementation of the invention method, and are not intended to limit the invention contents or the invention features.
Fig. 5 is a schematic diagram of an embodiment of the invention provided in the present application. The figure illustrates the model composition relationship of a device with the cloud center and edge node composition relationship and a system on a model training task. 511 is a schematic of a cloud center, 512 is a schematic of an edge node, 521 is a schematic of a cloud training task, 522 is a schematic of an edge training task, and 523 is a schematic of an edge training task. Typically, the cloud center is represented as one and the edge nodes are represented as N, so the cloud training task of diagram 521 is a "cloud training task" and the tasks of the edge nodes can be represented as "edge training task-i" and "edge verification task-j". Further, the illustrated model constitutes a relationship: the cloud training task shown in 521 is composed of a cloud training task private model and a public model (initial copy), the edge training task shown in 522 is composed of a public model (intermediate copy), and the edge verification task shown in 523 is composed of a private model and a public model (intermediate copy). It should be noted that the quantitative relationship of the model components described herein is merely an implementation example and is not a limitation of the inventive features. One implementation is: the edge training task and the edge verification task run on the same edge node; the other realization mode is as follows: the edge training task and the edge verification task are operated on different edge nodes; in an implementation mode that the edge training task and the edge verification task are operated at different nodes, the number of the edge nodes for operating the training task and the verification task can form N: m is shown in arbitrary proportion.
Fig. 6 shows one of the operation execution flows proposed by the present invention. The figure illustrates that the cloud center generates a common model initial copy, drives the edge nodes to generate an intermediate copy and finally optimizes to obtain the full generation of a final copy in the migration learning process. Wherein:
10A-preparation of model training tasks: the operation is used for preparing model training tasks for transfer learning through the cloud center, and the model training tasks include but are not limited to model training tasks for generating public model initial copies through the cloud center and model training tasks for generating public model intermediate copies through edge nodes and running;
10B-perform local model training tasks: the operation is used for deploying and executing a model training task running in a cloud center;
10C-get initial copy of common model: the operation is used for obtaining a public model from a model obtained by model training running in a cloud center, and the public model is used as an initial copy of the public model;
10D-local/remote joint optimization common model: this operation is used to jointly optimize a common model between local at the cloud center and remote at which the edge nodes reside. FIG. 7 is a diagram illustrating a detailed implementation process of the joint optimization;
10E-selection and elimination of intermediate copies of the public model: the operation is used for selecting and eliminating the public model intermediate copy obtained by the joint optimization, so that a high-quality public model intermediate copy is obtained;
10F-local training and optimization common model intermediate copy: the operation is used for retraining and optimizing the selected high-quality public model intermediate copy locally at the cloud center;
10G-determine the final copy of the common model: this operation is used to evaluate and derive a final copy of the common model from the locally trained and optimized intermediate copy of the common model.
Fig. 7 shows one of the operation execution flows proposed by the present invention. The figure illustrates a detailed implementation process for performing a local/remote joint optimization common model based on a cloud center. It should be noted that the illustrated serial relationship of receiving a training task request, sending a task, receiving a task result, and receiving a training task request is only one form of describing a plurality of training tasks. This form should not be used as an exclusion and constraint for multiple training tasks with multiple edge nodes requesting and training in parallel. Wherein:
20A-sending a command to initialize the edge node: this operation is used to send edge node initialization commands from the cloud center to each edge node in preparation for the model training task. Further, the initialization command may include optional model training task list information;
20B-training task request of receiving node: the operation is used for receiving a training task request which is sent by a model training task determined according to a model training task list after initialization of each edge node is completed;
20C-sending training task and common model initial copy: the operation is used for sending model training task content corresponding to the request and public model content required by training according to the request condition of the edge node;
20D-waiting for the node to report the training task result: the operation is used for waiting for the training result to report the result of the model training task;
20E-common model intermediate copy of receiving node: the operation is used for receiving task results sent by the nodes after the training task is completed, particularly public model intermediate copies;
20F-receiving a model training task request for an edge node: this operation is used to receive the next model training task request from the edge node. The next request here may be a request of a different node, or a request of the same node for the next training task.
Fig. 8 shows one of the operation execution flows proposed by the present invention. The figure illustrates a training process in which an edge node that completes a transfer learning algorithm performs a model training task as a model training device. Such edge nodes are called edge training nodes. Wherein:
30A-receiving training node initialization command: the operation is used for the edge training node to receive a node initialization command sent from the cloud center. The initialization command comprises optional model training task list information;
30B-perform node initialization operation: this operation is used to complete the initialization operation of the edge training node. Specifically, the operation determines a model training task for transfer learning according to the optional model training list information, so as to send a task request for obtaining the training task content to the cloud center;
30C-receive model training task: the operation is used for receiving model training task content sent to the edge training node from the cloud center;
30D-request/receive common model correlation module: the operation is used for obtaining a public model corresponding to the training task according to the request, and a module corresponding to the public model or other cloud center contents;
30E-performing model training tasks: the operation is used for deploying model training tasks and resources required by running the tasks on the edge training nodes and executing the model training tasks;
30F-get a common model of training results: the operation is used for obtaining a public model intermediate copy from a model training result after the edge training node completes a model training task;
30G-submission validation public model: this operation is used to send the common model intermediate copy from the edge training node to the corresponding verification node for model verification.
Fig. 9 shows one of the operation execution flows proposed by the present invention. The figure illustrates a verification process in which an edge node that completes a transfer learning algorithm performs a model verification task as a model verification device. Such edge nodes are called edge verification nodes. Wherein:
40A-receiving training node initialization command: this operation is used for the edge verification node to receive a node initialization command sent from the cloud center. The initialization command comprises optional model verification task list information;
40B-node initialization operation is performed: this operation is used to complete the initialization operation of the edge verification node. Specifically, the operation determines a model verification task of transfer learning according to the optional model verification list information, and sends a task request for obtaining the content of the verification task to the cloud center;
40C-receive model verification task: the operation is used for receiving a model verification task sent to the edge verification node from the cloud center;
40D-request model verification related content: the operation is used for sending a model verification request to a corresponding edge training node according to the description of the verification task so as to obtain the content required by the verification task. In particular, the verification task content contains a common model copy of the transfer learning;
40E-receive model verification of related content: this operation is used to receive content relevant to the model verification task. In particular, the relevant content contains a copy of the common model of the transfer learning;
40F-training verification model: this operation is used to train a special verification model at the edge verification node;
40G-performing common model verification: the operations are for performing a model validation task on an edge validation node;
40H-upload public model verification results: this operation is used to upload the results of the model verification task from the edge verification node.
In this application, the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, and may be located in a single network node, or may be distributed on multiple network nodes. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.
In addition, according to specific constraints and implementation requirements, functional components in the embodiments of the present application may be integrated into one component, or each component may exist alone physically, or two or more components may be integrated into one component. The integrated components can be realized in a form of hardware or a form of software functional units.
The integrated components, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention essentially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product stored in a storage medium and including instructions for causing one or more computer devices (which may be personal computers, servers, or network devices) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
While the invention has been described with reference to specific embodiments, the scope of the invention is not limited thereto, and those skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
It should be understood that, in the various embodiments of the present application, the serial numbers of the above-mentioned processes do not mean a strict order of execution, and the execution order of the processes should be determined by their functions and inherent logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention. While the present application has been described herein in conjunction with various embodiments, other variations to the disclosed embodiments may be understood and effected by those skilled in the art in practicing the present application as claimed herein.

Claims (8)

1. An edge node-based transfer learning method is used for a model training process, and the model training is implemented on edge nodes. The method is characterized by comprising the following steps:
determining a first data set, wherein the first data set is a result of first model training;
sending a first data set, the sending of the first data set resulting from a response to a second model training request of one edge training node;
receiving a second data set, wherein the second data set is from an edge verification node and is a result of second model verification;
generating a third data set, the third data set comprising a plurality of second data sets;
generating a fourth data set, the fourth data set from the third data set.
2. The method of claim 1, further comprising:
the first data set is an initial copy of a common model of the model training process.
3. The method of claim 1, further comprising:
the fourth data set is a final copy of a common model of the model training process.
4. The method of claim 1, wherein receiving the second data set further comprises:
the second data set is from the edge verification node, and the edge verification node receives a model obtained by training from the edge training node;
the edge training node receives the first data set.
5. The method of claim 1, wherein generating a fourth data set further comprises:
the second dataset is an intermediate copy of a common model of the model training process;
selecting a good quality intermediate replica from the third data set that identifies the common model;
and training the high-quality intermediate copy.
6. A computer program, characterized in that it comprises means for performing the method according to any one of claims 1 to 5.
7. A computer-readable storage medium, characterized in that the computer storage medium stores program instructions that, when executed by a processor, cause the processor to perform the method of any of claims 1-5.
8. The device for executing the computer program is characterized by comprising a processing component, a storage component and a communication module component, wherein the processing component, the storage component and the communication module component are connected with each other, the storage component is used for storing data processing codes, and the communication module is used for carrying out information interaction with external equipment; the processing component is configured for calling program code for performing the method according to any of claims 1-5.
CN202010288894.4A 2020-04-13 2020-04-13 Transfer learning method, device and system based on edge nodes Pending CN113537504A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010288894.4A CN113537504A (en) 2020-04-13 2020-04-13 Transfer learning method, device and system based on edge nodes

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010288894.4A CN113537504A (en) 2020-04-13 2020-04-13 Transfer learning method, device and system based on edge nodes

Publications (1)

Publication Number Publication Date
CN113537504A true CN113537504A (en) 2021-10-22

Family

ID=78120041

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010288894.4A Pending CN113537504A (en) 2020-04-13 2020-04-13 Transfer learning method, device and system based on edge nodes

Country Status (1)

Country Link
CN (1) CN113537504A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024065682A1 (en) * 2022-09-30 2024-04-04 Shenzhen Tcl New Technology Co., Ltd. Communication devices and methods for machine learning model training

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024065682A1 (en) * 2022-09-30 2024-04-04 Shenzhen Tcl New Technology Co., Ltd. Communication devices and methods for machine learning model training

Similar Documents

Publication Publication Date Title
Liu et al. Adaptive asynchronous federated learning in resource-constrained edge computing
Guo et al. Cloud resource scheduling with deep reinforcement learning and imitation learning
CN104714852B (en) A kind of parameter synchronization optimization method and its system suitable for distributed machines study
CN108833161B (en) Method for establishing intelligent contract micro-service model based on fog calculation
US20240111586A1 (en) Multi-policy intelligent scheduling method and apparatus oriented to heterogeneous computing power
CN113098714A (en) Low-delay network slicing method based on deep reinforcement learning
CN108111335A (en) A kind of method and system dispatched and link virtual network function
CN105094970B (en) The method of more times scheduling models of task can be divided under a kind of solution distributed system
CN112416585A (en) GPU resource management and intelligent scheduling method for deep learning
CN110490416A (en) Task management method and terminal device
CN109343945A (en) A kind of multitask dynamic allocation method based on contract net algorithm
CN105208093B (en) The structure system of resource pool is calculated in a kind of cloud operating system
Xu et al. Living with artificial intelligence: A paradigm shift toward future network traffic control
CN110119399B (en) Business process optimization method based on machine learning
CN115297008A (en) Intelligent computing network-based collaborative training method and device, terminal and storage medium
CN113537504A (en) Transfer learning method, device and system based on edge nodes
CN114461368A (en) Multi-target cloud workflow scheduling method based on cooperative fruit fly algorithm
CN113014649B (en) Cloud Internet of things load balancing method, device and equipment based on deep learning
CN105827744A (en) Data processing method of cloud storage platform
CN106254452A (en) The big data access method of medical treatment under cloud platform
CN110119268B (en) Workflow optimization method based on artificial intelligence
CN113377503A (en) Task scheduling method, device and system for collaborative AI (artificial intelligence)
Tao et al. DRL-Driven Digital Twin Function Virtualization for Adaptive Service Response in 6G Networks
CN115329985B (en) Unmanned cluster intelligent model training method and device and electronic equipment
CN106302656A (en) The Medical Data processing method of cloud storage platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20211022