CN116562399A

CN116562399A - Model training method and device with end Bian Yun cooperated

Info

Publication number: CN116562399A
Application number: CN202310844489.XA
Authority: CN
Inventors: 宋金洲; 孙仁恩; 魏鹏; 张冠男
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2023-07-10
Filing date: 2023-07-10
Publication date: 2023-08-08

Abstract

One or more embodiments of the present disclosure provide a method and an apparatus for training a model by using end-to-end cloud coordination, where the method for training a model includes receiving gradient information sent by each application end corresponding to an edge node, performing aggregation processing based on the received gradient information sent by each application end to obtain aggregated gradient data, sending the aggregated gradient data to a cloud server, and training a global model by the cloud server according to the aggregated gradient data. In the embodiment of the specification, the gradient information of the application end is aggregated through the edge node, so that the data volume of the gradient information is reduced, and the characteristics of all the gradient information can be reserved to the greatest extent.

Description

Model training method and device with end Bian Yun cooperated

Technical Field

One or more embodiments of the present disclosure relate to the field of big data analysis technologies, and in particular, to a method and apparatus for training a model by using end-to-edge cloud collaboration.

Background

Some problems occur while the machine learning technology rapidly develops, for example, when a cloud server trains a machine learning model, characteristic data of a user side cannot be directly given to the cloud server due to the problem of privacy security, so that the federal learning technology is generated.

In the federal learning process, the user side does not need to exchange local data with the cloud server, and model training can be completed cooperatively only by sending gradient flow information obtained by local model training to the cloud. However, for a scene with massive sample data, the data volume of the gradient flow information sent by the user side is large, so that the time for model training by the cloud is long, and the model training speed is low.

Disclosure of Invention

In order to improve the training speed of a machine learning model, one or more embodiments of the present disclosure provide a method, a device, a terminal cloud system and a storage medium for training a model by using terminal cloud cooperation.

In a first aspect, one or more embodiments of the present disclosure provide a model training method for end-edge cloud collaboration, applied to an edge node, where the method includes:

receiving gradient information sent by each application end corresponding to the edge node, wherein the gradient information is obtained by training a local model deployed at the application end by the application end based on local characteristic data;

Performing aggregation processing based on the received gradient information sent by each application end to obtain aggregated gradient data;

and sending the aggregated gradient data to a cloud server so that the cloud server trains a global model deployed on the cloud server according to the aggregated gradient data.

In one or more embodiments of the present disclosure, the aggregating processing based on the received gradient information sent by each application end to obtain aggregated gradient data includes:

grouping the received gradient information sent by each application end according to the preset granularity to obtain a plurality of gradient information groups;

for each gradient information group, carrying out weighted average processing on all gradient information included in the gradient information group based on a preset weight value to obtain gradient data corresponding to the gradient information group;

and obtaining the aggregation gradient data according to the gradient data corresponding to each gradient information group.

carrying out logarithmic aggregation on the gradient information based on the received data quantity of the gradient information sent by each application end to obtain a plurality of gradient information groups;

In one or more embodiments of the present specification, the method further comprises:

receiving gradient update parameters sent by the cloud server, wherein the gradient update parameters are obtained by training the global model by the cloud server according to the aggregated gradient data;

and updating the preset weight value based on the gradient updating parameter.

acquiring current time information;

under the condition that the current time information is located in a first time period, performing first aggregation processing based on the received gradient information sent by each application end to obtain first aggregation gradient data;

and under the condition that the current time information is positioned in a second time period, performing second aggregation processing based on the received gradient information sent by each application end to obtain second aggregation gradient data, wherein the data volume of the first aggregation gradient data is different from that of the second aggregation gradient data.

In a second aspect, one or more embodiments of the present disclosure provide a model training method for end-to-edge cloud collaboration, applied to a cloud server, where the method includes:

receiving aggregation gradient data sent by each edge node, wherein the aggregation gradient data is obtained by the edge node according to the method of any implementation manner of the first aspect;

and training the global model deployed on the cloud server according to the received aggregated gradient data sent by each edge node.

and in the global model training process, obtaining gradient update parameters for updating the preset weight values of the edge nodes, and sending the gradient update parameters to the edge nodes so that the edge nodes update the preset weight values based on the gradient update parameters.

In a third aspect, one or more embodiments of the present disclosure provide an end-to-edge cloud collaborative model training apparatus, applied to an edge node, the apparatus comprising:

the information receiving module is configured to receive gradient information sent by each application end corresponding to the edge node, wherein the gradient information is obtained by training a local model deployed at the application end by the application end based on local characteristic data;

The aggregation processing module is configured to perform aggregation processing based on the received gradient information sent by each application end to obtain aggregation gradient data;

and the data sending module is configured to send the aggregation gradient data to a cloud server so that the cloud server trains a global model deployed on the cloud server according to the aggregation gradient data.

In a fourth aspect, one or more embodiments of the present disclosure provide a model training apparatus for end-to-edge cloud collaboration, applied to a cloud server, where the apparatus includes:

the data receiving module is configured to receive the aggregation gradient data sent by each edge node, wherein the aggregation gradient data is obtained by the edge node according to the method of any implementation mode of the first aspect;

and the model training module is configured to train the global model deployed on the cloud server according to the received aggregated gradient data sent by each edge node.

In a fifth aspect, one or more embodiments of the present specification provide an end edge cloud system comprising:

at least one application end;

at least one edge node for performing the method according to any implementation of the first aspect; and

A cloud server for performing the method according to any embodiment of the second aspect.

In a sixth aspect, one or more embodiments of the present specification provide a storage medium storing computer instructions for causing a computer to perform the method according to any embodiment of the first or second aspects.

The model training method in one or more embodiments of the present disclosure includes receiving gradient information sent by each application end corresponding to an edge node, performing aggregation processing based on the received gradient information sent by each application end to obtain aggregated gradient data, sending the aggregated gradient data to a cloud server, and training a global model according to the aggregated gradient data by the cloud server. In the embodiment of the specification, the gradient information of the application end is aggregated through the edge node, so that the data volume of the gradient information is reduced, and the characteristics of all the gradient information can be reserved to the greatest extent.

Drawings

Fig. 1 is a schematic diagram of a terminal edge cloud system according to an exemplary embodiment of the present disclosure.

Fig. 2 is a flow chart of a model training method in an exemplary embodiment of the present description.

Fig. 3 is a schematic diagram of a model training method in an exemplary embodiment of the present disclosure.

Fig. 4 is a flow chart of a model training method in an exemplary embodiment of the present description.

Fig. 5 is a flow chart of a model training method in an exemplary embodiment of the present description.

Fig. 6 is a flow chart of a model training method in an exemplary embodiment of the present description.

Fig. 7 is a flow chart of a model training method in an exemplary embodiment of the present description.

Fig. 8 is a flow chart of a model training method in an exemplary embodiment of the present description.

Fig. 9 is a block diagram of a model training device in an exemplary embodiment of the present disclosure.

Fig. 10 is a block diagram of a model training device in an exemplary embodiment of the present disclosure.

Fig. 11 is a block diagram of an apparatus in an exemplary embodiment of the present specification.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with one or more embodiments of the present specification. Rather, they are merely examples of apparatus and methods consistent with aspects of one or more embodiments of the present description as detailed in the accompanying claims.

It should be noted that: in other embodiments, the steps of the corresponding methods are not necessarily performed in the order shown and described in this specification. In some other embodiments, the method may include more or fewer steps than described in this specification. Furthermore, individual steps described in this specification may be described as being broken down into multiple steps in other embodiments; while various steps described in this specification may be combined into a single step in other embodiments.

User information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) referred to in this specification are both information and data authorized by the user or sufficiently authorized by the parties, and the collection, use and processing of relevant data requires compliance with relevant laws and regulations and standards of the relevant country and region, and is provided with corresponding operation portals for the user to choose authorization or denial.

The traditional end cloud collaborative model training technology is that the characteristic data of the user end are directly uploaded to a cloud server to carry out model training, however, part of the characteristic data of the user end relate to user privacy, and in order to ensure privacy safety, the sensitive characteristic data cannot be directly uploaded to the cloud server, so that federal learning (Federated Learning) technology is applied.

In federal learning, a user side can train a local model based on local characteristic data to obtain gradient information, encrypt the gradient information and upload the gradient information to a cloud server, and the cloud server completes model training on the cloud based on the gradient information. The federal learning can ensure that the user characteristic data is not local, thereby protecting the privacy security of the user.

In the traditional end cloud collaborative model training scheme, gradient information of all user ends is uploaded to a cloud server for model training, machine learning models are easy to be overfitted due to user characteristic differences of different regions and huge data volume, model training speed is slow due to data delay and other reasons, and based on the fact that the end Bian Yun collaborative system architecture is proposed.

Edge cloud collaboration is a distributed computing based on edge computing (edge computing), the architecture of which is shown in fig. 1, for example. The terminal refers to an application terminal, namely user terminal equipment, such as a mobile phone, wearing equipment, various sensors, a camera, an intelligent home and the like. The edge refers to edge nodes, which can be deployed near the application end device and are responsible for performing operations such as preliminary filtering, analysis, storage and the like on the uploaded data of the application end device in one area, for example, in the example of fig. 1, the edge nodes include 3 edge nodes in total, and each edge node corresponds to the application end device in one area. The cloud is a cloud server or a cloud server cluster and is responsible for analyzing, processing, storing and other operations on data uploaded by each edge node.

In the end-edge cloud collaborative system architecture, each edge node can be deployed close to user end equipment, so that each edge node can be responsible for data calculation and storage of the end equipment in the range of the edge node, and then the data is uploaded to the cloud. The nearby calculation mode can effectively reduce data delay and improve stability, saves network bandwidth and shares pressure for the cloud server.

In the related technology, in the training of a machine learning model with end-to-side cloud cooperation, an end side trains a local model by utilizing local characteristic data to obtain gradient information, and then the gradient information is uploaded to an edge node. And each edge node gathers the gradient information of all the terminal devices in the self-responsibility range and then uploads the gradient information to the cloud server, so that the cloud performs model training according to the gradient information from each edge node.

However, for model training scenes with mass data scale, the gradient information data volume from the terminal equipment is large, so that the data volume uploaded to the cloud end by the edge node is also large, and the consumption speed of the cloud server for the gradient information is limited during model training, so that the time consumption of the cloud server for model training is long, and the model training speed is low.

Based on the defects of the related art, one or more embodiments of the present disclosure provide a method, a device, a terminal cloud system and a storage medium for model training by using terminal cloud cooperation, which aim to improve the model training speed of a cloud end by using a model training mode by using terminal cloud cooperation, shorten training time, and guarantee model precision and effect.

In some embodiments, the present description provides a model training method for end-edge cloud collaboration, which is applicable to edge nodes, and the edge nodes perform processing.

As shown in fig. 2, the model training method cooperated with the terminal Bian Yun provided in one or more embodiments of the present disclosure includes:

s210, receiving gradient information sent by each application end corresponding to the edge node.

With reference to fig. 1, the end-edge cloud system includes a cloud server and a plurality of edge nodes, where the plurality of edge nodes may be deployed in different regions, respectively, so that each edge node is responsible for operations such as data receiving, processing, and storing of all application end devices within the region.

In some implementations, the edge node can be, for example, an edge node server, such as a CDN (ContentDeliveryNetwork) node server, or the like.

The application end is a user terminal, such as a mobile phone, a wearable device, an intelligent home and other devices. It can be understood that when the user uses the Application device, the Application device can collect some usage habits of the user, for example, when the user uses a certain mobile phone Application (App), the Application device can collect information such as clicking habits, page exposure duration, shopping records of the user, and the information is local feature data collected by the Application device.

The local feature data may be divided into sensitive data and non-sensitive data, where sensitive data refers to feature data related to user privacy, and non-sensitive data refers to feature data unrelated to user privacy. For non-sensitive data, the user privacy is not involved, so that the non-sensitive data can be directly uploaded to the cloud end for participating in model training by an application end. However, for sensitive data, due to the privacy protection problem of the user, the sensitive data cannot be directly uploaded to the cloud.

In federal learning, a local model needs to be deployed at each application end, and for the application end with local feature data, the application end can train the local model by using the local feature data locally, and corresponding gradient information can be obtained through local model training. The gradient information is information for marking the updating direction of the model, and the updating process of the model parameters is carried out through the gradient information, namely the process of model iteration and training. The gradient information can not directly expose the original local characteristic data, so that the transmission of the gradient information can well ensure the safety of the private data of the application end, which is also the core idea of federal learning.

In this embodiment of the present disclosure, each application may train the local model according to the local feature data owned by itself, so as to obtain corresponding gradient information, and then, the edge node may receive gradient information sent by all application terminals in its own responsibility range. For example, in the example scenario of fig. 1, gradient information obtained by the application ends A1 to A3 after the local models are respectively trained is sent to the edge node D1, gradient information obtained by the application ends B1 to B3 after the local models are respectively trained is sent to the edge node D2, and gradient information obtained by the application ends C1 to C3 after the local models are respectively trained is sent to the edge node D3.

S220, performing aggregation processing based on the received gradient information sent by each application end to obtain aggregation gradient data.

In the embodiment of the present disclosure, after receiving the gradient information sent by each application end, the edge node does not directly upload the entire gradient information to the cloud server, but performs aggregation processing on the gradient information. The aggregation processing of the gradient information means that the gradient information with huge data volume is processed into the aggregation gradient data with smaller data volume by a data aggregation mode on the basis of maintaining the full data field of view.

Taking the edge node D1 in the example scenario of fig. 1 as an example, it is assumed that the edge node D1 receives gradient information sent by 1 million pieces of end devices within its own responsibility, so that the data size of the gradient information received by the edge node D1 is 1 million pieces. If the edge node D1 directly sends the gradient information to the cloud server, the speed of the cloud server for training the gradient information consumed by the model is far less than the order of magnitude, so that the data storage pressure of the cloud server is high, and the model training speed is slow.

Therefore, in the present embodiment, the edge node D1 may perform aggregation processing on the received gradient information, where the aggregation processing mainly extracts the features of the gradient information, and aggregates the massive gradient information into information with a smaller data size on the basis of the missing features as few as possible. For example, by aggregating 1 million gradient information, the resulting aggregated gradient data may include only 1 million data amounts, such that the data amounts are reduced in multiples or exponential numbers.

In the present embodiment, the method of aggregating gradient information by the edge node is not limited, and for example, the gradient information may be aggregated by using an aggregation function such as weighted average or logarithmic aggregation, and those skilled in the art may also select an appropriate aggregation method by integrating the model training speed and accuracy or taking the peak and valley of the data amount into consideration.

It will be appreciated that the aggregation process differs from conventional data screening in that data screening requires screening of a portion of the total amount of data for purposes of reducing the amount of data, but in this manner the characteristics of the data being screened will be completely lost. The aggregation processing in the embodiment of the present disclosure still performs aggregation based on the full amount of gradient information, and the obtained aggregated gradient data can maximally retain the features of all gradient information, so that the subsequent model training is still established in the global field of view, and lays a data foundation for model accuracy and effect.

And S230, transmitting the aggregated gradient data to a cloud server so that the cloud server trains a global model deployed on the cloud server according to the aggregated gradient data.

In federal learning, a global model needs to be deployed at a cloud server side, and due to the difference of operation capacities, the global model of the cloud server may be more complex than a local model at an application side, but the global model has basically the same function as the local model at the application side, and in this embodiment of the present specification, the global model of the cloud server and the local model at the application side may be regarded as a model having the same function.

It can be understood that when each application end trains the local model, the local model is trained by utilizing own local characteristic data, and the environments between different application ends are completely isolated. And the global model of the cloud server is trained by utilizing the aggregated gradient data sent by all the edge nodes, so that the trained global model is a model for learning the data characteristics of all the application ends. The cloud server trains the global model by utilizing all the aggregated gradient data, namely, the target of the end Bian Yun collaborative model training.

In this embodiment of the present disclosure, after the gradient information is aggregated to obtain aggregated gradient data, each edge node may send the aggregated gradient data to a cloud server, and the cloud server may train the global model according to the received aggregated gradient data.

In some embodiments, as the cloud server trains the global model continuously, a new version of the global model can be obtained after the trained sample data reaches a certain amount, and then the cloud server can send the new version of the global model to edge nodes or each application end so as to deploy the latest global model at each edge node and each application end, thereby realizing reasoning tasks such as information recommendation, search prediction and the like by using the latest version of the global model at the application end. And then, continuing to repeat the above method process, and continuously training and updating the model by using new data, which can be understood by those skilled in the art, and the description is not repeated.

As can be seen from the foregoing, in the embodiment of the present disclosure, the gradient information of the application end is aggregated by the edge node, so that the data size of the gradient information is reduced, and the features of all gradient information can be retained to the greatest extent.

Fig. 3 illustrates a framework diagram of a model training method coordinated by the end Bian Yun in some embodiments of the present description, as described below in connection with fig. 3.

As shown in fig. 3, for each application end, user feature data, that is, local feature data described in the present specification, may be collected in real time during the use process of the user. The manner and principles of user characteristic data collection will be understood and fully implemented by those skilled in the art with reference to the relevant art, and this description is not intended to be limiting.

Each application end can train the local model deployed locally according to the local feature data, the training process of the local model is a supervised training process, and can be understood and implemented by a person skilled in the art with reference to related technologies.

After the gradient information is obtained through local training, each application end can send the gradient information to the edge node corresponding to the gradient information. In a mass data scale scene, the data volume of the gradient information received by each edge node is large, so that the edge node can aggregate the received gradient information through one or more method steps to obtain aggregate gradient data with reduced data volume. Then, each edge node uploads the aggregated gradient data to a cloud server, which trains the global model based on the aggregated gradient data.

In some embodiments of the present disclosure, the edge node needs to consider two points in the aggregation processing of the gradient information: firstly, the data size after aggregation treatment; secondly, the effect and model precision of cloud server model training are achieved.

It can be understood that if the data size of the aggregated gradient data obtained after the aggregation processing is smaller, the feature information lost in the aggregation processing process of the gradient information is more, so that the speed of the subsequent cloud model training is faster, and meanwhile, the model accuracy is relatively poor. Otherwise, if the data volume of the aggregation gradient data obtained after the aggregation processing is larger, the lost characteristic information in the aggregation processing process of the gradient information is smaller, so that the subsequent cloud model training speed is slower, and meanwhile, the model precision is relatively higher.

Thus, in some embodiments of the present disclosure, it is contemplated that different aggregation processing modes may be used in different situations. For example, in one exemplary scenario, taking a mobile phone application as an example, a user often uses more mobile phones in the daytime and uses less mobile phones at night, so that the data volume in the daytime is far greater than the data volume in the nighttime in the model training process coordinated with the terminal Bian Yun. Accordingly, in some embodiments of the present disclosure, different polymerization processing modes may be set based on different time periods, which is described below with reference to fig. 4.

As shown in fig. 4, in some embodiments, the model training method illustrated in the present disclosure performs, based on the received gradient information sent by each application end, a process of aggregating to obtain aggregated gradient data, where the process includes:

s410, acquiring current time information.

And S420, performing first aggregation processing based on the received gradient information sent by each application end under the condition that the current time information is in the first time period, so as to obtain first aggregation gradient data.

And S430, performing second polymerization processing based on the received gradient information sent by each application end under the condition that the current time information is located in the second time period, so as to obtain second polymerization gradient data.

In this embodiment of the present disclosure, different time periods may be divided in advance, so that the time period in which the current model is trained is determined according to the time of the current model training, and then the edge node may select an aggregation processing manner corresponding to the time period to perform aggregation processing on the received gradient information.

For example, in one example scenario, 07:00 to 20:00 per day may be divided into first time periods and 20:00 per day to 07:00 per day to the next day may be divided into second time periods in days. It can be appreciated that the first period is daytime, the data volume of the system is larger, the data consumption pressure of the cloud server is larger, and the edge node should preferentially reduce the data volume to the cloud server. And the second time period is at night, the data volume of the system is smaller, the data consumption pressure of the cloud server is smaller, and the edge node should preferentially ensure the accuracy of model training.

Therefore, in the example, when the end edge cooperates to perform model training, the current time information of the system can be acquired in real time, and then whether the current time information belongs to the first time period or the second time period is judged according to the current time information.

If the current time information of the system is in the first time period, the system data size is larger, so that the edge node can perform first aggregation processing on the received gradient information, and the data size of the first aggregation gradient data obtained after the first aggregation processing is smaller. For example, in one example, the first aggregation process may be logarithmic aggregation, such that the amount of aggregated gradient data obtained after the aggregation process will decrease exponentially compared to the original gradient information, thereby alleviating the data consumption pressure of the cloud server.

If the current time information of the system is in the second time period, the system data size is smaller, so that the edge node can perform second aggregation processing on the received gradient information, and the data size of second aggregation gradient data obtained after the second aggregation processing is larger than that of the first aggregation gradient data. For example, in one example, the second aggregation process may be a group weighted average, so that the aggregate gradient data obtained after the aggregation process is reduced in data volume by a multiple compared to the original gradient information, thereby improving the accuracy of model training while meeting the consumption speed of the cloud server.

Of course, it will be understood by those skilled in the art that the manner of dividing the time period of model training is not limited to the above example, and any time period may be divided according to the peak value and the valley value of the system data amount according to specific scene requirements, which will not be described in detail in this specification. Also, specific procedures of the first polymerization process and the second polymerization process described above are described below in this specification.

According to the method and the device, in the embodiment of the specification, the time period of model training is divided, and different aggregation processing modes are adopted for edge nodes in different time periods, so that the model training speed is guaranteed in a time period with large data volume, the cloud server pressure is relieved, the model training precision is guaranteed in a time period with relatively small data volume, and the model effect is improved.

As shown in fig. 5, in some embodiments, the model training method illustrated in the present specification includes a process of performing, by an edge node, a second polymerization process based on gradient information, including:

s510, grouping the received gradient information sent by each application end according to the preset granularity, and obtaining a plurality of gradient information groups.

Taking an edge node as an example, the edge node receives gradient information sent by each application end in a self-responsibility range, the data volume of the gradient information is assumed to be n, and in this embodiment of the present disclosure, the gradient information with the data volume of n needs to be first grouped to obtain a plurality of gradient information groups.

The preset granularity is the data size m included in each gradient information set, so that n gradient information sets can be obtained by grouping n gradient information with the preset granularity m, wherein each gradient information set comprises m gradient information.

S520, for each gradient information set, carrying out weighted average processing on all gradient information included in the gradient information set based on a preset weight value to obtain gradient data corresponding to the gradient information set.

It will be appreciated that in the foregoing example, one gradient information set is taken as an example, and the gradient information set includes m gradient information in total. In the present embodiment, by performing weighted average processing on m pieces of gradient information included in the gradient information set, one piece of gradient information, that is, gradient data obtained after performing aggregation processing on the gradient information set is expressed as:

（1）

In the formula (1), P _j Gradient data representing the jth gradient information set, T _i Represents the ith gradient information, ω _i And representing a preset weight value corresponding to the ith gradient information.

Therefore, after the weighted average processing is performed on the m gradient information included in each gradient information set through the formula (1), the gradient data corresponding to each gradient information set can be obtained, and n/m gradient data can be obtained as n gradient information is totally divided into n/m gradient information sets.

S530, acquiring aggregate gradient data according to the gradient data corresponding to each gradient information set.

In the foregoing example of the present specification, n/m gradient data obtained after the weighted average processing is performed on the m gradient information included in each gradient information set, that is, the aggregated gradient data of the embodiment of the present specification.

It can be seen that in this example, the data amount of the gradient information received by the edge node is n, the data amount of the aggregated gradient data obtained after the aggregation processing is n/m, so that the data amount sent to the cloud server is reduced by a multiple, for example, assuming that the data amount of the gradient information is n=100000, the preset granularity is m=100, so that the data amount of the aggregated gradient data obtained after the aggregation processing is reduced to n/m=100000/100=1000. In addition, the whole gradient information can be combined in the aggregation processing process, so that the model training of the follow-up cloud server is still established in the global field of view, and the model accuracy is ensured.

As shown in fig. 6, in some embodiments, the model training method illustrated in the present specification includes a process of performing, by an edge node, a first aggregation process based on gradient information, including:

s610, carrying out logarithmic aggregation on the gradient information based on the received data quantity of the gradient information sent by each application end, and obtaining a plurality of gradient information groups.

In the present embodiment, the difference is that the gradient information is grouped with the preset granularity as shown in fig. 5, and in this example, the gradient information is grouped by means of logarithmic aggregation.

Taking an edge node as an example, the edge node receives gradient information sent by each application end in its own responsible range, where the data size of the gradient information is assumed to be n, and in this embodiment of the present disclosure, the number of gradient information groups divided by the n gradient information is lg (n), where the number m of gradient information included in each gradient information group is m=n/lg (n). For example, the data amount of gradient information n=100000, in this example, n=100000 gradient information is divided into lg (100000) =5 gradient information groups, and the number of gradient information included in each gradient information group is m=100000/5=20000.

S620, for each gradient information set, carrying out weighted average processing on all gradient information included in the gradient information set based on a preset weight value to obtain gradient data corresponding to the gradient information set.

S630, acquiring aggregate gradient data according to the gradient data corresponding to each gradient information group.

It can be understood that in the foregoing example, after the gradient information is divided into lg (n) gradient information sets, for m gradient information included in each gradient information set, a weighted average process may be performed on all gradient information included in each gradient information set according to the foregoing process of S520 to S530, so as to obtain gradient data corresponding to each gradient information set, and further obtain aggregated gradient data, which is not described in detail in this specification.

It can be seen that in this example, the data amount of the gradient information received by the edge node is n, the data amount of the aggregated gradient data obtained after the aggregation processing is lg (n), so that the data amount sent to the cloud server is exponentially reduced, for example, it is assumed that the data amount of the gradient information n=100000, so that the data amount of the aggregated gradient data obtained after the aggregation processing will be reduced to lg (100000) =5, and the data amount is reduced to a greater extent than in the embodiment of fig. 5. In addition, the whole gradient information can be combined in the aggregation processing process, so that the model training of the follow-up cloud server is still established in the global field of view, and the model accuracy is ensured.

It should be noted that, in some embodiments, when the end-to-side cloud collaborative model is trained, the edge node may only perform aggregation processing on gradient information sent by the application end, and transmit the aggregated gradient data after the aggregation processing to the cloud server, without directly participating in a model training process of the subsequent cloud server.

In other embodiments, the edge node may also be trained in conjunction with the cloud server when the cloud server trains the global model, so as to preset a weight value ω in the edge node _i The update iterations are also performed to further improve model accuracy, as will be described below in connection with fig. 7.

As shown in fig. 7, in some embodiments, the model training method illustrated in the present specification further includes:

S710, receiving gradient update parameters sent by the cloud server.

S720, updating the preset weight value based on the gradient updating parameters.

According to the formula (1), when the edge node performs weighted average on the gradient information, a preset weight value needs to be set for each gradient information, the preset weight value can be preset, and then in the process of training the global model by the cloud server, the preset weight value of the edge node is synchronously updated and optimized, so that the edge node learns the latest data characteristics, and the aggregation processing effect of the gradient information is improved.

Specifically, when the cloud server trains the global model according to the aggregated gradient data sent by each edge node, the cloud server not only generates the related information for performing parameter tuning on the global model, but also generates the related information for performing parameter tuning on the preset weight value of each edge node, and the related information for performing parameter tuning on the preset weight value of the edge node is the gradient update parameter described in the specification.

Thereby making itDuring each iteration training of the cloud server, the cloud server can send the gradient update parameters to each edge node, and the gradient update parameters can be understood as preset weight values omega shown in formula (1) _i The updated gradient values are performed so that the edge node can utilize the gradient update parameters to preset weight values omega of each gradient information after receiving the gradient update parameters _i Updating is carried out, the cycle is repeated, and when each cycle of iterative training is carried out, the edge node can update the preset weight value omega based on the gradient updating parameters sent by the cloud server _i And updating is carried out, so that the edge node can learn the latest data characteristics, and the effect and the precision of aggregation processing are improved.

In some embodiments, the present description provides a model training method of end-edge cloud collaboration, which can be applied to a cloud server, and the cloud server performs processing.

As shown in fig. 8, the model training method cooperated with the terminal Bian Yun provided in one or more embodiments of the present disclosure includes:

s810, receiving aggregation gradient data sent by each edge node.

S820, training a global model deployed on the cloud server according to the received aggregation gradient data sent by each edge node.

In combination with the foregoing embodiments of fig. 1 and fig. 3, each application may collect local feature data, then train the local model deployed locally according to the local feature data to obtain gradient information, and after obtaining the gradient information through local training, each application may send the gradient information to the edge node corresponding to the gradient information. Thus, the edge node may aggregate the received gradient information through one or more of the foregoing method steps to obtain aggregated gradient data. Then, each edge node uploads the aggregated gradient data to a cloud server, which trains the global model based on the aggregated gradient data.

It is worth to say that, along with the continuous training of the global model by the cloud server, the cloud server can obtain a new version of global model after the training sample data reaches a certain amount. After the new version of the global model is obtained, the cloud server can select whether to send the new version of the model to an application end or an edge node according to specific scene requirements. For example, in one example, the model reasoning service needs to be executed at the application end, so that the cloud server can issue the latest version of the global model obtained through training to each application end so as to deploy the latest global model at the application end, and thus, the application end can utilize the latest version of the global model to realize reasoning tasks such as information recommendation, search prediction and the like.

As can be seen from the foregoing embodiment of fig. 7, when the cloud server trains the global model according to the aggregated gradient data sent by each edge node, the cloud server generates relevant information for parameter tuning of the global model, and may also generate gradient update parameters for parameter tuning of preset weight values of each edge node, and send the gradient update parameters to each edge node, so that the edge node updates the preset weight values based on the gradient update parameters. Those skilled in the art will understand and realize the foregoing and will not be further described in this specification.

As can be seen from the foregoing, in the embodiment of the present disclosure, the gradient information of the application end is aggregated by the edge node, so that the data size of the gradient information is reduced, and the features of all gradient information can be retained to the greatest extent. And moreover, the time period of model training is divided, and different aggregation processing modes are adopted for edge nodes in different time periods, so that the model training speed is ensured in a time period with large data volume, the cloud server pressure is relieved, the model training precision is ensured in a time period with relatively small data volume, and the model effect is improved.

In some embodiments, one or more embodiments of the present description provide an end-to-edge cloud collaborative model training apparatus, which may be applied to edge nodes.

As shown in fig. 9, the model training apparatus provided by one or more embodiments of the present disclosure includes:

the information receiving module 10 is configured to receive gradient information sent by each application end corresponding to the edge node, wherein the gradient information is obtained by training a local model deployed at the application end by the application end based on local characteristic data;

The aggregation processing module 20 is configured to perform aggregation processing based on the received gradient information sent by each application end, so as to obtain aggregated gradient data;

the data sending module 30 is configured to send the aggregated gradient data to a cloud server, so that the cloud server trains a global model deployed on the cloud server according to the aggregated gradient data.

In some embodiments, one or more embodiments of the present description provide a model training apparatus for end-to-edge cloud collaboration, which may be applied to a cloud server.

As shown in fig. 10, the model training apparatus provided by one or more embodiments of the present disclosure includes:

a data receiving module 40 configured to receive aggregate gradient data sent by each edge node, the aggregate gradient data being obtained by the edge node according to the method of any embodiment of the first aspect;

the model training module 50 is configured to train a global model deployed on the cloud server according to the received aggregated gradient data sent by each edge node.

In some embodiments, one or more embodiments of the present description provide an end-edge cloud system comprising:

The application end, the edge node and the cloud server are referred to by the person skilled in the art with reference to the foregoing embodiments, and will not be described herein.

In some embodiments, one or more embodiments of the present description provide a storage medium storing computer instructions for causing a computer to perform the method of any of the preceding embodiments.

Fig. 11 is a schematic structural diagram of an apparatus according to an exemplary embodiment, where the apparatus may be the foregoing application end, or may be an edge node, or may be a cloud server, and this disclosure is not limited thereto.

Referring to fig. 11, at the hardware level, the device includes a processor 702, an internal bus 704, a network interface 706, a memory 708, and a non-volatile storage 710, although other scenarios may also include the hardware required. One or more embodiments of the present description may be implemented in a software-based manner, such as by the processor 702 reading a corresponding computer program from the non-volatile storage 710 into the memory 708 and then running. Of course, in addition to software implementation, one or more embodiments of the present disclosure do not exclude other implementation manners, such as a logic device or a combination of software and hardware, etc., that is, the execution subject of the following processing flow is not limited to each logic unit, but may also be hardware or a logic device.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. A typical implementation device is a computer, which may be in the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email device, game console, tablet computer, wearable device, or a combination of any of these devices.

In a typical configuration, a computer includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, read only compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic disk storage, quantum memory, graphene-based storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by the computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

The foregoing describes certain embodiments of the present description. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

The terminology used in the one or more embodiments of the specification is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the specification. As used in this specification, one or more embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used in one or more embodiments of the present specification to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.

The foregoing description of the preferred embodiment(s) is (are) merely intended to illustrate the embodiment(s) of the present invention, and it is not intended to limit the embodiment(s) of the present invention to the particular embodiment(s) described.

Claims

1. An end-edge cloud collaborative model training method applied to edge nodes, the method comprising:

2. The method of claim 1, wherein the aggregating based on the received gradient information sent by each application end to obtain aggregated gradient data, comprises:

3. The method of claim 1, wherein the aggregating based on the received gradient information sent by each application end to obtain aggregated gradient data, comprises:

4. A method according to claim 2 or 3, further comprising:

And updating the preset weight value based on the gradient updating parameter.

5. The method of claim 1, wherein the aggregating based on the received gradient information sent by each application end to obtain aggregated gradient data, comprises:

acquiring current time information;

6. A model training method of end-edge cloud cooperation is applied to a cloud server, and comprises the following steps:

receiving aggregation gradient data sent by each edge node, wherein the aggregation gradient data are obtained by the edge node according to the method of any one of claims 1 to 5;

7. The method of claim 6, further comprising:

8. An end-edge cloud collaborative model training apparatus applied to edge nodes, the apparatus comprising:

9. A model training device for end-to-end cloud collaboration, applied to a cloud server, the device comprising:

A data receiving module configured to receive aggregated gradient data transmitted by respective edge nodes, the aggregated gradient data being derived by the edge nodes according to the method of any one of claims 1 to 5;

10. An end edge cloud system, comprising:

at least one application end;

at least one edge node for performing the method according to any of claims 1 to 5; and

cloud server for performing the method according to any of claims 6 to 7.

11. A storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1 to 5 or to perform the method of any one of claims 6 to 7.