CN116611536B

CN116611536B - Model training method and device, electronic equipment and storage medium

Info

Publication number: CN116611536B
Application number: CN202310887452.5A
Authority: CN
Inventors: 周希敏; 张冠男; 魏鹏; 孙仁恩
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2023-07-19
Filing date: 2023-07-19
Publication date: 2023-09-29
Anticipated expiration: 2043-07-19
Also published as: CN116611536A

Abstract

One or more embodiments of the present disclosure provide a model training method and apparatus, an electronic device, and a storage medium, where the method combines three sides of a terminal, an edge CND node, and a cloud to complete training of a target model, thereby improving a training effect of the target model, and the method includes: receiving a plurality of gradient data, wherein the gradient data are generated by a terminal according to sample data and a target model and are sent to an edge CDN node; performing aggregation treatment on the plurality of gradient data to obtain a first aggregation result; and sending the first aggregation result to a cloud end, so that the cloud end updates the target model according to the first aggregation result, and sending the updated target model to a terminal generating the plurality of gradient data.

Description

Model training method and device, electronic equipment and storage medium

Technical Field

One or more embodiments of the present disclosure relate to the field of model training technologies, and in particular, to a model training method and apparatus, an electronic device, and a storage medium.

Background

With the continuous development of artificial intelligence technology and big data technology, various automatic and intelligent services are more and more, and the services bring very good use experience to users, especially bring targeted use experience to different users. The quality of these services depends on the training effect of the network model in machine learning. In the related art, when a network model is trained, a terminal device collects user data and uploads the user data to a cloud end, and the cloud end uses the user data to train the model and issue the model, but the training effect obtained by the training mode in the related art is still to be improved.

Disclosure of Invention

In view of this, one or more embodiments of the present disclosure provide a model training method and apparatus, an electronic device, and a storage medium.

In order to achieve the above object, one or more embodiments of the present disclosure provide the following technical solutions:

according to a first aspect of one or more embodiments of the present specification, there is provided a model training method, the method comprising:

receiving a plurality of gradient data, wherein the gradient data are generated by a terminal according to sample data and a target model and are sent to an edge CDN node;

performing aggregation treatment on the plurality of gradient data to obtain a first aggregation result;

and sending the first aggregation result to a cloud end, so that the cloud end updates the target model according to the first aggregation result, and sending the updated target model to a terminal generating the plurality of gradient data.

In one embodiment of the present disclosure, the gradient data is compressed by a terminal before being sent to an edge CDN node;

before the aggregation processing is performed on the plurality of gradient data to obtain a first aggregation result, the method further includes:

and respectively decompressing each gradient data in the plurality of gradient data.

In one embodiment of the present disclosure, the gradient data is homomorphic encrypted by a terminal before being sent to a CDN node;

the aggregation processing is performed on the plurality of gradient data to obtain a first aggregation result, including:

carrying out aggregation treatment on a plurality of gradient data subjected to homomorphic encryption treatment to obtain a first aggregation result;

the sending the first aggregation result to a cloud end, so that the cloud end updates the target model according to the first aggregation result, including:

and sending the first aggregation result to a cloud end so that the cloud end updates the target model according to the decryption result of the first aggregation result.

In one embodiment of the present disclosure, the aggregating the plurality of gradient data to obtain a first aggregate result includes:

and carrying out aggregation average on the plurality of gradient data, and determining the obtained average value as the first aggregation result.

In one embodiment of the present specification, the gradient data includes gradients and model versions;

carrying out aggregation treatment on the gradient data with the model version being the latest version in the plurality of gradient data to obtain a first aggregation result;

The sending the first aggregation result to a cloud end, so that the cloud end updates the target model according to the first aggregation result, and sends the updated target model to a terminal generating the plurality of gradient data, including:

and sending the first aggregation result to a cloud end, so that the cloud end updates the target model according to the first aggregation result, and sending the updated target model to a terminal generating gradient data with a model version of the plurality of gradient data being the latest version.

and responding to the number of the plurality of gradient data reaching a preset number threshold, and carrying out aggregation processing on the plurality of gradient data to obtain a first aggregation result.

In an embodiment of the present disclosure, the sending the first aggregation result to a cloud end, so that the cloud end updates the target model according to the first aggregation result, and sends the updated target model to a terminal that generates the plurality of gradient data, where the method includes:

and the first aggregation result is compressed and then sent to a cloud end, so that the cloud end updates the target model according to the first aggregation result after decompression, and the updated target model is sent to a terminal generating the plurality of gradient data.

In one embodiment of the present specification, the gradient data includes a gradient and a terminal identification;

and sending the first aggregation result and an identification list to a cloud end, so that the cloud end updates the target model according to the first aggregation result, and sending the updated target model to a terminal indicated in the identification list, wherein the identification list comprises terminal identifications in each gradient data in the plurality of gradient data.

According to a second aspect of one or more embodiments of the present specification, there is provided a model training method, the method comprising:

receiving a first aggregation result, wherein the first aggregation result is obtained by aggregating a plurality of gradient data by an edge CDN node and is sent to a cloud, and the gradient data is generated by a terminal according to sample data and a target model and is sent to the edge CDN node;

and updating the target model according to the first aggregation result, and sending the updated target model to a terminal generating the plurality of gradient data.

In one embodiment of the present specification, the receiving the first aggregation result includes:

receiving a plurality of first aggregation results, wherein the plurality of first aggregation results are generated by at least one edge CDN node and sent to a cloud;

the updating the target model according to the first aggregation result, and sending the updated target model to a terminal generating the plurality of gradient data, including:

performing polymerization treatment on the plurality of first polymerization results to obtain a second polymerization result;

and updating the target model according to the second aggregation results, and sending the updated target model to the terminals corresponding to the plurality of first aggregation results.

In an embodiment of the present disclosure, the first aggregation result is obtained by aggregating a plurality of homomorphic encrypted gradient data by an edge CDN node and is sent to a cloud end;

said updating said object model according to said second polymerization result comprises:

and updating the target model according to the homomorphic decrypted second polymerization result.

In one embodiment of the present disclosure, the plurality of first aggregation results are generated by a plurality of edge CDN nodes and sent to the cloud, where the plurality of edge CDN nodes belong to at least one region;

The receiving the first aggregation result further includes:

receiving node identifiers corresponding to each first aggregation result in the plurality of first aggregation results, wherein the node identifiers comprise identifiers of edge CDN nodes generating the first aggregation results;

the step of polymerizing the plurality of first polymerization results to obtain a second polymerization result includes:

grouping the plurality of first aggregation results according to the node identifier corresponding to each first aggregation result in the plurality of first aggregation results to obtain at least one first aggregation result of each region;

respectively carrying out polymerization treatment on at least one first polymerization result of each region to obtain a second polymerization result of each region;

the updating the target model according to the second aggregation result, and sending the updated target model to the terminals corresponding to the plurality of first aggregation results, including:

and for each region, updating the target model corresponding to the region according to the second aggregation result of the region, and sending the updated target model to the terminal corresponding to at least one first aggregation result of the region.

In one embodiment of the present specification, the receiving the first aggregation result further includes:

Receiving an identification list corresponding to each first aggregation result in the plurality of first aggregation results, wherein the identification list comprises terminal identifications in a plurality of gradient data corresponding to the first aggregation results;

the sending the updated target model to the terminals corresponding to the plurality of first aggregation results includes:

and according to the identification list corresponding to each first aggregation result in the plurality of first aggregation results, sending the updated target model to the terminal corresponding to the plurality of first aggregation results.

In an embodiment of the present disclosure, the first aggregation result is obtained by aggregating, by an edge CDN node, gradient data having a model version that is the latest version of the plurality of gradient data, and sending the aggregated result to a cloud;

and sending the updated target model to a terminal generating the gradient data with the model version being the latest version in the plurality of gradient data.

In one embodiment of the present specification, the method further comprises:

and sending the original model of the target model of the latest version to a terminal generating gradient data of a model version which is not the latest version of the plurality of gradient data.

According to a third aspect of one or more embodiments of the present specification, there is provided a model training apparatus, the apparatus comprising:

the first receiving module is used for receiving a plurality of gradient data, wherein the gradient data are generated by the terminal according to the sample data and the target model and are sent to the edge CDN node;

the aggregation module is used for carrying out aggregation treatment on the plurality of gradient data to obtain a first aggregation result;

the first updating module is used for sending the first aggregation result to a cloud end so that the cloud end updates the target model according to the first aggregation result and sends the updated target model to a terminal generating the plurality of gradient data.

the apparatus further comprises a decompression module for:

and respectively decompressing each gradient data in the plurality of gradient data before the plurality of gradient data are subjected to aggregation processing to obtain a first aggregation result.

The aggregation module is specifically used for:

the first updating module is configured to send the first aggregation result to a cloud, so that when the cloud updates the target model according to the first aggregation result, the first updating module is specifically configured to:

In one embodiment of the present specification, the aggregation module is specifically configured to:

the aggregation module is specifically used for:

the first updating module is specifically configured to:

In one embodiment of the present specification, the first updating module is specifically configured to:

the first updating module is specifically configured to:

According to a fourth aspect of one or more embodiments of the present specification, there is provided a model training apparatus, the apparatus comprising:

The second receiving module is used for receiving a first aggregation result, wherein the first aggregation result is obtained by aggregating a plurality of gradient data by an edge CDN node and is sent to a cloud, and the gradient data is generated by a terminal according to sample data and a target model and is sent to the edge CDN node;

and the second updating module is used for updating the target model according to the first aggregation result and sending the updated target model to a terminal generating the plurality of gradient data.

In one embodiment of the present specification, the second receiving module is specifically configured to:

the second updating module is specifically configured to:

The second updating module is configured to, when updating the target model according to the second polymerization result, specifically:

the second receiving module is further configured to:

the second updating module is configured to aggregate the plurality of first aggregation results, and when a second aggregation result is obtained, the second updating module is specifically configured to:

the second updating module is configured to update the target model according to the second aggregation result, and when sending the updated target model to the terminals corresponding to the plurality of first aggregation results, the second updating module is specifically configured to:

In one embodiment of the present specification, the second receiving module is further configured to:

the second updating module is configured to, when sending the updated target model to the terminals corresponding to the plurality of first aggregation results, specifically:

the second updating module is configured to, when sending the updated object model to a terminal that generates the plurality of gradient data, specifically:

In one embodiment of the present specification, the apparatus further includes a version module for:

According to a third aspect of one or more embodiments of the present specification, there is provided an electronic device comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor implements the method of the first or second aspect by executing the executable instructions.

According to a fourth aspect of one or more embodiments of the present description, there is provided a computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the method according to the first or second aspect.

The technical scheme provided by the embodiment of the specification can comprise the following beneficial effects:

in the model training method provided by the embodiment of the specification, the terminal generates gradient data according to the sample data and the target model and sends the gradient data to the CDN node, the CDN node performs aggregation processing on the received plurality of gradient data, and sends the obtained first aggregation result to the cloud, and the cloud updates the target model according to the received first aggregation result and sends the updated target model to the terminal generating the plurality of gradient data. According to the method, the training of the target model is completed by combining the three sides of the terminal, the edge CND node and the cloud, and the training effect of the target model is improved from at least three aspects: the gradient data is generated at the terminal, namely the sample data of the terminal is always kept at the local, so that the safety of the user data is protected, the leakage of the privacy of the user is avoided, and the terminal can utilize all local sample data (including the privacy data of the user) so that the target model can learn comprehensive knowledge; because the aggregation processing of the plurality of gradient data is completed at the edge CDN node, and only the first aggregation result is uploaded to the cloud end, the data volume of the cloud end is reduced, the bandwidth cost of the cloud end is reduced, and the model updating frequency of the cloud end is improved; because the updating of the target model is based on the first aggregation result obtained by the aggregation processing of the plurality of gradient data, each terminal can share the data of other terminals, namely, the target model trained on the basis of comprehensive and various data sets can be obtained, and therefore the accuracy and the robustness of the target model are improved.

Drawings

Fig. 1 is a schematic structural diagram of a terminal, an edge CDN node, and a cloud end according to an exemplary embodiment.

Fig. 2 is a flow chart of a model training method running on an edge CND node, provided in an exemplary embodiment.

FIG. 3 is a flowchart of a model training method running in the cloud end, according to an exemplary embodiment.

FIG. 4 is a flowchart of a method of updating a target model provided by an exemplary embodiment.

Fig. 5 is a schematic diagram of an apparatus according to an exemplary embodiment.

Fig. 6 is a block diagram of a model training apparatus operating at an edge CND node, provided in an exemplary embodiment.

FIG. 7 is a block diagram of a model training apparatus operating in the cloud end, according to an example embodiment.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with one or more embodiments of the present specification. Rather, they are merely examples of apparatus and methods consistent with aspects of one or more embodiments of the present description as detailed in the accompanying claims.

It should be noted that: in other embodiments, the steps of the corresponding method are not necessarily performed in the order shown and described in this specification. In some other embodiments, the method may include more or fewer steps than described in this specification. Furthermore, individual steps described in this specification, in other embodiments, may be described as being split into multiple steps; while various steps described in this specification may be combined into a single step in other embodiments.

With the continuous development of artificial intelligence technology and big data technology, various automatic and intelligent services are more and more, and the services bring very good use experience to users, especially bring targeted use experience to different users. The quality of these services depends on the training effect of the network model in machine learning. In the related art, when a network model is trained, a terminal device collects user data and uploads the user data to a cloud end, the cloud end utilizes the user data to perform model training and model issuing, but in the training mode in the related art, the terminal cannot upload some privacy data to the cloud end, so that the model cannot learn enough and comprehensive knowledge, and the obtained training effect is required to be improved.

Based on the above, in a first aspect, at least one embodiment of the present disclosure provides a model training method, which is used for training a target model running in a terminal, so that the target model can learn knowledge related to a user to which the terminal belongs and other users, and accuracy and pertinence of services provided by the target model are improved. Illustratively, the goal model is a content recommendation model, and the method aims at enabling the content recommended by the goal model to meet the requirement of the user, for example, enabling the video recommended by the goal model to be video which is interesting to the user and is willing to put into time for watching.

The method can be implemented by the terminal, the edge CDN node and the cloud end shown in fig. 1. The terminal can be a terminal device such as a smart phone, a tablet computer and the like for running the target model. The edge CDN nodes are server nodes distributed in different geographic positions worldwide and are responsible for caching, accelerating and delivering static and dynamic contents; edge CDN nodes differ from traditional CDNs in that they are closer to the user, typically at the edge of the network, i.e., closest to the user. Therefore, network delay can be effectively reduced, loading speed can be improved, and user experience can be improved. The cloud may be a server side of the server. It can be understood that the service providing range of the server can be divided into a plurality of areas, the server is internally provided with a target model of each area in the plurality of areas, the target model of each area is issued to all terminals in the corresponding area, and the target model in each terminal is trained and iterated by the method; each area may have a plurality of edge CDN nodes, where each edge CDN node manages all or part of the terminals in the area, i.e., receives the requests sent by the plurality of terminals in the area.

The terminal generates gradient data according to the sample data and the target model and sends the gradient data to the CDN node, the CDN node carries out aggregation processing on the received plurality of gradient data, the obtained first aggregation result is sent to the cloud end, the cloud end updates the target model according to the received first aggregation result, and the updated target model is sent to the terminal for generating the plurality of gradient data. The gradient data is generated at the terminal, namely the sample data of the terminal is always kept at the local, so that the safety of the user data is protected, the leakage of the privacy of the user is avoided, and the terminal can utilize all local sample data (including the privacy data of the user) so that the target model can learn comprehensive knowledge; because the aggregation processing of the plurality of gradient data is completed at the edge CDN node, and only the first aggregation result is uploaded to the cloud end, the data volume of the cloud end is reduced, the bandwidth cost of the cloud end is reduced, and the model updating frequency of the cloud end is improved; because the updating of the target model is based on the first aggregation result obtained by the aggregation processing of the plurality of gradient data, each terminal can share the data of other terminals, namely, the target model trained on the basis of comprehensive and various data sets can be obtained, and therefore the accuracy and the robustness of the target model are improved.

The model training method is described in detail from two sides of the edge CND node and the cloud respectively, and it can be understood that the terminal side is relatively brief, and the process will be described on other two sides, and will not be described separately.

Referring to fig. 2, a flow of a model training method running at an edge CDN node is schematically shown, and includes steps S201 to S203.

In step S201, a plurality of gradient data are received, wherein the gradient data are generated by a terminal according to sample data and a target model, and are sent to an edge CDN node.

The terminal may collect sample data in the process of running the target model, and tag the sample data according to user behavior, for example, after the target model recommends a certain video, the type of the video may be used as sample data, and tag the sample data according to the proportion of the time length of the user watching the video (that is, the proportion of the watching time length to the total time length of the video). After the terminal device collects a batch of sample data and marks the tag, the batch of sample data can be input into an xnn-end intelligent reasoning engine to obtain a corresponding gradient.

It will be appreciated that the gradient data may be processed by the terminal before being sent to the edge CDN nodes at least one of: compression processing, homomorphic encryption processing and flattening processing. The compression processing can adopt a lossless data compression algorithm (zstd), and the compression processing can reduce the data transmission quantity and the communication cost between the terminal and the edge CND node; the homomorphic encryption processing can ensure the information security in the gradient data transmission process; the flattening processing can enable the gradient data not to carry the structural information of the target model, and avoid the structural information leakage of the target model.

The edge CND node may receive gradient data sent by each terminal in the area where the edge CND node is located, so that a plurality of gradient data may be generated by one or more terminals in the area where the edge CND node is located.

By way of example, the gradient data may include gradients, terminal identifications, model versions, etc., wherein the terminal identifications may be terminal IP addresses, or IDs of users logged on the terminal, etc.

In step S202, aggregation processing is performed on the plurality of gradient data, so as to obtain a first aggregation result.

In step S201, the edge CND node may sequentially receive a plurality of gradient data, and perform an aggregation process on the plurality of gradient data in response to the number of the plurality of gradient data reaching a preset number threshold, to obtain a first aggregation result. That is, when the edge CDN node receives gradient data reported by the terminal in the area, the edge CDN node aggregates the received gradient data each time a preset number of thresholds are reached.

If the gradient data is compressed by the terminal before being sent to the edge CDN node, each of the plurality of gradient data may be decompressed before executing the step.

If the gradient data is homomorphic encrypted by the terminal before being sent to the edge CDN node, the step may be performed to aggregate the plurality of homomorphic encrypted gradient data, to obtain a first aggregate result. I.e. the aggregation process is performed directly without decrypting the gradient data.

If the gradient data includes a model version, the gradient data with the model version being the latest version in the plurality of gradient data can be aggregated when the step is executed, so as to obtain a first aggregation result. Namely, deleting the gradient data with the model version which is not the latest version in the plurality of gradient data, and then carrying out aggregation treatment. If the gradient data is generated by the old version model, the method is not suitable for training of the target model of the latest version, so that accuracy and pertinence of model training can be improved by determining the first aggregation result by adopting the gradient data of the latest version.

If the gradient data includes a terminal identifier, an identifier list may be generated after the step is performed, where the identifier list includes terminal identifiers in each of the plurality of gradient data, that is, after the step generates the first aggregation result by using the plurality of gradient data received in the step S101, the terminal identifiers in each of the gradient data related to the generation of the first aggregation result are counted to form the identifier list. It will be appreciated that if the gradient data comprises both a model version and a terminal identity, the identity list comprises the terminal identity in each of the plurality of gradient data for which the model version is the most recent version, in other words the identity list comprises only the terminal identities in the gradient data involved in generating the first aggregation result.

For example, the step may perform an aggregate average on the plurality of gradient data, and determine the obtained average value as the first aggregate result.

In step S203, the first aggregation result is sent to a cloud end, so that the cloud end updates the target model according to the first aggregation result, and sends the updated target model to a terminal that generates the plurality of gradient data.

In this step, the first aggregation result may be compressed and then sent to a cloud end, so that the cloud end updates the target model according to the first aggregation result after the decompression, and sends the updated target model to a terminal that generates the plurality of gradient data. The compression processing can adopt a lossless data compression algorithm (zstd), and the compression processing can reduce the data transmission quantity and the communication cost between the cloud and the edge CND nodes.

For example, in this step, the first aggregation result may be sent according to a communication protocol between the cloud end and the edge CDN node, and then the cloud end needs to perform data cleaning on the received first aggregation result according to the communication protocol to restore the first aggregation result.

If the gradient data is homomorphic encrypted by the terminal before being sent to the edge CDN node, the first aggregation result may be directly sent to the cloud end without decryption when the step is executed, so that the cloud end updates the target model according to the decryption result of the first aggregation result. The cloud end firstly carries out homomorphic decryption on the first aggregation result, and then updates the target model by using the homomorphic decrypted first aggregation result. The gradient data is homomorphic decrypted after being transmitted, aggregated and the like after homomorphic encryption, so that the user data and the user privacy are not exposed, and the data security and the privacy security of the user are protected to the greatest extent.

If the gradient data includes a terminal identifier, the first aggregation result and an identifier list may be sent to a cloud end, so that the cloud end updates the target model according to the first aggregation result, and sends the updated target model to a terminal indicated in the identifier list, where the identifier list includes terminal identifiers in each of the plurality of gradient data. That is, after updating the target model according to the first aggregation result, the cloud end may send the updated target model to the terminal to which each terminal identifier in the identifier list belongs. The cloud uses gradient data generated by a certain terminal to update the target model, and the updated target model is issued to the terminal, so that the terminals contributing to model training enjoy model training results, the target model of each terminal can learn own user behaviors, the pertinence of model training is improved, and the obtained training effect can provide more targeted services.

If the gradient data includes a model version, the step may send the first aggregation result to a cloud end, so that the cloud end updates the target model according to the first aggregation result, and sends the updated target model to a terminal that generates gradient data with a model version of the plurality of gradient data being the latest version. For example, in this step, an identifier list formed by the first aggregation result and the terminal identifier in the gradient data with the model version of the plurality of gradient data being the latest version is sent to the cloud, and the cloud may issue the updated target model according to the identifier list, so as to issue the updated target model to the terminal generating the gradient data with the model version of the plurality of gradient data being the latest version.

Under the two conditions, the cloud end updates the target model by utilizing gradient data generated by a certain terminal, and then the updated target model is issued to the terminal, so that the terminals contributing to model training enjoy model training results, the target model of each terminal can learn own user behaviors, the pertinence of model training is improved, and the obtained training effect can provide more targeted services.

In addition, for the terminal generating the gradient data with the model version which is not the latest version in the plurality of gradient data, the original model of the target model with the latest version can be issued to the terminal so that the terminal can be updated to the model with the latest version and is not influenced by data generated by other terminals.

It may be appreciated that the cloud may receive the first aggregate results sent by the plurality of edge CDN nodes, so that the cloud may update the target model in combination with the first aggregate results sent by the plurality of edge CDN nodes, for example, update the target model of the corresponding region using the first aggregate results sent by the plurality of edge CDN nodes in each region; this will be described in detail in the description of the cloud testing method, which is not described in detail herein.

Referring to fig. 3, a flow of a model training method running in the cloud is shown in an exemplary manner, and includes steps S301 to S303.

In step S301, a first aggregation result is received, where the first aggregation result is obtained by aggregating a plurality of gradient data by an edge CDN node and is sent to a cloud, and the gradient data is generated by a terminal according to sample data and a target model and is sent to the edge CDN node.

The details of the gradient data generated by the terminal and the first aggregation result generated by the edge CDN node are described in more detail in the method applied to the edge CDN node, and are not repeated here.

It is understood that this step may receive a plurality of first aggregate results, where the plurality of first aggregate results are generated by at least one edge CDN node (i.e., one or more edge CDN nodes) and sent to the cloud.

In an exemplary case of receiving a plurality of first aggregation results, the step may further receive a node identifier corresponding to each first aggregation result in the plurality of first aggregation results, where the node identifier includes an identifier of an edge CDN node that generates the first aggregation result.

In an exemplary case of receiving a plurality of first aggregation results, the step may further receive an identifier list corresponding to each of the plurality of first aggregation results, where the identifier list includes terminal identifiers in a plurality of gradient data corresponding to the first aggregation results.

In step S302, the target model is updated according to the first aggregation result, and the updated target model is sent to a terminal that generates the plurality of gradient data.

Under the condition that the first aggregation result is obtained by aggregating the gradient data with the model version of the latest version in the plurality of gradient data by the edge CDN node and sending the aggregated result to the cloud end, the step may send the updated target model to a terminal generating the gradient data with the model version of the latest version in the plurality of gradient data, for example, according to an identifier list corresponding to the first aggregation result, send the updated target model to a terminal corresponding to the first aggregation result, where the identifier list includes identifiers of terminals in the gradient data with the model version of the latest version in the plurality of gradient data. In addition, the original model of the target model of the latest version can be sent to the terminal generating the gradient data of the model version which is not the latest version in the plurality of gradient data, so that the model of the latest version is updated, and the model is not influenced by data generated by other terminals.

Where step S301 receives a plurality of first aggregation results, this step may be performed in the manner shown in fig. 4, including sub-steps S3021 to S3022.

In sub-step S3021, the plurality of first polymerization results are subjected to a polymerization process, resulting in a second polymerization result.

Under the condition that the first aggregation result is obtained by aggregating a plurality of homomorphic encrypted gradient data by an edge CDN node and is sent to a cloud, the aggregation processing can be directly carried out on the plurality of first aggregation results under the condition that homomorphic decryption is not carried out.

The first aggregate results are generated by a plurality of edge CDN nodes and sent to the cloud, where the plurality of edge CDN nodes belong to at least one region (i.e., the plurality of edge CDN nodes belong to the same or different regions); the sub-step may first group the plurality of first aggregation results according to the node identifier corresponding to each first aggregation result in the plurality of first aggregation results, to obtain at least one first aggregation result of each region; and then respectively carrying out polymerization treatment on at least one first polymerization result of each region to obtain a second polymerization result of each region. That is, if the edge CDN node that sends the plurality of first aggregate results is at least one edge CDN node of the same region, performing local sum processing on the plurality of first aggregate results; and if the edge CDN nodes of the different areas of the edge CDN nodes of the first aggregation results are sent, carrying out aggregation processing on at least one first aggregation result in each area related to the area.

In sub-step S3022, the target model is updated according to the second aggregation result, and the updated target model is sent to the terminals corresponding to the plurality of first aggregation results.

Under the condition that the first aggregation result is obtained by aggregating a plurality of homomorphic encrypted gradient data by an edge CDN node and is sent to a cloud, the sub-step can update the target model according to the homomorphic decrypted second aggregation result. The gradient data is homomorphic decrypted after being transmitted, aggregated and the like after homomorphic encryption, so that the user data and the user privacy are not exposed, and the data security and the privacy security of the user are protected to the greatest extent.

For example, the substep may send the updated object model to the terminal corresponding to the plurality of first aggregation results according to the identifier list corresponding to each of the plurality of first aggregation results. That is, the updated target identifier is sent to the terminal to which each terminal identifier in the identifier list corresponding to each first aggregation result belongs.

The first aggregate results are generated by a plurality of edge CDN nodes and sent to the cloud, where the plurality of edge CDN nodes belong to at least one region (i.e., the plurality of edge CDN nodes belong to the same or different regions); the substep may update, for each region, the target model corresponding to the region according to the second aggregation result of the region, and send the updated target model to the terminal corresponding to the at least one first aggregation result of the region, for example, according to the identifier list corresponding to each first aggregation result in the at least one first aggregation result of the region, send the updated target model to the terminal corresponding to the at least one first aggregation result of the region. In the example, the first aggregation result reported by CDN nodes in the area, namely gradient data reported by terminals in the area are used for training the target model corresponding to the area, so that the pertinence of model training is further improved, in terms of image, sample data generated by the terminals in a certain area are used for training the target model in the area through the edge CDN nodes and the cloud, and the trained target model is issued to the terminals in the area, so that the target models in different areas have differences and pertinence to the area.

Other details of the model training method running in the cloud end are described in more detail in the model training method running in the edge CDN node, and a detailed description is not repeated here.

Fig. 5 is a schematic block diagram of an apparatus according to an exemplary embodiment. Referring to fig. 5, at the hardware level, the device includes a processor 502, an internal bus 504, a network interface 506, a memory 508, and a non-volatile storage 510, although other tasks may be performed by the device. One or more embodiments of the present description may be implemented in a software-based manner, such as by the processor 502 reading a corresponding computer program from the non-volatile storage 510 into the memory 508 and then running. Of course, in addition to software implementation, one or more embodiments of the present disclosure do not exclude other implementation manners, such as a logic device or a combination of software and hardware, etc., that is, the execution subject of the following processing flow is not limited to each logic unit, but may also be hardware or a logic device.

Referring to fig. 6, the model training apparatus may be applied to the device shown in fig. 5 to implement the technical solution of the present specification. The device comprises:

The first receiving module 601 is configured to receive a plurality of gradient data, where the gradient data is generated by a terminal according to sample data and a target model, and sent to an edge CDN node;

the aggregation module 602 is configured to aggregate the plurality of gradient data to obtain a first aggregation result;

the first updating module 603 is configured to send the first aggregation result to a cloud end, so that the cloud end updates the target model according to the first aggregation result, and sends the updated target model to a terminal that generates the plurality of gradient data.

the apparatus further comprises a decompression module for:

the aggregation module is specifically used for:

the first updating module is specifically configured to:

Referring to fig. 7, the model training apparatus may be applied to the device shown in fig. 5 to implement the technical solution of the present specification. The device comprises:

The second receiving module 701 is configured to receive a first aggregation result, where the first aggregation result is obtained by aggregating a plurality of gradient data by an edge CDN node and is sent to a cloud, and the gradient data is generated by a terminal according to sample data and a target model and is sent to the edge CDN node;

and the second updating module 702 is configured to update the target model according to the first aggregation result, and send the updated target model to a terminal that generates the plurality of gradient data.

the second updating module is specifically configured to:

the second receiving module is further configured to:

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. A typical implementation device is a computer, which may be in the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email device, game console, tablet computer, wearable device, or a combination of any of these devices.

In a typical configuration, a computer includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, read only compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic disk storage, quantum memory, graphene-based storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by the computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

The terminology used in the one or more embodiments of the specification is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the specification. As used in this specification, one or more embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.

The user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of related data is required to comply with the relevant laws and regulations and standards of the relevant country and region, and is provided with corresponding operation entries for the user to select authorization or rejection.

It should be understood that although the terms first, second, third, etc. may be used in one or more embodiments of the present description to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.

The foregoing description of the preferred embodiment(s) is (are) merely intended to illustrate the embodiment(s) of the present invention, and it is not intended to limit the embodiment(s) of the present invention to the particular embodiment(s) described.

Claims

1. A method of model training, the method comprising:

and sending the first aggregation results and the node identifiers to a cloud end, so that the cloud end receives a plurality of first aggregation results generated and sent by a plurality of edge CDN nodes belonging to at least one region, and then groups the plurality of first aggregation results according to the node identifiers corresponding to each first aggregation result in the plurality of first aggregation results to obtain at least one first aggregation result of each region, respectively carrying out aggregation processing on the at least one first aggregation result of each region to obtain a second aggregation result of each region, updating a target model corresponding to the region according to the second aggregation result of the region, and sending the updated target model to a terminal corresponding to the at least one first aggregation result of the region, wherein the node identifiers comprise identifiers of the edge CDNs generating the first aggregation results, and the terminal corresponding to the first aggregation results comprises a terminal generating a plurality of gradient data corresponding to the first aggregation result.

2. The model training method according to claim 1, wherein the gradient data is compressed by a terminal before being sent to an edge CDN node;

3. The model training method according to claim 1, wherein the gradient data is homomorphic encrypted by a terminal before being sent to a CDN node;

4. The model training method according to claim 1, wherein the aggregating the plurality of gradient data to obtain a first aggregate result includes:

5. The model training method of claim 1, the gradient data comprising a gradient and a model version;

6. The model training method according to claim 1, wherein the aggregating the plurality of gradient data to obtain a first aggregate result includes:

7. The model training method of claim 1, wherein the sending the first aggregation result to a cloud end, so that the cloud end updates the target model according to the first aggregation result, and sends the updated target model to a terminal that generates the plurality of gradient data, includes:

8. The model training method of claim 1, the gradient data comprising a gradient and a terminal identification;

9. A method of model training, the method comprising:

receiving a plurality of first aggregation results generated and sent by at least one edge CDN node, wherein the first aggregation results are obtained by aggregating a plurality of gradient data by the edge CDN node and are sent to a cloud, and the gradient data are generated by a terminal according to sample data and a target model and are sent to the edge CDN node;

the method comprises the steps of carrying out aggregation processing on a plurality of first aggregation results to obtain a second aggregation result, updating the target model according to the second aggregation result, and sending the updated target model to terminals corresponding to the plurality of first aggregation results, wherein the terminals corresponding to the first aggregation results comprise terminals for generating a plurality of gradient data corresponding to the first aggregation results;

and under the condition that the plurality of first aggregation results are generated by a plurality of edge CDN nodes and sent to the cloud end, and the plurality of edge CDN nodes belong to at least one region:

the method further comprises the steps of:

10. The model training method according to claim 9, wherein the first aggregation result is obtained by aggregating a plurality of homomorphic encrypted gradient data by an edge CDN node and is sent to a cloud;

11. The model training method of claim 9, the receiving a first aggregate result, further comprising:

12. The model training method according to claim 9, wherein the first aggregation result is obtained by aggregating, by an edge CDN node, gradient data having a model version that is the latest version of the plurality of gradient data, and is sent to a cloud;

the terminal corresponding to the first aggregation result comprises: and generating a terminal of which the model version is the latest version of the gradient data in the plurality of gradient data corresponding to the first aggregation result.

13. The model training method of claim 12, the method further comprising:

And sending the original model of the target model of the latest version to a terminal generating gradient data of a model version which is not the latest version in the plurality of gradient data corresponding to the first aggregation result.

14. A model training apparatus, the apparatus comprising:

the first updating module is configured to send the first aggregation result and the node identifier to a cloud, so that the cloud receives a plurality of first aggregation results generated and sent by a plurality of edge CDN nodes belonging to at least one region, and then groups the plurality of first aggregation results according to the node identifier corresponding to each first aggregation result in the plurality of first aggregation results, so as to obtain at least one first aggregation result of each region, respectively perform aggregation processing on the at least one first aggregation result of each region, obtain a second aggregation result of each region, update the target model according to the second aggregation result of the region, and send the updated target model to a terminal corresponding to the at least one first aggregation result of the region, where the node identifier includes an identifier of the edge CDN node generating the first aggregation result, and the terminal corresponding to the first aggregation result includes a terminal generating a plurality of gradient data corresponding to the first aggregation result.

15. A model training apparatus, the apparatus comprising:

the second receiving module is used for receiving a plurality of first aggregation results generated and sent by at least one edge CDN node, wherein the first aggregation results are obtained by carrying out aggregation processing on a plurality of gradient data by the edge CDN node and are sent to a cloud, and the gradient data are generated by a terminal according to sample data and a target model and are sent to the edge CDN node;

the second updating module is used for carrying out aggregation processing on the plurality of first aggregation results to obtain a second aggregation result, updating the target model according to the first aggregation result, and sending the updated target model to terminals corresponding to the plurality of first aggregation results, wherein the terminals corresponding to the first aggregation result comprise terminals for generating a plurality of gradient data corresponding to the first aggregation result;

the second receiving module is further configured to:

16. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to implement the method of any one of claims 1-13 by executing the executable instructions.

17. A computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the method of any of claims 1-13.