CN115617827B

CN115617827B - Service model joint updating method and system based on parameter compression

Info

Publication number: CN115617827B
Application number: CN202211461638.6A
Authority: CN
Inventors: 周俊; 朱海洋; 陈为; 陈晓丰; 季永炜; 谈旭炜; 潘奇豪
Original assignee: Products Zhongda Digital Technology Co ltd; Zhejiang University ZJU
Current assignee: Products Zhongda Digital Technology Co ltd; Zhejiang University ZJU
Priority date: 2022-11-18
Filing date: 2022-11-18
Publication date: 2023-04-07
Anticipated expiration: 2042-11-18
Also published as: CN115617827A

Abstract

In the method for updating the service model, when the server judges that the iteration turn of the current iteration turn is a preset target turn, the server judges whether the model updating process enters a key stage or not based on the learning rate of two adjacent turns or the variation range of the parameters. When the key stage is not entered, the server sends k indication information corresponding to k dimensions to each participant, wherein each indication information is determined by the server according to the convergence condition of each dimension of the n local parameter vectors received from the n participants and is used for indicating whether the corresponding dimension is compressed in the current round and the subsequent rounds of iterations. Therefore, each participant compresses the dimension which is converged or close to the converged dimension in the determined local parameter vector in the current iteration and the subsequent iterations, and provides the obtained target parameter vector with the reduced data quantity to the server.

Description

Service model joint updating method and system based on parameter compression

Technical Field

One or more embodiments of the present disclosure relate to the field of machine learning, and in particular, to a method and a system for jointly updating a business model based on parameter compression.

Background

Due to the fact that a large commodity supply chain integrated service enterprise group is diversified in industry and large in business scale, covers the whole process of integrated services such as purchasing, manufacturing, distribution, storage, logistics, distribution and financing, deposits and accumulates massive business data and financial data, improves enterprise efficiency for fully mining data value and enabling business operation, and a business model is updated by a distributed learning method based on the massive business data and the financial data. In this case, a multi-party joint update model is the most widely used method, requiring a combination of gradient or model parameter (collectively referred to as parameter) updates in each iteration. However, the efficiency of model updating is affected by frequently transferring parameters in such a large parameter range. Therefore, how to efficiently update the model in a large-scale cluster becomes a key concern. To alleviate the communication bottleneck, some solutions propose to increase the size of the sample batch, i.e. to calculate parameters by each participant based on the large batch of samples, thereby reducing the communication frequency per iteration. In the scheme, the key stage in the model updating process is identified, so that the model accuracy is ensured, and the communication traffic is greatly reduced.

Disclosure of Invention

One or more embodiments of the present specification describe a method and a system for jointly updating a service model based on parameter compression, which can greatly reduce communication traffic while ensuring model accuracy.

In a first aspect, a service model joint updating method based on parameter compression is provided, which includes:

each participant i determines a local parameter vector with k dimensions according to the local sample set and the local model parameters of the business model, and provides the local parameter vector with k dimensions to the server;

the server is used for judging a first condition and carrying out dimension convergence detection in parallel under the condition that the t is equal to a preset target turn;

the first condition includes: the ratio of the t +1 th round learning rate to the t-1 th round learning rate is smaller than a learning rate threshold, or the target distance between the t-th round aggregation parameter vector and the t-1 th round aggregation parameter vector is not smaller than a distance threshold; the aggregation parameter vector of the t round is obtained by aggregating n local parameter vectors sent by the n participants;

the dimension convergence detection comprises: averaging and calculating the variance of the n element values of the n local parameter vectors corresponding to each dimension j, and determining the signal-to-noise ratio corresponding to each dimension j according to the average value and the variance value calculated for each dimension j, wherein the signal-to-noise ratio is used for indicating the convergence condition of the corresponding dimension; comparing the signal-to-noise ratio corresponding to each dimension j with an occupation ratio threshold value to obtain corresponding indication information, wherein the indication information is used for indicating whether the dimension j is compressed in the current iteration and a plurality of subsequent iterations;

the server, in case the first condition is not satisfied, sending k pieces of indication information corresponding to the k dimensions to each participant i;

each participant i compresses or decompresses each dimension j of the corresponding local parameter vector according to the k pieces of indication information to obtain a target parameter vector, and provides the target parameter vector to the server;

and the server acquires a first updating parameter of the service model based on the n target parameter vectors sent by the n participants, and sends the first updating parameter to each participant i, so that each participant i updates the local model parameter of the participant i based on the first updating parameter for the next iteration.

In a second aspect, a system for jointly updating a business model based on parameter compression is provided, which includes:

each participant i is used for determining a local parameter vector with k dimensions according to the local sample set and the local model parameters of the business model and providing the local parameter vector with k dimensions to the server;

the first condition includes: the ratio of the t +1 th round learning rate to the t-1 th round learning rate is smaller than a learning rate threshold value, or the target distance between the t-1 th round aggregation parameter vector and the t-1 th round aggregation parameter vector is not smaller than a distance threshold value; the t-th round aggregation parameter vector is obtained by aggregating n local parameter vectors sent by the n participants;

the server is further configured to send k pieces of indication information corresponding to the k dimensions to each participant i if the first condition is not satisfied;

each participant i is further used for compressing or uncompressing each dimension j of the corresponding local parameter vector according to the k pieces of indication information to obtain a target parameter vector and providing the target parameter vector to the server;

the server is further configured to obtain a first update parameter of the service model based on the n target parameter vectors sent by the n participants, and send the first update parameter to each participant i, so that each participant i updates the local model parameter of the participant i based on the first update parameter for the next iteration.

In the service model joint updating method and system based on parameter compression provided by one or more embodiments of the present specification, the server determines whether the model updating process enters a critical stage based on the learning rate or the variation range of the parameter of two adjacent rounds when determining that the round of the current iteration is the preset target round. When the key stage is not entered, the server sends k indication information corresponding to k dimensions to each participant, wherein each indication information is determined by the server according to the convergence condition of each dimension of the n local parameter vectors received from the n participants and is used for indicating whether the corresponding dimension is compressed in the current round and the subsequent rounds of iterations. Therefore, each participant compresses the dimension which is converged or close to the converged dimension in the determined local parameter vector in the current iteration and the subsequent iterations, and provides the obtained target parameter vector with the reduced data quantity to the server. Therefore, the communication traffic can be greatly reduced while the model accuracy is ensured.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

FIG. 1 illustrates one of the schematic diagrams of a business model joint update system based on parameter compression according to one embodiment;

FIG. 2 illustrates an interaction diagram of a business model joint update method based on parameter compression according to one embodiment;

FIG. 3 illustrates an interaction diagram of a merchandise recommendation model joint update method based on parameter compression according to one embodiment;

FIG. 4 shows a second schematic diagram of a business model joint update system based on parameter compression according to an embodiment.

Detailed Description

The scheme provided by the specification is described below with reference to the accompanying drawings.

FIG. 1 shows one of the schematic diagrams of a business model joint update system based on parameter compression according to one embodiment. In fig. 1, the system includes a server and n participants, where n is a positive integer. The participants may be implemented as any computing, processing capable device, platform, server, or cluster of devices. The server comprises a key stage identification device, a convergence detection device, a compression device and an updating device.

In fig. 1, each participant i of the participants 1 to n may determine a corresponding local parameter vector having k dimensions according to its local sample set and the local model parameters of the business model, and provide it to the server. Wherein k is a positive integer. The business model herein is used to predict the classification or regression values of business objects. The business object may be, for example, a picture, a text, a user or a commodity. Wherein i is a positive integer, and i is more than or equal to 1 and less than or equal to n.

After receiving n local parameter vectors sent by n participants, the server may first determine whether the iteration round is a preset target round, and if so, perform determination of a first condition through a key stage identification device therein, and perform dimension convergence detection through a convergence detection device therein. It should be understood that the determination of the first condition and the dimension convergence detection herein may be performed in parallel.

The determination of the first condition is a key stage for identifying the model updating process (the model parameter updating at this stage has a large influence on the convergence of the model parameters, and is also generally referred to as a core link of the model updating). The first condition includes: and the ratio of the t +1 th round learning rate to the t-1 th round learning rate is smaller than a learning rate threshold value, or the target distance between the t-1 th round aggregation parameter vector and the t-1 th round aggregation parameter vector is not smaller than a distance threshold value. The t-th round aggregation parameter vector is obtained by aggregating n local parameter vectors sent by n participants. That is, the scheme is to identify whether the model updating process enters a critical stage based on the learning rate of two adjacent rounds or the variation amplitude of the parameter.

The above dimension convergence detection may specifically include: averaging and calculating the variance of n element values of n local parameter vectors corresponding to each dimension j, and determining the signal-to-noise ratio corresponding to each dimension j according to the average value and the variance value calculated for each dimension j, wherein the signal-to-noise ratio is used for indicating the convergence condition of the corresponding dimension. Wherein j is a positive integer, and j is more than or equal to 1 and less than or equal to k.

The above dimension convergence detection further includes: based on the signal-to-noise ratio determined for each dimension j, corresponding indication information is obtained, thus obtaining k indication information corresponding to k dimensions.

Taking a first dimension of any of the k dimensions as an example, if the signal-to-noise ratio corresponding to the first dimension is less than the duty ratio threshold, the indication information corresponding to the first dimension is determined to be compressed. And if the signal-to-noise ratio corresponding to the first dimension is not less than the duty ratio threshold, determining the indication information corresponding to the first dimension as uncompressed.

In case the first condition is not met, or in case the model update procedure does not enter a critical phase, the k indication information is sent to each participant i by the compression means.

And each participant i compresses or does not compress each dimension j of the corresponding local parameter vector according to the received k pieces of indication information to obtain a target parameter vector and provides the target parameter vector to the server.

The server obtains a first updating parameter of the service model based on n target parameter vectors sent by n participants through an updating device, and sends the first updating parameter to each participant i, so that each participant i updates the local model parameter of the participant i based on the first updating parameter for the next iteration.

It should be understood that after entering the next iteration, each participant i performs compression or non-compression processing on each dimension j of the local parameter vector according to the received k pieces of indication information after calculating the local parameter vector, until reaching a compression end condition. The compression end condition here includes that updated k pieces of indication information are received or an indication to stop compression is received, or that an iteration end condition is satisfied, or the like.

It should be noted that, when the server identifies, by the key phase identification means, that the key phase has not been entered, the server sends k pieces of indication information corresponding to k dimensions to each participant to indicate that each participant compresses the corresponding dimension in the current round and subsequent rounds of iterations. Because the parameter quantity after compression is less than the parameter quantity before compression, the scheme can reduce the data quantity transmitted between the participant and the server. In addition, since the compression processing described above is performed in a non-critical stage, the accuracy of the model is not affected. That is to say, the scheme can greatly reduce the communication traffic while ensuring the model accuracy.

FIG. 2 illustrates an interaction diagram of a business model joint update method based on parameter compression according to one embodiment. It should be noted that the method involves multiple rounds of iterations, and fig. 2 shows interaction steps included in the t-th (t is a positive integer) round of iterations, and since the interaction processes of the respective participants participating in the t-th round of iterations and the server are similar, fig. 2 mainly shows the interaction steps of any participant (called a first participant for convenience of description) participating in the t-th round of iterations and the server, and the interaction steps of other participants participating in the round of iterations and the server can be referred to the interaction steps of the first participant and the server. As shown in fig. 2, the method may include the steps of:

in step 202, each participant i determines a local parameter vector with k dimensions according to the local sample set and the local model parameters of the business model, and provides the local parameter vector to the server.

Taking any participant as an example, the business object corresponding to the sample in the local sample set maintained by the participant may include any one of the following: pictures, text, users, and merchandise, etc.

In addition, the business model may be a classification model or a regression model for predicting a classification or regression value of the business object. In one embodiment, the business model may be implemented based on a decision tree algorithm, a Bayesian algorithm, etc., and in another embodiment, the business model may be implemented based on a neural network.

It should be noted that, when the t-th iteration is the first iteration, the local model parameters may be obtained by initializing the service model by the server before the start of multiple iterations, and then issuing or providing the initialized model parameters to each participant, so that each participant may use the initialized model parameters as the local model parameters. Of course, in practical applications, each participant may first define the structure of the model (for example, what kind of model is used, the number of layers of the model, the number of neurons in each layer, and the like), and then perform the same initialization to obtain the respective local model parameters.

When the t-th iteration is a non-first iteration, the local model parameters may be updated in the t-1-th iteration.

Finally, the local parameter vector may include a gradient vector or a model parameter vector, and the determination of the local parameter vector may refer to the prior art. Taking the gradient vector as an example, the determination method can be as follows: the prediction result can be determined according to the local sample set and the local model parameters, and then the prediction loss can be determined according to the prediction result and the sample label. And finally, determining the gradient vector corresponding to the local model parameter according to the prediction loss and by utilizing a back propagation method.

It should be understood that, in practical applications, the above method for determining the local parameter vector also includes multiple iterations, and a specific iteration number may be preset.

And 204, the server parallelly judges the first condition and detects the dimension convergence under the condition of judging that t is equal to the preset target turn.

It should be understood that t is the number of iterations.

In practical applications, the preset target turns may be multiple. When there are multiple preset target rounds, t may be sequentially compared with each preset target round to determine whether t is equal to the preset target round.

The above-mentioned determination of the first condition may also be understood as identifying a critical stage of the model update process. The first condition may include: the ratio of the t +1 th round learning rate to the t-1 th round learning rate is smaller than a learning rate threshold, or the target distance between the t-th round aggregation parameter vector and the t-1 th round aggregation parameter vector is not smaller than a distance threshold.

In machine learning, gradient descent is a parameter optimization algorithm that is widely used to minimize model errors. The gradient descent method estimates the model parameters through multiple iterations, and minimizes the loss function in each iteration. In the gradient descent method, a uniform learning rate is usually given, which is used to control the learning progress of the model in an iterative process.

Generally speaking, in the previous iteration, the learning rate is larger, so that the advancing step length is longer, and then the gradient decrease can be performed at a faster speed. In later iteration, the learning rate is gradually reduced to reduce the learning step length, so that the convergence of the model is facilitated, and the optimal solution is easier to approach.

Therefore, the scheme identifies the key stage based on the variation range of the learning rate of two adjacent rounds. Specifically, when the ratio of the t +1 th round learning rate to the t-th round learning rate is smaller than the learning rate threshold, it is determined that the first condition is satisfied, that is, the key stage of the model updating process is entered. Referring to the learning rate setting method, it can be seen that the present solution uses later iteration as a key stage of the model updating process.

In addition, the scheme also identifies the key stage based on the variation amplitude of the parameters of two adjacent rounds. Specifically, when the target distance between the t-th round aggregation parameter vector and the t-1 th round aggregation parameter vector is not less than the distance threshold, it is determined that the first condition is satisfied, that is, the key stage of the model updating process is entered. That is, the iteration with a large parameter change amplitude is used as a key stage of the model updating process in the scheme.

The t-th round aggregation parameter vector is obtained by aggregating n local parameter vectors sent by n participants. In one example, the t-th round aggregated parameter vector is obtained by summing n local parameter vectors.

The above-mentioned t-1 round aggregation parameter vector is obtained by aggregating n local parameter vectors sent by n participants in the t-1 round, similarly to the calculation method of the t-1 round aggregation parameter vector.

In one example, the target distance is obtained by calculating a second-order norm distance between the t-th aggregation parameter vector and the t-1 st aggregation parameter vector, and a second-order norm of the t-1 st aggregation parameter vector, and then calculating a ratio between the second-order norm distance and the second-order norm. The specific calculation formula may be as follows:

(formula 1)

Wherein d is the target distance,grad _t for t-th round of aggregating parameter vectors, grad _t-1 And (4) aggregating the parameter vectors in the t-1 th round.

In a word, under the condition that the ratio of the learning rates of two adjacent rounds is smaller than the learning rate threshold value or the target distance of the aggregation parameter vectors of the two adjacent rounds is not smaller than the distance threshold value, the first condition is judged to be met, namely, the key stage of the model updating process is entered, otherwise, the first condition is judged not to be met.

The dimension convergence detection in step 204 may include: averaging and calculating the variance of n element values of n local parameter vectors corresponding to each dimension j, and determining the signal-to-noise ratio corresponding to each dimension j according to the average value and the variance value calculated for each dimension j, wherein the signal-to-noise ratio is used for indicating the convergence condition of the corresponding dimension.

That is, for each of k dimensions, n element values of the n local parameter vectors corresponding to the dimension are averaged and subjected to variance calculation to obtain k average values and k variance values corresponding to the k dimensions. Then, the ratio of 1 average value and 1 variance value corresponding to each dimension j can be calculated to obtain the signal-to-noise ratio corresponding to each dimension j. Where k signal-to-noise ratios corresponding to k dimensions may form a signal-to-noise ratio vector.

Then, for the signal-to-noise ratio corresponding to each dimension j, the signal-to-noise ratio can be compared with the corresponding ratio threshold, if the signal-to-noise ratio is smaller than the ratio threshold, the dimension is indicated to have entered a stable stage and is close to convergence, and therefore the indication information corresponding to the dimension is determined as the compression indication. And if the value is not less than the ratio threshold value, the dimension is still in dynamic change and is not converged, so that the indication information corresponding to the dimension is determined as the non-compression indication.

After determining the corresponding indication information for each dimension j, k indication information corresponding to k dimensions may be obtained, where each indication information is used to indicate whether to perform compression processing on the corresponding dimension in the current round and subsequent rounds of iterations. In other words, each participant i determines whether to compress or not each dimension of the n determined local parameter vectors according to the k pieces of indication information in the current round and the subsequent rounds of iterations.

In step 206, the server sends k pieces of indication information corresponding to k dimensions to each participant i in case the first condition is not satisfied.

In other words, when the identification model updating process has not entered the critical stage, the server sends k pieces of indication information corresponding to k dimensions to each participant i, so that each participant i compresses or does not compress each dimension j of the corresponding local parameter vector according to the k pieces of indication information. And when the identification model updating process enters a key stage, not sending k pieces of indication information corresponding to k dimensions. That is, the result of the dimension convergence detection (i.e., k pieces of indication information) is used in the case where the first condition is not satisfied.

And step 208, each participant i compresses or decompresses each dimension j of the corresponding local parameter vector according to the k pieces of indication information to obtain a target parameter vector, and provides the target parameter vector to the server.

In one example, the target parameter vector is obtained by:

for each dimension j of the k dimensions, whether the indication information corresponding to the dimension j is a compression indication or an uncompression indication is judged. If the instruction information is a compression instruction, the element value of the local parameter vector corresponding to the dimension j is quantized to obtain a quantized value as a processing result. The quantization value has a smaller number of bits than the element value. In the case where the indication information is an indication of no compression, the element value of the local parameter vector corresponding to the dimension j is retained as a processing result. And forming a target parameter vector based on the processing result corresponding to each dimension in the k dimensions.

In one example, the quantization process may include: the element value corresponding to dimension j (typically a floating point number containing 32 bits) is converted to an integer (e.g., multiplied by 10 to the power of 8), and then a predetermined number of bits (e.g., 4 bits or 8 bits) are truncated from the end as the corresponding quantized value.

Of course, in practical applications, a quantization value corresponding to any dimension may also be obtained by other quantization processing methods, which is not limited in this specification.

In another example, the target parameter vector is obtained by:

for each dimension j in the k dimensions, whether the indication information corresponding to the dimension j is a compression indication or an uncompression indication is judged. And under the condition that the indication information is a compression indication, judging whether the element value of the dimension j corresponding to the local parameter vector is smaller than an element value threshold, if so, taking 0 as a processing result, otherwise, taking the element value as the processing result. In the case where the indication information is an indication of no compression, the element value of the local parameter vector corresponding to the dimension j is retained as a processing result. And forming a target parameter vector based on the processing result corresponding to each dimension in the k dimensions.

In step 210, the server obtains a first update parameter of the service model based on n target parameter vectors sent by n participants, and sends the first update parameter to each participant i, so that each participant i updates a local model parameter thereof based on the first update parameter for a next iteration.

In an example, the obtaining of the first update parameter of the business model includes: and aggregating the n target parameter vectors to obtain an updated t-th aggregation parameter vector. And subtracting the product of the updated t-th round aggregation parameter vector and the t-th round learning rate from the global model parameter of the business model to obtain a first updating parameter of the business model.

Wherein, the aggregating n target parameter vectors may include: and averaging or weighted averaging the n target parameter vectors to obtain an updated polymerization parameter vector of the t-th round.

In one example, the first updated parameter of the business model may be obtained with reference to the following formula.

(formula 2)

Wherein, w _t For the first update parameter, w _t-1 Global model parameters currently maintained for the server. Eta _t Is the t-th round learning rate, also called learning step length, n is the number of target parameter vectors, m _i Is the ith target parameter vector.

It should be appreciated that after obtaining the first update parameter, the server may update (or replace) its currently maintained global model parameter with the first update parameter, thereby obtaining an updated global model parameter for the next iteration.

In addition, each participant i, after receiving the first updated parameter, can update (or replace) its locally maintained local model parameter by using the first updated parameter, so as to obtain an updated local model parameter for the next iteration.

It should be understood that since k pieces of indication information are received in the t-th iteration, after entering the next iteration, each participant i determines a t + 1-th local parameter vector according to the local sample set and the updated local model parameters, and compresses or decompresses each dimension j of the t + 1-th local parameter vector based on the k pieces of indication information received in the t-th iteration to obtain a t + 1-th target parameter vector, and provides the target parameter vector to the server. Then, the server may aggregate n pieces of t +1 th round target parameter vectors sent by n participants to obtain t +1 th round update parameters, and issue the t +1 th round update parameters to each participant i, so that the server and each participant i update the t +1 th round model parameters (including the global model parameters maintained by the server and the local model parameters maintained by each participant i); and the like until reaching the compression end condition. The compression end condition includes that updated k pieces of indication information or compression stop indications are received, or that an iteration end condition (for example, the number of iterations reaches a predetermined round or a global model parameter converges) is satisfied, and the like.

It should be noted that, in each iteration starting from the t-th round, each participant i sends a target parameter vector with a smaller data volume than the initial local parameter vector to the server, so that the data transmission volume between the participant and the server in the model updating process can be reduced by the scheme, the data transmission efficiency can be further improved, and the model updating efficiency can be improved.

The above describes the updating method of the model parameters of the t-th round (including the global model parameters maintained by the server and the local model parameters maintained by each participant i) when the first condition is satisfied, that is, in the non-critical phase. And under the condition that the first condition is not satisfied, namely in a key stage, directly utilizing the local parameter vector sent by each participant i to update the model parameters of the t-th round. In other words, in the key stage, each dimension of the local parameter vector of each participant is not compressed, so that the local parameter vector contains more useful information, and the process of updating the model is not influenced.

The method for updating the model parameters of the t-th round directly by using the local parameter vectors sent by each participant i comprises the steps that the server obtains second updating parameters of the service model based on n local parameter vectors and sends the second updating parameters to each participant i, so that each participant i updates the local model parameters of the participant i based on the second updating parameters for the next round of iteration.

Of course, in practical applications, if the key phase is identified, the server may instruct each participant to perform compression with a low compression ratio on the respective local parameter vector, which is not limited in this specification.

The second update parameter obtaining method may refer to the above formula 2, and this description is not repeated herein.

It should be further noted that, in step 204, if the server determines that t is not equal to the preset target turn, or determines that the turn of the current iteration is not equal to the preset target turn, steps 206 to 210 may be replaced by: and the server acquires other updating parameters of the service model based on the n local parameter vectors sent by the n participants and sends the other updating parameters to each participant i, so that each participant i updates the local model parameters of the participant i based on the other updating parameters for the next iteration.

Finally, after multiple iterations, each participant i uses the local model parameters obtained by the participant i as a business model which is updated by the participant i in cooperation with other participants.

Taking any participant as an example, in the case that the business object corresponding to the sample in the local sample set is a picture, the business model updated by the participant in cooperation with other participants may be a picture identification model. In the case that the business object corresponding to the sample in its local sample set is text, then the business model updated in cooperation with other participants may be a text recognition model. In the case that the business objects corresponding to the samples in the local sample set are commodities and users, then the business model updated by cooperation with other participants may be a commodity recommendation model and the like.

In summary, the service model joint updating method based on parameter compression provided in the embodiments of the present description can identify a key stage in a model updating process, and when the identification enters the key stage, compression on a parameter to be transmitted of a participant is reduced as much as possible in order to avoid influencing model parameter convergence. And when the identification enters a non-critical stage, the compression of the parameters to be transmitted of the participants is improved as much as possible, and the communication traffic is reduced. And the compression of the parameters is automatically performed from the identification, thereby improving the degree of automation of the decision. In addition, the server sends k pieces of indication information to each participant based on the result of the dimension convergence detection, and automatic control of participant parameter compression can be realized.

In addition, the method and the device automatically identify the key stage of model updating, so that appropriate compression measures are taken, the loss of model precision caused by blind compression is avoided, and even model parameters are not converged due to the blind compression. By the automatic identification method, the participants can compress communication under necessary conditions, the communication traffic is reduced, the model updating speed is increased, the model updating expandability is improved, and larger-scale clusters and larger-scale model sizes can be supported.

In a word, the scheme is a method for adaptively and automatically switching compression and non-compression parameters, and can greatly reduce communication traffic and ensure the accuracy and effect of model parameter convergence by selecting low compression and high compression at other places at a key stage.

The following describes the present solution by taking a business model as a commodity recommendation model as an example.

FIG. 3 illustrates an interaction diagram of a commodity recommendation model joint update method based on parameter compression according to one embodiment. It should be noted that the method involves multiple rounds of iterations, and the interaction steps included in the t-th (t is a positive integer) round of iterations are shown in fig. 3, and since the interaction processes of the respective participants participating in the t-th round of iterations and the server are similar, the interaction step of any participant (called a first participant for convenience of description) participating in the t-th round of iterations and the server are mainly shown in fig. 3, and the interaction steps of other participants participating in the round of iterations and the server can be referred to as the interaction step of the first participant and the server. As shown in fig. 3, the method may include the steps of:

step 302, each participant i determines local parameter vectors with k dimensions according to the local sample set and the local model parameters of the commodity recommendation model, and provides the local parameter vectors with k dimensions to the server.

The business objects corresponding to the samples in the local sample set include users and commodities, and the characteristics of the samples include user attributes (e.g., occupation, hobby, academic calendar, and the like), operation behaviors (e.g., browsing, clicking, closing, and the like), and commodity attributes (e.g., commodity category, commodity price, commodity details, and the like).

And 304, the server parallelly judges the first condition and detects the dimension convergence under the condition of judging that t is equal to the preset target turn.

The first condition includes: the ratio of the t +1 th round learning rate to the t-1 th round learning rate is smaller than a learning rate threshold, or the target distance between the t-th round aggregation parameter vector and the t-1 th round aggregation parameter vector is not smaller than a distance threshold. The t-th aggregation parameter vector is obtained by aggregating n local parameter vectors sent by n participants.

The above dimension convergence detection includes: averaging and calculating the variance of n element values of n local parameter vectors corresponding to each dimension j, and determining the signal-to-noise ratio corresponding to each dimension j according to the average value and the variance value calculated for each dimension j, wherein the signal-to-noise ratio is used for indicating the convergence condition of the corresponding dimension; and comparing the signal-to-noise ratio corresponding to each dimension j with the ratio threshold value to obtain corresponding indication information, wherein the indication information is used for indicating whether the dimension j is compressed in the current iteration and a plurality of subsequent iterations.

In step 306, the server sends k pieces of indication information corresponding to k dimensions to each participant i if the first condition is not satisfied.

And 308, compressing or uncompressing each dimension j of the corresponding local parameter vector by each participant i according to the k pieces of indication information to obtain a target parameter vector, and providing the target parameter vector to the server.

In step 310, the server obtains a first update parameter of the commodity recommendation model based on n target parameter vectors sent by n participants, and sends the first update parameter to each participant i, so that each participant i updates a local model parameter thereof based on the first update parameter for the next iteration.

After multiple iterations, each participant i uses the obtained local model parameters as a commodity recommendation model which is updated by the participants in cooperation with other participants.

In summary, the commodity recommendation model joint updating method based on parameter compression provided by the embodiments of the present description can update the commodity recommendation model under the condition of saving communication resources.

Corresponding to the service model joint updating method based on parameter compression, an embodiment of the present specification further provides a service model joint updating system based on parameter compression, as shown in fig. 4, the system may include: a server 402 and n participants 404.

Each participant 404 is configured to determine a local parameter vector having k dimensions from the local sample set and the local model parameters of the business model, and provide it to the server 402.

Wherein the local parameter vector comprises a gradient vector or a model parameter vector.

And the server 402 is configured to perform judgment of a first condition and dimension convergence detection in parallel when t is equal to a preset target turn.

Wherein the first condition comprises: and the ratio of the t +1 th round learning rate to the t-1 th round learning rate is smaller than a learning rate threshold value, or the target distance between the t-1 th round aggregation parameter vector and the t-1 th round aggregation parameter vector is not smaller than a distance threshold value. The t-th round of aggregated parameter vectors is obtained by aggregating n local parameter vectors sent by n participants 404.

The target distance is obtained by calculating a second-order norm distance between the t-th round aggregation parameter vector and the t-1 st round aggregation parameter vector and a second-order norm of the t-1 st round aggregation parameter vector and then calculating a ratio of the second-order norm distance to the second-order norm.

The above dimension convergence detection includes: averaging and calculating the variance of n element values of n local parameter vectors corresponding to each dimension j, and determining the signal-to-noise ratio corresponding to each dimension j according to the average value and the variance value calculated for each dimension j, wherein the signal-to-noise ratio is used for indicating the convergence condition of the corresponding dimension; and comparing the signal-to-noise ratio corresponding to each dimension j with the ratio threshold to obtain corresponding indication information, wherein the indication information is used for indicating whether the dimension j is compressed in the current iteration and a plurality of subsequent iterations.

Wherein the determining a signal-to-noise ratio corresponding to each dimension j comprises:

the ratio of the mean to the variance values corresponding to each dimension j is calculated to obtain the signal-to-noise ratio corresponding to each dimension j.

In addition, each piece of the indication information is a compression indication when the signal-to-noise ratio of the corresponding dimension is smaller than the duty threshold, and is an uncompression indication when the signal-to-noise ratio of the corresponding dimension is not smaller than the duty threshold.

Server 402 is further configured to send k indication information corresponding to k dimensions to each participant 404 if the first condition is not satisfied.

Each participant 404 is further configured to perform compression or non-compression processing on each dimension j of the corresponding local parameter vector according to the k pieces of indication information to obtain a target parameter vector, and provide the target parameter vector to the server 402.

In one example, each participant 404 is specifically configured to: for each dimension j in the k dimensions, whether the indication information corresponding to the dimension j is a compression indication or an uncompression indication is judged. And in the case that the indication information is a compression indication, performing quantization processing on the element value of the local parameter vector corresponding to the dimension j to obtain a quantized value as a processing result. The quantization value contains a smaller number of bits than the number of bits contained in the element value. And in the case that the indication information is the non-compression indication, keeping the element value of the local parameter vector corresponding to the dimension j as the processing result. And forming a target parameter vector based on the processing result corresponding to each dimension in the k dimensions.

In another example, each participant 404 is specifically configured to: for each dimension j of the k dimensions, whether the indication information corresponding to the dimension j is a compression indication or an uncompression indication is judged. And under the condition that the indication information is a compression indication, judging whether the element value of the dimension j corresponding to the local parameter vector is smaller than an element value threshold, if so, taking 0 as a processing result, otherwise, taking the element value as the processing result. In the case where the indication information is an indication of no compression, the element value of the local parameter vector corresponding to the dimension j is retained as a processing result. And forming a target parameter vector based on the processing result corresponding to each dimension in the k dimensions.

The server 402 is further configured to obtain a first update parameter of the service model based on the n target parameter vectors sent by the n participants 404, and send the first update parameter to each participant 404, so that each participant 404 updates the local model parameter thereof based on the first update parameter for the next iteration.

The server 402 is specifically configured to:

and aggregating the n target parameter vectors to obtain an updated t-th aggregation parameter vector. And subtracting the product of the updated t-th aggregation parameter vector and the t-th learning rate from the global model parameter of the business model to obtain a first updating parameter of the business model.

Optionally, the server 402 is further configured to, if the first condition is met, obtain a second update parameter of the service model based on the n local parameter vectors, and send the second update parameter to each of the participants 404, so that each of the participants 404 updates the local model parameter thereof based on the second update parameter for a next iteration.

The functions of each functional module of the device in the above embodiments of the present description may be implemented through each step of the above method embodiments, and therefore, a specific working process of the device provided in one embodiment of the present description is not repeated herein.

The service model joint updating system based on parameter compression provided by one embodiment of the present specification can update the service model under the condition of saving communication resources.

All the embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The steps of a method or algorithm described in connection with the disclosure herein may be embodied in hardware or may be embodied in software instructions executed by a processor. The software instructions may consist of corresponding software modules that may be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, a hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC. Additionally, the ASIC may reside in a server. Of course, the processor and the storage medium may reside as discrete components in a server.

Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.

The foregoing description of specific embodiments has been presented for purposes of illustration and description. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The above-mentioned embodiments, objects, technical solutions and advantages of the present specification are further described in detail, it should be understood that the above-mentioned embodiments are only specific embodiments of the present specification, and are not intended to limit the scope of the present specification, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present specification should be included in the scope of the present specification.

Claims

1. A service model joint updating method based on parameter compression relates to a server and n participants; the method comprises a plurality of iterations, wherein any tth iteration comprises:

the server parallelly judges a first condition and detects the dimension convergence under the condition that the t is equal to a preset target turn;

the first condition includes: the ratio of the t +1 th round learning rate to the t-1 th round learning rate is smaller than a learning rate threshold, or the target distance between the t-th round aggregation parameter vector and the t-1 th round aggregation parameter vector is not smaller than a distance threshold; the t-th round aggregation parameter vector is obtained by aggregating n local parameter vectors sent by the n participants;

the dimension convergence detection comprises: averaging and calculating the variance of n element values of the n local parameter vectors corresponding to each dimension j, and determining the signal-to-noise ratio corresponding to each dimension j according to the average value and the variance value calculated for each dimension j, wherein the signal-to-noise ratio is used for indicating the convergence condition of the corresponding dimension; comparing the signal-to-noise ratio corresponding to each dimension j with the ratio threshold to obtain corresponding indication information, wherein the indication information is used for indicating whether the dimension j is compressed in the current iteration and a plurality of subsequent iterations;

and the server acquires a first updating parameter of the service model based on the n target parameter vectors sent by the n participants and sends the first updating parameter to each participant i, so that each participant i updates the local model parameter of the participant i based on the first updating parameter for the next iteration.

2. The method of claim 1, further comprising:

and under the condition that the first condition is met, the server acquires second updating parameters of the service model based on the n local parameter vectors and sends the second updating parameters to each participant i, so that each participant i updates the local model parameters of the participant i based on the second updating parameters for the next iteration.

3. The method of claim 1, wherein the target distance is obtained by calculating a second-order norm distance between the t-th aggregation parameter vector and the t-1 st aggregation parameter vector, and a second-order norm of the t-1 st aggregation parameter vector, and then calculating a ratio of the second-order norm distance to the second-order norm.

4. The method of claim 1, wherein the determining a signal-to-noise ratio corresponding to each dimension j comprises:

5. The method according to claim 1, wherein each of the k pieces of indication information is a compression indication if the signal-to-noise ratio of the corresponding dimension is less than a duty ratio threshold, and is an uncompression indication if the signal-to-noise ratio of the corresponding dimension is not less than the duty ratio threshold.

6. The method of claim 1, wherein the target parameter vector is obtained by:

for each dimension j in the k dimensions, judging whether the indication information corresponding to the dimension j is a compression indication or an uncompression indication;

under the condition that the indication information is a compression indication, carrying out quantization processing on the element value of the local parameter vector corresponding to the dimension j to obtain a quantization value as a processing result; the bit number contained in the quantization value is smaller than the bit number contained in the element value;

under the condition that the indication information is the non-compression indication, retaining the element value of the local parameter vector corresponding to the dimension j as a processing result;

and forming the target parameter vector based on the processing result corresponding to each dimension in the k dimensions.

7. The method of claim 1, wherein the target parameter vector is obtained by:

under the condition that the indication information is a compression indication, judging whether an element value of the local parameter vector corresponding to the dimension j is smaller than an element value threshold value, if so, taking 0 as a processing result, otherwise, taking the element value as the processing result;

8. The method of claim 1, wherein the obtaining of the first updated parameter of the business model comprises:

aggregating the n target parameter vectors to obtain an updated t-th aggregation parameter vector;

and subtracting the product of the updated t-th round aggregation parameter vector and the t-th round learning rate from the global model parameter of the business model to obtain a first updating parameter of the business model.

9. The method of claim 1, wherein the local parameter vector comprises a gradient vector or a model parameter vector.

10. A business model joint updating system based on parameter compression performs multiple iterations through a server and n participants, wherein the t-th iteration comprises:

the server is used for judging a first condition and performing dimensionality convergence detection in parallel under the condition that the t is equal to a preset target turn;

the dimension convergence detection comprises: averaging and calculating the variance of the n element values of the n local parameter vectors corresponding to each dimension j, and determining the signal-to-noise ratio corresponding to each dimension j according to the average value and the variance value calculated for each dimension j, wherein the signal-to-noise ratio is used for indicating the convergence condition of the corresponding dimension; comparing the signal-to-noise ratio corresponding to each dimension j with the ratio threshold to obtain corresponding indication information, wherein the indication information is used for indicating whether the dimension j is compressed in the current iteration and a plurality of subsequent iterations;

each participant i is further configured to perform compression or non-compression processing on each dimension j of the corresponding local parameter vector according to the k pieces of indication information to obtain a target parameter vector, and provide the target parameter vector to the server;

the server is further configured to obtain a first update parameter of the service model based on the n target parameter vectors sent by the n participants, and send the first update parameter to each participant i, so that each participant i updates a local model parameter of the participant i based on the first update parameter for a next iteration.