CN115617827A

CN115617827A - A method and system for jointly updating business models based on parameter compression

Info

Publication number: CN115617827A
Application number: CN202211461638.6A
Authority: CN
Inventors: 周俊; 朱海洋; 陈为; 陈晓丰; 季永炜; 谈旭炜; 潘奇豪
Original assignee: Products Zhongda Digital Technology Co ltd; Zhejiang University ZJU
Current assignee: Products Zhongda Digital Technology Co ltd; Zhejiang University ZJU
Priority date: 2022-11-18
Filing date: 2022-11-18
Publication date: 2023-01-17
Anticipated expiration: 2042-11-18
Also published as: CN115617827B

Abstract

In the method for updating the service model, when the server judges that the iteration turn of the current iteration turn is a preset target turn, the server judges whether the model updating process enters a key stage or not based on the learning rate of two adjacent turns or the variation range of the parameters. When the key stage is not entered, the server sends k indication information corresponding to k dimensions to each participant, wherein each indication information is determined by the server according to the convergence condition of each dimension of the n local parameter vectors received from the n participants and is used for indicating whether the corresponding dimension is compressed in the current round and the subsequent rounds of iterations. Therefore, each participant compresses the dimension which is converged or close to the converged dimension in the determined local parameter vector in the current iteration and the subsequent iterations, and provides the obtained target parameter vector with the reduced data quantity to the server.

Description

A method and system for jointly updating business models based on parameter compression

技术领域technical field

本说明书一个或多个实施例涉及机器学习领域，尤其涉及一种基于参数压缩的业务模型联合更新方法及系统。One or more embodiments of this specification relate to the field of machine learning, and in particular to a method and system for jointly updating service models based on parameter compression.

背景技术Background technique

大宗商品供应链集成服务企业集团因产业多元化、业务规模庞大，且覆盖采购、制造、分销、仓储、物流、配送、融资等集成服务全流程，沉淀、积累了海量的业务数据和财务数据，为充分挖掘数据价值赋能业务运营提高企业效率，我们会基于海量的业务数据和财务数据，采用分布式学习的方法来更新业务模型。在这种情况下，多参与方联合更新模型是最广泛采用的方法，需要在每轮迭代中结合各梯度或者模型参数（统称为参数）更新。然而在如此大的参数范围内频繁传递参数会影响模型的更新效率。因此，如何在大规模集群中，高效更新模型成为重点关注的问题。为了缓解通信瓶颈，部分方案提出增加样本批量的大小，即由每个参与方基于大批量样本计算参数，从而减少每轮迭代的通信频率。本方案中，对模型更新过程中的关键阶段进行识别，从而在保证模型精度的同时，大幅度降低通信量。Commodity supply chain integration service enterprise groups have accumulated a large amount of business data and financial data due to their industrial diversification and large business scale, covering the entire process of integrated services such as procurement, manufacturing, distribution, warehousing, logistics, distribution, and financing. In order to fully tap the value of data to empower business operations and improve enterprise efficiency, we will use distributed learning methods to update business models based on massive business and financial data. In this case, the multi-participant joint update model is the most widely adopted method, which needs to be combined with each gradient or model parameters (collectively referred to as parameters) update in each round of iterations. However, passing parameters frequently in such a large parameter range will affect the update efficiency of the model. Therefore, how to efficiently update the model in a large-scale cluster has become a key concern. In order to alleviate the communication bottleneck, some schemes propose to increase the size of the sample batch, that is, each participant calculates parameters based on a large batch of samples, thereby reducing the communication frequency of each iteration. In this solution, the key stages in the model update process are identified, so as to ensure the accuracy of the model while greatly reducing the communication traffic.

发明内容Contents of the invention

本说明书一个或多个实施例描述了一种基于参数压缩的业务模型联合更新方法及系统，可以在保证模型精度的同时，大幅度降低通信量。One or more embodiments of this specification describe a method and system for jointly updating business models based on parameter compression, which can greatly reduce communication traffic while ensuring model accuracy.

第一方面，提供了一种基于参数压缩的业务模型联合更新方法，包括：In the first aspect, a method for jointly updating business models based on parameter compression is provided, including:

每个参与方i，根据本地样本集和业务模型的本地模型参数，确定具有k个维度的局部参数向量，并将其提供给所述服务器；Each participant i determines a local parameter vector with k dimensions according to the local sample set and the local model parameters of the business model, and provides it to the server;

所述服务器，在所述t等于预设的目标轮次的情况下，并行进行第一条件的判断和维度收敛检测；The server, when the t is equal to the preset target round, performs the judgment of the first condition and the dimension convergence detection in parallel;

所述第一条件包括：第t+1轮学习率与第t轮学习率之比小于学习率阈值，或者，第t轮聚合参数向量与第t-1轮聚合参数向量的目标距离不小于距离阈值；所述第t轮聚合参数向量，通过聚合所述n个参与方发送的n份局部参数向量而得到；The first condition includes: the ratio of the learning rate of the t+1th round to the learning rate of the tth round is less than the learning rate threshold, or the target distance between the aggregation parameter vector of the tth round and the aggregation parameter vector of the t-1th round is not less than the distance Threshold; the t-th round of aggregation parameter vectors is obtained by aggregating n parts of local parameter vectors sent by the n participants;

所述维度收敛检测包括：对所述n份局部参数向量对应于每个维度j的n个元素值进行求平均和求方差，并根据对每个维度j计算的平均值和方差值，确定对应于每个维度j的信噪比，该信噪比用于指示对应维度的收敛情况；以及将每个维度j对应的信噪比与占比阈值进行大小比较，得到对应的指示信息，该指示信息用于指示在本轮以及后续的若干轮迭代中是否对维度j进行压缩处理；The dimensional convergence detection includes: averaging and calculating the variance of the n element values corresponding to each dimension j of the n parts of the local parameter vector, and determining according to the average value and variance value calculated for each dimension j Corresponding to the signal-to-noise ratio of each dimension j, the signal-to-noise ratio is used to indicate the convergence of the corresponding dimension; and comparing the signal-to-noise ratio corresponding to each dimension j with the proportion threshold to obtain corresponding indication information, the The indication information is used to indicate whether to compress the dimension j in the current round and subsequent rounds of iterations;

所述服务器，在所述第一条件不满足的情况下，向每个参与方i发送对应于所述k个维度的k个指示信息；The server, when the first condition is not satisfied, sends k pieces of indication information corresponding to the k dimensions to each participant i;

每个参与方i，根据所述k个指示信息，对对应的局部参数向量的每个维度j进行压缩或不压缩处理，得到目标参数向量，并将其提供给所述服务器；Each participant i, according to the k indication information, compresses or does not compress each dimension j of the corresponding local parameter vector to obtain a target parameter vector, and provides it to the server;

所述服务器，基于所述n个参与方发送的n份目标参数向量，获取所述业务模型的第一更新参数，并将其下发至每个参与方i，以供每个参与方i基于所述第一更新参数，更新其本地模型参数，以用于下一轮迭代。The server, based on the n target parameter vectors sent by the n participants, obtains the first updated parameter of the business model, and sends it to each participant i for each participant i to use based on The first update parameter updates its local model parameters for the next iteration.

第二方面，提供了一种基于参数压缩的业务模型联合更新系统，包括：In the second aspect, a system for jointly updating business models based on parameter compression is provided, including:

每个参与方i，用于根据本地样本集和业务模型的本地模型参数，确定具有k个维度的局部参数向量，并将其提供给所述服务器；Each participant i is used to determine a local parameter vector with k dimensions according to the local sample set and the local model parameters of the business model, and provide it to the server;

所述服务器，用于在所述t等于预设的目标轮次的情况下，并行进行第一条件的判断和维度收敛检测；The server is configured to perform the judgment of the first condition and the dimension convergence detection in parallel when the t is equal to the preset target round;

所述服务器，还用于在所述第一条件不满足的情况下，向每个参与方i发送对应于所述k个维度的k个指示信息；The server is further configured to send k pieces of indication information corresponding to the k dimensions to each participant i when the first condition is not met;

每个参与方i，还用于根据所述k个指示信息，对对应的局部参数向量的每个维度j进行压缩或不压缩处理，得到目标参数向量，并将其提供给所述服务器；Each participant i is further configured to compress or not compress each dimension j of the corresponding local parameter vector according to the k indication information, obtain a target parameter vector, and provide it to the server;

所述服务器，还用于基于所述n个参与方发送的n份目标参数向量，获取所述业务模型的第一更新参数，并将其下发至每个参与方i，以供每个参与方i基于所述第一更新参数，更新其本地模型参数，以用于下一轮迭代。The server is further configured to obtain the first update parameter of the business model based on the n target parameter vectors sent by the n participants, and send it to each participant i for each participant Party i updates its local model parameters based on the first update parameters for the next iteration.

本说明书一个或多个实施例提供的基于参数压缩的业务模型联合更新方法及系统，服务器在判断本轮迭代的轮次为预设的目标轮次的情况下，基于相邻两轮的学习率或者参数的变化幅度，判断模型更新过程是否进入到了关键阶段。在还没有进入关键阶段时，服务器向每个参与方发送对应于k个维度的k个指示信息，其中的每个指示信息是服务器根据从n个参与方接收的n份局部参数向量的每个维度的收敛情况确定的，用于指示在本轮以及后续的若干轮迭代中是否对对应维度进行压缩。由此使得每个参与方在本轮以及后续的若干轮迭代中，对确定的局部参数向量中已经收敛或接近收敛的维度进行压缩，并将得到的数据量变小的目标参数向量提供给服务器。由此可以实现在保证模型精度的同时，大幅度降低通信量。In the method and system for jointly updating business models based on parameter compression provided by one or more embodiments of this specification, when the server judges that the current round of iteration is the preset target round, it Or the change range of the parameters to judge whether the model update process has entered a critical stage. Before entering the critical stage, the server sends k indication information corresponding to k dimensions to each participant, each of which is based on each of n local parameter vectors received from n participants. Determined by the convergence of the dimension, it is used to indicate whether to compress the corresponding dimension in the current round and subsequent rounds of iterations. This enables each participant to compress the converged or near-converged dimensions of the determined local parameter vectors in the current round and several subsequent iterations, and provide the obtained target parameter vectors with a smaller amount of data to the server. In this way, the communication traffic can be greatly reduced while ensuring the accuracy of the model.

附图说明Description of drawings

为了更清楚地说明本说明书实施例的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本说明书的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其它的附图。In order to more clearly illustrate the technical solutions of the embodiments of this specification, the following will briefly introduce the drawings that need to be used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of this specification. Those of ordinary skill in the art can also obtain other drawings based on these drawings without making creative efforts.

图1示出根据一个实施例的基于参数压缩的业务模型联合更新系统示意图之一；Fig. 1 shows one of the schematic diagrams of a system for jointly updating business models based on parameter compression according to an embodiment;

图2示出根据一个实施例的基于参数压缩的业务模型联合更新方法交互图；Fig. 2 shows an interaction diagram of a method for jointly updating business models based on parameter compression according to an embodiment;

图3示出根据一个实施例的基于参数压缩的商品推荐模型联合更新方法交互图；Fig. 3 shows an interaction diagram of a method for jointly updating product recommendation models based on parameter compression according to an embodiment;

图4示出根据一个实施例的基于参数压缩的业务模型联合更新系统示意图之二。Fig. 4 shows the second schematic diagram of the system for jointly updating service models based on parameter compression according to an embodiment.

具体实施方式detailed description

下面结合附图，对本说明书提供的方案进行描述。The solutions provided in this specification will be described below in conjunction with the accompanying drawings.

图1示出根据一个实施例的基于参数压缩的业务模型联合更新系统示意图之一。图1中，该系统包括服务器和n个参与方，其中，n为正整数。各参与方可以实现为任何具有计算、处理能力的设备、平台、服务器或设备集群。该服务器包括关键阶段识别装置、收敛检测装置、压缩装置和更新装置。Fig. 1 shows one of schematic diagrams of a system for jointly updating service models based on parameter compression according to an embodiment. In Fig. 1, the system includes a server and n participants, where n is a positive integer. Each participant can be implemented as any device, platform, server or device cluster with computing and processing capabilities. The server includes critical stage identification means, convergence detection means, compression means and update means.

图1中，参与方1-参与方n中的每个参与方i，可以根据其本地样本集和业务模型的本地模型参数，确定对应的具有k个维度的局部参数向量，并将其提供给服务器。其中，k为正整数。这里的业务模型用于预测业务对象的分类或回归值。其中的业务对象例如可以为图片、文本、用户或者商品等。其中，i为正整数，且1≤i≤n。In Figure 1, each participant i among participant 1-participant n can determine the corresponding local parameter vector with k dimensions according to its local sample set and local model parameters of the business model, and provide it to server. Among them, k is a positive integer. Here the business model is used to predict the classification or regression value of the business object. The business objects can be pictures, texts, users or commodities, for example. Wherein, i is a positive integer, and 1≤i≤n.

服务器在接收到n个参与方发送的n份局部参数向量后，可以先判断本轮迭代的轮次是否为预设的目标轮次，如果是，则通过其中的关键阶段识别装置进行第一条件的判断，以及通过其中的收敛检测装置进行维度收敛检测。应理解，这里的第一条件的判断和维度收敛检测可以并行执行。After receiving n copies of local parameter vectors sent by n participants, the server can first judge whether the round of the current iteration is the preset target round, and if so, perform the first condition through the key stage identification device judgment, and the dimension convergence detection is performed through the convergence detection device therein. It should be understood that the judgment of the first condition and the detection of dimensionality convergence here can be executed in parallel.

需要说明，上述的第一条件的判断是用于识别模型更新过程的关键阶段（该阶段的模型参数更新对于模型参数的收敛有较大影响，通常也称为模型更新的核心环节）。该第一条件包括：第t+1轮学习率与第t轮学习率之比小于学习率阈值，或者，第t轮聚合参数向量与第t-1轮聚合参数向量的目标距离不小于距离阈值。该第t轮聚合参数向量，通过聚合n个参与方发送的n份局部参数向量而得到。也就是说，本方案是基于相邻两轮的学习率或者参数的变化幅度，来识别模型更新过程是否进入到了关键阶段。It should be noted that the judgment of the above-mentioned first condition is used to identify the key stage of the model update process (the model parameter update at this stage has a greater impact on the convergence of the model parameters, and is usually called the core link of the model update). The first condition includes: the ratio of the learning rate of the t+1th round to the learning rate of the tth round is less than the learning rate threshold, or the target distance between the aggregated parameter vector of the tth round and the aggregated parameter vector of the t-1th round is not less than the distance threshold . The t-th round of aggregation parameter vector is obtained by aggregating n parts of local parameter vectors sent by n participants. That is to say, this solution is based on the learning rate of two adjacent rounds or the variation range of parameters to identify whether the model update process has entered a critical stage.

上述的维度收敛检测具体可以包括：对n份局部参数向量对应于每个维度j的n个元素值进行求平均和求方差，并根据对每个维度j计算的平均值和方差值，确定对应于每个维度j的信噪比，该信噪比用于指示对应维度的收敛情况。其中，j为正整数，且1≤j≤k。The above-mentioned dimensional convergence detection may specifically include: averaging and calculating the variance of the n element values corresponding to each dimension j of n local parameter vectors, and determining Corresponding to the signal-to-noise ratio of each dimension j, the signal-to-noise ratio is used to indicate the convergence of the corresponding dimension. Wherein, j is a positive integer, and 1≤j≤k.

上述的维度收敛检测还包括：基于针对每个维度j确定的信噪比，获取对应的指示信息，如此得到对应于k个维度的k个指示信息。The above-mentioned dimensional convergence detection further includes: acquiring corresponding indication information based on the determined signal-to-noise ratio for each dimension j, so as to obtain k pieces of indication information corresponding to k dimensions.

以k个维度中任意的第一维度为例来说，如果对应于第一维度的信噪比小于占比阈值，则确定对应于第一维度的指示信息为压缩。而如果对应于第一维度的信噪比不小于占比阈值，则确定对应于第一维度的指示信息为不压缩。Taking any first dimension among the k dimensions as an example, if the signal-to-noise ratio corresponding to the first dimension is smaller than the ratio threshold, it is determined that the indication information corresponding to the first dimension is compression. And if the signal-to-noise ratio corresponding to the first dimension is not less than the ratio threshold, it is determined that the indication information corresponding to the first dimension is not to be compressed.

在第一条件不满足的情况下，或者说，在模型更新过程未进入到关键阶段的情况下，通过压缩装置，向每个参与方i发送该k个指示信息。When the first condition is not met, or in other words, when the model update process does not enter a critical stage, the k pieces of indication information are sent to each participant i through the compression device.

每个参与方i，根据接收的k个指示信息，对对应的局部参数向量的每个维度j进行压缩或不压缩处理，得到目标参数向量，并将其提供给服务器。Each participant i, according to the received k pieces of indication information, compresses or does not compress each dimension j of the corresponding local parameter vector to obtain a target parameter vector, and provides it to the server.

服务器通过其中的更新装置，基于n个参与方发送的n份目标参数向量，获取业务模型的第一更新参数，并将其下发至每个参与方i，以供每个参与方i基于第一更新参数，更新其本地模型参数，以用于下一轮迭代。The server obtains the first updated parameter of the business model based on the n target parameter vectors sent by n participants through the update device therein, and sends it to each participant i for each participant i to use based on the first A parameter update updates its local model parameters for the next iteration.

应理解，在进入下一轮迭代后，每个参与方i在计算得到局部参数向量之后，根据接收的k个指示信息，对局部参数向量的每个维度j进行压缩或不压缩处理，直至到达压缩结束条件。这里的压缩结束条件包括，接收到更新的k个指示信息或停止压缩指示，或者，满足迭代结束条件等。It should be understood that after entering the next iteration, each participant i, after calculating the local parameter vector, compresses or does not compress each dimension j of the local parameter vector according to the received k indication information until reaching Compression end condition. The compression end condition here includes receiving updated k indication information or stopping the compression indication, or satisfying the iteration end condition, and the like.

需要说明，在服务器通过关键阶段识别装置识别还没有进入关键阶段时，服务器向每个参与方发送对应于k个维度的k个指示信息，以指示每个参与方在本轮以及后续的若干轮迭代中对对应维度进行压缩。由于压缩后的参数量小于压缩前的参数量，从而本方案可以减小参与方与服务器之间传输的数据量。此外，由于上述的压缩处理是指非关键阶段执行的，从而不会影响模型的精度。也就是说，本方案可以在保证模型精度的同时，大幅度降低通信量。It should be noted that when the server has not entered the critical stage through the identification of the critical stage identification device, the server sends k pieces of indication information corresponding to k dimensions to each participant to indicate that each participant is in the current round and subsequent rounds The corresponding dimension is compressed during the iteration. Since the amount of parameters after compression is smaller than the amount of parameters before compression, this scheme can reduce the amount of data transmitted between the participant and the server. In addition, since the above-mentioned compression process is performed in a non-critical stage, it will not affect the accuracy of the model. That is to say, this scheme can greatly reduce the communication traffic while ensuring the accuracy of the model.

图2示出根据一个实施例的基于参数压缩的业务模型联合更新方法交互图。需要说明，该方法涉及多轮迭代，图2中示出其中第t（t为正整数）轮迭代包括的交互步骤，并且，因参与第t轮迭代的各个参与方与服务器的交互过程相近，所以图2中主要示出参与该第t轮迭代的任一个参与方（为便于描述，称作第一参与方）与服务器的交互步骤，参与该轮迭代的其它参与方与服务器的交互步骤，可以参见该第一参与方与服务器的交互步骤。如图2所示，该方法可以包括如下步骤：Fig. 2 shows an interaction diagram of a method for jointly updating service models based on parameter compression according to an embodiment. It should be noted that this method involves multiple rounds of iterations. Figure 2 shows the interaction steps included in the tth (t is a positive integer) round of iterations, and because the interaction process between the participants participating in the tth round of iterations and the server is similar, Therefore, Figure 2 mainly shows the interaction steps between any participant participating in the t-th iteration (for convenience of description, referred to as the first participant) and the server, and the interaction steps between other participants participating in the iteration and the server. Refer to the interaction steps between the first participant and the server. As shown in Figure 2, the method may include the following steps:

步骤202，每个参与方i根据本地样本集和业务模型的本地模型参数，确定具有k个维度的局部参数向量，并将其提供给服务器。Step 202, each participant i determines a local parameter vector with k dimensions according to the local sample set and the local model parameters of the business model, and provides it to the server.

以任一参与方为例来说，其维护的本地样本集中的样本所对应的业务对象，可以包括以下中的任一种：图片、文本、用户以及商品等。Taking any participant as an example, the business objects corresponding to the samples in the local sample set maintained by it may include any of the following: pictures, texts, users, and commodities.

此外，上述业务模型可以分类模型或回归模型，用于预测上述业务对象的分类或回归值。在一个实施例中，该业务模型可以基于决策树算法、贝叶斯算法等实现，在另一个实施例中，该业务模型可以基于神经网络实现。In addition, the above business model may be a classification model or a regression model, and is used to predict the classification or regression value of the above business object. In one embodiment, the business model can be realized based on decision tree algorithm, Bayesian algorithm, etc. In another embodiment, the business model can be realized based on neural network.

需要说明，当第t轮迭代为首轮迭代时，上述本地模型参数可以是由服务器在多轮迭代开始之前，对业务模型进行初始化，然后将初始化的模型参数下发或提供给各参与方，从而各参与方可以将上述初始化的模型参数作为本地模型参数。当然，在实际应用中，也可以是由各参与方先约定好模型的结构（例如采用何种模型，模型的层数，每层的神经元的数目等等），之后再进行相同的初始化，以得到各自的本地模型参数。It should be noted that when the t-th iteration is the first iteration, the above-mentioned local model parameters may be initialized by the server before the multi-round iteration starts, and then the initialized model parameters are issued or provided to each participant, so that Each participant can use the above initialized model parameters as local model parameters. Of course, in practical applications, it is also possible for each participant to agree on the structure of the model (such as which model to use, the number of layers of the model, the number of neurons in each layer, etc.), and then perform the same initialization. to get the respective local model parameters.

当第t轮迭代为非首轮迭代时，上述本地模型参数可以是在第t-1轮迭代中更新得到。When the t-th iteration is not the first iteration, the above local model parameters may be updated in the t-1th iteration.

最后，上述局部参数向量可以包括梯度向量或模型参数向量，关于局部参数向量的确定可参考现有技术。以梯度向量为例来说，其确定方法可以如下：可以先根据本地样本集以及本地模型参数，确定预测结果，再根据预测结果以及样本标签，确定预测损失。最后，再根据预测损失，并利用反向传播法，确定本地模型参数对应的梯度向量。Finally, the above-mentioned local parameter vector may include a gradient vector or a model parameter vector, and reference may be made to the prior art for determination of the local parameter vector. Taking the gradient vector as an example, its determination method can be as follows: first determine the prediction result according to the local sample set and local model parameters, and then determine the prediction loss according to the prediction result and sample labels. Finally, according to the prediction loss, and using the backpropagation method, the gradient vector corresponding to the local model parameters is determined.

应理解，在实际应用中，上述局部参数向量的确定方法也包括多轮迭代，具体地的迭代轮次可以预先设定。It should be understood that, in practical applications, the above method for determining the local parameter vector also includes multiple rounds of iterations, and the specific iteration rounds may be preset.

步骤204，服务器在判断t等于预设的目标轮次的情况下，并行进行第一条件的判断和维度收敛检测。In step 204, when the server judges that t is equal to the preset target round, the judgment of the first condition and the detection of dimension convergence are performed in parallel.

应理解，上述t即为本轮迭代的轮次。It should be understood that the above t is the round of the current iteration.

在实际应用中，上述预设的目标轮次可以为多个。在预设的目标轮次为多个的情况下，可以将t依次与每个预设的目标轮次进行比对，以判断t是否等于预设的目标轮次。In practical applications, there may be multiple preset target rounds. In the case of multiple preset target rounds, t may be compared with each preset target round in turn to determine whether t is equal to the preset target round.

上述的第一条件的判断也可以理解为是对模型更新过程的关键阶段进行识别。该第一条件可以包括：第t+1轮学习率与第t轮学习率之比小于学习率阈值，或者，第t轮聚合参数向量与第t-1轮聚合参数向量的目标距离不小于距离阈值。The judgment of the above-mentioned first condition can also be understood as identifying a key stage of the model updating process. The first condition may include: the ratio of the learning rate of the t+1th round to the learning rate of the tth round is less than the learning rate threshold, or the target distance between the aggregated parameter vector of the tth round and the aggregated parameter vector of the t-1th round is not less than the distance threshold.

在机器学习中，梯度下降是一个广泛被用来最小化模型误差的参数优化算法。梯度下降法通过多轮迭代，并在每一轮迭代中最小化损失函数来估计模型参数。在梯度下降法中，通常会给定统一的学习率，该学习率用于在迭代过程中控制模型的学习进度。In machine learning, gradient descent is a parameter optimization algorithm widely used to minimize model error. The gradient descent method estimates model parameters through multiple iterations and minimizes the loss function in each iteration. In the gradient descent method, a uniform learning rate is usually given, which is used to control the learning progress of the model during the iterative process.

通常而言，在前期的迭代中，学习率较大，从而前进的步长会较长，这时便能以较快的速度进行梯度下降。而在后期的迭代中，逐步减小学习率，以减小学习步长，这样有助于模型的收敛，更容易接近最优解。Generally speaking, in the early iterations, the learning rate is larger, so the forward step size will be longer, and then the gradient can be descended at a faster speed. In the later iterations, the learning rate is gradually reduced to reduce the learning step size, which helps the convergence of the model and makes it easier to approach the optimal solution.

故而本方案，基于相邻两轮的学习率的变化幅度，来识别关键阶段。具体地，在第t+1轮学习率与第t轮学习率之比小于学习率阈值时，判定为第一条件满足，也即进入模型更新过程的关键阶段。参照前述的学习率设定方法，可以得出本方案是将后期的迭代作为模型更新过程的关键阶段。Therefore, this scheme identifies the key stage based on the change range of the learning rate between two adjacent rounds. Specifically, when the ratio of the learning rate of the t+1th round to the learning rate of the tth round is less than the learning rate threshold, it is determined that the first condition is satisfied, that is, entering a critical stage of the model update process. Referring to the aforementioned learning rate setting method, it can be concluded that this solution uses the later iterations as a key stage in the model update process.

此外，本方案还基于相邻两轮参数的变化幅度，来识别关键阶段。具体地，在第t轮聚合参数向量与第t-1轮聚合参数向量的目标距离不小于距离阈值时，判定为第一条件满足，也即进入模型更新过程的关键阶段。也就是说，本方案是将参数变化幅度较大的迭代作为模型更新过程的关键阶段。In addition, this scheme also identifies critical stages based on the variation range of parameters between two adjacent rounds. Specifically, when the target distance between the aggregation parameter vector of the t-th round and the aggregation parameter vector of the t-1th round is not less than the distance threshold, it is determined that the first condition is satisfied, that is, entering a critical stage of the model update process. That is to say, this scheme takes the iteration with large parameter changes as the key stage of the model update process.

上述的第t轮聚合参数向量，通过聚合n个参与方发送的n份局部参数向量而得到。在一个示例中，通过对n份局部参数向量求和，得到第t轮聚合参数向量。The aforementioned t-th round of aggregation parameter vectors is obtained by aggregating n parts of local parameter vectors sent by n participants. In one example, the t-th round aggregation parameter vector is obtained by summing n local parameter vectors.

上述的第t-1轮聚合参数向量与第t轮聚合参数向量的计算方法相似，其是通过聚合n个参与方在第t-1轮发送的n份局部参数向量而得到。The calculation method of the aggregation parameter vector of the t-1th round above is similar to the calculation method of the aggregation parameter vector of the t-th round, which is obtained by aggregating n parts of local parameter vectors sent by n participants in the t-1 round.

在一个示例中，上述目标距离，通过计算第t轮聚合参数向量与第t-1轮聚合参数向量的二阶范数距离，以及第t-1轮聚合参数向量的二阶范数后，再计算二阶范数距离与二阶范数之比而得到。具体的计算公式可以如下：In an example, the above target distance is calculated by calculating the second-order norm distance between the t-th round of aggregation parameter vector and the t-1-th round of aggregation parameter vector, and the second-order norm of the t-1-th round of aggregation parameter vector, and then It is obtained by calculating the ratio of the second-order norm distance to the second-order norm. The specific calculation formula can be as follows:

（公式1）

(Formula 1)

其中，d为目标距离，grad_t为第t轮聚合参数向量，grad_t-1为第t-1轮聚合参数向量。Among them, d is the target distance, grad _t is the aggregation parameter vector of the t-th round, and grad _t-1 is the aggregation parameter vector of the t-1 round.

总之，在相邻两轮学习率之比小于学习率阈值，或者，在相邻两轮聚合参数向量的目标距离不小于距离阈值的情况下，判定第一条件满足，也即进入模型更新过程的关键阶段，否则判定第一条件不满足。In short, when the ratio of the learning rate between two adjacent rounds is less than the learning rate threshold, or when the target distance of the aggregated parameter vectors between two adjacent rounds is not less than the distance threshold, it is judged that the first condition is satisfied, that is, entering the model update process critical stage, otherwise it is judged that the first condition is not satisfied.

步骤204中的维度收敛检测可以包括：对n份局部参数向量对应于每个维度j的n个元素值进行求平均和求方差，并根据对每个维度j计算的平均值和方差值，确定对应于每个维度j的信噪比，该信噪比用于指示对应维度的收敛情况。The dimensional convergence detection in step 204 may include: averaging and calculating the variance of the n element values corresponding to each dimension j of n local parameter vectors, and according to the average and variance values calculated for each dimension j, A signal-to-noise ratio corresponding to each dimension j is determined, and the signal-to-noise ratio is used to indicate the convergence of the corresponding dimension.

也即针对k个维度中的每个维度，对n份局部参数向量对应于该维度的n个元素值进行求平均和求方差，得到k个维度对应的k个平均值和k个方差值。之后，可以计算每个维度j对应的1个平均值和1个方差值之比，得到每个维度j对应的信噪比。这里k个维度对应的k个信噪比可以形成信噪比向量。That is, for each dimension in the k dimensions, the n element values corresponding to the dimension of the n local parameter vectors are averaged and the variance is calculated to obtain k average values and k variance values corresponding to the k dimensions . After that, the ratio of 1 mean value and 1 variance value corresponding to each dimension j can be calculated to obtain the signal-to-noise ratio corresponding to each dimension j. Here, k signal-to-noise ratios corresponding to k dimensions can form a signal-to-noise ratio vector.

接着，对于每个维度j对应的信噪比，可以将该信噪比与对应的占比阈值进行比较，如果小于占比阈值，则说明该维度已经进入了平稳阶段，接近收敛，从而将对应于该维度的指示信息确定为压缩指示。而如果不小于占比阈值，则说明该维度还在动态变化中，未收敛，从而将对应于该维度的指示信息确定为不压缩指示。Next, for the signal-to-noise ratio corresponding to each dimension j, the signal-to-noise ratio can be compared with the corresponding proportion threshold. If it is less than the proportion threshold, it means that the dimension has entered a stable stage and is close to convergence, so the corresponding The indication information in this dimension is determined as a compression indication. If it is not less than the proportion threshold, it means that the dimension is still changing dynamically and has not converged, so the indication information corresponding to the dimension is determined as an indication of no compression.

在针对每个维度j，确定对应的指示信息之后，就可以得到对应于k个维度的k个指示信息，每个指示信息用于指示在本轮以及后续的若干轮迭代中是否对对应维度进行压缩处理。换句话说，每个参与方i在本轮以及后续的若干轮迭代中，根据该k个指示信息，确定是否对所确定的n份局部参数向量的每个维度进行压缩或不压缩处理。After determining the corresponding indication information for each dimension j, k indication information corresponding to k dimensions can be obtained, and each indication information is used to indicate whether to perform Compression processing. In other words, each participant i determines whether to compress or not compress each dimension of the determined n local parameter vectors according to the k indication information in the current round and subsequent rounds of iterations.

步骤206，服务器在第一条件不满足的情况下，向每个参与方i发送对应于k个维度的k个指示信息。Step 206, when the first condition is not met, the server sends k pieces of indication information corresponding to k dimensions to each participant i.

换句话说，服务器在识别模型更新过程还未进入到关键阶段时，向每个参与方i发送对应于k个维度的k个指示信息，以便于每个参与方i根据该k个指示信息，对对应的局部参数向量的每个维度j进行压缩或不压缩处理。而在识别模型更新过程进入到关键阶段时，不发送对应于k个维度的k个指示信息。也即，维度收敛检测的结果（即k个指示信息）在第一条件不满足的情况下才被使用。In other words, when the identification model update process has not entered a critical stage, the server sends k indication information corresponding to k dimensions to each participant i, so that each participant i can, according to the k indication information, Each dimension j of the corresponding local parameter vector is compressed or not compressed. However, when the recognition model update process enters a critical stage, k pieces of indication information corresponding to the k dimensions are not sent. That is, the results of dimensionality convergence detection (that is, the k pieces of indication information) are used only when the first condition is not satisfied.

步骤208，每个参与方i根据k个指示信息，对对应的局部参数向量的每个维度j进行压缩或不压缩处理，得到目标参数向量，并将其提供给服务器。Step 208 , each participant i compresses or does not compress each dimension j of the corresponding local parameter vector according to the k indication information, obtains a target parameter vector, and provides it to the server.

在一个示例中，上述目标参数向量通过以下步骤获得：In one example, the above target parameter vector is obtained by the following steps:

对于k个维度中的每个维度j，判断维度j对应的指示信息是压缩指示还是不压缩指示。在该指示信息为压缩指示的情况下，对局部参数向量对应于维度j的元素值进行量化处理，得到量化值，作为处理结果。该量化值所含比特数小于元素值所含比特数。在该指示信息为不压缩指示的情况下，保留局部参数向量对应于维度j的元素值，作为处理结果。基于k个维度中各个维度各自对应的处理结果，形成目标参数向量。For each dimension j in the k dimensions, it is judged whether the indication information corresponding to dimension j is a compression indication or an uncompression indication. If the instruction information is a compression instruction, perform quantization processing on the element value of the local parameter vector corresponding to the dimension j to obtain a quantization value as a processing result. The quantized value contains a smaller number of bits than the element value. In the case that the indication information is an indication of no compression, the element value corresponding to the dimension j of the local parameter vector is reserved as a processing result. A target parameter vector is formed based on processing results corresponding to respective dimensions in the k dimensions.

在一个例子中，上述量化处理可以包括：将维度j对应的元素值（一般为包含32个比特的浮点数）转换为整数（比如，乘以10的8次方），然后从末尾开始截取预定数目个比特（比如，4个比特或者8个比特）作为对应的量化值。In an example, the above quantization process may include: converting the element value corresponding to dimension j (generally a floating point number containing 32 bits) into an integer (for example, multiplying by 10 to the 8th power), and then intercepting the predetermined value from the end A number of bits (for example, 4 bits or 8 bits) are used as corresponding quantization values.

当然，在实际应用中，还可以通过其它量化处理方法，获取对应于任一维度的量化值，本说明书对此不作限定。Of course, in practical applications, quantization values corresponding to any dimension may also be obtained through other quantization processing methods, which is not limited in this specification.

在另一个示例中，上述目标参数向量通过以下步骤获得：In another example, the above target parameter vector is obtained by the following steps:

对于k个维度中的每个维度j，判断维度j对应的指示信息是压缩指示还是不压缩指示。在该指示信息为压缩指示的情况下，判断局部参数向量对应于维度j的元素值是否小于元素值阈值，如果是，则将0作为处理结果，否则将元素值本身作为处理结果。在该指示信息为不压缩指示的情况下，保留局部参数向量对应于维度j的元素值，作为处理结果。基于k个维度中各个维度各自对应的处理结果，形成目标参数向量。For each dimension j in the k dimensions, it is judged whether the indication information corresponding to dimension j is a compression indication or an uncompression indication. If the indication information is a compression indication, it is judged whether the element value corresponding to the dimension j of the local parameter vector is smaller than the element value threshold, and if yes, 0 is taken as the processing result, otherwise, the element value itself is taken as the processing result. In the case that the indication information is an indication of no compression, the element value corresponding to the dimension j of the local parameter vector is reserved as a processing result. A target parameter vector is formed based on processing results corresponding to respective dimensions in the k dimensions.

步骤210，服务器基于n个参与方发送的n份目标参数向量，获取业务模型的第一更新参数，并将其下发至每个参与方i，以供每个参与方i基于第一更新参数，更新其本地模型参数，以用于下一轮迭代。Step 210, the server obtains the first update parameter of the business model based on n target parameter vectors sent by n participants, and sends it to each participant i for each participant i to update based on the first update parameter , to update its local model parameters for the next iteration.

在一个示例中，上述获取业务模型的第一更新参数，包括：聚合n份目标参数向量，得到更新的第t轮聚合参数向量。将业务模型的全局模型参数减去更新的第t轮聚合参数向量与第t轮学习率的乘积，得到业务模型的第一更新参数。In an example, the acquisition of the first update parameters of the business model includes: aggregating n target parameter vectors to obtain an updated t-th round of aggregation parameter vectors. The first updated parameter of the business model is obtained by subtracting the product of the updated aggregation parameter vector of the t-th round and the learning rate of the t-th round from the global model parameters of the business model.

其中，上述聚合n份目标参数向量可以包括：对n份目标参数向量进行求平均或者求加权平均，得到更新的第t轮聚合参数向量。Wherein, the above-mentioned aggregating n target parameter vectors may include: averaging or weighting the n target parameter vectors to obtain an updated t-th round of aggregation parameter vectors.

在一个例子中，可以参照如下公式得到业务模型的第一更新参数。In an example, the first update parameter of the business model can be obtained by referring to the following formula.

（公式2）

(Formula 2)

其中， w_t为第一更新参数，w_t-1为服务器当前维护的全局模型参数。η_t为第t轮学习率，也称学习步长，n为目标参数向量的数目，m_i为第i个目标参数向量。Wherein, w _t is the first update parameter, and w _t-1 is the global model parameter currently maintained by the server. η _t is the learning rate of the t-th round, also known as the learning step size, n is the number of target parameter vectors, and m _i is the i-th target parameter vector.

应理解，在获取到上述第一更新参数之后，服务器可以利用该第一更新参数，更新（或替换）其当前维护的全局模型参数，从而得到更新的全局模型参数，以用于下一轮迭代。It should be understood that after obtaining the above-mentioned first update parameters, the server can use the first update parameters to update (or replace) the global model parameters it currently maintains, so as to obtain updated global model parameters for the next iteration .

此外，每个参与方i在接收到上述第一更新参数后，可以利用该第一更新参数，更新（或替换）其本地维护的本地模型参数，从而得到更新的本地模型参数，以用于下一轮迭代。In addition, after receiving the above-mentioned first update parameters, each participant i can use the first update parameters to update (or replace) its locally maintained local model parameters, so as to obtain updated local model parameters for the following One iteration.

应理解，由于在第t轮迭代中接收到了k个指示信息，因此在进入下一轮迭代后，每个参与方i根据本地样本集和更新的本地模型参数，确定第t+1轮局部参数向量后，基于在第t轮接收的k个指示信息，对第t+1轮局部参数向量的每个维度j进行压缩或不压缩处理，得到第t+1轮目标参数向量，并将其提供给服务器。之后，服务器可以聚合n个参与方发送的n份第t+1轮目标参数向量，得到第t+1轮更新参数，并将其下发至每个参与方i，从而服务器和每个参与方i进行第t+1轮模型参数（包括服务器维护的全局模型参数以及每个参与方i维护的本地模型参数）的更新；依次类推，直至到达压缩结束条件。这里的压缩结束条件包括，接收到更新的k个指示信息或停止压缩指示，或者，满足迭代结束条件（比如，迭代次数达到预定轮次或者全局模型参数收敛）等。It should be understood that since k indications are received in the t-th iteration, after entering the next iteration, each participant i determines the local parameters of the t+1th round according to the local sample set and the updated local model parameters After vectoring, based on the k indication information received in the tth round, each dimension j of the local parameter vector of the t+1th round is compressed or not compressed, and the target parameter vector of the t+1th round is obtained and provided to to the server. Afterwards, the server can aggregate the n copies of the t+1 round target parameter vectors sent by n participants, obtain the t+1 round update parameters, and send them to each participant i, so that the server and each participant i updates the model parameters of the t+1 round (including the global model parameters maintained by the server and the local model parameters maintained by each participant i); and so on until the compression end condition is reached. The compression end condition here includes receiving updated k indication information or stopping the compression indication, or satisfying the iteration end condition (for example, the number of iterations reaches a predetermined number of rounds or the global model parameters converge) and the like.

需要说明，由于在从第t轮开始的各轮迭代中，每个参与方i向服务器发送相比于初始的局部参数向量数据量较小的目标参数向量，从而本方案可以减小在模型更新过程中参与方与服务器之间的数据传输量，进而可以提升数据传输效率，这有助于提升模型更新效率。It should be noted that since each participant i sends a target parameter vector with a smaller amount of data than the initial local parameter vector to the server in each round of iterations starting from the tth round, this scheme can reduce the time spent on model update During the process, the amount of data transmission between the participants and the server can improve the efficiency of data transmission, which helps to improve the efficiency of model updating.

以上对在第一条件满足的情况下，也即在非关键阶段，第t轮模型参数（包括服务器维护的全局模型参数以及每个参与方i维护的本地模型参数）的更新方法进行了说明。在第一条件不满足的情况下，也即在关键阶段，直接利用每个参与方i发送的局部参数向量，进行第t轮模型参数的更新。也即在关键阶段，不对每个参与方的局部参数向量的各个维度进行压缩处理，从而使得该局部参数向量包含更多有用的信息，进而不会影响模型更新的过程。The above describes the updating method of the model parameters (including the global model parameters maintained by the server and the local model parameters maintained by each participant i) in the t-th round when the first condition is satisfied, that is, in the non-critical stage. When the first condition is not satisfied, that is, in the critical stage, the local parameter vector sent by each participant i is directly used to update the model parameters of the t-th round. That is to say, in the critical stage, each dimension of the local parameter vector of each participant is not compressed, so that the local parameter vector contains more useful information, which will not affect the process of model updating.

上述直接利用每个参与方i发送的局部参数向量，进行第t轮模型参数的更新，包括: 服务器基于n份局部参数向量，获取业务模型的第二更新参数，并将其下发至每个参与方i，以供每个参与方i基于第二更新参数，更新其本地模型参数，以用于下一轮迭代。The above directly uses the local parameter vector sent by each participant i to update the t-th round of model parameters, including: the server obtains the second update parameter of the business model based on n local parameter vectors, and sends it to each Participant i, for each participant i to update its local model parameters based on the second update parameters for the next round of iteration.

当然，在实际应用中，如果识别为关键阶段，服务器可以指示各参与方针对各自的局部参数向量进行低压缩比的压缩，本说明书对此不作限定。Of course, in practical applications, if it is identified as a critical stage, the server may instruct each participant to perform compression with a low compression ratio for their respective local parameter vectors, which is not limited in this specification.

这里的第二更新参数的获取方法可以参照上述公式2，本说明书在此不复赘述。Here, the method for obtaining the second update parameter may refer to the above formula 2, which will not be repeated here in this specification.

还需要说明，在步骤204中，如果服务器判断t不等于预设的目标轮次，或者说，判断本轮迭代的轮次不等于预设的目标轮次，那么步骤206-步骤210可以替换为：服务器基于n个参与方发送的n份局部参数向量，获取业务模型的其它更新参数，并将其下发至每个参与方i，以供每个参与方i基于其它更新参数，更新其本地模型参数，以用于下一轮迭代。It should also be noted that in step 204, if the server judges that t is not equal to the preset target round, or in other words, judges that the current iteration round is not equal to the preset target round, then steps 206-210 can be replaced by : The server obtains other update parameters of the business model based on n local parameter vectors sent by n participants, and sends them to each participant i for each participant i to update its local Model parameters to use for the next iteration.

最后，在多轮迭代后，每个参与方i将其得到的本地模型参数，作为其与其它参与方协同更新的业务模型。Finally, after multiple rounds of iterations, each participant i uses the local model parameters it obtains as its business model for collaborative update with other participants.

以任一参与方为例来说，在其本地样本集中的样本对应的业务对象为图片的情况下，那么其与其它参与方协同更新的业务模型可以为图片识别模型。在其本地样本集中的样本对应的业务对象为文本的情况下，那么其与其它参与方协同更新的业务模型可以为文本识别模型。在其本地样本集中的样本对应的业务对象为商品和用户的情况下，那么其与其它参与方协同更新的业务模型可以为商品推荐模型等等。Taking any participant as an example, if the business object corresponding to the sample in its local sample set is a picture, then the business model that it cooperates with other participants to update can be a picture recognition model. In the case that the business object corresponding to the samples in its local sample set is text, the business model that it coordinates with other parties to update may be a text recognition model. In the case that the business objects corresponding to the samples in its local sample set are commodities and users, then the business model that it coordinates with other participants to update can be a commodity recommendation model and so on.

综上，本说明书实施例提供的基于参数压缩的业务模型联合更新方法，可以对模型更新过程中的关键阶段进行识别，当识别进入关键阶段时，为了避免影响模型参数收敛，尽可能减少对参与方待传输参数的压缩。而当识别进入非关键阶段时，尽可能提高对参与方待传输参数的压缩，降低通信量。并且从识别到参数的压缩是自动执行的，从而提高了决策的自动程度。此外，服务器基于维度收敛检测的结果，向每个参与方发送k个指示信息，可以实现对参与方参数压缩的自动控制。To sum up, the business model joint update method based on parameter compression provided by the embodiment of this specification can identify the key stages in the model update process. Compression of parameters to be transmitted. When the recognition enters a non-critical stage, the compression of the parameters to be transmitted by the participants should be improved as much as possible to reduce the communication traffic. And the compression from recognition to parameters is performed automatically, thereby improving the automatic degree of decision-making. In addition, based on the result of dimensionality convergence detection, the server sends k pieces of instruction information to each participant, which can realize automatic control of participant parameter compression.

此外，本方案自动识别出模型更新的关键阶段，从而采取合适的压缩措施，避免了盲目压缩带来的模型精度的损失，甚至因为盲目压缩使得模型参数不收敛。通过本方案的自动识别方法，使得参与方能够在必要情况下压缩通信，降低通信量，提升模型更新速度，并且提高了模型更新的可扩展性，可以支持更大规模的集群，更大规模的模型大小。In addition, this solution automatically identifies the key stage of model update, so as to take appropriate compression measures, avoiding the loss of model accuracy caused by blind compression, and even the model parameters do not converge due to blind compression. Through the automatic identification method of this scheme, the participants can compress the communication when necessary, reduce the communication traffic, improve the model update speed, and improve the scalability of the model update, which can support larger clusters and larger model size.

总之，本方案是一种自适应自动切换压缩与不压缩参数的方法，且通过在关键阶段选择低压缩和其它地方的高压缩，可以大幅度降低通信量，并且保证了模型参数收敛的精度和效果。In short, this scheme is an adaptive and automatic switching method of compressing and non-compressing parameters, and by selecting low compression in critical stages and high compression in other places, the traffic can be greatly reduced, and the accuracy and accuracy of model parameter convergence can be guaranteed. Effect.

以下以业务模型为商品推荐模型为例，对本方案进行说明。The following takes the business model as a product recommendation model as an example to illustrate this solution.

图3示出根据一个实施例的基于参数压缩的商品推荐模型联合更新方法交互图。需要说明，该方法涉及多轮迭代，图3中示出其中第t（t为正整数）轮迭代包括的交互步骤，并且，因参与第t轮迭代的各个参与方与服务器的交互过程相近，所以图3中主要示出参与该第t轮迭代的任一个参与方（为便于描述，称作第一参与方）与服务器的交互步骤，参与该轮迭代的其它参与方与服务器的交互步骤，可以参见该第一参与方与服务器的交互步骤。如图3所示，该方法可以包括如下步骤：Fig. 3 shows an interaction diagram of a method for jointly updating product recommendation models based on parameter compression according to an embodiment. It should be noted that this method involves multiple rounds of iterations. Figure 3 shows the interaction steps included in the tth (t is a positive integer) round of iterations, and because the interaction process between each participant participating in the tth iteration and the server is similar, Therefore, Figure 3 mainly shows the interaction steps between any participant participating in the t-th iteration (for convenience of description, referred to as the first participant) and the server, and the interaction steps between other participants participating in the iteration and the server. Refer to the interaction steps between the first participant and the server. As shown in Figure 3, the method may include the following steps:

步骤302，每个参与方i根据本地样本集和商品推荐模型的本地模型参数，确定具有k个维度的局部参数向量，并将其提供给服务器。Step 302, each participant i determines a local parameter vector with k dimensions according to the local sample set and the local model parameters of the commodity recommendation model, and provides it to the server.

上述本地样本集中的样本所对应的业务对象包括用户和商品，样本的特征包括用户属性（如，可以为职业、爱好以及学历等）、操作行为（如，可以为浏览、点击以及关闭等）和商品属性（如，可以为商品类别、商品价格以及商品详情等）。The business objects corresponding to the samples in the above local sample set include users and commodities, and the characteristics of the samples include user attributes (such as occupation, hobbies, and education, etc.), operation behavior (such as browsing, clicking, and closing, etc.) and Product attributes (for example, it can be product category, product price, product details, etc.).

步骤304，服务器在判断t等于预设的目标轮次的情况下，并行进行第一条件的判断和维度收敛检测。In step 304, when judging that t is equal to the preset target round, the server performs the judging of the first condition and the dimension convergence detection in parallel.

上述第一条件包括：第t+1轮学习率与第t轮学习率之比小于学习率阈值，或者，第t轮聚合参数向量与第t-1轮聚合参数向量的目标距离不小于距离阈值。该第t轮聚合参数向量，通过聚合n个参与方发送的n份局部参数向量而得到。The above-mentioned first condition includes: the ratio of the learning rate of the t+1 round to the learning rate of the t-th round is less than the learning rate threshold, or the target distance between the aggregation parameter vector of the t-th round and the aggregation parameter vector of the t-1 round is not less than the distance threshold . The t-th round of aggregation parameter vector is obtained by aggregating n parts of local parameter vectors sent by n participants.

上述维度收敛检测包括：对n份局部参数向量对应于每个维度j的n个元素值进行求平均和求方差，并根据对每个维度j计算的平均值和方差值，确定对应于每个维度j的信噪比，该信噪比用于指示对应维度的收敛情况；以及将每个维度j对应的信噪比与占比阈值进行比对，得到对应的指示信息，该指示信息用于指示在本轮以及后续的若干轮迭代中是否对维度j进行压缩处理。The above-mentioned dimensional convergence detection includes: averaging and calculating the variance of the n element values corresponding to each dimension j of n local parameter vectors, and determining the corresponding The signal-to-noise ratio of a dimension j, the signal-to-noise ratio is used to indicate the convergence of the corresponding dimension; and the signal-to-noise ratio corresponding to each dimension j is compared with the proportion threshold to obtain the corresponding indication information, the indication information is used Indicates whether to perform compression processing on dimension j in the current round and subsequent rounds of iterations.

步骤306，服务器在第一条件不满足的情况下，向每个参与方i发送对应于k个维度的k个指示信息。Step 306, when the first condition is not met, the server sends k pieces of indication information corresponding to k dimensions to each participant i.

步骤308，每个参与方i根据k个指示信息，对对应的局部参数向量的每个维度j进行压缩或不压缩处理，得到目标参数向量，并将其提供给服务器。Step 308 , each participant i compresses or does not compress each dimension j of the corresponding local parameter vector according to the k indication information, obtains a target parameter vector, and provides it to the server.

步骤310，服务器基于n个参与方发送的n份目标参数向量，获取商品推荐模型的第一更新参数，并将其下发至每个参与方i，以供每个参与方i基于该第一更新参数，更新其本地模型参数，以用于下一轮迭代。In step 310, the server obtains the first updated parameter of the product recommendation model based on n target parameter vectors sent by n participants, and sends it to each participant i for each participant i to use based on the first update parameter. Update parameters, which update their local model parameters for the next iteration.

在多轮迭代后，每个参与方i将其得到的本地模型参数，作为其与其它参与方协同更新的商品推荐模型。After multiple rounds of iterations, each participant i uses the local model parameters it obtains as its product recommendation model to be updated collaboratively with other participants.

综上，本说明书实施例提供的基于参数压缩的商品推荐模型联合更新方法，可以在节约通信资源的情况下，对商品推荐模型进行更新。To sum up, the method for jointly updating commodity recommendation models based on parameter compression provided by the embodiment of this specification can update commodity recommendation models while saving communication resources.

与上述基于参数压缩的业务模型联合更新方法对应地，本说明书一个实施例还提供的一种基于参数压缩的业务模型联合更新系统，如图4所示，该系统可以包括：服务器402和n个参与方404。Corresponding to the above method for jointly updating business models based on parameter compression, an embodiment of this specification also provides a system for jointly updating business models based on parameter compression. As shown in FIG. 4 , the system may include: a server 402 and n Participant 404 .

每个参与方404，用于根据本地样本集和业务模型的本地模型参数，确定具有k个维度的局部参数向量，并将其提供给服务器402。Each participant 404 is configured to determine a local parameter vector with k dimensions according to the local sample set and the local model parameters of the service model, and provide it to the server 402 .

其中，该局部参数向量包括梯度向量或模型参数向量。Wherein, the local parameter vector includes a gradient vector or a model parameter vector.

服务器402，用于在t等于预设的目标轮次的情况下，并行进行第一条件的判断和维度收敛检测。The server 402 is configured to perform the judgment of the first condition and the dimension convergence detection in parallel when t is equal to the preset target round.

其中，第一条件包括：第t+1轮学习率与第t轮学习率之比小于学习率阈值，或者，第t轮聚合参数向量与第t-1轮聚合参数向量的目标距离不小于距离阈值。该第t轮聚合参数向量，通过聚合n个参与方404发送的n份局部参数向量而得到。Among them, the first condition includes: the ratio of the learning rate of the t+1 round to the learning rate of the t-th round is less than the learning rate threshold, or the target distance between the aggregation parameter vector of the t-th round and the aggregation parameter vector of the t-1 round is not less than the distance threshold. The t-th round of aggregation parameter vectors is obtained by aggregating n parts of local parameter vectors sent by n participants 404 .

其中，上述目标距离，通过计算第t轮聚合参数向量与第t-1轮聚合参数向量的二阶范数距离，以及第t-1轮聚合参数向量的二阶范数后，再计算该二阶范数距离与该二阶范数之比而得到。Among them, the above-mentioned target distance is calculated by calculating the second-order norm distance between the t-th round aggregation parameter vector and the t-1-th round aggregation parameter vector, and the second-order norm of the t-1-th round aggregation parameter vector, and then calculating the second-order norm. It is obtained by the ratio of the first-order norm distance to the second-order norm.

上述维度收敛检测包括：对n份局部参数向量对应于每个维度j的n个元素值进行求平均和求方差，并根据对每个维度j计算的平均值和方差值，确定对应于每个维度j的信噪比，该信噪比用于指示对应维度的收敛情况；以及将每个维度j对应的信噪比与占比阈值进行大小比较，得到对应的指示信息，该指示信息用于指示在本轮以及后续的若干轮迭代中是否对维度j进行压缩处理。The above-mentioned dimensional convergence detection includes: averaging and calculating the variance of the n element values corresponding to each dimension j of n local parameter vectors, and determining the corresponding The signal-to-noise ratio of a dimension j, the signal-to-noise ratio is used to indicate the convergence of the corresponding dimension; and the signal-to-noise ratio corresponding to each dimension j is compared with the proportion threshold to obtain the corresponding indication information, and the indication information is used Indicates whether to perform compression processing on dimension j in the current round and subsequent rounds of iterations.

其中，上述确定对应于每个维度j的信噪比，包括：Wherein, the above determination corresponds to the signal-to-noise ratio of each dimension j, including:

计算对应于每个维度j的平均值与方差值之比，得到对应于每个维度j的信噪比。Calculate the ratio of the mean value to the variance value corresponding to each dimension j, and obtain the signal-to-noise ratio corresponding to each dimension j.

此外，上述每个指示信息，在对应维度的信噪比小于占比阈值的情况下为压缩指示，在对应维度的信噪比不小于占比阈值的情况下为不压缩指示。In addition, each of the above indication information is a compression indication when the signal-to-noise ratio of the corresponding dimension is less than the proportion threshold, and is a non-compression indication when the signal-to-noise ratio of the corresponding dimension is not less than the proportion threshold.

服务器402，还用于在第一条件不满足的情况下，向每个参与方404发送对应于k个维度的k个指示信息。The server 402 is further configured to send k pieces of indication information corresponding to the k dimensions to each participant 404 when the first condition is not met.

每个参与方404，还用于根据k个指示信息，对对应的局部参数向量的每个维度j进行压缩或不压缩处理，得到目标参数向量，并将其提供给服务器402。Each participant 404 is further configured to compress or not compress each dimension j of the corresponding local parameter vector according to the k indication information to obtain a target parameter vector and provide it to the server 402 .

在一个示例中，每个参与方404具体用于：对于k个维度中的每个维度j，判断维度j对应的指示信息是压缩指示还是不压缩指示。在指示信息为压缩指示的情况下，对局部参数向量对应于维度j的元素值进行量化处理，得到量化值，作为处理结果。该量化值所含比特数小于所述元素值所含比特数。在指示信息为不压缩指示的情况下，保留局部参数向量对应于维度j的元素值，作为处理结果。基于k个维度中各个维度各自对应的处理结果，形成目标参数向量。In an example, each participant 404 is specifically configured to: for each dimension j in the k dimensions, determine whether the indication information corresponding to dimension j is a compression indication or an uncompression indication. In the case that the instruction information is a compression instruction, quantization processing is performed on the element value of the local parameter vector corresponding to the dimension j, and a quantization value is obtained as a processing result. The number of bits contained in the quantized value is less than the number of bits contained in the value of the element. In the case that the indication information is an indication of no compression, the element value corresponding to the dimension j of the local parameter vector is reserved as a processing result. A target parameter vector is formed based on processing results corresponding to respective dimensions in the k dimensions.

在另一个示例中，每个参与方404具体用于：对于k个维度中的每个维度j，判断维度j对应的指示信息是压缩指示还是不压缩指示。在指示信息为压缩指示的情况下，判断局部参数向量对应于维度j的元素值是否小于元素值阈值，如果是，则将0作为处理结果，否则将元素值作为处理结果。在指示信息为不压缩指示的情况下，保留局部参数向量对应于维度j的元素值，作为处理结果。基于k个维度中各个维度各自对应的处理结果，形成目标参数向量。In another example, each participant 404 is specifically configured to: for each dimension j in the k dimensions, determine whether the indication information corresponding to dimension j is a compression indication or an uncompression indication. When the instruction information is a compression instruction, it is judged whether the element value corresponding to the dimension j of the local parameter vector is smaller than the element value threshold, if yes, 0 is taken as the processing result, otherwise, the element value is taken as the processing result. In the case that the indication information is an indication of no compression, the element value corresponding to the dimension j of the local parameter vector is reserved as a processing result. A target parameter vector is formed based on processing results corresponding to respective dimensions in the k dimensions.

服务器402，还用于基于n个参与方404发送的n份目标参数向量，获取业务模型的第一更新参数，并将其下发至每个参与方404，以供每个参与方404基于第一更新参数，更新其本地模型参数，以用于下一轮迭代。The server 402 is further configured to obtain the first update parameter of the business model based on n target parameter vectors sent by n participants 404, and send it to each participant 404 for each participant 404 to use based on the first update parameter A parameter update updates its local model parameters for the next iteration.

服务器402具体用于：Server 402 is specifically used for:

聚合n份目标参数向量，得到更新的第t轮聚合参数向量。将业务模型的全局模型参数减去更新的第t轮聚合参数向量与第t轮学习率的乘积，得到业务模型的第一更新参数。Aggregate n target parameter vectors to obtain the updated round t aggregation parameter vector. The first updated parameter of the business model is obtained by subtracting the product of the updated aggregation parameter vector of the t-th round and the learning rate of the t-th round from the global model parameters of the business model.

可选地，服务器402，还用于在第一条件满足的情况下，基于n份局部参数向量，获取业务模型的第二更新参数，并将其下发至每个参与方404，以供每个参与方404基于第二更新参数，更新其本地模型参数，以用于下一轮迭代。Optionally, the server 402 is further configured to obtain the second updated parameter of the business model based on n local parameter vectors when the first condition is met, and send it to each participant 404 for each A participant 404 updates its local model parameters based on the second update parameters for the next iteration.

本说明书上述实施例装置的各功能模块的功能，可以通过上述方法实施例的各步骤来实现，因此，本说明书一个实施例提供的装置的具体工作过程，在此不复赘述。The functions of each functional module of the device in the above embodiment of this specification can be realized through the steps of the above method embodiment. Therefore, the specific working process of the device provided by one embodiment of this specification will not be repeated here.

本说明书一个实施例提供的基于参数压缩的业务模型联合更新系统，可以在节约通信资源的情况下，对业务模型进行更新。The service model joint update system based on parameter compression provided by an embodiment of this specification can update the service model while saving communication resources.

本说明书中的各个实施例均采用递进的方式描述，各个实施例之间相同相似的部分互相参见即可，每个实施例重点说明的都是与其他实施例的不同之处。尤其，对于系统实施例而言，由于其基本相似于方法实施例，所以描述的比较简单，相关之处参见方法实施例的部分说明即可。Each embodiment in this specification is described in a progressive manner, the same and similar parts of each embodiment can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the system embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for relevant parts, refer to part of the description of the method embodiment.

结合本说明书公开内容所描述的方法或者算法的步骤可以硬件的方式来实现，也可以是由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成，软件模块可以被存放于RAM存储器、闪存、ROM存储器、EPROM存储器、EEPROM存储器、寄存器、硬盘、移动硬盘、CD-ROM或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器，从而使处理器能够从该存储介质读取信息，且可向该存储介质写入信息。当然，存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。另外，该ASIC可以位于服务器中。当然，处理器和存储介质也可以作为分立组件存在于服务器中。The steps of the methods or algorithms described in conjunction with the disclosure of this specification can be implemented in the form of hardware, or can be implemented in the form of a processor executing software instructions. The software instructions can be composed of corresponding software modules, and the software modules can be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, mobile hard disk, CD-ROM or any other form of storage known in the art medium. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be a component of the processor. The processor and storage medium can be located in the ASIC. Alternatively, the ASIC may be located in the server. Of course, the processor and the storage medium can also exist in the server as discrete components.

本领域技术人员应该可以意识到，在上述一个或多个示例中，本发明所描述的功能可以用硬件、软件、固件或它们的任意组合来实现。当使用软件实现时，可以将这些功能存储在计算机可读介质中或者作为计算机可读介质上的一个或多个指令或代码进行传输。计算机可读介质包括计算机存储介质和通信介质，其中通信介质包括便于从一个地方向另一个地方传送计算机程序的任何介质。存储介质可以是通用或专用计算机能够存取的任何可用介质。Those skilled in the art should be aware that, in the above one or more examples, the functions described in the present invention may be implemented by hardware, software, firmware or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.

上述对本说明书特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下，在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外，在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中，多任务处理和并行处理也是可以的或者可能是有利的。The foregoing describes specific embodiments of this specification. Other implementations are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in an order different from that in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. Multitasking and parallel processing are also possible or may be advantageous in certain embodiments.

以上所述的具体实施方式，对本说明书的目的、技术方案和有益效果进行了进一步详细说明，所应理解的是，以上所述仅为本说明书的具体实施方式而已，并不用于限定本说明书的保护范围，凡在本说明书的技术方案的基础之上，所做的任何修改、等同替换、改进等，均应包括在本说明书的保护范围之内。The above-mentioned specific implementation modes further describe the purpose, technical solutions and beneficial effects of this specification in detail. Scope of protection: Any modification, equivalent replacement, improvement, etc. made on the basis of the technical solutions in this specification shall be included in the scope of protection of this specification.

Claims

1. A service model joint updating method based on parameter compression relates to a server and n participants; the method comprises a plurality of iterations, wherein any tth iteration comprises:

each participant i determines a local parameter vector with k dimensions according to the local sample set and the local model parameters of the business model, and provides the local parameter vector with k dimensions to the server;

the server parallelly judges a first condition and detects the dimension convergence under the condition that the t is equal to a preset target turn;

the first condition includes: the ratio of the t +1 th round learning rate to the t-1 th round learning rate is smaller than a learning rate threshold, or the target distance between the t-th round aggregation parameter vector and the t-1 th round aggregation parameter vector is not smaller than a distance threshold; the t-th round aggregation parameter vector is obtained by aggregating n local parameter vectors sent by the n participants;

the dimension convergence detection comprises: averaging and calculating the variance of the n element values of the n local parameter vectors corresponding to each dimension j, and determining the signal-to-noise ratio corresponding to each dimension j according to the average value and the variance value calculated for each dimension j, wherein the signal-to-noise ratio is used for indicating the convergence condition of the corresponding dimension; comparing the signal-to-noise ratio corresponding to each dimension j with an occupation ratio threshold value to obtain corresponding indication information, wherein the indication information is used for indicating whether the dimension j is compressed in the current iteration and a plurality of subsequent iterations;

the server, in case the first condition is not satisfied, sending k pieces of indication information corresponding to the k dimensions to each participant i;

each participant i compresses or decompresses each dimension j of the corresponding local parameter vector according to the k pieces of indication information to obtain a target parameter vector, and provides the target parameter vector to the server;

and the server acquires a first updating parameter of the service model based on the n target parameter vectors sent by the n participants and sends the first updating parameter to each participant i, so that each participant i updates the local model parameter of the participant i based on the first updating parameter for the next iteration.

2. The method of claim 1, further comprising:

and under the condition that the first condition is met, the server acquires second updating parameters of the service model based on the n local parameter vectors and sends the second updating parameters to each participant i, so that each participant i updates the local model parameters of the participant i based on the second updating parameters for the next iteration.

3. The method of claim 1, wherein the target distance is obtained by calculating a second-order norm distance between the t-th aggregation parameter vector and the t-1 st aggregation parameter vector, and a second-order norm of the t-1 st aggregation parameter vector, and then calculating a ratio of the second-order norm distance to the second-order norm.

4. The method of claim 1, wherein the determining a signal-to-noise ratio corresponding to each dimension j comprises:

the ratio of the mean to the variance values corresponding to each dimension j is calculated to obtain the signal-to-noise ratio corresponding to each dimension j.

5. The method of claim 1, wherein each of the k indicators is a compression indicator if the signal-to-noise ratio of the corresponding dimension is less than a duty ratio threshold, and is an uncompression indicator if the signal-to-noise ratio of the corresponding dimension is not less than the duty ratio threshold.

6. The method of claim 1, wherein the target parameter vector is obtained by:

for each dimension j in the k dimensions, judging whether the indication information corresponding to the dimension j is a compression indication or an uncompression indication;

under the condition that the indication information is a compression indication, carrying out quantization processing on the element value of the local parameter vector corresponding to the dimension j to obtain a quantization value as a processing result; the bit number contained in the quantization value is smaller than the bit number contained in the element value;

under the condition that the indication information is the non-compression indication, retaining the element value of the local parameter vector corresponding to the dimension j as a processing result;

and forming the target parameter vector based on the processing result corresponding to each dimension in the k dimensions.

7. The method of claim 1, wherein the target parameter vector is obtained by:

under the condition that the indication information is a compression indication, judging whether an element value of the local parameter vector corresponding to the dimension j is smaller than an element value threshold value, if so, taking 0 as a processing result, otherwise, taking the element value as the processing result;

8. The method of claim 1, wherein the obtaining of the first updated parameter of the business model comprises:

aggregating the n target parameter vectors to obtain an updated t-th aggregation parameter vector;

and subtracting the product of the updated t-th round aggregation parameter vector and the t-th round learning rate from the global model parameter of the business model to obtain a first updating parameter of the business model.

9. The method of claim 1, wherein the local parameter vector comprises a gradient vector or a model parameter vector.

10. A business model joint updating system based on parameter compression comprises a server and n participants;

each participant i is used for determining a local parameter vector with k dimensions according to the local model parameters of the local sample set and the business model and providing the local parameter vector with k dimensions to the server;

the server is used for judging a first condition and performing dimensionality convergence detection in parallel under the condition that the t is equal to a preset target turn;

the dimension convergence detection comprises: averaging and calculating the variance of n element values of the n local parameter vectors corresponding to each dimension j, and determining the signal-to-noise ratio corresponding to each dimension j according to the average value and the variance value calculated for each dimension j, wherein the signal-to-noise ratio is used for indicating the convergence condition of the corresponding dimension; comparing the signal-to-noise ratio corresponding to each dimension j with an occupation ratio threshold value to obtain corresponding indication information, wherein the indication information is used for indicating whether the dimension j is compressed in the current iteration and a plurality of subsequent iterations;

the server is further configured to send k pieces of indication information corresponding to the k dimensions to each participant i if the first condition is not satisfied;

each participant i is further configured to perform compression or non-compression processing on each dimension j of the corresponding local parameter vector according to the k pieces of indication information to obtain a target parameter vector, and provide the target parameter vector to the server;

the server is further configured to obtain a first update parameter of the service model based on the n target parameter vectors sent by the n participants, and send the first update parameter to each participant i, so that each participant i updates the local model parameter of the participant i based on the first update parameter for the next iteration.