CN112990475A

CN112990475A - Model training method and system based on multi-party safety calculation

Info

Publication number: CN112990475A
Application number: CN202110159936.9A
Authority: CN
Inventors: 周亚顺; 赵原; 尹栋
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2021-02-05
Filing date: 2021-02-05
Publication date: 2021-06-18
Anticipated expiration: 2041-02-05
Also published as: CN112990475B

Abstract

The embodiment of the specification discloses a model training method and system based on multi-party security computation. Wherein the method comprises applying to a first computing party, the method comprising: performing cooperative operation with other calculation parties based on the first segment of the model parameter matrix to obtain a first segment of a first product matrix; based on the first segment of the first product matrix, performing cooperative operation with other calculation parties to obtain a first segment of an activation matrix; based on the first fragment and the label matrix of the activation matrix, performing cooperative operation with other calculation parties to obtain a first fragment of the gradient matrix of the current round; determining a first segment of the momentum gradient matrix of the current wheel based on the first segment of the momentum gradient matrix of the previous wheel and the first segment of the gradient matrix of the current wheel; a first patch of the updated model parameter matrix is determined based on the first patch of the model parameter matrix and the first patch of the momentum gradient matrix of the current wheel.

Description

Model training method and system based on multi-party safety calculation

Technical Field

The present disclosure relates to the field of information technology, and in particular, to a model training method and system based on multi-party security computation.

Background

In the big data era, there are very many data islands. Data is often scattered in different enterprises, and enterprises do not trust each other completely due to the consideration of competitive relationship and privacy protection.

In some cases, collaborative security modeling is required between enterprises, so that the models are trained cooperatively by using data of all the enterprises on the premise of sufficiently protecting the privacy and the security of the enterprise data. Data used for model training is dispersed among all the participants of cooperative modeling, and data privacy and safety of all the participants need to be protected in the process of model training.

Therefore, it is necessary to provide a model training method based on multi-party security computation to perform model training while protecting data privacy of each participant.

Disclosure of Invention

One aspect of embodiments of the present specification provides a model training method based on multi-party security computation. The method is applied to a first calculator, and the first calculator holds a label matrix and a first fragment of a model parameter matrix; the method comprises one or more rounds of iterative updating of the model parameters, wherein one round of iterative updating comprises the following steps: performing cooperative operation with other calculation parties based on the first segment of the model parameter matrix to obtain a first segment of a first product matrix; wherein, other calculation parties hold the second segment of the feature matrix and the model parameter matrix; the first product matrix is the product of the model parameter matrix and the characteristic matrix; on the basis of a first segment of the first product matrix, performing cooperative operation with other calculation parties to obtain a first segment of an activation matrix, wherein elements of the activation matrix are activation function values of counterpoint elements in the first product matrix; based on the first fragment and the label matrix of the activation matrix, performing cooperative operation with other calculation parties to obtain a first fragment of the gradient matrix of the current round; wherein, the gradient matrix is the product of the difference between the activation matrix and the label matrix and the characteristic matrix; determining a first segment of the momentum gradient matrix of the current wheel based on the first segment of the momentum gradient matrix of the previous wheel and the first segment of the gradient matrix of the current wheel; a first patch of the updated model parameter matrix is determined based on the first patch of the model parameter matrix and the first patch of the momentum gradient matrix of the current wheel.

Another aspect of an embodiment of the present specification provides a model training system based on multi-party security computation. The system is applied to a first calculator, and the first calculator holds a label matrix and a first fragment of a model parameter matrix; the system is used for carrying out one or more rounds of iterative updating on the model parameters, wherein the system comprises: a first obtaining module, configured to obtain a first segment of a first product matrix based on a first segment of a model parameter matrix and perform cooperative operation with other computation parties; wherein, other calculation parties hold the second segment of the feature matrix and the model parameter matrix; the first product matrix is the product of the model parameter matrix and the characteristic matrix; a second obtaining module, configured to obtain a first segment of an activation matrix based on the first segment of the first product matrix and through cooperative operation with other computing parties, where an element of the activation matrix is an activation function value of a counterpoint element in the first product matrix; the third obtaining module can be used for obtaining a first fragment of the gradient matrix of the current round through cooperative operation with other computing parties based on the first fragment and the label matrix of the activation matrix; wherein, the gradient matrix is the product of the difference between the activation matrix and the label matrix and the characteristic matrix; a first determination module to determine a first slice of the momentum gradient matrix of the current wheel based on the first slice of the momentum gradient matrix of the previous wheel and the first slice of the gradient matrix of the current wheel; a second determination module may be configured to determine a first segment of the updated model parameter matrix based on the first segment of the model parameter matrix and a first segment of a momentum gradient matrix of the current wheel.

Another aspect of embodiments of the present specification provides a model training method based on multi-party security computation. The method is applied to a second calculator, the second calculator holds a feature matrix and a second fragment of a model parameter matrix, the method comprises one or more rounds of iterative updating on the model parameters, wherein the one round of iterative updating comprises the following steps: performing cooperative operation with other calculation parties based on the second segment of the characteristic matrix and the model parameter matrix to obtain a second segment of the first product matrix; wherein, other calculation parties hold a label matrix and a first fragment of a model parameter matrix; the first product matrix is the product of the model parameter matrix and the characteristic matrix; on the basis of a second segment of the first product matrix, performing cooperative operation with other calculation parties to obtain a second segment of the activation matrix, wherein elements of the activation matrix are activation function values of counterpoint elements in the first product matrix; based on the second segment and the feature matrix of the activation matrix, performing cooperative operation with other calculation parties to obtain a second segment of the gradient matrix of the current round; wherein, the gradient matrix is the product of the difference between the activation matrix and the label matrix and the characteristic matrix; determining a second segment of the momentum gradient matrix of the current wheel based on the second segment of the momentum gradient matrix of the previous wheel and the second segment of the gradient matrix of the current wheel; determining a second segment of the updated model parameter matrix based on the second segment of the model parameter matrix and the second segment of the momentum gradient matrix of the current wheel.

Another aspect of an embodiment of the present specification provides a model training system based on multi-party security computation. The system is applied to a second calculator, the second calculator holds a feature matrix and a second fragment of a model parameter matrix, the system comprises one or more rounds of iterative updating on model parameters, and the system comprises: a fourth obtaining module, configured to obtain a second segment of the first product matrix based on the second segment of the feature matrix and the model parameter matrix and performing cooperative operation with other computing parties; wherein, other calculation parties hold the first sub-piece of the characteristic matrix and the model parameter matrix; the first product matrix is the product of the model parameter matrix and the characteristic matrix; a fifth obtaining module, configured to obtain a second segment of the activation matrix based on a second segment of the first product matrix and performing cooperative operation with other computing parties, where an element of the activation matrix is an activation function value of a contraposition element in the first product matrix; the sixth obtaining module may be configured to obtain a second segment of the gradient matrix of the current round by performing cooperative operation with other computing parties based on the second segment of the activation matrix and the feature matrix; wherein, the gradient matrix is the product of the difference between the activation matrix and the label matrix and the characteristic matrix; a third determination module that may be configured to determine a second slice of the momentum gradient matrix of the current wheel based on the second slice of the momentum gradient matrix of the previous wheel and the second slice of the gradient matrix of the current wheel; the fourth determination module may be configured to determine a second segment of the updated model parameter matrix based on the second segment of the model parameter matrix and a second segment of the momentum gradient matrix of the current wheel.

Another aspect of embodiments of the present specification provides a model training method based on multi-party security computation. The method is applied to any party, and the party holds a first fragment of a characteristic matrix, a first fragment of a label matrix and a first fragment of a model parameter matrix; the method comprises one or more rounds of iterative updating of the model parameters, wherein one round of iterative updating comprises the following steps: performing cooperative operation with other calculation parties based on the first segment of the model parameter matrix and the first segment of the feature matrix to obtain a first segment of a first product matrix; wherein, the other computation parties hold a second segment of the feature matrix, a second segment of the label matrix and a second segment of the model parameter matrix; the first product matrix is the product of the model parameter matrix and the characteristic matrix; on the basis of a first segment of the first product matrix, performing cooperative operation with other calculation parties to obtain a first segment of an activation matrix, wherein elements of the activation matrix are activation function values of counterpoint elements in the first product matrix; obtaining a first fragment of the gradient matrix based on the first fragment of the activation matrix, the first fragment of the feature matrix and the first fragment of the label matrix and other calculation parties for cooperative operation; wherein, the gradient matrix is the product of the difference between the activation matrix and the label matrix and the characteristic matrix; determining a first segment of the momentum gradient matrix of the current wheel based on the first segment of the momentum gradient matrix of the previous wheel and the first segment of the gradient matrix of the current wheel; a first patch of the updated model parameter matrix is determined based on the first patch of the model parameter matrix and the first patch of the momentum gradient matrix of the current wheel.

Another aspect of an embodiment of the present specification provides a model training system based on multi-party security computation. The system is applied to any participant, and the participant holds a first fragment of a characteristic matrix, a first fragment of a label matrix and a first fragment of a model parameter matrix; the system comprises one or more rounds of iterative updating of the model parameters, wherein the system comprises: a seventh obtaining module, configured to perform cooperative operation with other computing parties based on the first segment of the model parameter matrix and the first segment of the feature matrix to obtain a first segment of the first product matrix; wherein, the other computation parties hold a second segment of the feature matrix, a second segment of the label matrix and a second segment of the model parameter matrix; the first product matrix is the product of the model parameter matrix and the characteristic matrix; an eighth obtaining module, configured to obtain a first segment of an activation matrix based on the first segment of the first product matrix and through cooperative operation with other computing parties, where an element of the activation matrix is an activation function value of a counterpoint element in the first product matrix; a ninth obtaining module, configured to obtain a first segment of the gradient matrix based on the first segment of the activation matrix, the first segment of the feature matrix, and the first segment of the tag matrix, and perform cooperative operation with other computing parties; wherein, the gradient matrix is the product of the difference between the activation matrix and the label matrix and the characteristic matrix; a fifth determination module that may be configured to determine a first slice of the momentum gradient matrix of the current wheel based on the first slice of the momentum gradient matrix of the previous wheel and the first slice of the gradient matrix of the current wheel; a sixth determination module may be configured to determine a first segment of the updated model parameter matrix based on the first segment of the model parameter matrix and a first segment of a momentum gradient matrix of the current wheel.

Another aspect of embodiments of the present specification provides a multi-party security computation-based model training apparatus comprising at least one storage medium and at least one processor, the at least one storage medium storing computer instructions; the at least one processor is configured to execute the computer instructions to implement a multi-party security computation based model training method.

Another aspect of embodiments of the present specification provides a computer-readable storage medium storing computer instructions, and when the computer instructions in the storage medium are read by a computer, the computer executes a model training method based on multi-party security computation.

Drawings

The present description will be further explained by way of exemplary embodiments, which will be described in detail by way of the accompanying drawings. These embodiments are not intended to be limiting, and in these embodiments like numerals are used to indicate like structures, wherein:

FIG. 1 is a schematic diagram of an exemplary application scenario of a multi-party secure computing based model training system in accordance with some embodiments of the present description;

FIG. 2 is an exemplary diagram of a multi-party secure multiplication protocol, shown in accordance with some embodiments of the present description;

FIG. 3 is an exemplary interaction flow diagram of a multi-party security computation based model training method according to some embodiments described herein;

FIG. 4 is an exemplary interaction flow diagram of a multi-party security computation based model training method according to some embodiments of the present description;

FIG. 5 is an exemplary block diagram of a multi-party security computation based model training system in accordance with some embodiments of the present description;

FIG. 6 is an exemplary block diagram of a multi-party security computation based model training system in accordance with some embodiments of the present description;

FIG. 7 is an exemplary block diagram of a multi-party security computation based model training system in accordance with some embodiments of the present description.

Detailed Description

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only examples or embodiments of the present description, and that for a person skilled in the art, the present description can also be applied to other similar scenarios on the basis of these drawings without inventive effort. Unless otherwise apparent from the context, or otherwise indicated, like reference numbers in the figures refer to the same structure or operation.

It should be understood that "system", "device", "unit" and/or "module" as used herein is a method for distinguishing different components, elements, parts, portions or assemblies at different levels. However, other words may be substituted by other expressions if they accomplish the same purpose.

As used in this specification and the appended claims, the terms "a," "an," "the," and/or "the" are not intended to be inclusive in the singular, but rather are intended to be inclusive in the plural, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that steps and elements are included which are explicitly identified, that the steps and elements do not form an exclusive list, and that a method or apparatus may include other steps or elements.

Flow charts are used in this description to illustrate operations performed by a system according to embodiments of the present description. It should be understood that the preceding or following operations are not necessarily performed in the exact order in which they are performed. Rather, the various steps may be processed in reverse order or simultaneously. Meanwhile, other operations may be added to the processes, or a certain step or several steps of operations may be removed from the processes.

The traditional machine learning needs to centralize all training sample data in one place and uniformly train the model. The machine learning for protecting privacy does not need to centralize all training sample data during training, each data owner cannot reveal own privacy data during the training process, and when the machine learning model for protecting privacy is trained in cooperation, how to ensure that the privacy data of each data owner cannot be revealed and the training speed of the model is a problem worthy of research. For example, in training a logistic regression model, a random gradient descent method is used to train the model, but since the random gradient descent method is a first-order convergence, the convergence rate is slow, and it takes a long time to train the model.

Therefore, some embodiments of the present disclosure provide a model training method and system based on multi-party security computation, which utilize an exponentially weighted average of model gradients to improve a model convergence rate by using historical gradient information, so as to improve a model training rate while protecting data privacy and security of participants of model training. The exponential weighted average means that the weight occupied by each element is distributed exponentially. The exponentially weighted average is also called an exponentially weighted moving average, a local average, or a moving average.

It should be noted that, for convenience of description, a logistic regression model is taken as an example in the embodiment of the present disclosure to illustrate, and in some other embodiments, other types of models, such as a linear regression model, may also be used, which is not limited in this respect. The technical solutions disclosed in the embodiments of the present specification are described in detail below with reference to the accompanying drawings.

FIG. 1 is a schematic diagram of an exemplary application scenario of a multi-party security computation based model training system according to some embodiments of the present description.

As shown in FIG. 1, in an application scenario, a multi-party security computation based model training system 100 may include an A-party device 110, a B-party device 120, a third-party server 130, and a network 140.

In some embodiments, party a may hold the labels in the training data and party B may hold the feature data in the training data. In some embodiments, parties a and B may hold a portion of the feature data and the label, respectively, e.g., parties a and B may hold one slice of the feature data and the label, respectively. The data held by the party A belongs to the privacy of the party A, and the data held by the party B belongs to the privacy of the party B. During the process of collaborative training of the model, the A party and the B party do not want to expose their private data to each other. In order to protect the data privacy of both parties, the input (such as characteristic data, labels and the like) and output (such as gradient, model parameters and the like) of a plurality of computing links involved in the training process can be stored in the equipment of both parties in a slicing mode, and the party A and the party B respectively execute one slice. For example, the feature data may be held by party a for one shard, party B for one shard, and the label may be held by party B for one shard, and party a for one shard. Gradients, model parameters may be sharded based on secret sharing, with party a and party B each holding a portion of the shard. The idea of secret sharing is that a secret is split in a proper mode, each split fragment is managed by different participants, a single participant cannot recover secret information, and only a plurality of participants cooperate together, the secret information can be recovered. In some embodiments, the full feature data may also be held by party a and the full tag data held by party B.

Devices 110/120 may include various types of computing devices with information transceiving capabilities, such as smart phones, laptop computers, desktop computers, servers, and the like.

In some embodiments, the servers may be independent servers or groups of servers, which may be centralized or distributed. In some embodiments, the server may be regional or remote. In some embodiments, the server may execute on a cloud platform. For example, the cloud platform may include one or any combination of a private cloud, a public cloud, a hybrid cloud, a community cloud, a decentralized cloud, an internal cloud, and the like.

The third party server 130 may assist the party a device 110 and the party B device 120 in performing multi-party security computations. For example, the third party server 130 may provide random numbers to both parties when a party performs a multi-party secure multiplication with a party B. For more on the multi-party secure multiplication computation, see the relevant description of fig. 2.

Network 140 connects the various components of the system so that communication can occur between the various components. The network between the various parts in the system may include wired networks and/or wireless networks. For example, network 140 may include a cable network, a wired network, a fiber optic network, a telecommunications network, an intranet, the internet, a Local Area Network (LAN), a Wide Area Network (WAN), a Wireless Local Area Network (WLAN), a Metropolitan Area Network (MAN), a Public Switched Telephone Network (PSTN), a bluetooth network, a ZigBee network (ZigBee), Near Field Communication (NFC), an intra-device bus, an intra-device line, a cable connection, and the like, or any combination thereof. The network connection between each two parts may be in one of the above-mentioned ways, or in a plurality of ways.

The multi-party cooperative operation involved in the model training process of the multi-party safety calculation is mostly based on a multi-party safety multiplication protocol. The multi-party secure multiplication protocol can be described in that when one multiplier belongs to the private data of the a-party and the other multiplier belongs to the private data of the B-party, the device of either party cannot directly calculate the product of the two multipliers, and can interact with the computing device of the other party according to the multi-party secure multiplication protocol to calculate a fragment of the product based on the private data of the own party. That is, party a and party B each obtain one slice of the product. The third party server 130 may assist the party a device 110 and the party B device 120 in running a multi-party secure multiplication protocol.

FIG. 2 is an exemplary diagram of a multi-party secure multiplication protocol, shown in accordance with some embodiments of the present description. As shown in fig. 2, the first and second computing parties are parties providing private data, for example, the first computing party owns private data a and the second computing party owns private data b, respectively.

In a multi-party secure multiplication protocol, a first and a second computing party may obtain random numbers in order to hide their own data between the computing parties. In some embodiments, the random number may be a number, a vector, or a matrix. The random number may be generated by a third party (e.g., third party server 130).

Referring to fig. 2, the obtained random numbers may include u, v, where u, v may be a matrix, a vector, a number, or any combination thereof. The obtained random numbers are distributed to the first calculator and the second calculator according to a certain rule. In particular, in some embodiments, a third party may generate two random numbers u, v, calculate the product of u, v, and split the product into z₀And z₁Two additive slices, i.e. z₀+z₁Uv. The first computing party can obtain u and z₀The second computing party may obtain v and z₁. Assuming that the product of the private data a owned by the first computing party and the private data b owned by the second computing party is to be calculated, since a and b are to be calculated and the private data a and b cannot be respectively leaked to the other party, the calculation of a and b can be realized by the obtained random numbers.

In the calculation, the first calculator may encrypt the private data a by using the obtained random number u, for example, to send the value of a-u (i.e., e) to the second calculator, and similarly, the second calculator may encrypt the private data b by using the obtained random number v, for example, to send the value of b-v (i.e., f) to the first calculator, and f and e may be regarded as data obtained by encrypting b and a, respectively. Since the second party is actually derived from the first partyE, i.e. it does not know what a and u are, what f the first computing party actually obtained from the second computing party, and what b and v are, then u and z of the random numbers obtained by the first computing party₀And a is secret from the second party, the random numbers v and z obtained by the second party₁And b, which it owns, is kept secret from the first computing party, and data, which it owns, is kept in privacy.

The first computing party may then utilize the random numbers u and z that it owns₀And f, received from the second computing party, is computed to obtain c₀Wherein c is₀＝uf+z₀. Similarly, the second computing party may utilize the random numbers v and z that it owns₁And e, received from the first computing party, is computed to obtain c₁，c₁＝eb+z₁。c₀+c₁I.e. a, b, c to be calculated₀A first slice representing a b owned by a first computing party, c₁Representing a second shard of a b owned by a second computing party.

Can prove that c₀+c₁(vi) uf + eb + uv-ub-uv + ab-ub + uv-ab, i.e., c₀+c₁Ab. The above is an exemplary calculation process of the multi-party secure multiplication protocol, and based on the multi-party secure multiplication protocol, products with privacy data of other parties can be calculated without exposing privacy data of each party. Based on the above, the security calculation such as matrix multiplication, polynomial and the like which is completed by the cooperation of multiple parties can be decomposed into the most basic multi-party security multiplication problem, and then the security calculation such as multi-party matrix multiplication, polynomial and the like is completed.

FIG. 3 is an exemplary interaction flow diagram of a multi-party security computation based model training method involving data interaction between multiple parties, according to some embodiments of the present description. Steps 302A-310A may be performed by a processing device of a first computing party, such as the device 120 of party B, and steps 302B-310B may be performed by a processing device of a second computing party, such as the device 110 of party a.

As shown in FIG. 3, a first computing party may hold a first slice of the label matrix and the model parameter matrix, and a second computing party may hold a second slice of the feature matrix and the model parameter matrix. The model parameter matrix may be partitioned based on secret sharing before training begins and assigned to the first and second computing parties. For example, the initialized model parameter matrix may be generated by a third-party server and split into two matrix fragments for distribution to the first and second computing parties. In some embodiments, the first and second splits of the model parameter matrix may also be initialized randomly locally by the first and second computing parties. It should be noted that, when the first calculator performs the operation, the second calculator is the other calculator relative to the first calculator, and similarly, when the second calculator performs the operation, the first calculator is the other calculator relative to the second calculator. The label matrix of the first calculation party and the feature matrix of the second calculation party have a corresponding relation. The corresponding relationship means that the feature matrix of the second calculation party and the label matrix of the first calculation party are aligned one to one, for example, the second calculation party has the feature matrix, wherein one row corresponds to the feature parameter of one training sample. In some embodiments, a training sample may include a plurality of feature parameters, and thus the feature parameters of a training sample may be represented in the form of (row) vectors, and specifically, the feature parameters of each row or each training sample in the feature matrix may be represented by X1, X2, … …, and Xn. The first calculator has a label matrix, and one row of the label matrix corresponds to the label value of one training sample. In some embodiments, a training sample may contain one or more label values, so that the label values of a training sample may be represented in the form of (row) vectors, and specifically, Y1, Y2, … …, Yn may represent rows in a label matrix or feature values of training samples, it is understood that, when there is only one label value of a training sample, Y1, Y2, … …, Yn may represent single values respectively, and in this case, the label matrix may be reduced to a label vector. The relationship between the aligned feature matrix and the label matrix can be represented as (X1, Y1), (X2, Y2), … …, (Xn, Yn), i.e., Y1 is the label of X1 and Y2 is the label of X2. In some embodiments, the alignment may be by adding an identification or ID. For example, the same ID is added to the data belonging to the same training sample in the feature matrix and the label matrix, for example, ID1 (X1, Y1), ID2 (X2, Y2), and the rows in the feature matrix/label matrix are arranged in the same ID sorting manner during calculation, so as to ensure that the feature data is aligned with the label data during calculation. In other words, row 1 in the feature matrix and row 1 in the label matrix correspond to the same training sample. Before operation, each calculator can achieve consensus on the identification or the ID based on the sample, ensure that the feature data and the label data of the same training sample have the same identification/ID, and achieve consensus on the arrangement sequence based on the identification/ID.

As shown in fig. 3, the interaction flow 300 may include the following operations.

Step 302A, based on the first partition of the model parameter matrix, performs cooperative operation with other computation parties to obtain a first partition of a first product matrix. In some embodiments, step 302A may be performed by the first obtaining module 510.

The model parameter matrix refers to model parameters expressed in the form of a matrix. For example, the parameter W of the logistic regression model.

In some embodiments, the user may divide the model parameter matrix into multiple shards based on secret sharing and distribute the shards to the various participants. For example, the model parameter matrix W may be divided into a first slice W₁A second segment W₂，W＝W₁+W₂。

The first partition of the model parameter matrix refers to a partition held by a first computational party, e.g., W₁. Second partition of model parameter matrix (e.g., W)₂) May be held by other computing parties (e.g., a second computing party).

In some embodiments, the first calculator may cooperate with the second calculator to obtain the first partition of the first product matrix. The first product matrix is a product of the model parameter matrix and the feature matrix. The cooperative operation may be a multiply calculation based on a multi-party secure multiplication protocol. The feature matrix used for model training is held by other computing parties (e.g., a second computing party).

First product momentThe matrix may be represented as WX, W being the model parameter matrix, and X being the feature matrix. Because the model parameter matrix W respectively exists in the form of fragments with the first computation party and other computation parties, the first computation party holds the first fragment W of the model parameter matrix₁Held by the other computing party is the second piece W of the model parameter matrix₂Then WX ═ W₁+W₂)X＝W₁X+W₂And (4) X. Wherein X is held by other computing parties, the data held by the first computing party and the other computing parties cannot be mutually disclosed, and the process of performing cooperative computing by using the multi-party secure multiplication protocol may be as shown in the following embodiments.

Due to W₂And X are all held by other computing parties, so that other computing parties can directly and independently compute W at local positions of other computing parties₂X, then the part actually needing to be cooperatively calculated by using the multi-party secure multiplication protocol is W₁And (4) X. Wherein, W₁Held by a first party, X by other parties, easily known, W₁X can be decomposed into multiple multiplication operations and addition operations, wherein when the multiplication operations are involved, a first calculator and a second calculator can calculate according to a multi-party safe multiplication protocol to obtain a first fragment and a second fragment of a corresponding product; when addition operation is involved, the first calculator and the second calculator can perform local calculation respectively based on the fragments held by the first calculator and the second calculator, the first calculator and the second calculator perform safe multiplication calculation in cooperation for multiple times, and finally W can be obtained₁A first and a second segment of X.

From the above description, it can be known that, for the first product matrix WX, the first partition of the first product matrix held by the first computation party is actually W₁One fragment of X [ W ]₁X]₁The other computing party holds a second slice of the first product matrix, the second slice of the first product matrix being substantially W₁Another slice of X [ W ]₁X]₂And W₂And (4) the sum of X.

It should be noted that, in the embodiment of the present specification, in order to clearly and concisely describe the calculation process, by default, necessary matrix transposition has been performed to meet the dimension requirement between matrix operations, and therefore, the description of the matrix transposition is not repeated.

Step 304A, based on the first partition of the first product matrix, performing cooperative operation with the other computing parties to obtain a first partition of an activation matrix. In some embodiments, step 304A may be performed by the second obtaining module 520.

An activation matrix is a matrix having as matrix elements the activation function values of the elements in the first product matrix, i.e. the elements of the activation matrix are the activation function values of the aligned elements in the first product matrix. That is, the processing device may compute activation function values for individual elements in the first product matrix in cooperation with other computational parties based on a first tile of the first product matrix to obtain a first tile of the activation matrix.

In some embodiments, the activation function used to calculate the activation function value may be a Sigmoid function, for example,

in some embodiments, a polynomial may be used to fit a Sigmoid function to implement a first slice of an activation matrix based on the first product matrix, and the activation function value fitted by the multi-party secure computation is implemented in cooperation with the other computing parties to obtain the first slice of the activation matrix.

In some embodiments, the fitting polynomial of the Sigmoid function may be expressed in the form of a function, as shown in equation (1).

g(q)＝h₀+h₁q+h₂q²+…+h_nqⁿ (1)

Wherein g (q) represents a fitting polynomial of a Sigmoid function, h is a preset coefficient, and all parties involved in multi-party safety calculation can know the fitting polynomial; q is a variable, i.e., the above-mentioned calculation yields the elements in the first product matrix; n is a natural number. Illustratively, the fitting polynomial of Sigmoid function may be

And the like. The first calculator and the second calculator can cooperatively determine a polynomial for fitting a Sigmoid function, and then the polynomial is calculated to obtain activationThe fragmentation of the matrix.

Determining a fitting polynomial of the Sigmoid function as

In this case, the first computing party may independently compute the activation function values of the elements of the activation matrix locally, for example, the elements of the first segment of the first product matrix are substituted into q sequentially, and the activation function values may be computed, so as to obtain the first segment g (q) of the activation matrix₁Similarly, other computing parties can independently compute the second segment g (q) of the activation matrix locally₂。

When the fitting polynomial of the Sigmoid function is a polynomial including a high order (such as a second order or more), the first computing party may further cooperate with other computing parties and obtain an activation function value based on a multi-party secure multiplication protocol, so as to obtain a first fragment of the activation matrix. Similarly, other computing parties may obtain a second segment of the activation matrix. With a fitting polynomial of the activation function of

For example, when calculating the activation function values, if elements corresponding to a first tile of the first product matrix and a second tile of the first product matrix are substituted into g (q), then there are

Wherein [ w₁x]₁Representing elements from a first slice of a first product matrix, [ w ]₁x]₂+w₂x represents a second tile from the first product matrix and [ w ]₁x]₁When the same element is expanded, a plurality of single numerical items (such as

) And a product term, wherein the cross term (namely two multiplied terms, one from the first calculator and one from the second calculator) can be obtained by the first calculator and other calculators through the calculation of a safe multiplication protocol, and the single term can be directly calculated at local. The secure multiplication protocol may refer to the description in fig. 2 of the above steps, and is not described here again.

After the computation is completed, the first computing party may obtain a first slice of the activation function value of each element in the first product matrix, and further obtain a first slice a of the activation matrix₁(ii) a The other computing parties obtain a second fragment A of the activation matrix₂。

Step 306A, based on the first segment of the activation matrix and the label matrix, performing cooperative operation with the other computing parties to obtain a first segment of the gradient matrix of the current round. In some embodiments, step 306A may be performed by the third obtaining module 530.

The gradient is one of the important model data in model training. The gradient matrix refers to gradient data of a model expressed in the form of a matrix.

In some embodiments, the gradient matrix is the product of the difference between the activation matrix and the label matrix and the feature matrix. The difference of the activation matrix and the label matrix may be used to represent the error of the prediction.

In some embodiments, the gradient matrix may be represented by equation (2) below.

dw＝X(A-Y) (2)

Wherein dw is a gradient matrix, X is a feature matrix, A is an activation matrix, and Y is a label matrix.

Decomposing the equation (2) to obtain dw-XY, wherein the feature matrix X is held by other computing parties; activation matrix a ═ a₁+A₂First partition A of an activation matrix held by a first computing party₁The other computing party holds a second slice A of the activation matrix₂(ii) a If the label matrix Y is held by the first computing party, dw ═ X (a1+ a2) -XY.

That is, dw is XA1+ XA2-XY, XA is calculated from data held by each of the calculation parties₁Can be obtained by the cooperative calculation of a first calculator and other calculators, XA₂The XY can be obtained by local independent calculation directly by other calculation parties, and also can be obtained by cooperative calculation of the first calculation party and other calculation parties. The collaborative computing method may refer to the computing process illustrated in step 302A, and is not described herein again.

After the computation is completed, the first computing party may obtain a first slice [ dw ] of the gradient matrix of the current wheel]₁，[dw]₁＝[XA₁]₁-[XY]₁Other computing parties may obtain a second slice [ dw ] of the gradient matrix for the current wheel]₂，[dw]₂＝[XA₁]₂-[XY]₂+XA₂。

Step 308A, determine a first slice of the momentum gradient matrix of the current wheel based on the first slice of the momentum gradient matrix of the previous wheel and the first slice of the gradient matrix of the current wheel. In some embodiments, step 308A may be performed by the first determining module 540.

The momentum gradient matrix is a gradient matrix obtained by weighting and summing the gradient matrix in the historical iteration and the gradient matrix in the current iteration in an exponential weighted average mode. The exponentially weighted average is a moving average weighted exponentially down. The weighting of each gradient matrix decreases exponentially with the increase of the iteration number, the gradient matrix closer to the current iteration is weighted more heavily, but the more historical gradient matrix is also given a certain weighting.

In some embodiments, the processing device may calculate the momentum gradient matrix according to equation (3) below.

S_t＝βS_t-1+(1-β)dw (3)

Wherein S is_tThe matrix is a momentum gradient matrix of the current iteration turn, beta is a preset weight, and the value of the preset weight can be between 0 and 1, for example, 0.9; s_t-1And d, calculating the gradient matrix obtained by the iteration update of the current round.

It should be noted that, when the current round of iteration update is the first round of iteration update, the momentum gradient matrix of the previous round may be 0, i.e., S_t-10, when proceeding to the second round of iterative update, the momentum gradient matrix of the previous round may be the gradient matrix of the first round, and so on. In some embodiments, when the current round of iterative update is the first round of iterative update, the value of the momentum gradient matrix may be other values besides the value 0 described above, such as a preset initial value other than 0, which is not limited herein.

In combination with the above-described momentum gradient calculation process, the first slice of the momentum gradient matrix of the current wheel may be calculated from the first slice of the momentum gradient matrix of the previous wheel and the first slice of the momentum gradient matrix of the current wheel. Specifically, the first slice of the momentum gradient matrix of the previous round and the first slice of the gradient matrix of the current round may be substituted into the above equation for calculation, S_t-1I.e. the first slice of the momentum gradient matrix of the previous wheel, dw is the first slice of the gradient matrix of the current wheel. The entire equation can then be interpreted as: the first slice of the momentum gradient matrix of the current wheel is a weighted sum of the first slice of the momentum gradient matrix of the previous wheel and the first slice of the gradient matrix of the current wheel.

Because the first fragment of the momentum gradient matrix of the previous round and the first fragment of the gradient matrix of the current round are both held by the first computing party, the first computing party can directly and independently compute locally without cooperating with other computing parties to compute. Similarly, other computing parties can independently and directly compute the second segment of the momentum gradient matrix of the current wheel locally.

Step 310A, determining a first segment of the updated model parameter matrix based on the first segment of the model parameter matrix and the first segment of the momentum gradient matrix of the current wheel. In some embodiments, step 310A may be performed by the second determination module 550.

The decrease of momentum gradient is relative to the decrease of random gradient, and the convergence speed of the model can be improved by using the gradient information updated by the iteration of the historical round by using the exponential weighted average of the gradient. Since model training the minimization objective function requires iteratively updating the model parameters along the negative direction of the gradient, the first patch of the updated model parameter matrix may be determined based on a difference of the first patch of the model parameter matrix and the first patch of the momentum gradient matrix of the current wheel.

In some embodiments, the learning rate of the model may be preset, and the processing device may multiply a first segment of the momentum gradient matrix of the current round by the preset learning rate to obtain a second product matrix, then calculate a difference value between the first segment of the model parameter matrix and the second product matrix, and determine the difference value as a first segment of the updated model parameter matrix.

In some embodiments, determining the first slice of the updated model parameter matrix may be represented by equation (4) below.

W_i＝W_i-1-l_rS_t (4)

Wherein, W_iFirst slice, W, representing updated model parameter matrix_i-1First partition, l, representing a model parameter matrix_rRepresenting a preset learning rate, a known quantity, S_tRepresenting a first segment of the momentum gradient matrix of the current wheel.

And substituting the first fragment of the model parameter matrix and the first fragment of the momentum gradient matrix of the current wheel into equation (4) to calculate the first fragment of the updated model parameter matrix. Similarly, other computing parties may compute the second segment of the updated model parameter matrix in the same manner.

In the embodiment of the specification, the calculation mode of the data of the participants of the multi-party security calculation in the model training process is based on secret sharing and is calculated by cooperating with other participants, the data cannot be out of the domain, and the privacy security of the data of each participant is ensured. In the model training process, the gradient information of the historical round iterative update is introduced by using the mode of momentum gradient reduction and the exponential weighted average of the gradient, so that the convergence speed of the model is improved, and the time spent in the model training is reduced.

In some embodiments, the second calculator may determine the second patch of the updated model parameter matrix based on a similar manner as the first calculator determines the first patch of the updated model parameter matrix described above. As shown in fig. 3, determining the second partition of the updated model parameter matrix may include steps 302B through 310B.

And step 302B, performing cooperative operation with other calculation parties based on the second segment of the feature matrix and the model parameter matrix to obtain a second segment of the first product matrix. In some embodiments, step 302B may be performed by the fourth obtaining module 610.

The other computing parties hold a first slice of the feature matrix and the model parameter matrix.

The first product matrix is a product of a model parameter matrix and a feature matrix.

Step 304B, based on the second slice of the first product matrix, performing cooperative operation with the other computing parties to obtain a second slice of an activation matrix, where elements of the activation matrix are activation function values of the counterpoint elements in the first product matrix. In some embodiments, step 304B may be performed by the fifth obtaining module 620.

In some embodiments, the activation function value may be calculated by a fitting polynomial of the activation function.

In some embodiments, the processing device may co-operate with the other computing party to obtain a second patch of the activation matrix based on the fitted polynomial based on a second patch of the first product.

And step 306B, obtaining a second segment of the gradient matrix of the current round by performing cooperative operation with the other computing parties based on the second segment of the activation matrix and the feature matrix. In some embodiments, step 306B may be performed by the sixth obtaining module 630.

The gradient matrix is the product of the difference between the activation matrix and the label matrix and the feature matrix.

In step 308B, a second slice of the momentum gradient matrix of the current wheel is determined based on the second slice of the momentum gradient matrix of the previous wheel and the second slice of the gradient matrix of the current wheel. In some embodiments, step 308B may be performed by the third determination module 640.

In some embodiments, the second slice of the momentum gradient matrix of the current wheel is a weighted sum of the second slice of the momentum gradient matrix of the previous wheel and the second slice of the gradient matrix of the current wheel.

In some embodiments, when the one-round iterative update is a first-round iterative update, a second slice of the momentum gradient matrix of the previous round is 0; and when the first round of iterative update is the second round of iterative update, the second fragment of the momentum gradient matrix of the previous round is the second fragment of the gradient matrix of the first round.

Step 310B, determining a second slice of the updated model parameter matrix based on the second slice of the model parameter matrix and the second slice of the momentum gradient matrix of the current wheel. In some embodiments, step 310B may be performed by the fourth determination module 650.

In some embodiments, the processing device may multiply the second segment of the momentum gradient matrix of the current wheel by a preset learning rate to obtain a third product matrix; and calculating a difference value between the second fragment of the model parameter matrix and the third product matrix, and determining the difference value as the second fragment of the updated model parameter matrix.

For the description of step 302B-step 310B, the relevant details can be found in the description of step 302A-step 310A. The difference between the two is that the processes of steps 302A-310A are performed by a first computing party, and the processes of steps 302B-310B are performed by a second computing party, which are calculated in the same way and can be referred to each other. In addition, the steps 302A-310A and 302B-310B are performed simultaneously in one iteration of the update.

In the embodiment, the calculation mode of the data of the participants of the multi-party security calculation in the model training process is based on secret sharing and cooperative calculation with other participants, the data cannot be out of the domain, and the privacy security of the data of each participant is ensured. In the model training process, the gradient information of the historical round iterative update is introduced by using the mode of momentum gradient reduction and the exponential weighted average of the gradient, so that the convergence speed of the model is improved, and the time spent in the model training is reduced.

FIG. 4 is an exemplary interaction flow diagram of a multi-party security computation based model training method involving data interaction between multiple parties, according to some embodiments of the present description. Steps 402-410 may be performed by a processing device, such as a device 110 of party a or a device 120 of party B.

In some embodiments, the interaction flow 400 shown in fig. 4 may be applied to any party, where the dashed lines through the middle of the steps indicate that the steps may be performed by either of the first and second computing parties. The first computing party may hold a first patch of the feature matrix, a first patch of the tag matrix, and a first patch of the model parameter matrix, and the second computing party may hold a second patch of the feature matrix, a second patch of the tag matrix, and a second patch of the model parameter matrix. It can be seen that in the present embodiment, the first and second computing parties are fully peer participants, and the flow performed by either party is the same, i.e. flow 400. The feature matrix, the tag matrix and the model parameter matrix can be divided based on secret sharing before training is started and distributed to the first calculator and the second calculator. For example, referring to the division manner of the model parameter matrix in fig. 3, a first segment and a second segment of the model parameter matrix may be generated and held by each participant. The first calculator can divide the label matrix held by the first calculator into a first fragment and a second fragment, and distribute the second fragment of the label matrix to the second calculator. Similarly, the second computing party may split the feature matrix held by the second computing party into a first partition and a second partition, and distribute the first partition of the feature matrix to the second computing party. The first fragment of the feature matrix and the first fragment of the tag matrix held by the first computing party and the second fragment of the feature matrix and the second fragment of the tag matrix held by the second computing party can be aligned by means of identification, ID and the like. For ease of understanding, any party performing flow 400 is represented in fig. 4 as a first party and a second party represents the other parties. However, it should be understood that in some other embodiments, the second computing party may also serve as any one of the parties executing the flow 400, and the first computing party serves as another party, which is not limited in this embodiment.

In some embodiments, the feature matrix may be represented by X, and the first partition of the feature matrix is X₁The second slice of the feature matrix is X₂. The label matrix is represented by Y, and the first fragment of the label matrix is Y₁The second segment of the label matrix is Y₂. The first fragment of the model parameter matrix is W when W represents the model parameter matrix₁The second slice of the model parameter matrix is W₂。

As shown in fig. 4, the interaction flow 400 may include the following operations.

Step 402, the first segment of the first product matrix is obtained based on the first segment of the model parameter matrix and the first segment of the feature matrix and through cooperative operation with other calculation parties. In some embodiments, step 402 may be performed by the seventh obtaining module 710.

For the description of the first segment of the model parameter matrix, reference may be made to the description of fig. 3, which is not repeated herein.

The first product matrix is a product of the model parameter matrix and the feature matrix.

Denote the first product matrix by WX, then WX ═ W₁+W₂)(X₁+X₂). The obtained material is developed to obtain WX ═ W (W)₁+W₂)X₁+(W₁+W₂)X₂＝W₁X₁+W₂X₁+W₁ X₂+W₂X₂. It should be noted that the necessary matrix transposing has been done here by default as well.

Wherein, the cross term W₂X₁、W₁X₂All can be obtained based on the matrix multiplication cooperative calculation as described in step 302A, and will not be described herein again. Local item W₁X₁Can be independently calculated by the first calculator at the local part of the first calculator, W₂X₂Can be independently computed locally by the second computing party.

After the calculation is completed, the first calculator mayTo obtain a first slice [ WX ] of a first product matrix]₁，[WX]₁＝W₁X₁+[W₂X₁]₁+[W₁X₂]₁. The second calculator may obtain a second slice [ WX ] of the first product matrix]₂，[WX]₂＝W₂X₂+[W₂X₁]₂+[W₁X₂]₂。

Step 404, based on the first partition of the first product matrix, performing cooperative operation with the other computing parties to obtain a first partition of an activation matrix. In some embodiments, step 404 may be performed by an eighth obtaining module 720.

For the cooperative operation process and the description of step 404, refer to the description of step 304A, which is not repeated herein.

After the calculation is completed, the same can be used as A₁A first tile representing an activation matrix, held by a first computing party; a. the₂A second tile, representing the activation matrix, is held by a second computing party.

And 406, performing cooperative operation with the other computing parties based on the first segment of the activation matrix, the first segment of the feature matrix and the first segment of the label matrix to obtain a first segment of a gradient matrix. In some embodiments, step 406 may be performed by the ninth obtaining module 730.

In some embodiments, the gradient matrix is the product of the difference between the activation matrix and the label matrix and the feature matrix. For the description of the gradient matrix, reference may be made to the description of step 306A, which is not repeated here.

Let dw denote the gradient matrix, then dw ═ (a-Y) X ═ ((a-Y)₁+A₂)-(Y₁+Y₂))(X₁+X₂). Developed to obtain dw ═ A₁+A₂)(X₁+X₂)-(Y₁+Y₂)(X₁+X₂) Further developed, dw ═ a₁X₁+A₁X₂+A₂X₁+A₂X₂)-(Y₁X₁+Y₁X₂+Y₂X₁+Y₂X₂) Wherein the local item A₁X₁、Y₁X₁Can be independently calculated by the first calculator at the local part of the first calculator, A₂X₂、Y₂X₂Can be independently computed locally by the second computing party. Cross term A₁X₂、A₂X₁、Y₁X₂、Y₂X₁The first and second computing parties cooperatively compute based on the matrix multiplication as in step 302A, the first computing party holding one slice of the cross term and the second computing party holding one slice of the cross term. In some embodiments, when calculating the gradient matrix dw, part of the terms may also be calculated locally, e.g., A₁-Y₁、A₂-Y₂The computational complexity can be reduced.

After the computation is completed, the first computing party may obtain a first tile [ dw ] of the gradient matrix]₁，[dw]₁＝A₁X₁-Y₁X₁+[A₁X₂]₁+[A₂X₁]₁-[Y₁X₂]₁-[Y₂X₁]₁(ii) a The second computing party may obtain a second slice [ dw ] of the gradient matrix]₂，[dw]₂＝A₂X₂-Y₂X₂+[A₁X₂]₂+[A₂X₁]₂-[Y₁X₂]₂-[Y₂X₁]₂。

At step 408, a first tile of the momentum gradient matrix of the current wheel is determined based on the first tile of the momentum gradient matrix of the previous wheel and the first tile of the gradient matrix of the current wheel. In some embodiments, step 408 may be performed by the fifth determination module 740.

Step 410, determining a first segment of the updated model parameter matrix based on the first segment of the model parameter matrix and the first segment of the momentum gradient matrix of the current wheel. In some embodiments, step 410 may be performed by the sixth determination module 750.

Since step 408 and step 410 do not involve calculation of cross terms in the calculation process, the description of step 408 and step 410 can be directly referred to the description of step 308A and step 310A, and will not be described herein again.

In this embodiment, except for ensuring the privacy and security of data of each participant in the training of the multi-party security computing model, improving the convergence rate of the model, and reducing the time required for model training, the algorithm deployed on any participant participating in the computing can be the same, each participant has the same status, and the deployment complexity of the algorithm can be reduced.

It should be noted that the above description of the interaction flows is only for illustration and description, and does not limit the application scope of the present specification. Various modifications and alterations to the flow may occur to those skilled in the art, given the benefit of this description. However, such modifications and variations are intended to be within the scope of the present description. For example, changes to the flow steps described herein, such as the addition of pre-processing steps and storage steps, may be made.

FIG. 5 is an exemplary block diagram of a multi-party security computation based model training system in accordance with some embodiments of the present description. As shown in fig. 5, the system 500 may include a first obtaining module 510, a second obtaining module 520, a third obtaining module 530, a first determining module 540, and a second determining module 550.

The system is applied to a first calculator for performing multiple rounds of iterative updating on model parameters, wherein the first calculator holds a label matrix and a first fragment of a model parameter matrix.

The first obtaining module 510 may be configured to obtain a first slice of a first product matrix based on the first slice of the model parameter matrix in cooperation with other computation parties.

The other calculation parties hold a second fragment of the feature matrix and the model parameter matrix; the first product matrix is a product of a model parameter matrix and a feature matrix. In some embodiments, the first obtaining module 510 may obtain the first partition of the first product matrix by performing a co-computation with other computing parties based on a multi-party secure multiplication protocol.

The second obtaining module 520 may be configured to obtain a first partition of the activation matrix based on the first partition of the first product matrix and cooperatively operated with the other computing parties.

The elements of the activation matrix are activation function values of the counterpoint elements in the first product matrix. In some embodiments, the second obtaining module 520 may obtain the first partition of the activation matrix based on a multi-party secure multiplication protocol by cooperating with other computing parties.

In some embodiments, the activation function values are calculated by fitting polynomials to the activation functions. The second obtaining module 520 may cooperate with the other computing party to obtain a first patch of the activation matrix based on the fitted polynomial based on the first patch of the first product.

The third obtaining module 530 may be configured to obtain the first partition of the gradient matrix of the current round in cooperation with the other computing parties based on the first partition of the activation matrix and the tag matrix.

In some embodiments, the third obtaining module 530 may obtain the first partition of the gradient matrix of the current round by calculation in cooperation with other calculating parties based on a multi-party secure multiplication protocol.

The first determination module 540 may be configured to determine a first tile of the momentum gradient matrix of the current wheel based on the first tile of the momentum gradient matrix of the previous wheel and the first tile of the gradient matrix of the current wheel.

In some embodiments, the first slice of the momentum gradient matrix of the current wheel is a weighted sum of the first slice of the momentum gradient matrix of the previous wheel and the first slice of the gradient matrix of the current wheel.

In some embodiments, when the one-round iterative update is a first-round iterative update, a first slice of the momentum gradient matrix of the previous round is 0; and when the first round of iterative update is the second round of iterative update, the first fragment of the momentum gradient matrix of the previous round is the first fragment of the gradient matrix of the first round.

The second determination module 550 may be configured to determine a first slice of the updated model parameter matrix based on the first slice of the model parameter matrix and the first slice of the momentum gradient matrix of the current wheel.

In some embodiments, the second determining module 550 may multiply the first segment of the momentum gradient matrix of the current round by a preset learning rate to obtain a second product matrix; and calculating the difference value of the first fragment of the model parameter matrix and the second product matrix, and determining the difference value as the first fragment of the updated model parameter matrix.

FIG. 6 is an exemplary block diagram of a multi-party security computation based model training system in accordance with some embodiments of the present description. As shown in fig. 6, the system 600 may include a fourth obtaining module 610, a fifth obtaining module 620, a sixth obtaining module 630, a third determining module 640, and a fourth determining module 650.

The system is applied to a second calculator and used for carrying out multiple rounds of iterative updating on the model parameters, wherein the second calculator holds the feature matrix and a second fragment of the model parameter matrix.

The fourth obtaining module 610 may obtain a second segment of the first product matrix by performing a cooperative operation with other computing parties based on the second segment of the feature matrix and the model parameter matrix.

The other calculation parties hold a label matrix and a first fragment of a model parameter matrix; the first product matrix is a product of a model parameter matrix and a feature matrix. In some embodiments, the fourth obtaining module 610 may obtain the second segment of the first product matrix by performing a cooperative calculation with other computing parties based on a multi-party secure multiplication protocol.

The fifth obtaining module 620 obtains the second segment of the activation matrix by performing cooperative operation with the other computing parties based on the second segment of the first product matrix.

The elements of the activation matrix are activation function values of the counterpoint elements in the first product matrix. In some embodiments, the fifth obtaining module 620 may obtain the second segment of the activation matrix by cooperating with other computing parties based on the multi-party secure multiplication protocol.

In some embodiments, the activation function values are calculated by fitting polynomials to the activation functions. The fifth obtaining module 620 may operate in cooperation with the other calculator to obtain a second patch of the activation matrix based on the fitted polynomial based on a second patch of the first product.

The sixth obtaining module 630 may be configured to obtain a second segment of the gradient matrix of the current round by performing a cooperative operation with the other computing parties based on the second segment of the activation matrix and the feature matrix.

In some embodiments, the sixth obtaining module 630 may obtain the second segment of the gradient matrix of the current round by performing a calculation in cooperation with other calculation parties based on the multi-party secure multiplication protocol.

The third determination module 640 may be configured to determine a second slice of the momentum gradient matrix of the current wheel based on the second slice of the momentum gradient matrix of the previous wheel and the second slice of the gradient matrix of the current wheel.

The fourth determination module 650 may be configured to determine a second patch of the updated model parameter matrix based on the second patch of the model parameter matrix and the second patch of the momentum gradient matrix of the current wheel.

In some embodiments, the fourth determining module 650 may multiply the second segment of the momentum gradient matrix of the current round by a preset learning rate to obtain a second product matrix; and calculating a difference value between the second fragment of the model parameter matrix and the second product matrix, and determining the difference value as the second fragment of the updated model parameter matrix.

FIG. 7 is an exemplary block diagram of a multi-party security computation based model training system in accordance with some embodiments of the present description. As shown in fig. 7, the system 700 may include a seventh obtaining module 710, an eighth obtaining module 720, a ninth obtaining module 730, a fifth determining module 740, and a sixth determining module 750.

The system can be applied to any participant for performing multiple rounds of iterative updating on the model parameters, wherein the participant holds a first fragment of the feature matrix, a first fragment of the tag matrix and a first fragment of the model parameter matrix.

The seventh obtaining module 710 may be configured to obtain a first segment of the first product matrix based on the first segment of the model parameter matrix and the first segment of the feature matrix, and performing cooperative operation with other computing parties.

In some embodiments, the other computing party holds a first slice of a feature matrix and a model parameter matrix; the first product matrix is a product of a model parameter matrix and a feature matrix.

In some embodiments, the seventh obtaining module 710 may obtain the first partition of the first product matrix by performing a cooperative calculation with other calculating parties based on a multi-party secure multiplication protocol.

The eighth obtaining module 720 may be configured to obtain a first partition of the activation matrix based on the first partition of the first product matrix and by cooperating with the other computing parties.

The elements of the activation matrix are activation function values of the counterpoint elements in the first product matrix. In some embodiments, the eighth obtaining module 720 may obtain the first partition of the activation matrix based on a multi-party secure multiplication protocol by cooperating with other computing parties.

The ninth obtaining module 730 may be configured to obtain the first segment of the gradient matrix based on the first segment of the activation matrix, the first segment of the feature matrix, and the first segment of the label matrix, and performing a cooperative operation with the other computing parties.

In some embodiments, the ninth obtaining module 730 may obtain the first partition of the gradient matrix of the current round by calculation in cooperation with other calculating parties based on the multi-party secure multiplication protocol.

The fifth determination module 740 may be configured to determine a first tile of the momentum gradient matrix of the current wheel based on the first tile of the momentum gradient matrix of the previous wheel and the first tile of the gradient matrix of the current wheel.

The sixth determination module 750 may be configured to determine a first tile of an updated model parameter matrix based on the first tile of the model parameter matrix and the first tile of the momentum gradient matrix of the current wheel.

In some embodiments, the sixth determining module 750 may multiply the first segment of the momentum gradient matrix of the current round by a preset learning rate to obtain a second product matrix; and calculating the difference value of the first fragment of the model parameter matrix and the second product matrix, and determining the difference value as the first fragment of the updated model parameter matrix.

For a detailed description of the modules of the model training system based on multi-party security computation, reference may be made to the flowchart section of this specification, e.g., the associated description of fig. 3 and 4.

It should be understood that the systems shown in fig. 5-7 and their modules may be implemented in a variety of ways. For example, in some embodiments, the system and its modules may be implemented in hardware, software, or a combination of software and hardware. Wherein the hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory for execution by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the methods and systems described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided, for example, on a carrier medium such as a diskette, CD-or DVD-ROM, a programmable memory such as read-only memory (firmware), or a data carrier such as an optical or electronic signal carrier. The system and its modules in this specification may be implemented not only by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., but also by software executed by various types of processors, for example, or by a combination of the above hardware circuits and software (e.g., firmware).

It should be noted that the above description of the model training system and its modules based on multi-party security computing is only for convenience of description and should not limit the present disclosure within the scope of the illustrated embodiments. It will be appreciated by those skilled in the art that, given the teachings of the present system, any combination of modules or sub-system configurations may be used to connect to other modules without departing from such teachings. For example, the first obtaining module 510, the second obtaining module 520, the third obtaining module 530, the first determining module 540, and the second determining module 550 may be different modules in one system, or may be a module that implements the functions of two or more modules described above. For example, each module may share one memory module, and each module may have its own memory module. Such variations are within the scope of the present disclosure.

The beneficial effects that may be brought by the embodiments of the present description include, but are not limited to: (1) in the model training process based on the multi-party safety calculation, data can not be out of the domain, and the privacy safety of the data of each participant is ensured. (2) In the model training process, the gradient information of the historical round iterative update is introduced by using the mode of momentum gradient reduction and the exponential weighted average of the gradient, so that the convergence speed of the model is improved, and the time spent in the model training is reduced.

It is to be noted that different embodiments may produce different advantages, and in different embodiments, any one or combination of the above advantages may be produced, or any other advantages may be obtained.

Having thus described the basic concept, it will be apparent to those skilled in the art that the foregoing detailed disclosure is to be regarded as illustrative only and not as limiting the present specification. Various modifications, improvements and adaptations to the present description may occur to those skilled in the art, although not explicitly described herein. Such modifications, improvements and adaptations are proposed in the present specification and thus fall within the spirit and scope of the exemplary embodiments of the present specification.

Also, the description uses specific words to describe embodiments of the description. Reference throughout this specification to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic described in connection with at least one embodiment of the specification is included. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, some features, structures, or characteristics of one or more embodiments of the specification may be combined as appropriate.

Moreover, those skilled in the art will appreciate that aspects of the present description may be illustrated and described in terms of several patentable species or situations, including any new and useful combination of processes, machines, manufacture, or materials, or any new and useful improvement thereof. Accordingly, aspects of this description may be performed entirely by hardware, entirely by software (including firmware, resident software, micro-code, etc.), or by a combination of hardware and software. The above hardware or software may be referred to as "data block," module, "" engine, "" unit, "" component, "or" system. Furthermore, aspects of the present description may be represented as a computer product, including computer readable program code, embodied in one or more computer readable media.

The computer storage medium may comprise a propagated data signal with the computer program code embodied therewith, for example, on baseband or as part of a carrier wave. The propagated signal may take any of a variety of forms, including electromagnetic, optical, etc., or any suitable combination. A computer storage medium may be any computer-readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code located on a computer storage medium may be propagated over any suitable medium, including radio, cable, fiber optic cable, RF, or the like, or any combination of the preceding.

Computer program code required for the operation of various portions of this specification may be written in any one or more programming languages, including an object oriented programming language such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C + +, C #, VB.NET, Python, and the like, a conventional programming language such as C, Visual Basic, Fortran 2003, Perl, COBOL 2002, PHP, ABAP, a dynamic programming language such as Python, Ruby, and Groovy, or other programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any network format, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet), or in a cloud computing environment, or as a service, such as a software as a service (SaaS).

Additionally, the order in which the elements and sequences of the process are recited in the specification, the use of alphanumeric characters, or other designations, is not intended to limit the order in which the processes and methods of the specification occur, unless otherwise specified in the claims. While various presently contemplated embodiments of the invention have been discussed in the foregoing disclosure by way of example, it is to be understood that such detail is solely for that purpose and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements that are within the spirit and scope of the embodiments herein. For example, although the system components described above may be implemented by hardware devices, they may also be implemented by software-only solutions, such as installing the described system on an existing server or mobile device.

Similarly, it should be noted that in the preceding description of embodiments of the present specification, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more of the embodiments. This method of disclosure, however, is not intended to imply that more features than are expressly recited in a claim. Indeed, the embodiments may be characterized as having less than all of the features of a single embodiment disclosed above.

Numerals describing the number of components, attributes, etc. are used in some embodiments, it being understood that such numerals used in the description of the embodiments are modified in some instances by the use of the modifier "about", "approximately" or "substantially". Unless otherwise indicated, "about", "approximately" or "substantially" indicates that the number allows a variation of ± 20%. Accordingly, in some embodiments, the numerical parameters used in the specification and claims are approximations that may vary depending upon the desired properties of the individual embodiments. In some embodiments, the numerical parameter should take into account the specified significant digits and employ a general digit preserving approach. Notwithstanding that the numerical ranges and parameters setting forth the broad scope of the range are approximations, in the specific examples, such numerical values are set forth as precisely as possible within the scope of the application.

For each patent, patent application publication, and other material, such as articles, books, specifications, publications, documents, etc., cited in this specification, the entire contents of each are hereby incorporated by reference into this specification. Except where the application history document does not conform to or conflict with the contents of the present specification, it is to be understood that the application history document, as used herein in the present specification or appended claims, is intended to define the broadest scope of the present specification (whether presently or later in the specification) rather than the broadest scope of the present specification. It is to be understood that the descriptions, definitions and/or uses of terms in the accompanying materials of this specification shall control if they are inconsistent or contrary to the descriptions and/or uses of terms in this specification.

Finally, it should be understood that the embodiments described herein are merely illustrative of the principles of the embodiments of the present disclosure. Other variations are also possible within the scope of the present description. Thus, by way of example, and not limitation, alternative configurations of the embodiments of the specification can be considered consistent with the teachings of the specification. Accordingly, the embodiments of the present description are not limited to only those embodiments explicitly described and depicted herein.

Claims

1. A model training method based on multi-party safety calculation is applied to a first calculating party, wherein the first calculating party holds a label matrix and a first fragment of a model parameter matrix; the method comprises one or more iterative updates of the model parameters, wherein one iterative update comprises:

performing cooperative operation with other calculation parties based on the first segment of the model parameter matrix to obtain a first segment of a first product matrix; wherein the other computing parties have a second segment of the feature matrix and the model parameter matrix; the first product matrix is the product of a model parameter matrix and a characteristic matrix;

obtaining a first fragment of an activation matrix based on the first fragment of the first product matrix and the other computation parties in a cooperative operation mode, wherein elements of the activation matrix are activation function values of contraposition elements in the first product matrix;

based on the first fragment of the activation matrix and the label matrix, performing cooperative operation with the other computing parties to obtain a first fragment of the gradient matrix of the current round; wherein the gradient matrix is the product of the difference between the activation matrix and the label matrix and the feature matrix;

determining a first segment of the momentum gradient matrix of the current wheel based on the first segment of the momentum gradient matrix of the previous wheel and the first segment of the gradient matrix of the current wheel;

determining a first segment of the updated model parameter matrix based on the first segment of the model parameter matrix and the first segment of the momentum gradient matrix of the current wheel.

2. The method of claim 1, wherein the activation function values are calculated by fitting polynomials to the activation functions; the obtaining a first partition of an activation matrix based on the first partition of the first product matrix in cooperation with the other computation parties includes:

the first partition based on the first product matrix is operated in cooperation with the other computing parties to obtain a first partition of an activation matrix based on the fitted polynomial.

3. The method of claim 1, wherein when the one-round update of iteration is a first round update of iteration, a first slice of the momentum gradient matrix of the previous round is 0; and when the first round of iterative update is the second round of iterative update, the first fragment of the momentum gradient matrix of the previous round is the first fragment of the gradient matrix of the first round.

4. The method of claim 1 or 3, the first tile of the momentum gradient matrix of the current wheel being a weighted sum of the first tile of the momentum gradient matrix of the previous wheel and the first tile of the gradient matrix of the current wheel.

5. The method of claim 1, the determining a first slice of an updated model parameter matrix based on the first slice of the model parameter matrix and a first slice of a momentum gradient matrix of the current wheel, comprising:

multiplying a first fragment of the momentum gradient matrix of the current wheel by a preset learning rate to obtain a second product matrix;

and calculating the difference value of the first fragment of the model parameter matrix and the second product matrix, and determining the difference value as the first fragment of the updated model parameter matrix.

6. A model training system based on multi-party safety calculation is applied to a first calculating party, wherein the first calculating party holds a label matrix and a first fragment of a model parameter matrix; the system is used for one or more rounds of iterative updating of model parameters, wherein the system comprises:

the first obtaining module is used for obtaining a first fragment of a first product matrix based on the first fragment of the model parameter matrix and performing cooperative operation with other calculation parties; wherein the other computing parties have a second segment of the feature matrix and the model parameter matrix; the first product matrix is the product of a model parameter matrix and a characteristic matrix;

a second obtaining module, configured to obtain a first segment of an activation matrix based on the first segment of the first product matrix and through cooperative operation with the other computing parties, where an element of the activation matrix is an activation function value of a counterpoint element in the first product matrix;

a third obtaining module, configured to obtain, based on the first segment of the activation matrix and the tag matrix, a first segment of a gradient matrix of the current round through cooperative operation with the other computing parties; wherein the gradient matrix is the product of the difference between the activation matrix and the label matrix and the feature matrix;

a first determination module to determine a first segment of the momentum gradient matrix of the current wheel based on the first segment of the momentum gradient matrix of the previous wheel and the first segment of the gradient matrix of the current wheel;

a second determining module to determine a first segment of the updated model parameter matrix based on the first segment of the model parameter matrix and the first segment of the momentum gradient matrix of the current wheel.

7. A model training method based on multi-party security computation, the method being applied to a second computing party holding a feature matrix and a second patch of a model parameter matrix, the method comprising performing one or more iterative updates on model parameters, wherein an iterative update comprises:

performing cooperative operation with other calculation parties based on the second segment of the characteristic matrix and the model parameter matrix to obtain a second segment of the first product matrix; wherein the other computing parties have a first fragment of the label matrix and the model parameter matrix; the first product matrix is the product of a model parameter matrix and a characteristic matrix;

on the basis of the second fragment of the first product matrix, performing cooperative operation with the other computing parties to obtain a second fragment of an activation matrix, wherein elements of the activation matrix are activation function values of contraposition elements in the first product matrix;

based on the second segment of the activation matrix and the feature matrix, performing cooperative operation with the other computing parties to obtain a second segment of the gradient matrix of the current round; wherein the gradient matrix is the product of the difference between the activation matrix and the label matrix and the feature matrix;

determining a second segment of the momentum gradient matrix of the current wheel based on the second segment of the momentum gradient matrix of the previous wheel and the second segment of the gradient matrix of the current wheel;

determining a second segment of the updated model parameter matrix based on the second segment of the model parameter matrix and the second segment of the momentum gradient matrix of the current wheel.

8. The method of claim 7, wherein the activation function values are calculated by fitting polynomials to the activation functions; the obtaining a second segment of the activation matrix based on the second segment of the first product matrix and the other computation parties through cooperative operation comprises:

the second patch based on the first product matrix is operated in cooperation with the other computing parties to obtain a second patch of an activation matrix based on the fitted polynomial.

9. The method of claim 7, wherein when the one-round update of iteration is a first-round update of iteration, a second slice of the momentum gradient matrix of the previous round is 0; and when the first round of iterative update is the second round of iterative update, the second fragment of the momentum gradient matrix of the previous round is the second fragment of the gradient matrix of the first round.

10. The method of claim 7 or 9, the second tile of the momentum gradient matrix of the current round being a weighted sum of the second tile of the momentum gradient matrix of the previous round and the second tile of the gradient matrix of the current round.

11. The method of claim 7, the determining a second slice of an updated model parameter matrix based on the second slice of the model parameter matrix and a second slice of the momentum gradient matrix of the current wheel, comprising:

multiplying a second fragment of the momentum gradient matrix of the current wheel by a preset learning rate to obtain a third product matrix;

and calculating a difference value between the second fragment of the model parameter matrix and the third product matrix, and determining the difference value as the second fragment of the updated model parameter matrix.

12. A multi-party security computation based model training system, the system being applied to a second computing party holding a feature matrix and a second partition of a model parameter matrix, the system comprising one or more rounds of iterative updates of model parameters, wherein the system comprises:

the fourth obtaining module is used for obtaining a second segment of the first product matrix based on the second segment of the characteristic matrix and the model parameter matrix and performing cooperative operation with other calculation parties; wherein the other computing parties have a first fragment of the label matrix and the model parameter matrix; the first product matrix is the product of a model parameter matrix and a characteristic matrix;

a fifth obtaining module, configured to obtain a second segment of an activation matrix based on a second segment of the first product matrix and performing cooperative operation with the other computation parties, where an element of the activation matrix is an activation function value of a counterpoint element in the first product matrix;

a sixth obtaining module, configured to obtain a second segment of the gradient matrix of the current round based on the second segment of the activation matrix and the feature matrix, and perform cooperative operation with the other computing parties; wherein the gradient matrix is the product of the difference between the activation matrix and the label matrix and the feature matrix;

a third determination module for determining a second segment of the momentum gradient matrix of the current wheel based on the second segment of the momentum gradient matrix of the previous wheel and the second segment of the gradient matrix of the current wheel;

a fourth determining module, configured to determine a second segment of the updated model parameter matrix based on the second segment of the model parameter matrix and the second segment of the momentum gradient matrix of the current wheel.

13. A model training method based on multi-party safety calculation is applied to any party, and the party holds a first fragment of a characteristic matrix, a first fragment of a label matrix and a first fragment of a model parameter matrix; the method comprises one or more iterative updates of the model parameters, wherein one iterative update comprises:

performing cooperative operation with other calculation parties based on the first segment of the model parameter matrix and the first segment of the feature matrix to obtain a first segment of a first product matrix; the other computing parties hold a second fragment of the feature matrix, a second fragment of the label matrix and a second fragment of the model parameter matrix; the first product matrix is the product of a model parameter matrix and a characteristic matrix;

obtaining a first fragment of a gradient matrix based on the first fragment of the activation matrix, the first fragment of the feature matrix and the first fragment of the label matrix and the other calculation parties through cooperative operation; wherein the gradient matrix is the product of the difference between the activation matrix and the label matrix and the feature matrix;

14. A model training system based on multi-party safety calculation is applied to any party, wherein the party holds a first fragment of a characteristic matrix, a first fragment of a label matrix and a first fragment of a model parameter matrix; the system includes one or more iterative updates of model parameters, wherein the system includes:

a seventh obtaining module, configured to perform cooperative operation with other computing parties based on the first segment of the model parameter matrix and the first segment of the feature matrix to obtain a first segment of the first product matrix; the other computing parties hold a second fragment of the feature matrix, a second fragment of the label matrix and a second fragment of the model parameter matrix; the first product matrix is the product of a model parameter matrix and a characteristic matrix;

an eighth obtaining module, configured to obtain a first segment of an activation matrix based on the first segment of the first product matrix and through cooperative operation with the other computing parties, where an element of the activation matrix is an activation function value of a counterpoint element in the first product matrix;

a ninth obtaining module, configured to obtain a first segment of a gradient matrix based on a cooperative operation of the first segment of the activation matrix, the first segment of the feature matrix, and the first segment of the tag matrix with the other computing parties; wherein the gradient matrix is the product of the difference between the activation matrix and the label matrix and the feature matrix;

a fifth determining module, configured to determine a first segment of the momentum gradient matrix of the current wheel based on the first segment of the momentum gradient matrix of the previous wheel and the first segment of the gradient matrix of the current wheel;

a sixth determining module, configured to determine a first segment of the updated model parameter matrix based on the first segment of the model parameter matrix and the first segment of the momentum gradient matrix of the current wheel.

15. A multi-party security computation based model training apparatus comprising a processor and a storage device for storing instructions which, when executed by the processor, implement the method of any one of claims 1 to 5.

16. A multi-party security computation based model training apparatus comprising a processor and a storage device for storing instructions which, when executed by the processor, implement the method of any one of claims 7 to 11.

17. A multi-party security computation based model training apparatus comprising a processor and a storage device for storing instructions which, when executed by the processor, implement the method of claim 13.