CN114153854B

CN114153854B - Secret sharing-based multi-key grouping information acquisition method and system

Info

Publication number: CN114153854B
Application number: CN202210120960.6A
Authority: CN
Inventors: 方文静; 王力
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2022-02-09
Filing date: 2022-02-09
Publication date: 2022-05-10
Anticipated expiration: 2042-02-09
Also published as: CN114153854A

Abstract

The embodiment of the specification discloses a secret sharing-based multi-key grouping information acquisition method and system. The data columns corresponding to a plurality of information items of a plurality of objects are vertically distributed in a plurality of parties, and the elements of each data column are arranged on the basis of at least two information items serving as sorting keys. The method is performed by one of the parties, comprising: obtaining the fragments of the data column corresponding to the sorting key; aiming at each sorting key, carrying out secret sharing operation on the fragments of the data column corresponding to the sorting key and other parties to obtain the fragments of the grouping mark column corresponding to the sorting key, wherein the elements of the grouping mark column indicate grouping information of the elements in the data column corresponding to the sorting key; and carrying out secret sharing operation on the fragments of the grouping mark column corresponding to each sorting key and other parties to obtain the fragments of the multi-key grouping mark column, wherein elements of the multi-key grouping mark column indicate joint grouping information of the elements in the data column corresponding to each sorting key based on each sorting key.

Description

Secret sharing-based multi-key grouping information acquisition method and system

Technical Field

The present disclosure relates to the field of information technologies, and in particular, to a method and a system for obtaining multi-key grouping information based on secret sharing.

Background

In the big data era, multiple participants of a business wish to collaborate to accomplish data tasks. However, due to data privacy, data isolation, and the like, the data of each party cannot be directly centralized and processed. Therefore, how to realize the data task of multi-party combination on the premise of ensuring data security becomes a problem to be solved urgently.

Disclosure of Invention

One embodiment of the present specification provides a secret sharing-based multi-key grouping information acquisition method. The data columns corresponding to a plurality of information items of a plurality of objects are vertically distributed in a plurality of parties, elements of each data column are arranged on the basis of at least two information items serving as sorting keys, and the elements at the same position in each data column correspond to the same object. The method is performed by one of the parties, comprising: obtaining secret sharing fragments of data columns corresponding to the sorting keys; aiming at each sorting key, performing secret sharing operation with other parties based on the secret sharing fragment of the data column corresponding to the sorting key to obtain the secret sharing fragment of the grouping mark column corresponding to the sorting key, wherein the element of the grouping mark column indicates grouping information of the element in the data column corresponding to the sorting key; and carrying out secret sharing operation with other parties based on the secret sharing fragment of the grouping mark column corresponding to each sorting key to obtain the secret sharing fragment of the multi-key grouping mark column, wherein elements of the multi-key grouping mark column indicate joint grouping information of the elements in the data column corresponding to each sorting key based on each sorting key.

One embodiment of the present specification provides a multi-key grouping information acquisition system based on secret sharing. The data columns corresponding to a plurality of information items of a plurality of objects are vertically distributed in a plurality of parties, elements of each data column are arranged on the basis of at least two information items serving as sorting keys, and the elements at the same position in each data column correspond to the same object. The system is implemented in one of the parties, comprising: the obtaining module is used for obtaining the secret sharing fragment of the data column corresponding to the sorting key; the first secret sharing operation module is used for performing secret sharing operation with other parties based on the secret sharing fragment of the data column corresponding to the sorting key aiming at each sorting key so as to obtain the secret sharing fragment of the grouping mark column corresponding to the sorting key, wherein the element of the grouping mark column indicates the grouping information of the element in the data column corresponding to the sorting key; and the second secret sharing operation module is used for performing secret sharing operation with other parties based on the secret sharing fragment of the grouping mark column corresponding to each sorting key to obtain the secret sharing fragment of the multi-key grouping mark column, wherein elements of the multi-key grouping mark column indicate joint grouping information of the elements in the data column corresponding to each sorting key based on each sorting key.

One of the embodiments of the present specification provides a secret sharing-based multi-key grouping information obtaining apparatus, including a processor and a storage device, where the storage device is used to store instructions, and when the processor executes the instructions, the secret sharing-based multi-key grouping information obtaining method according to any embodiment of the present specification is implemented.

One embodiment of the present specification provides a data aggregation method based on secret sharing. The data columns corresponding to a plurality of information items of a plurality of objects are vertically distributed in a plurality of parties, elements of each data column are arranged on the basis of at least two information items serving as sorting keys, and the elements at the same position in each data column correspond to the same object. The method is performed by one of the parties, comprising: obtaining secret sharing fragments of data columns corresponding to the sorting keys; aiming at each sorting key, performing secret sharing operation with other parties based on the secret sharing fragment of the data column corresponding to the sorting key to obtain the secret sharing fragment of the grouping mark column corresponding to the sorting key, wherein the element of the grouping mark column indicates grouping information of the element in the data column corresponding to the sorting key; carrying out secret sharing operation with other parties based on the secret sharing fragment of the grouping mark column corresponding to each sorting key to obtain the secret sharing fragment of the multi-key grouping mark column, wherein elements of the multi-key grouping mark column indicate joint grouping information of the elements in the data column corresponding to each sorting key based on each sorting key; disclosing a secret sharing fragment of a multi-key grouping mark column to obtain the multi-key grouping mark column; and acquiring an aggregation result corresponding to each group of the data columns to be aggregated in the data columns corresponding to the plurality of information items and/or a secret sharing fragment of the aggregation result corresponding to each group according to the multi-key grouping mark column.

One embodiment of the present specification provides a data aggregation system based on secret sharing. The data columns corresponding to a plurality of information items of a plurality of objects are vertically distributed in a plurality of parties, elements of each data column are arranged on the basis of at least two information items serving as sorting keys, and the elements at the same position in each data column correspond to the same object. The system is implemented in one of the parties, comprising: the obtaining module is used for obtaining the secret sharing fragment of the data column corresponding to the sorting key; the first secret sharing operation module is used for performing secret sharing operation with other parties based on the secret sharing fragment of the data column corresponding to the sorting key aiming at each sorting key so as to obtain the secret sharing fragment of the grouping mark column corresponding to the sorting key, wherein the element of the grouping mark column indicates the grouping information of the element in the data column corresponding to the sorting key; the second secret sharing operation module is used for performing secret sharing operation with other parties based on the secret sharing fragment of the grouping mark column corresponding to each sorting key to obtain the secret sharing fragment of the multi-key grouping mark column, wherein elements of the multi-key grouping mark column indicate joint grouping information of the elements in the data column corresponding to each sorting key based on each sorting key; a disclosure module configured to disclose the secret sharing shards of the multi-key grouping flag column to obtain the multi-key grouping flag column; and the aggregation module is used for obtaining the aggregation result corresponding to each group of the data columns to be aggregated in the data columns corresponding to the plurality of information items and/or the secret sharing fragment of the aggregation result corresponding to each group according to the multi-key grouping mark column.

One of the embodiments of the present specification provides a data aggregation apparatus based on secret sharing, including a processor and a storage device, where the storage device is configured to store an instruction, and when the processor executes the instruction, the data aggregation apparatus based on secret sharing according to any one of the embodiments of the present specification is implemented.

Drawings

The present description will be further explained by way of exemplary embodiments, which will be described in detail by way of the accompanying drawings. These embodiments are not intended to be limiting, and in these embodiments like numerals are used to indicate like structures, wherein:

fig. 1 is a schematic diagram of an application scenario of data aggregation based on secret sharing according to some embodiments of the present description;

FIG. 2 is an exemplary flow diagram of a method for data aggregation based on secret sharing, according to some embodiments of the present description;

FIG. 3 is a schematic illustration of a multi-key ordering according to some embodiments of the present description;

FIG. 4 is a schematic diagram of an operator for implementing a secret sharing comparison operation, according to some embodiments of the present description;

FIG. 5 is an example of obtaining a grouped tag column based on a data column k, where the grouped tag column is as long as the data column k, according to some embodiments of the present description;

FIG. 6 is an example of obtaining a column of grouped labels based on a column of data k, where the column of grouped labels is not as long as the column of data k, according to some embodiments of the present description;

FIGS. 7A-7D are schematic diagrams illustrating equivalence relations between grouping tag columns and multi-key grouping tag columns corresponding to sorting keys according to some embodiments of the present disclosure;

FIGS. 8A and 8B are two examples of determining grouping based on a multi-key grouping flag according to some embodiments of the present description;

FIG. 9 is an exemplary block diagram of a multi-key grouping information acquisition system based on secret sharing according to some embodiments of the present description;

fig. 10 is an exemplary block diagram of a data aggregation system based on secret sharing, shown in accordance with some embodiments of the present description.

Detailed Description

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only examples or embodiments of the present description, and that for a person skilled in the art, the present description can also be applied to other similar scenarios on the basis of these drawings without inventive effort. Unless otherwise apparent from the context, or otherwise indicated, like reference numbers in the figures refer to the same structure or operation.

It should be understood that "system", "device", "unit" and/or "module" as used herein is a method for distinguishing different components, elements, parts, portions or assemblies at different levels. However, other words may be substituted by other expressions if they accomplish the same purpose.

As used in this specification, the terms "a", "an" and/or "the" are not intended to be inclusive of the singular, but rather are intended to be inclusive of the plural, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that steps and elements are included which are explicitly identified, that the steps and elements do not form an exclusive list, and that a method or apparatus may include other steps or elements.

Flow charts are used in this description to illustrate operations performed by a system according to embodiments of the present description. It should be understood that the preceding or following operations are not necessarily performed in the exact order in which they are performed. Rather, the various steps may be processed in reverse order or simultaneously. Meanwhile, other operations may be added to the processes, or a certain step or several steps of operations may be removed from the processes.

Some basic concepts to which this specification relates will first be described.

Secret sharing (secret sharing), also called secret sharing, is a cryptographic technique that splits a secret (secret) in such a way that several shares (shares) obtained by splitting are held by different parties, a single party cannot recover the secret, and only if several parties cooperate, e.g. disclosing the respective held shares, can the secret be recovered. The secret may be in the form of a one-dimensional value, an array, a vector, a matrix, etc. The shares obtained by splitting the secret can also be called secret sharing shards, or shards for short.

Secure multi-party computing (SMPC), abbreviated as MPC in english, solves the problem of how to securely compute a contracted function without a trusted third party. The MPC needs to ensure both the input privacy and the result correctness, i.e. private data (input) of any party cannot be revealed during the interactive computation process, and also needs to ensure that the computed result is consistent with the result obtained by directly inputting private data of each party into the agreed function.

Secure multiparty computation can be implemented in conjunction with secret sharing, and the computation results (output) can be distributed to the parties in a sharded fashion. Specifically, through interactive computation, each participant can obtain an output fragment of the commitment function, and the output fragment obtained by each party is obtained by directly inputting private data of each party into the commitment function and then splitting function output (secret). In this specification, such secure multiparty computation implemented in conjunction with secret sharing is also referred to as secret sharing operation, and specifically may be secret sharing combination, secret sharing ranking, or the like. In some embodiments, in the secret sharing operation, the input and/or intermediate calculation result may also exist in a fragmented form, so as to protect the data privacy of each party.

Fig. 1 is a schematic diagram of an application scenario of data aggregation based on secret sharing according to some embodiments of the present description.

Referring to fig. 1, a plurality of data columns are vertically distributed between a first side (denoted as a side) and a second side (denoted as B side). Vertical distribution of data means that multiple parties have information items of the same multiple objects but the information items of the parties are different. It is assumed that a two-dimensional table is used to store a plurality of information items of a plurality of objects, wherein each row corresponds to an object, or each row corresponds to an ID, the plurality of information items under each ID can be collectively referred to as a record under the ID, and each column corresponds to an information item (or referred to as a field). A vertical distribution is equivalent to a division of the complete two-dimensional table into a plurality of sections in the vertical direction, each section may comprise one or more columns of data, and the plurality of sections may be distributed over multiple parties. The vertical distribution may align the IDs of the data, i.e., the same row of each data column owned by multiple parties may correspond to the same ID (object). In still other embodiments, the vertical distribution of data may also refer to splitting each element of the two-dimensional table into secret sharing shards, where the secret sharing shards of each element are held by multiple parties. In still other embodiments, the vertical distribution of data may also refer to a mixture of the two aforementioned scenarios. For example, in the complete two-dimensional table, a part of columns is held by the party a, another part of columns is held by the party B, and a part of columns is held by two parties in the form of secret sharing shards, that is, the party a holds the first shard of the part of columns, and the party B holds the second shard of the part of columns.

As shown in fig. 1, the a-side has data columns corresponding to the three information items k1, k2, k3 as sort keys, respectively. Party B has a data column corresponding to information item k4 as a sort key, and a data column to be aggregated (simply referred to as a column to be aggregated) corresponding to information item v. It is to be understood that the vertical distribution of data illustrated in fig. 1 is by way of example only. In some embodiments, any party may only have a data column corresponding to one sort key, or may have a plurality of data columns corresponding to a plurality of sort keys, respectively, and the column to be aggregated may be on any party. For convenience of description, data columns, such as data columns k1, k2, k3, k4 and a column v to be aggregated, may be distinguished by key names. In this specification, it is not assumed that the second party (B party) is the party owning the column to be aggregated v.

A. And B, two parties hope to realize multi-key sorting through cooperation security and to realize the grouping and aggregating operation to the aggregation column based on the result of the multi-key sorting. In a real-world scenario, k1, k2, k3, and k4 may represent user features of different dimensions, v may be a field to be counted, and A, B both sides want to implement a data statistics task for a specific group of users through cooperation, where the group refers to that the user features of multiple dimensions all satisfy a specific condition (e.g., k1, k2, k3, and k4 are respectively equal to specific values).

Fig. 2 is an exemplary flow diagram of a method for data aggregation based on secret sharing, according to some embodiments of the present description. The process 200 may be performed by either party.

Wherein the elements of each data column (e.g., the data column corresponding to the sort key and the column to be aggregated) have been arranged based on at least two information items as sort keys. That is, the elements of each data column are multi-key ordered. Multi-key sorting refers to sorting a to-be-sorted sequence based on a plurality (at least two) of sort keys, and the order of elements in the resulting sequence (i.e., the data column in step 210) is jointly determined by the plurality of sort keys. It will be appreciated that when the plurality of sort keys of several objects are respectively the same, the corresponding positions (e.g., corresponding rows) of these objects in the result sequence are consecutive, and these objects or their associated data may be divided into the same group. Referring to fig. 3, the left side of the arrow is the data columns corresponding to the information items k1, k2, k3 and k4 which are not multi-key sorted, and the data columns are subjected to multiple rounds of iterative sorting (ascending order) by taking k4, k3, k2 and k1 as sorting keys in sequence, and the result of multi-key sorting is shown in the right side of the arrow. When several objects (distinguished by numbers) are identical in k1, identical in k2, identical in k3 and identical in k4, corresponding rows of the objects in the sorted data columns k1, k2, k3, k4 are consecutive, and the objects (IDs) or their associated data (e.g., fields, records under IDs) can be divided into the same group. As shown in fig. 3, if k1 of objects numbered 1 and 5 are all 0, k2 is all 1, k3 is all 0, and k4 is all 1, then corresponding lines of objects numbered 1 and 5 in the result sequence are consecutive, and the objects numbered 1 and 5 or their associated data can be divided into the same group. If k1 of objects numbered 4 and 0 are all 1, k2 is all 1, k3 is all 1, and k4 is all 1, then the corresponding lines of objects numbered 0 and 4 in the result sequence are consecutive, and the objects numbered 0 and 4 or their associated data can be divided into the same group.

For some aggregation operations depending on the ordering, such as maximum value finding, minimum value finding, median finding, and the like, the data columns may be preferentially ordered based on the information items corresponding to the data columns to be aggregated (e.g., the column v to be aggregated in fig. 1), and then the data columns may be ordered and grouped based on a plurality of ordering keys, respectively. It should be noted that, although in some embodiments, the data columns to be aggregated are also arranged as a basis for sorting, for the sake of distinction, the information items corresponding to the data columns to be aggregated are not conceptually referred to as sorting keys. That is, when data columns corresponding to sort keys are referred to in this specification, it means that these data columns do not serve as data columns to be aggregated. In some embodiments of the present description, the principle of stable ordering is followed when ordering (e.g. descending or ascending) according to some sort key, that is, the precedence of two or more elements in the data column in the result sequence is consistent with that in the original data column. For example, in fig. 3, the element number 1 and the element number 5 in the k1 sequence are both 0, and in the resulting sequence (right side of the arrow), the element number 1 still precedes the element number 5.

The multi-key sorting can be obtained through secret sharing operation, and any one of the two parties can obtain the fragments of the data columns after the multi-key sorting. In particular, a slice of a data column may include a slice of elements in the data column.

As shown in fig. 2, the process 200 may include the following steps:

step 210, obtaining the fragment of the data column corresponding to the sorting key.

According to the foregoing, the data columns corresponding to the sorting keys obtained in step 210 are sorted by multiple keys.

Step 220, for each sorting key, performing secret sharing operation with another party based on the secret sharing fragment of the data column corresponding to the sorting key to obtain the fragment of the grouping mark column corresponding to the sorting key.

Wherein, the element of the grouping mark column indicates the grouping information of the element in the data column corresponding to the sorting key. Because the data columns corresponding to the sorting keys are sorted by multiple keys, the sorting result of the elements of the data columns corresponding to any sorting key is jointly determined by the multiple sorting keys. In some embodiments, two or more elements having the same value in succession in the data column corresponding to the sort key belong to the same group. Of course, a single element in the data column is different from the previous element and the next element, and the single element forms a new group.

Taking fig. 1 as an example, party a can obtain the slices of the grouped tag columns corresponding to k1, k2, k3 and k4 respectively<G1>_A、<G2>_A、<G3>_A、<G4>_AParty B can obtain the slices of the grouped label columns corresponding to k1, k2, k3 and k4 respectively<G1>_B、<G2>_B、<G3>_B、<G4>_B。

In some embodiments, the shards of the grouped token columns may be obtained through a secret sharing comparison operation. Specifically, taking the data column k corresponding to the sorting key k as an example, for each pair of adjacent positions (i.e., adjacent rows) of the data column k, the party a and the party B may perform secret sharing comparison operation based on the shards of the elements of the pair of adjacent positions to obtain the shard of the grouping mark column corresponding to the sorting key k. In some embodiments, the grouping flag column may be equal in length to the data column k, and the element of the grouping flag column may indicate whether the corresponding element in the data column k is the same as its previous element. For example, a non-leading (row) element of a group flag column may indicate whether a pair of the elements in data column k is the same as its preceding element, and if not, further indicate that there is a group boundary point between the pair of the elements in data column k and its preceding element, otherwise further indicate that there is no group boundary point between the pair of the elements in data column k and its preceding element. In particular, the first (row) element of the grouping flag column may be a preset value, and reference may be specifically made to the related description below. In some embodiments, the length of the grouped flag column may coincide with the logarithm of the adjacent positions in the data column k, i.e., each element of the grouped flag column corresponds to a pair of adjacent positions in the data column k, and the element of the grouped flag column may indicate whether there is a component boundary point between the elements of the corresponding pair of adjacent positions.

Fig. 4 illustrates an operator (ciphertext comparison operator for short) for implementing a secret sharing comparison operation, which shows an input-output relationship of the secret sharing comparison operation. As shown in FIG. 4, for a two-party scenario, the inputs to the ciphertext comparison operator, i.e., the two sets of slices<M1>_A、<M2>_AAnd<M1>_B、<M2>_Ba, B, where M1 and M2 are the two values (typically numerical values) involved in the comparison. A. The two parties B can respectively obtain the fragments of the comparison result M (0/1 represents the difference), namely the party A obtains the fragments<M>_AParty B obtains shards<M>_B. It is understood that the expression rule of the comparison result in fig. 4 is only an example, and for example, the expression rule may also be: when M1= M2, M = 0; when M1 ≠ M2, M = 1. The internal implementation of the secret sharing comparison operator is not limited in the present specification, and the secret sharing comparison operator is called as a black box operator, and both existing algorithms and algorithms that will appear in the future and can implement the foregoing comparison function can be used in the embodiments of the present specification.

Fig. 5 and 6 provide examples of obtaining a packet tag column based on a data column k when the ciphertext comparison operator shown in fig. 4 is applied.

In the case that the grouping flag column is as long as the data column k, as shown in fig. 5, the row 2 element of the grouping flag column is 1, which indicates that the row 2 element in the data column k is the same as the row 1 element, and further indicates that there is no group boundary point between the row 2 element and the row 1 element of the data column k; row 3 elements of the grouped marking column are 1, which indicates that row 3 elements in the data column k are the same as row 2 elements, further indicates that there is no group demarcation point between row 3 elements and row 2 elements of the data column k; ...; the 9 th row element of the grouping flag column is 0, which indicates that the 9 th row element in the data column k is different from the 8 th row element, further indicates that there is a grouping boundary point between the 9 th row element and the 8 th row element of the data column k; the 10 th row element of the grouping flag column is 0, which indicates that the 10 th row element in the data column k is different from the 9 th row element, further indicating that there is a grouping boundary point between the 10 th row element and the 9 th row element of the data column k. It can be seen that the distribution of 0's in the grouping flag column reflects the distribution of the component boundaries in the data column k, or reflects the distribution of the starting points of the individual groupings in the data column k. In some embodiments, the row 1 element of the grouping flag column may be 0. It will be understood that the 0, 1 elements of the grouping flag column in the above embodiments may be interchanged.

In the case that the grouping flag column is not equal to the data column k, as shown in fig. 6, the row 1 element of the grouping flag column is 1, which indicates that there is no group boundary point between the row 1 element and the row 2 element of the data column k; the row 2 element of the grouping mark column is 1, which indicates that there is no grouping boundary point between the row 2 element and the row 3 element of the data column; the row 3 element of the grouping flag column is 0, which indicates that there is a grouping boundary point between the row 3 element and the row 4 element of the data column k; ...; the 8 th row element of the grouping flag column is 0, which indicates that there is a grouping boundary point between the 8 th row element and the 9 th row element of the data column; the row 9 element of the grouping flag column is 0, which indicates that there is a grouping boundary point between the row 9 element and the row 10 element of the data column. It can be seen that the distribution of 0's in the grouping flags column reflects the distribution of the grouping boundaries in the data column k. It will be understood that the 0, 1 elements of the grouping flag column in the above embodiments may be interchanged.

And step 230, performing secret sharing operation with another party based on the fragments of the grouping mark columns corresponding to the sorting keys to obtain the fragments of the multi-key grouping mark columns.

Wherein the elements of the multi-key grouping flag column may indicate joint grouping information of the elements in each data column based on each sort key. When the sort keys of several objects are the same, the corresponding elements of these objects in the data column belong to the same (multi-key) group. Of course, a single element in a column of data may also constitute a (multi-key) grouping.

Because the multi-key grouping is determined by a plurality of sorting keys, the party A and the party B can perform secret sharing operation based on the fragments of the grouping mark columns corresponding to the sorting keys to obtain the fragments of the multi-key grouping mark columns. Taking fig. 1 as an example, party a and party B may perform secret sharing operation based on the fragments of the grouping flag columns corresponding to sorting keys k1, k2, k3, and k4, to obtain fragments of the multi-key grouping flag column. The secret sharing operation herein may refer to a secret sharing and operation or a secret sharing and operation.

In some embodiments, the grouping flag column may be as long as the data column (one-to-one correspondence of elements), and the elements of the grouping flag column indicate whether the bit element in the data column is the same as its previous element. When the elements of a pair of adjacent positions are the same, the element of the next element in the grouping mark column is 1, otherwise, the element is 0. With reference to the foregoing, the latter being 0 means that there is a component boundary point between the elements of the pair of adjacent positions. The first element of the grouping flag column is 0. Based on the result, the multi-key grouping mark column is equal to the result of bitwise AND of the grouping mark column corresponding to each sorting key. It will be appreciated that since any one bit input of the AND operation is 0 and the output is 0, the distribution of 0's in the multi-key grouping flag column may still reflect the distribution of grouping boundaries in the data column. The grouping flag columns in fig. 7A to 7D are obtained based on the data columns (result sequences) shown in fig. 3. Referring to fig. 3 and 7A in combination, for a data column of length 6, a grouped flag column of length 6 may be obtained, and to the left of the equation of fig. 7A are grouped flag columns B1, B2, B3, B4 corresponding to sorting keys k1, k2, k3, k4, respectively, which are bitwise anded, i.e., equal to the multi-key grouped flag column B.

In some embodiments, the grouping flag column may be as long as the data column (one-to-one correspondence of elements), and the elements of the grouping flag column indicate whether the bit element in the data column is the same as its previous element. When the elements of a pair of adjacent positions are the same, the element of the next element in the grouping mark column is 0, otherwise, the element is 1. With reference to the foregoing, a value of 1 for the latter means that there is a compositional boundary between the elements of the pair of adjacent positions. The first element of the grouping flag column is 1. Based on this, the multi-key grouping mark column is equal to the result of the grouping mark column corresponding to each sorting key according to the phase OR. It will be appreciated that since any one bit input of the OR operation is 1 and the output is 1, the distribution of 1's in the multi-key grouping flag column may still reflect the distribution of grouping boundaries in the data column. Referring to fig. 3 and 7B in combination, for a data column with a length of 6, a grouping flag column with a length of 6 as well can be obtained, and fig. 7B is provided with grouping flag columns B1, B2, B3 and B4 corresponding to sorting keys k1, k2, k3 and k4 on the left in a formula, and the grouping flag columns are equal to the multi-key grouping flag column B.

In some embodiments, the length of the grouped flag column may coincide with the logarithm of the adjacent positions in the data column, i.e., each element of the grouped flag column corresponds to a pair of adjacent positions in the data column, and the element of the grouped flag column indicates whether there is a group boundary point between the elements of the corresponding pair of adjacent positions. When the elements of a pair of adjacent positions are the same, the corresponding element in the grouping flag column is 1, otherwise, it is 0. With reference to the foregoing, the latter being 0 means that there is a component boundary point between the elements of the pair of adjacent positions. Based on the result, the multi-key grouping mark column is equal to the result of bitwise AND of the grouping mark column corresponding to each sorting key. It will be appreciated that since any one bit input of the AND operation is 0 and the output is 0, the distribution of 0's in the multi-key grouping flag column may still reflect the distribution of grouping boundaries in the data column. Referring to fig. 3 and 7C in combination, for a data column of length 6, a grouped flag column of length 5 may be obtained, and on the left of the equation in fig. 7C are grouped flag columns B1, B2, B3, B4 corresponding to sort keys k1, k2, k3, k4, respectively, which are bitwise anded, i.e., equal to the multi-key grouped flag column B.

In some embodiments, the length of the grouped flag column may coincide with the logarithm of the adjacent positions in the data column, i.e., each element of the grouped flag column corresponds to a pair of adjacent positions in the data column, and the element of the grouped flag column indicates whether there is a group boundary point between the elements of the corresponding pair of adjacent positions. When the elements of a pair of adjacent positions are the same, the corresponding element in the grouping flag column is 0, otherwise, it is 1. With reference to the foregoing, a value of 1 for the latter means that there is a compositional boundary between the elements of the pair of adjacent positions. Based on this, the multi-key grouping mark column is equal to the result of the grouping mark column corresponding to each sorting key according to the phase OR. It will be appreciated that since any one bit input of the OR operation is 1 and the output is 1, the distribution of 1's in the multi-key grouping flag column may still reflect the distribution of grouping boundaries in the data column. Referring to fig. 3 and 7D in combination, for a data column with a length of 6, a grouped flag column with a length of 5 may be obtained, and on the left side of the equation of fig. 7C, grouped flag columns B1, B2, B3, B4 corresponding to sorting keys k1, k2, k3, k4, respectively, are sorted and are put in a bit or result, i.e., equal to the multi-key grouped flag column B.

Step 240, disclosing the fragments of the multi-key grouping mark column to obtain the multi-key grouping mark column.

Disclosing shards of a multi-key grouping flag column means that the multi-key grouping flag column is also disclosed. Since the disclosure of the multi-key grouping flag column only exposes the joint grouping information (such as the number of elements in a group, the number of elements in a group) of the elements in each data column based on each sorting key, the elements in each data column are not exposed. Thus, the disclosure of the multi-key packet marker column is an acceptable security back-off for the actual traffic scenario.

In some embodiments, parties a and B may exchange slices of a multi-key packet tag column to obtain the multi-key packet tag column. In some embodiments, the shards of the multi-key grouping flag columns of the two parties may be concentrated on one party, such as the a-party/B-party/third party, and the multi-key grouping flag columns may be available to either party.

In some embodiments, for the case that the length of the grouping flag column is consistent with the logarithm of the adjacent position in the data column, after the multi-key grouping flag column is obtained, 0/1 may be added as a new head element at the head (before the original head) of the multi-key grouping flag column, so that the multi-key grouping flag column obtained under the condition that the grouping flag column is as long as the data column may be obtained by conversion, which may be specifically referred to fig. 7A to 7D.

And step 250, obtaining the aggregation result corresponding to each group of the data columns to be aggregated in the data columns corresponding to the plurality of information items and/or the fragments of the aggregation result corresponding to each group according to the multi-key grouping mark column.

The multi-key grouping flag column may indicate joint grouping information of elements in each data column (result sequence) based on each sort key, such as indicating which objects belong to a group in the corresponding elements in the data column. With reference to the foregoing, in some embodiments, the distribution of 0's in the multi-key grouping flag column may reflect the distribution of component boundaries in the data column. In some embodiments, the distribution of 1's in the multi-key grouping indicia column may reflect the distribution of component boundaries in the data column. Further, the grouping situation can be determined by the distribution of the group boundaries. Fig. 8A and 8B provide two examples of determining grouping based on a multi-key grouping flag. As shown in fig. 8A, for a data column of length 6, a multi-key grouping flag column B of length 6 can be obtained, the distribution of 1's in the multi-key grouping flag column B reflects the distribution of the grouping boundaries in the data column, and based on the multi-key grouping flag column B, a grouping situation represented by a group number column in ascending order from top to bottom can be obtained, and 6 objects (or their associated data) of numbers 0 to 5 are divided into 4 groups. Wherein, the element in the same row with the bold group number can be regarded as the first element in the group. As shown in fig. 8B, for a data column with a length of 6, a multi-key grouping flag column B with a length of 5 may be obtained, the distribution of 1 in the multi-key grouping flag column B reflects the distribution of component boundaries in the data column, and a grouping situation may be obtained based on the multi-key grouping flag column B, which is the same as that in fig. 8A and is not described herein again. The secret sharing-based multi-key grouping information acquisition method provided by the embodiment of the specification provides safe back-off intra-group aggregation by disclosing the fragments of the multi-key grouping mark column, and can realize multi-party combined data tasks on the premise of ensuring data safety.

In some embodiments, the aggregated result may include a result of one or more of the following operations on the intra-group elements: summing, averaging, counting, maximizing, averaging (i.e., median), minimizing, etc. That is, the aggregation may refer to one or more of summing, averaging, counting, maximizing, averaging, minimizing, and the like.

For intra-group counting, the number of elements of each group of the data column to be aggregated (i.e., the column to be aggregated) may be obtained from the multi-key grouping flag column. Referring to fig. 8A or fig. 8B, the number of elements of each group of the columns to be aggregated may be determined according to the grouping condition (group number corresponding to each object) corresponding to the multi-key grouping flag column, where the number of each group number is the number of elements of the corresponding group.

When the data column to be aggregated is stored in the party a and the party B in a fragmentation form, the parties can perform operations such as summation and averaging on the fragments of the same group of elements in the data column to be aggregated according to the grouping situation, so as to obtain the fragments of the aggregation result corresponding to each group. For example, for summation, a summation operation may be performed on the slices of the elements in the same group in the column to be aggregated according to grouping conditions, so as to obtain the slices of the sum value corresponding to each group. For the operations depending on the sorting, such as finding the maximum value, finding the median, finding the minimum value, and the like, since the sorting is preferentially performed based on the data columns to be aggregated, the pieces of the elements of the data columns to be aggregated in each group are also ordered, and the piece of the maximum value, the median piece, or the minimum value piece can be directly found in the group based on the pieces.

When the data to be aggregated is the private data of the party A or the party B, the party A or the party B can perform any aggregation operation according to the grouping condition to obtain the aggregation result of each group.

It should be noted that although the present specification mainly describes a two-party scenario as an example, the principle of the present specification can be generalized and applied to a multi-party (three or more parties) scenario. In a multi-party scenario, any two parties may act as a pair of opposing parties to perform the aforementioned processes of party a and party B. In addition, when the data to be aggregated is listed as a single party private, the other party can only execute the steps 210-240 in the flow 200. It should be further noted that steps 210 to 230 in the process 200 are also applicable to the fragment of each data column corresponding to the sorting key owned by both sides, and a scenario in which the fragment of the multi-key grouping flag column needs to be cooperatively obtained. At this time, the fragments of the multi-key grouping mark column obtained by the two parties can be used as the known conditions of other multi-party secure computing tasks, so as to help the two parties to complete other multi-party secure computing tasks.

Fig. 9 is an exemplary block diagram of a multi-key grouping information acquisition system based on secret sharing according to some embodiments of the present description. System 900 may be implemented in any of a number of parties.

As shown in fig. 9, the system 900 may include an obtaining module 910, a first secret sharing operation module 920, and a second secret sharing operation module 930.

The obtaining module 910 may be configured to obtain a slice of a data column corresponding to an ordering key.

The first secret sharing operation module 920 may be configured to: and aiming at each sorting key, carrying out secret sharing operation with other parties based on the secret sharing fragment of the data column corresponding to the sorting key so as to obtain the fragment of the grouping mark column corresponding to the sorting key.

The second secret sharing operation module 930 may be configured to perform secret sharing operation with other parties based on the fragments of the grouping mark column corresponding to each sorting key, so as to obtain fragments of the multi-key grouping mark column.

Fig. 10 is an exemplary block diagram of a data aggregation system based on secret sharing, shown in accordance with some embodiments of the present description. System 1000 can be implemented in any of a number of parties.

As shown in fig. 10, the system 1000 may include an obtaining module 1010, a first secret sharing operation module 1020, a second secret sharing operation module 1030, a disclosure module 1040, and an aggregation module 1050.

The obtaining module 1010 may be configured to obtain a slice of a data column corresponding to an ordering key.

The first secret sharing operation module 1020 may be configured to: and aiming at each sorting key, carrying out secret sharing operation with other parties based on the secret sharing fragment of the data column corresponding to the sorting key so as to obtain the fragment of the grouping mark column corresponding to the sorting key.

The second secret sharing operation module 1030 may be configured to perform secret sharing operation with other parties based on the fragments of the group tag columns corresponding to the sorting keys, to obtain fragments of the multi-key group tag columns.

The disclosure module 1040 may be used to disclose a slice of a multi-key grouping flag column to obtain the multi-key grouping flag column.

The aggregation module 1050 may be configured to obtain, according to the multi-key grouping flag column, an aggregation result corresponding to each grouping of the data columns to be aggregated in the data columns corresponding to the plurality of information items and/or a fragment of the aggregation result corresponding to each grouping.

For more details on the system 900, the system 1000, and the modules thereof, reference may be made to fig. 2 and its associated description.

It should be understood that the systems shown in fig. 9, 10 and their modules may be implemented in various ways. For example, in some embodiments, the system and its modules may be implemented in hardware, software, or a combination of software and hardware. Wherein the hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory for execution by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the methods and systems described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided, for example, on a carrier medium such as a diskette, CD-or DVD-ROM, a programmable memory such as read-only memory (firmware), or a data carrier such as an optical or electronic signal carrier. The system and its modules in this specification may be implemented not only by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, etc., or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., but also by software executed by various types of processors, for example, or by a combination of hardware circuits and software (e.g., firmware).

It should be noted that the above description of the system and its modules is for convenience of description only and should not limit the present disclosure to the illustrated embodiments. It will be appreciated by those skilled in the art that, given the teachings of the system, any combination of modules or sub-system configurations may be used to connect to other modules without departing from such teachings. For example, in some embodiments, the first secret sharing operation module 1020 (or 920) and the second secret sharing operation module 1030 (or 930) may be two separate modules or may be combined into one module. Such variations are within the scope of the present disclosure.

The beneficial effects that may be brought by the embodiments of the present specification include, but are not limited to: (1) the data task of multi-party combination is realized on the premise of ensuring the data security; (2) an intra-group aggregation of security deferrals is provided by disclosing fragmentation of a multi-key grouping tag column. (3) The fragments of the grouping mark columns corresponding to the sorting keys can be calculated in parallel so as to improve the calculation efficiency; (3) after the multi-key grouping mark column in a clear text form is obtained, the intra-group polymerization operation can be locally carried out at any party without cooperation of multiple parties, so that the polymerization efficiency is greatly improved. It is to be noted that different embodiments may produce different advantages, and in different embodiments, any one or combination of the above advantages may be produced, or any other advantages may be obtained.

Having thus described the basic concept, it will be apparent to those skilled in the art that the foregoing detailed disclosure is to be considered merely illustrative and not restrictive of the embodiments herein. Although not explicitly described herein, various modifications, improvements and adaptations to the embodiments described herein may occur to those skilled in the art. Such modifications, improvements and adaptations are proposed in the embodiments of the present specification and thus fall within the spirit and scope of the exemplary embodiments of the present specification.

Also, the description uses specific words to describe embodiments of the description. Reference to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic described in connection with at least one embodiment of the specification. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, some features, structures, or characteristics of one or more embodiments of the specification may be combined as appropriate.

Moreover, those skilled in the art will appreciate that aspects of the embodiments of the present description may be illustrated and described in terms of several patentable species or situations, including any new and useful combination of processes, machines, manufacture, or materials, or any new and useful improvement thereof. Accordingly, aspects of embodiments of the present description may be carried out entirely by hardware, entirely by software (including firmware, resident software, micro-code, etc.), or by a combination of hardware and software. The above hardware or software may be referred to as "data block," module, "" engine, "" unit, "" component, "or" system. Furthermore, aspects of the embodiments of the present specification may be represented as a computer product, including computer readable program code, embodied in one or more computer readable media.

The computer storage medium may comprise a propagated data signal with the computer program code embodied therewith, for example, on baseband or as part of a carrier wave. The propagated signal may take any of a variety of forms, including electromagnetic, optical, etc., or any suitable combination. A computer storage medium may be any computer-readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code located on a computer storage medium may be propagated over any suitable medium, including radio, cable, fiber optic cable, RF, or the like, or any combination of the preceding.

Computer program code required for operation of various portions of the embodiments of the present description may be written in any one or more programming languages, including an object oriented programming language such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C + +, C #, VB.NET, Python, and the like, a conventional programming language such as C, VisualBasic, Fortran2003, Perl, COBOL2002, PHP, ABAP, a dynamic programming language such as Python, Ruby, and Groovy, or other programming languages, and the like. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or processing device. In the latter scenario, the remote computer may be connected to the user's computer through any network format, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet), or in a cloud computing environment, or as a service, such as a software as a service (SaaS).

In addition, unless explicitly stated in the claims, the order of processing elements and sequences, use of numbers and letters, or use of other names in the embodiments of the present specification are not intended to limit the order of the processes and methods in the embodiments of the present specification. While various presently contemplated embodiments of the invention have been discussed in the foregoing disclosure by way of example, it is to be understood that such detail is solely for that purpose and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements that are within the spirit and scope of the embodiments herein. For example, although the system components described above may be implemented by hardware devices, they may also be implemented by software-only solutions, such as installing the described system on an existing processing device or mobile device.

Similarly, it should be noted that in the preceding description of embodiments of the specification, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more embodiments of the invention. This method of disclosure, however, is not intended to imply that more features are required than are expressly recited in the claims. Indeed, the embodiments may be characterized as having less than all of the features of a single embodiment disclosed above.

For each patent, patent application publication, and other material, such as articles, books, specifications, publications, documents, etc., cited in this specification, the entire contents of each are hereby incorporated by reference into this specification. Except where the application is filed in a manner inconsistent or contrary to the present specification, and except where a claim is filed in a manner limited to the broadest scope of the application (whether present or later appended to the application). It is to be understood that the descriptions, definitions and/or uses of terms in the accompanying materials of this specification shall control if they are inconsistent or contrary to the descriptions and/or uses of terms in this specification.

Finally, it should be understood that the embodiments described herein are merely illustrative of the principles of the embodiments of the present disclosure. Other variations are possible within the scope of the embodiments of the present description. Thus, by way of example, and not limitation, alternative configurations of the embodiments of the specification can be considered consistent with the teachings of the specification. Accordingly, the embodiments of the present description are not limited to only those embodiments explicitly described and depicted herein.

Claims

1. A multi-key grouping information acquisition method based on secret sharing is disclosed, wherein data columns corresponding to a plurality of information items of a plurality of objects are vertically distributed in a plurality of parties, elements of each data column are arranged based on at least two information items serving as sorting keys, and the elements at the same position in each data column correspond to the same object; the method is performed by one of the parties, comprising:

obtaining secret sharing fragments of data columns corresponding to the sorting keys;

aiming at each sorting key, performing secret sharing operation with other parties based on the secret sharing fragment of the data column corresponding to the sorting key to obtain the secret sharing fragment of the grouping mark column corresponding to the sorting key, wherein the element of the grouping mark column indicates grouping information of the element in the data column corresponding to the sorting key;

and carrying out secret sharing operation with other parties based on the secret sharing fragment of the grouping mark column corresponding to each sorting key to obtain the secret sharing fragment of the multi-key grouping mark column, wherein elements of the multi-key grouping mark column indicate joint grouping information of the elements in the data column corresponding to each sorting key based on each sorting key.

2. The method of claim 1, wherein performing, for each sort key, a secret sharing operation with the other party based on the secret sharing shard of the data column corresponding to the sort key to obtain the secret sharing shard of the grouped token column corresponding to the sort key comprises, for each pair of adjacent positions of the data column:

performing secret sharing comparison operation with other parties based on the secret sharing shards of the pair of adjacent elements to obtain the secret sharing shards of the grouped mark columns; the elements of the grouping flag column indicate whether the corresponding element in the data column is identical to its previous element.

3. The method according to claim 2, wherein when the elements of the pair of adjacent positions are the same, the element at the corresponding position of the following element in the grouping mark column is 1, otherwise, 0; the first element of the grouping mark column is 0;

the multi-key grouping mark column is equal to the result of bitwise AND of the grouping mark columns corresponding to the sorting keys.

4. The method according to claim 2, wherein when the elements of the pair of adjacent positions are the same, the element of the corresponding position of the following element in the grouping mark column is 0, otherwise is 1; the first element of the grouping mark column is 1;

the multi-key grouping mark column is equal to the result of the grouping mark column corresponding to each sorting key according to the phase OR.

5. The method of claim 1, wherein performing, for each sort key, a secret sharing operation with the other party based on the secret sharing shard of the data column corresponding to the sort key to obtain the secret sharing shard of the grouped token column corresponding to the sort key comprises, for each pair of adjacent positions of the data column:

performing secret sharing comparison operation with other parties based on the secret sharing shards of the pair of adjacent elements to obtain the secret sharing shards of the grouped mark columns; the elements of the grouping flag column indicate whether there is a grouping boundary point between the pair of adjacently positioned elements to which it corresponds.

6. The method of claim 5, wherein when the elements of the pair of adjacent positions are the same, the corresponding element in the grouping flag column is 1, otherwise 0;

7. The method of claim 5, wherein when the elements of the pair of adjacent positions are the same, the corresponding element in the grouping flag column is 0, otherwise 1;

8. A multi-key grouping information acquisition system based on secret sharing is disclosed, wherein data columns corresponding to a plurality of information items of a plurality of objects are vertically distributed in a plurality of parties, elements of each data column are arranged based on at least two information items serving as sorting keys, and the elements at the same position in each data column correspond to the same object; the system is implemented in one of the parties, comprising:

the obtaining module is used for obtaining the secret sharing fragment of the data column corresponding to the sorting key;

a first secret sharing operation module configured to: aiming at each sorting key, performing secret sharing operation with other parties based on the secret sharing fragment of the data column corresponding to the sorting key to obtain the secret sharing fragment of the grouping mark column corresponding to the sorting key, wherein the element of the grouping mark column indicates grouping information of the element in the data column corresponding to the sorting key;

and the second secret sharing operation module is used for performing secret sharing operation with other parties based on the secret sharing fragment of the grouping mark column corresponding to each sorting key to obtain the secret sharing fragment of the multi-key grouping mark column, wherein elements of the multi-key grouping mark column indicate joint grouping information of the elements in the data column corresponding to each sorting key based on each sorting key.

9. A secret sharing-based multi-key grouping information acquisition apparatus comprises a processor and a storage device, wherein the storage device is used for storing instructions, and when the processor executes the instructions, the method according to any one of claims 1-7 is implemented.

10. A data aggregation method based on secret sharing is characterized in that data columns corresponding to a plurality of information items of a plurality of objects are vertically distributed in a plurality of parties, elements of each data column are arranged on the basis of at least two information items serving as sorting keys, and the elements at the same position in each data column correspond to the same object; the method is performed by one of the parties, comprising:

carrying out secret sharing operation with other parties based on the secret sharing fragment of the grouping mark column corresponding to each sorting key to obtain the secret sharing fragment of the multi-key grouping mark column, wherein elements of the multi-key grouping mark column indicate joint grouping information of the elements in the data column corresponding to each sorting key based on each sorting key;

disclosing a secret sharing fragment of a multi-key grouping mark column to obtain the multi-key grouping mark column;

and acquiring an aggregation result corresponding to each group of the data columns to be aggregated in the data columns corresponding to the plurality of information items and/or a secret sharing fragment of the aggregation result corresponding to each group according to the multi-key grouping mark column.

11. The method of claim 10, wherein the aggregated result comprises a result of one or more of the following operations on an intra-group element: summing, averaging, counting, maximizing, averaging, minimizing.

12. A data aggregation system based on secret sharing is disclosed, wherein data columns corresponding to a plurality of information items of a plurality of objects are vertically distributed in a plurality of parties, elements of each data column are arranged based on at least two information items serving as sorting keys, and the elements at the same position in each data column correspond to the same object; the system is implemented in one of the parties, comprising:

the second secret sharing operation module is used for performing secret sharing operation with other parties based on the secret sharing fragment of the grouping mark column corresponding to each sorting key to obtain the secret sharing fragment of the multi-key grouping mark column, wherein elements of the multi-key grouping mark column indicate joint grouping information of the elements in the data column corresponding to each sorting key based on each sorting key;

a disclosure module configured to disclose the secret sharing shard of the multi-key grouping tag column to obtain the multi-key grouping tag column;

and the aggregation module is used for obtaining the aggregation result corresponding to each group of the data columns to be aggregated in the data columns corresponding to the plurality of information items and/or the secret sharing fragment of the aggregation result corresponding to each group according to the multi-key grouping mark column.

13. A data aggregation apparatus based on secret sharing, comprising a processor and a storage device for storing instructions, wherein the processor, when executing the instructions, implements the method of any one of claims 10 or 11.