CN116522390A

CN116522390A - Data set processing method, electronic device and storage medium

Info

Publication number: CN116522390A
Application number: CN202310483611.5A
Authority: CN
Inventors: 薛原; 金凡; 秘相友
Original assignee: China Financial Certification Authority Co ltd
Current assignee: China Financial Certification Authority Co ltd
Priority date: 2023-04-28
Filing date: 2023-04-28
Publication date: 2023-08-01

Abstract

The application relates to a data set processing method, electronic equipment and a storage medium. The method comprises the following steps: forming a dictionary tree corresponding to each data holder in response to the dictionary tree construction instruction; respectively determining the root node as a current query node of each data holder, and marking the root node as an access completion node; triggering each data holder to execute the parameter processing interaction event in response to the parameter processing instruction; determining whether to record the character string attribute value of the current query node according to the target decryption parameters determined by the parameter processing interaction event; and updating the current query node of each data holder until the record number of the character string attribute values reaches the preset number or the current query node cannot be updated, and transmitting all the recorded character string attribute values to each data holder. The method and the device can allow a plurality of data holders to participate in finding out a specific number of intersection elements, and the specific number of intersection elements can be efficiently found out under the condition that information security of the plurality of data holders is ensured.

Description

Data set processing method, electronic device and storage medium

Technical Field

The present disclosure relates to the field of data processing technologies, and in particular, to a data set processing method, an electronic device, and a storage medium.

Background

When parties or individuals holding respective private data sets cooperate to find a specific number of intersection elements in the private data sets held by the parties, in the related art, all the intersection elements in the private data sets held by the parties may be obtained through a conventional private set intersection (Private Set Intersection, PSI) technology, which may cause leakage of redundant intersection elements and poor finding efficiency. In addition, some additional constraints may be placed on the required intersection elements by the application scenario, for example, selecting a convenient and as early time as possible for the meeting without revealing the respective schedules of all the participants.

In view of this, there is a need to propose a data set processing method capable of allowing a plurality of data holders to participate in finding a specific number of intersection elements, so as to efficiently find a specific number of intersection elements while securing information of a plurality of data holders.

Disclosure of Invention

In order to overcome the problems in the related art, the application provides a data set processing method, an electronic device and a storage medium, wherein the data set processing method can allow a plurality of data holders to participate in finding out a specific number of intersection elements, and the specific number of intersection elements can be efficiently found out under the condition of ensuring the information security of the plurality of data holders.

A first aspect of the present application provides a data set processing method, including:

triggering each data holder to form a dictionary tree corresponding to each data holder based on the data set held by each data holder in response to the dictionary tree construction instruction; respectively determining a root node in a dictionary tree correspondingly generated by each data holder as a current query node of each data holder, and marking the current query node of each data holder as an access completion node; responding to the parameter processing instruction to trigger each data holder to execute a parameter processing interaction event, wherein the parameter processing interaction event is used for determining a target decryption parameter corresponding to the current query node of the first data holder; determining whether to record the character string attribute value of the current query node according to the target decryption parameters; updating the current query node of each data holder and marking the updated current query node as an access completion node; determining whether to record the character string attribute value of the updated current query node according to the target decryption parameters corresponding to the updated current query node of each data holder; and until the record quantity of the character string attribute values reaches the preset quantity or the current query node of any data holder cannot be updated, sending all the recorded character string attribute values to each data holder.

In one embodiment, the parameter processing instructions include a first processing instruction and a second processing instruction; in triggering each data holder to execute a parameter handling interactivity event in response to a parameter handling instruction, the parameter handling interactivity event comprises: triggering each data holder to determine the state encryption parameters corresponding to each data holder based on the current query node of the dictionary tree and the state attribute parameters of each current query node in response to the first processing instruction; and triggering each data holder to determine a target decryption parameter corresponding to the current query node of the first data holder based on the state encryption parameter corresponding to each data holder respectively in response to the second processing instruction.

In one embodiment, the first processing instructions include key pair generation instructions and parameter encryption instructions; triggering each data holder to determine the state encryption parameters respectively corresponding to each data holder based on the current query node of the respective dictionary tree and the state attribute parameters of each current query node in response to the first processing instruction comprises: the method comprises the steps that a first data holder is triggered to generate a preset private key and a preset public key in response to a key pair generation instruction, and the first data holder is triggered to send the preset public key to other data holders; and triggering each data holder to encrypt the state attribute parameters of the current query node through a preset public key respectively in response to the parameter encryption instruction to obtain the state encryption parameters corresponding to each data holder respectively.

In one embodiment, the second processing instructions include parameter operation instructions and parameter decryption instructions; triggering each data holder to determine a target decryption parameter corresponding to the current query node of the first data holder based on the state encryption parameter corresponding to each data holder respectively in response to the second processing instruction comprises: triggering each data holder to determine a target encryption parameter in a second data holder based on the state encryption parameter corresponding to each data holder respectively in response to the parameter operation instruction; and responding to the parameter decryption instruction, triggering the second data holder to send the target encryption parameter to the first data holder, and triggering the first data holder to decrypt the target encryption parameter through a preset private key to obtain the target decryption parameter.

In one embodiment, triggering each data holder in response to the parameter operation instruction to determine the target encryption parameter in the second data holder based on the state encryption parameter respectively corresponding to each data holder includes: and multiplying the state encryption parameters corresponding to each data holder respectively in response to the parameter operation instruction to obtain the target encryption parameters.

In one embodiment, updating the current query node for each data holder includes: and responding to the query node update instruction to trigger each data holder to execute an array processing interaction event, wherein the array processing interaction event is used for updating the current query node of each data holder.

In one embodiment, in triggering each data holder to execute an array processing interactivity event in response to a query node update instruction, the array processing interactivity event comprises: triggering each data holder to encrypt each array element of the child node array of the current query node of each data holder through a preset public key in response to an array encryption instruction to obtain an initial encryption array corresponding to each data holder; the number of the array elements of the child node array is used for indicating whether the characters corresponding to the child node of the current query node are identical to the characters in the character set or not; triggering each data holder to determine a target encryption array in a second data holder based on the initial encryption arrays respectively corresponding to each data holder in response to the array operation instruction; the second data holder is triggered to send the target encrypted array to the first data holder in response to the array decryption instruction, and the first data holder is triggered to decrypt the target encrypted array through a preset private key to obtain a target decryption array; and determining the updated current query node according to the array elements in the target decryption array.

In one embodiment, determining the updated current query node from array elements in the target decryption array includes: if the target decryption array has a target array element larger than zero and the child node which is the same as the character corresponding to the target array element in the character set is not marked as an access completion node, determining the child node which is the same as the character corresponding to the target array element with the minimum array element number in the character set as an updated current query node; if the target decryption array has target array elements larger than zero and has the same child nodes as the characters corresponding to the target array elements in the character set, marking the child nodes as access completion nodes, or if the array elements in the target decryption array are all zero, judging whether the current query node is a root node; if the current query node is judged to be the root node, determining that the intersection element in each data holder is queried, and executing the step of transmitting all recorded character string attribute values to each data holder; if the current query node is judged not to be the root node, determining the updated current query node according to the array elements in the target decryption array corresponding to the parent node of the current query node.

In one embodiment, triggering each data holder in response to the array operation instruction to determine a target encryption array in the second data holder based on the initial encryption array respectively corresponding to each data holder includes: and multiplying each array element in the initial encryption array corresponding to each data holder according to the array element number in a one-to-one correspondence manner in response to the array operation instruction to obtain the target encryption array.

In one embodiment, determining whether to record the string property value of the current query node based on the target decryption parameter comprises: and if the target decryption parameter is a preset parameter, determining to record the character string attribute value of the current query node.

In one embodiment, triggering each data holder to form a corresponding dictionary tree for each data holder based on the respective held data set in response to the dictionary tree construction instruction includes: triggering each data holder to form an initial dictionary tree corresponding to each data holder based on the data set held by each data holder in response to the dictionary tree construction instruction; each data holder is triggered to respectively generate a plurality of random character strings in response to the dictionary tree protection instruction, and the data holders are triggered to respectively insert the generated random character strings into the initial dictionary trees to form the dictionary tree corresponding to each data holder; if the end tree node corresponding to the end character of the random character string exists in the initial dictionary tree, the state attribute parameter of the end tree node is not updated; and if the tail tree node corresponding to the tail character of the random character string does not exist in the initial dictionary tree, updating the state attribute parameter of the tail tree node into a preset distinguishing parameter.

A second aspect of the present application provides an electronic device, comprising:

a processor; and

a memory having executable code stored thereon which, when executed by the processor, causes the processor to perform the method as described above.

A third aspect of the present application provides a non-transitory machine-readable storage medium having stored thereon executable code which, when executed by a processor of an electronic device, causes the processor to perform the method as described above.

The technical scheme that this application provided can include following beneficial effect:

according to the data set processing method, the electronic device and the storage medium, each data holder is triggered to form the dictionary tree corresponding to each data holder based on the data set held by each data holder in response to the dictionary tree construction instruction, so that root nodes in the dictionary tree generated by each data holder correspondingly are respectively determined to be current query nodes of each data holder, and the current query nodes of each data holder are marked to be access completion nodes. And further responding to the parameter processing instruction to trigger each data holder to execute a parameter processing interaction event, wherein the parameter processing interaction event is used for determining a target decryption parameter corresponding to the current query node of the first data holder, so as to determine whether to record the character string attribute value of the current query node according to the target decryption parameter. Further updating the current query node of each data holder, marking the updated current query node as an access completion node, determining whether to record the character string attribute values of the updated current query node according to the target decryption parameters corresponding to the updated current query node of each data holder until the record quantity of the character string attribute values reaches the preset quantity or the current query node of any data holder cannot be updated, and transmitting all the recorded character string attribute values to each data holder. Thus, a plurality of data holders can be allowed to participate in finding out a specific number of intersection elements, and the specific number of intersection elements can be efficiently found out while ensuring the information security of the plurality of data holders.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The above, as well as additional purposes, features, and advantages of exemplary embodiments of the present application will become readily apparent from the following detailed description when read in conjunction with the accompanying drawings. In the drawings, several embodiments of the present application are shown by way of example and not by way of limitation, and identical or corresponding reference numerals indicate identical or corresponding parts.

FIG. 1 is one of the flow diagrams of a data set processing method according to an embodiment of the present disclosure;

FIG. 2 is a second flow chart of a data set processing method according to an embodiment of the present disclosure;

FIG. 3 is a third flow chart of a data set processing method according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Embodiments will now be described with reference to the accompanying drawings. It will be appreciated that for simplicity and clarity of illustration, where considered appropriate, reference numerals have been repeated among the figures to indicate corresponding or analogous elements. Furthermore, the present application sets forth numerous specific details in order to provide a thorough understanding of the embodiments described herein. However, it will be understood by those of ordinary skill in the art that the embodiments described herein may be practiced without these specific details. In other instances, well-known methods, procedures, and components have not been described in detail so as not to obscure the embodiments described herein. Moreover, this description should not be taken as limiting the scope of the embodiments described herein.

All intersection elements in the private data set held by each party are obtained through the traditional privacy set intersection (Private Set Intersection, PSI) technology, but the situation that redundant intersection elements are revealed and the searching efficiency is low may be caused. In addition, some additional constraints may be placed on the required intersection elements by the application scenario, for example, selecting a convenient and as early time as possible for the meeting without revealing the respective schedules of all the participants. In view of this, there is a need to propose a data set processing method capable of allowing a plurality of data holders to participate in finding a specific number of intersection elements, so as to efficiently find a specific number of intersection elements while securing information of a plurality of data holders.

In view of the above problems, embodiments of the present application provide a data set processing method, which can allow multiple data holders to participate in finding a specific number of intersection elements, and efficiently find a specific number of intersection elements while ensuring information security of the multiple data holders.

The following describes the technical scheme of the embodiments of the present application in detail with reference to the accompanying drawings.

Fig. 1 is a schematic flow chart of a data set processing method according to an embodiment of the present application. Referring to fig. 1, a data set processing method shown in an embodiment of the present application may include:

in step 101, each data holder is triggered to form a dictionary tree corresponding to each data holder based on the respective held data set in response to the dictionary tree construction instruction. The dictionary tree construction instruction is an instruction generated based on the occurrence of an event of a specific number of intersection elements in the private data set held by each party, and may be generated by user input or generated by a processor according to a monitoring situation, and needs to be determined according to an actual application situation, which is not limited in this aspect.

The dictionary tree, which is either a trie tree or a prefix tree, is a data structure for quick retrieval of strings. Each data holder can contain a plurality of character strings in the data set, each character combination in the character strings of the data set forms a character set, and the character set can be expressed as Σ= { c ₁ ，...，c _m }, where Σ is the character set, c ₁ ，...，c _m Representing each character in the character set, m is an integer greater than or equal to 1. In particular, the data set may be formed by strings of multiple languages such as english or chinese, for example, when processing a string of chinese type, codes or pinyin corresponding to each character in the string may be first obtained, and then converted into corresponding bytes to form a byte sequence, so that according to the byte sequence corresponding to each string, the construction of a corresponding dictionary tree may be completed according to the rule that each byte corresponds to one out edge.

The dictionary tree constructed based on the data set has the following characteristics: the dictionary tree has a number of tree nodes (including a root node) with each tree node except the root node having a character in a character set. In addition, for each non-leaf node in the plurality of tree nodes, the non-leaf node is a non-endpoint node, that is, a node with child nodes, and characters contained in all child nodes of each non-leaf node are different. Furthermore, for any character string belonging to the data set, there is a tree node among the plurality of tree nodes, so that characters passing from the root node to the tree node are connected to form the character string. Finally, for each leaf node in the plurality of tree nodes, the leaf node is a destination node, that is, a node which is not followed by a child node, and character strings formed by connecting characters passing from the root node to the leaf node are all character strings belonging to a data set.

In this embodiment, the tree node attribute may include four attributes, which are a char attribute, a key attribute, a value attribute, and a child attribute. The value of the char attribute is the character on the tree node; the value of the key attribute is a state attribute parameter, and the state attribute parameter is used for indicating whether characters passing from the root node to the current tree node can form a character string belonging to a data set or not after being connected, and the state attribute parameter can be a Boolean value; the value of the value attribute is a character string attribute value and is used for representing a character string formed by connecting characters passing through from a root node to a current tree node; the value of the child attribute is a child node array, the child node array is a Boolean array with the array length being the number of characters in the character set, and each array element in the array is used for indicating whether a child node exists in the current node or not, so that the child node has the char attribute value being the character corresponding to the character set.

In the embodiment of the application, the time complexity isThe dictionary tree construction algorithm of (2) may be, for example:

Firstly, creating a root node R of a dictionary tree;

then, for each string in the data set SLet v≡r (i.e. let the first node be the root node), for j≡0, m _i In the dictionary tree construction algorithm, j represents the character number in the character string, if the current tree node v has a child node u and satisfies u.char=c _j V≡u (i.e. update the current tree node to the child node u); otherwise, creating node u, letting u.char≡c _j The key attribute, value attribute, and child attribute of u are updated, V (T) ≡v (T) ≡u }, E (T) ≡e (T) ≡u { (V, u) }, v≡u (i.e., the current tree node V is updated to the tree node u). In the dictionary tree construction algorithm, i represents the number of character strings in a data set, i is more than or equal to 0 and less than or equal to k-1, and k is the number of character strings in the data set;is character string s _i The u.char is the char attribute of the node u, V (T) is the set of all tree nodes, E (T) is the set of directed edges in the dictionary tree, and (V, u) refers to the directed edges pointed to the node u by the current node V;

next, if j=m _i Then v.key≡true, v.value≡s _i . Wherein, the key represents the key attribute of the current node v, and the value represents the value attribute of the current node v. It will be appreciated that the value of current j is equal to the string s _i The character number of the last character can determine that the characters passing from the root node to the current tree node are connected to form a character string belonging to the data set under the current tree node, so that the key attribute of the current tree node is updated to True, and the value attribute of the current tree node is updated to the character string s _i 。

It will be appreciated that the above dictionary tree construction algorithm is merely exemplary, and in practical applications, other construction algorithms may be selected to construct the dictionary tree, and the dictionary tree may be constructed by determining an appropriate construction algorithm according to the practical application, which is not limited in this aspect of the present application.

In step 102, the root node in the dictionary tree generated by each data holder is determined as the current query node of each data holder, and the current query node of each data holder is marked as an access completion node. It can be understood that each data holder takes the root node of the respective dictionary tree as the first query node, and if any intermediate tree node is taken as the first query node, query errors may occur, so that the searching efficiency of intersection elements is reduced.

In the embodiment of the application, the tree nodes which are already used as the current query nodes are marked as access completion nodes, so that the problem of efficiency reduction caused by repeated access is avoided.

In step 103, each data holder is triggered to execute a parameter handling interaction event in response to the parameter handling instructions. The parameter processing instruction may be automatically generated after the dictionary tree corresponding to each data holder is built, or may be generated by user input, which needs to be determined according to the actual application situation, and the application is not limited in this aspect.

The above-mentioned parameter processing interaction event is used for determining a target decryption parameter corresponding to the current query node of the first data holder, where the parameter processing interaction event requires each data holder to participate in execution together.

In step 104, it is determined whether to record the string property value of the current query node according to the target decryption parameter. The target decryption parameter reflects whether the current query node of each data holder to the root node of each data holder can respectively form a complete character string in the data set of each data holder, if so, the character string attribute value of the current query node of the first data holder is determined, and it can be understood that the character string attribute value of the current query node of the first data holder is an intersection element shared by each data holder.

In step 105, the current query node of each data holder is updated and the updated current query node is marked as an access completion node. The current query node of each data holder is updated so that other intersection elements can be continuously searched, and the updated current query node also needs to be marked as an access completion node.

In step 106, it is determined whether to record the character string attribute value of the updated current query node according to the target decryption parameter corresponding to the updated current query node of each data holder. In the embodiment of the present application, the operations of steps 103 to 104 are re-executed substantially based on the updated current query node, so as to determine whether to record the character string attribute value of the updated current query node. If the record is determined, recording the updated character string attribute value of the current query node, and continuously updating the current query node of each data holder; if it is determined that no record is present, the update is continued directly for the current query node of each data holder, and so on.

In step 107, all the recorded string attribute values are sent to each data holder until the number of records of the string attribute values reaches a preset number or the current query node of any data holder cannot be updated. In this embodiment of the present application, a number may be preset to determine a specific number of intersection elements that each data holder needs to find in this collaboration. When the recorded number of the character string attribute values reaches the preset number, the fact that a sufficient number of intersection elements are searched is indicated, and therefore all the recorded character string attribute values are sent to each data holder, and each data holder can acquire the specific number of intersection elements required to be searched in the current cooperation.

It can be understood that when the current query node of any data holder cannot be updated, it indicates that no tree node in the dictionary tree corresponding to the data holder can be used as the updated current query node, and the tree node to be accessed is already accessed, so that it indicates that the data holder cannot necessarily have the same intersection element as other data holders. Therefore, the searching of the intersection elements can be finished at this time, and all the recorded character string attribute values are sent to each data holder, so that each data holder can acquire the specific number of intersection elements required to be searched in the present cooperation.

And triggering each data holder to form a dictionary tree corresponding to each data holder based on the data set held by each data holder in response to the dictionary tree construction instruction, further respectively determining a root node in the dictionary tree generated by each data holder as a current query node of each data holder, and marking the current query node of each data holder as an access completion node. And further responding to the parameter processing instruction to trigger each data holder to execute a parameter processing interaction event, wherein the parameter processing interaction event is used for determining a target decryption parameter corresponding to the current query node of the first data holder, so as to determine whether to record the character string attribute value of the current query node according to the target decryption parameter. Further updating the current query node of each data holder, marking the updated current query node as an access completion node, determining whether to record the character string attribute values of the updated current query node according to the target decryption parameters corresponding to the updated current query node of each data holder until the record quantity of the character string attribute values reaches the preset quantity or the current query node of any data holder cannot be updated, and transmitting all the recorded character string attribute values to each data holder. Thus, a plurality of data holders can be allowed to participate in finding out a specific number of intersection elements, and the specific number of intersection elements can be efficiently found out while ensuring the information security of the plurality of data holders.

In some embodiments, the dictionary tree generated by each data holder separately may be protected by generating a random string, and the state attribute parameters of each current query node may also be processed to form target decryption parameters. Fig. 2 is a second flowchart of a data set processing method according to an embodiment of the present application, and referring to fig. 2, the data set processing method according to an embodiment of the present application may include:

in step 201, each data holder is triggered to form a dictionary tree corresponding to each data holder based on the respective held data set in response to the dictionary tree construction instruction.

In this embodiment of the present application, each data holder may be triggered to form an initial dictionary tree corresponding to each data holder based on a data set held by each data holder in response to a dictionary tree construction instruction, and then each data holder is triggered to generate a plurality of random character strings respectively in response to a dictionary tree protection instruction.

If the end tree node corresponding to the end character of the random character string exists in the initial dictionary tree, the state attribute parameter of the end tree node is not updated; and if the tail tree node corresponding to the tail character of the random character string does not exist in the initial dictionary tree, updating the state attribute parameter of the tail tree node into a preset distinguishing parameter. For example, the preset distinguishing parameter may be set to "False" or "0" in order to distinguish the random string from the original string in the data set, and the string formed by connecting the characters passing through the last tree node corresponding to the last character of the random string from the root node does not belong to the data set. Therefore, the random character strings are inserted into the initial dictionary tree, so that the data of the initial dictionary tree can be protected, and meanwhile, the data of the initial dictionary tree cannot be confused.

In some application scenarios, for example when processing a time-type privacy dataset, one and the same data holder is selected to schedule the meeting at the earliest possible time without revealing the respective schedules of all data holders. Assuming that the constraint of "as early as possible" does not exist, all data holders may also pre-select a hash algorithm (also called a hash function, including a remainder method, a folding method, a radix conversion method, and a data rearrangement method) together before building the dictionary tree, and each data holder maps each string (including each randomly generated string) held by each data holder into a hash value through the hash algorithm, so that different strings have different hash values. Each data holder may then construct a respective dictionary tree based on the calculated hash values. After performing the operations of steps 102 to 107 described above based on the current dictionary tree, when each data holder obtains a common hash value of all data holders, an associated character string may be obtained from the common hash value, that is, intersection elements of all data holders are obtained.

In step 202, the root node in the dictionary tree generated by each data holder is determined as the current query node of each data holder, and the current query node of each data holder is marked as an access completion node.

In the embodiment of the present application, the content of step 202 is the same as that of step 102, and will not be described here again.

In step 203, each data holder is triggered to determine a state encryption parameter corresponding to each data holder based on the current query node of the respective dictionary tree and the state attribute parameter of each current query node in response to the first processing instruction. In embodiments of the present application, the parameter processing instructions may include, but are not limited to, a first processing instruction and a second processing instruction, wherein the first processing instruction may include, but is not limited to, a key pair generation instruction and a parameter encryption instruction.

Specifically, the first data holder may be triggered to generate a preset private key and a preset public key in response to the key pair generation instruction, and trigger the first data holder to send the preset public key to the remaining data holders. The first data holder may be any one of all the data holders, or may be the first data holder after all the data holders are arranged according to a random number sequence, and in practical application, the first data holder needs to be determined according to the practical application situation, which is not limited in this aspect. In addition, the generation of the preset private key and the preset public key may be that the first data holder selects an identical-state encryption scheme for generation, the identical-state encryption scheme is one of identical-state encryption schemes, the identical-state encryption (Homomorphic Encryption, abbreviated as HE) is one of main technologies for constructing a secure multiparty computing protocol architecture, and the secure multiparty computing is a cryptography-based privacy computing technology, which allows each data provider to cooperatively complete the computation of a predetermined function without revealing private data and without trusted third parties, and the secure multiparty computing has the characteristics of decentralizing, guaranteeing the security of input data, guaranteeing the accuracy of computing results, and the like. The isomorphic encryption schemes may be, for example, braker ski-Gentry-Vaikuntanathan (BGV), braker ski-Fan-Vercauteren (BFV), etc., and in practical application, an appropriate isomorphic encryption scheme needs to be selected according to the practical application, which is not limited in this aspect of the application.

Further, in response to the parameter encryption instruction, each data holder is triggered to encrypt the state attribute parameters of the current query node through the preset public key respectively to obtain the state encryption parameters corresponding to each data holder respectively, wherein the state encryption parameters can be expressed as a _i Wherein i denotes the number of the data holder, a _i The state encryption parameter of the ith data holder is represented, i is more than or equal to 1 and less than or equal to n, n is the total number of the data holders, and n is more than or equal to 2.

In step 204, each data holder is triggered to determine a target decryption parameter corresponding to the current query node of the first data holder based on the state encryption parameter corresponding to each data holder, respectively, in response to the second processing instruction. In embodiments of the present application, the second processing instruction may include, but is not limited to, a parameter operation instruction and a parameter decryption instruction.

Specifically, each data holder may be triggered to determine the target encryption parameter in the second data holder based on the state encryption parameter respectively corresponding to each data holder in response to the parameter operation instruction. In the embodiment of the application, the state encryption parameters corresponding to each data holder can be multiplied by each other in response to the parameter operation instruction to obtain the target encryption parameters. The second data holder may be any one data holder except the first data holder among all the data holders, or may be the last data holder after all the data holders are arranged according to a random number sequence, in practical application, the second data holder needs to be determined according to practical application conditions, which is not limited in this aspect of the application.

Let n data holders be respectivelyP ₁ ，...，P _n The state encryption parameters corresponding to each data holder are a respectively ₁ ，...，a _n The target encryption parameter may be calculated by the following expression, for example:

b＝a ₁ ×a ₂ ×…×a _n

wherein b is a target encryption parameter, P ₁ P as the first data holder _n Is the second data holder.

In practical applications, the target encryption parameter may be determined by chain multiplication, for example. Specifically, a first data holder P ₁ Its corresponding state encryption parameter a ₁ To the data holder P ₂ By the data-holding party P ₂ Calculation of a ₁ ×a ₂ Then the data holder P ₂ Transmitting the calculation result to the data holder P ₃ By the data-holding party P ₃ Calculation of a ₁ ×a ₂ ×a ₃ Then the data holder P ₃ Transmitting the calculation result to the data holder P ₄ And so on, up to the data holder P _n-1 Send its calculation result to the second data holder P _n Finally by P _n Calculation of a ₁ ×a ₂ ×…×a _n Thereby determining the target encryption parameter b in the second data holder. First data holder P ₁ Intermediate calculation results are not obtained in this calculation process.

Still further exemplary, a divide-by-two multiplication strategy may also be employed to determine the target encryption parameter. In particular, n data holders P may be provided ₁ ，...，P _n Two-by-two groupings, in particular, only one data holder is allowed to exist for a maximum of one group. First, calculating the product of the state encryption parameters of each group of data holders, then calculating the product of the calculation results of the two groups of data holders according to the calculation results, and so on until the target encryption parameter b is calculated, and defining the data holder determining the target encryption parameter b as the second data holder. Likewise, the first data holder P ₁ No intermediate calculations are obtained during this calculationAs a result.

It will be appreciated that the above expression for calculating the target encryption parameter and the manner of calculating using the expression are merely exemplary, and in practical application, an appropriate calculation expression and calculation manner need to be selected according to the practical application, and the present application is not limited in this respect.

Further, the second data holder is triggered to send the target encryption parameters to the first data holder in response to the parameter decryption instruction, and the first data holder is triggered to decrypt the target encryption parameters through a preset private key to obtain target decryption parameters.

In step 205, it is determined whether to record the string property value of the current query node according to the target decryption parameter. Specifically, if the target decryption parameter is a preset parameter, determining to record the attribute value of the character string of the current query node. In the embodiment of the present application, the state encryption parameter may be a boolean value, i.e. be "True" or "False", and since the state encryption parameter needs to participate in the operation, the state encryption parameter may be converted into "1" and "False" into "0". Therefore, the preset parameter may be set to 1, if the target decryption parameter is 1, the attribute value of the character string of the current query node is determined and recorded, and k+.k-1, k being greater than or equal to 1, k being the preset number, that is, the specific number of intersection elements that each data holder needs to find in the present collaboration.

In some embodiments, the current query node of each data holder may be updated by triggering each data holder to execute an array processing interaction event in response to a query node update instruction to determine an updated current query node. Fig. 3 is a third flow chart of a data set processing method according to an embodiment of the present application, and referring to fig. 3, an array processing interaction event in the data set processing method according to an embodiment of the present application may include:

in step 301, each data holder is triggered to encrypt each array element of the child node array of the current query node of each data holder by a preset public key in response to the array encryption instruction to obtain a corresponding array element of each data holderAn array is initially encrypted. The initial encryption array may be denoted as a _i I represents the number of the data holder, A _i Representing the child node array of the ith data holder. The number of the array elements of the child node array is used for indicating whether the characters corresponding to the child node of the current query node are identical to the characters in the character set or not, and whether the characters are identical or not can be indicated by using a boolean value.

Illustratively, assuming that the character set is { a, b, c }, the character corresponding to the child node of the current query node is c, then the child node array of the current query node is [0, 1]. It may be understood that the number of characters of the character set may be greater in practical application, child nodes of the current query node may also be various, and the child node array of the current query node needs to be determined according to the practical application, which is not limited in this aspect of the present application.

In step 302, each data holder is triggered to determine a target encryption array in the second data holder based on the initial encryption arrays respectively corresponding to each data holder in response to the array operation instruction. In this embodiment of the present application, each array element in the initial encryption array corresponding to each data holder may be multiplied according to the array element number in a one-to-one correspondence manner in response to the array operation instruction, so as to obtain the target encryption array.

Let n data holders be P respectively ₁ ，...，P _n The initial encryption arrays corresponding to the data holders are A respectively ₁ ，...，A _n The target encryption array may be calculated by the following expression, for example:

B[j]＝A ₁ [j]×A ₂ [j]…×A _n [j]

wherein B [ j ]]For the array element with the array element number j in the target encryption array, A ₁ [j]，A ₂ [j]...，A _n [j]The method is characterized in that the method comprises the steps of setting an array element number j in an initial encryption array, wherein j is the array element number of a target encryption array or the initial encryption array, and is the character number in a character set, wherein j is the character number of the character set, m is the character number of the character set, and is the array length of the target encryption array or the initial encryption array.

In practical applications, the target encryption array may be determined by chain multiplication in step 204, for example. Specifically, a first data holder P ₁ Its corresponding initial encryption array A ₁ To the data holder P ₂ By the data-holding party P ₂ Calculation A ₁ [j]×A ₂ [j]. Suppose A ₁ ＝[1，1，0]，A ₂ ＝[1，0，1]J=1, then a ₁ [1]×A ₂ [1]=1; j=2, then a ₁ [2]×A ₂ [2]=0; j=3, then a ₁ [3]×A ₂ [3]=0, thus the first data holder P ₁ With the data holder P ₂ The calculated result of (1, 0)]. Then the data holder P ₂ Transmitting the calculation result to the data holder P ₃ By the data-holding party P ₃ Calculation A ₁ [j]×A ₂ [j]×A ₃ [j]Then the data holder P ₃ Transmitting the calculation result to the data holder P ₄ And so on, up to the data holder P _n-1 Send its calculation result to the second data holder P _n Finally by P _n Calculation A ₁ [j]×A ₂ [j]...×A _n [j]Thereby determining the target encryption array B in the second data holder. First data holder P ₁ Intermediate calculation results are not obtained in this calculation process.

Still further by way of example, the divide-by-conquer multiplication strategy described above in step 204 may also be employed to determine the target encryption array. In particular, n data holders P may be provided ₁ ，...，P _n Two-by-two groupings, in particular, only one data holder is allowed to exist for a maximum of one group. First calculating the product of the initial encrypted arrays of each group of data holders, then calculating the calculation of the two groups of data holders according to the calculation resultThe resulting product, and so on, until the target encryption array B is calculated, and the data holder that determines the target encryption array B is defined as the second data holder. Likewise, the first data holder P ₁ Intermediate calculation results are not obtained in this calculation process.

It will be appreciated that the above expression for calculating the target encryption array and the manner of calculating using the expression are merely exemplary, and in practical application, an appropriate calculation expression and calculation manner need to be selected according to the practical application, and the application is not limited in this respect.

In step 303, the second data holder is triggered to send the target encrypted array to the first data holder in response to the array decryption instruction, and the first data holder is triggered to decrypt the target encrypted array by the preset private key, so as to obtain the target decrypted array.

In step 304, an updated current query node is determined from array elements in the target decryption array. In this embodiment of the present application, the target decryption array obtained by the first data holder through decryption is sent to other data holders, so that other data holders can also check the array elements in the target decryption array.

Specifically, if there is a target array element greater than zero in the target decryption array and the child node having the same character as the target array element corresponds to the character set is not marked as an access completion node, it may be understood that a child node having the same character as the target array element corresponds to the character set may be a child node of the current query node of each data holder, and the child node has not been accessed yet, and continuing to query the child node may find a complete character string or a partial prefix of the complete character string that each data holder commonly has, determining the child node having the same character as the character corresponding to the target array element with the smallest number of array elements in the character set as the updated current query node.

Illustratively, assuming the character set { a, b, c }, the target decryption array is [1, 0], it is explained that the tree node with character a can be a child node of each data holder's current query node in each data holder. At this time, the array element number of the target array element is 1, which is the smallest array element number in the target decryption array, and thus the child node having the character a is determined as the updated current query node.

In particular, in some application scenarios, such as when processing a time-type privacy dataset, one and all data holders are selected to schedule a meeting at a convenient and as early time as possible without revealing their respective schedules. An extra condition of "as early as possible" is added here, and then the common intersection element with the smallest word order needs to be found, so that the limitation of "the number of array elements is the smallest" needs to be added. In contrast, in some application scenarios, only a specific number of intersection elements of each data holder need to be found, and the limitation of "the number of array elements is minimum" may not be required.

If the target decryption array has target array elements larger than zero and the child nodes with the same characters as the target array elements in the character set are marked as access completion nodes, or if the array elements in the target decryption array are zero, further judging whether the current query node is a root node. Specifically, if the current query node is determined to be the root node, determining that the intersection element in each data holder is queried, wherein the number of records of the possible character string attribute values is smaller than the preset number k, but no suitable child node can be used as the updated current query node, and if the current query node of each data holder is indicated that the current query node of each data holder cannot continue updating, executing the step of transmitting all the recorded character string attribute values to each data holder; if the current query node is not the root node, determining an updated current query node according to the array elements in the target decryption array corresponding to the parent node of the current query node, and re-executing step 304 substantially based on the target decryption array corresponding to the parent node of the current query node, re-determining other child nodes of the parent node as updated current query nodes, and re-executing the operation contents of steps 103 to 104 to determine whether to record the character string attribute value of the updated current query node. If the record is determined, recording the updated character string attribute value of the current query node, and continuously updating the current query node of each data holder; if it is determined that no record is present, the update is continued directly for the current query node of each data holder, and so on.

Corresponding to the embodiment of the application function implementation method, the application also provides electronic equipment for executing the data set processing method and corresponding embodiments.

Fig. 4 shows a block diagram of a hardware configuration of an electronic device 400 in which the data set processing method of the embodiment of the present application may be implemented. As shown in fig. 4, electronic device 400 may include a processor 410 and a memory 420. In the electronic apparatus 400 of fig. 4, only constituent elements related to the present embodiment are shown. Thus, it will be apparent to those of ordinary skill in the art that: electronic device 400 may also include common constituent elements that are different from those shown in fig. 4. Such as: a fixed point arithmetic unit.

Electronic device 400 may correspond to a computing device having various processing functions, such as functions for generating a neural network, training or learning a neural network, quantifying a floating point type neural network as a fixed point type neural network, or retraining a neural network. For example, the electronic device 400 may be implemented as various types of devices, such as a Personal Computer (PC), a server device, a mobile device, and so forth.

The processor 410 controls all functions of the electronic device 400. For example, the processor 410 controls all functions of the electronic device 400 by executing programs stored in the memory 420 on the electronic device 400. The processor 410 may be implemented by a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), an Application Processor (AP), an artificial intelligence processor chip (IPU), etc. provided in the electronic device 400. However, the present application is not limited thereto.

In some embodiments, processor 410 may include an input/output (I/O) unit 411 and a computing unit 412. The I/O unit 411 may be used to receive various data such as a dictionary tree construction instruction and a parameter processing instruction. For example, the calculating unit 412 may be configured to trigger, in response to a dictionary tree construction instruction received via the I/O unit 411, each data holder to form a dictionary tree corresponding to each data holder based on the data set held by each data holder, or may trigger, in response to a parameter processing instruction, each data holder to execute a parameter processing interaction event, determine whether to record a string attribute value of the current query node according to a target decryption parameter determined by the parameter processing interaction event. This string property value may be output by the I/O unit 411, for example. The output data may be provided to memory 420 for reading by other devices (not shown) or may be provided directly to other devices for use.

The memory 420 is hardware for storing various data processed in the electronic device 400. For example, the memory 420 may store processed data and data to be processed in the electronic device 400. Memory 420 may store data that is involved in the processing of data sets that have been or are to be processed by processor 410. Further, the memory 420 may store applications, drivers, etc. to be driven by the electronic device 400. For example: the memory 420 may store various programs related to the data set processing method to be executed by the processor 410. The memory 420 may be a DRAM, but the present application is not limited thereto. The memory 420 may include at least one of volatile memory or nonvolatile memory. The nonvolatile memory may include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), flash memory, phase change RAM (PRAM), magnetic RAM (MRAM), resistive RAM (RRAM), ferroelectric RAM (FRAM), and the like. Volatile memory can include Dynamic RAM (DRAM), static RAM (SRAM), synchronous DRAM (SDRAM), PRAM, MRAM, RRAM, ferroelectric RAM (FeRAM), and the like. In an embodiment, the memory 420 may include at least one of a Hard Disk Drive (HDD), a Solid State Drive (SSD), a high density flash memory (CF), a Secure Digital (SD) card, a Micro-secure digital (Micro-SD) card, a Mini-secure digital (Mini-SD) card, an extreme digital (xD) card, a cache (caches), or a memory stick.

In summary, specific functions implemented by the memory 420 and the processor 410 of the electronic device 400 provided in the embodiments of the present disclosure may be explained in comparison with the foregoing embodiments in the present disclosure, and may achieve the technical effects of the foregoing embodiments, which will not be repeated herein.

In this embodiment, the processor 410 may be implemented in any suitable manner. For example, the processor 410 may take the form of, for example, a microprocessor or processor, and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a programmable logic controller, and an embedded microcontroller, among others.

It should be understood that the possible terms "first" or "second" and the like in the claims, specification and drawings disclosed herein are used for distinguishing between different objects and not for describing a particular sequential order. The terms "comprises" and "comprising" when used in the specification and claims of this application are taken to specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the present application is for the purpose of describing particular embodiments only, and is not intended to be limiting of the present disclosure. As used in the specification and claims of this application, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the term "and/or" as used in the present disclosure and claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

It should also be appreciated that any of the modules, units, components, servers, computers, terminals, or devices illustrated herein that execute instructions may include or otherwise access a computer readable medium, such as a storage medium, computer storage medium, or data storage device (removable) and/or non-removable) such as a magnetic disk, optical disk, or magnetic tape. Computer storage media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.

Although the embodiments of the present application are described above, the content is only an example adopted for understanding the present application, and is not intended to limit the scope and application scenario of the present application. Any person skilled in the art can make any modifications and variations in form and detail without departing from the spirit and scope of the disclosure, but the scope of the disclosure is still subject to the scope of the claims.

Claims

1. A data set processing method, comprising:

triggering each data holder to form a dictionary tree corresponding to each data holder based on the data set held by each data holder in response to the dictionary tree construction instruction;

respectively determining a root node in a dictionary tree correspondingly generated by each data holder as a current query node of each data holder, and marking the current query node of each data holder as an access completion node;

triggering each data holder to execute a parameter processing interaction event in response to the parameter processing instruction, wherein the parameter processing interaction event is used for determining a target decryption parameter corresponding to a current query node of the first data holder;

Determining whether to record the character string attribute value of the current query node according to the target decryption parameters;

updating the current query node of each data holder and marking the updated current query node as the access completion node;

determining whether to record the character string attribute value of the updated current query node according to the target decryption parameters corresponding to the updated current query node of each data holder;

and until the record quantity of the character string attribute values reaches the preset quantity or the current query node of any data holder cannot be updated, sending all the recorded character string attribute values to each data holder.

2. The data set processing method according to claim 1, wherein the parameter processing instructions include a first processing instruction and a second processing instruction;

in the triggering each data holder to execute a parameter processing interaction event in response to a parameter processing instruction, the parameter processing interaction event includes:

triggering each data holder to determine a state encryption parameter corresponding to each data holder based on the current query node of the dictionary tree and the state attribute parameter of each current query node in response to the first processing instruction;

And responding to the second processing instruction to trigger each data holder to determine a target decryption parameter corresponding to the current query node of the first data holder based on the state encryption parameter corresponding to each data holder.

3. The data set processing method according to claim 2, wherein the first processing instruction includes a key pair generation instruction and a parameter encryption instruction;

the triggering, in response to the first processing instruction, each data holder to determine a state encryption parameter corresponding to each data holder based on a current query node of a respective dictionary tree and a state attribute parameter of each current query node includes:

the first data holder is triggered to generate a preset private key and a preset public key in response to the key pair generation instruction, and the first data holder is triggered to send the preset public key to other data holders;

and responding to the parameter encryption instruction to trigger each data holder to encrypt the state attribute parameters of the current query node through the preset public key respectively to obtain the state encryption parameters corresponding to each data holder respectively.

4. A data set processing method according to claim 3, wherein the second processing instruction includes a parameter operation instruction and a parameter decryption instruction;

The triggering, by each data holder in response to the second processing instruction, the determining, by each data holder, a target decryption parameter corresponding to the current query node of the first data holder based on the state encryption parameter corresponding to each data holder, including:

triggering each data holder to determine a target encryption parameter in a second data holder based on the state encryption parameter corresponding to each data holder respectively in response to the parameter operation instruction;

and responding to the parameter decryption instruction, triggering the second data holder to send the target encryption parameter to the first data holder, and triggering the first data holder to decrypt the target encryption parameter through the preset private key to obtain the target decryption parameter.

5. The data set processing method according to claim 4, wherein triggering each data holder to determine the target encryption parameter in the second data holder based on the state encryption parameter respectively corresponding to each data holder in response to the parameter operation instruction comprises:

and multiplying the state encryption parameters corresponding to each data holder respectively in response to the parameter operation instruction to obtain the target encryption parameters.

6. The data set processing method according to claim 4, wherein the updating of the current query node of each data holder comprises:

and responding to the query node update instruction to trigger each data holder to execute an array processing interaction event, wherein the array processing interaction event is used for updating the current query node of each data holder.

7. The data set processing method according to claim 6, wherein in the triggering each data holder to execute an array processing interaction event in response to a query node update instruction, the array processing interaction event includes:

triggering each data holder to encrypt each array element of the child node array of the current query node of each data holder through the preset public key respectively in response to the array encryption instruction to obtain an initial encryption array corresponding to each data holder;

the array length of the child node array is equal to the number of characters in the character set, the array element numbers of the child node array are in one-to-one correspondence with the character numbers of the character set, and the array elements of the child node array are used for indicating whether the characters corresponding to the child node of the current query node are identical to the characters in the character set or not;

Triggering each data holder to determine a target encryption array in the second data holder based on the initial encryption array corresponding to each data holder respectively in response to the array operation instruction;

the second data holder is triggered to send the target encryption array to the first data holder in response to an array decryption instruction, and the first data holder is triggered to decrypt the target encryption array through the preset private key to obtain the target decryption array;

and determining the updated current query node according to the array elements in the target decryption array.

8. The method of claim 7, wherein determining the updated current query node from the array elements in the target decryption array comprises:

if the target decryption array has a target array element larger than zero and the child node which is the same as the character corresponding to the target array element in the character set is not marked as the access completion node, determining the child node which is the same as the character corresponding to the target array element with the minimum array element number in the character set as the updated current query node;

If the target decryption array has target array elements larger than zero and has the same child nodes with the characters corresponding to the target array elements in the character set, marking the child nodes as the access completion nodes, or if the array elements in the target decryption array are all zero, judging whether the current query node is the root node;

if the current query node is the root node, determining that the intersection element in each data holder is queried, and executing the step of transmitting all recorded character string attribute values to each data holder;

if the current query node is judged not to be the root node, determining the updated current query node according to the array elements in the target decryption array corresponding to the father node of the current query node.

9. The data set processing method according to claim 7, wherein triggering each data holder in response to the array operation instruction to determine a target encryption array in the second data holder based on the initial encryption array respectively corresponding to each data holder comprises:

and multiplying each array element in the initial encryption array corresponding to each data holder according to the array element number in a one-to-one correspondence manner in response to the array operation instruction, so as to obtain the target encryption array.

10. The data set processing method according to claim 1, wherein the determining whether to record the string property value of the current query node according to the target decryption parameter includes:

and if the target decryption parameter is a preset parameter, determining to record the character string attribute value of the current query node.

11. The data set processing method according to claim 2, wherein triggering each data holder to form a corresponding dictionary tree for each data holder based on the respective held data set in response to the dictionary tree construction instruction includes:

triggering each data holder to form an initial dictionary tree corresponding to each data holder based on the data set held by each data holder in response to the dictionary tree construction instruction;

each data holder is triggered to respectively generate a plurality of random character strings in response to the dictionary tree protection instruction, and the data holders are triggered to respectively insert the generated random character strings into the initial dictionary trees to form the dictionary tree corresponding to each data holder;

if the last tree node corresponding to the last character of the random character string exists in the initial dictionary tree, the state attribute parameters of the last tree node are not updated; and if the tail tree node corresponding to the tail character of the random character string does not exist in the initial dictionary tree, updating the state attribute parameter of the tail tree node into a preset distinguishing parameter.

12. An electronic device, comprising:

a processor; and

a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the method of any of claims 1-11.

13. A non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to perform the method of any of claims 1-11.