CN113626848A - Sample data generation method and device, electronic equipment and computer readable medium - Google Patents
Sample data generation method and device, electronic equipment and computer readable medium Download PDFInfo
- Publication number
- CN113626848A CN113626848A CN202110974414.4A CN202110974414A CN113626848A CN 113626848 A CN113626848 A CN 113626848A CN 202110974414 A CN202110974414 A CN 202110974414A CN 113626848 A CN113626848 A CN 113626848A
- Authority
- CN
- China
- Prior art keywords
- data
- public
- encrypted
- encrypted data
- participant
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 44
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 17
- 238000012549 training Methods 0.000 claims description 12
- 238000004590 computer program Methods 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 8
- 238000010586 diagram Methods 0.000 description 13
- 150000003839 salts Chemical class 0.000 description 13
- 238000004891 communication Methods 0.000 description 9
- 230000006870 function Effects 0.000 description 8
- 230000008569 process Effects 0.000 description 6
- 230000003287 optical effect Effects 0.000 description 4
- 230000003993 interaction Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 238000013475 authorization Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000002427 irreversible effect Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000009938 salting Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/602—Providing cryptographic facilities or services
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Bioethics (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computer Hardware Design (AREA)
- Software Systems (AREA)
- Computer Security & Cryptography (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application discloses a sample data generation method, a sample data generation device, electronic equipment and a computer readable medium, and relates to the technical field of computers, wherein one specific embodiment comprises the steps of receiving a sample data generation request, further generating a preset character string, and determining an adding position of the preset character string; sending the preset character string and the adding position to each participant so that each participant encrypts local data corresponding to each participant based on the preset character string and the adding position to generate encrypted data; obtaining each encrypted data, and further solving an intersection to obtain public encrypted data; and determining public sample data based on the public encryption data, and outputting the public sample data. The privacy intersection problem in federal learning is solved by using an information abstract algorithm to combine an encryption mode of adding a preset character string to local data, and scene adaptability is strong.
Description
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for generating sample data, an electronic device, and a computer-readable medium.
Background
At present, in the field of federal model learning, a privacy interaction method is adopted for obtaining training samples, but existing privacy interaction methods have specific use scenes, have respective defects in the aspects of deployment conditions, communication traffic and running time of each participant, and are difficult to balance in all aspects.
In the process of implementing the present application, the inventor finds that at least the following problems exist in the prior art:
when a training sample of a federal model is obtained through privacy deal, a use scene for carrying out privacy deal is specific, and when the privacy deal is carried out, the deployment condition, the communication quantity and the running time of each participant are difficult to balance.
Disclosure of Invention
In view of this, embodiments of the present application provide a sample data generation method, an apparatus, an electronic device, and a computer-readable medium, which can solve the problem that, when a training sample of a federal model is obtained through privacy deal, the use scenario of privacy deal is specific, and when privacy deal is performed, it is difficult to balance deployment conditions, traffic, and running time of each participant.
In order to achieve the above object, according to an aspect of the embodiments of the present application, there is provided a sample data generating method, including:
receiving a sample data generation request, further generating a preset character string, and determining an adding position of the preset character string;
sending a preset character string and an adding position to each participant so that each participant encrypts local data corresponding to each participant based on the preset character string and the adding position to generate encrypted data;
acquiring each encrypted data, and further solving an intersection to obtain public encrypted data;
and determining the public sample data based on the public encryption data, and outputting.
Optionally, generating the encrypted data comprises:
determining a data identifier corresponding to the adding position;
embedding a preset character string into local data corresponding to each participant according to the data identification to generate each local embedded data;
based on the message digest algorithm, each local embedded data is processed to generate a hash value, and the hash value is determined to be encrypted data.
Optionally, determining common sample data comprises:
establishing a corresponding relation between local data and each encrypted data locally at each participant;
and based on the corresponding relation, positioning local data corresponding to the public encrypted data, and further determining the determined local data corresponding to the public encrypted data as public sample data.
Optionally, intersecting to obtain public encryption data includes:
generating an asynchronous intersection solving task based on each encrypted data;
and executing each asynchronous intersection solving task to obtain public encryption data.
Optionally, generating an asynchronous intersection-solving task based on each encrypted data includes:
pairing every two encrypted data to generate paired encrypted data;
and generating asynchronous intersection solving tasks based on the paired encrypted data.
Optionally, executing each asynchronous intersection solving task to obtain public encryption data, including:
executing each asynchronous intersection task to generate an encrypted intersection data set;
and solving the intersection of the encrypted intersection data in the encrypted intersection data set so as to generate public encrypted data.
Optionally, after determining the common sample data, the method further comprises:
model training is performed based on common sample data.
In addition, the present application also provides a sample data generating apparatus, including:
the receiving unit is configured to receive a sample data generation request, further generate a preset character string and determine an adding position of the preset character string;
the encrypted data generating unit is configured to send the preset character string and the adding position to each participant so that each participant encrypts local data corresponding to each participant based on the preset character string and the adding position to generate encrypted data;
the public encrypted data determining unit is configured to acquire each encrypted data and further obtain an intersection to obtain public encrypted data;
and the sample data determining unit is configured to determine the public sample data based on the public encryption data and output the public sample data.
Optionally, the encrypted data generating unit is further configured to:
determining a data identifier corresponding to the adding position;
embedding a preset character string into local data corresponding to each participant according to the data identification to generate each local embedded data;
based on the message digest algorithm, each local embedded data is processed to generate a hash value, and the hash value is determined to be encrypted data.
Optionally, the sample data determining unit is further configured to:
establishing a corresponding relation between local data and each encrypted data locally at each participant;
and based on the corresponding relation, positioning local data corresponding to the public encrypted data, and further determining the determined local data corresponding to the public encrypted data as public sample data.
Optionally, the public encryption data determination unit is further configured to:
generating an asynchronous intersection solving task based on each encrypted data;
and executing each asynchronous intersection solving task to obtain public encryption data.
Optionally, the public encryption data determination unit is further configured to:
pairing every two encrypted data to generate paired encrypted data;
and generating asynchronous intersection solving tasks based on the paired encrypted data.
Optionally, the public encryption data determination unit is further configured to:
executing each asynchronous intersection task to generate an encrypted intersection data set;
and solving the intersection of the encrypted intersection data in the encrypted intersection data set so as to generate public encrypted data.
Optionally, the sample data generating apparatus further comprises a model training unit configured to:
model training is performed based on common sample data.
In addition, the present application further provides an electronic device for generating sample data, including: one or more processors; the storage device is used for storing one or more programs, and when the one or more programs are executed by one or more processors, the one or more processors realize the sample data generation method.
In addition, the present application also provides a computer readable medium, on which a computer program is stored, which when executed by a processor implements the sample data generating method as described above.
One embodiment of the above invention has the following advantages or benefits: the method comprises the steps of generating a preset character string by receiving a sample data generation request, and determining an adding position of the preset character string; sending a preset character string and an adding position to each participant so that each participant encrypts local data corresponding to each participant based on the preset character string and the adding position to generate encrypted data; acquiring each encrypted data, and further solving an intersection to obtain public encrypted data; and determining the public sample data based on the public encryption data, and outputting. The privacy intersection problem in federal learning is solved by using a Message-Digest Algorithm (MD 5) in combination with an encryption mode of adding random salt (namely, preset character strings), and the scene adaptability is strong.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a further understanding of the application and are not to be construed as limiting the application. Wherein:
fig. 1 is a schematic view of a main flow of a sample data generation method according to a first embodiment of the present application;
fig. 2 is a schematic diagram of a main flow of a sample data generation method according to a second embodiment of the present application;
fig. 3 is a schematic view of an application scenario of a sample data generation method according to a third embodiment of the present application;
fig. 4 is a schematic diagram of main units of a sample data generation apparatus according to an embodiment of the present application;
FIG. 5 is an exemplary system architecture diagram to which embodiments of the present application may be applied;
fig. 6 is a schematic structural diagram of a computer system suitable for implementing the terminal device or the server according to the embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a schematic diagram of a main flow of a sample data generation method according to a first embodiment of the present application, and as shown in fig. 1, the sample data generation method includes:
step S101, receiving a sample data generation request, further generating a preset character string, and determining an adding position of the preset character string.
In this embodiment, an execution subject (for example, a server) of the sample data generation method may receive the sample data generation request through a wired connection or a wireless connection. Specifically, the request for generating the training sample data of the federal learning model may be. Federal learning is a machine learning framework, and can effectively help a plurality of organizations to perform data use and machine learning modeling under the condition of meeting the requirements of user privacy protection, data safety and government regulations. The execution subject may generate a preset character string after receiving the sample data generation request, for example, a random salt. In cryptography, random salt refers to a process of making a hashed result not match a hashed result using an original password by inserting a specific character string at an arbitrarily fixed position of the password, which is called "salting", and this specific character string is called "random salt". After generating the preset string (i.e., the random salt), the execution subject may determine a position to add the preset string to the local data of each participant according to the setting information of the user or any other manner. For example, the preset character string is added before or after which numeric identifier or which alphabetical identifier of each participant local data, or added to several rows and several columns of each participant local data, and the like, and this position is not specifically limited in the embodiment of the present application.
And S102, sending the preset character string and the adding position to each participant so that each participant encrypts local data corresponding to each participant based on the preset character string and the adding position to generate encrypted data.
In this embodiment, each participant may be a business, a bank, a school, or the like. The present application does not limit the specific form of each participant.
After determining the preset character string and the adding position of the preset character string, the execution main body can package the preset character string and the adding position and transmit the packaged preset character string and the adding position to each participant through a custom protocol. And enabling each participant to call a preset encryption algorithm to encrypt local data of each participant according to the acquired preset character string (such as random salt) and the adding position, and further generating encrypted data.
And step S103, acquiring each encrypted data, and further solving an intersection to obtain public encrypted data.
Specifically, the intersection is calculated to obtain public encryption data, which includes:
generating an asynchronous intersection solving task based on each encrypted data;
and executing each asynchronous intersection solving task to obtain public encryption data.
Specifically, generating an asynchronous intersection solving task based on each encrypted data includes:
pairing every two encrypted data to generate paired encrypted data. Specifically, the executing entity may pair each encrypted data generated by each participant two by two, and then obtain paired encrypted data.
And generating asynchronous intersection solving tasks based on the paired encrypted data. Specifically, after obtaining the paired encrypted data, the executing entity may generate an asynchronous intersection-solving task for solving an intersection of the paired encrypted data.
Specifically, executing each asynchronous intersection solving task to obtain public encryption data includes:
executing each asynchronous intersection task to generate an encrypted intersection data set; and solving the intersection of the encrypted intersection data in the encrypted intersection data set so as to generate public encrypted data.
Specifically, after the execution main body executes each asynchronous intersection solving task, intersection is solved again for each encrypted intersection data in the generated encrypted intersection data set, then intersection is solved for each encrypted intersection data obtained by solving intersection again until two encrypted intersection data remain finally, and the last intersection is solved for the two encrypted intersection data to obtain public encrypted data.
And step S104, determining the public sample data based on the public encryption data and outputting the public sample data.
Specifically, determining common sample data includes:
and establishing a corresponding relation between the local data and each encrypted data locally at each participant. In this embodiment, the execution main body may establish, at each participant, a corresponding relationship between the local data of each participant and the encrypted data (that is, the encrypted data obtained by adding a preset character string to a preset position of the local data and encrypting with a preset encryption algorithm).
The execution subject may locate local data corresponding to the public encrypted data based on the established correspondence, and further determine the determined local data corresponding to the public encrypted data as public sample data.
Specifically, after determining the common sample data, the sample data generating method further includes:
model training is performed based on common sample data. Specifically, the federal learning model can be trained based on public sample data.
The embodiment generates a preset character string by receiving a sample data generation request, and determines an adding position of the preset character string; sending a preset character string and an adding position to each participant so that each participant encrypts local data corresponding to each participant based on the preset character string and the adding position to generate encrypted data; acquiring each encrypted data, and further solving an intersection to obtain public encrypted data; and determining the public sample data based on the public encryption data, and outputting. The privacy intersection problem in federal learning is solved by using a Message-Digest Algorithm (MD 5) in combination with an encryption mode of adding random salt (namely, preset character strings), and the scene adaptability is strong.
Fig. 2 is a schematic main flow diagram of a sample data generation method according to a second embodiment of the present application, and as shown in fig. 2, the sample data generation method includes:
step S201, receiving a sample data generation request, further generating a preset character string, and determining an adding position of the preset character string.
Step S202, sending the preset character string and the adding position to each participant, so that each participant encrypts local data corresponding to each participant based on the preset character string and the adding position to generate encrypted data.
The principle of step S201 to step S202 is similar to that of step S101 to step S102, and is not described here again.
Specifically, step S202 can also be realized by step S2021 to step S2023:
step S2021, determining the data identifier corresponding to the adding position.
Specifically, the data identification may be 0, 1, 2, 3, etc. The data identifier is not specifically limited in the embodiments of the present application.
Step S2022, according to the data identifier, embed the preset character string into the local data corresponding to each participant, and generate each local embedded data.
After determining the data identifier corresponding to the adding position of the preset character string, the execution main body may determine preset row information and column information corresponding to the adding position again, and send the data identifier and the preset row information and column information to each participant. Each participant can position the embedded point according to the obtained data identifier and the preset row information and column information, so as to embed the preset character string into the respective local data based on the embedded point obtained by positioning, and generate each local embedded data corresponding to each participant.
For example, the local data for one participant is: x7faqgjw, the predetermined string may be abcdefg, the data identifier may be 7, and the row information and the column information may be a first row and a second column. The located embedding point is "7" of the local data of the participant, and the preset embedding rule may be after embedding the digital identifier. The execution body may embed the preset character string abcdefg into "7" of x7faqgjw to obtain x7 abcdeffaqgjw, which is the local embedded data of the participant. The embedding rule of the preset character string is not specifically limited.
Step S2023, based on the information digest algorithm, processes each local embedded data to generate a hash value, and further determines the hash value as encrypted data.
After obtaining each piece of local embedded data, the execution main body may process each piece of local embedded data based on an information digest algorithm to generate a hash value, and then determine the hash value as encrypted data. Specifically, MD5 hash calculation may be performed on each local embedded data to obtain each hash value. The execution body may further determine each hash value corresponding to each locally embedded data as the encrypted data. MD5(MD5 Message-Digest Algorithm), a widely used cryptographic hash function, may produce a 128-bit hash value to ensure the integrity of the Message transmission. MD5 is a fixed length output obtained by computing an input string of arbitrary length, the same ciphertext can only be obtained if the plaintext is the same, and the algorithm is irreversible. Even if the encrypted ciphertext is obtained, it is impossible to calculate the plaintext by the decryption function. For example, the locally embedded data x7 abcdeffaqgjw is hashed by MD5, and the obtained 4a1690d5eb6c126ef68606dda68c2f79 is a hash value.
According to the embodiment, the local data of each participant is encrypted by using MD5 hash calculation in combination with a random salt adding mode, so that the encrypted local data of each participant has higher security and is not easy to crack.
Step S203, acquiring each encrypted data, and further obtaining an intersection to obtain public encrypted data.
And step S204, determining the public sample data based on the public encryption data and outputting the public sample data.
The principle of step S203 to step S204 is similar to that of step S103 to step S104, and is not described here again.
Fig. 3 is a schematic view of an application scenario of a sample data generation method according to a third embodiment of the present application. The sample data generation method in the embodiment of the application can be applied to a scene of generating the training sample of the federal learning model. As shown in fig. 3, the server 302 receives the sample data generation request 301, generates a preset character string 303, and determines an addition position 304 of the preset character string 303. The server 302 sends a preset character string 303 and an addition position 304 to each participant 305 (which may include participant 1, participant 2, etc.), so that each participant 305 (which may include participant 1, participant 2, etc.) encrypts local data 306 (which may be local data 1 corresponding to participant 1, local data 2 corresponding to participant 2, etc.) corresponding to each participant (which may include participant 1, participant 2, etc.) based on the preset character string 303 and the addition position 304, and generates encrypted data 307 (which may include encrypted data 1, encrypted data 2, etc.). Server 302 obtains encrypted data 307 (which may include encrypted data 1, encrypted data 2, etc.), and intersects to obtain public encrypted data 308. The server 302 determines common sample data 309 based on the common encrypted data 308, and outputs it.
In the embodiment of the application, one party participating in federal learning (namely, the party A can be) generates a random salt, determines the position of adding the random salt to the sample ID, packages the information and transmits the information to the rest parties through a custom protocol. And other participants obtain the hash value of the local sample ID according to the acquired random salt and the addition rule by using an MD5 encryption algorithm according to the rule, and establish a corresponding relation table of the sample ID and the hash value locally. Each participant carries out stream type intersection on the calculated hash value, the calculated hash value can be subjected to distributed parallel processing, the public sample hash value obtained by final intersection is sent back to each participant, and after the participant obtains the returned hash value, the public sample ID can be obtained through the locally established mapping relation.
Fig. 4 is a schematic diagram of main units of a sample data generation apparatus according to an embodiment of the present application. As shown in fig. 4, the sample data generation apparatus includes a reception unit 401, an encrypted data generation unit 402, a common encrypted data determination unit 403, and a sample data determination unit 404.
A receiving unit 401 configured to receive a sample data generation request, further generate a preset character string, and determine an adding position of the preset character string;
an encrypted data generating unit 402 configured to transmit the preset character string and the adding position to each participant, so that each participant encrypts local data corresponding to each participant based on the preset character string and the adding position to generate encrypted data;
a public encrypted data determining unit 403 configured to obtain each encrypted data, and further find an intersection to obtain public encrypted data;
a sample data determination unit 404 configured to determine common sample data based on the common encrypted data, and output.
In some embodiments, the encrypted data generation unit 402 is further configured to: determining a data identifier corresponding to the adding position; embedding a preset character string into local data corresponding to each participant according to the data identification to generate each local embedded data; based on the message digest algorithm, each local embedded data is processed to generate a hash value, and the hash value is determined to be encrypted data.
In some embodiments, the sample data determination unit 404 is further configured to: establishing a corresponding relation between local data and each encrypted data locally at each participant; and based on the corresponding relation, positioning local data corresponding to the public encrypted data, and further determining the determined local data corresponding to the public encrypted data as public sample data.
In some embodiments, the public encryption data determination unit 403 is further configured to: generating an asynchronous intersection solving task based on each encrypted data; and executing each asynchronous intersection solving task to obtain public encryption data.
In some embodiments, the public encryption data determination unit 403 is further configured to: pairing every two encrypted data to generate paired encrypted data; and generating asynchronous intersection solving tasks based on the paired encrypted data.
In some embodiments, the public encryption data determination unit 403 is further configured to: executing each asynchronous intersection task to generate an encrypted intersection data set; and solving the intersection of the encrypted intersection data in the encrypted intersection data set so as to generate public encrypted data.
In some embodiments, the sample data generating apparatus further comprises a model training unit, not shown in fig. 4, configured to: model training is performed based on common sample data.
It should be noted that the sample data generation method and the sample data generation apparatus according to the present application have a corresponding relationship in the specific implementation contents, and therefore the description of the duplicated contents is omitted.
Fig. 5 illustrates an exemplary system architecture 500 to which the sample data generation method or the sample data generation apparatus of the embodiments of the present application may be applied.
As shown in fig. 5, the system architecture 500 may include terminal devices 501, 502, 503, a network 504, and a server 505. The network 504 serves to provide a medium for communication links between the terminal devices 501, 502, 503 and the server 505. Network 504 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 501, 502, 503 to interact with a server 505 over a network 504 to receive or send messages or the like. The terminal devices 501, 502, 503 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).
The terminal devices 501, 502, 503 may be various electronic devices having sample data generation processing screens and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 505 may be a server providing various services, such as a background management server (for example only) providing support for a user to generate a request using sample data submitted by the terminal devices 501, 502, 503. The background management server can receive a sample data generation request, further generate a preset character string and determine an adding position of the preset character string; sending a preset character string and an adding position to each participant so that each participant encrypts local data corresponding to each participant based on the preset character string and the adding position to generate encrypted data; acquiring each encrypted data, and further solving an intersection to obtain public encrypted data; and determining the public sample data based on the public encryption data, and outputting. The privacy intersection problem in federal learning is solved by using a Message-Digest Algorithm (MD 5) in combination with an encryption mode of adding random salt (namely, preset character strings), and the scene adaptability is strong.
The sample data generation method provided in the embodiment of the present application is generally executed by the server 505, and accordingly, the sample data generation apparatus is generally installed in the server 505.
It should be understood that the number of terminal devices, networks, and servers in fig. 5 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 6, shown is a block diagram of a computer system 600 suitable for use in implementing a terminal device of an embodiment of the present application. The terminal device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU)601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data necessary for the operation of the computer system 600 are also stored. The CPU601, ROM602, and RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output section 607 including a signal processing section such as a Cathode Ray Tube (CRT), a liquid crystal credit authorization inquiry processor (LCD), and the like, and a speaker and the like; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary.
In particular, according to embodiments disclosed herein, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, embodiments disclosed herein include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611. The above-described functions defined in the system of the present application are executed when the computer program is executed by the Central Processing Unit (CPU) 601.
It should be noted that the computer readable medium shown in the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a receiving unit, an encrypted data generating unit, a common encrypted data determining unit, and a sample data determining unit. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.
As another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs, and when the one or more programs are executed by one device, the device receives a sample data generation request, further generates a preset character string, and determines an adding position of the preset character string; sending a preset character string and an adding position to each participant so that each participant encrypts local data corresponding to each participant based on the preset character string and the adding position to generate encrypted data; acquiring each encrypted data, and further solving an intersection to obtain public encrypted data; and determining the public sample data based on the public encryption data, and outputting.
According to the technical scheme of the embodiment of the application, the privacy interaction problem in federal learning is solved by using a Message-Digest Algorithm (MD 5) in combination with an encryption mode of adding random salt (namely, a preset character string), and the scene adaptability is strong.
The above-described embodiments should not be construed as limiting the scope of the present application. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.
Claims (12)
1. A sample data generating method is characterized by comprising the following steps:
receiving a sample data generation request, further generating a preset character string, and determining an adding position of the preset character string;
sending the preset character string and the adding position to each participant so that each participant encrypts local data corresponding to each participant based on the preset character string and the adding position to generate encrypted data;
obtaining each encrypted data, and further solving an intersection to obtain public encrypted data;
and determining public sample data based on the public encryption data, and outputting the public sample data.
2. The method of claim 1, wherein generating the encrypted data comprises:
determining a data identifier corresponding to the adding position;
embedding the preset character strings into local data corresponding to each participant according to the data identification to generate each local embedded data;
processing each of the locally embedded data based on an information digest algorithm to generate a hash value, and further determining the hash value as encrypted data.
3. The method of claim 1, wherein said determining common sample data comprises:
establishing a corresponding relation between local data and each encrypted data locally at each participant;
and based on the corresponding relation, positioning local data corresponding to the public encrypted data, and further determining the determined local data corresponding to the public encrypted data as public sample data.
4. The method of claim 1, wherein the intersecting yields public encryption data, comprising:
generating an asynchronous intersection solving task based on each encrypted data;
and executing each asynchronous intersection solving task to obtain public encryption data.
5. The method of claim 4, wherein generating an asynchronous intersection task based on each of the encrypted data comprises:
pairing every two encrypted data to generate paired encrypted data;
and generating an asynchronous intersection solving task based on each pair encrypted data.
6. The method of claim 4, wherein said performing asynchronous intersection tasks to obtain common encrypted data comprises:
executing each asynchronous intersection task to generate an encrypted intersection data set;
and solving the intersection of the encrypted intersection data in the encrypted intersection data set so as to generate public encrypted data.
7. The method of claim 1, wherein after said determining common sample data, the method further comprises:
and performing model training based on the common sample data.
8. A sample data generation apparatus, comprising:
the receiving unit is configured to receive a sample data generation request, further generate a preset character string, and determine an adding position of the preset character string;
an encrypted data generation unit configured to send the preset character string and the addition position to each participant so that each participant encrypts local data corresponding to each participant based on the preset character string and the addition position to generate encrypted data;
the public encrypted data determining unit is configured to acquire each encrypted data and further obtain an intersection to obtain public encrypted data;
and the sample data determining unit is configured to determine the public sample data based on the public encryption data and output the public sample data.
9. The apparatus of claim 8, wherein the encrypted data generation unit is further configured to:
determining a data identifier corresponding to the adding position;
embedding the preset character strings into local data corresponding to each participant according to the data identification to generate each local embedded data;
processing each of the locally embedded data based on an information digest algorithm to generate a hash value, and further determining the hash value as encrypted data.
10. The apparatus of claim 8, wherein the sample data determination unit is further configured to:
establishing a corresponding relation between local data and each encrypted data locally at each participant;
and based on the corresponding relation, positioning local data corresponding to the public encrypted data, and further determining the determined local data corresponding to the public encrypted data as public sample data.
11. An electronic device for generating sample data, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.
12. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110974414.4A CN113626848A (en) | 2021-08-24 | 2021-08-24 | Sample data generation method and device, electronic equipment and computer readable medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110974414.4A CN113626848A (en) | 2021-08-24 | 2021-08-24 | Sample data generation method and device, electronic equipment and computer readable medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113626848A true CN113626848A (en) | 2021-11-09 |
Family
ID=78387437
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110974414.4A Pending CN113626848A (en) | 2021-08-24 | 2021-08-24 | Sample data generation method and device, electronic equipment and computer readable medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113626848A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114900325A (en) * | 2022-03-25 | 2022-08-12 | 杭州博盾习言科技有限公司 | Privacy set intersection method, system, device and medium based on federal learning |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8566601B1 (en) * | 2012-09-12 | 2013-10-22 | Zeutro Llc | Systems and methods for functional encryption using a string of arbitrary length |
CN105812141A (en) * | 2016-03-07 | 2016-07-27 | 东北大学 | Outsourcing encrypted data-orientated verifiable intersection operation method and system |
CN112861175A (en) * | 2021-02-03 | 2021-05-28 | 华控清交信息科技(北京)有限公司 | Data processing method and device and data processing device |
CN113032817A (en) * | 2021-05-21 | 2021-06-25 | 北京百度网讯科技有限公司 | Data alignment method, device, equipment and medium based on block chain |
CN113051239A (en) * | 2021-03-26 | 2021-06-29 | 北京沃东天骏信息技术有限公司 | Data sharing method, use method of model applying data sharing method and related equipment |
CN113065155A (en) * | 2021-03-26 | 2021-07-02 | 杭州宇链科技有限公司 | Privacy set intersection method based on trusted execution environment assistance |
-
2021
- 2021-08-24 CN CN202110974414.4A patent/CN113626848A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8566601B1 (en) * | 2012-09-12 | 2013-10-22 | Zeutro Llc | Systems and methods for functional encryption using a string of arbitrary length |
CN105812141A (en) * | 2016-03-07 | 2016-07-27 | 东北大学 | Outsourcing encrypted data-orientated verifiable intersection operation method and system |
CN112861175A (en) * | 2021-02-03 | 2021-05-28 | 华控清交信息科技(北京)有限公司 | Data processing method and device and data processing device |
CN113051239A (en) * | 2021-03-26 | 2021-06-29 | 北京沃东天骏信息技术有限公司 | Data sharing method, use method of model applying data sharing method and related equipment |
CN113065155A (en) * | 2021-03-26 | 2021-07-02 | 杭州宇链科技有限公司 | Privacy set intersection method based on trusted execution environment assistance |
CN113032817A (en) * | 2021-05-21 | 2021-06-25 | 北京百度网讯科技有限公司 | Data alignment method, device, equipment and medium based on block chain |
Non-Patent Citations (2)
Title |
---|
申立艳;陈小军;时金桥;胡兰兰;: "隐私保护集合交集计算技术研究综述", 计算机研究与发展, no. 10, 31 December 2017 (2017-12-31) * |
陈陪宁;: "基于加密技术的隐私保护在WEB信息系统中的应用研究", 电脑知识与技术, no. 12, 25 April 2013 (2013-04-25) * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114900325A (en) * | 2022-03-25 | 2022-08-12 | 杭州博盾习言科技有限公司 | Privacy set intersection method, system, device and medium based on federal learning |
CN114900325B (en) * | 2022-03-25 | 2024-03-26 | 杭州博盾习言科技有限公司 | Federal learning-based privacy set intersection method, system, equipment and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10880732B2 (en) | Authentication of phone caller identity | |
CN113364760A (en) | Data encryption processing method and device, computer equipment and storage medium | |
CN111464295B (en) | Bank card making method and device | |
CN108880812B (en) | Method and system for data encryption | |
CN109743161B (en) | Information encryption method, electronic device and computer readable medium | |
CN112966287B (en) | Method, system, device and computer readable medium for acquiring user data | |
CN112765642A (en) | Data processing method, data processing apparatus, electronic device, and medium | |
WO2017006118A1 (en) | Secure distributed encryption system and method | |
US10530581B2 (en) | Authenticated broadcast encryption | |
CN115203749A (en) | Data transaction method and system based on block chain | |
CN111181920A (en) | Encryption and decryption method and device | |
CN114785524A (en) | Electronic seal generation method, device, equipment and medium | |
CN113468580B (en) | Multi-party collaborative signature method and system | |
CN111552950B (en) | Software authorization method and device and computer readable storage medium | |
CN112565156B (en) | Information registration method, device and system | |
CN113626848A (en) | Sample data generation method and device, electronic equipment and computer readable medium | |
CN113761566A (en) | Data processing method and device | |
CN110781523B (en) | Method and apparatus for processing information | |
CN110166226B (en) | Method and device for generating secret key | |
CN109639409B (en) | Key initialization method, key initialization device, electronic equipment and computer-readable storage medium | |
CN113761585A (en) | Data processing method, device and system | |
CN111832046A (en) | Trusted data evidence storing method based on block chain technology | |
CN112926076B (en) | Data processing method, device and system | |
CN112738008B (en) | Information synchronous changing method, device, computer and readable storage medium | |
CN116192466A (en) | Letter processing method, device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |