CN113626848A - Sample data generation method and device, electronic equipment and computer readable medium - Google Patents

Sample data generation method and device, electronic equipment and computer readable medium Download PDF

Info

Publication number
CN113626848A
CN113626848A CN202110974414.4A CN202110974414A CN113626848A CN 113626848 A CN113626848 A CN 113626848A CN 202110974414 A CN202110974414 A CN 202110974414A CN 113626848 A CN113626848 A CN 113626848A
Authority
CN
China
Prior art keywords
data
public
encrypted
encrypted data
participant
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110974414.4A
Other languages
Chinese (zh)
Inventor
袁梓焜
王科
刘博�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202110974414.4A priority Critical patent/CN113626848A/en
Publication of CN113626848A publication Critical patent/CN113626848A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Bioethics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a sample data generation method, a sample data generation device, electronic equipment and a computer readable medium, and relates to the technical field of computers, wherein one specific embodiment comprises the steps of receiving a sample data generation request, further generating a preset character string, and determining an adding position of the preset character string; sending the preset character string and the adding position to each participant so that each participant encrypts local data corresponding to each participant based on the preset character string and the adding position to generate encrypted data; obtaining each encrypted data, and further solving an intersection to obtain public encrypted data; and determining public sample data based on the public encryption data, and outputting the public sample data. The privacy intersection problem in federal learning is solved by using an information abstract algorithm to combine an encryption mode of adding a preset character string to local data, and scene adaptability is strong.

Description

Sample data generation method and device, electronic equipment and computer readable medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for generating sample data, an electronic device, and a computer-readable medium.
Background
At present, in the field of federal model learning, a privacy interaction method is adopted for obtaining training samples, but existing privacy interaction methods have specific use scenes, have respective defects in the aspects of deployment conditions, communication traffic and running time of each participant, and are difficult to balance in all aspects.
In the process of implementing the present application, the inventor finds that at least the following problems exist in the prior art:
when a training sample of a federal model is obtained through privacy deal, a use scene for carrying out privacy deal is specific, and when the privacy deal is carried out, the deployment condition, the communication quantity and the running time of each participant are difficult to balance.
Disclosure of Invention
In view of this, embodiments of the present application provide a sample data generation method, an apparatus, an electronic device, and a computer-readable medium, which can solve the problem that, when a training sample of a federal model is obtained through privacy deal, the use scenario of privacy deal is specific, and when privacy deal is performed, it is difficult to balance deployment conditions, traffic, and running time of each participant.
In order to achieve the above object, according to an aspect of the embodiments of the present application, there is provided a sample data generating method, including:
receiving a sample data generation request, further generating a preset character string, and determining an adding position of the preset character string;
sending a preset character string and an adding position to each participant so that each participant encrypts local data corresponding to each participant based on the preset character string and the adding position to generate encrypted data;
acquiring each encrypted data, and further solving an intersection to obtain public encrypted data;
and determining the public sample data based on the public encryption data, and outputting.
Optionally, generating the encrypted data comprises:
determining a data identifier corresponding to the adding position;
embedding a preset character string into local data corresponding to each participant according to the data identification to generate each local embedded data;
based on the message digest algorithm, each local embedded data is processed to generate a hash value, and the hash value is determined to be encrypted data.
Optionally, determining common sample data comprises:
establishing a corresponding relation between local data and each encrypted data locally at each participant;
and based on the corresponding relation, positioning local data corresponding to the public encrypted data, and further determining the determined local data corresponding to the public encrypted data as public sample data.
Optionally, intersecting to obtain public encryption data includes:
generating an asynchronous intersection solving task based on each encrypted data;
and executing each asynchronous intersection solving task to obtain public encryption data.
Optionally, generating an asynchronous intersection-solving task based on each encrypted data includes:
pairing every two encrypted data to generate paired encrypted data;
and generating asynchronous intersection solving tasks based on the paired encrypted data.
Optionally, executing each asynchronous intersection solving task to obtain public encryption data, including:
executing each asynchronous intersection task to generate an encrypted intersection data set;
and solving the intersection of the encrypted intersection data in the encrypted intersection data set so as to generate public encrypted data.
Optionally, after determining the common sample data, the method further comprises:
model training is performed based on common sample data.
In addition, the present application also provides a sample data generating apparatus, including:
the receiving unit is configured to receive a sample data generation request, further generate a preset character string and determine an adding position of the preset character string;
the encrypted data generating unit is configured to send the preset character string and the adding position to each participant so that each participant encrypts local data corresponding to each participant based on the preset character string and the adding position to generate encrypted data;
the public encrypted data determining unit is configured to acquire each encrypted data and further obtain an intersection to obtain public encrypted data;
and the sample data determining unit is configured to determine the public sample data based on the public encryption data and output the public sample data.
Optionally, the encrypted data generating unit is further configured to:
determining a data identifier corresponding to the adding position;
embedding a preset character string into local data corresponding to each participant according to the data identification to generate each local embedded data;
based on the message digest algorithm, each local embedded data is processed to generate a hash value, and the hash value is determined to be encrypted data.
Optionally, the sample data determining unit is further configured to:
establishing a corresponding relation between local data and each encrypted data locally at each participant;
and based on the corresponding relation, positioning local data corresponding to the public encrypted data, and further determining the determined local data corresponding to the public encrypted data as public sample data.
Optionally, the public encryption data determination unit is further configured to:
generating an asynchronous intersection solving task based on each encrypted data;
and executing each asynchronous intersection solving task to obtain public encryption data.
Optionally, the public encryption data determination unit is further configured to:
pairing every two encrypted data to generate paired encrypted data;
and generating asynchronous intersection solving tasks based on the paired encrypted data.
Optionally, the public encryption data determination unit is further configured to:
executing each asynchronous intersection task to generate an encrypted intersection data set;
and solving the intersection of the encrypted intersection data in the encrypted intersection data set so as to generate public encrypted data.
Optionally, the sample data generating apparatus further comprises a model training unit configured to:
model training is performed based on common sample data.
In addition, the present application further provides an electronic device for generating sample data, including: one or more processors; the storage device is used for storing one or more programs, and when the one or more programs are executed by one or more processors, the one or more processors realize the sample data generation method.
In addition, the present application also provides a computer readable medium, on which a computer program is stored, which when executed by a processor implements the sample data generating method as described above.
One embodiment of the above invention has the following advantages or benefits: the method comprises the steps of generating a preset character string by receiving a sample data generation request, and determining an adding position of the preset character string; sending a preset character string and an adding position to each participant so that each participant encrypts local data corresponding to each participant based on the preset character string and the adding position to generate encrypted data; acquiring each encrypted data, and further solving an intersection to obtain public encrypted data; and determining the public sample data based on the public encryption data, and outputting. The privacy intersection problem in federal learning is solved by using a Message-Digest Algorithm (MD 5) in combination with an encryption mode of adding random salt (namely, preset character strings), and the scene adaptability is strong.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a further understanding of the application and are not to be construed as limiting the application. Wherein:
fig. 1 is a schematic view of a main flow of a sample data generation method according to a first embodiment of the present application;
fig. 2 is a schematic diagram of a main flow of a sample data generation method according to a second embodiment of the present application;
fig. 3 is a schematic view of an application scenario of a sample data generation method according to a third embodiment of the present application;
fig. 4 is a schematic diagram of main units of a sample data generation apparatus according to an embodiment of the present application;
FIG. 5 is an exemplary system architecture diagram to which embodiments of the present application may be applied;
fig. 6 is a schematic structural diagram of a computer system suitable for implementing the terminal device or the server according to the embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a schematic diagram of a main flow of a sample data generation method according to a first embodiment of the present application, and as shown in fig. 1, the sample data generation method includes:
step S101, receiving a sample data generation request, further generating a preset character string, and determining an adding position of the preset character string.
In this embodiment, an execution subject (for example, a server) of the sample data generation method may receive the sample data generation request through a wired connection or a wireless connection. Specifically, the request for generating the training sample data of the federal learning model may be. Federal learning is a machine learning framework, and can effectively help a plurality of organizations to perform data use and machine learning modeling under the condition of meeting the requirements of user privacy protection, data safety and government regulations. The execution subject may generate a preset character string after receiving the sample data generation request, for example, a random salt. In cryptography, random salt refers to a process of making a hashed result not match a hashed result using an original password by inserting a specific character string at an arbitrarily fixed position of the password, which is called "salting", and this specific character string is called "random salt". After generating the preset string (i.e., the random salt), the execution subject may determine a position to add the preset string to the local data of each participant according to the setting information of the user or any other manner. For example, the preset character string is added before or after which numeric identifier or which alphabetical identifier of each participant local data, or added to several rows and several columns of each participant local data, and the like, and this position is not specifically limited in the embodiment of the present application.
And S102, sending the preset character string and the adding position to each participant so that each participant encrypts local data corresponding to each participant based on the preset character string and the adding position to generate encrypted data.
In this embodiment, each participant may be a business, a bank, a school, or the like. The present application does not limit the specific form of each participant.
After determining the preset character string and the adding position of the preset character string, the execution main body can package the preset character string and the adding position and transmit the packaged preset character string and the adding position to each participant through a custom protocol. And enabling each participant to call a preset encryption algorithm to encrypt local data of each participant according to the acquired preset character string (such as random salt) and the adding position, and further generating encrypted data.
And step S103, acquiring each encrypted data, and further solving an intersection to obtain public encrypted data.
Specifically, the intersection is calculated to obtain public encryption data, which includes:
generating an asynchronous intersection solving task based on each encrypted data;
and executing each asynchronous intersection solving task to obtain public encryption data.
Specifically, generating an asynchronous intersection solving task based on each encrypted data includes:
pairing every two encrypted data to generate paired encrypted data. Specifically, the executing entity may pair each encrypted data generated by each participant two by two, and then obtain paired encrypted data.
And generating asynchronous intersection solving tasks based on the paired encrypted data. Specifically, after obtaining the paired encrypted data, the executing entity may generate an asynchronous intersection-solving task for solving an intersection of the paired encrypted data.
Specifically, executing each asynchronous intersection solving task to obtain public encryption data includes:
executing each asynchronous intersection task to generate an encrypted intersection data set; and solving the intersection of the encrypted intersection data in the encrypted intersection data set so as to generate public encrypted data.
Specifically, after the execution main body executes each asynchronous intersection solving task, intersection is solved again for each encrypted intersection data in the generated encrypted intersection data set, then intersection is solved for each encrypted intersection data obtained by solving intersection again until two encrypted intersection data remain finally, and the last intersection is solved for the two encrypted intersection data to obtain public encrypted data.
And step S104, determining the public sample data based on the public encryption data and outputting the public sample data.
Specifically, determining common sample data includes:
and establishing a corresponding relation between the local data and each encrypted data locally at each participant. In this embodiment, the execution main body may establish, at each participant, a corresponding relationship between the local data of each participant and the encrypted data (that is, the encrypted data obtained by adding a preset character string to a preset position of the local data and encrypting with a preset encryption algorithm).
The execution subject may locate local data corresponding to the public encrypted data based on the established correspondence, and further determine the determined local data corresponding to the public encrypted data as public sample data.
Specifically, after determining the common sample data, the sample data generating method further includes:
model training is performed based on common sample data. Specifically, the federal learning model can be trained based on public sample data.
The embodiment generates a preset character string by receiving a sample data generation request, and determines an adding position of the preset character string; sending a preset character string and an adding position to each participant so that each participant encrypts local data corresponding to each participant based on the preset character string and the adding position to generate encrypted data; acquiring each encrypted data, and further solving an intersection to obtain public encrypted data; and determining the public sample data based on the public encryption data, and outputting. The privacy intersection problem in federal learning is solved by using a Message-Digest Algorithm (MD 5) in combination with an encryption mode of adding random salt (namely, preset character strings), and the scene adaptability is strong.
Fig. 2 is a schematic main flow diagram of a sample data generation method according to a second embodiment of the present application, and as shown in fig. 2, the sample data generation method includes:
step S201, receiving a sample data generation request, further generating a preset character string, and determining an adding position of the preset character string.
Step S202, sending the preset character string and the adding position to each participant, so that each participant encrypts local data corresponding to each participant based on the preset character string and the adding position to generate encrypted data.
The principle of step S201 to step S202 is similar to that of step S101 to step S102, and is not described here again.
Specifically, step S202 can also be realized by step S2021 to step S2023:
step S2021, determining the data identifier corresponding to the adding position.
Specifically, the data identification may be 0, 1, 2, 3, etc. The data identifier is not specifically limited in the embodiments of the present application.
Step S2022, according to the data identifier, embed the preset character string into the local data corresponding to each participant, and generate each local embedded data.
After determining the data identifier corresponding to the adding position of the preset character string, the execution main body may determine preset row information and column information corresponding to the adding position again, and send the data identifier and the preset row information and column information to each participant. Each participant can position the embedded point according to the obtained data identifier and the preset row information and column information, so as to embed the preset character string into the respective local data based on the embedded point obtained by positioning, and generate each local embedded data corresponding to each participant.
For example, the local data for one participant is: x7faqgjw, the predetermined string may be abcdefg, the data identifier may be 7, and the row information and the column information may be a first row and a second column. The located embedding point is "7" of the local data of the participant, and the preset embedding rule may be after embedding the digital identifier. The execution body may embed the preset character string abcdefg into "7" of x7faqgjw to obtain x7 abcdeffaqgjw, which is the local embedded data of the participant. The embedding rule of the preset character string is not specifically limited.
Step S2023, based on the information digest algorithm, processes each local embedded data to generate a hash value, and further determines the hash value as encrypted data.
After obtaining each piece of local embedded data, the execution main body may process each piece of local embedded data based on an information digest algorithm to generate a hash value, and then determine the hash value as encrypted data. Specifically, MD5 hash calculation may be performed on each local embedded data to obtain each hash value. The execution body may further determine each hash value corresponding to each locally embedded data as the encrypted data. MD5(MD5 Message-Digest Algorithm), a widely used cryptographic hash function, may produce a 128-bit hash value to ensure the integrity of the Message transmission. MD5 is a fixed length output obtained by computing an input string of arbitrary length, the same ciphertext can only be obtained if the plaintext is the same, and the algorithm is irreversible. Even if the encrypted ciphertext is obtained, it is impossible to calculate the plaintext by the decryption function. For example, the locally embedded data x7 abcdeffaqgjw is hashed by MD5, and the obtained 4a1690d5eb6c126ef68606dda68c2f79 is a hash value.
According to the embodiment, the local data of each participant is encrypted by using MD5 hash calculation in combination with a random salt adding mode, so that the encrypted local data of each participant has higher security and is not easy to crack.
Step S203, acquiring each encrypted data, and further obtaining an intersection to obtain public encrypted data.
And step S204, determining the public sample data based on the public encryption data and outputting the public sample data.
The principle of step S203 to step S204 is similar to that of step S103 to step S104, and is not described here again.
Fig. 3 is a schematic view of an application scenario of a sample data generation method according to a third embodiment of the present application. The sample data generation method in the embodiment of the application can be applied to a scene of generating the training sample of the federal learning model. As shown in fig. 3, the server 302 receives the sample data generation request 301, generates a preset character string 303, and determines an addition position 304 of the preset character string 303. The server 302 sends a preset character string 303 and an addition position 304 to each participant 305 (which may include participant 1, participant 2, etc.), so that each participant 305 (which may include participant 1, participant 2, etc.) encrypts local data 306 (which may be local data 1 corresponding to participant 1, local data 2 corresponding to participant 2, etc.) corresponding to each participant (which may include participant 1, participant 2, etc.) based on the preset character string 303 and the addition position 304, and generates encrypted data 307 (which may include encrypted data 1, encrypted data 2, etc.). Server 302 obtains encrypted data 307 (which may include encrypted data 1, encrypted data 2, etc.), and intersects to obtain public encrypted data 308. The server 302 determines common sample data 309 based on the common encrypted data 308, and outputs it.
In the embodiment of the application, one party participating in federal learning (namely, the party A can be) generates a random salt, determines the position of adding the random salt to the sample ID, packages the information and transmits the information to the rest parties through a custom protocol. And other participants obtain the hash value of the local sample ID according to the acquired random salt and the addition rule by using an MD5 encryption algorithm according to the rule, and establish a corresponding relation table of the sample ID and the hash value locally. Each participant carries out stream type intersection on the calculated hash value, the calculated hash value can be subjected to distributed parallel processing, the public sample hash value obtained by final intersection is sent back to each participant, and after the participant obtains the returned hash value, the public sample ID can be obtained through the locally established mapping relation.
Fig. 4 is a schematic diagram of main units of a sample data generation apparatus according to an embodiment of the present application. As shown in fig. 4, the sample data generation apparatus includes a reception unit 401, an encrypted data generation unit 402, a common encrypted data determination unit 403, and a sample data determination unit 404.
A receiving unit 401 configured to receive a sample data generation request, further generate a preset character string, and determine an adding position of the preset character string;
an encrypted data generating unit 402 configured to transmit the preset character string and the adding position to each participant, so that each participant encrypts local data corresponding to each participant based on the preset character string and the adding position to generate encrypted data;
a public encrypted data determining unit 403 configured to obtain each encrypted data, and further find an intersection to obtain public encrypted data;
a sample data determination unit 404 configured to determine common sample data based on the common encrypted data, and output.
In some embodiments, the encrypted data generation unit 402 is further configured to: determining a data identifier corresponding to the adding position; embedding a preset character string into local data corresponding to each participant according to the data identification to generate each local embedded data; based on the message digest algorithm, each local embedded data is processed to generate a hash value, and the hash value is determined to be encrypted data.
In some embodiments, the sample data determination unit 404 is further configured to: establishing a corresponding relation between local data and each encrypted data locally at each participant; and based on the corresponding relation, positioning local data corresponding to the public encrypted data, and further determining the determined local data corresponding to the public encrypted data as public sample data.
In some embodiments, the public encryption data determination unit 403 is further configured to: generating an asynchronous intersection solving task based on each encrypted data; and executing each asynchronous intersection solving task to obtain public encryption data.
In some embodiments, the public encryption data determination unit 403 is further configured to: pairing every two encrypted data to generate paired encrypted data; and generating asynchronous intersection solving tasks based on the paired encrypted data.
In some embodiments, the public encryption data determination unit 403 is further configured to: executing each asynchronous intersection task to generate an encrypted intersection data set; and solving the intersection of the encrypted intersection data in the encrypted intersection data set so as to generate public encrypted data.
In some embodiments, the sample data generating apparatus further comprises a model training unit, not shown in fig. 4, configured to: model training is performed based on common sample data.
It should be noted that the sample data generation method and the sample data generation apparatus according to the present application have a corresponding relationship in the specific implementation contents, and therefore the description of the duplicated contents is omitted.
Fig. 5 illustrates an exemplary system architecture 500 to which the sample data generation method or the sample data generation apparatus of the embodiments of the present application may be applied.
As shown in fig. 5, the system architecture 500 may include terminal devices 501, 502, 503, a network 504, and a server 505. The network 504 serves to provide a medium for communication links between the terminal devices 501, 502, 503 and the server 505. Network 504 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 501, 502, 503 to interact with a server 505 over a network 504 to receive or send messages or the like. The terminal devices 501, 502, 503 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).
The terminal devices 501, 502, 503 may be various electronic devices having sample data generation processing screens and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 505 may be a server providing various services, such as a background management server (for example only) providing support for a user to generate a request using sample data submitted by the terminal devices 501, 502, 503. The background management server can receive a sample data generation request, further generate a preset character string and determine an adding position of the preset character string; sending a preset character string and an adding position to each participant so that each participant encrypts local data corresponding to each participant based on the preset character string and the adding position to generate encrypted data; acquiring each encrypted data, and further solving an intersection to obtain public encrypted data; and determining the public sample data based on the public encryption data, and outputting. The privacy intersection problem in federal learning is solved by using a Message-Digest Algorithm (MD 5) in combination with an encryption mode of adding random salt (namely, preset character strings), and the scene adaptability is strong.
The sample data generation method provided in the embodiment of the present application is generally executed by the server 505, and accordingly, the sample data generation apparatus is generally installed in the server 505.
It should be understood that the number of terminal devices, networks, and servers in fig. 5 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 6, shown is a block diagram of a computer system 600 suitable for use in implementing a terminal device of an embodiment of the present application. The terminal device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU)601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data necessary for the operation of the computer system 600 are also stored. The CPU601, ROM602, and RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output section 607 including a signal processing section such as a Cathode Ray Tube (CRT), a liquid crystal credit authorization inquiry processor (LCD), and the like, and a speaker and the like; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary.
In particular, according to embodiments disclosed herein, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, embodiments disclosed herein include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611. The above-described functions defined in the system of the present application are executed when the computer program is executed by the Central Processing Unit (CPU) 601.
It should be noted that the computer readable medium shown in the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a receiving unit, an encrypted data generating unit, a common encrypted data determining unit, and a sample data determining unit. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.
As another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs, and when the one or more programs are executed by one device, the device receives a sample data generation request, further generates a preset character string, and determines an adding position of the preset character string; sending a preset character string and an adding position to each participant so that each participant encrypts local data corresponding to each participant based on the preset character string and the adding position to generate encrypted data; acquiring each encrypted data, and further solving an intersection to obtain public encrypted data; and determining the public sample data based on the public encryption data, and outputting.
According to the technical scheme of the embodiment of the application, the privacy interaction problem in federal learning is solved by using a Message-Digest Algorithm (MD 5) in combination with an encryption mode of adding random salt (namely, a preset character string), and the scene adaptability is strong.
The above-described embodiments should not be construed as limiting the scope of the present application. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (12)

1. A sample data generating method is characterized by comprising the following steps:
receiving a sample data generation request, further generating a preset character string, and determining an adding position of the preset character string;
sending the preset character string and the adding position to each participant so that each participant encrypts local data corresponding to each participant based on the preset character string and the adding position to generate encrypted data;
obtaining each encrypted data, and further solving an intersection to obtain public encrypted data;
and determining public sample data based on the public encryption data, and outputting the public sample data.
2. The method of claim 1, wherein generating the encrypted data comprises:
determining a data identifier corresponding to the adding position;
embedding the preset character strings into local data corresponding to each participant according to the data identification to generate each local embedded data;
processing each of the locally embedded data based on an information digest algorithm to generate a hash value, and further determining the hash value as encrypted data.
3. The method of claim 1, wherein said determining common sample data comprises:
establishing a corresponding relation between local data and each encrypted data locally at each participant;
and based on the corresponding relation, positioning local data corresponding to the public encrypted data, and further determining the determined local data corresponding to the public encrypted data as public sample data.
4. The method of claim 1, wherein the intersecting yields public encryption data, comprising:
generating an asynchronous intersection solving task based on each encrypted data;
and executing each asynchronous intersection solving task to obtain public encryption data.
5. The method of claim 4, wherein generating an asynchronous intersection task based on each of the encrypted data comprises:
pairing every two encrypted data to generate paired encrypted data;
and generating an asynchronous intersection solving task based on each pair encrypted data.
6. The method of claim 4, wherein said performing asynchronous intersection tasks to obtain common encrypted data comprises:
executing each asynchronous intersection task to generate an encrypted intersection data set;
and solving the intersection of the encrypted intersection data in the encrypted intersection data set so as to generate public encrypted data.
7. The method of claim 1, wherein after said determining common sample data, the method further comprises:
and performing model training based on the common sample data.
8. A sample data generation apparatus, comprising:
the receiving unit is configured to receive a sample data generation request, further generate a preset character string, and determine an adding position of the preset character string;
an encrypted data generation unit configured to send the preset character string and the addition position to each participant so that each participant encrypts local data corresponding to each participant based on the preset character string and the addition position to generate encrypted data;
the public encrypted data determining unit is configured to acquire each encrypted data and further obtain an intersection to obtain public encrypted data;
and the sample data determining unit is configured to determine the public sample data based on the public encryption data and output the public sample data.
9. The apparatus of claim 8, wherein the encrypted data generation unit is further configured to:
determining a data identifier corresponding to the adding position;
embedding the preset character strings into local data corresponding to each participant according to the data identification to generate each local embedded data;
processing each of the locally embedded data based on an information digest algorithm to generate a hash value, and further determining the hash value as encrypted data.
10. The apparatus of claim 8, wherein the sample data determination unit is further configured to:
establishing a corresponding relation between local data and each encrypted data locally at each participant;
and based on the corresponding relation, positioning local data corresponding to the public encrypted data, and further determining the determined local data corresponding to the public encrypted data as public sample data.
11. An electronic device for generating sample data, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.
12. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-7.
CN202110974414.4A 2021-08-24 2021-08-24 Sample data generation method and device, electronic equipment and computer readable medium Pending CN113626848A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110974414.4A CN113626848A (en) 2021-08-24 2021-08-24 Sample data generation method and device, electronic equipment and computer readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110974414.4A CN113626848A (en) 2021-08-24 2021-08-24 Sample data generation method and device, electronic equipment and computer readable medium

Publications (1)

Publication Number Publication Date
CN113626848A true CN113626848A (en) 2021-11-09

Family

ID=78387437

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110974414.4A Pending CN113626848A (en) 2021-08-24 2021-08-24 Sample data generation method and device, electronic equipment and computer readable medium

Country Status (1)

Country Link
CN (1) CN113626848A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114900325A (en) * 2022-03-25 2022-08-12 杭州博盾习言科技有限公司 Privacy set intersection method, system, device and medium based on federal learning

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8566601B1 (en) * 2012-09-12 2013-10-22 Zeutro Llc Systems and methods for functional encryption using a string of arbitrary length
CN105812141A (en) * 2016-03-07 2016-07-27 东北大学 Outsourcing encrypted data-orientated verifiable intersection operation method and system
CN112861175A (en) * 2021-02-03 2021-05-28 华控清交信息科技(北京)有限公司 Data processing method and device and data processing device
CN113032817A (en) * 2021-05-21 2021-06-25 北京百度网讯科技有限公司 Data alignment method, device, equipment and medium based on block chain
CN113051239A (en) * 2021-03-26 2021-06-29 北京沃东天骏信息技术有限公司 Data sharing method, use method of model applying data sharing method and related equipment
CN113065155A (en) * 2021-03-26 2021-07-02 杭州宇链科技有限公司 Privacy set intersection method based on trusted execution environment assistance

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8566601B1 (en) * 2012-09-12 2013-10-22 Zeutro Llc Systems and methods for functional encryption using a string of arbitrary length
CN105812141A (en) * 2016-03-07 2016-07-27 东北大学 Outsourcing encrypted data-orientated verifiable intersection operation method and system
CN112861175A (en) * 2021-02-03 2021-05-28 华控清交信息科技(北京)有限公司 Data processing method and device and data processing device
CN113051239A (en) * 2021-03-26 2021-06-29 北京沃东天骏信息技术有限公司 Data sharing method, use method of model applying data sharing method and related equipment
CN113065155A (en) * 2021-03-26 2021-07-02 杭州宇链科技有限公司 Privacy set intersection method based on trusted execution environment assistance
CN113032817A (en) * 2021-05-21 2021-06-25 北京百度网讯科技有限公司 Data alignment method, device, equipment and medium based on block chain

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
申立艳;陈小军;时金桥;胡兰兰;: "隐私保护集合交集计算技术研究综述", 计算机研究与发展, no. 10, 31 December 2017 (2017-12-31) *
陈陪宁;: "基于加密技术的隐私保护在WEB信息系统中的应用研究", 电脑知识与技术, no. 12, 25 April 2013 (2013-04-25) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114900325A (en) * 2022-03-25 2022-08-12 杭州博盾习言科技有限公司 Privacy set intersection method, system, device and medium based on federal learning
CN114900325B (en) * 2022-03-25 2024-03-26 杭州博盾习言科技有限公司 Federal learning-based privacy set intersection method, system, equipment and medium

Similar Documents

Publication Publication Date Title
US10880732B2 (en) Authentication of phone caller identity
CN113364760A (en) Data encryption processing method and device, computer equipment and storage medium
CN111464295B (en) Bank card making method and device
CN108880812B (en) Method and system for data encryption
CN109743161B (en) Information encryption method, electronic device and computer readable medium
CN112966287B (en) Method, system, device and computer readable medium for acquiring user data
CN112765642A (en) Data processing method, data processing apparatus, electronic device, and medium
WO2017006118A1 (en) Secure distributed encryption system and method
US10530581B2 (en) Authenticated broadcast encryption
CN115203749A (en) Data transaction method and system based on block chain
CN111181920A (en) Encryption and decryption method and device
CN114785524A (en) Electronic seal generation method, device, equipment and medium
CN113468580B (en) Multi-party collaborative signature method and system
CN111552950B (en) Software authorization method and device and computer readable storage medium
CN112565156B (en) Information registration method, device and system
CN113626848A (en) Sample data generation method and device, electronic equipment and computer readable medium
CN113761566A (en) Data processing method and device
CN110781523B (en) Method and apparatus for processing information
CN110166226B (en) Method and device for generating secret key
CN109639409B (en) Key initialization method, key initialization device, electronic equipment and computer-readable storage medium
CN113761585A (en) Data processing method, device and system
CN111832046A (en) Trusted data evidence storing method based on block chain technology
CN112926076B (en) Data processing method, device and system
CN112738008B (en) Information synchronous changing method, device, computer and readable storage medium
CN116192466A (en) Letter processing method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination