CN117424765B - Distributed single-heat encoding method, device, electronic equipment and computer storage medium - Google Patents

Distributed single-heat encoding method, device, electronic equipment and computer storage medium Download PDF

Info

Publication number
CN117424765B
CN117424765B CN202311744887.0A CN202311744887A CN117424765B CN 117424765 B CN117424765 B CN 117424765B CN 202311744887 A CN202311744887 A CN 202311744887A CN 117424765 B CN117424765 B CN 117424765B
Authority
CN
China
Prior art keywords
global
local
category
category feature
participants
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311744887.0A
Other languages
Chinese (zh)
Other versions
CN117424765A (en
Inventor
王德健
王慧东
董科雄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin Yikang Internet Technology Co ltd
Original Assignee
Tianjin Yikang Internet Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin Yikang Internet Technology Co ltd filed Critical Tianjin Yikang Internet Technology Co ltd
Priority to CN202311744887.0A priority Critical patent/CN117424765B/en
Publication of CN117424765A publication Critical patent/CN117424765A/en
Application granted granted Critical
Publication of CN117424765B publication Critical patent/CN117424765B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/20Network architectures or network communication protocols for network security for managing network security; network security policies in general
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/0001Systems modifying transmission characteristics according to link quality, e.g. power backoff
    • H04L1/0006Systems modifying transmission characteristics according to link quality, e.g. power backoff by adapting the transmission format
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/0001Systems modifying transmission characteristics according to link quality, e.g. power backoff
    • H04L1/0015Systems modifying transmission characteristics according to link quality, e.g. power backoff characterised by the adaptation strategy

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Quality & Reliability (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The application provides a distributed single-heat encoding method, a device, electronic equipment and a computer storage medium, wherein the method comprises the following steps: local category feature sets of all the participants are obtained, and the local category feature sets are category feature sets obtained after local data of all the participants are de-duplicated; generating a global encoder according to the local category feature set; and sending a global encoder to each participant, wherein the global encoder is used for performing one-time encoding on the local data of each participant. The server collects the local category characteristic set, so that the privacy and safety of data transmission among all parties can be protected. In addition, all the participants encode according to the global encoder, so that all the participants encode the local data by using the same encoding mode, the condition that the unique hot encoding vectors of the data encoded by all the participants are inconsistent or conflict can be avoided, and the unified and management of the data of all the participants are facilitated.

Description

Distributed single-heat encoding method, device, electronic equipment and computer storage medium
Technical Field
The present invention relates to the field of data processing, and in particular, to a distributed single thermal encoding method, apparatus, electronic device, and computer storage medium.
Background
In federal learning, if some of the data is character-type data, then it is necessary to unithermally encode the data. Current one-time thermal encoding typically collects the features to be encoded directly for one-time thermal encoding, i.e., gathering the original data into one place. This approach may present privacy concerns that tend to expose the data privacy of the parties.
Disclosure of Invention
In view of the foregoing, an object of an embodiment of the present application is to provide a distributed single-hot encoding method, apparatus, electronic device, and computer storage medium, which can protect the privacy and security of data during single-hot encoding.
In a first aspect, an embodiment of the present application provides a distributed single thermal encoding method, applied to a server, including: obtaining local category feature sets of all the participants, wherein the local category feature sets are category feature sets obtained after local data of all the participants are de-duplicated; generating a global encoder according to the local category feature set; the global encoders are sent to the respective participants, the global encoders being used to unithermally encode the local data of the respective participants.
In the implementation process, when the server collects the data of each participant, the local category feature sets are collected instead of the original data of the local data, so that the privacy and safety of data transmission between the participant and the server and between the participant and the participant can be protected. In addition, when the server and the participators perform data transmission, only the local category feature set and the global encoder are required to be transmitted, and the original data and the independent heat encoding vectors corresponding to the original data are not required to be transmitted, so that the data transmission quantity can be reduced, and the communication quality and efficiency are improved. Moreover, the server generates the global encoder after acquiring the local category feature set of each participant, and all the participants encode according to the global encoder, so that all the participants encode the local data by using the same encoding mode, thereby avoiding the condition that the unique heat encoding vectors of the data encoding of each participant are inconsistent or conflict, and being convenient for the data unification and management of each participant.
In one embodiment, the generating a global encoder from the local category feature set includes: performing de-duplication and aggregation treatment on all the obtained local category feature sets to obtain a global category feature set; and generating a global encoder according to the global category characteristic set.
In the implementation process, the repeated category characteristics of a plurality of participants are removed by carrying out the de-duplication treatment on all the obtained local category characteristic sets, so that repeated independent-heat encoding of the repeated category characteristics by a server can be reduced, and the independent-heat encoding efficiency is improved.
In one embodiment, the generating a global encoder from the global category feature set includes: and converting each value of each class feature in the global class feature set into a single thermal coding vector, and generating the global encoder according to a plurality of single thermal coding vectors corresponding to a plurality of global class feature sets.
In the implementation process, the single-hot coding vector corresponding to the possible value of the class feature appearing in the local data of all the participants can be obtained by performing single-hot coding on the class feature in the global class feature set. That is, all the participants use the same coding mode, so that the condition that the coding results of a plurality of participants are inconsistent or conflict can be avoided, and the unification and management of data are facilitated.
In one embodiment, the generating a global encoder from the local category feature set includes: and respectively performing single-heat coding on class features in the local class feature sets of all the participants to obtain global encoders corresponding to a plurality of the participants.
In the implementation process, after the server acquires the local category feature sets, the server directly performs independent thermal coding on the category features in each local category feature set, unified processing on the category features in the local category feature sets is not needed, and for scenes with fewer participants, the performance requirement on the server can be reduced, and the independent thermal coding cost is reduced.
In one embodiment, the global encoder includes a name and a corresponding specific coding value of each category feature, where the name and the corresponding specific coding value of each category feature are used to perform one-hot coding on the local data of each participant.
In the implementation process, by setting the name of each category characteristic and the corresponding specific coding value in the global coder, when a participant codes local data, the name of the local data can be directly matched with the name of the category characteristic in the global coder to obtain the specific coding value corresponding to the local data, and then the independent heat coding of the local data of the participant can be completed through each participant, when the server and the participant perform data transmission, only the local category characteristic set and the global coder are required to be transmitted, and the independent heat coding vectors corresponding to the original data and the original data are not required to be transmitted, so that the data transmission quantity can be reduced, and the communication quality and the communication efficiency are improved.
In a second aspect, embodiments of the present application further provide a distributed one-hot encoding method, applied to a participant, including: performing deduplication processing on each category characteristic of the local data of the participant to obtain a local category characteristic set corresponding to the participant; transmitting the local category feature set to a server; receiving global encoders sent by the servers, wherein the global encoders are generated according to the local category feature sets of a plurality of participants; and performing one-time thermal encoding on the local data of the participant according to the global encoder.
In the implementation process, the participant performs the de-duplication processing on the original data to obtain the local category feature set, and transmits the local category feature set to the server instead of the original data of the local data, so that the privacy and safety of data transmission between the participant and the server and between the participant and the participant can be protected. In addition, when the server and the participators perform data transmission, only the local category feature set and the global encoder are required to be transmitted, and the original data and the independent heat encoding vectors corresponding to the original data are not required to be transmitted, so that the data transmission quantity can be reduced, and the communication quality and efficiency are improved. Furthermore, all the participants encode according to the global encoder, so that all the participants encode the local data by using the same encoding mode, and further, the condition that the unique hot encoding vectors of the data encoded by all the participants are inconsistent or conflict can be avoided, and the unification and management of the data of all the participants are facilitated.
In a third aspect, an embodiment of the present application further provides a distributed single thermal encoding apparatus, applied to a server, including: the acquisition module is used for acquiring local category feature sets of all the participants, wherein the local category feature sets are category feature sets obtained by de-duplication of local data of all the participants; the generation module is used for generating a global encoder according to the local category characteristic set; and the first sending module is used for sending the global encoder to each participant, and the global encoder is used for performing one-time thermal encoding on the local data of each participant.
In a fourth aspect, embodiments of the present application further provide a distributed single thermal encoding apparatus, applied to a participant, including: the processing module is used for carrying out de-duplication processing on each category characteristic of the local data of the participator so as to obtain a local category characteristic set corresponding to the participator; the second sending module is used for sending the local category feature set to a server; the receiving module is used for receiving the global encoder sent by the server, and the global encoder is generated according to the local category feature sets of the participants; and the encoding module is used for performing one-time thermal encoding on the local data of the participant according to the global encoder.
In a fifth aspect, embodiments of the present application further provide an electronic device, including: a processor, a memory storing machine-readable instructions executable by the processor, which when executed by the processor perform the steps of the method of the first aspect, or any one of the possibilities of the first aspect, the second aspect, or any one of the possible implementation manners of the second aspect.
In a sixth aspect, the embodiments of the present application further provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the distributed one-hot encoding method of the first aspect, or any of the possible implementation manners of the first aspect.
In order to make the above objects, features and advantages of the present application more comprehensible, embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered limiting the scope, and that other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic diagram of interaction between a server and a participant provided in an embodiment of the present application;
fig. 2 is a schematic block diagram of an electronic device according to an embodiment of the present application;
fig. 3 is a flowchart of a distributed single-hot encoding method applied to a server according to an embodiment of the present application;
fig. 4 is a flowchart of a distributed single-hot encoding method applied to a participant according to an embodiment of the present application;
fig. 5 is a schematic functional block diagram of a distributed single thermal encoding device applied to a server according to an embodiment of the present application;
fig. 6 is a schematic functional block diagram of a distributed single thermal encoding device applied to a participant according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only to distinguish the description, and are not to be construed as indicating or implying relative importance.
Machine learning is a technique that utilizes data and algorithms to allow computers to learn and optimize automatically. A machine learning model is a mathematical function that can produce output results from input data. To train a machine learning model, we need to provide a large amount of data as input and the corresponding output results as labels. The data and the labels form a training set for adjusting parameters of the model so that the model can better fit the rules of the data.
However, not all data can be directly input as a model, and the input requirements for machine learning must be numerical data. However, many of the features in a data set that are typically acquired are character-type data, not numerical-type data, and this portion of the data also contains important information, such as gender, color, country, etc., which are character-type data. To solve this problem, it is often necessary to encode, i.e. convert, category type data into numerical type features. If the conversion is done directly, because there is no size or order relationship between them, such as 1 for men and 2 for women, the model may misunderstand that women are larger than men, or that there is some linear relationship between women and men, which is clearly unreasonable.
One common coding scheme is one-hot encoding (one-hot encoding). The idea of one-hot coding is to represent each class with a vector containing only 0 and 1, where only one element is 1 and the remaining elements are 0. The location of this element corresponds to the index of the category in all possible categories. For example, if there are three categories: red, green, blue, they can be represented by the following three vectors: red: [1, 0], green: [0,1,0], blue: [0,0,1]. Thus, each category has a unique vector representation and there is no relationship in size or order between the vectors. This avoids false assumptions or inferences made by the model about the type data.
However, the inventor of the application finds that, under a distributed scene, independent heat encoding is needed when multiple parties perform federal learning joint model training, and in this case, if the characteristics to be encoded are directly collected to perform independent heat encoding, on one hand, privacy problems exist, and the data privacy of each participant is exposed. On the other hand, there is a problem of transmission efficiency when the data amount is excessively large.
In view of this, the present application proposes a distributed single thermal encoding, which can protect privacy and security of data transmission between a participant and a server and between the participant and the participant by setting the server to collect a local category feature set when collecting data of each participant, but not collecting original data of local data. In addition, when the server and the participators perform data transmission, only the local category feature set and the global encoder are required to be transmitted, and the original data and the independent heat encoding vectors corresponding to the original data are not required to be transmitted, so that the data transmission quantity can be reduced, and the communication quality and efficiency are improved.
For the convenience of understanding the present embodiment, an operation environment for executing a distributed single thermal encoding method disclosed in the embodiments of the present application will be described in detail.
Fig. 1 is a schematic diagram of interaction between a server and a participant according to an embodiment of the present application. The server is communicatively coupled to one or more participants via a network for data communication or interaction.
The server herein may be a web server, a database server, or the like. The server is used for performing one-time thermal coding on category characteristics in the local category characteristic set sent by the participants, and sending the generated global encoder to each participant.
The participants may be personal computers (personal computer, PC), tablet computers, smartphones, personal digital assistants (personal digital assistant, PDA) and other terminal devices that require unique thermal encoding. The participant is used for carrying out de-duplication processing on the local data to obtain a local category characteristic set. And performing one-time thermal encoding on the local data of the participant according to the global encoder.
In order to facilitate understanding of the present embodiment, an electronic device performing the distributed single thermal encoding method disclosed in the embodiments of the present application will be described in detail. The electronic equipment can be arranged on the server or the plurality of participants, and the arrangement mode of the electronic equipment can be adjusted according to actual conditions, so that the electronic equipment is not particularly limited.
As shown in fig. 2, a block schematic diagram of the electronic device is shown. The electronic device 100 may include a memory 111, a memory controller 112, a processor 113, a peripheral interface 114, and an input output unit 115. It will be appreciated by those of ordinary skill in the art that the configuration shown in fig. 2 is merely illustrative and is not limiting of the configuration of the electronic device 100. For example, electronic device 100 may also include more or fewer components than shown in FIG. 2, or have a different configuration than shown in FIG. 2.
The above-mentioned memory 111, memory controller 112, processor 113, peripheral interface 114 and input/output unit 115 are electrically connected to each other directly or indirectly to realize data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. The processor 113 is used to execute executable modules stored in the memory.
The Memory 111 may be, but is not limited to, a random access Memory (Random Access Memory, RAM), a Read Only Memory (ROM), a programmable Read Only Memory (Programmable Read-Only Memory, PROM), an erasable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM), an electrically erasable Read Only Memory (Electric Erasable Programmable Read-Only Memory, EEPROM), etc. The memory 111 is configured to store a program, and the processor 113 executes the program after receiving an execution instruction, and a method executed by the electronic device 100 defined by the process disclosed in any embodiment of the present application may be applied to the processor 113 or implemented by the processor 113.
The processor 113 may be an integrated circuit chip having signal processing capabilities. The processor 113 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP for short), etc.; but also digital signal processors (digital signal processor, DSP for short), application specific integrated circuits (Application Specific Integrated Circuit, ASIC for short), field Programmable Gate Arrays (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The peripheral interface 114 couples various input/output devices to the processor 113 and the memory 111. In some embodiments, the peripheral interface 114, the processor 113, and the memory controller 112 may be implemented in a single chip. In other examples, they may be implemented by separate chips.
The input-output unit 115 described above is used to provide input data to a user. The input/output unit 115 may be, but is not limited to, a mouse, a keyboard, and the like.
The electronic device 100 in the present embodiment may be used to perform each step in each method provided in the embodiments of the present application. The implementation of the distributed single thermal encoding method is described in detail below by way of several embodiments.
Fig. 3 is a flowchart of a distributed single-hot encoding method applied to a server according to an embodiment of the present application. The specific flow shown in fig. 3 will be described in detail.
Step 201, a local category feature set of each participant is obtained.
Each party herein refers to one or more clients that need to be uni-thermally encoded. Each participant includes one or more local data therein, which may belong to one or more category characteristics.
The local category feature set is a category feature set obtained by de-duplication of local data of each participant.
It should be understood that if a certain participant includes a plurality of local data, there may be a situation that the local data partially overlap, and the local data is subjected to deduplication processing, so that a category feature set of the local data of the participant can be obtained.
Illustratively, if a party includes: "Man, woman, man, other, man, other" local data, then after the local data of the participant is subjected to the deduplication processing, the obtained local category feature set may be: male, female, others.
If a certain participant includes: "yellow, green, yellow, red, green, purple, red, green, purple" local data, then the local data of the participant is subjected to the deduplication processing, and the obtained local category feature set may be: yellow, green, purple, red.
In one embodiment, each participant performs deduplication processing on the local data to obtain a corresponding local category feature set, and sends the local category feature set obtained by each participant to a server.
Step 202, generating a global encoder according to the local category feature set.
It should be understood that after the server obtains the local category feature sets sent by each participant, the server performs one-hot encoding on the local category feature sets to obtain a corresponding global encoder.
The global encoder here is a set of one-hot encoding vectors corresponding to possible values of the class features that have appeared in all the participants' local data.
Step 203, the global encoders are sent to the respective participants.
The global encoder is used here to thermally encode the local data of the individual participants.
It will be appreciated that the server, upon obtaining the global encoder, sends the global encoder to each participant separately. After each participant receives the global encoder, each participant uses the global encoder to perform one-time encoding on the corresponding local data.
Illustratively, if the server-generated global encoder is { Man, [1, 0], woman, [0,1,0], others, [0, 1] }. The local data of party a are: { Man, woman, man }, the local data of party B is: { Man, woman, man }, participant C's local data is: { other, female }, participant D's local data is: { other, female, other, male }, then party a is receiving the global encoder, and the unique hot encoding vectors obtained by unique hot encoding the local data of the global encoder are respectively: { [1, 0], [0,1,0], [1, 0] }. The party B receives the global encoder, and the independent heat encoding vectors obtained by independent heat encoding of the local data of the party B are respectively: { [1, 0], [0,1,0], [1, 0] }. The global encoder is received by the party C, and the independent heat encoding vectors obtained by independent heat encoding of the local data of the party C are respectively as follows: { [0, 1], [0,1,0] }. And when receiving the global encoder, the participant D obtains the independent heat encoding vectors obtained by independent heat encoding of the local data of the global encoder, wherein the independent heat encoding vectors are respectively as follows: { [0, 1], [0,1,0], [0, 1], [1, 0] }.
In the implementation process, when the server collects the data of each participant, the local category feature sets are collected instead of the original data of the local data, so that the privacy and safety of data transmission between the participant and the server and between the participant and the participant can be protected. In addition, when the server and the participators perform data transmission, only the local category feature set and the global encoder are required to be transmitted, and the original data and the independent heat encoding vectors corresponding to the original data are not required to be transmitted, so that the data transmission quantity can be reduced, and the communication quality and efficiency are improved. Moreover, the server generates the global encoder after acquiring the local category feature set of each participant, and all the participants encode according to the global encoder, so that all the participants encode the local data by using the same encoding mode, thereby avoiding the condition that the unique heat encoding vectors of the data encoding of each participant are inconsistent or conflict, and being convenient for the data unification and management of each participant.
In one possible implementation, step 202 includes: performing de-duplication and aggregation treatment on all the obtained local category feature sets to obtain a global category feature set; and generating a global encoder according to the global category feature set.
It will be appreciated that the server receives a local category feature set for a plurality of participants and that the server needs to encode all local category feature sets received. The local category feature sets of the multiple participants may have partially overlapped portions, so that the server may perform deduplication processing on the received local category feature sets before encoding to obtain category features of local data of all the participants, and then aggregate the category features to obtain a global category feature set.
The global class feature set here contains possible values of class features that have appeared in the local data of all participants.
Illustratively, if the local category feature sets received by the server are respectively: { Man, woman }, { other }, { Man, woman, other }, { woman }, { man, other }, { man, female }, then the global category feature set may be { man, woman, other }.
If the local category feature sets received by the server are respectively: { Red, green }, { yellow, green }, { purple, yellow, red }, { purple }, { yellow, purple }, { Red, yellow, green }, then the global category feature set may be { Red, green, yellow, purple }.
It should be understood that after the global category feature set is determined, each category feature in the global category feature set may be separately unithermally coded, and the unithermally coded corresponding to all the category features in the global category feature set may be collected to generate a global encoder corresponding to the global category feature set.
In the implementation process, the repeated category characteristics of a plurality of participants are removed by carrying out the de-duplication treatment on all the obtained local category characteristic sets, so that repeated independent-heat encoding of the repeated category characteristics by a server can be reduced, and the independent-heat encoding efficiency is improved.
In one possible implementation, generating a global encoder from a global category feature set includes: and converting each value of each category feature in the global category feature set into a single thermal coding vector, and generating a global encoder according to a plurality of single thermal coding vectors corresponding to the plurality of global category feature sets.
It should be understood that after obtaining the global category feature set, the server may perform the one-hot encoding on each category feature in the global category feature set, where the set of one-hot encoding vectors of all category features in the global category feature set is a global encoder corresponding to the global category feature set.
Illustratively, if the global class feature set is { male, female, other }, then the male, female, and other in the global class feature set are separately unithermally encoded to obtain [1, 0], [0,1,0], and [0, 1], then the global encoder may be { male, [1, 0], female, [0,1,0], and other, [0, 1] }.
If the global category feature set is { yellow, green, red, purple }, then the yellow, green, red and purple in the global category feature set are thermally coded separately, to obtain [1, 0], [0,1,0], [0, 1], the global encoder may be yellow, the global code the device may be a device that is { yellow.
In the implementation process, the single-hot coding vector corresponding to the possible value of the class feature appearing in the local data of all the participants can be obtained by performing single-hot coding on the class feature in the global class feature set. That is, all the participants use the same coding mode, so that the condition that the coding results of a plurality of participants are inconsistent or conflict can be avoided, and the unification and management of data are facilitated.
In one possible implementation, step 202 includes: and respectively performing single-heat coding on class features in the local class feature sets of all the participants to obtain global encoders corresponding to a plurality of the participants.
In one embodiment, after the server obtains the local category feature set of each participant, the server may directly perform one-time encoding on the category features in the obtained local category feature set, and generate a global encoder corresponding to the local category feature set.
The global encoder may be plural, and the number of the global encoders may be determined according to the number of the participants. For example, 3 participants correspond to 3 global encoders, 2 participants correspond to 2 global encoders, etc.
For example, if there are 3 participants, namely, participant 1, participant 2 and participant 3, which need to perform the one-time encoding, the local category feature sets generated by the three participants are also 3, namely, the local category feature set 1, the local category feature set 2 and the local category feature set 3. After the server obtains the local category feature set 1, the local category feature set 2 and the local category feature set 3, the server performs independent heat coding on the category features in each local feature set, and generates a corresponding global encoder according to the independent heat coding vector of the category feature in each local feature set, so that the global encoder 1, the global encoder 2 and the global encoder 3 can be generated respectively.
If 2 participants need to perform the single-hot coding, namely, a participant 1 and a participant 2, the local category feature sets generated by the two participants are also 2, namely, a local category feature set 1 and a local category feature set 2. After the server obtains the local category feature set 1 and the local category feature set 2, the server performs independent thermal coding on the category features in each local feature set, and generates a corresponding global encoder according to the independent thermal coding vector of the category feature in each local feature set, so that the global encoder 1 and the global encoder 2 can be generated respectively.
It should be understood that if the server directly performs the one-hot encoding on the category features in the obtained local category feature set, and generates the global encoder corresponding to the local category feature set. The server, when sending the global encoders to the participants, sends each global encoder to the corresponding participant, respectively.
Illustratively, as in the example above, if the server-generated global encoder includes global encoder 1, global encoder 2, and global encoder 3, then the server, upon sending the global encoder to the participant, sends global encoder 1 to participant 1, global encoder 2 to participant 2, and global encoder 3 to participant 3.
In the implementation process, after the server acquires the local category feature sets, the server directly performs independent thermal coding on the category features in each local category feature set, unified processing on the category features in the local category feature sets is not needed, and for scenes with fewer participants, the performance requirement on the server can be reduced, and the independent thermal coding cost is reduced.
In one possible implementation, the global encoder includes a name of each category feature and a corresponding specific encoding value.
The names of the category characteristics and the corresponding specific coding values are used for carrying out single-hot coding on the local data of each participant.
It should be appreciated that after the server performs the one-hot encoding on the category characteristics, the server generates one-hot encoding vectors corresponding to the category characteristics. Because the global encoder is sent to each participant, each participant needs to perform the one-time thermal encoding on the local data of the participant according to the global encoder, if the global encoder has only one-time thermal encoding vector representing the category characteristic, the participant cannot determine which one-time thermal encoding vector corresponds to each local data when performing one-time thermal encoding on the local data according to the global encoder. Thus, the global encoder may include a name for each category feature, as well as a specific code value for that category feature.
After the participant acquires the global encoder, the local data in the participant is matched with the names of the class features in the global encoder, and after the local data is matched with the corresponding class features, the specific coding value corresponding to the class features is used as the coding value of the local data.
Illustratively, if the global encoder is { yellow, [1, 0], green, [0,1, 0], red, [0,1, 0], purple, [0, 1] }, the local data of the participants are respectively: yellow, green, purple, red, green. When the participant performs the single-heat coding, matching the yellow in the local data with the global encoder to obtain a matching result of yellow, [1, 0] "; matching the ' green ' in the local data with the global encoder to obtain a matching result of ' green, [0,1, 0] "; matching the red in the local data with the global encoder to obtain a matching result of red, 0,1, 0; and matching the 'purple' in the local data with the global encoder to obtain a matching result of 'purple, [0,1 ]'. The local data one-time thermal encoding result of the participant is: { [1, 0], [0,1, 0], [0, 1] [0, 1], [0,1, 0], [0,1, 0] }.
In the implementation process, by setting the name of each category characteristic and the corresponding specific coding value in the global coder, when a participant codes local data, the name of the local data can be directly matched with the name of the category characteristic in the global coder to obtain the specific coding value corresponding to the local data, and then the independent heat coding of the local data of the participant can be completed through each participant, when the server and the participant perform data transmission, only the local category characteristic set and the global coder are required to be transmitted, and the independent heat coding vectors corresponding to the original data and the original data are not required to be transmitted, so that the data transmission quantity can be reduced, and the communication quality and the communication efficiency are improved.
Fig. 4 is a flowchart of a distributed single-hot encoding method applied to a participant according to an embodiment of the present application. The specific flow shown in fig. 4 will be described in detail.
Step 301, performing deduplication processing on each category feature of the local data of the participant to obtain a local category feature set corresponding to the participant.
It should be understood that if a certain participant includes a plurality of local data, there may be a situation that the local data partially overlap, and the local data is subjected to deduplication processing, so that a category feature set of the local data of the participant can be obtained.
And each participant performs de-duplication processing on each category characteristic in the participant to obtain a local category characteristic set of the participant.
Step 302, a local category feature set is sent to a server.
It will be appreciated that each participant, after having obtained a respective set of local category characteristics, sends the local category characteristics to the server. After receiving the local category feature sets sent by each participant, the server generates a corresponding global encoder.
Step 303, receiving the global encoder sent by the server.
The global encoder herein is generated from a set of local category characteristics for a plurality of participants.
Step 304, the local data of the participants are thermally encoded solely according to the global encoder.
It should be understood that after the participant obtains the global encoder, the local data of the participant is matched with the global encoder to obtain specific coding values corresponding to each local data, and the specific coding values are the independent heat coding vectors corresponding to each local data.
In the implementation process, the participant performs the de-duplication processing on the original data to obtain the local category feature set, and transmits the local category feature set to the server instead of the original data of the local data, so that the privacy and safety of data transmission between the participant and the server and between the participant and the participant can be protected. In addition, when the server and the participators perform data transmission, only the local category feature set and the global encoder are required to be transmitted, and the original data and the independent heat encoding vectors corresponding to the original data are not required to be transmitted, so that the data transmission quantity can be reduced, and the communication quality and efficiency are improved. Furthermore, all the participants encode according to the global encoder, so that all the participants encode the local data by using the same encoding mode, and further, the condition that the unique hot encoding vectors of the data encoded by all the participants are inconsistent or conflict can be avoided, and the unification and management of the data of all the participants are facilitated.
Based on the same application conception, the embodiment of the application also provides a distributed single-heat encoding device applied to the server, which corresponds to the distributed single-heat encoding method, and because the principle of solving the problem of the device in the embodiment of the application is similar to that of the embodiment of the distributed single-heat encoding method, the implementation of the device in the embodiment of the application can be referred to the description in the embodiment of the method, and the repetition is omitted.
Fig. 5 is a schematic functional block diagram of a distributed single-heat encoding device applied to a server according to an embodiment of the present application. Each module in the distributed single thermal encoding apparatus in this embodiment is configured to perform each step in the above-described method embodiment. The distributed single-heat encoding device comprises an acquisition module 401, a generation module 402 and a first sending module 403; wherein,
the obtaining module 401 is configured to obtain a local category feature set of each participant, where the local category feature set is a category feature set obtained by deduplicating local data of each participant.
The generating module 402 is configured to generate a global encoder according to the local category feature set.
A first transmitting module 403 is configured to transmit the global encoder to the respective participants, where the global encoder is configured to perform one-time thermal encoding on the local data of the respective participants.
In a possible implementation manner, the generating module 402 is further configured to: performing de-duplication and aggregation treatment on all the obtained local category feature sets to obtain a global category feature set; and generating a global encoder according to the global category characteristic set.
In a possible implementation manner, the generating module 402 is specifically configured to: and converting each value of each class feature in the global class feature set into a single thermal coding vector, and generating the global encoder according to a plurality of single thermal coding vectors corresponding to a plurality of global class feature sets.
In a possible implementation manner, the generating module 402 is further configured to: and respectively performing single-heat coding on class features in the local class feature sets of all the participants to obtain global encoders corresponding to a plurality of the participants.
Based on the same application conception, the embodiment of the application also provides a distributed single-heat encoding device applied to the participants, which corresponds to the distributed single-heat encoding method, and because the principle of solving the problem of the device in the embodiment of the application is similar to that of the embodiment of the distributed single-heat encoding method, the implementation of the device in the embodiment of the application can refer to the description in the embodiment of the method, and the repetition is omitted.
Fig. 6 is a schematic functional block diagram of a distributed single-heat encoding device applied to a participant according to an embodiment of the present application. Each module in the distributed single thermal encoding apparatus in this embodiment is configured to perform each step in the above-described method embodiment. The distributed single-heat encoding device comprises a processing module 501, a second sending module 502, a receiving module 503 and an encoding module 504; wherein,
the processing module 501 is configured to perform deduplication processing on each category feature of the local data of the participant, so as to obtain a local category feature set corresponding to the participant.
The second sending module 502 is configured to send the local category feature set to a server.
The receiving module 503 is configured to receive a global encoder sent by the server, where the global encoder is generated according to the local category feature sets of the multiple participants.
The encoding module 504 is configured to perform one-time encoding on the local data of the participant according to the global encoder.
Furthermore, the embodiments of the present application also provide a computer readable storage medium, on which a computer program is stored, which when executed by a processor performs the steps of the distributed single thermal encoding method described in the above method embodiments.
The computer program product of the distributed single thermal encoding method provided in the embodiments of the present application includes a computer readable storage medium storing program codes, where the instructions included in the program codes may be used to execute the steps of the distributed single thermal encoding method described in the embodiments of the method, and the embodiments of the method may be referred to specifically and not be repeated herein.
In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other manners as well. The apparatus embodiments described above are merely illustrative, for example, flow diagrams and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, the functional modules in the embodiments of the present application may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes. It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the same, but rather, various modifications and variations may be made by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present application should be included in the protection scope of the present application. It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (8)

1. A distributed one-time thermal encoding method, applied to a server, comprising:
obtaining local category feature sets of all the participants, wherein the local category feature sets are category feature sets obtained after local data of all the participants are de-duplicated;
Generating a global encoder according to the local category feature set;
transmitting the global encoders to the respective participants, the global encoders being used for performing one-time encoding on the local data of the respective participants;
the generating a global encoder according to the local category feature set comprises the following steps:
performing de-duplication and aggregation treatment on all the obtained local category feature sets to obtain a global category feature set;
generating a global encoder according to the global category feature set;
the generating a global encoder according to the global category feature set includes:
converting each value of each class feature in the global class feature set into a single thermal coding vector, and generating the global encoder according to a plurality of single thermal coding vectors corresponding to a plurality of global class feature sets; the global encoder comprises a name of each category characteristic and a corresponding specific coding value.
2. The method of claim 1, wherein the generating a global encoder from the set of local category features comprises:
and respectively performing single-heat coding on class features in the local class feature sets of all the participants to obtain global encoders corresponding to a plurality of the participants.
3. A method according to claim 1 or 2, characterized in that the names of the category characteristics and the corresponding specific code values are used for the one-time encoding of the local data of the respective participants.
4. A distributed one-time thermal encoding method, applied to a participant, comprising:
performing deduplication processing on each category characteristic of the local data of the participant to obtain a local category characteristic set corresponding to the participant;
transmitting the local category feature set to a server;
receiving global encoders sent by the servers, wherein the global encoders are generated according to the local category feature sets of a plurality of participants;
performing one-time thermal encoding on the local data of the participant according to the global encoder;
the global encoder obtains a global category feature set by carrying out de-duplication and aggregation on all the obtained local category feature sets; converting each value of each class feature in the global class feature set into a single-hot coding vector, and generating according to a plurality of single-hot coding vectors corresponding to a plurality of global class feature sets; the global encoder comprises a name of each category characteristic and a corresponding specific coding value.
5. A distributed single thermal encoding apparatus, for use with a server, comprising:
the acquisition module is used for acquiring local category feature sets of all the participants, wherein the local category feature sets are category feature sets obtained by de-duplication of local data of all the participants;
the generation module is used for generating a global encoder according to the local category characteristic set;
a first sending module, configured to send the global encoder to each of the participants, where the global encoder is configured to perform one-time thermal encoding on the local data of each of the participants;
the generation module is further used for carrying out de-duplication and aggregation on all the obtained local category feature sets to obtain a global category feature set; generating a global encoder according to the global category feature set;
the generation module is further configured to convert each value of each category feature in the global category feature set into a single thermal coding vector, and generate the global encoder according to a plurality of single thermal coding vectors corresponding to a plurality of global category feature sets; the global encoder comprises a name of each category characteristic and a corresponding specific coding value.
6. A distributed single thermal encoding apparatus for use with a participant, comprising:
the processing module is used for carrying out de-duplication processing on each category characteristic of the local data of the participator so as to obtain a local category characteristic set corresponding to the participator;
the second sending module is used for sending the local category feature set to a server;
the receiving module is used for receiving the global encoder sent by the server, and the global encoder is generated according to the local category feature sets of the participants;
the encoding module is used for performing one-time encoding on the local data of the participant according to the global encoder;
the global encoder obtains a global category feature set by carrying out de-duplication and aggregation on all the obtained local category feature sets; converting each value of each class feature in the global class feature set into a single-hot coding vector, and generating according to a plurality of single-hot coding vectors corresponding to a plurality of global class feature sets; the global encoder comprises a name of each category characteristic and a corresponding specific coding value.
7. An electronic device, comprising: a processor, a memory storing machine-readable instructions executable by the processor, which when executed by the processor perform the steps of the method of any of claims 1 to 4 when the electronic device is run.
8. A computer-readable storage medium, characterized in that it has stored thereon a computer program which, when executed by a processor, performs the steps of the method according to any of claims 1 to 4.
CN202311744887.0A 2023-12-19 2023-12-19 Distributed single-heat encoding method, device, electronic equipment and computer storage medium Active CN117424765B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311744887.0A CN117424765B (en) 2023-12-19 2023-12-19 Distributed single-heat encoding method, device, electronic equipment and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311744887.0A CN117424765B (en) 2023-12-19 2023-12-19 Distributed single-heat encoding method, device, electronic equipment and computer storage medium

Publications (2)

Publication Number Publication Date
CN117424765A CN117424765A (en) 2024-01-19
CN117424765B true CN117424765B (en) 2024-03-22

Family

ID=89532941

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311744887.0A Active CN117424765B (en) 2023-12-19 2023-12-19 Distributed single-heat encoding method, device, electronic equipment and computer storage medium

Country Status (1)

Country Link
CN (1) CN117424765B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7148824B1 (en) * 2005-08-05 2006-12-12 Xerox Corporation Automatic detection of character encoding format using statistical analysis of the text strings
CN112598139A (en) * 2020-12-22 2021-04-02 百度在线网络技术(北京)有限公司 Category coding method, category coding device, category coding apparatus, storage medium, and program product
WO2021114618A1 (en) * 2020-05-14 2021-06-17 平安科技(深圳)有限公司 Federated learning method and apparatus, computer device, and readable storage medium
CN113642664A (en) * 2021-08-24 2021-11-12 安徽大学 Privacy protection image classification method based on federal learning
GB202204687D0 (en) * 2022-03-31 2022-05-18 V Nova Int Ltd Pre-analysis for video encoding
CN115841133A (en) * 2022-12-13 2023-03-24 成都卫士通信息产业股份有限公司 Method, device and equipment for federated learning and storage medium
CN116167456A (en) * 2023-02-27 2023-05-26 杭州电子科技大学 Federal learning method based on code comparison and classification correction
WO2023098546A1 (en) * 2021-12-02 2023-06-08 华为技术有限公司 Federated learning method and related device
CN116821429A (en) * 2023-03-01 2023-09-29 杭州后量子密码科技有限公司 Safe text classification method and system based on privacy set operation

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090307209A1 (en) * 2008-06-10 2009-12-10 David Carmel Term-statistics modification for category-based search
US10469578B2 (en) * 2011-11-28 2019-11-05 Pure Storage, Inc. Prioritization of messages of a dispersed storage network
CN108536705B (en) * 2017-03-02 2021-10-01 华为技术有限公司 Coding and operation method of object in database system and database server
US11423282B2 (en) * 2018-10-30 2022-08-23 Huawei Technologies Co., Ltd. Autoencoder-based generative adversarial networks for text generation
KR20230165503A (en) * 2022-05-27 2023-12-05 서울대학교산학협력단 Method for distributed matrix multiplication using task entanglement-based coding

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7148824B1 (en) * 2005-08-05 2006-12-12 Xerox Corporation Automatic detection of character encoding format using statistical analysis of the text strings
WO2021114618A1 (en) * 2020-05-14 2021-06-17 平安科技(深圳)有限公司 Federated learning method and apparatus, computer device, and readable storage medium
CN112598139A (en) * 2020-12-22 2021-04-02 百度在线网络技术(北京)有限公司 Category coding method, category coding device, category coding apparatus, storage medium, and program product
CN113642664A (en) * 2021-08-24 2021-11-12 安徽大学 Privacy protection image classification method based on federal learning
WO2023098546A1 (en) * 2021-12-02 2023-06-08 华为技术有限公司 Federated learning method and related device
GB202204687D0 (en) * 2022-03-31 2022-05-18 V Nova Int Ltd Pre-analysis for video encoding
CN115841133A (en) * 2022-12-13 2023-03-24 成都卫士通信息产业股份有限公司 Method, device and equipment for federated learning and storage medium
CN116167456A (en) * 2023-02-27 2023-05-26 杭州电子科技大学 Federal learning method based on code comparison and classification correction
CN116821429A (en) * 2023-03-01 2023-09-29 杭州后量子密码科技有限公司 Safe text classification method and system based on privacy set operation

Also Published As

Publication number Publication date
CN117424765A (en) 2024-01-19

Similar Documents

Publication Publication Date Title
CN111914277B (en) Intersection data generation method and federal model training method based on intersection data
US20190163699A1 (en) Method and apparatus for information interaction
CN105205169A (en) Distributed image index and retrieval method
Liu et al. Interval‐valued intuitionistic fuzzy ordered weighted cosine similarity measure and its application in investment decision‐making
CN112070504B (en) Content inspection method and device for blockchain transaction
US20200394330A1 (en) Information processing system, data provision system, and related method
Borges et al. A generalised NGINAR (1) process with inflated‐parameter geometric counting series
CN113407851A (en) Method, device, equipment and medium for determining recommendation information based on double-tower model
CN117424765B (en) Distributed single-heat encoding method, device, electronic equipment and computer storage medium
CN107240021B (en) Insurance information conversion method and device
RU2014146752A (en) METHOD FOR SYNTACTIC SERVICE ANALYSIS, FLEXIBLE ADAPTED TO IMS SYSTEM TAG
CN108134799B (en) Novel coding and decoding method and device thereof
CN113162628B (en) Data encoding method, data decoding method, terminal and storage medium
CN113743129B (en) Information pushing method, system, equipment and medium based on neural network
CN113590829A (en) Bidirectional selection system for service objects and service providers and method of operation thereof
CN116244650B (en) Feature binning method, device, electronic equipment and computer readable storage medium
CN112598139A (en) Category coding method, category coding device, category coding apparatus, storage medium, and program product
CN111008276A (en) Complete entity relationship extraction method and device
CN110930195A (en) Data processing method and electronic equipment
CN117014382B (en) Data stream processing system and method based on convergence and distribution equipment
CN113011584B (en) Coding model training method, coding device and storage medium
CN113052661B (en) Method and device for acquiring attribute information, electronic equipment and storage medium
CN113469683B (en) Key storage method and device, electronic equipment and storage medium
CN116595978B (en) Object category identification method, device, storage medium and computer equipment
CN103955666A (en) Data transmission method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant