CN112449009B - SVD-based communication compression method and device for Federal learning recommendation system - Google Patents

SVD-based communication compression method and device for Federal learning recommendation system Download PDF

Info

Publication number
CN112449009B
CN112449009B CN202011274868.2A CN202011274868A CN112449009B CN 112449009 B CN112449009 B CN 112449009B CN 202011274868 A CN202011274868 A CN 202011274868A CN 112449009 B CN112449009 B CN 112449009B
Authority
CN
China
Prior art keywords
data
uploaded
classification
target
gradient
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011274868.2A
Other languages
Chinese (zh)
Other versions
CN112449009A (en
Inventor
刘刚
谭向前
周明洋
蔡树彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen University
Original Assignee
Shenzhen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen University filed Critical Shenzhen University
Priority to CN202011274868.2A priority Critical patent/CN112449009B/en
Publication of CN112449009A publication Critical patent/CN112449009A/en
Application granted granted Critical
Publication of CN112449009B publication Critical patent/CN112449009B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/55Push-based network services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a communication compression method and a communication compression system of a federal learning recommendation system based on SVD (singular value decomposition), wherein the method comprises the following steps: acquiring gradient data to be uploaded of a current client; grouping the gradient data to be uploaded based on the target number sequence and the preset target number of the gradient data to be uploaded to obtain multiple groups of gradient data to be uploaded; clustering each group of gradient data to be uploaded by adopting a preset clustering algorithm according to rows to obtain classification data with classification labels, and determining the classification labels corresponding to the target numbers; generating classification label data according to the classification label corresponding to each target number and the sequence of the target numbers; and sending the classification data and the classification label data to a server. Therefore, data compression is realized by clustering the data to be uploaded, and the influence of data compression on the accuracy of the recommendation system is reduced under the condition of improving the data compression rate.

Description

SVD-based federated learning recommendation system communication compression method and device
Technical Field
The invention relates to the technical field of computer network application, in particular to a communication compression method and device of a federal learning recommendation system based on SVD.
Background
Through years of development, the recommendation system is more and more intelligent, people's favor can be known relatively comprehensively, and people can be thrown to the best of the preference accurately. With the popularization of smart phones, network users are increased in a blowout mode again, and the traditional recommendation system has to face the problems of server resource shortage, insufficient computation amount and the like. In addition, for more accurate recommendation, a recommendation system may widely collect various information of a user, and a user terminal such as a mobile phone stores a large amount of information of the user, including some private contents related to an individual. If the information is not protected, security problems such as privacy disclosure and the like easily occur.
Based on the above problems, a concept of federal learning based on model averaging has been proposed. The training link is moved to the user side, so that the user does not need to upload personal information to a server, and only the trained gradient needs to be uploaded. The method can solve the problems of privacy protection of users and shortage of computing resources of the server. For the SVD-based federal learning recommendation system, the data volume of the gradient to be uploaded is large, the uploading bandwidth of the user side such as a mobile phone is limited, and if the gradient data is not compressed and directly uploaded, the transmission efficiency of data transmission is greatly affected. At present, the existing communication compression methods mainly include random mask, rank reduction, deep gradient compression and the like, however, when the communication compression methods are applied to the SVD-based federal learning recommendation system, the problem that the data compression effect is poor or the accuracy of the recommendation model of the whole system is affected after data compression exists.
Disclosure of Invention
In view of this, embodiments of the present invention provide a communication compression method and apparatus for a federal learning recommendation system based on SVD, so as to overcome the problem in the prior art that a communication compression method suitable for the federal learning recommendation system based on SVD is lacked.
The embodiment of the invention provides a communication compression method of a federated learning recommendation system based on SVD, which is applied to a client and comprises the following steps:
acquiring gradient data to be uploaded of a current client;
based on the target number of the gradient data to be uploaded, clustering the gradient data to be uploaded by adopting a preset clustering algorithm to obtain classified data with classification labels, and determining the classification label corresponding to each target number;
generating classification label data according to the classification label corresponding to each target number and the sequence of the target numbers;
and sending the classification data and the classification label data to a server.
Optionally, the obtaining of gradient data to be uploaded at the current client includes:
acquiring local gradient data of the current client, and receiving previous round of global gradient data fed back by the server;
and based on the target number of the local gradient data, performing difference calculation on the local gradient data and the previous round of global gradient data to obtain the gradient data to be uploaded.
Optionally, the clustering, based on the target number of the gradient data to be uploaded, the gradient data to be uploaded by using a preset clustering algorithm to obtain classification data with classification tags, and determining the classification tag corresponding to each target number includes:
grouping the gradient data to be uploaded based on the target number sequence and the preset target number of the gradient data to be uploaded to obtain multiple groups of gradient data to be uploaded;
and based on the target numbers of the gradient data to be uploaded, clustering each group of gradient data to be uploaded by adopting a preset clustering algorithm to obtain classification data with classification labels, and determining the classification labels corresponding to the target numbers.
Optionally, the gradient data to be uploaded is matrix data with target numbers, and the preset number of the target numbers is obtained by the following method:
acquiring the row and column number and the target compression rate of the matrix data;
and calculating the number of the preset target numbers according to the row number and the column number and the target compression multiplying power.
Optionally, the classification data with classification tags comprises: first classification data with a first classification label and second classification data with a second classification label.
Optionally, the generating of the classification tag data according to the sequence of the object numbers and the classification tags corresponding to the object numbers includes:
sequentially acquiring classification labels corresponding to the 32 target numbers according to the target number sequence to combine into 32-bit binary data;
and sequentially converting the 32-bit binary data into Int type data to generate the classification label data.
Optionally, the preset target number is an integer multiple of 32.
The embodiment of the invention also provides a communication compression device of the federal learning recommendation system based on SVD, which is applied to a client and comprises the following components:
the acquisition module is used for acquiring gradient data to be uploaded of the current client;
the first processing module is used for clustering the gradient data to be uploaded by adopting a preset clustering algorithm based on the target number of the gradient data to be uploaded to obtain classified data with a classified label and determining the classified label corresponding to each target number;
the second processing module is used for generating classification label data according to the classification labels corresponding to the target numbers and the target number sequence;
and the third processing module is used for sending the classification data and the classification label data to a server.
An embodiment of the present invention further provides an electronic device, including: the communication compression method comprises a memory and a processor, wherein the memory and the processor are connected with each other in a communication mode, the memory stores computer instructions, and the processor executes the computer instructions so as to execute the communication compression method of the SVD-based federal learning recommendation system provided by the embodiment of the invention.
The embodiment of the invention also provides a computer-readable storage medium, which stores computer instructions for enabling a computer to execute the communication compression method of the SVD-based federated learning recommendation system provided by the embodiment of the invention.
The technical scheme of the invention has the following advantages:
the embodiment of the invention provides a communication compression method and a communication compression system of a federal learning recommendation system based on SVD (singular value decomposition), wherein gradient data to be uploaded of a current client side are obtained; grouping the gradient data to be uploaded based on the target number sequence and the preset target number of the gradient data to be uploaded to obtain multiple groups of gradient data to be uploaded; clustering each group of gradient data to be uploaded by adopting a preset clustering algorithm according to rows to obtain classification data with classification labels, and determining the classification labels corresponding to the target numbers; generating classification label data according to the classification label corresponding to each target number and the target number sequence; and sending the classification data and the classification label data to a server. The data compression is realized by clustering the data to be uploaded, and the classification data with the classification tags and the classification tag data comprising the classification tags corresponding to the target numbers and the target number sequence are uploaded, so that the gradient data restored by the server through the classification tags has higher restoration degree, the accuracy of the recommendation model finally generated by the recommendation system is further ensured, and the influence of the data compression on the accuracy of the recommendation system is further reduced under the condition of improving the data compression ratio.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a flow chart of a communication compression method of a SVD-based federated learning recommendation system in an embodiment of the present invention;
fig. 2 is a schematic diagram illustrating a processing result of gradient data to be uploaded according to an embodiment of the present invention;
FIG. 3 is another diagram illustrating the processing result of gradient data to be uploaded according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a communication compression device of the SVD-based federated learning recommendation system in an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device in an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The technical features mentioned in the different embodiments of the invention described below can be combined with each other as long as they do not conflict with each other.
At present, the existing communication compression methods mainly include random mask, rank reduction, deep gradient compression, etc., wherein,
reducing the rank: the basic idea of the method for optimizing the communication between federal learning provided by Google is to take a matrix to be uploaded as a product of two small matrices, wherein one of the two small matrices is generated by using a random seed, and the other small matrix is used as upload data. If the original matrix to be uploaded is:
Figure BDA0002775882210000051
assuming that the maximum rank of the matrix H is k (k is a fixed value), [ 2 ]]It can therefore be assumed that the matrix H is the product of two matrices AB, namely:
Figure BDA0002775882210000061
however, in the federal SVD recommendation system, the gradient Qi that needs to be uploaded each time is a vector of 200k (15-30), that is, the rank of the matrix H in the above formula is at most 30, and if k is set equal to 30, the compression effect is not achieved, and if k is less than 30, then (d 2-k) d1 pieces of information are necessarily lost. So this method cannot be well used for this recommendation system.
Random masking method: similar to rank reduction, the method hopes to upload the original matrix into a smaller matrix, except that the method is originally provided for sparse matrix compression. A small part of values in the sparse matrix are randomly selected to represent the whole sparse matrix, so that data needing to be uploaded can be greatly compressed, and compared with the situation that little point error caused by the compression is completely acceptable. In dense matrices, this approach is not fundamentally feasible.
Deep gradient compression: for a deep neural network, the deep gradient compression performance is excellent, and the vectors smaller than the threshold value are reserved by the gradient vectors, so that unnecessary parameter transmission among all network layers is reduced, and meanwhile, in the next round of training, the reserved gradient vectors in the previous round are added with the gradients in the current round of training, so that the details of each round of training are not lost. However, the method is not suitable for being adopted in the Federal SVD recommendation system, because the clients are independent and data are isolated, and the gradient of each training round can have irreplaceable effect on the whole situation. Each round must upload the complete gradient instead of the gradient filtered by the threshold, otherwise it causes a particularly large error in the recommendation accuracy of the final recommendation model.
Based on the problem that the existing communication compression method is difficult to be applied to the SVD-based federated learning recommendation system, the embodiment of the invention provides a communication compression method specially aiming at the SVD-based federated learning recommendation system, and as shown in fig. 1, the communication compression method mainly comprises the following steps:
step S101: and acquiring gradient data to be uploaded of the current client. Specifically, in the SVD-based federal learning recommendation system, each client uploads gradient data obtained after local data training of the client to the server in each round, the gradient data to be uploaded is in a form of a gradient matrix containing target codes, then the server performs model averaging according to the received gradient matrix of each client to obtain a global gradient matrix, and feeds back the obtained global gradient matrix to each client until the recommendation model of the recommendation system is trained, and recommendation targets are recommended to users through the trained recommendation model, for example: when the recommendation system is used for recommending movies for the user, the target codes are movie numbers of all movies to be recommended, and in the gradient data to be uploaded, the target codes are sorted according to a fixed order, for example, sorted from small to large according to the codes, and correspondingly, the global gradient data fed back by the server are also sorted according to the same order.
Step S102: and based on the target number of the gradient data to be uploaded, clustering the gradient data to be uploaded by adopting a preset clustering algorithm to obtain classified data with classification labels, and determining the classification label corresponding to each target number. Specifically, in the embodiment of the present invention, the selected preset clustering algorithm is a kmans + + algorithm, and experiments show that a clustering result with a better clustering effect can be obtained by using the kmans + + algorithm, and in practical applications, other clustering algorithms such as mean shift clustering and the like can also be used, which is not limited by the present invention. In the embodiment of the invention, a classification label is set for each classified data after clustering, and then the corresponding relation between the gradient data corresponding to each target number in the gradient data to be uploaded and each classified data can be established through the classification label, so that the server can restore the gradient data to be uploaded at the current client more accurately according to the classification label corresponding to each target number, and the influence on the accuracy of a recommendation model of a recommendation system is reduced.
Step S103: and generating classification label data according to the classification label corresponding to each target number and the target number sequence. Specifically, the classification labels corresponding to the target numbers are arranged according to the arrangement sequence of the target labels to obtain classification label data, so that the server can determine the classification labels corresponding to the target numbers directly according to the classification label data.
Step S104: and sending the classification data and the classification label data to a server. Specifically, the matrix data and the classification tag data formed by the classification data may be packaged together and then uploaded to the server.
Through the steps S101 to S104, the communication compression method of the federal learning recommendation system based on SVD provided in the embodiment of the present invention implements data compression by clustering data to be uploaded, and by uploading classified data with classification tags and classified tag data including classification tags corresponding to target numbers and target number sequences, the server can reduce gradient data by the classification tags with a high reduction degree, thereby ensuring the accuracy of the recommendation model finally generated by the recommendation system, and further reducing the influence of data compression on the accuracy of the recommendation system in the case of improving the data compression ratio.
Specifically, in an embodiment, the step S101 specifically includes the following steps:
step S201: and acquiring local gradient data of the current client, and receiving the previous round of global gradient data fed back by the server.
Step S202: and based on the target number of the local gradient data, performing difference calculation on the local gradient data and the global gradient data of the previous round to obtain gradient data to be uploaded.
Specifically, if each iteration directly uses the local gradient data of the current client as the gradient data to be uploaded, since there is no correlation between gradient values in the gradient data, a large amount of operations may be performed during clustering due to an excessively large difference between gradient values. And the situation that the gradients of two rounds are not changed in the training process exists in the federal system, so that the difference value between the local gradient data of the current round of the client and the global gradient data fed back by the previous round of the server can be used as the gradient data to be uploaded, and after the compressed gradient data is received at the server, the local gradient data of the current client can be restored by using the global gradient data of the previous round stored by the server, so that the clustering speed is improved on the basis of not influencing the recommendation model training of the recommendation system, and the calculation amount in the data compression process is reduced.
Specifically, in an embodiment, the step S102 specifically includes the following steps:
step S301: and grouping the gradient data to be uploaded based on the target number sequence and the preset target number of the gradient data to be uploaded to obtain multiple groups of gradient data to be uploaded. Specifically, since the gradient data to be uploaded is matrix data with a target number, the preset target number is obtained by the following method: acquiring the row and column number and target compression multiplying power of matrix data; and calculating the number of preset target numbers according to the row number and the column number and the target compression ratio. In the embodiment of the present invention, a calculation formula of the compression ratio is shown as formula (1):
Figure BDA0002775882210000091
where r denotes a compression magnification, N denotes a column number of the matrix data, I denotes a row number (i.e., the number of object numbers) of the matrix data, and K denotes a preset object number.
The above formula is simplified to obtain:
Figure BDA0002775882210000092
therefore, since the gradient data to be uploaded are known (i.e. the number of rows and columns of the matrix data is determined), the relationship between the compression ratio and the number of the preset target numbers can be obtained through the above formulas (1) and (2), and therefore the number of the preset target numbers can be obtained according to the compression ratio requirement set by the recommendation system. Of course, in practical applications, the number of preset target numbers may also be set empirically, and then the compression ratio may be estimated by the above formulas (1) and (2).
Step S302: and based on the target numbers of the gradient data to be uploaded, clustering each group of gradient data to be uploaded by adopting a preset clustering algorithm to obtain classification data with classification labels, and determining the classification labels corresponding to the target numbers. Specifically, the matrix data may be grouped according to the target number of the gradient data (matrix data with the target number) to be uploaded, and then the groups are clustered, after all the groups are clustered, the classification results of each group are merged together to obtain classification data, and the classification label of each target number in the group to which the target number belongs is determined. Since the number of the preset target numbers (i.e., the number of the target numbers of each packet) is fixed, the classification label of each packet can be represented by a natural number from 0 to the number of the preset target numbers K-1, and data confusion does not occur. The processing result of performing the grouping clustering on the gradient data to be uploaded is shown in fig. 2, where the left side of the arrow is matrix data with target numbers (i.e. gradient data to be uploaded), the first column is the target numbers, and the right side of the arrow is sequentially classified data with classification labels (the first column is the classification labels) and classification label data composed of the classification labels corresponding to the target numbers.
Specifically, as can be seen from the above equation (2), since N can be regarded as a constant value, when K is much larger than N, r ≈ N, but the value of N is fixed, so that the compression ratio of this method has an upper limit and is small. The reason for this is that the number I (target number) is too large and each class label is stored by one 32bit Int type data, resulting in class label data occupying a large amount of data space. In order to further improve the data compression rate, in the embodiment of the present invention, the classification data with the classification tags is divided into: first classification data with a first classification tag and second classification data with a second classification tag. By limiting the gradient data clustering result of each group into two types, the data quantity of the classified data of each group is reduced, and the data compression rate is improved. Further, the first classification label is 0, the second classification label is 1, and the number of the preset target numbers is an integer multiple of 32. The step S103 includes the following steps:
step S401: and sequentially acquiring the classification labels corresponding to the 32 target numbers according to the sequence of the target numbers to combine into 32-bit binary data.
Step S402: the 32-bit binary data is sequentially converted into Int type data, and classification tag data is generated. If the number of the remaining objects is less than 32, 32 binary data is formed after 0 is supplemented at the end, and then the 32 binary data is converted into Int type data for uploading.
Therefore, by limiting the classification number of each group of clustering results to be 2 and the classification labels to be represented by 0 and 1, the index values corresponding to all target numbers are guaranteed to be 0 or 1, so that the value of each classification label only occupies 1bit, 0,1 with a 2-system scale is obtained as the classification label corresponding to each group of target numbers, and every 32 bits are combined together to form corresponding Int type data which is uploaded. The problem that the data quantity of the classification labels in the previous process is too large is solved. Therefore, the purpose of reducing the communication data volume in the transmission process and improving the compression ratio is achieved.
On the basis of the grouping clustering shown in fig. 2, the processing result of the serialized grouping clustering is shown in fig. 3, and at this time, the calculation formula of the compression ratio is shown in formula (3):
Figure BDA0002775882210000111
where r denotes a compression magnification, N denotes the number of columns of the matrix data, I denotes the number of rows of the matrix data (i.e., the number of target numbers), and K denotes the number of preset target numbers.
The above formula is simplified to obtain:
Figure BDA0002775882210000112
since the classification labels corresponding to the target numbers are combined in 32-bit groups, the value of K is a multiple of 32, that is, K is a multiple of K
K=2 n (n is a positive integer of 5 or more)
Theoretically assuming N =20, when K =4096, r =487, which is a very ideal compression ratio. Experiments show that the accuracy of the compressed whole recommendation system is slightly different from that of the uncompressed recommendation system, but is within an acceptable error range of the recommendation system.
Under the condition of different numbers of clients, local gradient data to be uploaded to a server by each client is compressed and then uploaded by using the communication compression method (serialized packet clustering for short) of the SVD-based federated learning recommendation system provided by the embodiment of the invention, and a comparison experiment is carried out with direct uploading without compression (clustering for short). The results of the specific experiments are shown in table 1. Experimental results show that the compression method provided by the embodiment of the invention can cause a system recommendation error (RMES for short) to slightly increase, that is, the recommendation accuracy of the recommendation model is slightly deviated, but deviation values are small, and the deviation is within an acceptable range compared with the compression rate of the gradient.
Table 1: influence of compression algorithm on accuracy under different number of clients
Figure BDA0002775882210000113
Figure BDA0002775882210000121
In addition, different preset target number numbers (namely K values) are set, local gradient data to be uploaded to a server by a client are compressed and then uploaded by using the communication compression method (serialized packet clustering for short) of the SVD-based federated learning recommendation system provided by the embodiment of the invention, and a comparison experiment is carried out, wherein the experiment result is shown in table 2. According to the experimental result, the increase of the K value can increase the compression multiple, but the accuracy of the recommendation system is lost to a certain extent. Moreover, the compression method provided by the embodiment of the invention greatly compresses the uploaded gradient data, and the deviation of the RMSE is within a system acceptable range, so that the compression method is more excellent than the existing compression method.
Table 2: different K-value corresponding compression ratio and RMSE
Value of K 0 (not compressed) 512 1024 2048 4096
Convergence time(s) 937 1909 1767 1694 1605
Average RMSE 0.8036 0.8074 0.8075 0.8072 0.8068
Theoretical compression ratio - 183 284 393 487
Actual compression factor - 182 281 387 478
The communication compression method of the federated learning recommendation system based on SVD provided by the embodiment of the invention improves the data compression rate and simultaneously reduces the influence of data compression on the accuracy rate of the recommendation model of the recommendation system, thereby accelerating the convergence rate of the recommendation model training.
The embodiment of the invention also provides a communication compression device of the federal learning recommendation system based on the SVD, as shown in fig. 4, the communication compression device of the federal learning recommendation system based on the SVD comprises:
the obtaining module 101 is configured to obtain gradient data to be uploaded of a current client. For details, refer to the related description of step S101 in the above method embodiment.
The first processing module 102 is configured to cluster the gradient data to be uploaded by using a preset clustering algorithm based on the target number of the gradient data to be uploaded to obtain classification data with classification tags, and determine the classification tag corresponding to each target number. For details, refer to the related description of step S102 in the above method embodiment.
The second processing module 103 is configured to generate classification tag data according to the classification tag corresponding to each object number and the sequence of the object numbers. For details, refer to the related description of step S103 in the above method embodiment.
And a third processing module 104, configured to send the classification data and the classification label data to a server. For details, refer to the related description of step S104 in the above method embodiment.
Through the cooperative cooperation of the above components, the communication compression device of the SVD-based federal learning recommendation system provided in the embodiment of the present invention implements data compression by clustering data to be uploaded, and by uploading classification data with classification tags and classification tag data including classification tags corresponding to target numbers and target number sequences, the server can make the gradient data restored by the classification tags have a higher restoration degree, thereby ensuring the accuracy of the recommendation model finally generated by the recommendation system, and further reducing the influence of data compression on the accuracy of the recommendation system under the condition of improving the data compression ratio.
There is also provided an electronic device according to an embodiment of the present invention, as shown in fig. 5, the electronic device may include a processor 901 and a memory 902, where the processor 901 and the memory 902 may be connected by a bus or in another manner, and fig. 5 takes the example of being connected by a bus as an example.
Processor 901 may be a Central Processing Unit (CPU). The Processor 901 may also be other general purpose processors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, or combinations thereof.
The memory 902, which is a non-transitory computer readable storage medium, may be used for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the methods in the method embodiments of the present invention. The processor 901 executes various functional applications and data processing of the processor by executing non-transitory software programs, instructions and modules stored in the memory 902, that is, implements the methods in the above-described method embodiments.
The memory 902 may include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created by the processor 901, and the like. Further, the memory 902 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 902 may optionally include memory located remotely from the processor 901, which may be connected to the processor 901 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
One or more modules are stored in the memory 902, which when executed by the processor 901 performs the methods in the above-described method embodiments.
The specific details of the electronic device may be understood by referring to the corresponding related descriptions and effects in the above method embodiments, and are not described herein again.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware related to instructions of a computer program, and the program can be stored in a computer readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk Drive (Hard Disk Drive, abbreviated as HDD), or a Solid State Drive (SSD); the storage medium may also comprise a combination of memories of the kind described above.
Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, those skilled in the art can make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope defined by the appended claims.

Claims (8)

1. A communication compression method of a federated learning recommendation system based on SVD is applied to a client, and is characterized by comprising the following steps:
acquiring gradient data to be uploaded of a current client;
based on the target number of the gradient data to be uploaded, clustering the gradient data to be uploaded by adopting a preset clustering algorithm to obtain classified data with classification labels, and determining the classification label corresponding to each target number;
generating classification label data according to the classification label corresponding to each target number and the target number sequence;
sending the classification data and the classification label data to a server;
the method for determining the classification label corresponding to each target number comprises the following steps of clustering the gradient data to be uploaded by adopting a preset clustering algorithm based on the target number of the gradient data to be uploaded to obtain classification data with the classification label, and determining the classification label corresponding to each target number, wherein the method comprises the following steps:
grouping the gradient data to be uploaded based on the target number sequence and the preset target number of the gradient data to be uploaded to obtain multiple groups of gradient data to be uploaded;
based on the target numbers of the gradient data to be uploaded, clustering each group of gradient data to be uploaded by adopting a preset clustering algorithm to obtain classification data with classification labels, and determining the classification labels corresponding to the target numbers;
the gradient data to be uploaded is matrix data with target numbers, and the preset target number is obtained by the following method:
acquiring the row and column number and the target compression ratio of the matrix data;
and calculating the number of the preset target numbers according to the row number and the column number and the target compression multiplying power.
2. The method according to claim 1, wherein the obtaining gradient data to be uploaded at the current client comprises:
acquiring local gradient data of the current client, and receiving previous round of global gradient data fed back by the server;
and based on the target number of the local gradient data, performing difference calculation on the local gradient data and the previous round of global gradient data to obtain the gradient data to be uploaded.
3. The method of claim 1, wherein the classification data with classification tags comprises: first classification data with a first classification label and second classification data with a second classification label.
4. The method of claim 3, wherein the first class label is 0 and the second class label is 1, and wherein the generating of class label data according to the sequence of object numbers and the class labels corresponding to the object numbers comprises:
sequentially acquiring classification labels corresponding to the 32 target numbers according to the target number sequence to combine into 32-bit binary data;
and sequentially converting the 32-bit binary data into Int type data to generate the classification label data.
5. The method of claim 3, wherein the predetermined target number is an integer multiple of 32.
6. The utility model provides a federal study recommendation system communication compression device based on SVD, is applied to the client, its characterized in that includes:
the acquisition module is used for acquiring gradient data to be uploaded of the current client;
the first processing module is used for clustering the gradient data to be uploaded by adopting a preset clustering algorithm based on the target number of the gradient data to be uploaded to obtain classified data with a classified label and determining the classified label corresponding to each target number;
the second processing module is used for generating classification label data according to the classification labels corresponding to the target numbers and the target number sequence;
the third processing module is used for sending the classification data and the classification label data to a server;
the first processing module is specifically configured to:
grouping the gradient data to be uploaded based on the target number sequence and the preset target number of the gradient data to be uploaded to obtain multiple groups of gradient data to be uploaded;
based on the target numbers of the gradient data to be uploaded, clustering each group of gradient data to be uploaded by adopting a preset clustering algorithm to obtain classification data with classification labels, and determining the classification labels corresponding to the target numbers;
the gradient data to be uploaded is matrix data with target numbers, and the preset target number is obtained by the following method:
acquiring the row and column number and the target compression rate of the matrix data;
and calculating the number of the preset target numbers according to the row number and the column number and the target compression multiplying power.
7. An electronic device, comprising:
a memory and a processor communicatively coupled to each other, the memory having stored therein computer instructions, the processor executing the computer instructions to perform the method of any of claims 1-5.
8. A computer-readable storage medium having stored thereon computer instructions for causing a computer to thereby perform the method of any one of claims 1-5.
CN202011274868.2A 2020-11-12 2020-11-12 SVD-based communication compression method and device for Federal learning recommendation system Active CN112449009B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011274868.2A CN112449009B (en) 2020-11-12 2020-11-12 SVD-based communication compression method and device for Federal learning recommendation system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011274868.2A CN112449009B (en) 2020-11-12 2020-11-12 SVD-based communication compression method and device for Federal learning recommendation system

Publications (2)

Publication Number Publication Date
CN112449009A CN112449009A (en) 2021-03-05
CN112449009B true CN112449009B (en) 2023-01-10

Family

ID=74737868

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011274868.2A Active CN112449009B (en) 2020-11-12 2020-11-12 SVD-based communication compression method and device for Federal learning recommendation system

Country Status (1)

Country Link
CN (1) CN112449009B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114125070B (en) * 2021-11-10 2023-06-13 深圳大学 Communication method, system, electronic device and storage medium for quantization compression
WO2023092323A1 (en) * 2021-11-24 2023-06-01 Intel Corporation Learning-based data compression method and system for inter-system or inter-component communications
CN114339252B (en) * 2021-12-31 2023-10-31 深圳大学 Data compression method and device
CN114861790B (en) * 2022-04-29 2023-03-17 深圳大学 Method, system and device for optimizing federal learning compression communication
CN115022316B (en) * 2022-05-20 2023-08-11 阿里巴巴(中国)有限公司 End cloud collaborative data processing system, method, equipment and computer storage medium
CN115600690A (en) * 2022-09-20 2023-01-13 天翼电子商务有限公司(Cn) Longitudinal federated learning discrete variable preprocessing method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111324812A (en) * 2020-02-20 2020-06-23 深圳前海微众银行股份有限公司 Federal recommendation method, device, equipment and medium based on transfer learning
CN111582505A (en) * 2020-05-14 2020-08-25 深圳前海微众银行股份有限公司 Federal modeling method, device, equipment and computer readable storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3635716A4 (en) * 2017-06-08 2021-04-07 D5Ai Llc Data splitting by gradient direction for neural networks
US11164105B2 (en) * 2017-11-13 2021-11-02 International Business Machines Corporation Intelligent recommendations implemented by modelling user profile through deep learning of multimodal user data
CN110297848B (en) * 2019-07-09 2024-02-23 深圳前海微众银行股份有限公司 Recommendation model training method, terminal and storage medium based on federal learning
CN111079022B (en) * 2019-12-20 2023-10-03 深圳前海微众银行股份有限公司 Personalized recommendation method, device, equipment and medium based on federal learning
CN111865815B (en) * 2020-09-24 2020-11-24 中国人民解放军国防科技大学 Flow classification method and system based on federal learning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111324812A (en) * 2020-02-20 2020-06-23 深圳前海微众银行股份有限公司 Federal recommendation method, device, equipment and medium based on transfer learning
CN111582505A (en) * 2020-05-14 2020-08-25 深圳前海微众银行股份有限公司 Federal modeling method, device, equipment and computer readable storage medium

Also Published As

Publication number Publication date
CN112449009A (en) 2021-03-05

Similar Documents

Publication Publication Date Title
CN112449009B (en) SVD-based communication compression method and device for Federal learning recommendation system
CN110222048B (en) Sequence generation method, device, computer equipment and storage medium
Ma et al. Layer-wised model aggregation for personalized federated learning
CN105138647A (en) Travel network cell division method based on Simhash algorithm
CN108628898B (en) Method, device and equipment for data storage
CN111260220B (en) Group control equipment identification method and device, electronic equipment and storage medium
CN107291935B (en) Spark and Huffman coding based CPIR-V nearest neighbor privacy protection query method
CN114138231B (en) Method, circuit and SOC for executing matrix multiplication operation
US10511695B2 (en) Packet-level clustering for memory-assisted compression of network traffic
CN116610731B (en) Big data distributed storage method and device, electronic equipment and storage medium
CN117119535A (en) Data distribution method and system for mobile terminal cluster hot spot sharing
CN110266834B (en) Area searching method and device based on internet protocol address
CN110728118B (en) Cross-data-platform data processing method, device, equipment and storage medium
CN110135465B (en) Model parameter representation space size estimation method and device and recommendation method
CN116187431A (en) Federal learning distillation method and device for non-independent co-distribution scene
CN104765790B (en) A kind of method and apparatus of data query
CN104391916A (en) GPEH data analysis method and device based on distributed computing platform
CN113807370A (en) Data processing method, device, equipment, storage medium and computer program product
CN112036418A (en) Method and device for extracting user features
CN110929118A (en) Network data processing method, equipment, device and medium
CN115329032B (en) Learning data transmission method, device, equipment and storage medium based on federated dictionary
Wu et al. Statistical prior aided separate compressed image sensing for green Internet of multimedia things
Li et al. A novel data compression technique incorporated with computer offloading in RGB-D SLAM
CN117437010A (en) Resource borrowing level prediction method, device, equipment, storage medium and program product
CN106802907B (en) The KPI calculation method of mobile LTE based on code stream addressing system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant