CN116341689B - Training method and device for machine learning model, electronic equipment and storage medium - Google Patents

Training method and device for machine learning model, electronic equipment and storage medium Download PDF

Info

Publication number
CN116341689B
CN116341689B CN202310312859.5A CN202310312859A CN116341689B CN 116341689 B CN116341689 B CN 116341689B CN 202310312859 A CN202310312859 A CN 202310312859A CN 116341689 B CN116341689 B CN 116341689B
Authority
CN
China
Prior art keywords
data
model
variable length
length coding
data packet
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310312859.5A
Other languages
Chinese (zh)
Other versions
CN116341689A (en
Inventor
崔来中
苏晓鑫
周义朋
刘江川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen University
Original Assignee
Shenzhen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen University filed Critical Shenzhen University
Priority to CN202310312859.5A priority Critical patent/CN116341689B/en
Publication of CN116341689A publication Critical patent/CN116341689A/en
Application granted granted Critical
Publication of CN116341689B publication Critical patent/CN116341689B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/04Protocols for data compression, e.g. ROHC

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a training method and device of a machine learning model, electronic equipment and a storage medium. The method is applied to at least one client, the server and a plurality of clients perform multi-iteration distributed model training, and in any iteration process, the method comprises the following steps: receiving a global model sent by a server, and training the global model based on local sample data in a client to obtain model update data; carrying out sparsification processing on the model update data to obtain model update sparsification data; performing variable length coding compression on the model updating sparse data to obtain model updating variable length coding data; and uploading the model update variable length coding data to a server so that the server updates the global model according to the model update variable length coding data. According to the technical scheme, variable length coding compression is carried out on the data, so that the bit number of data distribution is more reasonable, the data compression error is reduced, and the model training precision is improved.

Description

Training method and device for machine learning model, electronic equipment and storage medium
Technical Field
The present invention relates to the field of artificial intelligence, and in particular, to a training method and apparatus for a machine learning model, an electronic device, and a storage medium.
Background
With the development of artificial intelligence technology, various advanced machine learning models are designed to train so as to provide various services and applications to meet life demands.
Federal learning is a new model of machine learning training that allows multiple clients to cooperatively train a machine learning model using local data and without the need to transmit the local data elsewhere. The training framework of federal learning comprises a server and a plurality of clients participating in training, the training process of the framework comprises a plurality of iterations, and model updating is carried out between the server and the clients in each iteration through transmitting compressed training machine learning models or model updating data.
In the process of implementing the present invention, the inventor finds that at least the following technical problems exist in the prior art: in the prior art, the problems of large compression error and low model training precision exist.
Disclosure of Invention
The invention provides a training method and device of a machine learning model, electronic equipment and a storage medium, so as to reduce compression errors and improve model training precision.
According to an aspect of the present invention, there is provided a training method of a machine learning model, applied to at least one client, the server and a plurality of clients perform multiple iterative distributed model training, and in any iterative process, the method includes:
receiving a global model sent by a server, and training the global model based on local sample data in the client to obtain model update data;
performing sparsification processing on the model update data to obtain model update sparsification data;
performing variable length coding compression on the model updating sparse data to obtain model updating variable length coding data;
and uploading the model updating variable length coding data to the server so that the server updates the global model according to the model updating variable length coding data to obtain a target global model.
According to another aspect of the present invention, there is provided a training apparatus of a machine learning model, applied to at least one client, the server performing a plurality of iterations of distributed model training with a plurality of clients, the apparatus comprising, during any one iteration:
the model update data determining module is used for receiving a global model sent by a server, and training the global model based on local sample data in the client to obtain model update data;
The update data sparsification module is used for performing sparsification processing on the model update data to obtain model update sparsification data;
the variable length coding compression module is used for performing variable length coding compression on the model updating sparse data to obtain model updating variable length coding data;
and the variable length coding data uploading module is used for uploading the model updating variable length coding data to the server so that the server updates the global model according to the model updating variable length coding data to obtain a target global model.
According to another aspect of the present invention, there is provided an electronic apparatus including:
at least one processor;
and a memory communicatively coupled to the at least one processor;
wherein the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the method of training a machine learning model according to any one of the embodiments of the present invention.
According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to implement the training method of the machine learning model according to any of the embodiments of the present invention when executed.
According to the technical scheme, the global model sent by the server is received, the global model is trained based on the local sample data in the client to obtain the model update data, the model update sparse data is obtained by sparse processing of the model update data, the preliminary compression of the model update data is realized, the variable length coding compression is carried out on the preliminarily compressed model update sparse data, the bit number of data distribution is more reasonable, the data compression error is reduced, the model update variable length coding data is uploaded to the server, and the server updates the global model according to the model update variable length coding data to obtain the target global model with higher precision.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method of training a federal learning model according to a first embodiment of the present invention;
FIG. 2 is a flow chart of a model single round global iterative training provided in accordance with a first embodiment of the present invention;
FIG. 3 is a flowchart of a training method of a machine learning model according to a second embodiment of the present invention;
FIG. 4 is a flowchart of a training method of a machine learning model according to a third embodiment of the present invention;
fig. 5 is a schematic diagram of a data packet according to a third embodiment of the present invention;
FIG. 6 is a schematic structural diagram of a training device for a machine learning model according to a fourth embodiment of the present invention;
fig. 7 is a schematic structural diagram of an electronic device implementing a training method of a machine learning model according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Before describing the specific embodiments, application scenarios of the present application will be described. Specifically, the training system of the machine learning model comprises a server and a plurality of clients participating in training. The server may be a parameter server located at the cloud end; the client may be a terminal device with certain storage and calculation capabilities; the server and each client may communicate wirelessly, for example, mobile communication, wiFi, etc.
Example 1
Fig. 1 is a flowchart of a machine learning model training method according to an embodiment of the present invention, where the method may be performed by a machine learning model training device, the machine learning model training device may be implemented in hardware and/or software, and the machine learning model training device may be configured in one or more clients. As shown in fig. 1, the method includes:
s110, receiving a global model sent by a server, and training the global model based on local sample data in the client to obtain model update data.
In this embodiment, in the machine learning model training process, the client downloads the global model from the server, and initializes the local model of the client by using the downloaded global model, that is, the downloaded global model is used as the initial model to be trained locally, and further trains the global model by using the local sample data in the client, so as to obtain the model update data of the client.
S120, performing sparsification processing on the model update data to obtain model update sparsification data.
In this embodiment, the model update and sparsity data refers to model update data subjected to sparsity processing, in other words, the model update and sparsity data is a result of the model update data being reduced in data size without losing necessary information.
Specifically, the model update data can be subjected to sparsification processing by a random sparsification method, a TOP-K sparsification method, model pruning and other methods, so that the model update sparsification data is obtained.
S130, performing variable length coding compression on the model updating sparse data to obtain model updating variable length coding data.
In the present embodiment, variable length coding compression means coding compression capable of variable length. Specifically, for model update data with a large value, the variable length coding compression can be quantized with a larger number of bits, while for model update data with a small value, the variable length coding compression uses a smaller number of bits for quantization. Compared with the equal-length coding compression mode, the variable-length coding compression can save bits, reduce flow waste, more fully utilize given flow and reduce errors caused by compression under the condition of realizing the same compression rate, thereby improving the accuracy of model training.
And S140, uploading the model updating variable length coding data to the server so that the server updates the global model according to the model updating variable length coding data to obtain a target global model.
The target global model is a global model obtained by training the current iteration times.
Fig. 2 is a flowchart of a model single round global iterative training according to the present embodiment. In the model training process, a client downloads a global model from a server, the client performs multi-round training on the global model locally, after the client finishes training, the client performs sparsification, parameter grouping and other treatments on model update data, and further performs variable length coding compression on the treated data to obtain model update variable length coding data, and further uploads the model update variable length coding data to the server, and further the server aggregates the model update variable length coding data uploaded by a plurality of clients, and updates the global model in the server by using the aggregated data to obtain a target global model.
On the basis of the above embodiments, after performing variable length coding compression on the model update sparse data to obtain model update variable length coded data, the method further includes: scaling the model update variable length coding data to obtain scaled model update variable length coding data; correspondingly, uploading the model update variable length coded data to the server comprises: and uploading the scaled model update variable length coding data to a server.
It should be noted that, in the model training process, errors between data before and after compression of the model update data need to be bounded, so as to ensure that the model training process cannot cause complete deviation of the trained model due to compression errors, and convergence cannot be achieved. In order to solve the problem of incapability of convergence, in this embodiment, data scaling is achieved by dividing model update variable-length encoded data by a constant of a preset value, so as to ensure that compression errors are bounded, where the constant of the preset value can be determined or adjusted according to a training experiment, and is not limited herein.
According to the technical scheme, the global model sent by the server is received, the global model is trained based on the local sample data in the client to obtain the model update data, the model update sparse data is obtained by sparse processing of the model update data, the preliminary compression of the model update data is realized, the variable length coding compression is carried out on the preliminarily compressed model update sparse data, the bit number of data distribution is more reasonable, the data compression error is reduced, the model update variable length coding data is uploaded to the server, and the server updates the global model according to the model update variable length coding data to obtain the target global model with higher precision.
Example two
Fig. 3 is a flowchart of a machine learning model training method according to a second embodiment of the present invention, where the method according to the present embodiment may be combined with each of the alternatives in the machine learning model training method provided in the foregoing embodiment. The training method of the machine learning model provided by the embodiment is further optimized. Optionally, the performing sparsification processing on the model update data to obtain model update sparsification data includes: determining an absolute value of the model update data; based on the absolute value of the model updating data, sequencing the model updating data to obtain a sequencing result of the model updating data; and acquiring a preset number of elements from the ordering result of the model updating data result, and determining the preset number of elements as model updating sparse data.
As shown in fig. 3, the method includes:
s210, receiving a global model sent by a server, and training the global model based on local sample data in the client to obtain model update data.
S220, determining the absolute value of the model update data.
S230, based on the absolute value of the model updating data, ordering the model updating data to obtain an ordering result of the model updating data.
S240, acquiring a preset number of elements from the ordering result of the model updating data result, and determining the preset number of elements as model updating sparse data.
S250, performing variable length coding compression on the model updating sparse data to obtain model updating variable length coding data.
And S260, uploading the model updating variable length coding data to the server so that the server updates the global model according to the model updating variable length coding data to obtain a target global model.
Illustratively, during the model training process of the t-th round of global iteration, the client may download the latest global model w from the server t And uses the downloaded latest global model to initialize the local modelAnd updating a multi-round batch gradient descent algorithm locally at the client, wherein the accumulated gradient data are as follows:
wherein,representing a model trained on the local side of client i for j rounds,/for the model>Representing local sample data randomly selected during local training, F i () Representing the local loss function of client i, E represents the local training round.
Further, the model update data may be calculated by the following formula:
Wherein,representing model update data->Representing accumulated gradient data, +.>Representing historical data that is not uploaded locally by the client.
After obtaining the model update data, the embodiment can sort d data in the model update data according to the absolute value from large to small, further take the first k data in the sorting result as the model update sparse data, store the remaining d-k data in the client, and determine the formula of the model update sparse data as follows:
wherein,representation model update sparsified data,/>Representing the first data in the model update data,representing the first k data in the absolute value ordering result in the model-taking updated data.
In some alternative embodiments, the model update data may be subjected to sparsification processing by a TOP-K algorithm, so as to implement preliminary compression of the model update data, and obtain the model update sparsified data.
According to the technical scheme, the absolute value of the model updating data is determined, the model updating data is ordered based on the absolute value of the model updating data, the ordering result of the model updating data is obtained, the preset number of elements are obtained from the ordering result of the model updating data, the preset number of elements are determined to be the model updating sparse data, and the preliminary compression of the model updating data is achieved.
Example III
Fig. 4 is a flowchart of a machine learning model training method according to a third embodiment of the present invention, where the method according to the present embodiment may be combined with each of the alternatives in the machine learning model training method provided in the foregoing embodiment. The training method of the machine learning model provided by the embodiment is further optimized. Optionally, the performing variable length coding compression on the model update sparse data to obtain model update variable length coding data includes: dividing the model updating sparse data into a plurality of data packets, and determining the number of parameters distributed by each data packet; determining a target coding length based on the number of parameters allocated to each data packet; and compressing the model updating sparse data based on the target coding length to obtain model updating variable-length coding data.
As shown in fig. 4, the method includes:
s310, receiving a global model sent by a server, and training the global model based on local sample data in the client to obtain model update data.
S320, performing sparsification processing on the model update data to obtain model update sparsification data.
S330, dividing the model updating sparse data into a plurality of data packets, and determining the number of parameters distributed by each data packet.
S340, determining a target coding length based on the parameter quantity allocated by each data packet.
S350, compressing the model updating sparse data based on the target coding length to obtain model updating variable-length coding data.
S360, uploading the model updating variable length coding data to the server so that the server updates the global model according to the model updating variable length coding data to obtain a target global model.
In this embodiment, the model update data may be transmitted in the form of data packets in the wireless network. Specifically, the model update thinned data may be divided into a plurality of data packets, and the encoding length of each data packet may be set to be the same, and the size of each data packet may be a fixed value. The number of parameters allocated to each data packet can be determined according to the model update sparse data in each data packet, the target coding length can be determined according to the number of parameters allocated to each data packet, and the model update variable-length coded data is compressed according to the target coding length, so that the number of bits allocated to the data is more reasonable, and the data compression error is reduced; further, the model updating variable length coding data is uploaded to the server, so that the global model is updated by the server according to the model updating variable length coding data, and the target global model with higher precision is obtained.
On the basis of the above embodiments, optionally, dividing the model update sparse data into a plurality of data packets, and determining the number of parameters allocated to each data packet includes: dividing the model updating sparse data into a plurality of data packets, inputting the model updating sparse data corresponding to each data packet into an optimization objective function, and carrying out minimization treatment on the optimization objective function to obtain the parameter quantity distributed by each data packet; the data packet comprises a data packet header and a data packet load, wherein the data packet header comprises header information, data packet position identification specification information, centroid identification specification information and centroid distribution information; the packet payload includes a location identifier and a centroid identifier for each parameter.
The optimization objective function may be an optimization objective function preset by a user or theoretically derived.
Illustratively, the optimization objective function may be:
wherein x=p r-1 +P r ;P r Indicating the number of parameters allocated to the r-th data packet, P r-1 The parameter number of the r-1 data packet distribution is represented; d represents the data dimension, α represents the decreasing exponent, B is a constant, Z r =P 1 +P 2 +…+P r ;Q(P r ,y r ) Representing the dimension P r Vector quantization to y using unbiased quantization algorithm r Quantization error after bits, y r Indicating the target coding length corresponding to the r data packet.
It will be appreciated that the optimization problem of optimizing the objective function as a single variable, when Q (P r ,y r ) When the convex function is increased, the optimization problem is the optimization problem of the strong convex loss function, and further the solution is realized. The optimal parameter quantity allocated to each data packet is obtained by continuously and iteratively optimizing the parameter quantity allocated to each data packet, so that the optimal variable length coding compression is set under the given communication flow limit, and the optimization of the model performance is realized.
In this embodiment, the process of determining the optimization objective function includes:
first analyze and assume that a P exists r Vectors of individual elementsWherein each element is quantized to y r A new vector +.>The compression error is:
in addition, to enable analysis of the form of compression errors, the model update data can be fitted using a power law distribution, each represented as follows:
|U{l}|≤φl α
where U { l } represents the element of vector U with the first largest absolute value, α represents a decreasing exponent and α <0, φ represents a constant.
After fitting the model update data using the power law distribution, the compression error of the machine learning model is:
Wherein, gamma i The form of (2) is as follows:
wherein Z is r =P 1 +P 2 +…+P r
Each client needs to distribute the parameter quantity P in the r-th data packet before uploading the data r So that the corresponding compression errors can be minimized, the local optimization problem is expressed as follows:
s.t.P r (s+y r )+H≤b
because R+1 variables exist in the local optimization problem, analysis or solution of properties cannot be directly performed, and a corresponding optimization mode needs to be designed. Note that γ i There are two items of content, one of which is determined by the sparsification error and the other of which is distributed by a plurality of distributionsDifferent quantization errors are determined, and therefore this embodiment can correspondingly consider this optimization problem divided into two sub-problems.
First, for the determination of the k value, under the communication traffic of a given R data packet, it can be assumed that each parameter is compressed to 1 bit, and then the maximum k of the number of transmitted parameters can be calculated max Assuming that each parameter is not compressed, represented by 32 bits, the minimum k to the number of parameters transmitted can be calculated min The present embodiment may traverse different parameter numbers within this range, thereby obtaining the optimal parameter number for uploading.
In traversing the number of parameters, for each given k, one needs to consider how to allocate the number of parameters into R packets, i.e., determine P 1 ,P 2 ,…,P R For such optimization solutions, the present embodiment may use a sequence minimum optimization (Sequential minimal optimization, SMO) algorithm that may be used to solve the optimization problem of multiple variables in the support vector machine. Specifically, the SMO algorithm can decompose a multivariate optimization problem into a plurality of univariate optimization problems to quickly solve, and in this embodiment, only the parameter number P of two adjacent data packets can be considered each time r-1 And P r The rest R-2 data packets are temporarily fixed, and at this time, the optimization objective function is:
on the basis of the foregoing embodiments, optionally, determining the target coding length based on the number of parameters allocated to each data packet includes: and inputting the parameter quantity distributed by each data packet into a pre-configured coding length determining model to obtain a target coding length.
Optionally, the preconfigured code length determining model is:
wherein y is r Represents the target coding length of the (r) th data packet, b represents the specification parameter of the data packet, H represents the header information of the data packet, and P r And s represents the position identification of the model updating parameter.
Fig. 5 is a schematic diagram of a data packet according to the present embodiment. As shown in fig. 5, the data packet includes a data packet header and a data packet load, s represents data packet position identification specification information, y represents centroid identification specification information, and centroid distribution refers to centroid distribution information; the packet payload includes a location identifier and a centroid identifier for each parameter. The PID represents the position identification of the parameters in the data packet, and the server can judge that the parameters are positioned at specific positions updated by the model according to the PID; CID represents the centroid identification of the parameters in the data packet, and the server can determine which centroid the value of the parameters belongs to according to CID. It will be appreciated that the data packets of the above format are transmitted and the server decodes the information in the data packets when it receives the data, so that the correct model update data can be obtained.
According to the technical scheme, the model updating sparse data is divided into the data packets, the parameter quantity distributed by each data packet is determined, the target coding length is determined based on the parameter quantity distributed by each data packet, the model updating sparse data is compressed based on the target coding length, the bit quantity distributed by the data is more reasonable, the data compression error is reduced, the model updating variable-length coding data is uploaded to the server, and the server updates the global model according to the model updating variable-length coding data to obtain the target global model with higher precision.
Example IV
Fig. 6 is a schematic structural diagram of a training device for a machine learning model according to a fourth embodiment of the present invention. The training device of the machine learning model is applied to at least one client, the server performs multiple iterations of distributed model training with multiple clients, and in any iteration process, as shown in fig. 6, the device includes:
the model update data determining module 410 is configured to receive a global model sent by a server, train the global model based on local sample data in the client, and obtain model update data;
The update data sparsification module 420 is configured to perform sparsification processing on the model update data to obtain model update sparsification data;
the variable length coding compression module 430 is configured to perform variable length coding compression on the model update sparse data to obtain model update variable length coding data;
and the variable length coding data uploading module 440 is configured to upload the model update variable length coding data to the server, so that the server updates the global model according to the model update variable length coding data to obtain a target global model.
According to the technical scheme, the global model sent by the server is received, the global model is trained based on the local sample data in the client to obtain the model update data, the model update sparse data is obtained by sparse processing of the model update data, the preliminary compression of the model update data is realized, the variable length coding compression is carried out on the model update sparse data, the bit number of data distribution is more reasonable, the data compression error is reduced, the model update variable length coding data is uploaded to the server, and the server updates the global model according to the model update variable length coding data to obtain the target global model with higher precision.
In some alternative embodiments, the update data sparseness module 420 is further configured to:
determining an absolute value of the model update data;
based on the absolute value of the model updating data, sequencing the model updating data to obtain a sequencing result of the model updating data;
and acquiring a preset number of elements from the ordering result of the model updating data result, and determining the preset number of elements as model updating sparse data.
In some alternative embodiments, the variable length code compression module 430 includes:
the distribution parameter number determining unit is used for dividing the model updating sparse data into a plurality of data packets and determining the parameter number distributed by each data packet;
a target coding length determining unit, configured to determine a target coding length based on the number of parameters allocated to each data packet;
and the variable length coding compression unit is used for compressing the model updating sparse data based on the target coding length to obtain model updating variable length coding data.
In some alternative embodiments, the allocation parameter number determining unit is further configured to:
dividing the model updating sparse data into a plurality of data packets, inputting the model updating sparse data corresponding to each data packet into an optimization objective function, and performing minimization treatment on the optimization objective function to obtain the parameter quantity distributed by each data packet;
The data packet comprises a data packet header and a data packet load, wherein the data packet header comprises header information, data packet position identification specification information, centroid identification specification information and centroid distribution information; the data packet load comprises a position identifier and a centroid identifier of each parameter.
In some alternative embodiments, the target coding length determining unit is further configured to:
and inputting the parameter quantity distributed by each data packet into a pre-configured code length determining model to obtain a target code length.
In some alternative embodiments, the preconfigured code length determining model is:
wherein y is r Table b representing the target encoding length of the r-th packetIndicating specification parameters of the data packet, H indicating header information of the data packet, and P r And s represents the position identification of the model updating parameter.
In some alternative embodiments, the training device of the machine learning model further includes:
the data scaling module is used for scaling the model updating variable-length coding data to obtain scaled model updating variable-length coding data;
correspondingly, the variable length encoded data uploading module 440 is further configured to:
And uploading the scaled model update variable length coding data to the server.
The training device of the machine learning model provided by the embodiment of the invention can execute the training method of the machine learning model provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Example five
Fig. 7 shows a schematic diagram of the structure of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital assistants, cellular telephones, smartphones, wearable devices (e.g., helmets, eyeglasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 7, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An I/O interface 15 is also connected to bus 14.
Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above, such as a training method for a machine learning model, the method comprising:
receiving a global model sent by a server, and training the global model based on local sample data in the client to obtain model update data;
Performing sparsification processing on the model update data to obtain model update sparsification data;
performing variable length coding compression on the model updating sparse data to obtain model updating variable length coding data;
and uploading the model updating variable length coding data to the server so that the server updates the global model according to the model updating variable length coding data to obtain a target global model.
In some embodiments, the training method of the machine learning model may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as the storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the training method of the machine learning model described above may be performed. Alternatively, in other embodiments, processor 11 may be configured to perform the training method of the machine learning model in any other suitable manner (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (8)

1. A method for training a machine learning model, applied to at least one client, wherein a server performs multiple iterations of distributed model training with multiple clients, the method comprising, during any one iteration:
receiving a global model sent by a server, and training the global model based on local sample data in the client to obtain model update data;
performing sparsification processing on the model update data to obtain model update sparsification data;
performing variable length coding compression on the model updating sparse data to obtain model updating variable length coding data;
uploading the model updating variable length coding data to the server so that the server updates the global model according to the model updating variable length coding data to obtain a target global model;
the variable length coding compression is carried out on the model updating sparse data to obtain model updating variable length coding data, and the variable length coding data comprises the following steps:
dividing the model updating sparse data into a plurality of data packets, and determining the number of parameters distributed by each data packet; determining a target coding length based on the number of parameters allocated to each data packet; compressing the model update sparse data based on the target coding length to obtain model update variable-length coding data;
The dividing the model updating sparse data into a plurality of data packets, and determining the parameter quantity allocated by each data packet includes:
dividing the model updating sparse data into a plurality of data packets, inputting the model updating sparse data corresponding to each data packet into an optimization objective function, and performing minimization treatment on the optimization objective function to obtain the parameter quantity distributed by each data packet; the data packet comprises a data packet header and a data packet load, wherein the data packet header comprises header information, data packet position identification specification information, centroid identification specification information and centroid distribution information; the data packet load comprises a position identifier and a centroid identifier of each parameter.
2. The method of claim 1, wherein the performing the thinning process on the model update data to obtain model update thinned data comprises:
determining an absolute value of the model update data;
based on the absolute value of the model updating data, sequencing the model updating data to obtain a sequencing result of the model updating data;
and acquiring a preset number of elements from the ordering result of the model updating data result, and determining the preset number of elements as model updating sparse data.
3. The method of claim 1, wherein said determining a target code length based on the number of parameters allocated for each of said data packets comprises:
and inputting the parameter quantity distributed by each data packet into a pre-configured code length determining model to obtain a target code length.
4. A method according to claim 3, wherein the preconfigured code length determining model is:
wherein, b represents the specification parameter of the data packet, H represents the header information of the data packet, s represents the number of parameters allocated by the data packet, and s represents the position identification of the model update parameter.
5. The method of claim 1, further comprising, after said subjecting said model update sparse data to variable length coding compression, the steps of:
scaling the model update variable length coding data to obtain scaled model update variable length coding data;
correspondingly, the uploading the model update variable length coding data to the server comprises the following steps:
and uploading the scaled model update variable length coding data to the server.
6. A training device for a machine learning model, applied to at least one client, wherein a server performs multiple iterations of distributed model training with multiple clients, and wherein during any iteration, the device comprises:
the model update data determining module is used for receiving a global model sent by a server, and training the global model based on local sample data in the client to obtain model update data;
the update data sparsification module is used for performing sparsification processing on the model update data to obtain model update sparsification data;
the variable length coding compression module is used for performing variable length coding compression on the model updating sparse data to obtain model updating variable length coding data;
the variable length coding data uploading module is used for uploading the model updating variable length coding data to the server so that the server updates the global model according to the model updating variable length coding data to obtain a target global model;
wherein, the variable length coding compression module includes:
the distribution parameter number determining unit is used for dividing the model updating sparse data into a plurality of data packets and determining the parameter number distributed by each data packet;
A target coding length determining unit, configured to determine a target coding length based on the number of parameters allocated to each data packet;
the variable length coding compression unit is used for compressing the model updating sparse data based on the target coding length to obtain model updating variable length coding data;
the distribution parameter quantity determining unit is further configured to divide the model update sparse data into a plurality of data packets, input the model update sparse data corresponding to each data packet into an optimization objective function, and perform minimization processing on the optimization objective function to obtain the parameter quantity distributed by each data packet; the data packet comprises a data packet header and a data packet load, wherein the data packet header comprises header information, data packet position identification specification information, centroid identification specification information and centroid distribution information; the data packet load comprises a position identifier and a centroid identifier of each parameter.
7. An electronic device, the electronic device comprising:
at least one processor;
and a memory communicatively coupled to the at least one processor;
wherein the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the training method of the machine learning model of any one of claims 1-5.
8. A computer readable storage medium storing computer instructions for causing a processor to perform the method of training the machine learning model of any one of claims 1-5.
CN202310312859.5A 2023-03-22 2023-03-22 Training method and device for machine learning model, electronic equipment and storage medium Active CN116341689B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310312859.5A CN116341689B (en) 2023-03-22 2023-03-22 Training method and device for machine learning model, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310312859.5A CN116341689B (en) 2023-03-22 2023-03-22 Training method and device for machine learning model, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN116341689A CN116341689A (en) 2023-06-27
CN116341689B true CN116341689B (en) 2024-02-06

Family

ID=86894467

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310312859.5A Active CN116341689B (en) 2023-03-22 2023-03-22 Training method and device for machine learning model, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116341689B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111713110A (en) * 2017-12-08 2020-09-25 松下电器(美国)知识产权公司 Image encoding device, image decoding device, image encoding method, and image decoding method
CN111901829A (en) * 2020-07-10 2020-11-06 江苏智能交通及智能驾驶研究院 Wireless federal learning method based on compressed sensing and quantitative coding
CN113178191A (en) * 2021-04-25 2021-07-27 平安科技(深圳)有限公司 Federal learning-based speech characterization model training method, device, equipment and medium
CN113222179A (en) * 2021-03-18 2021-08-06 北京邮电大学 Federal learning model compression method based on model sparsification and weight quantization
CN113259333A (en) * 2021-04-29 2021-08-13 深圳大学 Federal learning data compression method, system, terminal, server and storage medium
CN114422606A (en) * 2022-03-15 2022-04-29 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Federal learning communication overhead compression method, device, equipment and medium
CN114861790A (en) * 2022-04-29 2022-08-05 深圳大学 Method, system and device for optimizing federal learning compression communication
WO2022269469A1 (en) * 2021-06-22 2022-12-29 Nokia Technologies Oy Method, apparatus and computer program product for federated learning for non independent and non identically distributed data
CN115564062A (en) * 2022-09-26 2023-01-03 南京理工大学 Federal learning system and method based on model pruning and transmission compression optimization
CN115829027A (en) * 2022-10-31 2023-03-21 广东工业大学 Comparative learning-based federated learning sparse training method and system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019219846A1 (en) * 2018-05-17 2019-11-21 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Concepts for distributed learning of neural networks and/or transmission of parameterization updates therefor
CN114548426B (en) * 2022-02-17 2023-11-24 北京百度网讯科技有限公司 Asynchronous federal learning method, business service prediction method, device and system

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111713110A (en) * 2017-12-08 2020-09-25 松下电器(美国)知识产权公司 Image encoding device, image decoding device, image encoding method, and image decoding method
CN111901829A (en) * 2020-07-10 2020-11-06 江苏智能交通及智能驾驶研究院 Wireless federal learning method based on compressed sensing and quantitative coding
CN113222179A (en) * 2021-03-18 2021-08-06 北京邮电大学 Federal learning model compression method based on model sparsification and weight quantization
CN113178191A (en) * 2021-04-25 2021-07-27 平安科技(深圳)有限公司 Federal learning-based speech characterization model training method, device, equipment and medium
CN113259333A (en) * 2021-04-29 2021-08-13 深圳大学 Federal learning data compression method, system, terminal, server and storage medium
WO2022269469A1 (en) * 2021-06-22 2022-12-29 Nokia Technologies Oy Method, apparatus and computer program product for federated learning for non independent and non identically distributed data
CN114422606A (en) * 2022-03-15 2022-04-29 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Federal learning communication overhead compression method, device, equipment and medium
CN114861790A (en) * 2022-04-29 2022-08-05 深圳大学 Method, system and device for optimizing federal learning compression communication
CN115564062A (en) * 2022-09-26 2023-01-03 南京理工大学 Federal learning system and method based on model pruning and transmission compression optimization
CN115829027A (en) * 2022-10-31 2023-03-21 广东工业大学 Comparative learning-based federated learning sparse training method and system

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
A Fast Blockchain-Based Federated Learning Framework With Compressed Communications;Laizhong Cui 等;《IEEE Journal on Selected Areas in Communications》;第40卷(第12期);3358-3372 *
On the ICN-IoT with federated learning integration of communication: Concepts, security-privacy issues, applications, and future perspectives;Anichur Rahman 等;《Future Generation Computer Systems》;第138卷;61-88 *
基于分布式计算框架的大数据机器学习分析;潘世成 等;《电子设计工程》;第28卷(第11期);79-83 *
基于跨层设计的边缘网络通信性能优化;黄涛;《中国博士学位论文全文数据库 信息科技辑》;第2021年卷(第12期);I136-19 *

Also Published As

Publication number Publication date
CN116341689A (en) 2023-06-27

Similar Documents

Publication Publication Date Title
CN112561078B (en) Distributed model training method and related device
US20190236453A1 (en) Method and system for data transmission, and electronic device
US20190044535A1 (en) Systems and methods for compressing parameters of learned parameter systems
CN111091278A (en) Edge detection model construction method and device for mechanical equipment anomaly detection
CN114065863B (en) Federal learning method, apparatus, system, electronic device and storage medium
CN114298322A (en) Federal learning method, device, system, electronic equipment and computer readable medium
CN115567589B (en) Compression transmission method, device and equipment of JSON data and storage medium
CN114418086B (en) Method and device for compressing neural network model
CN112488060A (en) Object detection method, device, apparatus, medium, and program product
CN114374440A (en) Estimation method and device of classical capacity of quantum channel, electronic device and medium
CN115496970A (en) Training method of image task model, image recognition method and related device
CN116341689B (en) Training method and device for machine learning model, electronic equipment and storage medium
CN110046670B (en) Feature vector dimension reduction method and device
CN107645665A (en) A kind of method and device of WebP entropy codes
CN114065913A (en) Model quantization method and device and terminal equipment
CN113807397A (en) Training method, device, equipment and storage medium of semantic representation model
EP3683733A1 (en) A method, an apparatus and a computer program product for neural networks
CN116702861B (en) Compression method, training method, processing method and device of deep learning model
CN117708071B (en) Processing method and device for coal mine equipment operation parameters based on big data
CN116611495B (en) Compression method, training method, processing method and device of deep learning model
CN117827710B (en) DMA bandwidth determining method, device, equipment and medium based on AI chip
CN114640357B (en) Data encoding method, apparatus and storage medium
CN115482422B (en) Training method of deep learning model, image processing method and device
CN113963433B (en) Motion search method, motion search device, electronic equipment and storage medium
CN115049051A (en) Model weight compression method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant