WO2023279975A1 - 一种模型处理方法、联邦学习方法及相关设备 - Google Patents

一种模型处理方法、联邦学习方法及相关设备 Download PDF

Info

Publication number
WO2023279975A1
WO2023279975A1 PCT/CN2022/100682 CN2022100682W WO2023279975A1 WO 2023279975 A1 WO2023279975 A1 WO 2023279975A1 CN 2022100682 W CN2022100682 W CN 2022100682W WO 2023279975 A1 WO2023279975 A1 WO 2023279975A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
pruning
loss function
neural network
models
Prior art date
Application number
PCT/CN2022/100682
Other languages
English (en)
French (fr)
Inventor
李银川
邵云峰
刘潇峰
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023279975A1 publication Critical patent/WO2023279975A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Definitions

  • the embodiments of the present application relate to the field of terminal artificial intelligence, and in particular, to a model processing method, a federated learning method, and related equipment.
  • Artificial intelligence is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • artificial intelligence is the branch of computer science that attempts to understand the nature of intelligence and produce a new class of intelligent machines that respond in ways similar to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, basic AI theory, etc.
  • the deep neural network needs to be simplified.
  • the pruning technology is the most widely used, and the pruning technology realizes the compression of the deep neural network by removing some parameters and modules.
  • the classic pruning process includes three steps. First, the training of the model is completed based on the local dataset, and then the trained model is pruned according to the preset rules. Finally, the pruned model needs to be fine-tuned using the local dataset. In order to avoid too much loss of model accuracy. The whole pruning process is more cumbersome and less efficient.
  • the embodiment of the present application provides a model processing method and related equipment, which can be used in combination with a federated learning method.
  • a model processing method and related equipment which can be used in combination with a federated learning method.
  • the first aspect of the present application provides a model processing method, which can be applied to model training and pruning scenarios.
  • the method can be executed by a model processing device (such as a client), and can also be executed by a component of the client (such as a processor, chip or chip system, etc.), the method includes: obtaining training data including label values; using the training data as input, training the neural network model according to the first loss function to obtain the first model, the first model includes a plurality of substructures, and the plurality of substructures Each substructure in includes at least two neurons; the first model is pruned based on the second loss function and constraints to obtain the second model, and the second loss function is used to indicate that at least one of the multiple substructures
  • the structure is pruned, and the constraint conditions are used to constrain the accuracy of the second model to be not lower than the accuracy of the first model, and the accuracy indicates the degree of difference between the output value of the model and the label value.
  • the first loss function can also be understood as a data loss function, which is mainly used to evaluate the accuracy of the model in the process of using data to train the model.
  • the second loss function can be understood as a sparse loss function, which is mainly used to make the model sparse (or called pruning).
  • the substructure can be the channel, feature map, network layer, subnetwork, or other predefined network structure composed of multiple neurons of the neural network model; when the neural network model is a convolutional neural network, the substructure can also be convolution kernel. In short, a substructure can be regarded as a functional whole. Pruning a substructure during pruning means pruning all the neurons included in the substructure.
  • the process of pruning the first model considering the constraints based on the data loss function is equivalent to constraining the pruning direction of the first model, so that the accuracy of the second model obtained by pruning is not as good as The accuracy of the first model is lower than the accuracy of the first model, reducing the subsequent steps of adjusting the accuracy of the model through fine-tuning, and pruning the substructure is more efficient than pruning neurons one by one, so as to ensure the accuracy of the pruned model while improving The efficiency of the model pruning process, the resulting model structure is also more concise.
  • the above constraints are specifically used to constrain the included angle between the descending direction of the first loss function and the descending direction of the second loss function to be less than or equal to 90 degrees .
  • the descending direction of the first loss function may be a gradient direction obtained by deriving the first loss function
  • the descending direction of the second loss function may be the gradient direction obtained by deriving the second loss function.
  • the above constraint condition is specifically used to constrain the value of the first loss function of the second model to be less than or equal to the value of the first loss function of the first model.
  • the first model and the second model use the first model and the second model to predict the same data, and then use the first loss function to measure the accuracy of the first model and the second model. The smaller the value of the corresponding first loss function, the better the accuracy of the model. high.
  • the specific accuracy may be determined by using the value of the first loss function of the second model and the value of the first loss function of the first model.
  • the accuracy of the first model and the second model may also be compared using an evaluation method different from the first loss function, and the present application does not limit the specific evaluation method.
  • the above-mentioned second loss function includes a first sparse item, and the first sparse item is related to a weight of at least one substructure in the plurality of substructures.
  • the first sparse item in the second loss function processes the substructure as a whole, so all channels are cut off during pruning , convolution kernel, feature map, network layer and other network structures, instead of a single neuron, greatly improves the efficiency of pruning, and the obtained model is more refined and lightweight.
  • the above-mentioned second loss function further includes a difference item, where the difference item indicates a difference between the first model and the second model.
  • the degree of difference between the models before and after pruning can be constrained to a certain extent, the similarity of the models before and after pruning can be ensured, and the accuracy of the model after pruning can be guaranteed.
  • the above step: pruning the first model based on the second loss function and constraints to obtain the second model includes: calculating the update coefficient based on the constraints , the update coefficient is used to adjust the direction of the first sparse item; the update coefficient is used to update the first sparse item in the second loss function to obtain the third loss function, the third loss function includes the difference item and the second sparse item, and the second The sparse item is updated based on the update coefficient and the first sparse item; the first model is pruned based on the third loss function to obtain the second model.
  • the pruning direction can be adjusted by introducing an update coefficient, so that the pruned second model satisfies the constraint condition.
  • the pruning in this case can also be understood as directional pruning.
  • the above-mentioned third loss function includes:
  • 2 is the L 2 norm
  • V n is the parameter of the first model
  • W n is the parameter of the second model
  • is the hyperparameter, which is used to adjust the weight of the first sparse item
  • si is the update Coefficients, by adjusting si to satisfy the constraints, is the parameter of the i-th substructure in the second model.
  • the first sparse item is updated by updating the coefficient to obtain the second sparse item
  • the updated second sparse item can perform directional pruning on the model to ensure that the accuracy of the model is not lost after pruning.
  • the above step: pruning the first model based on the second loss function and constraints to obtain the second model includes: pruning the first model based on the second loss function
  • the first model is randomly pruned at least once until the second model obtained after pruning the first model satisfies the constraint condition.
  • the first model can be randomly pruned based on the second loss function to obtain the second model; if the constraint condition is satisfied, the second model is output; if the constraint condition is not satisfied, the first model is repeatedly pruned based on the second loss function.
  • the model is randomly pruned until the constraints are met.
  • pruning can be performed by random pruning plus constraint condition judgment, and the pruned model is output only when the constraint conditions are met. This method does not need to use data to fine-tune the pruned model, which is more versatile.
  • the above method is applied to the client in the federated learning system, and the data used to train the neural network model is the local data of the client, for example, the data collected by the sensor of the client The data or the data generated during the operation of the program application of the client, etc., the method also includes: receiving the neural network model sent by the upstream device; sending the second model to the upstream device.
  • Upstream devices are devices such as servers that can communicate with clients.
  • this method can be applied to federated learning scenarios.
  • this method can help upstream devices to filter out multi-level models with no loss of accuracy and simplified structure for aggregation.
  • a model that reduces the communication burden on the uplink that is, the communication link from the client to the upstream device).
  • the above training data includes: image data, audio data, or text data. It can be understood that the above three types are just examples of training data. In practical applications, the specific forms of training data are different according to the types of tasks processed by the neural network model, which are not limited here.
  • the above-mentioned neural network model is used to classify and/or identify image data. It can be understood that in practical applications, the neural network model can also be used for target detection, information recommendation, speech recognition, character recognition, question and answer tasks, man-machine games, etc., which are not limited here.
  • the pruning method provided by the embodiment of this application can be applied to the neural network model applied to any scene (for example: intelligent terminal, intelligent transportation, intelligent medical care, automatic driving, smart city, etc.) , which is conducive to improving the pruning efficiency of the neural network model.
  • any scene for example: intelligent terminal, intelligent transportation, intelligent medical care, automatic driving, smart city, etc.
  • it can also ensure the accuracy of the neural network model.
  • the above steps of model training and pruning can be performed once or multiple times, depending on the needs. In this case, a model that is more in line with user expectations can be obtained, while ensuring that the model accuracy is not lost. Pruning can be completed under no circumstances, saving storage and communication costs.
  • the second aspect of the present application provides a federated learning method, which can be applied to model pruning scenarios.
  • This method can be executed by upstream devices (cloud servers or edge servers, etc.), and can also be executed by components of upstream devices (such as processors, chips, etc.) or chip system, etc.), this method can be understood as an operation that constrains pruning first and then aggregates.
  • the method includes: sending a neural network model to a plurality of downstream devices, the neural network model including a plurality of substructures, each substructure including at least two neurons; receiving a plurality of first models from a plurality of downstream devices, a plurality of first models Obtained by neural network model training, wherein, the loss function used in the training process can be referred to as the first loss function; based on the second loss function) and constraints to pruning a plurality of first models respectively, wherein the second loss function It is used to indicate the pruning of the substructure of multiple first models, and the constraint condition is used to constrain the accuracy of each first model after pruning to not be lower than the accuracy before pruning; multiple first models after pruning Aggregation is performed to obtain a second model.
  • the above step: pruning the first model based on the second loss function and constraints includes: performing at least one Random pruning until the model obtained after pruning the first model satisfies the constraints.
  • the first model can be randomly pruned based on the second loss function to obtain the first model before and after pruning; if the constraint condition is satisfied, the model is output; if the constraint condition is not satisfied, the second loss function is used to repeat the pruning The first model performs a step of random pruning until the constraints are met.
  • the third aspect of the present application provides a federated learning method, which can be applied to model pruning scenarios.
  • the method can be executed by a server (cloud server or edge server, etc.), and can also be executed by a server component (such as a processor, chip or chip) system, etc.), this method can be understood as an operation of aggregation first and then constraint pruning.
  • the method includes: sending a neural network model to a plurality of downstream devices, the neural network model including a plurality of substructures, each substructure including at least two neurons; receiving a plurality of first models from a plurality of downstream devices, a plurality of first models Obtained by neural network model training, wherein the loss function used in the training process can be called the first loss function; multiple first models are aggregated to obtain the second model; based on the loss function (hereinafter referred to as the second loss function ) and constraint conditions to prune the second model, wherein the second loss function is used to indicate the substructure of the second model to be pruned, and the constraint condition is used to constrain the accuracy of the second model after pruning to not be lower than that of pruning previous accuracy.
  • the loss function hereinafter referred to as the second loss function
  • constraint conditions to prune the second model, wherein the second loss function is used to indicate the substructure of the second model to be pruned, and the constraint condition is used to constrain the accuracy of the second model after pruning to not be lower
  • the above step: pruning the second model based on the second loss function and constraints includes: performing at least one Random pruning until the model obtained after pruning the second model satisfies the constraints.
  • the second model can be randomly pruned based on the second loss function to obtain the model; if the constraint condition is satisfied, the pruned second model is output; if the constraint condition is not satisfied, the second loss function is repeatedly used to The second model performs a step of random pruning until the constraints are met.
  • the server prunes the model using the method provided in the embodiment of the present application, it does not need to use the training data to adjust the model to ensure the accuracy of the model, that is, the server can The model is pruned without using the training data of the client, and the accuracy of the model is guaranteed. This avoids the need to transmit the training data of the client to the upstream device during pruning, which can protect the data privacy of the client.
  • the substructure may be a channel, a feature map, a network layer, a subnetwork of a neural network model, or other predefined network structures composed of multiple neurons; when When the neural network model is a convolutional neural network, the substructure can also be a convolution kernel. In short, a substructure can be regarded as a functional whole. Pruning a substructure during pruning means pruning all the neurons included in the substructure. The method provided in this application prunes each substructure as a whole, which is more efficient than pruning neurons one by one, and the obtained model structure is also more concise.
  • the above-mentioned multiple first models are trained according to the first loss function, and the loss function used for pruning is called the second loss function , the constraint condition is specifically used to constrain the included angle between the descending direction of the first loss function and the descending direction of the second loss function to be less than or equal to 90 degrees.
  • the descending direction of the first loss function may be a gradient direction obtained by deriving the first loss function
  • the descending direction of the second loss function may be the gradient direction obtained by deriving the second loss function.
  • the above constraints are specifically used to constrain the value of the first loss function of the model after pruning to be less than or equal to the value of the first loss function of the model before pruning.
  • a loss function value In other words, use the model before and after pruning to predict the same data, and then use the first loss function to measure the accuracy of the model before and after pruning. The smaller the value of the corresponding first loss function, the higher the accuracy of the model.
  • the specific accuracy may be determined by using the value of the first loss function of the model before and after pruning.
  • an evaluation method different from that of the first loss function may also be used to compare the accuracy of the model before and after pruning, and this application does not limit the specific evaluation method.
  • the foregoing steps further include: sending the pruned models to the multiple downstream devices.
  • This possible implementation can be applied to a scenario where a cloud server or an edge server performs pruning and aggregation.
  • the pruned model is sent to multiple downstream devices.
  • pruning the model and then sending it to the downstream device can reduce the communication burden on the one hand, and reduce the storage space and processing capacity requirements of the downstream device on the other hand.
  • the foregoing steps further include: sending the pruned model to an upstream device.
  • This possible implementation can be applied to the scenario where the edge server performs pruning and aggregation. After pruning and aggregation, the pruned model is sent to the upstream device. In order to enable the upstream server to continue to aggregate and prune the model to integrate information from more client devices.
  • the above-mentioned second loss function includes a first sparse item, and the first sparse item is related to the weight of at least one substructure in the plurality of substructures .
  • the first sparse item in the second loss function is to process the substructure as a whole, so all the channels and volumes are cut off during pruning.
  • Network structures such as product kernels, feature maps, and network layers, rather than individual neurons, greatly improve the efficiency of pruning, and the obtained models are more refined and lightweight.
  • the above-mentioned second loss function further includes a difference item, where the difference item indicates a difference between models before and after pruning.
  • the degree of difference between the models before and after pruning can be constrained to a certain extent, and the similarity of the models before and after pruning can be guaranteed.
  • the above step: respectively pruning multiple first models based on the second loss function and the constraint conditions includes: calculating the update based on the constraint conditions Coefficient, the update coefficient is used to adjust the direction of the first sparse item; use the update coefficient to update the first sparse item in the second loss function to obtain the third loss function, the third loss function includes the difference item and the second sparse item, the first The second sparse item is updated based on the update coefficient and the first sparse item; the model is pruned based on the third loss function.
  • the pruning direction can be adjusted by introducing an update coefficient, so that the pruned first model satisfies the constraint condition.
  • the pruning in this case can also be understood as directional pruning.
  • the above-mentioned third loss function includes:
  • 2 is the L 2 norm
  • V n and W n are the parameters of the model before and after pruning
  • is a hyperparameter, which is used to adjust the weight of the first sparse item
  • s i is the update coefficient
  • the first sparse item is updated by updating the coefficient to obtain the second sparse item
  • the updated second sparse item can perform directional pruning on the model to ensure that the accuracy of the model is not lost after pruning.
  • the above training data includes: image data, audio data, or text data. It can be understood that the above three types are only examples of training data. In practical applications, the specific forms of training data are different according to the input of the neural network model, which is not limited here.
  • the above-mentioned neural network model is used to classify and/or identify image data. It can be understood that, in practical applications, the neural network model can also be used for prediction, encoding, decoding, etc., which are not specifically limited here.
  • the above steps of receiving, pruning, aggregating, and sending can be performed once or multiple times, which can be set according to specific needs.
  • a model that is more in line with user expectations can be obtained, while ensuring the accuracy of the model Pruning is done without loss, saving storage and communication costs.
  • the fourth aspect of the present application provides a model processing device, which can be applied to model training and pruning scenarios.
  • the model processing device can be a client, and the model processing device includes: an acquisition unit for acquiring training data including label values ;
  • the training unit is used to use the training data as input, train the neural network model according to the first loss function, and obtain the first model, the first model includes a plurality of substructures, and each substructure includes at least two neurons;
  • the pruning unit uses The first model is pruned based on the second loss function and constraint conditions to obtain the second model, the second loss function is used to indicate that at least one substructure in the plurality of substructures is to be pruned, and the constraint conditions are used to constrain the first
  • the accuracy of the second model is not lower than the accuracy of the first model, and the accuracy indicates the degree of difference between the output value of the model and the label value.
  • the first loss function can also be understood as a data loss function, which is mainly used to evaluate the accuracy of the model in the process of using data to train the model.
  • the second loss function can be understood as a sparse loss function, which is mainly used to make the model sparse (or called pruning).
  • each unit of the model processing device provided in the fourth aspect may be configured to implement the method in any possible implementation manner of the first aspect.
  • the fifth aspect of the present application provides an upstream device, which can be applied to model training and pruning scenarios, federated learning scenarios, etc.
  • the upstream device can be a cloud server or an edge server in a federated learning scenario, and the upstream device includes: a sending unit , used to send the neural network model to multiple downstream devices, the neural network model includes multiple substructures, each substructure includes at least two neurons; the receiving unit is used to receive multiple first models from multiple downstream devices, multiple A first model is trained by the neural network model, wherein the loss function used in the training process can be called the first loss function; the pruning unit is used to pair the loss function (subsequently called the second loss function) and constraints Multiple first models are pruned separately, wherein the second loss function is used to indicate the substructure of multiple first models to be pruned, and the constraint condition is used to constrain the accuracy of each first model after pruning to not be lower than The accuracy before pruning; the aggregation unit is used to aggregate the multiple first models after pruning to obtain the second model.
  • each unit of the upstream device provided in the foregoing fifth aspect may be configured to implement the method in any possible implementation manner of the foregoing second aspect.
  • the sixth aspect of the present application provides an upstream device, which can be applied to model training and pruning scenarios, federated learning scenarios, etc.
  • the upstream device can be a cloud server or an edge server in a federated learning scenario, and the upstream device includes: a sending unit , used to send the neural network model to multiple downstream devices, the neural network model includes multiple substructures, each substructure includes at least two neurons; the receiving unit is used to receive multiple first models from multiple downstream devices, multiple A first model is trained by the neural network model, wherein the loss function used in the training process can be called the first loss function; the aggregation unit is used to aggregate multiple first models to obtain the second model; pruning The unit is used for pruning the second model based on the loss function (hereinafter referred to as the second loss function) and the constraint condition, wherein the second loss function is used to indicate the substructure of the second model to be pruned, and the constraint condition is used
  • the accuracy after pruning of the constrained second model is not lower than the accuracy before pruning.
  • each unit of the upstream device provided in the sixth aspect may be configured to implement the method in any possible implementation manner of the aforementioned third aspect.
  • the seventh aspect of the present application provides an electronic device, including: a processor, the processor is coupled with a memory, and the memory is used to store programs or instructions, and when the programs or instructions are executed by the processor, the electronic device realizes the above-mentioned first aspect , the method in any possible implementation of the second aspect and the third aspect.
  • the eighth aspect of the present application provides a computer-readable medium, on which a computer program or instruction is stored.
  • the computer program or instruction When the computer program or instruction is run on the computer, the computer executes the above-mentioned first aspect, second aspect, and third aspect.
  • a ninth aspect of the present application provides a computer program product.
  • the computer program product When the computer program product is executed on a computer, the computer executes the method in any possible implementation manner of the aforementioned first aspect, second aspect, or third aspect.
  • Fig. 1 is a kind of structural schematic diagram of main frame of artificial intelligence
  • Fig. 2 is a schematic diagram of the architecture of a federated learning system provided by the present application.
  • FIG. 3 is a schematic diagram of the architecture of another federated learning system provided by the present application.
  • FIG. 4 is a schematic diagram of the architecture of another federated learning system provided by the present application.
  • FIG. 5 is a schematic diagram of the architecture of another federated learning system provided by the present application.
  • Figure 6 is a schematic flow chart of the federated learning method provided by this application.
  • Fig. 7A, Fig. 8-Fig. 10 are several schematic diagrams of the pruning direction in the pruning process provided by the present application.
  • Figure 7B is a schematic structural view of the model before and after pruning provided by the present application.
  • Fig. 11 is another schematic flowchart of the federated learning method provided by this application.
  • Fig. 12 is a schematic flow chart of the model processing method provided by the present application.
  • Fig. 13 is a schematic structural diagram of the model processing equipment provided by the present application.
  • Figure 14 is a schematic structural diagram of the upstream equipment provided by the present application.
  • Fig. 15 is another structural schematic diagram of the upstream equipment provided by the present application.
  • Fig. 16 is another structural schematic diagram of the model processing equipment provided by the present application.
  • Fig. 17 is another structural schematic diagram of the upstream equipment provided by the present application.
  • Embodiments of the present application provide a model processing method, a federated learning method, and related devices.
  • the process of pruning the first model considering the constraints based on the data loss function is equivalent to providing a direction for the pruning of the first model, so that the accuracy of the second model obtained by pruning is not lower than that of the first model Accuracy, reduce the subsequent steps of fine-tuning to adjust the model accuracy, so as to improve the efficiency of the model pruning process while ensuring the accuracy of the pruned model.
  • Figure 1 shows a schematic structural diagram of the main framework of artificial intelligence.
  • the following is from the “intelligent information chain” (horizontal axis) and “IT value chain” ( Vertical axis) to illustrate the above artificial intelligence theme framework in two dimensions.
  • the "intelligent information chain” reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has undergone a condensed process of "data-information-knowledge-wisdom".
  • IT value chain reflects the value brought by artificial intelligence to the information technology industry from the underlying infrastructure of artificial intelligence, information (provided and processed by technology) to the systematic industrial ecological process.
  • the infrastructure provides computing power support for the artificial intelligence system, realizes communication with the outside world, and realizes support through the basic platform.
  • computing power is provided by smart chips, such as central processing unit (central processing unit, CPU), network processing unit (neural-network processing unit, NPU), graphics processing unit (English: graphics processing unit, GPU), Application specific integrated circuit (application specific integrated circuit, ASIC) or field programmable logic gate array (field programmable gate array, FPGA) and other hardware acceleration chips
  • the basic platform includes distributed computing framework and network and other related platform protection and support, This can include cloud storage and computing, interconnection networks, and more.
  • sensors communicate with the outside to obtain data, and these data are provided to the smart chips in the distributed computing system provided by the basic platform for calculation.
  • Data from the upper layer of the infrastructure is used to represent data sources in the field of artificial intelligence.
  • the data involves graphics, images, voice, text, and IoT data of traditional equipment, including business data of existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
  • Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making, etc.
  • machine learning and deep learning can symbolize and formalize intelligent information modeling, extraction, preprocessing, training, etc. of data.
  • Reasoning refers to the process of simulating human intelligent reasoning in a computer or intelligent system, and using formalized information to carry out machine thinking and solve problems according to reasoning control strategies.
  • the typical functions are search and matching.
  • Decision-making refers to the process of decision-making after intelligent information is reasoned, and usually provides functions such as classification, sorting, and prediction.
  • some general capabilities can be formed based on the results of data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing, speech recognition, image processing identification, etc.
  • Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. It is the packaging of the overall solution of artificial intelligence, which commercializes intelligent information decision-making and realizes landing applications. Its application fields mainly include: intelligent terminals, intelligent transportation, Smart healthcare, autonomous driving, smart cities, etc.
  • the embodiment of this application can be applied to the client or the cloud, and can also be applied to training machine learning models used in various application federated learning scenarios.
  • the trained machine learning models can be applied to the above-mentioned various applications
  • the processing objects of the trained machine learning model can be image samples, discrete data samples, text samples or voice samples, etc., which are not exhaustive here.
  • the machine learning model can be specifically expressed as a neural network, a linear model, or other types of machine learning models.
  • multiple modules that make up a machine learning model can be specifically expressed as a neural network module, an existing model module, or other types of machines.
  • the modules of the learning model, etc. are not exhaustive here.
  • only the machine learning model is represented as a neural network as an example for illustration, and it can be understood by analogy when the machine learning model is represented as a type other than the neural network, and will not be repeated in the embodiments of the present application.
  • the embodiment of the present application can be applied to the client, cloud, or federated learning, etc., mainly for training and pruning the neural network, and thus involves a large number of related applications of the neural network.
  • the following first introduces the related terms and concepts of the neural network that may be involved in the embodiment of the present application.
  • the neural network can be composed of neural units, and the neural unit can refer to an operation unit that takes X s and the intercept 1 as input, and the output of the operation unit can be shown in formula (1-1):
  • Ws is the weight of xs
  • b is the bias of the neural unit.
  • f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal.
  • the output signal of the activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function.
  • a neural network is a network formed by connecting multiple above-mentioned single neural units, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected with the local receptive field of the previous layer to extract the features of the local receptive field.
  • the local receptive field can be an area composed of several neural units.
  • Deep neural network also known as multi-layer neural network
  • DNN can be understood as a neural network with multiple intermediate layers.
  • DNN is divided according to the position of different layers, and the neural network inside DNN can be divided into three categories: input layer, intermediate layer, and output layer.
  • the first layer is the input layer
  • the last layer is the output layer
  • the middle layers are all intermediate layers, or hidden layers.
  • the layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer.
  • each layer can be expressed as a linear relational expression: in, is the input vector, is the output vector, Is the offset vector or bias parameter, w is the weight matrix (also called coefficient), and ⁇ () is the activation function.
  • in is the input vector
  • w is the weight matrix (also called coefficient)
  • ⁇ () is the activation function.
  • Each layer is just an input vector After such a simple operation to get the output vector Due to the large number of DNN layers, the coefficient W and the offset vector The number is also higher.
  • DNN The definition of these parameters in DNN is as follows: Take the coefficient w as an example: Assume that in a three-layer DNN, the linear coefficient from the fourth neuron of the second layer to the second neuron of the third layer is defined as The superscript 3 represents the layer number of the coefficient W, and the subscript corresponds to the output third layer index 2 and the input second layer index 4.
  • the coefficient from the kth neuron of the L-1 layer to the jth neuron of the L layer is defined as
  • the input layer has no W parameter.
  • more intermediate layers make the network more capable of describing complex situations in the real world. Theoretically speaking, a model with more parameters has a higher complexity and a greater "capacity", which means that it can complete more complex learning tasks.
  • Training the deep neural network is the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (the weight matrix formed by the vector W of many layers).
  • Convolutional neural network is a deep neural network with a convolutional structure.
  • the convolutional neural network contains a feature extractor composed of a convolutional layer and a subsampling layer, which can be regarded as a filter.
  • the convolutional layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network.
  • a neuron can only be connected to some adjacent neurons.
  • a convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units. Neural units of the same feature plane share weights, and the shared weights here are convolution kernels.
  • Shared weights can be understood as a way to extract image information that is independent of location.
  • the convolution kernel can be initialized in the form of a matrix of random size, and the convolution kernel can obtain reasonable weights through learning during the training process of the convolutional neural network.
  • the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.
  • a recurrent neural network refers to a sequence where the current output is also related to the previous output. The specific manifestation is that the network will remember the previous information, save it in the internal state of the network, and apply it to the calculation of the current output.
  • the loss function may generally include loss functions such as error square mean square, cross entropy, logarithm, and exponential.
  • mean squared error the loss function, defined as Specifically, a specific loss function can be selected according to the actual application scenario.
  • the neural network can use the error back propagation (back propagation, BP) algorithm to correct the size of the parameters in the initial neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller. Specifically, passing the input signal forward until the output will generate an error loss, and updating the parameters in the initial neural network model by backpropagating the error loss information, so that the error loss converges.
  • the backpropagation algorithm is a backpropagation movement dominated by error loss, aiming to obtain the optimal parameters of the neural network model, such as the weight matrix.
  • the client when it performs model training, it can train the global model through the loss function or the BP algorithm to obtain the trained global model.
  • a distributed machine learning algorithm through multiple clients, such as mobile devices or edge servers, and the server cooperate to complete model training and algorithm update on the premise that the data does not leave the domain, so as to obtain the trained global model. It can be understood that in the process of machine learning, each participant can conduct joint modeling with the help of data from other parties. All parties do not need to share data resources, that is, when the data does not come out of the local area, data joint training is carried out to establish a shared machine learning model.
  • the embodiment of the present application can be applied to a model processing device (such as a client, cloud) or a federated learning system.
  • a model processing device such as a client, cloud
  • a federated learning system such as a federated learning system
  • FIG. 2 it is a schematic diagram of the architecture of a federated learning system provided by this application.
  • the system may include multiple servers, and the multiple servers may establish connections with each other, that is, the servers may also communicate with each other.
  • Each server can communicate with one or more clients, and the clients can be deployed in various devices, such as deployed in mobile terminals or servers, etc., as shown in Figure 2, client 1, client 2, ..., Client N-1 and Client N etc.
  • servers or between servers and clients can interact through any communication mechanism/communication standard communication network
  • the communication network can be a wide area network, a local area network, a point-to-point connection, etc., or any combination thereof.
  • the communication network may include a wireless network, a wired network, or a combination of a wireless network and a wired network, and the like.
  • the wireless network includes but is not limited to: the fifth generation mobile communication technology (5th-Generation, 5G) system, long term evolution (long term evolution, LTE) system, global system for mobile communication (GSM) or code division Multiple access (code division multiple access, CDMA) network, wideband code division multiple access (wideband code division multiple access, WCDMA) network, wireless fidelity (wireless fidelity, WiFi), Bluetooth (bluetooth), Zigbee protocol (Zigbee), Any one or combination of radio frequency identification technology (radio frequency identification, RFID), long-range (Long Range, Lora) wireless communication, near field communication (near field communication, NFC).
  • the wired network may include an optical fiber communication network or a network composed of coaxial cables.
  • the client can be deployed in various servers or terminals.
  • the client mentioned below can also refer to the server or terminal on which the client software program is deployed.
  • the terminal can include a mobile terminal or a fixedly installed terminal, etc., for example , the terminal specifically may include a mobile phone, a tablet, a personal computer (personal computer, PC), a smart bracelet, a stereo, a TV, a smart watch, or other terminals.
  • each server can send the model to be trained to the client that has established a connection with it, and the client can use the locally stored training samples to train the model, and send the parameters of the trained model and other data Feedback to the server, after the server receives one or more trained models fed back by one or more clients, it can prune the received one or more models, and prune one or more models after pruning
  • the data of the model is aggregated to obtain the aggregated data, which is equivalent to the aggregated model. After the stop condition is met, the final model can be output to complete federated learning.
  • an intermediate server (referred to as an edge server in this application) is generally introduced between the server and the client to form a multi-layer architecture, namely
  • the client-side server-cloud server architecture reduces the transmission delay between the client and the federated learning system through the side server.
  • the federated learning system to which the federated learning method provided in this application can be applied may include various topological relationships.
  • the federated learning system may include two or more layers of architecture. Some possible architectures are exemplified below.
  • FIG. 3 a schematic structural diagram of a federated learning system provided by this application.
  • the federated learning system includes a server-client two-layer architecture.
  • a server can directly establish a connection with one or more clients.
  • the server sends the global model to one or more clients that have established a connection with it.
  • the client uses locally stored training samples to train the received global model, and feeds the trained global model back to the server, and the server performs pruning based on the received trained global model, And update the global model stored locally to get the final global model.
  • the client uses locally stored training samples to train and prune the received global model, and feeds back the pruned global model to the server, and the server based on the received training and pruning
  • the final global model updates the locally stored global model to obtain the final global model.
  • FIG. 4 a schematic structural diagram of a federated learning system provided by this application.
  • the federated learning system includes one or more cloud servers, one or more side servers, and one or more clients, forming a cloud server-side server-client three-tier architecture.
  • one or more side servers are connected to the cloud server, and one or more clients are connected to the side server.
  • the cloud server sends the locally saved global model to the edge server, and then the edge server sends the global model to the client connected to it.
  • the client uses locally stored training samples to train the received global model, and feeds the trained global model back to the edge server, and the edge server prunes the received trained global model branches, and use the pruned global model to update the locally stored global model, and feed back the updated global model from the edge server to the cloud server to complete federated learning.
  • the client uses locally stored training samples to train and prune the received global model, and feeds back the trained and pruned global model to the edge server, and the edge server
  • the trained global model updates the locally stored global model, and feeds back the updated global model from the edge server to the cloud server to complete federated learning.
  • the client uses locally stored training samples to train the received global model, and feeds the trained global model back to the edge server, and the edge server trains the local model according to the received trained global model.
  • the stored global model is updated, and the updated global model of the side server is fed back to the cloud server, and the cloud server prunes the received global model to obtain the pruned global model and complete federated learning.
  • the process of model pruning can be done on the client side, on the side server, or on the cloud server.
  • the pruning process can also be performed in multiple links. For example, when the client trains the model, it prunes and then sends it to the side server. When the side server aggregates the model, it also performs pruning, and then sends it to the cloud server for processing. Do limited.
  • FIG. 5 a schematic structural diagram of another federated learning system provided by this application.
  • the federated learning system includes an architecture of more than three layers, wherein one layer includes one or more cloud servers, and multiple side servers form a two-layer or more architecture, such as one or more upstream side servers.
  • One-tier architecture each upstream edge server is connected to one or more downstream edge servers.
  • Each edge server in the last layer architecture formed by the edge servers is connected to one or more clients, so that the clients form a layer architecture.
  • the most upstream cloud server sends the latest global model stored locally to the side server of the next layer, and then the side server sends the global model to the next layer layer by layer until it is sent to the client .
  • the client uses locally stored training samples to train and prune the received global model, and feeds back the trained and pruned global model to the edge server on the upper layer , and then the upper-level edge server updates the locally stored global model based on the received trained global model, and then uploads the updated global model to the upper-level edge server, and so on, until the first
  • the second-tier edge server uploads the updated global model to the cloud server, and the cloud server updates the local global model based on the received global model to obtain the final global model and complete federated learning.
  • the pruning process can be performed at any layer in the federated learning system, which is not limited here
  • the direction of data transmission to the cloud server is called upstream, and the direction of data transmission to the client is called downstream, for example, as shown in Figure 3
  • the server is the upstream device of the client
  • the client is the downstream device of the server.
  • the cloud server can be called the upstream device of the edge server, and the client can be called the downstream device of the edge server, etc., and so on .
  • the pruning step is performed by an upstream device (cloud server or edge server) or by a client, it can be divided into two situations, which are described below:
  • the upstream device performs the pruning step.
  • an embodiment of the federated learning method provided by the embodiment of this application includes steps 601 to 608 .
  • Step 601 the upstream device sends the neural network model to the client.
  • the client receives the neural network model sent by the upstream device.
  • the upstream device in this embodiment of the present application may be the server in the federated learning system shown in FIGS. 2-5 .
  • the upstream device may be any one of the multiple servers shown in Figure 2 above, or any server in the two-tier architecture shown in Figure 3 above, or it may be Any one of the cloud server or the edge server shown in 4 may also be any one of the cloud server or the edge server shown in FIG. 5 .
  • the number of clients can be one or more. If the upstream device establishes connections with multiple clients, the upstream device can send the neural network model to each client.
  • the above-mentioned neural network model can be a model stored locally in the upstream device, such as a global model stored locally in the cloud server, or the upstream device can save the received model locally or update the local storage after receiving the model sent by other servers model.
  • the upstream device can send the structural parameters of the neural network model (such as the width, depth, or convolution kernel size, etc.) of the neural network or the initial weight parameters to the client.
  • the upstream device can also send the training model to the client.
  • Configure parameters such as the learning rate, the number of epochs, or the categories in the security algorithm, so that the final training client can use the training configuration parameters to train the neural network model.
  • the neural network model may be a global model stored on the cloud server.
  • the global model stored on the cloud server is referred to as a cloud-side model hereinafter.
  • the neural network model can be a local model saved on the edge server, or called an edge server model, and the edge server receives the model issued by the upper layer edge server or cloud server Afterwards, use the received model as the side server model or update the existing side server model to obtain a new side server model, and send the new side server model (ie, the neural network model) to the client.
  • the upstream device may directly deliver the side server model (or called the neural network model) to the client.
  • the neural network model mentioned may specifically include convolutional neural networks (CNN), deep convolutional neural network (deep convolutional neural networks, DCNN), recurrent neural network (recurrent neural network, RNN) and other neural networks, the specific model to be learned can be determined according to the actual application scenario, which is not limited in this application.
  • CNN convolutional neural networks
  • DCNN deep convolutional neural networks
  • RNN recurrent neural network
  • the specific model to be learned can be determined according to the actual application scenario, which is not limited in this application.
  • the upstream device may actively send the neural network model to the connected client, or may send the neural network model to the client at the request of the client.
  • the upstream device is an edge server
  • the client can send a request message to the edge server to request to participate in federated learning.
  • the edge server confirms that the client is allowed to participate in federated learning, it can send an Deliver the neural network model.
  • the upstream device is a cloud server
  • the client can send a request message to the cloud server to request to participate in federated learning.
  • the cloud server After receiving the request message and confirming that the client is allowed to participate in federated learning, the cloud server can store the The cloud side model is sent to the side server, and the side server updates the local network model according to the received model to obtain the neural network model, and sends the neural network model to the client.
  • step 602 the client uses the training data as input to train the neural network model according to the first loss function to obtain the first model.
  • the client After the client receives the neural network model sent by the upstream device, it can update the locally stored device-side model based on the neural network model, for example, to obtain a new neural network model by means of replacement or weighted fusion. And using the training data with label values and the first loss function to train the new neural network model, so as to obtain the first model.
  • the training data in the embodiment of the present application may have various types or forms, which are specifically related to the scene where the model is applied.
  • the specific form of the training data can be audio data, etc.
  • the role of the model is image classification
  • the specific form of training data can be image data, etc.
  • the role of the model is to predict speech
  • the specific form of the training data can be text data, etc. It can be understood that the above situations are just examples, and not necessarily a one-to-one relationship.
  • the specific form of training data can also be image data or text data, etc.
  • the role of the model is to recognize the voice corresponding to the image, and the specific form of the training data can be image data
  • the training data can be word vectors corresponding to movies, etc.
  • the above-mentioned training data may also include data of different modalities at the same time.
  • the training data may include image/video data collected by a camera, and may also include voice/text data issued by a user.
  • the specific form or type of the training data is not limited in the embodiment of the present application.
  • the client can use the locally stored training samples to train the neural network model (or the above-mentioned new neural network model) to obtain the first model.
  • the client can be deployed in a mobile terminal, which can collect a large amount of data during operation, and the client can use the collected data as training samples to perform personalized training on the neural network model to obtain customer terminal personalization model.
  • the process for the client (taking one of the first clients as an example) to train the neural network model may specifically include: using the training data as the input of the neural network model, and reducing the value of the first loss function as the target for the neural network model.
  • the network model is trained to obtain the first model.
  • the first loss function is used to indicate the difference between the output value of the neural network model and the label value.
  • the neural network is trained by using the training data and the optimization algorithm to obtain the first model.
  • the first loss function can be understood as a data loss function (referred to as training loss).
  • the first loss function in the embodiment of the present application may be a mean square error loss, or a function such as a cross-entropy loss that can be used to measure the difference between the output value of the neural network model and the label value (or real value).
  • the above-mentioned optimization algorithm can be the gradient descent method, the Newton method, or the adaptive moment estimation method and other optimization algorithms that can be used in machine learning.
  • the details are not limited here, and the gradient algorithm is used as an example to describe it below.
  • is the learning rate of the gradient descent optimization algorithm or the step size of each step update.
  • f is the first loss function (the specific form may be the above-mentioned mean square error loss, cross-entropy loss, etc.).
  • the above formula one is just an example of a gradient algorithm.
  • the gradient algorithm can also be other types of formulas, which are not specifically limited here.
  • all or part (for example: slices) of training data may be used.
  • slices are used, and each iteration uses a slice (batch) of data
  • a set of training data in Represents the input data for neural network model training, Indicates the input data The corresponding real label (or label value).
  • the above formula 2 is just an example of calculating the gradient.
  • the calculating gradient may also be other types of formulas, which are not specifically limited here.
  • the first model in the embodiment of the present application may be the neural network model in the above training process, or the first model after the value of the first loss function is less than the first threshold.
  • the first model may be the above training
  • the neural network model in the process can also be a model obtained after training based on the local data set of the client, which is not limited here.
  • Step 603 the client sends the first model to the upstream device.
  • the upstream device receives the first model sent by the client.
  • the client sends the first model or information about the first model, such as weight parameters and gradient parameters, to the upstream device.
  • the upstream device receives the first model sent by the client.
  • step 604 the upstream device prunes the first model based on the second loss function and the constraints to obtain the second model.
  • the upstream device may prune the first model based on the second loss function and constraint conditions to obtain the second model.
  • the substructure of the first model may be determined first, and then the first model may be pruned on the substructure.
  • the substructure includes at least two neurons, and the substructure can be set according to actual needs, and the substructure can be a channel, a feature map, a network layer, a subnetwork of a neural network model, or a predefined network composed of multiple neurons Other network structures; when the neural network model is a convolutional neural network, the substructure can also be a convolution kernel.
  • a substructure can be regarded as a functional whole. Pruning a substructure during pruning means pruning all the neurons included in the substructure. By pruning the model substructure, the model can be compressed from the model structure, which facilitates the realization of underlying hardware acceleration.
  • the above-mentioned second loss function can be understood as a sparse loss function (sparse loss for short), and the second loss function includes a difference term and a first sparse term, wherein the difference term is used to represent the parameters between the first model and the second model difference.
  • the first sparse item is used to prune at least one substructure of the plurality of substructures of the first model. Constraints are used to constrain the accuracy of the second model to be no less than the accuracy of the first model, which indicates how different the model's output values are from the label values.
  • n is the number of iterations
  • n is a positive integer
  • 2 is the L 2 norm
  • V n is the parameter of the first model
  • W n is the parameter of the second model
  • is a hyperparameter, which is used to adjust the weight of the first sparse item, in the embodiment of the present application Hyperparameters can take any non-negative real number.
  • the above-mentioned second loss function is just an example. In practical applications, there may also be other forms of the second loss function.
  • the L 2 norm in the first sparse item can be replaced by the L 0 norm
  • the number, L 1 norm, approximation of L 0 norm, approximation of L 1 norm, mixed norm of L 0 and L p , mixed norm of L 1 and L p , etc. can be used to guide the function of variable sparsity.
  • the difference term can be replaced by any other function that measures the similarity or distance between two variables, such as Euclidean distance, Mahalanobis distance, mutual information, cosine similarity, inner product, or norm.
  • the selection of the difference item and the first sparse item can be adapted according to the actual application scenario, and is not limited here.
  • the upstream device can use the same sparse loss function and constraint conditions to prune the multiple first models respectively.
  • multiple second models correspond to multiple first models one-to-one; of course, the upstream device can also group multiple first models according to the type of substructure, and each group corresponds to a sparse loss function They may be the same or different, and then combine multiple first models of the same group to perform pruning; or, the upstream device may also combine all received first models to perform pruning together.
  • the pruning manner of multiple first models is not specifically limited here. The specific pruning process is described below by taking a first model as an example.
  • the pruning direction of the first model in the embodiment of the present application can be understood as the descending direction of Wn calculated by the second loss function
  • the training data direction of the first model can be understood as the direction of Vn calculated by the first loss function down direction.
  • the pruning direction of the first model can be understood as the gradient direction of Wn obtained by deriving the second loss function
  • the training data direction of the first model can be understood as the direction of the first loss function.
  • the gradient direction of Vn is derived.
  • the first one is to prune the first model by introducing an update coefficient si .
  • an update coefficient is calculated based on a constraint condition, and the update coefficient is used to adjust the direction of the first sparse item.
  • the first model is pruned based on the third loss function to obtain the second model.
  • a subspace may be determined according to constraint conditions, and the accuracy of the second model in the subspace is the same as the accuracy of the first model.
  • a form of the third loss function is specifically as follows:
  • si is the update coefficient, and the constraints are met by adjusting si .
  • the second loss function For the rest of the description, refer to the foregoing description about the second loss function, and details are not repeated here.
  • the above formula 4 is only an example of the third loss function, which can be set according to the description of the aforementioned second loss function in practical applications, and is not specifically limited here.
  • the first model parameter V n includes three substructures, namely V 1 , V 2 and V 3 , and for the convenience of showing the pruning direction in FIG. 7A , first group V n , assuming that it is divided into two groups (or understood as 2 substructures): a and b.
  • pruning the first model parameter V n can be understood as pruning V a and/or V b .
  • FIG. 7A is described by taking the pruning of Va as an example, that is, pruning the first model until V a becomes 0.
  • the descending direction of the second loss function is the pruning direction before correction in FIG. 7A .
  • E() can be understood as an intra-group normalization operator.
  • the Hessian matrix There are many eigenvalues that are close to 0, and perturbing the model parameters in the directions corresponding to these eigenvalues will hardly change the accuracy of the model.
  • P0 as shown in Figure 7A represents the subspace generated by these directions (in Figure 7A, P0 is plane as an example) in which the accuracy of the first model is the same as the accuracy of the second model.
  • ⁇ 0 represents the projection operator projected to the subspace P0
  • s a is the corresponding s i of group a
  • the calculation method can refer to the following formula:
  • the substructure can be a channel, a feature map, a network layer, a subnetwork of a neural network model, or other predefined network structures composed of multiple neurons; when the neural network model is a convolutional neural network
  • the substructure can also be a convolution kernel.
  • Pruning a substructure during pruning means pruning all the neurons included in the substructure.
  • FIG. 7B is only described by taking one substructure including 2 neurons as an example. It is understood that a substructure may include more or fewer neurons. It can be seen from Figure 7B that the model after pruning has two substructures reduced compared to the model before pruning. Of course, FIG. 7B is only for more intuitive description of the model changes before and after pruning, and the number of pruning substructures can be one or more, which is not limited here.
  • V a can also be pruned by adjusting V b through si , so that the direction of the predicted training data direction V′ n after pruning is the corrected pruning direction, Pruning the first model based on the direction has little impact on the accuracy of the first model.
  • FIG. 8 please refer to FIG. 8 .
  • the second method is to correct the pruning direction of the first model based on the constraints, obtain the corrected pruning direction, and perform pruning on the first model based on the corrected pruning direction.
  • a better second model ie, the pruned model
  • another form of the gradient algorithm in step 602 can be as follows:
  • n is the number of iterations
  • Z n+1 is the nth set of training data
  • is the learning rate of the gradient descent optimization algorithm or the step size of each step update.
  • f is the first loss function (the specific form may be the above-mentioned mean square error loss, cross-entropy loss, etc.).
  • g(n, ⁇ ) is a function to adjust the pruning direction
  • c and ⁇ are two hyperparameters that control the strength of sparse penalty, ⁇ (0.5,1], i represents the i-th substructure
  • the above formula 6 and formula 7 are just an example of solving the better/optimum second model, and there may be other ways in practical applications, which are not specifically limited here.
  • the above formula seven can be replaced by the following formula eight:
  • () + indicates that only the numbers greater than 0 are taken, and the numbers less than 0 are set to zero.
  • the pruning direction of the first model can be projected into the subspace to obtain the corrected pruning direction, and the first model can be pruned according to the corrected pruning direction, which can ensure that the accuracy of the first model before pruning is the same as The second model after pruning has the same accuracy.
  • the pruning direction before correction is projected to the subspace P0 , so as to obtain the corrected pruning direction as shown in FIG. 7A .
  • the pruning direction of the first model is mirrored based on the subspace to obtain the corrected pruning direction.
  • the corrected pruning direction can be obtained by mirroring the subspace, and the first model is pruned according to the corrected pruning direction, which can ensure the accuracy of the first model before pruning and pruning
  • the accuracy of the latter second model is similar.
  • the value of the first loss function of the second model is smaller than the value of the first loss function of the first model.
  • the pruning direction before correction is mirrored based on the subspace P0, so as to obtain the mirrored pruning direction as shown in Figure 9 (that is, the corrected pruning direction branch direction).
  • the angle between the training data direction of the first model and the pruning direction of the first model can be determined first, if the angle is an obtuse angle (that is, the pruning direction and the data The training direction is opposite, which is why the model needs to be fine-tuned after the model is pruned in the prior art), then adjust the pruning direction of the first model to meet the difference between the corrected pruning direction and the training data direction of the first model.
  • the included angles are acute or right angles.
  • pruning the first model according to the corrected pruning direction can ensure that the accuracy of the first model before pruning is similar to the accuracy of the second model after pruning.
  • the value of the first loss function of the second model is less than or equal to the value of the first loss function of the first model.
  • the included angle between the pruning direction before correction and the data training direction of the first model may be an obtuse angle, that is, the two directions are inconsistent.
  • the subsequent steps of fine-tuning the model are reduced.
  • the pruning direction is adjusted to a direction range that can be an acute angle or a right angle with the data training direction of the first model.
  • the first model is pruned based on the second loss function and constraints
  • the first model is pruned based on the second loss function and constraints
  • the first model can be randomly optimized repeatedly until the pruned second model satisfies the constraint conditions, etc., which are not limited here.
  • Step 605 the upstream device aggregates multiple second models to obtain a third model. This step is optional.
  • the upstream device may aggregate the multiple second models to obtain the third model. And take the third model as the global model.
  • the third model is used as the neural network model in step 601 to repeatedly perform steps 601 to 605 .
  • steps 601 to 605 count as one iteration, and steps 601 to 605 in this embodiment of the present application may be performed multiple times.
  • the stop condition of step 601 to step 605 (which can also be understood as the stop condition of pruning update) can be set.
  • the stop condition can be the number of cycles, the sparseness of the model after pruning The degree reaches a certain threshold, etc., which are not specifically limited here.
  • the above-mentioned aggregation method may be to obtain a weighted average of multiple second models, or may be to obtain an average of multiple second models, etc., which are not specifically limited here.
  • the accuracy of the aggregated third model is higher than the accuracy of the first model; if the data of each client's model is asynchronous, the aggregated The accuracy of the third model may be lower than the accuracy of the first model.
  • the upstream device may also prune the third model through loss functions and constraints to obtain the fourth model.
  • the upstream device may send the fourth model to the edge server or the client.
  • the upstream device is an edge server, the edge server may send the fourth model to the cloud service, so that the upper server can process the fourth model.
  • Step 606 whether the first preset condition is met, if yes, the training ends. If not, repeat the aforementioned steps with the third model as the neural network model. This step is optional.
  • the upstream device may judge whether the first preset condition is met, and if it is (satisfied), the training process of the model ends.
  • steps 607 and 608 may be performed, or the third model may be sent to an upstream edge server or cloud server.
  • steps 607 and 608 may be performed, or the third model may be sent to an upstream edge server or cloud server.
  • the third model is used as the neural network model in step 601 and the steps from step 601 to step 606 shown in FIG. 6 are repeatedly executed (or understood as one iteration). That is, the upstream device sends the third model to the client, and the client uses the local data set to train the third model to obtain the fourth model, and sends the fourth model to the upstream device.
  • the upstream device receives multiple fourth models sent by multiple clients, and the upstream device prunes the multiple fourth models based on the sparse loss function and constraint conditions to obtain multiple fifth models. And aggregating multiple fifth models to obtain a sixth model, and then judging whether the first preset condition is satisfied, and if so, the training ends. If not, the sixth model is used as the neural network model of the first iteration or the third model of the second iteration to repeat the steps shown in FIG. 6 until the first preset condition is satisfied.
  • the upstream device may also send the first indication information to the client indicating that the training is not over, so that the client can determine whether to continue the training according to the first indication information Model.
  • the first preset condition may be that the third model converges, the number of cycles from step 601 to step 605 reaches a threshold, and the accuracy of the global model reaches a threshold, etc., which are not specifically limited here.
  • Step 607 the upstream device sends the third model and the second indication information to the client. This step is optional,
  • the upstream device sends the third model and second indication information to the client, where the second indication information is used to indicate the end of the training process of the third model.
  • Step 608 the client uses the third model to perform inference according to the second indication information. This step is optional.
  • the client can know that the training process of the third model has ended according to the second indication information, and use the third model for inference.
  • the client uses local data to train the neural network model to obtain the first model, and sends the first model to the upstream device.
  • the upstream device then prunes the first model according to the constraints.
  • the upstream device considers constraints during the pruning process, so that the accuracy of the second model after pruning is higher than or equal to the first model.
  • the step of model accuracy so as to improve the efficiency of the model pruning process while ensuring the accuracy of the pruned model.
  • the model can be compressed from the model structure, which facilitates the realization of underlying hardware acceleration. And it reduces the size of the model and reduces the storage and computing overhead of the client.
  • the client performs the pruning step.
  • another embodiment of the federated learning method provided by the embodiment of this application includes steps 1101 to 1108 .
  • Step 1101 the upstream device sends the neural network model to the client.
  • the client receives the neural network model sent by the upstream device.
  • Step 1102 the client uses the training data as input, and trains the neural network model according to the first loss function to obtain the first model.
  • Step 1101 and step 1102 in this embodiment are similar to step 601 and step 602 in the foregoing embodiment shown in FIG. 6 , and will not be repeated here.
  • Step 1103 the client prunes the first model based on the second loss function and constraints to obtain the second model.
  • Step 1103 performed by the client in this embodiment is similar to step 604 performed by the upstream device in the foregoing embodiment shown in FIG. 6 , and will not be repeated here.
  • step 1102 and step 1103 can be regarded as an iterative process.
  • the client acquires the second model, it can judge whether the second preset condition is satisfied, and if it is (satisfied), execute step 1104 .
  • the third model is used as the neural network model in step 1102 to repeatedly execute step 1102 and step 1103 (or understood as one iteration). That is, the training data is used as input, and the third model is trained according to the first loss function to obtain the trained third model. And based on the constraints, the trained third model is pruned to obtain the seventh model, and then it is judged whether the first preset condition is satisfied, and if so, step 1104 is executed. If not, repeat step 1102 and step 1103 using the seventh model as the neural network model of the first iteration or the third model of the second iteration until the second preset condition is met.
  • the second preset condition may be that the model converges, the cycle times of steps 1102 and 1103 reach a threshold, and the accuracy of the model reaches a threshold, etc., which are not specifically limited here.
  • step 1103 can also be regarded as an iterative process. After the client acquires the second model, it can judge whether the third preset condition is satisfied, and if so, step 1104 is executed. If not (not satisfied), the third model is used as the neural network model in step 1102 to repeat step 1102 until the third preset condition is met.
  • the third preset condition may be that the model converges, the number of cycles in step 1102 reaches the threshold, the accuracy of the model reaches the threshold, etc., which are not specifically limited here.
  • Step 1104 the client sends the second model to the upstream device.
  • the upstream device receives the second model sent by the client.
  • the client sends the second model or information about the second model, such as weight parameters and gradient parameters, to the upstream device.
  • the upstream device receives the second model sent by the client.
  • Step 1105 the upstream device aggregates multiple second models to obtain a third model.
  • Step 1105 in this embodiment is similar to step 605 in the foregoing embodiment shown in FIG. 6 , and will not be repeated here.
  • Step 1106 whether the first preset condition is satisfied, if yes, the training ends. If not, repeat the aforementioned steps with the third model as the neural network model. This step is optional.
  • step 1107 and step 1108 may be performed, or the third model may be sent to an upstream edge server or cloud server.
  • the steps performed after the training ends there is no limitation on the steps performed after the training ends.
  • the third model is used as the neural network model in step 601 to repeatedly execute the steps shown in FIG. 6 (or understood as one iteration). That is, the upstream device sends the third model to the client, and the client uses the local data set to train the third model to obtain the fourth model, and prunes multiple fourth models based on the sparse loss function and constraints to obtain multiple fifth models .
  • the fifth model is sent to the upstream device.
  • the upstream device receives multiple fifth models sent by multiple clients, aggregates the multiple fifth models to obtain a sixth model, and then judges whether the first preset condition is satisfied, and if so, the training ends. If not, the sixth model is used as the neural network model of the first iteration or the third model of the second iteration to repeat the steps shown in FIG. 6 until the first preset condition is satisfied.
  • the upstream device may also send the first indication information to the client indicating that the training is not over, so that the client can determine whether to continue the training according to the first indication information Model.
  • the first preset condition may be that the third model converges, the number of cycles from step 1101 to step 1105 reaches a threshold, and the accuracy of the global model reaches a threshold, etc., which are not limited here.
  • Step 1107 the upstream device sends second indication information to the client. This step is optional.
  • Step 1108 the client uses the third model to perform inference according to the second indication information. This step is optional.
  • Step 1107 and step 1108 in this embodiment are similar to steps 607 and 608 in the foregoing embodiment shown in FIG. 6 , and will not be repeated here.
  • the main difference between this embodiment and the embodiment shown in FIG. 6 is that the pruning step in the embodiment shown in FIG. 6 is performed by an upstream device, while the pruning step in this embodiment is performed by a client.
  • the client uses local data to train the neural network model to obtain the first model, then prunes the first model according to constraints to obtain the second model, and sends the second model to the upstream device.
  • model the upstream device aggregates according to the second model to obtain the global model.
  • the upstream device considers constraints during the pruning process, so that the accuracy of the second model after pruning is higher than or equal to the first model. The step of model accuracy, so as to improve the efficiency of the model pruning process while ensuring the accuracy of the pruned model.
  • the model can be compressed from the model structure, which facilitates the realization of underlying hardware acceleration. And it reduces the size of the model and reduces the storage and computing overhead of the client.
  • the method can be executed by a model processing device (such as a client), and can also be executed by a component of a model processing device (such as a processor, a chip, or a chip system etc.), wherein the model processing device may be a cloud server or a client, and this embodiment includes steps 1201 to 1203.
  • a model processing device such as a client
  • a component of a model processing device such as a processor, a chip, or a chip system etc.
  • Step 1201 acquire training data including label values.
  • the training data based on tag values may be stored in other devices such as a server, and the model processing device obtains the training data through other devices such as the server. It can also be collected during operation by a model processing device, which is not specifically limited here.
  • Step 1202 using the training data as input, training the neural network model according to the first loss function to obtain the first model.
  • Step 1202 performed by the model processing device in this embodiment is similar to step 602 performed by the client in the foregoing embodiment shown in FIG. 6 , and will not be repeated here.
  • Step 1203 pruning the first model based on the second loss function and constraints to obtain a second model.
  • Step 1203 performed by the model processing device in this embodiment is similar to step 604 performed by the upstream device in the foregoing embodiment shown in FIG. 6 , and will not be repeated here.
  • the model processing device trains the neural network model using local data to obtain the first model, and then prunes the first model according to constraints to obtain the second model.
  • the model processing device considers constraints during the pruning process, so that the accuracy of the second model after pruning is higher than or equal to the first model. It can also be understood as pruning that does not increase the training loss and reduces the process of subsequent model fine-tuning.
  • model processing method and federated learning method in the embodiment of the present application are described above, and the model processing device and upstream device in the embodiment of the present application are described below, please refer to Figure 13, an implementation of the model processing device in the embodiment of the present application Examples include:
  • An acquisition unit 1301, configured to acquire training data including label values
  • the training unit 1302 is configured to use the training data as input, train the neural network model according to the first loss function, and obtain the first model, the first model includes a plurality of substructures, and each substructure in the plurality of substructures includes at least two neurons;
  • a pruning unit 1303, configured to prune the first model based on a second loss function and a constraint condition to obtain a second model, the second loss function is used to indicate that at least one substructure in the plurality of substructures is to be pruned,
  • the constraints are used to constrain the accuracy of the second model to be no lower than the accuracy of the first model, and the accuracy indicates the degree of difference between the output value of the model and the label value.
  • model processing equipment also includes:
  • a receiving unit 1304, configured to receive the neural network model sent by the upstream device
  • a sending unit 1305, configured to send the second model to an upstream device.
  • each unit in the model processing device is similar to the steps and related descriptions performed by the client in FIG. 2 to FIG. 5 or in FIG. 11 or the model processing device in the embodiment shown in FIG. Let me repeat.
  • the pruning unit 1303 considers the constraints based on the data loss function, which is equivalent to providing a direction for the pruning of the first model, so that the second pruning obtained by pruning
  • the accuracy of the model is not lower than the accuracy of the first model, reducing the subsequent steps of adjusting the accuracy of the model through fine-tuning, thereby improving the efficiency of the model pruning process while ensuring the accuracy of the pruned model.
  • the upstream device may be the aforementioned cloud server or the aforementioned edge server.
  • the upstream device includes:
  • the sending unit 1401 is configured to send the neural network model to multiple downstream devices, the neural network model includes multiple substructures, and each substructure in the multiple substructures includes at least two neurons;
  • a receiving unit 1402 configured to receive multiple first models from multiple downstream devices, where the multiple first models are trained by neural network models;
  • the pruning unit 1403 is configured to respectively prune a plurality of first models based on a loss function and a constraint condition, wherein the loss function is used to indicate to prune the substructures of a plurality of first models, and the constraint condition is used to constrain each The accuracy of the first model after pruning is not lower than the accuracy before pruning;
  • the aggregation unit 1404 is configured to aggregate the multiple pruned first models to obtain a second model.
  • each unit in the upstream device is similar to the steps and related descriptions performed by the cloud server or the edge server in the embodiments shown in FIGS. 2 to 11 , and will not be repeated here.
  • the client uses local data to train the neural network model to obtain the first model, and sends the first model to the upstream device.
  • the pruning unit 1403 is pruning the first model according to the constraints.
  • the pruning unit 1403 considers constraints during the pruning process, so that the accuracy of the first model before and after pruning is approximated, which can also be understood as pruning that does not increase the training loss, reducing the subsequent steps of adjusting the model accuracy through fine-tuning , so as to improve the efficiency of the model pruning process while ensuring the accuracy of the pruned model.
  • the pruning unit 1403 can compress the model from the model structure by pruning the model substructure, which facilitates the realization of underlying hardware acceleration. And it reduces the size of the model and reduces the storage and computing overhead of the client.
  • the upstream device may be the aforementioned cloud server or the aforementioned edge server.
  • the upstream device includes:
  • a sending unit 1501 configured to send a neural network model to multiple downstream devices, where the neural network model includes multiple substructures, and each substructure in the multiple substructures includes at least two neurons;
  • a receiving unit 1502 configured to receive a plurality of first models from the plurality of downstream devices, the plurality of first models are trained by the neural network model;
  • an aggregation unit 1503 configured to aggregate the multiple first models to obtain a second model
  • a pruning unit 1504 configured to prune the second model based on a loss function and a constraint condition, wherein the loss function is used to indicate to prune the substructure of the second model, and the constraint The condition is used to constrain the accuracy of the second model after pruning to not be lower than the accuracy before pruning.
  • each unit in the upstream device is similar to the steps and related descriptions performed by the cloud server or the edge server in the embodiments shown in FIGS. 2 to 11 , and will not be repeated here.
  • the client uses local data to train the neural network model to obtain the first model, and sends the first model to the upstream device.
  • the pruning unit 1504 is pruning the second model according to the constraints.
  • the pruning unit 1504 considers constraints during the pruning process, so that the accuracy of the second model before and after pruning is approximate, which can also be understood as pruning that does not increase the training loss, reducing the subsequent steps of adjusting the model accuracy through fine-tuning , so as to improve the efficiency of the model pruning process while ensuring the accuracy of the pruned model.
  • the pruning unit 1504 can compress the model from the model structure by pruning the model substructure, which facilitates the realization of underlying hardware acceleration. And it reduces the size of the model and reduces the storage and computing overhead of the client.
  • the embodiment of the present application also provides a model processing device, as shown in Figure 16, for the sake of illustration, only the parts related to the embodiment of the present application are shown, and the specific technical details are not disclosed, please refer to the method of the embodiment of the present application part (that is, the steps performed by the client in the aforementioned FIG. 2 to FIG. 11 or the model processing device in the embodiment shown in FIG. 12 are similar to the related descriptions).
  • the model processing device can be any terminal device including a mobile phone, a tablet computer, etc. Taking the model processing device as a client and the client as a mobile phone as an example:
  • FIG. 16 is a block diagram showing a partial structure of the model processing device-the mobile phone provided by the embodiment of the present application.
  • the mobile phone includes: a radio frequency (radio frequency, RF) circuit 1610, a memory 1620, an input unit 1630, a display unit 1640, a sensor 1650, an audio circuit 1660, a wireless fidelity (wireless fidelity, WiFi) module 1670, and a processor 1680 , and power supply 1690 and other components.
  • RF radio frequency
  • the RF circuit 1610 can be used for sending and receiving information or receiving and sending signals during a call. In particular, after receiving the downlink information from the base station, it is processed by the processor 1680; in addition, the designed uplink data is sent to the base station.
  • the RF circuit 1610 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier (low noise amplifier, LNA), a duplexer, and the like.
  • RF circuitry 1610 may also communicate with networks and other devices via wireless communications.
  • the above-mentioned wireless communication can use any communication standard or protocol, including but not limited to global system of mobile communication (GSM), general packet radio service (general packet radio service, GPRS), code division multiple access (code division multiple access) multiple access (CDMA), wideband code division multiple access (WCDMA), long term evolution (LTE), e-mail, short message service (short messaging service, SMS), etc.
  • GSM global system of mobile communication
  • GPRS general packet radio service
  • CDMA code division multiple access
  • WCDMA wideband code division multiple access
  • LTE long term evolution
  • SMS short message service
  • the memory 1620 can be used to store software programs and modules, and the processor 1680 executes various functional applications and data processing of the mobile phone by running the software programs and modules stored in the memory 1620 .
  • the memory 1620 can mainly include a program storage area and a data storage area, wherein the program storage area can store an operating system, at least one application program required by a function (such as a sound playback function, an image playback function, etc.), etc.; Data created by the use of mobile phones (such as audio data, phonebook, etc.), etc.
  • the memory 1620 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage devices.
  • the input unit 1630 can be used to receive input numbers or character information, and generate key signal input related to user settings and function control of the mobile phone.
  • the input unit 1630 may include a touch panel 1631 and other input devices 1632 .
  • the touch panel 1631 also referred to as a touch screen, can collect touch operations of the user on or near it (for example, the user uses any suitable object or accessory such as a finger or a stylus on the touch panel 1631 or near the touch panel 1631). operation), and drive the corresponding connection device according to the preset program.
  • the touch panel 1631 may include two parts, a touch detection device and a touch controller.
  • the touch detection device detects the user's touch orientation, and detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts it into contact coordinates, and sends it to the to the processor 1680, and can receive and execute commands sent by the processor 1680.
  • the touch panel 1631 can be implemented in various types such as resistive, capacitive, infrared, and surface acoustic wave.
  • the input unit 1630 may also include other input devices 1632 .
  • other input devices 1632 may include but not limited to one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), trackball, mouse, joystick, and the like.
  • the display unit 1640 may be used to display information input by or provided to the user and various menus of the mobile phone.
  • the display unit 1640 may include a display panel 1641.
  • the display panel 1641 may be configured in the form of a liquid crystal display (liquid crystal display, LCD) or an organic light-emitting diode (OLED).
  • the touch panel 1631 can cover the display panel 1641, and when the touch panel 1631 detects a touch operation on or near it, it sends it to the processor 1680 to determine the type of the touch event, and then the processor 1680 determines the type of the touch event according to the The type provides a corresponding visual output on the display panel 1641.
  • the touch panel 1631 and the display panel 1641 are used as two independent components to realize the input and input functions of the mobile phone, in some embodiments, the touch panel 1631 and the display panel 1641 can be integrated to form a mobile phone. Realize the input and output functions of the mobile phone.
  • the handset may also include at least one sensor 1650, such as a light sensor, motion sensor, and other sensors.
  • the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display panel 1641 according to the brightness of the ambient light, and the proximity sensor may turn off the display panel 1641 and/or when the mobile phone is moved to the ear. or backlight.
  • the accelerometer sensor can detect the magnitude of acceleration in various directions (generally three axes), and can detect the magnitude and direction of gravity when it is stationary, and can be used to identify the application of mobile phone posture (such as horizontal and vertical screen switching, related Games, magnetometer attitude calibration), vibration recognition related functions (such as pedometer, tap), etc.; as for other sensors such as gyroscope, barometer, hygrometer, thermometer, infrared, IMU, SLAM sensor, etc. that can also be configured on the mobile phone, I won't repeat them here.
  • mobile phone posture such as horizontal and vertical screen switching, related Games, magnetometer attitude calibration
  • vibration recognition related functions such as pedometer, tap
  • other sensors such as gyroscope, barometer, hygrometer, thermometer, infrared, IMU, SLAM sensor, etc. that can also be configured on the mobile phone, I won't repeat them here.
  • the audio circuit 1660, the speaker 1661, and the microphone 1662 can provide an audio interface between the user and the mobile phone.
  • the audio circuit 1660 can transmit the electrical signal converted from the received audio data to the speaker 1661, and the speaker 1661 converts it into an audio signal for output; After being received, it is converted into audio data, and then the audio data is processed by the output processor 1680, and then sent to another mobile phone through the RF circuit 1610, or the audio data is output to the memory 1620 for further processing.
  • WiFi is a short-distance wireless transmission technology.
  • the mobile phone can help users send and receive emails, browse web pages, and access streaming media through the WiFi module 1670. It provides users with wireless broadband Internet access.
  • FIG. 16 shows a WiFi module 1670, it can be understood that it is not an essential component of the mobile phone.
  • the processor 1680 is the control center of the mobile phone. It uses various interfaces and lines to connect various parts of the entire mobile phone. By running or executing software programs and/or modules stored in the memory 1620, and calling data stored in the memory 1620, execution Various functions and processing data of the mobile phone, so as to monitor the mobile phone as a whole.
  • the processor 1680 may include one or more processing units; preferably, the processor 1680 may integrate an application processor and a modem processor, wherein the application processor mainly processes operating systems, user interfaces, and application programs, etc. , the modem processor mainly handles wireless communications. It can be understood that the foregoing modem processor may not be integrated into the processor 1680 .
  • the mobile phone also includes a power supply 1690 (such as a battery) for supplying power to various components.
  • a power supply 1690 (such as a battery) for supplying power to various components.
  • the power supply can be logically connected to the processor 1680 through the power management system, so that functions such as charging, discharging, and power consumption management can be realized through the power management system.
  • the mobile phone may also include a camera, a Bluetooth module, etc., which will not be repeated here.
  • the processor 1680 included in the mobile phone can execute the functions of the client in FIG. 2 to FIG. 5 and FIG. 11 or the model processing device in the embodiment shown in FIG. 12 , which will not be repeated here.
  • the upstream device may include a processor 1701 , a memory 1702 and a communication interface 1703 .
  • the processor 1701, memory 1702 and communication interface 1703 are interconnected by wires.
  • program instructions and data are stored in the memory 1702 .
  • the memory 1702 stores program instructions and data corresponding to the steps executed by the cloud server or the edge server in the embodiments shown in FIGS. 2 to 6 or 11 .
  • the processor 1701 is configured to execute the steps executed by the cloud server or the edge server in the embodiments shown in FIGS. 2 to 6 or FIG. 11 .
  • the communication interface 1703 can be used to receive and send data, and is used to execute the steps related to acquisition, sending, and receiving of the cloud server or edge server in the embodiment shown in FIG. 2 to FIG. 6 or FIG. 11 .
  • the upstream device may include more or fewer components than those shown in FIG. 17 , which is only an example in the present application and not limited thereto.
  • the disclosed system, device and method can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components can be combined or integrated. to another system, or some features may be ignored, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units may be fully or partially realized by software, hardware, firmware or any combination thereof.
  • the integrated units When the integrated units are implemented using software, they may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present invention will be generated in whole or in part.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server or data center Transmission to another website site, computer, server or data center by wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center integrated with one or more available media.
  • the available medium may be a magnetic medium (such as a floppy disk, a hard disk, or a magnetic tape), an optical medium (such as a DVD), or a semiconductor medium (such as a solid state disk (solid state disk, SSD)), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Image Analysis (AREA)

Abstract

一种模型处理方法,可以应用于模型训练与剪枝场景,该方法可以由客户端执行,还可以由客户端的部件(例如处理器、芯片或芯片系统等)执行,该方法包括:获取包括标签值的训练数据(1201);以训练数据为输入,根据第一损失函数训练神经网络模型,得到第一模型(1202);基于第二损失函数以及约束条件对第一模型进行剪枝,以得到第二模型(1203);约束条件用于约束第二模型的精度不低于第一模型的精度。在对第一模型进行剪枝的过程中,考虑基于数据损失函数的约束条件,相当于为第一模型的剪枝提供一个方向,使得剪枝得到的第二模型的精度不低于第一模型的精度,减少后续通过微调调整模型精度的步骤,从而在保证剪枝后模型的精度的同时提升模型剪枝过程的效率。

Description

一种模型处理方法、联邦学习方法及相关设备
本申请要求于2021年7月6日提交中国专利局、申请号为202110763965.6、发明名称为“一种模型处理方法、联邦学习方法及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及终端人工智能领域,尤其涉及一种模型处理方法、联邦学习方法及相关设备。
背景技术
人工智能(artificial intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个分支,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式作出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。人工智能领域的研究包括机器人,自然语言处理,计算机视觉,决策与推理,人机交互,推荐与搜索,AI基础理论等。
目前,深度学习作为机器学习的主流分支之一应用广泛,但深度神经网络模型大、参数多以及其带来的计算、存储、功耗、时延等方面的问题阻碍了深度学习模型的产品化。为解决该问题,需要对深度神经网络进行简化。其中,剪枝技术应用最广,剪枝技术通过移除部分参数和模块实现对深度神经网络的压缩。经典的剪枝过程包括三个步骤,首先基于本地数据集完成对模型的训练,然后根据预设的规则对训练好的模型剪枝,最后还需要利用本地数据集对剪枝后的模型进行微调以避免模型精度损失太多。整个剪枝流程较为繁琐,效率较低。
发明内容
本申请实施例提供了一种模型处理方法及相关设备,该方法可以和联邦学习方法结合起来使用。在对第一模型进行剪枝的过程中,考虑基于数据损失函数的约束条件,相当于约束了第一模型的剪枝方向,使得剪枝得到的第二模型的精度不低于第一模型的精度,减少后续通过微调调整模型精度的步骤,从而在保证剪枝后模型的精度的同时提升模型剪枝过程的效率。
本申请第一方面提供了一种模型处理方法,可以应用于模型训练与剪枝场景,该方法可以由模型处理设备(例如客户端)执行,还可以由客户端的部件(例如处理器、芯片或芯片系统等)执行,该方法包括:获取包括标签值的训练数据;以训练数据为输入,根据第一损失函数训练神经网络模型,得到第一模型,第一模型包括多个子结构,多个子结构中的每个子结构包括至少两个神经元;基于第二损失函数以及约束条件对第一模型进行剪枝,以得到第二模型,第二损失函数用于指示将多个子结构中的至少一个子结构进行剪枝,约束条件用于约束第二模型的精度不低于第一模型的精度,精度指示模型的输出值与标签值之间的差异 程度。
其中,第一损失函数也可以理解为是数据损失函数,主要用于在使用数据训练模型的过程中评估模型的精度。第二损失函数可以理解为是稀疏损失函数,主要用于对模型进行稀疏(或称为剪枝)。子结构可以是神经网络模型的通道、特征图、网络层、子网络、或者预定义的由多个神经元组成的其他网络结构;当神经网络模型是卷积神经网络时,子结构还可以是卷积核。总之,一个子结构可以看作一个功能整体,在剪枝时对子结构进行剪枝,是指将该子结构包括的所有神经元都进行剪枝。
本申请实施例中,在对第一模型进行剪枝的过程中,考虑基于数据损失函数的约束条件,相当于约束了第一模型的剪枝方向,使得剪枝得到的第二模型的精度不低于第一模型的精度,减少后续通过微调调整模型精度的步骤,而且是对子结构进行剪枝,比对神经元逐个剪枝效率更高,从而在保证剪枝后模型的精度的同时提升模型剪枝过程的效率,得到的模型结构也更简洁。
可选地,在第一方面的一种可能的实现方式中,上述的约束条件具体用于约束第一损失函数的下降方向与第二损失函数的下降方向之间的夹角小于或等于90度。其中,第一损失函数的下降方向可以是对第一损失函数求导得到的梯度方向,第二损失函数的下降方向可以是对第二损失函数求导得到的梯度方向。
该种可能的实现方式中,通过调整第一损失函数的下降方向与第二损失函数的下降方向的夹角小于或等于90度,可以保证剪枝后的第二模型精度较剪枝前的第一模型的精度不会有所下降,减少了后续微调模型精度的步骤。
可选地,在第一方面的一种可能的实现方式中,上述的约束条件具体用于约束第二模型的第一损失函数的值小于或等于第一模型的第一损失函数的值。换句话说,用第一模型和第二模型对相同的数据做预测,再用第一损失函数衡量第一模型和第二模型的精度,对应第一损失函数的值越小的模型的精度越高。
该种可能的实现方式中,具体精度的确定方式可以是用第二模型的第一损失函数的值与第一模型的第一损失函数的值进行确定。当然,也可以使用与第一损失函数不同的评价方式来比较第一模型和第二模型的精度,本申请不限定具体的评价方式。
可选地,在第一方面的一种可能的实现方式中,上述的第二损失函数包括第一稀疏项,第一稀疏项与多个子结构中的至少一个子结构的权重相关。
该种可能的实现方式中,在对第一模型进行剪枝时,第二损失函数中的第一稀疏项是将子结构作为一个整体进行处理,所以剪枝的时候被剪掉的都是通道、卷积核、特征图、网络层等网络结构,而不是单个的神经元,大大提高了剪枝的效率,得到的模型也更加精炼、轻量。
可选地,在第一方面的一种可能的实现方式中,上述的第二损失函数还包括差异项,该差异项指示第一模型与第二模型的差异。
该种可能的实现方式中,通过在第二损失函数增加差异项,可以一定程度上约束剪枝前后的模型的差异程度,保证剪枝前后模型的相似性,进而保证剪枝后模型的精度。
可选地,在第一方面的一种可能的实现方式中,上述步骤:基于第二损失函数以及约束条件对第一模型进行剪枝,以得到第二模型,包括:基于约束条件计算更新系数,更新系数 用于调整第一稀疏项的方向;使用更新系数更新第二损失函数中的第一稀疏项,以得到第三损失函数,第三损失函数包括差异项与第二稀疏项,第二稀疏项基于更新系数与第一稀疏项更新得到;基于第三损失函数对第一模型进行剪枝,以得到第二模型。
该种可能的实现方式中,通过引入更新系数可以调整剪枝方向,从而使得剪枝后的第二模型满足约束条件。该情况下的剪枝也可以理解为是定向剪枝。
可选地,在第一方面的一种可能的实现方式中,上述的第三损失函数包括:
Figure PCTCN2022100682-appb-000001
其中,|| || 2为L 2范数,V n为第一模型的参数,W n为第二模型的参数,λ为超参数,用于调节第一稀疏项的权重;s i为更新系数,通过调节s i以满足约束条件,
Figure PCTCN2022100682-appb-000002
为第二模型中的第i个子结构的参数。
该种可能的实现方式中,通过更新系数更新第一稀疏项得到第二稀疏项,且更新后的第二稀疏项可以对模型进行定向剪枝,保证剪枝后模型的精度不损失。
可选地,在第一方面的一种可能的实现方式中,上述步骤:基于第二损失函数以及约束条件对第一模型进行剪枝,以得到第二模型,包括:基于第二损失函数对第一模型进行至少一次随机剪枝,直至剪枝第一模型后得到的第二模型满足约束条件。具体可以是基于第二损失函数对第一模型进行随机剪枝,以得到第二模型;若满足约束条件,则输出第二模型;若不满足约束条件,则重复基于第二损失函数对第一模型进行随机剪枝的步骤直至满足约束条件。
该种可能的实现方式中,可以通过随机剪枝加上约束条件判断的方式进行剪枝,只有满足约束条件的情况下,才输出剪枝后的模型。这种方式不需要使用数据来微调剪枝后的模型,通用性更高。
可选地,在第一方面的一种可能的实现方式中,上述方法应用于联邦学习系统中的客户端,用来训练神经网络模型的数据是客户端本地的数据,例如客户端的传感器采集的数据或客户端的程序应用等运行过程中生成的数据等,该方法还包括:接收上游设备发送的神经网络模型;向上游设备发送第二模型。上游设备是可以与客户端通信的服务器等设备。
该种可能的实现方式中,该方法可以应用于联邦学习场景,一方面通过引入约束条件对模型子结构进行剪枝,可以帮助上游设备筛选出精度不损失、结构简化的且用于聚合的多个模型,减少了上行链路(即客户端到上游设备的通信链路)的通信负担。
可选地,在第一方面的一种可能的实现方式中,上述的训练数据包括:图像数据、音频数据或者文本数据等。可以理解的是,上述三种只是对训练数据的举例,在实际应用中,根据神经网络模型的所处理的任务类型不同,训练数据的具体形式不同,此处不做限定。
可选地,在第一方面的一种可能的实现方式中,上述的神经网络模型用于对图像数据进行分类和/或识别等。可以理解的是,在实际应用中,神经网络模型还可以用于目标检测、信息推荐、语音识别、文字识别、问答任务、人机游戏等,具体此处不做限定。
该种可能的实现方式中,对于应用于任何场景(例如:智能终端、智能交通、智能医疗、自动驾驶、智慧城市等)的神经网络模型,都可以适用于本申请实施例提供的剪枝方法,有利于提升神经网络模型的剪枝效率,在减少神经网络占用的存储空间外,还可以保证神经网 络模型的精度。
在一种可能的实现方式中,上述模型训练与剪枝的步骤可以是一次也可以是多次,具体根据需要设置,该情况下可以得到更加符合用户预期的模型,在保证模型精度不损失的情况下完成剪枝,节省存储和通信成本。
本申请第二方面提供了一种联邦学习方法,可以应用于模型剪枝场景,该方法可以由上游设备(云服务器或边服务器等)执行,还可以由上游设备的部件(例如处理器、芯片或芯片系统等)执行,该方法可以理解为是先约束剪枝,后聚合的操作。该方法包括:向多个下游设备发送神经网络模型,神经网络模型包括多个子结构,每个子结构包括至少两个神经元;接收来自多个下游设备的多个第一模型,多个第一模型由神经网络模型训练得到,其中,训练过程中使用的损失函数可以称为第一损失函数;基于第二损失函数)以及约束条件对多个第一模型分别进行剪枝,其中,第二损失函数用于指示对多个第一模型的子结构进行剪枝,约束条件用于约束每个第一模型剪枝后的精度不低于剪枝前的精度;将剪枝后的多个第一模型进行聚合,以得到第二模型。
可选地,在第二方面的一种可能的实现方式中,上述步骤:基于第二损失函数以及约束条件对第一模型进行剪枝,包括:基于第二损失函数对第一模型进行至少一次随机剪枝,直至剪枝第一模型后得到的模型满足约束条件。具体可以是基于第二损失函数对第一模型进行随机剪枝,以得到剪枝前后的第一模型;若满足约束条件,则输出模型;若不满足约束条件,则重复基于第二损失函数对第一模型进行随机剪枝的步骤直至满足约束条件。
本申请第三方面提供了一种联邦学习方法,可以应用于模型剪枝场景,该方法可以由服务器(云服务器或边服务器等)执行,还可以由服务器的部件(例如处理器、芯片或芯片系统等)执行,该方法可以理解为是先聚合,后约束剪枝的操作。该方法包括:向多个下游设备发送神经网络模型,神经网络模型包括多个子结构,每个子结构包括至少两个神经元;接收来自多个下游设备的多个第一模型,多个第一模型由神经网络模型训练得到,其中,训练过程中使用的损失函数可以称为第一损失函数;将多个第一模型进行聚合,以得到第二模型;基于损失函数(后续称为第二损失函数)以及约束条件对第二模型进行剪枝,其中,第二损失函数用于指示对第二模型的子结构进行剪枝,约束条件用于约束第二模型剪枝后的精度不低于剪枝前的精度。
可选地,在第三方面的一种可能的实现方式中,上述步骤:基于第二损失函数以及约束条件对第二模型进行剪枝,包括:基于第二损失函数对第二模型进行至少一次随机剪枝,直至剪枝第二模型后得到的模型满足约束条件。具体可以是基于第二损失函数对第二模型进行随机剪枝,以得到模型;若满足约束条件,则输出剪枝后的第二模型;若不满足约束条件,则重复基于第二损失函数对第二模型进行随机剪枝的步骤直至满足约束条件。
在第二方面/第三方面提供的实现方式中,服务器使用本申请实施例提供的方法对模型进行剪枝之后,不需要再利用训练数据来调整模型以保证模型精度,也就是说,服务器可以在不使用客户端的训练数据的情况下对模型进行剪枝,且保证模型的精度,这样就避免了剪枝时要将客户端的训练数据传输到上游设备,可以保护客户端的数据隐私。在上述第二方面/第三方面提供的实现方式中,子结构可以是神经网络模型的通道、特征图、网络层、子网络、或者预定义的由多个神经元组成的其他网络结构;当神经网络模型是卷积神经网络时,子结 构还可以是卷积核。总之,一个子结构可以看作一个功能整体,在剪枝时对子结构进行剪枝,是指将该子结构包括的所有神经元都进行剪枝。本申请提供的方法将每个子结构作为一个整体进行剪枝,比对神经元逐个剪枝效率更高,得到的模型结构也更简洁。
可选地,在第二方面/第三方面的一种可能的实现方式中,上述的多个第一模型是根据第一损失函数训练得到,剪枝所使用的损失函数称为第二损失函数,约束条件具体用于约束第一损失函数的下降方向与第二损失函数的下降方向之间的夹角小于或等于90度。其中,第一损失函数的下降方向可以是对第一损失函数求导得到的梯度方向,第二损失函数的下降方向可以是对第二损失函数求导得到的梯度方向。
该种可能的实现方式中,通过调整第一损失函数的下降方向与第二损失函数的下降方向的夹角小于或等于90度,可以保证剪枝前后的第一模型的精度不会有所下降,减少了后续微调模型精度的步骤。
可选地,在第二方面/第三方面的一种可能的实现方式中,上述的约束条件具体用于约束剪枝后的模型的第一损失函数的值小于或等于剪枝前模型的第一损失函数的值。换句话说,用剪枝前后的模型对相同的数据做预测,再用第一损失函数衡量剪枝前后模型的精度,对应第一损失函数的值越小的模型的精度越高。
该种可能的实现方式中,具体精度的确定方式可以是用剪枝前后的模型的第一损失函数的值进行确定。当然,也可以使用与第一损失函数不同的评价方式来比较剪枝前后模型的精度,本申请不限定具体的评价方式。
可选地,在第二方面/第三方面的一种可能的实现方式中,上述步骤还包括:向所述多个下游设备发送剪枝后的模型。
该种可能的实现方式中,可以应用于云服务器或边服务器进行剪枝、聚合的场景,剪枝聚合之后,向多个下游设备发送剪枝后的模型。以使下游设备使用剪枝后的模型进行推理,或者对剪枝后的模型进行再训练。将模型剪枝后再发送给下游设备,一方面可以减少通信负担,另一方面可以降低对下游设备的存储空间、处理能力的要求。
可选地,在第二方面/第三方面的一种可能的实现方式中,上述步骤还包括:向上游设备发送剪枝后的模型。
该种可能的实现方式中,可以应用于边服务器进行剪枝、聚合的场景,剪枝聚合之后,向上游设备发送剪枝后的模型。以使上游服务器继续对模型进行聚合、剪枝等操作,以综合来自更多客户端设备的信息。
可选地,在第二方面/第三方面的一种可能的实现方式中,上述的第二损失函数包括第一稀疏项,第一稀疏项与多个子结构中的至少一个子结构的权重相关。
该种可能的实现方式中,在对模型进行剪枝时,第二损失函数中的第一稀疏项是将子结构作为一个整体进行处理,所以剪枝的时候被剪掉的都是通道、卷积核、特征图、网络层等网络结构,而不是单个的神经元,大大提高了剪枝的效率,得到的模型也更加精炼、轻量。
可选地,在第二方面/第三方面的一种可能的实现方式中,上述的第二损失函数还包括差异项,差异项指示剪枝前后模型的差异。
该种可能的实现方式中,通过在第二损失函数增加差异项,可以一定程度上约束剪枝前后的模型的差异程度,保证剪枝前后模型的相似性。
可选地,在第二方面/第三方面的一种可能的实现方式中,上述步骤:基于第二损失函数以及约束条件对多个第一模型分别进行剪枝,包括:基于约束条件计算更新系数,更新系数用于调整第一稀疏项的方向;使用更新系数更新第二损失函数中的第一稀疏项,以得到第三损失函数,第三损失函数包括差异项与第二稀疏项,第二稀疏项基于更新系数与第一稀疏项更新得到;基于第三损失函数对模型进行剪枝。
该种可能的实现方式中,通过引入更新系数可以调整剪枝方向,从而使得剪枝后的第一模型满足约束条件。该情况下的剪枝也可以理解为是定向剪枝。
可选地,在第二方面/第三方面的一种可能的实现方式中,上述的第三损失函数包括:
Figure PCTCN2022100682-appb-000003
其中,|| || 2为L 2范数,V n与,W n为分别为剪枝前后模型的参数,λ为超参数,用于调节第一稀疏项的权重;s i为更新系数,通过调节s i以满足约束条件,
Figure PCTCN2022100682-appb-000004
为剪枝后模型中的第i个子结构的参数。
该种可能的实现方式中,通过更新系数更新第一稀疏项得到第二稀疏项,且更新后的第二稀疏项可以对模型进行定向剪枝,保证剪枝后模型的精度不损失。
可选地,在第二方面/第三方面的一种可能的实现方式中,上述的训练数据包括:图像数据、音频数据或者文本数据等。可以理解的是,上述三种只是对训练数据的举例,在实际应用中,根据神经网络模型的输入不同,训练数据的具体形式不同,此处不做限定。
可选地,在第二方面/第三方面的一种可能的实现方式中,上述的神经网络模型用于对图像数据进行分类和/或识别等。可以理解的是,在实际应用中,神经网络模型还可以用于预测、编码、解码等,具体此处不做限定。
在一种可能的实现方式中,上述接收、剪枝、聚合、发送的步骤可以是一次也可以是多次,具体根据需要设置,该情况下可以得到更加符合用户预期的模型,在保证模型精度不损失的情况下完成剪枝,节省存储和通信成本。
本申请第四方面提供了一种模型处理设备,可以应用于模型训练与剪枝场景,该模型处理设备可以是客户端,该模型处理设备包括:获取单元,用于获取包括标签值的训练数据;训练单元,用于以训练数据为输入,根据第一损失函数训练神经网络模型,得到第一模型,第一模型包括多个子结构,每个子结构包括至少两个神经元;剪枝单元,用于基于第二损失函数以及约束条件对第一模型进行剪枝,以得到第二模型,第二损失函数用于指示将多个子结构中的至少一个子结构进行剪枝,约束条件用于约束第二模型的精度不低于第一模型的精度,精度指示模型的输出值与标签值之间的差异程度。其中,第一损失函数也可以理解为是数据损失函数,主要用于在使用数据训练模型的过程中评估模型的精度。第二损失函数可以理解为是稀疏损失函数,主要用于对模型进行稀疏(或称为剪枝)。
可选地,上述第四方面提供的模型处理设备的各个单元可以被配置为用于实现前述第一方面的任意可能的实现方式中的方法。
本申请第五方面提供了一种上游设备,可以应用于模型训练与剪枝场景、联邦学习场景等,该上游设备可以是联邦学习场景中的云服务器或边服务器,该上游设备包括:发送单元, 用于向多个下游设备发送神经网络模型,神经网络模型包括多个子结构,每个子结构包括至少两个神经元;接收单元,用于接收来自多个下游设备的多个第一模型,多个第一模型由神经网络模型训练得到,其中,训练过程中使用的损失函数可以称为第一损失函数;剪枝单元,用于基于损失函数(后续称为第二损失函数)以及约束条件对多个第一模型分别进行剪枝,其中,第二损失函数用于指示对多个第一模型的子结构进行剪枝,约束条件用于约束每个第一模型剪枝后的精度不低于剪枝前的精度;聚合单元,用于将剪枝后的多个第一模型进行聚合,以得到第二模型。
可选地,上述第五方面提供的上游设备的各个单元可以被配置为用于实现前述第二方面的任意可能的实现方式中的方法。
本申请第六方面提供了一种上游设备,可以应用于模型训练与剪枝场景、联邦学习场景等,该上游设备可以是联邦学习场景中的云服务器或边服务器,该上游设备包括:发送单元,用于向多个下游设备发送神经网络模型,神经网络模型包括多个子结构,每个子结构包括至少两个神经元;接收单元,用于接收来自多个下游设备的多个第一模型,多个第一模型由神经网络模型训练得到,其中,训练过程中使用的损失函数可以称为第一损失函数;聚合单元,用于将多个第一模型进行聚合,以得到第二模型;剪枝单元,用于基于损失函数(后续称为第二损失函数)以及约束条件对第二模型进行剪枝,其中,第二损失函数用于指示对第二模型的子结构进行剪枝,约束条件用于约束第二模型剪枝后的精度不低于剪枝前的精度。
可选地,上述第六方面提供的上游设备的各个单元可以被配置为用于实现前述第三方面的任意可能的实现方式中的方法。
本申请第七方面提供了一种电子设备,包括:处理器,处理器与存储器耦合,存储器用于存储程序或指令,当程序或指令被处理器执行时,使得该电子设备实现上述第一方面、第二方面、第三方面的任意可能的实现方式中的方法。
本申请第八方面提供了一种计算机可读介质,其上存储有计算机程序或指令,当计算机程序或指令在计算机上运行时,使得计算机执行前述第一方面、第二方面、第三方面的任意可能的实现方式中的方法。
本申请第九方面提供了一种计算机程序产品,该计算机程序产品在计算机上执行时,使得计算机执行前述第一方面、第二方面或第三方面的任意可能的实现方式中的方法。
上述第四方面、第五方面、第六方面、第七方面、第八方面、第九方面的任一种可能的实现方式所带来的技术效果可参考前面第一方面、第二方面、第三方面中对应的实现方式所带来的技术效果,此处不再赘述。
附图说明
图1为人工智能主体框架的一种结构示意图;
图2为本申请提供的一种联邦学习系统的架构示意图;
图3为本申请提供的另一种联邦学习系统的架构示意图;
图4为本申请提供的另一种联邦学习系统的架构示意图;
图5为本申请提供的另一种联邦学习系统的架构示意图;
图6为本申请提供的联邦学习方法的一个流程示意图;
图7A、图8-图10为本申请提供的剪枝过程中剪枝方向的几种示意图;
图7B为本申请提供的剪枝前后模型的结构示意图;
图11为本申请提供的联邦学习方法的另一个流程示意图;
图12为本申请提供的模型处理方法一个流程示意图;
图13为本申请提供的模型处理设备的一个结构示意图;
图14为本申请提供的上游设备的一个结构示意图;
图15为本申请提供的上游设备的另一个结构示意图;
图16为本申请提供的模型处理设备的另一个结构示意图;
图17为本申请提供的上游设备的另一个结构示意图。
具体实施方式
本申请实施例提供了一种模型处理方法、联邦学习方法及相关设备。在对第一模型进行剪枝的过程中,考虑基于数据损失函数的约束条件,相当于为第一模型的剪枝提供一个方向,使得剪枝得到的第二模型的精度不低于第一模型的精度,减少后续通过微调调整模型精度的步骤,从而在保证剪枝后模型的精度的同时提升模型剪枝过程的效率。
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
首先对人工智能系统总体工作流程进行描述,请参见图1,图1示出的为人工智能主体框架的一种结构示意图,下面从“智能信息链”(水平轴)和“IT价值链”(垂直轴)两个维度对上述人工智能主题框架进行阐述。其中,“智能信息链”反映从数据的获取到处理的一列过程。举例来说,可以是智能信息感知、智能信息表示与形成、智能推理、智能决策、智能执行与输出的一般过程。在这个过程中,数据经历了“数据—信息—知识—智慧”的凝练过程。“IT价值链”从人智能的底层基础设施、信息(提供和处理技术实现)到系统的产业生态过程,反映人工智能为信息技术产业带来的价值。
(1)基础设施
基础设施为人工智能系统提供计算能力支持,实现与外部世界的沟通,并通过基础平台实现支撑。通过传感器与外部沟通;计算能力由智能芯片,如中央处理器(central processing unit,CPU)、网络处理器(neural-network processing unit,NPU)、图形处理器(英语:graphics processing unit,GPU)、专用集成电路(application specific integrated circuit,ASIC)或现场可编程逻辑门阵列(field programmable gate array,FPGA)等硬件加速芯片)提供;基础平台包括分布式计算框架及网络等相关的平台保障和支持,可以包括云存储和计算、互联互通网络等。举例来说,传感器和外部沟通获取数据,这些数据提供给基础平台提供的分布式计算系统中的智能芯片进行计算。
(2)数据
基础设施的上一层的数据用于表示人工智能领域的数据来源。数据涉及到图形、图像、语音、文本,还涉及到传统设备的物联网数据,包括已有系统的业务数据以及力、位移、液 位、温度、湿度等感知数据。
(3)数据处理
数据处理通常包括数据训练,机器学习,深度学习,搜索,推理,决策等方式。
其中,机器学习和深度学习可以对数据进行符号化和形式化的智能信息建模、抽取、预处理、训练等。
推理是指在计算机或智能系统中,模拟人类的智能推理方式,依据推理控制策略,利用形式化的信息进行机器思维和求解问题的过程,典型的功能是搜索与匹配。
决策是指智能信息经过推理后进行决策的过程,通常提供分类、排序、预测等功能。
(4)通用能力
对数据经过上面提到的数据处理后,进一步基于数据处理的结果可以形成一些通用的能力,比如可以是算法或者一个通用系统,例如,翻译,文本的分析,计算机视觉的处理,语音识别,图像的识别等等。
(5)智能产品及行业应用
智能产品及行业应用指人工智能系统在各领域的产品和应用,是对人工智能整体解决方案的封装,将智能信息决策产品化、实现落地应用,其应用领域主要包括:智能终端、智能交通、智能医疗、自动驾驶、智慧城市等。
本申请实施例可以应用于客户端、也可以应用于云端,还可以应用于对各种应用联邦学习场景中采用到的机器学习模型进行训练,训练后的机器学习模型可以应用于上述各种应用领域中以实现分类、回归或其他功能,训练后的机器学习模型的处理对象可以为图像样本、离散数据样本、文本样本或语音样本等,此处不做穷举。其中机器学习模型具体可以表现为神经网络、线性模型或其他类型的机器学习模型等,对应的,组成机器学习模型的多个模块具体可以表现为神经网络模块、现行模型模块或组成其他类型的机器学习模型的模块等,此处不做穷举。在后续实施例中,仅以机器学习模型表现为神经网络为例进行说明,对于机器学习模型表现为除神经网络之外的其他类型时可以类推理解,本申请实施例中不再赘述。
本申请实施例可以应用于客户端、云端、或联邦学习等,主要是对神经网络进行训练与剪枝,因此涉及了大量神经网络的相关应用。为了更好地理解本申请实施例的方案,下面先对本申请实施例可能涉及的神经网络的相关术语和概念进行介绍。
1、神经网络
神经网络可以是由神经单元组成的,神经单元可以是指以X s和截距1为输入的运算单元,该运算单元的输出可以如公式(1-1)所示:
Figure PCTCN2022100682-appb-000005
其中,s=1、2、……n,n为大于1的自然数,Ws为xs的权重,b为神经单元的偏置。f为神经单元的激活函数(activation functions),用于将非线性特性引入神经网络中,来将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入,激活函数可以是sigmoid函数。神经网络是将多个上述单一的神经单元联结在一起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连,来提取局部接受域的特征,局部接受域可以是由若干个神经单元组成的区域。
2、深度神经网络
深度神经网络(deep neural network,DNN),也称多层神经网络,可以理解为具有多层中间层的神经网络。按照不同层的位置对DNN进行划分,DNN内部的神经网络可以分为三类:输入层,中间层,输出层。一般来说第一层是输入层,最后一层是输出层,中间的层数都是中间层,或者称为隐藏层。在未进行剪枝或压缩的神经网络中,层与层之间是全连接的,也就是说,第i层的任意一个神经元一定与第i+1层的任意一个神经元相连。
虽然DNN看起来很复杂,其每一层可以表示为线性关系表达式:
Figure PCTCN2022100682-appb-000006
其中,
Figure PCTCN2022100682-appb-000007
是输入向量,
Figure PCTCN2022100682-appb-000008
是输出向量,
Figure PCTCN2022100682-appb-000009
是偏移向量或者称为偏置参数,w是权重矩阵(也称系数),α()是激活函数。每一层仅仅是对输入向量
Figure PCTCN2022100682-appb-000010
经过如此简单的操作得到输出向量
Figure PCTCN2022100682-appb-000011
由于DNN层数多,系数W和偏移向量
Figure PCTCN2022100682-appb-000012
的数量也比较多。这些参数在DNN中的定义如下所述:以系数w为例:假设在一个三层的DNN中,第二层的第4个神经元到第三层的第2个神经元的线性系数定义为
Figure PCTCN2022100682-appb-000013
上标3代表系数W所在的层数,而下标对应的是输出的第三层索引2和输入的第二层索引4。
综上,第L-1层的第k个神经元到第L层的第j个神经元的系数定义为
Figure PCTCN2022100682-appb-000014
需要注意的是,输入层是没有W参数的。在深度神经网络中,更多的中间层让网络更能够刻画现实世界中的复杂情形。理论上而言,参数越多的模型复杂度越高,“容量”也就越大,也就意味着它能完成更复杂的学习任务。训练深度神经网络的也就是学习权重矩阵的过程,其最终目的是得到训练好的深度神经网络的所有层的权重矩阵(由很多层的向量W形成的权重矩阵)。
3、卷积神经网络
卷积神经网络(convolutional neuron network,CNN)是一种带有卷积结构的深度神经网络。卷积神经网络包含了一个由卷积层和子采样层构成的特征抽取器,该特征抽取器可以看作是滤波器。卷积层是指卷积神经网络中对输入信号进行卷积处理的神经元层。在卷积神经网络的卷积层中,一个神经元可以只与部分邻层神经元连接。一个卷积层中,通常包含若干个特征平面,每个特征平面可以由一些矩形排列的神经单元组成。同一特征平面的神经单元共享权重,这里共享的权重就是卷积核。共享权重可以理解为提取图像信息的方式与位置无关。卷积核可以以随机大小的矩阵的形式初始化,在卷积神经网络的训练过程中卷积核可以通过学习得到合理的权重。另外,共享权重带来的直接好处是减少卷积神经网络各层之间的连接,同时又降低了过拟合的风险。
4、循环神经网络(recurrent neural network,RNN)
在传统的神经网络中模型中,层与层之间是全连接的,每层之间的设备是无连接的。但是这种普通的神经网络对于很多问题是无法解决的。比如,预测句子的下一个单词是什么,因为一个句子中前后单词并不是独立的,一般需要用到前面的单词。循环神经网络指的是一个序列当前的输出与之前的输出也有关。具体的表现形式为网络会对前面的信息进行记忆,保存在网络的内部状态中,并应用于当前输出的计算中。
5、损失函数
在训练深度神经网络的过程中,因为希望深度神经网络的输出尽可能的接近真正想要预测的值,所以可以通过比较当前网络的预测值和真正想要的目标值,再根据两者之间的差异 情况来更新每一层神经网络的权重向量(当然,在第一次更新之前通常会有初始化的过程,即为深度神经网络中的各层预先配置参数),比如,如果网络的预测值高了,就调整权重向量让它预测低一些,不断地调整,直到深度神经网络能够预测出真正想要的目标值或与真正想要的目标值非常接近的值。因此,就需要预先定义“如何比较预测值和目标值之间的差异”,这便是损失函数(loss function)或目标函数(objective function),它们是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么深度神经网络的训练就变成了尽可能缩小这个loss的过程。该损失函数通常可以包括误差平方均方、交叉熵、对数、指数等损失函数。例如,可以使用均方误差作为损失函数,定义为
Figure PCTCN2022100682-appb-000015
具体可以根据实际应用场景选择具体的损失函数。
6、反向传播算法
神经网络可以采用误差反向传播(back propagation,BP)算法在训练过程中修正初始的神经网络模型中参数的大小,使得神经网络模型的重建误差损失越来越小。具体地,前向传递输入信号直至输出会产生误差损失,通过反向传播误差损失信息来更新初始的神经网络模型中参数,从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动,旨在得到最优的神经网络模型的参数,例如权重矩阵。
在本申请中,客户端在进行模型训练时,即可通过损失函数或者通过BP算法来对全局模型进行训练,以得到训练后的全局模型。
7、联邦学习(federated learning,FL)
一种分布式机器学习算法,通过多个客户端,如移动设备或边缘服务器,和服务器在数据不出域的前提下,协作式完成模型训练和算法更新,以得到训练后的全局模型。可以理解为,在进行机器学习的过程中,各参与方可借助其他方数据进行联合建模。各方无需共享数据资源,即数据不出本地的情况下,进行数据联合训练,建立共享的机器学习模型。
首先,本申请实施例可以应用于模型处理设备(例如客户端、云端)或联邦学习系统。下面先对本申请提供的联邦学习系统进行介绍。
参阅图2,本申请提供的一种联邦学习系统的架构示意图。该系统(或者也可以简称为集群)中可以包括多个服务器,该多个服务器之间可以互相建立连接,即各个服务器之间也可以进行通信。每个服务器可以和一个或者多个客户端通信,客户端可以部署于各种设备中,如部署于移动终端或者服务器等,如图2中所示出的客户端1、客户端2、…、客户端N-1以及客户端N等。
具体地,服务器之间或者服务器与客户端之间,可以通过任何通信机制/通信标准的通信网络进行交互,通信网络可以是广域网、局域网、点对点连接等方式,或它们的任意组合。具体地,该通信网络可以包括无线网络、有线网络或者无线网络与有线网络的组合等。该无线网络包括但不限于:第五代移动通信技术(5th-Generation,5G)系统,长期演进(long term evolution,LTE)系统、全球移动通信系统(global system for mobile communication,GSM)或码分多址(code division multiple access,CDMA)网络、宽带码分多址(wideband code division multiple access,WCDMA)网络、无线保真(wireless fidelity,WiFi)、蓝牙(bluetooth)、紫蜂协议(Zigbee)、射频识别技术(radio frequency identification,RFID)、远程(Long Range,Lora)无线通信、近距 离无线通信(near field communication,NFC)中的任意一种或多种的组合。该有线网络可以包括光纤通信网络或同轴电缆组成的网络等。
通常,客户端可以部署于各种服务器或者终端中,以下所提及的客户端也可以是指部署了客户端软件程序的服务器或者终端,该终端可以包括移动终端或者固定安装的终端等,例如,该终端具体可以包括手机、平板、个人计算机(personal computer,PC)、智能手环、音响、电视、智能手表或其他终端等。
在进行联邦学习时,每个服务器可以向与其建立了连接的客户端下发待训练的模型,客户端可以使用本地存储的训练样本对该模型进行训练,并将训练后的模型的参数等数据反馈至服务器,服务器在接收到一个或者多个客户端反馈的训练后的一个或者多个模型之后,可以对收到的一个或者多个模型进行剪枝,并对剪枝后的一个或者多个模型的数据进行聚合,以得到聚合后的数据,相当于聚合后的模型。在满足停止条件之后,即可输出最终的模型,完成联邦学习。
通常,为解决客户端和服务器之间距离较远而导致的传输时延大的问题,一般在服务器和客户端之间引入中间层服务器(本申请称为边服务器),形成多层架构,即客户端-边服务器-云服务器的架构,从而通过边服务器来减少客户端和联邦学习系统之间的传输时延。
具体地,本申请提供的联邦学习方法可以应用的联邦学习系统可以包括多种拓扑关系,如联邦学习系统可以包括两层或者两层以上的架构,下面对一些可能的架构进行示例性介绍。
一、两层架构
如图3所示,本申请提供的一种联邦学习系统的结构示意图。
其中,该联邦学习系统内包括服务器-客户端形成的两层架构。服务器可以直接与一个或者多个客户端直接建立连接。
在联邦学习的过程中,服务器向与其建立了连接的一个或者多个客户端下发全局模型。
一种可能实现的方式中,客户端使用本地存储的训练样本对接收到的全局模型进行训练,并将训练后的全局模型反馈至服务器,服务器基于接收到的训练后的全局模型进行剪枝,并对本地存储的全局模型进行更新,以得到最终的全局模型。
另一种可能实现的方式中,客户端使用本地存储的训练样本对接收到的全局模型进行训练与剪枝,并将剪枝后的全局模型反馈至服务器,服务器基于接收到的训练与剪枝后的全局模型对本地存储的全局模型进行更新,以得到最终的全局模型。
二、三层架构
如图4所示,本申请提供的一种联邦学习系统的结构示意图。
其中,联邦学习系统中包括了一个或多个云服务器、一个或多个边服务器以及一个或者多个客户端,形成云服务器-边服务器-客户端三层架构。
在该系统中,一个或者多个边服务器接入云服务器,一个或者多个客户端接入边服务器。
在进行联邦学习的过程中,云服务器将本地保存的全局模型下发给边服务器,然后边服务器将该全局模型下发给与其连接的客户端。
一种可能实现的方式中,客户端使用本地存储的训练样本对接收到的全局模型进行训练,并将训练后的全局模型反馈给边服务器,边服务器对接收到的训练后的全局模型进行剪枝,并用剪枝后的全局模型对本地存储的全局模型进行更新,并将边服务器更新后的全局模型再 反馈至云服务器,完成联邦学习。
另一种可能实现的方式中,客户端使用本地存储的训练样本对接收到的全局模型进行训练与剪枝,并将训练与剪枝后的全局模型反馈给边服务器,边服务器根据接收到的训练后的全局模型对本地存储的全局模型进行更新,并将边服务器更新后的全局模型再反馈至云服务器,完成联邦学习。
一种可能实现的方式中,客户端使用本地存储的训练样本对接收到的全局模型进行训练,并将训练后的全局模型反馈给边服务器,边服务器根据接收到的训练后的全局模型对本地存储的全局模型进行更新,并将边服务器更新后的全局模型再反馈至云服务器,云服务器再对接收到的全局模型进行剪枝,得到剪枝后的全局模型,完成联邦学习。
模型剪枝的过程可以在客户端,也可以在边服务器,还可以在云服务器,除了上面举例的只在客户端或边服务器或云服务器进行模型剪枝的方式,在联邦学习的过程中,也可以在多个环节都进行剪枝的过程,例如客户端训练模型时进行剪枝后再发送给边服务器,边服务器聚合模型时也进行剪枝,再发给云服务器处理,具体此处不做限定。
三、三层以上架构
如图5所示,本申请提供的另一种联邦学习系统的结构示意图。
其中,该联邦学习系统中包括了三层以上内的架构,其中一层包括一个或者多个云服务器,多个边服务器形成两层或者两层以上的架构,如一个或者多个上游边服务器组成一层架构,每个上游边服务器与一个或者多个下游边服务器连接。边服务器形成的最后一层架构中的每个边服务器和一个或者多个客户端连接,从而客户端形成一层架构。
在联邦学习的过程中,最上游的云服务器将本地存储的最新的全局模型下发给下一层的边服务器,随后边服务器向下一层逐层下发全局模型,直至下发至客户端。客户端在接收到边服务器下发的全局模型之后,使用本地存储的训练样本对接收到的全局模型进行训练与剪枝,并将训练与剪枝后的全局模型反馈给上一层的边服务器,然后上一层边服务器基于接收到的训练后的全局模型对本地存储的全局模型进行更新之后,即可将更新后的全局模型上传至更上一层的边服务器,以此类推,直到第二层边服务器将更新后的全局模型上传至云服务器,云服务器基于接收到的全局模型更新本地的全局模型,以得到最终的全局模型,完成联邦学习。可以理解的是,这里仅以客户端对模型进行训练与剪枝为例进行说明,与上述三层架构类似,剪枝的过程可以在联邦学习系统中的任意一层,具体此处不做限定。
需要说明的是,在本申请中,针对联邦学习架构中的每个设备,将向云服务器传输数据的方向称为上游,将向客户端传输数据的方向称为下游,例如,如图3中所示,服务器是客户端的上游设备,客户端是服务器的下游设备,如图4所示,云服务器可以称为边服务器的上游设备,客户端可以称为边服务器的下游设备等,以此类推。
另外,对本申请实施例中的模型(例如:神经网络模型、第一模型、第二模型、第三模型等等)所应用的场景做下简单介绍,该模型可以应用于前述的智能终端、智能交通、智能医疗、自动驾驶、智慧城市等任何需要神经网络模型对文本、图像或语音等输入数据进行分类、识别、预测、推荐、翻译、编码、解码等场景。
前述对本申请提供的联邦学习系统以及模型的应用场景进行了介绍,下面对该联邦学习系统中各个设备执行的详细步骤进行介绍。
本申请实施例中,在联邦学习系统下,根据剪枝步骤是由上游设备(云服务器或边服务器)还是由客户端执行可以分为两种情况,下面分别描述:
第一种,上游设备执行剪枝步骤。
参阅图6,本申请实施例提供的联邦学习方法一个实施例,该实施例包括步骤601至步骤608。
步骤601,上游设备向客户端发送神经网络模型。相应的,客户端接收上游设备发送的神经网络模型。
本申请实施例中的上游设备可以是前述图2-图5中的联邦学习系统中的服务器。例如,该上游设备可以是如前述图2中所示出的多个服务器中的任意一个服务器,也可以是前述图3中所示出的两层架构中的任意一个服务器,也可以是如图4中所示的云服务器或者边服务器中的任意一个,还可以是如图5中所述示出的云服务器或者边服务器中的任意一个。该客户端的数量可以是一个或者多个,若上游设备与多个客户端建立了连接,则上游设备可以向每个客户端发送神经网络模型。
其中,上述的神经网络模型可以是上游设备本地存储的模型,如云服务器本地存储的全局模型,或者上游设备可以接收到其他服务器发送的模型之后,将接收到的模型保存在本地或者更新本地存储的模型。具体地,上游设备可以向客户端发送神经网络模型的结构参数(如神经网络的宽度、深度或者卷积核大小等)或者初始权重参数等,可选地,上游设备还可以向客户端发送训练配置参数,如学习率、epoch数量或者安全算法中类别等参数,以使最终进行训练的客户端可以使用该训练配置参数来对神经网络模型进行训练。
例如,当上游设备为云服务器时,该神经网络模型可以是云服务器上保存的全局模型,为便于区分,以下将云服务器上保存的全局模型称为云侧模型。
又例如,当该上游设备是边服务器时,该神经网络模型可以是边服务器上保存的本地模型,或者称为边服务器模型,在边服务器接收到上一层边服务器或者云服务器下发的模型之后,使用接收到的模型作为边服务器模型或更新已有的边服务器模型,以得到新的边服务器模型,并向客户端发送新的边服务器模型(即神经网络模型)。还需要说明的是,当上游设备是边服务器时,上游设备可以直接向客户端下发边服务器模型(或者称为神经网络模型)。
在本申请实施例中,所提及的神经网络模型,如第一模型、第二模型或者第三模型等等,具体可以包括卷积神经网络(convolutional neural networks,CNN),深度卷积神经网络(deep convolutional neural networks,DCNN),循环神经网络(recurrent neural network,RNN)等神经网络,具体可以根据实际应用场景确定待学习的模型,本申请对此并不作限定。
可选地,上游设备可以主动向与其连接的客户端发送神经网络模型,也可以是在客户端的请求下向客户端发送神经网络模型。例如,若上游设备为边服务器,客户端可以向边服务器发送请求消息,以请求参与联邦学习,边服务器在接收到请求消息之后,若确认允许该客户端参与联邦学习,则可以向该客户端下发神经网络模型。又例如,若上游设备为云服务器,客户端可以向云服务器发送请求消息,以请求参与联邦学习,云服务器在接收到该请求消息,并确认允许该客户端参与联邦学习,则可以将本地存储的云侧模型下发给边服务器,边服务器根据接收到的模型更新本地的网络模型得到神经网络模型,并将神经网络模型下发给客户 端。
步骤602,客户端以训练数据为输入,根据第一损失函数训练神经网络模型,得到第一模型。
客户端在接收到上游设备发送的神经网络模型之后,即可基于该神经网络模型更新本地存储的端侧模型,例如通过替换、加权融合等方式得到新的神经网络模型。并使用带标签值的训练数据与第一损失函数对该新的神经网络模型进行训练,从而得到第一模型。
本申请实施例中的训练数据可以有多种类型或形式,具体与模型所应用的场景相关。例如:当模型的作用是音频识别,则训练数据的具体形式可以是音频数据等。又例如:当模型的作用是图像分类,则训练数据的具体形式可以是图像数据等。再例如:当模型的作用是预测语音,则训练数据的具体形式可以是文本数据等。可以理解的是,上述几种情况只是举例,并且并不一定是一一对应的关系,例如对于音频识别,训练数据的具体形式还可以是图像数据或文本数据等(例如:若应用于教育领域中的看图播放语音场景,则模型的作用是识别图像对应的语音,则训练数据的具体形式可以是图像数据),在实际应用中,还有其他的场景,例如:当模型的作用的电影推荐场景,则训练数据可以是电影对应的词向量等。在一些应用场景,上述训练数据还可以同时包括不同模态的数据,比如在自动驾驶场景,训练数据可以包括摄像头采集的图像/视频数据,还可以包括用户发出指示的语音/文本数据等。本申请实施例中对于训练数据的具体形式或类型不做限定。
客户端可以使用本地保存的训练样本对神经网络模型(或者是上述的新的神经网络模型)进行训练,以得到第一模型。例如,客户端可以部署于移动终端中,该移动终端在运行过程中可以采集到大量数据,客户端可以将采集到的数据作为训练样本,从而对神经网络模型进行个性化的训练,以得到客户端的个性化模型。
其中,客户端(以其中一个第一客户端为例)对神经网络模型进行训练的过程具体可以包括:以训练数据作为神经网络模型的输入,以减小第一损失函数的值为目标对神经网络模型进行训练,以得到第一模型。第一损失函数用于指示神经网络模型的输出值与标签值之间的差异。进一步的,使用训练数据与优化算法对神经网络进行训练,以得到第一模型。该第一损失函数可以理解为是数据损失函数(简称训练loss)。
本申请实施例中的第一损失函数可以是均方误差损失,也可以是交叉熵损失等可以用来衡量神经网络模型输出值与标签值(或真实值)之间差异的函数。
上述的优化算法可以是梯度下降方法,也可以是牛顿法,还可以是自适应矩估计法等可用于机器学习中的优化算法,具体此处不做限定,下面以梯度算法为例进行描述。
可选地,梯度算法的一种具体形式如下:
公式一:
Figure PCTCN2022100682-appb-000016
其中,v i
Figure PCTCN2022100682-appb-000017
分别表示联邦学习第n轮训练过程中更新前与更新后的神经网络模型参数,γ是梯度下降优化算法的学习率或每一步更新的步长。f为第一损失函数(具体形式可以是上述的均方误差损失、交叉熵损失等)。
Figure PCTCN2022100682-appb-000018
为f的梯度,例如对f求导得到。
可选地,上述公式一只是一种梯度算法的举例,在实际应用中,梯度算法还可以是其他 类型的公式,具体此处不做限定。
另外,在上述训练过程中,可以采用全部或部分(例如:切片)的训练数据。一般情况下会采用切片的方式,每次迭代使用一个切片(batch)的数据
Figure PCTCN2022100682-appb-000019
来计算损失函数,更新梯度值:
公式二:
Figure PCTCN2022100682-appb-000020
其中,
Figure PCTCN2022100682-appb-000021
表示第i个客户端存储的数据集(即上述的训练数据)中的某一个切片的数据集合
Figure PCTCN2022100682-appb-000022
中的一组训练数据
Figure PCTCN2022100682-appb-000023
表示用于神经网络模型训练的输入数据,
Figure PCTCN2022100682-appb-000024
表示输入数据
Figure PCTCN2022100682-appb-000025
对应的真实标签(或标签值)。
可选地,上述公式二只是一种计算梯度的举例,在实际应用中,计算梯度还可以是其他类型的公式,具体此处不做限定。
本申请实施例中的第一模型可以是上述训练过程中的神经网络模型,也可以是第一损失函数的值小于第一阈值后的第一模型,换句话说,第一模型可以是上述训练过程中的神经网络模型,也可以是基于客户端的本地数据集训练结束后得到的模型,具体此处不做限定。
步骤603,客户端向上游设备发送第一模型。相应的,上游设备接收客户端发送的第一模型。
可选地,客户端向上游设备发送第一模型或者是第一模型的信息,例如权重参数、梯度参数等。相应的,上游设备接收客户端发送的第一模型。
步骤604,上游设备基于第二损失函数以及约束条件对第一模型进行剪枝,以得到第二模型。
上游设备接收客户端发送的第一模型之后,可以基于第二损失函数与约束条件对第一模型进行剪枝,以得到第二模型。
可选地,在接收到客户端发送的第一模型之后,可以先确定第一模型的子结构,再对第一模型进行子结构上的剪枝。该子结构包括至少两个神经元,且该子结构可以根据实际需要设置,子结构可以是神经网络模型的通道、特征图、网络层、子网络、或者预定义的由多个神经元组成的其他网络结构;当神经网络模型是卷积神经网络时,子结构还可以是卷积核。总之,一个子结构可以看作一个功能整体,在剪枝时对子结构进行剪枝,是指将该子结构包括的所有神经元都进行剪枝。通过对模型子结构的剪枝,能够从模型结构上对模型进行压缩,便于底层硬件加速的实现。
上述的第二损失函数可以理解为是稀疏损失函数(简称稀疏loss),该第二损失函数包括差异项与第一稀疏项,其中,差异项用于表示第一模型与第二模型之间参数的差异。第一稀疏项用于将第一模型的多个子结构中的至少一个子结构进行剪枝。约束条件用于约束第二 模型的精度不低于第一模型的精度,该精度指示模型的输出值与标签值之间的差异程度。
可选地,第二损失函数的一种具体形式如下:
公式三:
Figure PCTCN2022100682-appb-000026
其中,n是迭代的次数,n为正整数,
Figure PCTCN2022100682-appb-000027
为差异项,
Figure PCTCN2022100682-appb-000028
为第一稀疏项。|| || 2为L 2范数,V n为第一模型的参数,W n为第二模型的参数,λ为超参数,用于调节第一稀疏项的权重,本申请实施例中的超参数可以取任意非负实数。例如,当λ=0时,表示稀疏项的权重为0,即训练过程中不要求子结构稀疏,通常可以适用于第一模型较小,传输通信成本较小,无需要求第一模型子结构稀疏的场景。
Figure PCTCN2022100682-appb-000029
为第二模型中第i个子结构。
可以理解的是,上述的第二损失函数只是一种举例,在实际应用中,还可以有其他形式的第二损失函数,例如,第一稀疏项中的L 2范数可以更换为L 0范数、L 1范数、L 0范数的近似、L 1范数的近似、L 0与L p混合范数、L 1与L p混合范数等可以用于引导变量稀疏性的函数。又例如,差异项可以替换为欧氏距离、马氏距离、互信息、余弦相似度、内积或者范数等其他任何衡量两个变量相似度或距离的函数。差异项与第一稀疏项的选择具体可以根据实际应用场景适配,具体此处不做限定。
可选地,若客户端的数量为多个,即上游设备接收的第一模型的数量为多个,则上游设备可以用相同的稀疏损失函数与约束条件对多个第一模型分别进行剪枝,以得到多个第二模型,多个第二模型与多个第一模型一一对应;当然,上游设备也可以根据子结构的类型对多个第一模型进行分组,每组对应的稀疏损失函数可以相同或不同,然后将同一组的多个第一模型结合起来进行剪枝;或者,上游设备也可以将所接收到的所有第一模型结合起来一起进行剪枝。对于多个第一模型的剪枝方式,具体此处不做限定。下面以一个第一模型为例描述具体的剪枝过程。
本申请实施例中的第一模型的剪枝方向可以理解为是对第二损失函数计算得到的Wn的下降方向,第一模型的训练数据方向可以理解为是对第一损失函数计算得到Vn的下降方向。例如,对于梯度下降来说,第一模型的剪枝方向可以理解为是对第二损失函数求导得到的Wn的梯度方向,第一模型的训练数据方向可以理解为是对第一损失函数求导得到Vn的梯度方向。
本申请实施例中,基于第二损失函数与约束条件对第一模型进行剪枝的方式有多种,下面举例描述:
第一种,通过引入更新系数s i的方式对第一模型进行剪枝。
一种可能实现的方式中,基于约束条件计算更新系数,该更新系数用于调整第一稀疏项的方向。使用更新系数更新第二损失函数中的第一稀疏项,以得到第三损失函数,该第三损失函数包括差异项与第二稀疏项,第二稀疏项基于更新系数与第一稀疏项更新得到。获取第三损失函数之后,基于第三损失函数对第一模型进行剪枝,以得到第二模型。具体的,可以根据约束条件确定子空间,该子空间内的第二模型精度与第一模型精度相同。
可选地,第三损失函数的一种形式具体如下:
公式四:
Figure PCTCN2022100682-appb-000030
其中,s i为所述更新系数,通过调节s i以满足所述约束条件。其余描述可参考前述关于第二损失函数的描述,具体此处不再赘述。
可选地,上述公式四只是第三损失函数的一种举例,在实际应用中可以根据如前述第二损失函数的描述所设置,具体此处不做限定。
为了更直观的理解s i,下面结合附图进行描述。
示例性的,请参阅图7A,假设第一模型参数V n包括三个子结构,分别为V 1、V 2以及V 3,为了方便在图7A中示出剪枝方向,先对V n进行分组,假设分为两组(或者理解为2个子结构):a与b。其中,a={1,2},b={3}。这样,对第一模型参数V n进行剪枝,可以理解为是对V a和/或V b进行剪枝。图7A以对V a进行剪枝为例进行描述,即对第一模型进行剪枝直至V a变为0。其中,第二损失函数的下降方向即图7A中校正前的剪枝方向。
上述举例用数学表达式表示如下:
V n:=(V 1,V 2,V 3);
Figure PCTCN2022100682-appb-000031
a=(1,2),b={3};
V a=(V 1,V 2);V b=(V 3);
Figure PCTCN2022100682-appb-000032
Figure PCTCN2022100682-appb-000033
Figure PCTCN2022100682-appb-000034
其中,E()可以理解为是组内归一化算子。另外,海森(Hessian)矩阵
Figure PCTCN2022100682-appb-000035
有很多近似0的特征值,在这些特征值对应的方向上对模型参数加扰动几乎不会改变模型的精度,如图7A所示的P0表示这些方向生成的子空间(图7A中以P0是平面为例),该平面内第一模型的精度与第二模型的精度相同。∏ 0表示投影到子空间P0的投影算子,s a为a组对应的s i,计算方式可参考下述公式:
公式五:
Figure PCTCN2022100682-appb-000036
可以理解的是,上述计算s i的公式只是一种示例,实际应用中,还可以有其他类型的公式,具体此处不做限定。
示例性的,以第一模型包括输入层、隐藏层1-3、输出层为例,展示剪枝前的模型(即第一模型)与剪枝后的模型(即第二模型)对比图,可以参考图7B,其中,子结构可以是神经网络模型的通道、特征图、网络层、子网络、或者预定义的由多个神经元组成的其他网络结构;当神经网络模型是卷积神经网络时,子结构还可以是卷积核。总之,一个子结构可以看作一个功能整体,在剪枝时对子结构进行剪枝,是指将该子结构包括的所有神经元都进行剪枝。图7B仅以一个子结构包括2个神经元为例进行描述。可以理解的是,一个子结构可以包括更多或更少的神经元。从图7B可以看出,剪枝后的模型相较于剪枝前的模型减少了两个子结构。当然,图7B只是为了更加直观的描述剪枝前后模型的变化,剪枝子结构的数量可以是一个或多个,具体此处不做限定。
另一种可能实现的方式中,还可以通过s i调整V b对V a进行剪枝,进而使得预测的训练数据方向V′ n在进行剪枝后得到的方向为矫正后的剪枝方向,基于该方向对第一模型进行剪枝,对第一模型的精度影响较小。示例性的,请参阅图8。
第二种,基于约束条件校正第一模型的剪枝方向,得到校正后的剪枝方向,并基于校正后的剪枝方向对第一模型进行剪枝。
梯度方向梯度方向本申请实施例中基于约束条件校正第一模型的剪枝方向的方式有多种,下面分别描述:
1、基于约束条件确定第一模型的剪枝方向。
除了上述第一种方式中引入更新系数s i之外,还可以通过下述的方式确定较优的第二模型(即剪枝后的模型):
可选地,假设是一边训练数据一边进行剪枝,则步骤602中梯度算法的另一种形式可以如下所示:
公式六:
Figure PCTCN2022100682-appb-000037
公式七:
Figure PCTCN2022100682-appb-000038
其中,n为迭代次数,Z n+1为第n组训练数据,且Z n+1∈Z,γ是梯度下降优化算法的学习率或每一步更新的步长。f为第一损失函数(具体形式可以是上述的均方误差损失、交叉熵损失等)。
Figure PCTCN2022100682-appb-000039
为f的梯度,例如对f求导得到。其余参数的解释可以参考前述步骤602中对于梯度算法的解释,此处不再赘述。g(n,γ)为调节剪枝方向的函数,c、μ是控制稀疏惩罚强度的两个超参数,μ∈(0.5,1],i表示第i个子结构,
Figure PCTCN2022100682-appb-000040
换句话说,使用训练数据Z更新V n得到第一模型V n+1,并基于V n+1更新W n+1,再用 W n+1替换上述梯度算法中的V n,从而不断对第一模型进行剪枝直至得到满足实际需要的第二模型(例如迭代次数达到阈值或第二模型精度/准确度达到阈值等)。
可选地,上述公式六与公式七只是解较优/最优第二模型的一种举例,在实际应用中可以有其他方式,具体此处不做限定。例如:上述的公式七可以替换为下述公式八:
公式八:
Figure PCTCN2022100682-appb-000041
其中,() +表示只取其中大于0的数,小于0的数置零。
2、确定第一模型精度与第二模型精度一致的子空间,基于子空间校正第一模型的剪枝方向。
基于约束条件确定第一模型精度与第二模型精度相同(或一致)的子空间(如上述描述的P0,此处不再赘述),并基于子空间校正第一模型的剪枝方向。
本申请实施例中,基于子空间校正第一模型的剪枝方向的方法的方式有多种,下面分别描述:
2.1、将第一模型的剪枝方向投影到子空间得到校正后的剪枝方向。
可选地,可以将第一模型的剪枝方向投影到子空间得到校正后的剪枝方向,第一模型根据校正后的剪枝方向进行剪枝,可以保证剪枝前的第一模型精度与剪枝后的第二模型精度相同。
示例性的,如图7A所示,确定子空间P0之后,将校正前的剪枝方向投影至子空间P0,从而得到如图7A所示的校正后的剪枝方向。
2.2、第一模型的剪枝方向基于子空间做镜像得到校正后的剪枝方向。
可选地,确定子空间后,可以基于子空间做镜像得到校正后的剪枝方向,第一模型根据校正后的剪枝方向进行剪枝,可以保证剪枝前的第一模型精度与剪枝后的第二模型精度相近。或者说第二模型的第一损失函数的值小于第一模型的第一损失函数的值。
示例性的,如图9所示,确定子空间P0之后,基于子空间P0对校正前的剪枝方向进行镜像处理,从而得到如图9所示镜像后的剪枝方向(即修正后的剪枝方向)。
3、若第一模型的剪枝方向与第一模型的训练数据方向之间的夹角为钝角,调整至锐角。
可选地,在对第一模型剪枝之前,可以先确定第一模型的训练数据方向与第一模型的剪枝方向之间的夹角,若该夹角为钝角(即剪枝方向与数据训练方向相反,这也是现有技术中对模型剪枝后需要对模型进行微调的原因),则调整第一模型的剪枝方向以满足校正后的剪枝方向与第一模型的训练数据方向之间的夹角为锐角或直角。再根据校正后的剪枝方向对第一模型进行剪枝,可以保证剪枝前的第一模型精度与剪枝后的第二模型精度相近。或者说第二模型的第一损失函数的值小于或等于第一模型的第一损失函数的值。
示例性的,如图10所示,校正前的剪枝方向可能与第一模型的数据训练方向之间的夹角为钝角,即两个方向不一致。为了保证剪枝方向与数据训练方向在一个大的方向上一致,减少后续对模型进行微调的步骤。将剪枝方向调整到可以与第一模型的数据训练方向之间夹角为锐角或直角的方向范围内。
可以理解的是,基于第二损失函数与约束条件对第一模型进行剪枝的方式有多种,上述几种只是举例说明,在实际应用中,基于第二损失函数与约束条件对第一模型进行剪枝还可以有其他方式,例如:可以重复随机优化第一模型,直至剪枝后的第二模型满足约束条件等,具体此处不做限定。
步骤605,上游设备聚合多个第二模型,得到第三模型。本步骤是可选地。
上游设备基于步骤605获取多个第二模型之后,可以聚合多个第二模型,得到第三模型。并将第三模型作为全局模型。
可选地,将第三模型作为步骤601中的神经网络模型重复执行步骤601至步骤605。换句话说,步骤601至步骤605算一次迭代,本申请实施例中的步骤601至步骤605可以执行多次。进一步的,若步骤601至步骤605循环执行,可以设置步骤601至步骤605的停止条件(也可以理解为是剪枝更新的停止条件),该停止条件可以是循环次数、剪枝后模型的稀疏程度达到某个阈值等,具体此处不做限定。
本申请实施例中,上述聚合的方式可以是求多个第二模型的加权平均,也可以是求多个第二模型的平均等,具体此处不做限定。
可选地,若每个客户端的模型的数据是同分布的,则聚合得到的第三模型的精度高于第一模型的精度,若每个客户端的模型的数据是非同步的,则聚合得到的第三模型的精度可能低于第一模型的精度。
可选地,上游设备聚合多个第二模型得到第三模型之后,还可以通过损失函数与约束条件对第三模型进行剪枝,得到第四模型。若上游设备为云服务器,则上游设备可以向边服务器或客户端发送第四模型。若上游设备为边服务器,则边服务器可以向云服务发送第四模型,以便于上层服务器对第四模型进行处理。
步骤606,是否满足第一预设条件,若是,训练结束。若否,以第三模型作为神经网络模型重复执行前述步骤。本步骤是可选地。
可选地,上游设备获取第三模型之后,可以判断是否满足第一预设条件,若是(满足),则模型的训练过程结束。训练过程结束后,可以执行步骤607与步骤608,或者可以向上游的边服务器或云服务器发送第三模型。本申请实施例中对于训练结束后执行的步骤不做限定。
若否(不满足),将第三模型作为步骤601中的神经网络模型重复执行图6所示步骤601至步骤606的步骤(或者理解为一次迭代)。即上游设备向客户端发送第三模型,客户端使用本地数据集训练第三模型得到第四模型,并向上游设备发送第四模型。上游设备接收多个客户端发送的多个第四模型,上游设备基于稀疏损失函数与约束条件对多个第四模型进行剪枝,得到多个第五模型。并聚合多个第五模型以得到第六模型,再判断是否满足第一预设条件,若满足,训练结束。若不满足,将第六模型作为上述第一次迭代的神经网络模型或第二次迭代的第三模型重复执行图6所示的步骤,直至满足第一预设条件。
可选地,若不满足第一预设条件,迭代过程中,上游设备还可以向客户端发送用于指示训练未结束的第一指示信息,以便于客户端根据第一指示信息确定是继续训练模型。
其中,第一预设条件可以是第三模型收敛、步骤601至步骤605的循环次数达到阈值、全局模型准确率达到阈值等,具体此处不做限定。
步骤607,上游设备向客户端发送第三模型与第二指示信息。本步骤是可选地,
可选地,若满足第一预设条件,上游设备向客户端发送第三模型与第二指示信息,该第二指示信息用于指示第三模型的训练过程结束。
步骤608,客户端根据第二指示信息使用第三模型进行推理。本步骤是可选地。
可选地,客户端接收到第三模型与第二指示信息之后,根据第二指示信息可以获知第三模型的训练过程已结束,并使用第三模型进行推理。
本实施例中,在联邦学习场景下,客户端通过本地数据对神经网络模型进行训练得到第一模型,并向上游设备发送该第一模型。上游设备再根据约束条件对第一模型进行剪枝。一方面,上游设备在剪枝过程中考虑约束条件,使得剪枝后的第二模型精度高于或等于第一模型,也可以理解为是不会增加训练损失的剪枝,减少后续通过微调调整模型精度的步骤,从而在保证剪枝后模型的精度的同时提升模型剪枝过程的效率。另一方面,通过对模型子结构的剪枝,能够从模型结构上对模型进行压缩,便于底层硬件加速的实现。并且减小了模型体积,降低了客户端的存储和计算开销。
第二种,客户端执行剪枝步骤。
参阅图11,本申请实施例提供的联邦学习方法的另一个实施例,该实施例包括步骤1101至步骤1108。
步骤1101,上游设备向客户端发送神经网络模型。相应的,客户端接收上游设备发送的神经网络模型。
步骤1102,客户端以训练数据为输入,根据第一损失函数训练神经网络模型,得到第一模型。
本实施例中的步骤1101与步骤1102与前述图6所示实施例中的步骤601与步骤602类似,此处不再赘述。
步骤1103,客户端基于第二损失函数以及约束条件对第一模型进行剪枝,以得到第二模型。
本实施例中客户端执行的步骤1103与前述图6所示实施例中上游设备执行的步骤604类似,此处不再赘述。
可选地,可以将步骤1102与步骤1103视为一次迭代过程。客户端获取第二模型之后,可以判断是否满足第二预设条件,若是(满足),则执行步骤1104。若否(不满足),将第三模型作为步骤1102中的神经网络模型重复执行步骤1102与步骤1103(或者理解为一次迭代)。即以训练数据为输入,根据第一损失函数训练第三模型,得到训练后的第三模型。并基于约束条件对训练后的第三模型进行剪枝得到第七模型,再判断是否满足第一预设条件,若满足,执行步骤1104。若不满足,将第七模型作为上述第一次迭代的神经网络模型或第二次迭代的第三模型重复执行步骤1102与步骤1103,直至满足第二预设条件。
其中,第二预设条件可以是模型收敛、步骤1102与步骤1103的循环次数达到阈值、模型准确率达到阈值等,具体此处不做限定。
可选地,也可以将步骤1103视为一次迭代过程。客户端获取第二模型之后,可以判断是否满足第三预设条件,若是(满足),则执行步骤1104。若否(不满足),将第三模型作为步骤1102中的神经网络模型重复执行步骤1102,直至满足第三预设条件。
其中,第三预设条件可以是模型收敛、步骤1102的循环次数达到阈值、模型准确率达到 阈值等,具体此处不做限定。
步骤1104,客户端向上游设备发送第二模型。相应的,上游设备接收客户端发送的第二模型。
可选地,客户端向上游设备发送第二模型或者是第二模型的信息,例如权重参数、梯度参数等。相应的,上游设备接收客户端发送的第二模型。
步骤1105,上游设备聚合多个第二模型,得到第三模型。
本实施例中的步骤1105与前述图6所示实施例中的步骤605类似,此处不再赘述。
步骤1106,是否满足第一预设条件,若是,训练结束。若否,以第三模型作为神经网络模型重复执行前述步骤。本步骤是可选地。
可选地,上游设备获取第三模型之后,可以判断是否满足第一预设条件,若是(满足),则训练结束。训练过程结束后,可以执行步骤1107与步骤1108,或者可以向上游的边服务器或云服务器发送第三模型。本申请实施例中对于训练结束后执行的步骤不做限定。
若否(不满足),将第三模型作为步骤601中的神经网络模型重复执行图6所示的步骤(或者理解为一次迭代)。即上游设备向客户端发送第三模型,客户端使用本地数据集训练第三模型得到第四模型,并基于稀疏损失函数与约束条件对多个第四模型进行剪枝,得到多个第五模型。向上游设备发送第五模型。上游设备接收多个客户端发送的多个第五模型,并聚合多个第五模型以得到第六模型,再判断是否满足第一预设条件,若满足,训练结束。若不满足,将第六模型作为上述第一次迭代的神经网络模型或第二次迭代的第三模型重复执行图6所示的步骤,直至满足第一预设条件。
可选地,若不满足第一预设条件,迭代过程中,上游设备还可以向客户端发送用于指示训练未结束的第一指示信息,以便于客户端根据第一指示信息确定是继续训练模型。
其中,第一预设条件可以是第三模型收敛、步骤1101至步骤1105的循环次数达到阈值、全局模型准确率达到阈值等,具体此处不做限定。
步骤1107,上游设备向客户端发送第二指示信息。本步骤是可选地。
步骤1108,客户端根据第二指示信息使用第三模型进行推理。本步骤是可选地。
本实施例中的步骤1107、步骤1108与前述图6所示实施例中的步骤607、步骤608类似,此处不再赘述。
该实施例与前述图6所示的实施例的区别主要是,图6所示实施例中的剪枝步骤由上游设备执行,本实施例中的剪枝步骤由客户端执行。
本实施例中,在联邦学习场景下,客户端通过本地数据对神经网络模型进行训练得到第一模型,再根据约束条件对第一模型进行剪枝得到第二模型,并向上游设备发送第二模型,上游设备根据第二模型进行聚合得到全局模型。一方面,上游设备在剪枝过程中考虑约束条件,使得剪枝后的第二模型精度高于或等于第一模型,也可以理解为是不会增加训练损失的剪枝,减少后续通过微调调整模型精度的步骤,从而在保证剪枝后模型的精度的同时提升模型剪枝过程的效率。另一方面,通过对模型子结构的剪枝,能够从模型结构上对模型进行压缩,便于底层硬件加速的实现。并且减小了模型体积,降低了客户端的存储和计算开销。
上述对本申请实施例中的方法应用于联邦学习场景进行了描述,下面对本申请实施例还提供的一种模型处理方法进行描述。请参阅图12,本申请实施例提供的模型处理方法一个实 施例,该方法可以由模型处理设备(例如客户端)执行,也可以由模型处理设备的部件(例如处理器、芯片、或芯片系统等)执行,其中,模型处理设备可以是云服务器或客户端,该实施例包括步骤1201至步骤1203。
步骤1201,获取包括标签值的训练数据。
本申请实施例中基于标签值的训练数据可以存储在服务器等其他设备中,模型处理设备通过服务器等其他设备获取该训练数据。也可以通过模型处理设备在运行过程中采集得到,具体此处不做限定。
步骤1202,以训练数据为输入,根据第一损失函数训练神经网络模型,得到第一模型。
本实施例中模型处理设备执行的步骤1202与前述图6所示实施例中客户端执行的步骤602类似,此处不再赘述。
步骤1203,基于第二损失函数以及约束条件对第一模型进行剪枝,以得到第二模型。
本实施例中模型处理设备执行的步骤1203与前述图6所示实施例中上游设备执行的步骤604类似,此处不再赘述。
本实施例中,模型处理设备通过本地数据对神经网络模型进行训练得到第一模型,再根据约束条件对第一模型进行剪枝得到第二模型。模型处理设备在剪枝过程中考虑约束条件,使得剪枝后的第二模型精度高于或等于第一模型,也可以理解为是不会增加训练损失的剪枝,减少后续模型微调的过程。
上面对本申请实施例中的模型处理方法与联邦学习方法进行了描述,下面对本申请实施例中的模型处理设备与上游设备进行描述,请参阅图13,本申请实施例中模型处理设备的一个实施例包括:
获取单元1301,用于获取包括标签值的训练数据;
训练单元1302,用于以训练数据为输入,根据第一损失函数训练神经网络模型,得到第一模型,第一模型包括多个子结构,多个子结构中的每个子结构包括至少两个神经元;
剪枝单元1303,用于基于第二损失函数以及约束条件对第一模型进行剪枝,以得到第二模型,第二损失函数用于指示将多个子结构中的至少一个子结构进行剪枝,约束条件用于约束第二模型的精度不低于第一模型的精度,精度指示模型的输出值与标签值之间的差异程度。
可选地,模型处理设备还包括:
接收单元1304,用于接收上游设备发送的神经网络模型;
发送单元1305,用于向上游设备发送第二模型。
本实施例中,模型处理设备中各单元所执行的操作与前述图2至图5或图11中客户端或图12所示实施例中模型处理设备执行的步骤、相关描述类似,此处不再赘述。
本实施例中,剪枝单元1303在对第一模型进行剪枝的过程中,考虑基于数据损失函数的约束条件,相当于为第一模型的剪枝提供一个方向,使得剪枝得到的第二模型的精度不低于第一模型的精度,减少后续通过微调调整模型精度的步骤,从而在保证剪枝后模型的精度的同时提升模型剪枝过程的效率。
请参阅图14,本申请实施例中上游设备的一个实施例,该上游设备可以是前述的云服务器,也可以是前述的边服务器,该上游设备包括:
发送单元1401,用于向多个下游设备发送神经网络模型,神经网络模型包括多个子结构, 多个子结构中的每个子结构包括至少两个神经元;
接收单元1402,用于接收来自多个下游设备的多个第一模型,多个第一模型由神经网络模型训练得到;
剪枝单元1403,用于基于损失函数以及约束条件对多个第一模型分别进行剪枝,其中,损失函数用于指示对多个第一模型的子结构进行剪枝,约束条件用于约束每个第一模型剪枝后的精度不低于剪枝前的精度;
聚合单元1404,用于将剪枝后的多个第一模型进行聚合,以得到第二模型。
本实施例中,上游设备中各单元所执行的操作与前述图2至图11所示实施例中云服务器或边服务器执行的步骤、相关描述类似,此处不再赘述。
本实施例中,在联邦学习场景下,客户端通过本地数据对神经网络模型进行训练得到第一模型,并向上游设备发送该第一模型。剪枝单元1403在根据约束条件对第一模型进行剪枝。一方面,剪枝单元1403在剪枝过程中考虑约束条件,使得剪枝前后第一模型的精度近似,也可以理解为是不会增加训练损失的剪枝,减少后续通过微调调整模型精度的步骤,从而在保证剪枝后模型的精度的同时提升模型剪枝过程的效率。另一方面,剪枝单元1403通过对模型子结构的剪枝,能够从模型结构上对模型进行压缩,便于底层硬件加速的实现。并且减小了模型体积,降低了客户端的存储和计算开销。
请参阅图15,本申请实施例中上游设备的另一个实施例,该上游设备可以是前述的云服务器,也可以是前述的边服务器,该上游设备包括:
发送单元1501,用于向多个下游设备发送神经网络模型,所述神经网络模型包括多个子结构,多个子结构中的每个子结构包括至少两个神经元;
接收单元1502,用于接收来自所述多个下游设备的多个第一模型,所述多个第一模型由所述神经网络模型训练得到;
聚合单元1503,用于将所述多个第一模型进行聚合,以得到第二模型;
剪枝单元1504,用于基于损失函数以及约束条件对所述第二模型进行剪枝,其中,所述损失函数用于指示对所述第二模型的所述子结构进行剪枝,所述约束条件用于约束所述第二模型剪枝后的精度不低于剪枝前的精度。
本实施例中,上游设备中各单元所执行的操作与前述图2至图11所示实施例中云服务器或边服务器执行的步骤、相关描述类似,此处不再赘述。
本实施例中,在联邦学习场景下,客户端通过本地数据对神经网络模型进行训练得到第一模型,并向上游设备发送该第一模型。剪枝单元1504在根据约束条件对第二模型进行剪枝。一方面,剪枝单元1504在剪枝过程中考虑约束条件,使得剪枝前后的第二模型精度近似,也可以理解为是不会增加训练损失的剪枝,减少后续通过微调调整模型精度的步骤,从而在保证剪枝后模型的精度的同时提升模型剪枝过程的效率。另一方面,剪枝单元1504通过对模型子结构的剪枝,能够从模型结构上对模型进行压缩,便于底层硬件加速的实现。并且减小了模型体积,降低了客户端的存储和计算开销。
本申请实施例还提供了一种模型处理设备,如图16所示,为了便于说明,仅示出了与本申请实施例相关的部分,具体技术细节未揭示的,请参照本申请实施例方法部分(即前述图2至图11中客户端或图12所示实施例中模型处理设备执行的步骤与相关描述类似)。该模型 处理设备可以为包括手机、平板电脑等任意终端设备,以模型处理设备是客户端,客户端是手机为例:
图16示出的是与本申请实施例提供的模型处理设备-手机的部分结构的框图。参考图16,手机包括:射频(radio frequency,RF)电路1610、存储器1620、输入单元1630、显示单元1640、传感器1650、音频电路1660、无线保真(wireless fidelity,WiFi)模块1670、处理器1680、以及电源1690等部件。本领域技术人员可以理解,图16中示出的手机结构并不构成对手机的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
下面结合图16对手机的各个构成部件进行具体的介绍:
RF电路1610可用于收发信息或通话过程中,信号的接收和发送,特别地,将基站的下行信息接收后,给处理器1680处理;另外,将设计上行的数据发送给基站。通常,RF电路1610包括但不限于天线、至少一个放大器、收发信机、耦合器、低噪声放大器(low noise amplifier,LNA)、双工器等。此外,RF电路1610还可以通过无线通信与网络和其他设备通信。上述无线通信可以使用任一通信标准或协议,包括但不限于全球移动通讯系统(global system of mobile communication,GSM)、通用分组无线服务(general packet radio service,GPRS)、码分多址(code division multiple access,CDMA)、宽带码分多址(wideband code division multiple access,WCDMA)、长期演进(long term evolution,LTE)、电子邮件、短消息服务(short messaging service,SMS)等。
存储器1620可用于存储软件程序以及模块,处理器1680通过运行存储在存储器1620的软件程序以及模块,从而执行手机的各种功能应用以及数据处理。存储器1620可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据手机的使用所创建的数据(比如音频数据、电话本等)等。此外,存储器1620可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。
输入单元1630可用于接收输入的数字或字符信息,以及产生与手机的用户设置以及功能控制有关的键信号输入。具体地,输入单元1630可包括触控面板1631以及其他输入设备1632。触控面板1631,也称为触摸屏,可收集用户在其上或附近的触摸操作(比如用户使用手指、触笔等任何适合的物体或附件在触控面板1631上或在触控面板1631附近的操作),并根据预先设定的程式驱动相应的连接装置。可选的,触控面板1631可包括触摸检测装置和触摸控制器两个部分。其中,触摸检测装置检测用户的触摸方位,并检测触摸操作带来的信号,将信号传送给触摸控制器;触摸控制器从触摸检测装置上接收触摸信息,并将它转换成触点坐标,再送给处理器1680,并能接收处理器1680发来的命令并加以执行。此外,可以采用电阻式、电容式、红外线以及表面声波等多种类型实现触控面板1631。除了触控面板1631,输入单元1630还可以包括其他输入设备1632。具体地,其他输入设备1632可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆等中的一种或多种。
显示单元1640可用于显示由用户输入的信息或提供给用户的信息以及手机的各种菜单。 显示单元1640可包括显示面板1641,可选的,可以采用液晶显示器(liquid crystal display,LCD)、有机发光二极管(organic light-emitting diode,OLED)等形式来配置显示面板1641。进一步的,触控面板1631可覆盖显示面板1641,当触控面板1631检测到在其上或附近的触摸操作后,传送给处理器1680以确定触摸事件的类型,随后处理器1680根据触摸事件的类型在显示面板1641上提供相应的视觉输出。虽然在图16中,触控面板1631与显示面板1641是作为两个独立的部件来实现手机的输入和输入功能,但是在某些实施例中,可以将触控面板1631与显示面板1641集成而实现手机的输入和输出功能。
手机还可包括至少一种传感器1650,比如光传感器、运动传感器以及其他传感器。具体地,光传感器可包括环境光传感器及接近传感器,其中,环境光传感器可根据环境光线的明暗来调节显示面板1641的亮度,接近传感器可在手机移动到耳边时,关闭显示面板1641和/或背光。作为运动传感器的一种,加速计传感器可检测各个方向上(一般为三轴)加速度的大小,静止时可检测出重力的大小及方向,可用于识别手机姿态的应用(比如横竖屏切换、相关游戏、磁力计姿态校准)、振动识别相关功能(比如计步器、敲击)等;至于手机还可配置的陀螺仪、气压计、湿度计、温度计、红外线、IMU、SLAM传感器等其他传感器,在此不再赘述。
音频电路1660、扬声器1661,传声器1662可提供用户与手机之间的音频接口。音频电路1660可将接收到的音频数据转换后的电信号,传输到扬声器1661,由扬声器1661转换为声音信号输出;另一方面,传声器1662将收集的声音信号转换为电信号,由音频电路1660接收后转换为音频数据,再将音频数据输出处理器1680处理后,经RF电路1610以发送给比如另一手机,或者将音频数据输出至存储器1620以便进一步处理。
WiFi属于短距离无线传输技术,手机通过WiFi模块1670可以帮助用户收发电子邮件、浏览网页和访问流式媒体等,它为用户提供了无线的宽带互联网访问。虽然图16示出了WiFi模块1670,但是可以理解的是,其并不属于手机的必须构成。
处理器1680是手机的控制中心,利用各种接口和线路连接整个手机的各个部分,通过运行或执行存储在存储器1620内的软件程序和/或模块,以及调用存储在存储器1620内的数据,执行手机的各种功能和处理数据,从而对手机进行整体监控。可选的,处理器1680可包括一个或多个处理单元;优选的,处理器1680可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器1680中。
手机还包括给各个部件供电的电源1690(比如电池),优选的,电源可以通过电源管理系统与处理器1680逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。
尽管未示出,手机还可以包括摄像头、蓝牙模块等,在此不再赘述。
在本申请实施例中,该手机所包括的处理器1680可以执行前述图2至图5、图11中客户端或图12所示实施例中模型处理设备的功能,此处不再赘述。
参阅图17,本申请提供的另一种上游设备的结构示意图。该上游设备可以包括处理器1701、存储器1702和通信接口1703。该处理器1701、存储器1702和通信接口1703通过线路互联。其中,存储器1702中存储有程序指令和数据。
存储器1702中存储了前述图2至图6或图11所示实施例中云服务器或边服务器执行的步骤对应的程序指令以及数据。
处理器1701,用于执行前述图2至图6或图11所示实施例中云服务器或边服务器执行的步骤。
通信接口1703可以用于进行数据的接收和发送,用于执行前述图2至图6或图11所示实施例中云服务器或边服务器与获取、发送、接收相关的步骤。
一种实现方式中,上游设备可以包括相对于图17更多或更少的部件,本申请对此仅仅是示例性说明,并不作限定。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。
当使用软件实现所述集成的单元时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,以便包含一系列单元的过程、方法、系统、产品或设备不必限于那些单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它单元。

Claims (24)

  1. 一种模型处理方法,其特征在于,所述方法包括:
    获取包括标签值的训练数据;
    以所述训练数据为输入,根据第一损失函数训练神经网络模型,得到第一模型,所述第一模型包括多个子结构,所述多个子结构中的每个子结构包括至少两个神经元;
    基于第二损失函数以及约束条件对所述第一模型进行剪枝,以得到第二模型,所述第二损失函数用于指示将所述多个子结构中的至少一个子结构进行剪枝,所述约束条件用于约束所述第二模型的精度不低于所述第一模型的精度,所述精度指示模型的输出值与所述标签值之间的差异程度。
  2. 根据权利要求1所述的方法,其特征在于,所述约束条件具体用于约束所述第一损失函数的下降方向与所述第二损失函数的下降方向之间的夹角小于或等于90度。
  3. 根据权利要求1所述的方法,其特征在于,所述约束条件具体用于约束所述第二模型的所述第一损失函数的值小于或等于所述第一模型的所述第一损失函数的值。
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,所述第二损失函数包括第一稀疏项,所述第一稀疏项与所述多个子结构中的至少一个子结构的权重相关。
  5. 根据权利要求1至4中任一项所述的方法,其特征在于,所述基于第二损失函数以及约束条件对所述第一模型进行剪枝,以得到第二模型,包括:
    基于所述第二损失函数对所述第一模型进行至少一次随机剪枝,直至剪枝所述第一模型后得到的所述第二模型满足所述约束条件。
  6. 根据权利要求1至5中任一项所述的方法,其特征在于,所述方法应用于客户端,所述训练数据是所述客户端本地的数据,所述方法还包括:
    接收上游设备发送的所述神经网络模型;
    向所述上游设备发送所述第二模型。
  7. 根据权利要求1至6中任一项所述的方法,其特征在于,所述训练数据包括:图像数据,音频数据或者文本数据。
  8. 根据权利要求1至7中任一项所述的方法,其特征在于,所述神经网络模型用于对图像数据进行分类和/或识别。
  9. 一种联邦学习方法,其特征在于,所述方法包括:
    向多个下游设备发送神经网络模型,所述神经网络模型包括多个子结构,所述多个子结构中的每个子结构包括至少两个神经元;
    接收来自所述多个下游设备的多个第一模型,所述多个第一模型由所述神经网络模型训练得到;
    基于损失函数以及约束条件对所述多个第一模型分别进行剪枝,其中,所述损失函数用于指示对所述多个第一模型的所述子结构进行剪枝,所述约束条件用于约束每个所述第一模型剪枝后的精度不低于剪枝前的精度;
    将剪枝后的所述多个第一模型进行聚合,以得到第二模型。
  10. 一种联邦学习方法,其特征在于,所述方法包括:
    向多个下游设备发送神经网络模型,所述神经网络模型包括多个子结构,所述多个子结 构中的每个子结构包括至少两个神经元;
    接收来自所述多个下游设备的多个第一模型,所述多个第一模型由所述神经网络模型训练得到;
    将所述多个第一模型进行聚合,以得到第二模型;
    基于损失函数以及约束条件对所述第二模型进行剪枝,其中,所述损失函数用于指示对所述第二模型的所述子结构进行剪枝,所述约束条件用于约束所述第二模型剪枝后的精度不低于剪枝前的精度。
  11. 一种模型处理设备,其特征在于,所述设备包括:
    获取单元,用于获取包括标签值的训练数据;
    训练单元,用于以所述训练数据为输入,根据第一损失函数训练神经网络模型,得到第一模型,所述第一模型包括多个子结构,所述多个子结构中的每个子结构包括至少两个神经元;
    剪枝单元,用于基于第二损失函数以及约束条件对所述第一模型进行剪枝,以得到第二模型,所述第二损失函数用于指示将所述多个子结构中的至少一个子结构进行剪枝,所述约束条件用于约束所述第二模型的精度不低于所述第一模型的精度,所述精度指示模型的输出值与所述标签值之间的差异程度。
  12. 根据权利要求11所述的设备,其特征在于,所述约束条件具体用于约束所述第一损失函数的下降方向与所述第二损失函数的下降方向之间的夹角小于或等于90度。
  13. 根据权利要求11所述的设备,其特征在于,所述约束条件具体用于约束所述第二模型的所述第一损失函数的值小于或等于所述第一模型的所述第一损失函数的值。
  14. 根据权利要求11至13中任一项所述的设备,其特征在于,所述第二损失函数包括第一稀疏项,所述第一稀疏项与所述多个子结构中的至少一个子结构的权重相关。
  15. 根据权利要求11至14中任一项所述的设备,其特征在于,所述剪枝单元,具体用于基于所述第二损失函数对所述第一模型进行至少一次随机剪枝,直至剪枝所述第一模型后得到的所述第二模型满足所述约束条件。
  16. 根据权利要求11至15中任一项所述的设备,其特征在于,所述模型处理设备为客户端,所述训练数据是所述客户端本地的数据,所述模型处理设备还包括:
    接收单元,用于接收上游设备发送的所述神经网络模型;
    发送单元,用于向所述上游设备发送所述第二模型。
  17. 根据权利要求11至16中任一项所述的设备,其特征在于,所述训练数据包括:图像数据,音频数据或者文本数据。
  18. 根据权利要求11至17中任一项所述的设备,其特征在于,所述神经网络模型用于对图像数据进行分类和/或识别。
  19. 一种上游设备,其特征在于,所述上游设备应用于联邦学习方法,所述上游设备包括:
    发送单元,用于向多个下游设备发送神经网络模型,所述神经网络模型包括多个子结构,所述多个子结构中的每个子结构包括至少两个神经元;
    接收单元,用于接收来自所述多个下游设备的多个第一模型,所述多个第一模型由所述神经网络模型训练得到;
    剪枝单元,用于基于损失函数以及约束条件对所述多个第一模型分别进行剪枝,其中,所述损失函数用于指示对所述多个第一模型的所述子结构进行剪枝,所述约束条件用于约束每个所述第一模型剪枝后的精度不低于剪枝前的精度;
    聚合单元,用于将剪枝后的所述多个第一模型进行聚合,以得到第二模型。
  20. 一种上游设备,其特征在于,所述上游设备应用于联邦学习方法,所述上游设备包括:
    发送单元,用于向多个下游设备发送神经网络模型,所述神经网络模型包括多个子结构,所述多个子结构中的每个子结构包括至少两个神经元;
    接收单元,用于接收来自所述多个下游设备的多个第一模型,所述多个第一模型由所述神经网络模型训练得到;
    聚合单元,用于将所述多个第一模型进行聚合,以得到第二模型;
    剪枝单元,用于基于损失函数以及约束条件对所述第二模型进行剪枝,其中,所述损失函数用于指示对所述第二模型的所述子结构进行剪枝,所述约束条件用于约束所述第二模型剪枝后的精度不低于剪枝前的精度。
  21. 一种电子设备,其特征在于,包括:处理器,所述处理器与存储器耦合,所述存储器用于存储程序或指令,当所述程序或指令被所述处理器执行时,使得所述电子设备执行如权利要求1至8中任一项所述的方法。
  22. 一种电子设备,其特征在于,包括:处理器,所述处理器与存储器耦合,所述存储器用于存储程序或指令,当所述程序或指令被所述处理器执行时,使得所述电子设备执行如权利要求9或10所述的方法。
  23. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有指令,所述指令在计算机上执行时,使得所述计算机执行如权利要求1至10中任一项所述的方法。
  24. 一种计算机程序产品,其特征在于,所述计算机程序产品在计算机上执行时,使得所述计算机执行如权利要求1至10中任一项所述的方法。
PCT/CN2022/100682 2021-07-06 2022-06-23 一种模型处理方法、联邦学习方法及相关设备 WO2023279975A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110763965.6A CN113469340A (zh) 2021-07-06 2021-07-06 一种模型处理方法、联邦学习方法及相关设备
CN202110763965.6 2021-07-06

Publications (1)

Publication Number Publication Date
WO2023279975A1 true WO2023279975A1 (zh) 2023-01-12

Family

ID=77878843

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/100682 WO2023279975A1 (zh) 2021-07-06 2022-06-23 一种模型处理方法、联邦学习方法及相关设备

Country Status (2)

Country Link
CN (1) CN113469340A (zh)
WO (1) WO2023279975A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116148193A (zh) * 2023-04-18 2023-05-23 天津中科谱光信息技术有限公司 水质监测方法、装置、设备及存储介质
CN116484922A (zh) * 2023-04-23 2023-07-25 深圳大学 一种联邦学习方法、系统、设备及存储介质
CN116797829A (zh) * 2023-06-13 2023-09-22 北京百度网讯科技有限公司 一种模型生成方法、图像分类方法、装置、设备及介质
CN117910536A (zh) * 2024-03-19 2024-04-19 浪潮电子信息产业股份有限公司 文本生成方法及其模型梯度剪枝方法、装置、设备、介质

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113469340A (zh) * 2021-07-06 2021-10-01 华为技术有限公司 一种模型处理方法、联邦学习方法及相关设备
CN114580632A (zh) * 2022-03-07 2022-06-03 腾讯科技(深圳)有限公司 模型优化方法和装置、计算设备及存储介质
CN114492847B (zh) * 2022-04-18 2022-06-24 奥罗科技(天津)有限公司 一种高效个性化联邦学习系统和方法
CN115170917B (zh) * 2022-06-20 2023-11-07 美的集团(上海)有限公司 图像处理方法、电子设备及存储介质
CN115115064B (zh) * 2022-07-11 2023-09-05 山东大学 一种半异步联邦学习方法及系统
CN118012596A (zh) * 2022-10-29 2024-05-10 华为技术有限公司 一种联邦学习方法及装置
CN118101501B (zh) * 2024-04-23 2024-07-05 山东大学 一种工业物联网异构联邦学习的通信方法和系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180046915A1 (en) * 2016-08-12 2018-02-15 Beijing Deephi Intelligence Technology Co., Ltd. Compression of deep neural networks with proper use of mask
CN110874550A (zh) * 2018-08-31 2020-03-10 华为技术有限公司 数据处理方法、装置、设备和系统
CN112101487A (zh) * 2020-11-17 2020-12-18 深圳感臻科技有限公司 一种细粒度识别模型的压缩方法和设备
CN112396179A (zh) * 2020-11-20 2021-02-23 浙江工业大学 一种基于通道梯度剪枝的柔性深度学习网络模型压缩方法
CN113065636A (zh) * 2021-02-27 2021-07-02 华为技术有限公司 一种卷积神经网络的剪枝处理方法、数据处理方法及设备
CN113469340A (zh) * 2021-07-06 2021-10-01 华为技术有限公司 一种模型处理方法、联邦学习方法及相关设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180046915A1 (en) * 2016-08-12 2018-02-15 Beijing Deephi Intelligence Technology Co., Ltd. Compression of deep neural networks with proper use of mask
CN110874550A (zh) * 2018-08-31 2020-03-10 华为技术有限公司 数据处理方法、装置、设备和系统
CN112101487A (zh) * 2020-11-17 2020-12-18 深圳感臻科技有限公司 一种细粒度识别模型的压缩方法和设备
CN112396179A (zh) * 2020-11-20 2021-02-23 浙江工业大学 一种基于通道梯度剪枝的柔性深度学习网络模型压缩方法
CN113065636A (zh) * 2021-02-27 2021-07-02 华为技术有限公司 一种卷积神经网络的剪枝处理方法、数据处理方法及设备
CN113469340A (zh) * 2021-07-06 2021-10-01 华为技术有限公司 一种模型处理方法、联邦学习方法及相关设备

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116148193A (zh) * 2023-04-18 2023-05-23 天津中科谱光信息技术有限公司 水质监测方法、装置、设备及存储介质
CN116148193B (zh) * 2023-04-18 2023-07-18 天津中科谱光信息技术有限公司 水质监测方法、装置、设备及存储介质
CN116484922A (zh) * 2023-04-23 2023-07-25 深圳大学 一种联邦学习方法、系统、设备及存储介质
CN116484922B (zh) * 2023-04-23 2024-02-06 深圳大学 一种联邦学习方法、系统、设备及存储介质
CN116797829A (zh) * 2023-06-13 2023-09-22 北京百度网讯科技有限公司 一种模型生成方法、图像分类方法、装置、设备及介质
CN117910536A (zh) * 2024-03-19 2024-04-19 浪潮电子信息产业股份有限公司 文本生成方法及其模型梯度剪枝方法、装置、设备、介质
CN117910536B (zh) * 2024-03-19 2024-06-07 浪潮电子信息产业股份有限公司 文本生成方法及其模型梯度剪枝方法、装置、设备、介质

Also Published As

Publication number Publication date
CN113469340A (zh) 2021-10-01

Similar Documents

Publication Publication Date Title
WO2023279975A1 (zh) 一种模型处理方法、联邦学习方法及相关设备
WO2020199932A1 (zh) 模型训练方法、人脸识别方法、装置、设备及存储介质
CN110009052B (zh) 一种图像识别的方法、图像识别模型训练的方法及装置
WO2022083536A1 (zh) 一种神经网络构建方法以及装置
WO2020108483A1 (zh) 模型训练方法、机器翻译方法、计算机设备和存储介质
WO2021233199A1 (zh) 搜索推荐模型的训练方法、搜索结果排序的方法及装置
WO2022105714A1 (zh) 数据处理方法、机器学习的训练方法及相关装置、设备
WO2022116933A1 (zh) 一种训练模型的方法、数据处理的方法以及装置
WO2022179492A1 (zh) 一种卷积神经网络的剪枝处理方法、数据处理方法及设备
WO2020103721A1 (zh) 信息处理的方法、装置及存储介质
CN111816159B (zh) 一种语种识别方法以及相关装置
WO2022016556A1 (zh) 一种神经网络蒸馏方法以及装置
WO2022247683A1 (zh) 一种联邦学习方法、装置及系统
WO2020147369A1 (zh) 自然语言处理方法、训练方法及数据处理设备
EP3642763B1 (en) System and method for neural networks
WO2022012668A1 (zh) 一种训练集处理方法和装置
WO2021136058A1 (zh) 一种处理视频的方法及装置
WO2023217127A1 (zh) 一种因果关系确定方法及相关设备
CN113191479A (zh) 联合学习的方法、系统、节点及存储介质
CN116935188B (zh) 模型训练方法、图像识别方法、装置、设备及介质
WO2023207487A1 (zh) 一种电路走线确定方法及相关设备
CN114334036A (zh) 一种模型训练的方法、相关装置、设备以及存储介质
CN113536970A (zh) 一种视频分类模型的训练方法及相关装置
WO2024046473A1 (zh) 一种数据处理方法及其装置
CN112446462A (zh) 目标神经网络模型的生成方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22836729

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22836729

Country of ref document: EP

Kind code of ref document: A1