WO2022111403A1

WO2022111403A1 - Machine learning method, device, and system

Info

Publication number: WO2022111403A1
Application number: PCT/CN2021/132048
Authority: WO
Inventors: 张朝阳; 杨禹志; 于天航; 李榕
Original assignee: 华为技术有限公司
Priority date: 2020-11-27
Filing date: 2021-11-22
Publication date: 2022-06-02
Also published as: CN114548356A

Abstract

Embodiments of the present application relate to the technical field of machine learning. Disclosed are a machine learning method, device, and system, being capable of significantly reducing the amount of data transmission during machine learning, thereby reducing the time-consuming ratio of the data transmission in an entire learning process, and effectively improving the efficiency of machine learning. The specific solution comprises: a sub node performs BNN-based machine learning on a local data set obtained by means of collection, so as to obtain a local model parameter corresponding to the local data set; and the sub node sends a first message to a central node, the first message comprising the local model parameter.

Description

A machine learning method, apparatus and system

This application claims the priority of a Chinese patent application with an application number of 202011365069.6 and an application title of "A Machine Learning Method, Device and System" filed with the State Intellectual Property Office on November 27, 2020, the entire contents of which are incorporated by reference in in this application.

technical field

The present application relates to the technical field of machine learning, and in particular, to a machine learning method, apparatus and system.

Background technique

With the development of artificial intelligence (AI) technology, the research on the combination of neural network and communication system has received extensive attention. For example, in scenarios such as intelligent communication, Internet of Things, Internet of Vehicles, and smart cities, machine learning can be used to improve the flexibility of the network.

Illustratively, take intelligent communication as an example. Through machine learning, computing nodes (such as base stations) can perform machine learning based on the data uploaded by each local node (such as mobile phones) combined with the neural network model set therein. The base station can deliver the model that meets the preset conditions obtained through machine learning to each mobile phone, so that each mobile phone can adjust its own communication process according to the model to realize intelligent communication.

It can be seen that in the process of machine learning based on the combination of the neural network and the communication system, each local node needs to upload all the data collected by it to the computing node, so that the computing node can perform training based on the data. Depending on the scenario, the amount of data being transferred can be very large. Since each local node needs to transmit data to the computing node, the pressure of data transmission between the local node and the computing node is relatively high. In addition, since the data is directly transmitted by the local node to the central node, it may include content related to user privacy information, which also makes the information privacy information insecure.

SUMMARY OF THE INVENTION

The embodiments of the present application provide a machine learning method, device and system, which can significantly reduce the amount of data transmission in the machine learning process, thereby reducing the time-consuming proportion of data transmission in the entire learning process, thereby effectively improving the efficiency of machine learning .

In order to achieve the above purpose, the embodiment of the present application adopts the following technical solutions:

In a first aspect, a machine learning method is provided, the method is applied to a sub-node, and a binary neural network model BNN is set in the sub-node, and the method includes: the sub-node performs BNN-based machine learning on a local data set collected and acquired, Get the local model parameters corresponding to the local dataset. The child node sends a first message to the central node, where the first message includes local model parameters.

Based on this scheme, a scheme combining BNN with a distributed machine learning architecture is provided. In this example, the BNN-based binarized machine learning can be performed locally, and the parameters of the local neural network model obtained thereby can be binarized parameters. Sending the binarized model parameters to the central node can significantly reduce the requirement for data transmission bandwidth compared to directly sending the high-precision neural network model or model parameters to the central node. , so the transmission time is correspondingly reduced. It is understandable that in the process of data transmission, the child nodes will not perform machine learning. Therefore, reducing the time-consuming of data transmission can significantly increase the proportion of time for machine learning in the entire learning process, thereby improving learning efficiency. .

In one possible design, the local model parameters included in the first message are binarized local model parameters. Based on this scheme, the formal constraints of the local model parameters during the transmission process are given. In this example, the transmitted local model parameters are binarized parameters, that is, including both +1 and -1 parameters. Obviously, each parameter corresponds to a 1-bit bandwidth. Compared with the data transmission under the traditional FL architecture, due to the need for high-precision data transmission, the transmission bandwidth corresponding to a high-precision element may be 16 bits or more. When the model parameter matrix of the element is used, it needs to consume more transmission resources. Therefore, the solution provided by this example can effectively reduce the demand for data transmission resources and shorten the transmission time, thereby improving the learning efficiency of the system.

In a possible design, the method further includes: the child node receives fusion parameters from the central node, and the fusion parameters are obtained by the central node through fusion according to local model parameters. The child node updates the local model parameters according to the fusion parameters to obtain the updated local model parameters. Based on this scheme, a method for model updating by sub-nodes by receiving information is presented. It can be understood that in the distributed learning architecture, the central node can perform data aggregation processing. For example, the model parameters transmitted by each sub-node can be obtained, and these model parameters can be fused to obtain and each sub-node. Model parameters with higher adaptability (such as the above-mentioned fusion parameters) that are adapted to the corresponding conditions of the nodes. The sub-nodes can obtain the fusion parameters from the central node, so that local fusion can be performed according to the fusion parameters in combination with the local parameters, thereby realizing the updating of the local neural network model.

In one possible design, the fusion parameters are binarized model parameters, or the fusion parameters are high-precision model parameters. Based on this scheme, different forms of fusion parameters are given. For example, in some implementations, the fusion parameter may be a binarized model parameter. For another example, in other implementations, the fusion parameter may be a high-precision model parameter. It can be understood that, in this application, the child nodes can transmit the binarized model parameters to the central node. The central node can perform fusion according to multiple binarized model parameters (for example, average weighting according to weights), thereby obtaining corresponding fusion parameters. Taking the fusion process as the weighted average as an example, it is obvious that the model parameters after the fusion process do not only include elements of +1 and -1, that is, the fusion model obtained after the weighted average should be high-precision model parameters. In this application, according to different scenarios, it is possible to choose to download (or download) the fusion parameters using the directly obtained high-precision parameters for downloading, so that the child nodes can more accurately update the local model according to the high-precision fusion parameters . Alternatively, the binarized fusion parameters obtained by binarizing the high-precision parameters are used for downloading, so that the data download rate is faster, and the sub-nodes can also complete the update of the local model faster. For child nodes, when receiving binarized fusion parameters, local fusion can be performed according to the local fusion method corresponding to the binarized fusion parameters, and when receiving high-precision fusion parameters, according to the local fusion method corresponding to the high-precision fusion parameters. The local fusion method performs local fusion. For the specific implementation method, please refer to the description in the embodiment, and details are not repeated here.

In a possible design, the first message further includes: accuracy information corresponding to the local model parameters. Among them, the accuracy information is obtained by the sub-node according to the local model parameters and the test data set. Based on this solution, a specific content of the first message is provided. In this example, after completing a round of local learning, the child node may send the accuracy information corresponding to the learning result to the central node. So that the central node can determine the accuracy of the system according to the accuracy information reported by each sub-node, and then determine whether to issue binary fusion parameters or high-precision fusion parameters. It should be noted that, in other implementation manners of the present application, the accuracy information may also be obtained by the child node through verification according to the local model parameters and the verification data set.

In a possible design, the method further includes: the child node continues machine learning based on the BNN according to the updated local model parameters. Based on this solution, an example of the method after the child node updates the local model according to the fusion parameters is provided. In this example, after updating the local model parameters, the child node can continue to perform the second round or subsequent rounds of learning on the local model based on the existing data set or in combination with the newly added data set, and repeat the above example. method until the learning result converges, and the machine learning is completed. It should be noted that, in other implementations of the present application, regardless of whether the learning result converges or not, the updated local model can be used to guide the current business of the child node, such as predicting the direction of data, etc.

In a second aspect, a machine learning method is provided, the method is applied to a central node, and the method includes: the central node receives N first messages from N child nodes respectively, the first messages include local model parameters, and the local model parameters are binarized local model parameters. N is an integer greater than or equal to 1. The central node fuses the local model parameters included in the N first messages to obtain the fusion parameters. The central node sends a second message to the M child nodes, where the second message includes a fusion parameter, and M is a positive integer greater than or equal to 1.

Based on this scheme, the central node can receive local model parameters from multiple sub-nodes, and perform fusion based on these local model parameters, thereby obtaining a fusion model with stronger adaptability. The central node can distribute the fusion model to each sub-node, so that the sub-nodes can perform local fusion according to the fusion model to complete a round of learning. Compared with the distributed architecture in the existing FL framework, in this example, the model parameters received by the central node from the child nodes may be binarized model parameters. It can be understood that the data volume of the binarized model parameters is significantly smaller than the data volume of ordinary model parameters (such as high-precision model parameters), so the uploading process is more efficient. After the central node fuses the local model parameters, the obtained fusion parameters can be adapted to the data set type corresponding to each sub-node, so it has more accurate and adaptable features. It should be noted that, in some implementations of the present application, there may be some reference sub-nodes among the N sub-nodes, and these sub-nodes can be used to provide local model parameters, but do not need fusion parameters from the central node. And the nodes that need to update the local model according to the fusion parameters may not be included in the N child nodes. Therefore, in some implementations, the M child nodes may also include child nodes that are not included in the N child nodes, or the M child nodes may be the first part of the N child nodes. The specific determination of the M sub-nodes can be flexibly configured according to the actual implementation process.

In a possible design, the central node fuses N local model parameters to obtain the fusion parameters, including: the central node performs a weighted average on the N local model parameters to obtain the fusion parameters. Based on this scheme, a scheme for obtaining fusion parameters is provided. In this example, the central node can process N local parameter models through a simple weighted average. The weight in the weighted average can be determined according to the size of the input data set during the local training process of the local parameter model. The central node can obtain the size of the data set used in the current round of learning from each child node, and can also obtain the size of the data set used by each child node in the local learning process from other nodes. Of course, in some other implementation manners of the present application, the central node may also adjust the weight in combination with other factors. For example, for some frequently used sub-nodes, their weights can be appropriately increased, while for some neural network models that use less sub-nodes, their corresponding weights can be appropriately decreased.

In a possible design, the fusion parameter included in the second message is a high-precision fusion parameter, or the fusion parameter included in the second message is a binarized fusion parameter. Based on this solution, an example of a solution in which a central node issues fusion parameters is provided. In this example, the central node may directly send the high-precision parameters obtained after the fusion process to the child nodes through the second message. The central node can also send the high-precision parameters after fusion processing to the child nodes through binarization processing and then through the second message. It can be understood that when the data transmission rate needs to be improved, it can be achieved by issuing binary fusion parameters, and when the accuracy needs to be improved, it can be achieved by issuing high-precision fusion parameters.

In a possible design, before the central node sends the second message to the M sub-nodes, the method further includes: the central node determines the system accuracy information according to the N first messages, and the central node determines the system accuracy information according to the system accuracy information. The fusion parameters included in a message are high-precision fusion parameters or binarized fusion parameters. Based on this scheme, a mechanism for the central node to adjust and issue high-precision fusion parameters and binary fusion parameters is provided. In this example, the central node may determine and issue high-precision fusion parameters or binary fusion parameters according to the system accuracy information. For example, when the system accuracy is low, the binarized fusion parameters can be sent, and when the system accuracy is high, the high-precision fusion parameters can be sent. The system accuracy may be determined according to the accuracy of each sub-node, or may be determined by the central node spontaneously verifying according to the model parameters of each sub-node.

In a possible design, the first message further includes: accuracy information. The accuracy information corresponds to the accuracy obtained by verifying the local model parameters included in the first message at the corresponding child node. The central node determining the system accuracy information according to the N first messages includes: the central node determining the system accuracy information according to the accuracy information included in the N messages. Based on the solution, a method for the central node to determine the system accuracy information is provided. In this example, each sub-node can send the accuracy of the model parameter verification obtained in this round of learning to the central node, and the central node can determine the system accuracy according to the accuracy uploaded by each sub-node, and then adjust the following accordingly. form of fusion parameters.

In a possible design, when the system accuracy information is less than or equal to the first threshold, the central node determines that the fusion parameter is a binarized fusion parameter. When the system accuracy information is greater than or equal to the second threshold, the central node determines that the fusion parameter is a high-precision fusion parameter. Based on this solution, a specific example of the form in which the central node determines and delivers fusion parameters is provided. In this example, when the central node determines that the system accuracy is less than or equal to the first threshold, it considers that the learning in the current system is in the preliminary stage, and there is still a lot of room for adjustment of model parameters, so data transmission with higher accuracy is not required , at this time, the improvement of data transmission efficiency should be the main priority. Therefore, when the system accuracy is less than or equal to the first threshold, the central node can issue binarized fusion parameters, thereby increasing the data transmission rate. Correspondingly, when the central node determines that the system accuracy is greater than or equal to the first threshold, it considers that the learning in the current system is close to convergence, and the adjustment space of the model parameters is small, so data transmission with higher accuracy is required. Therefore, when the system accuracy is greater than or equal to the first threshold, the central node can issue high-precision fusion parameters, thereby improving the accuracy of model parameters. The first threshold and the second threshold may be preset, and in different implementations, the first threshold and the second threshold may be the same or different.

In a possible design, when the central node sends the second message to the M sub-nodes, including: when the number of iteration rounds is less than or equal to the third threshold, the central node sends the first message including the binarized fusion parameter to the M sub-nodes Two news. When the number of iteration rounds is greater than or equal to the fourth threshold, the central node sends a second message including high-precision fusion parameters to the M sub-nodes. Based on this solution, another mechanism is provided for the central node to determine the form of the fusion parameters to be issued. In this example, the central node may determine the form of delivering the fusion parameter according to the number of iteration rounds. For example, when the number of iteration rounds is small, that is, less than or equal to the third threshold, the central node can think that the current state should focus on the improvement of data transmission efficiency, so it can choose to issue binarized fusion parameters to Increase data transfer rate. Correspondingly, when the number of iteration rounds is large, that is, when it is greater than or equal to the fourth threshold, the central node can think that the current state should be based on accuracy, so it can choose to issue high-precision fusion parameters to improve the local fusion process. accuracy in .

In a possible design, the central node sends the second message to the M child nodes through broadcasting. Based on the solution, a method for the central node to deliver the second message is provided. In this example, the central node can deliver the second message in the form of broadcasting, without the need to deliver the second message to each child node separately. It can be understood that the content of the data delivered to each child node is similar, therefore, the data can be delivered to each child node at the same time in the form of broadcasting. At the same time, since the transmission is a binarized fusion parameter or a high-precision fusion parameter, the transmission form of the broadcast will not affect the information security.

In a third aspect, a machine learning device is provided, the device can be applied to a sub-node, and a binary neural network model BNN is set in the sub-node, and the device includes: an acquisition unit, used for collecting and acquiring a local data set. BNN-based machine learning to obtain local model parameters corresponding to the local dataset. The sending unit is configured to send a first message to the central node, where the first message includes local model parameters.

In one possible design, the local model parameters included in the first message are binarized local model parameters.

In a possible design, the apparatus further includes: a receiving unit, configured to receive fusion parameters from the central node, where the fusion parameters are obtained by the central node fusion according to local model parameters. The fusion unit is used to fuse according to the fusion parameters and the local model parameters to obtain the updated local model parameters.

In one possible design, the fusion parameters are binarized model parameters, or the fusion parameters are high-precision model parameters.

In a possible design, the first message further includes: accuracy information corresponding to the local model parameters. The acquisition unit is also used to verify the acquired accuracy information according to the local model parameters and the test data set.

In a possible design, the apparatus further includes: a learning unit for continuing machine learning based on the BNN according to the updated local model parameters.

In a fourth aspect, a machine learning device is provided, the device is applied to a central node, and the device includes: a receiving unit, configured to receive N first messages from N sub-nodes respectively, the first messages include local model parameters, local model parameters are the binarized local model parameters. N is an integer greater than or equal to 1. The fusion unit is configured to fuse the local model parameters included in the N first messages to obtain fusion parameters. A sending unit, configured to send a second message to the M sub-nodes, where the second message includes a fusion parameter, and M is a positive integer greater than or equal to 1.

In a possible design, the fusion unit is specifically used to perform a weighted average of N local model parameters to obtain fusion parameters.

In a possible design, the fusion parameter included in the second message is a high-precision fusion parameter, or the fusion parameter included in the second message is a binarized fusion parameter.

In a possible design, the device further includes: a determining unit, configured to determine the system accuracy information according to the N first messages, and the central node determines, according to the system accuracy information, that the fusion parameter included in the first message is high Precision fusion parameters or binarized fusion parameters.

In a possible design, the first message further includes: accuracy information. The accuracy information corresponds to the accuracy obtained by verifying the local model parameters included in the first message at the corresponding child node. The determining unit is specifically configured to determine the system accuracy information according to the accuracy information included in the N messages.

In a possible design, the determining unit is configured to determine that the fusion parameter is a binarized fusion parameter when the system accuracy information is less than or equal to the first threshold. The determining unit is further configured to determine that the fusion parameter is a high-precision fusion parameter when the system accuracy information is greater than or equal to the second threshold.

In a possible design, the sending unit is configured to send the second message including the binarized fusion parameter to the M sub-nodes when the number of iteration rounds is less than or equal to the third threshold. The sending unit is further configured to send a second message including a high-precision fusion parameter to the M sub-nodes when the number of iteration rounds is greater than or equal to the fourth threshold.

In a possible design, the sending unit is specifically configured to send the second message to the M sub-nodes through broadcasting.

In a fifth aspect, a child node is provided, the child node may include one or more processors and one or more memories. One or more memories are coupled to the one or more processors, and the one or more memories store computer instructions. When one or more processors execute the computer instructions, the child nodes are caused to perform the machine learning method of any of the first aspect and possible designs thereof.

In a sixth aspect, a central node is provided, and the central node may include one or more processors and one or more memories. One or more memories are coupled to the one or more processors, and the one or more memories store computer instructions. When the one or more processors execute the computer instructions, the central node is caused to perform the machine learning method of any one of the second aspect and possible designs thereof.

In a seventh aspect, a machine learning system is provided, the machine learning system includes one or more sub-nodes provided in the fifth aspect, and one or more central nodes as provided in the sixth aspect.

In an eighth aspect, a chip system is provided, the chip system includes an interface circuit and a processor; the interface circuit and the processor are interconnected through a line; the interface circuit is used to receive a signal from a memory and send a signal to the processor, and the signal includes a signal stored in the memory. Computer instructions; when the processor executes the computer instructions, the system-on-a-chip executes the machine learning method described in any one of the above-mentioned first aspect and various possible designs, or executes the above-mentioned second aspect and various possible designs The machine learning method described in any of the above.

In a ninth aspect, a computer-readable storage medium is provided, the computer-readable storage medium includes computer instructions, and when the computer instructions are executed, the machine learning method described in any one of the above-mentioned first aspect and various possible designs is executed , or, perform the machine learning method as described in any of the above-mentioned second aspect and various possible designs.

A tenth aspect provides a computer program product, the computer program product includes instructions, when the computer program product runs on a computer, so that the computer can execute any one of the above-mentioned first aspect and various possible designs according to the instructions. The machine learning method described above, or, implement the machine learning method described in any one of the above second aspect and various possible designs.

It should be understood that the technical features of the technical solutions provided in the third aspect, the fourth aspect, the fifth aspect, the sixth aspect, the seventh aspect, the eighth aspect, the ninth aspect and the tenth aspect can all correspond to the first aspect. The machine learning method provided in the one aspect and its possible designs, or the second aspect and its possible designs, can achieve similar beneficial effects, which will not be repeated here.

Description of drawings

Fig. 1 is a kind of realization schematic diagram of machine learning in the communication process;

Fig. 2 is the working schematic diagram of a kind of FL architecture;

3 is a schematic diagram of a comparison between a BNN and an ordinary neural network based on high-precision parameters;

FIG. 4 is a schematic diagram of the composition of a machine learning system according to an embodiment of the present application;

FIG. 5 is a schematic diagram of the composition of another machine learning system provided by an embodiment of the present application;

FIG. 6 is a schematic working logic diagram of a machine learning system provided by an embodiment of the present application;

FIG. 7 is a schematic working logic diagram of another machine learning system provided by an embodiment of the present application;

FIG. 8 is a schematic working logic diagram of another machine learning system provided by an embodiment of the present application;

FIG. 9 is a schematic logical diagram of a machine learning method provided by an embodiment of the present application;

10 is a schematic diagram of a comparison of simulation results provided by an embodiment of the present application;

11 is a schematic diagram of a comparison of another simulation result provided by an embodiment of the present application;

FIG. 12 is a schematic diagram of a comparison of another simulation result provided by an embodiment of the present application;

FIG. 13 is a schematic diagram of the composition of a machine learning apparatus provided by an embodiment of the present application;

FIG. 14 is a schematic diagram of the composition of another machine learning apparatus provided by an embodiment of the present application;

FIG. 15 is a schematic diagram of the composition of a child node according to an embodiment of the present application;

FIG. 16 is a schematic diagram of the composition of a chip system provided by an embodiment of the present application;

FIG. 17 is a schematic diagram of the composition of a central node according to an embodiment of the present application;

FIG. 18 is a schematic diagram of the composition of another chip system provided by an embodiment of the present application.

Detailed ways

In the embodiments of the present application, words such as "exemplary" or "for example" are used to represent examples, illustrations or illustrations. Any embodiments or designs described in the embodiments of the present application as "exemplary" or "such as" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present the related concepts in a specific manner.

In the embodiments of the present application, the terms "first" and "second" are only used for description purposes, and cannot be understood as indicating or implying relative importance or implying the number of indicated technical features. Thus, a feature defined as "first" or "second" may expressly or implicitly include one or more of that feature. In the description of this application, unless stated otherwise, "plurality" means two or more.

In this application, the meaning of the term "at least one" refers to one or more, and the meaning of the term "plurality" in this application refers to two or more. For example, a plurality of second messages refers to two or more more than one second message.

It is to be understood that the terminology used in describing the various described examples herein is for the purpose of describing particular examples and is not intended to be limiting. As used in the description of the various described examples and the appended claims, the singular forms "a", "an")" and "the" are intended to include the plural forms as well, unless the context dictates otherwise. clearly instructed.

It will also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. The term "and/or" is an association relationship that describes an associated object, indicating that there can be three kinds of relationships, for example, A and/or B, which can mean that A exists alone, A and B exist simultaneously, and B exists alone. a situation. In addition, the character "/" in this application generally indicates that the related objects are an "or" relationship.

It should also be understood that, in each embodiment of the present application, the size of the sequence number of each process does not mean the sequence of execution, and the execution sequence of each process should be determined by its function and internal logic, and should not be used in the embodiment of the present application. Implementation constitutes any limitation.

It should be understood that determining B according to A does not mean that B is only determined according to A, and B may also be determined according to A and/or other information.

It will also be understood that the term "includes" (also referred to as "includes", "including", "comprises" and/or "comprising") when used in this specification designates the presence of stated features, integers, steps, operations, elements , and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groupings thereof.

It should also be understood that the term "if" may be interpreted to mean "when" or "upon" or "in response to determining" or "in response to detecting." Similarly, depending on the context, the phrases "if it is determined..." or "if a [statement or event] is detected" can be interpreted to mean "when determining..." or "in response to determining... ” or “on detection of [recited condition or event]” or “in response to detection of [recited condition or event]”.

It should be understood that references throughout the specification to "one embodiment," "an embodiment," and "one possible implementation" mean that a particular feature, structure, or characteristic related to the embodiment or implementation is included in the present application at least one embodiment of . Thus, appearances of "in one embodiment" or "in an embodiment" or "one possible implementation" in various places throughout this specification are not necessarily necessarily referring to the same embodiment. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner in one or more embodiments.

It should also be understood that the "connection" mentioned in the embodiments of the present application may be a direct connection, an indirect connection, a wired connection, or a wireless connection, that is, the embodiment of the present application can The connection method is not limited.

The solutions provided in the embodiments of the present application will be described in detail below.

Since the coupling of neural network-based machine learning and communication systems can significantly improve the flexibility of the communication process, the use mechanism of neural networks in the communication process has a significant impact on the communication effect.

In the current solution, the local node can collect relevant data, and upload the data to the computing node respectively, so that the computing node can learn based on the data, thereby obtaining the corresponding training model. The computing node can issue the training model to each local node, so that each local node can predict and guide its work in the communication process according to the training model.

Exemplarily, with reference to FIG. 1 , it is an implementation example of machine learning in a communication process. Among them, three local nodes perform machine learning through computing nodes as an example. As shown in FIG. 1 , the local node 1 can upload the data set 1 composed of the collected data to the computing node. Similarly, the local node 2 can upload the data set 2 composed of the collected data to the computing node. The local node 3 can upload the data set 3 composed of the collected data to the computing node. Compute nodes can perform machine learning based on these datasets (eg dataset 1 - dataset 3). For example, a basic neural network model can be preset in the computing node, and the basic neural network model can be iteratively learned according to data set 1 to data set 3, and various model parameters (such as weight and bias, etc.) of the basic neural network model are optimized to obtain Iterate over the converged model parameters, thereby completing a round of machine learning. After that, the computing node can deliver the iteratively converged model parameters to each local node. For example, send model parameters to local node 1, local node 2 and local node 3. It should be noted that, in this example, the local node may also be preset with a model of the same type as the basic neural network model in the computing node. The local node can update the locally maintained model according to the received model parameters, thereby obtaining the training model after machine learning.

In this way, the local node can predict and guide its work based on the trained model. For example, in the process involving edge computing, in scenarios such as Internet of Vehicles, automatic driving, and prediction of user input habits, the corresponding training model can be used to judge and predict the corresponding parameters according to the above scheme, thereby greatly improving The working performance of the local node.

It can be seen that, in the solution shown in Figure 1, each local node needs to send the data set collected by it to the computing node respectively. In order to make the results of machine learning accurate enough, in general, the data volume of the data set that the computing node needs to collect is very large, which makes the local node in the process of transmitting the data set to the computing node. The communication link between them creates a great burden. In addition, since the data set is directly sent to the computing node, when some private information of the user is included in the data set, the private information will be directly exposed in the communication link and the computing node, resulting in information privacy. hazard.

In order to solve the above problems, at present, a Federated Learning (FL) architecture with a distributed architecture can be used to couple machine learning and communication, reduce the amount of data transmission, and at the same time properly protect information privacy.

In the FL architecture, a central node (or called a central server) and sub-nodes can be provided. Among them, the number of child nodes can range from several to several thousand according to different tasks and data distribution. For each child node, a neural network model can be preset locally. The neural network model in each child node is the same. The child nodes can obtain the corresponding dataset to learn in the neural network model. During the learning process, each child node performs several iterations (usually once for all data in the dataset), and then uploads the local model with converged model parameters to the central node. The central node will weight and average all the sent local models according to the proportion of the data volume of each child node (this process may also be called fusion of training models), thereby obtaining the fusion training model. Then, the central node can send the obtained training model to all sub-nodes, so that the sub-nodes can continue to train according to the new data set according to the fused training model, or directly use it for calculation and prediction of related scenarios. It should be noted that, in different implementations, the transmission of the training model between the child nodes and the central node may be to directly transmit all the data of the training model, or it may be to transmit only the parameters of the training model.

Exemplarily, with reference to FIG. 2 , it is a working schematic diagram of a FL architecture. Wherein, the architecture includes three sub-nodes (such as local node 1, local node 2 and local node 3) and one central node as an example. For local node 1, relevant data can be collected to obtain data set 1. The local training model corresponding to the data set 1 may be stored in the local node 1 . The local node 1 can input the data set 1 into the local training model, and perform local training, thereby obtaining the converged local training model parameter 1 (eg, marked as W1). Similarly, in other local nodes, processing similar to the above local node 1 can also be performed, and corresponding local training model parameters 2 (eg, identified as W2) and local training model parameters 3 (eg, identified as W3) can be obtained. Understandably, due to the learned acquisition of locally trained model parameters, there is a strong correlation with the input dataset. For example, when the input data sets are different, the obtained local training model parameters may also be different. The three local nodes can respectively send the acquired local training model parameters (such as W1-W3) to the central node. The central node can fuse the obtained W1-W3, and then obtain the fused training model parameters (such as W0). The central node can deliver the fused training model parameters to the local node 1, the local node 2 and the local node 3 respectively. Each local node can update the local training model according to the received W0.

It can be seen that in the FL architecture, the data set is processed locally without being sent to the central node, so that the information security of the data set can be guaranteed. At the same time, since the data volume of the training model or the training model parameters is significantly smaller than the data volume of the data set itself, the data transmission pressure between the child nodes and the central node can be effectively reduced by transmitting the training model.

However, with the development of communication technology, the training efficiency and data transmission volume under the FL architecture still cannot meet the needs of all scenarios. Take the child node as the mobile phone and the central node as the base station as an example. There is information exchange between the mobile phone and the base station at any time. In the FL architecture, although only the learning training model or the training model parameters need to be transmitted, since a parameter often requires a transmission bandwidth of 16 bits (bit) or higher, and a set of training model parameters includes multiple parameters, therefore, for Transmission bandwidth requirements are still very high. Further, during the transmission of the training model (or training model parameters), the mobile phone will not continue to perform local training. Therefore, due to the long transmission time of the training model (or training model parameters), the training efficiency of the entire FL architecture will be higher. Low.

In order to solve the above problems, the solutions provided by the embodiments of the present application can combine a binarized data processing solution and a distributed neural network learning solution to achieve the effect of reducing the pressure on data transmission while improving the efficiency of machine learning.

First, the binarized data processing scheme according to the present application will be described. It should be noted that, in this application, the neural network applying the binarized data processing scheme may also be referred to as a binarized neural network (Binary Neural Network, BNN).

It should be noted that, before the data is transmitted (for example, before the transmission between the child node and the central node), the node that sends the data needs to quantify the data before transmission. For example, take the child node sending data to the central node as an example. The child node can quantify the data to be sent into a sequence consisting of 0 or 1, and then transmit the sequence through the uplink data transmission channel. It can be understood that after the data is quantized, the corresponding data may not be completely equivalent to the data before the quantization. In the process of quantization, the wider the bit width of a sequence after quantization corresponds to a data to be transmitted (such as data called full precision), the higher the precision of the parameters obtained after quantization. In this embodiment of the present application, quantized data with a wider quantization bit width may be referred to as high-precision data, or high-precision parameters. In some implementations, high-precision data or high-precision parameters may refer to data with a sequence bit width greater than or equal to 32 bits.

Unlike neural networks based on high-precision parameters, in BNN, the parameters of the neural network consist of binarized parameters of +1 and -1. In each child node, during the learning (or training) process of the neural network, the parameters of the training model are identified by the binarized parameters. Therefore, when learning BNN, compared with model learning based on high-precision parameters without binarization, it can effectively reduce the amount of calculation when the neural network is used for deduction, and accelerate the convergence of the learning process. At the same time, BNN can also reduce the amount of storage required to store the parameters of the neural network, thereby reducing the amount of communication required to send the entire neural network.

Exemplarily, FIG. 3 is a schematic diagram of a comparison between a BNN and an ordinary neural network based on high-precision parameters. As shown in Fig. 3, in a common neural network, in the case of performing three iterations of calculation, the corresponding high-precision parameters can be W1, W2, and W3 respectively. Corresponding this process to BNN, then, in the case of performing three iterations of calculation, the corresponding binarization parameters can be Wb1, Wb2, and Wb3 respectively. For the same iterative calculation process, take W1 and Wb1 as examples. The corresponding relationship between W1 and Wb1 can be as follows: W1 can be converted into corresponding Wb1 through binarization conversion. For example, the binarization conversion may be: for any element in the calculation matrix corresponding to W1, if the element is greater than 0, the element corresponding to the corresponding position in the Wb1 matrix is denoted as +1. Correspondingly, if the element is less than 0, the element corresponding to the corresponding position in the Wb1 matrix is marked as -1. For Wb1, the corresponding W1 can be obtained through gradient accumulation. It can be understood that in a typical BNN learning process, the binarized parameters can be used for forward calculation and gradient calculation, and the gradients are accumulated on the corresponding high-precision parameters. When a sufficiently large gradient is accumulated on the high-precision parameters, the binarization parameters jump. BNN performs the above process multiple times through iteration, gradually updating the parameters, and finally converges after enough iterations. Therefore, when using a learned BNN, it is only necessary to use the binarized parameters for deduction, and the final output is the deduction result of the BNN.

The following is an example to illustrate the learning of BNN in combination with actual scenarios. Take the image classification problem and the stochastic gradient descent method as an example. Let bs be the batch size of each learning, the binarization parameter is _Wi ^b , and the high precision parameter is _Wi . Among them, the subscript i represents the user index. During learning, each time bs data is extracted from the data set (x ₁ , y ₁ ), (x ₂ , y ₂ ),..., (x _N , y _N ) in a non-replacement manner, denoted as ( x′ ₁ ,y′ ₁ ),(x′ ₂ ,y′ ₂ ),…,(x′ _bs ,y′ _bs ), N is the number of data in the dataset. Then, the child nodes can compute

In this loss calculation formula, L represents the structure of the neural network used, Wi ^b is the current binarization parameter, _lossfunc (·,·) is the loss function of the neural network, and loss is the final loss value. After calculating the loss, the child node performs back-propagation to calculate the gradient value of the binarization parameter, that is,

And the gradient is accumulated to the high-precision parameter, W _i ←W _i -ηgrad, where η is the learning rate, which can be preset before the calculation. Finally, if a certain high-precision parameter changes sign in this iteration, the corresponding binarized parameter sign is also flipped (eg, flipped from +1 to -1). After each of the above iterations, continue to extract bs data from the remaining data sets (if the data is insufficient, all are extracted), and repeat the above iterations. A round of learning ends when all data has been extracted and data extraction cannot continue. In the learning process of binary neural network, this process needs to be repeated many times until the method converges. In this example, the convergence of the method can be determined by comparing the calculation result of loss with the calculation results of the previous one or several times. For example, after this loss calculation, if the difference between the settlement result of this loss and the previous three calculation results is within the preset range, then the method is considered to be convergent.

It can be seen that compared with ordinary neural networks, BNN has the characteristics of fast convergence and small data transmission. However, there is currently no solution to using BNN in distributed machine learning systems. However, if BNN is directly applied to the existing distributed learning-based system (such as FL architecture), since all data transmission is carried out in the form of binarization, the learning accuracy of the entire system will be too low and Unusable situation.

In the machine learning method provided by the embodiment of the present application, the above-mentioned BNN can be used in a learning framework based on a distributed machine learning system such as FL, and according to the machine learning method provided by the embodiment of the present application, the entire FL framework can be used in the machine learning method. The data transmission pressure of the machine learning system is significantly relieved, and at the same time, the learning efficiency of the entire machine learning system is improved by combining the needs of different scenarios. It should be noted that, in this application, local training in each sub-node can be performed locally based on a binarized neural network model. In different usage scenarios, the neural network model can be selected flexibly. For example, the neural network model can have network structures such as a fully connected network and a convolutional neural network.

The machine learning method provided by the embodiments of the present application can be applied to wireless communication systems including 3G/4G/5G/6G, or satellite communication.

The wireless communication system is usually composed of cells, each cell includes a base station (Base Station, BS), and the base station provides communication services to multiple mobile stations (Mobile Station, MS). The base station includes BBU (Baseband Unit, Chinese: Baseband Unit) and RRU (Remote Radio Unit, Chinese: Remote Radio Unit). The BBU and RRU can be placed in different places, for example, the RRU is far away and placed in an area with high traffic volume, and the BBU is placed in the central computer room. BBU and RRU can also be placed in the same computer room. The BBU and RRU can also be different components under one rack.

It should be noted that the wireless communication systems mentioned in the solution of the present invention include but are not limited to: Narrow Band-Internet of Things (NB-IoT), Global System for Mobile Communications (GSM), Enhanced Data Rate for GSM Evolution (EDGE), Wideband Code Division Multiple Access (WCDMA), Code Division Multiple Access 2000 (Code Division Multiple Access, CDMA2000), Time Division The three major application scenarios of synchronous code division multiple access system (Time Division-Synchronization Code Division Multiple Access, TD-SCDMA), long term evolution system (Long Term Evolution, LTE) and the next generation 5G mobile communication system are eMBB, URLLC and eMTC.

In this example, the base station is a device deployed in a radio access network to provide a wireless communication function for an MS. The base stations may include various forms of macro base stations, micro base stations (also called small cells), relay stations, access points, and the like. In systems using different radio access technologies, the names of devices with base station functions may be different. For example, in LTE systems, it is called an evolved NodeB (evolved NodeB, eNB or eNodeB). In the 3rd Generation (3G) system, it is called a Node B (Node B) and so on. For convenience of description, in all the embodiments of this application, the above-mentioned apparatuses for providing wireless communication functions for MSs are collectively referred to as network equipment or base stations or BSs.

The MS involved in the solution of the present invention may include various handheld devices, vehicle-mounted devices, wearable devices, computing devices, or other processing devices connected to a wireless modem with wireless communication capabilities. The MS may also be referred to as a terminal (terminal), and the MS may be a subscriber unit (subscriber unit), a cellular phone (cellular phone), a smart phone (smart phone), a wireless data card, a personal digital assistant (Personal Digital Assistant, PDA) ) computer, tablet computer, wireless modem (modem), handheld device (handset), laptop computer (laptop computer), machine type communication (Machine Type Communication, MTC) terminal, etc.

Please refer to FIG. 4 , which shows the composition of a machine learning system provided by an embodiment of the present application. As shown in FIG. 4 , the machine learning system may include a central node and multiple sub-nodes (eg, sub-node 1-sub-node N). The child node may also be called a local node. The central node can communicate with each sub-node in a wired or wireless manner. For example, the central node can receive the local training results uploaded by each sub-node in the uplink transmission channel. Wherein, the local training result may include parameters of the local training model, or the local training model itself. For another example, the central node may deliver the merged local training model parameters or the local training model itself to each sub-node through a broadcast or downlink transmission channel. In order to facilitate the description of the solutions provided by the embodiments of the present application, the following takes the data transmitted at the child node and the central node corresponding to the local training model parameters and the fusion training model parameters as an example. In different scenarios, the transmission of local training model parameters or local training model can be realized by transmitting the binarized parameters corresponding to each parameter, or by directly transmitting the high-precision parameters corresponding to each parameter. It should be noted that the parameters of the local training model may include the first item or multiple parameters of the parameters such as the weight and bias of the neural network, and the parameters of the local training model may also be their corresponding gradients.

Exemplarily, take the central node as the base station and the child node coupled with the central node as the mobile phone as an example. As shown in Figure 5, the machine learning system may include a base station and N mobile phones (such as mobile phone 1 - mobile phone N). The same basic training model may be pre-stored in the N mobile phones and base stations. The mobile phone 1 can collect data of the corresponding scene, thereby forming the corresponding data set 1 . Mobile phone 1 can input the data set 1 into the basic training model for local training. It can be understood that, in this embodiment of the present application, the mobile phone can perform local training through the BNN. For example, the mobile phone 1 can input the data set 1 into the basic training model, and obtain the local model parameter 1 according to the machine learning method of high-precision parameters. The local model parameter 1 may be a high precision parameter. In some implementations, the local model parameters 1 may include high precision weights and biases. The mobile phone 1 can perform reverse deduction based on high-precision weights and biases, thereby completing the learning of a part of the data in the data set 1. Next, the mobile phone 1 can perform binarization conversion on the corresponding high-precision weights and biases to obtain corresponding binarization parameters. In the learning process of the subsequent data, the mobile phone 1 can perform training and learning based on the binarization parameter. Since the data volume of the binarization parameters is significantly smaller than the high-precision data volume, the mobile phone 1 can quickly acquire and complete the local training to acquire the converged weights and biases. Since the local training is based on the binarized parameters, the weight and bias results obtained by the mobile phone 1 can be the binarized parameters.

Similar to the processing in mobile phone 1, other mobile phones, such as mobile phone 2-mobile phone N, can also perform similar local training respectively to obtain corresponding binarization parameters. In this application, the binarized parameters obtained by local training of each mobile phone may be referred to as local parameters. For example, the binarized weights and biases obtained by mobile phone 1 after one round of learning can be called local parameters 1. The binarized weights and biases obtained by mobile phone 2 after one round of learning can be called local parameters 2. The binarized weights and biases obtained by the mobile phone N after one round of learning can be called local parameters N.

Each mobile phone can send its corresponding local parameters to the base station respectively. The base station may fuse the acquired N local parameters (eg, local parameter 1-local parameter N) to obtain a normalized fusion parameter. Then, the base station can distribute the fusion parameters to each mobile phone respectively. After receiving the fusion parameters, the mobile phone can update the local basic training model accordingly, and perform the next round of learning or directly use it for data prediction in actual scenarios.

With reference to FIG. 4 , the following describes the processing and transmission of data in the sub-nodes and the central node by including three sub-nodes in the machine learning system.

Please refer to Figure 6. A central fusion module may be set in the central node, and a learning module and a local fusion module may be set in each sub-node.

Take child node 1 as an example. When the child node executes the machine learning method provided in the embodiment of the present application, the learning module in the child node may be used to perform local training on the data set to obtain the binarized local parameters. Subnode 1 can send the local parameter to the central node. Similarly, the child node 2 and the child node 3 may also send their corresponding local parameters to the central node. The central fusion module in the central node can be used to fuse all received local parameters to obtain fusion parameters. Next, the central node may deliver the fusion parameter to the sub-node 1 to the sub-node 3 respectively. In sub-node 1, the local fusion module can be used to update the local training model according to the received fusion parameters, so as to obtain a local training model based on the fusion parameters. Similarly, in sub-node 2, the local fusion module can be used to update the local training model according to the received fusion parameters, so as to obtain the local training model based on the fusion parameters. In sub-node 3, the local fusion module can be used to update the local training model according to the received fusion parameters, so as to obtain the local training model based on the fusion parameters.

In some implementations, the local fusion module may be an independent module that is not included in the child node. For example, referring to Fig. 7, when the local fusion module is independent of each sub-node, the central node can only send the fusion parameters to the local fusion module, and the local fusion module can be used to update the local training model, and the updated local The training model is distributed to child node 1 to child node 3 respectively. In this way, the performance requirements for the sub-nodes can be reduced, and at the same time, since the central node only needs to send the fusion parameters to the local fusion module, the signaling overhead of the central node can be reduced.

In other implementation manners of the present application, the local fusion module may also be set in some sub-nodes. For example, in conjunction with Figure 8. Wherein, the local fusion module is integrated in the sub-node 3, and the local fusion modules of the sub-node 1 and the sub-node 2 can be set independently of the sub-nodes. Under this architecture, after acquiring the fusion parameters, the central node can send the fusion parameters to the local fusion modules corresponding to the sub-node 1 and the sub-node 2, and the sub-node 3 respectively. In this way, the local fusion modules corresponding to the sub-node 1 and the sub-node 2 can deliver the local training model updated based on the fusion parameters to the sub-node 1 and the sub-node 2. As for the sub-node 3, the local fusion module integrated therein can be used to update the local training model according to the received fusion parameters, thereby obtaining the updated local training model.

It should be understood that the composition of the learning system shown in FIG. 6 , FIG. 7 and FIG. 8 in this example is only an example, and in other implementations of the present application, the system may also include multiple independent Configured local fusion module. For example, take the system configured with 5 sub-nodes (eg, sub-node 1-sub-node 5) and 3 local fusion modules (eg, local fusion module 1-local fusion module 3) as an example. In some scenarios, local fusion module 1 can provide local fusion services for sub-node 1 and sub-node 2, local fusion module 2 can provide local fusion services for sub-node 3 and sub-node 4, and local fusion module 3 can provide local fusion services for sub-node 5 Fusion Services. In other scenarios, the corresponding relationship between the local fusion module and the child nodes can also be reconfigured. Sub-node 5 provides local fusion service, and local fusion module 3 can provide local fusion service for sub-node 4. Of course, in some scenarios, one or part of the three local fusion modules can also provide local fusion services to sub-nodes. For example, local fusion module 1 can provide local fusion services for sub-node 1 and sub-node 3, and local fusion module 2 Sub-node 2, sub-node 5, and sub-node 4 can be provided with local fusion services, and local fusion module 3 can be in a dormant state such as sleep.

For the convenience of description, the following takes the integration of the local fusion module in the sub-node as an example. FIG. 9 shows a logical schematic diagram of a machine learning method provided by an embodiment of the present application. As shown in Figure 9, the method may include:

S901, the child node 1 performs local learning.

S902, the child node 1 acquires local parameters.

S903, the child node 1 sends the local parameter to the central node.

It can be understood that, in combination with the above description, for other sub-nodes in the machine learning system, the above-mentioned S901-S903 can also be executed respectively, so that the central node can obtain N local parameters. Wherein, in some implementations, the local parameter may be a binarized local parameter.

S904, the central node fuses the N local parameters.

S905, the central node obtains fusion parameters.

S906, the central node sends the fusion parameter to the child node 1.

S907, the child node 1 updates the local model parameters according to the fusion parameters.

It can be understood that, for other sub-nodes in the machine learning system, the central node can also execute the above S906 correspondingly, and the corresponding sub-nodes can also execute the above S907, so as to update its local training model.

It can be seen that, in the machine learning method provided by the embodiments of the present application, since the sub-nodes can send the binarized local parameters to the central node, it is not necessary to send the high-precision local parameters with a large amount of data to the central node, so It can significantly reduce the communication pressure between the child nodes and the central node. Since the amount of transmitted data is small, the time-consuming is correspondingly reduced, which can increase the proportion of local training in the entire learning time, thereby improving the learning efficiency. In addition, in some implementations of the present application, the central node can send the fusion parameters to each sub-node by broadcasting, so that the central node does not need to send the fusion parameters to each sub-node one by one, thereby saving the central node signaling overhead.

It should be noted that, in the embodiment of the present application, in order to be able to adapt to the learning needs in different scenarios, the embodiment of the present application also provides three modes in the method shown in FIG. 9 , so that the machine learning system can learn In scenarios with different efficiency and learning accuracy, select the corresponding mode to achieve the effect of rapid convergence or high-accuracy learning.

Mode 1: Binarized parameters are used when uploading model parameters, and high-precision parameters are used when central node models are downloaded.

Due to the high-precision parameters used in the download, the child nodes can more accurately update and obtain the local training model. Due to the binarized parameters used for uploading, the overall learning efficiency is still significantly higher than that of the existing FL architecture that uses high-precision parameters for both uplink and downlink. This mode can be used in scenarios that have certain requirements for learning efficiency and learning accuracy.

Mode 2: Binarization parameters are used when uploading and downloading model parameters.

Because the model parameters uploaded and downloaded are all binarized parameters. Therefore, the data transmission stress of the system is minimal. This mode can be used in scenarios that require high learning efficiency.

Mode 3: High-precision parameters are used when uploading and downloading model parameters.

Due to the high-precision parameters used in the download, the child nodes can more accurately update and obtain the local training model. This mode can be used in scenarios that require high learning accuracy.

It should be understood that, with reference to FIG. 6 and FIG. 9 , in each round of iteration, the sub-node trains a local model based on the existing model and local data and uploads it to the central server. For each child node,

(i=1,2,...,M, where M is the number of child nodes) represents all binarization parameters of the ith child node. The central server receives local parameters uploaded from all child nodes

Then, execute the central model fusion method to obtain the central parameters

and will

It is distributed to all child nodes in the form of broadcast. child node receives

, immediately using the local model fusion method will receive the

It is _fused with the local high precision parameter Wi. After fusion, the sub-nodes are re-binary quantized

Then start the next round of local training. The central model fusion method and the local model fusion method differ according to the parameter transmission mode (eg mode 1-mode 3).

The following describes the specific calculation process of center fusion and local fusion when each mode is adopted.

1. For center fusion in mode 1:

After receiving the binarized parameters uploaded by all sub-nodes, the central server performs a weighted average on them to obtain the parameters of the central end.

It should be noted that when the amount of data on each node is equal, this fusion formula becomes

at this time,

The value itself has no practical meaning. However, when M is known, it can reflect how many nodes have positive or negative high-precision parameters for a certain parameter. For example, when the number of nodes is M=100, the data received from the central server

, the parameter value of a certain position is -0.2, which means that at this position, there are 60 sub-nodes with negative high-precision parameters and uploaded -1, and the remaining 40 sub-nodes with positive high-precision parameters and uploaded +1. When the amount of data on each node is not equal, due to the large number of system nodes, it can be considered that

It reflects the proportion of positive or negative cumulative gradients in the data of all nodes, considering that for node i, the total proportion of local data

So for each node, it can be considered as the total number of nodes

and each node has the same amount of data. Because at this time, the equivalent M' for each node is different, but it does not affect the specific implementation details. In order to simplify the calculation and facilitate the description of the derivation process of the fusion formula, the following descriptions are given by taking the same size of each child node data set as an example. It can be understood that, in other implementation manners of the present application, the size of each sub-node data set may also be different. When the size of the data set is inconsistent, the calculation process is similar, which will not be repeated here.

2. For local fusion in mode 1:

The local fusion module receives

Then perform local fusion calculation, because the high-precision parameters of all child nodes are accumulated based on the local data set, so they have strong correlation, and because of the large number of nodes, it can be assumed that the high-precision parameters of all nodes are approximately the same normal The sampled values of the distribution. Further, it is assumed that the covariance matrix of this normal distribution is a diagonal matrix, that is, the values of different position parameters are irrelevant. Therefore, it can be assumed that each child node parameter Wi _,j ∼N(μ _j ,σ _j ), wherein Wi _,j represents the parameter value corresponding to the jth parameter of the ith node. The problem is transformed into, under the above known conditions, how to calculate according to Wi _,j and

The value of μ _j is estimated from the value of , and the original Wi _,j is replaced by the estimated value as an estimate of the mean value of W _k,j (k=1,2,...,M) for all nodes. Note that the local high-precision parameters of each node are likely to be different, and when estimated completely independently, the results will also be different. In the process of analyzing and solving this problem, for the sake of brevity, all subscripts are omitted, and all symbols represent the parameters corresponding to the jth parameter of the ith node.

From the center in this mode

The calculation process of , it can be found that the number of all child nodes of W>0 is

First, assuming that the local W>0, record

0≤θ≤1, which is the proportion of W>0 in all nodes except this node. Then in the remaining M-1 nodes, there are (M-1)θ nodes W _k,j >0. It is equivalent to obtaining (M-1)θ positive observations of unknown specific values in a group of M observation samples of a normal distribution N(μ,σ), (M-1)(1-θ) Negative observations with unknown specific values, and a known positive observation W, the mean μ of this normal distribution can be considered as the mean of all child nodes for the local W>0 observations, which can reflect this to a certain extent. Global cumulative gradient magnitude. Therefore, the maximum likelihood can be obtained by the maximum likelihood principle

value.

The problem is constructed as:

α|x～N(0,1)} represents the upper quantile function of the normal distribution, and after taking the logarithm of the above formula, the equivalent problem can be obtained:

Perform parameter replacement, note y=μ/σ, x=W/σ, because the value of W has nothing to do with the optimization goal, so the problem can be transformed into:

In order to solve the above optimization problem, the optimization solution for x is firstly carried out. Due to the constraint of W>0, there must be x>0. At this point, all items related to x are extracted, and the following sub-problems can be obtained:

It can be easily solved by derivation

After bringing it into the original problem, the problem can be equivalently transformed into:

The optimization objective is an increasing and then decreasing function of y. The value of this function is only related to θ and M. Since M can be known in advance, the optimal solution can be drawn in advance.

The relationship curve between θ and θ, fit this relationship with a specific function, and save it locally to reduce the complexity of solving optimization problems. This example uses the logarithmic function

A least-squares fit is performed on this curve, where θ is in the range [0,1] and c>0 to ensure that the function is well defined. This function can be used as a given M,

Approximate function expression for θ, stored locally. Then we get the solution of the original problem when W>0:

Considering the case of W < 0, and in the same way, the solution to the original problem can be obtained as:

In the process of local fusion, for each W, according to the logarithmic approximate expression obtained in the above process,

then use

Calculate

and calculate

last use

Replace W. Among them, α>1 represents the magnification, which is a parameter selected in advance, generally between 1.5 and 2.5. (For the sake of system stability, in order to make the system converge more stably, it should be ensured that with the deepening of the calculation, W is always It tends to be far away from 0, but not too large. Since the absolute size of the high-precision parameters is of little significance, but the relative size is important, all parameters can be proportionally enlarged and then limited). clamp( ) represents the clipping operation, clamp(x,a,b)=a when x<a, clamp(x,a,b)=x when a≤x≤b, clamp(x,a when x>b ,b)=b.

Finally, the formula for local fusion is as follows:

According to the local fusion formula, the local fusion module can update the local training model according to the fusion parameters obtained.

3. For center fusion in mode 2:

After the central server receives the binarized parameters uploaded by all sub-nodes, sum them and take the sign to obtain the parameters of the central end.

If the result is exactly 0, -1 or +1 is randomly issued. at this time,

The meaning of is that for a certain parameter, if the larger proportion of all node data is positive, the binarization parameter is positive, then it takes +1, otherwise, it takes -1. At this time, this parameter only reflects a general overall trend and contains less information.

4. For local fusion in mode 2:

child node receives

After local fusion calculation, use a simple linear fusion method:

Among them, sign(·) represents the sign function, and β is between 0 and 1, which is a parameter selected in advance.

5. For center fusion in mode 3:

After the central server receives the high-precision parameters uploaded by all the sub-nodes, it averages them to obtain the parameters of the central end.

at this time,

The meaning of is the weighted average cumulative gradient calculated by all child nodes according to the local data set.

6. For local fusion in mode 3:

child node receives

directly after

Update the local high-precision parameters, i.e.

This mode is basically the same as the fusion method under the traditional FL framework, the only difference is that the local model is BNN, and the forward calculation and deduction complexity of training is lower than that of ordinary neural networks.

Based on the above descriptions of 1-6, it can be seen that for different transmission modes, corresponding local fusion and central fusion methods can be used to achieve smooth corresponding learning calculations.

As described above, in different implementation scenarios, based on the method shown in FIG. 9 , any one of the above Mode 1, Mode 2, or Mode 3 may be selected as the transmission mode, so as to obtain corresponding beneficial effects. It should be noted that no matter the above mode 1 or mode 2 or mode 3 is adopted, since the BNN is used for local training of the child nodes locally, the results can be obtained iteratively faster than the existing FL architecture.

In other implementation manners of the present application, the above-mentioned mode 1 and mode 2, or mode 1 and mode 3, or mode 2 and mode 3, or mode 1 and mode 2 and mode 3 can also be combined to realize the above-mentioned mode as shown in FIG. 9 . method.

Exemplarily, mode 2 can be used at the beginning of learning. In this way, although the training results with higher accuracy cannot be obtained, the previous rounds of learning can be quickly converged. It is understandable that, in general, a learning process requires multiple rounds of learning to complete. In the previous rounds of learning, since the parameters in the model are likely to change in the subsequent learning process, the accuracy requirements for the previous rounds of learning are not high. If the convergence speed of the previous rounds of learning can be improved, it will greatly contribute to the improvement of the overall learning efficiency. When the accuracy reaches a certain level, mode 1 can be used to continue learning, thereby appropriately improving the accuracy of the parameters. When the accuracy is improved to a certain level, mode 3 can be used to continue learning, thereby obtaining the result with the highest accuracy.

As an example, Table 1 shows a correspondence between accuracy and mode selection.

Table 1

As shown in Table 1, when the accuracy is less than 0.65, the central node determines to continue to use mode 2 for learning. When the accuracy is between [0.65, 0.8], the central node can determine to use mode 1 for learning. When the accuracy is greater than or equal to 0.8, the central node can determine to learn through mode 3.

The accuracy may be obtained by the central node calculated according to the accuracy uploaded by each sub-node. For example, the central node can be based on

Calculate the accuracy of the acquisition system, and determine the transmission mode according to the corresponding relationship shown in Table 1. It should be noted that 0.65/0.8 in Table 1 are all setting examples of a threshold, and in other implementations of the present application, the threshold may also be set to other values, or flexibly adjusted according to the environment.

It should be noted that, for each child node, the local training model corresponding to the local parameters can be verified through the test set stored therein, thereby obtaining the corresponding accuracy, and sending the accuracy to the central node. In other implementation manners of the present application, the operation of verifying the acquisition accuracy may also be completed at the central node. For example, the training model corresponding to the local node may be stored in the central node. After receiving the local parameters sent by the local node, the central node can update the training model according to the local parameters, and verify the accuracy corresponding to the local parameters based on the updated training model and the test set stored in the central node. In this way, the corresponding accuracy can be obtained. Similarly, for local parameters uploaded by other nodes, the central node can also obtain the corresponding accuracy. Thus, the central node can calculate the accuracy of the corresponding system based on the accuracy corresponding to each local parameter, and then determine the data transmission mode.

In other implementation manners of the present application, the accuracy may also be the accuracy obtained by the central node according to the fusion parameters after central fusion and the training model combined with the test set or the verification data set for verification. In different implementation manners, the method for determining the accuracy may be flexibly determined, which is not limited in this embodiment of the present application.

In the above description, the selection of the transmission mode by accuracy is used as an example for description. In other implementations of the present application, the central node may also determine the transmission mode according to other methods. For example, the central node may determine the transmission mode according to the number of iteration rounds N. Table 2 shows a possible correspondence between the number of iteration rounds N and the transmission mode.

Table 2

迭代轮数NIteration rounds N	参数传输模式parameter transfer mode
N≤5N≤5	模式2 Mode 2
5<N≤505<N≤50	模式1Mode 1
N>50N>50	模式3 Mode 3

As shown in Table 2, when the number of iteration rounds is within 5, the central node can determine that in the current learning, the need to improve the convergence speed is higher, so mode 2 can be used for learning. When the number of iteration rounds is greater than 5 rounds and within 50 rounds, the central node can determine that the accuracy needs to be appropriately improved in the current learning, and then adopts mode 1 to learn. When the number of iteration rounds is greater than 50, the central node can consider that the learning is about to end, and the parameter transmission needs to be performed with the highest accuracy, that is, mode 3 is used for learning.

It should be noted that, in some implementation manners of the present application, when the three modes are switched, the central node may instruct each sub-node to adjust the parameter transmission mode. For example, adding a parameter transmission mode field to indicate, three modes can be indicated by 2 bits, for example, 00 indicates that the next round parameter transmission mode is mode 1, 01 indicates mode 2, and 10 indicates mode 3. The parameter transmission mode field can be delivered together with the central fusion model, or delivered through a dedicated control channel.

From the above solution description, it can be understood that, in combination with the machine learning system shown in FIG. 4 to FIG. 8 , using the machine learning method shown in FIG. 9 can significantly increase the system learning efficiency. At the same time, combined with the selection and use of mode 1, mode 2 and mode 3, the corresponding mode can be adaptively selected according to the different requirements for accuracy and convergence speed in the current learning process to obtain the optimal learning result.

In order to illustrate the effects that can be achieved by the solutions provided in the embodiments of the present application, the following describes the effects that can be achieved by the solutions described in the present application in combination with simulation data.

Taking MNIST handwritten digit recognition as an example, in the simulation, a 4-layer convolutional neural network consisting of two convolutional layers and two fully connected layers is used. The training set in the MNIST dataset is evenly distributed on 100 child nodes. Each node has a total of 600 pairs of data, including 60 pairs of each type of data, and the test set is only stored on the central server. The final training set related results are the mean of the results of 100 nodes, and the test set related results are performed by the central server based on the binarized results of local parameters. High-precision parameters are quantized with 32 bits.

Use 4-layer convolutional neural network for training, the specific structure is: 3*3*16 convolutional layer, normalization layer, 2*2 maximum pooling layer, tanh activation function, 3*3*16 convolutional layer, normalization layer A normalization layer, a 2*2 max pooling layer, a tanh activation function, a 784*100 fully connected layer, a normalization layer, a tanh activation function, a 100*10 fully connected layer, a softmax activation function, and finally a cross-entropy loss function. Using the Adam gradient update method, the initial value of the learning rate η is 0.05, and then decreases every 30 iterations to 0.02, 0.01, 0.005, and 0.002.

In the local fusion method of mode 1, assuming M = 100, the verification is fitted with a logarithmic function

The curve of the function, plotted out

The curve and the logarithmic approximation obtained using interpolation, the results of which are shown in Figure 10. It can be seen that the approximate curve drawn by the local fusion method shown in Mode 1 (the expression is

), which basically coincides with the real curve drawn actually. Therefore, the local fusion method of the above mode 1 can better simulate the real situation.

FIG. 11 shows the curve of the accuracy rate of the test set changing with time when the present invention is applied to the MNIST handwritten digit recognition data set. Among them, centralized training means that all data is collected to a central node for training, and this curve is used as a baseline for comparison. It can be seen that in terms of the accuracy of the test set, both

modes

1 and 3 can achieve close to the centralized effect. Mode 3 has a slight advantage in the training effect, but mode 1 requires a small amount of communication per iteration and consumes The communication cost is much lower than that of mode 3, and mode 3 is not very stable. In Mode 2, although the final performance is poor, it is highly competitive in the early stage of training due to the extremely low amount of communication required, and the performance cannot be compared with the first two modes as the training progresses. Generally speaking, mode 1 in the present invention is suitable for most practical situations, mode 2 is suitable for application in the early stage of training, or when communication resources are very tight and the requirements for learning effect are very low, and mode 3 is suitable for later training of well-trained models. Fine tune.

The following (Table 3) gives a comparison of the amount of computation and communication per child node required to achieve 90% and 95% accuracy on the test set for the first time. Among them, the results when α=1.5 and β=0.3 are selected as the results of mode 1 and mode 2, and the calculation amount is 1 for each child node to perform forward calculation and backpropagation, so each child node training needs to be performed 10 times Forward computation and backpropagation require 10 computations.

table 3

It can be seen that the total number of parameters of the system is 82242. When the system works in mode 1, it needs to upload 10.04KB and download 66.70KB data each time; when it works in mode 2, it needs to upload and download 10.04KB data each time; mode 2 Because the accuracy rate of 95% cannot be achieved, the data is vacant; when working in mode 3, each 321.26KB data needs to be uploaded and downloaded each time. Compared with the training mode (mode 3) of ordinary federated learning, the communication amount required for each iteration of the mode 1 method in the present invention is greatly reduced. A magnitude reduction in the total time required for distributed machine learning tasks.

In the scenario where the data sets in each sub-node are not independent and identically distributed, the simulation results of the machine learning method provided by the embodiments of the present application are as follows:

Also taking MNIST handwritten digit recognition as an example, in the simulation, a 4-layer convolutional neural network consisting of two convolutional layers and two fully connected layers is used. The network structure is the same as that in Section 2.4.1. The initial learning rate η is The value is 0.02 and then decreases every 30 iterations to 0.01, 0.005, 0.002. The training set in the MNIST data set is unevenly distributed on 100 sub-nodes. The specific distribution method is as follows: first, the data set is divided into 10 parts according to the type, and then each part is divided into 100 parts to obtain 1000 sub-data sets. The 1000 subdatasets are randomly assigned to 100 child nodes, and each child node is randomly assigned to 10 subdatasets. The test set is only saved on the central server.

FIG. 12 is a curve showing the change of the accuracy rate of the test set when the present invention is applied to the non-IID MNIST handwritten digit recognition data set. Among them, the mode switching in the hybrid mode is judged by the number of iteration rounds. Initially, mode 2 is used for training. After 5 iterations, it is changed to mode 1, and then it is changed to mode 3 after 50 iterations. In this case, due to the non-IID characteristics of the dataset, the system will suffer a certain performance loss. It can be seen that the mixed mode can also achieve better results in the end.

Table 4 shows the communication and computational costs required to achieve a certain accuracy for 5 consecutive times for the three modes and the hybrid mode.

Table 4

Combined with the simulation results in Table 4, it can be seen that it is difficult to achieve the 85% accuracy requirement by simply using Mode 1 and Mode 2, while the hybrid mode can be achieved, and the communication overhead is more advantageous than Mode 3.

The foregoing mainly introduces the solutions provided by the embodiments of the present application from the perspectives of the child nodes and the central node. In order to realize the above-mentioned functions, it includes corresponding hardware structures and/or software modules for executing each function. Those skilled in the art should easily realize that the present application can be implemented in hardware or a combination of hardware and computer software with the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein. Whether a function is performed by hardware or computer software driving hardware depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.

In this embodiment of the present application, the devices involved may be divided into functional modules according to the above method examples. For example, each functional module may be divided into each function, or two or more functions may be integrated into one processing module. The above-mentioned integrated modules can be implemented in the form of hardware, and can also be implemented in the form of software function modules. It should be noted that, the division of modules in the embodiments of the present application is schematic, and is only a logical function division, and there may be other division manners in actual implementation.

Referring to FIG. 13 , a machine learning device 1300 is provided in an embodiment of the present application. The device can be applied to a sub-node, and a binary neural network model BNN is set in the sub-node. The device includes: an obtaining unit 1301 for Perform BNN-based machine learning on the collected local data set to obtain local model parameters corresponding to the local data set. The sending unit 1302 is configured to send a first message to the central node, where the first message includes local model parameters.

In a possible design, the apparatus further includes: a receiving unit 1303, configured to receive fusion parameters from the central node, where the fusion parameters are obtained by the central node by fusion according to local model parameters. The fusion unit 1304 is configured to fuse the fusion parameters and the local model parameters to obtain updated local model parameters.

In a possible design, the first message further includes: accuracy information corresponding to the local model parameters. The obtaining unit 1301 is further configured to verify the obtained accuracy information according to the local model parameters and the test data set.

In a possible design, the apparatus further includes: a learning unit 1305, configured to continue machine learning based on the BNN according to the updated local model parameters.

It should be noted that, all relevant contents of the steps involved in the above method embodiments can be cited in the functional description of the corresponding functional module, which will not be repeated here.

Referring to FIG. 14 , a machine learning apparatus 1400 is provided in an embodiment of the present application. The apparatus is applied to a central node. The apparatus includes: a receiving unit 1401, configured to receive N first messages from N child nodes, respectively. The message includes local model parameters, which are binarized local model parameters. N is an integer greater than or equal to 1. The fusion unit 1402 is configured to fuse the local model parameters included in the N first messages to obtain fusion parameters. The sending unit 1403 is configured to send a second message to the M sub-nodes, where the second message includes a fusion parameter, and M is a positive integer greater than or equal to 1. The M child nodes are included in the N child nodes.

In a possible design, the fusion unit 1402 is specifically configured to perform a weighted average on the N local model parameters to obtain the fusion parameters.

In a possible design, the apparatus further includes: a determining unit 1404, configured to determine the system accuracy information according to the N first messages, and the central node determines, according to the system accuracy information, that the fusion parameter included in the first message is High-precision fusion parameters or binarized fusion parameters.

In a possible design, the first message further includes: accuracy information. The accuracy information corresponds to the accuracy obtained by verifying the local model parameters included in the first message at the corresponding child node. The determining unit 1404 is specifically configured to determine the system accuracy information according to the accuracy information included in the N messages.

In a possible design, the determining unit 1404 is configured to determine that the fusion parameter is a binarized fusion parameter when the system accuracy information is less than or equal to the first threshold. The determining unit 1404 is further configured to determine that the fusion parameter is a high-precision fusion parameter when the system accuracy information is greater than or equal to the second threshold.

In a possible design, the sending unit 1403 is configured to send the second message including the binarized fusion parameter to the M sub-nodes when the number of iteration rounds is less than or equal to the third threshold. The sending unit 1403 is further configured to send a second message including a high-precision fusion parameter to the M sub-nodes when the number of iteration rounds is greater than or equal to the fourth threshold.

In a possible design, the sending unit 1403 is specifically configured to send the second message to the M sub-nodes through broadcasting.

It should be noted that, all relevant contents of the steps involved in the above method embodiments can be cited in the functional descriptions of the corresponding functional modules, which will not be repeated here.

FIG. 15 shows a schematic composition diagram of a child node 1500 . As shown in FIG. 15 , the child node 1500 may include: a processor 1501 and a memory 1502 . The memory 1502 is used to store computer-implemented instructions. Exemplarily, in some embodiments, when the processor 1501 executes the instructions stored in the memory 1502, the child node 1500 may be caused to execute the data processing method shown in any of the foregoing embodiments.

FIG. 16 shows a schematic diagram of the composition of a chip system 1600 . The chip system may be applied to any sub-node involved in the embodiments of this application. The chip system 1600 may include: a processor 1601 and a communication interface 1602, which are used to support related devices to implement the functions involved in the above embodiments. In a possible design, the chip system further includes a memory for storing necessary program instructions and data of the child nodes. The chip system may be composed of chips, or may include chips and other discrete devices. It should be noted that, in some implementation manners of the present application, the communication interface 1602 may also be referred to as an interface circuit.

FIG. 17 shows a schematic diagram of the composition of a central node 1700 . As shown in FIG. 17 , the central node 1700 may include: a processor 1701 and a memory 1702 . The memory 1702 is used to store computer-implemented instructions. Exemplarily, in some embodiments, when the processor 1701 executes the instructions stored in the memory 1702, the central node 1700 can be caused to execute the data processing method shown in any one of the foregoing embodiments.

FIG. 18 shows a schematic composition diagram of a chip system 1800 . The chip system may be applied to any central node involved in the embodiments of this application. The chip system 1800 may include: a processor 1801 and a communication interface 1802, which are used to support related devices to implement the functions involved in the above embodiments. In a possible design, the chip system further includes a memory for storing necessary program instructions and data of the central node. The chip system may be composed of chips, or may include chips and other discrete devices. It should be noted that, in some implementation manners of the present application, the communication interface 1802 may also be referred to as an interface circuit.

The functions or actions or operations or steps in the above embodiments may be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented using a software program, it can be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, all or part of the processes or functions described in the embodiments of the present application are generated. The computer may be a general purpose computer, special purpose computer, computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server, or data center Transmission to another website site, computer, server, or data center by wire (eg, coaxial cable, optical fiber, digital subscriber line, DSL) or wireless (eg, infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer, or data storage devices including one or more servers, data centers, etc. that can be integrated with the medium. The usable media may be magnetic media (e.g., floppy disk, hard disk, magnetic tape), optical media (e.g., DVD), or semiconductor media (e.g., solid state drive (SSD)), and the like.

Although the application has been described in conjunction with specific features and embodiments thereof, it will be apparent that various modifications and combinations can be made therein without departing from the spirit and scope of the application. Accordingly, this specification and drawings are merely exemplary illustrations of the application as defined by the appended claims, and are deemed to cover any and all modifications, variations, combinations or equivalents within the scope of this application. Obviously, those skilled in the art can make various changes and modifications to the present application without departing from the spirit and scope of the present application. Thus, if these modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is also intended to include these modifications and variations.

Claims

A machine learning method, characterized in that the method is applied to a sub-node, and the sub-node is provided with a binary neural network model BNN, and the method includes:

The sub-node performs machine learning based on the BNN on the local data set obtained by collecting, and obtains local model parameters corresponding to the local data set;

The child node sends a first message to the central node, where the first message includes the local model parameters.
The method according to claim 1, wherein the local model parameters included in the first message are binarized local model parameters.
The method according to claim 1 or 2, wherein the method further comprises:

The child node receives fusion parameters from the central node, and the fusion parameters are obtained by the central node fusion according to the local model parameters;

The sub-node performs fusion processing on the fusion parameters and the local model parameters to obtain updated local model parameters.
The method according to claim 3, wherein the fusion parameter is a binarized model parameter, or the fusion parameter is a high-precision model parameter.
The method according to any one of claims 1-4, wherein the first message further comprises:

accuracy information corresponding to the local model parameters;

Wherein, the accuracy information is obtained by the sub-node through verification according to the local model parameters and the test data set.
The method according to any one of claims 1-5, wherein the method further comprises:

The child node continues machine learning based on the BNN according to the updated local model parameters.
A machine learning method, wherein the method is applied to a central node, and the method comprises:

The central node receives N first messages from N child nodes respectively, the first messages include local model parameters, and the local model parameters are binary local model parameters; N is an integer greater than or equal to 1;

The central node fuses the local model parameters included in the N first messages to obtain the fusion parameters;

The central node sends a second message to the M sub-nodes, where the second message includes a fusion parameter, where M is a positive integer greater than or equal to 1.
The method according to claim 7, wherein the central node fuses the N local model parameters to obtain the fusion parameters, comprising:

The central node performs a weighted average on the N local model parameters to obtain the fusion parameters.
The method according to claim 7 or 8, wherein the fusion parameter included in the second message is a high-precision fusion parameter, or,

The fusion parameter included in the first message is a binarized fusion parameter.
The method according to any one of claims 7-9, wherein,

Before the central node sends the second message to the M sub-nodes, the method further includes:

The central node determines the system accuracy information according to the N first messages,

The central node determines, according to the system accuracy information, that the fusion parameter included in the first message is a high-precision fusion parameter or a binarized fusion parameter.
The method according to claim 10, wherein the first message further comprises: accuracy information; the accuracy information and the local model parameters included in the first message are verified and obtained at the corresponding child node The accuracy corresponds to;

The central node determines the system accuracy information according to the N first messages, including:

The central node determines system accuracy information according to the accuracy information included in the N messages.
The method according to claim 10 or 11, wherein when the system accuracy information is less than or equal to a first threshold, the central node determines that the fusion parameter is a binarized fusion parameter;

When the system accuracy information is greater than or equal to the second threshold, the central node determines that the fusion parameter is a high-precision fusion parameter.
The method according to any one of claims 7-9, wherein,

Send a second message to the M sub-nodes at the central node, including:

When the number of iteration rounds is less than or equal to the third threshold, the central node sends a second message including the binarized fusion parameter to the M sub-nodes;

When the number of iteration rounds is greater than or equal to the fourth threshold, the central node sends a second message including a high-precision fusion parameter to the M sub-nodes.
The method according to any one of claims 7-13, wherein the central node sends the second message to the M sub-nodes through broadcasting.
A machine learning device, characterized in that the device is applied to a sub-node, and the sub-node is provided with a binary neural network model BNN, and the device comprises: an acquisition unit for collecting and acquiring a local data set , perform machine learning based on the BNN, and obtain local model parameters corresponding to the local data set; a sending unit, configured to send a first message to the central node, where the first message includes the local model parameters.
The apparatus according to claim 15, wherein the local model parameters included in the first message are binarized local model parameters.
The apparatus according to claim 15 or 16, wherein the apparatus further comprises: a receiving unit, configured to receive a fusion parameter from the central node, the fusion parameter being the central node according to the local model parameter obtained by fusion;

The fusion unit is configured to fuse the parameters according to the fusion and the local model parameters to obtain the updated local model parameters.
The apparatus according to claim 17, wherein the fusion parameter is a binarized model parameter, or the fusion parameter is a high-precision model parameter.
The apparatus according to any one of claims 15-18, wherein the first message further comprises: accuracy information corresponding to the local model parameter;

The obtaining unit is further configured to verify and obtain the accuracy information according to the local model parameters and the test data set.
The apparatus according to any one of claims 15-19, wherein the apparatus further comprises: a learning unit, configured to continue machine learning based on the BNN according to the updated local model parameters.
A machine learning device, characterized in that the device is applied to a central node, and the device comprises: a receiving unit configured to receive N first messages from N sub-nodes respectively, the first messages including local model parameters , the local model parameter is a binarized local model parameter; N is an integer greater than or equal to 1;

a fusion unit, configured to fuse the local model parameters included in the N first messages to obtain fusion parameters;

A sending unit, configured to send a second message to the M sub-nodes, where the second message includes the fusion parameter, and M is a positive integer greater than or equal to 1.
The apparatus according to claim 21, wherein the fusion unit is specifically configured to perform a weighted average of the N local model parameters to obtain the fusion parameters.
The apparatus according to claim 21 or 22, wherein the fusion parameter included in the second message is a high-precision fusion parameter, or the fusion parameter included in the second message is a binarized parameter fusion parameters.
The apparatus according to any one of claims 21-23, wherein the apparatus further comprises: a determining unit, configured to determine the system accuracy information according to the N first messages, and the central node according to The system accuracy information determines that the fusion parameter included in the first message is a high-precision fusion parameter or a binarized fusion parameter.
The apparatus according to claim 24, wherein the first message further comprises: accuracy information; the accuracy information is calibrated at the corresponding child node with the local model parameters included in the first message and the determining unit is specifically configured to determine the system accuracy information according to the accuracy information included in the N messages.
The apparatus according to claim 24 or 25, wherein the determining unit is configured to determine that the fusion parameter is a binarized fusion parameter when the system accuracy information is less than or equal to a first threshold; The determining unit is further configured to determine that the fusion parameter is a high-precision fusion parameter when the system accuracy information is greater than or equal to a second threshold.
The apparatus according to any one of claims 21 to 23, wherein the sending unit is configured to send the data including binarization to the M sub-nodes when the number of iteration rounds is less than or equal to a third threshold The second message of the fusion parameter; the sending unit is further configured to send the second message including the high-precision fusion parameter to the M sub-nodes when the number of iteration rounds is greater than or equal to the fourth threshold .
The apparatus according to any one of claims 21-27, wherein the sending unit is specifically configured to send the second message to the M sub-nodes through broadcasting.
A machine learning system, characterized in that the machine learning system comprises one or more machine learning devices according to any one of claims 15-20, and one or more machine learning devices according to any one of claims 21-28 A machine learning device of the kind.
A child node, characterized in that the child node includes one or more processors and one or more memories; the one or more memories are coupled with the one or more processors, and the one or more memories the memory stores computer instructions;

The computer instructions, when executed by the one or more processors, cause the child nodes to perform the machine learning method of any of claims 1-6.
A central node, characterized in that the central node includes one or more processors and one or more memories; the one or more memories are coupled with the one or more processors, the one or more memories a memory storing computer instructions;

When the one or more processors execute the computer instructions, the central node is caused to perform the machine learning method of any one of claims 7-14.
A machine learning system, characterized in that, the machine learning system comprises one or more central nodes as claimed in claim 31 , and one or more sub-nodes as claimed in claim 30 .
A chip system, characterized in that the chip system includes an interface circuit and a processor; the interface circuit and the processor are interconnected by lines; the interface circuit is used for receiving signals from a memory and sending signals to the processor sending the signal, the signal comprising computer instructions stored in the memory; when the processor executes the computer instructions, the system-on-a-chip performs the machine learning of any one of claims 1-6 method; or, when the processor executes the computer instructions, the chip system executes the machine learning method according to any one of claims 7-14.
A computer-readable storage medium, wherein the computer-readable storage medium comprises computer instructions, when the computer instructions are executed, the machine learning method according to any one of claims 1-6 is executed; or , performing the machine learning method as claimed in any one of claims 7-14.
A computer program, characterized in that, when the computer program runs on a computer, the machine learning method according to any one of claims 1-6 is executed; or, as any one of claims 7-14 A described machine learning method is performed.
A computer program product, characterized in that the computer program product includes instructions that, when the computer program product is run on a computer, cause the machine learning method according to any one of claims 1 to 6 to be executed ; or, a machine learning method as claimed in any one of claims 7-14 is performed.