WO2022082742A1

WO2022082742A1 - Model training method and device, server, terminal, and storage medium

Info

Publication number: WO2022082742A1
Application number: PCT/CN2020/123292
Authority: WO
Inventors: 牟勤; 洪伟; 赵中原; 熊可欣
Original assignee: 北京小米移动软件有限公司; 北京邮电大学
Priority date: 2020-10-23
Filing date: 2020-10-23
Publication date: 2022-04-28
Also published as: CN114667523A

Abstract

The present disclosure relates to a model training method and device, a server, a terminal, and a storage medium, which belong to the technical field of communications. The method comprises: receiving data distribution information of multiple terminals, the data distribution information comprising data categories and the number of samples included in each category; selecting a training data set corresponding to the data distribution information of the multiple terminals; performing model training on the basis of the training data set, so as to obtain model parameters; sending the model parameters to at least part of the multiple terminals; receiving a training result obtained after the at least part of the multiple terminals have trained the model parameters; and updating the model parameters on the basis of the training result of the at least part of the multiple terminals, so as to obtain global model parameters.

Description

Model training method, device, server, terminal and storage medium

technical field

The present disclosure relates to the field of communication technologies, and in particular, to a model training method, device, server, terminal and storage medium.

Background technique

Meta-learning is a learning method that uses past knowledge and experience to guide the learning of new tasks. Meta-learning has the ability to learn to learn.

Centralized meta-learning is a meta-learning scheme, and its steps usually include: the server collects data from each terminal, and integrates the data to generate a training data set; the server randomly initializes a set of model parameters as global model parameters; each Round training extracts a set of tasks from the training data set, a set of tasks includes multiple tasks, each task includes a support set and a query set; uses the support set of the extracted tasks to perform local updates, and obtains locally updated model parameters; use the extracted tasks The query set is used to test the locally updated model parameters, and the test gradient is obtained; the average value of the test gradient on each task is determined, and the gradient descent method is used to update the global model; the above process is repeated until the model converges, and the meta-model is obtained and distributed to each task. Terminals; each terminal uses local data to fine-tune the model to obtain an adaptive update model.

SUMMARY OF THE INVENTION

The embodiments of the present disclosure provide a model training method, apparatus, server, terminal and storage medium, which can save transmission bandwidth and save computing resources of the server. The technical solution is as follows:

According to an aspect of the embodiments of the present disclosure, there is provided a model training method, the method comprising:

receiving data distribution information of multiple terminals, where the data distribution information includes categories of data and the number of samples included in each category;

selecting a training data set that conforms to the data distribution information of the multiple terminals;

Perform model training based on the training data set to obtain model parameters;

sending the model parameters to at least some of the terminals;

receiving a training result obtained by training the model parameters by the at least some terminals;

Based on the training results of the at least part of the terminals, the model parameters are updated to obtain global model parameters.

According to another aspect of the embodiments of the present disclosure, there is provided a model training method, the method comprising:

sending data distribution information of the terminal, where the data distribution information includes categories of data and the number of samples included in each category;

Receive model parameters, where the model parameters are obtained by training a training data set selected by the server based on the data distribution information;

training the model parameters to obtain a training result;

The training result is sent, and the training result is used to globally update the model parameters to obtain global model parameters.

According to another aspect of the embodiments of the present disclosure, there is provided a model training apparatus, the apparatus comprising:

a receiving module, configured to receive data distribution information of multiple terminals, where the data distribution information includes data categories and the number of samples included in each category;

a selection module, configured to select a training data set that conforms to the data distribution information of the multiple terminals;

a model training module, configured to perform model training based on the training data set to obtain model parameters;

a sending module, configured to send the model parameters to at least some of the terminals;

The receiving module is further configured to receive a training result obtained by training the model parameters by the at least part of the terminals;

The model training module is further configured to update the model parameters based on the training results of the at least part of the terminals to obtain global model parameters.

a sending module, configured to send data distribution information of the terminal, where the data distribution information includes data categories and the number of samples included in each category;

a receiving module, configured to receive model parameters, the model parameters are obtained by training a training data set selected by the server based on the data distribution information;

a model training module, configured to train the model parameters to obtain a training result;

The sending module is further configured to send the training result, where the training result is used to globally update the model parameters to obtain global model parameters.

According to another aspect of embodiments of the present disclosure, there is provided a server, the server comprising: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to load and execute the executable Instructions to implement the aforementioned model training method.

According to another aspect of the embodiments of the present disclosure, a terminal is provided, the terminal comprising: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to load and execute the executable Instructions to implement the aforementioned model training method.

According to another aspect of the embodiments of the present disclosure, a computer-readable storage medium is provided, when the instructions in the computer-readable storage medium are executed by a processor, the aforementioned model training method can be executed.

In the embodiment of the present disclosure, the server receives the data distribution information sent by the terminal, then summarizes the data distribution information of each terminal, selects data with the same distribution to form a training set for preliminary training, and then sends the model parameters to the terminal for distributed training, and then Then the terminal uploads the training results to the server for global update; in this process, the data distribution information, model parameters, etc. are transmitted between the terminal and the server, but the data of the terminal is not directly transmitted, and the bandwidth occupation is small; moreover, through the terminal Distributed training, low consumption of server computing resources.

It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.

Description of drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description serve to explain the principles of the disclosure.

1 shows a block diagram of a model training system provided by an exemplary embodiment of the present disclosure;

FIG. 2 is a flowchart of a model training method according to an exemplary embodiment;

3 is a flowchart of a model training method according to an exemplary embodiment;

4 is a flowchart of a model training method according to an exemplary embodiment;

FIG. 5 is a flowchart showing a connection establishment process according to an exemplary embodiment;

FIG. 6 is a flow chart of an initialization model parameter training process according to an exemplary embodiment;

7 is a schematic structural diagram of a model training apparatus according to an exemplary embodiment;

8 is a schematic structural diagram of a model training apparatus according to an exemplary embodiment;

FIG. 9 is a block diagram of a terminal according to an exemplary embodiment;

Fig. 10 is a block diagram of a server according to an exemplary embodiment.

Detailed ways

Exemplary embodiments will be described in detail herein, examples of which are illustrated in the accompanying drawings. Where the following description refers to the drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the illustrative examples below are not intended to represent all implementations consistent with this disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as recited in the appended claims.

FIG. 1 shows a block diagram of a model training system provided by an exemplary embodiment of the present disclosure. As shown in FIG. 1 , the model training system may include: a network side 12 and a terminal 13 .

The network side 12 includes a server 120, and the server 120 communicates with the terminal 13 through a wireless channel. In this embodiment of the present disclosure, the server 120 may belong to a functional unit of a network-side device, and the network-side device may be a base station, which is a device deployed in an access network to provide a wireless communication function for a terminal. The terminal 13 is a terminal accessing a network side device, and the network side device coordinates each terminal to participate in distributed collaborative learning.

The base station may include various forms of macro base station, micro base station, relay station, access point and so on. In systems using different wireless access technologies, the names of devices with base station functions may be different. In 5G New Radio (NR, New Radio) systems, they are called gNodeBs or gNBs. With the evolution of communication technology, the name "base station" may be descriptive and will change. For convenience of description, hereinafter, the above-mentioned apparatuses for providing wireless communication functions for terminals are collectively referred to as network-side devices.

The terminal 13 may include various handheld devices with wireless communication functions, vehicle-mounted devices, wearable devices, computing devices or other processing devices connected to the wireless modem, as well as various forms of user equipment, mobile stations (Mobile Station, MS), terminal and so on. For the convenience of description, the devices mentioned above are collectively referred to as terminals. The access network device 120 and the terminal 13 communicate with each other through a certain air interface technology, such as a Uu interface.

In the related art, the server 120 collects the data of each terminal for centralized meta-learning (or called centralized training). On the one hand, this method requires data transmission and occupies a large bandwidth; on the other hand, all training work Completed by the server, the training time is long, the server computing resource consumption is large, and meta-learning does not pursue an optimal global model, but hopes to train a well-initialized model that can quickly adapt to new tasks. Therefore, centralized meta-learning The lengthy model convergence period in the solution brings little performance improvement, and the model training efficiency is low.

In addition, in the related art, many edge terminals accessing the network are often unable to upload data (small sample data) to the server for training due to data privacy, security issues, etc. The data of these edge terminals often contains a large amount of information. Model performance improvement is important, and centralized meta-learning schemes cannot fully utilize data from edge terminals. In addition, the centralized meta-learning scheme indiscriminately uses server data as training data, and the server data is weakly correlated with the terminal, so the trained model has weak generalization ability for the task of the terminal.

The model training system and business scenarios described in the embodiments of the present disclosure are for the purpose of illustrating the technical solutions of the embodiments of the present disclosure more clearly, and do not constitute a limitation on the technical solutions provided by the embodiments of the present disclosure. For the evolution of the training system and the emergence of new business scenarios, the technical solutions provided by the embodiments of the present disclosure are also applicable to similar technical problems.

Fig. 2 is a flowchart of a model training method according to an exemplary embodiment. Referring to Figure 2, the method includes the following steps:

In step 101, the server receives data distribution information of multiple terminals.

The data distribution information includes categories of data and the number of samples included in each category.

The data category is used to classify the data in the data set. For example, when training a picture classification model, the pictures in the data set can be divided into multiple categories according to the classification requirements, such as people, plants, landscapes, etc. belong to one category. The number of samples included in each category refers to the amount of data of each category, for example, the number of pictures whose category is a person.

In step 102, the server selects a training data set that conforms to the data distribution information of the multiple terminals.

After receiving the data distribution information of the multiple terminals, the server selects data with the same category and number of samples as those included in the data distribution information according to the data distribution information of the multiple terminals to form a training data set.

In step 103, the server performs model training based on the training data set to obtain model parameters.

In step 104, the server sends the model parameters to at least some of the terminals.

The server can select the terminals that meet the distributed training requirements based on the user scheduling information of the terminals, and let these terminals participate in the distributed training.

In step 105, the server receives a training result obtained by training the model parameters by the at least some terminals.

In step 106, the server updates the model parameters based on the training results of the at least part of the terminals to obtain global model parameters.

After the terminal trains the model parameters provided by the server, the training result is reported to the server, and the server can complete the global update based on the training results of each terminal to obtain the global model parameters. On the one hand, the global model parameters achieve the training target, and on the other hand, because the training results of at least part of the terminals are integrated, they are suitable for at least part of the terminals mentioned above.

In the embodiment of the present disclosure, the server receives the data distribution information sent by the terminal, then summarizes the data distribution information of each terminal, selects data with the same distribution to form a training set for preliminary training, and then sends the model parameters to the terminal for distributed training, and then Then the terminal uploads the training results to the server for global update; in this process, the data distribution information, model parameters, etc. are transmitted between the terminal and the server, but the data of the terminal is not directly transmitted, and the bandwidth occupation is small; moreover, through the terminal Distributed training, the consumption of server computing resources is small, the training period is short, and the training efficiency is high. At the same time, since the server selects the same data distributed as the terminal to form the training set during preliminary training, the correlation between the model and the terminal is strengthened, and the model has strong generalization ability; in addition, the distributed training process is directly participated by the terminal. The terminal does not need to upload its own data, and even the edge terminal can participate, so that the data of the edge terminal can be used to improve the performance of the learning model, thus ensuring that the training scheme can make full use of the data of the edge terminal.

The solution for distributed collaborative learning using data distribution characteristics provided by the embodiments of the present disclosure is suitable for training meta-models with strong generalization capabilities, such as model training for tasks such as deep learning and image processing.

Optionally, receive data distribution information of multiple terminals, including:

Receive the data distribution information transmitted by each of the multiple terminals through radio resource control (Radio Resource Control, RRC) signaling.

In the embodiment of the present disclosure, when transmitting the data distribution information, the terminal and the server may first establish an RRC connection, and in the process of establishing the RRC connection, transmit the above-mentioned data distribution information through RRC signaling. In this way, the uploading process of the data distribution information can be simplified.

Optionally, selecting a training data set that conforms to the data distribution information of the multiple terminals, including:

combining the data distribution information of the multiple terminals to obtain total data distribution information;

The data whose distribution conforms to the total data distribution information is extracted from the local data of the server to obtain the training data set.

For example, the data distribution information of terminal 1 includes: {type A, sample size a1; type B, sample size b}; the data distribution information of terminal 2 includes: {type A, sample size a2; type C, sample size c}; Then, the total data distribution information includes: {type A, sample size a1+a2; type B, sample size b; type C, sample size c}. The server selects the data type and sample size according to {type A, sample size a1+a2; type B, sample size b; type C, sample size c} to form a training data set.

Optionally, the model parameters include initialization model parameters, and model training is performed based on the training data set to obtain model parameters, including:

Use the training data set to perform model training to obtain the initialization model parameters;

Alternatively, the model parameters include intermediate model parameters, and model training is performed based on the training data set to obtain model parameters, including:

Use the training data set to perform model training to obtain initialization model parameters;

The initial model parameters are iteratively updated to obtain the intermediate model parameters.

In the embodiment of the present disclosure, on the one hand, the server performs model training according to the training data set to obtain an initialized model parameter, and then sends the initialized model parameter to the terminal, which can save the training time of the terminal; on the other hand, the server performs training based on the terminal. As a result, the initialized model parameters are updated to obtain the intermediate model parameters, and then the intermediate model parameters are sent to the terminal, so that the terminal can perform training on the basis of the intermediate model parameters, thereby speeding up the entire model training process.

The intermediate model parameters are obtained by iteratively updated by the server based on the training results uploaded by the terminal.

Optionally, sending the model parameters to at least some of the multiple terminals includes:

receiving user scheduling information of each terminal in the plurality of terminals;

determining, based on the user scheduling information of each terminal in the plurality of terminals, whether each terminal in the plurality of terminals meets the distributed training requirement;

Sending the model parameters to a terminal that meets the distributed training requirement among the multiple terminals.

Exemplarily, the user scheduling information includes at least one of the following parameters:

The data volume of the data in the terminal, the similarity between the data distribution and the total data distribution information, the communication status, the computing capability, and the performance requirements of the learning model, and the total data distribution information is obtained by combining the data distribution information of the multiple terminals.

The data amount of the data in the terminal may be obtained based on the data distribution information uploaded by the terminal, that is, the sum of the sample sizes of various types of data in the data distribution information. The similarity between the data distribution and the total data distribution information refers to the difference between the categories included in a terminal and the categories in the total data distribution information, such as the ratio of the number of categories included in the terminal to the number of categories in the total data distribution information, and the number of categories in the terminal. The ratio of the number of samples of a category to the number of samples of the corresponding category in the total data distribution information, and the above difference type is obtained by combining the above two ratios. The communication status may include Channel Quality Indication (CQI). Computing power can include computing speed and equipment surplus computing power. Computing speed refers to the number of calculations per second (calculation times/S), and equipment surplus computing power refers to the percentage of computing power that can be allocated to model training. The performance requirements of the learning model include the preference for tasks and the requirements for accuracy. Among them, the preference for tasks can be represented by the probabilistic characteristics of tasks that may be performed locally. Taking classification tasks as an example, the probability of occurrence of each category of data in the task is used. to represent: P={p(category 1), p(category 2), ...}; exemplarily, the requirements for accuracy may be as follows: model accuracy>90%.

Exemplarily, for each parameter in the user scheduling information, the server sets a threshold range that meets the distributed training requirements. When each parameter of a terminal meets the set threshold range, the terminal meets the distributed training requirements. .

Optionally, sending the model parameters to a terminal that meets the distributed training requirements among the multiple terminals, including:

Determine data transmission parameters based on the data volume of the model parameters and the communication status of the terminals that meet the distributed training requirements;

The model parameters are sent to the terminal that meets the distributed training requirements according to the data transmission parameters.

Here, the data transmission parameters include parameters such as modulation mode and code rate. When the data amount of the model parameters is different and the communication status of the terminal is different, different modulation modes and code rates can be selected for transmission, so that the selected modulation mode and code rate can be used for transmission. It matches the amount of data to be transmitted and the communication status of the terminal, so as to achieve a better transmission effect.

Among them, the data volume of model parameters is related to the size of the model, the larger the model, the larger the data volume of model parameters; on the other hand, it is also related to the accuracy of each model parameter the larger the amount. The precision of the model parameters may refer to the number of digits retained after the decimal point. The higher the precision of the model parameters and the more digits retained after the decimal point, the larger the amount of data occupied by the model parameters.

Optionally, the training result includes a gradient value, and the gradient value is a gradient value obtained by testing the trained model parameters after the terminal trains the model parameters;

Alternatively, the training result includes a model update parameter, and the model update parameter is a model parameter obtained after the terminal trains the model parameter.

In the embodiment of the present disclosure, the training result of the terminal can be in two cases, one is the gradient value obtained by testing after the training is completed, and the other is the model update parameters obtained only after model training without testing. The reason for these two situations is that the data volume of the data in the terminal is different. For example, when the data volume in the terminal is large, the data in the terminal can form a support set and a query set. In this case, the terminal can use the support set first. Model training is performed, and then the query set is used for model testing; when the amount of data in the terminal is small, the data in the terminal can only form a support set. At this time, the terminal uses the support set for model training, and the model test is completed by the server. .

Here, the size of the data amount in the terminal can be obtained by comparing with the threshold value, for example, if it is larger than the threshold value, it is larger, and if it is smaller than the threshold value, it is smaller. The threshold may be determined based on the data volume of multiple terminals, for example, may be a quantile of the data volume of multiple terminals. For example, if the data volume of 80% of users reaches 1000, the threshold is set to 1000. The threshold may be determined by the server based on the data distribution information of each terminal, and then notified to each terminal.

Optionally, when the training result of each terminal in the at least part of the terminals includes gradient values,

Based on the training results of the at least part of the terminals, the model parameters are updated to obtain global model parameters, including:

Based on the average value of the gradient values of the at least part of the terminals, the model parameters are iteratively updated using a gradient descent method to obtain global model parameters.

In this case, each terminal in at least some of the terminals has a large amount of data and can form a support set and a query set at the same time. Therefore, each terminal in at least some of the terminals reports the gradient value to the server, so as to facilitate the server to complete the matching Iterative update of model parameters.

Optionally, when the training result of at least one terminal in the at least part of the terminals includes model update parameters,

selecting a query set that conforms to the data distribution information of the first terminal, where the first terminal is a terminal whose training result includes model update parameters;

Test the model update parameters of the first terminal based on the query set to obtain a gradient value;

In this case, some terminals have a small amount of data and cannot form a support set and a query set at the same time. Therefore, these terminals only report model update parameters to the server, and the server extracts the query set from the local for model testing, and then uses the test set. The obtained gradient values complete the iterative update of the model parameters.

Optionally, based on the average value of the gradient values of the at least part of the terminals, using a gradient descent method to iteratively update the model parameters to obtain global model parameters, including:

Based on the average value of the first gradient values of the at least part of the terminals, iteratively update the model parameters by using a gradient descent method;

determining whether the average value of the first gradient values of the at least part of the terminals is within a threshold range;

In response to the average value of the first gradient values of the at least part of the terminals being not within the threshold range, sending the intermediate model parameters of the model parameters after the iterative update of the model parameters to the at least part of the terminals;

The intermediate model parameters are iteratively updated by using the average value of the second gradient values of the at least part of the terminals; The gradient values obtained by testing the intermediate model parameters.

Optionally, the method further includes:

In response to the average value of the first gradient values of the at least part of the terminals being within the threshold range, the global model parameters after the iterative update of the model parameters are sent to the at least part of the terminals, and the global model parameters are used for all the terminals. The terminal performs adaptive update.

In the embodiment of the present disclosure, the updating of model parameters is usually a multi-round distributed training process, that is, at least some terminals perform model training once and report their respective training results as a round of distributed training process. After this round of distributed training process is over, on the one hand, the server performs a global update based on the training results; on the other hand, the terminal can determine whether it meets the requirements of distributed training based on the average value of the gradient values corresponding to the training results of these terminals. If the requirements are met, the globally updated model will be used as the globally updated model. The globally updated model does not require distributed training, but can be used by users after adaptive training; if the requirements are not met, the globally updated model will be used. The intermediate model parameters are sent to the terminal as the basis for the next round of terminal training, and the next round of terminal training is performed on the basis of the intermediate model parameters.

In this implementation, the server monitors the effect of the distributed model training, and stops learning when the model accuracy meets the requirements, and does not require training until the model converges. This training method greatly improves the training efficiency. At the same time, the global model parameters will be adaptively updated by each terminal in the future, so that each terminal can obtain a more personalized model, which ensures that the model used by the terminal is more in line with the terminal's task requirements. model performance.

The adaptive updating of the terminal may refer to that the terminal uses data in the terminal to update based on the global model parameters, so that the model parameters meet the requirements of the terminal.

It should be noted that the foregoing steps 101 to 102 and the foregoing optional steps may be combined arbitrarily.

Fig. 3 is a flow chart of a model training method according to an exemplary embodiment. Referring to Figure 3, the method includes the following steps:

In step 201, the terminal sends data distribution information of the terminal, where the data distribution information includes data types and the number of samples included in each type.

The terminal counts the number of local samples of each data category, generates data distribution information, and sends it to the server.

In step 202, the terminal receives model parameters, where the model parameters are obtained by training a training data set selected by the server based on the data distribution information.

The model parameters here can be either initial model parameters or intermediate model parameters.

In step 203, the terminal trains the model parameters to obtain a training result.

In step 204, the terminal sends the training result, and the training result is used to globally update the model parameters to obtain global model parameters.

In the embodiment of the present disclosure, the terminal sends its own data distribution information to the server, the server summarizes the data distribution information of each terminal, selects data with the same distribution to form a training set for preliminary training, and then sends the model parameters to the terminal for distribution training, and then the terminal uploads the training results to the server for global update; in this process, the data distribution information, model parameters, etc. are transmitted between the terminal and the server, but the data of the terminal is not directly transmitted, and the bandwidth consumption is small; and , through the terminal distributed training, the server computing resource consumption is small.

Optionally, send the data distribution information of the terminal, including:

The data distribution information is sent through RRC signaling.

Optionally, the model parameters include initializing model parameters and receiving model parameters, including:

Receive the initialization model parameters, where the initialization model parameters are obtained by training the server using the training data set selected by the data distribution information;

Alternatively, the model parameters include intermediate model parameters, and the received model parameters include:

The intermediate model parameters are received, where the intermediate model parameters are obtained by iteratively updating the initialization model parameters by the server.

Optionally, the training result includes a gradient value, and the gradient value is a gradient value obtained by testing the trained model parameters after the model parameters are trained;

Alternatively, the training result includes a model update parameter, and the model update parameter is a model parameter obtained after training the model parameter.

Optionally, when the training result includes model update parameters,

Send the training results, including:

determining data transmission parameters based on the data volume of the model update parameter and the communication status of the terminal;

The model update parameters are sent to the server according to the data transmission parameters.

Optionally, the method further includes:

Sending user scheduling information, where the user scheduling information includes at least one of the following parameters: data volume of data in the terminal, similarity between data distribution and total data distribution information, communication status, computing power, and learning model performance requirements, the The total data distribution information is obtained by combining the data distribution information of the multiple terminals.

Optionally, the method further includes:

Receive global model parameters;

The global model parameters are adaptively updated.

It should be noted that the foregoing steps 201 to 202 and the foregoing optional steps may be combined arbitrarily.

Fig. 4 is a flow chart of a model training method according to an exemplary embodiment. Referring to Figure 4, the method includes the following steps:

In step 301, the server and the terminal establish an RRC connection.

Exemplarily, the process of establishing an RRC connection between the server and the terminal may refer to FIG. 5, and the steps are as follows:

Step 3011: The terminal sends a request to establish an RRC connection signaling to the server, and the request to establish an RRC connection signaling application requests to establish an RRC connection with the server. Correspondingly, the server receives the request to establish the RRC connection signaling.

Step 3012: The server sends the RRC connection establishment signaling to the client, where the RRC connection establishment signaling is used to notify the terminal server that the server agrees to establish the RRC connection. Correspondingly, the terminal receives the RRC connection establishment signaling.

Step 3013: The terminal sends an RRC connection establishment complete signaling to the server, where the RRC connection establishment complete signaling is used to notify the server that the RRC connection establishment is complete. Correspondingly, the server receives the RRC connection establishment completion signaling.

The signaling transmission and reception in the above-mentioned RRC connection establishment process is performed by the network communication module of the terminal and the network communication module of the server. The network communication modules of the terminal and the server can be composed of two parts: a sending module and a receiving module.

In step 302, the terminal sends the data distribution information to the server; the server receives the data distribution information sent by the terminal.

The data distribution information includes data categories and the number of samples included in each category.

In this embodiment of the present disclosure, step 301 and step 302 may be in no order, for example, data distribution information may be transmitted during the process of establishing an RRC connection between the server and the terminal, that is, the server receives the data distribution information transmitted by the terminal through RRC signaling . For example, the server receives the data distribution information that the terminal completes signaling transmission through RRC connection establishment.

In step 303, the server combines the data distribution information of the multiple terminals to obtain total data distribution information.

For example, the data distribution information of terminal 1 includes: {type A, sample size a1; type B, sample size b}; the data distribution information of terminal 2 includes: {type A, sample size a2; type C, sample size c}; Then, the total data distribution information includes: {type A, sample size a1+a2; type B, sample size b; type C, sample size c}.

In step 304, the server extracts data whose distribution conforms to the total data distribution information from the local data of the server to obtain the training data set.

Exemplarily, the server selects the data type and sample size according to {type A, sample size a1+a2; type B, sample size b; type C, sample size c} obtained by combining in step 303 to form a training data set.

In step 305, the server uses the training data set to perform model training to obtain the initialization model parameters.

Illustratively, the process of obtaining the initialization model parameters through server training can be seen in Figure 6, and the steps are as follows:

Step 3051: The server randomly initializes a set of model parameters.

Step 3052: The server extracts a batch of tasks from the training data set, and each task includes a support set and a query set.

Exemplarily, the total data distribution information is denoted as P, the server local data is denoted as D _s , data is extracted from the local data according to the total data distribution information P, and a training data set is generated, denoted as

server from training dataset

Extract data to generate several tasks, each task contains support set and query set, respectively denoted as

and

Step 3053: The server uses the support set of each task for training, and calculates the model loss and gradient to obtain the updated model parameters on each task.

Exemplarily, the server can use the gradient descent method to obtain the updated model parameters, which can be expressed as the following formula (1):

Among them, θ′ _i represents the updated model parameters on the ith task, θ represents a set of initialized model parameters, α represents the learning rate of a single task,

represents the derivation, L represents the loss function of the model on the support set, f represents the model, T _i represents the ith task,

represents the support set for the ith task.

Step 3054: The server uses the query set of each task to calculate the test loss and gradient for updating the model parameters.

Step 3055: The server summarizes the gradients on each task, updates the randomly initialized model parameters, and obtains the initialized model parameters.

Illustratively, the server computes the test loss and gradient for updating the model parameters using the query set for each task, sums and averages the gradients over each task. Using the gradient descent method, the global model parameters are updated with the average gradient value, which can be expressed as the following formula (2):

Among them, β represents the global learning rate, N represents the number of tasks used in this round of training, p(T) represents the set of tasks used in this round of training,

Represents the queryset for the ith task.

In the above process, each step can be performed by the model training module of the server, and, in step 3052, the training data set of the server can be stored in the data processing and storage module of the server, and the model training module can be combined with the data processing and storage module. Signaling interaction is performed between them to extract a batch of tasks.

In step 306, the terminal sends the user scheduling information to the server; the server receives the user scheduling information sent by the terminal.

The user scheduling information includes at least one of the following parameters: data volume of data in the terminal, similarity between data distribution and total data distribution information, communication status, computing power, and learning model performance requirements, and the total data distribution information is: It is obtained by combining the data distribution information of the multiple terminals.

In this embodiment of the present disclosure, step 306 and step 302 may be performed simultaneously, that is, the terminal sends the user scheduling information to the server when transmitting the data distribution information, that is, the user scheduling information may also be transmitted through RRC signaling.

In an implementation manner of the embodiment of the present disclosure, the user scheduling information may only include communication status, computing capability, and performance requirements of the learning model, and the similarity of the data volume, data distribution, and total data distribution information of the data in the terminal may be determined by the server based on The data distribution information is determined.

In this embodiment of the present disclosure, each parameter in the user scheduling information may be sent to the server by the terminal together, or may be sent to the server in sequence.

Among these parameters, the communication condition usually includes the CQI, and the CQI needs to be obtained by the terminal through measurement. Therefore, the method may further include: before step 306, the terminal performs CQI measurement.

In the embodiment of the present disclosure, the user scheduling information is acquired by the user management module in the terminal, and sent to the network communication module of the server through the network communication module of the terminal, and the network communication module of the server transmits it to the user management module of the server. When the network communication module and the user management module in the above-mentioned terminal or server carry out the transmission of user scheduling information, a new signaling may be used for execution, and the function of this signaling is to transmit the user scheduling information.

In step 307, the server determines whether each of the multiple terminals meets the distributed training requirement based on the user scheduling information of each of the multiple terminals.

Terminals other than the terminals selected above that meet the distributed training requirements among the multiple terminals do not participate in this training.

In step 308, the server sends the initial model parameters to the terminal that meets the distributed training requirement among the multiple terminals. The terminal receives initial model parameters.

In this step, if the terminal in step 301-step 307 belongs to the terminal that meets the distributed training requirements, the terminal will participate in step 308-step 314; and if the terminal in step 301-step 307 does not belong to the distributed training requirement required terminal, the terminal will not participate in steps 308-314. This embodiment is described by taking as an example that the terminal in step 301 to step 307 belongs to a terminal that meets the distributed training requirement.

Exemplarily, when transmitting the initialization model parameters, the server first determines the data transmission parameters based on the data volume of the initial model parameters and the communication status of the terminal; and then sends the initialization model parameters to the terminal according to the data transmission parameters. Here, determining the data transmission parameters may be performed by a transmission control module in the server. After the transmission control module determines the data transmission parameters, it may control the network communication module to send the initialization model parameters according to the above data transmission parameters.

For example, the server encapsulates the initialization model parameters according to the above data transmission scheme. The server sends the packaged data packet of initializing model parameters to the terminal. The terminal decapsulates the data packet after receiving it. The terminal confirms the correctness of the received data packet based on the decapsulated data. Then the terminal feeds back a message to the server, informing the server that the terminal has correctly received the initialization model parameters.

In the above process, for the terminal, verifying the correctness of the data packet and generating the feedback message is performed by the transmission control module in the terminal, and the receiving and sending processes are performed by the network communication module.

In step 309, the terminal trains the initial model parameters to obtain a training result.

Here, the size of the data amount in the terminal can be obtained by comparing with the threshold value, for example, if it is larger than the threshold value, it is larger, and if it is smaller than the threshold value, it is smaller. The threshold may be determined based on the data volume of multiple terminals, for example, may be a quantile of the data volume of multiple terminals. For example, if the data volume of 80% of users reaches 1000, the threshold is set to 1000. The threshold may be determined by the server based on the data distribution information of each terminal, and then notified to each terminal. The terminal can determine whether to generate a query set based on the threshold and its own data volume.

Exemplarily, the terminal uses the support set to update the initial model parameters by gradient descent to obtain the model update parameters, which can be expressed as the following formula (3):

Among them, θ _ui represents the model update parameter of the ith terminal,

represents the support set in the ith terminal.

If there is a query set in the terminal, the terminal uses the query set to test the model update parameters, and calculates the test loss and gradient value, which can be expressed as the following formula (4):

Among them, g _ui represents the test gradient of the model update parameters of the ith terminal,

represents the query set in the training set of the ith terminal.

In step 310, the terminal sends the training result to the server; the server receives the training result sent by the terminal.

When the terminal sends the training result, if it sends the model update parameters, it can be done in the way that the server sends the initialization model parameters in step 308, that is, the data transmission parameters are first determined and then sent according to the data transmission parameters. By extension, in the embodiment of the present disclosure, if model parameters need to be transmitted between the terminal and the server, the data transmission parameters are first determined and then transmitted according to the data transmission parameters.

In step 311, the server updates the model parameters based on the training results of at least some of the terminals. When the updated model parameters meet the requirements, step 312 is performed, and when the updated model parameters do not meet the requirements, step 313 is performed.

At least some of the terminals here refer to the terminals that participate in the training and meet the distributed training requirements. The server may obtain the average value of the gradient values of at least some of the terminals based on the training results of the terminal. If the average value of the gradient values of at least some of the terminals is within the threshold range (eg, less than the set value), it means that the updated model parameters meet the requirements; otherwise, the updated model parameters do not meet the requirements.

Exemplarily, when the training result of each terminal in the at least some terminals includes a gradient value, step 311 may include:

The server uses a gradient descent method to iteratively update the model parameters based on the average value of the gradient values of the at least part of the terminals.

Exemplarily, when the training result of at least one terminal in the at least part of the terminals includes model update parameters, step 311 may include:

The server selects a query set that conforms to the data distribution information of the first terminal, where the first terminal is a terminal whose training result includes model update parameters;

The server tests the model update parameters of the first terminal based on the query set to obtain a gradient value;

In this step, the server determines whether a query set needs to be generated for the terminal according to the data volume of each terminal.

In this embodiment of the present disclosure, the server uses the average value of gradient values of at least some terminals to update the model parameters by gradient descent, which can be expressed as the following formula (5):

Among them, M represents the number of terminals that meet the distributed training requirements, that is, the number of terminals participating in the distributed training.

It can be judged that the updated model parameters meet the requirements according to the following formula (6):

Here, g ₀ represents the aforementioned threshold value (set value).

Step 311 can be executed by the model update module in the server. In the process of executing the above steps, the module needs to interact with the data processing and storage module in the server, and obtains the data to generate a query set for the terminal, which can be used in the interaction process. A newly added signaling to instruct the data processing and storage module to provide the above-mentioned data for generating the query set.

In step 312, the server sends the intermediate model parameters whose model parameters are iteratively updated to the at least part of the terminals; the terminal receives the intermediate model parameters sent by the server.

After receiving the intermediate model parameters sent by the server, the terminal trains the intermediate model parameters to obtain a training result, and then repeats steps 310 and 311 to iteratively update.

In step 313, the server sends the global model parameters whose model parameters are iteratively updated to the at least part of the terminals; the terminals receive the global model parameters sent by the server.

In the above steps, only data distribution information, user scheduling information, etc. can be transmitted between the server and the terminal through RRC signaling, and subsequent model parameters, training results, etc., are transmitted through service data due to the large amount of data.

In step 314, the terminal adaptively updates the global model parameters.

In the embodiment of the present disclosure, the terminal uses the support set to test the global model parameters, calculates the test loss and gradient, and performs gradient descent update to obtain an adaptive model, which can be expressed as the following formula (7):

Among them, Φ _ui (θ) is the adaptive update model of the ith terminal,

is the query set in the test set of the ith terminal.

The

aforementioned steps

309 and 314 may be performed by the model updating module in the terminal, which needs to interact with the data processing and storage module in the terminal during the execution of the above steps to obtain data to generate a support set, a query set, and the like.

Fig. 7 is a schematic structural diagram of a model training apparatus according to an exemplary embodiment. The apparatus has the function of implementing the server in the above method embodiment, and the function may be implemented by hardware, or by executing corresponding software in hardware. As shown in FIG. 7 , the apparatus includes: a receiving module 501 , a selecting module 502 , a model training module 503 and a sending module 504 .

Wherein, the receiving module 501 is configured to receive data distribution information of multiple terminals, where the data distribution information includes data categories and the number of samples included in each category;

A selection module 502 is configured to select a training data set that conforms to the data distribution information of the multiple terminals;

The model training module 503 is configured to perform model training based on the training data set to obtain model parameters;

a sending module 504, configured to send the model parameters to at least some of the multiple terminals;

The receiving module 501 is further configured to receive a training result obtained by training the model parameters by the at least part of the terminals;

The model training module 503 is further configured to update the model parameters based on the training results of the at least part of the terminals to obtain global model parameters.

Optionally, the receiving module 501 is configured to receive the data distribution information transmitted by each of the multiple terminals through RRC signaling.

Optionally, the selection module 502 is configured to combine the data distribution information of the multiple terminals to obtain total data distribution information; extract data whose distribution conforms to the total data distribution information from the local data of the server. , to obtain the training data set.

Optionally, the model parameters include initialization model parameters, and the model training module 503 is configured to use the training data set to perform model training to obtain the initialization model parameters;

Alternatively, the model parameters include intermediate model parameters, and the model training module 503 is configured to perform model training by using the training data set to obtain initialized model parameters; iteratively update the initialized model parameters to obtain the intermediate model parameters model parameters.

Optionally, the receiving module 501 is further configured to receive user scheduling information of each terminal in the multiple terminals;

The apparatus further includes: a determination module 505, configured to determine whether each of the multiple terminals meets the distributed training requirement based on user scheduling information of each of the multiple terminals;

The sending module 504 is configured to send the model parameters to a terminal that meets the distributed training requirement among the multiple terminals.

Optionally, the user scheduling information includes at least one of the following parameters:

Optionally, the determining module 505 is further configured to determine data transmission parameters based on the data volume of the model parameters and the communication status of the terminals that meet the distributed training requirements;

The sending module 504 is configured to send the model parameter to the terminal that meets the distributed training requirement according to the data transmission parameter.

The model training module 503 is configured to use a gradient descent method to iteratively update the model parameters based on the average value of the gradient values of the at least part of the terminals to obtain global model parameters.

The selecting module 502 is configured to select a query set that conforms to the data distribution information of the first terminal, where the first terminal is a terminal whose training result includes model update parameters;

The model training module 503 is configured to test the model update parameters of the first terminal based on the query set to obtain gradient values; The model parameters are iteratively updated to obtain global model parameters.

Optionally, the model training module 503 is configured to use a gradient descent method to iteratively update the model parameters based on the average value of the first gradient values of the at least some terminals; determine the first gradient values of the at least some terminals. Whether the average value of a gradient value is within the threshold value range; in response to the average value of the first gradient values of the at least part of the terminals being not within the threshold value range, the intermediate model parameters after the iterative update of the model parameters are sent to the at least one terminal. some terminals; iteratively update the intermediate model parameters by using the average value of the second gradient values of the at least part of the terminals; wherein, the second gradient values are obtained by the terminal after training the intermediate model parameters The gradient values obtained by testing the intermediate model parameters after training.

Optionally, the sending module 504 is further configured to, in response to the average value of the first gradient values of the at least part of the terminals being within a threshold range, send the global model parameters after the iterative update of the model parameters to all the terminals. at least some of the terminals, the global model parameters are used for adaptive updating of the terminals.

Fig. 8 is a schematic structural diagram of a model training apparatus according to an exemplary embodiment. The apparatus has the function of realizing the terminal in the above method embodiment, and the function may be realized by hardware, or by executing corresponding software in hardware. As shown in FIG. 8 , the apparatus includes: a sending module 601, a receiving module 602 and a model training module 603.

Wherein, the sending module 601 is configured to send data distribution information of the terminal, where the data distribution information includes data categories and the number of samples included in each category;

The receiving module 602 is configured to receive model parameters, where the model parameters are obtained by training a training data set selected by the server based on the data distribution information;

A model training module 603, configured to train the model parameters to obtain a training result;

The sending module 601 is further configured to send the training result, where the training result is used to globally update the model parameters to obtain global model parameters.

Optionally, the sending module 601 is configured to send the data distribution information through RRC signaling.

Optionally, the model parameters include initialization model parameters, and the receiving module 602 is configured to receive the initialization model parameters, where the initialization model parameters are the training data set selected by the server using the data distribution information. owned;

Alternatively, the model parameters include intermediate model parameters, and the receiving module 602 is configured to receive the intermediate model parameters, where the intermediate model parameters are obtained by iteratively updating the initialization model parameters by the server.

Optionally, when the training result includes model update parameters,

The apparatus further includes: a determining module 604, configured to determine data transmission parameters based on the data volume of the model update parameter and the communication status of the terminal;

The sending module 601 is configured to send the model update parameter to the server according to the data transmission parameter.

Optionally, the sending module 601 is further configured to send user scheduling information, where the user scheduling information includes at least one of the following parameters: data volume of data in the terminal, similarity between data distribution and total data distribution information , communication status, computing capability, and learning model performance requirements, and the total data distribution information is obtained by combining the data distribution information of the multiple terminals.

Optionally, the receiving module 602 is further configured to receive global model parameters;

The model training module 603 is further configured to adaptively update the global model parameters.

FIG. 9 is a block diagram of a terminal 700 according to an exemplary embodiment. The terminal 700 may include: a processor 701 , a receiver 702 , a transmitter 703 , a memory 704 and a bus 705 .

The processor 701 includes one or more processing cores, and the processor 701 executes various functional applications and information processing by running software programs and modules.

The receiver 702 and the transmitter 703 may be implemented as a communication component, which may be a communication chip.

Memory 704 is connected to processor 701 via bus 705 .

The memory 704 may be configured to store at least one instruction, and the processor 701 may be configured to execute the at least one instruction, so as to implement various steps in the foregoing method embodiments.

Additionally, memory 704 may be implemented by any type or combination of volatile or non-volatile storage devices including, but not limited to, magnetic or optical disks, electrically erasable programmable Read Only Memory (EEPROM), Erasable Programmable Read Only Memory (EPROM), Static Anytime Access Memory (SRAM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Programmable Read Only Memory (PROM) .

In an exemplary embodiment, a computer-readable storage medium is also provided, wherein the computer-readable storage medium stores at least one instruction, at least one piece of program, code set or instruction set, the at least one instruction, the At least one section of program, the code set or the instruction set is loaded and executed by the processor to implement the model training method provided by each of the above method embodiments.

FIG. 10 is a block diagram of a server 800 according to an exemplary embodiment. The server 800 may include: a processor 801 , a receiver 802 , a transmitter 803 and a memory 804 . The receiver 802, the transmitter 803 and the memory 804 are respectively connected to the processor 801 through a bus.

The processor 801 includes one or more processing cores, and the processor 801 executes the method executed by the server in the model training method provided by the embodiment of the present disclosure by running software programs and modules. Memory 804 may be used to store software programs and modules. Specifically, the memory 804 can store the operating system 8041 and an application module 8042 required for at least one function. The receiver 802 is used for receiving communication data sent by other devices, and the transmitter 803 is used for sending communication data to other devices.

An exemplary embodiment of the present disclosure also provides a model training system, where the model training system includes a terminal and a server. The terminal is the terminal provided by the embodiment shown in FIG. 9 . The server is the server provided by the embodiment shown in FIG. 10 .

Other embodiments of the present disclosure will readily occur to those skilled in the art upon consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the present disclosure that follow the general principles of the present disclosure and include common knowledge or techniques in the technical field not disclosed by the present disclosure . The specification and examples are to be regarded as exemplary only, with the true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise structures described above and illustrated in the accompanying drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

A model training method, characterized in that the method comprises:

receiving data distribution information of multiple terminals, where the data distribution information includes categories of data and the number of samples included in each category;

selecting a training data set that conforms to the data distribution information of the multiple terminals;

Perform model training based on the training data set to obtain model parameters;

sending the model parameters to at least some of the terminals;

receiving a training result obtained by training the model parameters by the at least some terminals;

Based on the training results of the at least part of the terminals, the model parameters are updated to obtain global model parameters.
The method according to claim 1, wherein receiving data distribution information of multiple terminals comprises:

The data distribution information transmitted by each of the multiple terminals through RRC signaling is received.
The method according to claim 1, wherein selecting a training data set conforming to the data distribution information of the multiple terminals comprises:

combining the data distribution information of the multiple terminals to obtain total data distribution information;

The data whose distribution conforms to the total data distribution information is extracted from the local data of the server to obtain the training data set.
The method according to claim 1, wherein the model parameters include initialization model parameters, and model training is performed based on the training data set to obtain model parameters, including:

Use the training data set to perform model training to obtain the initialization model parameters;

Alternatively, the model parameters include intermediate model parameters, and model training is performed based on the training data set to obtain model parameters, including:

Use the training data set to perform model training to obtain initialization model parameters;

The initial model parameters are iteratively updated to obtain the intermediate model parameters.
The method according to claim 1, wherein sending the model parameters to at least some of the multiple terminals comprises:

receiving user scheduling information of each terminal in the plurality of terminals;

determining, based on the user scheduling information of each terminal in the plurality of terminals, whether each terminal in the plurality of terminals meets the distributed training requirement;

Sending the model parameters to a terminal that meets the distributed training requirement among the multiple terminals.
The method according to claim 5, wherein the user scheduling information includes at least one of the following parameters:

The data volume of the data in the terminal, the similarity between the data distribution and the total data distribution information, the communication status, the computing capability, and the performance requirements of the learning model, and the total data distribution information is obtained by combining the data distribution information of the multiple terminals.
The method according to claim 5, wherein sending the model parameters to a terminal that meets the distributed training requirement among the multiple terminals comprises:

Determine data transmission parameters based on the data volume of the model parameters and the communication status of the terminals that meet the distributed training requirements;

The model parameters are sent to the terminal that meets the distributed training requirements according to the data transmission parameters.
The method according to any one of claims 1 to 7, characterized in that, the training result includes a gradient value, and the gradient value is obtained after the terminal trains the model parameters by The gradient value obtained from the model parameter test;

Alternatively, the training result includes a model update parameter, and the model update parameter is a model parameter obtained after the terminal trains the model parameter.
The method according to claim 8, wherein when the training result of each terminal in the at least part of the terminals includes a gradient value,

Based on the training results of the at least part of the terminals, the model parameters are updated to obtain global model parameters, including:

Based on the average value of the gradient values of the at least part of the terminals, the model parameters are iteratively updated using a gradient descent method to obtain global model parameters.
The method according to claim 8, wherein when the training result of at least one terminal in the at least part of the terminals includes a model update parameter,

Based on the training results of the at least part of the terminals, the model parameters are updated to obtain global model parameters, including:

selecting a query set that conforms to the data distribution information of the first terminal, where the first terminal is a terminal whose training result includes model update parameters;

Test the model update parameters of the first terminal based on the query set to obtain a gradient value;

Based on the average value of the gradient values of the at least part of the terminals, the model parameters are iteratively updated using a gradient descent method to obtain global model parameters.
The method according to claim 9 or 10, wherein, based on the average value of the gradient values of the at least part of the terminals, using a gradient descent method to iteratively update the model parameters to obtain global model parameters, comprising:

Based on the average value of the first gradient values of the at least part of the terminals, iteratively update the model parameters by using a gradient descent method;

determining whether the average value of the first gradient values of the at least part of the terminals is within a threshold range;

In response to the average value of the first gradient values of the at least part of the terminals being not within the threshold range, sending the intermediate model parameters of the model parameters after the iterative update of the model parameters to the at least part of the terminals;

The intermediate model parameters are iteratively updated by using the average value of the second gradient values of the at least part of the terminals; The gradient values obtained by testing the intermediate model parameters.
The method according to claim 11, wherein the method further comprises:

In response to the average value of the first gradient values of the at least part of the terminals being within the threshold range, the global model parameters after the iterative update of the model parameters are sent to the at least part of the terminals, and the global model parameters are used for all the terminals. The terminal performs adaptive update.
A model training method, characterized in that the method comprises:

sending data distribution information of the terminal, where the data distribution information includes categories of data and the number of samples included in each category;

Receive model parameters, where the model parameters are obtained by training a training data set selected by the server based on the data distribution information;

training the model parameters to obtain a training result;

The training result is sent, and the training result is used to globally update the model parameters to obtain global model parameters.
The method according to claim 13, wherein the data distribution information of the sending terminal comprises:

The data distribution information is sent through RRC signaling.
The method of claim 13, wherein the model parameters include initialization model parameters, and the receiving model parameters include:

Receive the initialization model parameters, where the initialization model parameters are obtained by training the server using the training data set selected by the data distribution information;

Alternatively, the model parameters include intermediate model parameters, and the received model parameters include:

The intermediate model parameters are received, where the intermediate model parameters are obtained by iteratively updating the initialization model parameters by the server.
The method according to claim 15, wherein the training result includes a gradient value, and the gradient value is a gradient value obtained by testing the trained model parameters after the model parameters are trained;

Alternatively, the training result includes a model update parameter, and the model update parameter is a model parameter obtained after training the model parameter.
The method according to claim 16, wherein when the training result includes model update parameters,

Send the training results, including:

determining data transmission parameters based on the data volume of the model update parameter and the communication status of the terminal;

The model update parameters are sent to the server according to the data transmission parameters.
The method of claim 13, wherein the method further comprises:

Sending user scheduling information, where the user scheduling information includes at least one of the following parameters: data volume of data in the terminal, similarity between data distribution and total data distribution information, communication status, computing power, and learning model performance requirements, the The total data distribution information is obtained by combining the data distribution information of multiple terminals.
The method according to any one of claims 13 to 18, wherein the method further comprises:

Receive global model parameters;

The global model parameters are adaptively updated.
A model training device, characterized in that the device comprises:

a receiving module, configured to receive data distribution information of multiple terminals, where the data distribution information includes data categories and the number of samples included in each category;

a selection module, configured to select a training data set that conforms to the data distribution information of the multiple terminals;

a model training module, configured to perform model training based on the training data set to obtain model parameters;

a sending module, configured to send the model parameters to at least some of the terminals;

The receiving module is further configured to receive a training result obtained by training the model parameters by the at least part of the terminals;

The model training module is further configured to update the model parameters based on the training results of the at least part of the terminals to obtain global model parameters.
The apparatus according to claim 20, wherein the receiving module is configured to receive the data distribution information transmitted by each of the multiple terminals through RRC signaling.
The apparatus according to claim 20, wherein the selection module is configured to combine the data distribution information of the multiple terminals to obtain total data distribution information; The data of the total data distribution information is obtained to obtain the training data set.
The device according to claim 20, wherein the model parameters include initialization model parameters, and the model training module is configured to use the training data set to perform model training to obtain the initialization model parameters;

Alternatively, the model parameters include intermediate model parameters, and the model training module is configured to use the training data set for model training to obtain initialized model parameters; and to iteratively update the initialized model parameters to obtain the intermediate model parameter.
The apparatus according to claim 20, wherein the receiving module is further configured to receive user scheduling information of each terminal in the plurality of terminals;

The apparatus further includes: a determining module configured to determine, based on user scheduling information of each terminal in the plurality of terminals, whether each terminal in the plurality of terminals meets the distributed training requirement;

The sending module is configured to send the model parameters to a terminal that meets the distributed training requirement among the multiple terminals.
The apparatus according to claim 24, wherein the user scheduling information includes at least one of the following parameters:

The data volume of the data in the terminal, the similarity between the data distribution and the total data distribution information, the communication status, the computing capability, and the performance requirements of the learning model, and the total data distribution information is obtained by combining the data distribution information of the multiple terminals.
The device according to claim 24, wherein the determining module is further configured to determine data transmission parameters based on the data volume of the model parameters and the communication status of the terminals that meet the distributed training requirements;

The sending module is configured to send the model parameter to the terminal meeting the distributed training requirement according to the data transmission parameter.
The apparatus according to any one of claims 20 to 26, wherein the training result includes a gradient value, and the gradient value is obtained by training the model parameters by the terminal after training the trained The gradient value obtained from the model parameter test;

Alternatively, the training result includes a model update parameter, and the model update parameter is a model parameter obtained after the terminal trains the model parameter.
The apparatus according to claim 27, wherein when the training result of each terminal in the at least part of the terminals includes a gradient value,

The model training module is configured to use a gradient descent method to iteratively update the model parameters based on the average value of the gradient values of the at least part of the terminals to obtain global model parameters.
The apparatus according to claim 27, wherein when the training result of at least one terminal in the at least part of the terminals includes a model update parameter,

The selection module is configured to select a query set that conforms to the data distribution information of the first terminal, where the first terminal is a terminal whose training result includes model update parameters;

The model training module is configured to test the model update parameters of the first terminal based on the query set to obtain a gradient value; The model parameters are iteratively updated to obtain the global model parameters.
The apparatus according to claim 28 or 29, wherein the model training module is configured to use a gradient descent method to iterate the model parameters based on the average value of the first gradient values of the at least part of the terminals updating; determining whether the average value of the first gradient values of the at least part of the terminals is within a threshold range; in response to the average value of the first gradient values of the at least part of the terminals being not within the threshold range, updating the model parameters iteratively The intermediate model parameters are sent to the at least part of the terminals; the intermediate model parameters are iteratively updated by using the average value of the second gradient values of the at least part of the terminals; wherein the second gradient value is the pair of the terminals. After the intermediate model parameters are trained, the gradient values obtained by testing the trained intermediate model parameters.
The apparatus according to claim 30, wherein the sending module is further configured to iteratively update the model parameters in response to the average value of the first gradient values of the at least part of the terminals being within a threshold range The latter global model parameters are sent to the at least part of the terminals, and the global model parameters are used for adaptive updating by the terminals.
A model training device, characterized in that the device comprises:

a sending module, configured to send data distribution information of the terminal, where the data distribution information includes data categories and the number of samples included in each category;

a receiving module, configured to receive model parameters, the model parameters are obtained by training a training data set selected by the server based on the data distribution information;

a model training module, configured to train the model parameters to obtain a training result;

The sending module is further configured to send the training result, where the training result is used to globally update the model parameters to obtain global model parameters.
The apparatus according to claim 32, wherein the sending module is configured to send the data distribution information through RRC signaling.
The apparatus according to claim 32, wherein the model parameters include initialization model parameters, and the receiving module is configured to receive the initialization model parameters, wherein the initialization model parameters are the data used by the server for the data The training data set selected by the distribution information is obtained by training;

Alternatively, the model parameters include intermediate model parameters, and the receiving module is configured to receive the intermediate model parameters, where the intermediate model parameters are obtained by iteratively updating the initialization model parameters by the server.
The device according to claim 34, wherein the training result includes a gradient value, and the gradient value is a gradient value obtained by testing the trained model parameters after the model parameters are trained;

Alternatively, the training result includes a model update parameter, and the model update parameter is a model parameter obtained after training the model parameter.
The apparatus according to claim 35, wherein when the training result includes model update parameters,

The apparatus further includes: a determination module configured to determine a data transmission parameter based on the data volume of the model update parameter and the communication status of the terminal;

The sending module is configured to send the model update parameter to the server according to the data transmission parameter.
The apparatus according to claim 32, wherein the sending module is further configured to send user scheduling information, wherein the user scheduling information includes at least one of the following parameters: data volume of data in the terminal, data distribution Similarity, communication status, computing capability, and learning model performance requirements with the total data distribution information, the total data distribution information is obtained by combining the data distribution information of multiple terminals.
The apparatus according to any one of claims 32 to 37, wherein the receiving module is further configured to receive global model parameters;

The model training module is further configured to adaptively update the global model parameters.
A server, characterized in that the server comprises:

processor;

memory for storing processor-executable instructions;

Wherein, the processor is configured to load and execute the executable instructions to implement the model training method of any one of claims 1 to 12.
A terminal, characterized in that the terminal comprises:

processor;

memory for storing processor-executable instructions;

Wherein, the processor is configured to load and execute the executable instructions to implement the model training method of any one of claims 13 to 19.
A computer-readable storage medium, characterized in that, when the instructions in the computer-readable storage medium are executed by a processor, the model training method of any one of claims 1 to 12 can be executed, or the right The model training method described in any one of 13 to 19 is required.