CN116739111A

CN116739111A - Training method, device, equipment and medium for joint learning model

Info

Publication number: CN116739111A
Application number: CN202310752325.4A
Authority: CN
Inventors: 赵凡
Original assignee: Xinao Xinzhi Technology Co ltd
Current assignee: Xinao Xinzhi Technology Co ltd
Priority date: 2023-06-25
Filing date: 2023-06-25
Publication date: 2023-09-12

Abstract

The embodiment of the application provides a training method, a device, equipment and a medium of a joint learning model, wherein in the embodiment of the application, a server updates probability values corresponding to each group of super-parameter values according to a Fedex algorithm and a total loss value; generating a group of candidate hyper-parameter values and probability values corresponding to the group of candidate hyper-parameter values according to each group of hyper-parameter values and the updated probability values of each group of hyper-parameter values; and replacing a group of super-parameter values with the minimum probability value stored currently by adopting the group of candidate super-parameter values, so that the optimization of the super-parameter values is realized. In the embodiment of the application, one super parameter is prevented from being directly and randomly selected as the super parameter of the joint learning, and new candidate super parameter values are generated to replace the existing super parameter values according to the training effects corresponding to different super parameters, so that the optimal super parameter can be selected, and the training effect and the prediction capability of the model are improved.

Description

Training method, device, equipment and medium for joint learning model

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a medium for training a joint learning model.

Background

The complex industrial scene has greater challenges than other internet scenes, and in the detection of the problem of the fuel gas flue of the user, the variable human and environmental factors cause a lot of difficulties for the ultra-parameter adjustment of the joint learning.

In the related art, a set of super parameters is randomly selected from a database corresponding to the super parameter values in a random sampling manner, and an untrained model is configured by using the set of super parameters. In addition, no adjustment is made to the hyper-parameter values during model training, which results in the hyper-parameter values of the final trained model still being the randomly selected hyper-parameter values. In joint learning, the setting of the model hyper-parameter directly affects the final training result, and the randomly selected hyper-parameter may affect the running time of the model and also may affect the prediction capability of the model, resulting in poor quality of the model obtained by training.

Disclosure of Invention

The application provides a method, a device, equipment and a medium for training a joint learning model, which are used for solving the problems that the time of model operation can be influenced by randomly selected hyper-parameter values in the prior art, the prediction capability of the model can be influenced, and the quality of the model obtained through training is poor.

The embodiment of the application provides a joint learning model training method which is applied to a server and comprises the following steps:

for each round of training, the following operations are performed:

determining a group of target hyper-parameter values according to the probability value corresponding to each group of currently stored hyper-parameter values, and training each sub-model by adopting the group of target hyper-parameter values; determining a total loss value according to each sub-loss value corresponding to each sub-model; if the total loss value is larger than a preset threshold value, updating the probability value corresponding to each group of super-parameter values according to a Fedex algorithm and the total loss value; inputting each group of hyper-parameter values and the updated probability value of each group of hyper-parameter values into a generation model, and obtaining a group of candidate hyper-parameter values output by the generation model; replacing a group of super-parameter values with the minimum probability value stored currently by adopting the group of candidate super-parameter values; and re-executing the operation according to the probability value corresponding to each group of the replaced currently stored super-parameter values.

Further, if the current training is a first-round training, before determining a set of target super-parameter values according to the probability value corresponding to each set of currently stored super-parameter values, the method further includes:

randomly selecting preset number groups of hyper-parameter values from a hyper-parameter value database, and determining the preset number groups of hyper-parameter values as each group of currently stored hyper-parameter values;

and acquiring a pre-stored initial probability value, and determining the initial probability value as a probability value corresponding to each group of super-parameter values.

Further, updating the probability value corresponding to each set of super-parameter values according to the Fedex algorithm and the total loss value includes:

inputting each group of super-parameter values, probability values corresponding to each group of super-parameter values and the total loss value into a program constructed based on a Fedex algorithm;

and acquiring updated probability values corresponding to each group of hyper-parameter values output by the program.

Further, the generated model is designed based on a Bayesian algorithm.

Further, replacing the set of super parameter values with the set of super parameter values having the smallest probability value currently stored comprises:

sequencing each group of super-parameter values according to the probability value corresponding to each group of super-parameter values, and determining a super-parameter value queue;

deleting the last group of the super parameter values in the super parameter value queue, and adding the candidate super parameter values into the super parameter value queue.

Further, the method further comprises:

aiming at each training round, if the training round number of the joint learning model reaches a preset quantity threshold, determining that the training of the joint learning model is completed, and determining the set of target super-parameter values as the optimal super-parameter values of the joint learning model; or if the total loss value corresponding to the training of the wheel does not exceed the preset threshold value and the number of times that the total loss value does not exceed the preset threshold value reaches the preset number of times threshold value, determining that the training of the joint learning model is completed, and determining the set of target super-parameter values as the optimal super-parameter value of the joint learning model.

The embodiment of the application also provides a joint learning model training device which is applied to the server and comprises:

the determining module is used for determining a group of target hyper-parameter values according to the probability value corresponding to each group of currently stored hyper-parameter values, and enabling each sub-model to train by adopting the group of target hyper-parameter values;

the processing module is used for determining a total loss value according to each sub-loss value corresponding to each sub-model; if the total loss value is larger than a preset threshold value, updating the probability value corresponding to each group of super-parameter values according to a Fedex algorithm and the total loss value;

the super parameter value optimizing module is used for inputting each group of super parameter values and the updated probability value of each group of super parameter values into the generating model to obtain a group of candidate super parameter values output by the generating model; replacing a group of super-parameter values with the minimum probability value stored currently by adopting the group of candidate super-parameter values; and re-executing the operation according to the probability value corresponding to each group of the replaced currently stored super-parameter values.

Further, the determining module is further configured to randomly select a preset number of sets of hyper-parameter values from the hyper-parameter value database if the current training is a first-round training, and determine the preset number of sets of hyper-parameter values as each set of hyper-parameter values currently stored; and acquiring a pre-stored initial probability value, and determining the initial probability value as a probability value corresponding to each group of super-parameter values.

The embodiment of the application also provides electronic equipment, which comprises a processor, wherein the processor is used for realizing the steps of the joint learning model training method when executing the computer program stored in the memory.

The embodiment of the application also provides a computer readable storage medium, which stores a computer program, the computer program implementing the steps of the joint learning model training method according to any one of the above when being executed by a processor.

In an embodiment of the present application, for each round of training, the server performs the following operations: determining a group of target hyper-parameter values according to the probability value corresponding to each group of currently stored hyper-parameter values, and training each sub-model by adopting the group of target hyper-parameter values; determining a total loss value according to each sub-loss value corresponding to each sub-model; if the total loss value is larger than a preset threshold value, updating the probability value corresponding to each group of super-parameter values according to a Fedex algorithm and the total loss value; inputting each group of hyper-parameter values and the updated probability value of each group of hyper-parameter values into a generation model, and obtaining a group of candidate hyper-parameter values output by the generation model and the probability values corresponding to the group of candidate hyper-parameter values; replacing a group of super-parameter values with the minimum probability value stored currently by adopting the group of candidate super-parameter values; and re-executing the operation according to the probability value corresponding to each group of the replaced currently stored super-parameter values.

Drawings

In order to more clearly illustrate the technical solutions of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a training process of a joint learning model according to an embodiment of the present application;

fig. 2 is a flow algorithm-based super-parameter tuning process provided in the prior art;

FIG. 3 is a schematic diagram of joint learning provided by an embodiment of the present application;

FIG. 4 is a graph showing a hyper-parameter value substitution process according to an embodiment of the application;

FIG. 5 is a super parameter tuning process according to an embodiment of the present application;

FIG. 6 is a schematic structural diagram of a training device for a joint learning model according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described in further detail below with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

In order to improve training effect and prediction capability of a joint learning model, the embodiment of the application provides a joint learning model training method, device, equipment and medium.

In an embodiment of the present application, for each round of training, the following operations are performed: determining a group of target hyper-parameter values according to the probability value corresponding to each group of currently stored hyper-parameter values, and training each sub-model by adopting the group of target hyper-parameter values; determining a total loss value according to each sub-loss value corresponding to each sub-model; if the total loss value is larger than a preset threshold value, updating the probability value corresponding to each group of super-parameter values according to a Fedex algorithm and the total loss value; inputting each group of hyper-parameter values and the updated probability value of each group of hyper-parameter values into a generation model, and obtaining a group of candidate hyper-parameter values output by the generation model and the probability values corresponding to the group of candidate hyper-parameter values; replacing a group of super-parameter values with the minimum probability value stored currently by adopting the group of candidate super-parameter values; and re-executing the operation according to the probability value corresponding to each group of the replaced currently stored super-parameter values.

Example 1:

fig. 1 is a schematic diagram of a training process of a joint learning model according to an embodiment of the present application, where the process includes:

for each round of training, the following operations are performed:

s101: and determining a group of target hyper-parameter values according to the probability value corresponding to each group of currently stored hyper-parameter values, and training each sub-model by adopting the group of target hyper-parameter values.

The joint learning model training method provided by the embodiment of the application is applied to a server.

The joint learning is a machine learning setting, and the sub-models of a plurality of clients train the models together under the coordination of the server, while maintaining the decentralization and dispersion of the training data. For example, in a user gas flue problem detection scene, pictures photographed one by enterprise personnel are taken, the content is a gas stove and a gas flue, and the joint learning is to jointly train a model by combining related pictures in different areas, and the intelligent real-time detection task of replacing manpower with an algorithm can be completed by the joint model obtained through training and the handheld terminal equipment.

In the joint learning process, the model needs to be subjected to super-parameter configuration, and after the super-parameters are configured, the corresponding numerical values are not modified in the training process. However, the selection of the hyper-parameter value directly affects the accuracy of the joint model obtained by the joint learning training. Based on this, in order to improve the determined optimal super parameter value of the super parameter values, in the embodiment of the present application, a plurality of sets of super parameter values and probability values corresponding to each set of super parameters are stored in the server. The server may sample according to the probability value corresponding to each set of hyper-parameters to determine a set of target hyper-parameter values.

And the server sends the set of target hyper-parameter values to each sub-model obtained in the joint learning, so that each sub-model adopts the set of target hyper-parameter values to configure local hyper-parameters, and model training is carried out after the configuration is completed.

S102: determining a total loss value according to each sub-loss value corresponding to each sub-model; and if the total loss value is larger than a preset threshold value, updating the probability value corresponding to each group of super-parameter values according to a Fedex algorithm and the total loss value.

In the embodiment of the application, after each sub-model is trained by adopting the target parameter value, each sub-model sends the sub-loss value obtained by training to a server. The server determines a total loss value for the joint learning based on each sub-loss value. If the total loss value is greater than the preset threshold, the probability value corresponding to each group of super-parameter values can be updated according to the Fedex algorithm and the total loss value.

Specifically, in the embodiment of the present application, the server calculates the sum value of each sub-loss value, and determines the sum value as the total loss value. If the server determines that the total loss value is greater than the preset threshold, the server calls a process corresponding to an Eedex algorithm, and updates probability values corresponding to each group of super-parameter values through the Fedex algorithm and the total loss value.

Specifically, the server configures the hyper-parameter value as x, the probability value as y, and (x, y) fits a proxy function (e.g., gaussian process model), evaluates the point set using the acquisition function and derives a new candidate hyper-parameter value.

S103: inputting each group of hyper-parameter values and the updated probability value of each group of hyper-parameter values into a generation model, and obtaining a group of candidate hyper-parameter values output by the generation model; replacing a group of super-parameter values with the minimum probability value stored currently by adopting the group of candidate super-parameter values; and re-executing the operation according to the probability value corresponding to each group of the replaced currently stored super-parameter values.

Under a complex scene, the problem of the joint learning is more complex, the time, the calculation cost and the learning performance are important, so that the traditional super-parameter tuning algorithm is difficult to apply to the joint learning, and the traditional joint learning super-parameter tuning algorithm is not perfect. For example, the super-parameter tuning algorithm based on the FedEx algorithm adopts one-time selection of multiple groups of super-parameter values, and other groups of super-parameter values cannot be acquired in the middle, so that the configuration of the finally obtained super-parameter values is not optimally ensured, and is only the optimal value in the multiple super-parameter values selected at one time. The Flora algorithm in the related art is superior to the Fedex algorithm in performance, but the Flora algorithm sacrifices a great time/calculation cost to replace communication cost, and the requirement of super-parameter tuning in complex combined learning scenes cannot be met.

Fig. 2 is a process of super parameter tuning based on the Flora algorithm provided in the prior art, as shown in fig. 2, the process includes:

s201: the server initializes N sets of hyper-parameter values and corresponding probability values.

S202: and sampling N groups of super-parameter values according to the probability values.

S203: distributing the sampled set of hyper-parameter values to each sub-model training.

S204: each sub-model uploads the training results to the server.

S205: the server updates and evaluates the super parameter set, if the evaluation passes, the training is ended, and if the evaluation does not pass, the S206 is executed.

S206: and the server updates the probability value corresponding to the super parameter value according to the training result, and executes S202.

Based on this, in the embodiment of the present application, after the server updates the probability value corresponding to each set of hyper-parameter values, the server inputs each set of hyper-parameter values and the updated probability value of each set of hyper-parameter values into the generation model, and the generation model generates a new set of candidate hyper-parameter values according to each set of hyper-parameter values and the updated probability value of each set of hyper-parameter values. The server replaces the set of super parameter values with the set of candidate super parameter values with the minimum probability value, and re-executes the operation according to the probability value corresponding to each set of replaced super parameter values.

Fig. 3 is a schematic diagram of joint learning provided in the embodiment of the present application, where, as shown in fig. 3, a server determines a candidate hyper-parameter value by using a Fedex algorithm, and replaces a hyper-parameter value with a minimum probability value by using the candidate hyper-parameter value. The server selects one set from the multiple sets of hyper-parameter values as a target hyper-parameter value, distributes the target hyper-parameter value to each sub-model (Client 1, client2 … … Client n), trains each sub-model according to a Local sample (Local Training), and sends a Training result (valid) to the server.

In the embodiment of the application, one super parameter is prevented from being directly and randomly selected as the super parameter of the joint learning, and the optimal super parameter is selected according to the training effect corresponding to different super parameters, so that the training effect and the prediction capability of the model are improved.

Example 2:

in order to select the optimal hyper-parameter value, based on the above embodiment, in the embodiment of the present application, if the current training is the first training, before determining a set of target hyper-parameter values according to the probability value corresponding to each set of currently stored hyper-parameter values, the method further includes:

In the embodiment of the application, before the server starts training, the server randomly selects a preset number of groups of super-parameter values from the super-parameter value database, and determines the preset groups of super-parameter values as each group of super-parameters currently stored.

Specifically, in the embodiment set of the present application, if the current training is the first training, the server randomly selects a preset number of sets of hyper-parameter values from the hyper-parameter value database. The electronic equipment acquires the stored initial probability value and determines the initial probability value as the probability value corresponding to each group of super-parameter values. Typically, the sum of the initial probability values for each set of hyper-parameter values is 1.

Example 3:

in order to update the probability value of each set of super parameter values, in the embodiments of the present application, updating the probability value corresponding to each set of super parameter values according to the Fedex algorithm and the total loss value includes:

In the embodiment of the application, the server can update the probability value corresponding to each group of super-parameter values based on the program constructed by the Fedex algorithm.

Specifically, a program constructed based on a Fedex algorithm is stored in a server, and each group of hyper-parameter values, probability values corresponding to each group of hyper-parameter values and total loss values are input into the program by the server, and each group of hyper-parameter values output by the program and updated probability values corresponding to each group of hyper-parameter values are acquired.

In order to improve the training effect and the prediction capability of the joint learning model, the generated model is designed based on a Bayesian algorithm in the embodiment of the application.

In the embodiment of the present application, when the server obtains a new set of candidate hyper-parameter values based on the generated model, the generated model may be a model designed based on a bayesian algorithm.

In addition, in the embodiment of the application, the generated model can also be a model designed by adopting a particle swarm algorithm, a genetic algorithm or an evolutionary algorithm.

Example 4:

in order to improve the training effect and the prediction capability of the joint learning model, based on the above embodiments, in the embodiment of the present application, replacing the set of super parameter values with the smallest probability value currently stored includes:

In the embodiment of the application, when the server replaces the set of super-parameter values with the smallest probability value stored currently, the server sorts the super-parameter values according to the probability value corresponding to each set of super-parameter values to determine a super-parameter value queue.

Then, the server deletes the last group of the super parameter values in the super parameter value queue, and adds the candidate super parameter values to the super parameter value queue.

Specifically, in the embodiment of the present application, the server sorts the super parameter values of each group according to the probability value corresponding to each super parameter value of each group, and determines a super parameter value queue. And the server stores the probability value corresponding to each order in the super parameter value queue, adds the candidate super parameter value into the first order, and determines the probability value corresponding to the first order as the probability value corresponding to the candidate super parameter value. The server deletes the last group of super parameter values in the super parameter value queue and moves the other super parameter values in the super parameter value queue in order.

Fig. 4 is a schematic diagram of a process for replacing a superparameter value according to an embodiment of the present application, where as shown in fig. 4, a server sorts each group of superparameter values according to probability values corresponding to each group of superparameter values, determines a superparameter value queue, adds candidate superparameter values to a first order, moves other superparameter values in the superparameter value queue in order, and deletes a last group of superparameter values in the superparameter value queue.

Example 5:

in order to improve the foregoing embodiments, in an embodiment of the present application, the method further includes:

In the embodiment of the application, when the joint learning model is trained, aiming at each training round, if the training round number of the joint learning model reaches a preset quantity threshold value, determining that the training of the joint learning model is completed, and determining the set of target hyper-parameter values as the optimal hyper-parameter values of the joint learning model.

Or, in the embodiment of the application, for each round of training, if the total loss value corresponding to the round of training does not exceed the preset threshold value and the number of times that the total loss value does not exceed the preset threshold value reaches the preset number of times threshold value, determining that the training of the joint learning model is completed, and determining the set of target super-parameter values as the optimal super-parameter value of the joint learning model.

Fig. 5 is a process of super parameter tuning provided in an embodiment of the present application, as shown in fig. 5, where the process includes:

s501: the server initializes N sets of hyper-parameter values and corresponding probability values.

S502: and sampling N groups of super-parameter values according to the probability values.

S503: distributing the sampled set of hyper-parameter values to each sub-model training.

S504: each sub-model uploads the training results to the server.

S505: the server updates and evaluates the super parameter set, if the evaluation passes, the training is ended, and if the evaluation does not pass, the step S506 is executed.

S506: and the server updates the probability value corresponding to the super-parameter value according to the training result.

S507: the server determines candidate hyper-parameter values according to the generated model, updates the N groups of hyper-parameter values by using the candidate hyper-parameter values, and executes S502 according to the generated model.

Example 6:

fig. 6 is a schematic structural diagram of a training device for a joint learning model according to an embodiment of the present application, where the device includes:

the determining module 601 is configured to determine a set of target hyper-parameter values according to the probability value corresponding to each set of currently stored hyper-parameter values, and enable each sub-model to train with the set of target hyper-parameter values;

a processing module 603, configured to determine a total loss value according to each sub-loss value corresponding to each sub-model; if the total loss value is larger than a preset threshold value, updating the probability value corresponding to each group of super-parameter values according to a Fedex algorithm and the total loss value;

the super parameter value optimizing module 602 is configured to input each set of super parameter values and the probability value updated by each set of super parameter values into a generating model, and obtain a set of candidate super parameter values output by the generating model; replacing a group of super-parameter values with the minimum probability value stored currently by adopting the group of candidate super-parameter values; and re-executing the operation according to the probability value corresponding to each group of the replaced currently stored super-parameter values.

In a possible implementation manner, the determining module 601 is further configured to randomly select a preset number of sets of hyper-parameter values from a hyper-parameter value database if the current training is a first-round training, and determine the preset number of sets of hyper-parameter values as each set of hyper-parameter values currently stored; and acquiring a pre-stored initial probability value, and determining the initial probability value as a probability value corresponding to each group of super-parameter values.

In a possible implementation manner, the processing module 603 is specifically configured to input the each set of hyper-parameter values, the probability value corresponding to each set of hyper-parameter values, and the total loss value into a program constructed based on a Fedex algorithm; and acquiring updated probability values corresponding to each group of hyper-parameter values output by the program.

In one possible implementation, the generated model is a model designed based on a bayesian algorithm.

In a possible implementation manner, the super-parameter value optimizing module 602 is configured to sort the super-parameter values of each group according to the probability value corresponding to each super-parameter value of each group, and determine a super-parameter value queue; deleting the last group of the super parameter values in the super parameter value queue, and adding the candidate super parameter values into the super parameter value queue.

In a possible implementation manner, the processing module 603 is further configured to perform training for each round, determine that training of the joint learning model is completed if the number of training rounds of the joint learning model reaches a preset number threshold, and determine the set of target hyper-parameter values as optimal hyper-parameter values of the joint learning model; or if the total loss value corresponding to the training of the wheel does not exceed the preset threshold value and the number of times that the total loss value does not exceed the preset threshold value reaches the preset number of times threshold value, determining that the training of the joint learning model is completed, and determining the set of target super-parameter values as the optimal super-parameter value of the joint learning model.

Example 7:

on the basis of the foregoing embodiments, an electronic device is further provided in the embodiments of the present application, and fig. 7 is a schematic structural diagram of an electronic device provided in the embodiments of the present application, as shown in fig. 7, including: the processor 71, the communication interface 72, the memory 73 and the communication bus 74, wherein the processor 71, the communication interface 72 and the memory 73 complete communication with each other through the communication bus 74;

the memory 73 has stored therein a computer program which, when executed by the processor 71, causes the processor 71 to perform the steps of:

for each round of training, the following operations are performed:

In one possible implementation, the processor is further configured to:

Because the principle of the electronic device for solving the problem is similar to that of the joint learning model training method, the implementation of the electronic device can refer to the embodiment of the method, and the repetition is omitted.

The communication bus mentioned above for the electronic devices may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus. The communication interface 72 is used for communication between the above-described electronic device and other devices. The Memory may include random access Memory (Random Access Memory, RAM) or may include Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.

The processor may be a general-purpose processor, including a central processing unit, a network processor (Network Processor, NP), etc.; but also digital instruction processors (Digital Signal Processing, DSP), application specific integrated circuits, field programmable gate arrays or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.

Example 8:

on the basis of the above embodiments, the embodiments of the present application further provide a computer readable storage medium, in which a computer program executable by a processor is stored, which when executed on the processor causes the processor to implement the steps of:

for each round of training, the following operations are performed:

In a possible implementation manner, if the current training is the first training, before determining a set of target hyper-parameter values according to the probability value corresponding to each set of currently stored hyper-parameter values, the method further includes:

In one possible implementation manner, the updating the probability value corresponding to each set of super-parameter values according to the Fedex algorithm and the total loss value includes:

In one possible implementation, replacing the set of super-parameter values with the set of super-parameter values having the smallest currently stored probability value includes:

In one possible embodiment, the method further comprises:

Since the principle of the solution of the problem of the computer readable storage medium is similar to that of the joint learning model training method, the implementation of the computer readable storage medium can refer to the embodiment of the method, and the repetition is not repeated.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A joint learning model training method applied to a server, the method comprising:

for each round of training, the following operations are performed:

2. The method of claim 1, wherein if the current training is a first round of training, before determining a set of target hyper-parameter values according to the probability value corresponding to each set of currently stored hyper-parameter values, the method further comprises:

3. The method of claim 1, wherein updating the probability value corresponding to each set of super-parameter values according to the Fedex algorithm and the total loss value comprises:

4. The method according to claim 1, wherein the generated model is a model designed based on a bayesian algorithm.

5. The method of claim 1, wherein replacing the set of hyper-parameter values for which the currently stored probability value is the smallest with the set of hyper-parameter values comprises:

6. The method according to claim 1, wherein the method further comprises:

7. A joint learning model training apparatus for use with a server, the apparatus comprising:

8. The apparatus of claim 7, wherein the determining module is further configured to randomly select a preset number of sets of hyper-parameter values from a hyper-parameter value database if the current training is a first round of training, and determine the preset number of sets of hyper-parameter values as each set of hyper-parameter values currently stored; and acquiring a pre-stored initial probability value, and determining the initial probability value as a probability value corresponding to each group of super-parameter values.

9. An electronic device comprising a processor for implementing the steps of the joint learning model training method of any of claims 1-6 when executing a computer program stored in a memory.

10. A computer readable storage medium, characterized in that it stores a computer program which, when executed by a processor, implements the steps of the joint learning model training method according to any one of claims 1-6.