CN112070205A

CN112070205A - Multi-loss model obtaining method and device

Info

Publication number: CN112070205A
Application number: CN202010754299.5A
Authority: CN
Inventors: 徐航; 张耕维; 李震国; 梁小丹; 黎嘉伟; 陈翼翼
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-07-30
Filing date: 2020-07-30
Publication date: 2020-12-11

Abstract

The embodiment of the application relates to the field of artificial intelligence, and provides a method and a device for acquiring a multi-loss model, which are used for dynamically adjusting a loss weighted value in a multi-loss model training process, so that the multi-loss model obtains a better training result. In the multi-loss model training process, acquiring a first loss value output by the sub-network in a first iteration period; then generating alternative weights according to the first loss value; inputting the first loss value and the alternative weight into a weight prediction model, and outputting a first weight parameter of the sub-network, wherein the first weight parameter is used for training a second iteration cycle in the multi-loss model training process, and the second iteration cycle is a next iteration cycle of the first iteration cycle; and finally, updating the parameters of the multi-loss model according to the first weight parameters.

Description

Multi-loss model obtaining method and device

Technical Field

The application relates to the field of artificial intelligence, in particular to a multi-loss model obtaining method and device.

Background

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision and reasoning, human-computer interaction, recommendation and search, AI basic theory, and the like.

Various functions of the AI, such as natural language processing, image processing, voice processing, etc., are currently typically implemented through neural networks. With the development of AI technology, the functions of AI are gradually diversified. Taking the panorama segmentation task as an example, the panorama segmentation task includes multiple subtasks such as instance segmentation, classification, regression, instance mask (instance mask), semantic segmentation (semantic segmentation), and the like. There is one optimization objective per sub-task. In the model training process corresponding to the panoramic segmentation task, each subtask corresponds to a sub-network during optimization, each sub-network corresponds to a plurality of loss values, and then neural network parameters are optimized through a back propagation algorithm according to the loss values, so that the finally output model is also called a multi-loss model. In the training process of the multi-loss model, the competition balance between each sub-network determines the final training effect of the multi-loss model. For example, in the neural network model training corresponding to the panoramic segmentation task, the training of the detection task and the segmentation task compete with each other, which may result in poor segmentation effect of a certain subtask, and therefore, the training problem among a plurality of tasks needs to be automatically optimized. Meanwhile, in a multi-task application scenario, each sub-network has very high sensitivity to loss weighted values.

The existing methods generally use sampling and other methods to search and optimize the hyperparameters, but cannot optimize the loss weighted values in a targeted manner. Therefore, how to better optimize the loss weighting value in the multi-loss model training process is an urgent problem to be solved.

Disclosure of Invention

The embodiment of the application provides a method and a device for acquiring a multi-loss model, which are used for realizing dynamic adjustment of loss weighted values in a multi-loss model training process, so that the multi-loss model obtains a better training result.

In a first aspect, an embodiment of the present application provides a multi-loss model acquisition method, which is applied to an acquisition scenario of a multi-loss model including at least one sub-network (i.e., a sub-task). In the multi-loss model training process, in a first iteration period, the multi-loss model obtaining device obtains a first loss value output by the sub-network; then the multi-loss model obtaining device generates alternative weights according to the first loss value; next, the multi-loss model obtaining device inputs the first loss value and the candidate weight into a weight prediction model, and outputs a first weight parameter of the subnetwork, wherein the first weight parameter is used for training a second iteration cycle in the multi-loss model training process, and the second iteration cycle is a next iteration cycle of the first iteration cycle; and finally, updating the parameters of the multi-loss model according to the first weight parameters by the multi-loss model training.

It is to be understood that the above describes only one iteration process in the training process, and the first iteration cycle and the second iteration cycle are also only used for distinguishing the iteration cycles and are not used for limiting a specific iteration cycle in the training process. In a specific application, the multi-loss model training process has a plurality of the above iterative processes. After the training iteration cycle of the multi-loss model is completed (i.e., the search process corresponding to the model is completed), the multi-loss model obtaining device outputs the trained multi-loss model.

In this embodiment, in the training process of the multi-loss model, the optimized weight values are updated for a plurality of loss values in each iteration period, and the effectiveness of the weights is explored by using parallel computing advantages and excellent network parameters and weights are inherited in the training process, so that the optimal training model is obtained within one training time. Optionally, the specific operation of the multi-loss model obtaining device in generating the candidate weight according to the first loss value is as follows: after the multi-loss model acquisition device acquires the first loss value, determining a sampling space of the first loss value; and then sampled in the sample space to obtain the candidate weights.

In this embodiment, the sampling space includes, but is not limited to, a log-normal distribution of the first loss values or a log-uniform distribution of the first loss values.

When the sampling space is lognormal distribution of the first loss value, the multi-loss model obtaining device determines the lognormal distribution as the sampling space; then, acquiring a sampling variance of the sampling space, and calculating a sampling mean value of the sampling space, wherein the sampling variance is a preset parameter; and finally, sampling in the sampling space according to the sampling mean value and the sampling variance to obtain the alternative weight. In this embodiment, the sample space generated by using the first loss value and the sample mean generate an alternative weight, so that the loss value and the weight can be associated together to obtain a targeted high-quality weighting scheme. Meanwhile, the lognormal distribution is adopted, so that the alternative weight obtained by sampling is ensured to be larger than zero on the basis of good generalization of the lognormal distribution.

Alternatively, the number of alternative weights may be determined according to the number of first penalty values (which is also equivalent to the number of subnets), i.e. the number of alternative weights is the nth power of 2, where N is the number of first penalty values. In the case of a smaller first loss value, the value of N may be set in this embodiment to ensure that enough candidate weights can be selected. In this embodiment, sufficient candidate weights are selected, so that the purpose of selecting high-quality weight parameters is achieved.

Optionally, the multi-loss model obtaining apparatus may also update the sampling mean value in real time during the training process, and the specific operations are as follows: the multi-loss model acquisition deviceCalculating the sample mean value by using a first formula and the first loss value, wherein the first formula is as follows:

wherein μ is used to represent the sample mean, c is used to represent the first penalty value, i is used to represent an identification of the first penalty value, and n is used to represent a number of the first penalty values. In this embodiment, the sampling mean value is updated according to the loss value, and the loss value and the weighting can be more closely associated in real time, so that a higher-quality weighting scheme is implemented.

Optionally, after the multi-loss model obtaining device generates the candidate weights, the candidate weights may be further filtered to obtain weight parameters trained by the sub-network in the second iteration cycle, and the specific operations are as follows: the multi-loss model obtaining device inputs the first loss value and the alternative weights into the weight prediction model to calculate expected scores corresponding to the alternative weights, wherein the expected scores are used for evaluating the training effect of the alternative weights; and then, the expected scores are sorted according to a descending order, and K candidate weights before sorting are selected to be output as first weight parameters corresponding to the sub-networks, wherein K is the number of the sub-networks. In this embodiment, better weight parameters are selected by predicting the training effect, so that the sampling efficiency and the quality of the weight parameters can be improved.

Optionally, the weight prediction model may further be updated according to historical training information, where the historical training information includes a loss value, a weight parameter, and a training effect corresponding to the first iteration cycle, and a loss value, a weight parameter, and a training effect corresponding to an iteration cycle before the first iteration cycle. In this embodiment, the multi-loss model acquisition device is constructed based on historical information in a training process as training data, and can effectively improve sampling efficiency and weight generation quality.

Optionally, the multi-loss model training parameters may further utilize a small number of training samples to perform initial training at an initial stage of training to obtain initialization training parameters, where the initialization training parameters include initialization weights, sampling variances, and initial parameters of a weight prediction model, and then start training of the multi-loss model according to the initialization training parameters. It can be understood that, in this embodiment, the multi-loss model obtaining apparatus may perform initialization training with a small number of samples, or may directly set corresponding initialization data by the user in the initialization stage. The specific manner is not limited herein.

In a second aspect, an embodiment of the present application provides a multiple loss model obtaining apparatus, where the apparatus includes at least one functional module, where the at least one functional module is configured to implement the method described in the first aspect or any one of the possible implementation manners of the first aspect, and communication between the functional modules may be implemented to implement the method steps.

In a third aspect, an embodiment of the present application provides a multiple loss model obtaining apparatus, including a memory and a processor; the memory stores programmable instructions; the processor is configured to invoke the programmable instructions to perform the method described in the first aspect or any one of the possible implementations of the first aspect.

In a fourth aspect, an embodiment of the present application provides a multiple loss model obtaining apparatus, including a processor; the processor is configured to be coupled to the memory, read instructions in the memory, and execute the method described in the first aspect or any one of the possible implementation manners of the first aspect according to the instructions. Optionally, the memory is an internal memory of the apparatus, and may also be an external memory of the apparatus. Alternatively, the apparatus may be a dedicated chip for training the multi-loss model, and the memory may be integrated in the dedicated chip or may be independent from the dedicated chip.

In a fifth aspect, embodiments of the present application provide a computer-readable storage medium having instructions stored thereon, which, when executed on a computer, cause the computer to perform the method described in the first aspect or any one of the possible implementation manners of the first aspect.

In a sixth aspect, the present application provides a computer program product containing instructions which, when run on a computer, cause the computer to perform the method of the above aspects.

In a seventh aspect, an embodiment of the present application provides a multiple loss model obtaining apparatus, where the apparatus includes: an obtaining module, configured to obtain a first loss value corresponding to the sub-network, where the first loss value is a loss value output by a first iteration cycle in the multi-task training scenario; a generating module, configured to generate an alternative weight according to the first loss value; a screening module, configured to input the first loss value and the candidate weights into a weight prediction model, and output a first weight parameter of the subnetwork, where the first weight parameter is used for training a second iteration cycle, and the second iteration cycle is a next iteration cycle of the first iteration cycle; and the first updating module is used for updating the parameters of the multi-loss model according to the first weight parameters.

Optionally, the generating module is specifically configured to determine a sampling space of the first loss value; sampling in the sampling space to obtain the alternative weights.

Optionally, the sampling space includes a lognormal distribution of the first loss values or a loguniform distribution of the first loss values.

Optionally, when the sampling space is lognormal distribution of the first loss value, the generating module is specifically configured to obtain a sampling variance of the sampling space, and calculate a sampling mean of the sampling space, where the sampling variance is a preset parameter; sampling in the sampling space according to the sampling mean and the sampling variance to obtain the alternative weights.

Optionally, the generating module is specifically configured to calculate the sampling mean value by using a first formula and the first loss value; wherein the first formula is:

wherein μ is used to represent the sample mean, c is used to represent the first penalty value, i is used to represent an identification of the first penalty value, and n is used to represent a number of the first penalty values.

Optionally, the screening module is specifically configured to input the first loss value and the candidate weight into the weight prediction model to calculate an expected score corresponding to the candidate weight, where the expected score is used to evaluate a training effect of the candidate weight; and performing descending arrangement on the expected scores, and selecting K alternative weights before arrangement to be output as the first weight parameter, wherein K is the number of the sub-networks.

Optionally, the apparatus further comprises: the initialization module is used for acquiring initialization training parameters, wherein the initialization training parameters comprise initialization weights, sampling variances and initial parameters of the weight prediction model; and starting the multi-loss model training according to the initialized training parameters.

Optionally, the apparatus further comprises: and the second updating module is used for updating the weight prediction model according to historical training information, wherein the historical training information comprises the first iteration period, loss values before the first iteration period, weight parameters corresponding to the loss values and training effects.

Drawings

FIG. 1 is a schematic diagram of an artificial intelligence agent framework provided by an embodiment of the present application;

fig. 2 is a schematic diagram of an application environment according to an embodiment of the present application;

FIG. 3 is a diagram of a system architecture according to an embodiment of the present application;

FIG. 4 is a diagram of a training system architecture provided in accordance with an embodiment of the present application;

FIG. 5 is a diagram illustrating an embodiment of a multi-loss model obtaining method according to an embodiment of the present application;

FIG. 6 is a schematic workflow diagram of a multi-loss model obtaining method in an embodiment of the present application;

FIG. 7 is a graph comparing experimental results of a multiple loss model in the examples of the present application;

FIG. 8 is a graph comparing results of another experiment of the multiple loss model in the example of the present application;

FIG. 9 is a graph comparing results of another experiment of the multiple loss model in the example of the present application;

FIG. 10 is a graph comparing results of another experiment of the multiple loss model in the example of the present application;

FIG. 11 is a comparison graph of the visualization effect of the multi-loss model in the embodiment of the present application;

FIG. 12 is a schematic diagram of an embodiment of a multi-loss model obtaining apparatus in the embodiment of the present application;

fig. 13 is a schematic diagram of another embodiment of a multiple-loss model obtaining apparatus in an embodiment of the present application;

FIG. 14 is a system block diagram of a multi-loss model acquisition apparatus according to an embodiment of the present application;

fig. 15 is a schematic diagram of another embodiment of the multiple loss model obtaining apparatus in the embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application are described below with reference to the accompanying drawings, and it is obvious that the described embodiments are only some embodiments of the present application, but not all embodiments. As can be known to those skilled in the art, with the advent of new application scenarios, the technical solution provided in the embodiments of the present application is also applicable to similar technical problems.

The terms "first," "second," and the like in the description and in the claims of the present application and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Moreover, the terms "comprises," "comprising," and any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or modules is not necessarily limited to those steps or modules explicitly listed, but may include other steps or modules not expressly listed or inherent to such process, method, article, or apparatus. The naming or numbering of the steps appearing in the present application does not mean that the steps in the method flow have to be executed in the chronological/logical order indicated by the naming or numbering, and the named or numbered process steps may be executed in a modified order depending on the technical purpose to be achieved, as long as the same or similar technical effects are achieved. The division of the units presented in this application is a logical division, and in practical applications, there may be another division, for example, multiple units may be combined or integrated into another system, or some features may be omitted, or not executed, and in addition, the shown or discussed coupling or direct coupling or communication connection between each other may be through some interfaces, and the indirect coupling or communication connection between the units may be in an electrical or other similar form, which is not limited in this application. Furthermore, the units or sub-units described as the separate parts may or may not be physically separate, may or may not be physical units, or may be distributed in a plurality of circuit units, and some or all of the units may be selected according to actual needs to achieve the purpose of the present disclosure.

For a better understanding of the embodiments of the invention, the following embodiments may be referred to by first explaining the concepts herein:

hyper-parametric optimization (HPO): the method is characterized in that an algorithm is used for optimizing the hyper-parameters in the existing model or algorithm. The hyper-parameters are generally set models or algorithm settings, and when a hyper-parameter optimization algorithm is used, appropriate hyper-parameters can be given for a specific model without manual intervention, so that the performance of the model is exerted to the maximum. Hyper-parameter optimization is an important component of automated machine learning.

Automatic machine learning (AutoML): refers to designing a series of advanced control systems to operate machine learning models so that the models can learn the appropriate parameters and configurations automatically without human intervention. In a learning model based on a deep neural network, automatic computation learning mainly comprises network architecture search and global parameter setting. The network architecture search is a recent research hotspot, is used for enabling a computer to generate a neural network architecture which is most suitable for problems according to data, and has the characteristics of high training complexity and great performance improvement.

Panorama Segmentation (Panoptic Segmentation): panorama segmentation is a recently proposed task of image parsing. The task combines two core computer vision tasks of Instance Segmentation (Instance Segmentation) and semantic Segmentation (semantic Segmentation), and an algorithm is required to solve the problem of pain points of the two tasks at the same time.

Loss value (Loss): for neural network training, the loss values between the network outputs and the labels are typically calculated, and then the neural network parameters are optimized by a back propagation algorithm. For a model of a task such as panorama segmentation, loss of a plurality of subtasks is generally included, a final loss value of a network is obtained through weighted summation, and network training is generally sensitive to a weighted value.

Multilayer Perceptron (Multi-Layer Perceptron, MLP): the artificial neural network is a simple artificial neural network, and generally comprises an input layer, a hidden layer and an output layer. The input layer is usually represented by a vector and the network parameters are optimized by a back-propagation algorithm.

FIG. 1 shows a schematic diagram of an artificial intelligence body framework that describes the overall workflow of an artificial intelligence system, applicable to the general artificial intelligence field requirements.

The artificial intelligence topic framework described above is set forth below in terms of two dimensions, the "intelligent information chain" (horizontal axis) and the "IT value chain" (vertical axis).

The "smart information chain" reflects a list of processes processed from the acquisition of data. For example, the general processes of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision making and intelligent execution and output can be realized. In this process, the data undergoes a "data-information-knowledge-wisdom" refinement process.

The 'IT value chain' reflects the value of the artificial intelligence to the information technology industry from the bottom infrastructure of the human intelligence, information (realization of providing and processing technology) to the industrial ecological process of the system.

(1) Infrastructure:

the infrastructure provides computing power support for the artificial intelligent system, realizes communication with the outside world, and realizes support through a foundation platform. Communicating with the outside through a sensor; the computing power is provided by intelligent chips (hardware acceleration chips such as CPU, NPU, GPU, ASIC, FPGA and the like); the basic platform comprises distributed computing framework, network and other related platform guarantees and supports, and can comprise cloud storage and computing, interconnection and intercommunication networks and the like. For example, sensors and external communications acquire data that is provided to intelligent chips in a distributed computing system provided by the base platform for computation.

(2) Data of

Data at the upper level of the infrastructure is used to represent the data source for the field of artificial intelligence. The data relates to graphs, images, voice and texts, and also relates to the data of the Internet of things of traditional equipment, including service data of the existing system and sensing data such as force, displacement, liquid level, temperature, humidity and the like.

(3) Data processing

Data processing typically includes data training, machine learning, deep learning, searching, reasoning, decision making, and the like.

The machine learning and the deep learning can perform symbolized and formalized intelligent information modeling, extraction, preprocessing, training and the like on data.

Inference means a process of simulating an intelligent human inference mode in a computer or an intelligent system, using formalized information to think about and solve a problem by a machine according to an inference control strategy, and a typical function is searching and matching.

The decision-making refers to a process of making a decision after reasoning intelligent information, and generally provides functions of classification, sequencing, prediction and the like.

(4) General capabilities

After the above-mentioned data processing, further based on the result of the data processing, some general capabilities may be formed, such as algorithms or a general system, e.g. translation, analysis of text, computer vision processing, speech recognition, recognition of images, etc.

(5) Intelligent product and industrial application

The intelligent product and industry application refers to the product and application of an artificial intelligence system in various fields, and is the encapsulation of an artificial intelligence integral solution, the intelligent information decision is commercialized, and the landing application is realized, and the application field mainly comprises: intelligent manufacturing, intelligent transportation, intelligent home, intelligent medical treatment, intelligent security, automatic driving, safe city, intelligent terminal and the like.

The multi-loss model provided by the embodiment of the application can be applied to various multitask scenes. In one exemplary scenario, the multitasking scene may be a computational visual scene, including a panoramic segmented scene in the unmanned vehicle perception system as shown in fig. 2. In the panorama segmentation task, the network structure may include classification, regression, instance mask (instance mask) loss and 1 semantic segmentation (semantic segmentation) loss of 3 stages, and 10 losses in total need to be optimized and tuned simultaneously. Namely, in a panoramic segmentation scene, instance segmentation is performed on each target object in an acquired picture to obtain an instance mask (mask) of each object (for example, each car, person, and the like), and meanwhile, the neural network needs to output a semantic segmentation result of a background (such as sky and ground). As shown in fig. 2, a is a scene picture collected by the unmanned vehicle sensing system; a picture output after the panorama is divided is shown as b in fig. 2.

In this embodiment, a computer vision scene is taken as an example for explanation, and as shown in fig. 3, a system architecture 300 is provided in the embodiment of the present application. The data collection device 360 is configured to obtain sample data and a loss value generated by training and store the sample data and the loss value in the database 330, and the training device 320 generates the target model/rule 301 based on the sample data maintained in the database 330 and the loss value generated by training. The following describes in more detail how the training device 320 obtains the target model/rule 301 based on the sample data and the loss value generated by training, and the target model/rule 301 can adaptively adjust the weight parameter corresponding to the loss value, and meanwhile, explores the effectiveness of the weight and inherits excellent network parameters and weights by using the advantage of parallel computation in the training process, thereby achieving the optimal training model within one training time.

As shown in fig. 4, the embodiment of the present application provides a training system 400 for a multiple loss model, which includes a weight generation module 401, a search module 402, and a weight prediction update module 403. The weight generation module 401 includes weight samples 4011 and weight predictions 4012. The searching module 402 outputs the loss value of each iteration period to the weight generating module 401, and then the weight samples 4011 of the weight generating module 401 determine a sampling space and a sampling mean value according to the loss value, and sample in the sampling space according to the sampling mean value and a preset sampling variance to generate an alternative weight; then, the expected scores corresponding to the candidate weights are predicted by the weight prediction 4012, and then the candidate weight with the expected score ranked at the top is selected as the weight parameter of the next stage. The search module 402 adds the weight parameters output by the weight prediction 4012 to the training process, and the search module 402 will train for at least one sub-network. It is understood that the training process of the search module 402 further includes other network parameters, which are not limited herein. The above process is repeated to realize iterative training. Meanwhile, the search module 402 takes the loss value output by each iteration training, the weight parameter corresponding to the loss value and the training effect as the input of the weight prediction updating module 403, and generates the purpose of model capability corresponding to the weight prediction 4012 through MLP.

An embodiment of the present application provides a method for acquiring a multiple loss model, specifically referring to fig. 5, the method for acquiring a multiple loss model includes the following steps:

501. and acquiring a first loss value corresponding to the sub-network, wherein the first loss value is the loss value output by the first iteration cycle.

In the model training process, the multi-loss model obtaining device obtains a first loss value output by the training of the at least one sub-network in a first iteration period. Assuming that the training process of the multi-loss model includes two sub-networks, each sub-network corresponds to 10 loss values, the number of iterations is 2000, the first iteration cycle is 100, and the multi-loss model obtaining device obtains 20 loss values output in the 100 th training process (i.e., the 20 loss values are used as the first loss value).

502. An alternative weight is generated from the first loss value.

In this embodiment, the specific operation of the multi-loss model obtaining apparatus to generate the candidate weight according to the first loss value is as follows: the multi-loss model obtaining device determines a sampling space corresponding to the first loss value, and then samples in the sampling space to obtain the candidate weight.

It is understood that, in the present embodiment, the sampling space corresponding to the first loss value includes, but is not limited to, a lognormal distribution of the first loss value or a logarithmically uniform distribution of the first loss value. Meanwhile, the parameters sampled by the multi-loss model acquisition device are different for different sampling spaces.

When the sampling space is lognormal distribution of the first loss value, the specific operation of the multi-loss model obtaining device for generating the alternative weight according to the first loss value is as follows:

the multi-loss model obtaining device determines the log-normal distribution of the first loss value; acquiring a sampling variance, and calculating the sampling mean value, wherein the sampling variance is a parameter preset by a user according to an empirical value in an initialization stage; then, taking a numerical value set in the lognormal distribution of the first loss value as a sampling space; and finally, adopting and selecting the alternative weight in the sampling space according to the sampling variance and the sampling mean. It is understood that in order to ensure enough candidate weights, the candidate weights may be selected according to the number of the first loss values, and the number of the candidate weights may be set to be an N-th power group of 2 in the embodiment, where N is the number of the first loss values. For example, if the training process of the multi-loss model includes 2 loss values, the number of candidate weights may be 4 groups, which are (a1, B1), (a2, B2), (A3, B3), and (a4, B4).

In this embodiment, the multi-loss model obtaining apparatus may calculate the sampling mean value by using a first formula and the first loss value;

wherein the first formula is:

And when the sampling space is the logarithm uniform distribution of the first loss value, the multi-loss model acquisition device samples in the sampling space according to the sampling probability to obtain the alternative weight.

503. Inputting the first loss value and the alternative weight into a weight prediction model, and outputting a first weight parameter corresponding to the sub-network, wherein the first weight parameter is used for training a second iteration cycle, and the second iteration cycle is a next iteration cycle of the first iteration cycle.

In this embodiment, the first loss value corresponds to multiple sets of candidate weights; the multi-loss model acquisition device inputs the first loss value and each group of alternative weights into a weight prediction model in sequence, and then calculates to obtain an expected value of each group of alternative weights, namely predicts to obtain a training effect after each group of alternative weights is trained; and sorting the expected scores in a descending order, and finally selecting K top-ranked alternative weights as the first weight parameters according to the number of sub-networks which are trained in parallel under the multi-task training scene. Assuming that the training process of the multi-loss model includes two loss values corresponding to the same sub-network, the number of the candidate weights may be 4 groups, which are (a1, B1), (a2, B2), (A3, B3), and (a4, B4). Obtaining expected scores corresponding to each group of alternative weights through the weight prediction model, wherein the expected scores are respectively assumed to be C1, C2, C3 and C4, and are sequentially reduced according to the sequence of C1, C2, C3 and C4; since the number of subnetworks is 1, an alternative weight set with a desired value of C1 is selected as the first weight parameter for the subnetwork. In this embodiment, when the number of the loss values is small, the number of the candidate weights may be set by the user, as long as the number of the candidate weights is sufficient.

In this embodiment, the weight prediction model may also be updated according to the historical training information, that is, training according to the historical training information is equivalent to obtaining a better weight prediction model. The historical training information comprises a loss value of a previous iteration cycle, a weight parameter corresponding to the loss value and a training result. In this embodiment, the weight prediction model may be a simple neural network model, such as MLP.

504. And updating the parameters of the multi-loss model according to the first weight parameters.

The multi-loss model obtaining device utilizes the first weight parameter to update and replace the weight parameter of the first iteration period, and then adds the updated network parameter to the first weight parameter to be used as the training parameter of the multi-loss model to train the second iteration period.

In this embodiment, after the training iteration cycle of the multi-loss model is completed, the training result of the multi-loss model is evaluated. And outputting the multi-loss model after the training result achieves the expected effect.

The following describes a method for acquiring a multi-loss model in an embodiment of the present application with a specific training procedure, specifically referring to fig. 6:

when the multi-loss model training is started, the weight generator and the network parameters are initialized, and the initialized weight parameters are generated by the weight generator. And performing parallel model training by using the initialization weight selection parameter and the initialization network parameter. Where n subnetworks are trained in parallel, each training phase is a single training phase (e.g., 2000 iterations). After the single training phase of each sub-network is completed, the single model is subjected to model performance evaluation, and model performance (i.e. training effect), loss value (loss) and weight parameters corresponding to the loss value are output. And determining whether the searching stage of the multi-loss model is completed or not, and if not, entering an updating stage. In this update phase, new weight parameters are generated in the manner shown in fig. 5 above. And updating the weight parameter of the last iteration period by using the new weight parameter, and updating the network parameter. And performing parallel model training by using the updated weight parameters and the updated network parameters. After the single training phase of each sub-network is completed, the single model is subjected to model performance evaluation, and model performance (i.e. training effect), loss value (loss) and weight parameters corresponding to the loss value are output. And determining whether the searching stage of the multi-loss model is completed or not, and if not, returning to the updating stage. And if so, outputting the optimal model.

While fig. 5 to 6 describe the universal scene of the multi-loss model, in a specific panorama segmentation task experiment, assuming that the panorama segmentation data set includes 80 target object types and 53 background type types, corresponding to about 11.8 ten thousand training pictures and 5 thousand test pictures (also referred to as COCO data set), the panorama segmentation Quality (PQ) is used as a measure of the total, and the panorama segmentation Quality (PQth, PQst) of the foreground and the background is considered. The model output by the model obtaining method in the embodiment of the application can obtain the optimal effect. Compared with the model performance obtained by no tuning or manual tuning, the performance of the model is greatly improved. The experimental results are shown in fig. 7 to 9. As can be seen from the results shown in fig. 7 to 9, on the data set according to the embodiment of the present application, the model obtained by the obtaining method of the present application obtains the optimal effect of the total PQ on the test set, relative to the manual parameter adjustment model; moreover, the optimal balance is obtained on the front-back background, and the condition of unbalanced effect in other methods does not appear. The model obtained by the method can reach 47.5% PQ. On the verification set, the weight is dynamically adjusted in the training process by adopting the method, so that the model obtained by adopting the method is greatly improved (by 2.4%).

Assume that the panorama segmentation dataset contains 100 target object classes, 50 background class classes, corresponding to 2 million training pictures and 2 million test pictures (also referred to as ADE20K dataset). The Panorama Quality (PQ) is also used as a measure of the overall Quality, while the foreground and background panorama Quality (PQth, PQst) are taken into account. The model output by the model obtaining method in the embodiment of the application can obtain the optimal effect. Compared with the model performance obtained by no tuning or manual tuning, the performance of the model is greatly improved. The experimental results are shown in fig. 10. In fig. 10, (Our Baselinee) is used to indicate the baseline for unadjusted weight parameters, (Our Baselinee) is used to indicate the baseline for manually adjusting weight parameters, (Our Baselinee + Ada-Segment) is used to indicate the baseline for using the acquisition method of the present application; therefore, the model adopting the acquisition method has great improvement.

In the visualization effect as shown in fig. 11, it can be seen that the effect of the model output by the obtaining method according to the embodiment of the present application can achieve a better effect. The image is used for indicating an original picture, the group route is used for indicating a picture segmentation result output by a model which is not output by the acquisition method of the embodiment of the application, and the outputs are used for indicating a picture segmentation result output by a model which is output by the acquisition method of the embodiment of the application.

In the embodiment of the application, in the training process of the multi-loss model, the optimized weight values are updated for the loss values of a plurality of tasks in each iteration period, and meanwhile, the effectiveness of the weight is explored by utilizing the parallel computing advantage and excellent network parameters and weights are inherited in the training process, so that the optimal training model is obtained in one training time.

In the above description of the method for acquiring a multiple loss model in the embodiment of the present application, and in the following description of the apparatus for acquiring a multiple loss model in the embodiment of the present application, referring to fig. 12 specifically, an embodiment of the apparatus 1200 for acquiring a multiple loss model in the embodiment of the present application includes: an obtaining module 1201, configured to obtain a first loss value corresponding to the sub-network, where the first loss value is a loss value output by a first iteration cycle in the multi-task training scenario;

a generating module 1202, configured to generate an alternative weight according to the first loss value;

a screening module 1203, configured to input the first loss value and the candidate weights into a weight prediction model, and output a first weight parameter of the subnetwork, where the first weight parameter is used for training a second iteration cycle, and the second iteration cycle is a next iteration cycle of the first iteration cycle;

a first updating module 1204, configured to update parameters of the multiple loss model according to the first weighting parameter.

The obtaining module 1201 is configured to perform the step 501 of the foregoing embodiment, that is, obtain the first loss value corresponding to the sub-network.

A generating module 1202, configured to perform step 502 of the foregoing embodiment, that is, generating an alternative weight according to the first loss value.

The screening module 1203 is configured to perform step 503 of the foregoing embodiment, that is, input the first loss value and the candidate weights into a weight prediction model, and output a first weight parameter of the sub-network.

A first updating module 1204, configured to perform the above-mentioned step 504 of the embodiment, that is, updating the training parameters of the multiple loss model according to the first weight parameters.

An embodiment of the present application provides a multi-loss model obtaining apparatus, as shown in fig. 13, the apparatus 1300 includes: at least one processor 1301, memory 1302, at least one communication bus 1303, at least one network interface 1304, or other user interface 1305. The communication bus 1303 is used to implement connection communication between these components.

Memory 1302 may include both read-only memory and random access memory, and provides instructions and data to processor 1301. A portion of the memory 1302 may also include non-volatile random access memory (NVRAM).

In some embodiments, memory 1302 stores the following elements, executable modules or data structures, or a subset thereof, or an expanded set thereof:

an operating system 13021 containing various system programs, such as a framework layer, a core library layer, a driver layer, etc. shown in fig. 14, for implementing various basic services and processing hardware-based tasks;

the application module 13022 contains various application programs, such as a desktop (launcher), a Media Player (Media Player), a Browser (Browser), etc., shown in fig. 14, for implementing various application services.

In the embodiment of the present application, the processor 1301 is configured to implement the method described in the embodiment of the present application corresponding to fig. 5 and fig. 6 by calling a program or an instruction stored in the memory 1302.

An embodiment of the present application provides a multiple loss model obtaining apparatus 1500, as shown in fig. 15, where the apparatus 1500 includes: a first processor 1501, a memory 1502, a transceiver 1503, a second processor 1504, a communication bus 1505. A communication bus 1505 is used to enable connective communication between these components.

The transceiver 1503 is used for data transmission with the outside.

The memory 1502 may include read-only memory and random access memory, and provides instructions and data to the first processor 1501 and the second processor 1504. A portion of the memory 1502 may also include non-volatile random access memory (NVRAM), such as RAM, ROM, EEPROM, CD-ROM, optical disks, hard disks, magnetic storage devices, and the like; the memory 1502 may be used to store one or more of computer program instructions, pre-set parameters, data resulting from computer intermediate operations, and the like.

The first processor 1501 and the second processor 1504 may be a Central Processing Unit (CPU), or digital processing units, etc.

In this embodiment, optionally, the first processor 1501 includes an on-chip memory, such as a TCM, a Cache, and an SRAM, where instructions are stored in the on-chip memory, and the first processor 1501 is coupled to the on-chip memory and is configured to implement the method described in this embodiment corresponding to fig. 5 and 6, or the first processor 1501 is coupled to the on-chip memory and is configured to call instructions in the on-chip memory and is coupled to the memory 1502 to obtain data, so as to implement the method described in this embodiment corresponding to fig. 5 and 6. In practice, the first processor 1501 may be a separately sold chip or may be integrated on a chip that includes the first processor 1501.

Optionally, the second processor 1504 is configured to implement the methods described in the embodiments of the present application corresponding to fig. 5 and 6 by calling a program or instructions stored in the memory 1502.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

Each module of the embodiment of the present application can also implement other method steps described in the embodiment of the present application corresponding to fig. 5, and details are not described here again.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims

1. A method for obtaining a multiple loss model, comprising at least one sub-network, comprising:

acquiring a first loss value corresponding to the sub-network, wherein the first loss value is a loss value output by a first iteration cycle under a scene of acquiring the multi-loss model;

generating alternative weights according to the first loss value;

inputting the first loss value and the alternative weight into a weight prediction model, and outputting a first weight parameter of the at least one sub-network, wherein the first weight parameter is used for training a second iteration cycle, and the second iteration cycle is a next iteration cycle of the first iteration cycle;

and updating the parameters of the multi-loss model according to the first weight parameters.

2. The method of claim 1, wherein generating the alternative weights from the first loss value comprises:

determining a sampling space of the first loss value;

sampling in the sampling space to obtain the alternative weights.

3. The method of claim 2, wherein the sampling space comprises a lognormal distribution of the first loss values or a loguniform distribution of the first loss values.

4. The method of claim 3, wherein when the sampling space is a lognormal distribution of the first loss values, the sampling in the sampling space to obtain the alternative weights comprises:

acquiring the sampling variance of the sampling space, and calculating the sampling mean value of the sampling space, wherein the sampling variance is a preset parameter;

sampling in the sampling space according to the sampling mean and the sampling variance to obtain the alternative weights.

5. The method of claim 4, wherein the computing the sample mean for the sample space comprises:

calculating the sampling mean value by using a first formula and the first loss value;

wherein the first formula is:

6. The method of any of claims 1 to 5, wherein inputting the first loss value and the alternative weights into a weight prediction model, outputting a first weight parameter for the at least one sub-network comprises:

inputting the first loss value and the alternative weight into the weight prediction model to calculate an expected score corresponding to the alternative weight, wherein the expected score is used for evaluating the training effect of the alternative weight;

and performing descending arrangement on the expected scores, and selecting K alternative weights before arrangement to be output as the first weight parameter, wherein K is the number of the sub-networks.

7. The method according to any of claims 1 to 6, wherein the number of alternative weights is the power N of 2, said N being the number of first loss values.

8. The method according to any one of claims 1 to 7, further comprising:

acquiring initialization training parameters, wherein the initialization training parameters comprise initialization weight, sampling variance and initial parameters of the weight prediction model;

and starting the training of the multi-loss model according to the initialized training parameters.

9. The method according to any one of claims 1 to 8, further comprising:

and updating the weight prediction model according to historical training information, wherein the historical training information comprises the first iteration period, loss values before the first iteration period, weight parameters corresponding to the loss values and training effects.

10. A multi-loss model acquisition apparatus applied to a multi-loss model acquisition scenario including at least one sub-network, comprising:

an obtaining module, configured to obtain a first loss value corresponding to the sub-network, where the first loss value is a loss value output by a first iteration cycle in a multi-loss model obtaining scenario;

a generating module, configured to generate an alternative weight according to the first loss value;

a screening module, configured to input the first loss value and the candidate weights into a weight prediction model, and output a first weight parameter of the subnetwork, where the first weight parameter is used for training a second iteration cycle, and the second iteration cycle is a next iteration cycle of the first iteration cycle;

and the first updating module is used for updating the parameters of the multi-loss model according to the first weight parameters.

11. The apparatus according to claim 10, wherein the generating module is specifically configured to determine a sampling space of the first loss values; sampling in the sampling space to obtain the alternative weights.

12. The apparatus of claim 11, wherein the sampling space comprises a lognormal distribution of the first loss values or a loguniform distribution of the first loss values.

13. The apparatus according to claim 12, wherein when the sampling space is a lognormal distribution of the first loss value, the generating module is specifically configured to obtain a sampling variance of the sampling space, and calculate a sampling mean of the sampling space, where the sampling variance is a preset parameter; sampling in the sampling space according to the sampling mean and the sampling variance to obtain the alternative weights.

14. The apparatus according to claim 13, wherein the generating module is specifically configured to calculate the sample mean using a first formula and the first loss value;

wherein the first formula is:

15. The apparatus according to any one of claims 10 to 14, wherein the screening module is specifically configured to input the first loss value and the candidate weight into the weight prediction model to calculate an expected score corresponding to the candidate weight, where the expected score is used to evaluate a training effect of the candidate weight;

16. The apparatus according to any of claims 10 to 15, wherein the number of alternative weights is N powers of 2, said N being the number of first loss values.

17. The apparatus of any one of claims 10 to 16, further comprising:

the initialization module is used for acquiring initialization training parameters, wherein the initialization training parameters comprise initialization weights, sampling variances and initial parameters of the weight prediction model;

18. The apparatus of any one of claims 10 to 16, further comprising:

and the second updating module is used for updating the weight prediction model according to historical training information, wherein the historical training information comprises the first iteration period, loss values before the first iteration period, weight parameters corresponding to the loss values and training effects.

19. A training apparatus for a multi-loss model, comprising at least one sub-network, comprising: a memory for storing program instructions and at least one processor that, when invoked, implements the multiple loss model acquisition method of any of claims 1-9.

20. A computer-readable storage medium having stored thereon instructions which, when executed on a processor, implement the multiple loss model acquisition method of any of claims 1-9.

21. A training system for a multiple loss model, comprising:

the device comprises a weight generation module, a search module and a weight prediction updating module;

the weight generation module comprises a weight sampling module and a weight prediction module;

the weight sampling module samples to obtain alternative weights according to a first loss value corresponding to the sub-network in a first iteration period;

the weight prediction module outputs a first weight parameter of the sub-network in a second iteration cycle according to the first loss value and the alternative weight, wherein the second iteration cycle is the next iteration cycle of the first iteration cycle;

the search module carries out training of a second iteration cycle according to the first weight parameter and the network parameter corresponding to the sub-network and outputs a training effect and a second loss value corresponding to the sub-network;

and the weight prediction updating module updates the parameters of the weight prediction module according to the second loss value and the training effect.