CN113822281A

CN113822281A - Apparatus, method and storage medium for multi-objective optimization

Info

Publication number: CN113822281A
Application number: CN202010565659.7A
Authority: CN
Inventors: 孙利; 汪留安; 孙俊
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2020-06-19
Filing date: 2020-06-19
Publication date: 2021-12-21
Also published as: JP2022002087A

Abstract

The present disclosure relates to an apparatus, method, and storage medium for multi-objective optimization of a model. According to one embodiment of the disclosure, the apparatus comprises: a memory storing instructions; and a processor configured to fetch instructions from the memory and execute the instructions to: determining a model loss function of the model; determining a multi-objective optimization function of the model based on the model loss function and the speed index of the model; and solving the multi-objective optimization function to determine a selected model that meets predetermined requirements; wherein the model comprises a plurality of sub-models for completing the image task; the model loss function is a weighted sum of sub-model loss functions of each of the plurality of sub-models; the loss weight for the weighted sum of each sub-model loss function is determined in an iteratively updated manner based on a set of training samples. The beneficial effects of the method, the device and the storage medium of the disclosure at least comprise: the model with good comprehensive performance can be screened out.

Description

Apparatus, method and storage medium for multi-objective optimization

Technical Field

The present disclosure relates generally to image processing, and more particularly, to an apparatus, method, and storage medium for multi-objective optimization of models for accomplishing image tasks.

Background

Recently, Deep Neural Networks (DNNs) have been adopted by many artificial intelligence. In many areas, DNNs can exceed the accuracy of human decisions. The superior performance of DNN stems from its ability to extract depth features from raw input data using statistical learning methods to obtain an efficient representation of the input space in a large amount of data. This is different from previous methods that used manually extracted features or expert design rules.

Accomplishing image tasks such as classification, localization, and/or segmentation is a common application scenario for DNN-based artificial intelligence models. When an image task is completed, a loss function is typically used to measure the accuracy performance of the model that completed the image task. In general, the loss function may evaluate how different the predicted and actual values of the model are. In general, the smaller the loss function, the better the accuracy performance of the model.

Disclosure of Invention

A brief summary of the disclosure is provided below in order to provide a basic understanding of some aspects of the disclosure. It should be understood that this summary is not an exhaustive overview of the disclosure. It is not intended to identify key or critical elements of the disclosure or to delineate the scope of the disclosure. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is discussed later.

According to one aspect of the present disclosure, an apparatus for multi-objective optimization of a model for accomplishing an image task is provided. The device includes: a memory storing instructions; and a processor configured to fetch instructions from the memory and execute the instructions to: determining a model loss function of the model; determining a multi-objective optimization function of the model based on the model loss function and the speed index of the model; and solving the multi-objective optimization function to determine a selected model that meets predetermined requirements; wherein the model comprises a plurality of sub-models for completing the image task; the model loss function is a weighted sum of sub-model loss functions of each of the plurality of sub-models; the loss weight for the weighted sum of each sub-model loss function is determined in an iterative update manner based on a training sample set; and the predetermined requirements include: the speed index of the selected model is not less than a preset speed threshold value; and the value of the model loss function at convergence for the selected model is not greater than the predetermined loss threshold.

According to one aspect of the present disclosure, a method for multi-objective optimization of a model for accomplishing an image task is provided. The method comprises the following steps: determining a model loss function of the model; determining a multi-objective optimization function of the model based on the model loss function and the speed index of the model; and solving the multi-objective optimization function to determine a selected model that meets predetermined requirements; wherein the model comprises a plurality of sub-models for completing the image task; the model loss function is a weighted sum of sub-model loss functions of each of the plurality of sub-models; the loss weight for the weighted sum of each sub-model loss function is determined in an iterative update manner based on a training sample set; and the predetermined requirements include: the speed index of the selected model is not less than a preset speed threshold value; and the value of the model loss function at convergence for the selected model is not greater than the predetermined loss threshold.

According to an aspect of the present disclosure, there is provided a computer-readable storage medium having a program stored thereon. The program is for multi-objective optimization of a model for performing an image task, and the program is such that when the program is executed by a processor it effects: determining a model loss function of the model; determining a multi-objective optimization function of the model based on the model loss function and the speed index of the model; and solving the multi-objective optimization function to determine a selected model that meets predetermined requirements; wherein the model comprises a plurality of sub-models for completing the image task; the model loss function is a weighted sum of sub-model loss functions of each of the plurality of sub-models; the loss weight for the weighted sum of each sub-model loss function is determined in an iterative update manner based on a training sample set; and the predetermined requirements include: the speed index of the selected model is not less than a preset speed threshold value; and the value of the model loss function at convergence for the selected model is not greater than the predetermined loss threshold.

The beneficial effects of the method, the device and the storage medium of the disclosure at least comprise: the model with good comprehensive performance can be screened out.

It should be noted that the above effects are not necessarily restrictive. Any one of the effects described in the present specification or other effects that can be grasped from the present specification can be achieved using or instead of the above-described effects.

Drawings

The above and other objects, features and advantages of the present disclosure will be more readily understood from the following description of embodiments thereof with reference to the accompanying drawings. The drawings are only for the purpose of illustrating the principles of the disclosure. The dimensions and relative positioning of the elements in the figures are not necessarily drawn to scale. Like reference numerals may denote like features. In the drawings:

FIG. 1 illustrates an exemplary block diagram of an apparatus for multi-objective optimization of a model to complete an image task according to one embodiment of the present disclosure;

FIG. 2 illustrates an exemplary flow diagram of a method for multi-objective optimization of a model to complete an image task according to one embodiment of the present disclosure;

FIG. 3 illustrates an exemplary flow diagram of a method for determining a model loss function according to one embodiment of the present disclosure;

FIG. 4 shows a schematic variation of a loss function with a speed indicator;

FIG. 5 illustrates an exemplary block diagram of an apparatus for multi-objective optimization of a model to complete an image task according to one embodiment of the present disclosure;

FIG. 6 illustrates an exemplary flow diagram of a method for completing an image task according to one embodiment of the present disclosure;

FIG. 7 illustrates an exemplary block diagram of an apparatus for completing an image task according to one embodiment of the present disclosure; and

fig. 8 shows an exemplary block diagram of an information processing apparatus according to one embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure will be described hereinafter with reference to the accompanying drawings. In the interest of clarity and conciseness, not all features of an actual embodiment are described in the specification. It will of course be appreciated that in the development of any such actual embodiment, numerous implementation-specific decisions may be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which will vary from one implementation to another.

Here, it should be further noted that, in order to avoid obscuring the present disclosure with unnecessary details, only the device structure closely related to the scheme according to the present disclosure is shown in the drawings, and other details not so related to the present disclosure are omitted.

It is to be understood that the disclosure is not limited to the described embodiments, as described below with reference to the drawings. In this context, embodiments may be combined with each other, features may be replaced or borrowed between different embodiments, one or more features may be omitted in one embodiment, where feasible.

As will be appreciated by one skilled in the art, aspects of the exemplary embodiments may be embodied as a system, method or computer program product. Thus, aspects of the exemplary embodiments may be embodied in the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware portions that may all generally be referred to herein as a "circuit," module "or" system. Furthermore, aspects of the illustrative embodiments may take the form of a computer program product embodied on one or more computer-readable media having computer-readable program code embodied thereon. The computer program may be distributed, for example, over a computer network, or it may be located on one or more remote servers or embedded in the memory of the device.

Computer program code for carrying out operations for aspects of the exemplary embodiments disclosed herein may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages.

One aspect of the present disclosure relates to an apparatus for multi-objective optimization of a model for accomplishing an image task. The image task involves processing the original input image and outputting valuable processing results. The processing result may be: the region of the object of interest in the image and/or the type of the object of interest. The image task may be completed by, for example, classifying, positioning, segmenting, etc. the image. The apparatus will be described below.

FIG. 1 illustrates an exemplary block diagram of an apparatus 100 for multi-objective optimization of a model for accomplishing an image task according to one embodiment of the present disclosure. The apparatus 100 includes a memory 101 and a processor 103. Memory 101 stores instructions. The processor 103 is capable of fetching instructions from the memory 101 and executing the fetched instructions to perform various operations to achieve multi-objective optimization of the model Mz. The model Mz is designed to accomplish the image task. After the model structure of the model Mz is determined, the model is trained based on a training sample set, and various parameters of the model Mz (e.g., specific values of matrix elements of a convolution kernel) can be determined. The trained model Mz can be used for processing the test image and outputting a processing result, so that an image task is completed.

In the present disclosure, multi-objective optimization may include optimization of the accuracy and operating speed of the model Mz. The selected model given after the multi-objective optimization can meet the preset requirements. The predetermined requirements include: the speed index of the selected model is not less than a preset speed threshold value; and the value of the model loss function at convergence for the selected model is not greater than the predetermined loss threshold. The speed index can characterize the speed at which the model completes the image task. The larger the velocity index, the less time is required to complete the processing of one output image. The multi-objective optimization may also include, but is not limited to, optimization of accuracy performance of the model, optimization of memory footprint of the model, and/or dictionary scale of the model, where dictionary scale refers to: the size of the parameter file of the trained neural network model is finally output; the memory occupancy is: and memory overhead actually occupied by the trained neural network when the trained neural network is deployed in an actual engineering task.

In other words, the above instructions are for implementing a method for multi-objective optimization of a model for performing an image task. One aspect of the present disclosure relates to a method for multi-objective optimization of a model for accomplishing an image task. Thus, more specific details of the apparatus 100 may be found in the following description of a method for multi-objective optimization of a model for performing an image task.

FIG. 2 illustrates an exemplary flow diagram of a method 200 for multi-objective optimization of a model for completing an image task according to one embodiment of the present disclosure.

In operation S201, a model loss function Lz of the model Mz is determined. The model loss function Lz may characterize the accuracy performance of the model Mz. The smaller the Lz, the better the accuracy performance. In the present disclosure, to accomplish the image task, the model Mz includes a plurality of (at least two) sub-models: m1]、M[2]、……、M[i]、……、M[maxi]. Multiple submodels constitute a set of submodels { M [ i ]](the model set comprises a plurality of elements, here for the sake of simplicity only one representative element M [ i ] is shown]). For example, 3 submodels, submodels M [1 ]]For accomplishing a division task, a submodel M [2 ]]For completing positioning task, sub-model M3]Used for finishing classification tasks. For example, the plurality of sub-models includes at least two of a classification model, a localization model, and a segmentation model. Can be determined according to each sub-model M [ i]Determines the appropriate loss function L_i. Referring to equation 1, in the present disclosure, the modulusThe model loss function Lz is each submodel M [ i ] of the plurality of submodels]Of the submodel loss function L_iWherein the submodel loss function L_iIs given as the loss weight of p_i. The loss function of the sub model can select a conventional loss function according to the function, and is not described in detail herein.

For any sub-model M [ i ]]It has a plurality of candidate submodel structures. The candidate sub-model structure can be denoted as m [ i ]][α]。m[i][α]By a sequence of operations O_i(0)、O_i(1)、……、O_i(j)、……、O_i(max _ j) determining that alpha is the submodel M [ i ]]Is determined. For example, the submodel M [ i ]]Selected as a model based on a deep neural network having n layers (e.g., convolutional layers, pooling layers, etc.), M [ i ]]By a sequence of operations O_i(0)、O_i(1)、……、O_i(j)、……、O_i(n-1). Each sub-model may be a deep neural network model that includes convolution operations. The combination of the candidate submodel structures of each submodel constitutes a model search space for the model Mz from which a selected model having a selected model structure can be searched for that satisfies a plurality of optimization objectives. For example, for the case where model Mz contains 3 sub-models, if M [1 ]]Is 100, M2]Is 400, M3]900, there are 36000000 candidate models for the model search space of model Mz, which corresponds to 36000000 candidate model structures. Can be represented by Mz [ k ]]To represent candidate models, by mz [ k ]]To represent the candidate model structure, which can be expressed as Mz: mz ═ o (k). In one embodiment, two sub-models in the model Mz may share a partial structure, where feasible. For example, the same convolution operation using the same convolution kernel is included in both the first and second submodels. This can reduce memory overhead and reduce hardware requirements of the device running the model without reducing model performance.

In operation S203, a multi-objective optimization function J is determined. Specifically, a multi-objective optimization function J of the model Mz is determined based on the model loss function Lz and the velocity index V of the model Mz.

In operation S205, the multi-objective optimization function J is solved. Specifically, the multi-objective optimization function J is solved to determine a selected model Mz [ k _ s ] of the model Mz that satisfies a predetermined requirement, wherein the selected model Mz [ k _ s ] has a model structure Mz [ k _ s ]. The predetermined requirements include: the velocity index V of the selected model Mz [ k _ s ] is not less than a predetermined velocity threshold Vth, and the model loss function Lz at convergence of the selected model Mz [ k _ s ] is not more than a predetermined loss threshold Th _ loss.

In one embodiment, the velocity indicator V is associated with the delay time of the model Mz. For example, the velocity indicator V is inversely proportional to the delay time of the model Mz. The delay time of the model Mz is related to the model structure.

In one embodiment, the velocity indicator V is proportional to the inverse of the sum Sc of the magnitudes of the output tensors of all convolution operations of the model Mz. For example: for a tensor of size 10 x 10, the number of components (i.e., the size of the tensor) is 1000; if the model Mz has 12 convolution operations and the output tensor of each convolution operation is 1000 in size, V may be set to 1/12000.

In one embodiment, the predetermined requirements further comprise: in the case where the model Mz is a selected model M [ k _ s ] employing a selected model structure, the memory footprint a _ mem of the selected model M [ k _ s ] is not greater than the predetermined memory footprint threshold Th _ mem. The situation where two submodels share a partial structure can be considered in determining the memory footprint of the model.

The multi-objective optimization function is related to a model loss function. Therefore, it is important to determine a reasonable expression of the model loss function. A method of determining the model loss function is described below with reference to fig. 3.

FIG. 3 illustrates an exemplary flow diagram of a method 300 for determining a model loss function according to one embodiment of the present disclosure.

In operation S301, a convergence range of the submodel is estimated. Specifically, a representative sub-model structure can be selected for each sub-model, and each sub-model is trained based on a training sample set; during the training of the submodel based on the training sample set, the variation range of the loss determined in the convergence process of the loss function of the submodel is the convergence value range of the submodel. For example: and selecting the value (or 0) of the corresponding loss function as the right end point of the convergence value range when all the prediction results of the submodel are wrong, and selecting the value (or 0) of the corresponding loss function as the left end point of the convergence value range when the accuracy of the prediction results of the submodel is highest. For example, candidate sub-model structures with intermediate levels of complexity may be selected as representative sub-model structures.

In operation S303, a value range adjustment coefficient of the submodel is determined. In particular, the range of the convergence range of the estimated submodels may differ significantly, which is disadvantageous for determining a suitable model loss function; determining a respective value range adjustment coefficient for each sub-model loss function based on the convergence value range for each sub-model loss function; each respective value range adjustment coefficient is arranged such that each respective value range adjustment coefficient overlaps the value range of the product of each respective sub-model loss function, which product is referred to as adjustment loss function in the following. I.e. the respective adjustment loss function C_i*L_iHave an overlap in the variation range of (A) and (B), wherein C_iAs a sub-model M [ i ]]The value range of (1) adjusts the coefficient. For example, the respective value range adjustment coefficients are set such that the adjustment loss functions of maxi sub-models all vary in the same order of magnitude and in the same value range (e.g., all vary in the value range [0,100 ]]) Or the right end points of the variation ranges are the same, or each variation range has the same median value, wherein the median value refers to the middle value of the variation range.

In the present disclosure, the loss weight ρ of the submodel loss function for each submodel_iAdjusting the coefficient C for the respective value range_iWith corresponding intermediate loss weight ρ'_iThe product of (see equation 2).

ρ_i＝C_iρ′_i (2)

In operation S305, a model is trained. Specifically, a model-based model penalty function and a specified model structure mz [ k _ sp ]]And training the model. In the first training, each middle lossWeight loss of rho'_iMay be set to 1 or other equal positive number. Specifying model structures as submodel structures m [ i _ x ] of each submodel]Some combination of (a). The specified model structure may be a model structure corresponding to a model of intermediate complexity.

In operation S307, it is determined whether a loss given by the model loss function is equal to or less than a predetermined loss threshold Lth in a case where the model loss function converges after the model is trained.

In the event that the determination is yes, the method 300 ends.

In the case where the determination result is no, operation S309 is performed.

In operation S309, at least one intermediate loss weight is adjusted. In particular, in case the loss given by the model loss function is greater than a predetermined loss threshold Lth, the product (C) of the corresponding submodel loss function is adjusted according to the corresponding intermediate loss weight, the corresponding value range adjustment coefficient and the corresponding value range_iρ′_iL_i) The intermediate loss weight of at least one of the plurality of submodels is adjusted. For example: for a submodel corresponding to a larger one of the respective intermediate loss weights, the respective value range adjustment coefficients, and the respective submodel loss function, increasing the intermediate loss weight for that submodel (e.g., the intermediate loss weight update is two, five, ten, or other multiple of the previous intermediate loss weight); alternatively, for a sub-model for which the smaller of the respective intermediate loss weights, the respective value range adjustment coefficients, and the respective sub-model loss functions corresponds, the intermediate loss weight for that sub-model is reduced (e.g., the intermediate loss weight update is 0.9, 0.5, 0.1, or other multiple of the previous intermediate loss weight). The intermediate loss weight of one submodel may be increased while the intermediate loss weight of another submodel is decreased.

That is, in method 300, the loss weights for each submodel are determined in an iteratively updated manner. In the process of iteratively adjusting the value range adjusting coefficient, a specified model structure mz [ k _ sp ] is specified during first training, and in the subsequent adjusting period, the specified model structure is not changed or reselected.

After the loss weights are determined, a multi-objective optimization function can be constructed based on the expression of the model loss function. In one embodiment, the multi-objective optimization function J may be constructed as shown in equation 3.

J＝Min G(Lz,V) (3)

Wherein G (Lz, V) ═ Lz V^b。

The index b is constant. Preferred values for b may be determined empirically or by a pareto optimization method. Considering that the pareto optimization method is a conventional technique, it is not described herein in detail.

And the model loss function Lz and the speed index V are related to the model structure, and the process of correspondingly determining the selected model Mz [ k _ s ] meeting the preset requirements by solving the multi-objective optimization function J is adopted. The resulting selected model Mz [ k _ s ] satisfies: the velocity indicator of the selected model Mz [ k _ s ] is less than a predetermined velocity threshold.

The method of solving for G may be a gradient descent method. Thus, a selected model Mz [ k _ s ] satisfying a predetermined requirement is screened from a candidate model set { Mz [ k ] } (the candidate model set includes a plurality of elements, where only one representative element Mz [ k ] is shown for simplicity).

An intuitive solution may be to select a specific model according to the distribution of the values of the loss function of the model. Fig. 4 shows a schematic variation of the loss function with the speed indicator, wherein the abscissa is the inverse of the speed indicator, i.e. the larger the abscissa, the longer it takes to complete an image task. Each point represents a candidate model structure mz [ k ]. When the corresponding points (1/V, Lz) of all the elements of the candidate model set { Mz [ k ] } are determined, a model corresponding to a point having an abscissa less than or equal to 1/Vth and a minimum Lz may be selected as the selected model Mz [ k _ s ].

Reference 1 can be made to a solution based on a frame differentiation method (DARTS method for short) to determine the selected model Mz [ k _ s ].

Document 1: liu, Hanxiao, Karen Simony, and Yiming Yang. "Darts: differentiated architecture search." arXiv preprint arXiv:1806.09055 (2018).

In DARTS method, O (k) is the differentiable model framework operation combining function (i.e., the continuous search space), and kmn is the transition probabilities of option m to option n. The gradient descent algorithm may be used to simultaneously optimize the mixed transition probabilities of the values of o (k) and the weight parameters of the model. And Min is G (Lz (omega), V (k)) so as to minimize (G is minimum) the value of the multiobjective optimization function of one of the obtained sub-networks (namely Mz [ k _ s ]). The formula is expressed as:

ω’(k)＝argmin_ωg (ω, k), and V (k'). gtoreq.minimum Speed.

Wherein the meaning of the formula is: find loss minimization (argmin)_ωG (ω, k)) and the weight parameter ω '(k) of k, the optimal structure combination sequence is denoted as k', and the structure combination sequence satisfies the requirement of minimum velocity.

The selected model Mz [ k _ s ] can be determined using RNN framework sampling network and reinforcement learning method combination (RL method for short) in reference 2.

Document 2: pham, Hieu, et al, "effective neural architecture search via parameter sharing," arXiv preprinting arXiv:1802.03268 (2018).

In the reinforcement learning method, O (k) can be parameterized by an RNN, denoted as Ctrl (θ), and the output sequence value of RNN is k.

In the RL method, the bonus function is set as follows.

R(ω,k)＝Max_G_value-G(Lz(ω),V(k))

Where Max _ G _ value is the maximum value of G (Lz (ω), v (k)) in calculating the bonus prize. For example, Max _ G _ value may be calculated when b is estimated using the pareto algorithm.

In the RL method, the target expectation function is set as follows.

J ═ EP (k, ω, θ) [ R (ω, k) ] (maximum expectation)

A particular RNN may be solved using a Policy gradient algorithm (Policy gradient algorithm) to maximize the probability of expected J ' for any network sampled by the RNN, and V (k ') is ≧ Minimum Speed for any network fabric operation sequence k '.

The apparatus for multi-objective optimization of models for accomplishing image tasks of the present disclosure is also configured as a modular structure. Such an arrangement will be described below with reference to fig. 5.

FIG. 5 illustrates an exemplary block diagram of an apparatus 500 for multi-objective optimization of a model for completing an image task according to one embodiment of the present disclosure. The apparatus 500 includes a model loss function determination unit 501, a multi-objective optimization function determination unit 503, and a solving unit 505. The model loss function determination unit 501 is configured to determine a model loss function Lz of the model Mz. The multi-objective optimization function determination unit 503 is configured to determine a multi-objective optimization function J of the model Mz based on the model loss function Lz and the speed index V of the model. The solving unit 505 is used to solve the multi-objective optimization function J to determine a selected model Mz [ k _ s ] that satisfies predetermined requirements. The model Mz includes a plurality of sub-models for completing the image task. The model penalty function Lz is a weighted sum of the submodel penalty functions of each of the plurality of submodels. The loss weight for the weighted sum of each sub-model loss function is determined in an iteratively updated manner based on a set of training samples. The predetermined requirements include: the speed index of the selected model is not less than a preset speed threshold value; and the value of the model loss function at convergence for the selected model is not greater than the predetermined loss threshold. Further details of the apparatus 500 may be found in relation to the description of the method 200.

One aspect of the disclosure also discloses a method of completing an image task. This method is described below exemplarily with reference to fig. 6.

FIG. 6 illustrates an exemplary flow diagram of a method 600 for completing an image task according to one embodiment of the present disclosure.

In operation S601, a model Mz (i.e., the selected model M [ k _ S ]) is determined using the method 200 for multi-objective optimization of models that accomplish image tasks of the present disclosure. The determined model Mz has a selected model structure and model parameters. The determined model meets the predetermined requirements. The predetermined requirements include at least the requirements regarding the speed index and the model loss function as described above.

In operation S603, an image task is completed using the determined model Mz with respect to the test image.

One aspect of the present disclosure relates to an apparatus for performing an image task. The apparatus is described below by way of example with reference to fig. 7.

FIG. 7 illustrates an exemplary block diagram of an apparatus 700 for completing an image task according to one embodiment of the present disclosure.

The apparatus 700 includes a model determination unit 701 and an image task completion unit 703. The model determination unit 700 is used to determine the model Mz (i.e., the selected model M k _ s) using the method 200 for multi-objective optimization of models completing an image task of the present disclosure. The determined model Mz has a selected model structure and model parameters. The determined model meets the predetermined requirements. The predetermined requirements include at least the requirements regarding the speed index and the model loss function as described above. The image task completion unit 703 is configured to complete an image task using the determined model Mz for the test image.

One aspect of the present disclosure provides a computer-readable storage medium having a program stored thereon. The program is for multi-objective optimization of a model for performing an image task, and the program is such that when the program is executed by a processor it effects: determining a model loss function of the model; determining a multi-objective optimization function of the model based on the model loss function and the speed index of the model; and solving the multi-objective optimization function to determine a selected model that meets predetermined requirements; wherein the model comprises a plurality of sub-models for completing the image task; the model loss function is a weighted sum of sub-model loss functions of each of the plurality of sub-models; the loss weight for the weighted sum of each sub-model loss function is determined in an iterative update manner based on a training sample set; and the predetermined requirements include: the speed index of the selected model is not less than a preset speed threshold value; and the value of the model loss function at convergence for the selected model is not greater than the predetermined loss threshold.

According to one aspect of the disclosure, an apparatus for multi-objective optimization of a model for performing an image task is also provided. The apparatus includes a circuit. The circuit is configured to: determining a model loss function of the model; determining a multi-objective optimization function of the model based on the model loss function and the speed index of the model; and solving the multi-objective optimization function to determine a selected model that meets predetermined requirements; wherein the model comprises a plurality of sub-models for completing the image task; the model loss function is a weighted sum of sub-model loss functions of each of the plurality of sub-models; the loss weight for the weighted sum of each sub-model loss function is determined in an iterative update manner based on a training sample set; and the predetermined requirements include: the speed index of the selected model is not less than a preset speed threshold value; and the value of the model loss function at convergence for the selected model is not greater than the predetermined loss threshold.

According to an aspect of the present disclosure, there is also provided an information processing apparatus for completing an image task. Fig. 8 is an exemplary block diagram of an information processing apparatus 800 according to one embodiment of the present disclosure. In fig. 8, a Central Processing Unit (CPU)801 performs various processes in accordance with a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage section 808 to a Random Access Memory (RAM) 803. The RAM 803 also stores data and the like necessary when the CPU 801 executes various processes, as necessary.

The CPU 801, the ROM 802, and the RAM 803 are connected to each other via a bus 804. An input/output interface 805 is also connected to the bus 804.

The following components are connected to the input/output interface 805: an input portion 806 including a soft keyboard and the like; an output portion 807 including a display such as a Liquid Crystal Display (LCD) and a speaker; a storage portion 808 such as a hard disk; and a communication section 809 including a network interface card such as a LAN card, a modem, and the like. The communication section 809 performs communication processing via a network such as the internet, a local area network, a mobile network, or a combination thereof.

A drive 810 is also connected to the input/output interface 805 as necessary. A removable medium 811 such as a semiconductor memory or the like is mounted on the drive 810 as needed, so that the program read therefrom is mounted on the storage portion 808 as needed.

The CPU 801 may run a program for implementing multi-objective optimization of models for performing image tasks according to the present disclosure, or a program for performing image tasks.

According to the scheme, the model meeting multiple targets can be obtained for the model which comprises multiple sub-models and completes the image task, and the model with good comprehensive performance can be screened out. This is beneficial to improving the user experience and the efficiency of completing the image task. Even the most suitable model is selected according to the hardware (such as the memory size). By using the value range adjustment coefficient and the intermediate loss weight, the contribution proportion of each model to the value of the model loss function is reasonably configured, so that the optimal model is favorably obtained.

While the invention has been described in terms of specific embodiments thereof, it will be appreciated that those skilled in the art will be able to devise various modifications (including combinations and substitutions of features between the embodiments, where appropriate), improvements and equivalents of the invention within the spirit and scope of the appended claims. Such modifications, improvements and equivalents are also intended to be included within the scope of the present invention.

It should be emphasized that the term "comprises/comprising" when used herein, is taken to specify the presence of stated features, elements, steps or components, but does not preclude the presence or addition of one or more other features, elements, steps or components.

Furthermore, the methods of the embodiments of the present invention are not limited to being performed in the time sequence described in the specification or shown in the drawings, and may be performed in other time sequences, in parallel, or independently. Therefore, the order of execution of the methods described in this specification does not limit the technical scope of the present invention.

Supplementary note

1. An apparatus for multi-objective optimization of a model for performing an image task, the apparatus comprising:

a memory storing instructions; and

a processor configured to fetch the instructions from the memory and execute the instructions to:

determining a model loss function for the model;

determining a multi-objective optimization function of the model based on the model loss function and the speed index of the model; and

solving the multi-objective optimization function to determine a selected model that meets predetermined requirements;

wherein the model comprises a plurality of sub-models for completing the image task;

the model loss function is a weighted sum of sub-model loss functions of each of the plurality of sub-models;

the loss weight for the weighted sum for each sub-model loss function is determined in an iteratively updated manner based on a training sample set; and is

The predetermined requirements include:

the speed index of the selected model is not less than a preset speed threshold; and

the value of the model loss function at convergence for the selected model is not greater than a predetermined loss threshold.

2. The apparatus of supplementary note 1, wherein each of the plurality of submodels is a deep neural network model containing convolution operations.

3. The apparatus according to supplementary note 2, wherein the speed index is associated with a delay time of the model.

4. The apparatus according to supplementary note 2, wherein the velocity index is an inverse number of a sum of magnitudes of output tensors of all convolution operations of the model.

5. The apparatus of supplementary note 1, wherein the plurality of sub-models includes at least two of a classification model, a localization model, and a segmentation model.

6. The apparatus according to supplementary note 1, wherein the predetermined requirement further includes: and under the condition that the model is a specified model adopting the selected model structure, the memory occupation amount of the specified model is not more than a preset memory occupation amount threshold value.

7. The apparatus of supplementary note 1, wherein determining the loss weight for each sub-model loss function comprises estimating a convergence range for each sub-model loss function based on the training sample set; and is

The convergence range of the loss function of each submodel is the variation range of the loss determined in the convergence process of the loss function of the corresponding submodel during the training of the corresponding submodel based on the training sample set.

8. The apparatus of supplementary note 7, wherein determining the loss weight for each sub-model loss function comprises determining a corresponding value range adjustment coefficient for each sub-model loss function based on a convergence value range for each sub-model loss function;

each respective value range adjustment coefficient is set such that the value range of each respective value range adjustment coefficient overlaps with the value range of the product of each respective sub-model loss function; and is

The loss weight of each sub-model is the product of the corresponding value range adjustment coefficient and the corresponding intermediate loss weight.

9. The apparatus of supplementary note 8, wherein determining the loss weight for each submodel loss function comprises:

training the model based on a model loss function and a specified model structure of the model;

determining whether the loss given by the model loss function is equal to or less than a predetermined threshold value under the condition that the model loss function converges after the model is trained; and

and under the condition that the loss given by the model loss function is greater than the preset threshold value, adjusting the middle loss weight of at least one submodel in the plurality of submodels according to the corresponding middle loss weight, the product of the corresponding value range adjusting coefficient and the corresponding submodel loss function.

10. The apparatus of supplementary note 9, wherein adjusting the intermediate loss weight of at least one of the plurality of submodels comprises:

and reducing the middle loss weight of the submodel corresponding to the smaller product in the products of the corresponding value range adjustment coefficient and the corresponding submodel loss function.

11. The apparatus of supplementary note 9, wherein adjusting the intermediate loss weight of at least one of the plurality of submodels comprises:

and aiming at the submodel corresponding to the larger product in the products of the corresponding value range adjusting coefficient and the corresponding submodel loss function, increasing the middle loss weight of the submodel.

12. A method for multi-objective optimization of a model for performing an image task, the method comprising:

determining a model loss function for the model;

The predetermined requirements include:

13. The method of supplementary note 12, wherein each of the plurality of submodels is a deep neural network model that includes convolution operations.

14. The method of supplementary note 12, wherein the velocity indicator is an inverse of a sum of magnitudes of output tensors of all convolution operations of the model.

15. The method of supplementary note 12, wherein determining the loss weight for each sub-model loss function comprises estimating a convergence range for each sub-model loss function based on the set of training samples; and is

16. The method of supplementary note 15, wherein determining the loss weight for each sub-model loss function comprises determining a corresponding value range adjustment coefficient for each sub-model loss function based on a convergence value range for each sub-model loss function;

17. The method of supplementary note 16, wherein determining the loss weight for each submodel loss function comprises:

18. The method of supplementary note 17, wherein adjusting the intermediate loss weight of at least one of the plurality of submodels comprises:

19. The method of supplementary note 17, wherein adjusting the intermediate loss weight of at least one of the plurality of submodels comprises:

20. A computer-readable storage medium on which a program is stored, the program being for multi-objective optimization of a model for performing an image task, and the program being such that when executed by a processor it carries out:

determining a model loss function for the model;

The predetermined requirements include:

Claims

a memory storing instructions; and

determining a model loss function for the model;

The predetermined requirements include:

the velocity indicator of the selected model is not less than a predetermined velocity threshold, an

2. The apparatus of claim 1, wherein each of the plurality of submodels is a deep neural network model that includes convolution operations.

3. The apparatus of claim 2, wherein the velocity indicator is proportional to an inverse of a sum of magnitudes of output tensors of all convolution operations of the model.

4. The apparatus of claim 1, wherein the plurality of sub-models comprises at least two of a classification model, a localization model, and a segmentation model.

5. The apparatus of claim 1, wherein the predetermined requirements further comprise: in the case where the model is a selected model employing the selected model structure, the memory footprint of the selected model is not greater than a predetermined memory footprint threshold.

6. The apparatus of claim 1, wherein determining the loss weight for each sub-model loss function comprises estimating a convergence range for each sub-model loss function based on the set of training samples; and is

7. The apparatus of claim 6, wherein determining the loss weight for each sub-model loss function comprises determining a respective range adjustment coefficient for each sub-model loss function based on a convergence range for each sub-model loss function;

8. The apparatus of claim 7, wherein determining a loss weight for each submodel loss function comprises:

9. A method for multi-objective optimization of a model for performing an image task, the method comprising:

determining a model loss function for the model;

The predetermined requirements include:

10. A computer-readable storage medium on which a program is stored, the program being for multi-objective optimization of a model for performing an image task, and the program being such that when executed by a processor it carries out:

determining a model loss function for the model;

The predetermined requirements include: