CN113159275A

CN113159275A - Network training method, image processing method, device, equipment and storage medium

Info

Publication number: CN113159275A
Application number: CN202110245295.9A
Authority: CN
Inventors: 刘李洋; 李艺; 旷章辉; 陈益民; 张伟
Original assignee: Shenzhen Sensetime Technology Co Ltd
Current assignee: Shenzhen Sensetime Technology Co Ltd
Priority date: 2021-03-05
Filing date: 2021-03-05
Publication date: 2021-07-23

Abstract

The present disclosure relates to a network training method, an image processing method, an apparatus, and a storage medium, the network training method including: inputting the sample image into an image processing network in an i-1 th state to obtain a plurality of first processing results of the sample image, wherein i is a positive integer; respectively determining the network loss of each branch network and the gradient information of each network loss to the main network according to the labeling information of the sample image and a plurality of first processing results; determining a target weighting coefficient of each network loss according to the gradient information; obtaining the total loss of the image processing network according to each network loss and the target weighting coefficient of each network loss; and training the image processing network in the i-1 th state according to the overall loss to obtain the image processing network in the i-th state. The embodiment of the disclosure can realize balanced learning among different image processing tasks, and improve network learning performance and efficiency.

Description

Network training method, image processing method, device, equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a network training method, an image processing apparatus, a device, and a storage medium.

Background

A Multi-task learning (MTL) mode aims to achieve the purposes of reducing network parameters, accelerating testing speed and improving task performance by sharing the characteristics of different levels of various tasks by utilizing the relevance among the tasks.

In the multi-task learning, because loss functions of different tasks are different, a phenomenon that learning cannot be balanced among the tasks may occur in the training process, and the phenomenon may cause some tasks to achieve a good learning effect, while other tasks are forced to be sacrificed.

Disclosure of Invention

The present disclosure provides a technical scheme of network training and image processing.

According to an aspect of the present disclosure, there is provided a network training method, including: inputting a sample image into an image processing network in an i-1 th state to obtain a plurality of first processing results of the sample image, wherein the image processing network comprises a main network and a plurality of branch networks, the main network is used for extracting image characteristics, the branch networks are used for outputting the processing results of each image processing task based on the image characteristics, and i is a positive integer; respectively determining the network loss of each branch network and the gradient information of each network loss to the main network according to the labeling information of the sample image and the plurality of first processing results; determining a target weighting coefficient of each network loss according to the gradient information; obtaining the total loss of the image processing network according to the network losses and the target weighting coefficients of the network losses; and training the image processing network in the i-1 th state according to the total loss to obtain the image processing network in the i-th state.

In a possible implementation manner, the determining a target weighting coefficient of each network loss according to the gradient information includes: determining overall gradient information according to the gradient information and the weighting parameters of the gradient information; and under the condition that the gradient components of the overall gradient information in the directions of the gradient information are the same, determining the parameter values corresponding to the weighting parameters of the gradient information as target weighting coefficients of the network losses.

In one possible implementation, the sum of the target weighting coefficients of the respective network losses is 1.

In one possible implementation, training the image processing network in the i-1 th state according to the total loss includes: updating the network parameters of the backbone network in the i-1 th state according to the total loss; and/or correspondingly updating the network parameters of each branch network in the i-1 th state according to the network losses.

In a possible implementation manner, the correspondingly updating the network parameters of the respective branch networks in the i-1 th state according to the respective network losses includes: determining scaling loss corresponding to each network loss according to each network loss and the scaling parameter of the (i-1) th state of each network loss; and correspondingly updating the network parameters of each branch network in the ith-1 state and the scaling parameters of the ith-1 state according to each scaling loss to obtain each branch network in the ith state and the scaling parameters of the ith state.

In a possible implementation manner, the obtaining the total loss of the image processing network according to the respective network losses and the target weighting coefficients of the respective network losses includes: and according to the target weighting coefficient of each network loss, carrying out weighted summation on the multiple network losses to obtain the total loss.

In a possible implementation manner, the separately determining gradient information of each network loss for the backbone network includes: and respectively determining the gradient information of the image characteristics extracted by the network losses to the backbone network.

In one possible implementation, the method further includes: and determining the image processing network in the ith state as a trained image processing network under the condition that the image processing network in the ith state meets a training condition.

In one possible implementation, the image processing task includes at least two of image recognition, image classification, image segmentation, and keypoint detection.

According to an aspect of the present disclosure, there is provided an image processing method including: and inputting the image to be processed into an image processing network to obtain a plurality of second processing results of the image, wherein the image processing network is obtained by training according to the network training method.

According to an aspect of the present disclosure, there is provided a network training apparatus, including:

the processing module 101 is configured to input a sample image into an image processing network in an i-1 th state to obtain a plurality of first processing results of the sample image, where the image processing network includes a main network and a plurality of branch networks, the main network is configured to extract image features, the branch networks are configured to output processing results of each image processing task based on the image features, and i is a positive integer;

a gradient information determining module 102, configured to determine, according to the labeling information of the sample image and the plurality of first processing results, network loss of each branch network and gradient information of each network loss to the main network respectively;

a weighting coefficient determining module 103, configured to determine a target weighting coefficient of each network loss according to the gradient information;

a total loss determining module 104, configured to obtain a total loss of the image processing network according to the network losses and the target weighting coefficients of the network losses;

and the training module 105 is configured to train the image processing network in the i-1 th state according to the total loss to obtain the image processing network in the i-th state.

In a possible implementation manner, the weighting factor determining module 103 includes: the overall gradient information determining submodule is used for determining overall gradient information according to the gradient information and the weighting parameters of the gradient information; and the weighting coefficient determining submodule is used for determining the parameter value corresponding to the weighting parameter of each piece of gradient information as the target weighting coefficient of each network loss under the condition that the gradient components of the overall gradient information in the direction of each piece of gradient information are the same.

In one possible implementation, the apparatus further includes: and the determining module is used for determining the image processing network in the ith state as the trained image processing network under the condition that the image processing network in the ith state meets the training condition.

According to an aspect of the present disclosure, there is provided an electronic device including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the above-described method.

According to an aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method.

In the embodiment of the disclosure, the target weighting coefficient of each network loss can be determined according to the gradient information of each branch network to the main network in each iterative training, so that when the image processing network is trained according to the network losses of a plurality of branch networks and the corresponding target weighting coefficients, the image processing network can update parameters according to the weighting losses, so as to realize balanced learning among different image processing tasks, and improve the network learning performance and efficiency.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.

Fig. 1 shows a flow diagram of a network training method according to an embodiment of the present disclosure.

FIG. 2 shows a schematic diagram of gradient components in accordance with an embodiment of the present disclosure.

Fig. 3 shows a block diagram of a network training apparatus according to an embodiment of the present disclosure.

Fig. 4 shows a block diagram of an electronic device in accordance with an embodiment of the disclosure.

Fig. 5 shows a block diagram of an electronic device in accordance with an embodiment of the disclosure.

Detailed Description

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

It should be understood that the terms "first," "second," and "third," etc. in the claims, description, and drawings of the present disclosure are used for distinguishing between different objects and not for describing a particular order. The terms "comprises" and "comprising," when used in the specification and claims of this disclosure, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.

Fig. 1 shows a flowchart of a network training method according to an embodiment of the present disclosure, as shown in fig. 1, the network training method includes:

in step S11, the sample images in the training set are input into an image processing network in the i-1 th state, so as to obtain a plurality of first processing results of the sample images, where the image processing network includes a main network and a plurality of branch networks, the main network is used to extract image features, the branch networks are used to output processing results of each image processing task based on the image features, and i is a positive integer;

in step S12, determining the network loss of each branch network and the gradient information of each network loss with respect to the main network, respectively, according to the labeling information of the sample image and the plurality of first processing results;

in step S13, determining a target weighting coefficient for each network loss based on the gradient information;

in step S14, obtaining the total loss of the image processing network according to each network loss and the target weighting coefficient of each network loss;

in step S15, the image processing network in the i-1 th state is trained according to the total loss, and the image processing network in the i-th state is obtained.

In one possible implementation, the network training method may be performed by an electronic device such as a terminal device or a server, where the terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like, and the method may be implemented by a processor calling a computer readable instruction stored in a memory. Alternatively, the method may be performed by a server.

In one possible implementation, in step S11, a training set for training the image processing network may include a number of sample images. The sample image may be an annotated image, wherein annotation information of the sample image can be obtained by annotating the sample image according to different image processing tasks, for example, the annotation information of the vehicle image may include: license plate color-yellow, license plate type-coach car, license plate number-YGS 980, etc. The labeling manner may be, for example, manual labeling, and the embodiment of the present disclosure is not limited thereto.

In a possible implementation manner, the image processing network in the i-1 th state may refer to the image processing network after the i-1 th parameter update (or i-1 st iterative training). The image processing network in the 0 th state (i.e. the image processing network when i is 1) may refer to an initial image processing network or an untrained image processing network.

In a possible implementation manner, the image processing network may include a main network and a plurality of branch networks, so that multi-task learning may be implemented, and different branch networks may correspond to different image processing tasks. The image processing task comprises at least two of image recognition, image classification, image segmentation and key point detection.

For example, the same license plate image may be subjected to image processing tasks such as a license plate color recognition task, a license plate type classification task, and a license plate number recognition task. It should be understood that the plurality of first processing results of the sample image may refer to results output by each branch network of the image processing network, such as license plate color, license plate type, license plate number, and the like.

In a possible implementation manner, the backbone network may adopt a network structure such as a residual error network, a residual error network plus a feature pyramid network, and the like, so as to extract image features of the input image, that is, shared features. It should be understood that the image features extracted by the backbone network include feature information required for each image processing task. The embodiments of the present disclosure are not limited to the network structure of the backbone network.

In a possible implementation manner, each branch network may adopt a network structure such as at least one pooling layer, a full connection layer, and a softmax layer, and the network structure of the branch network may be set according to an actual image processing task, which is not limited in this embodiment of the present disclosure. It should be understood that the corresponding loss functions, or the corresponding annotation information, may be different for each branch network, so that different image processing tasks can be implemented based on the shared features. The number of the branch networks may be set according to the actual image processing task requirement, and the embodiment of the present disclosure is not limited thereto.

In a possible implementation manner, in step S12, obtaining network losses of a plurality of branch networks according to the label information of the sample image and the plurality of first processing results based on the loss function corresponding to each image processing task may be implemented; the gradient information of each network loss to the backbone network can be understood as the partial derivative of each network loss. For the calculation manner of the gradient information, a manner known in the art (e.g., a back propagation method) may be adopted, and the embodiment of the present disclosure is not limited thereto.

It will be appreciated that the gradient is actually the derivative of the function, and that the direction of the gradient may represent the direction in which the function rises or falls most quickly at a given point. For different image processing tasks, the gradient of the loss function corresponding to each image processing task can be solved to obtain the optimal updating direction of the shared network parameter for the image processing task, and because only one shared network parameter is used, the gradients in different directions can be weighted and summed to obtain the updating direction of a target.

In one possible implementation manner, in step S13, a target weighting coefficient of each network loss may be determined according to the gradient information corresponding to each network loss. In order to determine the target weighting coefficients of the network losses, weighting parameters may be introduced into each gradient information to obtain total gradient information of weighted summation, and then the sum of the weighting parameters is constrained to be 1, and gradient components of the total gradient information in the direction of each gradient information are constrained to be equal (or projection is equal), so as to obtain specific parameter values of the weighting parameters, that is, the weighting coefficients of the network losses of the training are obtained.

In the embodiment of the present disclosure, by constraining the gradient components of the overall gradient information in each gradient information direction to be equal, it can be realized that when updating the network parameters of the backbone network, the update step lengths in each gradient information direction are equal, that is, learning can be performed at the same speed between each image processing task, and thus, balanced learning between different image processing tasks is realized. The step size can be understood as a learning rate.

In one possible implementation manner, in step S14, obtaining the total loss of the image processing network according to the target weighting coefficients of the network losses and the network losses may include: according to the target weighting coefficient of each network loss, carrying out weighted summation on a plurality of network losses to obtain the total loss of the image processing network; or carrying out weighted average on a plurality of network losses according to the target weighting coefficient of each network loss to obtain the total loss of the image processing network. The determination method of the total loss can be set according to actual requirements, and the embodiment of the disclosure is not limited.

In a possible implementation manner, in step S15, the image processing network in the i-1 th state is trained according to the total loss to obtain the image processing network in the i-th state, which may be obtained by updating network parameters of the main network and each branch network of the image processing network in the i-1 th state according to the total loss in a manner of back propagation, gradient descent, or the like; or, the network parameters of the backbone network in the i-1 th state can be updated according to the total loss, and the network parameters of each branch network in the i-1 th state are correspondingly updated according to each network loss to obtain the image processing network in the i-th state. The specific updating method can be set according to actual requirements, and the implementation of the present disclosure is not limited. By the method, the balance training of the image processing network can be realized based on the total loss after weighted summation, and the network parameters of the image processing network can be updated through one-time back propagation, so that the network training time can be greatly reduced, and the network learning performance and efficiency are improved.

It should be understood that the network is trained more than once, and after the image processing network in the ith state is obtained, it may be determined whether the image processing network in the ith state meets a preset training condition (e.g., total loss convergence or a preset number of iterations is reached); under the condition of meeting the training condition, obtaining a trained image processing network; and under the condition that the training condition is not met, performing the (i + 1) th network training according to the network training method until an image processing network meeting the training condition is obtained.

In a possible implementation manner, in step S13, the determining the target weighting factor of each network loss according to the gradient information includes:

determining overall gradient information according to the gradient information and the weighting parameters of the gradient information;

and under the condition that the gradient components of the overall gradient information in the direction of each gradient information are the same, determining the parameter value corresponding to the weighting parameter of each gradient information as the target weighting coefficient of each network loss.

In one possible implementation, the overall gradient information may be expressed as

Gradient information of each network loss to the backbone network can be expressed as

Where n represents the nth image processing task (i.e., the nth branch network),

L_nrepresenting the network loss, g, for the nth image processing task_nGradient information, alpha, representing the nth network loss_nA weighting parameter representing the nth gradient information,

represents a pair L_nThe partial derivatives are calculated, and N represents the total number of image processing tasks.

In one possible implementation, as described above, the gradient component of the overall gradient information in the direction of each gradient information may be understood as a projection of the overall gradient information in the direction of each gradient information. FIG. 2 shows a schematic diagram of gradient components, as shown in FIG. 2, g, according to an embodiment of the present disclosure₁、g₂、g₃Gradient information representing the respective 3 network losses, g represents the overall gradient information of the 3 gradient information, a₁、a₂、a₃Respectively, represent the gradient components (projections) of the overall gradient information in the direction of the 3 gradient information, wherein the arrow points in the direction representing the gradient and the arrow length represents the step size of the gradient.

In one possible implementation, the gradient component of the total gradient information in the direction of each gradient information may be determined according to the unit norm vector of each gradient information. Wherein the norm (norm) of the nth gradient information can be represented as | g_n|, the unit norm vector of the nth gradient information (unit-norm vector) can be expressed as u_n＝g_n/‖g_nII, the gradient component of the total gradient information in the direction of the respective gradient information may be represented as

Where T represents transpose.

In one possible implementation, the gradient components of the overall gradient information in the direction of the respective gradient information are the same, meaning that the equation in equation 1 holds. By solving the formula 1, parameter values corresponding to the weighting parameters of each piece of gradient information, that is, target weighting coefficients of each network loss can be obtained.

Wherein the content of the first and second substances,

can represent a unit norm vector of the 1 st gradient information. It should be understood that the above-described embodiments,

may be any one of a plurality of gradient information,

may represent gradient information other than the arbitrary one of the plurality of gradient information.

Considering that the exact target weighting factor may not be obtained by equation 1, in one possible implementation, the sum of the weighting parameters of the gradient information may be constrained to be 1, i.e., Σ_nα_n1 is ═ 1; further can simultaneously form an equation set

The parameter values of the weighting parameters are obtained as follows:

wherein the content of the first and second substances,

c＝[1,…,1]。

it should be understood that since the sum of the weighting parameters constraining each gradient information is 1, the sum of the resulting target weighting coefficients for each network loss is 1.

It should be noted that constraining the sum of the weighting parameters of each gradient information to be 1 is one possible implementation manner provided by the embodiment of the present disclosure. The disclosure should not be limited thereto, and those skilled in the art can constrain the sum of the weighting parameters to be 10, 100, and other positive integers according to actual needs to obtain specific parameter values of the weighting parameters, which is not limited by the embodiments of the disclosure.

In the embodiment of the present disclosure, because the determined target weighting coefficients of the respective network losses are determined under the condition that the gradient components of the overall gradient information in the directions of the respective gradient information are the same, when the network parameters of the backbone network are updated according to the overall losses, the update step lengths in the directions of the respective gradient information are equal, and thus, balanced learning between different image processing tasks is achieved.

In one possible implementation, in step S15, training the image processing network in the i-1 th state according to the overall loss may include: updating the network parameters of the backbone network in the i-1 th state according to the overall loss; and/or correspondingly updating the network parameters of each branch network in the i-1 th state according to each network loss. By the method, the balance learning of the image processing network can be effectively carried out.

In a possible implementation manner, the network parameters of the main network can be updated according to the total loss and the network parameters of each branch network can be updated correspondingly according to each network loss by adopting a mode of back propagation, gradient descent and the like. The network parameter updating method can be set according to actual requirements, and the embodiment of the disclosure is not limited.

In a possible implementation, training the image processing network in the i-1 th state according to the total loss may further include: and updating the network parameters of the backbone network in the (i-1) th state and each branch network in the (i-1) th state according to the overall loss. The embodiment of the present disclosure is not limited to adopting various updating modes.

It should be understood that, the network parameters of the backbone network in the i-1 th state are updated, and the network parameters of the branch networks in the i-1 th state are updated, so that the backbone network in the i-th state and the branch networks in the i-th state are obtained correspondingly, that is, the image processing network in the i-th state is obtained.

It can be known that, the parameter quantity of the main network is larger than the parameter quantity of the branch networks, so according to the embodiment of the present disclosure, the total loss determined based on the target weighting coefficient is more emphasized on the balance learning of the main network, and the balance learning among the branch networks is less emphasized, so that when the image processing network is subjected to the balance training based on the total loss, the training speed can be increased while the balance learning effect of the main network is more improved.

Considering that although the parameter amount of the branch network is smaller than that of the main network, if the gradient balance learning of the main network can be concerned, the loss balance learning among the branch networks can be concerned, so that more balanced network training can be realized, and the network learning performance and the training effect can be further improved.

In a possible implementation manner, the correspondingly updating the network parameters of the respective branch networks in the i-1 th state according to the respective network losses may include: determining scaling losses corresponding to the network losses according to the network losses and scaling parameters of the (i-1) th state of the network losses; and correspondingly updating the network parameters of each branch network in the ith-1 state and the scaling parameters of the ith-1 state according to each scaling loss to obtain each branch network in the ith state and the scaling parameters of the ith state. By the method, loss balance learning among the branch networks can be realized based on the scaling loss of each branch network, and a better training effect is obtained when the scaling loss and the total loss are used in combination to train the image processing network.

In a possible implementation, the scaling parameter of the i-1 th state can be understood as a hyper-parameter of the loss of each network of the i-1 th state. It should be understood that both the hyper-parameters and the network parameters may be trained, or may be self-learning. The initial value of the scaling parameter can be set to be 0 or any value, so that in the iterative training process, the scaling parameter of the ith state obtained by updating the scaling parameter of the (i-1) th state according to each scaling loss can be used for determining the scaling loss in the next iterative training. Wherein, the scaling parameter may be updated in a gradient descent and/or a regularization manner, which is not limited in this disclosure.

In one possible implementation, the scaling penalty may be expressed as

Wherein e represents a natural constant, s_nScaling parameter, { L, { which represents the nth network loss_nRepresents the nth network loss. In this way, the scaling loss in each iterative training can be made to tend to be the same constant (for example, 1), so that the loss balance learning between different image processing tasks can be realized.

It should be understood that the above scaling loss is a specific implementation manner provided by the embodiments of the present disclosure, and in fact, the present disclosure should not be limited to the above implementation manner, as long as the scaling loss determined based on the scaling parameter is within the protection scope of the present disclosure.

In a possible implementation manner, the network parameters of the entire image processing network (the main network and the branch networks) may also be updated by performing multiple back propagation on the entire image processing network (the main network and the branch networks) through multiple scaling losses. And updating the network parameters of each branch network by directly adopting each network loss without adopting each scaling loss. The embodiment of the present disclosure is not limited to the parameter updating method of the image processing network.

In the embodiment of the disclosure, not only gradient balance learning of the backbone network based on the total loss is realized, but also loss balance learning among the branch networks is realized, so that more balanced network training is integrally realized, and an image processing network with better training effect is obtained.

In a possible implementation manner, updating the network parameter of the backbone network in the i-1 th state according to the total loss based on the scaling parameter may further include: according to the weighting coefficient corresponding to each scaled network loss, carrying out weighted summation on the scaled network losses to obtain the total loss of the image processing network; updating the network parameters of the backbone network in the (i-1) th state according to the total loss;

wherein the network losses of the plurality of scalings may be scaling parameters s corresponding to the network losses_nDetermined, e.g. can be expressed as

The weighting coefficients corresponding to the scaled network losses may be determined in the same manner as the target weighting coefficients determined in step S13, that is, the weighting coefficients corresponding to the scaled network losses may be determined according to the gradient information of the respective scaled network losses with respect to the backbone network.

By the method, the scaling parameters and the weighting coefficients can be combined and used to realize gradient balance learning and loss balance learning, so that the training effect of the image processing network and the performance of the image processing network are integrally improved.

In one possible implementation, obtaining the total loss of the image processing network according to the respective network losses and the target weighting coefficients of the respective network losses includes: and according to the target weighting coefficient of each network loss, carrying out weighted summation on the multiple network losses to obtain the total loss. Wherein the respective network loss may comprise the scaled network loss.

According to the embodiment of the disclosure, not only can the balance training of the network parameters of the main network be realized in the training of the image processing network, but also the balance training among the network parameters of a plurality of branch networks can be realized, so as to realize the balance training of the image processing network together.

As described above, the gradient information of each network loss for the backbone network can be determined in a back propagation manner, and considering that the amount of parameters of the backbone network is large, when each gradient information is calculated for network parameters of the backbone network, a back propagation operation needs to be performed for each layer of the network layer of the entire image processing network, so that the calculation amount is large, the required calculation resources are large, and time and efficiency are low.

In a possible implementation manner, the separately determining gradient information of each network loss with respect to the backbone network may include: and respectively determining gradient information of each network loss on the image features extracted by the backbone network.

The image features extracted by the backbone network may include a feature map output by a last network layer of the backbone network, and may also be referred to as a shared feature map. By the method, when gradient information of the image characteristics extracted by each network loss to the main network is determined, the gradient information aiming at the image characteristics can be obtained only by reversely propagating and calculating the output of the last network layer of the main network from the output layer of each branch network, so that the calculation amount required by determining the gradient information is greatly reduced, the calculation resources are saved, and the training efficiency is improved.

It should be understood that the gradient information is determined, i.e. gradient calculation, i.e. function derivation, and the objects for derivation may be different, and then the objects for gradient calculation may also be different, i.e. gradient calculation may be performed on network parameters of the network, or gradient calculation may be performed on image features output by the network, where gradient calculation may be performed in a back propagation manner.

Since the network parameters of the backbone network act on the obtained image features, the effect of determining the target weighting coefficients is the same for the gradient information of the image features and the gradient information of the network parameters of the backbone network. In the embodiment of the disclosure, by determining the gradient information of the image features, the whole image processing network does not need to be subjected to complete back propagation operation, so that the calculation amount required for determining the gradient information is reduced, the calculation resources are saved, and the training efficiency is improved.

In a possible implementation manner, the embodiment of the present disclosure may also determine gradient information of each network loss with respect to a network parameter of the backbone network, that is, obtain a gradient for the network parameter, which is not limited in the embodiment of the present disclosure.

In one possible implementation, the method further includes: and determining the image processing network in the ith state as the trained image processing network when the image processing network in the ith state meets the training condition. By the method, the trained image processing network can efficiently and accurately perform multi-task image processing.

The training conditions may include, for example: the total loss is minimum, the total loss converges, the number of iterations is reached, the accuracy of the image processing network meets the requirement, and the like, which can be set specifically according to the actual requirement, and the embodiment of the present disclosure is not limited.

According to the network training method disclosed by the embodiment of the disclosure, the obtained trained image processing network can be deployed on various mobile terminals or servers and used for carrying out various multitask image processing, such as multi-attribute recognition of a license plate; multi-attribute recognition of human faces, and the like.

According to an embodiment of the present disclosure, there is also provided an image processing method including: and inputting the image to be processed into an image processing network to obtain a plurality of second processing results of the image, wherein the image processing network is obtained by training according to the network training method.

That is, the image processing network obtained by the network training method can be used for performing multi-task image processing on an image to be processed to obtain a plurality of second processing results. For example, when the image to be processed is a license plate image, the plurality of second processing results may include a license plate color, a license plate number, a license plate type, and the like.

According to the network training method disclosed by the embodiment of the disclosure, the weighting coefficients of different task loss functions in multi-task learning can be automatically solved, the effect of reducing the manual trial and error cost and balancing the learning of each task can be achieved by minimizing the weighting loss, and the performance of each task can be improved by utilizing the relevance between the tasks while the training and testing efficiency is improved.

It is understood that the above-mentioned method embodiments of the present disclosure can be combined with each other to form a combined embodiment without departing from the logic of the principle, which is limited by the space, and the detailed description of the present disclosure is omitted. Those skilled in the art will appreciate that in the above methods of the specific embodiments, the specific order of execution of the steps should be determined by their function and possibly their inherent logic.

In addition, the present disclosure also provides a network training apparatus, an electronic device, a computer-readable storage medium, and a program, which can be used to implement any network training method provided by the present disclosure, and the corresponding technical solutions and descriptions and corresponding descriptions in the method sections are not repeated.

Fig. 3 shows a block diagram of a network training apparatus according to an embodiment of the present disclosure, as shown in fig. 3, the apparatus comprising:

the processing module is used for inputting a sample image into an image processing network in an i-1 th state to obtain a plurality of first processing results of the sample image, the image processing network comprises a main network and a plurality of branch networks, the main network is used for extracting image characteristics, the branch networks are used for outputting the processing results of each image processing task based on the image characteristics, and i is a positive integer; a gradient information determining module, configured to determine, according to the labeling information of the sample image and the plurality of first processing results, network loss of each branch network and gradient information of each network loss to the main network, respectively; the weighting coefficient determining module is used for determining a target weighting coefficient of each network loss according to the gradient information; the overall loss determining module is used for obtaining the overall loss of the image processing network according to the network losses and the target weighting coefficients of the network losses; and the training module is used for training the image processing network in the i-1 th state according to the total loss to obtain the image processing network in the i-th state.

In a possible implementation manner, the weighting factor determining module includes: the overall gradient information determining submodule is used for determining overall gradient information according to the gradient information and the weighting parameters of the gradient information; and the weighting coefficient determining submodule is used for determining the parameter value corresponding to the weighting parameter of each piece of gradient information as the target weighting coefficient of each network loss under the condition that the gradient components of the overall gradient information in the direction of each piece of gradient information are the same.

According to an embodiment of the present disclosure, there is also provided an image processing apparatus including: the image processing module is used for inputting the image to be processed into an image processing network to obtain a plurality of second processing results of the image, wherein the image processing network is obtained by training according to the network training device.

In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here.

Embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the above-mentioned method. The computer readable storage medium may be a non-volatile computer readable storage medium.

An embodiment of the present disclosure further provides an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the above-described method.

The embodiments of the present disclosure also provide a computer program product, which includes computer readable code, and when the computer readable code runs on a device, a processor in the device executes instructions for implementing the network training method and/or the image processing method provided in any one of the above embodiments.

The embodiments of the present disclosure also provide another computer program product for storing computer readable instructions, which when executed cause a computer to perform the operations of the network training method and/or the image processing method provided in any of the above embodiments.

The electronic device may be provided as a terminal, server, or other form of device.

Fig. 4 illustrates a block diagram of an electronic device 800 in accordance with an embodiment of the disclosure. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, or the like terminal.

Referring to fig. 4, electronic device 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.

The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.

The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 800 is in an operation mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, the sensor assembly 814 may detect an open/closed state of the electronic device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in the position of the electronic device 800 or a component of the electronic device 800, the presence or absence of user contact with the electronic device 800, orientation or acceleration/deceleration of the electronic device 800, and a change in the temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a Complementary Metal Oxide Semiconductor (CMOS) or Charge Coupled Device (CCD) image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as a wireless network (WiFi), a second generation mobile communication technology (2G) or a third generation mobile communication technology (3G), or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium, such as the memory 804, is also provided that includes computer program instructions executable by the processor 820 of the electronic device 800 to perform the above-described methods.

Fig. 5 illustrates a block diagram of an electronic device 1900 in accordance with an embodiment of the disclosure. For example, the electronic device 1900 may be provided as a server. Referring to fig. 5, electronic device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the above-described method.

The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system, such as the Microsoft Server operating system (Windows Server), stored in the memory 1932^TM) Apple Inc. of the present application based on the graphic user interface operating System (Mac OS X)^TM) Multi-user, multi-process computer operating system (Unix)^TM) Free and open native code Unix-like operating System (Linux)^TM) Open native code Unix-like operating System (FreeBSD)^TM) Or the like.

In an exemplary embodiment, a non-transitory computer readable storage medium, such as the memory 1932, is also provided that includes computer program instructions executable by the processing component 1922 of the electronic device 1900 to perform the above-described methods.

The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The computer program product may be embodied in hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A method of network training, comprising:

inputting a sample image into an image processing network in an i-1 th state to obtain a plurality of first processing results of the sample image, wherein the image processing network comprises a main network and a plurality of branch networks, the main network is used for extracting image characteristics, the branch networks are used for outputting the processing results of each image processing task based on the image characteristics, and i is a positive integer;

respectively determining the network loss of each branch network and the gradient information of each network loss to the main network according to the labeling information of the sample image and the plurality of first processing results;

determining target weighting coefficients of the network losses according to the gradient information;

obtaining the total loss of the image processing network according to the network losses and the target weighting coefficients of the network losses;

and training the image processing network in the i-1 th state according to the total loss to obtain the image processing network in the i-th state.

2. The method of claim 1, wherein determining the target weighting factor for each network loss according to the gradient information comprises:

and under the condition that the gradient components of the overall gradient information in the directions of the gradient information are the same, determining the parameter values corresponding to the weighting parameters of the gradient information as target weighting coefficients of the network losses.

3. The method according to claim 1 or 2, wherein the sum of the target weighting coefficients of the respective network losses is 1.

4. The method of any of claims 1-3, wherein training the image processing network for the i-1 th state based on the overall loss comprises:

updating the network parameters of the backbone network in the i-1 th state according to the total loss;

and/or correspondingly updating the network parameters of each branch network in the i-1 th state according to the network losses.

5. The method according to claim 4, wherein the correspondingly updating the network parameters of the respective branch networks of the i-1 th state according to the respective network losses comprises:

determining scaling losses corresponding to the network losses according to the network losses and scaling parameters of the (i-1) th states of the network losses;

and correspondingly updating the network parameters of each branch network in the ith-1 state and the scaling parameters of the ith-1 state according to each scaling loss to obtain each branch network in the ith state and the scaling parameters of the ith state.

6. The method according to any one of claims 1-5, wherein said deriving an overall loss of the image processing network from the respective network losses and the target weighting coefficients for the respective network losses comprises:

and according to the target weighting coefficient of each network loss, carrying out weighted summation on the multiple network losses to obtain the total loss.

7. The method of claim 1, wherein separately determining gradient information of each network loss for the backbone network comprises:

and respectively determining the gradient information of the image characteristics extracted by the network losses to the backbone network.

8. The method according to any one of claims 1-7, further comprising:

and determining the image processing network in the ith state as a trained image processing network under the condition that the image processing network in the ith state meets a training condition.

9. The method according to any of claims 1-8, wherein the image processing tasks include at least two of image recognition, image classification, image segmentation, and keypoint detection.

10. An image processing method, comprising:

inputting an image to be processed into an image processing network to obtain a plurality of second processing results of the image, wherein the image processing network is trained according to the network training method of any one of claims 1 to 9.

11. A network training apparatus, comprising:

the processing module is used for inputting a sample image into an image processing network in an i-1 th state to obtain a plurality of first processing results of the sample image, the image processing network comprises a main network and a plurality of branch networks, the main network is used for extracting image characteristics, the branch networks are used for outputting the processing results of each image processing task based on the image characteristics, and i is a positive integer;

a gradient information determining module, configured to determine, according to the labeling information of the sample image and the plurality of first processing results, network loss of each branch network and gradient information of each network loss to the main network, respectively;

a weighting coefficient determining module, configured to determine, according to the gradient information, a target weighting coefficient of each network loss;

the overall loss determining module is used for obtaining the overall loss of the image processing network according to the network losses and the target weighting coefficients of the network losses;

and the training module is used for training the image processing network in the i-1 th state according to the total loss to obtain the image processing network in the i-th state.

12. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to invoke the memory-stored instructions to perform the method of any one of claims 1 to 10.

13. A computer readable storage medium having computer program instructions stored thereon, which when executed by a processor implement the method of any one of claims 1 to 10.