CN114202026A

CN114202026A - Multitask model training method and device and multitask processing method and device

Info

Publication number: CN114202026A
Application number: CN202111508235.8A
Authority: CN
Inventors: 张宸鸣; 钟开; 张通滨; 杨建忠; 卢振
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-12-10
Filing date: 2021-12-10
Publication date: 2022-03-18

Abstract

The disclosure provides a multitask model training method and device, and relates to the technical fields of computer vision, deep learning and the like. The specific implementation scheme is as follows: acquiring a training sample set comprising at least one type of initial image; acquiring a pre-established multi-task network, wherein a general feature extractor in the multi-task network is respectively connected with each initial branch network through branch nodes; selecting an initial image from the training sample set, inputting the selected initial image into a general feature extractor, and obtaining a feature map corresponding to the selected initial image; inputting the characteristic diagram into an initial branch network corresponding to the identification element of the characteristic diagram aiming at each characteristic diagram in the obtained characteristic diagrams; and acquiring the gradient value of each initial branch network in the branch node, adjusting the loss weight value of the corresponding initial branch network based on the gradient value of each initial branch network, and responding to the fact that the multitask network meets the training completion condition to obtain the multitask model. This embodiment balances the training effects of multiple tasks.

Description

Multitask model training method and device and multitask processing method and device

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to the field of computer vision, deep learning, and the like, and in particular, to a multitask model training method and apparatus, a multitask processing method and apparatus, an electronic device, a computer-readable storage medium, and a computer program product.

Background

With the development of AI (Artificial Intelligence) technology, the performance of mobile terminal hardware is improved, the price is reduced, and the visual algorithm can be used for making up for the lack of positioning accuracy at the mobile terminal, for example, lane-level navigation or other functions requiring high-precision positioning can be realized at the mobile terminal, which is of great significance for automatic driving and map data production. Due to the limitation of the computing power of the mobile terminal, more identification elements need to be identified on the premise of ensuring the algorithm execution efficiency, the multi-task model is a necessary scheme, the training intensity of different tasks needs to be balanced when the multi-task model is trained, and the best effect of the multi-task model on each task can be ensured.

Disclosure of Invention

The present disclosure provides a multitask model training method and apparatus, a multitask processing method and apparatus, an electronic device, a computer-readable storage medium, and a computer program product.

According to a first aspect, there is provided a multitask model training method, the method comprising: acquiring a training sample set comprising at least one type of initial image, wherein each type of initial image is marked with at least one type of identification element; acquiring a pre-established multitask network, wherein the multitask network comprises a general feature extractor and initial branch networks which correspond to various identification elements one by one, and the general feature extractor is connected with each initial branch network through branch nodes; the following training steps are performed: selecting initial images from the training sample set, inputting the selected initial images into a general feature extractor to obtain feature maps corresponding to the selected initial images one by one, wherein identification elements of the selected initial images correspond to all initial branch networks; inputting the characteristic diagram into an initial branch network corresponding to the identification element of the characteristic diagram aiming at each characteristic diagram in the obtained characteristic diagrams; and acquiring the gradient value of each initial branch network in the branch node, adjusting the loss weight value of the corresponding initial branch network based on the gradient value of each initial branch network, and responding to the fact that the multitask network meets the training completion condition to obtain the multitask model.

According to a second aspect, there is provided a multitasking method, the method comprising: acquiring an image to be processed; and inputting the image to be processed into the multitask model generated by adopting the method described in any one implementation mode of the first aspect to obtain a multitask processing result of the image to be processed.

According to a third aspect, there is provided a multitask model training device, the device comprising: the system comprises a sample acquisition unit, a comparison unit and a comparison unit, wherein the sample acquisition unit is configured to acquire a training sample set comprising at least one type of initial image, and each type of initial image is labeled with at least one type of identification element; the network acquisition unit is configured to acquire a pre-established multitask network, the multitask network comprises a general feature extractor and initial branch networks which correspond to various identification elements one by one, and the general feature extractor is connected with each initial branch network through a branch node; the image selection unit is configured to select initial images from the training sample set, input the selected initial images into the universal feature extractor, and obtain feature maps corresponding to the selected initial images one by one, wherein the identification elements of the selected initial images correspond to all the initial branch networks; a feature input unit configured to input, for each of the obtained feature maps, the feature map into an initial branch network corresponding to an identification element of the feature map; a gradient adjustment unit configured to acquire a gradient value of each initial branch network in the branch node, and adjust a loss weight value of the corresponding initial branch network based on the gradient value of each initial branch network; and the model obtaining unit is configured to obtain the multi-task model when the multi-task network meets the training completion condition.

According to a fourth aspect, there is also provided a multitasking device comprising: an acquisition unit configured to acquire an image to be processed; and the input unit is configured to input the image to be processed into the multitask model generated by adopting the device according to any one of the implementation modes of the third aspect, and obtain a multitask processing result of the image to be processed.

According to a fifth aspect, there is provided an electronic device comprising: at least one processor; and a memory communicatively connected to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method as described in any one of the implementations of the first aspect or the second aspect.

According to a sixth aspect, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform a method as described in any one of the implementations of the first or second aspect.

According to a seventh aspect, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method as described in any of the implementations of the first or second aspect.

The multi-task model training method and device provided by the embodiment of the disclosure comprise the steps of firstly, obtaining a training sample set comprising at least one type of initial image, wherein each type of initial image is marked with at least one type of identification element; secondly, acquiring a pre-established multi-task network, wherein the multi-task network comprises a general feature extractor and initial branch networks which correspond to various identification elements one by one, and the general feature extractor is respectively connected with each initial branch network through branch nodes; thirdly, selecting initial images from the training sample set, inputting the selected initial images into the universal feature extractor to obtain feature maps corresponding to the selected initial images one by one, wherein the identification elements of the selected initial images correspond to all the initial branch networks; secondly, inputting each feature map into an initial branch network corresponding to the identification element of the feature map aiming at each feature map in the obtained feature maps; and finally, acquiring the gradient value of each initial branch network in the branch node, adjusting the loss weight value of the corresponding initial branch network based on the gradient value of each initial branch network, and responding to the fact that the multitask network meets the training completion condition to obtain the multitask model. Therefore, by setting the general feature extractor and the multi-task network of the initial branch networks and adopting the gradient of each initial branch network to adjust the loss weight value of the initial branch network during multi-task training, the multi-task model based on gradient balance is obtained, the training effect of a plurality of tasks is balanced, and the multi-task model can achieve the best effect on each task.

The multitasking method and the multitasking device provided by the embodiment of the disclosure acquire an image to be processed; and inputting the image to be processed into the multitask model generated by adopting the multitask model training method of the embodiment to obtain a multitask processing result of the image to be processed. Therefore, the multi-task model generated by the universal feature extractor and the plurality of initial branch networks is adopted, the task processing effect of each task can be balanced, and the efficiency of multi-task processing is improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a flow diagram of one embodiment of a multitask model training method according to the present disclosure;

FIG. 2 is a schematic diagram of an architecture for training a multitasking network in an embodiment of the present disclosure;

FIG. 3 is a flow diagram for one embodiment of a multitasking method according to the present disclosure;

FIG. 4 is a schematic block diagram of one embodiment of a multitask model training device according to the present disclosure;

FIG. 5 is a schematic block diagram of one embodiment of a multitasking device according to the present disclosure;

FIG. 6 is a block diagram of an electronic device for implementing a multitasking model training method or a multitasking method according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Based on the fact that a traditional multi-task model with multiple recognition elements needs to be labeled with a full data set, otherwise, multiple different task models are adopted, the present disclosure provides a multi-task model training method based on gradient balance, and fig. 1 shows a flow 100 according to an embodiment of the multi-task model training method of the present disclosure, where the multi-task model training method includes the following steps:

step 101, a training sample set comprising at least one type of initial image is obtained.

In this embodiment, the execution subject on which the multitask model training method operates may obtain the training sample set in various manners, for example, the execution subject may obtain the training sample set stored in the database server in a wired connection manner or a wireless connection manner. As another example, the user may obtain a training sample set collected by the terminal by communicating with the terminal.

Here, the training sample set may include at least one type of initial images, and the initial images are divided into different types for different tasks implemented by different samples when the multitask model is trained, and each type of initial image may implement sample labeling of the recognition element corresponding to at least one type of task, for example, one type of initial image of the training sample set is used for a target detection task, another type of initial image of the training sample set is used for semantic segmentation and keypoint detection, and the initial image of the current training sample set may also be used for other visual tasks, which is not described herein again.

In this embodiment, each type of initial image is labeled with at least one type of identification element, the identification element is a detection target to be processed by the multitask model, the detection target may be a person, a thing, a scene, etc. in the image, the multitask model generally has more than two detection targets, the identification element is correctly determined by the multitask network by labeling the identification element on the initial image, and true value information of the detection target is provided for the multitask model training. When the initial image is used for realizing a task, a class of identification elements are marked on the initial image; when the initial image is used for realizing more than two tasks, more than two types of identification elements are marked on the initial image; as shown in fig. 2, the class a image is marked with an identification element 1 (not shown in the figure) and an identification element 2 (not shown in the figure), and two tasks can be realized; the class B image is labeled with an identification element 3 (not shown in the figure), and a task can be realized.

In this embodiment, acquiring a training sample set including at least one type of initial image includes: and adopting independent data read-in modules for images with different identification elements, realizing independent data preprocessing according to task requirements, and processing different format information. For example, the preprocessing methods required by tasks such as semantic segmentation, target detection, and key point detection are different in processing of images of each type of recognition element, so that the images are completely decoupled according to different recognition elements to obtain a training sample set.

In the technical scheme of the disclosure, the processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the initial image and the identification element are executed after authorization, and the processes conform to relevant laws and regulations.

Step 102, a pre-established multitask network is obtained.

The multitask network comprises a general feature extractor and initial branch networks which correspond to various identification elements one to one, and the general feature extractor is connected with the initial branch networks through branch nodes.

In this embodiment, the general feature extractor maps the input image into a high-dimensional feature space to obtain a feature map of the input image, where the feature map includes features of all identification elements of the input image, and the general feature extractor is a network shared by all initial branch networks; the initial branch networks respectively correspond to the tasks and the identification elements of the multitask model, the number of the initial branch networks is the same as that of the tasks of the multitask model, the number of the initial branch networks is also the same as that of the identification elements, and each initial branch network performs task processing on the corresponding identification element in the characteristic diagram of the input image based on the characteristic diagram of the input image to obtain a task processing result corresponding to the identification element.

In this embodiment, the general feature extractor extracts all features of the input initial image to obtain a feature map of the input image, splits the feature map corresponding to different identification elements based on the difference between the identification elements, inputs the feature map of each identification element into the initial branch network corresponding to the identification element, and obtains a task processing result of each initial branch network performing task processing on each feature map.

Based on the different scenes adapted by the multitask model, the initial branch network in the multitask network may have different network structures, for example, in the automatic driving scene, the multitask model needs to perform lane line segmentation, pedestrian detection and the like, and then the multitask network needs to have a network structure including a semantic segmentation network and a target detection network.

In some optional implementations of this embodiment, the initial branch network may include: any two of a semantic segmentation network, a target detection network and a key point detection network.

In this embodiment, the semantic segmentation network takes a feature map of some original data (e.g., a plane image) as an input and converts the feature map into a mask with a highlighted region of interest, where each pixel in the image is assigned to a corresponding category according to an object of interest to which the pixel belongs, and compared with the conventional semantic segmentation network, the semantic segmentation network of this embodiment takes an extracted feature map as an intermediate link, and the semantic segmentation network of this embodiment directly takes the feature map of the original image as an input.

In this embodiment, the target detection network takes some feature maps of raw data as input, finds out all interested targets in the feature maps, and determines the positions and sizes of the targets. Compared with the traditional target detection network, the target detection network of the embodiment takes the extracted feature map as an intermediate link, and directly adopts the feature map of the initial image as input.

In this embodiment, the keypoint detection network takes feature maps of some original data as input, finds out all interested keypoints in the feature maps, and determines the position relationship between the keypoints. Compared with the traditional key point detection network, the key point detection network of the embodiment directly adopts the feature graph of the initial image as input, and takes the extracted feature graph as an intermediate link.

In the optional implementation mode, based on the tasks of the multitask model, various initial branch networks are set, various optional modes are provided for the representation of the initial branch networks, and the diversity of the initial branch network setting is improved.

And 103, selecting initial images from the training sample set, inputting the selected initial images into the general feature extractor, and obtaining feature maps corresponding to the selected initial images one by one.

Wherein the selected identification elements of the initial image correspond to all of the initial branch networks.

In this embodiment, the executing subject may select an initial image from the training sample set obtained in step 101, and execute the training steps from step 103 to step 106. The selection manner and the number of the initial images are not limited in the present application. For example, in one iterative training, a type of initial image can be randomly selected, and at least two identification elements are marked on the type of initial image; or two types of initial images are randomly selected in one iteration training, one type of identification element is marked on each type of initial image, the loss value of the multitask network is calculated according to the marking information of the identification elements of the selected initial images, and the parameters of the multitask network are adjusted.

In this embodiment, the general feature extractor is mainly used to map the selected initial image to a high-dimensional feature space to obtain a high-dimensional feature. The generic feature extractor may be an encoder, for example, the feature extractor may be composed of two layers of DNNs, each of which is 512 dimensions.

In this embodiment, the feature map corresponds to each type of initial image, for example, if a type of initial image is input, the feature map of the type of initial image is output by the general feature extractor; and inputting the multiple types of initial images, and outputting the feature maps of the multiple types of initial images by the universal feature extractor. As shown in fig. 2, when a class a image is input, a class a feature map is output; and inputting the B-class image, and outputting the B-class feature map.

Based on the different number of the selected initial images, the manner of obtaining the feature maps corresponding to the selected initial images one to one is also different. In some optional implementation manners of this embodiment, the inputting the selected initial image into the general feature extractor to obtain the feature maps corresponding to the selected initial image one to one includes: responding to the selected initial images of multiple types, superposing and inputting the initial images of multiple types into a general feature extractor to obtain a feature map output by the general feature extractor; and splitting the feature map output by the general feature extractor according to the types of the multiple types of initial images to obtain the feature maps corresponding to the multiple types of initial images.

In the optional implementation mode, when the selected initial images are of multiple types, the multiple types of initial images are simultaneously overlapped and input into the general feature extractor, and the general feature extractor is split according to the types of the multiple types of initial images, so that feature maps corresponding to the various types of initial images can be effectively obtained, and the effectiveness of subsequent training of each initial branch network is ensured.

Optionally, the inputting the selected initial image into the general feature extractor to obtain the feature maps corresponding to the selected initial image one to one includes: and responding to the selected initial image as one type, inputting the initial image into a general feature extractor to obtain a feature map output by the general feature extractor, wherein the feature map output by the general feature extractor is the feature map of all the initial branch networks.

In this optional implementation manner, the feature map corresponding to the initial image may be used as an input of each initial branch network, so that each initial branch network can process its respective identification element in the feature map corresponding to the initial image.

And 104, inputting the characteristic diagram into the initial branch network corresponding to the identification element of the characteristic diagram aiming at each characteristic diagram in the obtained characteristic diagrams.

In this embodiment, each iterative training of the multitask network obtains a feature map from the general feature extractor, splits the feature map according to the type of the initial image, and inputs each feature map into the initial branch network corresponding to the identification element of the initial image, for example, in fig. 2, if an identification element 1 and an identification element 2 are marked in the a-type image, an a-type feature map corresponding to the a-type image is obtained from the general feature extractor, and the a-type feature maps are respectively input into the initial branch network 1 and the initial branch network 2, at this time, the a-type feature map is input into both the initial branch network 1 and the initial branch network 2. And if the B-class image is marked with the identification element 3, obtaining a B-class feature map corresponding to the B-class image from the general feature extractor, and inputting the B-class feature map into the initial branch network 3.

As shown in fig. 2, the initial images corresponding to different recognition elements are combined and input to a general feature extractor of a multi-task network, the general feature extractor is used to perform feature extraction on the input initial image, the extracted feature maps are split according to different recognition elements and then are respectively transmitted to independent initial branch networks, each initial branch network performs independent recognition element processing, and the recognition element processing may be target detection task processing of multiple categories, or multiple target recognition task processing, such as detection, semantic segmentation, and key point detection. Because different initial branch networks correspond to independent loss functions, when data marking is carried out on each identification element, only each identification element can be marked, and the prediction condition of other initial branch networks does not need to be considered.

And 105, acquiring the gradient value of each initial branch network in the branch node, and adjusting the loss weight value of the corresponding initial branch network based on the gradient value of each initial branch network.

In this embodiment, based on the structure of the multitask network, independent loss functions may be set for different initial branch networks, and the loss functions of the different initial branch networks may be calculated in each iterative training of the multitask network, as shown in fig. 2, the loss function of the initial branch network 1 may be independently calculated for the identification element 1 processed by the initial branch network 1, so as to obtain a loss value of the loss function of the initial branch network 1, and based on the loss value of the initial branch network 1, parameters of the multitask network may be adjusted.

The gradient is intended to be a vector, meaning that the directional derivative of a certain loss function at that point takes a maximum value along that direction, i.e. the loss function changes most rapidly and at a maximum rate along that direction at that point. In deep learning, the main task of the neural network is to find the optimal network parameters (weights and biases) during learning, i.e. the parameters with the lowest loss function. However, in general, the loss function is complicated, and there are many parameters, and the point at which the minimum value is obtained cannot be specified. The method of finding the minimum value (or the value as small as possible) by the gradient is the gradient descent method. In order to make the loss function of the initial branch network decrease most quickly, a gradient descent method algorithm can be adopted to update the parameters of the multitask network along the negative direction of the gradient.

Since the multitasking network shares most of the network structure (the generic feature extractor in fig. 2), different initial branch networks will affect the feature activation of the generic feature extractor, and thus a conflict situation may occur. To solve this problem, the loss weights of different initial branch networks can be adjusted according to the gradient of each initial branch network in the general feature extractor. The training step of the multitask network comprises multiple times of iterative training, gradient values of branch nodes connecting the general feature extractor and each initial branch network can be acquired through a tool in each iterative training, and gradient values corresponding to loss functions of each initial branch network can be obtained in a distinguishing mode based on the acquired gradient values.

In this embodiment, during each iterative training, a loss value of a loss function of the multitask network is calculated once, the loss value of the multitask network is obtained by adding product values of all initial branch networks, the product value of each initial network is obtained by multiplying the loss value of each initial branch network by a loss weight value of each initial branch network, the proportion of the initial branch network in the multitask network can be adjusted by adjusting the loss weight value of the initial branch network, and the larger the loss weight value is, the larger the training proportion of the initial branch network in the multitask network is.

In some optional implementations of this embodiment, the adjusting the loss weight value of the corresponding initial branch network based on the gradient value of each initial branch network includes: in response to the gradient value of the current initial branch network in all the initial branch networks in the current iteration training period being greater than the gradient values of the other initial branch networks, setting the loss weight value of the current initial branch network to be less than the loss weight values of the other initial branch networks in the next iteration training period.

In this optional implementation manner, the iterative training period refers to a time period during which all initial branch networks in the multi-task network complete the computation of the loss function and the adjustment of the parameters in the current iterative training.

In the optional implementation mode, the gradient of the initial branch network in the current iteration training period is monitored, the loss weight value of the current initial branch network in the next iteration training period is adjusted, and the reliability of the multi-task network training is ensured by the adjusting means provided for the multi-task network training.

Optionally, the adjusting the loss weight value of each initial branch network based on the gradient value of the initial branch network further includes: and in response to the gradient values of the current initial branch network in all the initial branch networks in the multi-iteration training period being greater than the gradient values of the other initial branch networks, setting the loss weight value of the current initial branch network to be less than the loss weight values of the other initial branch networks after the multi-iteration training period.

In this optional implementation manner, after the gradient value of the current initial branch network is monitored for a plurality of iteration cycles, the loss weight value of the current initial branch network is adjusted, so as to provide a reliable basis for the stable training of the multitask network.

And 106, responding to the fact that the multitask network meets training completion conditions, and obtaining a multitask model.

In this embodiment, whether the multitask network meets the training completion condition or not can be detected through the loss value of the multitask network, and after the multitask network meets the training completion condition, the trained multitask model is obtained.

In this embodiment, the training completion condition includes at least one of the following: the training iteration times of the multitask network reach a preset iteration threshold value, and the loss value of the multitask network is smaller than the preset loss value threshold value. Wherein the predetermined iteration threshold is an empirical value based on a loss value of the multitask network. For example, the predetermined iteration threshold for a multitasking network is 1 ten thousand times. The predetermined penalty value threshold for the multitasking network is 0.02.

Optionally, in this embodiment, in response to that the multitask network does not satisfy the training completion condition, the relevant parameters in the multitask network are adjusted so that the loss value of the multitask network converges, and the

training step

103 and 106 are continuously performed based on the adjusted multitask network.

In this optional implementation manner, when the multitask network does not satisfy the training completion condition, the relevant parameters of the multitask network are adjusted, which is helpful for helping the convergence of the loss value of the multitask network.

In this embodiment, if the training is not completed, the loss value of the multitask network can be converged by adjusting the parameters of the multitask network. Specifically, adjusting the relevant parameters in the multitasking network so that the loss value of the multitasking network converges comprises: by executing steps 103 to 106, the parameters of any initial branch network in the multitasking network or the loss weight value of any initial branch network are repeatedly adjusted, so that the loss value of the multitasking network converges.

Optionally, in each iteration process, parameters of more than two initial branch networks may also be adjusted simultaneously, so as to ensure that the loss value of the multitask network gradually becomes smaller until stable.

The multi-task model training method provided by the embodiment of the disclosure includes the steps that firstly, a training sample set including at least one type of initial image is obtained, and at least one type of identification element is marked on each type of initial image; secondly, acquiring a pre-established multi-task network, wherein the multi-task network comprises a general feature extractor and initial branch networks which correspond to various identification elements one by one, and the general feature extractor is respectively connected with each initial branch network through branch nodes; thirdly, selecting initial images from the training sample set, inputting the selected initial images into the universal feature extractor to obtain feature maps corresponding to the selected initial images one by one, wherein the identification elements of the selected initial images correspond to all the initial branch networks; secondly, inputting each feature map into an initial branch network corresponding to the identification element of the feature map aiming at each feature map in the obtained feature maps; and finally, acquiring the gradient value of each initial branch network in the branch node, adjusting the loss weight value of the corresponding initial branch network based on the gradient value of each initial branch network, and responding to the fact that the multitask network meets the training completion condition to obtain the multitask model. Therefore, by setting the general feature extractor and the multi-task network of the initial branch networks and adopting the gradient of each initial branch network to adjust the loss weight value of the initial branch network during multi-task training, the multi-task model based on gradient balance is obtained, the training effect of a plurality of tasks is balanced, and the multi-task model can achieve the best effect on each task.

In another embodiment of the present disclosure, the method for training a multitask model further includes: acquiring a newly added image, wherein the newly added image is marked with at least one type of newly added elements; adding a newly-added branch network corresponding to all newly-added elements in the multitask model so that the general feature extractor is also connected with the newly-added branch network through branch nodes; the following new training steps are performed: selecting a newly added image and an initial image, and inputting the selected newly added image and the selected initial image into a general feature extractor simultaneously to obtain a new feature map; inputting the feature maps of the correspondingly selected initial images split from the new feature maps into each initial branch network in sequence; inputting the feature map of the newly added image which is split from the new feature map and correspondingly selected into the newly added branch network; and acquiring gradient values of each initial branch network and each newly added branch network in the branch node, and adjusting the loss weight value of the corresponding initial branch network and/or the newly added branch network based on the gradient values of each initial branch network and the newly added branch network.

In this embodiment, the newly added image is an image different from the initial image, and the newly added image is labeled with labeling information of the newly added element, so as to provide truth value information for the newly added element, thereby facilitating training of the newly added branch network; the newly added branch network is a network that is different from each of the initial branch networks, and the newly added branch network can perform a task different from each of the initial branch networks.

In this embodiment, after the newly added image is input into the general feature extractor, a feature map corresponding to the newly added image is correspondingly generated, and the feature map corresponding to the newly added image is input into the newly added branch network, which is convenient for training of the newly added branch network.

In this embodiment, when the multitask model added with the newly added branch network meets the training completion condition, the new multitask model is obtained, and the new multitask model is added with the newly added branch network relative to the multitask model, so that new task processing can be realized.

The multi-task model training method provided by the embodiment of the disclosure can be used for randomly expanding new newly added elements on a generated multi-task model, and simultaneously, only newly added data sets corresponding to the newly added elements need to be labeled and newly added branch networks are added, so that different elements can be randomly expanded.

In another embodiment of the present disclosure, the multi-tasking model training method may further include: one or more initial branching networks in the multitasking model are removed.

The multi-task model training method provided by the embodiment of the disclosure can remove one or more initial branch networks from the trained multi-task model, and after the initial branch networks are removed, the performance of a new multi-task model is not affected, so that the expandability of the multi-task model is ensured.

Optionally, after a newly added branch network is added to the new multitask model, in another optional implementation manner of this embodiment, the multitask model training method may further include: one or more newly added branch networks in the new multitasking model are removed.

Further, based on the multi-task model training method provided by the embodiment, the disclosure also provides an embodiment of a multi-task processing method, and the multi-task processing method disclosed by the disclosure combines the artificial intelligence fields of computer vision, deep learning and the like.

Referring to fig. 3, a flowchart 300 of an embodiment of a multitasking method according to the present disclosure is shown, the multitasking method provided by the present embodiment includes the following steps:

step 301, acquiring an image to be processed.

In this embodiment, the image to be processed may be an image including information of people, things, scenery, and the like, and different task processing results may be obtained by processing the image to be processed through the multitask model. The execution subject of the multitasking method may acquire the image to be processed in various ways. For example, the execution subject may obtain the image to be processed stored in the database server through a wired connection manner or a wireless connection manner. For another example, the execution main body may also receive, in real time, the to-be-processed image acquired by the terminal or other device in real time.

In this embodiment, the acquired image to be processed may have an identification element or may not have an identification element, when the image to be processed has an identification element, the identification element on the image to be processed may be of one type or may also be of multiple types, each type of identification element corresponds to one task, and the identification element may be effectively identified based on the general feature extractor in the multitask model and the initial branch network corresponding to the identification element, so as to obtain a task processing result for the identification element.

When the image to be processed does not have the identification element, the multitask model can directly give the task processing result of the undetected identification element.

In this embodiment, the identification element corresponds to a task, for example, in the target detection task, the identification element is a target corresponding to the target detection task, and the target may be a person, an object, or the like in the image to be processed; in the semantic segmentation task, the identification elements are pixel categories of different objects in the image to be processed to be labeled in the semantic segmentation task.

Step 302, inputting the image to be processed into the multitask model to obtain the multitask processing result of the image to be processed.

In this embodiment, the executing body may input the to-be-processed image obtained in step 301 into the multitask model, so as to obtain a multitask processing result of the obtained to-be-processed image. It should be noted that the multitasking result is a result of performing multiple kinds of task processing on the image to be processed, and based on the structure of the multitasking model, the obtained multitasking result can improve the efficiency of processing all tasks.

In this embodiment, the multitask model may be obtained by training using the method described in the embodiment of fig. 1, and the specific training process may refer to the description related to the embodiment of fig. 1, which is not described herein again.

In this embodiment, the multitasking result of the image to be processed is determined based on the initial branch network and/or the newly added branch network of the multitasking model, and the number of the initial branch network and the newly added branch network in the multitasking model is the number of the multitasking results of the image to be processed. For example, the multitasking model has only the initial branch networks, and the number of the initial branch networks is two, then there are two multitasking results of the image to be processed. For another example, the multitasking model includes two initial branch networks and three newly added branch networks, and there are five multitasking results of the image to be processed.

In some optional implementations of this embodiment, the initial branching network of the multitasking model includes: any two or more of the semantic segmentation network, the target detection network and the key point detection network, and the multitasking result comprises the following steps: at least two items of semantic segmentation results, target detection results and key point detection results of targets in the image to be processed.

In this optional implementation, the multitask processing results obtained are different based on the difference of the tasks of the multitask model, and after the multitask model processes the image to be processed, the expression forms of the obtained multitask processing results may also be different.

In the optional implementation mode, the multi-task processing result expression form is set based on the tasks of the multi-task model, so that various optional modes are provided for representing the multi-task processing result, and the diversity of the multi-task model for processing the image to be processed is improved.

The multitasking method provided by the embodiment of the disclosure acquires an image to be processed; and inputting the image to be processed into the multitask model generated by adopting the multitask model training method of the embodiment to obtain a multitask processing result of the image to be processed. Therefore, the multi-task model generated by the universal feature extractor and the plurality of initial branch networks is adopted, the task processing effect of each task can be balanced, and the efficiency of multi-task processing is improved.

With further reference to fig. 4, as an implementation of the methods shown in the above-mentioned figures, the present disclosure provides an embodiment of a multitask model training device, which corresponds to the embodiment of the method shown in fig. 1, and which is particularly applicable to various electronic devices.

As shown in fig. 4, the multitask model training device 400 provided in this embodiment includes: a sample acquiring unit 401, a network acquiring unit 402, an image selecting unit 403, a feature inputting unit 404, a gradient adjusting unit 405, and a model obtaining unit 406. The sample acquiring unit 401 may be configured to acquire a training sample set including at least one type of initial image, where each type of initial image is labeled with at least one type of identification element. The network acquiring unit 402 may be configured to acquire a pre-established multitasking network, where the multitasking network includes a general feature extractor and initial branch networks corresponding to the identification elements, and the general feature extractor is connected to each of the initial branch networks through a branch node. The image selecting unit 403 may be configured to select an initial image from the training sample set, input the selected initial image into the general feature extractor, and obtain a feature map corresponding to the selected initial image, where the identification elements of the selected initial image correspond to all the initial branch networks. The feature input unit 404 may be configured to input, for each of the obtained feature maps, the feature map into an initial branch network corresponding to the identification element of the feature map. The gradient adjustment unit 405 may be configured to collect gradient values of each initial branch network in the branch node, and adjust the loss weight values of the corresponding initial branch networks based on the gradient values of each initial branch network. The model obtaining unit 406 may be configured to obtain the multitask model when the multitask network satisfies the training completion condition.

In the present embodiment, in the multitask model training device 400: the detailed processing of the sample obtaining unit 401, the network obtaining unit 402, the image selecting unit 403, the feature input unit 404, the gradient adjusting unit 405, and the model obtaining unit 406 and the technical effects thereof may refer to the related descriptions of step 101, step 102, step 103, step 104, step 105, and step 106 in the corresponding embodiment of fig. 1, which are not described herein again.

In some optional implementations of the present embodiment, the image selecting unit 403 includes: and (3) superposing the modules (not shown in the figure) to obtain modules (not shown in the figure). The superposition module may be configured to, in response to that the selected initial image is of multiple types, superpose and input the multiple types of initial images into the general feature extractor, so as to obtain a feature map output by the general feature extractor. The obtaining module may be configured to split the feature map output by the general feature extractor according to the types of the multiple types of initial images to obtain feature maps corresponding to the multiple types of initial images.

In some optional implementations of this embodiment, the apparatus 400 further includes: an additional acquiring unit (not shown), a network adding unit (not shown), an additional selecting unit (not shown), an initial input unit (not shown), and an additional input unit (not shown). The new-added image acquisition unit is configured to acquire a new-added image, and the new-added image is marked with at least one type of new added element. The network adding unit may be configured to add a new branch network corresponding to all the new added elements in the multitasking model, so that the general feature extractor is further connected with the new branch network through the branch node. The newly added image and the initial image may be selected, and the selected newly added image and the selected initial image may be simultaneously input to the general feature extractor to obtain a new feature map. The initial input unit may be configured to sequentially input feature map inputs, which are split from the new feature map and correspond to the selected initial images, into the initial branch networks. The newly added input unit may be configured to input the feature map, which is split from the new feature map and corresponds to the selected newly added image, into the newly added branch network. The newly added adjusting unit may be configured to collect gradient values of each initial branch network and the newly added branch network in the branch node, and adjust the loss weight values of the corresponding initial branch network and/or the newly added branch network based on the gradient values of each initial branch network and the gradient values of the newly added branch network.

In some optional implementations of this embodiment, the initial branch network includes: any two of a semantic segmentation network, a target detection network and a key point detection network.

In some optional implementations of the present embodiment, the gradient adjustment unit 405 is further configured to: in response to the gradient value of the current initial branch network in all the initial branch networks in the current iteration training period being greater than the gradient values of the other initial branch networks, setting the loss weight value of the current initial branch network to be less than the loss weight values of the other initial branch networks in the next iteration training period.

In some optional implementations of the present embodiment, the apparatus 400 further includes: a removal unit (not shown in the figure). Wherein the removing unit may be configured to remove one or more initial branch networks in the multitasking model.

In the multi-task model training device provided by the embodiment of the present disclosure, first, a sample obtaining unit 401 obtains a training sample set including at least one type of initial image, where each type of initial image is labeled with at least one type of identification element; secondly, the network obtaining unit 402 obtains a pre-established multitask network, where the multitask network includes a general feature extractor and initial branch networks corresponding to various identification elements one to one, and the general feature extractor is connected to each of the initial branch networks through a branch node; thirdly, the image selecting unit 403 selects an initial image from the training sample set, inputs the selected initial image into the general feature extractor, obtains feature maps corresponding to the selected initial image one by one, and identifies elements of the selected initial image corresponding to all the initial branch networks; next, for each of the obtained feature maps, feature input section 404 inputs the feature map into the initial branch network corresponding to the identification element of the feature map; then, the gradient adjustment unit 405 collects the gradient values of each initial branch network in the branch node, and adjusts the loss weight values of the corresponding initial branch networks based on the gradient values of each initial branch network; finally, the model obtaining unit 406 obtains the multitask model in response to the multitask network satisfying the training completion condition. Therefore, by setting the general feature extractor and the multi-task network of the initial branch networks and adopting the gradient of each initial branch network to adjust the loss weight value of the initial branch network during multi-task training, the multi-task model based on gradient balance is obtained, the training effect of a plurality of tasks is balanced, and the multi-task model can achieve the best effect on each task.

With further reference to fig. 5, as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of a multitasking device, which corresponds to the embodiment of the method shown in fig. 3, and which is particularly applicable in various electronic devices.

As shown in fig. 5, the multitasking device 500 provided by the present embodiment includes: an acquisition unit 501 and an input unit 502. The acquiring unit 501 may be configured to acquire an image to be processed. The input unit 502 may be configured to input the image to be processed into the multitasking model generated by the apparatus as described in the embodiment of fig. 3, and obtain a multitasking result of the image to be processed.

In the present embodiment, in the multitasking device 500: the detailed processing of the obtaining unit 501 and the input unit 502 and the technical effects thereof can refer to the related descriptions of step 301 and step 302 in the corresponding embodiment of fig. 3, which are not repeated herein.

In some optional implementations of this embodiment, the multitasking result includes: at least two items of semantic segmentation results, target detection results and key point detection results of targets in the image to be processed.

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 6 illustrates a schematic block diagram of an example electronic device 600 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 6, the apparatus 600 includes a computing unit 601, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data required for the operation of the device 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, or the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 601 performs the respective methods and processes described above, such as a multitask model training method or a multitask processing method. For example, in some embodiments, the multitasking model training method or the multitasking method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as the storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into RAM603 and executed by the computing unit 601, one or more steps of the multitasking model training method or the multitasking method described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured by any other suitable means (e.g., by means of firmware) to perform a multitask model training method or a multitask processing method.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable multitasking model training device, multitasking device, or the like, such that the program codes, when executed by the processor or controller, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method of multitask model training, the method comprising:

acquiring a training sample set comprising at least one type of initial image, wherein each type of initial image is marked with at least one type of identification element;

acquiring a pre-established multitask network, wherein the multitask network comprises a general feature extractor and initial branch networks which correspond to various identification elements one by one, and the general feature extractor is connected with each initial branch network through branch nodes;

the following training steps are performed:

selecting initial images from the training sample set, inputting the selected initial images into the universal feature extractor to obtain feature maps corresponding to the selected initial images one by one, wherein identification elements of the selected initial images correspond to all initial branch networks;

inputting the characteristic diagram into an initial branch network corresponding to the identification element of the characteristic diagram aiming at each characteristic diagram in the obtained characteristic diagrams;

and acquiring the gradient value of each initial branch network in the branch node, adjusting the loss weight value of the corresponding initial branch network based on the gradient value of each initial branch network, and responding to the fact that the multitask network meets training completion conditions to obtain the multitask model.

2. The method according to claim 1, wherein the inputting the selected initial image into the general feature extractor to obtain the feature map corresponding to the selected initial image one by one comprises:

responding to the selected initial images in multiple types, superposing and inputting the initial images in the multiple types into the general feature extractor to obtain a feature map output by the general feature extractor;

and splitting the feature map output by the general feature extractor according to the types of the multiple types of initial images to obtain the feature maps corresponding to the multiple types of initial images.

3. The method of claim 1, further comprising:

acquiring a newly added image, wherein at least one type of newly added elements are marked on the newly added image;

adding a new branch network corresponding to all the new elements in the multitask model so that the general feature extractor is also connected with the new branch network through the branch nodes;

the following new training steps are performed:

selecting a newly added image and an initial image, and inputting the selected newly added image and the selected initial image into the general feature extractor at the same time to obtain a new feature map;

inputting the feature maps of the correspondingly selected initial images split from the new feature maps into each initial branch network in sequence;

inputting the feature map of the newly added image which is split from the new feature map and correspondingly selected into the newly added branch network;

and acquiring gradient values of each initial branch network and the newly-added branch network in the branch node, and adjusting the loss weight value of the corresponding initial branch network and/or the newly-added branch network based on the gradient values of the initial branch networks and the newly-added branch networks.

4. The method of claim 1, wherein the initial branch network comprises: any two of a semantic segmentation network, a target detection network and a key point detection network.

5. The method of claim 1, wherein said adjusting the loss weight value of each initial branch network based on the gradient value of the respective initial branch network comprises:

in response to the gradient value of the current initial branch network in all the initial branch networks in the current iteration training period being greater than the gradient values of the other initial branch networks, setting the loss weight value of the current initial branch network to be less than the loss weight values of the other initial branch networks in the next iteration training period.

6. The method of claim 1, further comprising:

removing one or more initial branching networks in the multitasking model.

7. A method of multitasking, the method comprising:

acquiring an image to be processed;

inputting the image to be processed into a multitask model generated by adopting the method according to any one of claims 1-6, and outputting a multitask processing result of the image to be processed.

8. The method of claim 7, wherein the multitasking result comprises:

at least two items of semantic segmentation results, target detection results and key point detection results of targets in the image to be processed.

9. A multitask model training device, said device comprising:

the system comprises a sample acquisition unit, a comparison unit and a comparison unit, wherein the sample acquisition unit is configured to acquire a training sample set comprising at least one type of initial image, and each type of initial image is labeled with at least one type of identification element;

the network acquisition unit is configured to acquire a pre-established multitask network, the multitask network comprises a general feature extractor and initial branch networks which correspond to various identification elements one by one, and the general feature extractor is connected with each initial branch network through a branch node;

an image selecting unit configured to select initial images from the training sample set, input the selected initial images into the general feature extractor, and obtain feature maps corresponding to the selected initial images one to one, wherein identification elements of the selected initial images correspond to all initial branch networks;

a feature input unit configured to input, for each of the obtained feature maps, the feature map into an initial branch network corresponding to an identification element of the feature map;

a gradient adjustment unit configured to collect gradient values of each initial branch network in the branch nodes and adjust a loss weight value of the corresponding initial branch network based on the gradient values of each initial branch network;

a model obtaining unit configured to obtain a multitask model when the multitask network satisfies a training completion condition.

10. The apparatus of claim 9, wherein the image selecting unit comprises:

the superposition module is configured to respond to the fact that the selected initial images are of multiple types, and superpose and input the multiple types of initial images into the general feature extractor to obtain a feature map output by the general feature extractor;

and the obtaining module is configured to split the feature map output by the general feature extractor according to the types of the multiple types of initial images to obtain the feature maps corresponding to the multiple types of initial images.

11. The apparatus of claim 9, the apparatus further comprising:

a newly added acquisition unit configured to acquire a newly added image, the newly added image being labeled with at least one type of newly added element;

a network adding unit configured to add a newly added branch network corresponding to all newly added elements in the multitasking model so that the general feature extractor is also connected with the newly added branch network through the branch node;

the newly-added selection unit is configured to select a newly-added image and an initial image, and the selected newly-added image and the selected initial image are simultaneously input into the general feature extractor to obtain a new feature map;

the initial input unit is configured to input the feature maps which are split from the new feature maps and correspond to the selected initial images into each initial branch network in sequence;

a newly-added input unit configured to input a feature map, which is split from the new feature map and corresponds to the selected newly-added image, into the newly-added branch network;

and the newly added adjusting unit is configured to acquire the gradient values of each initial branch network and the newly added branch network in the branch node, and adjust the loss weight value of the corresponding initial branch network and/or the newly added branch network based on the gradient value of each initial branch network and the gradient value of the newly added branch network.

12. The apparatus of claim 9, wherein the initial branch network comprises: any two of a semantic segmentation network, a target detection network and a key point detection network.

13. The apparatus of claim 9, wherein the gradient adjustment unit is further configured to:

14. The apparatus of claim 9, the apparatus further comprising:

a removal unit configured to remove one or more initial branching networks in the multitasking model.

15. A multitasking device, said device comprising:

an acquisition unit configured to acquire an image to be processed;

an input unit configured to input the image to be processed into a multitask model generated by using the apparatus according to any one of claims 9 to 14, and output a multitask processing result of the image to be processed.

16. The apparatus of claim 15, wherein the multitasking result comprises:

17. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.

18. A non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of claims 1-8.

19. A computer program product comprising a computer program which, when executed by a processor, implements the method of any one of claims 1-8.