CN113610150A

CN113610150A - Model training method, object classification method and device and electronic equipment

Info

Publication number: CN113610150A
Application number: CN202110896590.0A
Authority: CN
Inventors: 戴兵
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-08-05
Filing date: 2021-08-05
Publication date: 2021-11-05
Anticipated expiration: 2041-08-05
Also published as: CN113610150B

Abstract

The disclosure provides a model training method, an object classification device and electronic equipment, and relates to the field of deep learning, in particular to the field of model training. The specific implementation scheme is as follows: obtaining a sample set for model training; the sample set comprises a sample object of each classification task of the multi-task model, and the multi-task model comprises a first network and a second network; the second type of network is a network structure except the first type of network; training a multi-task model by using each sample object included in the sample set and the task identifier corresponding to each sample object; in the training process, each sample object is used for training the network parameters of the second-class network and the network parameters of the first-class network aiming at the corresponding task, and the corresponding task is a classification task with a task identifier corresponding to the sample object. By the scheme disclosed by the invention, the network parameters needing to be trained in the multitask model can be reduced.

Description

Model training method, object classification method and device and electronic equipment

Technical Field

The disclosure relates to the technical field of deep learning, in particular to the field of model training, and specifically relates to a method for model training, an object classification method, an object classification device and electronic equipment.

Background

The multitask model is a model that can simultaneously implement a plurality of classification tasks for an object. For example, in image classification, multiple image classifications can be simultaneously performed for one image by a multitask model.

In the related art, a multitask model is obtained mainly through training in a hard parameter sharing mode. However, the hard parameter sharing method requires more network parameters of the trained multitask model.

Disclosure of Invention

The disclosure provides a model training method, an object classification device and electronic equipment for reducing network parameters needing to be trained in a multi-task model.

According to an aspect of the present disclosure, there is provided a method of model training, comprising:

obtaining a sample set for model training; the sample set comprises a sample object of each classification task of a multitask model, the multitask model comprises a first network and a second network, and the first network comprises a full connection layer and a normalization layer in a terminal feature extraction layer; the second type of network is a network structure except the first type of network;

training the multi-task model by using each sample object included in the sample set and the task identifier corresponding to each sample object;

wherein, the task identifier corresponding to each sample object is the identifier of the classification task to which the sample object belongs; in the training process, each sample object is used for training the network parameters of the second-class network and the network parameters of the first-class network for the corresponding task, and the corresponding task is a classification task with a task identifier corresponding to the sample object.

According to another aspect of the present disclosure, there is provided an object classification method including:

acquiring a target object to be classified;

performing multi-task classification on the target object based on a pre-trained target multi-task model to obtain a classification result of each classification task;

the target multi-task classification model is a model obtained by training by using any one of the model training methods.

According to another aspect of the present disclosure, there is provided an apparatus for model training, including:

the sample set acquisition module is used for acquiring a sample set for model training; the sample set comprises a sample object of each classification task of a multitask model, the multitask model comprises a first network and a second network, and the first network comprises a full connection layer and a normalization layer in a terminal feature extraction layer; the second type of network is a network structure except the first type of network;

the model training module is used for training the multi-task model by utilizing each sample object included in the sample set and the task identifier corresponding to each sample object;

According to another aspect of the present disclosure, there is provided an object classification apparatus including:

the object acquisition module is used for acquiring a target object to be classified;

the object classification module is used for carrying out multi-task classification on the target object based on a pre-trained target multi-task model to obtain a classification result of each classification task;

the target multi-task classification model is a model obtained by training by using any device trained by the model.

According to another aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of model training or a method of object classification.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform a method of model training or an object classification method.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements a method of model training or an object classification method.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a diagram of a multitask model trained by hard parameter sharing in the related art;

fig. 2 is a schematic diagram of a ResNet50 network in the related art;

FIG. 3 is a schematic diagram according to a first embodiment of the present disclosure;

FIG. 4 is a schematic diagram according to a second embodiment of the present disclosure;

FIG. 5 is a schematic diagram according to a third embodiment of the present disclosure;

FIG. 6 is a schematic diagram of a multitasking model provided according to an embodiment of the present disclosure;

FIG. 7 is a schematic diagram according to a fourth embodiment of the present disclosure;

FIG. 8 is a schematic diagram according to a fifth embodiment of the present disclosure;

FIG. 9 is a block diagram of an electronic device for implementing a method of model training or a method of object classification in accordance with an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

With the continuous development of deep learning, object classification in deep learning also becomes an important algorithm task at present.

In practical needs, it is often necessary to perform multiple parallel, unrelated classification tasks. If one classification task needs to correspond to one object classification model, the workload is greatly increased and the deployment efficiency is low when the object classification models are trained and deployed.

A multitask model, which is a model capable of simultaneously implementing a plurality of classification tasks on an object, is an effective means for solving these problems. For example, in image classification, multiple image classifications can be simultaneously performed for one image by a multitask model.

In the related art, a multitask model is obtained mainly through training in a hard parameter sharing mode. In brief, in the multitask model, each classification task shares the feature extraction layer at the bottom layer in the multitask model, and each classification task has an independent feature extraction layer at the tail end and an independent full connection layer. Fig. 1 is a schematic diagram of a multitask model trained by hard parameter sharing in the related art. Three tasks of Task A, Task B and Task C in the figure 1 share the feature extraction layer at the bottom, namely Shared layers in the figure 1, and each Task has a feature extraction layer at the end and a full connection layer at the top, namely Specific layers in the figure, to form a structure with Shared bottom and independent top.

Illustratively, as shown in fig. 2, it is a schematic diagram of a ResNet50 network in the related art. Wherein: input is an input layer for inputting an object to be processed; stem layer is the initial layer, comprising a convolutional layer and a pooling layer; layer1, layer2, layer3 and layer4 are the main structures of ResNet50, wherein layer4 layer represents the composition of the reciprocal 3 convolutional layers of ResNet 50. Each layer contains several bottle neck structures, each bottle neck structure is shown in the right half of fig. 2, and each bottle neck structure is composed of three conv + bn + relu layers, where conv represents a convolution layer, bn represents a batch normalization layer, relu is an activation function, i.e., an activation function layer, and the expression is f (x) max (0, x); GAP (Global Average Pooling) represents a Global Pooling layer and outputs a vector with fixed 2048 dimensions; FC is a full connectivity layer.

If classical ResNet50 is used as the base network for the multitasking model, then the hard parameter sharing approach is: task A, task B and task C share the input, stem layer, layer1, layer2 and layer3 at the bottom in the ResNet50 network, and the last layer4, GAP and full link layer are owned by each task independently.

However, for the neural network model, the feature extraction layer at the end often contains more network parameters, and for the multi-task classification model, each classification task has an independent feature extraction layer at the end, so that the hard parameter sharing mode needs more network parameters to be trained. Further, the memory occupied during training is large. Meanwhile, when the trained model is deployed, more memory is occupied.

In the related art, there is also a scheme in which all the feature extraction layers are shared by classification tasks, and each classification task has only an independent full connection layer. However, since all the network parameters of the feature extraction layers are shared, when a certain classification task reaches the optimal classification, other classification tasks cannot reach the optimal classification. I.e. the problem of not being able to converge at the same time.

In order to reduce and solve the technical problems in the related art, embodiments of the present disclosure provide a method for model training.

It should be noted that, in a specific application, the method for model training provided by the embodiment of the present disclosure may be applied to various electronic devices, such as personal computers, servers, and other devices with data processing capability. In addition, it is understood that the model training method provided by the embodiments of the present disclosure may be implemented by software, hardware, or a combination of software and hardware.

The method for model training provided by the embodiment of the disclosure may include:

obtaining a sample set for model training; the sample set comprises a sample object of each classification task of the multi-task model, the multi-task model comprises a first network and a second network, and the first network comprises a full connection layer and a normalization layer in a terminal feature extraction layer; the second type of network is a network structure except the first type of network;

training a multi-task model by using each sample object included in the sample set and the task identifier corresponding to each sample object;

wherein, the task identifier corresponding to each sample object is the identifier of the classification task to which the sample object belongs; in the training process, each sample object is used for training the network parameters of the second-class network and the network parameters of the first-class network aiming at the corresponding task, and the corresponding task is a classification task with a task identifier corresponding to the sample object.

In the above scheme provided by the present disclosure, in the training process, each sample object is used to train the network parameters of the second-class network and the network parameters of the first-class network for the classification tasks having the task identifiers corresponding to the sample object, so that in the trained multi-task model, the network parameters of the second-class network are shared by the classification tasks, and the network parameters in the first-class network belong to the classification tasks. And because the first-class network only comprises the normalization layer in the characteristic extraction layer at the tail end, the network parameters except the normalization layer in the characteristic extraction layer at the tail end are shared by all classification tasks, so that the number of the network parameters unique to each classification task is reduced. Therefore, the network parameters needing to be trained in the multitask model can be reduced through the scheme provided by the disclosure. And further, the memory occupied by training and deploying the multi-task model can be reduced.

Furthermore, each classification task has unique parameters in the normalization layer in the feature extraction layer at the tail end, so that the problem of training conflict among the classification tasks can be avoided.

A method for training a model according to an embodiment of the present disclosure is described below with reference to the accompanying drawings.

As shown in fig. 3, an embodiment of the present disclosure provides a method for model training, which may include the following steps:

s301, obtaining a sample set for model training; the sample set comprises a sample object of each classification task of the multi-task model, the multi-task model comprises a first network and a second network, and the first network comprises a full connection layer and a normalization layer in a terminal feature extraction layer; the second type of network is a network structure except the first type of network;

network parameters in a first type of network in the multi-task model belong to different classification tasks. For example, the classification task of the multitask model includes a classification task 1, a classification task 2, and a classification task 3, and the first-class network of the multitask model includes a network parameter 1 for the classification task 1, a network parameter 2 for the classification task 2, and a network parameter 3 for the classification task 3. The network parameters in the second type network of the multitask model are shared by all classification tasks, namely only one network parameter is included, and the network parameters can be used for all the classification tasks.

The first type of network comprises a fully connected layer and a normalization layer in the feature extraction layer of the terminal. And for the network layers such as the convolution layer and the activation function layer in the feature extraction layer at the tail end, the network layers belong to a second type network. The feature extraction layer is a structural layer divided among different network structures. Taking the ResNet50 network as an example, the feature extraction layer in the ResNet50 network includes: layer1, layer2, layer3 and layer 4. layer1, layer2, layer3, and layer4 each contain multiple convolutional layers, activation function layers, and normalization layers. In the embodiment of the present disclosure, the layer4 is a terminal feature extraction layer, and may also be referred to as a highest feature extraction layer.

The acquired sample set may be pre-established. In order to train the multitask model, the acquired sample set needs to contain a sample object for each classification task of the multitask model. For example, the classification task of the multi-task model includes classification task 1 and classification task 2, and the sample set includes sample object 1 and sample object 2 for classification task 1, and sample object 3 and sample object 4 for classification task 2, where sample object 1 and sample object 2 are used to train the multi-classification model to implement classification task 1, and sample objects 3 and 4 are used to train the multi-classification task to implement classification task 2.

It should be noted that the multitasking model may be a model for classifying objects such as images and audio. Corresponding to the multitasking model, the sample object may be an image, audio, or the like. For example, if the multitask model is a multitask image classification model, such as a model constructed based on a CNN (Convolutional Neural Network), the sample object may be a sample image.

Each classification task of the multi-task model is a classification task for the same object, such as a plurality of classification tasks for classifying images. Each classification task may be a parallel unrelated task, for example, the classification task 1 is to classify the color of an image, and the classification task 2 is to classify an object included in the image; each classification task may be a coarse-fine classification task, for example, the classification task 1 is to classify objects included in the image, and the classification task 2 is to classify buildings included in the image.

S302, training a multi-task model by using each sample object included in the sample set and a task identifier corresponding to each sample object; wherein, the task identifier corresponding to each sample object is the identifier of the classification task to which the sample object belongs; in the training process, each sample object is used for training the network parameters of the second-class network and the network parameters of the first-class network aiming at the corresponding task, and the corresponding task is a classification task with a task identifier corresponding to the sample object.

Wherein, the sample objects contained in the sample set correspond to the task identifiers. Each task identifies a classification task that characterizes the sample object to which it belongs. Optionally, the corresponding relationship between each sample object and the task identifier may be implemented based on the naming of the sample object. For example, if the task number of classification task 1 is 1, the sample objects belonging to classification task 1 may be named 1-xx, such as 1-01, 1-02, etc. Optionally, the task identifier may be a task ID (Identity Document) of the task.

After the sample set is obtained, the multi-task model can be trained by using each sample object included in the sample set and the task identifier corresponding to each sample object.

In the process of training the multitask model, after each sample object is input into the multitask model, the sample object is only used for training the network parameters of the second type network, and the network parameters of the first type network aiming at the corresponding tasks are not used for training the network parameters of the first type network not corresponding to the tasks.

For example, the classification task of the multitask model includes classification task 1 and classification task 2, and the sample set includes sample object 1 and sample object 2 of classification task 1, and sample object 3 and sample object 4 of classification task 2. When the sample object 1 is input to the multitask classification model, the sample object 1 is only used for training the network parameters of the second-class network, and the network parameters of the first-class network for the classification task 1, but not for training the network parameters of the first-class network for the classification task 2.

According to the scheme provided by the disclosure, because the first-class network only comprises the normalization layer in the terminal feature extraction layer, the network parameters except the normalization layer in the second-class network feature extraction layer are shared by all classification tasks, and the number of unique network parameters of each classification task is reduced. Therefore, the network parameters needing to be trained in the multitask model can be reduced through the scheme provided by the disclosure. Meanwhile, each classification task has unique parameters in the normalization layer in the feature extraction layer at the tail end, so that the problem of training conflict among the classification tasks can be avoided.

Based on the embodiment of fig. 3, as shown in fig. 4, a method for model training provided by another embodiment of the present disclosure, the above S302, may include steps S3021 to S3024:

s3021, selecting a target sample object from the sample set;

for example, the target sample object is selected from the sample set in a random manner, or the target sample object is selected from the sample set in a preset non-random selection manner, which will be specifically described in the following embodiments and will not be described herein again.

Optionally, in an implementation manner, in order to fully utilize each sample object in the sample set to train the target classification model, the sample object obtained each time may be recorded, so that an unused sample object may be obtained from the sample set as the target sample object when the sample object is obtained.

S3022, inputting the selected target sample object and the corresponding task identifier into a multi-task model, so that the multi-task model classifies the target sample object based on the network parameters of the second type of network and the network parameters of the first type of network for the designated task to obtain a classification result; the designated task is a classification task with a task identifier corresponding to the target sample object;

and determining the classification task to which the target sample object belongs according to the task identifier corresponding to the target sample object. Thus, the target sample object and corresponding task identification may be input into the multitask model.

After receiving the target sample object and the corresponding task identifier, the multitask model may determine a network parameter for processing the target sample in the first-class network, that is, a network parameter of the first-class network for the designated task, and may further classify the target sample object based on the network parameter of the second-class network and the network parameter of the first-class network for the designated task, so as to obtain a classification result.

S3023, based on the obtained classification result and the difference between the target sample object and the calibration result, adjusting the network parameters of the second-class network and the network parameters of the designated tasks of the first-class network;

after the classification result of the target sample object is obtained, a difference between the classification result and the calibration result of the target sample object, which is also referred to as a model loss, may be calculated.

For example, the target sample object is a sample object for classification task 1, and classification task 1 is used to classify objects into class 1, class 2, and class 3. The target sample object is calibrated to be of class 3. Inputting the target sample object into a multitask model, classifying the target sample object by the multitask model based on the network parameters of the second type of network and the network parameters of the first type of network aiming at the classification task 1, and obtaining a classification result as follows: the probability of class 1 is 20%, the probability of class 2 is 10%, and the probability of class 3 is 70%, then the difference between the calibration result and the classification result can be calculated as: 30%, or 0.3.

After determining the difference of the calibration results of the target sample object, the network parameters of the second type network and the network parameters of the designated tasks of the first type network may be adjusted based on the obtained classification result and the difference of the calibration results of the target sample object.

For the neural network model, the larger the difference is, the larger the adjustment amplitude of the parameter to be adjusted is, so that the network parameters of the second type of network and the network parameters of the designated tasks of the first type of network in the multitask model can be adjusted based on the difference with the calibration result of the target sample object by combining the actual situation and the requirement.

Optionally, in an implementation manner, a predetermined parameter adjustment manner may be adopted to adjust the network parameters of the second type of network and the network parameters of the designated tasks of the first type of network in the multitask model. Illustratively, the predetermined parameter adjustment manner may be a random gradient descent manner, a batch gradient descent manner, or the like.

S3024, judging whether all the sample objects in the sample set are selected; if not, returning to execute the step S3021; otherwise, training is finished.

Optionally, in an implementation manner, after the parameters in the multitask model are adjusted, it is further required to determine whether all the sample objects in the sample set have been selected, and if there are sample objects in the sample set that have not been selected, the step of selecting the target sample object from the sample set may be returned to, that is, the step S3021 is returned to until all the sample objects in the sample set have been selected. If all the sample objects in the sample set have been selected, the training is finished.

By the aid of the scheme, network parameters needing to be trained in the multi-task model can be reduced, and the problem of training conflict among all classification tasks can be solved. Furthermore, the network parameters of the second type network and the network parameters of the designated tasks of the first type network in the multi-task model are adjusted by using the target sample object, so that the trained multi-task model can execute the designated tasks.

Optionally, for a situation of network parameters that need to be adjusted by using different sample objects in the multi-task model training, for each classification task, after adjusting the network parameters of the second-class network and the network parameters of the classification task of the first-class network in the multi-task model based on the plurality of sample objects of the classification task, the network parameters of another classification task are adjusted, so that the network parameters that need to be trained are prevented from being frequently switched by the multi-task model.

Based on this, the method for model training provided in another embodiment of the present disclosure, the above S3021, may include:

selecting a plurality of target sample objects of a target classification task from the sample set; the target classification task is any one of a plurality of classification tasks.

For any classification task, after the classification task is determined as a target classification task, a plurality of target sample objects of the target classification task can be selected from a sample set, and then the target sample objects are sequentially input into a multi-task model so as to adjust network parameters of a second type of network in the multi-task model and network parameters of the target classification task of a first type of network.

At this time, before returning to the step S3023 to select the target sample object from the sample set, the method further includes:

from the plurality of classification tasks, a new target classification task is determined.

Wherein the new target classification task may be randomly determined. Optionally, in an implementation manner, the determining a new target classification task from the plurality of classification tasks may include:

and determining a new target classification task from the plurality of classification tasks according to the task selection mode in turn.

For example, the classification tasks of the multitask model include: the classification task 1, the classification task 2 and the classification task 3 are determined, the classification task 1 is determined as a target classification task, the classification task 2 is determined as a target classification task, the classification task 3 is determined as a target classification task, after one round, the classification task 1 is determined as the target classification task again, and the rest is done until the training is finished.

By selecting the tasks in turn, the network parameters in the second type network in the multi-task model can be prevented from being biased to a certain classification task, so that the accuracy of multi-task model classification can be improved.

By the aid of the scheme, network parameters needing to be trained in the multi-task model can be reduced, and the problem of training conflict among all classification tasks can be solved. Furthermore, the network parameters needing to be trained can be avoided when the multitask model is frequently switched.

Optionally, in an embodiment, the first type of network may further include: and the global pooling layer is arranged between the full connection layer and the terminal feature extraction layer. And the tail end feature extraction layer inputs the output feature vectors into the global pooling layer, and the feature vectors are processed by the global pooling layer and then input into the full-connection layer.

Optionally, in an embodiment, the first type of network may further include: the first pooling layer and the feature fusion layer are arranged between the full-connection layer and the feature extraction layer at the tail end; the second type of network comprises a second pooling layer connected with the feature extraction layer at the tail end; and the characteristic fusion layer is used for fusing the characteristics of the first pooling layer and the second pooling layer and inputting the fused characteristics into the full-connection layer.

The second pooling layer may input the feature information extracted by the feature extraction network of the bottom layer to the feature fusion layer, and the feature fusion layer may input the features of the first pooling layer and the second pooling layer to the full connection layer after fusing. Therefore, the full connection layer can simultaneously classify the objects based on the feature information extracted by the bottom layer feature extraction network and the feature information extracted by the high layer feature extraction network. The classification effect of the multitask model can be improved.

As shown in fig. 5, an embodiment of the present disclosure provides an object classification method, which may include the following steps:

s501, acquiring a target object to be classified;

after the multi-task model trained by the model training method provided by the invention is deployed, the multi-task model can realize object classification. At this time, the target object to be classified may be acquired.

And S502, performing multi-task classification on the target object based on a pre-trained target multi-task model to obtain a classification result of each classification task.

After the target object is obtained, multi-task classification can be performed on the target object based on a pre-trained target multi-task model, and classification results of all classification tasks are obtained.

For example, each classification task of the target multitask model 1 includes a classification task 1, a classification task 2 and a classification task 3. Then the target object can be classified in a multi-task manner by using the target multi-task model 1, and the classification result of the classification task 1, the classification result of the classification task 2, and the classification result of the classification task 3 are obtained.

Optionally, in an implementation manner, the target object may be input into the target multitask model, so that the target multitask model performs multitask classification on the target object based on the network parameters of the second-class network and the network parameters of the first-class network for each classification task, and a classification result of each classification task is obtained.

After the target object is obtained, the target multitask model can perform object processing on the target object based on the network parameters of the second type network to obtain an intermediate processing result. And then sequentially setting task identifiers of all the classification tasks for the intermediate processing result, and after the task identifiers are set each time, carrying out object processing on the intermediate processing result by using the network parameters of the first-class network aiming at the classification tasks corresponding to the currently set task identifiers to obtain the classification results of the classification tasks corresponding to the currently set task identifiers.

The object processing may be processing such as feature extraction, feature pooling, and feature classification.

The object processing on the target object based on the network parameters of the second type of network may be to perform feature extraction on the target object by using a bottom feature extraction layer, and the intermediate classification result may be bottom features extracted by the bottom feature extraction layer in the multitasking model.

After the bottom layer features are extracted, the task identifiers of any classification tasks in the classification tasks can be set for the bottom layer features. And then, the network parameters of the first-class network, which aim at the classification task corresponding to the currently set task identifier, are utilized to perform processing such as feature extraction, feature pooling, feature classification and the like on the bottom-layer features, so as to obtain the classification result of the classification task corresponding to the currently set task identifier.

And after the classification result is obtained, continuing to set the task identifier of the next classification task for the extracted bottom layer features, and repeating the process until the classification results of all classification tasks are obtained.

According to the scheme provided by the disclosure, due to the fact that the target object is obtained, the target object can be subjected to multi-task classification, and the classification result of each classification task is obtained, so that the speed of classifying the target object is high.

In order to better understand the scheme provided by the present disclosure, as shown in fig. 6, a scenario of a multitask model constructed by a ResNet50 network for image classification is taken as an example to describe the scheme provided by the present disclosure. The second type of network in the multitasking model comprises stem layer, layer1, layer2 and layer3 of ResNet50, and conv and relu in layer4, and the first type of network comprises: bn, GAP and FC layers in layer 4.

Assuming that there are now 4 image classification tasks, the 4 image classification tasks share the stem layer, layer1, layer2 and layer3 underlying feature extraction layers of ResNet50, as well as conv and relu in layer4, except bn. In layer4, each image classification task has its own network parameters (i.e., bn parameters in bn layer), and its own network parameters in GAP layer and FC layer. Therefore, the image classification tasks can be converged simultaneously, and the parameter quantity of bn is small. The method and the device enable the training and deployment processes to basically occupy no memory, simultaneously increase great convenience for reasoning and deployment, and also can support the expansion of more image classification tasks. When an image classification task needs to be added, only a few bn parameters corresponding to the image classification task need to be added in the bn layer, and the image classification task has good expansibility. In addition, the scheme is also suitable for other classification tasks with bn structures, and has good expansibility.

When a multitask model based on the ResNet50 network needs to be trained, the objects of the samples in the sample set need to be added with task identifications (such as task IDs) corresponding to the sample objects in addition to the image paths and the calibration results (such as labels). Thus, after the sample object extracts the bottom layer features through the bottom layer feature extraction layer, when the sample object passes through the layer4, the bn parameters belonging to the image classification task indicated by the task identifier in the bn layer to be used can be determined according to the object identifier corresponding to the sample object. For example, when the image classification task 0 is trained, only bn parameters with _0 as suffix, such as bn1_0, bn2_0, bn3_0, in the bn layer of layer4, are used, while bn parameters belonging to other image classification tasks in the bn layer of layer4 are not used.

When classifying target images, the target images may be input to a multitasking model based on the ResNet50 network. And outputting a 14X 1024-dimensional vector N through a stem layer, a layer1, a layer2 and a layer 3. When the image passes through layer4, the task identifier of the image classification task needs to be set once according to the image classification task, and then the bn parameter corresponding to the task identifier in the bn layer of layer4 is adopted for processing. After the layer4 completes processing, the processing result of the layer4 is output through the corresponding GAP and FC layer, because there are 4 image classification tasks, so that the layer4 needs to sequentially set task identifiers of the 4 image classification tasks, and perform the above process for 4 times, thereby obtaining the classification result of the 4 image classification tasks.

Further, as shown in fig. 6, 512-dimensional features may be obtained by passing the output features of layer2 through GAP, and 2560-dimensional features may be obtained by splicing with 2048-dimensional features of layer4, which pass through GAP. Therefore, the bottom layer characteristics of the upper layer2 can be utilized, and the multi-task classification effect can be further improved.

According to an embodiment of the present disclosure, as shown in fig. 7, the present disclosure further provides an apparatus for model training, the apparatus including:

a sample set obtaining module 701, configured to obtain a sample set for model training; the sample set comprises a sample object of each classification task of the multi-task model, the multi-task model comprises a first network and a second network, and the first network comprises a full connection layer and a normalization layer in a terminal feature extraction layer; the second type of network is a network structure except the first type of network;

a model training module 702, configured to train a multi-task model by using each sample object included in the sample set and a task identifier corresponding to each sample object;

Optionally, the model training module includes:

the object selection submodule is used for selecting a target sample object from the sample set;

the object classification submodule is used for inputting the selected target sample object and the corresponding task identifier into the multitask model so that the multitask model classifies the target sample object based on the network parameters of the second type of network and the network parameters of the first type of network aiming at the specified task to obtain a classification result; the designated task is a classification task with a task identifier corresponding to the target sample object;

the network parameter adjusting submodule is used for adjusting the network parameters of the second type of network and the network parameters of the designated tasks of the first type of network based on the obtained classification result and the difference between the obtained classification result and the calibration result of the target sample object; and returning to the execution object selection submodule until all the sample objects in the sample set are selected.

Optionally, the object selection sub-module is further configured to select a plurality of target sample objects of the target classification task from the sample set; the target classification task is any one of a plurality of classification tasks;

the network parameter adjustment submodule also comprises:

and the task determination submodule is used for determining a new target classification task from the plurality of classification tasks before returning to the execution object selection submodule.

Optionally, the task determining sub-module is further configured to determine a new target classification task from the multiple classification tasks according to a task selection manner in turn.

Optionally, the first type of network further includes: the first pooling layer and the feature fusion layer are arranged between the full-connection layer and the feature extraction layer at the tail end; the second type of network comprises a second pooling layer connected with the feature extraction layer at the tail end; and the characteristic fusion layer is used for fusing the characteristics of the first pooling layer and the second pooling layer and inputting the fused characteristics into the full-connection layer.

In the above scheme provided by the present disclosure, in the training process, each sample object is used to train the network parameters of the second-class network and the network parameters of the first-class network for the classification tasks having the task identifiers corresponding to the sample object, so that in the trained multi-task model, the network parameters of the second-class network are shared by the classification tasks, and the network parameters in the first-class network belong to the classification tasks. And because the first-class network only comprises a normalization layer in the feature extraction layer at the tail end, network parameters except the normalization layer in the second-class network feature extraction layer are shared by all classification tasks, and the number of unique network parameters of each classification task is reduced. Therefore, the network parameters needing to be trained in the multitask model can be reduced through the scheme provided by the disclosure.

According to an embodiment of the object classification method of the present disclosure, as shown in fig. 8, the present disclosure also provides an object classification apparatus, the apparatus including:

an object obtaining module 801, configured to obtain a target object to be classified;

an object classification module 802, configured to perform multi-task classification on a target object based on a pre-trained target multi-task model to obtain a classification result of each classification task;

the target multi-task classification model is a model obtained by training by using the model training device provided by the invention.

Optionally, the object classification module includes:

and the object input sub-module is used for inputting the target object into the target multitask model so as to enable the target multitask model to carry out multitask classification on the target object based on the network parameters of the second type network and the network parameters of the first type network aiming at each classification task to obtain the classification result of each classification task.

Optionally, the target multitask model is further configured to perform object processing on the target object based on the network parameters of the second type of network to obtain an intermediate processing result; and after the task identification is set for each intermediate processing result, carrying out object processing on the intermediate processing result by using the network parameters of the first-class network aiming at the classification task corresponding to the currently set task identification to obtain the classification result of the classification task corresponding to the currently set task identification.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

An embodiment of the present disclosure provides an electronic device, including:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of model training or an object classification method.

The disclosed embodiments provide a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform a method of model training or an object classification method.

Embodiments of the present disclosure provide a computer program product comprising a computer program which, when executed by a processor, implements a method of model training or an object classification method.

FIG. 9 illustrates a schematic block diagram of an example electronic device 900 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 9, the apparatus 900 includes a computing unit 901, which can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the device 900 can also be stored. The calculation unit 901, ROM 902, and RAM 903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.

A number of components in the device 900 are connected to the I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, and the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, optical disk, or the like; and a communication unit 909 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 909 allows the device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 901 performs the respective methods and processes described above, such as a method of model training or an object classification method. For example, in some embodiments, the method of model training or the object classification method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 900 via ROM 902 and/or communications unit 909. When the computer program is loaded into the RAM 903 and executed by the computing unit 901, one or more steps of the above described method of model training or object classification method may be performed. Alternatively, in other embodiments, the computing unit 901 may be configured by any other suitable means (e.g., by means of firmware) to perform a method of model training or an object classification method.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method of model training, comprising:

2. The method according to claim 1, wherein the training the multitask model by using each sample object included in the sample set and a task identifier corresponding to each sample object comprises:

selecting a target sample object from the sample set;

inputting the selected target sample object and the corresponding task identifier into the multitask model so that the multitask model classifies the target sample object based on the network parameters of the second type of network and the network parameters of the first type of network aiming at the specified task to obtain a classification result; the designated task is a classification task with a task identifier corresponding to the target sample object;

based on the obtained classification result and the difference between the calibration result of the target sample object, adjusting the network parameters of the second type network and the network parameters of the designated tasks of the first type network; and returning to the step of selecting the target sample object from the sample set until all sample objects in the sample set are selected.

3. The method of claim 2, wherein said extracting a target sample object from said sample set comprises:

selecting a plurality of target sample objects of a target classification task from the sample set; the target classification task is any one of a plurality of classification tasks;

before returning to the sample set and selecting the target sample object, the method further includes:

4. The method of claim 3, wherein said determining a new target classification task from a plurality of classification tasks comprises:

5. The method according to any of claims 1-4, wherein the first type of network further comprises: the first pooling layer and the feature fusion layer are arranged between the full-connection layer and the feature extraction layer at the tail end; the second type of network comprises a second pooling layer connected with a terminal feature extraction layer; and the characteristic fusion layer is used for fusing the characteristics of the first pooling layer and the second pooling layer and inputting the fused characteristics into the full-connection layer.

6. An object classification method, comprising:

acquiring a target object to be classified;

wherein the target multitask classification model is a model trained by the method according to any one of claims 1-5.

7. The method of claim 6, wherein,

the method comprises the following steps of carrying out multi-task classification on a target object based on a pre-trained target multi-task model to obtain a classification result of each classification task, wherein the classification result comprises the following steps:

and inputting the target object into a target multitask model so that the target multitask model carries out multitask classification on the target object based on the network parameters of the second type network and the network parameters of the first type network aiming at each classification task to obtain a classification result of each classification task.

8. The method of claim 7, wherein the target multitask model multitasks classifying the target object based on the network parameters of the second type network and the network parameters of the first type network for each classification task to obtain a classification result of each classification task, and the classification result comprises:

the target multitask model carries out object processing on the target object based on the network parameters of the second type network to obtain an intermediate processing result;

and after the task identification is set for each intermediate processing result, carrying out object processing on the intermediate processing result by using the network parameters of the first-class network aiming at the classification task corresponding to the currently set task identification to obtain the classification result of the classification task corresponding to the currently set task identification.

9. An apparatus for model training, comprising:

10. The apparatus of claim 9, wherein the model training module comprises:

an object selection submodule for selecting a target sample object from the sample set;

a network parameter adjusting submodule, configured to adjust a network parameter of the second type network and a network parameter of the designated task of the first type network based on a difference between the obtained classification result and the calibration result of the target sample object; and returning to execute the object selection submodule until all the sample objects in the sample set are selected.

11. The apparatus of claim 10, wherein the object selection sub-module is further configured to select a plurality of target sample objects of a target classification task from the sample set; the target classification task is any one of a plurality of classification tasks;

the network parameter adjusting submodule further includes:

and the task determination submodule is used for determining a new target classification task from the plurality of classification tasks before returning to the execution of the object selection submodule.

12. The apparatus of claim 11, wherein the task determination sub-module is further configured to determine a new target classification task from the plurality of classification tasks in a manner of selecting tasks in turn.

13. The apparatus according to any one of claims 9-12, wherein the first type of network further comprises: the first pooling layer and the feature fusion layer are arranged between the full-connection layer and the feature extraction layer at the tail end; the second type of network comprises a second pooling layer connected with a terminal feature extraction layer; and the characteristic fusion layer is used for fusing the characteristics of the first pooling layer and the second pooling layer and inputting the fused characteristics into the full-connection layer.

14. An object classification apparatus comprising:

wherein the target multitask classification model is a model trained by the device according to any one of claims 9-13.

15. The apparatus of claim 14, wherein the object classification module comprises:

and the object input sub-module is used for inputting the target object into a target multitask model so as to enable the target multitask model to carry out multitask classification on the target object based on the network parameters of the second type network and the network parameters of the first type network aiming at each classification task to obtain the classification result of each classification task.

16. The apparatus of claim 15, wherein the target multitasking model is further configured to perform object processing on the target object based on network parameters of a second type of network to obtain an intermediate processing result; and after the task identification is set for each intermediate processing result, carrying out object processing on the intermediate processing result by using the network parameters of the first-class network aiming at the classification task corresponding to the currently set task identification to obtain the classification result of the classification task corresponding to the currently set task identification.

17. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5 or 6-8.

18. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any of claims 1-5 or 6-8.

19. A computer program product comprising a computer program which, when executed by a processor, implements the method of any one of claims 1-5 or 6-8.