CN113609951B

CN113609951B - Training method, device, equipment and medium for target detection model and target detection method, device, equipment and medium

Info

Publication number: CN113609951B
Application number: CN202110873991.4A
Authority: CN
Inventors: 王云浩; 陈松; 张滨; 辛颖; 冯原; 王晓迪; 龙翔; 贾壮; 彭岩; 郑弘晖; 李超; 谷祎; 韩树民
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-07-30
Filing date: 2021-07-30
Publication date: 2023-11-24
Anticipated expiration: 2041-07-30
Also published as: CN113609951A

Abstract

The disclosure provides a training and target detection method, device, equipment and medium of a target detection model, relates to the field of artificial intelligence, in particular to a computer vision and deep learning technology, and can be particularly used in smart cities and intelligent traffic scenes. The specific implementation scheme is as follows: extracting initial convolution characteristics of a first sample picture of a first category by utilizing a characteristic extraction layer of a first target detection model; determining a target channel to be enhanced in the initial convolution characteristics by utilizing a characteristic enhancement layer of the first target detection model, and enhancing the target channel to obtain target convolution characteristics; based on the target convolution characteristics, a target detection result aiming at the first sample picture is obtained; the first target detection model is trained based on the target detection result first loss value. After the first target detection model obtained through the training method is trained again based on a small amount of novel class sample pictures, the picture aiming at the novel class still has higher generalization detection capability.

Description

Training method, device, equipment and medium for target detection model and target detection method, device, equipment and medium

Technical Field

The present disclosure relates to the field of artificial intelligence, and in particular to computer vision and deep learning techniques, which are particularly useful in smart cities and intelligent traffic scenarios.

Background

The target detection method based on deep learning is used as a main channel for the rapid development of artificial intelligence, and initially falls to various fields of industry, remote sensing, agriculture, unmanned driving and the like. The vast majority of current training processes of target detection models are based on a large amount of sample data, and massive image data is needed to be used as a support.

However, the method of training the target detection model based on a large amount of sample data has various drawbacks, for example, the need to acquire a large amount of sample data consumes excessive time and labor, which increases the training cost of the target detection model; or, the sample data amount of some picture categories is too small, so that the target detection model cannot be trained to achieve the expected effect.

Disclosure of Invention

The disclosure provides a training and target detection method, device, equipment and medium for a target detection model.

According to a first aspect of the present disclosure, there is provided a training method of a target detection model, including:

Inputting a first sample picture of a first category into a feature extraction layer of a first target detection model, and extracting initial convolution features by using the feature extraction layer;

inputting the initial convolution characteristics to a characteristic enhancement layer of a first target detection model, determining a target channel to be enhanced in the initial convolution characteristics by using the characteristic enhancement layer, and enhancing the target channel to obtain target convolution characteristics;

based on the target convolution characteristics, a target detection result aiming at the first sample picture is obtained;

determining a first loss value corresponding to a target detection result of the first sample picture, and training a first target detection model based on the first loss value until the first loss value reaches a first preset loss threshold.

According to a second aspect of the present disclosure, there is provided a target detection method including:

inputting a second class of pictures to be detected into a second target detection model obtained by a training method provided according to a second aspect of the present disclosure;

and outputting a target detection result aiming at the picture to be detected by using the second target detection model.

According to a third aspect of the present disclosure, there is provided a training apparatus of an object detection model, comprising:

the feature extraction module is used for inputting a first sample picture of a first type into a feature extraction layer of the first target detection model, and extracting initial convolution features by using the feature extraction layer;

The characteristic enhancement module is used for inputting the initial convolution characteristic into a characteristic enhancement layer of the first target detection model, determining a target channel to be enhanced in the initial convolution characteristic by utilizing the characteristic enhancement layer, and enhancing the target channel to obtain the target convolution characteristic;

the first target detection module is used for acquiring a target detection result aiming at the first sample picture based on the target convolution characteristic;

the first model training module is used for determining a first loss value corresponding to the target detection result of the first sample picture, and training the first target detection model based on the first loss value until the first loss value reaches a first preset loss threshold value.

According to a fourth aspect of the present disclosure, there is provided an object detection apparatus including:

the image input module is used for inputting a second class of images to be detected into a second target detection model obtained by the training method provided by the second aspect of the disclosure;

and the result output module is used for outputting a target detection result aiming at the picture to be detected by using the second target detection model.

According to a fifth aspect of the present disclosure, there is provided an electronic device comprising:

at least one processor; and

A memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the training method provided in the first aspect of the present disclosure.

According to a sixth aspect of the present disclosure, there is provided an electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the object detection method provided in the second aspect of the present disclosure.

According to a seventh aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the training method provided by the first aspect of the present disclosure.

According to an eighth aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the object detection method provided in the second aspect.

According to a ninth aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the training method provided by the first aspect of the present disclosure.

According to a tenth aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the object detection method provided by the second aspect of the present disclosure.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

The beneficial effects that this disclosure provided technical scheme brought are:

in the technical scheme of the disclosure, the first target detection model is provided with the feature enhancement layer, and after training the first target detection model, the feature enhancement layer can have the capability of enhancing at least part of channels in the convolution features based on the characteristics of the picture. The capability of the feature enhancement layer to enhance the channel can ensure that the first target detection model can still have higher generalized detection capability for the novel class of pictures after training based on a small amount of novel class of sample pictures again. That is, in the case that the sample data amount of the novel class is small, the first target detection model can be continuously performed on a small amount of sample pictures of the novel class, so that the first target detection model has better target detection capability for the sample pictures of the novel class, the sample data amount required for training the target detection model applicable to the pictures of the novel class can be remarkably reduced, and the training cost of the target detection model is greatly reduced.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 shows a schematic block diagram of an object detection model provided by an embodiment of the present disclosure;

FIG. 2 is a schematic flow chart of a training method of a target detection model according to an embodiment of the disclosure;

FIG. 3 is a flow chart of another method for training a target detection model according to an embodiment of the present disclosure;

FIG. 4 is a flow chart of a training method of a further object detection model according to an embodiment of the present disclosure;

fig. 5 shows a flowchart of a target detection method according to an embodiment of the disclosure;

FIG. 6 shows one of the structural schematic diagrams of a training apparatus for a target detection model provided in an embodiment of the present disclosure;

FIG. 7 is a second schematic diagram of a training apparatus for another object detection model according to an embodiment of the disclosure;

fig. 8 is a schematic structural diagram of an object detection device according to an embodiment of the present disclosure;

FIG. 9 illustrates a schematic block diagram of an example electronic device that may be used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The method, device, equipment and medium for training and target detection of a target detection model provided by the embodiment of the disclosure aim to solve at least one of the above technical problems in the prior art.

Fig. 1 shows a schematic structural diagram of an object detection model provided by an embodiment of the present disclosure, and as shown in fig. 1, the object detection model includes at least a feature extraction layer and a feature enhancement layer. After the target detection model is trained, a target detection task may be performed, for example, the position of the target object in the picture may be detected using the target detection model.

In the embodiment of the present disclosure, the pictures may be classified into different types according to the difference of the target objects to be detected included in the pictures, for example, pictures with scenery as the target objects to be detected and pictures with animals as the target objects to be detected are different types of pictures, and for convenience of understanding and identification, the two different picture types are respectively classified into a first type and a second type, and it should be understood that the first type and the second type may not refer to a specific picture type.

The object detection model may be divided into a first object detection model and a second object detection model, where the first object detection model is suitable for an object detection task of a first class of pictures, and the second object detection model is suitable for an object detection task of a second class of pictures, that is, the first object detection model and the second object detection model are respectively used for object detection tasks of different picture classes. The first target detection model is obtained by training a first class of sample pictures; after the training of the first target detection model is completed, a small amount of sample pictures of the second category can be used for continuing to train the first target detection model, and after the training is completed, the second target detection model can be obtained. The training process of the first object detection model and the second object detection model will be described below.

Fig. 2 is a schematic flow chart of a training method of an object detection model according to an embodiment of the disclosure, where the method is used for training a first object detection model, and as shown in fig. 2, the method may mainly include the following steps:

s210: and inputting the first sample picture of the first category into a feature extraction layer of the first target detection model, and extracting initial convolution features by using the feature extraction layer.

In an embodiment of the present disclosure, the feature extraction layer includes at least one layer of convolution, and after the first sample image is subjected to at least one layer of convolution processing, a convolution feature of the first sample image may be obtained.

For ease of understanding and description, the disclosed embodiments refer to the convolution features extracted by the feature extraction layer as initial convolution features. It will be appreciated that the initial convolution feature corresponds to a plurality of channels, and that the feature expression capabilities of different channels may be different.

S220: and inputting the initial convolution characteristic into a characteristic enhancement layer of the first target detection model, determining a target channel to be enhanced in the initial convolution characteristic by using the characteristic enhancement layer, and enhancing the target channel to obtain the target convolution characteristic.

As previously described, the initial convolution feature corresponds to a plurality of channels, and the feature expression capabilities of the different channels may be different. In this step, a channel with a weak expression ability may be selected as a target channel to be enhanced based on the degree of the characteristic expression ability of each channel. After determining the target channel, the characteristic expression capability of the target channel can be enhanced in a preset manner, for example, the values of at least part of elements in the matrix of the target channel can be increased.

In the embodiment of the present disclosure, when determining a target channel to be enhanced, a feature enhancement layer may be utilized to calculate a first average value of elements in a sub-matrix of each channel in an initial convolution feature; and determining the channel with the corresponding first average value smaller than a first preset value as a target channel to be enhanced in the initial convolution characteristic.

In the embodiment of the disclosure, when the target channel is enhanced, the values of elements in the submatrix of the target channel can be increased based on the submatrix of the target channel and the submatrix of the channel adjacent to the target channel.

S230: and acquiring a target detection result aiming at the first sample picture based on the target convolution characteristic.

In the embodiment of the disclosure, candidate regions possibly containing the target object may be generated based on the target convolution characteristics, and probability scores of each candidate region possibly containing the target object, and the target detection result is determined based on the probability score corresponding to each candidate region.

S240: determining a first loss value corresponding to a target detection result of the first sample picture, and training a first target detection model based on the first loss value until the first loss value reaches a first preset loss threshold.

The specific value of the first preset loss threshold value can be determined according to actual design requirements, the first target detection model is trained based on the first loss value, so that the first loss value can reach the preset loss threshold value, when the first loss value can reach the preset loss threshold value, the first training process can be ended, and at the moment, the trained first target detection model can output a more accurate target detection result aiming at the first sample picture.

In the training method of the target detection model provided by the embodiment of the disclosure, the first target detection model is provided with the feature enhancement layer, and after the first target detection model is trained, the feature enhancement layer can have the capability of enhancing at least part of channels in the convolution features based on the characteristics of the pictures. The capability of the feature enhancement layer to enhance the channel can ensure that the first target detection model can still have higher generalized detection capability for the novel class of pictures after training based on a small amount of novel class of sample pictures again. That is, in the case that the sample data amount of the novel class is small, the first target detection model can be continuously performed on a small amount of sample pictures of the novel class, so that the first target detection model has better target detection capability for the sample pictures of the novel class, the sample data amount required for training the target detection model applicable to the pictures of the novel class can be remarkably reduced, and the training cost of the target detection model is greatly reduced.

Fig. 3 is a flowchart of another method for training a first object detection model according to an embodiment of the disclosure, where the method is used for training a first object detection model, and as shown in fig. 3, the method may mainly include the following steps:

s310: and inputting the first sample picture of the first category into a feature extraction layer of the first target detection model, and extracting initial convolution features by using the feature extraction layer.

For ease of understanding and description, the disclosed embodiments refer to the convolution features extracted by the feature extraction layer as initial convolution features. It will be appreciated that the initial convolution feature corresponds to a plurality of channels, which may be in the form of a matrix, the channels corresponding to a sub-matrix in the matrix. Wherein, the values of the elements in the submatrix of each channel can represent the intensity of the characteristic expression capability of the channel, and the characteristic expression capability of different channels can be different.

S320: the initial convolution features are input to a feature enhancement layer of a first target detection model, and a first average value of elements in a submatrix of each channel in the initial convolution features is calculated by using the feature enhancement layer.

For ease of understanding and description, the embodiment of the present application defines the average value of the elements in the submatrix of each channel as the first average value. It will be appreciated that the sum of the values of all the elements in the sub-matrix of a channel is added and divided by the total number of all the elements in the sub-matrix to obtain a first average value for each element in the sub-matrix of the channel.

As described above, the values of the elements in the sub-matrix of each channel may represent the intensity of the characteristic expression capability of the channel, and the average value of the elements in the sub-matrix of each channel may reflect the characteristic expression capability of the channel as a whole, and in general, the larger the corresponding average value, the stronger the characteristic expression capability of the channel.

S330: and determining the channel with the corresponding first average value smaller than a first preset value as a target channel to be enhanced in the initial convolution characteristic.

In this embodiment of the present disclosure, the specific value of the first preset value may be determined according to an actual design requirement, and when a channel corresponding to a first average value of the channels is smaller than the channel of the first preset value, the channel may be determined to be a target channel to be enhanced in the initial convolution feature.

As described above, the average value of each element in the sub-matrix of each channel may reflect the characteristic expression capability of the channel as a whole, and by setting a first preset value as a reference for the strength of the characteristic expression capability of the channel, the target channel to be enhanced may be rapidly determined by comparing the average value corresponding to the sub-matrix with the first preset value.

For ease of understanding and description, embodiments of the present application define the average value of each element in the matrix of the initial convolution feature as the second average value. Alternatively, the first preset value may be a second average value of each element in the matrix of the initial convolution feature. The average value of each element in the matrix of the initial convolution characteristic is used as a reference of the characteristic expression capability of the channel, the influence of the characteristics of the sample image on the characteristic expression capability of the channel is fully considered, and the selected target channel is ensured to be more objective and accurate.

In determining the target channel, a feature enhancement layer may be utilized to calculate a second average value for each element in the matrix of the initial convolution feature. It will be appreciated that the sum of the values of all the elements in the matrix of the initial convolution feature is added and divided by the total number of all the elements in the matrix to obtain the second average value of the elements in the matrix of the initial convolution feature.

After the second average value is obtained, the magnitudes of the first average value and the second average value corresponding to each channel can be compared, and the channel with the corresponding first average value smaller than the second average value can be determined as the target channel to be enhanced in the initial convolution characteristic. That is, when the first average value corresponding to a channel is smaller than the second average value, the channel may be determined to be the target channel to be enhanced in the initial convolution feature.

S340: and increasing the values of elements in the submatrices of the target channels based on the submatrices of the target channels and the submatrices of the channels adjacent to the target channels to obtain target convolution characteristics.

In the embodiment of the present disclosure, a preset operation mode may be used to calculate a sub-matrix of a target channel and a sub-matrix of a channel adjacent to the target channel, and the calculated sub-matrix is used as a sub-matrix of an enhanced target channel, and it may be understood that the general preset operation mode should ensure that at least a part of values of elements in the sub-matrix of the enhanced target channel are increased compared with an original value. According to the embodiment of the disclosure, the values of the elements in the submatrices of the target channels are increased based on the submatrices of the adjacent channels of the target channels, so that the values of the elements are prevented from being suddenly changed, and the enhancement effect of the channels is more gentle.

Optionally, the values of the elements in the submatrices of the target channels are increased by the following formula (1):

C _iA ＝w1×C _i +w2×C _i+1 - - - - -equation 1

In formula 1, C _iA Is a submatrix of the enhanced target channel, C _i Is a sub-matrix of target channels to be enhanced, C _i+1 Is a sub-matrix of channels adjacent to the target channel, w1 and w2 are trainable weight values, i may represent the sequence number of the target channel.

It should be noted that, the initial values of w1 and w2 may be set according to actual design requirements, w1 and w2 may be parameters of the model, and the values of w1 and w2 may be changed during training. One of the purposes of the training process for the first object detection model is that the first object detection model is able to learn to set corresponding values for w1 and w2 based on the initial convolution characteristics of the picture to obtain the desired object convolution characteristics.

S350: and acquiring a target detection result aiming at the first sample picture based on the target convolution characteristic.

Specifically, as shown in fig. 1, after a candidate region that may contain a target object is generated based on a target convolution feature, the candidate region and the target convolution feature may be pooled (e.g., ROI pooled) and then input to two full-connection layers (e.g., two convolutions), so as to obtain multiple classification probabilities, where each classification probability has a corresponding regression boundary, and the regression boundary corresponding to the classification probability is mapped to a first sample picture according to a preset probability threshold, so as to obtain a target detection result for the first sample picture.

S360: determining a first loss value corresponding to a target detection result of the first sample picture, and training a first target detection model based on the first loss value until the first loss value reaches a first preset loss threshold.

The specific value of the first preset loss threshold value can be determined according to actual design requirements, the first target detection model is trained based on the first loss value so that the first loss value can reach the preset loss threshold value, when the first loss value can reach the preset loss threshold value, the training process can be ended, and at the moment, the trained first target detection model can output a more accurate target detection result aiming at the first sample picture.

After training the first object detection model with the sample pictures of the first category is completed, training of the first object detection model with a small number of sample pictures of the second category may be continued, for example, after step S240 or step S360, the following step S410 may be continued. And after training the first target detection model by using the sample pictures of the second category, obtaining a second target detection model. The process of training to obtain the second object detection model is described as follows:

Fig. 4 is a flowchart of another training method for a target detection model according to an embodiment of the present disclosure, where the method is used to continue training a first target detection model after step S240 or step S360, so as to train to obtain a second target detection model, and as shown in fig. 4, the method may mainly include the following steps:

s410: and inputting the second sample picture of the second category into the obtained first target detection model, and outputting a target detection result aiming at the second sample picture by using the first target detection model.

It should be noted that, before S410, training of the first object detection model is already completed by using the sample picture of the first category.

In the embodiment of the disclosure, a second sample picture of a second category may be input to a feature extraction layer of the first target detection model, and initial convolution features are extracted by using the feature extraction layer; inputting the initial convolution characteristics to a characteristic enhancement layer of a first target detection model, determining a target channel to be enhanced in the initial convolution characteristics by using the characteristic enhancement layer, and enhancing the target channel to obtain target convolution characteristics; and acquiring a target detection result aiming at the second sample picture based on the target convolution characteristic.

It is understood that the specific process of step S410 is substantially the same as the processes of step S210 to step S230, and step S310 to step S350, except for the types of the sample pictures used, and the specific description in step S410 may refer to the corresponding description in the above description, which is not repeated here.

S420: determining a second loss value corresponding to the target detection result of the second sample picture, training the first target detection model based on the second loss value, and ending training when the second loss value reaches a second preset loss threshold value to obtain a second target detection model suitable for a target detection task of the second class of pictures.

The specific value of the second preset loss threshold may be determined according to the actual design requirement, and the training of the first target detection model based on the second loss value is to enable the second loss value to reach the preset loss threshold, and when the second loss value can reach the preset loss threshold, the training process may be ended, where the first target detection model may be used as the second target detection model applicable to the target detection task of the second class of pictures.

As described above, the first object detection model is provided with a feature enhancement layer, and after training the first object detection model, the feature enhancement layer can have the capability of enhancing at least part of channels in the convolution feature based on the characteristics of the picture. The capability of the feature enhancement layer to enhance the channel can ensure that the first target detection model can still have higher generalization detection capability for the second class of pictures after being trained based on a small amount of second class of sample pictures. That is, the second object detection model obtained by using a small amount of sample pictures of the second category to the first object detection model can have better object detection capability for the sample pictures of the second category, which can significantly reduce the amount of sample data required for training the object detection model applicable to the pictures of the second category and greatly reduce the training cost of the object detection model.

Fig. 5 shows a flow chart of a target detection method according to an embodiment of the disclosure, as shown in fig. 5, the method may mainly include the following steps:

s510: and inputting the second class of pictures to be detected into the trained second target detection model.

S520: and outputting a target detection result aiming at the picture to be detected by using the second target detection model.

In the embodiment of the disclosure, a second-class picture to be detected can be input to a feature extraction layer of a second target detection model, and initial convolution features are extracted by using the feature extraction layer; inputting the initial convolution characteristic into a characteristic enhancement layer of a second target detection model, determining a target channel to be enhanced in the initial convolution characteristic by using the characteristic enhancement layer, and enhancing the target channel to obtain the target convolution characteristic; and acquiring a target detection result aiming at the picture to be detected based on the target convolution characteristic.

Optionally, when determining a target channel to be enhanced in the initial convolution feature by using the feature enhancement layer, calculating a first average value of each element in the submatrix of each channel in the initial convolution feature by using the feature enhancement layer; and determining the channel with the corresponding first average value smaller than a first preset value as a target channel to be enhanced in the initial convolution characteristic.

Optionally, the first preset value is a second average value of each element in the matrix of the initial convolution feature. When a channel with the corresponding first average value smaller than a first preset value is determined as a target channel to be enhanced in the initial convolution feature, a feature enhancement layer can be utilized to calculate a second average value of each element in a matrix of the initial convolution feature; and determining the corresponding channel with the first average value smaller than the second average value as a target channel to be enhanced in the initial convolution characteristic.

In enhancing the target channel, the values of the elements in the sub-matrix of the target channel may be increased based on the sub-matrix of the target channel and the sub-matrix of the channel adjacent to the target channel.

Alternatively, the values of the elements in the submatrices of the target channels may be increased by the following formula:

C _iA ＝w1×C _i +w2×C _i+1 wherein C _iA Is a submatrix of the enhanced target channel, C _i Is a sub-matrix of target channels to be enhanced, C _i+1 Is a sub-matrix of channels adjacent to the target channel, w1 and w2 are trainable weight values, i may represent the sequence number of the target channel.

Based on the same principle as the training method of the target detection model, fig. 6 shows one of the structural diagrams of the training device of the target detection model according to the embodiment of the disclosure, and fig. 7 shows one of the structural diagrams of the training device of the target detection model according to the embodiment of the disclosure. As shown in fig. 6, the training apparatus 600 for the object detection model includes a feature extraction module 610, a feature enhancement module 620, a first object detection module 630, and a first model training module 640.

The feature extraction module 610 is configured to input a first sample picture of a first type to a feature extraction layer of a first object detection model, and extract initial convolution features using the feature extraction layer.

The feature enhancement module 620 is configured to input the initial convolution feature to a feature enhancement layer of the first target detection model, determine a target channel to be enhanced in the initial convolution feature by using the feature enhancement layer, and enhance the target channel to obtain the target convolution feature.

The first target detection module 630 is configured to obtain a target detection result for the first sample picture based on the target convolution feature.

The first model training module 640 is configured to determine a first loss value corresponding to the target detection result of the first sample picture, and train the first target detection model based on the first loss value until the first loss value reaches a first preset loss threshold.

In the training device for the target detection model provided by the embodiment of the present disclosure, the first target detection model is provided with a feature enhancement layer, and after training the first target detection model, the feature enhancement layer can have the capability of enhancing at least part of channels in the convolution feature based on the characteristics of the picture. The capability of the feature enhancement layer to enhance the channel can ensure that the first target detection model can still have higher generalized detection capability for the novel class of pictures after training based on a small amount of novel class of sample pictures again. That is, in the case that the sample data amount of the novel class is small, the first target detection model can be continuously performed on a small amount of sample pictures of the novel class, so that the first target detection model has better target detection capability for the sample pictures of the novel class, the sample data amount required for training the target detection model applicable to the pictures of the novel class can be remarkably reduced, and the training cost of the target detection model is greatly reduced.

In the embodiment of the present disclosure, the feature enhancement module 620 is specifically configured to, when configured to determine, using the feature enhancement layer, a target channel to be enhanced in the initial convolution feature:

calculating a first average value of each element in the submatrix of each channel in the initial convolution characteristic by using the characteristic enhancement layer;

and determining the channel with the corresponding first average value smaller than a first preset value as a target channel to be enhanced in the initial convolution characteristic.

In the embodiment of the disclosure, the first preset value is a second average value of each element in the matrix of the initial convolution feature; the feature enhancement module 620, when determining a channel with a corresponding first average value smaller than a first preset value as a target channel to be enhanced in the initial convolution feature, is specifically configured to:

calculating a second average value of each element in the matrix of the initial convolution feature by using the feature enhancement layer;

and determining the corresponding channel with the first average value smaller than the second average value as a target channel to be enhanced in the initial convolution characteristic.

In the embodiment of the present disclosure, the feature enhancement module 620, when used to enhance the target channel, is specifically used to: based on the sub-matrix of the target channel and the sub-matrix of the channel adjacent to the target channel, the values of the elements in the sub-matrix of the target channel are increased.

In the disclosed embodiment, the values of the elements in the submatrices of the target channels are increased by the following formula: c (C) _iA ＝w1×C _i +w2×C _i+1 ，

Wherein C is _iA Is a submatrix of the enhanced target channel, C _i Is a sub-matrix of target channels to be enhanced, C _i+1 Is a sub-matrix of channels adjacent to the target channel, w1 and w2 are trainable weight values.

In this embodiment, as shown in fig. 7, the training apparatus 600 for the object detection model further includes a second object detection module 650 and a second model training module 660.

The second object detection module 650 is configured to input a second sample picture of the second category to the first object detection model, and output an object detection result for the second sample picture using the first object detection model.

The second model training module 660 is configured to determine a second loss value corresponding to the target detection result of the second sample picture, train the first target detection model based on the second loss value, and end training when the second loss value reaches a second preset loss threshold value, so as to obtain a second target detection model applicable to the target detection task of the second class of picture.

It can be understood that the above modules of the training device for the target detection model in the embodiments of the present disclosure have functions of implementing the corresponding steps of the training method for the target detection model. The functions can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the functions described above. The modules may be software and/or hardware, and each module may be implemented separately or may be implemented by integrating multiple modules. For the functional description of each module of the training device of the target detection model, reference may be specifically made to the corresponding description of the model training method, which is not repeated herein.

Based on the same principle as the above-described object detection method, fig. 8 shows a schematic structural diagram of an object detection device according to an embodiment of the present disclosure. As shown in fig. 8, the object detection apparatus 800 includes a picture input module 810 and a result output module 820.

The picture input module 810 is configured to input a second class of pictures to be detected into a second object detection model obtained according to the training method of claim 6.

The result output module 820 is configured to output a target detection result for the picture to be detected using the second target detection model.

It will be appreciated that the above modules of the object detection apparatus in the embodiments of the present disclosure have functions of implementing the corresponding steps of the above-described object detection method. The functions can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the functions described above. The modules may be software and/or hardware, and each module may be implemented separately or may be implemented by integrating multiple modules. For the functional description of each module of the above object detection apparatus, reference may be specifically made to the corresponding description of the above object detection method, which is not repeated herein.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

FIG. 9 illustrates a schematic block diagram of an example electronic device that may be used to implement embodiments of the present disclosure, it being understood that the electronic device may be used to implement at least one of the training method and the target detection method of the target detection model of embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 9, the apparatus 900 includes a computing unit 901 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM903, various programs and data required for the operation of the device 900 can also be stored. The computing unit 901, the ROM902, and the RAM903 are connected to each other by a bus 904. An input/output (I/O) interface 905 is also connected to the bus 904.

Various components in device 900 are connected to I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, or the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, an optical disk, or the like; and a communication unit 909 such as a network card, modem, wireless communication transceiver, or the like. The communication unit 909 allows the device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunications networks.

The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 901 performs the respective methods and processes described above, for example, at least one of a training method and an object detection method of the object detection model. For example, in some embodiments, at least one of the training method of the object detection model and the object detection method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 900 via the ROM902 and/or the communication unit 909. When the computer program is loaded into the RAM903 and executed by the computing unit 901, one or more steps of the training method of the object detection model described above may be performed, or one or more steps of the object detection method described above may be performed, in other embodiments the computing unit 901 may be configured to perform at least one of the training method and the object detection method of the object detection model in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A training method of a target detection model, comprising:

inputting a first sample picture of a first category to a feature extraction layer of a first target detection model, and extracting initial convolution features by using the feature extraction layer;

inputting the initial convolution characteristic to a characteristic enhancement layer of a first target detection model, determining a target channel to be enhanced in the initial convolution characteristic by using the characteristic enhancement layer, and enhancing the target channel to obtain a target convolution characteristic;

Acquiring a target detection result aiming at the first sample picture based on the target convolution characteristic;

determining a first loss value corresponding to a target detection result of the first sample picture, and training the first target detection model based on the first loss value until the first loss value reaches a first preset loss threshold;

the determining, by the feature enhancement layer, a target channel to be enhanced in the initial convolution feature includes:

calculating a first average value of elements in a sub-matrix of each channel in the initial convolution feature by using the feature enhancement layer;

determining a channel of which the corresponding first average value is smaller than a first preset value as a target channel to be enhanced in the initial convolution characteristic;

the enhancing the target channel comprises the following steps:

based on the sub-matrix of the target channel and the sub-matrix of the channel adjacent to the target channel, increasing the values of the elements in the sub-matrix of the target channel.

2. The method of claim 1, wherein the first preset value is a second average value of elements in a matrix of the initial convolution feature;

the determining the corresponding channel with the first average value smaller than the first preset value as the target channel to be enhanced in the initial convolution feature includes:

3. The method of claim 1, increasing the values of elements in the submatrices of the target channels by:

C _iA =w1×C _i +w2×C _i+1 ，

wherein C is _iA Is a sub-matrix of the target channel after enhancement, C _i Is a sub-matrix of the target channel to be enhanced, C _i+1 Is a channel adjacent to the target channelThe submatrices, w1 and w2, are trainable weight values, i representing the sequence number of the target channel.

4. A method according to any one of claims 1 to 3, further comprising, after said training the first object detection model based on the first loss value until the first loss value reaches a first preset loss threshold:

inputting a second sample picture of a second category into the first target detection model, and outputting a target detection result aiming at the second sample picture by utilizing the first target detection model;

determining a second loss value corresponding to the target detection result of the second sample picture, training the first target detection model based on the second loss value, and ending training when the second loss value reaches a second preset loss threshold value to obtain a second target detection model applicable to the target detection task of the second class of picture.

5. A target detection method comprising:

inputting a second class of pictures to be detected into a second target detection model obtained by the training method according to claim 4;

6. A training device for a target detection model, comprising:

the feature extraction module is used for inputting a first sample picture of a first type into a feature extraction layer of a first target detection model, and extracting initial convolution features by using the feature extraction layer;

the characteristic enhancement module is used for inputting the initial convolution characteristic to a characteristic enhancement layer of a first target detection model, determining a target channel to be enhanced in the initial convolution characteristic by utilizing the characteristic enhancement layer, and enhancing the target channel to obtain a target convolution characteristic;

the first model training module is used for determining a first loss value corresponding to a target detection result of the first sample picture, and training the first target detection model based on the first loss value until the first loss value reaches a first preset loss threshold value;

The feature enhancement module is specifically configured to, when determining, by using the feature enhancement layer, a target channel to be enhanced in the initial convolution feature:

the feature enhancement module is used for enhancing the target channel, and is specifically used for:

7. The apparatus of claim 6, the first preset value being a second average value of elements in a matrix of the initial convolution feature;

the feature enhancement module is specifically configured to, when determining, as a target channel to be enhanced in the initial convolution feature, a channel corresponding to the first average value being smaller than a first preset value:

8. The apparatus of claim 6, the values of the elements in the submatrices of the target channels are increased by the formula:

C _iA =w1×C _i +w2×C _i+1 ，

wherein C is _iA Is a sub-matrix of the target channel after enhancement, C _i Is a sub-matrix of the target channel to be enhanced, C _i+1 Is a sub-matrix of channels adjacent to the target channel, w1 and w2 are trainable weight values, i represents the sequence number of the target channel.

9. The apparatus of claim 6, further comprising:

a second target detection module, configured to input a second sample picture of a second category into a first target detection model obtained by the training method according to any one of claims 1 to 3, and output a target detection result for the second sample picture using the first target detection model;

and the second model training module is used for determining a second loss value corresponding to the target detection result of the second sample picture, training the first target detection model based on the second loss value, and ending training when the second loss value reaches a second preset loss threshold value to obtain a second target detection model applicable to the target detection task of the second class of picture.

10. An object detection apparatus comprising:

the picture input module is used for inputting a second class of pictures to be detected into a second target detection model obtained by the training method according to claim 4;

and the result output module is used for outputting a target detection result aiming at the picture to be detected by utilizing the second target detection model.

11. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-4.

12. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of claim 5.

13. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-4.

14. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of claim 5.