CN116612295A

CN116612295A - Feature map generation method, training method and device of target detection model

Info

Publication number: CN116612295A
Application number: CN202310431499.0A
Authority: CN
Inventors: 陈阳; 李弼; 希滕; 张刚
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2023-04-20
Filing date: 2023-04-20
Publication date: 2023-08-18

Abstract

The disclosure provides a feature map generation method and a training method and device of a target detection model, relates to the technical field of artificial intelligence, and particularly relates to the technical fields of computer vision, deep learning, image processing and the like. The specific implementation scheme is as follows: partitioning a candidate region of the sample image; randomly determining anchor points in the subareas obtained by the subareas of the candidate areas; determining a characteristic value of an anchor point according to the position information of the anchor point in the subarea and the mapping position of the candidate area in a first characteristic image, wherein the first characteristic image is a characteristic image of a sample image; and generating a second feature map of the candidate region according to the feature value of the anchor point, wherein the second feature map is used for predicting the category of the object in the candidate region. The feature expression is enriched, so that data enhancement is realized from the feature level, the processing process of sample pictures and the model training flow are simplified, and the performance and generalization capability of a trained target detection model can be improved.

Description

Feature map generation method, training method and device of target detection model

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to the technical fields of computer vision, deep learning, image processing and the like, and specifically relates to a feature map generation method, a training method of a target detection model and a training device of the target detection model.

Background

In a visual task, for example, in a target detection task, the more the number of image samples, the better the trained target detection model effect, and the stronger the generalization capability of the model. In practice, however, the number of samples is often insufficient or the sample quality is not good enough, which requires data enhancement of the samples to improve the performance of the model.

Disclosure of Invention

The disclosure provides a feature map generation method, a training method of a target detection model and a training device of the target detection model.

According to an aspect of the present disclosure, there is provided a feature map generating method, including: partitioning a candidate region of the sample image; randomly determining anchor points in the subareas obtained by the subareas of the candidate areas; determining a characteristic value of an anchor point according to the position information of the anchor point in the subarea and the mapping position of the candidate area in a first characteristic image, wherein the first characteristic image is a characteristic image of a sample image; and generating a second feature map of the candidate region according to the feature value of the anchor point, wherein the second feature map is used for predicting the category of the object in the candidate region.

According to another aspect of the present disclosure, there is provided a training method of a target detection model, the method including: acquiring the labeling information of a sample image; inputting the sample image into a target detection model to obtain a prediction result of the sample image; in the process of obtaining the prediction result of the sample image, the target detection model is used for obtaining a second feature map of the candidate region in the sample image according to the feature map generating method according to any one of the embodiments, and the second feature map is used for obtaining the prediction result of the sample image; and adjusting parameters of the target detection model according to the prediction result of the sample image and the labeling information of the sample image to obtain a trained target detection model.

According to another aspect of the present disclosure, there is provided a target detection method including: acquiring an image to be detected; inputting the image to be detected into a trained target detection model to obtain a detection result of the image to be detected, wherein the trained target detection model is obtained by training by adopting the training method in any embodiment.

According to another aspect of the present disclosure, there is provided a feature map generating apparatus, including: the partitioning unit is used for partitioning the candidate area of the sample image; a first determining unit for randomly determining an anchor point in a sub-region obtained by partitioning the candidate region; the second determining unit is used for determining the characteristic value of the anchor point according to the position information of the anchor point in the subarea and the mapping position of the candidate area in the first characteristic map, wherein the first characteristic map is a characteristic map of the sample image; and the generating unit is used for generating a second characteristic diagram of the candidate region according to the characteristic value of the anchor point, wherein the second characteristic diagram is used for predicting the category of the object in the candidate region.

According to another aspect of the present disclosure, there is provided a training apparatus of an object detection model, the apparatus including: the sample acquisition unit is used for acquiring a sample image and labeling information of the sample image; the result prediction unit is used for inputting the sample image into the target detection model to obtain a prediction result of the sample image; in the process of obtaining the prediction result of the sample image, the target detection model is used for obtaining a second feature map of the candidate region in the sample image according to the feature map generating method according to any one of the embodiments, and the second feature map is used for obtaining the prediction result of the sample image; and the adjusting unit is used for adjusting the parameters of the target detection model according to the prediction result of the sample image and the labeling information of the sample image to obtain a trained target detection model.

According to another aspect of the present disclosure, there is provided an object detection apparatus including: the image acquisition unit is used for acquiring an image to be detected; the detection unit is used for inputting the image to be detected into a trained target detection model to obtain a detection result of the image to be detected, wherein the trained target detection model is obtained by training by adopting the training method in any embodiment.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the embodiments of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform a method according to any one of the embodiments of the present disclosure.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method according to any of the embodiments of the present disclosure.

The feature map generation method, the training method of the target detection model and the training device of the target detection model provided by the embodiment of the disclosure are implemented by partitioning candidate areas of a sample image; randomly determining anchor points in the subareas obtained by the subareas of the candidate areas; determining a characteristic value of an anchor point according to the position information of the anchor point in the subarea and the mapping position of the candidate area in a first characteristic image, wherein the first characteristic image is a characteristic image of a sample image; and generating a second feature map of the candidate region according to the feature value of the anchor point, wherein the second feature map is used for predicting the category of the object in the candidate region. Because the positions of the anchor points are randomly determined, in the training process of the target detection model, different anchor points can be randomly obtained in each parameter adjustment process, further more characteristic information in the first characteristic diagram can be utilized for training, and characteristic expression is enriched, so that data enhancement is realized from the characteristic layer, the processing process of sample pictures and the model training process are simplified, and further the performance and generalization capability of the trained target detection model can be improved.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a flow diagram of a feature map generation method provided in accordance with an embodiment of the present disclosure;

fig. 2 is a schematic diagram of a feature map generation method in the related art;

FIG. 3 is a schematic diagram of a feature map generation method provided in accordance with an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a feature map generation method provided in accordance with yet another embodiment of the present disclosure;

FIG. 5 is a schematic diagram of a feature map generation method provided in accordance with a further embodiment of the present disclosure;

FIG. 6 is a flow chart of a training method for a target detection model provided in accordance with an embodiment of the present disclosure;

FIG. 7 is a flow chart of a method for object detection provided in accordance with an embodiment of the present disclosure;

FIG. 8 is a block diagram of a feature map generating apparatus provided according to an embodiment of the present disclosure;

FIG. 9 is a block diagram of a training apparatus for an object detection model provided in accordance with an embodiment of the present disclosure;

FIG. 10 is a block diagram of an object detection device provided in accordance with an embodiment of the present disclosure;

fig. 11 is a block diagram of an electronic device for implementing a feature map generation method of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The current deep learning technology obtains a fire study in the artificial intelligence field, and shows excellent performance in the fields of image classification, target detection, semantic segmentation and the like. In the visual task, the more the number of general image samples is, the better the trained model effect is, and the stronger the generalization capability of the model is. However, in practice, the number of samples is often insufficient or the quality of the samples is not good enough, so that data enhancement is required to be performed on the samples, thereby improving the data quantity, and meanwhile, the purpose of improving the quality of the samples can be achieved to a certain extent, so that the performance of the model is improved. In particular, in target detection, data enhancement is a critical ring, which plays a critical role in the detection performance of the model. How to reasonably and efficiently realize data enhancement to improve the performance of the target detection model is also a research hot spot.

In the related art, data enhancement in target detection generally uses the following scheme: 1. performing disturbance on the pixel level of RGB (Red Green Blue) on the image, such as brightness, contrast adjustment and the like; 2. performing scale transformation on the whole image, such as randomly adjusting the size of the image, rotating the image and the like; 3. the fusion is performed on the target area of the image, for example, a MixUp (fusion) method is adopted.

However, in the method, the data of the image sample is enhanced, and then the image sample after the data enhancement is input into the target detection model for training, so that the process is complex.

In order to solve at least one of the above problems, an embodiment of the present disclosure provides a feature map generating method, a training method of a target detection model, and a device thereof, by partitioning a candidate region of a sample image; randomly determining anchor points in the subareas obtained by the subareas of the candidate areas; determining a characteristic value of an anchor point according to the position information of the anchor point in the subarea and the mapping position of the candidate area in a first characteristic image, wherein the first characteristic image is a characteristic image of a sample image; and generating a second feature map of the candidate region according to the feature value of the anchor point, wherein the second feature map is used for predicting the category of the object in the candidate region. Because the positions of the anchor points are randomly determined, in the training process of the target detection model, different anchor points can be randomly obtained in each parameter adjustment process, further more characteristic information in the first characteristic diagram can be utilized for training, and characteristic expression is enriched, so that data enhancement is realized from the characteristic layer, the processing process of sample pictures and the model training process are simplified, and further the performance and generalization capability of the trained target detection model can be improved.

The present disclosure is now fully described with reference to the accompanying drawings. Fig. 1 is a flowchart of a feature map generating method according to an embodiment of the present disclosure. Referring to fig. 1, an embodiment of the present disclosure provides a feature map generating method 100, where the method 100 includes the following steps S101 to S104.

Step S101, partitioning a candidate region of the sample image.

Step S102, anchor points are randomly determined in the sub-regions obtained by the candidate region partitioning.

Step S103, determining the characteristic value of the anchor point according to the position information of the anchor point in the sub-region and the mapping position of the candidate region in the first characteristic map, wherein the first characteristic map is the characteristic map of the sample image.

Step S104, generating a second feature map of the candidate region according to the feature value of the anchor point, wherein the second feature map is used for predicting the category of the object in the candidate region.

The method 100 may be used in a training process of a target detection model, where the target detection model can be used to identify multiple objects in a picture, and may also locate different objects. The object detection model may be a one-stage object detection model or a two-stage object detection model, etc. The method 100 is used in a training process of a target detection model.

Taking a two-stage object detection model as an example, the model can be roughly divided into two stages, when a sample image is input into the object detection model, an object region can be extracted in one stage, namely, each candidate region (Proposal) of the sample image is obtained, and in addition, a first feature map of the sample image can be obtained in the stage. The two stages can perform classification recognition on the object.

A candidate region refers to a relatively small region that may contain a region to be identified or classified, which may appear as a selection box in the sample image, and boxes out objects of interest that may be contained in the sample image. The first feature map is a feature map corresponding to the sample image, and the feature map records feature information in the sample image. The candidate region and the first feature map are both intermediate layer outputs of the target detection model. It will be appreciated that the size of each candidate region generated by the sample image is not fixed, and the method 100 may generate each second feature map of a specific size for each candidate region that is not fixed, i.e., each second feature map generated by each candidate region is fixed in size. It will be appreciated that the specific dimensions may be predetermined, for example 2 _× 2, or 7 _× 7, etc.

After the candidate region is obtained in step S101, the candidate region may be partitioned to obtain a plurality of sub-regions. It will be appreciated that the number of sub-regions is the same as the size of the second feature map to be generated, e.g. 2 _× 2, then the candidate region needs to be divided into 2 _× 2 sub-regions, each of which may be different or the same size.

In step S102, an anchor point needs to be randomly determined in each sub-area, and it can be understood that the location of the anchor point 222 is randomly generated by the system, the location of the anchor point is not fixed, and after the anchor point is generated, the location information of the anchor point in the sub-area, that is, the coordinates of the anchor point, can be obtained.

It will be appreciated that the mapping position of the candidate region in the first feature map may be obtained by mapping the candidate region to the first feature map. In the mapping process, if the sample image is 800 _× 800 picture with 665 in the middle _× 665 (zone inner frame one dog). After the image passes through the extracted features of the target detection model, the scaling step length of the first feature image is 32. Thus, the side length of both the scaled sample image (i.e., the first feature map) and the scaled initial candidate region (i.e., the candidate region) is 1/32 of the input. I.e. the first feature map becomes 25 _× 25. The candidate region would become 20.78 _× 20.78 (reserving decimal places) The candidate region is still centered in the first feature map. After the candidate region is mapped to the first feature map, the position relation and the proportion of the candidate region and the first feature map, and the position relation and the proportion of the candidate region and the sample picture remain unchanged. In addition, the step of mapping the candidate region to the first feature map may precede step S101.

In step S103, the feature value of the anchor point is determined according to the position information of the anchor point in the sub-area and the mapping position of the candidate area in the first feature map.

In step S104, the feature values of the sub-regions may be determined by the feature values of the anchor points in the sub-regions, thereby generating a second feature map.

The second feature map may be used to predict a class of objects in the candidate region, and may also be used to predict confidence in the class and location information of the objects. For example, a dog is within the candidate region, and the object is the confidence (probability) of the dog, and the position coordinates of the dog in the sample image.

It will be appreciated that the method 100 may be implemented in a ROI alignment (Region of Interest Align, region of interest algorithm) network of the object detection model, and that the generated second feature map may also be converted into a set of discrete point features, i.e. a set of discrete point features corresponding to the candidate region. The discrete point feature set can be input into a plurality of full-connection layers of the target detection model for classification, regression and the like to obtain classification scores (confidence degrees) and position information (positioning coordinates) of the final detection object, and the classification scores (confidence degrees) and the position information (positioning coordinates) can be specifically set according to actual conditions.

Fig. 2 is a schematic diagram of a feature map generating method in the related art, referring to fig. 2, fig. 2 shows a mapping position of a candidate region in the related art in a first feature map, and a dashed box 210 represents the first feature map, where the first feature map includes a plurality of grids 211. Solid line box 220 represents a candidate region, which is partitioned into 2 _× 2 sub-areas 221. Black dots 222 represent anchor points in the sub-regions and white dots 212 represent eigenvalues at the corner points of the grid. In the related art, an ROI alignment scheme is adopted. In generating the second feature map of the candidate region 220, the midpoint of the sub-region 221 is selected as the anchor point 222, i.e., located in the sub-region 221, and then calculating the eigenvalues of the anchor points 222 by using the eigenvalues 212 of the four corner points of the grid corresponding to the anchor points 222. In the process of multiple parameter adjustment of model training, the same four corner points are used for calculating the characteristic value of the anchor point every time, and the characteristic information in the utilized first characteristic diagram is less.

Fig. 3 is a schematic diagram block diagram of a feature map generating method according to an embodiment of the disclosure, referring to fig. 3, fig. 3 shows a mapping position of a candidate region in a first feature map, and a dashed box 310 represents the first feature map, where the first feature map includes a plurality of grids 311. Solid line box 320 represents a candidate region, which is partitioned into 2 _× 2 sub-areas 321. Black dots 322 represent anchor points in the sub-regions and white dots 312 represent eigenvalues at the corner points of the grid.

In partitioning candidate region 320 to 2 _× After 2 sub-regions 321, an anchor point 322 is randomly generated within the sub-region 321, taking the upper left sub-region 321 as an example. Then, according to the position information of the anchor point 322 in the subarea and the mapping position of the candidate area in the first feature map, the grid 311 corresponding to the anchor point 322 is found, then the feature values of the anchor point 322 are calculated by using the feature values 312 of the four vertexes of the grid 311, and the feature values of the anchor point 322 are used as the feature values of the subarea 321 in the upper left corner, so that the feature values of the four subareas in the candidate area 320 can be determined, namely, the second feature map of the candidate area is determined.

It will be appreciated that during the training of the object detection model, it is necessary to input a sample image, then generate a candidate region from the sample image, then calculate a second feature map of the candidate region by the method 100, and then predict the class of the object in the candidate region using the feature map. And then calculating a loss function of the target detection model, and when the loss function does not meet the conditions, adjusting parameters of the model, and repeating the steps of generating a candidate region, calculating a second feature map of the candidate region, predicting a category and calculating the loss function. I.e. the training process described above needs to be repeated continuously so that the function over time satisfies the condition.

As shown in fig. 3, the upper left corner sub-area 321 coincides with the 6 grids 311 in the first feature map (i.e. the features of the sub-area are related to the 6 grids), if the anchor point is selected in the manner of fig. 2, the feature value of the anchor point is always related to the same one of the 6 grids 211, and in the process of multiple parameter adjustment, each anchor point can be randomly selected into any one of the 6 grids, i.e. in the process of multiple iterations of the model, the feature values of the corner points of the 6 grids can be used to calculate the feature value of the anchor point, so that more features in the first feature map are used, and data enhancement is realized from the feature level.

It can be understood that in the related art, since the midpoint of the sub-region is adopted as the anchor point, that is, in the repeated process of model training, each sub-region always selects the same anchor point, and the position information of the anchor point remains unchanged, the grid corresponding to the anchor point in the first feature map is fixed, that is, the features in the first feature map utilized in the calculation process of the feature value of the anchor point will not change (the feature values of four corner points 212 of the grid in fig. 2), so that the utilized features of the first feature map are single.

In the embodiment, by randomly selecting the anchor points, the anchor points at different positions in the subarea can be randomly selected in the parameter adjustment process, and as the grids in the first feature map corresponding to each anchor point are possibly different, more grids, namely the features in the first feature map, can be utilized in the model training process, so that feature expression is enriched from the feature level, data enhancement of the feature level is realized, and the performance of the model is improved. And the data enhancement of the sample image per se is not needed in advance like the mode in the related art, the flow is simpler, and the model training process is simplified.

Fig. 4 is a schematic diagram of a feature map generating method according to another embodiment of the present disclosure, referring to fig. 4, in some embodiments, determining anchor points randomly in a sub-region divided by a candidate region in step S102 may include: dividing the subareas into a plurality of units; an anchor point is randomly determined in each of the plurality of cells.

As in FIG. 4, candidate region 420 is partitioned intoIs 2 _× 2 sub-areas 421, each sub-area 421 may then be divided into 4 cells 423, e.g. sub-area 421 in the upper left corner in fig. 4, which may be equally divided into 2 with thick dashed lines _× 2 cells 423, an anchor 422 may be randomly determined in each cell 423.

It will be appreciated that the sub-regions may not be divided into cells (fig. 3), i.e. the entire region may randomly select an anchor point. Alternatively, the sub-region may be divided into a plurality of units (fig. 4), that is, the whole region may randomly select a plurality of anchor points, and since each unit is different from the grid 411 having the intersection in the first feature map 410, by dividing the units, the anchor points may be more easily located in each grid related to the sub-region randomly and uniformly, so that more features in the first feature map may be utilized, and the expression of the features is further enriched.

With continued reference to fig. 4, in some embodiments, in step S104, generating a second feature map of the candidate region according to the feature values of the anchor points may include the following steps: taking the characteristic value of the anchor point as the characteristic value of the unit where the anchor point is located, and averaging the characteristic values of a plurality of units in the subarea to obtain the characteristic value of the subarea; and generating a second feature map of the candidate region according to the feature values of the sub-regions in the candidate region.

Taking the upper left sub-area 421 in fig. 4 as an example, after the anchor point 422 is randomly determined in each unit 423, the grid 411 in the first feature map 410 corresponding to the anchor point 423 may be determined according to the location information of the anchor point and the mapping location of the candidate area in the first feature map, and then the feature values of the four corner points 412 of the grid 411 are used to calculate the feature value of the anchor point 422.

The calculated characteristic value of the anchor point 422 is then used as the characteristic value of the unit in which the anchor point is located. In this way, the respective characteristic values of the plurality of cells in each sub-region can be determined. For example, the four cells of the sub-region 421 in the upper left hand corner of fig. 4 have respective eigenvalues. Then, the characteristic value of the sub-area 421 may be calculated by an averaging operation of the characteristic values of the four units.

After determining the feature values of the four sub-regions of the candidate region 420, 2 can be obtained _× 2.

Of course, in other embodiments, the maximum value of the feature values of the plurality of units in the sub-area may also be taken as the feature value of the sub-area.

The characteristic values of the subareas are determined by integrating the characteristic values of the units in the subareas, so that a second characteristic map of the candidate areas is generated, the second characteristic map can be enabled to represent the characteristics of the candidate areas, and further the accurate determination of the follow-up prediction results is improved.

With continued reference to fig. 3, in some embodiments, in step S103, a feature value of the anchor point is determined according to the location information of the anchor point in the sub-area and the mapping location of the candidate area in the first feature map, including the following steps one and two.

Step one, determining a target grid corresponding to the anchor point from a plurality of grids contained in the first feature map according to the position information of the anchor point in the sub-region and the mapping position of the candidate region in the first feature map.

And step two, determining the characteristic value of the anchor point by utilizing a bilinear interpolation algorithm according to the characteristic information of the target grid.

As shown in fig. 3, the first feature map 310 may include 8×8 grids, and the feature information of each grid includes feature values of four corner points of the grid.

After the anchor point 322 is randomly selected, since the mapping position of the candidate region 320 in the first feature map 310 is known and the position of the anchor point 322 in the sub-region 322 is also known, the positional relationship of the anchor point 322 relative to the first feature map 310 can be obtained, and thus the target grid 311 corresponding to the anchor point 322, that is, the grid where the anchor point 322 is located, can be determined.

The feature values of the anchor points 322 may then be calculated using bilinear interpolation using the feature information of the target grid 311, i.e., the feature values at its four corner points 312.

The bilinear interpolation algorithm is a method for calculating the characteristic value of the anchor point by utilizing 4 points in the first characteristic diagram, the calculation result is accurate, the calculation speed is high, and the calculation speed of the model and the accuracy of the prediction result are improved.

Also, as in fig. 4, the anchor point 422 in each cell may find its corresponding target mesh 411 in the first feature map 410, and then calculate the feature value of the anchor point 422 using the feature values of the four corner points 412 of the target mesh.

Of course, besides the bilinear interpolation algorithm, the feature value of the anchor point can also be calculated by using other interpolation algorithms such as bicubic interpolation algorithm, nearest neighbor method and the like.

Fig. 5 is a schematic diagram of a feature map generating method according to another embodiment of the present disclosure, referring to fig. 5, on the basis of the above embodiment, before the step S101 of partitioning the candidate region of the sample image, the method further includes: the size of the candidate region is randomly enlarged.

It will be appreciated that the candidate region may be a rectangular frame, and that after the candidate region is obtained, the size of the rectangular frame may be enlarged in a random manner, i.e. the enlargement factor may be any number. A second feature map is then generated using the enlarged candidate region.

After the candidate region is generated at one stage of the object detection model, the candidate region may be randomly enlarged, the solid line box 520 in fig. 5 represents the enlarged candidate region, the thick dotted line box 530 represents the candidate region before enlargement, after the candidate region is randomly enlarged, the enlarged candidate region may be partitioned, the anchor point may be randomly determined in the sub-region, and then the feature value of the anchor point may be calculated, thereby generating the second feature map of the enlarged candidate region.

It can be appreciated that the range of the candidate region after expansion in the first feature map 510 is larger, and when the feature value of the anchor point 522 is calculated, the candidate region will utilize a part of the grids outside the original candidate region 530, so that the semantic information of the context can be considered, that is, the features of the background region outside the candidate region can be comprehensively utilized when the prediction is performed, which is beneficial to improving the accuracy of the result prediction.

For example, if a shadow is selected from a candidate region box in the image to be measured, it may be difficult to accurately predict what object the shadow belongs to from the shadow alone, but by enlarging the size of the candidate region, the content of the background region outside the candidate region in the image to be measured may be used to accurately predict the shadow, for example, a human shadow or a building shadow.

In some embodiments, randomly expanding the size of the candidate region may include the following sub-steps one through three.

And step one, randomly generating an expansion coefficient, wherein the expansion coefficient is larger than 1.

Step two, determining the target length and the target width of the enlarged candidate region according to the expansion coefficient, the length of the candidate region and the width of the candidate region;

and thirdly, expanding the candidate region according to the target length and the target width under the condition that the central position of the candidate region is kept unchanged.

The expansion coefficient may be a random number randomly generated by the system, which should take any value greater than 1 in order to allow expansion of the candidate region.

The expansion coefficient may be an expansion coefficient of the length and width of the candidate region, that is, the target length of the expanded candidate region=the length of the original candidate region×the expansion coefficient, and the target width of the expanded candidate region=the width of the original candidate region×the expansion coefficient.

Then, the center position of the candidate region can be kept unchanged, the candidate region is expanded outwards, namely, the length and the width of the candidate region are respectively changed into the target length and the target width, so that the expansion of the candidate region is realized, and the expansion is simpler and easy to realize.

Also, it can be understood that in the training process of the target detection model, due to multiple parameter adjustment, the expansion coefficients obtained randomly by the candidate regions are different in each parameter adjustment process, so that more context semantic information can be utilized to realize data enhancement at the feature level and enrich feature expression.

With continued reference to FIG. 5, in one particular embodiment, feature-level data enhancement may be implemented in two ways.

(1) Random perturbation and interpolation of regional features

Taking the object detection model as a two-stage object detector, for example (e.g., a fast Region-Convolutional-Network (fast RCNN)). After a candidate region (proposal) is acquired in the first stage, the characteristics corresponding to the candidate region are required to be aggregated, namely the candidate region characteristics are divided into a plurality of sub-regions, one or more anchor points are selected from each sub-region in a uniformly distributed mode, and characteristic values of the corresponding anchor points are acquired in a bilinear interpolation mode. In the embodiment, in the training stage of the target detection model, anchor points in random disturbance subareas are positioned, so that different weights are applied to the features at each angular point of the grid according to the position information, and the feature expression is enriched.

(2) Randomly expanding candidate region boundaries

In order to further improve feature richness, the boundary of the candidate region is further randomly and outwards expanded while the anchor point is randomly disturbed. Under the condition of expanding in a proper range, the operation can better utilize the context semantic information of the candidate region and enrich the feature expression.

The target detection data enhancement method based on the regional feature interpolation can be used in a two-stage target detector to enhance data at the feature level and improve the detection model performance. The feature enhancement method of the embodiment can innovatively enhance data at the feature level, more fully utilize the context semantic information of the target region corresponding to the feature map, and more effectively improve the target detection performance.

FIG. 6 is a flow chart of a training method for a target detection model provided in accordance with an embodiment of the present disclosure; referring to fig. 6, the embodiment of the disclosure further provides a training method 600 of the target detection model, where the method 600 includes the following steps S601 to S603.

Step S601, obtaining a sample image and labeling information of the sample image.

Step S602, inputting a sample image into a target detection model to obtain a prediction result of the sample image; and in the process of obtaining the prediction result of the sample image, the target detection model is used for obtaining a second feature map of the candidate region in the sample image according to the feature map generating method according to any one of the embodiments, and the second feature map is used for obtaining the prediction result of the sample image.

And step S603, adjusting parameters of the target detection model according to the prediction result of the sample image and the labeling information of the sample image to obtain a trained target detection model.

The object detection model is used for identifying a plurality of objects in a picture, and different objects can be positioned.

The sample image may be a picture for training the target detection model, and the labeling information of the sample image may include a true category of the object in each candidate region in the sample image, true position information of the object, and the like.

And inputting the sample image into a target detection model to obtain a prediction result. It may be appreciated that the target detection model may first obtain the candidate region of the sample image by using the sample image, and then obtain the second feature map corresponding to the candidate region by using the feature map generating method described in any of the foregoing embodiments. And obtaining a prediction result by using the second feature map.

The prediction result may include a prediction category for each object contained in the sample image, a confidence (classification score) of the prediction category, and predicted position information (positioning coordinates, i.e., position coordinates of the object in the sample image) of the object.

And then calculating a loss function by using the labeling information and the prediction result, and adjusting the parameters of the target detection model under the condition that the loss function does not meet the convergence condition. And then repeatedly executing the step S602 until the loss function meets the convergence condition to obtain a trained target detection model. The trained target detection model can be used for detecting the image to be detected to obtain the category of the object in the image to be detected, the confidence of the category and the position information of the object.

It can be understood that, because the feature map generating method of any embodiment is adopted in the process of outputting the prediction result, in the process of parameter adjustment of the model, anchor points at different positions in the subarea can be randomly selected, and because grids in the first feature map corresponding to each anchor point may be different, more grids, namely, features in the first feature map, can be utilized in the model training process, so that feature expression is enriched from a feature level, data enhancement of the feature level is realized, and further, the performance of the model is improved. And the data enhancement of the sample image per se is not needed in advance like the mode in the related art, the flow is simpler, and the model training process is simplified.

In some embodiments, the object detection model includes a first intermediate layer, a second intermediate layer, and a third intermediate layer; in step S602, inputting the sample image into the target detection model to obtain a prediction result of the sample image may include: inputting the sample image into the first intermediate layer to obtain a candidate region of the sample image and a first feature map of the sample image; inputting the candidate region and the first feature map into a second intermediate layer to obtain a second feature map of the candidate region; and inputting the second characteristic diagram into a third intermediate layer to obtain a prediction result of the sample image.

The object detection model may include a first intermediate layer, a second intermediate layer, and a third intermediate layer, wherein at least one intermediate layer may be comprised of one or more neural networks.

It will be appreciated that the first intermediate layer may comprise a backbone (backbone) network and a region candidate network (Region Proposal Network, RPN), the sample image being input to the backbone network to obtain a first feature map of the sample image, and the first feature map being input to the region candidate network to obtain candidate regions in the sample image, the candidate regions typically not being uniform in size.

The second intermediate layer may include an ROI alignment network, and may be used to perform the feature map generating method described in any of the foregoing embodiments, that is, partition the candidate region of the sample image; randomly determining anchor points in the subareas obtained by the subareas of the candidate areas; determining a characteristic value of the anchor point according to the position information of the anchor point in the subarea and the mapping position of the candidate area in the first characteristic map; and generating a second feature map of the candidate region according to the feature value of the anchor point.

In some embodiments, the candidate region and the first feature map are input into a second intermediate layer, the second intermediate layer may map the candidate region to the first feature map, find the features of the corresponding region from the first feature map by using the coordinates of the candidate region, and sample and take points (select anchor points) on the features of various candidate regions with different shapes by using the ROI alignment network, so as to generate second feature maps of various candidate regions, where the dimensions of the second feature maps are the same.

The third intermediate layer may include a plurality of fully connected layers that may implement classification and frame regression using the second feature map to obtain a prediction result. For example, the third intermediate layer may convert the second feature map into a discrete point feature set, and since the sizes of the second feature maps are the same, the lengths of the discrete point feature sets corresponding to the obtained candidate regions are the same, and then the discrete point feature sets are input into the plurality of fully connected layers, so that the prediction result may be obtained.

And then calculating a loss function by using the labeling information and the prediction result, and adjusting parameters of the target detection model under the condition that the loss function does not meet the convergence condition, for example, adjusting parameters of the main network, the area candidate network and a plurality of full connection layers until the loss function meets the convergence condition so as to obtain the trained target detection model.

In this embodiment, the second intermediate layer may be used to enhance data on the feature layer, thereby improving performance of the target detection model and simplifying the training process.

FIG. 7 is a flow chart of a method for object detection provided in accordance with an embodiment of the present disclosure; referring to fig. 7, the present embodiment provides a target detection method 700, which includes the following steps S701 and S702.

Step S701, acquiring an image to be measured.

Step S702, inputting the image to be detected into a trained target detection model to obtain a detection result of the image to be detected, wherein the trained target detection model is obtained by training the training method of the target detection model in any embodiment.

The image to be detected may be a picture which is required to be subjected to target detection. The trained target detection model may be obtained by training the target detection model training method provided in any of the above embodiments. The detection result may include a category of an object included in the image to be detected, a confidence of the category, position information of the object, and the like, and the accuracy is high.

It can be appreciated that the randomly selected anchor points and the randomly enlarged candidate region sizes in the feature map generation method are used in the model training process. After model training is finished, namely in the target detection process of the image to be detected, the position of the anchor point can still adopt a traditional ROI alignment scheme, namely the anchor point is taken as the midpoint of a subarea (or a unit in the subarea), and the candidate area can be predicted accurately without random expansion.

FIG. 8 is a block diagram of a feature map generating apparatus provided according to an embodiment of the present disclosure; referring to fig. 8, an embodiment of the disclosure provides a feature map generating apparatus 800, where the apparatus 800 includes the following units.

A partitioning unit 801 for partitioning a candidate region of the sample image.

A first determining unit 802 is configured to determine an anchor point randomly in a sub-area obtained by partitioning the candidate area.

A second determining unit 803, configured to determine a feature value of the anchor point according to the location information of the anchor point in the sub-area and the mapping location of the candidate area in the first feature map, where the first feature map is a feature map of the sample image.

The generating unit 804 is configured to generate a second feature map of the candidate region according to the feature value of the anchor point, where the second feature map is used to predict a class of the object in the candidate region.

In some embodiments, the first determining unit 802 is further configured to: dividing the subareas into a plurality of units; an anchor point is randomly determined in each of the plurality of cells.

In some embodiments, the generating unit 804 is further configured to: taking the characteristic value of the anchor point as the characteristic value of the unit where the anchor point is located, and averaging the characteristic values of a plurality of units in the subarea to obtain the characteristic value of the subarea; and generating a second feature map of the candidate region according to the feature values of the sub-regions in the candidate region.

In some embodiments, the second determining unit 803 is further configured to: determining a target grid corresponding to the anchor point from a plurality of grids contained in the first feature map according to the position information of the anchor point in the sub-region and the mapping position of the candidate region in the first feature map; and determining the characteristic value of the anchor point by using a bilinear interpolation algorithm according to the characteristic information of the target grid.

In some embodiments, before the partitioning unit 801 is configured to partition the candidate region of the sample image, the apparatus further comprises: and an enlarging unit for randomly enlarging the size of the candidate region.

In some embodiments, the expansion unit is further to: randomly generating an expansion coefficient, wherein the expansion coefficient is larger than 1; determining the target length and the target width of the enlarged candidate region according to the expansion coefficient, the length of the candidate region and the width of the candidate region; the candidate region is enlarged according to the target length and the target width while keeping the center position of the candidate region unchanged.

The feature map generating device provided by the embodiment partitions the candidate region of the sample image; randomly determining anchor points in the subareas obtained by the subareas of the candidate areas; determining a characteristic value of an anchor point according to the position information of the anchor point in the subarea and the mapping position of the candidate area in a first characteristic image, wherein the first characteristic image is a characteristic image of a sample image; and generating a second feature map of the candidate region according to the feature value of the anchor point, wherein the second feature map is used for predicting the category of the object in the candidate region. Because the positions of the anchor points are randomly determined, in the training process of the target detection model, different anchor points can be randomly obtained in each parameter adjustment process, further more characteristic information in the first characteristic diagram can be utilized for training, and characteristic expression is enriched, so that data enhancement is realized from the characteristic layer, the processing process of sample pictures and the model training process are simplified, and further the performance and generalization capability of the trained target detection model can be improved.

FIG. 9 is a block diagram of a training apparatus for an object detection model provided in accordance with an embodiment of the present disclosure; referring to fig. 9, an embodiment of the disclosure provides a training apparatus 900 for a target detection model, where the apparatus 900 includes the following units.

A sample acquiring unit 901, configured to acquire a sample image and labeling information of the sample image;

a result prediction unit 902, configured to input a sample image into the target detection model, and obtain a prediction result of the sample image; and in the process of obtaining the prediction result of the sample image, the target detection model is used for obtaining a second feature map of the candidate region in the sample image according to the feature map generating method according to any one of the embodiments, and the second feature map is used for obtaining the prediction result of the sample image.

The adjusting unit 903 is configured to adjust parameters of the target detection model according to the prediction result of the sample image and the labeling information of the sample image, so as to obtain a trained target detection model.

In some embodiments, the object detection model includes a first intermediate layer, a second intermediate layer, and a third intermediate layer; the result prediction unit 902 is further configured to: inputting the sample image into the first intermediate layer to obtain a candidate region of the sample image and a first feature map of the sample image; inputting the candidate region and the first feature map into a second intermediate layer to obtain a second feature map of the candidate region; and inputting the second characteristic diagram into a third intermediate layer to obtain a prediction result of the sample image.

FIG. 10 is a block diagram of an object detection device provided in accordance with an embodiment of the present disclosure; referring to fig. 10, an embodiment of the disclosure provides an object detection apparatus 1000, which includes the following units.

An image acquisition unit 1001 is used for acquiring an image to be measured.

The detection unit 1002 is configured to input an image to be detected into a trained target detection model to obtain a detection result of the image to be detected, where the trained target detection model is obtained by training the training method of the target detection model described in any one of the embodiments.

For descriptions of specific functions and examples of each module and sub-module of the apparatus in the embodiments of the present disclosure, reference may be made to the related descriptions of corresponding steps in the foregoing method embodiments, which are not repeated herein.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the related user personal information all conform to the regulations of related laws and regulations, and the public sequence is not violated.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

An embodiment of the present disclosure provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the embodiments described above.

The disclosed embodiments provide a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method of any of the embodiments described above.

The disclosed embodiments provide a computer program product comprising a computer program which, when executed by a processor, implements the method of any of the embodiments described above.

Fig. 11 is a block diagram of an electronic device for implementing a feature map generation method of an embodiment of the present disclosure. Referring to FIG. 11, an electronic device is intended to represent various forms of digital computers, such as laptops, desktops, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile apparatuses, such as personal digital assistants, cellular telephones, smartphones, wearable devices, and other similar computing apparatuses. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 11, the apparatus 1100 includes a computing unit 1101 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1102 or a computer program loaded from a storage unit 1108 into a Random Access Memory (RAM) 1103. In the RAM 1103, various programs and data required for the operation of the device 1100 can also be stored. The computing unit 1101, ROM 1102, and RAM 1103 are connected to each other by a bus 1104. An input/output (I/O) interface 1105 is also connected to bus 1104.

Various components in device 1100 are connected to I/O interface 1105, including: an input unit 1106 such as a keyboard, a mouse, etc.; an output unit 1107 such as various types of displays, speakers, and the like; a storage unit 1108, such as a magnetic disk, optical disk, etc.; and a communication unit 1109 such as a network card, modem, wireless communication transceiver, or the like. The communication unit 1109 allows the device 1100 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 1101 may be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 1101 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The calculation unit 1101 performs the respective methods and processes described above, such as a feature map generation method, a training method of an object detection model, and an object detection method. For example, in some embodiments, the feature map generation method, the training method of the object detection model, and the object detection method may be implemented as computer software programs tangibly embodied on a machine-readable medium, such as storage unit 1108. In some embodiments, some or all of the computer programs may be loaded and/or installed onto device 1100 via ROM 1102 and/or communication unit 1109. When the computer program is loaded into the RAM 1103 and executed by the computing unit 1101, one or more steps of the feature map generation method, the training method of the object detection model, and the object detection method described above may be performed. Alternatively, in other embodiments, the computing unit 1101 may be configured to perform the feature map generation method, the training method of the target detection model, and the target detection method in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions, improvements, etc. that are within the principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A feature map generation method, the method comprising:

partitioning a candidate region of the sample image;

randomly determining an anchor point in a sub-region obtained by partitioning the candidate region;

determining a characteristic value of the anchor point according to the position information of the anchor point in the sub-region and the mapping position of the candidate region in a first characteristic map, wherein the first characteristic map is a characteristic map of the sample image;

And generating a second feature map of the candidate region according to the feature value of the anchor point, wherein the second feature map is used for predicting the category of the object in the candidate region.

2. The method of claim 1, wherein randomly determining anchor points in the sub-region partitioned by the candidate region comprises:

dividing the sub-region into a plurality of cells;

an anchor point is randomly determined in each of the plurality of cells.

3. The method of claim 2, wherein generating a second feature map of the candidate region from feature values of the anchor point comprises:

taking the characteristic value of the anchor point as the characteristic value of the unit where the anchor point is located, and averaging the characteristic values of the units in the subarea to obtain the characteristic value of the subarea;

and generating a second feature map of the candidate region according to the feature values of the sub-regions in the candidate region.

4. A method according to any of claims 1-3, wherein determining the feature value of the anchor point from the location information of the anchor point in the sub-region and the mapped location of the candidate region in the first feature map comprises:

Determining a target grid corresponding to the anchor point from a plurality of grids contained in the first feature map according to the position information of the anchor point in the sub-region and the mapping position of the candidate region in the first feature map;

and determining the characteristic value of the anchor point by using a bilinear interpolation algorithm according to the characteristic information of the target grid.

5. The method of any of claims 1-4, prior to partitioning the candidate region of the sample image, the method further comprising:

randomly expanding the size of the candidate region.

6. The method of claim 5, wherein randomly expanding the size of the candidate region comprises:

randomly generating an expansion coefficient, the expansion coefficient being greater than 1;

determining the target length and the target width of the candidate region after expansion according to the expansion coefficient, the length of the candidate region and the width of the candidate region;

and expanding the candidate region according to the target length and the target width under the condition that the central position of the candidate region is kept unchanged.

7. A method of training a target detection model, the method comprising:

acquiring a sample image and labeling information of the sample image;

Inputting the sample image into a target detection model to obtain a prediction result of the sample image; and in the process of obtaining the prediction result of the sample image, the target detection model is used for obtaining a second feature map of the candidate region in the sample image according to the method of any one of claims 1-6, and the second feature map is used for obtaining the prediction result of the sample image;

and adjusting parameters of the target detection model according to the prediction result of the sample image and the labeling information of the sample image to obtain a trained target detection model.

8. The method of claim 7, wherein the object detection model includes a first intermediate layer, a second intermediate layer, and a third intermediate layer;

inputting the sample image into a target detection model to obtain a prediction result of the sample image, wherein the method comprises the following steps:

inputting the sample image into the first intermediate layer to obtain a candidate region of the sample image and a first feature map of the sample image;

inputting the candidate region and the first feature map into the second intermediate layer to obtain a second feature map of the candidate region;

and inputting the second characteristic diagram into the third intermediate layer to obtain a prediction result of the sample image.

9. A target detection method comprising:

acquiring an image to be detected;

inputting the image to be detected into a trained target detection model to obtain a detection result of the image to be detected, wherein the trained target detection model is obtained by training according to the method of claim 7 or 8.

10. A feature map generation apparatus, the apparatus comprising:

the partitioning unit is used for partitioning the candidate area of the sample image;

a first determining unit, configured to randomly determine an anchor point in a sub-area obtained by partitioning the candidate area;

a second determining unit, configured to determine a feature value of the anchor point according to the position information of the anchor point in the sub-area and the mapping position of the candidate area in a first feature map, where the first feature map is a feature map of the sample image;

and the generation unit is used for generating a second characteristic diagram of the candidate region according to the characteristic value of the anchor point, wherein the second characteristic diagram is used for predicting the category of the object in the candidate region.

11. The apparatus of claim 10, wherein the first determining unit is further configured to:

dividing the sub-region into a plurality of cells;

An anchor point is randomly determined in each of the plurality of cells.

12. The apparatus of claim 11, wherein the generating unit is further configured to:

13. The apparatus according to any of claims 10-12, wherein the second determining unit is further configured to:

14. The apparatus according to any of claims 10-13, further comprising, before the partition unit is configured to partition a candidate region of a sample image:

and the enlarging unit is used for randomly enlarging the size of the candidate area.

15. The apparatus of claim 14, wherein the expansion unit is further configured to:

16. A training apparatus for a target detection model, the apparatus comprising:

the sample acquisition unit is used for acquiring a sample image and labeling information of the sample image;

the result prediction unit is used for inputting the sample image into a target detection model to obtain a prediction result of the sample image; and in the process of obtaining the prediction result of the sample image, the target detection model is used for obtaining a second feature map of the candidate region in the sample image according to the method of any one of claims 1-6, and the second feature map is used for obtaining the prediction result of the sample image;

and the adjusting unit is used for adjusting the parameters of the target detection model according to the prediction result of the sample image and the labeling information of the sample image to obtain a trained target detection model.

17. The apparatus of claim 16, wherein the object detection model comprises a first intermediate layer, a second intermediate layer, and a third intermediate layer;

the result prediction unit is further configured to:

18. An object detection apparatus comprising:

the image acquisition unit is used for acquiring an image to be detected;

the detection unit is used for inputting the image to be detected into a trained target detection model to obtain a detection result of the image to be detected, wherein the trained target detection model is obtained by training according to the method of claim 7 or 8.

19. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.

20. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-9.

21. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-9.