CN109978043B

CN109978043B - Target detection method and device

Info

Publication number: CN109978043B
Application number: CN201910209571.9A
Authority: CN
Inventors: 刘萌萌
Original assignee: Hangzhou H3C Technologies Co Ltd
Current assignee: Hangzhou H3C Technologies Co Ltd
Priority date: 2019-03-19
Filing date: 2019-03-19
Publication date: 2020-11-10
Anticipated expiration: 2039-03-19
Also published as: CN109978043A

Abstract

The application provides a target detection method and a target detection device; the method comprises the following steps: acquiring a plurality of different first characteristic vectors and second characteristic vectors of an image to be detected; obtaining a first target detection result corresponding to each first feature vector based on the first target detection model, and obtaining a second target detection result corresponding to each second feature vector based on the second target detection model; and determining a target detection result of the image to be detected according to the first target detection result and the second target detection result. Because the first size of the image cutting frame corresponding to the first feature vector is different from the second size of the image cutting frame corresponding to the matched second feature vector, and the difference value between the third size corresponding to the first feature vector with the smallest dimension and the fourth size corresponding to the second feature vector with the smallest dimension is smaller than the set threshold value, the first target detection result and the second target detection result form a complementary relationship, and the precision of the target detection result is improved.

Description

Target detection method and device

Technical Field

The application relates to the technical field of machine vision, in particular to a target detection method and device.

Background

The YOLO target detection model solves the target detection as a regression problem and completes the input of the image to be detected to the output of the target position and type based on an individual end-to-end (end-to-end) neural network. When the YOLO detects a target in an image to be detected, the image to be detected is cut into a plurality of grids (grid cells) according to a certain image cutting frame size, and if the center of a certain target falls into a certain grid cell, the target is predicted by the grid cell. Each grid cell predicts regression frames (bounding boxes) with the same number as the prior frames based on a plurality of prior frames with different sizes, and coordinates corresponding to the bounding boxes are coordinates of the target in the image to be detected.

However, after the image to be detected is cut into a plurality of grids by the YOLO, if the center points of some targets fall on the edge positions of the grids, the problem of missing detection is caused, and further, a certain error exists in the result of target detection on the image to be detected.

Disclosure of Invention

In view of this, an object of the embodiments of the present application is to provide a method and an apparatus for detecting an object, which can reduce a problem that a larger error exists in an object detection result of an image to be detected due to missing detection of a grid edge portion, so as to improve accuracy of object detection.

In a first aspect, an embodiment of the present application provides a target detection method, including:

acquiring a plurality of different first characteristic vectors of an image to be detected and second characteristic vectors respectively matched with the first characteristic vectors; the first size of the image cutting frame corresponding to the first feature vector is different from the second size of the image cutting frame corresponding to the second feature vector matched with the first feature vector, and the difference value between the third size corresponding to the first feature vector with the smallest dimension among the plurality of first feature vectors and the fourth size corresponding to the second feature vector with the smallest dimension among the plurality of second feature vectors is smaller than a set threshold value; the dimension size refers to the number of elements corresponding to the target dimension;

for each first feature vector, performing target detection on the image to be detected based on a first target detection model corresponding to the first feature vector, an image cutting frame corresponding to the first feature vector and a prior frame corresponding to the first target detection model to obtain a first target detection result corresponding to the first feature vector,

for each second feature vector, performing target detection on the image to be detected based on a second target detection model corresponding to the second feature vector, an image cutting frame corresponding to the second feature vector and a prior frame corresponding to the second target detection model to obtain a second target detection result corresponding to the second feature vector;

and determining the target detection result of the image to be detected according to the first target detection result and the second target detection result.

In a second aspect, an embodiment of the present application further provides an object detection apparatus, where the apparatus includes:

the acquisition module is used for acquiring a plurality of different first characteristic vectors of the image to be detected and second characteristic vectors respectively matched with the first characteristic vectors; the first size of the image cutting frame corresponding to the first feature vector is different from the second size of the image cutting frame corresponding to the second feature vector matched with the first feature vector, and the difference value between the third size corresponding to the first feature vector with the smallest dimension size in the plurality of first feature vectors and the fourth size corresponding to the second feature vector with the smallest dimension size in the plurality of second feature vectors is smaller than a set threshold value; the dimension size refers to the number of elements corresponding to the target dimension;

a first identification module, configured to perform, for each first feature vector, target detection on the to-be-detected image based on a first target detection model corresponding to the first feature vector, an image cutting frame corresponding to the first feature vector, and a prior frame corresponding to the first target detection model, to obtain a first target detection result corresponding to the first feature vector, and,

the second identification module is used for carrying out target detection on the image to be detected based on a second target detection model corresponding to each second feature vector, an image cutting frame corresponding to the second feature vector and a prior frame corresponding to the second target detection model aiming at each second feature vector to obtain a second target detection result corresponding to the second feature vector;

and the third identification module is used for determining the target detection result of the image to be detected according to the first target detection result and the second target detection result.

In a third aspect, an embodiment of the present application further provides a computer device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the computer device is running, the machine-readable instructions when executed by the processor performing the steps of the first aspect described above, or any possible implementation of the first aspect.

In a fourth aspect, this application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps in the first aspect or any one of the possible implementation manners of the first aspect.

In the target detection method provided by the embodiment of the disclosure, after a plurality of different first feature vectors of an image to be detected and second feature vectors matched with the first feature vectors are obtained, a first target detection model, an image cutting frame corresponding to the first feature vectors and a prior frame corresponding to the first target detection model are used for performing target detection on the image to be detected to obtain a first target detection result; and performing target detection on the image to be detected by using a second target detection model, the image cutting frame corresponding to the second characteristic vector and the prior frame corresponding to the second target detection model to obtain a second target detection result. Because the first size of the image cutting frame corresponding to the first feature vector is different from the second size of the image cutting frame corresponding to the second feature vector matched with the first feature vector, and the difference value between the third size corresponding to the first feature vector with the smallest dimension among the plurality of first feature vectors and the fourth size corresponding to the second feature vector with the smallest dimension among the plurality of second feature vectors is smaller than the set threshold value, if the center of a certain target falls on the edge position of the first grid, the target falls on other positions except the edge position in the second grid, and even if the first target detection model misses the target, the target can be detected by the second target detection model; if the center of a certain target falls on the edge position of the second grid, the center of the certain target falls on other positions except the edge position in the first grid, and even if the second target detection model misses the target, the target is detected by the first target detection model; the first target detection result and the second target detection result form a complementary relationship, so that the accuracy of the target detection result is improved.

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

Fig. 1 is a flowchart illustrating a target detection method provided in an embodiment of the present application;

fig. 2 is a flowchart illustrating a specific method for obtaining a first feature vector in a target detection method provided in an embodiment of the present application;

fig. 3 is a structural diagram of a neural network for acquiring a first feature vector and a second feature vector in the target detection method provided in the embodiment of the present application;

fig. 4 is a structural diagram illustrating another neural network for acquiring a first feature vector and a second feature vector in the target detection method provided in the embodiment of the present application;

fig. 5 is a flowchart illustrating a specific method for obtaining a first target detection model and a second target detection model in the target detection method provided in the embodiment of the present application;

FIG. 6 is a schematic diagram of an object detection apparatus provided in an embodiment of the present application;

fig. 7 shows a schematic diagram of a computer device provided in an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

The technical solutions in the present application will be described clearly and completely with reference to the drawings in the present application, and it should be understood that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the present application, as generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

To facilitate understanding of the present embodiment, first, a detailed description is given of an object detection method disclosed in the embodiments of the present application, and an execution subject of the object detection method provided in the embodiments of the present application is generally a computer device with computing capability. In addition, the first target detection model and the second target detection model in the embodiment of the present application may be YOLO, or may be other types of target detection models. These models have the following characteristics: and cutting the image to be processed by using the image cutting frame to form a plurality of grids, and regressing to obtain the position information of the target in the image to be detected based on each grid and a preset prior frame.

Example one

Referring to fig. 1, which is a flowchart of a target detection method provided in an embodiment of the present application, the method includes steps S101 to S104, where:

s101: and acquiring a plurality of different first characteristic vectors of the image to be detected and second characteristic vectors respectively matched with the first characteristic vectors.

The first size of the image cutting frame corresponding to the first feature vector is different from the second size of the image cutting frame corresponding to the second feature vector matched with the first feature vector, and the difference value between the third size corresponding to the first feature vector with the smallest dimension among the plurality of first feature vectors and the fourth size corresponding to the second feature vector with the smallest dimension among the plurality of second feature vectors is smaller than a set threshold value; the dimension size refers to the number of elements corresponding to the target dimension.

In a specific implementation, the plurality of different first feature vectors means that the first dimensions of the different first feature vectors are different in dimension size, and the second dimensions are also different in dimension size.

Illustratively, there are three first eigenvectors, and the dimension sizes of the three first eigenvectors in the first dimension are: 13. 26 and 52; the dimension sizes of the three first feature vectors in the second dimension are respectively as follows: 13. 26 and 52. The three first feature vectors are: 13 × 75, 26 × 75, 52 × 75.

Similarly, the different second feature vectors refer to different first dimensions and different second dimensions of the second feature vectors.

Illustratively, there are three second eigenvectors, and the dimension sizes of the three second eigenvectors in the first dimension are: 14. 28 and 56; the dimension sizes of the three first feature vectors in the second dimension are respectively as follows: 14. 28 and 56. The three first feature vectors are: 14 × 75, 28 × 75, 56 × 75.

The first characteristic vector and the second characteristic vector of the image to be detected can be obtained through the characteristic extraction network.

Specifically, the image to be detected can be represented as a map vector; if the image to be detected is a two-channel image, such as a binary image, the corresponding image quantity is a two-dimensional vector; and if the image to be detected is a three-channel image, the corresponding image quantity is a three-dimensional vector. The embodiment of the present application describes a target detection method provided by the embodiment of the present application by taking an image amount as a three-dimensional vector as an example. The target dimension includes a first dimension and a second dimension in the embodiment of the present application.

The feature extraction network comprises a plurality of feature extraction layers, after the image vector of the image to be detected is input into the feature extraction network, the feature extraction layer performs convolution operation on the image vector of the image to be detected aiming at the condition that the feature extraction layer is the first layer of feature extraction layer in the feature extraction network to obtain the feature vector corresponding to the feature extraction layer, and transmits the feature vector corresponding to the feature extraction layer to the next layer of feature extraction layer.

Aiming at the condition that the feature vector extraction layer is not the first feature extraction layer in the feature extraction network, carrying out convolution operation on the feature vector transmitted by the feature extraction layer on the basis of the last feature extraction layer by the feature extraction layer to obtain a feature vector corresponding to the feature extraction layer; and if the feature extraction layer is not the last feature extraction layer, transmitting the obtained feature vector to the next feature extraction layer.

And taking N feature extraction layers in the feature extraction network as target feature extraction layers, wherein the feature vector corresponding to each target feature extraction layer is an intermediate feature vector. The number of the target feature extraction layers is consistent with that of the first feature vectors, each intermediate feature vector corresponds to one first feature vector, and the dimension sizes of the intermediate feature vectors in the first dimension and the second dimension are decreased in a front-back sequence of the target feature extraction layers.

For example, the target feature extraction layer has three layers, and the corresponding intermediate feature vectors are, according to the sequence of the target feature extraction layer: 52 × 256, 26 × 512, 13 × 512.

After intermediate feature vectors respectively corresponding to the target feature extraction layers are obtained from each target feature extraction layer, a first feature vector and a second feature vector corresponding to each intermediate feature vector are determined according to each intermediate feature vector.

Here, the first feature vector and its matching second feature vector may be derived based on the same intermediate feature vector. This is because the intermediate feature vectors actually carry the feature information in the image to be processed, and the intermediate feature vectors are different from each other, and the carried feature information in the image to be processed is different from each other. In order to reduce the influence of the difference on the first target detection result and the second target detection result, a complementary relationship between the first target detection result and the second target detection result is formed based on the first feature vector and the matched second feature vector, so that the first feature vector and the matched second feature vector are obtained by using the same intermediate feature vector, and feature information carried by the obtained first feature vector and the matched second feature vector is relatively close.

In addition, the first feature vector and the second feature vector matched with the first feature vector are obtained based on the same intermediate feature vector, so that the operation of a feature extraction layer can be reduced, the operation speed is increased, and the model training difficulty is reduced.

It is noted that the first feature vector and its matching second feature vector may be derived based on different intermediate feature vectors. At this time, two feature extraction networks may be provided, which extract an intermediate feature vector corresponding to the first feature vector and an intermediate feature vector corresponding to the second feature vector, respectively. Or the same feature extraction network is used for respectively extracting different intermediate feature vectors for the first feature vector and the second feature vector matched with the first feature vector; at this time, the first feature vector and the second feature vector matched with the first feature vector correspond to different target feature extraction layers respectively.

The present application is only described with respect to obtaining a first feature vector and a matching second feature vector for the same intermediate feature vector.

In order to obtain a first feature vector and a second feature vector based on the same intermediate feature vector and reduce the complexity of operation, the dimension size of the first dimension of the intermediate feature vector is the same as that of the first dimension of the corresponding first feature vector, and the dimension size of the second dimension of the intermediate feature vector is the same as that of the second dimension of the corresponding first intermediate feature vector; or the first dimension of the intermediate feature vector is the same as the dimension of the first dimension of the corresponding second feature vector, and the second dimension of the intermediate feature vector is the same as the dimension of the second dimension of the corresponding second intermediate feature vector.

For the case that the dimension of the first dimension of the intermediate feature vector is the same as the dimension of the first dimension of the corresponding first feature vector, and the dimension of the second dimension of the intermediate feature vector is the same as the dimension of the second dimension of the corresponding first intermediate feature vector, the first feature vector can be obtained in the following manner:

(1): extracting layer M aiming at the last layer of target feature₁Corresponding intermediate feature vector A₁According to the intermediate characteristic directionQuantity A₁Obtaining a first feature vector B corresponding to the intermediate feature vector₁。

Here, since the first dimension of the intermediate feature vector is the same as the dimension of the first dimension of the corresponding first feature vector, and the second dimension of the intermediate feature vector is the same as the dimension of the second dimension of the corresponding first feature vector, it is only necessary to use at least one convolution layer for the last target feature extraction layer M₁Corresponding intermediate feature vector A₁Performing convolution operation to realize the intermediate feature vector A₁To obtain the intermediate feature vector A₁Corresponding first feature vector B₁。

(2) Aiming at the ith target feature extraction layer M of the non-last layer_iThe obtained intermediate feature vector A_iBased on the intermediate feature vector A_iAnd the non-last target feature extraction layer M_iOther target feature extraction layer M thereafter₁～M_(i-1)The obtained intermediate feature vector A₁～A_(i-1)Obtaining the intermediate feature vector A_iCorresponding first feature vector B_i。

Here, the target feature extraction layer M_iThe method refers to the ith target feature extraction layer from the last target feature extraction layer in the feature extraction network, and i is an integer greater than 1.

Here, the layer M is extracted from the target feature_iCorresponding intermediate feature vector A_iAnd a target feature extraction layer M_iOther target feature extraction layer M thereafter₁～M_(i-1)The obtained intermediate feature vector A₁～A_(i-1)Obtaining the intermediate feature vector A_iCorresponding first feature vector B_iAny one of the following I and II may be used, but not limited thereto.

I: when a first feature vector corresponding to the intermediate feature vector of the current target feature extraction layer is obtained based on the intermediate feature vector obtained by the current target feature extraction layer (any one layer of target feature extraction layer which is not the last layer) and the intermediate feature vectors obtained by other target feature extraction layers after the current target feature extraction layer, establishing a direct relationship between the intermediate feature vector of the current target feature extraction layer and the intermediate feature vectors obtained by other target feature extraction layers after the current target feature extraction layer.

Illustratively, the target feature extraction layer M_iOther target feature extraction layer M thereafter₁～M_(i-1)The obtained intermediate feature vector A₁～A_(i-1)Dimension conversion is performed using different convolution layers, respectively, so that a feature vector A 'is obtained after the dimension conversion'₁～A’_(i-1)The dimension size of the first dimension and the dimension size of the second dimension of the target feature extraction layer M are respectively connected with the target feature extraction layer M_iThe obtained intermediate feature vector A_iIs the same as the dimension of the second dimension, and then intermediate feature vector a is applied_iAnd a feature vector A'₁～A’_(i-1)Overlapping to obtain the intermediate characteristic vector A_iCorresponding first feature vector B_i。

II: when a first feature vector corresponding to the intermediate feature vector of the current target feature extraction layer is obtained based on the intermediate feature vector obtained by the current target feature extraction layer and the intermediate feature vectors obtained by other target feature extraction layers behind the current target feature extraction layer, establishing an indirect relationship between the intermediate feature vector of the current target feature extraction layer and the intermediate feature vectors corresponding to other target feature extraction layers which do not have an adjacent relationship with the intermediate feature vector; and establishing a direct relation between the intermediate characteristic vector of the current target characteristic extraction layer and the intermediate characteristic vector corresponding to the next target characteristic extraction layer with adjacent relation.

Illustratively, referring to fig. 2, an embodiment of the present application provides a neural network for obtaining each first feature vector, including: a feature extraction network and a dimension transformation network.

It is assumed that a target feature extraction layer in a feature extraction network includes: m₁M3, M1M 3 are incrementally sized in a first dimension and a second dimension. Eyes of a userThe intermediate feature vectors corresponding to the marker feature extraction layers M1-M3 are A1-A3 respectively. And respectively carrying out A1-A3 on the intermediate feature vectors corresponding to the target feature extraction layers M1-M3.

M1 is the last target feature extraction layer, and the convolution operation is performed on it using the two convolution layers N11 and N12 to obtain the first feature vector B1 corresponding to a 1. The convolutional layer N1 is a feature vector C1 obtained by performing convolution operation on the intermediate feature vector.

In order to obtain the first eigenvector B2 corresponding to the intermediate eigenvector a2, the eigenvector C1 is first up-sampled to obtain the transformed eigenvector C '1 of the eigenvector C1, and at this time, the dimensions of the first dimension and the second dimension of the transformed eigenvector C' 1 are the same as the dimensions of the first dimension and the second dimension of the intermediate eigenvector a2, respectively. Then, the intermediate eigenvector a2 and the transformed eigenvector C' 1 are superimposed (superimposed layer P1) to obtain eigenvector D2, and the superimposed eigenvector D2 is convolved by using the two convolution layers N21 and N22 to obtain a first eigenvector B2 corresponding to the intermediate eigenvector a 2. The convolutional layer N21 performs convolution operation on the feature vector D2 to obtain a feature vector C2.

In order to obtain the first eigenvector B3 corresponding to the intermediate eigenvector A3, the eigenvector C2 is first up-sampled to obtain the transformed eigenvector C '2 of the eigenvector C2, and at this time, the dimensions of the first dimension and the second dimension of the transformed eigenvector C' 2 are the same as the dimensions of the first dimension and the second dimension of the intermediate eigenvector a2, respectively. Then, the intermediate eigenvector a3 and the transformed eigenvector C' 2 are superimposed (superimposed layer P2) to obtain eigenvector D3, and the superimposed eigenvector D3 is convolved by using the two convolution layers N31 and N32 to obtain a first eigenvector B3 corresponding to the intermediate eigenvector a 3.

The obtained first eigenvector B1, the first eigenvector B2 and the first eigenvector B3 are sent to the three first target detection models Q1 to Q3, respectively, and first target detection results corresponding to the first target detection models Q1 to Q3 are obtained, respectively.

It should be noted here that feature extraction networks and dimension transformation networks of other structures may also be designed according to actual needs. For example, the number of convolution layers is increased, or the number of target extraction layers and the number of corresponding first feature vectors and first target detection models are increased.

For the case that the dimension of the first dimension of the intermediate feature vector is the same as the dimension of the first dimension of the corresponding first feature vector, and the dimension of the second dimension of the intermediate feature vector is the same as the dimension of the second dimension of the corresponding first intermediate feature vector, the second feature vector can be obtained in the following manner:

(1) the second eigenvectors matched with the respective first eigenvectors may be obtained in a manner similar to the manner of obtaining the first eigenvectors.

Illustratively, as can be seen in fig. 3 below, there is provided a neural network for obtaining a first feature vector and a second feature vector, comprising: a feature extraction network, a first dimension transformation network, and a second dimension transformation network.

Specifically, the process of obtaining the first eigenvector and the second eigenvector through the first dimension transformation network and the second dimension transformation network is similar to the embodiment corresponding to fig. 2, and is not repeated here.

(2) Because the first size of the image cutting frame corresponding to the first feature vector is different from the second size of the image cutting frame corresponding to the second feature vector matched with the first feature vector, and the difference value between the third size corresponding to the first feature vector with the smallest dimension size in the plurality of first feature vectors and the fourth size corresponding to the second feature vector with the smallest dimension size in the plurality of second feature vectors is smaller than the set threshold value. Thus, in another case, the second feature vector cannot be directly derived from the intermediate feature vector.

In order to obtain a second feature vector from the intermediate feature vector, it is necessary to perform a filling operation on each dimension of the intermediate feature vector to generate target intermediate feature vectors corresponding to each intermediate feature vector;

generating second eigenvectors corresponding to the target intermediate eigenvectors according to the target intermediate eigenvectors;

the dimension sizes of the first dimension and the second dimension of the target intermediate feature vector and the corresponding second feature vector are the same.

For example, referring to fig. 4 below, another neural network for obtaining a first feature vector and a second feature vector is provided, which includes: the system comprises a feature extraction network, a first dimension transformation network, a second dimension transformation network and a dimension filling network.

The dimension filling network comprises a plurality of dimension filling modules for performing dimension filling. The dimension filling module has one-to-one correspondence with the intermediate feature vectors. After each intermediate feature vector enters a corresponding dimension filling module, completing dimension filling to obtain a target intermediate feature vector corresponding to each intermediate feature vector; and the target intermediate feature vectors enter a second dimension transformation network to obtain second feature vectors respectively corresponding to the second target detection networks.

Specifically, the filling dimension size can be obtained in the following manner.

Assuming the dimension of the intermediate feature vector is W × H, the dimension of the convolution kernel is: f1 × F2, top fill dimension: UP, the dimension of bottom filling is DP, the dimension of left side filling is LP, the dimension of right side filling is RP, the moving step length of convolution kernel is S, and the dimension Wnew H new of the image to be detected after convolution satisfies the following relational expression:

based on the two formulas, the dimension sizes of three intermediate and feature vectors obtained by the feature extraction network in the first dimension and the second dimension are assumed to be respectively: 13, 26, and 52, 52. The dimension sizes of the obtained second feature vectors in the first dimension and the second dimension are respectively as follows: 14, 28, 18 and 56. Assuming that the dimension of the convolution kernel is 3 × f and the step size is 1, the corresponding values are substituted into the above formula, and the obtained three filling dimension sizes are respectively:

LP+RP＝UP+DP＝4；

LP+RP＝UP+DP＝5；

LP+RP＝UP+DP＝7。

in connection with the foregoing S101, the target detection method provided in this embodiment of the present application, after obtaining the first eigenvector and the second eigenvector, further includes:

s102: and for each first feature vector, performing target detection on the image to be detected based on a first target detection model corresponding to the first feature vector, an image cutting frame corresponding to the first feature vector and a prior frame corresponding to the first target detection model to obtain a first target detection result corresponding to the first feature vector.

S103: and for each second feature vector, performing target detection on the image to be detected based on a second target detection model corresponding to the second feature vector, an image cutting frame corresponding to the second feature vector and a prior frame corresponding to the second target detection model to obtain a second target detection result corresponding to the second feature vector.

Wherein, the above S102 and S103 have no execution sequence; the execution may be synchronous or asynchronous, but S104 described below is executed after the execution of both S102 and S103 is completed.

Specifically, the present application takes YOLO as a first target detection model as an example, and describes the process of obtaining a first target detection result and a second target detection result respectively.

The process of obtaining the first target detection result includes:

a 1: through the above S101, three first feature vectors are obtained, which are: f11(feature map 1): 13 x 75; f2: 26 x 75; f3: 52*52*75.

It should be noted here that the dimension of each first eigenvector in each dimension may be adjusted according to actual needs, as long as it is ensured that the dimensional size laws of the three eigenvectors in the first dimension and the second dimension are sequentially from F11 to F13: n × n, 2n × 2n, 4n × 4 n. The dimension sizes of the first feature vectors in the third dimension are the same.

b 1: for the first feature vector F11, perform: inputting the first feature vector F11 into a first target detection model corresponding to the first feature vector F11, wherein the first target detection model divides the image to be detected into a plurality of grids according to the first size of the corresponding image cutting frame, and the number of the grids is: 13 x 13; each grid is responsible for obtaining target information of a target center point falling on the grid through regression of a group of prior frames with large dimensions. The regression result includes the position coordinates of the target in the image to be detected and the classification of the target, i.e., the first target detection result corresponding to the first feature vector F11.

c 1: for the first feature vector F12, perform: inputting the first feature vector F12 into a first target detection model corresponding to the first feature vector F12, wherein the first target detection model divides the image to be detected into a plurality of grids according to the first size of the corresponding image cutting frame, and the number of the grids is: 26 x 26; each grid is responsible for obtaining target information of a target center point falling on the grid through the prior frame regression of a group of dimension sizes. The regression result includes the position coordinates of the target in the image to be detected and the classification of the target, i.e., the first target detection result corresponding to the first feature vector F12.

d 1: for the first feature vector F13, perform: inputting the first feature vector F13 into a first target detection model corresponding to the first feature vector F13, wherein the first target detection model divides the image to be detected into a plurality of grids according to the first size of the corresponding image cutting frame, and the number of the grids is: 52 x 52; each grid is responsible for obtaining target information of a target center point falling on the grid through regression of a set of priori frames with small dimensions. The regression result includes the position coordinates of the target in the image to be detected and the classification of the target, i.e., the first target detection result corresponding to the first feature vector F13.

Wherein, the b 1-d 1 do not have a sequential execution order.

In the above b1 to d1, the large dimension, the medium dimension, and the small dimension are relative to each other. In a specific implementation process, the dimension size of the image cutting frame is not preset, but the number of the image to be detected to be cut into a plurality of grids according to the image cutting frame is preset. The number of meshes is consistent with the product of the dimension size values of the first dimension and the second dimension of the corresponding first feature vector. And during segmentation, segmentation is carried out according to the dimension size numerical values of the first dimension and the second dimension corresponding to the first feature vector. For example, if the dimension of the first feature vector is 13 × 75, the corresponding grid number is 13 × 13; if the dimension of a feature vector is 17 × 90, the corresponding number of meshes is 17 × 17. The smaller the number of the corresponding grids is, the larger the area corresponding to the grids is, so that the first target detection result can be obtained by the prior frame regression with the larger dimension size. The following cases b 2-d 2 are similar and thus will not be described again.

The process of obtaining the second target detection result includes:

a 2: through the above S101, three first feature vectors are obtained, which are: f21(feature map 1): 14 x 75; f2: 28 x 75; f3: 56*56*75.

It should be noted here that the dimension of each second eigenvector in each dimension may also be adjusted according to actual needs, as long as it is ensured that the dimensional size laws of the three second eigenvectors in the first dimension and the second dimension are sequentially from F21 to F23: (n + m) × (n + m), 2(n + m) × 2(n + m), 4(n + m) × 4(n + m) may be used. The dimension sizes of the second feature vectors in the third dimension are the same. Wherein m is smaller than a preset threshold value.

b 2: for the second feature vector F21, perform: inputting the second feature vector F21 into a second target detection model corresponding to the second feature vector F21, wherein the second target detection model divides the image to be detected into a plurality of grids according to the second size of the corresponding image cutting frame, and the number of the grids is: 14 x 14; each grid is responsible for obtaining target information of a target center point falling on the grid through regression of a group of prior frames with large dimensions. The regression result includes the position coordinates of the target in the image to be detected and the classification of the target, i.e., the second target detection result corresponding to the second feature vector F21.

c 2: for the second feature vector F22, perform: inputting the second feature vector F22 into a second target detection model corresponding to the second feature vector F22, wherein the second target detection model divides the image to be detected into a plurality of grids according to the second size of the corresponding image cutting frame, and the number of the grids is: 28 x 28; each grid is responsible for obtaining target information of a target center point falling on the grid through the prior frame regression of a group of dimension sizes. The regression result includes the position coordinates of the target in the image to be detected and the classification of the target, i.e., the second target detection result corresponding to the second feature vector F22.

d 2: for the second feature vector F23, perform: inputting the second feature vector F23 into a second target detection model corresponding to the second feature vector F3, wherein the second target detection model divides the image to be detected into a plurality of grids according to the second size of the corresponding image cutting frame, and the number of the grids is: 56 x 56; each grid is responsible for obtaining target information of a target center point falling on the grid through regression of a set of priori frames with small dimensions. The regression result includes the position coordinates of the target in the image to be detected and the classification of the target, i.e., the second target detection result corresponding to the second feature vector F3.

The b 2-d 2 do not have a sequential execution order.

Here, it should be noted that m is a difference between a first size corresponding to the first eigenvector with the smallest dimension and a second size corresponding to the second eigenvector with the smallest dimension, and the difference may be set according to actual needs, and may be set to 0.5, 1.6, 2, 2.3, or the like, in addition to 1.

In a specific implementation, the first feature vector is the same as the prior frame corresponding to the matched second feature vector. In addition, the prior boxes corresponding to the first feature vector and the matching second feature vector may be set to be different, but since a difference between a third size corresponding to the first feature vector with the smallest dimension and a fourth size corresponding to the second feature vector with the smallest dimension is smaller than a set threshold, it means that the mesh dimension sizes corresponding to the first feature vector and the matching second feature vector are relatively close, and therefore, the prior boxes corresponding to the first feature vector and the matching second feature vector may also be set to be close.

Receiving the above S102 and S103, after obtaining the first target detection result and the second target detection result, the target detection method provided in the embodiment of the present application further includes:

s104: and determining a target detection result of the image to be detected according to the first target detection result and the second target detection result.

In specific implementation, a preferred method of performing duplicate checking on the first target detection result and the second target detection result can be adopted to obtain the correct target detection results of all targets which are not repeated in the image to be detected. The target detection result includes: the position coordinates of the target in the image to be detected and the name or category of the target.

Example two

Referring to fig. 5, an embodiment of the present application further provides a specific method for obtaining a first target detection model and a second target detection model, where the method includes:

s501: and acquiring a plurality of sample images and target marking information corresponding to each sample image.

The target labeling information is the position of a target artificially labeled for the sample image, and a name or a classification corresponding to the target.

S502: aiming at each sample image, acquiring a plurality of different first sample characteristic vectors of the sample image and second sample characteristic vectors respectively matched with the first sample characteristic vectors; the first sample size of the image cutting frame corresponding to the first sample feature vector is different from the second sample size of the image cutting frame corresponding to the second sample feature vector matched with the first sample feature vector, and the difference value between the third sample size corresponding to the first sample feature vector with the smallest dimension among the plurality of first sample feature vectors and the fourth sample size corresponding to the second sample feature vector with the smallest dimension among the plurality of second sample feature vectors is smaller than the set threshold value.

Here, the first sample feature vectors and the second sample feature vectors respectively matching the respective first sample feature vectors may be obtained in the following manner:

inputting an image to be detected into a basic feature extraction network, and acquiring sample intermediate feature vectors corresponding to each target feature extraction layer from a plurality of target feature extraction layers in the basic feature extraction network; dimension sizes of the intermediate feature vectors of each sample in a first dimension and a second dimension are decreased in a front-back sequence of the target feature extraction layer;

determining a first sample feature vector and a second sample feature vector corresponding to each sample intermediate feature vector according to each sample intermediate feature vector; the first sample feature vector and the matched second sample feature vector correspond to the same target feature extraction layer;

the dimension sizes of the sample intermediate feature vector and the first dimension and the second dimension corresponding to the first sample feature vector are the same; or the dimension sizes of the first dimension and the second dimension of the sample intermediate feature vector and the corresponding second sample feature vector are the same.

A more detailed obtaining manner of the first sample feature vector and the second sample feature vector is similar to the specific obtaining manner of the first feature vector and the second feature vector, please refer to the description of the first embodiment specifically, and will not be described again here.

S503: and for each first sample feature vector, performing target detection on the sample image based on a first basic detection model corresponding to the first sample feature vector, an image cutting frame corresponding to the first sample feature vector and a prior frame corresponding to the first basic detection model to obtain a first sample target detection result corresponding to the first sample feature vector.

Here, the specific obtaining manner of the first target detection result is similar to that of the first target detection result, and please refer to the description of the first embodiment specifically, which is not repeated herein.

S504: and for each second sample feature vector, performing target detection on the sample image based on a second basic detection model corresponding to the second sample feature vector, an image cutting frame corresponding to the second sample feature vector and a prior frame corresponding to the second basic detection model to obtain a second sample target detection result corresponding to the second sample feature vector.

Here, the specific obtaining manner of the second sample target detection result is similar to the obtaining manner of the second target detection result, and please refer to the description of the first embodiment, which is not repeated herein.

Wherein, the above S503 and S504 have no execution sequence.

S505: and determining a sample target detection result of the sample image according to the first sample target detection result and the second sample target detection result.

Here, the specific obtaining manner of the sample target detection result is similar to the obtaining manner of the target detection result, and please refer to the description of the first embodiment, which is not repeated herein.

S506: and training the first basic detection model and the second basic detection model according to the sample target detection result and the target marking information corresponding to each sample image to obtain the first target detection model and the second target detection model.

In the specific implementation, the first basic detection model and the second basic detection model are trained according to the sample target detection result and the target labeling information corresponding to each sample image, that is, the parameters of the first basic detection model and the second basic detection model are adjusted according to the error between the sample target detection result and the target labeling information corresponding to each sample image, and the adjustment process is to make the error between the sample target detection result and the target labeling information corresponding to each sample image smaller and smaller in the overall trend, that is, to make the sample target detection result corresponding to each sample image more and more consistent with the target labeling information.

In addition, this embodiment further includes: and training the basic feature extraction network according to the sample target detection result and the target marking information corresponding to each sample image to obtain the feature extraction network.

Here, the first target detection model, the second target detection model, and the feature extraction network are obtained by synchronous training.

In addition, this embodiment further includes: and training the basic dimension transformation network according to the sample target detection result and the target marking information corresponding to each sample image to obtain the basic dimension transformation network corresponding to the first embodiment.

Through the model training process, a first target detection model, a second target detection model, a feature extraction network and a dimension transformation network in the embodiment of the application are obtained.

In addition, the dimension transformation network and the feature extraction network may also be obtained by training in advance.

The first target detection model, the second target detection model, the feature extraction network and the basic dimension transformation network form a target detection model, so that the precision of a target detection result can be improved.

Based on the same inventive concept, the embodiment of the present application further provides a target detection apparatus corresponding to the target detection method, and since the principle of the apparatus in the embodiment of the present application for solving the problem is similar to the target detection method described above in the embodiment of the present application, the implementation of the apparatus may refer to the implementation of the method, and repeated details are not repeated.

EXAMPLE III

Referring to fig. 6, which is a schematic view of an object detection apparatus provided in the fifth embodiment of the present application, the apparatus includes: an acquisition module 61, a first recognition module 62, a second recognition module 63 and a third recognition module 64; wherein,

the obtaining module 61 is configured to obtain a plurality of different first feature vectors of an image to be detected, and second feature vectors respectively matched with the first feature vectors; the first size of the image cutting frame corresponding to the first feature vector is different from the second size of the image cutting frame corresponding to the second feature vector matched with the first feature vector, and the difference value between the third size corresponding to the first feature vector with the smallest dimension among the plurality of first feature vectors and the fourth size corresponding to the second feature vector with the smallest dimension among the plurality of second feature vectors is smaller than a set threshold value; the dimension size refers to the number of elements corresponding to the target dimension;

a first identification module 62, configured to, for each first feature vector, perform target detection on the to-be-detected image based on a first target detection model corresponding to the first feature vector, an image cutting frame corresponding to the first feature vector, and a prior frame corresponding to the first target detection model, to obtain a first target detection result corresponding to the first feature vector, and

a second identification module 63, configured to perform, for each second feature vector, target detection on the image to be detected based on a second target detection model corresponding to the second feature vector, an image cutting frame corresponding to the second feature vector, and a prior frame corresponding to the second target detection model, so as to obtain a second target detection result corresponding to the second feature vector;

and a third identifying module 64, configured to determine a target detection result of the to-be-detected image according to the first target detection result and the second target detection result.

In an alternative embodiment, the first feature vector and the matching second feature vector correspond to the same prior frame.

In an alternative embodiment, the target dimension includes a first dimension and a second dimension; the obtaining module 61 is configured to obtain a plurality of different first feature vectors of the image to be detected and second feature vectors respectively matched with the first feature vectors in the following manner:

inputting the image to be detected into a pre-trained feature extraction network, and acquiring intermediate feature vectors corresponding to the target feature extraction layers from a plurality of target feature extraction layers in the feature extraction network; the dimension sizes of the intermediate feature vectors in the first dimension and the second dimension are decreased in a front-back sequence of the target feature extraction layer;

determining the first feature vector and the second feature vector corresponding to each intermediate feature vector according to each intermediate feature vector; the first feature vector and the matched second feature vector correspond to the same target feature extraction layer;

the dimension sizes of the first dimension and the second dimension of the intermediate feature vector and the corresponding first feature vector are the same; or the dimension sizes of the first dimension and the second dimension of the intermediate feature vector and the corresponding second feature vector are the same.

In an optional implementation manner, for a case that the intermediate feature vectors and the first and second dimensions of the corresponding first feature vectors are the same in dimension size, the obtaining module 61 is configured to determine the first feature vector corresponding to each intermediate feature vector by using the following manner, including:

aiming at the intermediate feature vector corresponding to the last layer of the target feature extraction layer, acquiring a first feature vector corresponding to the intermediate feature vector according to the intermediate feature vector;

and aiming at the intermediate characteristic vector obtained by the non-last layer of the target characteristic extraction layer, obtaining a first characteristic vector corresponding to the intermediate characteristic vector according to the intermediate characteristic vector and the intermediate characteristic vectors obtained by other target characteristic extraction layers behind the non-last layer of the target characteristic extraction layer.

In an optional implementation manner, for a case that the intermediate feature vectors and the first dimension and the second dimension of the corresponding first feature vector are the same in dimension, the obtaining module 61 is configured to obtain the second feature vector corresponding to each intermediate feature vector by:

filling each intermediate feature vector to generate target intermediate feature vectors corresponding to the intermediate feature vectors respectively;

generating a second feature vector corresponding to each target intermediate feature vector according to each target intermediate feature vector;

the dimension sizes of the first dimension and the second dimension of the target intermediate characteristic vector and the corresponding second characteristic vector are the same.

In an optional implementation, the first target detection model and the second target detection model are both YOLO target detection models.

In an alternative embodiment, the method further comprises: a model training module 65, configured to obtain a first target detection model and a second target detection model in the following manner:

acquiring a plurality of sample images and target marking information corresponding to each sample image;

aiming at each sample image, acquiring a plurality of different first sample characteristic vectors of the sample image and second sample characteristic vectors respectively matched with the first sample characteristic vectors; the first sample size of the image cutting frame corresponding to the first sample feature vector is different from the second sample size of the image cutting frame corresponding to the second sample feature vector matched with the first sample feature vector, and the difference value between the third sample size corresponding to the first sample feature vector with the smallest dimension among the plurality of first sample feature vectors and the fourth sample size corresponding to the second sample feature vector with the smallest dimension among the plurality of second sample feature vectors is smaller than the set threshold value;

for each first sample feature vector, performing target detection on the sample image based on a first basic detection model corresponding to the first sample feature vector, an image cutting frame corresponding to the first sample feature vector and a prior frame corresponding to the first basic detection model to obtain a first sample target detection result corresponding to the first sample feature vector, and

for each second sample feature vector, performing target detection on the sample image based on a second basic detection model corresponding to the second sample feature vector, an image cutting frame corresponding to the second sample feature vector and a prior frame corresponding to the second basic detection model to obtain a second sample target detection result corresponding to the second sample feature vector;

determining a sample target detection result of the sample image according to the first sample target detection result and the second sample target detection result;

and training the first basic detection model and the second basic detection model according to a sample target detection result and target labeling information corresponding to each sample image to obtain a first target detection model and a second target detection model.

In an alternative embodiment, the model training module 65 is configured to obtain, for each sample image, a plurality of different first sample feature vectors of the sample image, and second sample feature vectors respectively matched with the first sample feature vectors in the following manner:

inputting the image to be detected into a basic feature extraction network, and acquiring sample intermediate feature vectors corresponding to the target feature extraction layers from a plurality of target feature extraction layers in the basic feature extraction network; dimension sizes of the sample intermediate feature vectors in a first dimension and a second dimension are decreased in a front-back sequence of the target feature extraction layer;

determining the first sample feature vector and the second sample feature vector corresponding to each sample intermediate feature vector according to each sample intermediate feature vector; the first sample feature vector and the matched second sample feature vector correspond to the same target feature extraction layer;

In an alternative embodiment, the model training module 65 is further configured to: and training the basic feature extraction network according to a sample target detection result and target marking information corresponding to each sample image to obtain the feature extraction network.

Example four

An embodiment of the present application further provides a computer device 700, as shown in fig. 7, which is a schematic structural diagram of the computer device 700 provided in the embodiment of the present application, and includes:

a processor 71, a memory 72, and a bus 73; the memory 72 is used for storing execution instructions and includes a memory 721 and an external memory 722; the memory 721 is also referred to as an internal memory, and is used for temporarily storing the operation data in the processor 71 and the data exchanged with the external memory 722 such as a hard disk, the processor 71 exchanges data with the external memory 722 through the memory 721, and when the computer device 700 is operated, the processor 71 communicates with the memory 72 through the bus 73, so that the processor 71 executes the following instructions in a user mode:

for each first feature vector, performing target detection on the image to be detected based on a first target detection model corresponding to the first feature vector, an image cutting frame corresponding to the first feature vector and a prior frame corresponding to the first target detection model to obtain a first target detection result corresponding to the first feature vector, and

In one possible implementation, the processor 71 executes an instruction in which the first feature vector and the matching second feature vector correspond to the same prior box.

In one possible embodiment, processor 71 executes instructions in which the target dimensions include a first dimension and a second dimension; obtain a plurality of different first eigenvectors of waiting to examine the detected image to and the second eigenvector that matches respectively with each first eigenvector, include:

In one possible embodiment, the instruction executed by the processor 71 to determine, according to each intermediate feature vector, the first feature vector corresponding to each intermediate feature vector when the intermediate feature vector and the corresponding first feature vector have the same dimension size in both the first dimension and the second dimension, includes:

In one possible embodiment, the instruction executed by the processor 71 to obtain, according to each intermediate feature vector, a second feature vector corresponding to each intermediate feature vector when the intermediate feature vector and the corresponding first feature vector have the same dimension size in both the first dimension and the second dimension includes:

In one possible implementation, processor 71 executes instructions in which the first target detection model and the second target detection model are both YOLO target detection models.

In one possible embodiment, the instructions executed by the processor 71 obtain the first object detection model and the second object detection model by:

aiming at each sample image, acquiring a plurality of different first sample characteristic vectors of the sample image and second sample characteristic vectors respectively matched with the first sample characteristic vectors; the size of a first sample of the image cutting frame corresponding to the first sample feature vector is different from the size of a second sample of the image cutting frame corresponding to a second sample feature vector matched with the first sample feature vector, and the difference value between the size of a third sample corresponding to the first sample feature vector with the smallest dimension size and the size of a fourth sample corresponding to the second sample feature vector with the smallest dimension size is smaller than the set threshold value;

In one possible embodiment, the instructions executed by the processor 71 to obtain, for each sample image, a plurality of different first sample feature vectors of the sample image and second sample feature vectors respectively matched with the first sample feature vectors include:

In a possible implementation, in the instructions executed by the processor 71, the method further includes: and training the basic feature extraction network according to a sample target detection result and target marking information corresponding to each sample image to obtain the feature extraction network.

An embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by the processor 71, the steps of the object detection method are performed.

Specifically, the storage medium can be a general-purpose storage medium, such as a removable disk, a hard disk, or the like, and when a computer program on the storage medium is executed, the above-mentioned target detection method can be executed, so that the problem of a large error in a target detection result of an image to be detected due to missing detection of a grid edge portion can be reduced, and the target detection accuracy can be improved.

The computer program product of the target detection method provided in the embodiment of the present application includes a computer-readable storage medium storing a program code, where instructions included in the program code may be used to execute the method described in the foregoing method embodiment, and specific implementation may refer to the method embodiment, which is not described herein again.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope disclosed in the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the exemplary embodiments of the present application, and are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of object detection, comprising:

2. The method of claim 1, wherein the first eigenvector corresponds to the same prior frame as the matched second eigenvector.

3. The object detection method of claim 1, wherein the object dimensions include a first dimension and a second dimension; the acquiring a plurality of different first feature vectors of the image to be detected and second feature vectors respectively matched with the first feature vectors comprise:

determining the first feature vector corresponding to each intermediate feature vector and a matched second feature vector according to each intermediate feature vector; the first feature vector and the matched second feature vector correspond to the same target feature extraction layer;

4. The object detection method according to claim 3, wherein, for a case where the intermediate feature vectors and the first and second dimensions of the corresponding first feature vectors are the same in dimension size, determining the first feature vector corresponding to each intermediate feature vector from each of the intermediate feature vectors comprises:

5. The object detection method according to claim 3, wherein, for a case where the intermediate feature vectors are the same as the first dimension and the second dimension of the corresponding first feature vector in dimension, obtaining a second feature vector corresponding to each intermediate feature vector according to each intermediate feature vector comprises:

6. The object detection method of claim 1, wherein the first object detection model and the second object detection model are both YOLO object detection models.

7. The object detection method according to claim 1, wherein the first object detection model and the second object detection model are obtained by:

8. The object detection method of claim 7, wherein the object dimensions include a first dimension and a second dimension; the acquiring, for each sample image, a plurality of different first sample feature vectors of the sample image and second sample feature vectors respectively matched with the respective first sample feature vectors includes:

determining the first sample feature vector corresponding to each sample intermediate feature vector and a matched second sample feature vector according to each sample intermediate feature vector; the first sample feature vector and the matched second sample feature vector correspond to the same target feature extraction layer;

9. The object detection method of claim 8, further comprising:

and training the basic feature extraction network according to a sample target detection result and target marking information corresponding to each sample image to obtain the feature extraction network.

10. A computer device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when a computer device is running, the machine-readable instructions when executed by the processor performing the steps of the object detection method of any one of claims 1 to 9.

11. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, performs the steps of the object detection method as claimed in any one of the claims 1 to 9.