CN111767935A

CN111767935A - Target detection method and device and electronic equipment

Info

Publication number: CN111767935A
Application number: CN201911056322.7A
Authority: CN
Inventors: 张凯; 谭文明; 李哲暘; 石大虎
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2019-10-31
Filing date: 2019-10-31
Publication date: 2020-10-13
Anticipated expiration: 2039-10-31
Also published as: CN111767935B

Abstract

The embodiment of the invention provides a target detection method, a target detection device and electronic equipment. Wherein the method comprises the following steps: acquiring a plurality of initial image characteristics of an image to be processed under a plurality of down-sampling magnifications; for each initial image feature under other down-sampling magnifications except the target down-sampling magnification, zooming the initial image feature to the target down-sampling magnification to obtain a mapping image feature; fusing the initial image features and the mapping image features under the target down-sampling magnification to obtain fused image features under the target down-sampling magnification; scaling the fused image features under the target down-sampling magnification to each of the other down-sampling magnifications to obtain the fused image features under each of the other down-sampling magnifications; and performing target detection on the image to be processed based on the fusion image characteristics of the image to be processed under the plurality of down-sampling magnifications. The accuracy of target detection can be improved.

Description

Target detection method and device and electronic equipment

Technical Field

The present invention relates to the field of machine vision, and in particular, to a target detection method, device and electronic device.

Background

Based on machine vision technology, the computer can automatically recognize the target existing in the image (hereinafter referred to as target detection), and perform corresponding processing for the target. For example, in video surveillance, a computer may identify and monitor people in an image.

In the related art, image features of an image may be extracted through a plurality of successive down-sampling processes, and feature regression may be performed on the extracted image features to determine positions of respective targets in the image as a detection result. However, if the number of times of the down-sampling processing is performed is large, a large amount of texture information may be lost in the down-sampling processing, and if the number of times of the down-sampling processing is performed is small, the obtained image features are shallow and contain a small amount of semantic information. Therefore, the image features extracted in the related art are difficult to accurately express the features of the image, that is, the image features are not accurate enough, and the accuracy of the detection result determined based on the inaccurate image features is also low.

Disclosure of Invention

The embodiment of the invention aims to provide a target detection method, a target detection device and electronic equipment, so as to improve the accuracy of a target detection result. The specific technical scheme is as follows:

in a first aspect of the present invention, there is provided a target detection method, the method comprising:

acquiring a plurality of initial image characteristics of an image to be processed under a plurality of down-sampling magnifications;

for each initial image feature under other down-sampling magnifications except the target down-sampling magnification, zooming the initial image feature to the target down-sampling magnification to obtain a mapping image feature;

fusing the initial image features and the mapping image features under the target down-sampling magnification to obtain fused image features under the target down-sampling magnification;

scaling the fused image features under the target down-sampling magnification to each of the other down-sampling magnifications to obtain the fused image features under each of the other down-sampling magnifications;

and performing target detection on the image to be processed based on the fusion image characteristics of the image to be processed under the plurality of down-sampling magnifications.

In a possible embodiment, the scaling the initial image feature to the target down-sampling magnification to obtain a mapped image feature includes:

repeatedly executing the following operations until the mapping image characteristics are obtained:

scaling the initial image feature to a next down-sampling magnification, wherein the next down-sampling magnification is a next down-sampling magnification which is close to a target down-sampling magnification from the down-sampling magnification to which the initial image feature belongs when the plurality of different down-sampling magnifications are sequenced according to sizes;

if the next down-sampling magnification is the target down-sampling magnification, the zoomed image feature is used as the mapping image feature;

and if the next down-sampling magnification is not the target down-sampling magnification, fusing the zoomed image features with the initial image features under the next down-sampling magnification to obtain new initial image features.

In a possible embodiment, the fusing the scaled image features with the initial image features at the next down-sampling magnification to obtain new initial image features includes:

and fusing the zoomed image features with the initial image features under the next down-sampling magnification and the initial image features zoomed from the last down-sampling magnification to the next down-sampling magnification to obtain new initial image features, wherein the last down-sampling magnification is the next down-sampling magnification from the down-sampling magnification to which the initial image features belong when the different down-sampling magnifications are sorted according to size to the target down-sampling magnification.

In a possible embodiment, the scaling the fused image feature at the target down-sampling magnification to each of the other down-sampling magnifications to obtain the fused image feature at each of the other down-sampling magnifications includes:

scaling the fused image features under the target down-sampling magnification to each of the other down-sampling magnifications;

and for each of the other down-sampling magnifications, fusing the image features scaled to the other down-sampling magnifications with the initial image features of the other down-sampling magnifications to obtain fused image features under the other down-sampling magnifications.

In a possible embodiment, the scaling, for each initial image feature at a down-sampling magnification other than the target down-sampling magnification, the initial image feature to the target down-sampling magnification to obtain a mapped image feature includes:

and scaling the initial image features to the target down-sampling magnification by one-time scaling aiming at each initial image feature under other down-sampling magnifications except the target down-sampling magnification to obtain the mapping image features.

In a second aspect of the present invention, there is provided an object detection apparatus, the apparatus comprising:

the characteristic extraction module is used for acquiring a plurality of initial image characteristics of the image to be processed under a plurality of down-sampling multiplying powers;

the characteristic mapping module is used for scaling the initial image characteristics to the target down-sampling multiplying power aiming at the initial image characteristics under other down-sampling multiplying powers except the target down-sampling multiplying power to obtain mapped image characteristics;

the feature fusion module is used for scaling the fusion image features under the target down-sampling magnification to each of the other down-sampling magnifications to obtain the fusion image features under each of the other down-sampling magnifications;

and the characteristic regression module is used for carrying out target detection on the image to be processed based on the fusion image characteristics of the image to be processed under the plurality of down-sampling multiplying powers.

In a possible embodiment, the feature mapping module is specifically configured to repeatedly perform the following operations until the mapped image feature is obtained:

In a possible embodiment, the feature mapping module is specifically configured to fuse the scaled image feature with an initial image feature at a next down-sampling magnification and an initial image feature scaled from a previous down-sampling magnification to a next down-sampling magnification to obtain a new initial image feature, where the previous down-sampling magnification is a next down-sampling magnification that is far from a target down-sampling magnification from a down-sampling magnification to which the initial image feature belongs when the multiple different down-sampling magnifications are sorted according to size.

In a possible embodiment, the feature fusion module is specifically configured to scale the fused image feature at the target down-sampling magnification to each of the other down-sampling magnifications;

In a possible embodiment, the feature mapping module is specifically configured to, for each initial image feature at a down-sampling magnification other than the target down-sampling magnification, scale the initial image feature to the target down-sampling magnification by one-time scaling to obtain a mapped image feature.

In a third aspect of the present invention, there is provided an electronic device comprising:

a memory for storing a computer program;

a processor adapted to perform the method steps of any of the above first aspects when executing a program stored in the memory.

In a fourth aspect of the present invention, a computer-readable storage medium is provided, having stored therein a computer program which, when executed by a processor, performs the method steps of any of the above-mentioned first aspects.

According to the target detection method, the target detection device and the electronic equipment, the initial image features of multiple down-sampling magnifications can be zoomed to the same target down-sampling magnification and fused to obtain the fused image features with rich semantic information and texture information, and the fused image features are zoomed to other down-sampling magnifications, so that the fused image features under each down-sampling magnification can better express the features of the image to be processed, and the accuracy of the obtained detection result is determined to be higher according to the fused image features. Of course, not all of the advantages described above need to be achieved at the same time in the practice of any one product or method of the invention.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flow chart of a target detection method according to an embodiment of the present invention;

fig. 2a is a schematic structural diagram of a feature fusion network according to an embodiment of the present invention;

fig. 2b is a schematic structural diagram of a feature fusion network according to an embodiment of the present invention;

fig. 2c is a schematic structural diagram of a feature fusion network according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of an object detection apparatus according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, fig. 1 is a schematic flow chart of a target detection method according to an embodiment of the present invention, which may include:

s101, acquiring a plurality of initial image characteristics of an image to be processed under a plurality of down-sampling magnifications.

The multiple down-sampling magnifications may be different according to different application scenarios, and may be, for example, 8 times down-sampling, 16 times down-sampling, 32 times down-sampling, and 64 times down-sampling, respectively. Since the principle of target detection is the same even though the multiple down-sampling magnifications are different, the following description will be given by taking an example in which the multiple down-sampling magnifications are 8 times down-sampling, 16 times down-sampling, 32 times down-sampling, and 64 times down-sampling, and the same principle can be obtained in other cases, and will not be described again.

The initial image features of the image to be processed under a plurality of down-sampling magnifications can be obtained by utilizing a feature extraction network with a plurality of convolution layers. And the initial image feature under 16 times down-sampling may be obtained by down-sampling the initial image feature under 8 times down-sampling by 2 times. The 2-fold down-sampling may be achieved by a convolution process with a step size of 2.

S102, aiming at each initial image feature under other down-sampling magnifications except the target down-sampling magnification, the initial image feature is zoomed to the target down-sampling magnification to obtain the mapping image feature.

The target down-sampling magnification is one down-sampling magnification of a plurality of down-sampling magnifications, and the target down-sampling magnification is set according to actual needs or user experience, for example, the target down-sampling magnification may be a down-sampling magnification with the smallest magnification among the plurality of down-sampling magnifications, or may be a down-sampling magnification with the largest magnification among the plurality of down-sampling magnifications, or may not be the smallest or the largest down-sampling magnification, which is not limited in this embodiment.

For any other down-sampling magnification, if the other down-sampling magnification is greater than the target down-sampling magnification, the initial image features of the other down-sampling magnification need to be up-sampled to scale the initial image features of the other down-sampling magnification to the target down-sampling magnification. If the other down-sampling magnification is smaller than the target down-sampling magnification, the initial image features of the other down-sampling magnification need to be down-sampled so as to scale the initial image features of the other down-sampling magnification to the target down-sampling magnification.

S103, fusing the initial image features and the mapping image features under the target down-sampling magnification to obtain fused image features under the target down-sampling magnification.

Each mapping image feature can be regarded as a matrix, and therefore, the elements at the corresponding positions of the mapping image features are added through element-level (elementary-wise) addition operation to obtain the fused image feature.

And S104, scaling the fused image features under the target down-sampling magnification to each other down-sampling magnification to obtain the fused image features under each other down-sampling magnification.

It can be understood that the fused image features at the target down-sampling magnification are obtained by fusing original image features at the sampling magnification after zooming to a uniform target down-sampling magnification, and because each down-sampling magnification has a down-sampling magnification with a higher magnification and a down-sampling magnification with a lower magnification, the fused image features have abundant semantic features and texture features at the same time. The fused image features under each down-sampling magnification obtained by scaling according to the fused image features under the target down-sampling magnification also have richer semantic features and texture features.

In one possible embodiment, the fused image feature at the target down-sampling magnification may be scaled to each of other down-sampling magnifications, and for each of the other down-sampling magnifications, the image feature scaled to the other down-sampling magnification is fused with the initial image feature at the other down-sampling magnification to obtain the fused image feature at the other down-sampling magnification.

For example, the multiple down-sampling magnifications are 8 down-sampling, 16 down-sampling, 32 down-sampling and 64 down-sampling respectively, and the target down-sampling magnification is 8 down-sampling magnifications. The obtained fused image features under 8 times of downsampling can be respectively scaled to 16 times of downsampling, 32 times of downsampling and 64 times of downsampling. And the image features which are zoomed to 16 times of downsampling are fused with the initial image features under 16 times of downsampling to obtain fused image features under 16 times of downsampling. And fusing the image features which are zoomed to 32 times and subjected to down sampling with the initial image features subjected to the 32 times of down sampling to obtain fused image features subjected to the 32 times of down sampling, and fusing the image features which are zoomed to 64 times and subjected to the 64 times of down sampling with the initial image features subjected to the 64 times of down sampling to obtain fused image features subjected to the 64 times of down sampling.

And S105, performing target detection on the image to be processed based on the fusion image characteristics of the image to be processed under the multiple down-sampling magnifications.

The target of the target existing in the image to be processed and the position of the target can be determined by performing feature regression on the fused image features under a plurality of down-sampling magnifications. The position of the object may be represented in the form of a frame of the object placed at the position of the object in the image to be processed.

By adopting the embodiment, the initial image features of a plurality of down-sampling magnifications can be zoomed to the same target down-sampling magnification and fused to obtain the fused image features with rich semantic information and texture information, and then the fused image features are zoomed to other down-sampling magnifications, so that the fused image features under each down-sampling magnification can better express the features of the image to be processed, and the obtained detection result is determined to be higher in accuracy according to the fused image features.

The manner in which the initial image features at other down-sampling magnifications are scaled to the target down-sampling magnification will be described below. In one possible embodiment, the mapping image feature may be obtained by scaling each initial image feature at a down-sampling magnification other than the target down-sampling magnification to the target down-sampling magnification by one scaling. In other possible embodiments, for each initial image feature at a down-sampling magnification other than the target down-sampling magnification, the initial image feature may be scaled to the target down-sampling magnification by multiple scaling.

For example, in one possible embodiment, the following operations may be repeatedly performed until the mapped image features are obtained:

scaling the initial image feature to a next down-sampling magnification. And if the next down-sampling magnification is the target down-sampling magnification, taking the scaled image characteristics as the mapping image characteristics. And if the next down-sampling magnification is not the target down-sampling magnification, fusing the zoomed image features with the initial image features under the next down-sampling magnification to obtain new initial image features.

And when the next down-sampling multiplying power is that a plurality of different down-sampling multiplying powers are sequenced according to sizes, the next down-sampling multiplying power which is close to the target down-sampling multiplying power from the down-sampling multiplying power which the initial image characteristic belongs to is obtained. For example, the multiple down-sampling magnifications are 8 times down-sampling, 16 times down-sampling, 32 times down-sampling and 64 times down-sampling respectively, and the target down-sampling magnification is 8 times down-sampling magnification. The next down-sampling rate of 16 times down-sampling is 8 times down-sampling, the next down-sampling rate of 32 times down-sampling is 16 times down-sampling, and the next down-sampling rate of 64 times down-sampling is 32 times down-sampling.

For the initial image feature at 16 times down-sampling, the initial image feature may be scaled to 8 times down-sampling to obtain the mapped image feature.

For the initial image feature under 32 times of downsampling, the initial image feature may be scaled to 16 times of downsampling and fused with the initial image feature under 16 times of downsampling to obtain a new initial image feature. The new initial image feature is the initial image feature at 16 times down-sampling, so reference can be made to the previous description of the initial image feature at 16 times down-sampling.

For the initial image feature under 64 times of downsampling, the initial image feature may be scaled to 64 times of downsampling and fused with the initial image feature under 32 times of downsampling to obtain a new initial image feature. The new initial image feature is the initial image feature at 32 times down-sampling, so the description about the initial image feature at 32 times down-sampling can be referred to before.

An implementation of the above flow can be seen in the network structure shown in fig. 2a, where C3 represents the image feature at 8 times down-sampling, C4 represents the image feature at 16 times down-sampling, C5 represents the image feature at 32 times down-sampling, and C6 represents the image feature at 64 times down-sampling. The cells in the leftmost column of the network can be considered as input, and the input is the initial image feature at each sampling magnification. The horizontal arrows may indicate convolution processing, and may be convolution processing with 1 × 1 convolution kernel and step 1, for example. The arrow to the upper right indicates the up-sampling process. Taking the unit in the first row and the second column in fig. 2a as an example, the input of the unit is the initial image feature under 8 times downsampling through convolution processing and the initial image feature under 16 times downsampling through upsampling processing, and the unit fuses the two input image features to obtain a new initial image feature under 8 times downsampling.

In yet another possible embodiment, the following operations may be repeatedly performed until the mapped image features are obtained:

scaling the initial image feature to a next down-sampling magnification. And if the next down-sampling magnification is the target down-sampling magnification, taking the scaled image characteristics as the mapping image characteristics. And if the next down-sampling magnification is not the target down-sampling magnification, fusing the zoomed image characteristics with the initial image characteristics under the next down-sampling magnification and the initial image characteristics when the previous down-sampling magnification is zoomed to the next down-sampling magnification to obtain new initial image characteristics.

And when the last down-sampling multiplying power is a plurality of different down-sampling multiplying powers which are sequenced according to sizes, starting from the down-sampling multiplying power of the initial image characteristic to the next down-sampling multiplying power far away from the target down-sampling multiplying power. For example, the multiple down-sampling magnifications are 8 times down-sampling, 16 times down-sampling, 32 times down-sampling and 64 times down-sampling respectively, and the target down-sampling magnification is 8 times down-sampling magnification. The last down-sampling multiplying power of 8 times down-sampling is 16 times down-sampling, the next down-sampling multiplying power of 16 times down-sampling is 32 times down-sampling, and the next down-sampling multiplying power of 32 times down-sampling is 64 times down-sampling.

An implementation may be as shown in fig. 2b, where the upward arrow represents the upsampling process. Taking the unit in the first row and the second column in fig. 2b as an example, the input of the unit is the initial image feature after 8 times of downsampling through convolution processing, the initial image feature after 16 times of downsampling through upsampling processing, and the image feature after scaling the initial image feature after 32 times of downsampling to 16 times of downsampling and then scaling to 8 times of downsampling magnification, and the unit fuses the two input image features to obtain a new initial image feature after 8 times of downsampling.

For each initial image feature at the down-sampling magnification other than the target down-sampling magnification, the initial image feature is scaled to the target down-sampling magnification by one scaling to obtain the mapped image feature, which can be seen in fig. 2 c. For the description of the marks in fig. 2c, reference may be made to the related description of fig. 2a and fig. 2b, which is not repeated herein.

It can be understood that compared with the structures shown in fig. 2a and 2b, the structure shown in fig. 2c is more simplified, and the calculation amount of target detection can be effectively saved. The structures shown in fig. 2a and 2b can enable the initial image features under different down-sampling magnifications to be mixed more fully, so that the obtained fused image features are more accurate. Any one of the structures in fig. 2a, fig. 2b, or fig. 2c may be selected according to actual requirements, and other structures may also be selected to implement target detection, which is not limited in this embodiment.

Referring to fig. 3, fig. 3 is a schematic structural diagram of an object detection apparatus according to an embodiment of the present invention, which may include:

the feature extraction module 301 is configured to obtain multiple initial image features of an image to be processed at multiple down-sampling magnifications;

a feature mapping module 302, configured to, for each initial image feature at a down-sampling magnification other than the target down-sampling magnification, scale the initial image feature to the target down-sampling magnification to obtain a mapped image feature;

the feature fusion module 303 is configured to scale the fusion image features at the target down-sampling magnification to each of the other down-sampling magnifications to obtain fusion image features at each of the other down-sampling magnifications;

a reverse transfer module 304, configured to scale the fused image feature at the target down-sampling magnification to each of the other down-sampling magnifications to obtain the fused image feature at each of the other down-sampling magnifications

And the feature regression module 305 is configured to perform target detection on the image to be processed based on the fusion image features of the image to be processed at multiple down-sampling magnifications.

scaling the initial image feature to a next down-sampling magnification, wherein the next down-sampling magnification is a next down-sampling magnification which is close to a target down-sampling magnification from the down-sampling magnification to which the initial image feature belongs when a plurality of different down-sampling magnifications are sequenced according to sizes;

if the next down-sampling magnification is the target down-sampling magnification, the zoomed image feature is taken as the mapping image feature;

In a possible embodiment, the feature mapping module is specifically configured to fuse the scaled image feature with an initial image feature at a next down-sampling magnification and an initial image feature scaled from a previous down-sampling magnification to a next down-sampling magnification to obtain a new initial image feature, where the previous down-sampling magnification is a next down-sampling magnification that is far from a target down-sampling magnification from a down-sampling magnification to which the initial image feature belongs when the previous down-sampling magnification is sorted by size for a plurality of different down-sampling magnifications.

In a possible embodiment, the feature fusion module is specifically configured to scale the fusion image features at the target down-sampling magnification to each of the other down-sampling magnifications;

and for each other down-sampling magnification, fusing the image features zoomed to the other down-sampling magnification with the initial image features of the other down-sampling magnification to obtain fused image features under the other down-sampling magnification.

In a possible embodiment, the feature mapping module is specifically configured to, for each initial image feature at a down-sampling magnification other than the target down-sampling magnification, scale the initial image feature to the target down-sampling magnification by one-time scaling to obtain the mapped image feature.

An embodiment of the present invention further provides an electronic device, as shown in fig. 4, which may include:

a memory 401 for storing a computer program;

the processor 402, when executing the program stored in the memory 401, implements the following steps:

scaling the initial image features to a target down-sampling magnification to obtain mapping image features aiming at all the initial image features under other down-sampling magnifications except the target down-sampling magnification;

scaling the fused image features under the target down-sampling magnification to each other down-sampling magnification to obtain the fused image features under each other down-sampling magnification;

and performing target detection on the image to be processed based on the fusion image characteristics of the image to be processed under a plurality of down-sampling magnifications.

In one possible embodiment, scaling the initial image feature to a target down-sampling magnification to obtain a mapped image feature comprises:

In one possible embodiment, fusing the scaled image features with the initial image features at the next down-sampling magnification to obtain new initial image features, including:

and fusing the zoomed image features with the initial image features under the next down-sampling magnification and the initial image features zoomed from the last down-sampling magnification to the next down-sampling magnification to obtain new initial image features, wherein when the last down-sampling magnification is a plurality of different down-sampling magnifications which are sorted according to size, the next down-sampling magnification which is far away from the target down-sampling magnification from the down-sampling magnification to which the initial image features belong is obtained.

In one possible embodiment, scaling the fused image feature at the target down-sampling magnification to each of the other down-sampling magnifications to obtain the fused image feature at each of the other down-sampling magnifications includes:

scaling the fused image features under the target down-sampling magnification to each other down-sampling magnification;

In one possible embodiment, scaling each initial image feature at a down-sampling magnification other than the target down-sampling magnification to obtain a mapped image feature, includes:

The Memory mentioned in the above electronic device may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.

In yet another embodiment of the present invention, a computer-readable storage medium is further provided, which stores instructions that, when executed on a computer, cause the computer to execute any one of the object detection methods in the above embodiments.

In yet another embodiment, a computer program product containing instructions is provided, which when run on a computer, causes the computer to perform any of the object detection methods of the above embodiments.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus, the electronic device, the computer-readable storage medium, and the computer program product embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiments.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A method of object detection, the method comprising:

2. The method of claim 1, wherein scaling the initial image feature to the target down-sampling magnification to obtain a mapped image feature comprises:

3. The method of claim 2, wherein fusing the scaled image features with initial image features at a next down-sampling magnification to obtain new initial image features comprises:

4. The method of claim 1, wherein scaling the fused image feature at the target down-sampling magnification to each of the other down-sampling magnifications to obtain the fused image feature at each of the other down-sampling magnifications comprises:

5. The method of claim 1, wherein scaling each initial image feature at a down-sampling magnification other than a target down-sampling magnification to the target down-sampling magnification to obtain a mapped image feature comprises:

6. An object detection apparatus, characterized in that the apparatus comprises:

a reverse transfer module, configured to scale the fused image feature at the target down-sampling magnification to each of the other down-sampling magnifications to obtain the fused image feature at each of the other down-sampling magnifications

7. The apparatus of claim 6, wherein the feature mapping module is specifically configured to repeatedly perform the following operations until the mapped image features are obtained:

8. The apparatus according to claim 7, wherein the feature mapping module is specifically configured to fuse the scaled image feature with an initial image feature at a next down-sampling magnification and an initial image feature scaled from a previous down-sampling magnification to a next down-sampling magnification to obtain a new initial image feature, where the previous down-sampling magnification is a next down-sampling magnification that is far from a target down-sampling magnification from a down-sampling magnification to which the initial image feature belongs when the plurality of different down-sampling magnifications are sorted according to size.

9. The apparatus according to claim 6, wherein the reverse transfer module is specifically configured to scale the fused image feature at the target down-sampling magnification to each of the other down-sampling magnifications;

10. The apparatus according to claim 6, wherein the feature mapping module is specifically configured to, for each initial image feature at a down-sampling magnification other than the target down-sampling magnification, scale the initial image feature to the target down-sampling magnification by one scaling to obtain a mapped image feature.

11. An electronic device, comprising:

a memory for storing a computer program;

a processor for implementing the method steps of any one of claims 1 to 5 when executing a program stored in the memory.

12. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of the claims 1-5.