CN111767935B

CN111767935B - Target detection method and device and electronic equipment

Info

Publication number: CN111767935B
Application number: CN201911056322.7A
Authority: CN
Inventors: 张凯; 谭文明; 李哲暘; 石大虎
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2019-10-31
Filing date: 2019-10-31
Publication date: 2023-09-05
Anticipated expiration: 2039-10-31
Also published as: CN111767935A

Abstract

The embodiment of the invention provides a target detection method and device and electronic equipment. Wherein the method comprises the following steps: acquiring a plurality of initial image features of an image to be processed under a plurality of downsampling multiplying factors; scaling the initial image features to the target downsampling magnification aiming at the initial image features under other downsampling magnifications except the target downsampling magnifications to obtain mapped image features; fusing the initial image features under the target downsampling multiplying power and the mapping image features to obtain fused image features under the target downsampling multiplying power; scaling the fusion image features under the target downsampling multiplying power to each other downsampling multiplying power to obtain the fusion image features under each other downsampling multiplying power; and performing target detection on the image to be processed based on the fusion image characteristics of the image to be processed under the plurality of downsampling multiplying factors. The accuracy of target detection can be improved.

Description

Target detection method and device and electronic equipment

Technical Field

The present invention relates to the field of machine vision, and in particular, to a target detection method, apparatus, and electronic device.

Background

Based on the machine vision technology, a computer can automatically identify an object existing in an image (hereinafter referred to as object detection) and perform corresponding processing on the object. For example, in video surveillance, a computer may identify people in an image and monitor those people.

In the related art, image features of an image may be extracted through a plurality of continuous downsampling processes, and feature regression may be performed on the extracted image features to determine the positions of respective targets in the image as detection results. However, if the number of downsampling processes is large, more texture information may be lost during the downsampling process, and if the number of downsampling processes is small, the obtained image features are shallow, and the semantic information is less contained. Therefore, the image features extracted in the related art are difficult to accurately express the features of the image, namely, the image features are not accurate enough, and the accuracy of the detection result obtained based on the inaccurate image features is also low.

Disclosure of Invention

The embodiment of the invention aims to provide a target detection method, a target detection device and electronic equipment, so as to improve the accuracy of a target detection result. The specific technical scheme is as follows:

in a first aspect of the present invention, there is provided a target detection method, the method comprising:

acquiring a plurality of initial image features of an image to be processed under a plurality of downsampling multiplying factors;

scaling the initial image features to the target downsampling magnification aiming at the initial image features under other downsampling magnifications except the target downsampling magnifications to obtain mapped image features;

fusing the initial image features under the target downsampling multiplying power and the mapping image features to obtain fused image features under the target downsampling multiplying power;

scaling the fusion image features under the target downsampling multiplying power to each other downsampling multiplying power to obtain the fusion image features under each other downsampling multiplying power;

and performing target detection on the image to be processed based on the fusion image characteristics of the image to be processed under the plurality of downsampling multiplying factors.

In a possible embodiment, said scaling the initial image feature to the target downsampling ratio results in a mapped image feature, comprising:

the following operations are repeatedly performed until the mapped image features are obtained:

scaling the initial image feature to a next downsampling magnification, wherein the next downsampling magnification is the next downsampling magnification close to the target downsampling magnification from the downsampling magnification to which the initial image feature belongs when the plurality of different downsampling magnifications are ordered according to size;

if the next downsampling multiplying power is the target downsampling multiplying power, taking the zoomed image features as mapping image features;

and if the next downsampling multiplying power is not the target downsampling multiplying power, fusing the zoomed image characteristic with the initial image characteristic under the next downsampling multiplying power to obtain a new initial image characteristic.

In one possible embodiment, the fusing the scaled image feature with the initial image feature at the next downsampling magnification to obtain a new initial image feature includes:

and merging the zoomed image features with the initial image features under the next downsampling magnification and the initial image features of the last downsampling magnification zoomed to the next downsampling magnification to obtain new initial image features, wherein the last downsampling magnification is the next downsampling magnification far away from the target downsampling magnification from the downsampling magnification to which the initial image features belong when the different downsampling magnifications are ordered according to the size.

In a possible embodiment, the scaling the fused image feature at the target downsampling magnification to each of the other downsampling magnifications to obtain the fused image feature at each of the other downsampling magnifications includes:

scaling the fusion image features under the target downsampling multiplying power to each of the other downsampling multiplying powers;

and aiming at each other downsampling multiplying power, fusing the image features scaled to the other downsampling multiplying power with the initial image features of the other downsampling multiplying power to obtain fused image features of the other downsampling multiplying power.

In one possible embodiment, the scaling the initial image feature to the target downsampling magnification for each initial image feature at the downsampling magnification other than the target downsampling magnification to obtain the mapped image feature includes:

and aiming at each initial image feature under other downsampling magnifications except for the target downsampling magnifications, scaling the initial image feature to the target downsampling magnifications through one-time scaling to obtain mapped image features.

In a second aspect of the present invention, there is provided an object detection apparatus, the apparatus comprising:

the feature extraction module is used for acquiring a plurality of initial image features of the image to be processed under a plurality of downsampling multiplying factors;

the feature mapping module is used for scaling the initial image features to the target downsampling multiplying power aiming at the initial image features under other downsampling multiplying powers except the target downsampling multiplying power to obtain mapped image features;

the feature fusion module is used for scaling the fusion image features under the target downsampling multiplying power to each other downsampling multiplying power to obtain the fusion image features under each other downsampling multiplying power;

and the characteristic regression module is used for carrying out target detection on the image to be processed based on the fused image characteristics of the image to be processed under the plurality of downsampling multiplying factors.

In a possible embodiment, the feature mapping module is specifically configured to repeatedly perform the following operations until a mapped image feature is obtained:

In one possible embodiment, the feature mapping module is specifically configured to fuse the scaled image feature with an initial image feature at a next downsampling magnification and an initial image feature at a last downsampling magnification scaled to the next downsampling magnification, so as to obtain a new initial image feature, where the last downsampling magnification is a downsampling magnification that is away from the target downsampling magnification from which the initial image feature belongs when the plurality of different downsampling magnifications are ordered according to size.

In a possible embodiment, the feature fusion module is specifically configured to scale the fused image feature under the target downsampling magnification to each of the other downsampling magnifications;

In a possible embodiment, the feature mapping module is specifically configured to scale, for each initial image feature at a downsampling magnification other than a target downsampling magnification, the initial image feature to the target downsampling magnification by one scaling, to obtain a mapped image feature.

In a third aspect of the present invention, there is provided an electronic apparatus comprising:

a memory for storing a computer program;

a processor for implementing the method steps of any of the above first aspects when executing a program stored on a memory.

In a fourth aspect of the invention, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method steps of any of the first aspects described above.

According to the target detection method, the target detection device and the electronic equipment provided by the embodiment of the invention, the initial image features of a plurality of downsampling magnifications are scaled to the same target downsampling magnifications and fused, so that the fused image features with rich semantic information and texture information are obtained, and then the fused image features are scaled to other downsampling magnifications, so that the fused image features under each downsampling magnifications can better express the features of the image to be processed, and therefore, the accuracy of the detection result obtained by determining the fused image features is higher. Of course, it is not necessary for any one product or method of practicing the invention to achieve all of the advantages set forth above at the same time.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a target detection method according to an embodiment of the present invention;

fig. 2a is a schematic structural diagram of a feature fusion network according to an embodiment of the present invention;

fig. 2b is another schematic structural diagram of a feature fusion network according to an embodiment of the present invention;

fig. 2c is a schematic structural diagram of a feature fusion network according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of an object detection device according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, fig. 1 is a schematic flow chart of a target detection method according to an embodiment of the present invention, which may include:

s101, acquiring a plurality of initial image features of an image to be processed under a plurality of downsampling multiplying powers.

The plurality of downsampling magnifications may be different according to application scenarios, and may be, for example, 8-fold downsampling, 16-fold downsampling, 32-fold downsampling, and 64-fold downsampling, respectively. Since the principle of target detection is the same even though the plurality of downsampling magnifications are different, the following description will take the case that the plurality of downsampling magnifications are 8-fold downsampling, 16-fold downsampling, 32-fold downsampling, and 64-fold downsampling, respectively, and the other cases are similarly available and will not be described again.

The method can be to extract the initial image characteristics of the image to be processed under a plurality of downsampling multiplying factors by utilizing a characteristic extraction network with a plurality of convolution layers. And the 16-fold downsampled initial image features may be obtained by 2-fold downsampling of the 8-fold downsampled initial image features. The 2-fold downsampling may be achieved by a convolution process with a step size of 2.

S102, aiming at each initial image feature under other downsampling magnifications except for the target downsampling magnifications, scaling the initial image feature to the target downsampling magnifications to obtain mapped image features.

The target downsampling magnification is one downsampling magnification among a plurality of downsampling magnifications, the target downsampling magnification is set according to actual requirements or user experience, and the target downsampling magnification can be the downsampling magnification with the smallest magnification among the plurality of downsampling magnifications, the downsampling magnification with the largest magnification among the plurality of downsampling magnifications, and the downsampling magnification with the largest magnification or the smallest and largest downsampling magnifications.

For any other downsampling magnification, if the other downsampling magnification is greater than the target downsampling magnification, the initial image feature at the other downsampling magnification needs to be scaled to the target downsampling magnification by upsampling the initial image feature at the other downsampling magnification. If the other downsampling magnification is less than the target downsampling magnification, the initial image feature at the other downsampling magnification needs to be scaled to the target downsampling magnification by downsampling the initial image feature at the other downsampling magnification.

S103, fusing the initial image features under the target downsampling multiplying power and the mapping image features to obtain the fused image features under the target downsampling multiplying power.

Each of the mapped image features may be regarded as a matrix, and thus elements at corresponding positions of each of the mapped image features may be added by element-level (elementwise) addition to obtain a fused image feature.

And S104, scaling the fused image features under the target downsampling multiplying power to other downsampling multiplying powers to obtain the fused image features under the other downsampling multiplying powers.

It can be understood that the fused image features under the target downsampling magnifications are obtained by fusing the initial image features under the sampling magnifications after scaling to the uniform target downsampling magnifications, and because the downsampling magnifications with higher magnifications and the downsampling magnifications with lower magnifications are simultaneously provided in the downsampling magnifications, the fused image features have rich semantic features and texture features. And the fused image features under each downsampling ratio obtained by scaling according to the fused image features under the target downsampling ratio also have richer semantic features and texture features.

In one possible embodiment, the fused image features at the target downsampling magnification may be scaled to various other downsampling magnifications, and for each other downsampling magnification, the image features scaled to the other downsampling magnifications are fused with the initial image features of the other downsampling magnifications to obtain the fused image features at the other downsampling magnifications.

Illustratively, the plurality of downsampling magnifications are respectively 8-fold downsampling, 16-fold downsampling, 32-fold downsampling, and 64-fold downsampling, with the target downsampling magnification being 8-fold downsampling magnification as an example. The resulting 8-fold downsampled fused image features may be scaled to 16-fold downsampling, 32-fold downsampling, and 64-fold downsampling, respectively. And fusing the image features scaled to 16 times of downsampling with the initial image features at 16 times of downsampling to obtain fused image features at 16 times of downsampling. And fusing the image features scaled to 32 times of downsampling with the initial image features downsampled to 32 times of downsampling to obtain fused image features downsampled to 32 times of downsampling, and fusing the image features scaled to 64 times of downsampling with the initial image features downsampled to 64 times of downsampling to obtain fused image features downsampled to 64 times of downsampling.

S105, performing target detection on the image to be processed based on the fusion image characteristics of the image to be processed under a plurality of downsampling multiplying powers.

The feature regression can be performed on the fused image features under a plurality of downsampling magnifications to determine the targets existing in the image to be processed and the positions of the targets. The position of the object may be represented by setting an object box at the position where the object is located in the image to be processed.

By adopting the embodiment, the initial image features of a plurality of downsampling magnifications can be scaled to the same target downsampling magnifications and fused to obtain the fused image features with rich semantic information and texture information, and then the fused image features are scaled to other downsampling magnifications, so that the fused image features of each downsampling magnifications can better express the features of the image to be processed, and therefore, the accuracy of the detection result obtained by determining the fused image features is higher.

The manner in which the original image features at the other downsampling magnifications are scaled to the target downsampling magnification will be described. In one possible embodiment, the mapping image feature may be obtained by scaling, for each initial image feature at a downsampling magnification other than the target downsampling magnification, the initial image feature to the target downsampling magnification by one scaling. In other possible embodiments, the initial image features at the downsampling magnification other than the target downsampling magnification may be scaled to the target downsampling magnification by multiple scaling.

For example, in one possible embodiment, the following operations may be repeated until the mapped image features are obtained:

the initial image feature is scaled to the next downsampling magnification. And if the next downsampling multiplying power is the target downsampling multiplying power, taking the zoomed image characteristic as a mapping image characteristic. If the next downsampling magnification is not the target downsampling magnification, fusing the zoomed image features with the initial image features under the next downsampling magnification to obtain new initial image features.

When the next downsampling magnification is a plurality of different downsampling magnifications which are ordered according to the size, the next downsampling magnification which is close to the target downsampling magnification from the downsampling magnification to which the initial image feature belongs. Taking the example that the plurality of downsampling magnifications are 8 times downsampling, 16 times downsampling, 32 times downsampling and 64 times downsampling respectively, the target downsampling magnification is 8 times downsampling magnification. The downsampling magnification of 16 times downsampling is 8 times downsampling, the downsampling magnification of 32 times downsampling is 16 times downsampling, and the downsampling magnification of 64 times downsampling is 32 times downsampling.

For an initial image feature at 16 times downsampling, it may be that the initial image feature is scaled to 8 times downsampling to yield a mapped image feature.

For an initial image feature at 32 times downsampling, the initial image feature may be scaled to 16 times downsampling and fused with the initial image feature at 16 times downsampling to obtain a new initial image feature. The new initial image feature is an initial image feature at 16 times down-sampling, so reference can be made to the description of the initial image feature at 16 times down-sampling previously described.

For an initial image feature at 64 times downsampled, the initial image feature may be scaled to 64 times downsampled and fused with the initial image feature at 32 times downsampled to obtain a new initial image feature. The new initial image feature is an initial image feature at 32 times down-sampling, so reference can be made to the description of the initial image feature at 32 times down-sampling previously described.

An implementation of the above procedure can be seen in the network structure shown in fig. 2a, where C3 represents the image feature at 8 times down-sampling, C4 represents the image feature at 16 times down-sampling, C5 represents the image feature at 32 times down-sampling, and C6 represents the image feature at 64 times down-sampling. The leftmost column of cells in the network may be considered as input, which is the initial image feature at each sampling rate. The arrow in the horizontal direction may indicate a convolution process, and may be, for example, a convolution process with a 1*1 convolution kernel, step 1. The arrow to the upper right indicates the upsampling process. Taking the unit of the first row and the second column in fig. 2a as an example, the input of the unit is the initial image feature under 8 times downsampling after convolution processing and the initial image feature under 16 times downsampling after upsampling processing, and the unit fuses the two input image features to obtain the new initial image feature under 8 times downsampling.

In yet another possible embodiment, the following operations may be repeated until the mapped image features are obtained:

the initial image feature is scaled to the next downsampling magnification. And if the next downsampling multiplying power is the target downsampling multiplying power, taking the zoomed image characteristic as a mapping image characteristic. If the next downsampling magnification is not the target downsampling magnification, the zoomed image features are fused with the initial image features under the next downsampling magnification and the initial image features of the last downsampling magnification zoomed to the next downsampling magnification, and then new initial image features are obtained.

When the last downsampling magnification is a plurality of different downsampling magnifications which are ordered according to the size, starting from the downsampling magnification to which the initial image feature belongs to the next downsampling magnification far away from the target downsampling magnification. Taking the example that the plurality of downsampling magnifications are 8 times downsampling, 16 times downsampling, 32 times downsampling and 64 times downsampling respectively, the target downsampling magnification is 8 times downsampling magnification. The last downsampling magnification of 8 times downsampling is 16 times downsampling, the downsampling magnification of 16 times downsampling is 32 times downsampling, and the downsampling magnification of 32 times downsampling is 64 times downsampling.

An implementation may be as shown in fig. 2b, where the up arrow represents the upsampling process. Taking the unit of the first row and the second column in fig. 2b as an example, the input of the unit is the initial image feature under 8 times downsampling through convolution processing, the initial image feature under 16 times downsampling through upsampling processing, and the image feature of scaling the initial image feature under 32 times downsampling to the image feature under 16 times downsampling and then scaling to 8 times downsampling multiplying power, and the unit fuses the two input image features to obtain the new initial image feature under 8 times downsampling.

For each initial image feature at the downsampling magnification other than the target downsampling magnification, scaling the initial image feature to the target downsampling magnification by one time, resulting in a mapped image feature, see fig. 2c. The descriptions of the symbols in fig. 2c can be referred to the previous descriptions related to fig. 2a and 2b, and will not be repeated here.

It can be appreciated that the structure shown in fig. 2c is more simplified than the structures shown in fig. 2a and 2b, and the calculation amount of target detection can be effectively saved. The structures shown in fig. 2a and fig. 2b can enable the initial image features under different downsampling rates to be more fully mixed, so that the obtained fusion image features are more accurate. Any one of the structures in fig. 2a, fig. 2b or fig. 2c may be selected according to actual requirements, or other structures may be selected to implement target detection, which is not limited in this embodiment.

Referring to fig. 3, fig. 3 is a schematic structural diagram of an object detection device according to an embodiment of the present invention, which may include:

the feature extraction module 301 is configured to obtain a plurality of initial image features of an image to be processed under a plurality of downsampling magnifications;

the feature mapping module 302 is configured to scale, for each initial image feature of the downsampling magnifications other than the target downsampling magnifications, the initial image feature to the target downsampling magnifications to obtain a mapped image feature;

the feature fusion module 303 is configured to scale the fused image feature under the target downsampling magnification to each other downsampling magnification, so as to obtain the fused image feature under each other downsampling magnification;

the inverse transfer module 304 is configured to scale the fused image feature under the target downsampling magnification to each other downsampling magnification, so as to obtain the fused image feature under each other downsampling magnification

The feature regression module 305 is configured to perform object detection on the image to be processed based on the features of the fused image of the image to be processed under a plurality of downsampling magnifications.

scaling the initial image feature to a next downsampling magnification, wherein the next downsampling magnification is the next downsampling magnification approaching to the target downsampling magnification from the downsampling magnification to which the initial image feature belongs when a plurality of different downsampling magnifications are ordered according to size;

In one possible embodiment, the feature mapping module is specifically configured to fuse the scaled image feature with an initial image feature at a next downsampling magnification and an initial image feature at a last downsampling magnification scaled to the next downsampling magnification, so as to obtain a new initial image feature, where the last downsampling magnification is a next downsampling magnification away from the target downsampling magnification from the downsampling magnification to which the initial image feature belongs when the plurality of different downsampling magnifications are ordered according to the size.

In one possible embodiment, the feature fusion module is specifically configured to scale the fused image feature at the target downsampling magnification to each of the other downsampling magnifications;

In one possible embodiment, the feature mapping module is specifically configured to scale, for each initial image feature at a downsampling magnification other than the target downsampling magnification, the initial image feature to the target downsampling magnification by one scaling, to obtain a mapped image feature.

The embodiment of the invention also provides an electronic device, as shown in fig. 4, which may include:

a memory 401 for storing a computer program;

a processor 402, configured to execute a program stored in the memory 401, and implement the following steps:

scaling the fused image features under the target downsampling multiplying power to other downsampling multiplying powers to obtain fused image features under other downsampling multiplying powers;

and performing target detection on the image to be processed based on the fusion image characteristics of the image to be processed under a plurality of downsampling multiplying powers.

In one possible embodiment, scaling the initial image feature to a target downsampling magnification results in a mapped image feature, comprising:

In one possible embodiment, fusing the scaled image features with the initial image features at the next downsampling magnification to obtain new initial image features includes:

and merging the zoomed image features with the initial image features under the next downsampling multiplying power and the initial image features under the previous downsampling multiplying power zoomed to the next downsampling multiplying power to obtain new initial image features, wherein the previous downsampling multiplying power is the next downsampling multiplying power far away from the target downsampling multiplying power from the downsampling multiplying power to which the initial image features belong when the different downsampling multiplying powers are ordered according to the size.

In one possible embodiment, scaling the fused image feature at the target downsampling magnification to each of the other downsampling magnifications to obtain the fused image feature at each of the other downsampling magnifications includes:

scaling the fused image features at the target downsampling magnification to each of the other downsampling magnifications;

In one possible embodiment, for each initial image feature at a downsampling magnification other than the target downsampling magnification, scaling the initial image feature to the target downsampling magnification to obtain a mapped image feature, comprising:

and aiming at each initial image feature under other downsampling magnifications except the target downsampling magnifications, scaling the initial image feature to the target downsampling magnifications through one-time scaling to obtain mapped image features.

The Memory mentioned in the electronic device may include a random access Memory (Random Access Memory, RAM) or may include a Non-Volatile Memory (NVM), such as at least one magnetic disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.

The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processing, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

In yet another embodiment of the present invention, a computer readable storage medium having instructions stored therein, which when run on a computer, cause the computer to perform any of the object detection methods of the above embodiments is also provided.

In yet another embodiment of the present invention, there is also provided a computer program product containing instructions that, when run on a computer, cause the computer to perform any of the object detection methods of the above embodiments.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the apparatus, electronic device, computer readable storage medium, and computer program product embodiments, the description is relatively simple, as relevant to the method embodiments being referred to in the section of the description of the method embodiments.

The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims

1. A method of target detection, the method comprising:

performing target detection on the image to be processed based on the fusion image characteristics of the image to be processed under the plurality of downsampling multiplying factors;

the scaling the initial image feature to the target downsampling ratio to obtain a mapped image feature comprises the following steps:

scaling the initial image feature to a next downsampling magnification, wherein the next downsampling magnification is a next downsampling magnification close to a target downsampling magnification from the downsampling magnification to which the initial image feature belongs when a plurality of different downsampling magnifications are ordered according to size;

2. The method of claim 1, wherein fusing the scaled image features with the initial image features at the next downsampling magnification to obtain new initial image features, comprises:

3. The method according to claim 1, wherein scaling the fused image feature at the target downsampling magnification to each of the other downsampling magnifications results in a fused image feature at each of the other downsampling magnifications, comprising:

4. The method of claim 1, wherein said scaling the initial image feature to the target downsampling magnification for each initial image feature at a downsampling magnification other than the target downsampling magnification, comprises:

5. An object detection device, the device comprising:

the reverse transfer module is used for scaling the fused image features under the target downsampling multiplying power to each other downsampling multiplying power to obtain the fused image features under each other downsampling multiplying power

The feature regression module is used for carrying out target detection on the image to be processed based on the fusion image features of the image to be processed under the plurality of downsampling multiplying factors;

the feature mapping module is specifically configured to repeatedly perform the following operations until a mapped image feature is obtained:

6. The apparatus of claim 5, wherein the feature mapping module is specifically configured to fuse the scaled image feature with an initial image feature at a next downsampling magnification and an initial image feature at a last downsampling magnification scaled to the next downsampling magnification, to obtain a new initial image feature, where the last downsampling magnification is a next downsampling magnification away from the target downsampling magnification from which the initial image feature belongs when the plurality of different downsampling magnifications are ordered by size.

7. The apparatus according to claim 5, wherein the inverse transfer module is configured to scale the fused image feature at the target downsampling magnification to each of the other downsampling magnifications;

8. The apparatus according to claim 5, wherein the feature mapping module is specifically configured to, for each initial image feature at a downsampling magnification other than a target downsampling magnification, scale the initial image feature to the target downsampling magnification by one scaling, to obtain the mapped image feature.

9. An electronic device, comprising:

a memory for storing a computer program;

a processor for carrying out the method steps of any one of claims 1-4 when executing a program stored on a memory.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored therein a computer program which, when executed by a processor, implements the method steps of any of claims 1-4.