CN111767935A - Target detection method and device and electronic equipment - Google Patents

Target detection method and device and electronic equipment Download PDF

Info

Publication number
CN111767935A
CN111767935A CN201911056322.7A CN201911056322A CN111767935A CN 111767935 A CN111767935 A CN 111767935A CN 201911056322 A CN201911056322 A CN 201911056322A CN 111767935 A CN111767935 A CN 111767935A
Authority
CN
China
Prior art keywords
sampling
magnification
target
magnifications
initial image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911056322.7A
Other languages
Chinese (zh)
Other versions
CN111767935B (en
Inventor
张凯
谭文明
李哲暘
石大虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hikvision Digital Technology Co Ltd
Original Assignee
Hangzhou Hikvision Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision Digital Technology Co Ltd filed Critical Hangzhou Hikvision Digital Technology Co Ltd
Priority to CN201911056322.7A priority Critical patent/CN111767935B/en
Publication of CN111767935A publication Critical patent/CN111767935A/en
Application granted granted Critical
Publication of CN111767935B publication Critical patent/CN111767935B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The embodiment of the invention provides a target detection method, a target detection device and electronic equipment. Wherein the method comprises the following steps: acquiring a plurality of initial image characteristics of an image to be processed under a plurality of down-sampling magnifications; for each initial image feature under other down-sampling magnifications except the target down-sampling magnification, zooming the initial image feature to the target down-sampling magnification to obtain a mapping image feature; fusing the initial image features and the mapping image features under the target down-sampling magnification to obtain fused image features under the target down-sampling magnification; scaling the fused image features under the target down-sampling magnification to each of the other down-sampling magnifications to obtain the fused image features under each of the other down-sampling magnifications; and performing target detection on the image to be processed based on the fusion image characteristics of the image to be processed under the plurality of down-sampling magnifications. The accuracy of target detection can be improved.

Description

Target detection method and device and electronic equipment
Technical Field
The present invention relates to the field of machine vision, and in particular, to a target detection method, device and electronic device.
Background
Based on machine vision technology, the computer can automatically recognize the target existing in the image (hereinafter referred to as target detection), and perform corresponding processing for the target. For example, in video surveillance, a computer may identify and monitor people in an image.
In the related art, image features of an image may be extracted through a plurality of successive down-sampling processes, and feature regression may be performed on the extracted image features to determine positions of respective targets in the image as a detection result. However, if the number of times of the down-sampling processing is performed is large, a large amount of texture information may be lost in the down-sampling processing, and if the number of times of the down-sampling processing is performed is small, the obtained image features are shallow and contain a small amount of semantic information. Therefore, the image features extracted in the related art are difficult to accurately express the features of the image, that is, the image features are not accurate enough, and the accuracy of the detection result determined based on the inaccurate image features is also low.
Disclosure of Invention
The embodiment of the invention aims to provide a target detection method, a target detection device and electronic equipment, so as to improve the accuracy of a target detection result. The specific technical scheme is as follows:
in a first aspect of the present invention, there is provided a target detection method, the method comprising:
acquiring a plurality of initial image characteristics of an image to be processed under a plurality of down-sampling magnifications;
for each initial image feature under other down-sampling magnifications except the target down-sampling magnification, zooming the initial image feature to the target down-sampling magnification to obtain a mapping image feature;
fusing the initial image features and the mapping image features under the target down-sampling magnification to obtain fused image features under the target down-sampling magnification;
scaling the fused image features under the target down-sampling magnification to each of the other down-sampling magnifications to obtain the fused image features under each of the other down-sampling magnifications;
and performing target detection on the image to be processed based on the fusion image characteristics of the image to be processed under the plurality of down-sampling magnifications.
In a possible embodiment, the scaling the initial image feature to the target down-sampling magnification to obtain a mapped image feature includes:
repeatedly executing the following operations until the mapping image characteristics are obtained:
scaling the initial image feature to a next down-sampling magnification, wherein the next down-sampling magnification is a next down-sampling magnification which is close to a target down-sampling magnification from the down-sampling magnification to which the initial image feature belongs when the plurality of different down-sampling magnifications are sequenced according to sizes;
if the next down-sampling magnification is the target down-sampling magnification, the zoomed image feature is used as the mapping image feature;
and if the next down-sampling magnification is not the target down-sampling magnification, fusing the zoomed image features with the initial image features under the next down-sampling magnification to obtain new initial image features.
In a possible embodiment, the fusing the scaled image features with the initial image features at the next down-sampling magnification to obtain new initial image features includes:
and fusing the zoomed image features with the initial image features under the next down-sampling magnification and the initial image features zoomed from the last down-sampling magnification to the next down-sampling magnification to obtain new initial image features, wherein the last down-sampling magnification is the next down-sampling magnification from the down-sampling magnification to which the initial image features belong when the different down-sampling magnifications are sorted according to size to the target down-sampling magnification.
In a possible embodiment, the scaling the fused image feature at the target down-sampling magnification to each of the other down-sampling magnifications to obtain the fused image feature at each of the other down-sampling magnifications includes:
scaling the fused image features under the target down-sampling magnification to each of the other down-sampling magnifications;
and for each of the other down-sampling magnifications, fusing the image features scaled to the other down-sampling magnifications with the initial image features of the other down-sampling magnifications to obtain fused image features under the other down-sampling magnifications.
In a possible embodiment, the scaling, for each initial image feature at a down-sampling magnification other than the target down-sampling magnification, the initial image feature to the target down-sampling magnification to obtain a mapped image feature includes:
and scaling the initial image features to the target down-sampling magnification by one-time scaling aiming at each initial image feature under other down-sampling magnifications except the target down-sampling magnification to obtain the mapping image features.
In a second aspect of the present invention, there is provided an object detection apparatus, the apparatus comprising:
the characteristic extraction module is used for acquiring a plurality of initial image characteristics of the image to be processed under a plurality of down-sampling multiplying powers;
the characteristic mapping module is used for scaling the initial image characteristics to the target down-sampling multiplying power aiming at the initial image characteristics under other down-sampling multiplying powers except the target down-sampling multiplying power to obtain mapped image characteristics;
the feature fusion module is used for scaling the fusion image features under the target down-sampling magnification to each of the other down-sampling magnifications to obtain the fusion image features under each of the other down-sampling magnifications;
and the characteristic regression module is used for carrying out target detection on the image to be processed based on the fusion image characteristics of the image to be processed under the plurality of down-sampling multiplying powers.
In a possible embodiment, the feature mapping module is specifically configured to repeatedly perform the following operations until the mapped image feature is obtained:
scaling the initial image feature to a next down-sampling magnification, wherein the next down-sampling magnification is a next down-sampling magnification which is close to a target down-sampling magnification from the down-sampling magnification to which the initial image feature belongs when the plurality of different down-sampling magnifications are sequenced according to sizes;
if the next down-sampling magnification is the target down-sampling magnification, the zoomed image feature is used as the mapping image feature;
and if the next down-sampling magnification is not the target down-sampling magnification, fusing the zoomed image features with the initial image features under the next down-sampling magnification to obtain new initial image features.
In a possible embodiment, the feature mapping module is specifically configured to fuse the scaled image feature with an initial image feature at a next down-sampling magnification and an initial image feature scaled from a previous down-sampling magnification to a next down-sampling magnification to obtain a new initial image feature, where the previous down-sampling magnification is a next down-sampling magnification that is far from a target down-sampling magnification from a down-sampling magnification to which the initial image feature belongs when the multiple different down-sampling magnifications are sorted according to size.
In a possible embodiment, the feature fusion module is specifically configured to scale the fused image feature at the target down-sampling magnification to each of the other down-sampling magnifications;
and for each of the other down-sampling magnifications, fusing the image features scaled to the other down-sampling magnifications with the initial image features of the other down-sampling magnifications to obtain fused image features under the other down-sampling magnifications.
In a possible embodiment, the feature mapping module is specifically configured to, for each initial image feature at a down-sampling magnification other than the target down-sampling magnification, scale the initial image feature to the target down-sampling magnification by one-time scaling to obtain a mapped image feature.
In a third aspect of the present invention, there is provided an electronic device comprising:
a memory for storing a computer program;
a processor adapted to perform the method steps of any of the above first aspects when executing a program stored in the memory.
In a fourth aspect of the present invention, a computer-readable storage medium is provided, having stored therein a computer program which, when executed by a processor, performs the method steps of any of the above-mentioned first aspects.
According to the target detection method, the target detection device and the electronic equipment, the initial image features of multiple down-sampling magnifications can be zoomed to the same target down-sampling magnification and fused to obtain the fused image features with rich semantic information and texture information, and the fused image features are zoomed to other down-sampling magnifications, so that the fused image features under each down-sampling magnification can better express the features of the image to be processed, and the accuracy of the obtained detection result is determined to be higher according to the fused image features. Of course, not all of the advantages described above need to be achieved at the same time in the practice of any one product or method of the invention.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flow chart of a target detection method according to an embodiment of the present invention;
fig. 2a is a schematic structural diagram of a feature fusion network according to an embodiment of the present invention;
fig. 2b is a schematic structural diagram of a feature fusion network according to an embodiment of the present invention;
fig. 2c is a schematic structural diagram of a feature fusion network according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of an object detection apparatus according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, fig. 1 is a schematic flow chart of a target detection method according to an embodiment of the present invention, which may include:
s101, acquiring a plurality of initial image characteristics of an image to be processed under a plurality of down-sampling magnifications.
The multiple down-sampling magnifications may be different according to different application scenarios, and may be, for example, 8 times down-sampling, 16 times down-sampling, 32 times down-sampling, and 64 times down-sampling, respectively. Since the principle of target detection is the same even though the multiple down-sampling magnifications are different, the following description will be given by taking an example in which the multiple down-sampling magnifications are 8 times down-sampling, 16 times down-sampling, 32 times down-sampling, and 64 times down-sampling, and the same principle can be obtained in other cases, and will not be described again.
The initial image features of the image to be processed under a plurality of down-sampling magnifications can be obtained by utilizing a feature extraction network with a plurality of convolution layers. And the initial image feature under 16 times down-sampling may be obtained by down-sampling the initial image feature under 8 times down-sampling by 2 times. The 2-fold down-sampling may be achieved by a convolution process with a step size of 2.
S102, aiming at each initial image feature under other down-sampling magnifications except the target down-sampling magnification, the initial image feature is zoomed to the target down-sampling magnification to obtain the mapping image feature.
The target down-sampling magnification is one down-sampling magnification of a plurality of down-sampling magnifications, and the target down-sampling magnification is set according to actual needs or user experience, for example, the target down-sampling magnification may be a down-sampling magnification with the smallest magnification among the plurality of down-sampling magnifications, or may be a down-sampling magnification with the largest magnification among the plurality of down-sampling magnifications, or may not be the smallest or the largest down-sampling magnification, which is not limited in this embodiment.
For any other down-sampling magnification, if the other down-sampling magnification is greater than the target down-sampling magnification, the initial image features of the other down-sampling magnification need to be up-sampled to scale the initial image features of the other down-sampling magnification to the target down-sampling magnification. If the other down-sampling magnification is smaller than the target down-sampling magnification, the initial image features of the other down-sampling magnification need to be down-sampled so as to scale the initial image features of the other down-sampling magnification to the target down-sampling magnification.
S103, fusing the initial image features and the mapping image features under the target down-sampling magnification to obtain fused image features under the target down-sampling magnification.
Each mapping image feature can be regarded as a matrix, and therefore, the elements at the corresponding positions of the mapping image features are added through element-level (elementary-wise) addition operation to obtain the fused image feature.
And S104, scaling the fused image features under the target down-sampling magnification to each other down-sampling magnification to obtain the fused image features under each other down-sampling magnification.
It can be understood that the fused image features at the target down-sampling magnification are obtained by fusing original image features at the sampling magnification after zooming to a uniform target down-sampling magnification, and because each down-sampling magnification has a down-sampling magnification with a higher magnification and a down-sampling magnification with a lower magnification, the fused image features have abundant semantic features and texture features at the same time. The fused image features under each down-sampling magnification obtained by scaling according to the fused image features under the target down-sampling magnification also have richer semantic features and texture features.
In one possible embodiment, the fused image feature at the target down-sampling magnification may be scaled to each of other down-sampling magnifications, and for each of the other down-sampling magnifications, the image feature scaled to the other down-sampling magnification is fused with the initial image feature at the other down-sampling magnification to obtain the fused image feature at the other down-sampling magnification.
For example, the multiple down-sampling magnifications are 8 down-sampling, 16 down-sampling, 32 down-sampling and 64 down-sampling respectively, and the target down-sampling magnification is 8 down-sampling magnifications. The obtained fused image features under 8 times of downsampling can be respectively scaled to 16 times of downsampling, 32 times of downsampling and 64 times of downsampling. And the image features which are zoomed to 16 times of downsampling are fused with the initial image features under 16 times of downsampling to obtain fused image features under 16 times of downsampling. And fusing the image features which are zoomed to 32 times and subjected to down sampling with the initial image features subjected to the 32 times of down sampling to obtain fused image features subjected to the 32 times of down sampling, and fusing the image features which are zoomed to 64 times and subjected to the 64 times of down sampling with the initial image features subjected to the 64 times of down sampling to obtain fused image features subjected to the 64 times of down sampling.
And S105, performing target detection on the image to be processed based on the fusion image characteristics of the image to be processed under the multiple down-sampling magnifications.
The target of the target existing in the image to be processed and the position of the target can be determined by performing feature regression on the fused image features under a plurality of down-sampling magnifications. The position of the object may be represented in the form of a frame of the object placed at the position of the object in the image to be processed.
By adopting the embodiment, the initial image features of a plurality of down-sampling magnifications can be zoomed to the same target down-sampling magnification and fused to obtain the fused image features with rich semantic information and texture information, and then the fused image features are zoomed to other down-sampling magnifications, so that the fused image features under each down-sampling magnification can better express the features of the image to be processed, and the obtained detection result is determined to be higher in accuracy according to the fused image features.
The manner in which the initial image features at other down-sampling magnifications are scaled to the target down-sampling magnification will be described below. In one possible embodiment, the mapping image feature may be obtained by scaling each initial image feature at a down-sampling magnification other than the target down-sampling magnification to the target down-sampling magnification by one scaling. In other possible embodiments, for each initial image feature at a down-sampling magnification other than the target down-sampling magnification, the initial image feature may be scaled to the target down-sampling magnification by multiple scaling.
For example, in one possible embodiment, the following operations may be repeatedly performed until the mapped image features are obtained:
scaling the initial image feature to a next down-sampling magnification. And if the next down-sampling magnification is the target down-sampling magnification, taking the scaled image characteristics as the mapping image characteristics. And if the next down-sampling magnification is not the target down-sampling magnification, fusing the zoomed image features with the initial image features under the next down-sampling magnification to obtain new initial image features.
And when the next down-sampling multiplying power is that a plurality of different down-sampling multiplying powers are sequenced according to sizes, the next down-sampling multiplying power which is close to the target down-sampling multiplying power from the down-sampling multiplying power which the initial image characteristic belongs to is obtained. For example, the multiple down-sampling magnifications are 8 times down-sampling, 16 times down-sampling, 32 times down-sampling and 64 times down-sampling respectively, and the target down-sampling magnification is 8 times down-sampling magnification. The next down-sampling rate of 16 times down-sampling is 8 times down-sampling, the next down-sampling rate of 32 times down-sampling is 16 times down-sampling, and the next down-sampling rate of 64 times down-sampling is 32 times down-sampling.
For the initial image feature at 16 times down-sampling, the initial image feature may be scaled to 8 times down-sampling to obtain the mapped image feature.
For the initial image feature under 32 times of downsampling, the initial image feature may be scaled to 16 times of downsampling and fused with the initial image feature under 16 times of downsampling to obtain a new initial image feature. The new initial image feature is the initial image feature at 16 times down-sampling, so reference can be made to the previous description of the initial image feature at 16 times down-sampling.
For the initial image feature under 64 times of downsampling, the initial image feature may be scaled to 64 times of downsampling and fused with the initial image feature under 32 times of downsampling to obtain a new initial image feature. The new initial image feature is the initial image feature at 32 times down-sampling, so the description about the initial image feature at 32 times down-sampling can be referred to before.
An implementation of the above flow can be seen in the network structure shown in fig. 2a, where C3 represents the image feature at 8 times down-sampling, C4 represents the image feature at 16 times down-sampling, C5 represents the image feature at 32 times down-sampling, and C6 represents the image feature at 64 times down-sampling. The cells in the leftmost column of the network can be considered as input, and the input is the initial image feature at each sampling magnification. The horizontal arrows may indicate convolution processing, and may be convolution processing with 1 × 1 convolution kernel and step 1, for example. The arrow to the upper right indicates the up-sampling process. Taking the unit in the first row and the second column in fig. 2a as an example, the input of the unit is the initial image feature under 8 times downsampling through convolution processing and the initial image feature under 16 times downsampling through upsampling processing, and the unit fuses the two input image features to obtain a new initial image feature under 8 times downsampling.
In yet another possible embodiment, the following operations may be repeatedly performed until the mapped image features are obtained:
scaling the initial image feature to a next down-sampling magnification. And if the next down-sampling magnification is the target down-sampling magnification, taking the scaled image characteristics as the mapping image characteristics. And if the next down-sampling magnification is not the target down-sampling magnification, fusing the zoomed image characteristics with the initial image characteristics under the next down-sampling magnification and the initial image characteristics when the previous down-sampling magnification is zoomed to the next down-sampling magnification to obtain new initial image characteristics.
And when the last down-sampling multiplying power is a plurality of different down-sampling multiplying powers which are sequenced according to sizes, starting from the down-sampling multiplying power of the initial image characteristic to the next down-sampling multiplying power far away from the target down-sampling multiplying power. For example, the multiple down-sampling magnifications are 8 times down-sampling, 16 times down-sampling, 32 times down-sampling and 64 times down-sampling respectively, and the target down-sampling magnification is 8 times down-sampling magnification. The last down-sampling multiplying power of 8 times down-sampling is 16 times down-sampling, the next down-sampling multiplying power of 16 times down-sampling is 32 times down-sampling, and the next down-sampling multiplying power of 32 times down-sampling is 64 times down-sampling.
An implementation may be as shown in fig. 2b, where the upward arrow represents the upsampling process. Taking the unit in the first row and the second column in fig. 2b as an example, the input of the unit is the initial image feature after 8 times of downsampling through convolution processing, the initial image feature after 16 times of downsampling through upsampling processing, and the image feature after scaling the initial image feature after 32 times of downsampling to 16 times of downsampling and then scaling to 8 times of downsampling magnification, and the unit fuses the two input image features to obtain a new initial image feature after 8 times of downsampling.
For each initial image feature at the down-sampling magnification other than the target down-sampling magnification, the initial image feature is scaled to the target down-sampling magnification by one scaling to obtain the mapped image feature, which can be seen in fig. 2 c. For the description of the marks in fig. 2c, reference may be made to the related description of fig. 2a and fig. 2b, which is not repeated herein.
It can be understood that compared with the structures shown in fig. 2a and 2b, the structure shown in fig. 2c is more simplified, and the calculation amount of target detection can be effectively saved. The structures shown in fig. 2a and 2b can enable the initial image features under different down-sampling magnifications to be mixed more fully, so that the obtained fused image features are more accurate. Any one of the structures in fig. 2a, fig. 2b, or fig. 2c may be selected according to actual requirements, and other structures may also be selected to implement target detection, which is not limited in this embodiment.
Referring to fig. 3, fig. 3 is a schematic structural diagram of an object detection apparatus according to an embodiment of the present invention, which may include:
the feature extraction module 301 is configured to obtain multiple initial image features of an image to be processed at multiple down-sampling magnifications;
a feature mapping module 302, configured to, for each initial image feature at a down-sampling magnification other than the target down-sampling magnification, scale the initial image feature to the target down-sampling magnification to obtain a mapped image feature;
the feature fusion module 303 is configured to scale the fusion image features at the target down-sampling magnification to each of the other down-sampling magnifications to obtain fusion image features at each of the other down-sampling magnifications;
a reverse transfer module 304, configured to scale the fused image feature at the target down-sampling magnification to each of the other down-sampling magnifications to obtain the fused image feature at each of the other down-sampling magnifications
And the feature regression module 305 is configured to perform target detection on the image to be processed based on the fusion image features of the image to be processed at multiple down-sampling magnifications.
In a possible embodiment, the feature mapping module is specifically configured to repeatedly perform the following operations until the mapped image feature is obtained:
scaling the initial image feature to a next down-sampling magnification, wherein the next down-sampling magnification is a next down-sampling magnification which is close to a target down-sampling magnification from the down-sampling magnification to which the initial image feature belongs when a plurality of different down-sampling magnifications are sequenced according to sizes;
if the next down-sampling magnification is the target down-sampling magnification, the zoomed image feature is taken as the mapping image feature;
and if the next down-sampling magnification is not the target down-sampling magnification, fusing the zoomed image features with the initial image features under the next down-sampling magnification to obtain new initial image features.
In a possible embodiment, the feature mapping module is specifically configured to fuse the scaled image feature with an initial image feature at a next down-sampling magnification and an initial image feature scaled from a previous down-sampling magnification to a next down-sampling magnification to obtain a new initial image feature, where the previous down-sampling magnification is a next down-sampling magnification that is far from a target down-sampling magnification from a down-sampling magnification to which the initial image feature belongs when the previous down-sampling magnification is sorted by size for a plurality of different down-sampling magnifications.
In a possible embodiment, the feature fusion module is specifically configured to scale the fusion image features at the target down-sampling magnification to each of the other down-sampling magnifications;
and for each other down-sampling magnification, fusing the image features zoomed to the other down-sampling magnification with the initial image features of the other down-sampling magnification to obtain fused image features under the other down-sampling magnification.
In a possible embodiment, the feature mapping module is specifically configured to, for each initial image feature at a down-sampling magnification other than the target down-sampling magnification, scale the initial image feature to the target down-sampling magnification by one-time scaling to obtain the mapped image feature.
An embodiment of the present invention further provides an electronic device, as shown in fig. 4, which may include:
a memory 401 for storing a computer program;
the processor 402, when executing the program stored in the memory 401, implements the following steps:
acquiring a plurality of initial image characteristics of an image to be processed under a plurality of down-sampling magnifications;
scaling the initial image features to a target down-sampling magnification to obtain mapping image features aiming at all the initial image features under other down-sampling magnifications except the target down-sampling magnification;
fusing the initial image features and the mapping image features under the target down-sampling magnification to obtain fused image features under the target down-sampling magnification;
scaling the fused image features under the target down-sampling magnification to each other down-sampling magnification to obtain the fused image features under each other down-sampling magnification;
and performing target detection on the image to be processed based on the fusion image characteristics of the image to be processed under a plurality of down-sampling magnifications.
In one possible embodiment, scaling the initial image feature to a target down-sampling magnification to obtain a mapped image feature comprises:
repeatedly executing the following operations until the mapping image characteristics are obtained:
scaling the initial image feature to a next down-sampling magnification, wherein the next down-sampling magnification is a next down-sampling magnification which is close to a target down-sampling magnification from the down-sampling magnification to which the initial image feature belongs when a plurality of different down-sampling magnifications are sequenced according to sizes;
if the next down-sampling magnification is the target down-sampling magnification, the zoomed image feature is taken as the mapping image feature;
and if the next down-sampling magnification is not the target down-sampling magnification, fusing the zoomed image features with the initial image features under the next down-sampling magnification to obtain new initial image features.
In one possible embodiment, fusing the scaled image features with the initial image features at the next down-sampling magnification to obtain new initial image features, including:
and fusing the zoomed image features with the initial image features under the next down-sampling magnification and the initial image features zoomed from the last down-sampling magnification to the next down-sampling magnification to obtain new initial image features, wherein when the last down-sampling magnification is a plurality of different down-sampling magnifications which are sorted according to size, the next down-sampling magnification which is far away from the target down-sampling magnification from the down-sampling magnification to which the initial image features belong is obtained.
In one possible embodiment, scaling the fused image feature at the target down-sampling magnification to each of the other down-sampling magnifications to obtain the fused image feature at each of the other down-sampling magnifications includes:
scaling the fused image features under the target down-sampling magnification to each other down-sampling magnification;
and for each other down-sampling magnification, fusing the image features zoomed to the other down-sampling magnification with the initial image features of the other down-sampling magnification to obtain fused image features under the other down-sampling magnification.
In one possible embodiment, scaling each initial image feature at a down-sampling magnification other than the target down-sampling magnification to obtain a mapped image feature, includes:
and scaling the initial image features to the target down-sampling magnification by one-time scaling aiming at each initial image feature under other down-sampling magnifications except the target down-sampling magnification to obtain the mapping image features.
The Memory mentioned in the above electronic device may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.
In yet another embodiment of the present invention, a computer-readable storage medium is further provided, which stores instructions that, when executed on a computer, cause the computer to execute any one of the object detection methods in the above embodiments.
In yet another embodiment, a computer program product containing instructions is provided, which when run on a computer, causes the computer to perform any of the object detection methods of the above embodiments.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus, the electronic device, the computer-readable storage medium, and the computer program product embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiments.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (12)

1. A method of object detection, the method comprising:
acquiring a plurality of initial image characteristics of an image to be processed under a plurality of down-sampling magnifications;
for each initial image feature under other down-sampling magnifications except the target down-sampling magnification, zooming the initial image feature to the target down-sampling magnification to obtain a mapping image feature;
fusing the initial image features and the mapping image features under the target down-sampling magnification to obtain fused image features under the target down-sampling magnification;
scaling the fused image features under the target down-sampling magnification to each of the other down-sampling magnifications to obtain the fused image features under each of the other down-sampling magnifications;
and performing target detection on the image to be processed based on the fusion image characteristics of the image to be processed under the plurality of down-sampling magnifications.
2. The method of claim 1, wherein scaling the initial image feature to the target down-sampling magnification to obtain a mapped image feature comprises:
repeatedly executing the following operations until the mapping image characteristics are obtained:
scaling the initial image feature to a next down-sampling magnification, wherein the next down-sampling magnification is a next down-sampling magnification which is close to a target down-sampling magnification from the down-sampling magnification to which the initial image feature belongs when the plurality of different down-sampling magnifications are sequenced according to sizes;
if the next down-sampling magnification is the target down-sampling magnification, the zoomed image feature is used as the mapping image feature;
and if the next down-sampling magnification is not the target down-sampling magnification, fusing the zoomed image features with the initial image features under the next down-sampling magnification to obtain new initial image features.
3. The method of claim 2, wherein fusing the scaled image features with initial image features at a next down-sampling magnification to obtain new initial image features comprises:
and fusing the zoomed image features with the initial image features under the next down-sampling magnification and the initial image features zoomed from the last down-sampling magnification to the next down-sampling magnification to obtain new initial image features, wherein the last down-sampling magnification is the next down-sampling magnification from the down-sampling magnification to which the initial image features belong when the different down-sampling magnifications are sorted according to size to the target down-sampling magnification.
4. The method of claim 1, wherein scaling the fused image feature at the target down-sampling magnification to each of the other down-sampling magnifications to obtain the fused image feature at each of the other down-sampling magnifications comprises:
scaling the fused image features under the target down-sampling magnification to each of the other down-sampling magnifications;
and for each of the other down-sampling magnifications, fusing the image features scaled to the other down-sampling magnifications with the initial image features of the other down-sampling magnifications to obtain fused image features under the other down-sampling magnifications.
5. The method of claim 1, wherein scaling each initial image feature at a down-sampling magnification other than a target down-sampling magnification to the target down-sampling magnification to obtain a mapped image feature comprises:
and scaling the initial image features to the target down-sampling magnification by one-time scaling aiming at each initial image feature under other down-sampling magnifications except the target down-sampling magnification to obtain the mapping image features.
6. An object detection apparatus, characterized in that the apparatus comprises:
the characteristic extraction module is used for acquiring a plurality of initial image characteristics of the image to be processed under a plurality of down-sampling multiplying powers;
the characteristic mapping module is used for scaling the initial image characteristics to the target down-sampling multiplying power aiming at the initial image characteristics under other down-sampling multiplying powers except the target down-sampling multiplying power to obtain mapped image characteristics;
the feature fusion module is used for scaling the fusion image features under the target down-sampling magnification to each of the other down-sampling magnifications to obtain the fusion image features under each of the other down-sampling magnifications;
a reverse transfer module, configured to scale the fused image feature at the target down-sampling magnification to each of the other down-sampling magnifications to obtain the fused image feature at each of the other down-sampling magnifications
And the characteristic regression module is used for carrying out target detection on the image to be processed based on the fusion image characteristics of the image to be processed under the plurality of down-sampling multiplying powers.
7. The apparatus of claim 6, wherein the feature mapping module is specifically configured to repeatedly perform the following operations until the mapped image features are obtained:
scaling the initial image feature to a next down-sampling magnification, wherein the next down-sampling magnification is a next down-sampling magnification which is close to a target down-sampling magnification from the down-sampling magnification to which the initial image feature belongs when the plurality of different down-sampling magnifications are sequenced according to sizes;
if the next down-sampling magnification is the target down-sampling magnification, the zoomed image feature is used as the mapping image feature;
and if the next down-sampling magnification is not the target down-sampling magnification, fusing the zoomed image features with the initial image features under the next down-sampling magnification to obtain new initial image features.
8. The apparatus according to claim 7, wherein the feature mapping module is specifically configured to fuse the scaled image feature with an initial image feature at a next down-sampling magnification and an initial image feature scaled from a previous down-sampling magnification to a next down-sampling magnification to obtain a new initial image feature, where the previous down-sampling magnification is a next down-sampling magnification that is far from a target down-sampling magnification from a down-sampling magnification to which the initial image feature belongs when the plurality of different down-sampling magnifications are sorted according to size.
9. The apparatus according to claim 6, wherein the reverse transfer module is specifically configured to scale the fused image feature at the target down-sampling magnification to each of the other down-sampling magnifications;
and for each of the other down-sampling magnifications, fusing the image features scaled to the other down-sampling magnifications with the initial image features of the other down-sampling magnifications to obtain fused image features under the other down-sampling magnifications.
10. The apparatus according to claim 6, wherein the feature mapping module is specifically configured to, for each initial image feature at a down-sampling magnification other than the target down-sampling magnification, scale the initial image feature to the target down-sampling magnification by one scaling to obtain a mapped image feature.
11. An electronic device, comprising:
a memory for storing a computer program;
a processor for implementing the method steps of any one of claims 1 to 5 when executing a program stored in the memory.
12. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of the claims 1-5.
CN201911056322.7A 2019-10-31 2019-10-31 Target detection method and device and electronic equipment Active CN111767935B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911056322.7A CN111767935B (en) 2019-10-31 2019-10-31 Target detection method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911056322.7A CN111767935B (en) 2019-10-31 2019-10-31 Target detection method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN111767935A true CN111767935A (en) 2020-10-13
CN111767935B CN111767935B (en) 2023-09-05

Family

ID=72718411

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911056322.7A Active CN111767935B (en) 2019-10-31 2019-10-31 Target detection method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN111767935B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109711241A (en) * 2018-10-30 2019-05-03 百度在线网络技术(北京)有限公司 Object detecting method, device and electronic equipment
US20190156144A1 (en) * 2017-02-23 2019-05-23 Beijing Sensetime Technology Development Co., Ltd Method and apparatus for detecting object, method and apparatus for training neural network, and electronic device
CN109948524A (en) * 2019-03-18 2019-06-28 北京航空航天大学 A kind of vehicular traffic density estimation method based on space base monitoring
CN110263877A (en) * 2019-06-27 2019-09-20 中国科学技术大学 Scene character detecting method
WO2019196718A1 (en) * 2018-04-10 2019-10-17 阿里巴巴集团控股有限公司 Element image generation method, device and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190156144A1 (en) * 2017-02-23 2019-05-23 Beijing Sensetime Technology Development Co., Ltd Method and apparatus for detecting object, method and apparatus for training neural network, and electronic device
WO2019196718A1 (en) * 2018-04-10 2019-10-17 阿里巴巴集团控股有限公司 Element image generation method, device and system
CN109711241A (en) * 2018-10-30 2019-05-03 百度在线网络技术(北京)有限公司 Object detecting method, device and electronic equipment
CN109948524A (en) * 2019-03-18 2019-06-28 北京航空航天大学 A kind of vehicular traffic density estimation method based on space base monitoring
CN110263877A (en) * 2019-06-27 2019-09-20 中国科学技术大学 Scene character detecting method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
TAO YANG,ET AL: "Semantic segmentation via highly fused convolutional network with multiple soft cost functions", COGNITIVE SYSTEMS RESEARCH, vol. 53, pages 20 - 30, XP085543583, DOI: 10.1016/j.cogsys.2018.04.004 *
熊昌镇;等: "多模型集成的弱监督语义分割算法", 计算机辅助设计与图形学学报, vol. 31, no. 5, pages 800 - 807 *

Also Published As

Publication number Publication date
CN111767935B (en) 2023-09-05

Similar Documents

Publication Publication Date Title
JP6902611B2 (en) Object detection methods, neural network training methods, equipment and electronics
CN110544214A (en) Image restoration method and device and electronic equipment
US20200175062A1 (en) Image retrieval method and apparatus, and electronic device
CN107885796B (en) Information recommendation method, device and equipment
CN106649681B (en) Data processing method, device and equipment
CN110298858B (en) Image clipping method and device
CN107609437A (en) A kind of targeted graphical code recognition methods and device
US20190114711A1 (en) Financial analysis system and method for unstructured text data
CN107066519A (en) A kind of task detection method and device
CN108694574B (en) Resource transfer channel processing method, device and equipment
WO2020173136A1 (en) Method and apparatus for monitoring application system, device, and storage medium
CN112560957A (en) Neural network training and detecting method, device and equipment
CN111145202B (en) Model generation method, image processing method, device, equipment and storage medium
CN109189677B (en) Test method and device for updating state of variable value
CN111835536A (en) Flow prediction method and device
US11164056B2 (en) Method and system for applying barcode, and server
CN111767935B (en) Target detection method and device and electronic equipment
CN115689061B (en) Wind power ultra-short term power prediction method and related equipment
CN111767934B (en) Image recognition method and device and electronic equipment
CN115661564A (en) Training method and device of image processing model, electronic equipment and storage medium
CN111551499B (en) Method and device for measuring sugar content of fruit, computer equipment and storage medium
CN114066544A (en) Method, device and storage medium for showing product system architecture
US9536013B2 (en) Method and apparatus for web browsing of handheld device
CN110032498B (en) Prediction method for user APP behaviors
CN115424082B (en) Image processing method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant