CN113221925B - Target detection method and device based on multi-scale image - Google Patents

Target detection method and device based on multi-scale image Download PDF

Info

Publication number
CN113221925B
CN113221925B CN202110679907.5A CN202110679907A CN113221925B CN 113221925 B CN113221925 B CN 113221925B CN 202110679907 A CN202110679907 A CN 202110679907A CN 113221925 B CN113221925 B CN 113221925B
Authority
CN
China
Prior art keywords
image
feature map
resolution
target detection
image reconstruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110679907.5A
Other languages
Chinese (zh)
Other versions
CN113221925A (en
Inventor
单纯
王曦
宫英慧
周彦哲
李金泽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN202110679907.5A priority Critical patent/CN113221925B/en
Publication of CN113221925A publication Critical patent/CN113221925A/en
Application granted granted Critical
Publication of CN113221925B publication Critical patent/CN113221925B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4053Super resolution, i.e. output image resolution higher than sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Abstract

The invention provides a target detection method and a target detection device based on a multi-scale image, wherein the method comprises the steps of inputting an original image to obtain a candidate region; acquiring an original characteristic map of the candidate region; comparing the original characteristic diagram of the candidate region with a preset resolution, inputting the original characteristic diagram lower than the preset resolution into an image reconstruction network model, and performing image enhancement; and inputting the image features after image enhancement and the original feature map of the candidate region into YOLOV3 for target detection and classification. According to the scheme of the invention, the target detection performance of the target detection network on the low-resolution images is enhanced by utilizing the output of the trained image reconstruction network, the detection of small targets is emphasized, and the detection effect is good.

Description

Target detection method and device based on multi-scale image
Technical Field
The invention relates to the technical field of computer vision, in particular to a target detection method and device based on multi-scale images.
Background
Matching and detection of image objects have always been a very important problem in the field of computer vision. The target detection technology has a wide application range, and how to develop an effective, accurate and widely applicable target detection algorithm is particularly important. In the process of target detection, we usually encounter four classical errors, (a) class is misidentified; (b) a positioning error, in which only a part of the body of the sample is positioned; (c) occlusion induced target unrecognized error; (d) And (4) small target errors, because the occupied area of the target is too small, the characteristics of the target are not effectively recognized, and the classification is wrong.
In recent years, image target detection algorithms based on deep learning have achieved a breakthrough development. The detection is carried out through the convolutional neural network, and the precision is greatly improved. For the above errors, many excellent algorithms are proposed to optimize the target detection, so as to improve the accuracy and speed of the target detection. The improvement of the target detection algorithm mainly aims at the following aspects: (1) The first method mainly aims at the model infrastructure of the algorithm, namely, the structure of the deep network is improved, such as deepening the basic network. (2) The second method mainly aims at improving the characteristics, and methods for increasing the context information and multi-scale information of the characteristics are popular improvement modes at present, so that the detection capability of the algorithm on small targets is improved. (3) A third category of methods is directed to improvements in data enhancement methods. Data enhancement is the simplest and most effective method to improve model robustness and reduce overfitting. In addition, the target detection algorithm is mainly improved in the following stages of (1) image processing. And (2) a detection stage. And (3) a classification stage.
The target detection algorithm based on deep learning is mainly divided into two main categories, namely a regression-based detection algorithm and a region proposal-based detection algorithm according to the structural difference. The regression-based target detection algorithm mainly comprises algorithms such as YOLO, SSD, retinaNet, refineDet and the like, and the algorithm mainly obtains a result by performing primary regression and multi-classification calculation through features extracted from a main network. The detection algorithm based on the regional proposal mainly comprises algorithms such as R-CNN, SPPNET, fast-RCNN, faster-RCNN, R-FCN and FPN, the algorithms are detected in two stages, the first stage is mainly responsible for carrying out rough regression and classification on an initial frame anchor on the characteristics extracted from the image to obtain a proposal frame, the second stage is mainly used for carrying out further regression and classification calculation on the proposal frame (proposal) obtained by the detection in the first stage to obtain a result, all the results obtained by the network are subjected to post-processing operations such as non-maximum value inhibition and border crossing prevention, and finally all the obtained detection frames are marked on the original image to complete the detection.
However, the above two algorithms completely depend on the scale change of the anchor for the problem of the target scale change, and cannot well solve the problem of the scale change in the target detection, especially the problem of the small target detection.
Disclosure of Invention
In order to solve the technical problems, the invention provides a target detection method and a target detection device based on a multi-scale image, and the method and the device are used for solving the problem that the target detection in the prior art cannot well solve the scale change in the target detection, especially the technical problem of small target detection.
According to a first aspect of the present invention, there is provided a method for multi-scale image-based object detection, the method comprising the steps of:
step S101: inputting an original image to obtain a candidate region;
step S102: acquiring an original characteristic map of the candidate region;
step S103: comparing the original characteristic diagram of the candidate region with a preset resolution, inputting the original characteristic diagram lower than the preset resolution into an image reconstruction network model, and performing image enhancement;
step S104: and inputting the image features after image enhancement and the original feature map of the candidate region into YOLOV3 for target detection and classification.
According to a second aspect of the present invention, there is provided a multi-scale image-based object detection apparatus, the apparatus comprising:
a candidate region acquisition module: the method comprises the steps of inputting an original image to obtain a candidate region;
an original characteristic diagram acquisition module: the method comprises the steps of obtaining an original feature map of the candidate region;
an image enhancement module: comparing the original characteristic diagram of the candidate region with a preset resolution, inputting the original characteristic diagram lower than the preset resolution into an image reconstruction network model, and performing image enhancement;
a target detection module: and inputting the image features after image enhancement and the original feature map of the candidate region into YOLOV3 for target detection and classification.
According to a third aspect of the present invention, there is provided a multi-scale image-based object detection system, comprising:
a processor for executing a plurality of instructions;
a memory to store a plurality of instructions;
wherein the instructions are stored by the memory, and loaded and executed by the processor to perform the multi-scale image-based object detection method as described above.
According to a fourth aspect of the present invention, there is provided a computer readable storage medium having a plurality of instructions stored therein; the instructions are used for loading and executing the multi-scale image-based target detection method by the processor.
According to the scheme of the invention, some improved algorithms are provided based on the aspect of multi-scale features, multi-scale feature expression is realized by methods such as feature fusion, feature enhancement and the like, and a new end-to-end network structure is provided for the purpose: and detecting the object in the low-resolution image through the cooperative learning of two deep neural networks, namely an image reconstruction network and an object detection network. Firstly, a target detection network is trained, secondly, an image reconstruction network enhances a low-resolution image into a high-resolution image through the assistance of the target detection network, and finally, the target detection performance of the target detection network on the low-resolution image is enhanced by utilizing the output of the trained image reconstruction network. According to the scheme of the invention, the detection of the small target is emphasized, and the low-resolution picture can be detected by using the IRN.
The foregoing description is only an overview of the technical solutions of the present invention, and in order to make the technical solutions of the present invention more clearly understood and to make the technical solutions of the present invention practical in accordance with the contents of the specification, the following detailed description is given of preferred embodiments of the present invention with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and, together with the description, serve to explain the principles of the invention. In the drawings:
FIG. 1 is a flowchart of a multi-scale image-based target detection method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an overall structure of a multi-scale image-based target detection method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an overall structure of an image reconstruction network model according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of image reconstruction for an image reconstruction network according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of an upsampling process according to an embodiment of the present invention;
FIG. 6 is a schematic view of a downsampling according to one embodiment of the present invention;
fig. 7 is a block diagram of a multi-scale image-based object detection apparatus according to an embodiment of the present invention.
Detailed Description
First, a flow of a multi-scale image-based target detection method according to an embodiment of the present invention is described with reference to fig. 1. As shown in fig. 1-2, the method comprises the steps of:
step S101: inputting an original image to obtain a candidate region;
step S102: acquiring an original feature map of the candidate region;
step S103: comparing the original characteristic diagram of the candidate region with a preset resolution, inputting the original characteristic diagram lower than the preset resolution into an image reconstruction network model, and performing image enhancement;
step S104: and inputting the image features after image enhancement and the original feature map of the candidate region into YOLOV3 for target detection and classification.
The step S101: an original image is input to obtain a candidate region, and in the present embodiment, the candidate region is obtained by a region generation network (RPN).
The step S102: obtaining the original feature map of the candidate region, in this embodiment, rolPooling obtains the original feature map of the candidate region.
The step S103: comparing the original characteristic diagram of the candidate region with a preset resolution, inputting the original characteristic diagram lower than the preset resolution into an image reconstruction network model, and performing image enhancement, wherein:
and carrying out no processing on the original feature map of the candidate region higher than or equal to the preset resolution.
In this embodiment, as shown in fig. 3 to 4, the image reconstruction network model includes an Image Reconstruction Network (IRN) and a target detection network, an input of the image reconstruction network is an image with a resolution lower than a preset resolution, an output of the image reconstruction network is a reconstructed image RLR, and a pixel size of the reconstructed image RLR is the same as a pixel size of an HR image output by the image reconstruction network; with the reconstructed image RLR as an input of the object detection network, a loss is calculated based on the reconstructed image RLR and a feature map HR acquired via an up-sampling operation of the image reconstruction network, thereby adjusting a parameter of the image reconstruction network.
The image reconstruction network comprises a plurality of convolution layers and a plurality of branches with different levels; for an input original feature map lower than a preset resolution, after convolution operation of the plurality of convolution layers, obtaining a branch with the lowest feature vector input level; each branch comprises a plurality of sampling blocks, each sampling block comprises an up sampling block and a down sampling block, and the transmission characteristics of each branch are enhanced in a certain proportion in the forward propagation process of each branch through the sampling blocks; for each branch of the plurality of branches: transmitting the upsampling characteristics of the branch to the corresponding sampling block in the branch with the higher level than the sampling block through each sampling block; the downsampling characteristic of the branch is transmitted to the corresponding sample block in the branch with the lower level than the branch through each sample block.
In this embodiment, the upsampling operation of the upsampling block and the downsampling operation of the downsampling block may be performed concurrently.
The image reconstruction network adopting the structure has the advantages that:
(1) The overall architecture starts with the low resolution profile as a first stage, adding step by step low to high resolution operations to form more stages, connecting subnets with different resolutions in parallel.
(2) Multi-scale fusion is performed, which is performed with the help of low resolution representations of the same depth and similar level to boost the high resolution representation. I.e. each subnet repeatedly accepts information from other parallel subnets.
For the operation expanded by 4 times, a total of three branches are provided in the embodiment, and the size of the feature map remains unchanged during the forward propagation of each branch. The three branches are different, but there is communication of information between each branch. For example, in the forward process, the lowest branch in the figure, i.e., branch 1, will expand its feature map by an upsampling block, which comprises 3 units (as shown in fig. 5), and then pass into branches 2 and 3, while branch 2 will also pass through a downsampling block (as shown in fig. 6), and send the reduced feature map to branch 1. In this embodiment, the upsampling operation of the upsampling block and the downsampling operation of the downsampling block may be performed at the same stage.
In this embodiment, the feature map lower than the threshold is input into the network, and through image reconstruction of the feature map, even if three different branches are provided, each branch is subjected to a parallel processing mode of upsampling and downsampling to obtain target features of different pixels, and meanwhile, through convolution, automatic noise reduction and feature enhancement, communication among the branches is subjected to a feature fusion mode, and finally a fused single feature map is obtained. The loss is calculated using the acquired enhanced low resolution image (RLR) and the target detection results of the upsampled acquired feature map (HR).
As shown in fig. 5, the upsampling block is composed of a first sub-pixel convolution unit, a first convolution unit, and a second sub-pixel convolution unit; the low-resolution feature map L0 is subjected to sub-pixel convolution of a first sub-pixel convolution unit to generate a high-resolution feature map H0; the high-resolution feature map H0 is converted into a low-resolution feature map L1 through the convolution operation of a first convolution unit; subtracting pixels from L0 to L1 to find the difference between the low-resolution feature maps; and the low-resolution feature map L1 is subjected to sub-pixel convolution of a second sub-pixel convolution unit to generate a high-resolution feature map H1, and the two high-resolution feature maps H0 and H1 are added pixel by pixel to output a high-resolution feature map HR.
As shown in fig. 6, the downsampling block is composed of a first convolution unit, a first sub-pixel convolution unit, and a second convolution unit; the high-resolution feature map H0 'is convolved by a first convolution unit to generate a low-resolution feature map L0'; the low-resolution feature map L0 'is converted into a high-resolution feature map H1' through the convolution operation of the first sub-pixel convolution unit; adding H0 'and H1' pixel by pixel for fusion; and the high-resolution feature map H1 'generates a low-resolution feature map L1' through convolution of the second convolution unit, pixel-by-pixel subtraction is carried out on the low high-resolution feature maps L0 'and L1', and a low-resolution feature map LR is output.
The loss function of the Image Reconstruction Network (IRN) is:
the task of IRN is to reconstruct high resolution images from low resolution images, and it is important to design an appropriate loss function in order to obtain the desired enhancement effect. Since our ultimate goal is to improve the accuracy of object detection, we wish to focus on the information related to the object to reconstruct a high resolution image. Based on the typical reconstruction loss in super-resolution, we add three auxiliary loss functions that play a secondary role in reconstructing the image.
RLoss is the error between the RLR image generated by the image reconstruction network and the HR output by the upsampling module.
Eloss-the edges between RLR and HR were extracted separately using the classical Sobel operator and then the average of the pixel differences was calculated.
Ploss extracts perceptual features from Frozen layers in the object recognition network, respectively, and then calculates the perceptual loss using the euclidean distance between the two extracted feature vectors.
4. Total loss of image reconstruction network:
Total Loss=w 1 RLoss+w 2 ELoss+w 3 PLoss
wherein w 1 ,w 2 ,w 3 The weight coefficient is specifically set according to experiments.
As shown in fig. 3, the training process of the image reconstruction network model includes:
step S301: training the target detection network by using the HR image output by the up-sampling module, and keeping the parameters of certain layers of YOLOV3 unchanged;
in this embodiment, parameters of some layers of YOLOV3 are kept unchanged to retain general feature extraction capability of the target detection network, and the network is used to guide an Image Reconstruction Network (IRN).
Step S302: fixing parameters of the target detection network, training an Image Reconstruction Network (IRN) in a supervised manner using training samples and the target recognition network;
in this embodiment, the training sample is an image with a resolution lower than a preset threshold as a low resolution image (LR), the output of the image reconstruction network is a reconstructed image (RLR) with the same size as HR pixels of the image reconstruction network, and a reconstruction loss and an edge loss are calculated by using a difference between the RLR and the HR image; the total loss of detection is calculated by using the target detection network with the RLR image as input. By using recombination losses, image Reconstruction Networks (IRNs) focus on using information useful for object detection to accomplish the reconstruction of images. The Total Loss of the image reconstruction network is Total Low = w 1 RLoss+w 2 ELoss+w 3 PLoss。
Step S303: training the target detection network using reconstructed images (RLR) generated by the Image Reconstruction Network (IRN), at which stage the parameters of the Image Reconstruction Network (IRN) are fixed and the parameters of the target detection network are trained.
In this embodiment, all layers of the object detection network are not Frozen (Frozen) to enhance the object detection capability. After training is complete, the entire process can be applied to the new LR images. The LR image is input to the IRN to generate a reconstructed image, which is then input as input into the object detection network. The final result is then predicted by the target detection network.
An embodiment of the present invention further provides a target detection apparatus based on a multi-scale image, and as shown in fig. 7, the apparatus includes:
a candidate region acquisition module: the method comprises the steps of inputting an original image to obtain a candidate region;
an original characteristic diagram acquisition module: the method comprises the steps of obtaining an original feature map of the candidate region;
an image enhancement module: comparing the original characteristic diagram of the candidate region with a preset resolution, inputting the original characteristic diagram lower than the preset resolution into an image reconstruction network model, and performing image enhancement;
a target detection module: and inputting the image features after image enhancement and the original feature map of the candidate region into YOLOV3 for target detection and classification.
The embodiment of the invention further provides a target detection system based on multi-scale images, which comprises the following steps:
a processor for executing a plurality of instructions;
a memory for storing a plurality of instructions;
wherein the instructions are stored by the memory, and loaded and executed by the processor to perform the multi-scale image-based object detection method as described above.
Embodiments of the present invention further provide a computer-readable storage medium having a plurality of instructions stored therein; the instructions are used for loading and executing the multi-scale image-based target detection method by the processor.
It should be noted that the embodiments and features of the embodiments of the present invention may be combined with each other without conflict.
In the several embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one type of logical functional division, and other divisions may be realized in practice, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or in the form of hardware plus a software functional unit.
The integrated unit implemented in the form of a software functional unit may be stored in a computer-readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a physical machine server, or a network cloud server, etc., and needs to install a Ubuntu operating system) to perform some steps of the method according to various embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and any simple modification, equivalent change and modification made to the above embodiment according to the technical spirit of the present invention are still within the scope of the technical solution of the present invention.

Claims (8)

1. A target detection method based on multi-scale images is characterized by comprising the following steps:
step S101: inputting an original image to obtain a candidate region;
step S102: acquiring an original characteristic map of the candidate region;
step S103: comparing the original characteristic diagram of the candidate region with a preset resolution, inputting the original characteristic diagram lower than the preset resolution into an image reconstruction network model, and performing image enhancement;
step S104: inputting the image features after image enhancement and the original feature map of the candidate region into YOLOV3 for target detection and classification;
the image reconstruction network model comprises an image reconstruction network and a target detection network, wherein the input of the image reconstruction network is an image with the resolution lower than the preset resolution, the output of the image reconstruction network is a reconstructed image RLR, and the pixel size of the reconstructed image RLR is the same as the pixel size of an HR image output by the image reconstruction network; calculating a loss based on the reconstructed image RLR and a feature map HR obtained through an up-sampling operation of the image reconstruction network by taking the reconstructed image RLR as an input of a target detection network, so as to adjust parameters of the image reconstruction network;
the image reconstruction network comprises a plurality of convolution layers and a plurality of branches with different levels; for an input original feature map lower than a preset resolution, after convolution operation of the plurality of convolution layers, obtaining a branch with the lowest feature vector input level; each branch comprises a plurality of sampling blocks, each sampling block comprises an up-sampling block and a down-sampling block, and the transmission characteristics of each branch are enhanced in a certain proportion in the forward propagation process of each branch through the sampling blocks; for each branch of the plurality of branches: transmitting the upsampling characteristics of the branch to the corresponding sampling blocks in the branches with the levels higher than the sampling blocks by the sampling blocks; the downsampling characteristics of the branch are transmitted by each sample block to the corresponding sample block in the branch with the lower level than itself.
2. The multi-scale image based object detection method of claim 1, wherein the upsampling block is composed of a first sub-pixel convolution unit, a first convolution unit, and a second sub-pixel convolution unit; the low-resolution feature map L0 is subjected to sub-pixel convolution of a first sub-pixel convolution unit to generate a high-resolution feature map H0; the high-resolution feature map H0 is converted into a low-resolution feature map L1 through the convolution operation of a first convolution unit; subtracting pixels from L0 to L1 to find the difference between the low-resolution feature maps; and the low-resolution feature map L1 generates a high-resolution feature map H1 through the sub-pixel convolution of the second sub-pixel convolution unit, and the two high-resolution feature maps H0 and H1 are added pixel by pixel to output a feature map HR.
3. The multi-scale image-based object detection method of claim 2, wherein the downsampling block is composed of a first convolution unit, a first sub-pixel convolution unit, and a second convolution unit; the high-resolution feature map H0 'is convolved by a first convolution unit to generate a low-resolution feature map L0'; the low-resolution feature map L0 'is converted into a high-resolution feature map H1' through the convolution operation of the first sub-pixel convolution unit; adding H0 'and H1' pixel by pixel for fusion; and the high-resolution feature map H1 'generates a low-resolution feature map L1' through convolution of the second convolution unit, pixel-by-pixel subtraction is carried out on the two low-resolution feature maps L0 'and L1', and a feature map LR is output.
4. The multi-scale image-based object detection method according to claim 3,
the loss function of the Image Reconstruction Network (IRN) is:
Total Loss=w 1 RLoss+w 2 ELoss+w 3 PLoss
wherein w 1 ,w 2 ,w 3 RLoss is the error between the RLR image generated by the image reconstruction network and HR, which is a weight coefficient; ELoss is to adopt Sobel operator to extract the edge between RLR and HR separately, then, calculate the average value of the pixel difference; PLoss is the perceptual loss calculated by extracting perceptual features from Frozen layers in the object recognition network, respectively, and then using the euclidean distance between the two extracted feature vectors.
5. The multi-scale image-based target detection method of claim 4, wherein the training process of the image reconstruction network model is as follows:
step S301: training the target detection network by using the HR image output by the up-sampling module, and keeping the parameters of certain layers of YOLOV3 unchanged;
step S302: fixing parameters of the target detection network, and training an image reconstruction network in a supervised manner by using a training sample and the target detection network;
step S303: and training the target detection network by using a reconstructed image RLR generated by the image reconstruction network, and at this stage, fixing the parameters of the image reconstruction network and training the parameters of the target detection network.
6. An apparatus for object detection based on multi-scale images, the apparatus comprising:
a candidate region acquisition module: the method comprises the steps of inputting an original image to obtain a candidate region;
an original characteristic diagram acquisition module: the method comprises the steps of obtaining an original feature map of the candidate region;
an image enhancement module: comparing the original characteristic diagram of the candidate region with a preset resolution, inputting the original characteristic diagram lower than the preset resolution into an image reconstruction network model, and performing image enhancement;
a target detection module: inputting the image features after image enhancement and the original feature map of the candidate region into YOLOV3 for target detection and classification;
the image reconstruction network model comprises an image reconstruction network and a target detection network, wherein the input of the image reconstruction network is an image with the resolution lower than the preset resolution, the output of the image reconstruction network is a reconstructed image RLR, and the pixel size of the reconstructed image RLR is the same as the pixel size of an HR image output by the image reconstruction network; calculating a loss based on the reconstructed image RLR and a feature map HR obtained through an up-sampling operation of the image reconstruction network by taking the reconstructed image RLR as an input of a target detection network, so as to adjust parameters of the image reconstruction network;
the image reconstruction network comprises a plurality of convolution layers and a plurality of branches with different levels; for an input original feature map lower than a preset resolution, after convolution operation of the plurality of convolution layers, obtaining a branch with the lowest feature vector input level; each branch comprises a plurality of sampling blocks, each sampling block comprises an up sampling block and a down sampling block, and the transmission characteristics of each branch are enhanced in a certain proportion in the forward propagation process of each branch through the sampling blocks; for each branch of the plurality of branches: transmitting the upsampling characteristics of the branch to the corresponding sampling block in the branch with the higher level than the sampling block through each sampling block; the downsampling characteristic of the branch is transmitted to the corresponding sample block in the branch with the lower level than the branch through each sample block.
7. A multi-scale image based object detection system, comprising:
a processor for executing a plurality of instructions;
a memory for storing a plurality of instructions;
wherein the plurality of instructions for storing by the memory and for loading and executing by the processor the method for multi-scale image based object detection according to any one of claims 1-5.
8. A computer-readable storage medium having stored therein a plurality of instructions; the plurality of instructions for loading and executing by a processor the method for multi-scale image based object detection according to any of claims 1-5.
CN202110679907.5A 2021-06-18 2021-06-18 Target detection method and device based on multi-scale image Active CN113221925B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110679907.5A CN113221925B (en) 2021-06-18 2021-06-18 Target detection method and device based on multi-scale image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110679907.5A CN113221925B (en) 2021-06-18 2021-06-18 Target detection method and device based on multi-scale image

Publications (2)

Publication Number Publication Date
CN113221925A CN113221925A (en) 2021-08-06
CN113221925B true CN113221925B (en) 2022-11-11

Family

ID=77080572

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110679907.5A Active CN113221925B (en) 2021-06-18 2021-06-18 Target detection method and device based on multi-scale image

Country Status (1)

Country Link
CN (1) CN113221925B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI779784B (en) * 2021-08-19 2022-10-01 中華電信股份有限公司 Feature analysis system, method and computer readable medium thereof
CN115601357B (en) * 2022-11-29 2023-05-26 南京航空航天大学 Stamping part surface defect detection method based on small sample
CN115937794B (en) * 2023-03-08 2023-08-15 成都须弥云图建筑设计有限公司 Small target object detection method and device, electronic equipment and storage medium
CN117197756B (en) * 2023-11-03 2024-02-27 深圳金三立视频科技股份有限公司 Hidden danger area intrusion detection method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110443172A (en) * 2019-07-25 2019-11-12 北京科技大学 A kind of object detection method and system based on super-resolution and model compression
CN111626208A (en) * 2020-05-27 2020-09-04 北京百度网讯科技有限公司 Method and apparatus for detecting small targets
CN112597887A (en) * 2020-12-22 2021-04-02 深圳集智数字科技有限公司 Target identification method and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109345449B (en) * 2018-07-17 2020-11-10 西安交通大学 Image super-resolution and non-uniform blur removing method based on fusion network
CN112446826A (en) * 2019-09-03 2021-03-05 联咏科技股份有限公司 Method and device for image super-resolution, image enhancement and model training

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110443172A (en) * 2019-07-25 2019-11-12 北京科技大学 A kind of object detection method and system based on super-resolution and model compression
CN111626208A (en) * 2020-05-27 2020-09-04 北京百度网讯科技有限公司 Method and apparatus for detecting small targets
CN112597887A (en) * 2020-12-22 2021-04-02 深圳集智数字科技有限公司 Target identification method and device

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Mixed YOLOv3-LITE:A Lightweight Real-Time Object Detection Method;Haipeng Zhao;《Sensors》;20200327;第1-18页 *
Real Time Object Detection Using YOLOv3;Omkar Masurekar等;《International Research Journal of Engineering and Technology (IRJET)》;20200331;第7卷(第3期);第3764-3768页 *
一种改进 YOLOv3的动态小目标检测方法;崔艳鹏等;《西安电子科技大学学报》;20200630;第47卷(第3期);第1-7页 *
基于特征金字塔模型的高分辨率遥感图像船舶目标检测;周慧等;《大连海事大学学报》;20191115;第45卷(第04期);第131-138页 *

Also Published As

Publication number Publication date
CN113221925A (en) 2021-08-06

Similar Documents

Publication Publication Date Title
CN113221925B (en) Target detection method and device based on multi-scale image
US11176381B2 (en) Video object segmentation by reference-guided mask propagation
US10353271B2 (en) Depth estimation method for monocular image based on multi-scale CNN and continuous CRF
CN111507333B (en) Image correction method and device, electronic equipment and storage medium
CN111968064B (en) Image processing method and device, electronic equipment and storage medium
CN110580680B (en) Face super-resolution method and device based on combined learning
CN111784762B (en) Method and device for extracting blood vessel center line of X-ray radiography image
CN111275034B (en) Method, device, equipment and storage medium for extracting text region from image
Couturier et al. Image denoising using a deep encoder-decoder network with skip connections
CN113610087B (en) Priori super-resolution-based image small target detection method and storage medium
JP2019194821A (en) Target recognition device, target recognition method, and program
CN112308866A (en) Image processing method, image processing device, electronic equipment and storage medium
Chen et al. Single depth image super-resolution using convolutional neural networks
KR102628115B1 (en) Image processing method, device, storage medium, and electronic device
CN111105452A (en) High-low resolution fusion stereo matching method based on binocular vision
CN108305268B (en) Image segmentation method and device
Zhang et al. Depth enhancement with improved exemplar-based inpainting and joint trilateral guided filtering
CN113077419A (en) Information processing method and device for hip joint CT image recognition
CN116645592A (en) Crack detection method based on image processing and storage medium
CN116152266A (en) Segmentation method, device and system for ultrasonic image of puncture needle
CN112634298B (en) Image processing method and device, storage medium and terminal
CN114529828A (en) Method, device and equipment for extracting residential area elements of remote sensing image
KR20220079125A (en) System and method for semi-supervised single image depth estimation and computer program for the same
CN111369425A (en) Image processing method, image processing device, electronic equipment and computer readable medium
CN112017113B (en) Image processing method and device, model training method and device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant