CN116258943A - Image plane area detection method based on edge information - Google Patents
Image plane area detection method based on edge information Download PDFInfo
- Publication number
- CN116258943A CN116258943A CN202310279248.5A CN202310279248A CN116258943A CN 116258943 A CN116258943 A CN 116258943A CN 202310279248 A CN202310279248 A CN 202310279248A CN 116258943 A CN116258943 A CN 116258943A
- Authority
- CN
- China
- Prior art keywords
- features
- level
- image plane
- network
- edge
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/35—Categorising the entire scene, e.g. birthday party or wedding scene
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Computing Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biodiversity & Conservation Biology (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses an image plane area detection method based on edge information, which comprises the steps of firstly extracting basic skeleton network characteristics of each level of an image by adopting a neural network, then extracting multi-level context characteristics of each obtained level of characteristics in an up-sampling and interlayer linking mode, extracting multi-level edge characteristics of the image by iteratively optimizing a network layer by using the obtained characteristics of the middle level, and optimizing different levels of characteristics to expected characteristic dimensions; and fusing the multi-level context features and the multi-level edge features in pairs level by level, and finally providing the fused features for a target model of the downstream image plane detection task to be trained so as to improve the performance of the target model in the downstream image plane detection task.
Description
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to an image plane area detection method based on edge information.
Background
Planar regions in a scene provide important information for a number of vision-based applications, including computer vision, stereoscopic vision, and robotic vision. Planar region detection is one of the important challenges in the categories of computer vision, pattern recognition, etc., in order to make a machine as human-like with the perceptibility of high-level scene structures. After extracting all planes from a single image, people can select the planes of interest to them and design efficient and attractive applications based on these planar areas. For example, people may decorate walls with favorite textures, or advertisers may leverage less informative areas (e.g., tables, walls, and boards) in promotional videos to more effectively market their products. In addition, planar features are also key cues for autonomous robots to perceive the surrounding environment and build maps through camera views.
Unlike detection and segmentation in traditional objects, plane detection is more limited and more challenging, because on the one hand, the assumption from a plane that lacks a given class and requires segmentation of planes of any class, and on the other hand, the boundaries of planar regions are difficult to define due to the high level abstraction of structural information within the scene, such as planar normal vectors. With the advent of deep neural networks, analysis of planar regions has become possible due to the ability of convolutional neural networks to learn high-level features characterization of images. The paper "Single-Image Piece-wise Planar 3DReconstruction via Associative Embedding" published by Yu et al 2019 at CVPR (1029-1037) and the paper "PlaneRCNN:3D Plane Detection and Reconstruction from a Single Image" published by Liu et al 2019 at CVPR (4450-4459) designed a CNN-based architecture to analyze Planar regions. The former uses a two-stage bottom-up network architecture, and the latter uses Mask R-CNN to generate any number of planes, and designs the features of all planes to be integrated by the refining module to further refine the prediction result. However, due to the challenge of plane class agnostic, the segmentation masks predicted by these methods are not accurate and small-sized planes are difficult to detect.
Recently, edge information has been shown to be effective in helping models learn more discriminative features in the fields of salient object detection, scene segmentation and parsing. "Self-correction for human parsing" published in TPAMI by Li et al 2021 constructs an additional edge network to estimate the edges of objects and proposes a Self-correcting strategy to remove false labels. However, it only optimizes the features of the edge region, while the edge region pixels occupy very little, about 1%, in the image.
Disclosure of Invention
The invention aims to improve the defects of the existing plane area detection method, and provides an image plane area detection method based on edge information, which can improve the performance of a downstream image plane analysis task.
The invention is realized by the following technical scheme:
an image plane area detection method based on edge information comprises the following steps:
step 1, extracting basic skeleton network characteristics of each level from an image by adopting a neural network;
step 2, extracting multi-level context features from the features of each level obtained in the step 1 in an up-sampling and interlayer linking mode;
step 3, extracting multi-level edge features of the image by iteratively optimizing the network layer according to the features of the intermediate level obtained in the step 1, and optimizing the features of different levels to expected feature dimensions;
step 4, fusing the multi-level context features obtained in the step 2 and the multi-level edge features obtained in the step 3 in pairs layer by layer, and keeping the resolution of the context features and the edge features with different resolutions the same;
and 5, providing the fused characteristics to a target model of the downstream image plane detection task to be trained so as to improve the performance of the target model in the downstream image plane detection task.
In the above technical solution, in step 1, a ResNet network structure is adopted, and the characteristics of each level output are expressed as
In the above technical solution, in step 2, using an FPN network to assist a res net neural network, extracting multi-level context features from multi-level basic skeleton network features by means of up-sampling and interlayer linking;
by Py i (. Cndot.) (i=2, 3,4,5, 6) represents functions of different levels of the FPN network, usingRepresenting the output of each layer of the FPN network, processing the output of each layer of the ResNet neural network in such a way that the contextual characteristics of the image output by each layer of the FPN network are obtained>
Up represents upsampling.
In the above technical solution, in step 3, three middle-level features of ResNet network output are selectedExtracting multi-level edge features of an image by iteratively optimizing a network layer, and adding the features of the three intermediate levels to each other>As input, input to an iterative optimization network layer composed of a plurality of channel smoothing reduction modules, and the iterative optimization network layer smoothly reduces the channel number of the input features by 2 times until the channel number of all the input features is reduced to 256, thereby obtaining edge features->
In the technical scheme, the multi-level context features obtained in the step 2 and the multi-level edge features obtained in the step 3 are sent into a resolution self-adaptive module to be fused in pairs in a level-by-level manner;
the resolution adaptive module is formed by combining a group of adaptive convolution kernels, and the specific process is defined as follows:
Tr 4 =Conv1(·)
Tr i the paired fusion operation representing the ith hierarchy is realized by adopting the combination of convolution layers; conv3 (·) and Conv1 (·) represent convolution operations with convolution kernel sizes 3 and 1, respectively; the degree represents the combined operation of the multiple convolutional layers.
In the above technical solution, in step 5, when training the target model, the mixing loss of the target model is calculated, where the mixing loss includes two terms, namely, edge loss and loss of the original image plane area detection model, where the original image plane area detection model refers to the neural network used in step 1.
The present invention also provides a computer readable storage medium storing a computer program which when executed implements the steps of the above method.
The invention has the advantages and beneficial effects that:
according to the invention, the edge features and the context features are extracted by utilizing a multi-level convolutional neural network, two plane features with different resolutions are respectively extracted by the multi-level convolutional network, wherein the designed cyclic convolutional layer circularly optimizes the perception fields of the features according to the different resolutions. After that, the invention carries out feature fusion, under the guidance of innovative resolution self-adaptive fusion operation and plane edge supervision, edge features and context features of different levels are aggregated into multi-level plane features and provided for any downstream image plane analysis model, thereby improving the performance of the target model in the downstream image plane analysis task. Experimental results show that the method can effectively improve the performance of the model in a plurality of image plane analysis tasks.
Drawings
Fig. 1 is a flowchart of an image plane area detection method based on edge information according to the present invention.
Fig. 2 is a visual image of three extracted discriminant features of the present invention, taking the planrcnn planar region detection model as an example of a target model.
FIG. 3 is a visual presentation of the predicted results of the present invention in planar analysis downstream task planar segmentation.
FIG. 4 is a visual presentation of the prediction results of the present invention in planar analysis downstream task depth estimation.
FIG. 5 is a visual presentation of the predicted results of the present invention in a planar analysis downstream task 3D reconstruction.
Other relevant drawings may be made by those of ordinary skill in the art from the above figures without undue burden.
Detailed Description
In order to make the person skilled in the art better understand the solution of the present invention, the following describes the solution of the present invention with reference to specific embodiments.
An image plane area detection method based on edge information, referring to fig. 1, comprises the following steps:
and step 1, extracting basic skeleton network characteristics of each level from the image by adopting a ResNet neural network.
ResNet is a common multi-scale neural network structure that internally contains 5 blocks, each block being defined as a block according to the hierarchy to which it belongs i (i=1, 2, …, 5), each block corresponding to a respective hierarchy, the characteristics of the output of each block (i.e. each hierarchy) being expressed as
And 2, using a FPN (Feature Pyramid Networks) network to assist a ResNet neural network, and extracting multi-level context characteristics from the multi-level basic skeleton network characteristics in an up-sampling and interlayer linking mode.
By Py i (. Cndot.) (i=2, 3,4,5, 6) represents functions of different levels of the FPN network, usingRepresenting the output of each layer of the FPN network, processing the output of each layer of the ResNet neural network in such a way that the contextual characteristics of the image output by each layer of the FPN network are obtained>
Up represents upsampling.
And 3, extracting multi-level edge features.
In order to reduce the calculation cost, the multi-level edge feature extraction shares the basic skeleton network features of each level extracted by ResNet, and the edge details of the higher level can be seriously lost due to the smaller receptive field of the lower level, so the invention selects the features of three middle levels output by the ResNet networkThe multi-level edge features of the image are extracted by iteratively optimizing the network layer and optimizing the different level features to the desired feature dimensions.
Features of the three intermediate levelsAs input, input to an iterative optimization network layer composed of a plurality of channel smoothing reduction modules, and the iterative optimization network layer smoothly reduces the channel number of the input features by 2 times until the channel number of all the input features is reduced to 256, thereby obtaining edge features->This process is describedThe method comprises the following steps:
Ed 1 representing an iterative optimization network layer, edge features are optimized multiple times until their feature dimensions conform to predefined feature dimensions (256 feature dimensions in this embodiment, but other dimensions are also possible).
In order to detect an image plane, especially a small-sized image plane, attention needs to be paid to the edge and the main area of the image plane at the same time, and since the context features obtained in the step 2 represent the main area of the image plane and the edge features obtained in the step 3 represent the edge area of the image plane, the multi-level context features obtained in the step 2 and the multi-level edge features obtained in the step 3 are sent into the resolution adaptive module to be fused in pairs layer by layer, and the resolution of the context features and the edge features with different resolutions originally are kept the same.
The resolution adaptive module is formed by combining a group of adaptive convolution kernels, and the specific process is defined as follows:
Tr 4 =Conv1(·)
Tr i the paired fusion operation representing the ith hierarchy is realized by adopting the combination of convolution layers; conv3 (() and Conv1 (()) represent convolution operations with convolution kernel sizes 3 and 1, respectively, and DEG represents a combination operation of multiple convolution layers.
And 5, fusing the edge features and the context features of different levels into 5 levels of plane features, and providing the fused plane features of each level for a target model of a downstream image plane detection task to be trained so as to improve the performance of the target model in the downstream image plane detection task. Since the dimensions and dimensions of the planar features of each level after fusion are the same as those of the features generated by the object model to be trained (in this embodiment, the ResNet model), the newly fused features can be easily fed into the existing object model.
Fig. 2 is a visual image of three extracted discriminant features of the present invention, using the planrcnn planar region detection model as an example of a target model.
The downstream image plane detection task may be: the tasks of plane segmentation, image depth estimation, image 3D plane reconstruction and the like refer to figures 3-5, and the results of plane segmentation, image depth estimation and image 3D plane reconstruction prediction are respectively displayed by adopting the method of the invention.
Further, in using the method proposed by the present invention, it is necessary to calculate the mixing loss of the target model. The blending loss generally contains two terms, namely an edge loss that helps to learn the model boundary information and a loss of the original image plane area detection model (the original image plane area detection model refers to the network model used in step 1). Formally, the mixing loss is defined as:
here the number of the elements is the number,represents edge loss, ++>Representing the loss of detection of the original image plane area, the ratio between them uses the super parameter lambda e And lambda (lambda) d AdjustingAnd (5) a section. For edge loss, to facilitate performance of downstream modules, model learning of edge information of the plane is desirable. In particular, this loss should explicitly preserve the edges of the true planar region, defined as follows:
here the number of the elements is the number,representing the value of the i-th pixel in the predicted edge map, N represents the number of pixels contained in the edge map, n+ represents the number of edge region pixels in the real planar region, and N-represents the number of non-edge region pixels in the real planar region.
For the planar area detection loss, calculation is performed according to a planar area detection method specifically used.
The foregoing has described exemplary embodiments of the invention, it being understood that any simple variations, modifications, or other equivalent arrangements which would not unduly obscure the invention may be made by those skilled in the art without departing from the spirit of the invention.
Claims (7)
1. An image plane area detection method based on edge information is characterized by comprising the following steps:
step 1, extracting basic skeleton network characteristics of each level from an image by adopting a neural network;
step 2, extracting multi-level context features from the features of each level obtained in the step 1 in an up-sampling and interlayer linking mode;
step 3, extracting multi-level edge features of the image by iteratively optimizing the network layer according to the features of the intermediate level obtained in the step 1, and optimizing the features of different levels to expected feature dimensions;
step 4, fusing the multi-level context features obtained in the step 2 and the multi-level edge features obtained in the step 3 in pairs layer by layer, and keeping the resolution of the context features and the edge features with different resolutions the same;
and 5, providing the fused characteristics to a target model of the downstream image plane detection task to be trained so as to improve the performance of the target model in the downstream image plane detection task.
3. The image plane area detection method based on edge information according to claim 2, wherein: in the step 2, using an FPN network to assist a ResNet neural network, extracting multi-level context characteristics from a plurality of levels of basic skeleton network characteristics in an up-sampling and interlayer linking mode;
by Py i (. Cndot.) (i=2, 3,4,5, 6) represents functions of different levels of the FPN network, usingRepresenting the output of each layer of the FPN network, processing the output of each layer of the ResNet neural network in such a way that the contextual characteristics of the image output by each layer of the FPN network are obtained>
Up represents upsampling.
4. The image plane area detection method based on edge information according to claim 3, wherein:in step 3, three intermediate level features of ResNet network output are selectedExtracting multi-level edge features of an image by iteratively optimizing a network layer, and adding the features of the three intermediate levels to each other>As input, input to an iterative optimization network layer composed of a plurality of channel smoothing reduction modules, and the iterative optimization network layer smoothly reduces the channel number of the input features by 2 multiplying power until the channel number of all the input features is reduced to 256, thereby obtaining edge features->
5. The image plane area detection method based on edge information according to claim 4, wherein: sending the multi-level context features obtained in the step 2 and the multi-level edge features obtained in the step 3 into a resolution self-adaptive module for carrying out layer-by-layer pairwise fusion;
the resolution self-adaptive module is formed by combining a group of self-adaptive convolution kernels, and the process is defined as follows:
Tr 4 =Conv1(·)
Tr i the paired fusion operation representing the ith hierarchy is realized by adopting the combination of convolution layers; conv3 (·) and Conv1 (·) represent convolution operations with convolution kernel sizes 3 and 1, respectively;representing a combined operation of multiple convolutional layers.
6. The image plane area detection method based on edge information according to claim 1, wherein: and 5, when the target model is trained, calculating the mixed loss of the target model, wherein the mixed loss comprises two items, namely edge loss and loss of an original image plane area detection model, and the original image plane area detection model refers to the neural network used in the step 1.
7. A computer readable storage medium, characterized in that a computer program is stored, which computer program, when executed, implements the steps of the method according to any of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310279248.5A CN116258943A (en) | 2023-03-21 | 2023-03-21 | Image plane area detection method based on edge information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310279248.5A CN116258943A (en) | 2023-03-21 | 2023-03-21 | Image plane area detection method based on edge information |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116258943A true CN116258943A (en) | 2023-06-13 |
Family
ID=86687957
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310279248.5A Pending CN116258943A (en) | 2023-03-21 | 2023-03-21 | Image plane area detection method based on edge information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116258943A (en) |
-
2023
- 2023-03-21 CN CN202310279248.5A patent/CN116258943A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Shivakumar et al. | Dfusenet: Deep fusion of rgb and sparse depth information for image guided dense depth completion | |
CN110443842B (en) | Depth map prediction method based on visual angle fusion | |
CN110176027B (en) | Video target tracking method, device, equipment and storage medium | |
CN104952083B (en) | A kind of saliency detection method based on the modeling of conspicuousness target background | |
CN112184585B (en) | Image completion method and system based on semantic edge fusion | |
CN112750201B (en) | Three-dimensional reconstruction method, related device and equipment | |
Xiao et al. | Single image dehazing based on learning of haze layers | |
US12019706B2 (en) | Data augmentation for object detection via differential neural rendering | |
CN112651423A (en) | Intelligent vision system | |
Zheng et al. | T-net: Deep stacked scale-iteration network for image dehazing | |
CN113393434A (en) | RGB-D significance detection method based on asymmetric double-current network architecture | |
Li et al. | Hierarchical opacity propagation for image matting | |
CN117218246A (en) | Training method and device for image generation model, electronic equipment and storage medium | |
CN112149526A (en) | Lane line detection method and system based on long-distance information fusion | |
Oliveira et al. | A novel Genetic Algorithms and SURF-Based approach for image retargeting | |
Fujii et al. | RGB-D image inpainting using generative adversarial network with a late fusion approach | |
CN116740261A (en) | Image reconstruction method and device and training method and device of image reconstruction model | |
US20230005107A1 (en) | Multi-task text inpainting of digital images | |
CN114373110A (en) | Method and device for detecting target of input image and related products | |
Wang et al. | Perception-guided multi-channel visual feature fusion for image retargeting | |
CN117612153A (en) | Three-dimensional target identification and positioning method based on image and point cloud information completion | |
Li et al. | SPN2D-GAN: semantic prior based night-to-day image-to-image translation | |
KR101592087B1 (en) | Method for generating saliency map based background location and medium for recording the same | |
CN113628349B (en) | AR navigation method, device and readable storage medium based on scene content adaptation | |
EP4047547A1 (en) | Method and system for removing scene text from images |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |