CN116258943A

CN116258943A - Image plane area detection method based on edge information

Info

Publication number: CN116258943A
Application number: CN202310279248.5A
Authority: CN
Inventors: 杨巨峰; 张知诚
Original assignee: Nankai University
Current assignee: Nankai University
Priority date: 2023-03-21
Filing date: 2023-03-21
Publication date: 2023-06-13

Abstract

The invention discloses an image plane area detection method based on edge information, which comprises the steps of firstly extracting basic skeleton network characteristics of each level of an image by adopting a neural network, then extracting multi-level context characteristics of each obtained level of characteristics in an up-sampling and interlayer linking mode, extracting multi-level edge characteristics of the image by iteratively optimizing a network layer by using the obtained characteristics of the middle level, and optimizing different levels of characteristics to expected characteristic dimensions; and fusing the multi-level context features and the multi-level edge features in pairs level by level, and finally providing the fused features for a target model of the downstream image plane detection task to be trained so as to improve the performance of the target model in the downstream image plane detection task.

Description

Image plane area detection method based on edge information

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to an image plane area detection method based on edge information.

Background

Planar regions in a scene provide important information for a number of vision-based applications, including computer vision, stereoscopic vision, and robotic vision. Planar region detection is one of the important challenges in the categories of computer vision, pattern recognition, etc., in order to make a machine as human-like with the perceptibility of high-level scene structures. After extracting all planes from a single image, people can select the planes of interest to them and design efficient and attractive applications based on these planar areas. For example, people may decorate walls with favorite textures, or advertisers may leverage less informative areas (e.g., tables, walls, and boards) in promotional videos to more effectively market their products. In addition, planar features are also key cues for autonomous robots to perceive the surrounding environment and build maps through camera views.

Unlike detection and segmentation in traditional objects, plane detection is more limited and more challenging, because on the one hand, the assumption from a plane that lacks a given class and requires segmentation of planes of any class, and on the other hand, the boundaries of planar regions are difficult to define due to the high level abstraction of structural information within the scene, such as planar normal vectors. With the advent of deep neural networks, analysis of planar regions has become possible due to the ability of convolutional neural networks to learn high-level features characterization of images. The paper "Single-Image Piece-wise Planar 3DReconstruction via Associative Embedding" published by Yu et al 2019 at CVPR (1029-1037) and the paper "PlaneRCNN:3D Plane Detection and Reconstruction from a Single Image" published by Liu et al 2019 at CVPR (4450-4459) designed a CNN-based architecture to analyze Planar regions. The former uses a two-stage bottom-up network architecture, and the latter uses Mask R-CNN to generate any number of planes, and designs the features of all planes to be integrated by the refining module to further refine the prediction result. However, due to the challenge of plane class agnostic, the segmentation masks predicted by these methods are not accurate and small-sized planes are difficult to detect.

Recently, edge information has been shown to be effective in helping models learn more discriminative features in the fields of salient object detection, scene segmentation and parsing. "Self-correction for human parsing" published in TPAMI by Li et al 2021 constructs an additional edge network to estimate the edges of objects and proposes a Self-correcting strategy to remove false labels. However, it only optimizes the features of the edge region, while the edge region pixels occupy very little, about 1%, in the image.

Disclosure of Invention

The invention aims to improve the defects of the existing plane area detection method, and provides an image plane area detection method based on edge information, which can improve the performance of a downstream image plane analysis task.

The invention is realized by the following technical scheme:

an image plane area detection method based on edge information comprises the following steps:

step 1, extracting basic skeleton network characteristics of each level from an image by adopting a neural network;

step 2, extracting multi-level context features from the features of each level obtained in the step 1 in an up-sampling and interlayer linking mode;

step 3, extracting multi-level edge features of the image by iteratively optimizing the network layer according to the features of the intermediate level obtained in the step 1, and optimizing the features of different levels to expected feature dimensions;

step 4, fusing the multi-level context features obtained in the step 2 and the multi-level edge features obtained in the step 3 in pairs layer by layer, and keeping the resolution of the context features and the edge features with different resolutions the same;

and 5, providing the fused characteristics to a target model of the downstream image plane detection task to be trained so as to improve the performance of the target model in the downstream image plane detection task.

In the above technical solution, in step 1, a ResNet network structure is adopted, and the characteristics of each level output are expressed as

In the above technical solution, in step 2, using an FPN network to assist a res net neural network, extracting multi-level context features from multi-level basic skeleton network features by means of up-sampling and interlayer linking;

by Py _i (. Cndot.) (i=2, 3,4,5, 6) represents functions of different levels of the FPN network, using

Representing the output of each layer of the FPN network, processing the output of each layer of the ResNet neural network in such a way that the contextual characteristics of the image output by each layer of the FPN network are obtained>

Up represents upsampling.

In the above technical solution, in step 3, three middle-level features of ResNet network output are selected

Extracting multi-level edge features of an image by iteratively optimizing a network layer, and adding the features of the three intermediate levels to each other>

As input, input to an iterative optimization network layer composed of a plurality of channel smoothing reduction modules, and the iterative optimization network layer smoothly reduces the channel number of the input features by 2 times until the channel number of all the input features is reduced to 256, thereby obtaining edge features->

In the technical scheme, the multi-level context features obtained in the step 2 and the multi-level edge features obtained in the step 3 are sent into a resolution self-adaptive module to be fused in pairs in a level-by-level manner;

the resolution adaptive module is formed by combining a group of adaptive convolution kernels, and the specific process is defined as follows:

Tr ₄ ＝Conv1(·)

Tr _i the paired fusion operation representing the ith hierarchy is realized by adopting the combination of convolution layers; conv3 (·) and Conv1 (·) represent convolution operations with convolution kernel sizes 3 and 1, respectively; the degree represents the combined operation of the multiple convolutional layers.

In the above technical solution, in step 5, when training the target model, the mixing loss of the target model is calculated, where the mixing loss includes two terms, namely, edge loss and loss of the original image plane area detection model, where the original image plane area detection model refers to the neural network used in step 1.

The present invention also provides a computer readable storage medium storing a computer program which when executed implements the steps of the above method.

The invention has the advantages and beneficial effects that:

according to the invention, the edge features and the context features are extracted by utilizing a multi-level convolutional neural network, two plane features with different resolutions are respectively extracted by the multi-level convolutional network, wherein the designed cyclic convolutional layer circularly optimizes the perception fields of the features according to the different resolutions. After that, the invention carries out feature fusion, under the guidance of innovative resolution self-adaptive fusion operation and plane edge supervision, edge features and context features of different levels are aggregated into multi-level plane features and provided for any downstream image plane analysis model, thereby improving the performance of the target model in the downstream image plane analysis task. Experimental results show that the method can effectively improve the performance of the model in a plurality of image plane analysis tasks.

Drawings

Fig. 1 is a flowchart of an image plane area detection method based on edge information according to the present invention.

Fig. 2 is a visual image of three extracted discriminant features of the present invention, taking the planrcnn planar region detection model as an example of a target model.

FIG. 3 is a visual presentation of the predicted results of the present invention in planar analysis downstream task planar segmentation.

FIG. 4 is a visual presentation of the prediction results of the present invention in planar analysis downstream task depth estimation.

FIG. 5 is a visual presentation of the predicted results of the present invention in a planar analysis downstream task 3D reconstruction.

Other relevant drawings may be made by those of ordinary skill in the art from the above figures without undue burden.

Detailed Description

In order to make the person skilled in the art better understand the solution of the present invention, the following describes the solution of the present invention with reference to specific embodiments.

An image plane area detection method based on edge information, referring to fig. 1, comprises the following steps:

and step 1, extracting basic skeleton network characteristics of each level from the image by adopting a ResNet neural network.

ResNet is a common multi-scale neural network structure that internally contains 5 blocks, each block being defined as a block according to the hierarchy to which it belongs _i (i=1, 2, …, 5), each block corresponding to a respective hierarchy, the characteristics of the output of each block (i.e. each hierarchy) being expressed as

And 2, using a FPN (Feature Pyramid Networks) network to assist a ResNet neural network, and extracting multi-level context characteristics from the multi-level basic skeleton network characteristics in an up-sampling and interlayer linking mode.

/>

Up represents upsampling.

And 3, extracting multi-level edge features.

In order to reduce the calculation cost, the multi-level edge feature extraction shares the basic skeleton network features of each level extracted by ResNet, and the edge details of the higher level can be seriously lost due to the smaller receptive field of the lower level, so the invention selects the features of three middle levels output by the ResNet network

The multi-level edge features of the image are extracted by iteratively optimizing the network layer and optimizing the different level features to the desired feature dimensions.

Features of the three intermediate levels

This process is describedThe method comprises the following steps:

Ed ₁ representing an iterative optimization network layer, edge features are optimized multiple times until their feature dimensions conform to predefined feature dimensions (256 feature dimensions in this embodiment, but other dimensions are also possible).

In order to detect an image plane, especially a small-sized image plane, attention needs to be paid to the edge and the main area of the image plane at the same time, and since the context features obtained in the step 2 represent the main area of the image plane and the edge features obtained in the step 3 represent the edge area of the image plane, the multi-level context features obtained in the step 2 and the multi-level edge features obtained in the step 3 are sent into the resolution adaptive module to be fused in pairs layer by layer, and the resolution of the context features and the edge features with different resolutions originally are kept the same.

Tr ₄ ＝Conv1(·)

Tr _i the paired fusion operation representing the ith hierarchy is realized by adopting the combination of convolution layers; conv3 (() and Conv1 (()) represent convolution operations with convolution kernel sizes 3 and 1, respectively, and DEG represents a combination operation of multiple convolution layers.

And 5, fusing the edge features and the context features of different levels into 5 levels of plane features, and providing the fused plane features of each level for a target model of a downstream image plane detection task to be trained so as to improve the performance of the target model in the downstream image plane detection task. Since the dimensions and dimensions of the planar features of each level after fusion are the same as those of the features generated by the object model to be trained (in this embodiment, the ResNet model), the newly fused features can be easily fed into the existing object model.

Fig. 2 is a visual image of three extracted discriminant features of the present invention, using the planrcnn planar region detection model as an example of a target model.

The downstream image plane detection task may be: the tasks of plane segmentation, image depth estimation, image 3D plane reconstruction and the like refer to figures 3-5, and the results of plane segmentation, image depth estimation and image 3D plane reconstruction prediction are respectively displayed by adopting the method of the invention.

Further, in using the method proposed by the present invention, it is necessary to calculate the mixing loss of the target model. The blending loss generally contains two terms, namely an edge loss that helps to learn the model boundary information and a loss of the original image plane area detection model (the original image plane area detection model refers to the network model used in step 1). Formally, the mixing loss is defined as:

here the number of the elements is the number,

represents edge loss, ++>

Representing the loss of detection of the original image plane area, the ratio between them uses the super parameter lambda _e And lambda (lambda) _d AdjustingAnd (5) a section. For edge loss, to facilitate performance of downstream modules, model learning of edge information of the plane is desirable. In particular, this loss should explicitly preserve the edges of the true planar region, defined as follows:

here the number of the elements is the number,

representing the value of the i-th pixel in the predicted edge map, N represents the number of pixels contained in the edge map, n+ represents the number of edge region pixels in the real planar region, and N-represents the number of non-edge region pixels in the real planar region.

For the planar area detection loss, calculation is performed according to a planar area detection method specifically used.

The foregoing has described exemplary embodiments of the invention, it being understood that any simple variations, modifications, or other equivalent arrangements which would not unduly obscure the invention may be made by those skilled in the art without departing from the spirit of the invention.

Claims

1. An image plane area detection method based on edge information is characterized by comprising the following steps:

2. The image plane area detection method based on edge information according to claim 1, wherein: in step 1, the ResNet network structure is adopted, and the characteristics of each level output are expressed as

3. The image plane area detection method based on edge information according to claim 2, wherein: in the step 2, using an FPN network to assist a ResNet neural network, extracting multi-level context characteristics from a plurality of levels of basic skeleton network characteristics in an up-sampling and interlayer linking mode;

Up represents upsampling.

4. The image plane area detection method based on edge information according to claim 3, wherein:in step 3, three intermediate level features of ResNet network output are selected

As input, input to an iterative optimization network layer composed of a plurality of channel smoothing reduction modules, and the iterative optimization network layer smoothly reduces the channel number of the input features by 2 multiplying power until the channel number of all the input features is reduced to 256, thereby obtaining edge features->

5. The image plane area detection method based on edge information according to claim 4, wherein: sending the multi-level context features obtained in the step 2 and the multi-level edge features obtained in the step 3 into a resolution self-adaptive module for carrying out layer-by-layer pairwise fusion;

the resolution self-adaptive module is formed by combining a group of self-adaptive convolution kernels, and the process is defined as follows:

/>

Tr ₄ ＝Conv1(·)

Tr _i the paired fusion operation representing the ith hierarchy is realized by adopting the combination of convolution layers; conv3 (·) and Conv1 (·) represent convolution operations with convolution kernel sizes 3 and 1, respectively;

representing a combined operation of multiple convolutional layers.

6. The image plane area detection method based on edge information according to claim 1, wherein: and 5, when the target model is trained, calculating the mixed loss of the target model, wherein the mixed loss comprises two items, namely edge loss and loss of an original image plane area detection model, and the original image plane area detection model refers to the neural network used in the step 1.

7. A computer readable storage medium, characterized in that a computer program is stored, which computer program, when executed, implements the steps of the method according to any of claims 1 to 6.