CN114972422A

CN114972422A - Image sequence motion occlusion detection method and device, memory and processor

Info

Publication number: CN114972422A
Application number: CN202210491032.0A
Authority: CN
Inventors: 董冲; 方挺; 韩家明
Original assignee: Anhui University Of Technology Science Park Co ltd
Current assignee: Anhui University Of Technology Science Park Co ltd
Priority date: 2022-05-07
Filing date: 2022-05-07
Publication date: 2022-08-30

Abstract

The application discloses a method, a device, a memory and a processor for detecting the motion occlusion of an image sequence, wherein the method comprises the steps of obtaining any two continuous frames of images; acquiring a dense optical flow field and a motion boundary area between the two frames of images; and analyzing the dense optical flow field and the motion boundary region as input by using a semantic segmentation deep neural network model to obtain an occlusion detection result output by the semantic segmentation deep neural network model. In the semantic segmentation depth neural network model, a multilayer accumulation loss function based on the information weight of the occlusion boundary space is adopted, and the neighborhood pixel space correlation of the occlusion boundary is embedded into the learning process, so that the network model can be converged to the details such as the motion occlusion boundary, and the like, and the constructed network model is suitable for motion occlusion detection and obtains the occlusion detection effect with clear boundary.

Description

Image sequence motion occlusion detection method and device, memory and processor

Technical Field

The application relates to a moving image sequence processing technology, in particular to a moving image sequence occlusion detection method based on a semantic segmentation deep neural network architecture.

Background

Image sequence motion occlusion refers to the phenomenon that a portion of pixels are visible in one frame of image and not visible in another frame of image. The method is an important task in the field of image processing and computer vision research, and aims to guide other computer vision tasks such as optical flow estimation, image registration, target segmentation, target tracking and the like to carry out accurate calculation by detecting occlusion areas between different objects and scenes or between different parts of different objects in an image sequence. The research results are widely applied to military science and technology, medical image processing and analysis, aerospace, satellite cloud picture analysis and the like.

The traditional image sequence motion occlusion detection method is to compare forward and backward motion estimation by utilizing motion symmetry or detect occlusion by establishing models such as geometric constraint, matching constraint and the like, but the methods have the problems of occlusion areas and occlusion boundary blurring when facing complex scenes or complex motion.

Disclosure of Invention

The embodiment of the application provides a method, a device, a memory and a processor for detecting motion occlusion of an image sequence, so as to at least solve the technical problems of occlusion areas and occlusion boundary blurring existing in motion occlusion of the image motion sequence.

According to an aspect of the present application, there is provided an image sequence motion occlusion detection method, including:

acquiring any two continuous frames of images;

acquiring a dense optical flow field and a motion boundary area between the two frames of images;

analyzing the dense optical flow field and the motion boundary region as input by using a semantic segmentation deep neural network model to obtain an occlusion detection result output by the semantic segmentation deep neural network model;

wherein a loss value L at a k-th layer of a decoder in the semantically segmented deep neural network model _k The following were used:

in the above formula, the parameters have the following meanings:

x denotes pixel coordinates, and Ω denotes a real number domain;

k _x a predicted value in each channel of the occlusion feature map input for the last layer of the decoder;

a(k _x ) Represents said k _x Mapping to the (0, 1) interval to form an activation value of a shielding mapping value;

o (x) represents the occlusion label of each pixel x, taking 0 or 1;

ω (x) represents a weight, and

o is an occlusion area, B is an occlusion boundary area;

ω ₀ (x) Is the occlusion region weight;

ω _b an initial weight of the occlusion boundary region;

d (σ) is a distance function based on the search window radius σ.

Further, in the present invention, the D (σ) is obtained by the following formula:

wherein:

d ₁ (x) Is the shielding edgeDistance from pixels in the boundary region to the occlusion boundary;

d ₂ (x) Is the distance of the point to the occlusion border region within the search window.

Further, in the present invention, the occlusion boundary area is obtained by:

obtaining an occlusion boundary from the real occlusion map;

performing mask expansion on the shielding boundary to obtain an expanded shielding area;

and subtracting the expanded occlusion area from the real occlusion image to obtain the occlusion boundary area.

Further, in the present invention, the loss value of the semantically segmented deep neural network model is

Wherein, ω is ^k Representing the weight of each layer of the occlusion prediction graph.

Further, in the present invention, ω is said ^k Each layer was taken to be the same.

Further, in the present invention, the structure of each layer of the decoder is as follows:

4 deconvolution modules stacked successively, wherein each deconvolution module is used for sequentially executing one deconvolution operation of 4 × 4 and two convolution operations of 7 × 7 to obtain a feature map after the deconvolution operation; after each convolution operation, performing normalization processing and activation processing once;

the splicing module is used for splicing the feature map generated by the corresponding layer of the encoder, the feature map obtained by the layer of the decoder after the deconvolution operation and the up-sampled occlusion feature map processed by the layer before the decoder to obtain a spliced feature map, and executing a 3 × 3 convolution operation on the spliced feature map to generate an occlusion feature map; the occlusion feature map is to be processed into an up-sampled occlusion feature map that is to be up-sampled in a next decoding layer by doubling a resolution via an up-sampling operation;

when the splicing module in the first layer of the decoder performs splicing to obtain a spliced feature map, splicing the feature map of the coding part with the feature map operated by the deconvolution module to obtain the spliced feature map.

Further, in the present invention, the acquiring a motion boundary region between the two images includes:

detecting a moving boundary of the dense optical flow field with an edge detector;

and expanding the motion boundary of the dense optical flow field by using an expansion mask to obtain a motion boundary region area.

In a second aspect of the present application, there is provided an image sequence motion occlusion detection apparatus, including:

the first acquisition module is used for acquiring any two continuous frames of images;

the second acquisition module is used for acquiring a dense optical flow field and a motion boundary between the two frames of images;

the analysis output module is used for analyzing the dense optical flow field and the motion boundary as input by utilizing a semantic segmentation deep neural network model to obtain an occlusion detection result output by the semantic segmentation deep neural network model;

in the above formula, the parameters have the following meanings:

x denotes pixel coordinates, and Ω denotes a real number domain;

o (x) represents the occlusion label for each pixel x, taking 0 or 1;

ω (x) represents a weight, and

o is an occlusion area, B is an occlusion boundary area;

ω ₀ (x) Is the occlusion region weight;

ω _b an initial weight of the occlusion boundary region;

d (σ) is a distance function based on the search window radius σ.

In a third aspect of the present application, there is provided a memory for storing software for performing the method of the first aspect of the present application.

In a fourth aspect of the present application, there is provided a processor for processing software for performing the method of the first aspect of the present application.

Has the beneficial effects that:

the application provides an image sequence motion occlusion detection method, which comprises the steps of obtaining two continuous frames of images; acquiring a dense optical flow field and a motion boundary area between the two frames of images; and analyzing the dense optical flow field and the motion boundary region as input by using a semantic segmentation deep neural network model to obtain an occlusion detection result output by the semantic segmentation deep neural network model. In the semantic segmentation depth neural network model, a multilayer accumulation loss function based on the information weight of the occlusion boundary space is adopted, and the neighborhood pixel space correlation of the occlusion boundary is embedded into the learning process, so that the network model can be converged to the details such as the motion occlusion boundary, and the like, and the constructed network model is suitable for motion occlusion detection and obtains the occlusion detection effect with clear boundary.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, are included to provide a further understanding of the application, and the description of the exemplary embodiments of the application are intended to be illustrative of the application and are not intended to limit the application. In the drawings:

FIG. 1 is a flow chart of a method for detecting occlusion due to motion of an image sequence according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a semantic segmentation deep neural network model according to an embodiment of the present application;

FIG. 3 is a first frame picture of a sequence of bamboooo _1 pictures in an MPI _ Sintel dataset;

FIG. 4 is a second frame picture of a sequence of bamboo _1 pictures in the MPI _ Sintel dataset;

FIG. 5 is an occlusion diagram of a sequence of bamboooo _1 images in an MPI _ Sintel data set calculated by a method according to an embodiment of the present invention.

Detailed Description

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.

The embodiment of the application provides an image sequence motion occlusion detection method, which creatively considers motion occlusion as semantic information among image sequences, adopts an encoder-decoder structure of a semantic segmentation depth neural network model to construct an occlusion detection neural network module, analyzes occlusion information in an image sequence optical flow field, and designs a loss function more fitting a motion occlusion scene, thereby realizing accurate detection of the motion occlusion.

As shown in fig. 1, a method for detecting occlusion due to motion in an image sequence according to an embodiment of the present invention is shown, and the method includes the following steps:

s102, acquiring any two continuous frames of images.

As shown in fig. 3 and 4, the 2 pictures provided by the embodiment of the present application are selected from the frame _0043 image and the frame _0044 image of the bamboooo _1 image sequence in the MPI _ sinter data set as the first frame image and the second frame image, and the grayscale image of the 2 frame image is presented in the drawing.

And S104, acquiring a dense optical flow field and a motion boundary area between the two frames of images.

In the embodiment, an optical flow convolution neural network is adopted to detect the dense optical flow field, and then a Sobel edge detector is used to detect the motion boundary of the dense optical flow field; and finally, expanding the motion boundary of the dense optical flow field by using an h x h expansion mask to obtain a motion boundary area.

And S106, analyzing the dense optical flow field and the motion boundary as input by using a semantic segmentation depth neural network model, and obtaining an occlusion detection result output by the semantic segmentation depth neural network model.

In this embodiment, as shown in fig. 2, the feature channel of the semantic segmentation deep neural network model is selected as 3 layers.

Wherein, the structure of each layer of the encoder is as follows:

4 convolution modules stacked successively, each convolution module being configured to sequentially perform a convolution operation by 3 × 3, wherein each convolution operation is followed by a normalization process and an activation process;

a pooling module to perform a 2 x 2 max pooling operation.

Under the effect of the above-mentioned encoder structure, the number of the characteristic channels is doubled in each down-sampling process, and is 16, 32, 64, 128 and 256 respectively.

Wherein, the structure of each layer of the decoder is as follows:

4 deconvolution modules stacked successively, wherein each deconvolution module is used for sequentially executing one deconvolution operation of 4 × 4 and two convolution operations of 7 × 7 to obtain a feature map after the deconvolution operation; wherein, normalization processing and activation processing are carried out once after each convolution operation;

And each deconvolution module operation reduces the number of channels to be 256, 128, 64, 32, 16 and 1 respectively, wherein the last layer does not introduce a deconvolution module, and a single-channel occlusion feature map is generated by 3 multiplied by 3 convolution.

Occlusion detection is a two-class semantic problem, usually employing a loss function based on binary cross entropy to train a neural network. However, motion occlusion pixels in an image sequence generally have obvious sample skew, and when the number of non-occlusion pixels is much larger than that of occlusion pixels, the network loss value cannot well reflect the accuracy of occlusion pixel detection; meanwhile, the designed network needs to be well converged to the details such as the motion occlusion boundary. Based on the above two considerations, in this embodiment, a multilayer accumulated loss function based on the information weight of the occlusion boundary space is designed, and specifically, the loss value L at the kth layer of the decoder in the semantic segmentation deep neural network model _k The following were used:

in the above formula, the parameters have the following meanings:

x denotes pixel coordinates, and Ω denotes a real number domain;

a(k _x ) Indicating that k is to be signed by a Sigmoid function _x Mapping to the (0, 1) interval to form an activation value of a shielding mapping value;

o (x) represents an occlusion label for each pixel x, taking 0 or 1, for distinguishing whether it is occluded or not;

ω (x) represents a weight, and

o is an occlusion area, and B is an occlusion boundary area;

ω ₀ (x) Is the occlusion region weight;

ω _b an initial weight of the occlusion boundary region;

d (σ) is a distance function based on the search window radius σ.

In the present embodiment, the D (σ) is obtained by the following formula:

wherein:

d ₁ (x) The distance from the pixel in the occlusion boundary area to the occlusion boundary is obtained;

The method is based on a semantic segmentation depth neural network architecture, improves the accuracy of a neural network model in detecting the occlusion area and the occlusion boundary by introducing motion boundary input and designing a multilayer accumulation loss function based on the occlusion boundary space information weight, has higher calculation precision and better adaptability to complex scenes and complex motion image sequences, and can be effectively applied to an image sequence motion analysis visual task.

In this embodiment, the occlusion boundary area is obtained by the following method:

obtaining an occlusion boundary from the real occlusion map;

and subtracting the expanded occlusion region from the real occlusion image to obtain the occlusion boundary region.

The embodiment adopts supervised learning, the real occlusion map is a target in the machine learning, and the real occlusion map is also obtained from the MPI _ sinter data set. In the embodiment of the application, the occlusion boundary region is obtained according to the real occlusion graph and acts on the distribution of the weight, so that the method in the embodiment of the application can clearly express at the occlusion boundary.

In this embodiment, the occlusion real map is downsampled according to the size of each layer of occlusion prediction map, the loss function defined above is used, and finally the loss value of the semantic segmentation deep neural network model is obtained as

In this embodiment, ω is ^k The same homogeneous zone was taken for each layer 0.5.

According to the occlusion detection result in fig. 5, the method improves the accuracy of the motion occlusion detection of the image sequence, has higher motion occlusion detection precision for complex scenes and complex motion image sequences, and has wide application prospects in the fields of medical segmentation, video monitoring and the like.

According to a second aspect of the present application, there is provided an image sequence motion occlusion detection apparatus, comprising:

the second acquisition module is used for acquiring a dense optical flow field and a motion boundary area between the two frames of images;

the analysis output module is used for analyzing the dense optical flow field and the motion boundary region as input by utilizing a semantic segmentation deep neural network model to obtain an occlusion detection result output by the semantic segmentation deep neural network model;

in the above formula, the meaning of each parameter is as follows:

x denotes pixel coordinates, and Ω denotes a real number domain;

o (x) represents the occlusion label for each pixel x, taking 0 or 1;

ω (x) represents a weight, and

o is an occlusion area, B is an occlusion boundary area;

ω ₀ (x) Is the occlusion region weight;

ω _b an initial weight of the occlusion boundary region;

d (σ) is a distance function based on the search window radius σ.

According to yet another aspect of the application, a processor is provided for executing software for executing the method for detecting occlusion in motion of an image sequence.

According to yet another aspect of the present application, a memory is provided for storing software for executing the method for detecting occlusion in motion of an image sequence.

It should be noted that, the method for detecting motion occlusion in an image sequence executed by the software is the same as the method for detecting motion occlusion in an image sequence described above, and is not described herein again.

In this embodiment, an electronic device is provided, comprising a memory in which a computer program is stored and a processor configured to run the computer program to perform the method in the above embodiments.

These computer programs may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks, and corresponding steps may be implemented by different modules.

The programs described above may be run on a processor or may also be stored in memory (or referred to as computer-readable media), which includes both non-transitory and non-transitory, removable and non-removable media, that implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement or the like made within the spirit and principle of the present application shall be included in the scope of the claims of the present application.

Claims

1. The image sequence motion occlusion detection method is characterized by comprising the following steps:

acquiring any two continuous frame images;

in the above formula, the parameters have the following meanings:

x denotes pixel coordinates, and Ω denotes a real number domain;

o (x) represents the occlusion label for each pixel x, taking 0 or 1;

ω (x) represents a weight, and

o is an occlusion area, B is an occlusion boundary area;

ω ₀ (x) Is the occlusion region weight;

ω _b an initial weight of the occlusion boundary region;

d (σ) is a distance function based on the search window radius σ.

2. The method of claim 1, wherein: the D (σ) is obtained by the following formula:

wherein:

d ₁ (x)the distance from the pixel in the occlusion boundary area to the occlusion boundary is obtained;

3. The method of claim 1, wherein: the occlusion boundary area is obtained by the following method:

obtaining an occlusion boundary from the real occlusion map;

4. The method of claim 1, wherein: the loss value of the semantic segmentation deep neural network model is

Wherein, ω is ^k And representing the weight of each layer of the occlusion prediction graph.

5. The method of claim 4, wherein: the omega ^k Each layer was taken to be the same.

6. The method of claim 5, wherein: the structure of each layer of the decoder is as follows:

7. The method according to any one of claims 1 to 6, characterized in that: the acquiring a motion boundary region between the two frame images comprises:

and expanding the motion boundary of the dense optical flow field by using an expansion mask to obtain a motion boundary area.

8. Image sequence motion shelters from detection device which characterized in that: the method comprises the following steps:

in the above formula, the parameters have the following meanings:

x denotes pixel coordinates, and Ω denotes a real number domain;

o (x) represents the occlusion label for each pixel x, taking 0 or 1;

ω (x) represents a weight, and

o is an occlusion area, B is an occlusion boundary area;

ω ₀ (x) Is the occlusion region weight;

ω _b an initial weight of the occlusion boundary region;

d (σ) is a distance function based on the search window radius σ.

9. A memory for storing software, characterized in that the software is adapted to perform the method of any of claims 1-7.

10. A processor for processing software, characterized in that the software is adapted to perform the method of any of claims 1-7.