CN114972422A - Image sequence motion occlusion detection method and device, memory and processor - Google Patents
Image sequence motion occlusion detection method and device, memory and processor Download PDFInfo
- Publication number
- CN114972422A CN114972422A CN202210491032.0A CN202210491032A CN114972422A CN 114972422 A CN114972422 A CN 114972422A CN 202210491032 A CN202210491032 A CN 202210491032A CN 114972422 A CN114972422 A CN 114972422A
- Authority
- CN
- China
- Prior art keywords
- occlusion
- boundary
- feature map
- motion
- network model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 30
- 238000000034 method Methods 0.000 claims abstract description 36
- 238000003062 neural network model Methods 0.000 claims abstract description 31
- 230000011218 segmentation Effects 0.000 claims abstract description 28
- 230000003287 optical effect Effects 0.000 claims abstract description 27
- 230000006870 function Effects 0.000 claims abstract description 15
- 238000013507 mapping Methods 0.000 claims description 12
- 238000012545 processing Methods 0.000 claims description 10
- 230000004913 activation Effects 0.000 claims description 9
- 238000004458 analytical method Methods 0.000 claims description 6
- 238000010606 normalization Methods 0.000 claims description 4
- 238000005070 sampling Methods 0.000 claims description 4
- 230000008569 process Effects 0.000 abstract description 5
- 238000009825 accumulation Methods 0.000 abstract description 3
- 230000000694 effects Effects 0.000 abstract description 3
- 238000001994 activation Methods 0.000 description 7
- 238000013528 artificial neural network Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 238000004590 computer program Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 235000017166 Bambusa arundinacea Nutrition 0.000 description 1
- 235000017491 Bambusa tulda Nutrition 0.000 description 1
- 241001330002 Bambuseae Species 0.000 description 1
- 208000006440 Open Bite Diseases 0.000 description 1
- 235000015334 Phyllostachys viridis Nutrition 0.000 description 1
- 239000011425 bamboo Substances 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/20—Image enhancement or restoration by the use of local operators
- G06T5/30—Erosion or dilatation, e.g. thinning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/12—Edge-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Abstract
The application discloses a method, a device, a memory and a processor for detecting the motion occlusion of an image sequence, wherein the method comprises the steps of obtaining any two continuous frames of images; acquiring a dense optical flow field and a motion boundary area between the two frames of images; and analyzing the dense optical flow field and the motion boundary region as input by using a semantic segmentation deep neural network model to obtain an occlusion detection result output by the semantic segmentation deep neural network model. In the semantic segmentation depth neural network model, a multilayer accumulation loss function based on the information weight of the occlusion boundary space is adopted, and the neighborhood pixel space correlation of the occlusion boundary is embedded into the learning process, so that the network model can be converged to the details such as the motion occlusion boundary, and the like, and the constructed network model is suitable for motion occlusion detection and obtains the occlusion detection effect with clear boundary.
Description
Technical Field
The application relates to a moving image sequence processing technology, in particular to a moving image sequence occlusion detection method based on a semantic segmentation deep neural network architecture.
Background
Image sequence motion occlusion refers to the phenomenon that a portion of pixels are visible in one frame of image and not visible in another frame of image. The method is an important task in the field of image processing and computer vision research, and aims to guide other computer vision tasks such as optical flow estimation, image registration, target segmentation, target tracking and the like to carry out accurate calculation by detecting occlusion areas between different objects and scenes or between different parts of different objects in an image sequence. The research results are widely applied to military science and technology, medical image processing and analysis, aerospace, satellite cloud picture analysis and the like.
The traditional image sequence motion occlusion detection method is to compare forward and backward motion estimation by utilizing motion symmetry or detect occlusion by establishing models such as geometric constraint, matching constraint and the like, but the methods have the problems of occlusion areas and occlusion boundary blurring when facing complex scenes or complex motion.
Disclosure of Invention
The embodiment of the application provides a method, a device, a memory and a processor for detecting motion occlusion of an image sequence, so as to at least solve the technical problems of occlusion areas and occlusion boundary blurring existing in motion occlusion of the image motion sequence.
According to an aspect of the present application, there is provided an image sequence motion occlusion detection method, including:
acquiring any two continuous frames of images;
acquiring a dense optical flow field and a motion boundary area between the two frames of images;
analyzing the dense optical flow field and the motion boundary region as input by using a semantic segmentation deep neural network model to obtain an occlusion detection result output by the semantic segmentation deep neural network model;
wherein a loss value L at a k-th layer of a decoder in the semantically segmented deep neural network model k The following were used:
in the above formula, the parameters have the following meanings:
x denotes pixel coordinates, and Ω denotes a real number domain;
k x a predicted value in each channel of the occlusion feature map input for the last layer of the decoder;
a(k x ) Represents said k x Mapping to the (0, 1) interval to form an activation value of a shielding mapping value;
o (x) represents the occlusion label of each pixel x, taking 0 or 1;
o is an occlusion area, B is an occlusion boundary area;
ω 0 (x) Is the occlusion region weight;
ω b an initial weight of the occlusion boundary region;
d (σ) is a distance function based on the search window radius σ.
Further, in the present invention, the D (σ) is obtained by the following formula:
wherein:
d 1 (x) Is the shielding edgeDistance from pixels in the boundary region to the occlusion boundary;
d 2 (x) Is the distance of the point to the occlusion border region within the search window.
Further, in the present invention, the occlusion boundary area is obtained by:
obtaining an occlusion boundary from the real occlusion map;
performing mask expansion on the shielding boundary to obtain an expanded shielding area;
and subtracting the expanded occlusion area from the real occlusion image to obtain the occlusion boundary area.
Further, in the present invention, the loss value of the semantically segmented deep neural network model is
Wherein, ω is k Representing the weight of each layer of the occlusion prediction graph.
Further, in the present invention, ω is said k Each layer was taken to be the same.
Further, in the present invention, the structure of each layer of the decoder is as follows:
4 deconvolution modules stacked successively, wherein each deconvolution module is used for sequentially executing one deconvolution operation of 4 × 4 and two convolution operations of 7 × 7 to obtain a feature map after the deconvolution operation; after each convolution operation, performing normalization processing and activation processing once;
the splicing module is used for splicing the feature map generated by the corresponding layer of the encoder, the feature map obtained by the layer of the decoder after the deconvolution operation and the up-sampled occlusion feature map processed by the layer before the decoder to obtain a spliced feature map, and executing a 3 × 3 convolution operation on the spliced feature map to generate an occlusion feature map; the occlusion feature map is to be processed into an up-sampled occlusion feature map that is to be up-sampled in a next decoding layer by doubling a resolution via an up-sampling operation;
when the splicing module in the first layer of the decoder performs splicing to obtain a spliced feature map, splicing the feature map of the coding part with the feature map operated by the deconvolution module to obtain the spliced feature map.
Further, in the present invention, the acquiring a motion boundary region between the two images includes:
detecting a moving boundary of the dense optical flow field with an edge detector;
and expanding the motion boundary of the dense optical flow field by using an expansion mask to obtain a motion boundary region area.
In a second aspect of the present application, there is provided an image sequence motion occlusion detection apparatus, including:
the first acquisition module is used for acquiring any two continuous frames of images;
the second acquisition module is used for acquiring a dense optical flow field and a motion boundary between the two frames of images;
the analysis output module is used for analyzing the dense optical flow field and the motion boundary as input by utilizing a semantic segmentation deep neural network model to obtain an occlusion detection result output by the semantic segmentation deep neural network model;
wherein a loss value L at a k-th layer of a decoder in the semantically segmented deep neural network model k The following were used:
in the above formula, the parameters have the following meanings:
x denotes pixel coordinates, and Ω denotes a real number domain;
k x a predicted value in each channel of the occlusion feature map input for the last layer of the decoder;
a(k x ) Represents said k x Mapping to the (0, 1) interval to form an activation value of a shielding mapping value;
o (x) represents the occlusion label for each pixel x, taking 0 or 1;
o is an occlusion area, B is an occlusion boundary area;
ω 0 (x) Is the occlusion region weight;
ω b an initial weight of the occlusion boundary region;
d (σ) is a distance function based on the search window radius σ.
In a third aspect of the present application, there is provided a memory for storing software for performing the method of the first aspect of the present application.
In a fourth aspect of the present application, there is provided a processor for processing software for performing the method of the first aspect of the present application.
Has the beneficial effects that:
the application provides an image sequence motion occlusion detection method, which comprises the steps of obtaining two continuous frames of images; acquiring a dense optical flow field and a motion boundary area between the two frames of images; and analyzing the dense optical flow field and the motion boundary region as input by using a semantic segmentation deep neural network model to obtain an occlusion detection result output by the semantic segmentation deep neural network model. In the semantic segmentation depth neural network model, a multilayer accumulation loss function based on the information weight of the occlusion boundary space is adopted, and the neighborhood pixel space correlation of the occlusion boundary is embedded into the learning process, so that the network model can be converged to the details such as the motion occlusion boundary, and the like, and the constructed network model is suitable for motion occlusion detection and obtains the occlusion detection effect with clear boundary.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, are included to provide a further understanding of the application, and the description of the exemplary embodiments of the application are intended to be illustrative of the application and are not intended to limit the application. In the drawings:
FIG. 1 is a flow chart of a method for detecting occlusion due to motion of an image sequence according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a semantic segmentation deep neural network model according to an embodiment of the present application;
FIG. 3 is a first frame picture of a sequence of bamboooo _1 pictures in an MPI _ Sintel dataset;
FIG. 4 is a second frame picture of a sequence of bamboo _1 pictures in the MPI _ Sintel dataset;
FIG. 5 is an occlusion diagram of a sequence of bamboooo _1 images in an MPI _ Sintel data set calculated by a method according to an embodiment of the present invention.
Detailed Description
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.
The embodiment of the application provides an image sequence motion occlusion detection method, which creatively considers motion occlusion as semantic information among image sequences, adopts an encoder-decoder structure of a semantic segmentation depth neural network model to construct an occlusion detection neural network module, analyzes occlusion information in an image sequence optical flow field, and designs a loss function more fitting a motion occlusion scene, thereby realizing accurate detection of the motion occlusion.
As shown in fig. 1, a method for detecting occlusion due to motion in an image sequence according to an embodiment of the present invention is shown, and the method includes the following steps:
s102, acquiring any two continuous frames of images.
As shown in fig. 3 and 4, the 2 pictures provided by the embodiment of the present application are selected from the frame _0043 image and the frame _0044 image of the bamboooo _1 image sequence in the MPI _ sinter data set as the first frame image and the second frame image, and the grayscale image of the 2 frame image is presented in the drawing.
And S104, acquiring a dense optical flow field and a motion boundary area between the two frames of images.
In the embodiment, an optical flow convolution neural network is adopted to detect the dense optical flow field, and then a Sobel edge detector is used to detect the motion boundary of the dense optical flow field; and finally, expanding the motion boundary of the dense optical flow field by using an h x h expansion mask to obtain a motion boundary area.
And S106, analyzing the dense optical flow field and the motion boundary as input by using a semantic segmentation depth neural network model, and obtaining an occlusion detection result output by the semantic segmentation depth neural network model.
In this embodiment, as shown in fig. 2, the feature channel of the semantic segmentation deep neural network model is selected as 3 layers.
Wherein, the structure of each layer of the encoder is as follows:
4 convolution modules stacked successively, each convolution module being configured to sequentially perform a convolution operation by 3 × 3, wherein each convolution operation is followed by a normalization process and an activation process;
a pooling module to perform a 2 x 2 max pooling operation.
Under the effect of the above-mentioned encoder structure, the number of the characteristic channels is doubled in each down-sampling process, and is 16, 32, 64, 128 and 256 respectively.
Wherein, the structure of each layer of the decoder is as follows:
4 deconvolution modules stacked successively, wherein each deconvolution module is used for sequentially executing one deconvolution operation of 4 × 4 and two convolution operations of 7 × 7 to obtain a feature map after the deconvolution operation; wherein, normalization processing and activation processing are carried out once after each convolution operation;
the splicing module is used for splicing the feature map generated by the corresponding layer of the encoder, the feature map obtained by the layer of the decoder after the deconvolution operation and the up-sampled occlusion feature map processed by the layer before the decoder to obtain a spliced feature map, and executing a 3 × 3 convolution operation on the spliced feature map to generate an occlusion feature map; the occlusion feature map is to be processed into an up-sampled occlusion feature map that is to be up-sampled in a next decoding layer by doubling a resolution via an up-sampling operation;
when the splicing module in the first layer of the decoder performs splicing to obtain a spliced feature map, splicing the feature map of the coding part with the feature map operated by the deconvolution module to obtain the spliced feature map.
And each deconvolution module operation reduces the number of channels to be 256, 128, 64, 32, 16 and 1 respectively, wherein the last layer does not introduce a deconvolution module, and a single-channel occlusion feature map is generated by 3 multiplied by 3 convolution.
Occlusion detection is a two-class semantic problem, usually employing a loss function based on binary cross entropy to train a neural network. However, motion occlusion pixels in an image sequence generally have obvious sample skew, and when the number of non-occlusion pixels is much larger than that of occlusion pixels, the network loss value cannot well reflect the accuracy of occlusion pixel detection; meanwhile, the designed network needs to be well converged to the details such as the motion occlusion boundary. Based on the above two considerations, in this embodiment, a multilayer accumulated loss function based on the information weight of the occlusion boundary space is designed, and specifically, the loss value L at the kth layer of the decoder in the semantic segmentation deep neural network model k The following were used:
in the above formula, the parameters have the following meanings:
x denotes pixel coordinates, and Ω denotes a real number domain;
k x a predicted value in each channel of the occlusion feature map input for the last layer of the decoder;
a(k x ) Indicating that k is to be signed by a Sigmoid function x Mapping to the (0, 1) interval to form an activation value of a shielding mapping value;
o (x) represents an occlusion label for each pixel x, taking 0 or 1, for distinguishing whether it is occluded or not;
o is an occlusion area, and B is an occlusion boundary area;
ω 0 (x) Is the occlusion region weight;
ω b an initial weight of the occlusion boundary region;
d (σ) is a distance function based on the search window radius σ.
In the present embodiment, the D (σ) is obtained by the following formula:
wherein:
d 1 (x) The distance from the pixel in the occlusion boundary area to the occlusion boundary is obtained;
d 2 (x) Is the distance of the point to the occlusion border region within the search window.
The method is based on a semantic segmentation depth neural network architecture, improves the accuracy of a neural network model in detecting the occlusion area and the occlusion boundary by introducing motion boundary input and designing a multilayer accumulation loss function based on the occlusion boundary space information weight, has higher calculation precision and better adaptability to complex scenes and complex motion image sequences, and can be effectively applied to an image sequence motion analysis visual task.
In this embodiment, the occlusion boundary area is obtained by the following method:
obtaining an occlusion boundary from the real occlusion map;
performing mask expansion on the shielding boundary to obtain an expanded shielding area;
and subtracting the expanded occlusion region from the real occlusion image to obtain the occlusion boundary region.
The embodiment adopts supervised learning, the real occlusion map is a target in the machine learning, and the real occlusion map is also obtained from the MPI _ sinter data set. In the embodiment of the application, the occlusion boundary region is obtained according to the real occlusion graph and acts on the distribution of the weight, so that the method in the embodiment of the application can clearly express at the occlusion boundary.
In this embodiment, the occlusion real map is downsampled according to the size of each layer of occlusion prediction map, the loss function defined above is used, and finally the loss value of the semantic segmentation deep neural network model is obtained as
Wherein, ω is k Representing the weight of each layer of the occlusion prediction graph.
In this embodiment, ω is k The same homogeneous zone was taken for each layer 0.5.
According to the occlusion detection result in fig. 5, the method improves the accuracy of the motion occlusion detection of the image sequence, has higher motion occlusion detection precision for complex scenes and complex motion image sequences, and has wide application prospects in the fields of medical segmentation, video monitoring and the like.
According to a second aspect of the present application, there is provided an image sequence motion occlusion detection apparatus, comprising:
the first acquisition module is used for acquiring any two continuous frames of images;
the second acquisition module is used for acquiring a dense optical flow field and a motion boundary area between the two frames of images;
the analysis output module is used for analyzing the dense optical flow field and the motion boundary region as input by utilizing a semantic segmentation deep neural network model to obtain an occlusion detection result output by the semantic segmentation deep neural network model;
wherein a loss value L at a k-th layer of a decoder in the semantically segmented deep neural network model k The following were used:
in the above formula, the meaning of each parameter is as follows:
x denotes pixel coordinates, and Ω denotes a real number domain;
k x a predicted value in each channel of the occlusion feature map input for the last layer of the decoder;
a(k x ) Represents said k x Mapping to the (0, 1) interval to form an activation value of a shielding mapping value;
o (x) represents the occlusion label for each pixel x, taking 0 or 1;
o is an occlusion area, B is an occlusion boundary area;
ω 0 (x) Is the occlusion region weight;
ω b an initial weight of the occlusion boundary region;
d (σ) is a distance function based on the search window radius σ.
According to yet another aspect of the application, a processor is provided for executing software for executing the method for detecting occlusion in motion of an image sequence.
According to yet another aspect of the present application, a memory is provided for storing software for executing the method for detecting occlusion in motion of an image sequence.
It should be noted that, the method for detecting motion occlusion in an image sequence executed by the software is the same as the method for detecting motion occlusion in an image sequence described above, and is not described herein again.
In this embodiment, an electronic device is provided, comprising a memory in which a computer program is stored and a processor configured to run the computer program to perform the method in the above embodiments.
These computer programs may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks, and corresponding steps may be implemented by different modules.
The programs described above may be run on a processor or may also be stored in memory (or referred to as computer-readable media), which includes both non-transitory and non-transitory, removable and non-removable media, that implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement or the like made within the spirit and principle of the present application shall be included in the scope of the claims of the present application.
Claims (10)
1. The image sequence motion occlusion detection method is characterized by comprising the following steps:
acquiring any two continuous frame images;
acquiring a dense optical flow field and a motion boundary area between the two frames of images;
analyzing the dense optical flow field and the motion boundary region as input by using a semantic segmentation deep neural network model to obtain an occlusion detection result output by the semantic segmentation deep neural network model;
wherein a loss value L at a k-th layer of a decoder in the semantically segmented deep neural network model k The following were used:
in the above formula, the parameters have the following meanings:
x denotes pixel coordinates, and Ω denotes a real number domain;
k x a predicted value in each channel of the occlusion feature map input for the last layer of the decoder;
a(k x ) Represents said k x Mapping to the (0, 1) interval to form an activation value of a shielding mapping value;
o (x) represents the occlusion label for each pixel x, taking 0 or 1;
o is an occlusion area, B is an occlusion boundary area;
ω 0 (x) Is the occlusion region weight;
ω b an initial weight of the occlusion boundary region;
d (σ) is a distance function based on the search window radius σ.
3. The method of claim 1, wherein: the occlusion boundary area is obtained by the following method:
obtaining an occlusion boundary from the real occlusion map;
performing mask expansion on the shielding boundary to obtain an expanded shielding area;
and subtracting the expanded occlusion region from the real occlusion image to obtain the occlusion boundary region.
5. The method of claim 4, wherein: the omega k Each layer was taken to be the same.
6. The method of claim 5, wherein: the structure of each layer of the decoder is as follows:
4 deconvolution modules stacked successively, wherein each deconvolution module is used for sequentially executing one deconvolution operation of 4 × 4 and two convolution operations of 7 × 7 to obtain a feature map after the deconvolution operation; wherein, normalization processing and activation processing are carried out once after each convolution operation;
the splicing module is used for splicing the feature map generated by the corresponding layer of the encoder, the feature map obtained by the layer of the decoder after the deconvolution operation and the up-sampled occlusion feature map processed by the layer before the decoder to obtain a spliced feature map, and executing a 3 × 3 convolution operation on the spliced feature map to generate an occlusion feature map; the occlusion feature map is to be processed into an up-sampled occlusion feature map that is to be up-sampled in a next decoding layer by doubling a resolution via an up-sampling operation;
when the splicing module in the first layer of the decoder performs splicing to obtain a spliced feature map, splicing the feature map of the coding part with the feature map operated by the deconvolution module to obtain the spliced feature map.
7. The method according to any one of claims 1 to 6, characterized in that: the acquiring a motion boundary region between the two frame images comprises:
detecting a moving boundary of the dense optical flow field with an edge detector;
and expanding the motion boundary of the dense optical flow field by using an expansion mask to obtain a motion boundary area.
8. Image sequence motion shelters from detection device which characterized in that: the method comprises the following steps:
the first acquisition module is used for acquiring any two continuous frames of images;
the second acquisition module is used for acquiring a dense optical flow field and a motion boundary area between the two frames of images;
the analysis output module is used for analyzing the dense optical flow field and the motion boundary region as input by utilizing a semantic segmentation deep neural network model to obtain an occlusion detection result output by the semantic segmentation deep neural network model;
wherein a loss value L at a k-th layer of a decoder in the semantically segmented deep neural network model k The following were used:
in the above formula, the parameters have the following meanings:
x denotes pixel coordinates, and Ω denotes a real number domain;
k x a predicted value in each channel of the occlusion feature map input for the last layer of the decoder;
a(k x ) Represents said k x Mapping to the (0, 1) interval to form an activation value of a shielding mapping value;
o (x) represents the occlusion label for each pixel x, taking 0 or 1;
o is an occlusion area, B is an occlusion boundary area;
ω 0 (x) Is the occlusion region weight;
ω b an initial weight of the occlusion boundary region;
d (σ) is a distance function based on the search window radius σ.
9. A memory for storing software, characterized in that the software is adapted to perform the method of any of claims 1-7.
10. A processor for processing software, characterized in that the software is adapted to perform the method of any of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210491032.0A CN114972422A (en) | 2022-05-07 | 2022-05-07 | Image sequence motion occlusion detection method and device, memory and processor |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210491032.0A CN114972422A (en) | 2022-05-07 | 2022-05-07 | Image sequence motion occlusion detection method and device, memory and processor |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114972422A true CN114972422A (en) | 2022-08-30 |
Family
ID=82980963
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210491032.0A Pending CN114972422A (en) | 2022-05-07 | 2022-05-07 | Image sequence motion occlusion detection method and device, memory and processor |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114972422A (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200084427A1 (en) * | 2018-09-12 | 2020-03-12 | Nvidia Corporation | Scene flow estimation using shared features |
CN110992367A (en) * | 2019-10-31 | 2020-04-10 | 北京交通大学 | Method for performing semantic segmentation on image with shielding area |
CN111401308A (en) * | 2020-04-08 | 2020-07-10 | 蚌埠学院 | Fish behavior video identification method based on optical flow effect |
CN112347852A (en) * | 2020-10-10 | 2021-02-09 | 上海交通大学 | Target tracking and semantic segmentation method and device for sports video and plug-in |
CN113888604A (en) * | 2021-09-27 | 2022-01-04 | 安徽清新互联信息科技有限公司 | Target tracking method based on depth optical flow |
US20220101539A1 (en) * | 2020-09-30 | 2022-03-31 | Qualcomm Incorporated | Sparse optical flow estimation |
-
2022
- 2022-05-07 CN CN202210491032.0A patent/CN114972422A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200084427A1 (en) * | 2018-09-12 | 2020-03-12 | Nvidia Corporation | Scene flow estimation using shared features |
CN110992367A (en) * | 2019-10-31 | 2020-04-10 | 北京交通大学 | Method for performing semantic segmentation on image with shielding area |
CN111401308A (en) * | 2020-04-08 | 2020-07-10 | 蚌埠学院 | Fish behavior video identification method based on optical flow effect |
US20220101539A1 (en) * | 2020-09-30 | 2022-03-31 | Qualcomm Incorporated | Sparse optical flow estimation |
CN112347852A (en) * | 2020-10-10 | 2021-02-09 | 上海交通大学 | Target tracking and semantic segmentation method and device for sports video and plug-in |
CN113888604A (en) * | 2021-09-27 | 2022-01-04 | 安徽清新互联信息科技有限公司 | Target tracking method based on depth optical flow |
Non-Patent Citations (2)
Title |
---|
LIU YU 等: ""Better Dense Trajectories by Motion in Videos"", 《 IEEE TRANSACTIONS ON CYBERNETICS》, vol. 49, no. 1, 28 November 2017 (2017-11-28), pages 159 - 170, XP011700745, DOI: 10.1109/TCYB.2017.2769097 * |
葛利跃 等: ""基于运动优化语义分割的变分光流计算方法"", 《模式识别与人工智能》, vol. 34, no. 7, 15 July 2021 (2021-07-15), pages 631 - 645 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Dolson et al. | Upsampling range data in dynamic environments | |
US8755563B2 (en) | Target detecting method and apparatus | |
JP2008518331A (en) | Understanding video content through real-time video motion analysis | |
CN112329702B (en) | Method and device for rapid face density prediction and face detection, electronic equipment and storage medium | |
CN109300151B (en) | Image processing method and device and electronic equipment | |
CN109377499B (en) | Pixel-level object segmentation method and device | |
US20170018106A1 (en) | Method and device for processing a picture | |
CN111311611B (en) | Real-time three-dimensional large-scene multi-object instance segmentation method | |
CN111986472B (en) | Vehicle speed determining method and vehicle | |
WO2016120132A1 (en) | Method and apparatus for generating an initial superpixel label map for an image | |
CN111382647B (en) | Picture processing method, device, equipment and storage medium | |
CN111415300A (en) | Splicing method and system for panoramic image | |
CN113269722A (en) | Training method for generating countermeasure network and high-resolution image reconstruction method | |
Patil et al. | End-to-end recurrent generative adversarial network for traffic and surveillance applications | |
Kim et al. | High-quality depth map up-sampling robust to edge noise of range sensors | |
CN112465029A (en) | Instance tracking method and device | |
CN112906614A (en) | Pedestrian re-identification method and device based on attention guidance and storage medium | |
US9659372B2 (en) | Video disparity estimate space-time refinement method and codec | |
CN111260686B (en) | Target tracking method and system for anti-shielding multi-feature fusion of self-adaptive cosine window | |
CN111881914A (en) | License plate character segmentation method and system based on self-learning threshold | |
Lee et al. | Integrating wavelet transformation with Markov random field analysis for the depth estimation of light‐field images | |
CN116468968A (en) | Astronomical image small target detection method integrating attention mechanism | |
CN114972422A (en) | Image sequence motion occlusion detection method and device, memory and processor | |
EP4235492A1 (en) | A computer-implemented method, data processing apparatus and computer program for object detection | |
Dong et al. | Monocular visual-IMU odometry using multi-channel image patch exemplars |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |