CN112529931B - Method and system for foreground segmentation - Google Patents

Method and system for foreground segmentation Download PDF

Info

Publication number
CN112529931B
CN112529931B CN202011539304.7A CN202011539304A CN112529931B CN 112529931 B CN112529931 B CN 112529931B CN 202011539304 A CN202011539304 A CN 202011539304A CN 112529931 B CN112529931 B CN 112529931B
Authority
CN
China
Prior art keywords
video frame
optical flow
segmentation
foreground segmentation
foreground
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011539304.7A
Other languages
Chinese (zh)
Other versions
CN112529931A (en
Inventor
梁栋
魏宗琪
耿其祥
孙涵
张立言
刘宁钟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202011539304.7A priority Critical patent/CN112529931B/en
Publication of CN112529931A publication Critical patent/CN112529931A/en
Application granted granted Critical
Publication of CN112529931B publication Critical patent/CN112529931B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a foreground segmentation method and a foreground segmentation system. The method comprises the following steps: acquiring a current video frame; acquiring a first video frame, a second video frame and a third video frame before the current moment; generating a layered light flow graph according to the current video frame, the first video frame, the second video frame and the third video frame; inputting a current video frame and a layered optical flow to the layered optical flow attention model to obtain a foreground segmentation matrix; and carrying out visual processing on the foreground segmentation matrix to obtain a segmentation result. By adopting the method and the system provided by the invention, the segmentation accuracy can be ensured while the cross-scene foreground segmentation is realized.

Description

Method and system for foreground segmentation
Technical Field
The invention relates to the technical field of scene segmentation, in particular to a method and a system for foreground segmentation.
Background
Monitoring systems are typically integrated with a variety of tasks such as object tracking, re-identification, and abnormal event alerting, which typically include segmentation tasks. Video foreground segmentation algorithms are subtasks in a segmentation task, which aim to identify moving objects (i.e. foreground) in a scene, playing an important role in a monitoring system. Currently, models of foreground segmentation include traditional background differential models and segmentation models based on deep learning. The traditional background differential model comprises Pfinder (Ping Desi) background reconstruction, foreground segmentation using a KDE (Kernel Density Estimate, kernel density estimation) and a ViBe (visualbackground extractor, video background extraction algorithm) model, and the traditional background differential model obtains a moving target in a scene in a differential mode by learning the background in a video scene. The method cannot acquire the background in the untrained scene, so that the method cannot well perform in the cross-scene foreground segmentation task.
The Deep learning-based segmentation models include Deep labv3+ (Deep labeling v3+ Model, deep labeling Model v3+ version), PSPNet (Pyramid scene parsing network, pyramid scene analysis network), and STAM (space-temporal attention Model). Modern semantic or instance segmentation methods (i.e., deep labv3+ and PSPNet) segment the foreground by providing semantic annotations for the entire scene. However, in the task of dividing the foreground across scenes, a large number of comments need to be made to adapt to different scenes, and the neural network needs to be retrained, otherwise, the foreground (especially the tiny foreground) is divided erroneously.
In addition, this approach ignores the motion properties of the foreground, and thus it is difficult to distinguish between motion foreground. In this regard, STAM fuses optical flow information, and according to the optical flow information of adjacent video frames, the segmentation result of the foreground is directly obtained by using an end-to-end model. However, although the STAM has a great improvement on the segmentation effect of the foreground, the segmentation result of the model only depends on the instantaneous motion to reflect the whole moving object, and the segmented result has the problem of holes and is also insufficient in guiding the segmentation of the foreground across the scene.
Therefore, how to improve the segmentation accuracy of the cross-scene foreground segmentation is a problem to be solved by those skilled in the art.
Disclosure of Invention
The invention aims to provide a foreground segmentation method and a foreground segmentation system, which can ensure segmentation accuracy while realizing cross-scene foreground segmentation.
In order to achieve the above object, the present invention provides the following solutions:
a method of foreground segmentation, comprising:
acquiring a current video frame;
acquiring a first video frame, a second video frame and a third video frame before the current moment;
generating a layered light flow graph according to the current video frame, the first video frame, the second video frame and the third video frame;
inputting the current video frame and the layered optical flow to a layered optical flow attention model to obtain a foreground segmentation matrix; the layered optical flow attention model is obtained by training a video frame encoder, an optical flow decoder and a decoder by using an intra-class scale loss function; the intra-class scale loss function is obtained by multiplying a loss adjustment parameter based on a target area on the basis of a focus loss function;
and carrying out visualization processing on the foreground segmentation matrix to obtain a segmentation result.
Optionally, the generating a layered optical flow graph according to the current video frame, the first video frame, the second video frame and the third video frame specifically includes:
determining the optical flow of the current video frame relative to the first video frame to obtain first optical flow information;
determining the optical flow of the current video frame relative to the second video frame to obtain second optical flow information;
determining the optical flow of the current video frame relative to the third video frame to obtain third optical flow information;
and inputting the first optical flow information into an R channel of the blank picture, inputting the second optical flow information into a G channel of the blank picture, and inputting the first optical flow information into a B channel of the blank picture to generate a layered optical flow diagram.
Optionally, before the inputting the current video frame and the layered optical flow to the layered optical flow attention model to obtain the foreground segmentation matrix, the method further includes:
and carrying out normalization processing on the current video frame and the layered optical flow.
Optionally, the step of training the hierarchical optical flow attention model specifically includes:
and training a video frame encoder, an optical flow decoder and a decoder by using the CDNet2014 data set as a training set and utilizing the intra-class scale loss function to obtain the hierarchical optical flow attention model.
Optionally, the process of training the video frame encoder, the optical flow decoder, and the decoder once includes:
selecting a group of training data in the CDNet2014 data set, wherein the training data comprises video frames, and layered light flow diagrams and true values corresponding to the video frames;
inputting the selected video frames into a video frame encoder to obtain video frame characteristics;
inputting the selected layered optical flow diagram into an optical flow encoder to obtain layered optical flow characteristics;
inputting video frame features and the layered optical flow features into the decoder for training to obtain a foreground segmentation matrix;
and calculating the loss of the foreground segmentation matrix and updating parameters of the video frame encoder, the optical flow decoder and the decoder according to the true value and the intra-class scale loss function.
Optionally, the intra-class scale loss is formulated as:
in the method, in the process of the invention,
wherein, loss CIS As intra-class scale loss function, alpha is balance factor, gamma is difficult and easy causeThe sub-value p is the probability value of the model prediction result, y is the true value, y=1 represents the foreground, y=0 represents the background, beta is the loss adjustment parameter based on the target area, t is the weight coefficient, fg is the moving target, and s (fg) is the ratio of the moving target in the scene true value.
Optionally, the performing visualization processing on the foreground segmentation matrix to obtain a segmentation result specifically includes:
multiplying 255 by the foreground segmentation matrix to obtain an expanded foreground segmentation matrix;
and according to the threshold value of the segmentation pixel, carrying out binarization processing on the expanded foreground segmentation matrix to obtain a segmentation result.
Optionally, the binarizing processing is performed on the expanded foreground segmentation matrix according to the segmentation pixel threshold value to obtain a segmentation result, which specifically includes:
updating all elements larger than the threshold value of the segmentation pixels in the expanded foreground segmentation matrix to 255 to obtain a first updated foreground segmentation matrix;
updating elements smaller than or equal to the threshold value of the segmentation pixels in the foreground segmentation matrix after the first updating to 0 to obtain a foreground segmentation matrix after the second updating;
and identifying all elements equal to 255 in the foreground segmentation matrix after the second updating as the foreground, and identifying all elements equal to 0 in the foreground segmentation matrix after the second updating as the background to obtain a segmentation result.
Optionally, the split pixel threshold is 15.
A system of foreground segmentation, the system comprising:
the current video frame acquisition module is used for acquiring a current video frame;
the second video frame acquisition module is used for acquiring a first video frame, a second video frame and a third video frame before the current moment;
the layered light flow graph generating module is used for generating a layered light flow graph according to the current video frame, the first video frame, the second video frame and the third video frame;
the foreground segmentation matrix determining module is used for inputting the current video frame and the layered optical flow to the layered optical flow attention model to obtain a foreground segmentation matrix; the layered optical flow attention model is obtained by training a video frame encoder, an optical flow decoder and a decoder by using an intra-class scale loss function; the intra-class scale loss function is obtained by multiplying a loss adjustment parameter based on a target area on the basis of a focus loss function;
and the visualization processing module is used for carrying out visualization processing on the foreground segmentation matrix to obtain a segmentation result.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides a foreground segmentation method and a foreground segmentation system, which are characterized in that on the basis of a focus loss function, an intra-class scale loss function is obtained by multiplying a loss adjustment parameter based on a target area, a CDNet2014 data set and the intra-class scale loss function are used for training a video frame encoder, an optical flow decoder and a decoder to obtain a layered optical flow attention model (Hierarchical Optical Flow Attention Model, HOFAM), a current video frame and a layered optical flow are input into the layered optical flow attention model to obtain a foreground segmentation matrix, and the foreground segmentation matrix is subjected to visual processing to obtain a segmentation result. According to the invention, by acquiring the optical flow information of the multi-frame video frame before the current moment, the moving target of the video information is used as the focus point of foreground segmentation, the step of scene-crossing foreground segmentation is simplified on the basis of improving the precision of small-area foreground segmentation, the problem of holes in the scene-crossing segmentation is solved, and the precision of scene-crossing foreground segmentation is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings that are needed in the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a square flow chart of foreground segmentation provided in an embodiment of the present invention;
FIG. 2 is a graph showing the comparison of the segmentation results of the application of layered optical flow information and the application of single-layer optical flow and double-layer optical flow in the embodiment of the present invention;
FIG. 3 is a graph showing a comparison of visual segmentation results of a hierarchical optical flow attention model and an existing model under different scenes provided in an embodiment of the present invention;
FIG. 4 is a graph showing a comparison of visual segmentation results of a hierarchical optical flow attention model and an existing model under different scenes provided in an embodiment of the present invention;
FIG. 5 is a graph comparing the results of the visualization using the focus loss and regression loss functions provided in the examples of the present invention;
FIG. 6 is a schematic diagram of a hierarchical optical flow attention model according to an embodiment of the present invention;
FIG. 7 is a flow chart of a hierarchical optical flow attention model training process provided in an embodiment of the present invention;
fig. 8 is a schematic structural diagram of a foreground segmentation system according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention aims to provide a foreground segmentation method and a foreground segmentation system, which can ensure segmentation accuracy while realizing cross-scene foreground segmentation.
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
Examples
Fig. 1 is a flowchart of foreground segmentation provided in an embodiment of the present invention, and as shown in fig. 1, the present invention provides a foreground segmentation method, where the method includes:
step 101: the current video frame is acquired.
Step 102: and acquiring a first video frame, a second video frame and a third video frame before the current moment.
Step 103: and generating a layered light flow graph according to the current video frame, the first video frame, the second video frame and the third video frame. The split result pairs using the hierarchical optical flow, the single-layer optical flow, and the double-layer optical flow are shown in fig. 2. Specifically, a current video frame, a first video frame, a second video frame and a third video frame are input into a selflow model to obtain a layered light flow graph.
Step 103, specifically includes:
and determining the optical flow of the current video frame relative to the first video frame to obtain first optical flow information.
And determining the optical flow of the current video frame relative to the second video frame to obtain second optical flow information.
And determining the optical flow of the current video frame relative to the third video frame to obtain third optical flow information.
And inputting the first optical flow information into an R channel of the blank picture, inputting the second optical flow information into a G channel of the blank picture, and inputting the first optical flow information into a B channel of the blank picture to generate a layered optical flow diagram.
Step 104: inputting a current video frame and a layered optical flow to the layered optical flow attention model to obtain a foreground segmentation matrix; the layered optical flow attention model is obtained by training a video frame encoder, an optical flow decoder and a decoder by using a Class-In Scale (CIS) loss function; the intra-class scale loss function is obtained by multiplying a loss adjustment parameter based on the target area on the basis of the focus loss function. Visual segmentation result pairs of a hierarchical optical flow attention model and an existing model under different scenes are shown in fig. 3-4. Also included before step 104 is: and carrying out normalization processing on the current video frame and the layered optical flow diagram.
The training step of the hierarchical optical flow attention model specifically comprises the following steps:
the CDNet2014 data set is used as a training set, and a video frame encoder, an optical flow decoder and a decoder are trained by using intra-class scale loss functions, so that a layered optical flow attention model is obtained. The CDNet2014 data set is not only a vehicle-related data set, but also contains various scenes (including various scenes such as scene jitter, dynamic background and infrared) of the monitoring video, and is a main stream data set in foreground segmentation. The CDNet2014 dataset includes a video frame set, a layered optical flow atlas, and a truth set.
A process for training a video frame encoder, an optical flow decoder, and a decoder, comprising: selecting a group of training data in the CDNet2014 data set, wherein the training data comprises video frames, and layered light flow diagrams and true values corresponding to the video frames; inputting the selected video frames into a video frame encoder to obtain video frame characteristics; inputting the selected layered optical flow diagram into an optical flow encoder to obtain layered optical flow characteristics; inputting the video frame features and the layered optical flow features into a decoder for training to obtain a foreground segmentation matrix; based on the truth values and intra-class scale loss functions, the loss of the foreground partition matrix is calculated and parameters of the video frame encoder, optical flow decoder, and decoder are updated.
And stopping training when the iteration reaches the preset times to obtain a layered optical flow attention model.
In the conventional foreground segmentation technique, the model is often trained using a regression Loss (L1 Loss) and a focus Loss (focal Loss) function, and the effect of using the model after the focus Loss training to deal with the imbalance problem between the foreground and background categories is better in the foreground segmentation technique as shown in fig. 5, using the visualization result of the focus Loss and the regression Loss function, the focus Loss function is as follows:in the Loss focal Indicating a focus loss.
However, the model after training using focus loss has a good effect only on the segmentation of a large target object (foreground with large area), but there is a defect in the segmentation of a small target object (foreground with small area), even resulting in complete absence of the small target object. The invention provides intra-class scale loss, which is used as a loss function for guiding a truth image and a prediction result in the training process, and then a back propagation network is used for training a model.
Specifically, the intra-class scale loss formula is:
in the method, in the process of the invention,
wherein, loss CIS The method is characterized in that the method is an intra-class scale loss function, alpha is a balance factor, gamma is a difficulty factor, p is a probability value of a model prediction result, y is a true value, y=1 represents a foreground, y=0 represents a background, beta is a loss adjustment parameter based on a target area, t is a weight coefficient, fg is a moving target, and s (fg) is a ratio of the moving target in a scene in the scene true value. The larger the foreground target area is, the smaller the beta is, so that the loss of a large target is reduced; the smaller the foreground target area, the larger the beta, so that the loss of small targets is up-regulated.
The intra-class scale loss function takes the scale of the target as a reference for adjustment, and takes the index of adjustmentAnd 50 (50 is a parameter obtained by sampling multiple times in a training scene), taking into account the absence of targets in the scene.
FIG. 6 is a schematic diagram of a hierarchical optical flow attention model according to an embodiment of the present invention, and FIG. 7 is a flowchart of a hierarchical optical flow attention model training according to an embodiment of the present invention, as shown in FIGS. 6-7, where a hierarchical optical flow solving process is as follows: first, the 1 st frame tau before the current moment T is extracted 1 Frame 5 tau 2 And frame 10 tau 3 Three frames of video frames, wherein the video frames are respectively combined with the video at the current moment TThe frame (current video frame) obtains the optical flow to obtain the long and short optical flow Op (tau) with different motion characteristics 1 )、Op(τ 2 ) And Op (τ) 3 ) The long and short optical flows Op (τ 1 )、Op(τ 2 ) And Op (τ) 3 ) And correspondingly inputting the R channel, the G channel and the B channel in the same blank picture to obtain a layered optical flow diagram.
The hierarchical optical flow attention model (comprising a trained video frame encoder, an optical flow encoder and a decoder) inputs a normalized current video frame to the video frame encoder, the video frame encoder extracts video frame features by performing a series of convolution operations on a current video frame matrix, then inputs a normalized hierarchical optical flow graph to the optical flow encoder, the optical flow encoder extracts required optical flow features by performing a series of convolution operations on the hierarchical optical flow matrix, inputs the extracted video frame features and the optical flow features to the decoder, the decoder performs decoding convolution operations on the video frame features and the optical flow features, and an attention module in the decoder combines the optical flow features and the video frame features to obtain motion information (namely a foreground), highlights the motion information on a video frame part and outputs a feature result (a foreground segmentation matrix) of foreground segmentation from end to end of the video frame. The foreground segmentation matrix is a normalized matrix, the foreground segmentation result is a single-channel picture (256×256 pixels in size), and the visualization processing from the feature result directly output by the model to the final segmentation result (picture) is also needed.
Step 105: and carrying out visual processing on the foreground segmentation matrix to obtain a segmentation result.
Step 105 specifically includes: multiplying the foreground segmentation matrix by 255 to obtain an expanded foreground segmentation matrix; and according to the threshold value of the segmentation pixel, performing binarization processing on the expanded foreground segmentation matrix to obtain a segmentation result.
According to the threshold value of the segmentation pixel, binarizing the expanded foreground segmentation matrix to obtain a segmentation result, wherein the method specifically comprises the following steps: updating elements larger than the threshold value of the segmentation pixels in the expanded foreground segmentation matrix to 255 to obtain a first updated foreground segmentation matrix; updating elements smaller than or equal to the threshold value of the segmentation pixels in the foreground segmentation matrix after the first updating to 0 to obtain a foreground segmentation matrix after the second updating; and identifying all elements equal to 255 in the foreground segmentation matrix after the second updating as the foreground, and identifying all elements equal to 0 in the foreground segmentation matrix after the second updating as the background, so as to obtain a segmentation result. The split pixel threshold is 15.
Fig. 8 is a schematic structural diagram of a foreground segmentation system according to an embodiment of the present invention, where, as shown in fig. 8, the foreground segmentation system according to the present invention includes:
the current video frame acquisition module 201 is configured to acquire a current video frame.
The second video frame acquisition module 202 is configured to acquire a first video frame, a second video frame, and a third video frame before the current time.
The layered optical flow map generating module 203 is configured to generate a layered optical flow map for a current video frame, a first video frame, a second video frame, and a third video frame.
The layered optical flow map generating module 203 specifically includes:
and the first optical flow information determining unit is used for determining the optical flow of the current video frame relative to the first video frame to obtain first optical flow information.
And the second optical flow information determining unit is used for determining the optical flow of the current video frame relative to the second video frame to obtain second optical flow information.
And the third optical flow information determining unit is used for determining the optical flow of the current video frame relative to the third video frame to obtain third optical flow information.
The layered optical flow diagram generating unit is used for inputting the first optical flow information into the R channel of the blank picture, inputting the second optical flow information into the G channel of the blank picture, and inputting the first optical flow information into the B channel of the blank picture to generate a layered optical flow diagram.
The foreground segmentation matrix determining module 204 is configured to input a current video frame and a layered optical flow graph to the layered optical flow attention model to obtain a foreground segmentation matrix; the hierarchical optical flow attention model is obtained by training a video frame encoder, an optical flow encoder and a decoder by using an intra-class scale loss function; the intra-class scale loss function is obtained by multiplying a loss adjustment parameter based on the target area on the basis of the focus loss function.
The foreground segmentation matrix determining module 204 specifically includes a training unit of the hierarchical optical flow attention model, configured to train the video frame encoder, the optical flow decoder, and the decoder with the CDNet2014 data set as a training set and using intra-class scale loss functions to obtain the hierarchical optical flow attention model.
The training unit of the hierarchical optical flow attention model performs one training process on the video frame encoder, the optical flow decoder and the decoder as follows: selecting a group of training data in the CDNet2014 data set, wherein the training data comprises video frames, and layered light flow diagrams and true values corresponding to the video frames; inputting the selected video frames into a video frame encoder to obtain video frame characteristics; inputting the selected layered optical flow diagram into an optical flow encoder to obtain layered optical flow characteristics; inputting the video frame features and the layered optical flow features into a decoder for training to obtain a foreground segmentation matrix; based on the truth values and intra-class scale loss functions, the loss of the foreground partition matrix is calculated and parameters of the video frame encoder, optical flow decoder, and decoder are updated.
The intra-class scale loss is calculated using the following formula:
in the method, in the process of the invention,
wherein, loss CIS The method is characterized in that the method is an intra-class scale loss function, alpha is a balance factor, gamma is a difficulty factor, p is a probability value of a model prediction result, y is a true value, y=1 represents a foreground, y=0 represents a background, beta is a loss adjustment parameter based on a target area, t is a weight coefficient, fg is a moving target, and s (fg) is a ratio of the moving target in a scene in the scene true value.
And the visualization processing module 205 is configured to perform visualization processing on the foreground segmentation matrix to obtain a segmentation result.
The visualization processing module 205 specifically includes:
and the foreground segmentation matrix expansion unit is used for multiplying the foreground segmentation matrix by 255 to obtain an expanded foreground segmentation matrix.
And the binarization processing unit is used for carrying out binarization processing on the expanded foreground segmentation matrix according to the segmentation pixel threshold value to obtain a segmentation result.
The binarization processing unit specifically comprises:
and the foreground segmentation matrix first updating subunit is used for updating all elements larger than the segmentation pixel threshold value in the expanded foreground segmentation matrix to 255 to obtain a foreground segmentation matrix after the first updating.
And the second updating subunit of the foreground segmentation matrix is used for updating elements, smaller than or equal to the threshold value of the segmentation pixels, in the foreground segmentation matrix after the first updating to 0 to obtain the foreground segmentation matrix after the second updating.
And the segmentation subunit is used for identifying all elements equal to 255 in the foreground segmentation matrix after the second updating as the foreground, and identifying all elements equal to 0 in the foreground segmentation matrix after the second updating as the background, so as to obtain a segmentation result. Wherein the split pixel threshold is 15.
In addition, the foreground segmentation system provided by the invention further comprises: and the normalization processing module is used for carrying out normalization processing on the current video frame and the layered optical flow diagram.
The invention provides a foreground segmentation method and a foreground segmentation system, which provide an intra-class scale loss function and a layered optical flow attention model. First, foreground segmentation using hierarchical optical flow information can significantly improve the hole problem (as can be seen from fig. 2); secondly, when the hierarchical optical flow attention model performs cross-scene segmentation, the segmentation result is almost close to a true value, and compared with deep LabV3+, PSPNet and STAM, the hierarchical optical flow attention model shows good segmentation effect (as can be seen from FIGS. 3-4); thirdly, the intra-class scale loss function training model is used, the problem that a small target cannot be segmented in the prior art is solved, and segmentation accuracy can be guaranteed while cross-scene foreground segmentation is achieved.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the system disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to assist in understanding the methods of the present invention and the core ideas thereof; also, it is within the scope of the present invention to be modified by those of ordinary skill in the art in light of the present teachings. In summary, the present description should not be construed as limiting the invention.

Claims (7)

1. A method of foreground segmentation, the method comprising:
acquiring a current video frame;
acquiring a first video frame, a second video frame and a third video frame before the current moment;
generating a layered light flow graph according to the current video frame, the first video frame, the second video frame and the third video frame;
inputting the current video frame and the layered optical flow to a layered optical flow attention model to obtain a foreground segmentation matrix; the layered optical flow attention model is obtained by training a video frame encoder, an optical flow decoder and a decoder by using an intra-class scale loss function; the intra-class scale loss function is obtained by multiplying a loss adjustment parameter based on a target area on the basis of a focus loss function;
performing visual processing on the foreground segmentation matrix to obtain a segmentation result;
the step of training the hierarchical optical flow attention model specifically comprises the following steps:
training a video frame encoder, an optical flow decoder and a decoder by using the CDNet2014 data set as a training set and utilizing the intra-class scale loss function to obtain the layered optical flow attention model;
a process for training a video frame encoder, an optical flow decoder, and a decoder, comprising:
selecting a group of training data in the CDNet2014 data set, wherein the training data comprises video frames, and layered light flow diagrams and true values corresponding to the video frames;
inputting the selected video frames into a video frame encoder to obtain video frame characteristics;
inputting the selected layered optical flow diagram into an optical flow encoder to obtain layered optical flow characteristics;
inputting video frame features and the layered optical flow features into the decoder for training to obtain a foreground segmentation matrix;
calculating a loss of the foreground segmentation matrix and updating parameters of the video frame encoder, the optical flow decoder, and the decoder according to the truth value and the intra-class scale loss function;
the intra-class scale loss is formulated as:
in the method, in the process of the invention,
wherein, loss CIS The method is characterized in that the method is an intra-class scale loss function, alpha is a balance factor, gamma is a difficulty factor, p is a probability value of a model prediction result, y is a true value, y=1 represents a foreground, y=0 represents a background, beta is a loss adjustment parameter based on a target area, t is a weight coefficient, fg is a moving target, and s (fg) is a ratio of the moving target in a scene in the scene true value.
2. The method according to claim 1, wherein generating a layered light flow graph from the current video frame, the first video frame, the second video frame and the third video frame, comprises:
determining the optical flow of the current video frame relative to the first video frame to obtain first optical flow information;
determining the optical flow of the current video frame relative to the second video frame to obtain second optical flow information;
determining the optical flow of the current video frame relative to the third video frame to obtain third optical flow information;
and inputting the first optical flow information into an R channel of the blank picture, inputting the second optical flow information into a G channel of the blank picture, and inputting the first optical flow information into a B channel of the blank picture to generate a layered optical flow diagram.
3. The method of foreground segmentation of claim 1, further comprising, before said inputting the current video frame and the layered optical flow to a layered optical flow attention model to obtain a foreground segmentation matrix:
and carrying out normalization processing on the current video frame and the layered light flow graph.
4. A method of foreground segmentation according to claim 3, wherein the performing a visualization process on the foreground segmentation matrix to obtain a segmentation result specifically comprises:
multiplying 255 by the foreground segmentation matrix to obtain an expanded foreground segmentation matrix;
and according to the threshold value of the segmentation pixel, carrying out binarization processing on the expanded foreground segmentation matrix to obtain a segmentation result.
5. The method for foreground segmentation according to claim 4, wherein the binarizing the expanded foreground segmentation matrix according to the segmentation pixel threshold value to obtain a segmentation result specifically comprises:
updating all elements larger than the threshold value of the segmentation pixels in the expanded foreground segmentation matrix to 255 to obtain a first updated foreground segmentation matrix;
updating elements smaller than or equal to the threshold value of the segmentation pixels in the foreground segmentation matrix after the first updating to 0 to obtain a foreground segmentation matrix after the second updating;
and identifying all elements equal to 255 in the foreground segmentation matrix after the second updating as the foreground, and identifying all elements equal to 0 in the foreground segmentation matrix after the second updating as the background to obtain a segmentation result.
6. The method of foreground segmentation of claim 5, wherein the segmented pixel threshold is 15.
7. A system for foreground segmentation, characterized in that the system applies a method for foreground segmentation as claimed in any one of claims 1-6, the system comprising:
the current video frame acquisition module is used for acquiring a current video frame;
the second video frame acquisition module is used for acquiring a first video frame, a second video frame and a third video frame before the current moment;
the layered light flow graph generating module is used for generating a layered light flow graph according to the current video frame, the first video frame, the second video frame and the third video frame;
the foreground segmentation matrix determining module is used for inputting the current video frame and the layered optical flow to the layered optical flow attention model to obtain a foreground segmentation matrix; the layered optical flow attention model is obtained by training a video frame encoder, an optical flow decoder and a decoder by using an intra-class scale loss function; the intra-class scale loss function is obtained by multiplying a loss adjustment parameter based on a target area on the basis of a focus loss function;
and the visualization processing module is used for carrying out visualization processing on the foreground segmentation matrix to obtain a segmentation result.
CN202011539304.7A 2020-12-23 2020-12-23 Method and system for foreground segmentation Active CN112529931B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011539304.7A CN112529931B (en) 2020-12-23 2020-12-23 Method and system for foreground segmentation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011539304.7A CN112529931B (en) 2020-12-23 2020-12-23 Method and system for foreground segmentation

Publications (2)

Publication Number Publication Date
CN112529931A CN112529931A (en) 2021-03-19
CN112529931B true CN112529931B (en) 2024-04-12

Family

ID=74975909

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011539304.7A Active CN112529931B (en) 2020-12-23 2020-12-23 Method and system for foreground segmentation

Country Status (1)

Country Link
CN (1) CN112529931B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113744306B (en) * 2021-06-08 2023-07-21 电子科技大学 Video target segmentation method based on time sequence content perception attention mechanism
CN113505737B (en) * 2021-07-26 2024-07-02 浙江大华技术股份有限公司 Method and device for determining foreground image, storage medium and electronic device
WO2024021016A1 (en) 2022-07-29 2024-02-01 宁德时代新能源科技股份有限公司 Measurement method and measurement apparatus

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108280831A (en) * 2018-02-02 2018-07-13 南昌航空大学 A kind of acquisition methods and system of image sequence light stream
CN109766828A (en) * 2019-01-08 2019-05-17 重庆同济同枥信息技术有限公司 A kind of vehicle target dividing method, device and communication equipment
CN110147763A (en) * 2019-05-20 2019-08-20 哈尔滨工业大学 Video semanteme dividing method based on convolutional neural networks
CN110738682A (en) * 2019-10-23 2020-01-31 南京航空航天大学 foreground segmentation method and system
CN110866938A (en) * 2019-11-21 2020-03-06 北京理工大学 Full-automatic video moving object segmentation method
CN111489372A (en) * 2020-03-11 2020-08-04 天津大学 Video foreground and background separation method based on cascade convolution neural network
CN111860162A (en) * 2020-06-17 2020-10-30 上海交通大学 Video crowd counting system and method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040032906A1 (en) * 2002-08-19 2004-02-19 Lillig Thomas M. Foreground segmentation for digital video

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108280831A (en) * 2018-02-02 2018-07-13 南昌航空大学 A kind of acquisition methods and system of image sequence light stream
CN109766828A (en) * 2019-01-08 2019-05-17 重庆同济同枥信息技术有限公司 A kind of vehicle target dividing method, device and communication equipment
CN110147763A (en) * 2019-05-20 2019-08-20 哈尔滨工业大学 Video semanteme dividing method based on convolutional neural networks
CN110738682A (en) * 2019-10-23 2020-01-31 南京航空航天大学 foreground segmentation method and system
CN110866938A (en) * 2019-11-21 2020-03-06 北京理工大学 Full-automatic video moving object segmentation method
CN111489372A (en) * 2020-03-11 2020-08-04 天津大学 Video foreground and background separation method based on cascade convolution neural network
CN111860162A (en) * 2020-06-17 2020-10-30 上海交通大学 Video crowd counting system and method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
基于动态特征融合的智能车应用检测分割技术;舒鑫印;王萍;;计算机工程与设计;20201016(第10期);全文 *
基于时空感知级联神经网络的视频前背景分离;杨敬钰;师雯;李坤;宋晓林;岳焕景;;天津大学学报(自然科学与工程技术版);20200427(第06期);全文 *
基于深度学习的视频语义分割综述;韩利丽;孟朝晖;;计算机系统应用;20191215(第12期);全文 *

Also Published As

Publication number Publication date
CN112529931A (en) 2021-03-19

Similar Documents

Publication Publication Date Title
CN112529931B (en) Method and system for foreground segmentation
CN111768388B (en) Product surface defect detection method and system based on positive sample reference
CN107909638B (en) Rendering method, medium, system and electronic device of virtual object
CN110648310A (en) Weak supervision casting defect identification method based on attention mechanism
CN114187311A (en) Image semantic segmentation method, device, equipment and storage medium
CN111914654B (en) Text layout analysis method, device, equipment and medium
CN110020658B (en) Salient object detection method based on multitask deep learning
CN111382647B (en) Picture processing method, device, equipment and storage medium
CN112508099A (en) Method and device for detecting target in real time
CN114897738A (en) Image blind restoration method based on semantic inconsistency detection
CN114708436B (en) Training method of semantic segmentation model, semantic segmentation method, semantic segmentation device and semantic segmentation medium
CN116030018A (en) Incoming material qualification inspection system and method for door processing
CN115240035A (en) Semi-supervised target detection model training method, device, equipment and storage medium
CN113850135A (en) Dynamic gesture recognition method and system based on time shift frame
CN110147724B (en) Method, apparatus, device, and medium for detecting text region in video
CN116342474A (en) Wafer surface defect detection method
CN114863182A (en) Image classification method, and training method and device of image classification model
CN111612803B (en) Vehicle image semantic segmentation method based on image definition
CN113807185A (en) Data processing method and device
CN113283396A (en) Target object class detection method and device, computer equipment and storage medium
CN114596244A (en) Infrared image identification method and system based on visual processing and multi-feature fusion
WO2023032665A1 (en) Label generation method, model generation method, label generation device, label generation program, model generation device, and model generation program
CN112396126B (en) Target detection method and system based on detection trunk and local feature optimization
CN114882469A (en) Traffic sign detection method and system based on DL-SSD model
CN114022938A (en) Method, device, equipment and storage medium for visual element identification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant