CN112907621A - Moving object extraction method based on difference and semantic information fusion - Google Patents

Moving object extraction method based on difference and semantic information fusion Download PDF

Info

Publication number
CN112907621A
CN112907621A CN202011439962.9A CN202011439962A CN112907621A CN 112907621 A CN112907621 A CN 112907621A CN 202011439962 A CN202011439962 A CN 202011439962A CN 112907621 A CN112907621 A CN 112907621A
Authority
CN
China
Prior art keywords
difference
target
semantic information
frame
moving
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011439962.9A
Other languages
Chinese (zh)
Other versions
CN112907621B (en
Inventor
谢巍
卢永辉
周延
许练濠
吴伟林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202011439962.9A priority Critical patent/CN112907621B/en
Publication of CN112907621A publication Critical patent/CN112907621A/en
Application granted granted Critical
Publication of CN112907621B publication Critical patent/CN112907621B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/215Motion-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/12Edge-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20036Morphological image processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a moving target extraction method based on difference and semantic information fusion, which mainly comprises the following steps: (1) acquiring an N-frame image sequence from monitoring equipment; (2) calculating difference information between image frames by using an inter-frame difference method according to two image frames with an interval of N; (3) extracting semantic information in the image by using a trained example segmentation model based on a convolutional neural network, wherein the semantic information comprises a target class and a pixel mask; (4) and combining the difference information and the semantic information through a fusion algorithm to extract a moving target in the image. According to the method, strong semantic information is introduced through an example segmentation model based on a convolutional neural network, and a moving target in an image can be well extracted by combining difference information obtained by an interframe difference method. The method is simple to implement and has good robustness.

Description

Moving object extraction method based on difference and semantic information fusion
Technical Field
The invention relates to the field of digital image processing and computer vision, in particular to a moving object extraction method based on difference and semantic information fusion.
Background
The analysis of moving objects has been one of the important research contents in the field of computer vision, and has been widely applied in production and life. The most common application scenario is analysis of surveillance videos, which usually does not pay much attention to static targets, but needs to be intensively analyzed for moving targets, because moving targets are likely to have important influence on production and living activities in the current surveillance scenario. At present, the moving target is mainly analyzed based on a background difference method and an inter-frame difference method (joint, Pierre-Marc. comprehensive study of background sub-analysis algorithms [ J ]. Journal of Electronic Imaging,2010,19(3): 033003-033003. the background difference method needs to model the background, but the same modeling method is difficult to be applied to a plurality of application scenarios, and the background model is established and background updating in the subsequent process needs a large amount of calculation, so that the use is complicated.
Thanks to the explosion of deep learning and the exponential increase of computing hardware computing power, convolutional neural networks are highly brilliant in various application fields of computer vision. The convolutional neural network has a parameter number of millions, which makes it possible to extract feature information of various levels in an image, including low-level texture information and high-level semantic information. The low-level texture information is the basis for correctly predicting and extracting high-level semantic information by the network, and the high-level semantic information enables the result of network prediction to have continuity and integrity. Currently, most convolutional neural networks predict a single picture, so that it is difficult to acquire motion information in the Image (Braham M, sbestien parrard, Droogenbroeck M v.semantic Background processing. IEEE/IEEE International Conference on Image processing. 2018.).
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a moving object extraction method based on difference and semantic information fusion. The method comprises the steps of obtaining semantic information in an image by using a convolutional neural network, obtaining difference information in the image by using an interframe difference method, combining the semantic information and the difference information by using a fusion algorithm, and finally extracting a moving target in the image. The method combines the semantic information extracted by the convolutional neural network and the motion information obtained by the interframe difference method, can well extract the motion target in the scene, and has simple implementation method and good robustness.
A moving object extraction method based on difference and semantic information fusion comprises the following steps:
s1, acquiring an N-frame image sequence from the monitoring equipment;
s2, calculating difference information between image frames by using an inter-frame difference method according to two image frames with an interval of N;
s3, extracting semantic information in the image by using the trained example segmentation model based on the convolutional neural network, wherein the semantic information comprises a target class and a pixel mask;
and S4, combining the difference information and the semantic information through a fusion algorithm, and extracting the moving object in the image.
Preferably, the image frames acquired in step S1 are all 24-bit RGB images of 3 channels, the length of the image sequence needs to be kept at N, and all the image frames need to be filtered by a gaussian kernel, which is expressed by the following expression:
Figure BDA0002830060310000021
preferably, in the step S1, when the image sequence is initialized, all image frames in the sequence adopt the 1 st frame, and the replacement starts as the number of acquired frames increases, and the replacement is performed according to a first-in-first-out (FIFO) principle, that is, the T-th frame replaces the T-N-th frame, the T + 1-th frame replaces the T-N + 1-th frame, and T > N.
The length N of the image sequence in step S1 is adjusted according to a specific application scenario.
Preferably, the input of the inter-frame difference method applied in step S2 is the T-N frame and the T-th frame, and the inter-frame difference method specifically includes the following steps:
s21, converting the two frame images from the RGB images into gray-scale images respectively, wherein the conversion formula is as follows:
gray=0.299*R+0.587*G+0.114*B
r, G, B, which represent the three color channels of a color image, respectively;
s22, subtracting the gray values of the corresponding positions of the two frames of images, and then taking the absolute value to obtain a difference result, wherein the formula is as follows:
dif(x,y)=|src(x,y)-dst(x,y)|
where src (x, y) represents a pixel value at coordinate (x, y) on the grayscale map of the reference frame, and dst (x, y) represents a pixel value at coordinate (x, y) on the grayscale map of the current frame.
Preferably, the input size of the convolutional neural network used in step S3 is 416 × 416 × 3;
the convolutional neural network comprises a backbone network, a characteristic pyramid network and 3 detection heads, wherein the backbone network is as follows: the system comprises a basic convolutional layer, a down-sampling layer, a residual error module, a down-sampling layer, 2 residual error modules which are connected in series, a down-sampling layer, 4 residual error modules which are connected in series, a down-sampling layer and 2 residual error modules which are connected in series;
the structure of the feature Pyramid network fpn (feature Pyramid network) from bottom to top is: the device comprises a cascade structure, a basic convolution layer, an upper sampling layer, a cascade structure, a basic convolution layer, an upper sampling layer and a cascade structure; the trunk network and the feature pyramid network are combined through 3 transverse connections, each transverse connection is composed of a basic convolution layer, the size matching of feature graphs is required to be ensured during connection, namely the feature graphs with the same size from the trunk network and the feature pyramid network are connected through one transverse connection;
the structure and parameters of the 3 detection heads are the same, and each detection head comprises two prediction branches; predicting a target class which is responsible for prediction of a current grid on each grid of a characteristic diagram by one of prediction branches corresponding to the target class, wherein the output dimension is S multiplied by C, S represents the size of the characteristic diagram of a current detection head, C represents the number of classes which need to be predicted in total, and the prediction branches are composed of basic convolution layers; the other prediction branch corresponds to target pixel mask prediction, the position mask of the target which is responsible for prediction of the current grid is predicted on each grid of the characteristic diagram, and the output dimension is H multiplied by W multiplied by S2Where S denotes a feature map size of a current detection head, H and W denote a height and a width of an input picture, respectively, and the prediction branch is composed of an upsampled layer and a base convolutional layer.
Preferably, the upsampling layer in the FPN structure is a resize function by using a nearest neighbor difference method;
the upsampling layer in the branch is masked at the position of the detection header by a Transposed convolution (Transposed convolution).
Preferably, the fusion algorithm in step S4 includes the following steps:
s41, performing morphological filtering on the difference result, and then performing binarization to obtain a moving pixel mask;
s42, performing channel separation on the target mask in the segmentation result, and performing binarization to obtain a target pixel mask;
s43, respectively calculating the proportion P of the moving pixels in each target pixel mask;
s44, judging the size relation between the ratio P of the moving pixels and the ratio threshold T by combining the set ratio threshold T, and if P is greater than T, judging that the current target is a moving target;
in the step S41, the morphological filtering is an On (OPEN) operation, specifically, an etching (erod) operation is performed first, and then an expansion (DILATE) operation is performed again;
and performing binarization operation after the morphological filtering operation, wherein a value thresh close to 0 is selected as a binarization threshold value, and a motion pixel mask obtained after binarization only comprises two values: 0 and 255, 8-bit unsigned integer, the formula of binarization is as follows:
Figure BDA0002830060310000031
where dif (x, y) denotes the difference result obtained in step S22.
Preferably, the channel separation operation in step S42 is to divide all target pixel masks predicted by the model one by one in the channel dimension to obtain a single-channel grayscale target pixel mask;
and (3) carrying out binarization after channel separation, wherein the threshold value of binarization is 1, and the target mask obtained after binarization only contains two values: 0 and 255, 8-bit unsigned integer.
Preferably, the step of calculating the ratio P of the moving pixels in the target pixel mask in step S43 is as follows:
calculating the number n of pixels with the median value of 255 in the target pixel mask1
Secondly, calculating the number n of pixels with the median values of 255 in the target pixel mask and the motion pixel mask2
Calculating the proportion P of the moving pixels in the target pixel mask by the following formula:
Figure BDA0002830060310000041
preferably, in the step S44, by adjusting the moving pixel ratio threshold T in the target pixel mask, the tolerance of the algorithm to the environmental noise may be adjusted, where the larger T is, the less sensitive to the environmental noise is, and if P > T, the current target is determined to be a moving target; otherwise, the target is a non-moving target.
Compared with the prior art, the invention has the following advantages and effects:
(1) the invention relates to a moving target extraction method based on difference and semantic information fusion, which comprises the steps of firstly obtaining an N-frame image sequence from monitoring equipment; calculating motion pixels in the image by using an inter-frame difference method according to the two frames of images with the interval of N to obtain a difference result; segmenting target object pixels in the image by using the trained convolutional neural network model to obtain segmentation results, wherein the segmentation results comprise a target category and a pixel mask; and combining the difference result and the segmentation result through a fusion algorithm, wherein the result is the extracted moving target. Therefore, the method and the device can acquire accurate motion information and semantic information from the image at the same time, can well extract the motion target in the scene, and provide effective technical support for guaranteeing the safety of production and living activities in the scene.
(2) The method combines the artificial intelligence technology, utilizes the convolutional neural network to carry out target segmentation on the input RGB image, can obtain the pixel-level accurate position mask of the target, and has stronger semantic information, continuity and integrity.
(3) In the method, the defect that the convolutional neural network cannot acquire the motion information in the image is supplemented by using an interframe difference method, the implementation is simple, the calculation amount is small, and the pixel-level motion information in the image can be conveniently and accurately acquired.
(4) The method combines the motion information acquired by the interframe difference method and the semantic information acquired by the convolutional neural network through a fusion algorithm, so that the method has the advantages of interframe difference and convolutional neural network, not only ensures the effect of extracting the moving target, but also is simple to implement and has good robustness.
Drawings
FIG. 1 is a flow chart of a moving object extraction method based on a convolutional neural network and interframe difference according to the present embodiment;
FIG. 2 is a flow chart of the fusion algorithm of the present embodiment;
FIG. 3 is a schematic diagram illustrating the effect of the interframe difference method in this embodiment;
FIG. 4 is a diagram of a convolutional neural network structure employed for segmentation in the present embodiment;
FIG. 5 is a diagram of a detection branch structure of a convolutional neural network employed for segmentation in the present embodiment;
fig. 6 is a schematic diagram of the moving object extraction result in the present embodiment.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.
The embodiment discloses a moving object extraction method based on difference and semantic information fusion, as shown in fig. 1, comprising the following steps:
s1, acquiring an N-frame image sequence from the zoom dome camera;
the image frames obtained from the zoom dome camera are all 24-bit RGB images of 3 channels, and the length of the image sequence needs to be kept at N. In order to reduce the effect of noise on the subsequent steps, all acquired image frames need to be gaussian filtered first. The gaussian nuclear expression used is as follows:
Figure BDA0002830060310000051
image sequence at initialization, all image frames in the sequence take the 1 st frame. And (3) replacing the frame number with the acquired frame number, wherein the t-th frame is replaced by the t (t > N) th frame, the t +1 th frame is replaced by the t-N +1 th frame and the like according to a first-in first-out principle during replacement.
The length N of the image sequence may be adjusted according to the specific application scenario. When the moving speed of the moving target to be extracted is large (more than or equal to 50 pixels per frame), N is set between 1 and 5; when the motion speed of a moving object needing to be extracted is small (<50 pixels per frame), N is set to be 5-15. A common value for N, N ═ 5.
S2, calculating difference information between image frames by using an inter-frame difference method according to two image frames with an interval of N;
the input of the interframe difference method is the t-N frame and the t frame. The interframe difference method specifically comprises the following steps:
s21, converting the two frame images from the RGB images into gray-scale images respectively, wherein the conversion formula is as follows:
gray=0.299*R+0.587*G+0.114*B
wherein R, G, B represent the three color channels of a color image, respectively.
And S22, subtracting the gray values of the corresponding positions of the two frames of images, and then taking the absolute value to obtain a difference result. The implementation formula is as follows:
dif(x,y)=|src(x,y)-dst(x,y)|
where src (x, y) represents a pixel value at coordinate (x, y) on the grayscale map of the reference frame, and dst (x, y) represents a pixel value at coordinate (x, y) on the grayscale map of the current frame.
When N is 1, the effect of the inter-frame difference method is schematically shown in fig. 3.
S3, extracting semantic information in the image by using the trained example segmentation model based on the convolutional neural network, wherein the semantic information comprises a target class and a pixel mask;
the convolutional neural network shown in fig. 4 uses an input size of 416 × 416 × 3. The Basic structure of the convolutional neural network mainly comprises a Basic convolutional layer (Basic conv), a residual error module (Res), a Down sampling layer (Down sample), an Up sampling layer (Up sample) and a cascade structure
Figure BDA0002830060310000061
Forming;
the convolutional neural network comprises a main network, a characteristic pyramid network and most of 3 detection heads (Head), wherein the main network is as follows: the system comprises a basic convolutional layer, a down-sampling layer, a residual error module, a down-sampling layer, 2 residual error modules which are connected in series, a down-sampling layer, 4 residual error modules which are connected in series, a down-sampling layer and 2 residual error modules which are connected in series;
the structure from bottom to top (from a smaller feature map to a larger feature map) of the feature Pyramid network fpn (feature Pyramid network) is: the device comprises a cascade structure, a basic convolution layer, an upper sampling layer, a cascade structure, a basic convolution layer, an upper sampling layer and a cascade structure. The trunk network and the feature pyramid network are combined through 3 transverse connections, each transverse connection is composed of a basic convolution layer, and the feature graphs need to be matched in size during connection, namely the feature graphs with the same size from the trunk network and the feature pyramid network are connected through one transverse connection.
The 3 detection heads have the same structure and parameters. Each detection head comprises two prediction branches; one of the prediction branches corresponds to target category prediction, the target category of which the current grid is responsible for prediction is predicted on each grid of the characteristic diagram, the output dimension (Class) is S multiplied by C, wherein S represents the size of the characteristic diagram of the current detection head, C represents the total category number needing prediction, and the prediction branch is composed of a basic convolutional layer; the other prediction branch corresponds to target pixel Mask prediction, a position Mask of a current grid responsible for predicting a target is predicted on each grid of the characteristic diagram, and the output dimension (Mask) is H multiplied by W multiplied by S2Where S denotes a feature map size of a current detection head, H and W denote a height and a width of an input picture, respectively, and the prediction branch is composed of an upsampled layer and a base convolutional layer.
There are two methods for implementing the upsampling layer. An upsampling layer in the FPN structure, implemented by a resize function; at the upsampling layer in the detection branch, this is achieved by Transposed convolution (Transposed convolution).
And S4, combining the difference information and the semantic information through a fusion algorithm, and extracting the moving object in the image.
As shown in fig. 2, the fusion algorithm specifically includes the following steps:
s41, performing morphological filtering on the difference result, and then performing binarization to obtain a moving pixel mask;
s42, performing channel separation on the target mask in the segmentation result, and performing binarization to obtain a target pixel mask;
s43, respectively calculating the proportion P of the moving pixels in each target pixel mask;
and S44, combining the given ratio threshold value T, judging the size relationship between P and T, and if P is greater than T, judging that the current target moves.
The fusion algorithm screens the non-moving target result by using the motion information obtained by the interframe difference, so that the result only contains the position mask of each moving target in the current image.
The morphological filtering in step S41 is an On (OPEN) operation in order to reduce the effect of environmental noise on the result. The threshold value for the binarization operation is generally selected to be a value close to 0, since a slight pixel value variation due to environmental noise needs to be ignored, and a common value is 5. The motion pixel mask obtained after binarization only comprises two values: 0 and 255. the formula of binarization for thresh ═ 5 is as follows:
Figure BDA0002830060310000071
where dif (x, y) denotes the difference result obtained in step S22.
The channel separation operation in S42 is to separate all the target masks predicted by the model one by one to obtain a single-channel gray target mask. The threshold value of the binarization operation is 1, and the target pixel mask obtained after binarization only comprises two values: 0 and 255.
The calculation method of the moving pixel proportion P in the target pixel mask in S43 is as follows:
calculating the number n of pixels with the median value of 255 in the target pixel mask1
Secondly, calculating the number n of pixels with the median values of 255 in the target pixel mask and the motion pixel mask2
Calculating to obtain the moving pixel proportion P in the target pixel mask by the following formula;
Figure BDA0002830060310000072
in S44, tolerance of the algorithm to the environmental noise may be adjusted by adjusting the moving pixel ratio threshold T in the target pixel mask, where the larger T is, the less sensitive the algorithm to the environmental noise is. The common value of T is 0.1, if P is greater than T, the current target is judged to be a moving target; otherwise, the target is a non-moving target. And combining the class branches of the network, and obtaining the class information of the current moving object. When N is 1, a schematic diagram of the moving object extraction result of the method of the present invention is shown in fig. 6.
Compared with the implementation of the method, the method has higher robustness, can well extract the moving target in the scene, and provides accurate and stable results for subsequent computer vision tasks.
The invention may be implemented in hardware, firmware, software, or a combination thereof. For a hardware implementation, the invention may be implemented within one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Programmable Logic Devices (PLDs), field-programmable gate arrays (FPGAs), processors, controllers, micro-controllers, electronic devices, other electronic units designed to perform the functions described herein, or a combination thereof.
For a firmware and/or software implementation, the techniques may be implemented with modules (e.g., procedures, steps, flows, and so on) that perform the functions described herein. The firmware and/or software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.
Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
The invention has been described in connection with the accompanying drawings, it is to be understood that the invention is not limited in its application to the details of construction and the arrangement of the components set forth in the following description, since various insubstantial modifications of the inventive concept and arrangement, or direct application of the inventive concept and arrangement to other applications without modification, are intended to be covered by the scope of the invention.

Claims (10)

1. A moving object extraction method based on difference and semantic information fusion is characterized by comprising the following steps:
s1, acquiring an N-frame image sequence from the monitoring equipment;
s2, calculating difference information between image frames by using an inter-frame difference method according to two image frames with an interval of N;
s3, extracting semantic information in the image by using the trained example segmentation model based on the convolutional neural network, wherein the semantic information comprises a target class and a pixel mask;
and S4, combining the difference information and the semantic information through a fusion algorithm, and extracting the moving object in the image.
2. The method for extracting moving object based on difference and semantic information fusion as claimed in claim 1, wherein the image frames obtained in step S1 are all 24-bit RGB images of 3 channels, the length of the image sequence needs to be kept as N, all the image frames need to be filtered by gaussian kernel, and the gaussian kernel expression is as follows:
Figure FDA0002830060300000011
3. the method for extracting moving objects based on difference and semantic information fusion as claimed in claim 2, wherein in step S1, when the image sequence is initialized, all image frames in the sequence adopt the 1 st frame, and as the number of acquired frames increases, the replacement is started, and the replacement is performed according to the first-in-first-out (FIFO) principle, that is, the T-th frame replaces the T-N-th frame, the T + 1-th frame replaces the T-N + 1-th frame, and T > N.
The length N of the image sequence in step S1 is adjusted according to a specific application scenario.
4. The method for extracting a moving object based on difference and semantic information fusion as claimed in claim 3, wherein the input of the inter-frame difference method applied in step S2 is the T-N frame and the T-th frame, and the inter-frame difference method specifically includes the following steps:
s21, converting the two frame images from the RGB images into gray-scale images respectively, wherein the conversion formula is as follows:
gray=0.299*R+0.587*G+0.114*B
r, G, B, which represent the three color channels of a color image, respectively;
s22, subtracting the gray values of the corresponding positions of the two frames of images, and then taking the absolute value to obtain a difference result, wherein the formula is as follows:
dif(x,y)=|src(x,y)-dst(x,y)|
where src (x, y) represents a pixel value at coordinate (x, y) on the grayscale map of the reference frame, and dst (x, y) represents a pixel value at coordinate (x, y) on the grayscale map of the current frame.
5. The method for extracting moving object based on difference and semantic information fusion as claimed in claim 4, wherein the input size of the convolutional neural network used in the step S3 is 416 x 3;
the convolutional neural network comprises a backbone network, a characteristic pyramid network and 3 detection heads, wherein the backbone network is as follows: the system comprises a basic convolutional layer, a down-sampling layer, a residual error module, a down-sampling layer, 2 residual error modules which are connected in series, a down-sampling layer, 4 residual error modules which are connected in series, a down-sampling layer and 2 residual error modules which are connected in series;
the structure of the feature Pyramid network fpn (feature Pyramid network) from bottom to top is: the device comprises a cascade structure, a basic convolution layer, an upper sampling layer, a cascade structure, a basic convolution layer, an upper sampling layer and a cascade structure; the trunk network and the feature pyramid network are combined through 3 transverse connections, each transverse connection is composed of a basic convolution layer, the size matching of feature graphs is required to be ensured during connection, namely the feature graphs with the same size from the trunk network and the feature pyramid network are connected through one transverse connection;
the structure of the 3 detection headsThe parameters are the same, and each detection head comprises two prediction branches; predicting a target class which is responsible for prediction of a current grid on each grid of a characteristic diagram by one of prediction branches corresponding to the target class, wherein the output dimension is S multiplied by C, S represents the size of the characteristic diagram of a current detection head, C represents the number of classes which need to be predicted in total, and the prediction branches are composed of basic convolution layers; the other prediction branch corresponds to target pixel mask prediction, the position mask of the target which is responsible for prediction of the current grid is predicted on each grid of the characteristic diagram, and the output dimension is H multiplied by W multiplied by S2Where S denotes a feature map size of a current detection head, H and W denote a height and a width of an input picture, respectively, and the prediction branch is composed of an upsampled layer and a base convolutional layer.
6. The method for extracting moving object based on difference and semantic information fusion as claimed in claim 5, characterized in that, the upsampling layer in FPN structure is by using nearest neighbor difference method resize function;
the upsampling layer in the branch is masked at the position of the detection header by a Transposed convolution (Transposed convolution).
7. The method for extracting moving object based on difference and semantic information fusion as claimed in claim 6, wherein the fusion algorithm in step S4 includes the following steps:
s41, performing morphological filtering on the difference result, and then performing binarization to obtain a moving pixel mask;
s42, performing channel separation on the target mask in the segmentation result, and performing binarization to obtain a target pixel mask;
s43, respectively calculating the proportion P of the moving pixels in each target pixel mask;
s44, judging the size relation between the ratio P of the moving pixels and the ratio threshold T by combining the set ratio threshold T, and if P is greater than T, judging that the current target is a moving target;
in the step S41, the morphological filtering is an On (OPEN) operation, specifically, an etching (erod) operation is performed first, and then an expansion (DILATE) operation is performed again;
and performing binarization operation after the morphological filtering operation, wherein a value thresh close to 0 is selected as a binarization threshold value, and a motion pixel mask obtained after binarization only comprises two values: 0 and 255, 8-bit unsigned integer, the formula of binarization is as follows:
Figure FDA0002830060300000031
where dif (x, y) denotes the difference result obtained in step S22.
8. The method for extracting a moving object based on difference and semantic information fusion as claimed in claim 7, wherein the channel separation operation in step S42 is to divide all target pixel masks predicted by the model one by one in channel dimension to obtain a single-channel gray-scale target pixel mask;
and (3) carrying out binarization after channel separation, wherein the threshold value of binarization is 1, and the target mask obtained after binarization only contains two values: 0 and 255, 8-bit unsigned integer.
9. The method for extracting a moving object based on difference and semantic information fusion according to claim 8, wherein the step of calculating the ratio P of moving pixels in the mask of target pixels in step S43 is as follows:
calculating the number n of pixels with the median value of 255 in the target pixel mask1
Secondly, calculating the number n of pixels with the median values of 255 in the target pixel mask and the motion pixel mask2
Calculating the proportion P of the moving pixels in the target pixel mask by the following formula:
Figure FDA0002830060300000032
10. the method for extracting a moving object based on difference and semantic information fusion according to claim 9, wherein in step S44, tolerance of the algorithm to the environmental noise can be adjusted by adjusting a moving pixel ratio threshold T in a target pixel mask, where the larger T is, the less sensitive it to the environmental noise is, and if P > T, the current object is determined to be a moving object; otherwise, the target is a non-moving target.
CN202011439962.9A 2021-02-24 2021-02-24 Moving object extraction method based on difference and semantic information fusion Active CN112907621B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011439962.9A CN112907621B (en) 2021-02-24 2021-02-24 Moving object extraction method based on difference and semantic information fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011439962.9A CN112907621B (en) 2021-02-24 2021-02-24 Moving object extraction method based on difference and semantic information fusion

Publications (2)

Publication Number Publication Date
CN112907621A true CN112907621A (en) 2021-06-04
CN112907621B CN112907621B (en) 2023-02-14

Family

ID=76111423

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011439962.9A Active CN112907621B (en) 2021-02-24 2021-02-24 Moving object extraction method based on difference and semantic information fusion

Country Status (1)

Country Link
CN (1) CN112907621B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113421231A (en) * 2021-06-08 2021-09-21 杭州海康威视数字技术股份有限公司 Bleeding point detection method, device and system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102184552A (en) * 2011-05-11 2011-09-14 上海理工大学 Moving target detecting method based on differential fusion and image edge information
CN110782477A (en) * 2019-10-10 2020-02-11 重庆第二师范学院 Moving target rapid detection method based on sequence image and computer vision system
US20200051250A1 (en) * 2018-08-08 2020-02-13 Beihang University Target tracking method and device oriented to airborne-based monitoring scenarios
CN111626090A (en) * 2020-03-03 2020-09-04 湖南理工学院 Moving target detection method based on depth frame difference convolutional neural network
WO2020181685A1 (en) * 2019-03-12 2020-09-17 南京邮电大学 Vehicle-mounted video target detection method based on deep learning
CN111862143A (en) * 2020-07-13 2020-10-30 郑州信大先进技术研究院 Automatic river bank collapse monitoring method
CN112381835A (en) * 2020-10-29 2021-02-19 中国农业大学 Crop leaf segmentation method and device based on convolutional neural network

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102184552A (en) * 2011-05-11 2011-09-14 上海理工大学 Moving target detecting method based on differential fusion and image edge information
US20200051250A1 (en) * 2018-08-08 2020-02-13 Beihang University Target tracking method and device oriented to airborne-based monitoring scenarios
WO2020181685A1 (en) * 2019-03-12 2020-09-17 南京邮电大学 Vehicle-mounted video target detection method based on deep learning
CN110782477A (en) * 2019-10-10 2020-02-11 重庆第二师范学院 Moving target rapid detection method based on sequence image and computer vision system
CN111626090A (en) * 2020-03-03 2020-09-04 湖南理工学院 Moving target detection method based on depth frame difference convolutional neural network
CN111862143A (en) * 2020-07-13 2020-10-30 郑州信大先进技术研究院 Automatic river bank collapse monitoring method
CN112381835A (en) * 2020-10-29 2021-02-19 中国农业大学 Crop leaf segmentation method and device based on convolutional neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
吴骞: "融合时空差分信息的动作识别算法研究", 《中国优秀博硕士学位论文全文数据库(硕士)》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113421231A (en) * 2021-06-08 2021-09-21 杭州海康威视数字技术股份有限公司 Bleeding point detection method, device and system
CN113421231B (en) * 2021-06-08 2023-02-28 杭州海康威视数字技术股份有限公司 Bleeding point detection method, device and system

Also Published As

Publication number Publication date
CN112907621B (en) 2023-02-14

Similar Documents

Publication Publication Date Title
CN111340844B (en) Multi-scale characteristic optical flow learning calculation method based on self-attention mechanism
CN108510451B (en) Method for reconstructing license plate based on double-layer convolutional neural network
CN112464807A (en) Video motion recognition method and device, electronic equipment and storage medium
CN111402146A (en) Image processing method and image processing apparatus
CN110807384A (en) Small target detection method and system under low visibility
CN112562255B (en) Intelligent image detection method for cable channel smoke and fire conditions in low-light-level environment
CN112614136A (en) Infrared small target real-time instance segmentation method and device
Hiraiwa et al. An FPGA based embedded vision system for real-time motion segmentation
CN116152591B (en) Model training method, infrared small target detection method and device and electronic equipment
CN113409355A (en) Moving target identification system and method based on FPGA
CN116129291A (en) Unmanned aerial vehicle animal husbandry-oriented image target recognition method and device
CN112907621B (en) Moving object extraction method based on difference and semantic information fusion
CN115937794A (en) Small target object detection method and device, electronic equipment and storage medium
CN114266952A (en) Real-time semantic segmentation method based on deep supervision
CN112766028A (en) Face fuzzy processing method and device, electronic equipment and storage medium
CN113936034A (en) Apparent motion combined weak and small moving object detection method combined with interframe light stream
Yeswanth et al. Sovereign critique network (SCN) based super-resolution for chest X-rays images
EP4248657A1 (en) Methods and systems for low light media enhancement
CN116110095A (en) Training method of face filtering model, face recognition method and device
US20230394632A1 (en) Method and image processing device for improving signal-to-noise ratio of image frame sequences
CN111797761B (en) Three-stage smoke detection system, method and readable medium
CN113065650B (en) Multichannel neural network instance separation method based on long-term memory learning
Philip Background subtraction algorithm for moving object detection using denoising architecture in FPGA
CN115100409A (en) Video portrait segmentation algorithm based on twin network
CN114565764A (en) Port panorama sensing system based on ship instance segmentation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant