CN105741322B

CN105741322B - A kind of field of view dividing method based on the fusion of video features layer

Info

Publication number: CN105741322B
Application number: CN201610072608.4A
Authority: CN
Inventors: 张睿; 童玉娟
Original assignee: Quzhou University
Current assignee: Quzhou University
Priority date: 2016-02-01
Filing date: 2016-02-01
Publication date: 2018-08-03
Anticipated expiration: 2036-02-01
Also published as: CN105741322A

Abstract

The invention discloses a kind of field of view dividing methods based on the fusion of video features layer.It includes the following steps：Calculate the color characteristic of each pixel in video；Calculate the dynamic feature of each pixel in video；Calculate the textural characteristics of each pixel in video；The dynamic feature of each pixel, color characteristic and textural characteristics in video are subjected to Feature-level fusion, region segmentation is carried out to the visual field in video according to fusion gained feature.The invention comprehensively utilizes the dynamic feature on video image vegetarian refreshments time dimension and the color characteristics and textural characteristics on Spatial Dimension, improve the validity and correctness of field of view segmentation.

Description

A kind of field of view dividing method based on the fusion of video features layer

Technical field

The present invention relates to video analysis processing technology field more particularly to a kind of visual field areas based on the fusion of video features floor Domain splitting method.

Background technology

With video technique it is continuous it is ripe decline with cost, video analysis and treatment technology have been widely used for scientific research, The numerous areas of production and social life.Carrying out region segmentation to the visual field presented in video helps to extract in video Valuable information, be a kind of important video analysis and treatment technology.

Currently, what is mainly used for reference for the region segmentation method of visual field in video is image region segmentation technology.Commonly Image region segmentation technology has the method based on color characteristic, the method based on textural characteristics, and the side based on shape feature Method etc..Obviously, image region segmentation technology is grafted directly in the video object, has ignored the abundant dynamic for including in video The time variation of feature and video content necessarily leads to the deficiency of field of view segmentation validity and correctness.

Invention content

The abundant dynamic for including in video the purpose of the present invention is overcoming existing field of view dividing method to have ignored is special The time variation of sign and video content leads to the technical problem of field of view segmentation validity and correctness deficiency, provides one The field of view dividing method that kind is merged based on video features layer, fully utilizes the dynamic on video image vegetarian refreshments time dimension Property the feature and color characteristic on Spatial Dimension and textural characteristics, improve the validity and correctness of field of view segmentation.

To solve the above-mentioned problems, the present invention is achieved by the following scheme：

A kind of field of view dividing method based on the fusion of video features layer of the present invention, includes the following steps：

S1：Calculate the color characteristic of each pixel in video；

S2：Calculate the dynamic feature of each pixel in video；

S3：Calculate the textural characteristics of each pixel in video；

S4：The dynamic feature of each pixel, color characteristic and textural characteristics in video are subjected to Feature-level fusion, root Region segmentation is carried out to the visual field in video according to fusion gained feature.

In the technical scheme, it using the field of view dividing method merged based on video features layer, fully utilizes and regards Dynamic feature on frequency pixel time dimension and the color characteristic on Spatial Dimension and textural characteristics, are utilized in video The Space Time Joint Distribution feature of visual information can not utilize vision to believe when overcoming to video capture image region segmentation method The deficiency of the temporal dynamic property feature of breath, the color video that this method is suitable for various resolution ratio fixed to visual field carry out visual field Region segmentation.

Preferably, the step S1 includes the following steps：

S11：One color feature vector based on RGB color, the RGB face are generated to each pixel of video Color characteristic vector is as follows：

f₁(i, j) |_t=(R (i, j) |_t, G (i, j) |_t, B (i, j) |_t)

Wherein, R (i, j) |_tPixel of the video in t frames at coordinate (i, j) is represented on red color channel Pixel value, G (i, j) |_tRepresent pixel pixel on green color channel of the video in t frames at coordinate (i, j) Value, B (i, j) |_tRepresent pixel value of pixel of the video in t frames at coordinate (i, j) on blue color channels；

S12：The RGB color of video is converted into hsv color space；

S13：One color feature vector based on hsv color space, the HSV face are generated to each pixel of video Color characteristic vector is as follows：

f₂(i, j) |_t=(H (i, j) |_t, S (i, j) |_t, V (i, j) |_t)

Wherein, H (i, j) |_tRepresent pixel pixel on tone channel of the video in t frames at coordinate (i, j) Value, S (i, j) |_tRepresent pixel pixel value on saturation degree channel of the video in t frames at coordinate (i, j), V (i, j)|_tRepresent pixel pixel value on luminance channel of the video in t frames at coordinate (i, j)；

S14：By color feature vector and the color feature vector based on hsv color space based on RGB color into Row series connection, generates a color feature vector based on double color spaces, which indicates as follows：

Preferably, the step S2 includes the following steps：

S21：Convert video to gray processing video；

S22：Background model is built to each pixel in gray processing video；

S23：The number of the conspicuousness gray-value variation occurred on each pixel in statistics gray processing video, conspicuousness Gray-value variation is defined as：Gray-value variation amplitude on one pixel is beyond the ash set by background model on the pixel Angle value normal variation range；

S24：The dynamic of each pixel in gray processing video is calculated, the calculation formula of the dynamic of pixel is：

Wherein, Ψ (i, j) |_tRepresent pixel of the gray processing video from start frame at t frame time internal coordinates (i, j) The number of the conspicuousness gray-value variation of upper generation；D (i, j) |_tRepresent pixel at coordinate (i, j) gray processing video from The frequency of conspicuousness gray-value variation, i.e., the dynamic of pixel at coordinate (i, j) occur in start frame to t frame times.

Preferably, the step S3 includes the following steps：

S31：Convert video to gray processing video, calculate gray processing video using original LBP operators is sitting in t frames The LBP texture values of the pixel at (i, j) are marked, and as the 1st texture eigenvalue W of the pixel₁(i, j) |_t；

S32：The LBP of pixel of the gray processing video in t frames at coordinate (i, j) is calculated using round LBP operators Texture value, and as the 2nd texture eigenvalue W of the pixel₂(i, j) |_t；

S33：By the 1st texture eigenvalue and the 2nd line of pixel of the gray processing video in t frames at coordinate (i, j) Reason eigenvalue cluster is combined into the texture feature vector of the pixel, i.e.,：f₄(i, j) |_t=(W₁(i, j) |_t, W₂(i, j) |_t)。

Preferably, the step S4 includes the following steps：

S41：The dynamic feature of each pixel, color characteristic and textural characteristics in video are subjected to Feature-level fusion, Obtain fusion feature vector；

S42：Automatically gathered using the fusion feature vector on all pixels point in video when clustering method pair t frames Alanysis；

S43：All pixels being classified as corresponding to the fusion feature vector of one kind by clustering are divided into together The region segmentation to visual field in video is completed in one region.

Substantial effect of the invention is：A variety of visual signatures of the video on time and Spatial Dimension are fully utilized, It is quiet due to increasing including the pixel dynamic feature on time dimension and the color on Spatial Dimension and textural characteristics Pixel dynamic feature this key message not having in state image uses image region segmentation method to overcome The problem of validity and correctness deficiency caused by region segmentation is carried out to video.

Description of the drawings

Fig. 1 is the work flow diagram of the present invention；

Fig. 2 is the schematic diagram of original LBP operators in step S31；

Fig. 3 is the schematic diagram of circle LBP operators in step S32.

Specific implementation mode

Below with reference to the embodiments and with reference to the accompanying drawing the technical solutions of the present invention will be further described.

Embodiment：A kind of field of view dividing method based on the fusion of video features layer of the present embodiment, as shown in Figure 1, Include the following steps：

S1：Calculate the color characteristic of each pixel in video；

S2：Calculate the dynamic feature of each pixel in video；

S3：Calculate the textural characteristics of each pixel in video；

Step S1 includes the following steps：

f₁(i, j) |_t=(R (i, j) |_t, G (i, j) |_t, B (i, j) |_t)

S12：The RGB color of video is converted into hsv color space；

f₂(i, j) |_t=(H (i, j) |_t, S (i, j) |_t, V (i, j) |_t)

Step S2 includes the following steps：

S21：Gray processing processing is carried out to video, converts video to gray processing video；

S22：Background model is built to each pixel in gray processing video；

S23：The number of the conspicuousness gray-value variation occurred on each pixel in statistics gray processing video, conspicuousness Gray-value variation is defined as：Gray-value variation amplitude on one pixel is beyond the ash set by background model on the pixel Angle value normal variation range, i.e. gray-value variation amplitude on a pixel is beyond set by background model on the pixel Gray value normal variation range is primary, and the conspicuousness gray-value variation number of the pixel adds 1；

Wherein, Ψ (i, j) |_tRepresent pixel of the gray processing video from start frame at t frame time internal coordinates (i, j) The number of the conspicuousness gray-value variation of upper generation；D (i, j) |_tRepresent pixel at coordinate (i, j) gray processing video from The frequency of generation conspicuousness gray-value variation in start frame to t frame times, i.e., the dynamic of pixel at coordinate (i, j), as The dynamic of vegetarian refreshments refers to that the frequency of conspicuousness gray-value variation occurs on pixel, the pixel in the low expression video of dynamic The scene changes at place are small, and dynamic height indicates that the scene changes in video at the pixel are big.

Step S3 includes the following steps：

S31：Convert video to gray processing video, calculate gray processing video using original LBP operators is sitting in t frames The LBP texture values of the pixel at (i, j) are marked, and as the 1st texture eigenvalue W of the pixel₁(i, j) |_t, original LBP operators are as shown in Figure 2；

S32：The LBP of pixel of the gray processing video in t frames at coordinate (i, j) is calculated using round LBP operators Texture value, and as the 2nd texture eigenvalue W of the pixel₂(i, j) |_t, round LBP operators are as shown in Figure 3；

Step S4 includes the following steps：

S41：The dynamic feature of each pixel, color characteristic and textural characteristics in video are subjected to Feature-level fusion, Obtain fusion feature vector：

S42：Using the fusion feature vector f (i, j) on all pixels point in video when clustering method pair t frames |_tIt carries out Automatic clustering；

The dynamic of pixel refers to that the frequency of conspicuousness gray-value variation, the low expression video of dynamic occur on pixel In scene changes at the pixel it is small, dynamic height indicates that the scene changes in video at the pixel are big, using based on regarding The field of view dividing method of frequency Feature-level fusion, fully utilize dynamic feature on video image vegetarian refreshments time dimension and Color characteristic on Spatial Dimension and textural characteristics are utilized the Space Time Joint Distribution feature of visual information in video, overcome The deficiency of the temporal dynamic property feature of visual information, this method can not be utilized to be applicable in when to video capture image region segmentation method Field of view segmentation is carried out in the color video to the fixed various resolution ratio of visual field.

Claims

1. a kind of field of view dividing method based on the fusion of video features layer, which is characterized in that include the following steps：

S1：Calculate the color characteristic of each pixel in video；

S2：Calculate the dynamic feature of each pixel in video；

S3：Calculate the textural characteristics of each pixel in video；

S4：The dynamic feature of each pixel in video, color characteristic and textural characteristics are subjected to Feature-level fusion, according to melting It closes gained feature and region segmentation is carried out to the visual field in video；

The step S2 includes the following steps：

S21：Convert video to gray processing video；

S22：Background model is built to each pixel in gray processing video；

S23：The number of the conspicuousness gray-value variation occurred on each pixel in statistics gray processing video, conspicuousness gray scale Value variation is defined as：Gray-value variation amplitude on one pixel is beyond the gray value set by background model on the pixel Normal variation range；

Wherein, ψ (i, j) |_tGray processing video is represented from start frame to occur on pixel at t frame time internal coordinates (i, j) Conspicuousness gray-value variation number；D (i, j) |_tPixel at coordinate (i, j) is represented in gray processing video from start frame The frequency of conspicuousness gray-value variation, i.e., the dynamic of pixel at coordinate (i, j) occur in t frame times.

2. a kind of field of view dividing method based on the fusion of video features layer according to claim 1, which is characterized in that The step S1 includes the following steps：

S11：One color feature vector based on RGB color is generated to each pixel of video, the RGB color is special Sign vector is as follows：

f₁(i, j) |_t=(R (i, j) |_t, G (i, j) |_t, B (i, j) |_t)

Wherein, R (i, j) |_tRepresent pixel pixel on red color channel of the video in t frames at coordinate (i, j) Value, G (i, j) |_tRepresent pixel pixel value on green color channel of the video in t frames at coordinate (i, j), B (i, j) |_tRepresent pixel value of pixel of the video in t frames at coordinate (i, j) on blue color channels；

S12：The RGB color of video is converted into hsv color space；

S13：One color feature vector based on hsv color space is generated to each pixel of video, the hsv color is special Sign vector is as follows：

f₂(i, j) | t=(H (i, j) |_t, S (i, j) |_t, V (i, j) |_t)

Wherein, H (i, j) |_tRepresent pixel pixel value on tone channel of the video in t frames at coordinate (i, j), S (i, j) |_tPixel pixel value on saturation degree channel of the video in t frames at coordinate (i, j) is represented, V (i, j) |_tGeneration Pixel pixel value on luminance channel of the table video in t frames at coordinate (i, j)；

S14：Color feature vector based on RGB color is gone here and there with the color feature vector based on hsv color space Connection, generates a color feature vector based on double color spaces, which indicates as follows：

3. a kind of field of view dividing method based on the fusion of video features layer according to claim 1 or 2, feature exist In the step S3 includes the following steps：

S31：It converts video to gray processing video, gray processing video is calculated in t frames in coordinate using original LBP operators The LBP texture values of pixel at (i, j), and as the 1st texture eigenvalue W of the pixel₁(i, j) |_t；

S32：The LBP textures of pixel of the gray processing video in t frames at coordinate (i, j) are calculated using round LBP operators Value, and as the 2nd texture eigenvalue W of the pixel₂(i, j) |_t；

S33：1st texture eigenvalue of pixel of the gray processing video in t frames at coordinate (i, j) and the 2nd texture is special Value indicative is combined as the texture feature vector of the pixel, i.e.,：f₄(i, j) |_t=(W₁(i, j) |_t, W₂(i, j) |_t)。

4. a kind of field of view dividing method based on the fusion of video features layer according to claim 1 or 2, feature exist In the step S4 includes the following steps：

S41：The dynamic feature of each pixel, color characteristic and textural characteristics in video are subjected to Feature-level fusion, are obtained Fusion feature vector；

S42：Automatically cluster minute is carried out using the fusion feature vector on all pixels point in video when clustering method pair t frames Analysis；

S43：All pixels being classified as by clustering corresponding to a kind of fusion feature vector are divided into same The region segmentation to visual field in video is completed in region.