CN113822161B - Dynamic hand detection method and device for fusion of skin color and three-level background updating model - Google Patents
Dynamic hand detection method and device for fusion of skin color and three-level background updating model Download PDFInfo
- Publication number
- CN113822161B CN113822161B CN202110966932.1A CN202110966932A CN113822161B CN 113822161 B CN113822161 B CN 113822161B CN 202110966932 A CN202110966932 A CN 202110966932A CN 113822161 B CN113822161 B CN 113822161B
- Authority
- CN
- China
- Prior art keywords
- pixel
- skin color
- background
- frame
- gaussian component
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 109
- 230000004927 fusion Effects 0.000 title claims description 7
- 230000033001 locomotion Effects 0.000 claims abstract description 81
- 238000000034 method Methods 0.000 claims abstract description 77
- 230000008859 change Effects 0.000 claims abstract description 45
- 239000000203 mixture Substances 0.000 claims abstract description 39
- 238000005286 illumination Methods 0.000 claims abstract description 16
- 230000000877 morphologic effect Effects 0.000 claims abstract description 11
- 238000007781 pre-processing Methods 0.000 claims abstract description 4
- 238000004891 communication Methods 0.000 claims description 4
- 230000003068 static effect Effects 0.000 claims description 3
- 125000003275 alpha amino acid group Chemical group 0.000 claims description 2
- 150000001875 compounds Chemical class 0.000 claims description 2
- 230000008569 process Effects 0.000 abstract description 10
- 238000000638 solvent extraction Methods 0.000 abstract 1
- 238000009826 distribution Methods 0.000 description 9
- 238000011410 subtraction method Methods 0.000 description 9
- 238000004364 calculation method Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 230000000903 blocking effect Effects 0.000 description 5
- 230000003993 interaction Effects 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 230000035772 mutation Effects 0.000 description 4
- 238000010276 construction Methods 0.000 description 3
- 230000006399 behavior Effects 0.000 description 2
- 230000000052 comparative effect Effects 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000010485 coping Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a dynamic hand detection method and device for fusing skin color and a three-level background updating model. The method comprises the following steps: preprocessing a video stream and partitioning the video stream; detecting a motion area by using a three-level background updating model to obtain a motion detection result; the motion detection process comprises updating three layers of background models of pixel level, region level and frame level; intercepting a rectangular area containing a change area in a color image, detecting a skin color area by using a CbCr-I channel Gaussian mixture model, and performing morphological operation to obtain a skin color detection result; intersection is taken between the skin color detection result and the movement detection result, so that a skin color movement area is obtained; and finding a connected region where the skin color movement region is located in the skin color detection result, and considering the region as a hand region. The three-level background updating model provided by the invention solves the problems of large calculated amount, poor adaptability to larger changes of illumination and the like of a background difference method based on a Gaussian mixture model.
Description
Technical Field
The invention relates to the fields of man-machine interaction and robots, in particular to a dynamic hand detection method and device for fusing skin color and a three-level background updating model.
Background
Gesture recognition occurs as a natural human-computer interaction, and hand detection/positioning is the first step of gesture recognition and is also the basis of gesture tracking, gesture recognition, behavior understanding, and the like. Hand detection effects have important effects on subsequent processing such as gesture recognition, behavior understanding, etc., but hand detection in complex background and varying illumination environments is a challenging task. This is mainly due to two reasons: on the one hand, the human hand has a plurality of joints, and the human hand is a non-rigid moving object with a plurality of moving joints, namely, the deformable moving object is detected, so that the human hand has certain difficulty in detecting the relative object. On the other hand, dynamic changes in background images (e.g., illumination, weather, etc.) and the like increase the difficulty of hand detection.
Some students use skin tone and motion information in combination to detect hand regions from video. Among them, the motion area is mostly detected by using an inter-frame difference method or a background subtraction method, and an important class of methods in the background subtraction method is based on a gaussian mixture model. The inter-frame difference method is generally only capable of detecting the outline of the motion region, and cannot completely detect the entire hand region, and "holes" often appear inside the hand region detected by the inter-frame difference method. Although the background subtraction method based on the Gaussian mixture model can obtain a relatively complete hand movement region, the method is suitable for detecting moving targets with small changes such as light rays, weather and the like, has complex background modeling process, large calculation amount and poor adaptability to larger environmental changes.
Disclosure of Invention
The invention aims to provide a dynamic hand detection method and a device for fusing skin color and a three-level background updating model, so that hand detection can be realized with lower calculated amount under the condition of general hardware configuration in a non-structural environment in the human-robot gesture interaction process.
The invention is realized in the following way: a dynamic hand detection method for fusing skin color and three-level background update model comprises the following steps:
a. preprocessing each frame of image in a video stream, dividing the image into a plurality of equal small blocks, taking the pixel mean value of the small blocks as the pixel value of the small blocks, and taking each small block as one pixel;
b. detecting a motion region by using a three-level background updating model to obtain a motion detection result, wherein the motion detection result is specifically as follows:
b-1, constructing an initial background image by using an inter-frame difference method;
b-2, detecting a change region by using an inter-frame difference method, performing morphological operation, and calculating the ratio r of the number of foreground pixels to the total number of images c ;
b-3, judging r c And rt m And rt c Relationship between rt m <r c <rt c Executing the step b-5, otherwise executing the step b-4; rt (rt) m An empirical threshold, rt, representing the presence of a moving object in an image c An empirical threshold indicative of an occurrence of a sudden change in illumination;
b-4, updating the background image by using the current frame, and executing the step b-2 for the next frame image;
b-5, removing a small-area communication area with a fixed proportion smaller than the area of the maximum communication area, marking a change area detected by an inter-frame difference method by using a rectangular frame, and dividing the whole image into a change area and a static area at the moment;
b-6, detecting a motion area by using an improved Gaussian mixture model for pixels in the change area; for pixels in the stationary region, on the one hand, it is used as a background, and on the other hand, the parameters of the gaussian component with a weight other than 0 are updated using formula (5);
w i,k =(1-α)w i,k-1 (5)
in formula (5), w i,k Weights of the ith gaussian component, w, of the kth frame i,k-1 The weight of the ith Gaussian component of the k-1 frame is given, and alpha is the self-adaptive learning rate;
c. intercepting a rectangular area containing a change area in a corresponding color image in a video stream, detecting a skin color area by using a CbCr-I channel Gaussian mixture model, and performing morphological operation to obtain a skin color detection result;
d. c, acquiring an intersection of the skin color detection result obtained in the step c and the movement detection result obtained in the step b, so as to obtain a skin color movement region;
e. and c, finding out a connected region where the skin color movement region obtained in the step d is located in the skin color detection result obtained in the step c, wherein the region is a hand region.
The pixels in the change region in the step b-6 are detected by using the modified Gaussian mixture model, and the specific steps are as follows:
b-61 initializing n for a certain pixel in the change region f =0,n f Representing the number of matches of the pixel value with the gaussian component corresponding to the pixel; a pixel value matches the gaussian component corresponding to that pixel,the following formula (3) needs to be satisfied,
in the formula (3), the amino acid sequence of the compound,and->Mean and standard deviation of the ith gaussian component of the k-1 frame pixel (x, y), respectively; a is a matching constant;
b-62, for the Gaussian component with the weight of non-0, judging whether the pixel is matched with the Gaussian component through a formula (3), if so, updating parameters of the Gaussian component by adopting a formula (4), and if so, successfully matching one Gaussian component every time, and n f 1 in number; if not, updating parameters of the Gaussian component by adopting a formula (5);
in the formula (4), d is an empirical coefficient, ρ is a parameter update rate, μ i,k-1 Mean value of the ith Gaussian component, μ for the k-1 frame i,k Is the mean value of the ith Gaussian component of the kth frame, f k A pixel value for a pixel of the kth frame;
b-63, sorting all Gaussian components according to the ratio of the weight to the variance from large to small;
b-64, if the weight of a certain Gaussian component is more than 0 and less than the initial weight and the variance is less than the initial variance, deleting the Gaussian component;
b-65, judge n f If equal to 0, if yes, executing the step b-66; if not, the pixel is a background pixel;
b-66, judging whether the weight of the last Gaussian component is 0, and if so, executing a step b-67; if not, assigning the pixel value to the mean value of the last Gaussian component, and respectively assigning the weight and variance of the last Gaussian component as an initial weight and an initial variance;
b-67, searching a Gaussian component with the weight of 0 appearing for the first time from the first Gaussian component, assigning the pixel value to the mean value of the Gaussian component, and respectively assigning the weight and variance of the Gaussian component as an initial weight and an initial variance;
b-68, normalizing the weight of each Gaussian component;
b-69, judging whether the Gaussian component with the weight of not 0 is matched with the pixel value according to the formula (3), and if so, judging that the pixel is a background pixel; if not, the pixel is a foreground pixel.
The step b-1 is to construct an initial background image by using an interframe difference method, and comprises the following specific steps:
b-11 detecting the change region according to formula (1) by the inter-frame difference method, performing morphological closing operation, and calculating the number N of pixels of the connected region c ;
In the above two formulas, f k (x, y) and f k-1 (x, y) represents pixel values of the k-th frame and k-1-th frame images, respectively, T a For the empirical threshold, b is an empirical coefficient, N represents the total number of pixels in the image, M k (x, y) is a binary image obtained after the difference;
b-12, judge N c If so, executing the step b-13; if not, executing the step b-14;
b-13, using a rectangular frame to frame all connected areas, and constructing a binarized background image bg by the current frame b The pixel values in the middle rectangular area are all assigned 0, and the pixel values outside the rectangular area are all assigned 1;
b-14, will be asBinarized background image bg constructed by previous frame b All pixel values are assigned 0;
b-15, taking the binarized background image bg constructed by the current frame b Intersection with the current frame image to obtain a background image bg constructed by the current frame t ;
And b-16, adding the background images constructed by the adjacent frames until each pixel in the image constructs a background at least 4 times, and calculating the pixel average value of each pixel, thereby obtaining an initial background image.
The dynamic hand detection device which is fused with the skin color and three-level background updating model and corresponds to the method comprises a motion detection module and a skin color detection module; the motion detection module is used for detecting a motion area in the video acquired by the camera, and particularly, the motion area is detected by using the three-level background updating model in the step b; the skin color detection module is used for detecting skin color areas, specifically, detecting skin color areas by using a CbCr-I channel Gaussian mixture model, and performing morphological operation.
The invention provides a dynamic hand detection method combining skin color and a three-level background updating model. The skin color detection adopts a proposed CbCr-I channel Gaussian mixture model, and the motion detection adopts a proposed three-level background updating model. The three-level background updating model solves the problems of large calculated amount, poor adaptability to large change of illumination and the like of a background difference method based on a Gaussian mixture model.
The invention has the advantages that:
according to the dynamic hand detection method and device for fusing the skin color and the three-level background updating model, the operation requirements under the specific application scene of investigation and rescue are oriented according to the characteristics of the leg-arm fusion multi-legged robot, and the method and device can adapt to larger changes of environmental illumination.
The method and the device for detecting the dynamic hand fused with the skin color and the three-level background updating model not only greatly reduce the calculated amount of the Gaussian mixture model, but also improve the detection effect of the change area.
The three-level background updating model solves the problems of large calculated amount, poor adaptability to larger changes of illumination and the like of a background difference method based on a Gaussian mixture model.
Drawings
FIG. 1 is an overall flow chart of a dynamic hand detection method of the present invention where skin tone is fused with a three level background update model.
FIG. 2 is a flow chart of a motion detection method based on a three-level background update model in the present invention.
Fig. 3 is a flow chart of a method of constructing an initial background using an inter-frame difference method according to the present invention.
Fig. 4 is a flow chart of a method of motion detection for a change region in accordance with the present invention.
Fig. 5 is a partial sample of the self-built database of the present invention.
Fig. 6 is an initial background image constructed using the inter-frame difference method of the present invention.
Fig. 7 is an exemplary diagram of a process for motion detection of a conference room scene in accordance with the present invention.
Fig. 8 is an exemplary diagram of a process for motion detection of a laboratory scene in accordance with the present invention.
Fig. 9 is a diagram of a comparative example of motion detection for a conference room scene using the present invention and prior art methods.
Fig. 10 is a diagram of a comparative example of motion detection for laboratory scenes using the present invention and prior art methods.
Fig. 11 is a diagram illustrating an example of a process of hand detection for a conference room scene according to the present invention.
Fig. 12 is a result of the present invention performing hand detection on a conference room scene.
Fig. 13 is an exemplary diagram of a process for hand detection of a laboratory scene in accordance with the present invention.
Fig. 14 is a result of the present invention performing hand detection on a laboratory scene.
Detailed Description
The invention provides a dynamic gesture detection method and a device for detecting a hand region by utilizing skin color and motion information and combining a skin color and a three-level background updating model, as shown in figure 1. The skin color detection adopts a CbCr-I channel Gaussian mixture model, and the motion detection adopts a three-level background updating model provided by the invention, as shown in figure 2, wherein the three-level in the three-level background refers to a frame level, a region level and a pixel level. The three-level background updating model is mainly used for improving a background modeling method based on a Gaussian mixture model aiming at the problems of large calculated amount, large illumination change, poor adaptability and the like of the traditional Gaussian mixture model.
When the traditional Gaussian mixture model is used for modeling the background, the same fixed number of Gaussian distributions are required to be established for pixels in each frame of image, the Gaussian mixture model of all pixel points is updated, the calculated amount is large, the requirement on system hardware is high, and the real-time performance of human and robot gesture interaction cannot be met. In addition, the background subtraction method cannot cope with the situation of large changes of illumination, and is generally used for a fixed scene, in which the background remains basically unchanged and only small changes, such as leaf shake, are generated on the background. But as the robot can move in different scenes, and large changes in light rays, etc. may be encountered. In order to solve the problems, the invention improves a background modeling method based on a Gaussian mixture model from the following aspects:
1. reducing the amount of computation
The traditional Gaussian mixture model establishes the same fixed number of Gaussian functions for each pixel to fit the pixel value, standard deviation, mean value, weight and the like are required to be updated, and operations such as sequencing, updating and the like are also required to be performed on the Gaussian distribution of each pixel, so that the calculation amount is large, and the interaction between a person and a robot is not facilitated. The invention is mainly aimed at gesture videos comprising human hands and human faces at the same time, and the hands are only small areas in the images. In these gesture video images, only some areas often have moving objects passing through, and other areas are only backgrounds, especially hands are smaller moving objects in the images, while the traditional gaussian modeling method uses the same number of gaussian distribution fitting backgrounds, so that calculation redundancy is caused, and waste of system resources is caused.
In order to reduce the amount of computation, the present invention introduces the idea of blocking. The blocking concept here is embodied in two ways: on the one hand, the image is divided into a plurality of identical small areas, for example, a small area of 2×2, and the average pixel value in the small area is used for replacing the pixel value in the area, so that the purpose of reducing the calculation amount is achieved. On the other hand, in the self-built dynamic gesture library, the user has a face and a hand, the hand is only a smaller area in the image, and a large amount of background exists in the image. The traditional gaussian modeling method uses the same fixed number of gaussian distributions to fit the background pixel values, but this method has a large amount of computational redundancy for contexts that do not change much over time. For example, when a person is far from a camera, the background is unchanged, and the hand is a smaller area in the image, if modeling is performed by using a traditional Gaussian modeling method, the area which is often determined as the background in the modeling process always occupies a large number of Gaussian components, so that the system overhead is wasted. To reduce the amount of computation, the idea of blocking is introduced here, dividing the image into two parts: the inter-frame difference method determines the region of the background as the change region detected by the inter-frame difference method. For the change region, a background model is built using a modified gaussian mixture model. And for the area judged as the background by the inter-frame difference method, only the parameters of the existing Gaussian distribution are updated, the new Gaussian distribution is not added, and whether the pixels in the area belong to the background is not judged.
In order to reduce the amount of computation, besides introducing the blocking idea, pixels are also divided into two types: single-mode pixel points and multi-mode pixel points. The invention assumes that a single-mode pixel can fit the pixel value using a gaussian distribution, while a multi-mode pixel can use a gaussian mixture model for background modeling. When the environment changes, the state of the pixel point may change, that is, the pixel point may be converted between a single mode and a multi-mode, so that the invention introduces a conversion mechanism between the single mode and the multi-mode.
It is assumed that each pixel in the initial background image can fit its pixel value using a gaussian distribution, and therefore, the pixel value of the initial background image is taken as the mean value of a single gaussian component in the initial background model, the gaussian component weight is set to 1, and its variance gives a larger value.
Assuming that the pixels of the change region can fit their pixel values using a plurality of gaussian distributions, a gaussian mixture model is built for these pixels and transformed using the three-level background update model proposed by the present invention.
2. Eliminating global ray mutation
The background subtraction method based on the Gaussian mixture model is suitable for detecting moving targets with small changes such as light rays, weather and the like, is sensitive to sudden changes of global brightness, and has weak capability of coping with scene mutation.
The traditional Gaussian mixture model can misjudge a large-area background pixel as a foreground due to illumination mutation, and aiming at the problem, the invention judges whether the surrounding environment has large-area illumination change or not by an inter-frame difference method. If the illumination mutation occurs, updating the frame level, and detecting a change area by using an inter-frame difference method; if no abrupt illumination changes occur, the motion region is detected using a modified Gaussian mixture model.
3. Construction of an initial background image
In order to extract a relatively clean initial background on the premise that a moving object exists, the invention constructs an initialization background by using a background image obtained by an inter-frame difference method, and provides a novel initial background extraction method.
4. In the background model updating process, a Gaussian degradation mechanism is added
After the background is matched and updated, checking whether redundant Gaussian components exist, and if so, removing the redundant Gaussian components to save the system overhead.
5. Updating parameters of all matching gaussian components
After the conventional gaussian mixture model is matched to a gaussian component, the matching is stopped, resulting in that the parameters of some gaussian components matched to the pixel are not updated properly. Therefore, the motion detection algorithm proposed by the present invention updates the parameters of all matched gaussian components, not just the earliest matched gaussian component.
6. To better accommodate the scene changes, an adaptive update rate is used.
The flow chart of the motion detection method based on the three-level background updating model is shown in fig. 2, and the specific calculation steps are shown in table 1. Wherein, the step of constructing the initial background by using the inter-frame difference method is shown in table 2, and the flow chart is shown in fig. 3.
Table 1 motion detection algorithm based on three-level background update model
Table 2 construction of initial background algorithm using interframe difference method
The motion detection method based on the three-level background model provided by the invention uses an inter-frame difference method to preliminarily detect the change region, and then divides the whole image into two parts: a change region and a rest region. Wherein, the change area motion detection algorithm is shown in table 3, and the flow chart is shown in fig. 4. And judging pixels in the static region by the inter-frame difference method, considering that the pixels in the region belong to the background, judging whether the pixels in the region belong to the background or not, adding no new Gaussian components, and updating the parameters of the Gaussian components with non-zero weights according to a formula (5) and normalizing the weights.
Table 3 change area motion detection algorithm
The inter-frame difference method is used for preliminarily detecting the change area, and in order to better adapt to the change of illumination, the self-adaptive threshold T is set after the difference between adjacent frames h 。
Assume { f k (x, y) }, k=1,.. k (x, y) and f k-1 (x, y) represent the kth frame and the kth-1 frame image, respectively. The difference between adjacent frames is made using gray image pixel values as follows:
wherein T is a For the empirical threshold, b is an empirical coefficient, N represents the total number of pixels in the image (i.e., the number of tiles after blocking), M k (x, y) is a binary image obtained by differentiating. Since the ambient light may vary, add-ons are purposely added to accommodate the ambient variationsIf the ambient light changes little, the value of the add will be small, even going to zero. If the ambient light changes relatively much, the value of the added item will also be relatively large. Thus, by adding the addition items to the threshold value, the adaptability of the inter-frame difference method to illumination changes is enhanced.
The matching and updating strategy based on the Gaussian mixture model is as follows:
when a certain frame of image is acquired, the pixel values belonging to the change area in the image are matched with the Gaussian components of the pixels. If a certain pixel value and a certain gaussian component of the pixel satisfy the formula (3), the pixel is considered to be matched with the gaussian component, i.e. the pixel is considered to be a background.
Wherein, the liquid crystal display device comprises a liquid crystal display device,and->Mean and standard deviation of the ith gaussian component of the k-1 frame pixel (x, y), respectively; a is a matching constant, typically chosen to be 2.5.
If a pixel successfully matches a gaussian component of the pixel, the parameters of the gaussian component are updated as per equation (4). And for the gaussian component for which the pixel does not match, its weight updates the weight of the gaussian component according to equation (5).
Wherein alpha is the self-adaptive learning rate, d is the empirical coefficient, r c The ratio of the number of foreground pixels to the total number of pixels detected by the inter-frame difference method is represented by ρ, which is the parameter update rate. w (w) i,k Weights of the ith gaussian component, w, of the kth frame i,k-1 Weighting, μ, the ith Gaussian component of the k-1 th frame i,k-1 Mean value of the ith Gaussian component, μ for the k-1 frame i,k Is the mean value of the ith Gaussian component of the kth frame, f k Is the pixel value of a pixel of the kth frame.
w i,k =(1-α)w i,k-1 (5)
If a pixel value does not match all of the Gaussian components of that pixel successfully, i.e., n f When the weight of the last gaussian component (the ranking of the gaussian components is arranged from the top to the bottom according to the weight-variance ratio, and the subsequent corresponding operation is only needed according to the sequence) is 0, if the weight of the last gaussian component is not 0, the weight of each previous gaussian component is not 0, and only the parameters of the last gaussian component need to be updated at this time, namely: and assigning the current pixel value to the mean value of the Gaussian component, wherein the weight and the variance are respectively an initial weight and an initial variance. If the weight of the last gaussian component is 0, then starting from the 1 st gaussian component, judging whether the weight of the gaussian component is 0, and updating the parameters of the gaussian component for the gaussian component of which the first weight is 0, namely: the current pixel value is calculatedThe mean value of the gaussian component is assigned, and the weight and variance are the initial weight and initial variance, respectively. Then carrying out weight normalization on each Gaussian component, and judging whether the pixel is matched with the Gaussian component according to a formula (3) for the Gaussian component with non-0 weight, and if so, considering the pixel as a background pixel; if all non-0 gaussian components do not match successfully, then the pixel is considered to be a foreground pixel.
Referring to fig. 1, the invention simultaneously uses skin color and motion information to detect hand regions, and the proposed dynamic hand detection method integrating a skin color model and a three-level background update model mainly comprises the following steps:
step 1: a video stream is input.
Step 2: preprocessing an input video stream, specifically: the input video stream is median filtered and the color image is converted into a gray scale map.
Step 3: in order to reduce the calculation amount, the gray image obtained in the step 2 is equally divided into a plurality of equal small blocks, and the small blocks are replaced by the small block average value. A small block is subsequently treated as a pixel. For example, an image is divided into 2×2 small blocks, which corresponds to a small block containing 4 pixels, and an average value of the 4 pixels is used as a pixel value of the 2×2 small block. Thus, the calculated amount can be reduced to 1/4 of the original calculated amount.
Step 4: the motion area is detected by using the three-level background updating model provided by the invention, and a motion detection result is obtained. The motion detection process comprises updating three hierarchical background models of pixel level, region level and frame level. See in particular the description of tables 1-3 and figures 2-4 above.
Step 5: and intercepting a rectangular area containing a change area in the color image, detecting a skin color area by using a CbCr-I channel Gaussian mixture model, and performing morphological operation (including closing operation and expansion operation) to obtain a skin color detection result.
Step 6: and acquiring an intersection of the skin color detection result and the movement detection result, thereby obtaining a skin color movement region.
Step 7: and (5) finding out the connected region where the skin color movement region obtained in the step (6) is located in the skin color detection result obtained in the step (5), and considering the region as a hand region, thereby achieving the purpose of removing the face and the skin color-like background.
The dynamic hand detection device corresponding to the method specifically comprises a motion detection module and a skin color detection module. The motion detection module is used for detecting a motion area in a video acquired by the camera, and particularly, the motion detection module detects the motion area by using the three-level background updating model provided by the invention. The skin tone detection module is used for detecting skin tone regions, and specifically, the invention detects skin tone regions by using a CbCr-I channel Gaussian mixture model and performs morphological operations (including a closing operation and an expansion operation).
In order to verify the correctness of the motion detection method and the hand detection method provided by the invention, the method is used for self-building a dynamic gesture library and performing a related experiment to verify the correctness of the method.
The dynamic gesture library of the present invention contains gestures in two scenarios (conference room and laboratory). Wherein each scene includes eight gestures, each gesture distinguishing between a left hand and a right hand. Each gesture of each hand was recorded at 0.6m,1m and 1.4m of the user's distance from the camera, respectively. Because the gesture is interacted with the leg-arm fusion multi-legged robot, the gesture is acquired by using a notebook computer controlling the robot to move, the resolution of the acquired gesture is 640 multiplied by 480 pixels, the frame speed is 15 frames/second, and a part of samples are shown in fig. 5.
A motion detection
The motion detection experiment of the invention comprises two scenes: conference rooms and laboratories. The initial backgrounds constructed according to the initial background construction method provided by the invention are shown in fig. 6, and the conference room and laboratory scene motion detection results are shown in fig. 7 and 8 respectively. In fig. 7, (a) is an original image of a conference room scene, (b) is an inter-frame difference method result, (c) is a preliminarily detected change region, and (d) is a hand movement detection result. In fig. 8, (a) is an original image of a laboratory scene, (b) is an interframe difference method result, (c) is a preliminarily detected change region, and (d) is a hand movement detection result. The accuracy of the motion detection method according to the present invention can be seen from the detection results of fig. 7 and 8.
The motion detection method provided by the invention is respectively compared with an interframe difference method and a background subtraction method based on a Gaussian mixture model in a conference room and a laboratory scene, and the results are respectively shown in fig. 9 and 10. In fig. 9, (a) is a conference room scene original image, (b) is a frame difference method motion detection result, (c) is a hand motion detection result of the method of the present invention, and (d) is a background subtraction method motion detection result based on a gaussian mixture model. In fig. 10, (a) is a laboratory scene artwork, (b) is an interframe difference method motion detection result, (c) is a hand motion detection result of the method of the present invention, and (d) is a background subtraction method motion detection result based on a gaussian mixture model. As can be seen from fig. 9 and 10, the inter-frame difference method can only detect the outline of the moving object, and the "cavity" is easy to appear in the frame, while the method provided by the invention can detect the hand movement region more completely.
Compared with a background subtraction method based on a Gaussian mixture model, on one hand, the method provided by the invention greatly reduces the calculated amount. This is because most of the area in the image belongs to the background, and the foreground is only a smaller area in the image. The method provided by the invention only detects hand motions by using an improved Gaussian mixture model aiming at a change region, and judges the region which is the background by using an interframe difference method, only updates the existing Gaussian parameters, does not increase the new Gaussian number, and does not judge whether the pixels belong to the background. For example, at initialization, each pixel is initialized to a single Gaussian model. Then, if some pixels are always judged as the background by the inter-frame difference method, only the Gaussian parameter is updated, the Gaussian number is not increased, and whether the pixels belong to the foreground is not judged, so that the calculation amount is reduced. On the other hand, for the variation region, the improved gaussian mixture model improves the detection effect to some extent.
B hand detection
The hand detection results in the conference room and laboratory scenarios are shown in fig. 11-14, respectively. In fig. 11, (a) is a conference room scene artwork, (b) is an inter-frame difference result, (c) is a detected change region, (d) is a motion detection result, (e) is a skin tone detection result in the change region, (f) is a skin tone motion region, and (g) is a hand detection result. In fig. 12, (a) - (h) are hand detection results in a conference room scenario. In fig. 13, (a) is a laboratory scene artwork, (b) is an inter-frame difference result, (c) is a detected change region, (d) is a motion detection result, (e) is a skin tone detection result in the change region, (f) is a skin tone motion region, and (g) is a hand detection result. In fig. 14, (a) - (h) are hand detection results in a laboratory scenario. The correctness of the hand detection method of the present invention can be seen from fig. 11-14.
Claims (6)
1. A dynamic hand detection method for fusing skin color and three-level background update model is characterized by comprising the following steps:
a. preprocessing each frame of image in a video stream, dividing the image into a plurality of equal small blocks, taking the pixel mean value of the small blocks as the pixel value of the small blocks, and taking each small block as one pixel;
b. detecting a motion region by using a three-level background updating model to obtain a motion detection result, wherein the motion detection result is specifically as follows:
b-1, constructing an initial background image by using an inter-frame difference method;
b-2, detecting a change region by using an inter-frame difference method, performing morphological operation, and calculating the ratio r of the number of foreground pixels to the total number of images c ;
b-3, judging r c And rt m And rt c Relationship between rt m <r c <rt c Executing the step b-5, otherwise executing the step b-4; rt (rt) m An empirical threshold, rt, representing the presence of a moving object in an image c An empirical threshold indicative of an occurrence of a sudden change in illumination;
b-4, updating the background image by using the current frame, and executing the step b-2 for the next frame image;
b-5, removing a small-area communication area with a fixed proportion smaller than the area of the maximum communication area, marking a change area detected by an inter-frame difference method by using a rectangular frame, and dividing the whole image into a change area and a static area at the moment;
b-6, detecting a motion area by using an improved Gaussian mixture model for pixels in the change area; for pixels in the stationary region, on the one hand, it is used as a background, and on the other hand, the parameters of the gaussian component with a weight other than 0 are updated using formula (5);
w i,k =(1-α)w i,k-1 (5)
in formula (5), w i,k Weights of the ith gaussian component, w, of the kth frame i,k-1 The weight of the ith Gaussian component of the k-1 frame is given, and alpha is the self-adaptive learning rate;
c. intercepting a rectangular area containing a change area in a corresponding color image in a video stream, detecting a skin color area by using a CbCr-I channel Gaussian mixture model, and performing morphological operation to obtain a skin color detection result;
d. c, acquiring an intersection of the skin color detection result obtained in the step c and the movement detection result obtained in the step b, so as to obtain a skin color movement region;
e. and c, finding out a connected region where the skin color movement region obtained in the step d is located in the skin color detection result obtained in the step c, wherein the region is a hand region.
2. The method for dynamic hand detection with fusion of skin tone and three-level background update model according to claim 1, wherein the step b-6 uses an improved gaussian mixture model for detecting motion areas for pixels in the change area, comprising the following steps:
b-61 initializing n for a certain pixel in the change region f =0,n f Representing the number of matches of the pixel value with the gaussian component corresponding to the pixel; a pixel value is matched with the gaussian component corresponding to the pixel, the following formula (3) is required to be satisfied,
in the formula (3), the amino acid sequence of the compound,and->Mean and standard deviation of the ith gaussian component of the k-1 frame pixel (x, y), respectively; a is a matching constant;
b-62, for the Gaussian component with the weight of non-0, judging whether the pixel is matched with the Gaussian component through a formula (3), if so, updating parameters of the Gaussian component by adopting a formula (4), and if so, successfully matching one Gaussian component every time, and n f 1 in number; if not, updating parameters of the Gaussian component by adopting a formula (5);
in the formula (4), d is an empirical coefficient, ρ is a parameter update rate, μ i,k-1 Mean value of the ith Gaussian component, μ for the k-1 frame i , k Is the mean value of the ith Gaussian component of the kth frame, f k A pixel value for a pixel of the kth frame;
b-63, sorting all Gaussian components according to the ratio of the weight to the variance from large to small;
b-64, if the weight of a certain Gaussian component is more than 0 and less than the initial weight and the variance is less than the initial variance, deleting the Gaussian component;
b-65, judge n f If equal to 0, if yes, executing the step b-66; if not, the pixel is a background pixel;
b-66, judging whether the weight of the last Gaussian component is 0, and if so, executing a step b-67; if not, assigning the pixel value to the mean value of the last Gaussian component, and respectively assigning the weight and variance of the last Gaussian component as an initial weight and an initial variance;
b-67, searching a Gaussian component with the weight of 0 appearing for the first time from the first Gaussian component, assigning the pixel value to the mean value of the Gaussian component, and respectively assigning the weight and variance of the Gaussian component as an initial weight and an initial variance;
b-68, normalizing the weight of each Gaussian component;
b-69, judging whether the Gaussian component with the weight of not 0 is matched with the pixel value according to the formula (3), and if so, judging that the pixel is a background pixel; if not, the pixel is a foreground pixel.
3. The method for dynamic hand detection with fusion of skin tone and three-level background update model according to claim 2, wherein a in formula (3) is 2.5.
4. The method for dynamically detecting hands fused by a skin color and three-level background update model according to claim 1, wherein the step b-1 of constructing an initial background image by using an interframe difference method comprises the following specific steps:
b-11 detecting the change region according to formula (1) by the inter-frame difference method, performing morphological closing operation, and calculating the number N of pixels of the connected region c ;
In the above two formulas, f k (x, y) and f k-1 (x, y) represents pixel values of the k-th frame and k-1-th frame images, respectively, T a For the empirical threshold, b is an empirical coefficient, N represents the total number of pixels in the image, M k (x, y) is a binary image obtained after the difference;
b-12, judge N c If so, executing the step b-13; if not, executing the step b-14;
b-13, using a rectangular frame to frame all connected areas, and constructing a binarized background image bg by the current frame b The pixel values in the middle rectangular area are all assigned 0, and the pixel values outside the rectangular area are all assigned 1;
b-14, constructing the binarized background image bg of the current frame b All pixel values are assigned 0;
b-15, taking the binarized background image bg constructed by the current frame b Intersection with the current frame image to obtain a background image bg constructed by the current frame t ;
And b-16, adding the background images constructed by the adjacent frames until each pixel in the image constructs a background at least 4 times, and calculating the pixel average value of each pixel, thereby obtaining an initial background image.
5. The method for dynamic hand detection with fusion of skin tone and three-level background update model according to claim 1, wherein the small blocks in step a are 2 x 2 areas.
6. A dynamic hand detection device integrating skin color and a three-level background updating model is characterized by comprising a motion detection module and a skin color detection module; the motion detection module is used for detecting a motion area in the video acquired by the camera, and specifically, the motion area is detected by using the three-level background updating model in claim 1; the skin color detection module is used for detecting skin color areas, specifically, detecting skin color areas by using a CbCr-I channel Gaussian mixture model, and performing morphological operation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110966932.1A CN113822161B (en) | 2021-08-23 | 2021-08-23 | Dynamic hand detection method and device for fusion of skin color and three-level background updating model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110966932.1A CN113822161B (en) | 2021-08-23 | 2021-08-23 | Dynamic hand detection method and device for fusion of skin color and three-level background updating model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113822161A CN113822161A (en) | 2021-12-21 |
CN113822161B true CN113822161B (en) | 2023-07-25 |
Family
ID=78913458
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110966932.1A Active CN113822161B (en) | 2021-08-23 | 2021-08-23 | Dynamic hand detection method and device for fusion of skin color and three-level background updating model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113822161B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115294456A (en) * | 2022-08-23 | 2022-11-04 | 山东巍然智能科技有限公司 | Building lightening project detection method, equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107452005A (en) * | 2017-08-10 | 2017-12-08 | 中国矿业大学(北京) | A kind of moving target detecting method of jointing edge frame difference and gauss hybrid models |
CN107507221A (en) * | 2017-07-28 | 2017-12-22 | 天津大学 | With reference to frame difference method and the moving object detection and tracking method of mixed Gauss model |
WO2018058854A1 (en) * | 2016-09-30 | 2018-04-05 | 北京大学深圳研究生院 | Video background removal method |
WO2018130016A1 (en) * | 2017-01-10 | 2018-07-19 | 哈尔滨工业大学深圳研究生院 | Parking detection method and device based on monitoring video |
-
2021
- 2021-08-23 CN CN202110966932.1A patent/CN113822161B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018058854A1 (en) * | 2016-09-30 | 2018-04-05 | 北京大学深圳研究生院 | Video background removal method |
WO2018130016A1 (en) * | 2017-01-10 | 2018-07-19 | 哈尔滨工业大学深圳研究生院 | Parking detection method and device based on monitoring video |
CN107507221A (en) * | 2017-07-28 | 2017-12-22 | 天津大学 | With reference to frame difference method and the moving object detection and tracking method of mixed Gauss model |
CN107452005A (en) * | 2017-08-10 | 2017-12-08 | 中国矿业大学(北京) | A kind of moving target detecting method of jointing edge frame difference and gauss hybrid models |
Non-Patent Citations (1)
Title |
---|
机器人视觉手势交互技术研究进展;齐静;徐坤;丁希仑;;机器人(第04期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN113822161A (en) | 2021-12-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110147743B (en) | Real-time online pedestrian analysis and counting system and method under complex scene | |
CN109558832B (en) | Human body posture detection method, device, equipment and storage medium | |
Chung et al. | An efficient hand gesture recognition system based on deep CNN | |
Zhang et al. | Robust visual tracking via exclusive context modeling | |
CN101470809B (en) | Moving object detection method based on expansion mixed gauss model | |
CN110705412A (en) | Video target detection method based on motion history image | |
WO2021012493A1 (en) | Short video keyword extraction method and apparatus, and storage medium | |
CN112861808B (en) | Dynamic gesture recognition method, device, computer equipment and readable storage medium | |
Liu et al. | Hand Gesture Recognition Based on Single‐Shot Multibox Detector Deep Learning | |
Hao et al. | Recognition of basketball players’ action detection based on visual image and Harris corner extraction algorithm | |
CN113822161B (en) | Dynamic hand detection method and device for fusion of skin color and three-level background updating model | |
CN112906520A (en) | Gesture coding-based action recognition method and device | |
Zhang et al. | An optical flow based moving objects detection algorithm for the UAV | |
Xu et al. | Feature extraction algorithm of basketball trajectory based on the background difference method | |
Singh | Recognizing hand gestures for human computer interaction | |
CN116912883A (en) | Multi-person key point identification method, equipment and storage medium based on deep learning | |
Mengcong et al. | Object semantic annotation based on visual SLAM | |
Qiao et al. | Two-Stream Convolutional Neural Network for Video Action Recognition. | |
CN104732558B (en) | moving object detection device | |
Zhong et al. | Research on Discriminative Skeleton‐Based Action Recognition in Spatiotemporal Fusion and Human‐Robot Interaction | |
CN113627342A (en) | Method, system, device and storage medium for video depth feature extraction optimization | |
Cao et al. | Fast EfficientDet: An Efficient Pedestrian Detection Network. | |
Zhou et al. | Vision sensor‐based SLAM problem for small UAVs in dynamic indoor environments | |
Xie et al. | Semantic-based traffic video retrieval using activity pattern analysis | |
Gangodkar et al. | Video Object Detection Using Densenet-Ssd |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20231115 Address after: No. 12, Unit 3, Building 62, No. 18 Qianjin Street, Fuxing District, Handan City, Hebei Province, 056000 Patentee after: Handan Maoxuan Building Materials Trading Co.,Ltd. Address before: 071002 No. 54 East 180 Road, Hebei, Baoding Patentee before: HEBEI University |
|
TR01 | Transfer of patent right |