CN107180224B - Finger motion detection and positioning method based on space-time filtering and joint space Kmeans - Google Patents

Finger motion detection and positioning method based on space-time filtering and joint space Kmeans Download PDF

Info

Publication number
CN107180224B
CN107180224B CN201710231824.3A CN201710231824A CN107180224B CN 107180224 B CN107180224 B CN 107180224B CN 201710231824 A CN201710231824 A CN 201710231824A CN 107180224 B CN107180224 B CN 107180224B
Authority
CN
China
Prior art keywords
space
background
kmeans
class
moving target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201710231824.3A
Other languages
Chinese (zh)
Other versions
CN107180224A (en
Inventor
韦岗
梁舒
马碧云
李增
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201710231824.3A priority Critical patent/CN107180224B/en
Publication of CN107180224A publication Critical patent/CN107180224A/en
Application granted granted Critical
Publication of CN107180224B publication Critical patent/CN107180224B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • G06V20/42Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a finger motion detection and positioning method based on space-time filtering and joint space Kmeans, which comprises the steps of firstly attaching ten labels with different colors (except black and white) to fingers of a player, and shooting a video of the player playing a keyboard instrument; then, detecting a finger moving target by using a time-space filtering method for an input video frame, and feeding back a space-space filtering result to guide dynamic background updating; finger moving target positioning is carried out by using joint space Kmeans, and the clustering number and the initialized class center are determined in a self-adaptive manner by combining with the statistical characteristics of R, G, B histogram, so that fingering recognition and recording functions with low calculation complexity, high convergence speed, high positioning accuracy and good real-time performance are realized.

Description

Finger motion detection and positioning method based on space-time filtering and joint space Kmeans
Technical Field
The invention relates to the technical fields of visual monitoring, digital image processing and the like, in particular to a finger motion detection and positioning method based on space-time filtering and joint space Kmeans.
Background
Proper fingering by a piano (or other keyboard instrument) player is critical to flexible playing and interpretation of music. The good fingering can embody the comprehension and interpretation of the characteristics of the style of the music and the content of the works by the player, and meanwhile, the energy and the time can be saved, and the playing efficiency can be improved. Although the fingering of playing has a general rule, the non-fixity of different song fingering increases the difficulty for beginners to practice fingering and imitate fingering of musicians. The manual recording fingering method not only needs higher music maintenance, but also is time-consuming and labor-consuming. Therefore, it is a necessary trend for fingering research and learning to realize automatic and intelligent fingering recognition by a machine.
The key of fingering recognition is the organic combination of moving object detection and moving object positioning.
A commonly used moving object detection method includes: background modeling, frame differencing, and optical flow.
1) Background modeling method: static scenes without intruding objects are assumed to have some general characteristics and are mixed together with a weighted sum of statistical models to simulate the background model. Once the background model is known, the intruding object can be detected by marking out the portions of the scene image that do not conform to this background model. Common background modeling methods include single gaussian models, mixed gaussian models, kernel density estimation, etc. Although the methods can obtain a relatively accurate moving target area, the calculation amount is relatively large, the speed is slow, and the method is sensitive to illumination change and background change.
2) Frame difference method: motion regions in the image are extracted by temporal differences between adjacent frames. Although the frame difference method has high operation speed and good stability, when the finger movement speed is low, the overlapped part of the two frames with the moving target pixels close to each other cannot be detected.
3) An optical flow method: motion detection is performed using the optical flow characteristics of a moving object that change with time, and an independent moving object can be detected even when no information on the scene can be obtained in advance, although background modeling is not required. However, the calculation is complex, a special hardware device is needed, the real-time requirement is difficult to meet, and meanwhile, the problems of motion boundaries, motion occlusion, multi-motion (including transparent and semitransparent motion) and the like are also bottlenecks of the optical flow method.
Simultaneous moving object localization methods are typically based on edge detection. Edge detection replaces simplified positioning information with accurate target contour representation, but edge detection loses a large amount of information when fingering is complicated or labels of two or more fingers have overlapping parts, and even judges two moving targets as one. And the finger can not be matched with the detected outline correctly because the edge detection can only position and can not classify. Meanwhile, the edge detection is greatly influenced by the background, and the noise point can be detected to interfere the positioning of the finger without a filtering function.
Therefore, in the application scenario of playing fingering recognition, the moving object detection and positioning method has various problems, such as: the invention provides a finger motion detection and positioning method based on space-time filtering and joint space Kmeans, which realizes fingering identification by analyzing videos of players playing pianos (or other keyboard instruments). The moving target detection of the invention adopts a space-time filtering method, which can overcome the influence of illumination change and background change and effectively avoid the missing detection of the low-speed moving target; the moving target positioning uses a joint space Kmeans method, so that the statistical characteristics of the images can be fully utilized to carry out self-adaptive judgment, and the positioning and clustering accuracy is improved.
Disclosure of Invention
The method aims to overcome the defect that the existing moving target detection and positioning method is applied to the performance fingering recognition scene, and provides the finger motion detection and positioning method based on time-space domain filtering and joint space Kmeans.
In order to achieve the purpose, the finger motion detection and positioning method based on the space-time filtering and the joint space Kmeans comprises three modules of labeling, video shooting, moving object detection and moving object positioning.
The labeling and video shooting module is used for generating a video file processed by a subsequent module, firstly, ten labels with different colors (except black and white) are attached to fingers of a player, and then the player normally plays a piano and shoots a playing process into a video.
The moving object detection module is used for detecting a moving object and adopts a space-time filtering method. Firstly, performing spatial filtering on an input video frame to obtain an accurate moving target area. And then the spatial filtering result is fed back to guide the temporal band-pass filtering result and the temporal low-pass filtering result to perform spatial recombination at the foreground (finger movement) position and the background position to complete dynamic background updating, so that the influences of illumination change, camera shake and background change can be overcome, and the missing detection of a moving target when the finger moves at a low speed is effectively avoided. And converting the detection result of the finger moving target from an RGB space (color space) to a YCrCb and HSV space (color space) for band-pass filtering, removing skin color and shadow, judging through a foreground threshold value, and extracting a label.
The specific implementation steps of the object motion detection are shown in fig. 2.
Step 1: spatial filtering, comprising the steps of:
1.1 search for moving object regions. The current input video frame and the background image are compared by pixel point in the space domain to search the moving target area.
1.2 determining foreground and background. The motion target area is set as a pixel point of a corresponding position of a current video input frame, and a pixel point of a background area is set as white (in an RGB space, white is (255, 255, 255)).
1.3 feed back foreground and background. Foreground (moving object region) and background feedback are used for background update of the next frame.
Step 2: dynamic background update, comprising the steps of:
and 2.1, spatial filtering result feedback. And feeding back the previous spatial filtering result to guide dynamic background updating. And judging whether the current input video frame is the 2 nd frame image. If the current input frame is the 2 nd frame, the background is not updated, and the first frame image is directly used as the background; and if the current input frame is not the 2 nd frame, performing the next operation.
2.2 spatial domain recombination. And performing spatial domain recombination on the time domain band-pass filtering result and the time domain low-pass filtering result at the foreground (finger movement) position and the background position to complete background updating.
And step 3: extracting the label, comprising the steps of:
3.1 removing skin color. Converting the RGB space into YCrCb space, and judging whether the coordinates (Cr, Cb) are in the skin color ellipse model. If a certain pixel point is in the skin color elliptical model, the pixel point is set to be white.
3.2 remove the shadow. And converting the RGB space into HSV space, and performing band-pass filtering on the V component histogram.
3.3 judging the label. And in the HSV space, calculating a foreground average threshold of the S component, and setting the pixel point of which the S component is smaller than the foreground saturation average threshold in the extracted moving target as white.
The moving target positioning module is used for positioning a moving target and adopts a joint space Kmeans method. The joint space Kmeans can not only be used for positioning, but also be used for classification, so that the correct matching of different fingers and label classification is realized, and the positioning errors caused by color overlapping, fingering complexity and noise point interference are effectively avoided. Firstly, the wave crests of the three component histograms after low-pass filtering are judged R, G, B, and the size of the clustering number K is determined in a self-adaptive mode, so that the classification is more accurate and intelligent. And then, self-adaptive initialization clustering is performed by utilizing the statistical characteristics of the histogram, so that the situation of local optimum can be avoided, the iterative convergence speed is accelerated, and the efficiency and the accuracy of the algorithm are improved. The color space (R, G, B) and the geometric space (x, y) are combined with the 5-dimensional Kmeans, so that the prior knowledge of the similar positions of the pixel points with the same color can be fully utilized, and the accuracy of clustering and positioning is improved. And random disturbance and simulated annealing are carried out on the clustering center, so that the stability of the algorithm is improved as much as possible while the clustering center is prevented from falling into local optimum. And finally, classifying and positioning the clustering result, and determining the corresponding position of the fingers of each frame of picture on the keyboard, thereby obtaining the fingering of the player.
The specific implementation steps of the target motion positioning are shown in fig. 3.
Step 1: joint spatial adaptive Kmeans comprising the steps of:
1.1 statistics R, G, B histogram characteristics. And (4) performing low-pass filtering on the three component histograms of the moving object detection result R, G, B, and judging the peak of the histogram in a self-adaptive manner.
1.2 adaptively determining the clustering number K. The maximum number of peaks of the R, G, B histogram is taken as the cluster number of the joint space Kmeans.
1.3 adaptive clustering initialization. The cluster center is initialized with the R, G, B histogram peak locations.
1.4 iterate until convergence. The following operations are repeated until convergence: (a) class centers for the K classes are calculated, respectively. The class center of the kth (K is more than or equal to 1 and less than or equal to K) is the mean vector of the 5-dimensional observation (R, G, B, x, y) vectors in the kth class. (b) Each observation is assigned to the class in which the closest class center is located (euclidean distance is used to define "closest").
Step 2: random perturbation and simulated annealing, comprising the steps of:
2.1 calculate the 5-dimensional perturbation radius for each class. Taking the distance from the center of each class to the farthest from all the points of the class as the disturbance radius rK(five-dimensional vector, K is the number of clusters).
2.2 random perturbation. Taking a random number random between-1 and 10Let class center go on rK*random0The disturbance of (2). Taking the result after the class center disturbance as a new initialization class center, and performing joint space self-operation againFitting for Kmeans. The new objective function is calculated as the difference Δ J-J from the current objective function. If Δ J < 0, the new solution is accepted as the current solution and the perturbation radius is updated. The objective function is an objective function in Kmeans.
2.3 simulated annealing. Modifying the random number participating in the disturbance into random0*a-tWherein a is the annealing rate, a>1, t is the number of anneals, and the operations of 2.1 and 2.2 are continued.
And step 3: fingering identification, comprising the following steps:
3.1 locating the moving object. And determining the corresponding position of each video frame finger on the keyboard by combining the coordinates of the spatial self-adaptive Kmeans clustering center so as to obtain the fingering.
3.2 fingered output. And uniformly storing the fingering of each video frame in csv for subsequent fingering learning and research.
Compared with the prior art, the invention has the following advantages and technical effects:
1) compared with the common moving target detection technology, the method can overcome the influences of illumination change, camera shake and background change, effectively avoids the degradation of the background and the omission of the moving target during low-speed finger movement, and is beneficial to the detection and extraction of the moving target.
2) The invention adopts a spatial filtering method in the moving target detection, determines the moving target by comparing the input video frame with the background image in a spatial domain pixel by pixel, and can obtain a more accurate moving target area compared with the common moving target detection technology.
3) The invention uses a united space self-adaptive Kmeans method in the moving target positioning, combines the positioning with the self-adaptive clustering, thereby realizing the correct matching of different fingers and label classification, and effectively avoiding the positioning error caused by color overlapping, fingering complexity and noise point interference. The color space (R, G, B) and the geometric space (x, y) are combined with the 5-dimensional Kmeans, so that the prior knowledge of the similar positions of the pixel points with the same color can be fully utilized, and the accuracy of clustering and positioning is improved.
4) In the method for positioning the moving target by using the Kmeans in the joint space, the size of the clustering number K is determined in a self-adaptive manner according to the maximum value of the number of the wave peaks after low-pass filtering of R, G, B three component histograms, so that the classification is more accurate and intelligent. The statistical characteristics of the histogram are used for self-adaptively initializing clustering, so that the situation of falling into local optimum can be avoided, the iterative convergence speed is accelerated, and the efficiency and the accuracy of the algorithm are improved.
In conclusion, the method can overcome the defect that the existing moving target detection and positioning method is applied to the performance fingering recognition scene, has the advantages of insensitivity to illumination and background change, low calculation complexity, high convergence speed, high positioning accuracy, good real-time performance and the like, and can be widely applied to gesture recognition and other fields by being properly modified.
Drawings
FIG. 1 is a general flow chart of a finger motion detection and localization method based on spatio-temporal filtering and joint space Kmeans according to the present invention;
FIG. 2 is a flow chart of a moving object detection module according to the present invention;
FIG. 3 is a flow chart of the moving object locating module according to the present invention.
Detailed Description
The invention firstly sticks the fingers of the player with ten labels with different colors (except black and white) and shoots the video of the player playing the piano with the fingers.
And then, carrying out moving object detection processing on the video. Firstly, determining a finger motion area by using a time-space filtering method for an input video frame, and extracting a label. The spatial filtering detects the moving target by comparing the input video frame with the background image pixel by pixel in the spatial domain, thereby obtaining a more accurate moving target area. And then, the spatial filtering result is fed back to guide dynamic background updating, so that the updated background is closest to the background of the spatial filtering input video frame, the background degradation is effectively avoided, and the detection and extraction of the moving target are facilitated. In YCrCb space, the projection of skin information on a CrCb two-dimensional plane is approximately in elliptical distribution, and the skin color pixel points on the finger moving target are removed by judging whether the coordinates (Cr, Cb) are in a skin color elliptical model. In the HSV space, the brightness V represents the brightness degree of the color, and the darker the brightness V is, the smaller the brightness V is. The brightness of the shadow relative to other parts of the finger moving object is minimum, and the shadow can be removed through band-pass filtering of the V component histogram. S represents the saturation of the color, and the saturation is higher if the color is dark and bright. The saturation of the label is maximum relative to other parts of the finger moving target, and the label can be extracted through the judgment of the foreground saturation average threshold.
And finally, performing motion positioning processing on the video. And classifying and positioning the finger moving target by adopting the joint space Kmeans, thereby realizing the functions of fingering identification and recording. Kmeans clusters with K points in space as centers, and classifies the object closest to the center. And through an iterative method, each clustering center is updated successively, the error is reduced continuously, and the optimal solution is converged when the error is unchanged. In this moving object location, the optimal solution is around the 10 label colors R, G, B. Therefore, the peak after low-pass filtering of the R, G, B three component histograms is used for self-adaptive initialization of the clusters, so that the initialization center of the Kmeans can be closer to the optimal solution, the iterative convergence speed is accelerated, the algorithm efficiency is improved, the self-adaptive initialization can avoid the situation of falling into the local optimal solution, which is different from the situation that the clustering of the random initialization Kmeans can possibly obtain the local optimal solution rather than the overall optimal solution. Meanwhile, the color space (R, G, B) and the geometric space (x, y) are combined with the 5-dimensional Kmeans, the prior knowledge of the similar positions of the pixel points with the same color can be fully utilized, and the accuracy of clustering and positioning is improved. The simulated annealing Kmean algorithm is a heuristic iterative algorithm with progressive convergence, which has been theoretically proven to converge on a global optimal solution with a probability of 1. Therefore, random disturbance and simulated annealing are carried out on the clustering center, and the stability of the algorithm is improved while the clustering center is prevented from falling into local optimum.
The invention organically combines methods such as machine learning, digital signal processing and the like together, and realizes the detection and the positioning of the finger movement based on the space-time filtering and the joint space Kmeans method. The present invention will be described in further detail with reference to the following detailed description and accompanying drawings, but the embodiments of the invention are not limited thereto.
Fig. 1 is a specific embodiment of the present invention, which mainly includes three modules of labeling and video shooting, moving object detection, and moving object positioning. The invention firstly sticks the fingers of the player with ten labels with different colors (except black and white) and shoots the video of the player playing the piano with the fingers. And then, detecting a finger moving target area by using a time-space filtering method for the input video frame, extracting a label, and classifying and positioning the finger moving target by using joint space Kmeans, thereby realizing fingering identification and recording functions.
The labeling and video shooting module is used for generating a video file processed by a subsequent module, firstly, ten labels with different colors (except black and white) are attached to fingers of a player, and then the player normally plays a piano and shoots a playing process into a video.
The moving object detection module is used for detecting a moving object and adopts a space-time filtering method. Firstly, performing spatial filtering on an input video frame to obtain an accurate moving target area. And then the spatial filtering result is fed back to guide the temporal band-pass filtering result and the temporal low-pass filtering result to perform spatial recombination at the foreground (finger movement) position and the background position to complete dynamic background updating, so that the influences of illumination change, camera shake and background change can be overcome, and the missing detection of a moving target when the finger moves at a low speed is effectively avoided. And converting the detection result of the finger moving target from an RGB space to a YCrCb and HSV space for band-pass filtering, removing skin color and shadow, judging through a foreground threshold value, and extracting a label.
The specific implementation steps of the object motion detection are shown in fig. 2.
Step 1: spatial filtering, comprising the steps of:
1.1 search for moving object regions. The current input video frame and the background image are compared by pixel point in the space domain to search the moving target area.
1.2 determining foreground and background. And setting the pixel points of the corresponding positions of the current video input frame in the motion target area, and setting the pixel points of the background area to be white.
1.3 feed back foreground and background. Foreground (moving object region) and background feedback are used for background update of the next frame.
Step 2: dynamic background update, comprising the steps of:
and 2.1, spatial filtering result feedback. And feeding back the previous spatial filtering result to guide dynamic background updating. And judging whether the current input video frame is the 2 nd frame image. If the current input frame is the 2 nd frame, the background is not updated, and the first frame image is directly used as the background; and if the current input frame is not the 2 nd frame, performing the next operation.
2.2 spatial domain recombination. And performing spatial domain recombination on the time domain band-pass filtering result and the time domain low-pass filtering result at the foreground (finger movement) position and the background position to complete background updating.
And step 3: extracting the label, comprising the steps of:
3.1 removing skin color. Converting the RGB space into YCrCb space, and judging whether the coordinates (Cr, Cb) are in the skin color ellipse model. If a certain pixel point is in the skin color elliptical model, the pixel point is set to be white.
3.2 remove the shadow. And converting the RGB space into HSV space, and performing band-pass filtering on the V component histogram.
3.3 judging the label. And in the HSV space, calculating a foreground average threshold of the S component, and setting the pixel point of which the S component is smaller than the foreground saturation average threshold in the extracted moving target as white.
The moving target positioning module is used for positioning a moving target and adopts a joint space Kmeans method. The joint space Kmeans can not only be used for positioning, but also be used for classification, so that the correct matching of different fingers and label classification is realized, and the positioning errors caused by color overlapping, fingering complexity and noise point interference are effectively avoided. Firstly, the wave crests of the three component histograms after low-pass filtering are judged R, G, B, and the size of the clustering number K is determined in a self-adaptive mode, so that the classification is more accurate and intelligent. And then, self-adaptive initialization clustering is performed by utilizing the statistical characteristics of the histogram, so that the situation of local optimum can be avoided, the iterative convergence speed is accelerated, and the efficiency and the accuracy of the algorithm are improved. The color space (R, G, B) and the geometric space (x, y) are combined with the 5-dimensional Kmeans, so that the prior knowledge of the similar positions of the pixel points with the same color can be fully utilized, and the accuracy of clustering and positioning is improved. And random disturbance and simulated annealing are carried out on the clustering center, so that the stability of the algorithm is improved as much as possible while the clustering center is prevented from falling into local optimum. And finally, classifying and positioning the clustering result, and determining the corresponding position of the fingers of each frame of picture on the keyboard, thereby obtaining the fingering of the player.
The specific implementation steps of the target motion positioning are shown in fig. 3.
Step 1: joint spatial adaptive Kmeans comprising the steps of:
1.1 statistics R, G, B histogram characteristics. And (4) performing low-pass filtering on the three component histograms of the moving object detection result R, G, B, and judging the peak of the histogram in a self-adaptive manner.
1.2 adaptively determining the clustering number K. The maximum number of peaks in the R, G, B histogram is taken as the cluster number of the joint space Kmeans.
1.3 adaptive clustering initialization. The cluster center is initialized with the R, G, B histogram peak locations.
1.4 iterate until convergence. The following operations are repeated until convergence: (a) class centers for the K classes are calculated, respectively. The class center of the kth (K is more than or equal to 1 and less than or equal to K) is the mean vector of the 5-dimensional observation (R, G, B, x, y) vectors in the kth class. (b) Each observation is assigned to the class in which the closest class center is located.
Step 2: random perturbation and simulated annealing, comprising the steps of:
2.1 calculate the 5-dimensional perturbation radius for each class. Taking the distance from the center of each class to the farthest from all the points of the class as the disturbance radius rK(five-dimensional vector, K is the number of clusters).
2.2 random perturbation. Taking a random number random between-1 and 10Let class center go on rK*random0The disturbance of (2). And taking the result after the class center disturbance as a new initialization class center, and performing joint space self-adaptive Kmeans again. The new objective function is calculated as the difference Δ J-J from the current objective function. If Δ J < 0, then connectThe new solution is taken as the current solution, and the disturbance radius is updated.
2.3 simulated annealing. Modifying the random number participating in the disturbance into random0*a-tWherein a is the annealing rate, a>1, t is the set annealing times, and the operations of 2.1 and 2.2 are carried out.
And step 3: fingering identification, comprising the following steps:
3.1 locating the moving object. And determining the corresponding position of each video frame finger on the keyboard by combining the coordinates of the spatial self-adaptive Kmeans clustering center so as to obtain the fingering.
3.2 fingered output. And uniformly storing the fingering of each video frame in csv for subsequent fingering learning and research.
The invention can be well realized and the effects of the invention can be obtained, and the embodiment of the invention can be widely applied to gesture recognition and other fields after being properly modified.

Claims (1)

1. The finger motion detection and positioning method based on space-time filtering and combined space self-adaptive Kmeans is characterized in that the method is realized by a labeling and shooting video module, a moving object detection module and a moving object positioning module;
the labeling and video shooting module is used for generating a video file processed by a subsequent module and comprises: firstly, attaching ten labels with different colors except black and white to fingers of a player, and shooting a playing process into a video while the player normally plays a piano;
the moving object detection module is used for detecting a moving object, and the specific implementation steps comprise:
step 1, spatial filtering, comprising the following steps:
1.1 search for moving object region: carrying out space domain pixel-by-pixel comparison on a current input video frame and a background image to search a moving target area;
1.2 determining foreground and background: setting a motion target area as a pixel point of a corresponding position of a current video input frame, setting a pixel point of a background area as white, wherein the white is (255, 255, 255) in an RGB space;
1.3 feed back foreground and background: the foreground, namely the moving target area and the background feedback are used for updating the background of the next frame;
step 2, dynamic background updating, comprising the following steps:
2.1 spatial filtering result feedback: the last spatial filtering result is fed back to guide dynamic background updating, whether the current input video frame is a 2 nd frame image or not is judged, if the current input frame is the 2 nd frame, the background is not updated, and the first frame image is directly used as the background; if the current input frame is not the 2 nd frame, performing the next operation;
2.2 spatial domain reorganization: performing space domain recombination on the time domain band-pass filtering result and the time domain low-pass filtering result in the foreground, namely the finger movement position and the background position, so as to complete background updating;
and step 3: extracting the label, comprising the steps of:
3.1 removing skin color: converting the RGB space into the YCrCb space, judging whether the coordinates (Cr, Cb) are in the skin color elliptical model, and if a certain pixel point is in the skin color elliptical model, setting the pixel point to be white;
3.2 shadow removal: converting the RGB space into HSV space, and performing band-pass filtering on the V component histogram;
3.3 judging label: in HSV space, calculating a foreground average threshold of an S component, and setting pixel points of which the S component is smaller than the foreground saturation average threshold in the extracted moving target as white;
the moving target positioning module is used for positioning a moving target and is realized by adopting joint space self-adaptive Kmeans, and the specific realization steps comprise:
step 1, combining spatial adaptive Kmeans, comprising the following steps:
1.1 statistical R, G, B histogram characterization: performing low-pass filtering on the three component histograms of the moving target detection result R, G, B, and judging the peak of the histogram in a self-adaptive manner;
1.2 self-adaptively determining the clustering number K: taking R, G, B maximum peak number of histogram as cluster number of joint space self-adaptive Kmeans;
1.3 adaptive clustering initialization: initializing a clustering center by using R, G, B histogram peak positions;
1.4 iterations until convergence: the following operations are repeated until convergence: (a) respectively calculating class centers of K classes; the class center of the kth class is a mean vector of 5-dimensional observation (R, G, B, x, y) vectors in the kth class, and K is more than or equal to 1 and less than or equal to K; (b) assigning each observation to the class in which the closest class center is located; the distance is determined by adopting the Euclidean distance;
step 2: random perturbation and simulated annealing, comprising the steps of:
2.1 calculate the 5-dimensional perturbation radius for each class: taking the farthest distance between the class center of each class and all the points in the class as a disturbance radius rK,rKIs a five-dimensional vector, and K is the number of clusters;
2.2 random perturbation: taking a random number random between-1 and 10Let class center go on rK*random0The disturbance of (2); taking the disturbed result of the class center as a new initialized class center, performing joint space self-adaptation Kmeans again, and calculating the difference value delta J between a new target function J 'and the current target function J to be J' -J; if the delta J is less than 0, a new solution is accepted as the current solution, the disturbance radius is updated, and the step 3 is carried out; otherwise, entering step 2.3;
2.3 simulated annealing: modifying the random number participating in the disturbance into random0*a-tWherein a is the annealing rate, a>1, t is the annealing times, and the operations of 2.1 and 2.2 are continued;
and step 3: fingering identification, comprising the following steps:
3.1 positioning of the moving target: determining the corresponding position of each video frame finger on the keyboard through the coordinate of the joint space self-adaptive Kmeans clustering center so as to obtain a fingering method;
3.2 fingering output: and uniformly storing the fingering of each video frame in a csv file for subsequent fingering learning and research.
CN201710231824.3A 2017-04-10 2017-04-10 Finger motion detection and positioning method based on space-time filtering and joint space Kmeans Expired - Fee Related CN107180224B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710231824.3A CN107180224B (en) 2017-04-10 2017-04-10 Finger motion detection and positioning method based on space-time filtering and joint space Kmeans

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710231824.3A CN107180224B (en) 2017-04-10 2017-04-10 Finger motion detection and positioning method based on space-time filtering and joint space Kmeans

Publications (2)

Publication Number Publication Date
CN107180224A CN107180224A (en) 2017-09-19
CN107180224B true CN107180224B (en) 2020-06-19

Family

ID=59830915

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710231824.3A Expired - Fee Related CN107180224B (en) 2017-04-10 2017-04-10 Finger motion detection and positioning method based on space-time filtering and joint space Kmeans

Country Status (1)

Country Link
CN (1) CN107180224B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
SG11201909139TA (en) 2017-12-22 2019-10-30 Beijing Sensetime Technology Development Co Ltd Methods and apparatuses for recognizing dynamic gesture, and control methods and apparatuses using gesture interaction
CN109960980B (en) * 2017-12-22 2022-03-15 北京市商汤科技开发有限公司 Dynamic gesture recognition method and device
CN109063781B (en) * 2018-08-14 2021-12-03 浙江理工大学 Design method of fuzzy image fabric imitating natural color function and form
CN109451634B (en) * 2018-10-19 2020-11-03 厦门理工学院 Gesture-based electric lamp control method and intelligent electric lamp system thereof
CN111105398A (en) * 2019-12-19 2020-05-05 昆明能讯科技有限责任公司 Transmission line component crack detection method based on visible light image data
WO2022052941A1 (en) * 2020-09-09 2022-03-17 桂林智神信息技术股份有限公司 Intelligent identification method and system for giving assistance with piano teaching, and intelligent piano training method and system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102368290A (en) * 2011-09-02 2012-03-07 华南理工大学 Hand gesture identification method based on finger advanced characteristic
CN105335711A (en) * 2015-10-22 2016-02-17 华南理工大学 Fingertip detection method in complex environment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102368290A (en) * 2011-09-02 2012-03-07 华南理工大学 Hand gesture identification method based on finger advanced characteristic
CN105335711A (en) * 2015-10-22 2016-02-17 华南理工大学 Fingertip detection method in complex environment

Also Published As

Publication number Publication date
CN107180224A (en) 2017-09-19

Similar Documents

Publication Publication Date Title
CN107180224B (en) Finger motion detection and positioning method based on space-time filtering and joint space Kmeans
Lee et al. Simultaneous traffic sign detection and boundary estimation using convolutional neural network
Oh et al. Fast video object segmentation by reference-guided mask propagation
CN111241931B (en) Aerial unmanned aerial vehicle target identification and tracking method based on YOLOv3
Porikli et al. Human body tracking by adaptive background models and mean-shift analysis
US11893789B2 (en) Deep neural network pose estimation system
Xu et al. Learning-based shadow recognition and removal from monochromatic natural images
CN108154118A (en) A kind of target detection system and method based on adaptive combined filter with multistage detection
Lim et al. Block-based histogram of optical flow for isolated sign language recognition
KR100735549B1 (en) Method and apparatus for conversion of skin color of image
KR20070016849A (en) Method and apparatus for serving prefer color conversion of skin color applying face detection and skin area detection
CN110399888B (en) Weiqi judging system based on MLP neural network and computer vision
CN111062974A (en) Method and system for extracting foreground target by removing ghost
CN111931654A (en) Intelligent monitoring method, system and device for personnel tracking
CN105513053A (en) Background modeling method for video analysis
Bekele et al. The deeper, the better: Analysis of person attributes recognition
CN115870980A (en) Vision-based piano playing robot control method and device
Vainstein et al. Modeling video activity with dynamic phrases and its application to action recognition in tennis videos
Kerdvibulvech et al. Vision-based detection of guitar players' fingertips without markers
CN113361329B (en) Robust single-target tracking method based on example feature perception
Katircioglu et al. Self-supervised segmentation via background inpainting
Kovalenko et al. Real-time hand tracking and gesture recognition using semantic-probabilistic network
Zhang et al. An optical flow based moving objects detection algorithm for the UAV
Lee et al. Efficient Face Detection and Tracking with extended camshift and haar-like features
CN115063724A (en) Fruit tree ridge identification method and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200619