CN107180224A

CN107180224A - Finger motion detection and localization method based on spatio-temporal filtering and joint space Kmeans

Info

Publication number: CN107180224A
Application number: CN201710231824.3A
Authority: CN
Inventors: 韦岗; 梁舒; 马碧云; 李增
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2017-04-10
Filing date: 2017-04-10
Publication date: 2017-09-19
Anticipated expiration: 2037-04-10
Also published as: CN107180224B

Abstract

The present invention discloses finger motion detection and localization method based on spatio-temporal filtering and joint space Kmeans, first by ten kinds of different colours on player's finger plaster（It is except black, ultrawhite）Label, shoot player's finger and play the video of keyboard instrument；Then to the frame of video of input, finger motion target is detected with the method for spatio-temporal filtering, airspace filter result feedback guidance dynamic background is updated；Finger motion target positioning is carried out with joint space Kmeans, with reference to R, G, B statistics with histogram characteristic adaptive determining clusters number and initialization class center, so as to realize the fingering identification, writing function that low computation complexity, fast convergence rate, positional accuracy are high, real-time performance is good.

Description

Finger motion detection and localization method based on spatio-temporal filtering and joint space Kmeans

Technical field

The present invention relates to the technical fields such as vision monitoring, Digital Image Processing, and in particular to based on spatio-temporal filtering and joint Space Kmeans finger motion detection and localization method.

Background technology

The piano correct fingering of (or other keyboard instruments) player is most important for flexible performance and annotation music. Good fingering can embody understanding and annotation of the player to composer's style and features, works content, at the same can be sparing of one's energy, when Between, improve and play efficiency.Though playing fingering has universal law, the flexibility of different song fingering is practiced to the fingering of beginner Practise and difficulty is added to the fingering imitation of virtuoso.Manual record fingering not only needs higher musicianship, consumes simultaneously When effort.Therefore, recognize that fingering turns into the inexorable trend of fingering the research and learning with realizing machine automated intelligent.

Wherein, the key of fingering identification is the combination of moving object detection and moving target positioning.

Conventional moving target detecting method includes：Background modeling method, frame difference method and optical flow method.

1) background modeling method：It is assumed that the static scene without intrusion object has some normal attributes, and use statistical model Weighted sum be mixed together to simulation background model.Once known background model, intrusion object just can be by marking scene graph The part that this background model is not met as in is detected.Conventional background modeling method includes:Single Gauss model, mixed Gaussian Model, Density Estimator etc..Though these methods can obtain more accurate motion target area, amount of calculation is relatively large, Speed is partially slow, changes sensitive to illumination variation, background.

2) frame difference method：The moving region in image is extracted by the time difference of adjacent interframe.Although frame difference method computing Speed, preferably, but when finger motion is slow, moving target pixel is sufficiently close to stability between two frames Intersection can not detect.

3) optical flow method：The optical flow characteristic changed over time using moving target carries out motion detection, although do not need background Modeling, in the case where any information of scene can not be obtained ahead of time, can also detect independent Moving Objects.But it calculates multiple It is miscellaneous, it is necessary to special hardware unit, it is difficult to requirement of real-time be met, while moving boundaries, motion block, do more physical exercises and (wherein wrap Include transparent, translucent motion) the problems such as be also optical flow method bottleneck.

Moving target localization method is typically based on rim detection simultaneously.Rim detection represents to replace with accurate objective contour Simplified location information, but rim detection is when fingering is complicated or label of two and above finger has lap, edge Detection can lose bulk information, or even two moving targets are judged as into one.And can not classify because rim detection can only be positioned The profile that finger can not be with detecting is caused correctly to match.Rim detection is larger by background influence simultaneously, without filtering function, makes an uproar Sound point can also be detected and disturb the positioning of finger.

Therefore for playing under fingering identification application scenarios, various the asking of above-mentioned moving object detection and localization method presence Topic, for example：The missing inspection of low-speed motion target, be difficult to classification cause finger can not with positioning correctly match, noise jamming seriously, this hair Finger motion detection and localization method of the bright proposition based on spatio-temporal filtering and joint space Kmeans, by analyzing player's finger The video of piano (or other keyboard instruments) plays fingering identification to realize.Air filter when moving object detection of the present invention is used The method of ripple, the influence that illumination variation and background can be overcome to change, is prevented effectively from the missing inspection of low-speed motion target；Moving target Position using joint space Kmeans methods can make full use of image statistical property carry out self-adaptive decision, improve positioning and The accuracy of cluster.

The content of the invention

This method aims to overcome that existing moving object detection and localization method are applied to play fingering identification scene Deficiency, propose based on time-space domain filter and joint space Kmeans finger motion detect and localization method.

In order to reach object above, the finger motion inspection of the present invention based on spatio-temporal filtering and joint space Kmeans Survey and positioned three modules by labelling and shooting video, moving object detection, moving target with localization method and constituted.

Above-mentioned labelling and shooting video module are used for the video file for generating subsequent module for processing, first by player's hand Refer to the label for sticking ten kinds of different colours (except black, ultrawhite), then simultaneously clap playing procedure in the normal piano of player Take the photograph into video.

Above-mentioned moving object detection module is used for moving object detection, using the method for spatio-temporal filtering.Input is regarded first Frequency frame carries out airspace filter, obtains accurate motion target area.Then airspace filter result feedback guidance time domain band logical is filtered Ripple result and temporal low-pass filter result carry out spatial domain restructuring in prospect (finger motion) position and background position and complete the dynamic back of the body Scape updates, the influence that illumination variation, DE Camera Shake and background can be overcome to change, and is prevented effectively from fortune during finger low-speed motion Moving-target missing inspection.Finger motion object detection results are transformed into YCrCb, HSV space (color from rgb space (color space) Space) bandpass filtering is carried out, remove the colour of skin and shade, and by prospect threshold decision, extract label.

Target motion detection to implement step as shown in Figure 2.

Step 1：Airspace filter, comprises the following steps：

1.1 searching moving target areas.Current input video frame is put to be compared pixel-by-pixel with background image progress spatial domain to be come Searching moving target area.

1.2 determine prospect and background.Motion target area is set to the pixel of current video input frame relevant position, background The pixel in region is set to white (in rgb space, being in vain (255,255,255)).

1.3 feedback prospects and background.Prospect (motion target area) and background are fed back to the context update for next frame.

Step 2：Dynamic background updates, and comprises the following steps：

2.1 airspace filter results are fed back.Last airspace filter result feedback guidance dynamic background is updated.Judge current Whether input video frame is the 2nd two field picture.If present incoming frame is the 2nd frame, background does not update, directly using the first two field picture as Background；If non-2nd frame of present incoming frame, next step operation is carried out.

2.2 spatial domains are recombinated.Time domain bandpass filtering result and temporal low-pass filter result in prospect (finger motion) position and Background position carries out spatial domain restructuring and completes context update.

Step 3：Label is extracted, is comprised the following steps：

3.1 remove the colour of skin.Rgb space is changed to YCrCb spaces, judges coordinate (Cr, Cb) whether in colour of skin elliptical modes In type.If certain pixel is in colour of skin model of ellipse, will the pixel be set to it is white.

3.2 remove shade.Rgb space is changed to HSV space, bandpass filtering is carried out to V component histogram.

3.3 judge label.In HSV space, the prospect average threshold of S components is calculated, by S components in the moving target of extraction Less than prospect saturation degree average threshold pixel be set to it is white.

Above-mentioned moving target locating module is used to position moving target, using joint space Kmeans methods.Joint space Kmeans can not only be positioned, and can be classified, so as to realize that different fingers are matched with the correct of labeling, be prevented effectively from The Wrong localization that colour superimposition, fingering complexity and noise spot interference are caused.First determine whether tri- histogram of component low pass filtereds of R, G, B Crest after ripple, adaptive determining clusters number K size makes classification more accurate and intelligence.Then histogrammic system is utilized Characteristic self-adaptive initialization cluster is counted, can avoid being absorbed in the situation of local optimum, accelerates the speed of iteration convergence, algorithm is improved Efficiency and the degree of accuracy.Carry out color space (R, G, B) and geometric space (x, y) joint 5 ties up Kmeans and can make full use of phase The priori being closely located to colored pixels point, improves the accuracy of cluster and positioning.And carry out cluster centre random Disturbance and simulated annealing, algorithm stability is improved while avoiding being absorbed in local optimum as far as possible.Finally cluster result is entered Row classification and positioning, determine relevant position of each frame picture finger on keyboard, so as to obtain the fingering of player.

Target motion positions to implement step as shown in Figure 3.

Step 1：The adaptive Kmeans of joint space, comprises the following steps：

1.1 statistics R, G, B property of the histogram.Tri- histogram of component of moving object detection result R, G, B are subjected to low pass Filtering, adaptive judgement histogram crest.

1.2 adaptive determining clusters number K.The maximum crest number of R, G, B histogram is taken to be used as joint space Kmeans Clusters number.

1.3 self-adaption clusters are initialized.Cluster centre is initialized using R, G, B histogram crest location.

1.4 iteration are until convergence.Following operation is repeated, until convergence：(a) K Ge Leilei centers are calculated respectively.Kth (1 ≤ k≤K) Lei Lei centers be 5 dimension observation (R, G, B, x, y) vectors in kth class mean vector.(b) distribution will each be observed (defined into the class where closest class center with Euclidean distance " nearest ").

Step 2：Random perturbation and simulated annealing, comprise the following steps：

2.1 calculate 5 dimension disturbance radiuses of each class.Take each class class centre-to-centre spacing such somewhat farthest distance as disturbing Dynamic radius r_K(five n dimensional vector ns, K is clusters number).

2.2 random perturbation.Take the random number random between -1~1₀, class center is subjected to r_K*random₀Disturbance.Will Result after class hub disturbances re-starts the adaptive Kmeans of joint space as new initialization class center.Calculate newly Object function and current goal function difference Δ J=J'-J.If Δ J ＜ 0, receive new explanation as current solution, and update disturbance Radius.The object function is the object function in Kmeans.

2.3 simulated annealing.The random number for participating in disturbance is modified to random₀*a^-t, wherein a is annealing speed, a>1, t For annealing times, proceed 2.1 and 2.2 operation.

Step 3：Fingering is recognized, is comprised the following steps：

3.1 moving targets are positioned.By the coordinate of the adaptive Kmeans cluster centres of joint space, each frame of video is determined Relevant position of the finger on keyboard is so as to obtain fingering.

3.2 fingering are exported.The fingering of each frame of video is uniformly stored in csv and learns and grinds for follow-up fingering Study carefully.

Compared with prior art, the invention has the advantages that and technique effect：

1) present invention updates airspace filter result feedback guidance dynamic background in moving object detection, makes the back of the body of renewal Scape can overcome illumination to become closest to the background of airspace filter input video frame compared to conventional Detection for Moving Target The influence that change, DE Camera Shake and background change, effectively prevent moving target when degeneration and the finger low-speed motion of background Missing inspection, beneficial to the detection and extraction of moving target.

2) method that the present invention uses airspace filter in moving object detection, is entered by input video frame with background image Row spatial domain is put relatively to determine moving target pixel-by-pixel, can obtain more smart compared to conventional Detection for Moving Target True motion target area.

3) present invention uses the adaptive Kmeans methods of joint space in moving target positioning, will position and adaptive poly- Class is combined, so as to realize that different fingers are matched with the correct of labeling, is prevented effectively from colour superimposition, fingering complexity and noise The Wrong localization that point interference is caused.Tieing up Kmeans using color space (R, G, B) and geometric space (x, y) joint 5 can be abundant The priori being closely located to using same color pixel, improves the accuracy of cluster and positioning.

4) present invention is in the joint space Kmeans methods that moving target is positioned, according to tri- histogram of component of R, G, B LPF postwave peak number purpose maximum, adaptive determining clusters number K size makes classification more accurate and intelligence.Profit Clustered with histogrammic statistical property self-adaptive initialization, can avoid being absorbed in the situation of local optimum, accelerate iteration convergence Speed, improves efficiency and the degree of accuracy of algorithm.

To sum up, the present invention can overcome existing moving object detection and localization method to be applied to play fingering identification scene Deficiency, with changing to illumination and background, low insensitive, computation complexity, fast convergence rate, positional accuracy be high, real-time performance Good the advantages of, while being suitably subject to transformation can be widely used for gesture identification and other fields.

Brief description of the drawings

Fig. 1 is the finger motion detection of the present invention based on spatio-temporal filtering and joint space Kmeans and localization method Overview flow chart；

Fig. 2 is the flow chart of moving object detection module of the present invention；

Fig. 3 is the flow chart of moving target locating module of the present invention.

Embodiment

The present invention by the label of ten kinds of different colours on player's finger plaster (except black, ultrawhite), shoots player's hand first Refer to the video of piano.

Then, above-mentioned video is taken exercises object detection process.First to the frame of video of input, with the method for spatio-temporal filtering Finger motion region is determined, label is extracted.Airspace filter carries out spatial domain point pixel-by-pixel by input video frame and background image Compare to detect moving target, so as to obtain more accurate motion target area.Then by airspace filter result feedback guidance Dynamic background updates, and makes the background of renewal closest to the background of airspace filter input video frame, effectively prevent the degeneration of background, Beneficial to the detection and extraction of moving target.In YCrCb spaces, projection approximation of the skin information on CrCb two dimensional surfaces is into ellipse Distribution, by judging coordinate (Cr, Cb) whether in colour of skin model of ellipse, to remove the skin pixel in finger motion target Point.Inevitably hatched pixel in moving target recognition, in HSV space, lightness V represents bright journey Degree, more dark then V is smaller.Shade is for the other parts of finger motion target, and lightness is minimum, by V component Nogata The bandpass filtering of figure is that can remove shade.S represents the saturation degree of color, and color is deep and gorgeous, then saturation degree is higher.Label relative to For the other parts of finger motion target, saturation degree is maximum, and i.e. extractable label is judged by prospect saturation degree average threshold.

Finally, above-mentioned video is taken exercises localization process.Finger motion target classification is carried out using joint space Kmeans With positioning so that realize fingering identification, writing function.Kmeans in space centered on K point to be clustered, in The object categorization of the heart.By the method for iteration, each cluster centre is gradually updated, error can be reduced constantly, when error is constant Time converges on optimal solution.In moving target positioning, optimal solution is i.e. near 10 kinds of label colors R, G, B.Therefore R, G, B are used Crest after three histogram of component LPFs carries out self-adaptive initial to cluster, can allow Kmeans initialization center Closer to optimal solution, so as to accelerate the speed of iteration convergence, the efficiency of algorithm is improved, the cluster with random initializtion Kmeans can Locally optimal solution can be obtained rather than total optimization solution is different, self-adaptive initialization can avoid being absorbed in the situation of local optimum.Together When color space (R, G, B) and geometric space (x, y) joint 5 tie up Kmeans same color pixel position can be made full use of to connect Near priori, improves the accuracy of cluster and positioning.Simulated annealing Kmean algorithms are a kind of heuristic iterative algorithms, tool There is asymptotic Convergence Property, its verified convergence with probability 1 is in globally optimal solution in theory.Therefore cluster centre is disturbed at random Dynamic and simulated annealing, algorithm stability is improved while avoiding being absorbed in local optimum.

The methods such as machine learning and Digital Signal Processing are organically combined together by the present invention, based on spatio-temporal filtering and connection Space Kmeans methods are closed, the detection and positioning of finger motion is realized.With reference to specific implementation step and accompanying drawing to the present invention Explanation is described in further detail, but the implementation of the present invention is not limited to this.

Fig. 1 for the present invention a kind of embodiment, it is main include labelling and shoot video, moving object detection, Moving target positions three modules.The present invention first by the label of ten kinds of different colours on player's finger plaster (except black, ultrawhite), Shoot the video of player's finger piano.Then to the frame of video of input, finger motion is detected with the method for spatio-temporal filtering Target area, extracts label, and finger motion target classification and positioning are carried out with joint space Kmeans, so as to realize that fingering is known Not, writing function.

Above-mentioned moving object detection module is used for moving object detection, using the method for spatio-temporal filtering.Input is regarded first Frequency frame carries out airspace filter, obtains accurate motion target area.Then airspace filter result feedback guidance time domain band logical is filtered Ripple result and temporal low-pass filter result carry out spatial domain restructuring in prospect (finger motion) position and background position and complete the dynamic back of the body Scape updates, the influence that illumination variation, DE Camera Shake and background can be overcome to change, and is prevented effectively from fortune during finger low-speed motion Moving-target missing inspection.Finger motion object detection results are transformed into YCrCb, HSV space from rgb space and carry out bandpass filtering, are gone Fall the colour of skin and shade, and by prospect threshold decision, extract label.

Target motion detection to implement step as shown in Figure 2.

Step 1：Airspace filter, comprises the following steps：

1.2 determine prospect and background.Motion target area is set to the pixel of current video input frame relevant position, background The pixel in region is set to white.

Step 2：Dynamic background updates, and comprises the following steps：

Step 3：Label is extracted, is comprised the following steps：

Target motion positions to implement step as shown in Figure 3.

Step 1：The adaptive Kmeans of joint space, comprises the following steps：

1.2 adaptive determining clusters number K.The crest number for taking R, G, B histogram maximum is used as joint space Kmeans clusters number.

1.4 iteration are until convergence.Following operation is repeated, until convergence：(a) K Ge Leilei centers are calculated respectively.Kth (1 ≤ k≤K) Lei Lei centers be 5 dimension observation (R, G, B, x, y) vectors in kth class mean vector.(b) distribution will each be observed Into the class where closest class center.

2.2 random perturbation.Take the random number random between -1~1₀, class center is subjected to r_K*random₀Disturbance.Will Result after class hub disturbances re-starts the adaptive Kmeans of joint space as new initialization class center.Calculate newly Object function and current goal function difference Δ J=J'-J.If Δ J ＜ 0, receive new explanation as current solution, and update disturbance Radius.

2.3 simulated annealing.The random number for participating in disturbance is modified to random₀*a^-t, wherein a is annealing speed, a>1, t For setting annealing times, the operation of progress 2.1 and 2.2.

Step 3：Fingering is recognized, is comprised the following steps：

As above it can preferably realize the present invention and obtain aforementioned invention effect, present example is through suitably being transformed It can be widely used for gesture identification and other fields.

Claims

1. finger motion detection and localization method based on spatio-temporal filtering and joint space Kmeans, it is characterised in that including：It is first First by the label of ten kinds of different colours in addition to black and white on player's finger plaster, shoot player's finger and play regarding for keyboard instrument Frequently；Then to the frame of video of input, finger motion region is determined with the method for spatio-temporal filtering, and airspace filter result feedback is referred to Dynamic background renewal is led, finger motion object detection results are transformed into YCrCb, HSV space from rgb space carries out band logical filter Ripple, removes the colour of skin and shade, by prospect threshold decision, extracts label；Finally finger motion is carried out with joint space Kmeans Target is positioned, and crest number after adaptive judgement R, G, B histogram LPF determines clusters number K, with reference to histogram Statistical property self-adaptive initial cluster centre, and by cluster result random perturbation and simulated annealing, thus realize fingering identification, Writing function.

2. the finger motion based on spatio-temporal filtering and joint space Kmeans is detected and localization method according to claim 1, It is characterized in that being realized by labelling and shooting video module, moving object detection module and moving target locating module；

Labelling and the shooting video module is used for the video file for generating subsequent module for processing：First by player's finger plaster The label of upper except black, ten kinds of ultrawhite different colours, then simultaneously shoots into playing procedure in the normal piano of player Video；

The moving object detection module is used for moving object detection, using the method for spatio-temporal filtering：First to input video frame Airspace filter is carried out, accurate motion target area is obtained；Then by airspace filter result feedback guidance time domain bandpass filtering knot Fruit and temporal low-pass filter result are that finger motion position and background position carry out spatial domain restructuring and complete dynamic background more in prospect Newly, the influence for overcoming illumination variation, DE Camera Shake and background to change, it is to avoid moving target missing inspection during finger low-speed motion； Finger motion object detection results are transformed into YCrCb, HSV space from rgb space and carry out bandpass filtering, remove the colour of skin and the moon Shadow, and by prospect threshold decision, extract label；

The moving target locating module is used to position moving target, using joint space Kmeans：First determine whether R, G, B tri- Crest after histogram of component LPF, adaptive determining clusters number K size makes classification more accurate and intelligence；So Clustered afterwards using histogrammic statistical property self-adaptive initialization；Carry out color space (R, G, B) and geometric space (x, y) joint 5 dimension Kmeans make full use of the priori that same color pixel is closely located to, and improve the accuracy of cluster and positioning；And will Cluster centre carries out random perturbation and simulated annealing, and algorithm stability is improved while avoiding being absorbed in local optimum；It is finally right Cluster result is classified and positioned, and relevant position of each frame picture finger on keyboard is determined, so as to obtain player's Fingering.

3. finger motion detection and positioning side according to claim 1 based on spatio-temporal filtering and joint space Kmeans Method, it is characterised in that the step that implements of the target motion detection block includes：

Step 1, airspace filter, comprise the following steps：

1.1 searching moving target areas：Current input video frame is put pixel-by-pixel with background image progress spatial domain to be compared to search for Motion target area；

1.2 determine prospect and background：Motion target area is set to the pixel of current video input frame relevant position, background region Pixel be set to it is white, in rgb space, in vain be (255,255,255)；

1.3 feedback prospects and background：Prospect is motion target area and background feeds back context update for next frame；

Step 2, dynamic background update, and comprise the following steps：

2.1 airspace filter results are fed back：Last airspace filter result feedback guidance dynamic background is updated, current input is judged Whether frame of video is the 2nd two field picture, if present incoming frame is the 2nd frame, background does not update, and is directly used as the back of the body using the first two field picture Scape；If non-2nd frame of present incoming frame, next step operation is carried out；

2.2 spatial domains are recombinated：Time domain bandpass filtering result and temporal low-pass filter result are finger motion position and background in prospect Position carries out spatial domain restructuring and completes context update；

Step 3：Label is extracted, is comprised the following steps：

3.1 remove the colour of skin：Rgb space is changed to YCrCb spaces, coordinate (Cr, Cb) is judged whether in colour of skin model of ellipse, If certain pixel is in colour of skin model of ellipse, will the pixel be set to it is white；

3.2 remove shade：Rgb space is changed to HSV space, bandpass filtering is carried out to V component histogram；

3.3 judge label：In HSV space, the prospect average threshold of S components is calculated, S components in the moving target of extraction are less than The pixel of prospect saturation degree average threshold is set to white.

4. finger motion detection and positioning side according to claim 1 based on spatio-temporal filtering and joint space Kmeans Method, it is characterised in that the step that implements of the target motion positions includes：

The adaptive Kmeans of step 1, joint space, comprises the following steps：

1.1 statistics R, G, B property of the histogram：Tri- histogram of component of moving object detection result R, G, B are subjected to LPF, Adaptive judgement histogram crest；

1.2 adaptive determining clusters number K：The maximum crest number of R, G, B histogram is taken to be used as the poly- of joint space Kmeans Class number；

1.3 self-adaption clusters are initialized：Cluster centre is initialized using R, G, B histogram crest location；

1.4 iteration are until convergence：Following operation is repeated, until convergence：(a) K Ge Leilei centers are calculated respectively；The class of kth class Center is the mean vector of 5 dimension observation (R, G, B, x, y) vectors in kth class, 1≤k≤K；(b) each observation is assigned to distance In class where nearest class center；It is described nearest to be determined using Euclidean distance；

2.1 calculate 5 dimension disturbance radiuses of each class：Take each class class centre-to-centre spacing such somewhat farthest distance as disturbance Radius r_K, r_KFor five n dimensional vector ns, K is clusters number；

2.2 random perturbation：Take the random number random between -1~1₀, class center is subjected to r_K*random₀Disturbance；By in class Result after heart disturbance re-starts the adaptive Kmeans of joint space as new initialization class center, calculates new target Function J ' and current goal function J difference DELTA J=J'-J；If Δ J ＜ 0, receive new explanation as current solution, and update disturbance Radius, into step 3；Otherwise step 2.3 is entered；

2.3 simulated annealing：The random number for participating in disturbance is modified to random₀*a^-t, wherein a is annealing speed, a>1, t is to move back Fiery number of times, proceed 2.1 and 2.2 operation；

Step 3：Fingering is recognized, is comprised the following steps：

3.1 moving targets are positioned：By the coordinate of the adaptive Kmeans cluster centres of joint space, each frame of video finger is determined Relevant position on keyboard is so as to obtain fingering；

3.2 fingering are exported：The fingering of each frame of video is uniformly stored in csv files and learns and grinds for follow-up fingering Study carefully.