CN102096471A

CN102096471A - Human-computer interaction method based on machine vision

Info

Publication number: CN102096471A
Application number: CN 201110040613
Authority: CN
Inventors: 骆威; 肖平; 孙敬飞
Original assignee: Vtron Technologies Ltd
Current assignee: Funing Science And Technology Pioneer Park Co ltd; Guangdong Gaohang Intellectual Property Operation Co ltd
Priority date: 2011-02-18
Filing date: 2011-02-18
Publication date: 2011-06-15
Anticipated expiration: 2031-02-18
Also published as: CN102096471B

Abstract

The invention relates to the technical field related to machine vision, particularly to a human-computer interaction method based on machine vision, comprising the following steps of: an image obtaining step for continuously acquiring image data through an image acquisition device to update data cache; an image denoising step for reading the data from the data cache, filtering the read data and removing random noise introduced into the image; a feature object extraction step for detecting a feature object in the denoised image, and marking the detected feature object with distinct mark; and a step of gesture identification step for analyzing the motion trail of the feature object extracted by the feature object extraction step, and executing corresponding machine instructions according to the result of analysis on the motion trail. The human-computer interaction method based on the machine vision provides a complete system solution, so that human can interact with the machine in real time, application limit of the prior touch technology is solved, and the human-machine interaction can be more natural.

Description

A kind of man-machine interaction method based on machine vision

Technical field

The present invention relates to the machine vision correlative technology field, particularly a kind of man-machine interaction method based on machine vision.

Background technology

Along with the development of human-computer interaction technology, the multiple point touching technology has become the application focus.APPLE company is introduced into the multiple point touching technology among the IPAD of its new release, and large-sized electronic whiteboard has also been realized the support to the multiple point touching technology.Along with people's daily life is stepped in the mouse epoch in the back, mutual naturality problem seems particularly important, interaction technique also experiencing gradually by epoch alternately to the more natural interaction era development that meets ergonomics.

APPLE company studies show that to ergonomics man-machine interaction should not be carried out on vertical screen.From the operation of reality is summed up, similar conclusion is also arranged, its reason mainly contained two: one, for a long time arm was lifted operation, and tangible sense of fatigue is arranged; The 2nd, finger and screen Long contact time can feel obviously that friction force increases, and causes discomfort.Based on the consideration of these aspects, be necessary to study a kind of more natural interactive mode that meets ergonomics.

In addition, there is following application limitations in present multiple point touching technology: promptly the operator must stand in the interactive device front, and has tangible object to contact with interactive device, could normally carry out mutual.

Summary of the invention

The invention provides a kind of man-machine interaction method, meet the more natural man-machine interaction mode that ergonomics requires more to overcome application limitations and the exploitation that present multiple point touching technology exists based on machine vision.

The present invention adopts following technical scheme to realize:

A kind of man-machine interaction method based on machine vision, described method comprises:

By the image acquisition step of the continuous acquisition of image data of image collecting device with the renewal metadata cache;

Reading of data from metadata cache is carried out filtering, removes the image denoising step of the random noise of being introduced in the image;

Detected characteristics object in the image that carries out denoising through the image denoising step, and with its feature object extraction step that identifies out with the visible marking;

Analyze the movement locus of the feature object that extracts through feature object extraction step, carry out the gesture identification step of corresponding machine instruction according to the movement locus analysis result.

As a kind of preferred version, described image is a coloured image.

As a kind of preferred version, the concrete steps of described image denoising are as follows:

The red color component value of being had a few in the image, green component values and blue component value are carried out following steps:

Coordinate to each some P is , the pixel value of this point is expressed as , red color component value, green component values and blue component value that it represents a P respectively then are expressed as the filtering that P is ordered:

，

In the formula

Represent P point red component, green component and the filtered value of blue component respectively, N is predefined greater than 1 natural number.

As a kind of preferred version in, the concrete steps of feature object extraction step are as follows:

Image after the denoising is set up background model establishment step based on the background model of the colourity of each pixel and saturation degree;

After obtaining background model, the feature object detection step of the feature object in the detected image;

To carry out the characteristic body body tag step of mark through the detected feature object of feature object detection step.

As further preferred version:

The concrete steps of described background model establishment step are as follows:

At first the image after the denoising is converted to the HSV space from rgb space;

The tone and the saturation degree component sum of each pixel in the M two field picture before the cumulative calculation current frame image, and ask the tone average of each pixel respectively

Figure 2011100406134100002DEST_PATH_IMAGE005

With the saturation degree average

, wherein M is the natural number greater than 1;

The tone and the saturation degree component sum of each pixel in the M frame difference image before the cumulative calculation current frame image, and ask the tone average of its difference image respectively

Figure 2011100406134100002DEST_PATH_IMAGE007

Saturation degree average with difference image

Described difference image i.e. the difference of front and back two two field picture corresponding pixel points.

Each pixel is set up with separately With

Be the center, fluctuation range is respectively a doubly up and down With b times Statistical model, then:

The background model tone span of each pixel is:

The background model saturation degree span of each pixel is:

Figure 2011100406134100002DEST_PATH_IMAGE011

Wherein a is the natural number greater than 1, and b is the natural number greater than 1;

The concrete steps of described feature object detection step are:

Current frame image is converted to the HSV space from rgb space, and calculates the tone and the saturation degree component of each pixel;

Each pixel in the current frame image is judged:

Whether its tone component is in the background model tone span of corresponding pixel points;

Whether its saturation degree component is in the background model saturation degree span of corresponding pixel points;

If the tone component of this pixel not in the background model tone span of corresponding pixel points and the saturation degree component not in the background model saturation degree span of corresponding pixel points, then this pixel is judged to the feature object, otherwise is judged to background pixel;

The concrete steps of described characteristic body body tag step are as follows:

Tone is divided between the tone zone of at least one, and be the hue mark of setting unique correspondence between each tone zone, each the pixel tone that is judged as the feature object is judged, according to distribute between the tone zone that tone fell into of each pixel with this tone zone between corresponding hue mark, the pixel that is judged as the background area is labeled as context marker.

As preferred version further:

In the described characteristic body body tag step, each pixel tone of current frame image is carried out normalized in [0,180] scope, tone is equally divided between 3 tone zones, promptly between first tone zone, scope is [0,60], between second tone zone, scope is [60,120], the three color scheme interval, scope is [120,180].

As preferred version further, described gesture identification step comprises:

The Track Initiation step specifically comprises:

Calculating has the barycentric coordinates of feature object in continuous K two field picture of each pixel formation of same hue mark, if the distance in the K frame between the barycentric coordinates of feature object is less than given initiation threshold, then write down the barycentric coordinates of the feature object of current frame image, as the origin coordinates of feature object of which movement;

The track following step specifically comprises:

If have distance between the barycentric coordinates of feature object of same hue mark in the consecutive frame, then the barycentric coordinates of the feature object that has the same hue mark in the consecutive frame are carried out association less than given correlation threshold;

The track identification step specifically comprises:

According to the relation execution corresponding operating between the position of analyzing feature object in the current frame image and the characteristic body body position that initialization obtains;

The track end step specifically comprises:

If current frame image satisfies the track termination condition, then be judged as track and finish.

As further preferred version again, described track termination condition is for satisfying:

In the present frame in characteristic body body tag number and the previous frame characteristic body body tag number inconsistent; Perhaps

Centroidal distance between the feature object that has the same hue mark in arbitrary feature object and the previous frame is arranged greater than given correlation threshold in the present frame; Perhaps

No feature object in the present frame.

Described reference numerals just is meant the feature object number that has different hue mark in the frame of being investigated.

Beneficial effect of the present invention:

At present, the multiple point touching technology is general day by day, but mutual naturality still remains further to be improved.Based on studies show that of ergonomics, should not vertically touch alternately on the screen, its reason mainly contained two: one, for a long time arm was lifted operation, and tangible sense of fatigue is arranged; The 2nd, finger and screen Long contact time can feel obviously that friction force increases, and causes discomfort.In addition, there is following application limitations in the multiple point touching technology at present: promptly the operator must stand in the interactive device front, and has tangible object to contact with interactive device, could normally carry out mutual.For overcoming the more natural man-machine interactive system that this application limitations and exploitation meet the ergonomics requirement more, the invention discloses a kind of contactless gesture identification method and system based on machine vision, this system is by the image capture device acquisition of image data, adopt zone leveling filtering to remove the random noise of introducing because of surround lighting, set up the background model under the static background, thereby can easyly accurately extract the feature object in the image sequence, for making the subsequent characteristics object tracking more stable, introduced characteristic body body tag method, and then initial method and the track following restrictive condition of operation beginning have been provided, carry out corresponding recognizer based on the gained track, provided three kinds of method of discrimination of EO at last.The present invention has provided complete system scheme, can be in real time and machine carry out having overcome the application limitations of present touching technique, and having made man-machine interaction more natural alternately.

Description of drawings

Fig. 1 is system's executive system process flow diagram;

Fig. 2 is a system hardware device synoptic diagram;

Fig. 3 is an image denoising filtering mask;

Fig. 4 carries out flow process for embodiment.

Embodiment

Below embodiments of the invention are described in detail, be to be noted that described embodiment is intended to be convenient to the understanding of the present invention, does not play the qualification effect to the present invention.

As shown in Figure 1, technical scheme of the present invention is as follows:

Step 1: Image Acquisition: to upgrade metadata cache, the image that is obtained is a coloured image, is used to judge user's operating gesture by the continuous acquisition of image data of image collecting device;

Step 2: image denoising: reading of data from metadata cache, carry out filtering, remove the random noise of being introduced in the image;

For eliminating the influence of surround lighting to images acquired, the present invention adopts zone levelling that original image is carried out filtering, and noise is to the influence of feature Object Extraction under the elimination different light model.Adopt mask shown in Figure 3 that entire image is carried out filtering, the coordinate of establishing certain some P in the image is

, the pixel value of this point is expressed as , red color component value, green component values and blue component value that it represents a P respectively then are expressed as follows the filtering that P is ordered:

In the formula

Represent P point red component, green component and the filtered value of blue component respectively.Can obviously reduce the random noise that illumination model is introduced through filtered image.

Step 3: feature Object Extraction: detected characteristics object in the image of denoising, and it is identified out with the visible marking, these step branch following steps are finished:

Step 3.1: set up background model;

For the characteristic body body can correctly be extracted, the present invention at first sets up background model, and the background model establishment step is as follows:

Step1: the image after the denoising is converted to the HSV space from rgb space;

Step2: the H(tone that calculates each pixel in 100 two field pictures before the accumulative total) with the S(saturation degree) component sum, and ask the average of each pixel respectively

With

Step3: the H and the S component sum of each pixel in the preceding 100 frame difference images of cumulative calculation, and ask its average respectively

With

Step4: each pixel is set up with separately

With

Be the center, fluctuation range is respectively 5 times up and down

With 4 times

Statistical model, then the span of each pixel H and S is respectively in the background model:

Figure 2011100406134100002DEST_PATH_IMAGE013

,

Step 3.2: feature object detection;

After obtaining background model, need motion characteristics object in the detected image, because background model of the present invention is based on the colourity (H) of each pixel and saturation degree (S), so the present invention need be before comprising in the image of scenery, detect the colourity and the saturation degree scope of each pixel, calculate its value whether in background model respective pixel point in the given scope.This operation divides following two steps to realize:

Step1: present image is converted to the HSV space from rgb space, and calculates the H and the S value of each pixel;

Step2: each pixel in the present image is judged its H and S value whether in background model in the span of corresponding pixel points, if the H of this pixel and S value not within given range, then are judged to the feature object with this pixel; Otherwise be judged to background pixel.

Step 3.3: characteristic body body tag;

For making follow-up track following more accurate, the present invention carries out following operation to image to be detected:

1, the feature object that extracts carries out mark, and labeling method is as follows:

Each pixel tone of feature object is judged, if the span of tone H be [0,60) then pixel is labeled as " 1 "; [60,120) then be labeled as " 2 "; [120,180) then be labeled as " 3 ".Tone relatively before, the tone span has been normalized to the interval [0,180); Be described below with mathematical formulae:

2, the pixel that is judged as the background area is labeled as " 0 ".

Step 4: gesture identification: the analytical characteristic movement locus of object, carry out corresponding machine instruction according to the trajectory analysis result; This step divides following four steps to finish:

Step 4.1: Track Initiationization;

Calculate and have non-" 0 " and the identical characteristic body body weight heart coordinate of mark in continuous three frames, if the distance between the characteristic body body weight heart coordinate then writes down this position, as the reference position of feature object of which movement less than given threshold value in three frames.

Step 4.2: track following;

Because only need analyze the track of motion,, and represent corresponding feature object with barycentric coordinates so the present invention at first non-to having " 0 " and the identical pixel of mark calculate its barycentric coordinates.

Secondly, each feature object may present different tones, thus multiple mark might appear, can be correctly related between the consecutive frame for making various being marked at, need satisfy following two conditions:

Condition 1: the focus point that has same tag in the only related consecutive frame;

Condition 2: the distance that has in the consecutive frame between the focus point of same tag needs less than given threshold value;

Step 4.3: track identification;

According to the relation execution corresponding operating between the position of analyzing feature object in the present frame and the characteristic body body position that initialization obtains, corresponding operating defines by mathematic(al) representation, so only need in operation to obtain corresponding results and get final product with calculating in each mathematical expression of position coordinates substitution that obtains.

Step 4.4: track finishes.

Defined three kinds of tracks among the present invention and finished determination methods, be respectively:

1, in the present frame in characteristic body body tag number and the previous frame characteristic body body tag number inconsistent;

2, distance between the feature object that has same tag in arbitrary feature object and the previous frame is arranged greater than given threshold value in the present frame;

3, no feature object in the present frame.

Described reference numerals just is meant the feature object number that has different hue mark in the frame of being investigated.The feature object that for example has mark in the former frame has 4, and the feature object that has mark in the present frame has only 2, and it be to be through with that two tracks are so obviously arranged.

Specific embodiment as shown in Figure 2, the hardware that present embodiment needs is image capture device and computing machine.Image capture device is by USB interface and compunication, and acquisition frame rate is 30fps, and resolution is 640 * 480pixels, and computing machine ceaselessly obtains video sequence in the image capture device by USB interface, and then each two field picture is carried out corresponding the processing.

For making present embodiment clear understandable, the feature object that the present invention adopts is the pen of two different colours, and computing machine is by analyzing the image sequence that collecting device collected, the movement locus of judging characteristic object, and then the execution corresponding application, its specific implementation is as follows:

System initialization:

Step 1: adopt zone levelling to carry out filtering to each two field picture that is obtained, remove random noise;

The image capture device frame per second is 30fps in the present embodiment, and image resolution ratio is 640 * 480.

Step 2: initialization system, the image that utilizes preceding 100 frames not comprise the feature object is set up background model;

The fundamental purpose of system initialization is to set up background model, so that can extract the feature object more exactly, so when setting up background model, should not comprise the feature object in the frame of video.Adopt preceding 100 frames to set up the static background model in the present embodiment.Certainly can determine to set up the needed frame number of background model voluntarily according to the site environment problem, but, should not be lower than 100 frames in order to guarantee the reliability of background model.

Program is carried out:

Step 3: obtain a new two field picture, and adopt zone levelling to carry out filtering, remove random noise to current frame image;

Step 4: detect in the present frame whether comprise the feature object,, then carry out mark if detect the feature object, and execution in step 5, otherwise circulation execution in step 4 comprises the feature object in detecting current video frame;

The feature object is the pen of two different colors in the present embodiment, for not comprising the pseudo-characteristic object of introducing because of human hand movement or other object of which movement in the feature object that makes extraction, added tone fluctuation range restriction in the present embodiment at two different colours pens, so that the pen of different colours can be labeled as " 1 " and " 2 " respectively by the method among the present invention, other zones all are labeled as background area " 0 ";

Step 5: the initialization gesture motion, if the initialization success, execution in step 6, otherwise execution in step 3;

The concrete initialization operation of present embodiment is the continuous barycentric coordinates that have same tag feature object between two consecutive frames of calculating, and less than threshold value 50, then initialization is finished as if two center of gravity coordinate Euclidean distances, otherwise needs from new initialization.

Step 6: analyze the track that gesture motion collected, the executed in real time corresponding program;

Three kinds of simple gestures have been defined in the present embodiment, the position of the feature object during with the gesture initialization of the position by analyzing feature object in the present frame concerns, carry out corresponding application, application programming in the present embodiment is for to operate a sub-picture, and three kinds of simple gestures are defined as in the present embodiment: image zoom, image rotation and image translation.Wherein image zoom needs two feature objects with rotation, and image translation only needs a feature object to get final product.

Step 7: judge whether gesture motion finishes, if finish then execution in step 3, otherwise execution in step 6;

Three kinds of methods among the present invention are adopted in the judgement that gesture finishes in the present embodiment, and at second point that track end in the invention is judged, ultimate range is set to 200 in the present embodiment.If Euclidean distance is greater than 200 between arbitrary feature object with same tag in adjacent two frames in front and back, then track finishes.

Present embodiment has been obtained the excellent real-time effect by system testing.What need statement once more is that present embodiment only is to an explanation of the present invention, and content of the present invention is not had any restriction.

Claims

1. the man-machine interaction method based on machine vision is characterized in that, described method comprises:

Detected characteristics object in denoising image, and with its feature object extraction step that identifies out with the visible marking;

2. exchange method according to claim 1 is characterized in that, described image is a coloured image.

3. exchange method according to claim 1 is characterized in that, the concrete steps of described image denoising are as follows:

Coordinate to each some P is

Figure 2011100406134100001DEST_PATH_IMAGE001

, the pixel value of this point is expressed as

, red color component value, green component values and blue component value that it represents a P respectively then are expressed as the filtering that P is ordered: