CN103677274B

CN103677274B - A kind of interaction method and system based on active vision

Info

Publication number: CN103677274B
Application number: CN201310724304.8A
Authority: CN
Inventors: 沈三明
Original assignee: Vtron Technologies Ltd
Current assignee: Vtron Group Co Ltd
Priority date: 2013-12-24
Filing date: 2013-12-24
Publication date: 2016-08-24
Anticipated expiration: 2033-12-24
Also published as: CN103677274A

Abstract

The invention discloses a kind of interaction method and system based on active vision, its method is that projection arrangement will need interactive image projection to arbitrary plane, and use the operation information of the filming apparatus collection user interactive picture to being projected out, the information of shooting is transferred to processing means be analyzed processing simultaneously, realizes man-machine interaction with the operation obtaining user corresponding.The processing means of the present invention uses the mode of off-line training so that detection efficiency is improved.When training data, it is proposed that a kind of new training mechanism, decrease substantial amounts of manual operation.When detection, have employed the model of real-time update, the colour of skin of the most current complexion model only several frames in front is relevant, so can effectively remove the illumination impact on the colour of skin, and effectively improve efficiency of algorithm, fully achieve real-time requirement.

Description

A kind of interaction method and system based on active vision

Technical field

The present invention relates to technical field of computer vision, more particularly, to a kind of interaction based on active vision Projecting method and system.

Then the picture to required interaction performs corresponding user operation, and by projection device to projection Region is watched for user；

Background technology

Although the multimodal human-computer interaction technology of integrated use vision, audition, sense of touch, olfactory sensation, the sense of taste etc. is more come Be applied manyly, but, both hands as action important in virtual reality system and perception relational model, It still plays the most replaceable effect in virtual reality system.At present, touch system is as a kind of up-to-date electricity Brain input equipment, it is man-machine interaction mode the simplest, convenient, a kind of.It give many matchmakers Body, with brand-new looks, is extremely attractive brand-new multimedia interactive equipment.Along with the progress of science and technology, projection The use of instrument is more and more easier, and any one plane can be become a display screen by it, and projector widely should For training conference, classroom instruction, cinema etc..

At present, camera and projection have gradually entered into the life of ordinary citizen, and carry out hands by projection and camera Automatically the identification of gesture becomes current study hotspot, by the automatic identification of gesture, has reached more preferable man-machine friendship Mutually so that the use of projection is convenient.The projection interactive system being currently based on vision is substantially based on auxiliary Light projection or based on finger tip location, but not strong in current existing scheme mostly real-time, and robustness is the highest.

Patent 200910190517.0 discloses a kind of projection interactive method based on finger, and hands is passed through in this invention The information such as CF in video file, extract the profile of finger, and record the movement locus of finger, Then by the movement locus of finger and the instruction contrast in the instruction database pre-defined, affiliated operation rail is judged Which kind of operational order mark belongs to, and reaches the purpose of man-machine interaction.Patent 200910197516.9 discloses one Finger identification method in interactive demonstration system, for interactive demonstration system based on video camera projection In determined the operation behavior of user by the identification of finger.This invention is based on image processing techniques, according to User paint operation time finger gesture feature, with its geometric space positional information for mainly identifying information, to taking the photograph As the image of head shooting is analyzed and processes, thus identify finger.In interactive projection system, in order to right Gesture is identified, and first has to segmentation from picture and obtains hand region.And at present about the segmentation of hand region An always difficult point.In projection interactive system, due to the irradiation of projector light, the arm of people may Presenting different colors, be also possible to comprise staff in projected picture, this brings difficulty all to segmentation arm.

Summary of the invention

For overcoming at least one defect (not enough) described in above-mentioned prior art, present invention firstly provides a kind of knowledge Other efficiency and the high interaction method based on active vision of accuracy.

A further object of the present invention is to propose a kind of interactive projection system based on active vision.

To achieve these goals, technical scheme is as follows:

A kind of interaction method based on active vision, projection arrangement will need interactive image projection to arbitrarily Plane, and use the operation information of the filming apparatus collection user interactive picture to being projected out, simultaneously by shooting Information is transferred to processing means and is analyzed processing, and realizes man-machine interaction with the operation obtaining user corresponding,

The process of described processing means Treatment Analysis information is: a selected color space to illumination-insensitive, Use Bayes' assessment to extract hand region, and this hand region be tracked and the location of fingertip location, Judge whether finger contacts with projection screen by range finding again, to determine whether to carry out man-machine interaction；

Using Bayes' assessment to extract hand region, its concrete mode is:

By off-line training acquisition probability P (s), P (c), P (c | s) and P (s | c), wherein P (s) represents training During the prior probability of the colour of skin, P (c) represents the prior probability of each color during training, P (c | s) table Show the prior probability that pixel is the colour of skin that pixel value is c, and P (s | c) represent that pixel value is c's after training Pixel is the probability of the colour of skin；

P (s | c)=P (c | s) P (s)/P (c) (1)

Hysteresis threshold T is obtained during off-line training_max, T_max=P (s | c)；It is to obtain according to probability distribution graph , namely probability distribution is interval；

Adaptive method is used to carry out Face Detection to extract hand region, according to the colour of skin district of nearest w frame identification Territory judges the area of skin color point detected at present, and the prior probability of nearest w frame is P_w(s), P_w(c), P_w(c | s), Wherein P_wS () represents the prior probability of the colour of skin, P in nearest w frame_wC () represents each color in nearest w frame Prior probability, P_w(c | s) represent that in nearest w frame, pixel value is the prior probability that pixel is the colour of skin of c, separately examines The probability P that survey is area of skin color ' (s | c) be:

P'(s | and c)=γ P (s | c)+(1-γ) P_w(s|c) (2)

Wherein P_w(s | c) obtained by formula (1), i.e. P_w(s | c)=P_w(c|s)P_w(s)/P_wC (), γ is one Control coefrficient, this control coefrficient is relevant with the training set of training stage；

Work as P'(s | c) ＞ T_max, detect whether the most at present into area of skin color, so far to extract hand region, otherwise It is not belonging to area of skin color.

Having a lot about hand region extracting method at present, the most frequently used having is based on hand color, based on hand Shape and the method demarcated based on color and physics.The color space that Face Detection is used at present mainly has The most several, RGB, normalized RGB, HSV, YCrCb, YUV etc..In the method, carry out During Face Detection, use color space to illumination-insensitive, thus so that the detection robustness of the colour of skin The highest.After a given color space, the way that the simplest judgement colour of skin which color is made up of is right Selected space adds a restrictive condition, i.e. afterwards threshold value T_max.This hysteresis threshold T_maxDrawn by experience, As at given a series of area of skin color image, so can be obtained by the distribution of area of skin color.

This method uses Bayesian learning method based on off-line when extracting hand region, uses non-pattra leaves subsequently This framework is tracked.With existing algorithm is compared, this method has the advantage that (1) passes through off-line The mode of study obtains the distribution of the colour of skin, and so can be greatly increased detection is speed.(2) have employed in real time The complexion model of change, i.e. this complexion model only several frames in front are relevant, therefore, although do not have the colour of skin mould of complexity Type, under the light conditions even in change, also can robust, effectively identify area of skin color.(3) side The efficiency of algorithm that method is used is the highest, can reach real-time process.

A kind of system being applied to described interaction method based on active vision, including for needing mutually Dynamic image projection is to the projection arrangement of arbitrary plane, for gathering the operation of user's interactive picture to being projected out The filming apparatus of information and carry out the processing means of information process analysis,

Using Bayes' assessment to extract hand region, its concrete mode is:

P (s | c)=P (c | s) P (s)/P (c) (1)

Hysteresis threshold T is obtained during off-line training_max, T_max=P (s | c)；

P'(s | and c)=γ P (s | c)+(1-γ) P_w(s|c) (2)

Compared with prior art, technical solution of the present invention provides the benefit that: the present invention uses the side of off-line training Formula so that detection efficiency is improved.When training data, it is proposed that a kind of new training mechanism, decrease Substantial amounts of manual operation.When detection, have employed the model of real-time update, the most current complexion model only front The colour of skin of several frames is relevant, so can effectively remove the illumination impact on the colour of skin.And the calculation that this patent proposes Method efficiency is the highest, fully achieves real-time requirement.

Accompanying drawing explanation

Fig. 1 is the interactive projection system schematic diagram of the present invention.

Fig. 2 is the flow chart of processing means of the present invention analyzing and processing information.

Fig. 3 is for using triangulation measuring principle figure.

Detailed description of the invention

Accompanying drawing being merely cited for property explanation, it is impossible to be interpreted as the restriction to this patent；

In order to the present embodiment is more preferably described, some parts of accompanying drawing have omission, zoom in or out, and do not represent reality The size of border product；

To those skilled in the art, in accompanying drawing, some known features and explanation thereof may be omitted is to manage Solve.

It is illustrated in figure 1 the optical projection system schematic diagram of the present invention, including projector, two cameras, calculating equipment. The main effect of projector is exactly that picture is thrown into arbitrary plane.The effect of camera is the picture that shooting projects away Face, is then communicated in calculating equipment.The data of collected by camera are mainly analyzed by calculating equipment, its point The flow chart that analysis processes is as shown in Figure 2.

First, it is from complex background, extract staff.Extracting staff from complex background is exactly from entire image The middle staff part by correspondence extracts, and it relates to the segmentation of image and asks staff area judging two Topic.Segmentation image typically belongs to be the feature extraction of low level, mainly make use of the geological information of staff, color Information and movable information.Wherein, geological information includes the shape of staff, profile etc.；Movable information refers to The movement locus of staff.The most accurate location fingertip location that is extracted as in staff region is laid a good foundation, generally The methods such as grey relevant dynamic matrix, edge detection operator method, calculus of finite differences can be used to realize.In the present invention, in order to Remove the impact that projection ray irradiates, use method based on Bayesian Estimation when arm extracts.Finding hands After the position in portion, start to keep the tracking to hand region, simultaneously need to position finger tip accurately.And about The localization method of finger tip also has a lot, has special marking method, edge analysis, Hough circle transformation etc..

Finally, need calculating camera to the distance of finger, judge whether finger contacts with projection screen.This is The main method using triangulation of system, i.e. based on binocular vision three-dimensional rebuilding method calculates this distance.

In the present embodiment, the method for " Bayesian Estimation " is used to carry out skin cluster:

Detection method about hand region has a lot at present, and the most frequently used having is based on hand color, based on hands Portion's shape and the method demarcated based on color and physics.It is to illumination that current algorithm exists maximum problem The most sensitive, secondly it is exactly the efficiency of algorithm.The color space that at present Face Detection is used mainly have with Under several, RGB, normalized RGB, HSV, YCrCb, YUV etc..When carrying out Face Detection, The color space to illumination-insensitive can be paid the utmost attention to, thus so that the detection robustness of the colour of skin compares High.After a given color space, the way that the simplest judgement colour of skin which color is made up of is to selected Space add restrictive condition, i.e. a hysteresis threshold.This hysteresis threshold is drawn by experience, as given one The area of skin color image of series, so can be obtained by the distribution of area of skin color.The present embodiment uses based on off-line Bayesian learning method detect area of skin color, use non-Bayesian frame to be tracked subsequently.With existing Algorithm compare, the present invention has an advantages below: (1) obtain by the way of off-line learning the colour of skin point Cloth, so can be greatly increased detection is speed.(2) have employed the complexion model of real-time change, i.e. this skin The color model only several frames in front are relevant, therefore, although do not have the complexion model of complexity, even in the illumination of change In the case of, also can robust, effectively identify area of skin color.(3) efficiency of algorithm of the present invention The highest, real-time process can be reached.

1.1 Face Detection

Face Detection mainly includes following components: (a) estimates certain pixel and belong to the probability of the colour of skin； B () obtains hysteresis threshold T according to probability distribution graph_max；Face Detection have employed bayes method, mainly wraps Include the off-line training process of iteration and adaptive detection process.

A. training and testing mechanism

Firstly the need of the given a series of picture including area of skin color, and need manual choosing in picture Going out flesh tone portion, employing color space is YUV4:2:2.The when that but the present embodiment specifically being trained and identifies, Do not use Y-component, mainly have following two reason: (1) Y-component is relevant with the brightness of pixel, Therefore remove Y-component, can effectively reduce the illumination impact on detection.(2) after removing Y-component, phase For YUV, the dimension of image reduces, so the efficiency of whole process can be greatly promoted.

Assuming that a pixel is that (x, y), its pixel value is that (x, y), training process mainly calculates following several c=c to I Individual value: prior probability P (s) of (1) colour of skin；(2) prior probability that each color occurs in training data P(c)；(3) pixel value is the prior probability P that pixel is the colour of skin (c | s) of c；Through training after, pixel The probability P that pixel is the colour of skin (s | c) that value is c can be obtained by bayesian criterion:

P (s | c)=P (c | s) P (s)/P (c) (1)

I.e. can determine that hysteresis threshold T_max, T_max=P (s | c), in being embodied as, when a certain pixel belongs to skin The probability of color is more than hysteresis threshold T_max, then this point is the colour of skin.

A. the off-line training after simplifying

Training mainly completes in the case of off-line, asks so it does not interferes with online detection efficiency etc. Topic.But, the training data that will obtain abundance is a job the most time-consuming.In order to solve this problem, The present embodiment uses a kind of adaptive training process.First, it is trained with one group of little data set, Then carry out the identification of area of skin color by substantial amounts of data and hysteresis threshold, and real-time renewal priori is general Rate P (s), P (c) and P (c | s).The most continuous update after threshold value by the area of skin color in picture and Non-area of skin color is separately.If classifier training obtains the result of mistake, then need manpower intervention to correct this Mistake, but the method still can complete the work required for major part.If it is intended to more accurate result, can To input more training data, if the result of training has reached demand, can deconditioning immediately.

B. the self-adapting detecting colour of skin

Even if having employed UV model, in the case of illumination is continually changing, still can obtain some mistakes Recognition result.In order to solve this problem, need according to the area of skin color of former frame identifications judge current Area of skin color point.Therefore, this patent have employed two groups of prior probabilitys: i.e. off-line training prior probability P (s), P (c) With P (c | s).The prior probability P of w frame recently_w(s), P_w(c), P_w(c|s).The priori of the most nearest w frame Probability can reflect current colour of skin state, and also can preferably adapt to current light conditions.The colour of skin is then Defined by following formula:

P'(s | and c)=γ P (s | c)+(1-γ) P_w(s|c) (2)

Wherein P_w(s | c) and P_w(s | c) all can be obtained by (1) formula, the priori of the most whole training set is general Rate and the prior probability of nearest w frame.γ is then a control coefrficient, and this control coefrficient is with the training of detection-phase Collect relevant.Work as P'(s | c) ＞ T_max, detect whether the most at present into area of skin color, so far to extract hand region, Otherwise it is not belonging to area of skin color.

1.2 colours of skin are followed the tracks of

Owing to being tracked just for hand region, i.e. monotrack, the present embodiment have employed Camshift Algorithm, what this algorithm was the most highly developed at present applies in motion tracking and image segmentation, and it is by video All frame numbers make meanShift computing, and meanShift algorithm is the gradient ascent algorithm of a variable step, and will The size of the result of previous frame, i.e. search window and center, as at the beginning of next frame meanshift algorithm search window Initial value.With this, iteration continues, so that it may realizes the tracking to target.Algorithmic procedure is as follows:

(1) search window is initialized；(2) color probability distribution of search window is calculated；(3) meanShift is run Algorithm, it is thus achieved that the size and location of search window；(4) in next frame video image, the initial of step (3) is used Value reinitializes the size and location of search window, then jumps to step (2) and proceed.

1.3 find finger tip

After taking out arm, task below is intended to find finger tip position in the picture, the method finding finger tip Have a variety of, such as the approximation K curvature by calculating profile, take extreme value according to K curvature at finger tip point and obtain finger Point position. the present invention is by following steps searching finger tip:

1) find out largest contours and fill this profile, thus having obtained not having noisy arm foreground image；

2) convex closure of profile is calculated；

3) calculate arm position of centre of gravity, convex closure is found out several candidate points of maximum curvature；

4) using candidate point farthest for distance position of centre of gravity as finger tip.

1.4 range finding

After finding fingertip location, need the distance obtaining finger tip to camera.According to solid geometry principle, first The demarcation of projector and video camera is carried out during the work to be done.Accurately and simple calibration process be projector- Video camera constitutes active vision system and carries out the key point of three-dimensional measurement, and calibrating camera method becomes the most very much Ripe, the most commonly used is the scaling method of Zhang Zhengyou.Calibrating camera has only to use a plane chessboard Lattice.As long as finding the projection just can be in the hope of as the corresponding relation between the two-dimensional points of three-dimensional coordinate point and projection picture Solve inner parameter and the external parameter of projector.

Following steps are used to carry out labeling projection instrument in the present embodiment:

S1. calibrating camera；

S2. prepare a blank, blank posts a papery gridiron pattern；

S3. control projector projects to go out a gridiron pattern and be radiated on blank；

Extract two tessellated angle points the most respectively；

S5. blank place plane is calculated according to papery X-comers；

S6. the video camera demarcated is used to calculate the three-dimensional coordinate of projection X-comers；

The original image of the projection S7. combining angle point three-dimensional coordinate point and projector calculates outside the internal participation of projector Portion's parameter；

The inside that can obtain projector through above step participates in external parameter.Just triangulation can be utilized Method calculates the finger tip distance to camera, as shown in Figure 3.Wherein P is observation station, O_l、O_rFor camera, T Being the distance between two cameras, Z is two cameras vertical dimensions to observation station, and f is the focal length of camera, x^r And x^lRepresent observation station horizontal coordinate in the picture respectively.Represent two camera shootings respectively The horizontal coordinate of picture centre.

Utilize similar triangles can be easy to derive Z value, as shown in Figure 3:

\frac{T - (x^{l} - x^{r})}{Z - f} = \frac{T}{Z}

&DoubleRightArrow; Z = \frac{f T}{x^{l} - x^{r}} - - - (4)

Utilize the principle of triangulation, it is also possible to calculate the distance of camera distance projection screen easily, thus The distance of finger distance screen can be extrapolated, if the distance of finger distance screen is less than a certain specific threshold, It is considered as click event to occur.By finger tip position in screen, and geometric calibration before, permissible At cursor of mouse location to finger tip, and analog mouse clicks on event.Realize interpersonal alternately, thus reach Any one projection plane is become the purpose of a touch screen.

The present invention uses the method for " Bayesian Estimation " to separate arm, and motion tracking system is joined interaction In optical projection system so that the efficiency of algorithm is improved.One new data training mechanism is proposed again；And During Face Detection, real-time renewal complexion model so that robustness and the efficiency of algorithm are all greatly improved.Simultaneously The present invention is by the checking of test of many times, and result shows that this system processing speed is fast, reliable and stable.

Obviously, the above embodiment of the present invention is only for clearly demonstrating example of the present invention, and not It it is the restriction to embodiments of the present invention.For those of ordinary skill in the field, in described above On the basis of can also make other changes in different forms.Here without also cannot be to all of enforcement Mode gives exhaustive.All any amendment, equivalent and improvement made within the spirit and principles in the present invention Deng, within should be included in the protection domain of the claims in the present invention.

Claims

1. an interaction method based on active vision, projection arrangement arrives needing interactive image projection Arbitrary plane, and use the operation information of the filming apparatus collection user interactive picture to being projected out, will clap simultaneously The information taken the photograph is transferred to processing means and is analyzed processing, and realizes man-machine interaction with the operation obtaining user corresponding,

It is characterized in that, the process of described processing means Treatment Analysis information is: select one to illumination-insensitive Color space, use Bayes' assessment to extract hand region, and this hand region be tracked and finger tip The location of position, then judge whether finger contacts with projection screen by range finding, man-machine to determine whether to carry out Alternately；

Using Bayes' assessment to extract hand region, its concrete mode is:

P (s | c)=P (c | s) P (s)/P (c) (1)

P'(s | and c)=γ P (s | c)+(1-γ) P_w(s|c) (2)

Interaction method based on active vision the most according to claim 1, described color space is adopted Use YUV4:2:2.

Interaction method based on active vision the most according to claim 2, described YUV4:2:2 Color space does not use Y-component.

Interaction method based on active vision the most according to claim 1, described off-line training Detailed process is: selects one group of a small amount of data to be trained, then uses mass data and hysteresis threshold to enter The identification of row area of skin color, and real-time renewal prior probability P (s), P (c) and P (c | s), the most constantly By the hysteresis threshold after updating, the area of skin color in picture and non-area of skin color are separated.

Interaction method based on active vision the most according to claim 1, described to hand region The implementation being tracked is: use Camshift algorithm, will the result of previous frame, i.e. search window Size and center, as the initial value of next frame meanshift algorithm search window；With this, iteration continues, it is achieved Tracking to target.

The most according to claim 5, interaction method based on active vision, described enter hand region The specific implementation of line trace is:

1) search window is initialized；

2) color probability distribution of search window is calculated；

3) meanShift algorithm is run, it is thus achieved that the size and location of search window；

4) by step 3 in next frame video image) initial value reinitialize size and the position of search window Put, then jump to step 2) proceed.

Interaction method based on active vision the most according to claim 1, finger tip position, described location The mode put is:

A) find out largest contours and fill this profile, to have obtained not having noisy arm foreground image；

B) convex closure of profile is calculated；

C) calculate arm position of centre of gravity, convex closure is found out several candidate points of maximum curvature；

D) using candidate point farthest for distance position of centre of gravity as finger tip.

Interaction method based on active vision the most according to claim 1, described range finding is to measure Finger tip to the distance of camera, concrete mode is: labeling projection instrument and video camera, the wherein scaling method of projector For:

S1. calibrating camera；

S2. prepare a blank, blank posts a papery gridiron pattern；

Extract two tessellated angle points the most respectively；

S5. blank place plane is calculated according to papery X-comers；

S7. the original image combining the three-dimensional coordinate point of angle point and the projection of projector calculates the internal participation of projector External parameter；

S8. the method utilizing triangulation calculates finger tip distance Z to camera；When distance Z is less than a certain specific Threshold value, then it is assumed that have click event to occur.

9. the interaction method based on active vision being applied to described in any one of claim 1 to 8 System, including for the interactive image projection projection arrangement to arbitrary plane will be needed, is used for gathering user To the filming apparatus of the operation information of the interactive picture being projected out and the processing means that carries out information process analysis,

Using Bayes' assessment to extract hand region, its concrete mode is:

P (s | c)=P (c | s) P (s)/P (c) (1)

P'(s | and c)=γ P (s | c)+(1-γ) P_w(s|c) (2)

System the most according to claim 9, it is characterised in that described filming apparatus is two cameras.