A kind of view-based access control model follows the tracks of the man-machine interaction method with gesture identification
Technical field
The invention belongs to Artificial technical field of intelligence, follow the tracks of the man-machine interaction method with gesture identification more particularly, to a kind of view-based access control model.
Background technology
The progress of technology makes to become closer to the mutual of natural way alternately between people and computer, " natural interaction " that namely people vigorously advocate.This man-machine interaction mode easily of touching technique has been pushed to numerous fields, except being applied in portable personal digital product, it is also widely used in the fields such as information household appliances, public information, electronic game, office automation devices and industrial equipment.Utilize touching technique, user to have only to use gesture the word on touching screen gently or that icon can be realized as with computer is mutual so that mutual more intuitively convenient between people and machine.
And existing touching technique must flow through human contact's screen thus the process of finishing man-machine interaction.The touching technique of this contact cannot realize the natural interaction with screen when people is away from screen, it is necessary to controls screen by devices such as remote controllers, it is impossible to providing good man-machine interaction experience, therefore above-mentioned touching technique exists limitation in artificial intelligence application.Visual Tracking utilizes the change of vision to replace staff motion on the touchscreen, still can arbitrary region on positioning screen when making people away from screen.Use touch screen by the motion of eyes, decrease many steps, accelerate development and the realization of intelligent man-machine interaction focusing on people.The application of the aspects such as current this technology is also only limitted to eye tracker, recognition of face, is not also applied in touching technique field.
Summary of the invention
Disadvantages described above or Improvement requirement for prior art, the invention provides a kind of view-based access control model and follow the tracks of the man-machine interaction method with gesture identification, its object is to, can arbitrarily have on the screen such as computer liquid crystal displayer of screen characteristics, common liquid crystals screen, projecting apparatus screen, giant display and realize visual tracking, and realize the man-machine interaction mode of Untouched control screen.
For achieving the above object, according to one aspect of the present invention, it is provided that a kind of view-based access control model follows the tracks of the man-machine interaction method with gesture identification, comprises the following steps:
(1) by infrared light supply, it is used for carrying out the varifocal high definition photographic head of visual tracking and multiple high definition photographic head for carrying out gesture identification is arranged on screen frame place;
(2) varifocal high definition camera collection facial image, and the facial image gathered is carried out face contour extraction;
(3) pixel coordinate (u of calculation procedure (2) obtains facial contour middle left and right pupil centereL,veL) and (ueR,veR);
(4) projection matrix Mel and the Mer of left and right pupil is calculated according to the pixel coordinate of left and right pupil center in facial contour and the coordinate at four angles of screen;
(5) projection matrix Mel and the Mer of the left and right pupil obtained by step (4) and the center pixel coordinate figure of left and right pupil calculate left and right pupil physical coordinates value on screen, and this region corresponding to physical coordinates value is the region that user performs gesture operation:
Wherein (Xer, Yer) represents right pupil physical coordinates value on screen, and (Xel, Yel) represents left pupil physical coordinates value on screen;
(6) according to the principle of binocular vision, the screen being placed with high definition photographic head is carried out parameter calibration, to obtain projection matrix Ml and the Mr of left and right high definition photographic head respectively;
(7) high definition camera collection user gesture touches the image of screen, and the image gathered is carried out pretreatment, to obtain the gesture of the user imager coordinate (u on left high definition photographic head1F,v1F) and imager coordinate (u on right high definition photographic head2F,v2F);
(8) according to the gesture operation of user imager coordinate (u on left high definition photographic head1F,v1F) and imager coordinate (u on right high definition photographic head2F,v2F) and the projection matrix Mr of the projection matrix Ml of left high definition photographic head and right high definition photographic head, and obtain the gesture of user three dimensional space coordinate (x on screen by below equationf,yf,zf), wherein this gesture operation is in the region that the physical coordinates value obtained in above-mentioned steps (5) is corresponding:
(9) the coordinate z that step (8) obtains is judgedfWhether less than threshold values γ, if zfLess than γ, it can be determined that user's gesture generation click action, by USB interface by the three dimensional space coordinate (x of finger tipf,yf,zf) output, else process terminates.
Preferably, step (2) includes following sub-step:
(2-1) utilize varifocal high definition camera collection facial image, and with mask method to gather facial image denoising;
(2-2) utilize Sobel operator that the pixel on facial image does gradient conversion, to obtain facial contour.
Preferably, step (3) is specifically, using the left and right pixel coordinate value that Sobel operator obtains left pupil in the facial contour that step (2) obtains is uLeL、uHeL, the pixel coordinate value up and down of left pupil is vLeL、vHeL, the center pixel coordinate figure (u of left pupileL,veL) it is (uLeL+uHeL/2,vLeL+vHeL/ 2), the center pixel coordinate figure (u of right pupileR,veR) it is (uLeR+uHeR/2,vLeR+vHeR/ 2), wherein uLeR、uHeRFor the left and right pixel coordinate value of right pupil, vLeR、vHeRPixel coordinate value up and down for right pupil.
Preferably, step (6) is specifically, utilize Zhang Zhengyou to demarcate and screen is demarcated, to obtain demarcation thing pixel coordinate on the high definition photographic head of left and right, each demarcation thing pixel coordinate (u on the high definition photographic head of left and right1m,v1m)、(u2m,v2m), wherein m is the number of fixed point, and utilizes below equation to obtain the projection matrix Ml of left high definition photographic head and the projection matrix Mr of right high definition photographic head respectively:
Wherein (xm,ym,zm) for the physical coordinates of circle fixed point.
Preferably, step (7) specifically includes following sub-step:
(7-1) left and right high definition photographic head gathers user's gesture respectively and touches the image of screen, and is subtracted each other by pixel in the image corresponding point of the image collected and initialization frame, to form new image;
(7-2) the new image that step (7-1) is obtained carries out image denoising;
(7-3) utilize Sobel operator that the pixel on image does gradient conversion, to obtain edge detection graph;
(7-4) pixel on left and right high definition photographic head is carried out K curvature differentiation by the edge detection graph obtained according to step (7-3), to obtain the gesture of the user imager coordinate (u on left and right high definition photographic head1F,v1F) and (u2F,v2F)。
It is another aspect of this invention to provide that provide a kind of view-based access control model to follow the tracks of the man-machine interaction method with gesture identification, comprise the following steps:
(1) by infrared light supply, it is used for carrying out the varifocal high definition photographic head of visual tracking and multiple high definition photographic head for carrying out gesture identification is arranged on screen frame place;
(2) varifocal high definition camera collection facial image, and the facial image gathered is carried out face contour extraction;
(3) pixel coordinate (u of calculation procedure (2) obtains facial contour middle left and right pupil centereL,veL) and (ueR,veR);
(4) projection matrix Mel and the Mer of left and right pupil is calculated according to the pixel coordinate of left and right pupil center in facial contour and the coordinate at four angles of screen;
(5) projection matrix Mel and the Mer of the left and right pupil obtained by step (4) and the center pixel coordinate figure of left and right pupil calculate left and right pupil physical coordinates value on screen, and this region corresponding to physical coordinates value is the region that user performs gesture operation:
Wherein (Xer, Yer) represents right pupil physical coordinates value on screen, and (Xel, Yel) represents left pupil physical coordinates value on screen;
(6) according to the principle of binocular vision, the screen being placed with high definition photographic head is carried out parameter calibration, to obtain projection matrix Ml and the Mr of left and right high definition photographic head respectively;
(7) high definition camera collection user gesture touches the image of screen, and the image gathered is carried out pretreatment, to obtain the gesture of the user imager coordinate (u on left high definition photographic head1F,v1F) and imager coordinate (u on right high definition photographic head2F,v2F);
(8) when user slides touch screen, according to the gesture operation of user imager coordinate (u on left high definition photographic head1F,v1F) and imager coordinate (u on right high definition photographic head2F,v2F) and the projection matrix Mr of the projection matrix Ml of left high definition photographic head and right high definition photographic head, and gesture three dimensional space coordinate (x of the first frame finger tip on screen of user is obtained by below equationf1,yf1,zf1), wherein this gesture operation is in the region that the physical coordinates value obtained in above-mentioned steps (5) is corresponding:
(9) step (8) is repeated, to obtain the three dimensional space coordinate (x of follow-up D-1 frame finger tip imagef2,yf2,zf2) ..., (xfD,yfD,zfD), wherein D represents that user slides the frame number of finger tip image collected when touching screen, thus obtaining gesture sliding trace on screen, is exported track by USB interface.
In general, by the contemplated above technical scheme of the present invention compared with prior art, it is possible to obtain following beneficial effect:
(1) present invention achieves and any screen (includes liquid crystal display screen, projecting apparatus screen or other screens etc.) realize that there is visual tracking location and the function of contactless touch;
(2) present invention uses simply, accurate positioning, it is simple to install.
Accompanying drawing explanation
Fig. 1 is the flow chart that view-based access control model of the present invention follows the tracks of the man-machine interaction method with gesture identification.
Fig. 2 is the schematic diagram of face contour detecting of the present invention.
Fig. 3 is the schematic diagram of visual tracking of the present invention.
Fig. 4 is the outline drawing of the device that gesture identification of the present invention uses.
Fig. 5 is the front view of the present invention.
Fig. 6 is the side view of screen of the present invention.
Fig. 7 is that the present invention demarcates thing schematic diagram.
Fig. 8 is that gesture of the present invention touches click schematic diagram.
Fig. 9 is gesture slip schematic diagram of the present invention.
Detailed description of the invention
In order to make the purpose of the present invention, technical scheme and advantage clearly understand, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein is only in order to explain the present invention, is not intended to limit the present invention.As long as just can be mutually combined additionally, technical characteristic involved in each embodiment of invention described below does not constitute conflict each other.
As it is shown in figure 1, a kind of view-based access control model of the present invention is followed the tracks of and the man-machine interaction method of gesture identification, comprise the following steps:
(1) by infrared light supply, it is used for carrying out the varifocal high definition photographic head of visual tracking and multiple high definition photographic head for carrying out gesture identification is arranged on the optional position of screen frame;In the present embodiment, carrying out the varifocal high definition photographic head of visual tracking, its characteristic is 10 times of zooms, and resolution is 720P, and frame per second is 60 frames/second, and angle lens is at 110 °;Infrared light supply selects the infrared light of 800nm-1200nm wavelength;Carrying out the frame per second of the high definition photographic head of gesture identification be 60 frames/second, resolution is 720P, and angle lens is at 110 °;Screen is the screen of arbitrary dimension or form, including liquid crystal display screen, projecting apparatus screen or other screens) photographic head is placed in the optional position, left and right of screen frame, as shown in Figs. 4-6, exemplarily, choose the center installation infrared light source of screen upper side frame, varifocal high definition photographic head and high definition photographic head, in the present embodiment, use two high definition photographic head, and a varifocal high definition photographic head, it should be understood that the photographic head quantity of the present invention is not limited to this.The present invention passes through infrared light supply as secondary light source.
(2) varifocal high definition camera collection facial image, and the facial image gathered is carried out face contour extraction;As in figure 2 it is shown, this step includes following sub-step:
(2-1) utilize varifocal high definition camera collection facial image, and with mask method to gather facial image denoising;Specifically, the mask of a 3*3 is initially set up Assume that on the facial image gathered, the pixel of certain point is aj,k, wherein j and k represents the position of point on image;Then there is aj,k=aj-1,k-1w1+aj-1,kw2+…+aj,kw5+…+aj+1,kw8+aj+1,k+1w9, thus obtaining new pixel aj,k, in the present embodiment,
(2-2) facial image after denoising is carried out rim detection, namely utilize Sobel operator that the pixel on facial image does gradient conversion, to obtain facial contour;Specifically, M sets Sobel operator For transverse gradients operator, For longitudinal gradient operator, facial image is used respectively ShAnd SvCarry out convolution algorithm, with obtain this facial image at two horizontal and vertical on gradient map;
(3) pixel coordinate of left and right pupil center in the facial contour that calculation procedure (2) obtains;As it is shown on figure 3, this step is specifically, the Sobel operator that still calls in the facial contour that step (2) obtains in above-mentioned steps (2-2), the left and right pixel coordinate value obtaining left pupil is uLeL、uHeL, the pixel coordinate value up and down of left pupil is vLeL、vHeL, the therefore center pixel coordinate figure (u of left pupileL,veL) it is (uLeL+uHeL/2,vLeL+vHeL/ 2).In like manner can obtain the center pixel coordinate figure (u of right pupileR,veR) it is (uLeR+uHeR/2,vLeR+vHeR/ 2), wherein uLeR、uHeRFor the left and right pixel coordinate value of right pupil, vLeR、vHeRPixel coordinate value up and down for right pupil.
(4) projection matrix Mel and the Mer of left and right pupil is calculated according to the pixel coordinate of left and right pupil center in facial contour and the coordinate at four angles of screen, as shown in Figure 4, particularly as follows: first, when human eye vision watches the screen upper left corner attentively, (top left co-ordinate is (x to this stepA,yA, 0)), it is possible to trying to achieve left and right pupil center pixel coordinate on high definition photographic head by step (3) is (u1eL,v1eL), (u1eR,v1eR), in like manner try to achieve and watch the screen upper right corner attentively (upper right corner coordinate is (xB,yB, 0)) time, left and right pupil center is at the pixel coordinate respectively (u of high definition photographic head2eL,v2eL), (u2eR,v2eR);(lower left corner coordinate is (x to watch the screen lower left corner attentivelyC,yC, 0)) time, left and right pupil center is at the pixel coordinate respectively (u of high definition photographic head3eL,v3eL), (u3eR,v3eR);(lower right corner coordinate is (x to watch the screen lower right corner attentivelyD,yD, 0)) time, left and right pupil center is at the pixel coordinate respectively (u of high definition photographic head4eL,v4eL), (u4eR,v4eR);
Then, the principle according to binocular vision
Coordinate and top left co-ordinate according to above-mentioned four angles of screen are (xA,yA), upper right corner coordinate is (xB,yB), lower left corner coordinate is (xC,yC), lower right corner coordinate is (xD,yD) bring on the right of above-mentioned equation, the pixel coordinate of the left pupil that four angles of screen are corresponding is (u1eL,v1eL), (u2eL,v2eL), (u3eL,v3eL), (u4eL,v4eL) bring the above-mentioned equation left side, simultaneous solution equation into
Can calculate the projection matrix obtaining left pupil is
The projection matrix that in like manner can try to achieve right pupil is
(5) projection matrix Mel and the Mer of the left and right pupil obtained by step (4) and the center pixel coordinate figure of left and right pupil calculate left and right pupil physical coordinates value on screen, and this region corresponding to physical coordinates value is the region that user performs gesture operation;Specifically, by the principle of following binocular vision
Calculating and obtain left and right pupil physical coordinates value on screen, wherein (Xer, Yer) represents right pupil physical coordinates value on screen;(Xel, Yel) represents left pupil physical coordinates value on screen.When vision invests different screen areas, showing broken box as shown in Figure 3, can complete location and the tracking of vision on screen, the region that the physical coordinates value of this step acquisition is corresponding is exactly the operating area of user's gesture in subsequent step (8).
(6) according to the principle of binocular vision, the screen being placed with high definition photographic head is carried out parameter calibration, to obtain projection matrix Ml and the Mr of left and right high definition photographic head respectively;Specifically, by the such as demarcation thing shown in figure (7), utilize Zhang Zhengyou to demarcate screen is demarcated, to obtain demarcation thing pixel coordinate on the high definition photographic head of left and right, each demarcation thing pixel coordinate (u on the high definition photographic head of left and right1m,v1m)、(u2m,v2m), wherein m is the number of fixed point, as figure (7) be shown with 9, (xm,ym,zm) for scheming the physical coordinates of the circle fixed point shown in (7).And utilize below equation to obtain the projection matrix Ml of left high definition photographic head and the projection matrix Mr of right high definition photographic head respectively:
The final projection matrix obtained is respectively
(7) high definition camera collection user gesture touches the image of screen, the image gathered is carried out pretreatment, including image subtraction, image denoising, edge extracting, the finger tip differentiated based on K curvature or nib image recognition, to obtain the gesture of the user imager coordinate (u on left high definition photographic head1F,v1F) and imager coordinate (u on right high definition photographic head2F,v2F);As shown in Figure 8, this step specifically includes following sub-step:
(7-1) left and right high definition photographic head gathers user's gesture respectively and touches the image of screen, and is subtracted each other by pixel in the image corresponding point of the image collected and initialization frame, to form new image;
(7-2) the new image that step (7-1) is obtained carries out image denoising, and the process of image denoising is identical with above-mentioned steps (2-1), does not repeat them here;
(7-3) image after denoising is carried out rim detection, namely utilize Sobel operator that the pixel on image does gradient conversion, to obtain edge detection graph;The process of rim detection is identical with above-mentioned steps (2-2), does not repeat them here;
(7-4) pixel on left and right high definition photographic head is carried out K curvature differentiation by the edge detection graph obtained according to step (7-3), to obtain the gesture of the user imager coordinate on left and right high definition photographic head;Specifically, being the edge image that can extract gesture according to the edge detection graph obtained in (7-3), each edge coordinate point vector isIt is set to the K point that this is counted to for starting point by the clockwise direction at edgeThe K point counted to counterclockwise is set toThenK vector computing formula beWhen above-mentioned calculating α is more than 0 and more than setting threshold values β (its span as 0.5 to 1 between), then current vectorCorresponding pixel coordinate is the gesture of user imager coordinate (u on left high definition photographic head1F,v1F);The process of right photographic head is same as described above, and the gesture obtaining user is (u at the pixel coordinate of right high definition photographic head2F,v2F);
(8) according to the gesture operation of user imager coordinate (u on left high definition photographic head1F,v1F) and imager coordinate (u on right high definition photographic head2F,v2F) and the projection matrix Mr of the projection matrix Ml of left high definition photographic head and right high definition photographic head, and obtain the gesture of user three dimensional space coordinate (x on screen by below equationf,yf,zf), wherein this gesture operation is in the region that the physical coordinates value obtained in above-mentioned steps (5) is corresponding:
Three dimensional space coordinate (the x that can obtain gesture is solved by above-mentioned two matrix equationf,yf,zf), namely complete three-dimensional imaging and the location of the gesture of user.
It should be noted that in this step, the mode that the gesture operation of user is click on touches screen.
(9) the coordinate z that step (8) obtains is judgedfWhether less than threshold values γ, wherein the span of γ and the length of screen are directly proportional, if zfLess than γ, it can be determined that user's gesture generation click action, by USB interface by the three dimensional space coordinate (x of finger tipf,yf,zf) output, else process terminates;
As shown in Figure 9, when user touches screen in a sliding manner, view-based access control model of the present invention location and the step included by gesture identification man-machine interaction method followed the tracks of are basic essentially identical with above-mentioned click mode, only difference is that above-mentioned steps (9) is replaced by:
Obtain the three dimensional space coordinate (x of continuous D frame finger tipf1,yf1,zf1), (xf2,yf2,zf2) ..., (xfD,yfD,zfD), wherein D represents that user slides the frame number of finger tip image collected when touching screen, and be positive integer, thus obtaining gesture sliding trace on screen, is exported by USB interface by track, thus realizing the identification of gesture slip.
Those skilled in the art will readily understand; the foregoing is only presently preferred embodiments of the present invention; not in order to limit the present invention, all any amendment, equivalent replacement and improvement etc. made within the spirit and principles in the present invention, should be included within protection scope of the present invention.