CN103336575A

CN103336575A - Man-machine interaction intelligent glasses system and interaction method

Info

Publication number: CN103336575A
Application number: CN2013102634399A
Authority: CN
Inventors: 费树培; 谢耀钦
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Shen Tech Advanced Cci Capital Ltd; Suzhou Zhongke Advanced Technology Research Institute Co Ltd
Priority date: 2013-06-27
Filing date: 2013-06-27
Publication date: 2013-10-02
Anticipated expiration: 2033-06-27
Also published as: CN103336575B

Abstract

The invention discloses a man-machine interaction intelligent glasses system. The system comprises a pair of perspective intelligent glasses which can enable the visible light to penetrate through the intelligent glass lenses and overlap image information on the true sight of people; two cameras and an infrared LED (light emitting diode) are arranged on the intelligent glasses; the two cameras are symmetrically arranged at left and right ends of the intelligent glasses in parallel, and the infrared LED is arranged at the central position of the intelligent glasses; and the cameras and the infrared LED of the intelligent glasses form a three-dimensional movement capturing system for capturing movement tracks and coordinates of objects in a certain three-dimensional space range. According to the system, the finger movement information of people is captured and judged, and the man-machine interaction with the intelligent glasses is realized.

Description

A kind of intelligent glasses system and exchange method of man-machine interaction

Technical field

The present invention relates to artificial intelligence field, especially a kind of intelligent glasses system and exchange method of man-machine interaction.

Background technology

At present, man-machine interaction always is one of key problem that various electronic equipments need solve, and has only to allow electronic equipment better response people's control signal, and electronic equipment just can better meet people's user demand.

Intelligent glasses refers to as smart mobile phone, has independently operating system, the program that can be provided by software houses such as user installation software, recreation, can control by voice or action and finish adding schedule, digital map navigation, with good friend's interaction, take pictures and video, launch functions such as video calling with friend, and can realize the general name of such class glasses that wireless network inserts by mobile communication network.

Google Glass, be by Google's a " augmented reality " glasses in April, 2012 issue, it has the function the same with smart mobile phone, can take pictures video calling and distinguish direction and surf the web, handle Word message and Email etc. by sound control.But at present with carry out interactive means with intelligent glasses and mainly rest on the interactive voice scope, other better interactive modes can not allow intelligent glasses response people's command signal.

Though adopt speech recognition technology and intelligent glasses to carry out the problem that man-machine interaction can solve intelligent glasses response people's control signal within the specific limits, also there is the problem that can not solve in it:

Man-machine interaction at present more is to adopt people's body language to send command signal to computing machine, such as sending a signal to computing machine, adopt the click of finger to send signal etc. by health control recreation, touch screen by mouse with finger.The interactive voice technology is because the limitation of its input signal, and just as prior art replenishes, and can not solve the people fully and computing machine carries out mutual problem.

Speech recognition technology can only be ordered by interrupted the sending to computing machine of people's voice command single, can not allow people and computing machine carry out command interaction rapid as mouse, accurately, namely can't solve control signal and import requirement fast and accurately.

In existing electronic equipment, speech recognition technology still can not become the major technique that realizes man-machine interaction, and is replenishing as prior art.Based on same present situation, speech recognition technology can not satisfy the man-machine interaction demand of intelligent glasses, and can only be as an additional technology of carrying out man-machine interaction with intelligent glasses.

Speech recognition technology exists long-time use to make the people produce tired problem easily, because for long signal input, if rely on the people to go in a minute to finish always, so after using for a long time, anyone can feel tired, therefore speech recognition technology can't also be the same for intelligent glasses as main man-machine interaction forwarding method.

But for the Clairvoyant type intelligent glasses, when intelligent glasses comprises a plurality of order button options with one screen is presented in people's the visual field, adopt the button that interactive voice can not be clicked fast and accurately to be needed, because interactive voice needs the people to say order, under such occasion, adopt the mode that is similar to mouse click or finger click just can better meet the man-machine interaction demand.

Based on this, be necessary to change existing design proposal, provide a kind of and can carry out man-machine interaction method fast and accurately with intelligent glasses, purpose of the present invention that Here it is.

Summary of the invention

At the existing deficiency that adopts speech recognition technology and intelligent glasses to carry out man-machine interaction, the embodiment of the invention provides a kind of intelligent glasses system of man-machine interaction, allows intelligent glasses can respond people's control signal, realizes special operation.

For achieving the above object, the technical solution adopted for the present invention to solve the technical problems is that the described intelligent glasses system that carries out man-machine interaction comprises:

Intelligent glasses: but described intelligent glasses is the Clairvoyant type intelligent glasses, can make the described intelligent glasses eyeglass of visible light transmissive, simultaneously image information is superimposed upon on people's the true visual field;

Camera and infrared LED: two cameras and an infrared LED are installed on the described intelligent glasses, described two parallel symmetries of camera are installed on the two ends, the left and right sides of intelligent glasses, be respectively left camera and right camera, described infrared LED is installed in the center of described intelligent glasses;

Two cameras of described intelligent glasses and described infrared LED have been formed a three-dimensional motion capture system that is used for catching the interior movement locus of object of certain three dimensions scope and coordinate.

Preferably, described intelligent glasses system comprises that three-dimensional motion catches coordinate system and image frame coordinate system, wherein, described three-dimensional motion catches coordinate system and is used for analyzing finger motion, and described image frame coordinate system is used for analyzing the shown image frame of described intelligent glasses.

Preferably, described three-dimensional motion catches coordinate system and described image frame coordinate system, be true origin O with described infrared LED center all, to be parallel to described image frame and to be the OXY plane through the plane of true origin O, set up right-handed coordinate system OXYZ, as the coordinate system of described three-dimensional motion capture system and described image frame.

Another object of the present invention is to provide the exchange method of described intelligent glasses system.

For achieving the above object, the technical solution adopted for the present invention to solve the technical problems is that the exchange method of the intelligent glasses system of described a kind of man-machine interaction may further comprise the steps:

(1) allow finger to enter the catching range of the three-dimensional motion capture system of being formed by two cameras installing on the described intelligent glasses and infrared LED;

(2) described infrared LED emission infrared ray, described infrared ray touches the finger back by the finger scattering;

(3) after described two cameras detect the infrared ray of described scattering, generate two width of cloth dispersion images relevant with the movement locus of described finger;

(4) described two width of cloth dispersion images are adopted the stereoscopic vision algorithm, calculate the three dimensional space coordinate of described finger;

(5) three dimensional space coordinate according to described finger judges whether to have selected button;

(6) described intelligent glasses is realized the image frame that described intelligent glasses shows is pointed clicking operation according to selection.

Concrete, described step (4) may further comprise the steps: read the coordinate range of described finger fingertip X, Y-direction in image frame from described two width of cloth dispersion images, adopt the binocular stereo vision algorithm to calculate the Z coordinate of described finger fingertip then.

Concrete, the process of Z coordinate that described employing binocular stereo vision algorithm calculates described finger fingertip is as follows: l ₁, l ₂Represent described two cameras respectively, B represents the distance between described two camera photocentres, and f is the focal length of described camera, and (x, y z) are the residing spatial point of described finger fingertip to be asked to P, and (x, y z) are its three dimensional space coordinate, P ₁(x ₁, y ₁), P ₂(x ₂, y ₂) be respectively P o'clock at two cameras as the picpointed coordinate on the plane, wherein the P parallax of ordering is d=x ₁-x ₂, calculate the degree of depth that P orders by geometric relationship and be:

z = f \frac{B}{x_{1} - x_{2}} = f \frac{B}{d} .

Concrete, described step (5) may further comprise the steps: if the X of three dimensional space coordinate scope any button in X, Y-direction and described image frame of described finger fingertip, Y coordinate range are approaching, then represent the top that described finger fingertip has been positioned at described button;

Concrete, if the Z coordinate of the three dimensional space coordinate scope of described finger fingertip surpasses the Z coordinate time of described image frame, then described button has been selected in representative.

Concrete, described step (5) is further comprising the steps of: when described finger fingertip correspondence be more than one volume coordinate point the time, ask for the Z coordinate of described more than one volume coordinate point and be weighted, ask for average coordinates

When

Greater than the Z coordinate time of described image frame in coordinate system, judge that described finger fingertip has touched the described image frame that described intelligent glasses shows, in the X of described finger fingertip, Y-direction coordinate range and described image frame the X of any button, when the Y-direction coordinate range overlaps, can judge that described finger clicked described button.

Concrete, described identification to finger fingertip is not limited only to a finger, can identify any or a plurality of finger tip, makes the man-machine interaction of carrying out with described intelligent glasses can realize multi-point touch.

Compare with existing intelligent glasses, the embodiment of the invention has following advantage:

1, carry out the speech recognition technology that man-machine interaction is adopted with respect to existing with intelligent glasses, the embodiment of the invention is very similar to present touch screen use-pattern, meets popular use habit.

2, the embodiment of the invention provides a kind of multi-point touch interactive mode that is similar to touch screen; can allow the people once import a plurality of control signals fast; thereby can provide rapid and precise man-machine interaction mode; and present speech recognition technology all is to import a signal slowly at every turn; very slow, poor efficiency.

3, the interactive mode that provides of the embodiment of the invention is stereoscopic three-dimensional, so just has more interactive mode than present speech recognition technology, can import the more control signal.

4, the interactive mode that provides of the embodiment of the invention does not exist serious long-time use can produce tired problem, because the people is when carrying out man-machine interaction with intelligent glasses, need moving just finger, and speech recognition technology needs at every moment all opening one's mouth;

5, the man-machine interaction mode that provides of the embodiment of the invention is based on people's limbs voice, and such mode can be better, the demand for control with the people informs computing machine faster, realizes man-machine interaction more efficiently.

Description of drawings

Fig. 1 is the outside drawing of the intelligent glasses that provides of the embodiment of the invention;

Fig. 2 is the intelligent glasses front view (FV) that the embodiment of the invention provides;

Fig. 3 is the interaction concept synoptic diagram that the embodiment of the invention provides;

Fig. 4 is the binocular stereo vision ultimate principle figure that the embodiment of the invention provides;

Fig. 5 is that the two dimensional image picture with having the three-dimensional depth of field that the embodiment of the invention provides carries out mutual synoptic diagram.

Embodiment

In order to make purpose of the present invention, technical scheme and advantage clearer, below in conjunction with embodiment, the present invention is further elaborated.Should be appreciated that described herein only is a part of embodiment of the present invention, rather than whole embodiment.Based on the embodiment among the present invention, the every other embodiment that those of ordinary skills obtain under the prerequisite of not making creative work belongs to the scope of protection of the invention.

One of purpose of the embodiment of the invention provides a kind of intelligent glasses system of man-machine interaction, allows intelligent glasses can respond people's control signal, realizes special operation.For achieving the above object, the embodiment of the invention provides a kind of intelligent glasses system of man-machine interaction, as shown in Figure 1, comprising:

Intelligent glasses: but described intelligent glasses is the Clairvoyant type intelligent glasses, can make the described intelligent glasses eyeglass of visible light transmissive, image information is superimposed upon on people's the true visual field simultaneously.After the people puts on this intelligent glasses, except the environment around seeing, also can see the image information that is shown by intelligent glasses, viewing effect is equivalent to watching a screen that size is certain in the space of human eye.

Camera and infrared LED: as shown in Figure 2, two cameras and an infrared LED 1 are installed on the described intelligent glasses, described two parallel symmetries of camera are installed on the two ends, the left and right sides of intelligent glasses, be respectively left camera 2 and right camera 3, described infrared LED 1 is installed in the center of described intelligent glasses.

Two cameras 2 of described intelligent glasses, 3 and described infrared LED 1 form a three-dimensional motion capture system that is used for catching movement locus of object and coordinate in the three dimensions scope.When the people brandishes finger, the range of movement of finger will be positioned at the capture space inside of this three-dimensional motion capture system.In use, infrared LED will be launched infrared ray, when the finger of meeting the people when infrared ray is scattered, can be detected by two cameras, thereby obtain two width of cloth dispersion image relevant with the movement locus of finger, two width of cloth dispersion images like this can calculate the three dimensional space coordinate of finger behind employing and stereoscopic vision algorithm.

Basic ideas of the present invention are that the finger motion information to the people catches and judges, finally reach with intelligent glasses and carry out man-machine interaction.Interactive mode is similar at touch-screen uses finger to click screen.At present, the process of certain clicking operation of touch screen device responds user was divided into for two steps:

User's finger fingertip moves to above the button that needs to click;

The user presses the button.

Realize that the problem that these two process natures need solve is, calculate people's finger fingertip with respect to the three-dimensional coordinate of touch screen, comprise the coordinate of X, Y-direction and Z direction.In the embodiment of the invention, realize that the interactive mode that people and intelligent glasses carry out clicking based on finger also is to be dependent on such process.Concrete implementation procedure as shown in Figure 3, described intelligent glasses system comprises two coordinate systems: three-dimensional motion catches coordinate system and image frame coordinate system, wherein, described three-dimensional motion catches coordinate system and is used for analyzing finger motion, and described image frame coordinate system is used for analyzing the shown image frame of described intelligent glasses.

Concrete, true origin and coordinate axis that described three-dimensional motion catches coordinate system and described image frame coordinate system overlap, be true origin with described infrared LED center all, that is, described three-dimensional motion catches finger X, the Y-direction coordinate that coordinate system captures and just directly is mapped as the coordinate of finger in the image frame coordinate system.

Concrete, as shown in Figure 3, be true origin O with described infrared LED center, to be parallel to described image frame and to be the OXY plane through the plane of true origin O, set up right-handed coordinate system OXYZ, as the coordinate system of described three-dimensional motion capture system, simultaneously also as the coordinate system of described image frame.In Fig. 3, plane ABCD is image frame, and the positive rectangular pyramid that dotted line is represented represents the catching range of three-dimensional motion capture system.

When various button clicks appear in image frame, need people's finger to click when finishing man-machine interaction, people's finger can enter the catching range of three-dimensional motion capture system, at this moment, infrared LED sends infrared ray, infrared ray can be by the finger scattering after running into people's finger, and the part infrared ray of scattering can enter the camera at intelligent glasses two ends, camera can be exported two width of cloth dispersion images, this two width of cloth dispersion image is after adopting corresponding stereoscopic vision algorithm, can calculate the three-dimensional coordinate scope of finger fingertip in coordinate system OXYZ, if the three-dimensional coordinate scope of finger fingertip is at X, the X of some buttons in Y-direction and the image frame, the Y coordinate range approaches, and then represents the top that finger fingertip has been positioned at this button; If the Z coordinate of the three-dimensional coordinate scope of finger fingertip surpasses the Z coordinate time of image frame, then this button has been selected in representative.By such mode, just can realize that the people directly clicks the image frame that intelligent glasses shows, allow the people can be as the mode of using touch screen, the words picture that intelligent glasses shows is pointed clicking operation, thereby realize people and intelligent glasses carry out based on the human body language man-machine interaction, allow intelligent glasses respond people's control signal rapidly, accurately.

(1) button click appears in the image frame that shows of described intelligent glasses, needs to click when finishing man-machine interaction, allows finger to enter the catching range of the three-dimensional motion capture system of being made up of two cameras installing on the described intelligent glasses and infrared LED;

(6) described intelligent glasses is according to selection, and response people's control signal realizes the image frame that described intelligent glasses shows is pointed clicking operation.

Because the coordinate system of three-dimensional motion capture system is identical with the coordinate system of image frame, therefore, can directly from two width of cloth dispersion images that video camera obtains, read the coordinate range of finger fingertip X, Y-direction in image frame, and the Z coordinate of finger fingertip can calculate by the binocular stereo vision algorithm.That is, described step (4) may further comprise the steps: read the coordinate range of described finger fingertip X, Y-direction in image frame from described two width of cloth dispersion images, adopt the binocular stereo vision algorithm to calculate the Z coordinate of described finger fingertip then.

The process that the three-dimensional motion capture system carries out analytical calculation to the Z coordinate of people's finger fingertip three-dimensional coordinate: as shown in Figure 4, be the binocular stereo vision algorithm basic principle among the figure.Technique of binocular stereoscopic vision is based on principle of parallax, in the drawings, and l ₁, l ₂Be respectively the camera of two parallel placements, in the present invention, l ₁, l ₂Represent left camera, right camera respectively.B represents two distances between the camera photocentre, and f is the focal length of camera.(x, y z) are spatial point to be asked to P, and (x, y z) are its three-dimensional coordinate, P ₁(x ₁, y ₁), P ₂(x ₂, y ₂) be respectively P o'clock at two cameras as the picpointed coordinate on the plane, wherein the P parallax of ordering is

D=x ₁-x ₂, can calculate the degree of depth that P orders by geometric relationship and be:

Therefore corresponding with finger fingertip is more than one volume coordinate point, need ask for the Z coordinate of these points and is weighted, and asks for average coordinates

When

Greater than the Z coordinate time of image frame in coordinate system, can judge finger fingertip and touch the image frame that intelligent glasses shows, meanwhile, if X, the Y-direction coordinate range of some buttons overlap substantially in the X of finger fingertip, Y-direction coordinate range and the image frame, can think to point and click this button, intelligent glasses can be finished corresponding operation, thereby realizes carrying out man-machine interaction with intelligent glasses.

Therefore, described step (4) may further comprise the steps: the process of Z coordinate that described employing binocular stereo vision algorithm calculates described finger fingertip is as follows: l ₁, l ₂Represent described two cameras respectively, B represents the distance between described two camera photocentres, and f is the focal length of described camera, and (x, y z) are the residing spatial point of described finger fingertip to be asked to P, and (x, y z) are its three dimensional space coordinate, P ₁(x ₁, y ₁), P ₂(x ₂, y ₂) be respectively P o'clock at two cameras as the picpointed coordinate on the plane, wherein the P parallax of ordering is, calculates the degree of depth that P orders by geometric relationship to be:

z = f \frac{B}{x_{1} - x_{2}} = f \frac{B}{d} .

Described step (5) may further comprise the steps: if the X of three dimensional space coordinate scope any button in X, Y-direction and described image frame of described finger fingertip, Y coordinate range are approaching, then represent the top that described finger fingertip has been positioned at described button; If the Z coordinate of the three dimensional space coordinate scope of described finger fingertip surpasses the Z coordinate time of described image frame, then described button has been selected in representative.

And described step (5) is further comprising the steps of: when described finger fingertip correspondence be more than one volume coordinate point the time, ask for the Z coordinate of described more than one volume coordinate point and be weighted, ask for average coordinates

When

Greater than the Z coordinate time of described image frame in coordinate system, judge that described finger fingertip has touched the described image frame that described intelligent glasses shows, simultaneously, if X, the Y-direction coordinate range of described finger fingertip overlap with X, the Y-direction coordinate range of any button in the described image frame, can judge that described finger clicked described button.

In embodiments of the present invention, the identification of finger finger tip is not limited only to a finger, can refers to that finger tips identify to ten, thereby make the man-machine interaction of carrying out with intelligent glasses can realize multi-point touch.

Simultaneously, though the just image frame of a two dimension that shows in the intelligent glasses at present, but this two-dimentional image frame but can reflect three-dimensional image effect (viewing effect is equivalent to present bore hole 3D) when using under the specific occasion, this moment, then the finger fingertip that will ask for help can touch the image of different distance if need carry out with the different parts of this 3-D view alternately.As shown in Figure 5, the plane that the several two dimensional surfaces representative that increases newly need be clicked with finger fingertip, though in fact these two dimensional surfaces appear in this image frame of ABCD, the viewing effect of human eye is equivalent to these two dimensional surfaces from the different plane of distance human eye.Because the three-dimensional motion capture system can be that the distance of initial point is measured to the finger fingertip range coordinate, so the signal that finger fingertip is clicked Different Plane can be responded.

Increase is mutual to two-dimensional picture with three-dimensional depth of field, just make exchange method provided by the invention not merely be confined on the two dimensional surface, just with respect to similar present touch screen two dimensional surface interactive mode, exchange method provided by the invention is stereoscopic three-dimensional, so just can provide more interactive signal.

Through the above description of the embodiments, those skilled in the art can be well understood to dimension disclosed by the invention and value and not be intended to be used to strictly be limited to described exact value.On the contrary, except as otherwise noted, each such dimension and value are intended to explain described value and center on the scope that is equal on this value function.

The above only is preferred implementation of the present invention; should be pointed out that for those skilled in the art, under the prerequisite that does not break away from the principle of the invention; can also make some improvements and modifications, these improvements and modifications also should be looked protection scope of the present invention.

Claims

1. the intelligent glasses system of a man-machine interaction is characterized in that, comprises with the lower part:

Camera and infrared LED: two cameras and an infrared LED have been installed on the described intelligent glasses, described two parallel symmetries of camera are installed on the two ends, the left and right sides of intelligent glasses, be respectively left camera and right camera, described infrared LED is installed in the center of described intelligent glasses;

Two cameras of described intelligent glasses and described infrared LED are formed a three-dimensional motion capture system that is used for catching the interior movement locus of object of certain three dimensions scope and coordinate.

2. the intelligent glasses system of a kind of man-machine interaction as claimed in claim 1, it is characterized in that, described intelligent glasses system comprises that three-dimensional motion catches coordinate system and image frame coordinate system, wherein, described three-dimensional motion catches coordinate system and is used for analyzing finger motion, and described image frame coordinate system is used for analyzing the shown image frame of described intelligent glasses.

3. the intelligent glasses system of a kind of man-machine interaction as claimed in claim 2, it is characterized in that, described three-dimensional motion catches coordinate system and described image frame coordinate system, be true origin O with described infrared LED center all, to be parallel to described image frame and to be the OXY plane through the plane of true origin O, set up right-handed coordinate system OXYZ, catch coordinate system and described image frame coordinate system as described three-dimensional motion.

4. the exchange method of the intelligent glasses system of a man-machine interaction is characterized in that, may further comprise the steps:

(1) finger enters the catching range of the three-dimensional motion capture system of being made up of two cameras installing on the described intelligent glasses and infrared LED;

5. the exchange method of the intelligent glasses system of a kind of man-machine interaction as claimed in claim 4, it is characterized in that, described step (4) may further comprise the steps: read the coordinate range of described finger fingertip X, Y-direction in image frame from described two width of cloth dispersion images, adopt the binocular stereo vision algorithm to calculate the Z coordinate of described finger fingertip then.

6. the exchange method of the intelligent glasses system of a kind of man-machine interaction as claimed in claim 5 is characterized in that, the process of Z coordinate that described employing binocular stereo vision algorithm calculates described finger fingertip is as follows: l ₁, l ₂Represent described two cameras respectively, B represents the distance between described two camera photocentres, and f is the focal length of described camera, and (x, y z) are the residing spatial point of described finger fingertip to be asked to P, and (x, y z) are its three dimensional space coordinate, P ₁(x ₁, y ₁), P ₂(x ₂, y ₂) be respectively P o'clock at two cameras as the picpointed coordinate on the plane, wherein the P parallax of ordering is d=x ₁-x ₂, calculate the degree of depth that P orders by geometric relationship and be:

z = f \frac{B}{x_{1} - x_{2}} = f \frac{B}{d} .

7. as the exchange method of the intelligent glasses system of claim 5 or 6 described a kind of man-machine interactions, it is characterized in that, described step (5) may further comprise the steps: the X of the three dimensional space coordinate scope of described finger fingertip any button in X, Y-direction and described image frame, Y coordinate range approach, and then represent the top that described finger fingertip has been positioned at described button.

8. the exchange method of the intelligent glasses system of a kind of man-machine interaction as claimed in claim 7, it is characterized in that, described step (5) may further comprise the steps: the Z coordinate of the three dimensional space coordinate scope of described finger fingertip surpasses the Z coordinate time of described image frame, and then described button has been selected in representative.

9. the exchange method of the intelligent glasses system of a kind of man-machine interaction as claimed in claim 8, it is characterized in that, described step (5) is further comprising the steps of: when described finger fingertip correspondence be more than one volume coordinate point the time, ask for the Z coordinate of described more than one volume coordinate point and be weighted, ask for average coordinates When

10. the exchange method of the intelligent glasses system of a kind of man-machine interaction as claimed in claim 5 is characterized in that, can identify any or a plurality of finger tip, makes the man-machine interaction of carrying out with described intelligent glasses can realize multi-point touch.