CN103336575B

CN103336575B - The intelligent glasses system of a kind of man-machine interaction and exchange method

Info

Publication number: CN103336575B
Application number: CN201310263439.9A
Authority: CN
Inventors: 费树培; 谢耀钦
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Shen Tech Advanced Cci Capital Ltd; Suzhou Zhongke Advanced Technology Research Institute Co Ltd
Priority date: 2013-06-27
Filing date: 2013-06-27
Publication date: 2016-06-29
Anticipated expiration: 2033-06-27
Also published as: CN103336575A

Abstract

The invention discloses the intelligent glasses system of a kind of man-machine interaction, including intelligent glasses, for can Clairvoyant type intelligent glasses, it is possible to make intelligent glasses eyeglass described in visible light-transmissive, image information be superimposed upon on the true visual field of people simultaneously；Installing two photographic head and an infrared LED on intelligent glasses, two photographic head Parallel Symmetrics are installed on the two ends, left and right of intelligent glasses, and infrared LED is arranged on the center of described intelligent glasses；Two photographic head of intelligent glasses and infrared LED constitute a three-dimensional motion capture system for catching movement locus of object and coordinate within the scope of certain three dimensions.The finger motion information of people is caught and judges by the present invention, it is achieved carry out man-machine interaction with intelligent glasses.

Description

The intelligent glasses system of a kind of man-machine interaction and exchange method

Technical field

The present invention relates to artificial intelligence field, the intelligent glasses system of especially a kind of man-machine interaction and exchange method.

Background technology

At present, man-machine interaction always is that various electronic equipment needs one of key problem of solution, only allows electronic equipment better respond the control signal of people, and electronic equipment just can better meet the user demand of people.

Intelligent glasses refers to as smart mobile phone, there is independent operating system, the program that can be provided by software houses such as user installation software, game, voice can be passed through or action has manipulated adding schedule, digital map navigation and good friend's interaction, shooting photo and video and friend launches the functions such as video calling, it is possible to be realized the general name of such class glasses that wireless network accesses by mobile communication network.

GoogleGlass, it is a " augmented reality " glasses issued in April, 2012 by Google, it has the function the same with smart mobile phone, it is possible to taken pictures by sound control, video calling and distinguish direction and surf the web, process Word message and Email etc..But predominantly staying in interactive voice scope with the mode carrying out man-machine interaction with intelligent glasses at present, other better interactive modes can not allow intelligent glasses respond the command signal of people.

Although adopting speech recognition technology and intelligent glasses to carry out the problem that man-machine interaction can solve the control signal of intelligent glasses response people within the specific limits, but it there is also the problem that can not solve:

Current man-machine interaction is more adopt the body language of people to send command signal to computer, is such as sent a signal to computer with finger by mouse, is adopted the click of finger to send signal etc. by health control game, touch screen.Interactive voice technology is due to the limitation of its input signal, and one that is intended only as prior art supplementary, can not be fully solved the problem that people interacts with computer.

Speech recognition technology only by the voice command single of people interrupted send order to computer, it is impossible to rapid as mouse, allow people and computer carry out command interaction accurately, namely cannot solve control signal input requirements fast and accurately.

In existing electronic equipment, speech recognition technology still can not become the major technique realizing man-machine interaction, and is intended only as supplementing of prior art.Based on same present situation, speech recognition technology can not meet the man-machine interaction demand of intelligent glasses, and only as carrying out a supplementary technology of man-machine interaction with intelligent glasses.

There is the problem that long-time use easily makes people produce fatigue in speech recognition technology, because inputting for long signal, if rely on people to speak always, so after a protracted use, anyone can feel tired, therefore speech recognition technology cannot, as main man-machine interaction forwarding method, also be the same for intelligent glasses.

For can Clairvoyant type intelligent glasses, when the screen that intelligent glasses comprises multiple order button option by is presented in the visual field of people, adopt interactive voice can not click the button of needs fast and accurately, because interactive voice needs people to say order, in this case, the mode being similar to click or finger click is adopted just can to better meet man-machine interaction demand.

Based on this, it is necessary to change existing design, it is provided that one can carry out man-machine interaction method fast and accurately with intelligent glasses, here it is the purpose of the present invention.

Summary of the invention

Carry out the deficiency of man-machine interaction for existing employing speech recognition technology and intelligent glasses, the embodiment of the present invention provides the intelligent glasses system of a kind of man-machine interaction, allows intelligent glasses can respond the control signal of people, it is achieved special operation.

For reaching above-mentioned purpose, the technical solution adopted for the present invention to solve the technical problems is, the intelligent glasses system of described carried out man-machine interaction, including:

Intelligent glasses: described intelligent glasses is can Clairvoyant type intelligent glasses, it is possible to makes intelligent glasses eyeglass described in visible light-transmissive, image information is superimposed upon on the true visual field of people simultaneously；

Photographic head and infrared LED: two photographic head and an infrared LED are installed on described intelligent glasses, said two photographic head Parallel Symmetric is installed on the two ends, left and right of intelligent glasses, respectively left photographic head and right photographic head, described infrared LED is arranged on the center of described intelligent glasses；

Two photographic head of described intelligent glasses and described infrared LED constitute a three-dimensional motion capture system for catching movement locus of object and coordinate within the scope of certain three dimensions.

Preferably, described intelligent glasses system includes three-dimensional motion and catches coordinate system and image frame coordinate system, wherein, described three-dimensional motion catches coordinate system and is used for analyzing finger motion, and described image frame coordinate system is for analyzing the image frame shown by described intelligent glasses.

Preferably, described three-dimensional motion catches coordinate system and described image frame coordinate system, all with described infrared LED center for zero O, to be parallel to described image frame and through the plane of zero O for OXY plane, set up right-handed coordinate system OXYZ, as the coordinate system of described three-dimensional motion capture system and described image frame.

Another object of the present invention is to provide the exchange method of described intelligent glasses system.

For reaching above-mentioned purpose, the technical solution adopted for the present invention to solve the technical problems is, the exchange method of the intelligent glasses system of described a kind of man-machine interaction, comprises the following steps:

(1) finger is allowed to enter the catching range of the three-dimensional motion capture system being made up of two photographic head installed on described intelligent glasses and infrared LED；

(2) described infrared LED launch infrared ray, described infrared touch after finger by finger scattering；

(3), after said two photographic head detects the infrared ray of described scattering, the two width dispersion images relevant to the movement locus of described finger are generated；

(4) described two width dispersion images are adopted stereoscopic vision algorithm, calculate the three dimensional space coordinate of described finger；

(5) judge whether to have selected button according to the three dimensional space coordinate of described finger；

(6) described intelligent glasses is according to selection, it is achieved the image frame that described intelligent glasses is shown carries out finger clicking operation.

Concrete, described step (4) comprises the following steps: read described finger fingertip coordinate range of X, Y-direction in image frame from described two width dispersion images, then adopts binocular stereo vision algorithm to calculate the Z coordinate of described finger fingertip.

Concrete, the process of the Z coordinate that described employing binocular stereo vision algorithm calculates described finger fingertip is as follows: l₁、l₂Representing said two photographic head respectively, B represents the distance between said two photographic head photocentre, and f is the focal length of described photographic head, P (x, y, z) spatial point residing for described finger fingertip to be asked, (x, y, z) for its three dimensional space coordinate, P₁(x₁,y₁)、P₂(x₂,y₂) respectively P point picpointed coordinate in two photographic head image plane, wherein the parallax of P point is d=x₁-x₂, the degree of depth being calculated P point by geometrical relationship is:

z = f \frac{B}{x_{1} - x_{2}} = f \frac{B}{d} .

Concrete, described step (5) comprises the following steps: if the three dimensional space coordinate scope of the described finger fingertip X of any one button, Y coordinate scope in X, Y-direction and described image frame is close, then represents described finger fingertip and has been positioned at the top of described button；

Concrete, if the Z coordinate of the three dimensional space coordinate scope of described finger fingertip exceedes the Z coordinate of described image frame, then represent and have selected described button.

Concrete, described step (5) is further comprising the steps of: when described finger fingertip corresponding be more than one space coordinates point time, ask for the Z coordinate of the space coordinates point of more than one and be weighted, asking for average coordinatesWhenDuring more than described image frame Z coordinate in a coordinate system, judge that described finger fingertip has touched the described image frame that described intelligent glasses shows, when the X of described finger fingertip, Y-direction coordinate range and when the X of any one button, Y-direction coordinate range overlap in described image frame, namely can determine whether that described finger clicks described button.

Concrete, the described identification to finger fingertip is not limited only to a finger, it is possible to any one or multiple finger tip are identified, make the man-machine interaction carried out with described intelligent glasses be capable of multi-point touch.

Compared with existing intelligent glasses, the embodiment of the present invention has the advantage that

1, carrying out, with intelligent glasses, the speech recognition technology that man-machine interaction adopts relative to existing, the embodiment of the present invention is very similar to present touch screen occupation mode, meets the use habit of masses.

2, a kind of multi-point touch interactive mode being similar to touch screen is embodiments provided; people can be allowed once to rapidly input multiple control signal; such that it is able to provide quickly and accurately man-machine interaction mode; and present speech recognition technology is one signal of input slowly every time, slowly, poor efficiency.

3, the interactive mode that the embodiment of the present invention provides is stereoscopic three-dimensional, thus has more interactive mode than current speech recognition technology, it is possible to input more control signal.

4, the embodiment of the present invention provides interactive mode is also absent from serious long-time use and can produce the problem of fatigue, and because of people is when carrying out man-machine interaction with intelligent glasses, it is necessary to dynamic simply finger, and speech recognition technology needs are at every moment all being opened one's mouth.

5, the man-machine interaction mode that the embodiment of the present invention provides is based on the limbs voice of people, and such mode better, faster by the demand for control of people can inform computer, it is achieved more efficient man-machine interaction.

Accompanying drawing explanation

Fig. 1 is the outside drawing of the intelligent glasses that the embodiment of the present invention provides；

Fig. 2 is the intelligent glasses front elevation that the embodiment of the present invention provides；

Fig. 3 is the interaction concept schematic diagram that the embodiment of the present invention provides；

Fig. 4 is the binocular stereo vision ultimate principle figure that the embodiment of the present invention provides；

Fig. 5 is the schematic diagram interacted with the two dimensional image picture with the three-dimensional depth of field that the embodiment of the present invention provides.

Detailed description of the invention

In order to make the purpose of the present invention, technical scheme and advantage clearly understand, below in conjunction with embodiment, the present invention is further elaborated.Should be appreciated that described herein is only a part of embodiment of the present invention, rather than whole embodiments.Based on the embodiment in the present invention, the every other embodiment that those of ordinary skill in the art obtain under the premise not making creative work, broadly fall into the scope of protection of the invention.

One of purpose of the embodiment of the present invention, is to provide the intelligent glasses system of a kind of man-machine interaction, allows intelligent glasses can respond the control signal of people, it is achieved special operation.For reaching above-mentioned purpose, embodiments provide the intelligent glasses system of a kind of man-machine interaction, as it is shown in figure 1, include:

Intelligent glasses: described intelligent glasses is can Clairvoyant type intelligent glasses, it is possible to makes intelligent glasses eyeglass described in visible light-transmissive, image information is superimposed upon on the true visual field of people simultaneously.After people puts on this intelligent glasses, except the environment seeing surrounding, it can be seen that the image information shown by intelligent glasses, viewing effect is equivalent to watching the screen that a size is certain in the space of human eye.

Photographic head and infrared LED: as shown in Figure 2, described intelligent glasses is installed two photographic head and an infrared LED 1, said two photographic head Parallel Symmetric is installed on the two ends, left and right of intelligent glasses, respectively left photographic head 2 and right photographic head 3, described infrared LED 1 is arranged on the center of described intelligent glasses.

Two photographic head 2,3 of described intelligent glasses and described infrared LED 1 form a three-dimensional motion capture system for catching movement locus of object and coordinate within the scope of three dimensions.When people brandishes finger, the range of movement of finger will be located in inside the capture space of this three-dimensional motion capture system.In use, infrared LED will launch infrared ray, when the finger that infrared ray meets people is scattered, can be detected by two photographic head, thus obtaining the dispersion image that two width are relevant to the movement locus of finger, such two width dispersion images are adopting and after stereoscopic vision algorithm, it is possible to calculate the three dimensional space coordinate of finger.

The basic ideas of the present invention are that the finger motion information to people catches and judges, are finally reached and carry out man-machine interaction with intelligent glasses.Interactive mode is similar to and uses finger to click screen on the touchscreen.At present, the process of certain clicking operation of touch-screen equipment response user is divided into two steps:

The finger fingertip of user moves to above the button needing to click；

User presses the button.

Realizing the two process nature needs the problem solved to be calculate the finger fingertip of people relative to the three-dimensional coordinate touching screen, including the coordinate of X, Y-direction and Z-direction.In the embodiment of the present invention, it is achieved the interactive mode that people and intelligent glasses carry out based on finger is clicked also is based in such process.Concrete implementation process is as shown in Figure 3, described intelligent glasses system includes two coordinate systems: three-dimensional motion catches coordinate system and image frame coordinate system, wherein, described three-dimensional motion catches coordinate system and is used for analyzing finger motion, and described image frame coordinate system is for analyzing the image frame shown by described intelligent glasses.

Concrete, described three-dimensional motion catches zero and the coordinate axes coincidence of coordinate system and described image frame coordinate system, all with described infrared LED center for zero, that is, described three-dimensional motion seizure coordinate system captures finger X, Y-direction coordinate are just mapped directly into finger coordinate in image frame coordinate system.

Concrete, as it is shown on figure 3, with described infrared LED center for zero O, to be parallel to described image frame and through the plane of zero O for OXY plane, set up right-handed coordinate system OXYZ, as the coordinate system of described three-dimensional motion capture system, also serve as the coordinate system of described image frame simultaneously.In figure 3, plane ABCD is image frame, and the positive rectangular pyramid that dotted line represents represents the catching range of three-dimensional motion capture system.

When various button clicks occurs in image frame, when the finger needing people clicks finishing man-machine interaction, the finger of people can enter the catching range of three-dimensional motion capture system, now, infrared LED sends infrared ray, infrared ray can by finger scattering after encountering the finger of people, and the part infrared ray of scattering can enter the photographic head at intelligent glasses two ends, photographic head can export two width dispersion images, this two width dispersion image is after adopting corresponding stereoscopic vision algorithm, finger fingertip three-dimensional coordinate scope in coordinate system OXYZ can be calculated, if the three-dimensional coordinate scope of finger fingertip is at X, Y-direction and the X of some button in image frame, Y coordinate scope is close, then represent finger fingertip and have been positioned at the top of this button；If the Z coordinate of the three-dimensional coordinate scope of finger fingertip exceedes the Z coordinate of image frame, then represent and have selected this button.By such mode, can be realized as people and click directly on the image frame that intelligent glasses shows, allow people can as use touch screen mode, the words picture that intelligent glasses is shown carries out finger clicking operation, thus realize people and intelligent glasses carry out based on human body language man-machine interaction, allow intelligent glasses respond the control signal of people rapidly, accurately.

(1) there is button click in the image frame that described intelligent glasses shows, when need to click finishing man-machine interaction, it is allowed to finger enters the catching range of the three-dimensional motion capture system being made up of two photographic head installed on described intelligent glasses and infrared LED；

(6) described intelligent glasses is according to selection, the control signal of response people, it is achieved the image frame that described intelligent glasses is shown carries out finger clicking operation.

Owing to the coordinate system of three-dimensional motion capture system and the coordinate system of image frame are identical, therefore, directly can read finger fingertip coordinate range of X, Y-direction in image frame from the two width dispersion images that video camera obtains, and the Z coordinate of finger fingertip can be calculated by binocular stereo vision algorithm.That is, described step (4) comprises the following steps: read described finger fingertip coordinate range of X, Y-direction in image frame from described two width dispersion images, then adopts binocular stereo vision algorithm to calculate the Z coordinate of described finger fingertip.

The Z coordinate of the finger fingertip three-dimensional coordinate of people is analyzed the process calculated by three-dimensional motion capture system: as shown in Figure 4, be the ultimate principle of binocular stereo vision algorithm in figure.Technique of binocular stereoscopic vision is based on principle of parallax, in the drawings, and l₁、l₂The respectively photographic head of two parallel placements, in the present invention, l₁、l₂Represent left photographic head, right photographic head respectively.B represents the distance between two photographic head photocentres, and f is the focal length of photographic head.P (x, y, z) for spatial point to be asked, (x, y, z) for its three-dimensional coordinate, P₁(x₁,y₁)、P₂(x₂,y₂) respectively P point picpointed coordinate in two photographic head image plane, wherein the parallax of P point is

D=x₁-x₂, the degree of depth that can be calculated P point by geometrical relationship is:

Corresponding with finger fingertip is more than one space coordinates point, it is therefore desirable to asks for the Z coordinate of these points and is weighted, asking for average coordinatesWhenDuring more than image frame Z coordinate in a coordinate system, can judge that finger fingertip has touched the image frame that intelligent glasses shows, meanwhile, if the X of finger fingertip, Y-direction coordinate range essentially coincide with the X of some button, Y-direction coordinate range in image frame, namely it is believed that finger clicks this button, intelligent glasses can complete corresponding operation, thus realizing carrying out man-machine interaction with intelligent glasses.

Therefore, described step (4) comprises the following steps: the process of the Z coordinate that described employing binocular stereo vision algorithm calculates described finger fingertip is as follows: l₁、l₂Representing said two photographic head respectively, B represents the distance between said two photographic head photocentre, and f is the focal length of described photographic head, P (x, y, z) spatial point residing for described finger fingertip to be asked, (x, y, z) for its three dimensional space coordinate, P₁(x₁,y₁)、P₂(x₂,y₂) respectively P point picpointed coordinate in two photographic head image plane, wherein the parallax of P point is, the degree of depth being calculated P point by geometrical relationship is:

z = f \frac{B}{x_{1} - x_{2}} = f \frac{B}{d} .

Described step (5) comprises the following steps: if the three dimensional space coordinate scope of the described finger fingertip X of any one button, Y coordinate scope in X, Y-direction and described image frame is close, then represents described finger fingertip and has been positioned at the top of described button；If the Z coordinate of the three dimensional space coordinate scope of described finger fingertip exceedes the Z coordinate of described image frame, then represent and have selected described button.

Further, described step (5) is further comprising the steps of: when described finger fingertip corresponding be more than one space coordinates point time, ask for the Z coordinate of the space coordinates point of more than one and be weighted, asking for average coordinatesWhenDuring more than described image frame Z coordinate in a coordinate system, judge that described finger fingertip has touched the described image frame that described intelligent glasses shows, simultaneously, if the X of described finger fingertip, Y-direction coordinate range overlap with the X of any one button, Y-direction coordinate range in described image frame, namely can determine whether that described finger clicks described button.

In embodiments of the present invention, the identification of finger finger tip is not limited only to a finger, it is possible to refer to that finger tips are identified to ten, so that the man-machine interaction carried out with intelligent glasses is capable of multi-point touch.

Simultaneously, although a simply two-dimentional image frame of display in intelligent glasses at present, but but can reflect three-dimensional image effect (viewing effect is equivalent to present bore hole 3D) when the image frame of this two dimension uses under specific occasion, now if desired interact with the different parts of this 3-D view, then the finger fingertip asked for help can touch the image of different distance.As shown in Figure 5, the several two dimensional surfaces newly increased represent the plane needing to click with finger fingertip, although these two dimensional surfaces virtually appear in this image frame of ABCD, but the viewing effect of human eye is equivalent to these two dimensional surfaces from the distance different plane of human eye.Owing to three-dimensional motion capture system can to the distance measurement of finger fingertip range coordinate system initial point, therefore finger fingertip is clicked the signal of Different Plane and can be responded.

Increase two-dimensional picture mutual with the three-dimensional depth of field, allow for exchange method provided by the invention and be not limited on two dimensional surface, namely relative to similar present touch screen two dimensional surface interactive mode, exchange method provided by the invention is stereoscopic three-dimensional, is thus provided that more interactive signal.

Through the above description of the embodiments, those skilled in the art is it can be understood that be not intended to be used to be limited strictly to described exact value to dimension disclosed by the invention and value.On the contrary, except as otherwise noted, each such dimension and value are intended to state described value and around the functionally equivalent scope of this value.

The above is only the preferred embodiment of the present invention; it should be pointed out that, for those skilled in the art, under the premise without departing from the principles of the invention; can also making some improvements and modifications, these improvements and modifications also should look protection scope of the present invention.

Claims

1. the intelligent glasses system of a man-machine interaction, it is characterised in that include with lower part:

Photographic head and infrared LED: on described intelligent glasses, be mounted with two photographic head and an infrared LED, said two photographic head Parallel Symmetric is installed on the two ends, left and right of intelligent glasses, respectively left photographic head and right photographic head, described infrared LED is arranged on the center of described intelligent glasses；

Two photographic head of described intelligent glasses and described infrared LED form a three-dimensional motion capture system for catching movement locus of object and coordinate within the scope of certain three dimensions；

Described intelligent glasses system includes three-dimensional motion and catches coordinate system and image frame coordinate system, and wherein, described three-dimensional motion catches coordinate system and is used for analyzing finger motion, and described image frame coordinate system is for analyzing the image frame shown by described intelligent glasses；

Described three-dimensional motion catches coordinate system and described image frame coordinate system, all with described infrared LED center for zero O, to be parallel to described image frame and through the plane of zero O for OXY plane, set up right-handed coordinate system OXYZ, catch coordinate system and described image frame coordinate system as described three-dimensional motion.

2. the exchange method of the intelligent glasses system of a man-machine interaction, it is characterised in that comprise the following steps:

(1) finger enters the catching range of the three-dimensional motion capture system being made up of two photographic head installed on described intelligent glasses and infrared LED；

(5) judge whether to have selected button according to the three dimensional space coordinate of described finger；The three dimensional space coordinate scope of described the finger fingertip X of any one button, Y coordinate scope in X, Y-direction and described image frame essentially coincide, then represent described finger fingertip and have been positioned at the top of described button；

3. the exchange method of the intelligent glasses system of a kind of man-machine interaction as claimed in claim 2, it is characterized in that, described step (4) comprises the following steps: read described finger fingertip coordinate range of X, Y-direction in image frame from described two width dispersion images, then adopts binocular stereo vision algorithm to calculate the Z coordinate of described finger fingertip.

4. the exchange method of the intelligent glasses system of a kind of man-machine interaction as claimed in claim 3, it is characterised in that the process of the Z coordinate that described employing binocular stereo vision algorithm calculates described finger fingertip is as follows: l₁、l₂Representing said two photographic head respectively, B represents the distance between said two photographic head photocentre, and f is the focal length of described photographic head, P (x, y, z) spatial point residing for described finger fingertip to be asked, (x, y, z) for its three dimensional space coordinate, P₁(x₁,y₁)、P₂(x₂,y₂) respectively P point picpointed coordinate in two photographic head image plane, wherein the parallax of P point is d=x₁-x₂, the degree of depth being calculated P point by geometrical relationship is:

z = f \frac{B}{x_{1} - x_{2}} = f \frac{B}{d} .

5. the exchange method of the intelligent glasses system of a kind of man-machine interaction as claimed in claim 2, it is characterized in that, described step (5) comprises the following steps: when the Z coordinate of the three dimensional space coordinate scope of described finger fingertip exceedes the Z coordinate of described image frame, then represent and have selected described button.

6. the exchange method of the intelligent glasses system of a kind of man-machine interaction as claimed in claim 5, it is characterized in that, described step (5) is further comprising the steps of: when described finger fingertip corresponding be more than one space coordinates point time, ask for the Z coordinate of the space coordinates point of more than one and be weighted, asking for average coordinatesWhenDuring more than described image frame Z coordinate in a coordinate system, judge that described finger fingertip has touched the described image frame that described intelligent glasses shows, when the X of described finger fingertip, Y-direction coordinate range and when the X of any one button, Y-direction coordinate range overlap in described image frame, namely can determine whether that described finger clicks described button.

7. the exchange method of the intelligent glasses system of a kind of man-machine interaction as claimed in claim 3, it is characterised in that any one or multiple finger tip can be identified, make the man-machine interaction carried out with described intelligent glasses be capable of multi-point touch.