CN211293894U

CN211293894U - Hand-written interaction device in air

Info

Publication number: CN211293894U
Application number: CN201922072434.3U
Authority: CN
Inventors: 张鑫; 方瑞妍; 林宏辉
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2019-11-27
Filing date: 2019-11-27
Publication date: 2020-08-18
Anticipated expiration: 2029-11-27

Abstract

The utility model discloses an overhead handwriting interaction device, which comprises an intelligent glasses module, a data preprocessing module, a gesture recognition module, a fingertip detection module and an overhead handwriting interaction module; the intelligent glasses module obtains RGB video stream by taking the RGB camera as a visual center by self, and wirelessly transmits the encoded video stream to the data preprocessing module through wifi; the data preprocessing module realizes the color space conversion from RGB to HSV and the enhancement of edge characteristics and data; the gesture recognition module realizes recognition of different gestures by using the CNN; the fingertip detection module is realized by a double-cascade CNN and is used for boundary selection frame and finger key node detection; the high-altitude handwriting interaction module realizes different writing operations according to the gesture recognition and fingertip detection results; the utility model discloses can realize aerial convenient, the high-efficient accurate diversified operation of writing of nature according to the simple and easy gesture of user.

Description

Hand-written interaction device in air

Technical Field

The utility model relates to a computer vision recognition, degree of depth study and artificial intelligence technical field, concretely relates to it is mutual device that write by hand in air.

Background

In a traditional man-machine interaction mode, interaction between people and equipment often depends on fixed modes such as a remote controller, a keyboard, a mouse and the like. Although the traditional man-machine interaction mode is simple and natural, the convenience and the interaction experience sense are relatively poor, and the requirements of people in all aspects are difficult to meet.

Gesture recognition is a natural and intuitive human-computer interaction mode and is a research hotspot in the field of current human-computer interaction. The aerial handwriting is a comfortable and natural man-machine non-contact interaction means, so that the user is allowed to write in a natural and free mode in the air, and the user is provided with intuitive and vivid, natural and convenient interaction experience.

The vision-based gesture interaction integrates related technologies of subjects such as image processing, machine vision and mode recognition, is a research hotspot in the field of man-machine interaction at present, and has very wide application prospects in the fields of sign language recognition, smart home and the like. Fingertip detection is one of the key technologies for realizing gesture interaction devices. The dynamic diversity of the background, the instability of ambient light, the personal characteristics of the hand shape, and the randomness of fingertip movement make accurate detection and positioning of the fingertips still a challenging task.

In recent years, the rapid development of deep learning technology provides a new concept for modeling hands. The fact proves that the deep learning is very effective for solving the computer vision problem; accordingly, a great deal of research effort has been published for gesture recognition and classification based on deep learning. Nevertheless, few researchers have explored how to detect fingertips in pure color images, which is a regression problem for deep learning. Therefore, our task is to detect key points of the fingers, including the finger tip and index finger joints, using RGB images taken by a self-centered visual motion camera.

Although the prior art has been in the air, the prior art still has the challenging problems of high price, low practicability, low recognition accuracy, low device operation efficiency, and the like.

SUMMERY OF THE UTILITY MODEL

The utility model aims at solving the above-mentioned defect among the prior art, provide an it is up to the air to write by hand interactive device.

The purpose of the utility model can be achieved by adopting the following technical scheme:

an over-the-air handwriting interaction device, the device comprising: the system comprises an intelligent glasses module, a data preprocessing module, a gesture recognition module, a fingertip detection module and a high-altitude handwriting interaction module;

the intelligent glasses module obtains RGB video stream by taking the RGB camera as a visual center by self, and wirelessly transmits the coded video stream to the data preprocessing module through the wifi unit;

the data preprocessing module, the gesture recognition module, the fingertip detection module, the handwriting interaction module and the like run on a PC (personal computer) or a high-performance notebook with a wifi unit;

the data preprocessing module is used for preprocessing the received RGB video stream and sending the processed result to the fingertip detection module and the gesture recognition module. The pretreatment process is as follows: converting video stream data from RGB to HSV color space, and enhancing edges based on HSV color space data;

the gesture recognition module realizes recognition of different gestures by using CNN (conversation Neural networks), and the recognition comprises four gesture classifications of character selection, writing completion confirmation, rewriting clearing and invalidation;

the fingertip detection module is composed of double-cascade CNN, the double-cascade CNN module is formed by cascade connection of a first-stage CNN module and a second-stage CNN module, wherein the first-stage CNN module is used for boundary frame selection, and the second-stage CNN module is used for finger key node detection;

the volley handwriting interaction module realizes different writing operations according to the output results of the gesture recognition module and the fingertip detection module, namely, realizes natural, convenient, efficient and accurate writing interaction between a user and the device.

Furthermore, the intelligent glasses module is formed by installing a mobile camera and matching glasses, so that the interactive experience, the degree of freedom, the real-time performance and the comfort of a user are improved; the obtained video stream data can be framed into RGB image data and respectively transmitted to the data preprocessing module.

Further, the data preprocessing module realizes color space conversion from RGB to HSV, edge enhancement is respectively carried out on the three channel data of the converted HSV, the result after the edge enhancement and the original HSV data are combined into HSVEEE of 6 channels to be used as the input of the network, and EEE is the result of the HSV edge enhancement.

Further, the gesture recognition module realizes character selection through CNN, confirms that writing is finished, and eliminates rewriting and invalid four gesture classification.

Furthermore, the fingertip detection module consists of a double-cascade CNN module, so that stable, efficient and accurate detection and calibration of the finger key nodes are realized; the double-cascade CNN module is formed by cascading a first-stage CNN module and a second-stage CNN module.

Further, the first level CNN module substantially locates the bounding box of the hand region in the frame.

Further, the input of the second-stage CNN module is the output of the first-stage CNN module.

The utility model discloses for prior art have following advantage and effect:

1. compare traditional mode, the utility model discloses an interactive mode adopts the form of non-contact to realize the human-computer interaction mode that the user high up in the air handwritten. The interaction mode is simple and natural, easy to learn and remember, large in free space, high in flexibility and interaction efficiency, and the user can avoid handwriting in the air by means of a handwriting tool.

2. Compare other indoor gesture interaction devices, the utility model discloses a degree of freedom is high, and the restriction is less, and interactive experience is better, and the accuracy is higher and the cost is lower relatively.

3. The utility model discloses can realize that the degree of accuracy of effective function and discernment under different complicated backgrounds is higher relatively, and the error is less relatively, based on this, the utility model discloses can realize that the user writes interactive operation at the simple and convenient efficient in diversified space, more have popularization nature.

Drawings

Fig. 1 is a structural diagram of an up-to-air handwriting interaction device according to an embodiment of the present invention;

fig. 2 is a flowchart illustrating the operation of the high-altitude handwriting interaction device according to an embodiment of the present invention;

fig. 3 is a general work flow diagram of a gesture recognition module in the high-altitude handwriting interaction device according to an embodiment of the present invention;

fig. 4 is a general work flow diagram of the fingertip detection module in the high-altitude handwriting interaction device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. Based on the embodiments in the present invention, all other embodiments obtained by a person skilled in the art without creative work belong to the protection scope of the present invention.

It is noted that the word "connected" or "connecting" does not only encompass the direct connection of two entities, but also the indirect connection via other entities with beneficial and improved effects. The terms "equal," "same," "simultaneous," or other similar terms, are not limited to the absolute equality or identity of mathematical terms, but may be similar in engineering sense or within an acceptable error range when practicing the rights of this patent. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.

In various embodiments of the present invention, the expression "each or/and at least one of B" includes any or all combinations of the words listed simultaneously. For example, the expression "out or B out or" out or/and at least one of B "may comprise a, may comprise B or may comprise both a and B.

The term "comprises" or "comprising" used in various embodiments of the present invention indicates the presence of the disclosed functions, operations or elements, and does not limit the addition of one or more functions, operations or elements. Furthermore, as used in various embodiments of the present invention, the terms "comprises," "comprising," "includes," "including," "has," "having" and their derivatives are intended to refer only to the particular feature, number, step, operation, element, component, or combination of the foregoing, and should not be construed as first excluding the existence of, or adding to, one or more other features, numbers, steps, operations, elements, components, or combination of the foregoing.

Examples

Fig. 1 shows that in the utility model discloses in a structure diagram of a concrete example of hand-written interactive installation in air, in this embodiment, hand-written interactive installation in air includes that intelligent glasses module 1, data preprocessing module 2, gesture recognition module 3, fingertip detection module 4 reach hand-written interactive module 5 in air.

The intelligent glasses module 1 is formed by installing a mobile camera and matching glasses, video stream data is obtained by taking the mobile camera as a visual center, and the video stream data is connected with the data preprocessing module 2; the data preprocessing module 2 is connected with the intelligent glasses module 1, the gesture recognition module 3 and the fingertip detection module 4; the data preprocessing module 2 realizes the color space conversion from RGB to HSV and the enhancement of the edge and data from HSV to HSVEEE, thereby improving the precision of the device; the data preprocessing module 2 respectively transmits the preprocessed data to the double-cascade CNN of the fingertip detection module 4 and the gesture recognition module 3; the gesture recognition module 3 realizes character selection by using CNN, confirms that writing is finished, and eliminates and rewrites classification recognition of three gestures; the fingertip detection module 4 is composed of a double-cascade CNN, the double-cascade CNN module is formed by cascading a first-level CNN module 41 and a second-level CNN module 42, wherein the first-level CNN module 41 approximately positions a boundary frame of a hand area in a frame, and outputs coordinates of the upper left corner and the lower right corner of the normalized boundary frame; the input of the second-level CNN module 42 is the output of the first-level CNN module 41, the input image is adjusted to a fixed size of 99x99 through image size redefinition operation, and normalized index finger-tip coordinates are output for finger key node detection; the volley handwriting interaction module 5 realizes different writing operations according to the output results of the gesture recognition module 3 and the fingertip detection module 4, namely, realizes natural, convenient, efficient and accurate writing interaction between a user and the device.

The utility model provides a high up in the air handwriting interaction device can come the different handwriting operations of control according to the gesture of user difference.

The intelligent glasses module 1 is formed by installing a mobile camera and matching glasses, so that the interactive experience, the degree of freedom, the flexibility, the real-time performance and the comfort of a user are improved at lower cost; the intelligent glasses module 1 can extract the acquired video stream data into RGB image data and respectively transmit the RGB image data to the data preprocessing module 2.

The data preprocessing module 2 realizes color space conversion from RGB to HSV, edge enhancement is respectively carried out on the converted HSV three-channel data through a Laplace algorithm, the result after the edge enhancement and the original HSV data are combined into 6-channel HSVEEE which is used as the input of a network, and EEE is the result of HSV edge enhancement.

The gesture recognition module 3 realizes the classification of four gestures of character selection, writing completion confirmation, rewriting clearing and invalidation through the CNN, and the CNN in the gesture recognition module 3 is realized by ResNet-18 and is used for extracting hand data characteristics and realizing gesture matching.

The data preprocessing of the gesture recognition module 3 adopts a lower frame for processing, the running speed of the device is improved while higher accuracy is guaranteed, and efficient operation of the device is realized on the basis.

The fingertip detection module 4 consists of a double-stage CNN module, and stable, efficient and accurate detection and calibration of the finger key nodes are realized; the double cascade CNN module is formed by cascading a first-stage CNN module 41 and a second-stage CNN module 42.

The first-level CNN module 41 learns the characteristics of the hand, roughly locating the bounding box of the hand region in the frame; the first-stage CNN module 41 adopts smaller kernels and output number, and adopts more detailed connection, so that background interference can be reduced, precision is improved, and operating efficiency is improved.

The second-level CNN module comprises six convolution layers, every two convolution layers are combined to form an alternate pooling layer, the input of the second-level CNN module is the output of the first-level CNN module, and the normalized index finger-tip coordinate is used as the output.

The second-level CNN module establishes an additional full-connection branch; the second-level CNN module achieves better results based on the combination of the two fully connected layers through the branches.

The volley handwriting interaction module is connected with the gesture recognition module and the fingertip detection module, so that the writing operation of writing, selecting characters, finishing writing and clearing rewriting is realized, and the natural, convenient, efficient and accurate writing interaction between a user and the device is realized.

The above embodiments are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be equivalent replacement modes, and all are included in the scope of the present invention.

Claims

1. A volley handwriting interaction device is characterized by comprising an intelligent glasses module, a data preprocessing module, a gesture recognition module, a fingertip detection module and a volley handwriting interaction module;

the intelligent glasses module comprises an RGB camera, a wifi unit, a processor unit and a battery unit, wherein RGB video streams collected by the RGB camera are transmitted to the connected processor unit for compression coding, and then are wirelessly transmitted to the data preprocessing module by the wifi unit, and the battery unit is used for supplying power to the intelligent glasses module;

the data preprocessing module, the gesture recognition module, the fingertip detection module and the handwriting interaction module run on a PC (personal computer) or a high-performance notebook with a wifi unit;

the data preprocessing module is used for preprocessing the received RGB video stream and sending the processed result to the fingertip detection module and the gesture recognition module;

the gesture recognition module realizes recognition of different gestures by using the CNN;

the fingertip detection module consists of double-cascade CNN, the double-cascade CNN module is formed by cascade connection of a first-stage CNN module and a second-stage CNN module, wherein the first-stage CNN module is used for boundary frame selection, and the second-stage CNN module is used for finger key node detection;

the volley handwriting interaction module is connected with the gesture recognition module and the fingertip detection module, so that different writing operations are realized.

2. The volley handwriting interaction device according to claim 1, wherein said smart glasses module realizes a self-centering RGB image capture viewing angle by fixing a mobile camera in the middle of a glasses frame, said wifi unit is disposed in the left temple of glasses, and said processor unit and battery unit are disposed in the right temple of glasses.

3. The up-to-the-air handwriting interaction device according to claim 1, wherein the gesture recognition module is used for classifying four gestures, namely word selection, confirmation of writing completion, rewriting clearing and invalidation.