CN110837301A

CN110837301A - Data glove for gesture recognition and gesture recognition method

Info

Publication number: CN110837301A
Application number: CN201911123406.8A
Authority: CN
Inventors: 不公告发明人
Original assignee: Beijing Huayan Mutual Entertainment Technology Co Ltd
Current assignee: Beijing Huayan Mutual Entertainment Technology Co Ltd
Priority date: 2019-11-16
Filing date: 2019-11-16
Publication date: 2020-02-25

Abstract

The invention discloses a data glove for gesture recognition and a gesture recognition method, wherein the data glove comprises a hand-shaped silica gel sensor array and elastic textile fabrics; the gesture recognition method comprises the steps of obtaining a training data set; training parameter values of target parameters of a neural network model by using the training data set to obtain the trained neural network model; and inputting the reading of the target local stretching sensor into the trained neural network model for prediction to obtain a target gesture parameter, and determining the target hand posture according to the target gesture parameter. The data glove provided by the invention is light and thin, comfortable to wear, free of an external optical device, low in price and capable of providing high posture reconstruction accuracy, and solves the problems of large size and low reconstruction accuracy caused by the need of an external optical or inertial device in the prior art.

Description

Data glove for gesture recognition and gesture recognition method

Technical Field

The invention relates to the field of gloves, in particular to a data glove for gesture recognition and a gesture recognition method.

Background

The hand is the primary means by which we manipulate objects and communicate with each other, and many applications, such as games, robotics, biomechanical analysis, rehabilitation, and emerging human-machine interaction modalities, such as augmented and virtual reality (AR/VR), rely heavily on precise methods to restore full-hand posture.

Existing hand pose reconstruction methods are based on a set of sensors either externally visually arranged or embedded in data gloves, most gloves using three types of sensors: IMU (inertial measurement unit), bending (flexibility) and strain (stretch) sensors, however these hand pose reconstruction methods in the prior art still have certain problems:

(1) many vision-based hand pose estimation methods are proposed in computer vision based on camera tracking, such as the marker-based MOCAP method (e.g., Vicon) which requires expensive infrastructure and markers to be placed on the user, requires an externally mounted camera to make the entire hand visible in the image, a limitation that is a practical obstacle for many applications, particularly those that may present severe occlusion, such as when interacting with objects, wearing gloves or other clothing, or working in cluttered environments, and is therefore limited to applications with controlled environments.

(2) An IMU sensor glove is composed of a 3-axis accelerometer, a 3-axis gyroscope and a 3-axis magnetometer, and the main defect of the IMU sensor glove in hand posture estimation is that compared with the size of human fingers, the IMU sensor glove has the characteristics of rigidity and large volume.

(3) The flex sensor glove, flex sensor glove has been very successfully applied to commercial products such as the CyberGlove, VPL glove, 5DT glove or more recently ManusVR glove, which typically has 5 to 22 sensors, low hand pose reconstruction accuracy, while the human hand has at least 25 degrees of freedom, larger sensing elements are difficult to place and often add complexity to the glove design, which in turn increases manufacturing costs, and may hinder dexterity and natural hand motion.

(4) Strain sensor gloves, most of which are resistive at present, use piezoresistive materials, elastic conductive yarn, or conductive fluid channels, however many resistive sensors suffer from hysteresis.

In view of the above problems, no effective solution has been proposed.

Disclosure of Invention

In order to solve the above technical problems, an object of the present invention is to provide a data glove for gesture recognition and a gesture recognition method thereof, so as to solve the problems of large glove size and low reconstruction accuracy caused by external optical or inertial devices in the related art.

In a first aspect, an embodiment of the present invention provides a data glove for gesture recognition, including:

the hand-shaped silica gel sensor array comprises a first protective layer, a first conducting layer, a dielectric layer, a second conducting layer and a second protective layer which are sequentially arranged, wherein the first conducting layer and the second conducting layer are formed by mixing organic silicon and carbon black, the first conducting layer and the second conducting layer are arranged in an overlapping mode to form a plurality of overlapping areas, the overlapping areas form a plurality of local capacitors and are used as local stretching sensors, and when the shape of the local stretching sensors changes, the capacitance of the local stretching sensors also changes;

the elastic textile fabric is composed of a plurality of textile components which are cut and customized by a laser cutting machine, and the plurality of textile components are connected with the hand-shaped silica gel sensor array in a closed mode to form a wearable glove.

Furthermore, the number of the local stretching sensors is 44, and the local stretching sensors are respectively and correspondingly positioned on each finger, each finger main joint, between the fingers and on the back of the hand.

Further, the thickness of the first conductive layer and the second conductive layer is 220 μm.

Further, both sides of the partial stretching sensor are provided with circular thin cuts.

Further, the elastic textile fabric is composed of 1 palm textile part, 3 flap textile parts and 5 finger textile parts.

In a second aspect, an embodiment of the present invention provides a method for performing gesture recognition using any one of the data gloves for gesture recognition described above, the method including:

acquiring a training data set, wherein the training data set comprises a local stretch sensor reading and a gesture parameter corresponding to the local stretch sensor reading;

training parameter values of target parameters of a neural network model by using the training data set to obtain the trained neural network model;

and inputting the reading of the target local stretching sensor into the trained neural network model for prediction to obtain a target gesture parameter, and determining the target hand posture according to the target gesture parameter.

Further, prior to the step of training parameter values of target parameters of a neural network model using the training data set, the method further comprises:

rejecting abnormal data in the training data set;

normalizing the local stretch sensor readings.

Further, the step of normalizing the local stretch sensor readings comprises:

obtaining a maximum and a minimum of the local stretch sensor readings using a median filter;

normalizing the local stretch sensor readings from the maximum and minimum values to range from-1 to 1.

Further, the neural network model is a U-Net convolutional neural network model, and the step of training parameter values of target parameters of the neural network model using the training data set includes:

and constructing a U-Net convolution neural network model, and training parameter values of target parameters of the U-Net convolution neural network model by using the training data set.

In a third aspect, an embodiment of the present invention provides a storage medium including a stored program, where the program performs any one of the above-mentioned methods.

In a fourth aspect, an embodiment of the present invention provides a processor, where the processor is configured to execute a program, where the program executes any one of the above methods.

In a fifth aspect, an embodiment of the present invention provides an electronic device, including: one or more processors, memory, a display device, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for performing any of the above-described methods.

Advantageous effects

The data glove for gesture recognition provided by the invention is light and thin, comfortable to wear, free of an external optical device, low in price and capable of providing high posture reconstruction accuracy. Because the hand-shaped silica gel sensor array and the elastic textile fabric can be stretched and deformed, the data glove can be easily put on and taken off without sacrificing the compact adaptability, and can be well adapted to the sizes of various hands.

Compared with the vision-based method, our gesture recognition relies only on the internal readings of the sensors and, once trained, does not require additional external infrastructure, opening the door to use scenarios where traditional motion capture methods are not applicable.

In contrast to the flex sensor glove, our design contains only one layer of silicone composite, and the number of sensing elements is limited only by the surface area of the connecting wires and the wiring space.

According to the gesture recognition method provided by the invention, the geometric and topological prior knowledge is introduced into the neural network model, the learning process can be standardized, the reconstruction performance can be improved, the training data quality can be improved by removing abnormal values and using a minimum-maximum normalization method, and the predicted gesture is more stable visually by training the U-Net convolutional neural network model.

Drawings

FIG. 1 is a schematic diagram of a data glove for gesture recognition according to an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of a hand-shaped silica gel sensor array according to an embodiment of the invention;

FIG. 3 is a pattern of a second conductive layer according to an embodiment of the present invention;

FIG. 4 is a pattern of a first conductive layer according to an embodiment of the present invention;

FIG. 5 is a schematic view of a partial stretch sensor layout according to an embodiment of the invention;

FIG. 6 is a graph of the number of sensor units versus the average error according to an embodiment of the present invention;

FIG. 7 is a schematic representation of an elastic textile structure according to an embodiment of the present invention;

FIG. 8 is a schematic view of an elastic textile fabric aligned with a hand-shaped silicone sensor array according to an embodiment of the present invention;

FIG. 9 is a finished glove after closed attachment of an elastic textile fabric to a hand-shaped silicone sensor array in accordance with an embodiment of the present invention;

FIG. 10 is a flow diagram of a method of gesture recognition according to an embodiment of the present invention;

FIG. 11 is a schematic diagram of a mapping of a partially stretched sensor to a two-dimensional grid in accordance with an embodiment of the invention;

FIG. 12 is a schematic diagram of mapping pose parameters to a two-dimensional grid in accordance with an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In an implementation of the present invention, there is provided a data glove for gesture recognition, the data glove comprising:

hand shape silica gel sensor array 10, hand shape silica gel sensor array 10 is including the first protective layer 11 that sets gradually, first conducting layer 12, dielectric layer 13, second conducting layer 14 and second protective layer 15, first conducting layer 12 and second conducting layer 14 are formed by organosilicon and carbon black mixture, first conducting layer 12 and the overlapping setting of second conducting layer 14 form a plurality of overlap regions, a plurality of overlap regions form a plurality of local capacitors to be used as local tensile sensor 15, wherein, work as when local tensile sensor 15's shape changes, local tensile sensor 15's electric capacity is also along with changing.

The elastic textile fabric 20 is composed of a plurality of textile components which are cut and customized by a laser cutting machine, and the textile components are connected with the hand-shaped silica gel sensor array 10 in a closed mode to form a wearable glove.

Among them, the present invention is based on two observations discovered by the inventors: (1) it has recently become possible to produce soft, stretchable sensor arrays entirely from silicone (2) modern data-driven techniques are available to map the resulting sensor readings to hand gestures, no longer related to skeletal transitions.

Fig. 1 shows a schematic view of a data glove for gesture recognition according to an embodiment of the present invention, which is composed of a stretchable hand-shaped silicone sensor array 10 and a thin custom-made elastic textile fabric 20 as shown in fig. 1, wherein the hand-shaped silicone sensor array 10 is composed of 5 layers, and a first conductive layer 12 and a second conductive layer 14 are embedded with conductive strip patterns, which when overlapped together form a partial capacitor, which is called a partial stretch sensor 15, as shown in fig. 2.

In order to solve the problem of hysteresis of the resistive sensors, we use capacitive local stretch sensors 15, which are based on shape variability, any change in shape: such as width w, length l or inter-plate distance d, all result in a capacitance C ∈_r∈₀A/d＝∈_r∈₀Iw/d, where e_rAnd e₀Is a constant. Thus, the area of the capacitor can be estimated by continuously measuring the capacitance:

provided that conservation of volume is maintained, i.e.

Traditionally, capacitive strain sensors are fabricated separately and connected to a pair of conductive traces for readout. To increase the number of sensors in a certain area, we arrange the traces in a grid structure. A local capacitor, also called a sensor cell, is formed where two traces overlap, and each pair of traces overlaps at most once. Thus, the number of conductors required is the sum of the grid row and column numbers, not the product. This space-saving design allows us to place up to 7 sensors on a thin object such as a finger. While for the 44 sensors on our gloves only 2 layers of 27 wires are needed, compared to 45 with the non-matrix approach, a 42.5% reduction.

The matrix layout means that the sensor cells cannot be read directly. Thus, we introduce a time-multiplexed readout scheme in which for each measurement, a voltage is applied to a subset of the traces, while the remaining wires are grounded. In this way a temporary (composite) capacitor is formed, the capacitance of which is measured. Composite capacitance value C_mWith a desired single capacitance value C_cThere is a linear relationship between:

MC_c＝C_m

m is a rectangular matrix, the rows of which encode all possible measurement combinations and the sensor cell capacitance C_cCombined capacitance C converted into a measurement_m. The rows of matrix M are formed by iteratively connecting one trace at the top and one trace at the bottom as source electrodes, with all remaining traces connected as ground electrodes. Our gloves were laid out with 15 on the bottomTraces, there are 12 traces on the top layer, resulting in 180 ═ 15 × 12 rows. Each row corresponds to one measurement value, so that 180 measurement combinations are required for 44 sensor units in our glove design. The linear system described above is overdetermined by design (to obtain better robustness) and solved in the least squares sense. We obtain the capacitance by measuring the charge time.

To achieve a physically accurate stretch reading, we need only repeat the reading, but do not necessarily correspond directly to the stretch value in the physical sense. This allows us to use lower charging resistances (47kOhm and 220kOhm) and thus improve the read-out rate. Furthermore, we do not solve for C every complete combined metric period (180 updates)_cInstead, the problem is solved once every 16 updates. We found experimentally that this scheme has good sensor read performance and that more frequent solutions negatively impact frame rate due to limitations of microcontroller host communication bottlenecks. Our readout scheme has a capture rate of about 60 hertz. To filter out noise in the readings, we solve for C_cPreviously filtered out of C_mThe last five frames.

The data glove provided by the embodiment of the invention senses the local stretching amount applied to the embedded silicon sensor by measuring the capacity change of the data glove. These stretch driven sensors are small, flexible, and low cost.

Reading C_cThe input may be to a deep neural network for training, which outputs hand gestures, which the application may then query, for example, render hands in virtual reality or perform collision detection with virtual objects for interaction. In our field experiments, the hand posture is modified by a so-called hand gestureAnd (4) filtering by using a filter.

In order to cover the freedom of one full hand, we use 44 partial stretch sensors 15 to make a full glove, as shown in fig. 5, for continuous full hand pose estimation, and our sensor design contains almost three times as many partial stretch sensors 15 as the nearest similar sensors.

Fig. 3 shows the pattern of the second conductive layer 14, fig. 4 shows the pattern of the first conductive layer 12, fig. 5 shows a schematic layout of the local stretch sensor 15 formed when the first conductive layer 12 and the second conductive layer 14 overlap, which we can design manually by adding the sensors in stages: (i) the longer sensors correspond directly to the major joints of the fingers (21-24, 32-36, 40-42) and thumb (0, 20); (ii) abduction sensors (16, 25-27) between the fingers; (iii) vertical sensors on fingers (8-9, 29-31, 37-39, 43) and thumbs (1, 28); (iv) the back of the hand is a regular grid of horizontal (2, 4, 7, 10, 17-19) and vertical (3, 5, 6, 11-15) sensors. Fig. 6 is a graph of the number of sensor units versus the average error according to an embodiment of the present invention, and it can be seen from fig. 6 that as the number of sensors increases, the average reconstruction error of the acquisition session decreases: from only 14 sensors covering 8.67 of the main joint to 44 sensors covering 6.75 of the whole glove, reconstruction accuracy is greatly improved. Finally, the sensors are connected by two layers of wires, so that each pair of connected traces (from different layers) overlaps at most once. In determining the final position of the sensor, we consider reducing the lead length and avoiding stretch absorption by nearby cuts. For these reasons, for example, the sensors (32-36, 40-42) on the knuckles are not centered, leaving some margin.

Wherein the sensor may be pre-stretched when the glove is worn on the user's hand for good sensitivity of finger abduction. Therefore, it is crucial to manufacture the sensor array in the rest position shown in fig. 5. In particular, the fingers must be parallel without any gap in between.

In order to enhance wearing comfort, thin cuts 16 with circular ends can be added on two sides of the rectangular sensor through laser cutting, as shown in fig. 5, the thin cuts 16 with circular shapes are added on the upper side, the lower side or the left side and the right side of the rectangular sensor, and wearing comfort is enhanced through increasing ventilation. Since they reduce the resistance to stretching, they also have a slight but positive effect on the reading, making the sensor more sensitive to stretching parallel to the incision. For example,

sensors

21, 33 or 40 located above the index finger joint are less sensitive to changes in volume of the finger, while sensors like 43, 37, 29 are primarily sensitive to changes in volume or diameter of the finger (e.g., due to muscle distension). In fig. 5, the sensors more sensitive to vertical stretching are indicated by dark colors, such as 40, 41, 42, and the sensors more sensitive to horizontal stretching are indicated by light colors, such as 29, 37, 43.

Wherein, our gloves can be made only by tools provided by modern factory laboratories. It is manufactured in a two-stage process, first we make a soft silicone sensor array covering the back of the glove, then we make an elastic textile fabric 20 consisting of a plurality of custom-made textile components cut with a laser cutter, we attach the textile components to the silicone sheet and close it to form a soft wearable glove.

The hand shaped silicone sensor array 10 may be generated layer by layer using the following steps.

First, we cast an insulating base layer on a glass sheet, and control the thickness by applying a tape to the edges of the glass sheet. Next, a conductive layer made of kenter silicone RTV 4420 silica gel was mixed with carbon black (conductive powder) and projected directly onto the first layer. The laser cutter is then removed by repeatedly etching (5 times) the negative of the pattern shown in fig. 4, leaving the conductive traces throughout the base layer. Then, a layer of pure silicon dielectric is cast, followed by another conductive layer, which is also etched (fig. 3). Finally, an insulation shielding layer is added.

Wherein the thickness of the conductive layer is preferably 220 μm to allow the required wire width to be only 2 mm. To expose the connection pads at the bottom of the sensor, the pads were covered with a thin tape before casting (last three layers) and removed before curing in an oven.

The laser cutter parameters in the etching step were power 30, speed 40, PPI 500, and a zettack Speedy300 laser cutter can be used. The use of higher power during the etching process can cause the silicon sensor to crosslink with the substrate glass and eventually be difficult to strip. After each complete etch cycle, the sensor was carefully wiped with a towel and isopropyl alcohol to remove dust residues.

After each casting step, the sensor was cured in an oven at 90 ℃ for 20 minutes. The sensor must be left for 15 minutes to allow the solvent to evaporate before curing in the oven. Otherwise, bubbles may form during the curing process due to evaporation of the solvent from the inside, while the uppermost part of the layer has already cured.

Finally, the sensor was cut into a hand shape with a laser cutter. Accurate alignment of the etching and cutting steps in the laser cutter is critical to avoid cutting in the sensor as this may lead to short circuits between the conductive layers. The total thickness of our sensor is 0.85 mm.

Since the silica gel sensor array is not wearable. It is not easy to stick it firmly to the hand and sticking two pieces of silicone together is a difficult task. Attempts to put on or take off such gloves are cumbersome due to the high friction and tightness. We have attempted to attach sensors to standard gloves, but finding proper alignment with the major joint centers is a challenge and difficult to perform with robustness and repeatability as required.

We therefore propose a simpler and more efficient solution to cutting a customized textile pattern component with a laser cutting machine, as shown in fig. 7, an elastic textile 20 can be composed of 1 large

palm textile component

21, 5

finger textile components

23 and 3 flap textile components 22 for attachment, which can be attached to a silicon sensor while resting on a flat surface. First, a PET mask covering the sensor and the incision was placed over the sensor, then everything was covered with silicone adhesive, and finally, the mask was carefully removed and the textile component was placed and firmly fixed. Fig. 8 shows a schematic view of the elastomeric fabric 20 aligned with the flat hand silicone sensor array 10, and fig. 9 shows the finished glove after the elastomeric fabric 20 is closed with the hand silicone sensor array 10 using textile glue. Where we can close the different textile parts with HT 2 textile glue and glue the seams with an electric iron. A 0.35mm thick highly elastic textile 20 (80% polyamide and 20% elastic fiber) was used. Finally, we attach the wrist strap with a velcro to enhance tightness and ensure repeated alignment of the sensor unit with the joint. The final data glove is shown schematically in fig. 1, and is easy to wear, unobtrusive, easy to manufacture, and inexpensive. A thin glove is formed by 44 individual stretch sensors on a hand silicon sensor array attached to an elastic fabric. The total weight is only 50 g, the thickness is only 1.2 mm, the glove can be worn comfortably even if used for a long time, and the glove can be well adapted to various hand sizes.

The data glove for gesture recognition provided by the embodiment of the invention is light and thin, comfortable to wear, free of an external optical device, low in price and capable of providing high posture reconstruction accuracy. Because the hand-shaped silica gel sensor array 10 and the elastic textile fabric 20 can be stretched and deformed, the data glove can be easily put on and taken off without sacrificing the compact adaptability, and can be well adapted to the sizes of various hands.

In an implementation of the present invention, there is also provided a method for gesture recognition, using any one of the above data gloves for gesture recognition, as shown in fig. 10, the method including the following steps: :

step S100, a training data set is obtained, wherein the training data set comprises readings of a local stretching sensor 15 and gesture parameters corresponding to the readings of the local stretching sensor 15;

step S102, using the training data set to train parameter values of target parameters of a neural network model to obtain the trained neural network model;

and step S104, inputting the reading of the target local stretching sensor 15 into the trained neural network model for prediction to obtain a target gesture parameter, and determining the target hand posture according to the target gesture parameter.

Where pose reconstruction is a very complex task since the local stretch sensor 15 is not in a one-to-one relationship with the degrees of freedom of the hand, the inventors have found that data representation based on a priori knowledge of the geometric neighborhood and spatial correspondence allows neural networks to more efficiently find joints on the human hand in the input and output domains. Due to the lack of a realistic acquisition method, obtaining a large and diverse training data set for hand pose estimation is a well-known challenge. Although this is particularly serious in (2D) image based methods (without using any instrumentation), we observe that our data glove design is so unobtrusive that it is not visible to the depth camera. This enables us to effectively capture training data using inexpensive, off-the-shelf hand tracking systems.

In order to improve the data quality, the method is calibrated by removing abnormal values and using a minimum-maximum normalization method, namely, the input of a neural network model is processed and mapped sensor data. We remove frames that may be outliers by detecting finger collisions because they represent infeasible gestures. We filter out frames with collision energy greater than 80, indicating that the estimated pose may be unnatural and erroneous, this filter only removes about 2% of the data. Ideally, the size of the readings from each sensor should be standardized so that it is insensitive to the size of the hands. We observed that the minimum and maximum magnitudes of each sensor reading were fixed after wearing the glove and were used to normalize the sensor data. Therefore, we have found that the minimum maximum calibration of each sensor is a reasonable trade-off between cost and accuracy. The key is to find the minimum and maximum values after the glove is put on. In practice, we propose a short calibration phase where the user should freely explore the different extreme poses to find the maximum and minimum values for each sensor, and then we use it to normalize the sensor data to the [ -1, 1] range, and in order to make this process more robust, we use a median filter (over 20 frames) to extract the minimum and maximum values. This simple calibration method is very efficient, in practice, due to the complexity and compactness of our data glove, which provides the correct alignment.

Our experiments were performed from large data sets captured by 10 people (unless specifically noted), including a wide variety of hand sizes and shapes. The length of the hand varies from 17 to 20.5 cm, the width from 9 to 11 cm, and the aspect ratio from 1.6 to 2.1. For each person, we captured five sessions using the data collection settings; each session lasts approximately 300 seconds. In three of the five sessions, the participants continued wearing gloves, while between the other two sessions, the gloves were removed. We refer to these two regimes as intra-and inter-meeting, respectively. To encourage participants to fully explore the space of hand gestures, we present a print library of example gestures during a recording conference.

For n frames in the training data, our regression model inputs

Is read from 44 stretch sensors and the target output

Is 25 gesture parameters covering the complete pose freedom of the hand.

Wherein the spatial correspondence between the input and output features is conveniently taken into account. Meaningful feature ordering and organization makes the learning task easier. For example, a group of sensors (

sensor units

0, 1, 2, 11, 16, 20, 28 of fig. 5) near the thumb, combined together, should have a higher impact on the prediction of the thumb. Also, some high-level gestures, such as a fist, result in more uniform sensor actions that require global coding, and thus it is difficult to define these interdependencies a priori. It is theoretically possible for a training FCN to learn this global local information, but in practice it requires a large amount of model capacity, training data, and hyper-parametric adjustments. Instead, we choose to build this geometric and topological a priori knowledge directly into the network structure to normalize the learning process and improve reconstruction performance.

We use a fully Convolutional Neural Network (CNN) and a two-dimensional grid representation as input and regression targets. More specifically, as shown in FIGS. 11-12, we use a 5 × 5 matrix to organize the input and output data. Fig. 11 shows how we map the readings of the local stretch sensor 15 to a two-dimensional grid, and fig. 12 shows how we map the pose parameters to a two-dimensional grid, each of which captures the spatial relationship. We use one matrix to organize the output, but two matrices to organize the input, because there are two sensors measuring horizontal and vertical stretch at each sensing location. For example,

sensors

29 and 33 are both located around the index finger joint, but each sensor captures a different stretch direction.

We use the U-Net network architecture to convert organized sensor readings into hand pose parameters. The downsampling and upsampling structures of the network may encode global information, while the symmetric hops between the encoder and decoder may maintain local correspondence.

We will fit L₂The penalty is used for the regression task, converting the two-dimensional sensor data into a gesture, L₂The expression of (a) is as follows:

wherein

Is the predicted value and y is the target attitude parameter.

Experiments show that the U-Net network model has better performance than other network architectures.

To evaluate the method of gesture recognition provided by the present example of the invention, we imagine a standard scenario for our hand-grasping method, in which the proposed neural network is trained only once, preferably on a large dataset containing samples from different hands. In this way, the new user only needs to perform an instant calibration method of less than one minute before using the glove for interaction.

In our experiments, we evaluated a generic model, which refers to using the data of all participants as training data, but the data of one participant as test data, and a personalized model. Whereas the personalized model is trained on data of one person only. This allows for more accurate pose reconstruction. In all experiments we used a medium size glove (20 × 12.5 cm); although only one size, it can handle a wide variety of hands.

For the personalized model, we performed experiments using two types of data: training data and test data sets are used only within the meeting, and training data is used in sessions and meetings, and test data is used in meetings. For the former, we use two sessions to predict the other session. For the latter, we use three intra-sessions and one inter-session to predict another inter-session. Intra-session samples generally have better performance than inter-session samples. This is because the positional alignment of the glove is better in continuous training. The Intra and Inter columns in table 1 show the average angular reconstruction errors for 10 different hands. On average, the average error within a session is 5.8 degrees, and the average error between sessions is 6.2 degrees. Slight error differences indicate that our data glove provides consistent calibration across sessions even though the glove was removed between sessions.

Table 1 average pose angular error (in degrees) during training for people of different hand sizes and aspect ratios.

In Table 1, the column "size" is in cm³The volume of the border is listed in units and the second column lists the dimensions of the border in cm. We report different scenarios: personalized models, generic models trained in sessions of 9 other participants (one left) when training during the session (1) completely identical to the test session, or (2) when removing gloves between two sessions: (3) using characteristic min-max sensor data obtained from training data, and (4) using individualized real-time min-max calibration. (5) The generic model was fine-tuned by short time (300 seconds) of personal training data. External hardware refers to the depth camera and GPU required for training data capture and processing.

For many applications, an angular reconstruction error of 7.6 is satisfactory. To further improve the reconstruction quality of the minimal individualized data, we fine tune the invisible data. That is, we load the network parameters from a pre-trained generic model and then further optimize all network parameters using a pupil learning rate of 1 x 10-6 and a batch size of 64, which helps to avoid catastrophic forgetfulness. The results are shown in column (5) of Table 1; their performance is comparable to the personalized model, but the time investment required is much less.

The generic model is crucial for a wide and diverse audience-oriented practical application, since training the personalized model is very time consuming (2 hours or more) and requires additional equipment (depth camera and GPU). Without calibration, the minimum maximum per sensor for all users can be obtained from the training data and applied to the normalization in the training and test data. Columns (3) and (4) of table 1 show the effectiveness of our calibration method: and the average pose reconstruction error after calibration is 7.6 degrees.

Our method supports five standard application scenarios, summarized in table 1:

(1) a personalized model within a session provides the best performance, but it must be gloved at all times.

(2) If the user can use a depth camera, personal training data (20 minutes) can be used. And used to train the personalized model for about 2 hours.

(3) If there is no time or ability to train and calibrate (e.g., in a rehabilitation environment), our generic model can be used in conjunction with the min-max values for each sensor extracted from the training set.

(4) By first exploring gestures to collect personal minimum maxima on the fly, and then using these values to normalize the sensor data, the accuracy of the generic model can be significantly improved in less than a minute of calibration time.

(5) One compromise between approaches (2) and (4) is to capture only 5 minutes of personal training data and fine tune the generic model for approximately 15 minutes.

Options (3) and (4) require only gloves and a pre-trained model, while other options require a depth camera and a GPU to train or fine tune the model. We consider (4) the most practical scenario, but applications that require higher precision may benefit from custom models (2) or (5). In fact, all of our models capture hand gestures well.

To illustrate the advantages of a dense local sensor array, we conducted ablation studies on the number of sensor units used to simulate a glove design with fewer sensors. The results show that higher reconstruction accuracy can be achieved with more sensors, with an average error reduction of 28% from 14 sensors to 44 sensors.

Our training, validation and testing data sets for the personalized model contained 85K, 10K and 15K samples, respectively. The number of samples of the non-personalized model is 800K, 90K and 120K respectively, in order to study the necessity of using such a huge training data set, we gradually and randomly delete part of the training data, and experiments prove that if less training data is used, the average session error is increased.

To obtain higher reconstruction accuracy, we compared five types of network models on the personalized model and the generic model, two one-dimensional baselines (FCN, LSTM) and three two-dimensional network structures, respectively, in table 2 and table 3: ResNet, U-Net and sum-conditional generation countermeasure networks (CGAN) generally converge faster and have lower reconstruction errors for two-dimensional based networks. The performance of FCNs is not satisfactory, especially in the case of a less diverse training set, such as in the case of personalized models. LSTM may obtain a smoother result with higher reconstruction accuracy than FCN, but it tends to over-smooth certain high frequency poses, such as contact of two fingers. Of these three two-dimensional based networks, ResNet has significantly improved over the FCN baseline, but there is still room for improvement. Both U-Net and CGAN have very high reconstruction accuracy. In our experiments, the predicted pose of U-Net is visually more stable than that of CGAN. Thus, 13M U-Net was used for all other experiments. It produces minimal errors for both personalized models and generic models. Experiments with networks with parameters less than 3M result in increased errors. For comparison, we also trained an SVM on the data of Table 3, resulting in a higher but still acceptable error of 7.8 degrees.

Table 2 comparison of different network models on personality models.

Table 3 comparison of different network models on a generic model.

According to the gesture recognition method provided by the embodiment of the invention, the geometric and topological prior knowledge is introduced into the neural network model, the learning process can be standardized and the reconstruction performance can be improved, the training data quality can be improved by removing abnormal values and using a minimum-maximum normalization method, and the predicted gesture is more stable visually by training the U-Net convolutional neural network model.

In practice of the invention, there is also provided a storage medium comprising a stored program, wherein the program performs any of the above-described methods.

In the implementation of the present invention, a processor is further provided, where the processor is configured to execute a program, where the program executes any one of the above methods.

In an implementation of the present invention, there is also provided an electronic device, including: one or more processors, memory, a display device, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for performing any of the above-described methods.

Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein. The scope of the invention is defined by the appended claims.

Claims

1. A data glove and a gesture recognition method for gesture recognition are characterized by comprising the following steps:

2. The data glove for gesture recognition and the gesture recognition method according to claim 1, wherein the number of the local stretching sensors is 44, and the local stretching sensors are respectively correspondingly arranged on each finger, each finger main joint, between the fingers and on the back of the hand.

3. The glove of claim 1 wherein the first and second conductive layers have a thickness of 220 μm.

4. The data glove for gesture recognition and the gesture recognition method according to claim 1, wherein the local stretching sensor is provided with thin circular cuts on both sides.

5. The data glove for gesture recognition and the gesture recognition method according to claim 1, wherein the elastic textile fabric is composed of 1 palm textile part, 3 flap textile parts and 5 finger textile parts.

6. A method of gesture recognition using the data glove for gesture recognition of any of claims 1 to 5, comprising:

7. The method of claim 6, wherein prior to the step of training parameter values for target parameters of a neural network model using the training data set, the method further comprises:

rejecting abnormal data in the training data set;

normalizing the local stretch sensor readings.

8. The method of claim 7, wherein normalizing the local stretch sensor readings comprises:

9. The method of claim 8, wherein the neural network model is a U-Net convolutional neural network model, and the step of training parameter values of target parameters of the neural network model using the training data set comprises: