WO2019173678A1

WO2019173678A1 - Optimal hand pose tracking using a flexible electronics-based sensing glove and machine learning

Info

Publication number: WO2019173678A1
Application number: PCT/US2019/021293
Authority: WO
Inventors: Erhan Arisoy; Livio Dalloro; Levent Burak Kara; Juan L. Aparicio Ojea; Wentai ZHANG; Nurcan GECER ULU; Jonelle YU; Fangcheng ZHU; Yifang ZHU; Burak Ozdoganlar; Kadri Bugra OZUTEMIZ; Carmel Majidi
Original assignee: Siemens Aktiengesellschaft; Carnegie Mellon University
Priority date: 2018-03-09
Filing date: 2019-03-08
Publication date: 2019-09-12

Abstract

Strain sensors may be arranged on a glove at locations determined by a most accurate machine-learned classifier of a plurality of machine-learned classifiers trained on different combinations of strain sensor placements. Measured strain data from a plurality of strain sensors arranged on a glove may be applied to a machine-learned model learned on training strain data and associated training hand poses to determine a measured hand pose. The measured hand pose may be output.

Description

OPTIMAL HAND POSE TRACKING USING A FLEXIBLE ELECTRONICS- BASED SENSING GLOVE AND MACHINE LEARNING

PRIORITY CLAIM

[0001] This application claims priority to U.S. provisional application serial number 62/640,906, filed 9 March 2018, U.S. provisional application serial number 62/640,875, filed 9 March 2018, U.S. provisional application serial number 62/641 ,609, filed 12 March 2018, and U.S. provisional application serial number 62/641 ,697, filed 12 March 2018, which are entirely

incorporated by reference.

FIELD

[0002] The following disclosure relates to tracking a hand pose using flexible electronics.

BACKGROUND

[0003] Hand gestures of human operators may be tracked. The gestures may be used to control a robot or to study or prevent workplace injuries. For example, a human operator may control a robotic arm by moving their hand or by performing a predetermined gesture with their hand. In another example, the movements of the operator during work are tracked and used to guide ergonomic changes to a workplace. Movement of the operator’s hands may also be correlated with the quality outcomes from a process.

[0004] In some cases, the hand gestures may be tracked using visual sensing. For example, an external camera records the hand gestures. In other cases, the operator may wear a glove incorporating sensors for tracking the hand gestures. The prevailing glove-based techniques commonly use bulky gloves that are cumbersome to wear while working. In still other cases, the operator holds a device containing the sensors for tracking the hand gestures. SUMMARY

[0005] By way of introduction, the preferred embodiments described below include methods, systems, instructions, and computer readable media for tracking a hand pose using flexible electronics.

[0006] In a first aspect, a method is provided for identifying a measured hand pose. Measured strain data from a plurality of strain sensors arranged on a glove is received and applied to a machine-learned model learned on training strain data and one or more associated training hand poses. The measured hand pose is determined based on the application of the measured strain data to the machine-learned model. The measured hand pose is output.

[0007] In a second aspect, a method is provided for training a machine learning model to determine a hand pose. Strain sensor data and a plurality of hand poses associated with the strain sensor data are stored. The machine learning model is trained to determine the hand pose based on the strain sensor data and the plurality of hand poses associated with the strain sensor data.

[0008] In a third aspect, a method is provided for determining an optimal arrangement of a plurality of strain sensors. First strain data and second strain data from the plurality of strain sensors arranged on a first glove, a first plurality of hand poses associated with the first strain data, and a second plurality of hand poses associated with the second strain data are stored. The first strain data from the plurality of strain sensors is grouped in different combinations. A plurality of machine learning classifiers is trained to

determine hand poses. The training is based on the different combinations of the first strain data and the first plurality of hand poses associated with the first strain data. At least one machine learning classifier of the plurality of machine learning classifiers is trained based on each different combination of the first strain data. The second strain data is applied to the plurality of machine-learned classifiers to determine a third plurality of hand poses. The second plurality of hand poses is compared to the third plurality of hand poses determined by the plurality of machine-learned classifiers. An accuracy of each of the plurality of machine-learned classifiers is determined based on the comparing. A machine-learned classifier of the plurality of machine-learned classifiers having a highest accuracy is selected. A second glove with strain sensors arranged on the glove at locations according to the combination of the first strain data used to train the machine-learned classifier having the highest accuracy is built.

[0009] In connection with any of the aforementioned aspects (including, for instance, those set forth above in the Summary), the systems or methods may alternatively or additionally include any combination of one or more of the following aspects or features. In the method for identifying a measured hand pose, the plurality strain sensors may be arranged on the glove at locations determined by a most accurate machine-learned classifier of a plurality of machine-learned classifiers trained on different combinations of strain sensor placements. In the method, the measured hand pose may include one or more joint angles and a number of the one or more joint angles may be larger than a number of the plurality of strain sensors. In the method, the one or more associated training hand poses may have been extracted from image data recorded contemporaneously with the training strain data. In the method, the training strain data may have been recorded while the glove is positioned in one or more predetermined gestures. In the method, the measured strain data may be collected during a movement of the glove and the measured hand pose may include the one or more joint angles throughout the

movement of the glove. The method may include generating a control command based on the measured hand pose and controlling a machine based on the control command. In the method, the plurality of strain sensors may be flexible Eutectic Gallium Indium-based strain sensors. The method may include generating an audial representation, a visual representation, or an audiovisual representation of the measured hand pose.

[0010] In the method for training a machine learning model to determine a hand pose, the hand pose may include a plurality of joint angles and a number of the plurality of joint angles may be larger than a number of the plurality of strain sensors that generated the strain sensor data. In the method, the plurality of hand poses associated with the strain sensor data may be predetermined based on second joint angles extracted from data recorded by a depth camera. The method may include recording image data contemporaneous with the strain sensor data and extracting the plurality of joint angles from the image data.

[0011] In the method for determining an optimal arrangement of a plurality of strain sensors, each combination of the first strain data may include different numbers and arrangements of strain data from the plurality of strain sensors. In the method, each combination of the first strain data may include strain data from at least three strain sensors. In the method, the machine learning classifiers may support vector machine models. The method may include recording the first strain data, adjusting a position of the glove, and recording the second strain data with the adjusted position of the glove. In the method, recording the first strain data may be performed while the glove is positioned in one or more predetermined gestures. In the method, two or more machine learning classifiers of the plurality of machine learning classifiers may be trained with strain data and associated hand poses from each combination of the first strain data.

[0012] The present invention is defined by the following claims, and nothing in this section should be taken as a limitation on those claims. Further aspects and advantages of the invention are discussed below in conjunction with the preferred embodiments and may be later claimed independently or in combination.

BRIEF DESCRIPTION OF THE DRAWINGS

[0013] The components and the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the invention. Moreover, in the figures, like reference numerals designate corresponding parts throughout the different views.

[0014] Figure 1 illustrates one embodiment of flexible electronics;

[0015] Figure 2 illustrates joints of a hand;

[0016] Figure 3 illustrates one embodiment of a method for determining an optimal arrangement of a plurality of strain sensors;

[0017] Figure 4 shows an example set of predetermined hand gestures;

[0018] Figure 5 illustrates one embodiment of a method for training a machine learning model to determine a hand pose; [0019] Figure 6a illustrates one wireframe representation of a hand;

[0020] Figure 6b illustrates another wireframe representation of a hand;

[0021] Figure 7 is a flow chart diagram of an embodiment of a method for training a machine-learning model to determine a hand pose;

[0022] Figure 8 illustrates one embodiment of a method for identifying a hand pose;

[0023] Figure 9 is a schematic representation of controlling a machine with gestures; and

[0024] Figure 10 is a block diagram of one embodiment of a system for determining an optimal arrangement of a plurality of strain sensors, training a machine learning model to determine a hand pose, and/or identifying a hand pose.

DETAILED DESCRIPTION OF THE DRAWINGS AND PRESENTLY PREFERRED EMBODIMENTS

[0025] Hand poses may be tracked with a wearable device. In one approach, a glove has two strain sensors disposed at each joint of the hand for determining the angle of the joint. However, to determine all ten joint angles for the five fingers of a hand, a glove normally needs 20 sensors and may be cumbersome to wear while working. A glove having many sensors may be heavy and reduce a length of time that the operator may work before resting.

[0026] In another approach, accelerometers may be attached to a glove to measure 12 predefined movements of a single finger. In order to measure movements of each finger and the hand as a whole, the glove has at least six accelerometers, which may make the glove very bulky or cumbersome to wear while working.

[0027] Vision-based approaches use a camera external to the hand to detect a hand pose. For example, RGB, infrared, or depth cameras may be used to determine the positions of fingers or fingertips of the hand. However, vision-based approaches require that the camera follow the operator, which may be impractical where the operator moves between locations while working. Additionally, vision-based approaches may not produce accurate hand poses when the hand is obscured, for example, by thick gloves or when an object is held in the hand. Further, external light sources may negatively impact the quality of images taken by the cameras, reducing the accuracy of the identified hand pose.

[0028] Flexible and soft electronics, as disclosed below, may allow for a lightweight system for detecting a hand pose. For example, flexible strain sensors may be constructed from liquid metals such as Eutectic Gallium Indium (EGaln), Galinstan, or ionic liquids. The soft and flexible strain sensors may be attached to a glove to measure joint angles of the hand. A small number of strain sensors attached at optimized locations on the glove may be used to predict a higher number of joint angles of the hand. Using fewer strain sensors may allow for a cheaper and lighter glove that may, with the aid of a computer, recognize hand poses more rapidly. The recognized gestures may be used to train or control robots using human movements as input for a human-machine interface (e.g. to avoid dirtying a screen by touching it), and to ensure worker’s hands are out of danger zones. The gestures may be aggregated to track and avoid injuries due to repeated movements in the workplace and to measure defects for quality control. Further, the gestures may be used to track in training and rehabilitation of workers, athletes and patients.

[0029] As discussed below, a machine-learning model may be created to relate the measured strain from plurality of strain sensors arranged on a glove with a hand pose. For this purpose, strain data may be measured for a minimum set of known hand poses and used to train the machine-learning model. The machine learned model may be used to determine a hand pose that was not used to train the machine learning model. That is, during deployment, strain data may be acquired from the hand pose using multiple strain sensors arranged on a glove worn on the hand. The strain data may be interpreted using the machine-learned model to determine the hand pose. The hand pose may be characterized by 14 joint angles of the hand. To determine these angles from the strain data, the strain data may be applied to the machine learned model. The machine learned model may accept as input strain data and may output the 14 joint angles. In some cases, the machine learned model may be trained off-line using a set of‘training’ hand poses where both the strain data and the associated joint angles may be acquired using the strain sensors and optical (or any other mode of) tracking. This allows the machine learning model to learn a mapping from the strain data to the joint angles. Strain data acquired during deployment may be applied to the learned machine learned model to get the joint angles. The process may be represented as a regression model y = f(x) where f has been learned by the machine learned model, where x is the strain data (e.g., 10 dimensional if 10 sensors are used), and where y is the hand pose joint angle data (e.g., a 14- dimensional vector). The output may the predicted joint angle data.

[0030] An optimal arrangement of strain sensors on the glove may be determined. A minimum number of strain sensors and their placement on the glove may be determined to enable complete identification of any hand pose a person can have. For this purpose, the strains associated with all possible hand poses are first determined using the derived machine-leaning based model above. Next, an inverse approach is used, where the strain data from a smaller set of sensors is applied to the machine learned model to identify a hand pose. All combinations of sensors are tested, and the smallest set of sensors (and the associated locations) that can determine all hand poses with high accuracy is identified as the optimal number and placement of sensors.

An accuracy of each combination of sensors may be determined based on comparison of hand-pose determination using strain data from all the sensors with respect to the reduced set of sensors.

[0031] In some cases, an additional sensor may be integrated into the glove to also indicate the spatial position, orientation, and/or acceleration of the hand. For example, the additional sensor may be located on the back of the hand, Information from the additional sensor may be output in addition to the hand pose to provide a complete spatial description of the hand position, orientation, and pose.

[0032] Figure 1 illustrates one embodiment of flexible electronics. Strain sensors 101 may be connected via interconnects 103 to a controller 105. The strain sensors 101 and interconnects 103 may be disposed on a glove 107. Strain data measured by the strain sensors 101 may be transmitted from the glove via a cable 109. More or fewer components may be provided. For example, though ten sensors 101 are shown, any number of sensors 101 may be used. In another example, the cable 109 may not be present. In a further example, the controller 105 is located remotely from the glove 107.

[0033] The strain sensors 101 may be strain gauges. The strain sensors 101 may be flexible strain sensors. The sensors 101 may be formed by printing one or more layers of material. In some cases, the sensors may be formed by micro contact printing or direct wire printing. One example of the strain sensors 101 is the Omega model KFH-20-120-C1 -11 L1 M2R strain gauge. An electrical property of the sensors 101 may change when the sensor 101 is bent. Measuring an extent of the change in the electrical property may indicate an angle of the bend.

[0034] The sensors 101 may be a part of or attached to the glove 107. For example, the sensors 101 may be integrally formed with the glove 107 (e.g., dispensed directly onto or into the glove fabric). In another example, the sensors 101 may be attached to the glove 107 with adhesive. Double sided tape may be used to secure the sensors 101 to the glove 107. In a further example, the sensors 101 are mechanically joined to the glove 107 via a slot, clip, or fastener.

[0035] In some cases, the sensors 101 may be made from EGaln or another material. For example, the sensors 101 may be EGaln electrodes embedded in elastomer. In another example, the sensors are flexible liquid metal or a liquid salt-solution based strain sensors In contrast to mercury and lead-based low-melting-point solders, EGaln alloys are nontoxic and may rapidly oxidize in air to form a self-sealing outer layer. EGaln alloys are orders of magnitude more conductive (s = 3x10⁶ S/m) than ionic solutions or conductive inks, allowing for the construction of stretchable circuit wiring for elastomer-sealed electronics with EGaln. Other materials, such as Galistan or ionic liquids may be used in place of EGaln. Elastomer may be selected for its mechanical properties (e.g. flexibility) and durability. By making a multi-layer sensor, both tension and compression may be measured by the strain sensors. [0036] During initial setup of the sensors 101 , the glove 107 may be fitted to the hand of a user. With the fingers of the hand bent, marks may be made on the glove above each knuckle or at other joints of interest. With the hand laid flat, the sensors 101 may be secured to the glove at the marked locations corresponding to the knuckles or other joints. By doing so, the sensors 101 remain neutral (e.g. are not in significant tension or compression) when the hand is in a flat, relaxed state. The sensors 101 will undergo tension or compression when the hand or fingers are moved and generate a signal.

[0037] The interconnects (or leads) 103 are conductive paths that electrically connect the sensors 101 to other elements. For example, the interconnects 103 may form an electrical connection between the sensors 101 and the controller 105 or the cable 109. The interconnects 103 may be flexible. Flexibility may allow for the interconnects 103 and the sensors 101 to conform to a contour of the glove 107. The interconnects 103 may be formed from the same material as the sensors 101 or from another material. For example, interconnects 103 may be made from EGaln or similar liquid conductors and embedded in an elastomer.

[0038] The controller 105 may receive data from the sensors 101 via the interconnects 103. Additionally or alternatively, the controller 105 may communicate with the sensors 101 via a wireless link (e.g., Bluetooth or WiFi). The controller 105 may send the sensor data to a remote computer, or an on- board microcomputer may be incorporated with 105. In some cases, the controller 105 may send the sensor data via the cable 109. In other cases, the controller 105 may send the data via a wireless link.

[0039] Because the change in electrical resistance of the sensors 101 due to bending is minimal, the controller 105 may include an amplifier microcircuit so that the signal from the sensors 101 generates a detectable signal.

[0040] The glove 107 may be made from flexible material. For example, the glove 107 is flexible to conform to a hand of a user or operator. In some cases, the glove 107 may be made from latex, spandex, or other suitable materials. One example of the glove 107 is the ULINE Microflex Diamond Grip. [0041] The cable 109 may carry data from the controller 105 or sensors 101 to a remote computer or vice versa. The cable 109 may be a ribbon cable. In some cases, the cable 109 may also provide power to the sensors 101 and the controller 105. With a wireless glove 107, the cable 109 may not be present or installed at all times. For example, the cable 109 may be removed when the glove 107 is in use or installed on a hand.

[0042] In some cases, the glove 107 may include additional sensors other than the strain sensors 101 disposed about the joints. For example, another sensor may be disposed on the glove 107 and may measure additional information about the hand. The sensor may be an accelerometer, strain sensor, or another sensor. The additional information may be a spatial position, acceleration, or an orientation of the hand.

[0043] Figure 2 illustrates joints 201 of a hand 203. The fingerjoints 201 are labeled 1 -14. The joints 201 may be organized in a different order than shown. The hand 203 may be a right or left hand of a user or operator.

[0044] Angles of the finger joints 201 may be measured by strain sensors. For example, the strain sensors 101 of Figure 1 may measure the joint angles for one or more of the joints 201. By using only 14 knuckle angles to model a hand pose, essential statistics such as finger’s bending may be reconstructed while other information, such as finger orientation and bone length, are not included in the model. Flaving fewer strain sensors 101 may reduce cost, weight, and complexity of both the construction of the glove and the interpretation of the measured data.

[0045] Figure 3 illustrates one embodiment of a method for determining an optimal arrangement of a plurality of strain sensors. More, fewer, or different acts may be performed. For example, acts 301 , 303, 305, and 319 may be omitted. The acts may be performed in a different order than shown. For example, act 303 may proceed from act 301.

[0046] In act 301 , first strain data is recorded. The strain data may be generated by one or more strain sensors, such as the sensors 101 of Figure 1 or sensors at the fourteen joints noted in Figure 2. The sensors may be disposed on a glove and generate data indicating an angle or bend of the sensors. The sensors may be disposed on the glove about a joint of the hand so that as the hand (e.g., a finger of the hand) is bent, the sensor is also bent. In some cases, there are ten sensors on the glove for measuring ten angles. More or fewer sensors may be used.

[0047] The glove with sensors arranged in locations without or prior to optimization may be referred to as a first-generation glove. A glove with sensors arranged in optimized locations (e.g. based on data from a first- generation glove) may be referred to as a second-generation glove.

[0048] The strain data may be recorded while the glove is positioned in one or more predetermined gestures. For example, a user may perform one or more gestures with their hand while wearing the glove as the strain sensors generate data indicating an angle or bend of the strain sensors. In some cases, the sensor data is recorded while the user or the glove is positioned in a one or more predetermined gestures. For example, the predetermined gestures may be the first thirteen letters of the American Sign Language (ASL) alphabet. The letters are shown in Figure 4. The user may hold their hand or the glove in one predetermined gesture (e.g. make one ASL letter) for an amount of time and then proceed to hold their hand or the glove in another predetermined gesture (e.g., another ASL letter). The strain data is recorded for each held position and/or during use including transition between held positions. The strain data may be preprocessed to remove the transition period between predetermined gestures.

[0049] In act 303, the position of the glove is adjusted. After the first strain data is recorded, the glove may be removed and worn again by a user. By adjusting the position of the glove before acquiring the second strain data, the locations of the strain sensors relative to the hand may change slightly.

[0050] In act 305, second strain data is recorded. The second strain data may be recorded in a similar manner as the first strain data in act 301. The second strain data may be recorded while the glove is positioned in the one or more predetermined gestures. For example, the second strain data may be recorded while a user wearing the glove sings the first thirteen ASL letters. In some cases, the second strain data may be recorded with an adjusted position of the glove. In some cases, the second strain data may be used to test a machine-learning classifier, for example in acts 311 , 313, and 315. [0051] In act 307, the first and second strain data may be stored.

Additionally, hand poses associated with the strain data may be stored. For example, the predetermined gestures performed while the first strain data was recorded may be stored. In this way, a hand pose (e.g. a predetermined gesture or an ASL letter) may be associated with the strain data of a given time. The strain data and the associated hand poses may be stored in a memory, for example, the memory 1005 of Figure 10.

[0052] The strain data includes data from one or more strain sensors. The strain sensors (and the associated strain data and hand poses) may be grouped into different combinations of data from one or more sensors. For example, strain data from 3, 4, or more strain sensors may be grouped together. As another example, ten different groupings are provided where each grouping is formed from a different combination of strain sensors and the corresponding strain data.

[0053] In act 309, machine learning (“ML”) classifiers are trained. The machine learning classifiers may support vector machine models. For example, a machine learning classifier may be a multi-class support vector machine model with 10-fold cross validation. The machine learning classifiers may be trained using the strain data to determine a hand pose based on input strain sensor data. In some cases, the machine learning classifiers are trained with the first strain data but not the second strain data. A different machine learning classifier may be trained for each different combination of strain sensors using the corresponding strain sensor data. For example, one or more machine learning classifiers may be trained on a combination of strain sensor data of three different strain sensors and the associated hand poses.

[0054] Multiple machine learning classifiers may be trained for each combination. Training multiple (e.g., three) classifiers for each combination of strain sensors may help reduce any variance in performance of the classifiers due to initialization bias. The performance of the classifiers trained on the same combination may be averaged.

[0055] By training one or more classifiers on each combination of different strain sensors, the combination of strain sensors that most accurately predicts the hand poses based on the strain data from those strain sensors may be determined. For example, three machine learning classifiers may be trained on the same combination of sensors, strain data, and associated hand poses. This combination may represent a minimum number of sensors necessary to accurately predict the hand poses.

[0056] In some cases, the machine learning classifiers may be part of the machine learning model trained in Figure 5. Additionally or alternatively, the machine learning classifiers may be part of the machine learned model of Figure 8. The machine learning classifiers may be retrained or updated based on the strain data and hand poses of act 505 of Figure 5. Additionally or alternatively, the machine learning classifier may be updated based on the strain data of act 801.

[0057] In act 311 , the second strain data is applied to the machine-learned classifiers. By applying the second data to the plurality of machine-learned classifiers, a plurality of hand poses may be generated by the machine- learned classifiers. The second strain data may be used to test the ability of the machine-learned classifiers in determining a hand pose that matches the hand pose associated with the second strain data. In some cases, the machine learning classifier was trained based on strain data from a particular combination of strain sensors. Second strain data from the same combination of sensors as the machine-learned classifier was trained on may be applied to the machine-learned classifier.

[0058] In act 313, the hand poses associated with the second strain data are compared to the hand poses generated by the machine-learned classifiers. The poses determined by the machine-learned classifiers may be compared with the known pose for the data in order to determine an accuracy for the machine-learned classifiers.

[0059] In act 315, accuracies of the machine-learned classifiers may be evaluated. For example, the accuracy may be expressed as a root mean squared error. The accuracy may be based on how many hand poses were determined correctly when compared to the hand poses associated with the second strain data. Where multiple machine-learned classifiers were trained based on the same combination of sensors, the performance of the classifiers may be averaged to determine an accuracy for the combination overall. [0060] In act 317, the most accurate machine-learned classifier may be selected. By comparing the accuracy of different classifiers, the most informative sensor combinations may be determined by choosing the combination with the highest accuracy. The selection may be constrained or influenced, such as weighting based on number of strain sensors in the combination to select a workable but fewer number where the greater number of strain sensors only provide incremental increase in accuracy. The combination of sensors resulting in a highest accuracy may represent a minimum number of sensors needed to accurately predict the hand poses.

[0061] In act 319, a second-generation glove may be built. The sensors on the second-generation glove may be arranged at locations according to the combination of sensors and sensor data that resulted in the most accurate or selected machine learning classifier. In some cases, the most accurate configuration of three strain gauges uses strain data from sensors disposed at joints of the thumb, middle finger, and pinky finger, labeled joints 1 , 6, and 13, respectively, in Figure 2. A combination of strain sensor data from three strain sensors arranged at those locations may be the minimum number of sensors needed to accurately predict the hand poses.

[0062] The second-generation glove may be the first-generation glove with some strain sensors removed. The second-generation glove may be independently manufactured, such as mass producing copies of the second- generation glove.

[0063] Figure 4 shows an example set of predetermined hand gestures 401. The hand gestures 401 are labeled P1-P13, corresponding to the first thirteen letters of the ASL alphabet, A through M. Fland poses for additional, different, or fewer letters in the ASL alphabet may be used. Many other hand gestures for communication may also be used.

[0064] Other gestures may be used, such as hand poses used as an interface for controlling a robot, or hand poses that are used in manufacturing and assembly, or any other processes that require human labor or robots.

[0065] The gesture P0 corresponds to a hand lying flat. This may be used as a calibration point to zero all sensor readings before recording the sensor data. Any other predetermined gesture may be used as a calibration gesture. [0066] The predetermined gestures 401 may be performed while strain sensor data is acquired. For example, the predetermined gestures 401 may be performed during acts 301 and 303 of Figure 3. A user or operator may perform the predetermined gestures 401 one at a time and hold each gesture for a period of time. For example, each gesture may be held for 10 seconds. Holding a gesture may allow for strain data from the strain sensors to stabilize. In some cases, a camera may record image data of the

predetermined gestures being performed. For example, the image data may be recorded during act 501 of Figure 5.

[0067] Figure 5 illustrates one embodiment of a method for training a machine learning model to determine a hand pose. In this embodiment, a vision-based approach is used for training. More, fewer, or different acts may be performed. For example, acts 501 and 503 may be omitted.

[0068] The machine learning model may be trained to map lower dimension strain sensor data to higher dimensional joint angle data. In some cases, sensor data may be acquired using a glove with an optimized strain sensor layout, for example according to the acts of Figure 3. In other cases, a glove with a non-optimized strain sensor layout may be used. In some cases, though the optimal sensor locations on the glove may be determined using discrete predetermined gestures, a machine learning model may be trained using sensor data, hand poses, joint angles, and image data from continuous or extended movements of the glove.

[0069] In act 501 , image data indicating the pose of a hand wearing the glove is recorded. A camera may be used to capture images of the glove or hand while the predetermined gestures are performed. In one example, the camera is a depth camera, such as a Leap Motion camera, that captures three-dimensional (3D) images. The Leap Motion or another depth camera may directly record the 3D coordinates of all the joints on a hand in real time.

In another example, the camera is an RGB camera that captures two- dimensional (2D) images. In a further example, the camera is a stereo camera that captures 3D images. The image data may be captured simultaneously or contemporaneously with the recording of the strain sensor data. The captured image data may be used to identify joint angles of the hand. The gestures may be held for a period of time or performed slowly so that the joint angles extracted from the images are consistent and stable.

[0070] In act 503, joint angles are extracted from the image data. The joint angles may be determined using vector geometry. For example, vectors may be drawn that follow each segment of a finger or another portion of the hand or glove. The angles between adjacent vectors may be determined as the joint angles. The joints for which joint angles are determined may correspond to the joints 201 of Figure 2. The joint angles may serve as ground truth for mapping strain sensor data to a hand pose.

[0071] The number of joint angles may be larger than the number of strain sensors disposed on the glove. The glove may use an arrangement and number of strain sensors as determined by the acts of Figure 3 or in another manner so that a lesser number of strain sensors may be used to predict or determine a hand pose having a larger number of joint angles. Alternatively, joint angles are determined only for joints with strain sensors.

[0072] In act 505, strain data and hand poses are stored. The strain data and associated hand poses may be stored in a memory, for example, the memory 1005 of Figure 10. The hand poses may be extracted from image data. For example, the hand pose may include joint angles extracted from image data. In this way, the machine learning model may be trained to predict a hand pose including the joint angles based on input strain data. In another example, the hand pose may include one or more predetermined gestures extracted from the image data. In a further example, the hand pose may include predetermined gestures determined based on the extracted joint angles.

[0073] In act 507, a machine learning model is trained based on the strain sensor data and the hand poses associated with the strain sensor data. The machine learning model may have double hidden layers. In some cases, the machine learning model may incorporate a feed-forward neural network to map the strain data to the hand pose. The machine-learned model may accept as input strain sensor data and output a hand pose for the hand or glove for which the strain data was recorded. The hand pose may include more joint angles than the number of strain sensors that generated the strain sensor data. For example, the machine-learned model may determine a hand pose including ten joint angles based on strain data from three strain sensors. The hand pose may be a representation of the skeleton or segments of the hand. Such a representation is shown in Figure 6b.

[0074] The machine-learned model may more accurately determine the hand pose and joint angles than applying linear regression or quadratic regression to the strain data.

[0075] In some cases, the machine learning model may be used in Figure 8 to determine a hand pose. For example, the machine learning model may be used in acts 803 and/or 805. Additionally or alternatively, the machine learning model may include the machine learning classifier of Figure 3. The machine learning classifier may be retrained or updated based on the strain data and hand poses of act 505.

[0076] Figure 6a and 6b illustrate two wireframe representations of a hand. The first representation 601 includes nodes 605 and links 607 of the hand. The second representation 603 also includes nodes 605 and links 607 of the hand.

[0077] The first representation 601 may be based on image data from a camera. For example, coordinate data from 3D images from a depth camera may be used to determine the location of the nodes 605 and the links 607 joining the nodes 605. The first representation 601 may include additional information such as the angles between fingers represented by the links 607. For example, the links 607 may be at an angle relative to an axis of the hand instead of being aligned with the axis.

[0078] The second representation 603 may be an output of a machine- learned model. The second representation may include the joint angles between links 607 through the nodes 605. In some cases, the second representation 603 may lack additional information about the angle between neighboring links (e.g. angles between fingers) that may be included in the first representation 601. Other information such as bone or link 607 length, palm orientation, or base knuckle location may be predetermined and not generated by the machine-learned model. Despite including less information, the second representation 603 may be used to recognize a predetermined gesture, track a user or operators hand movements, or control a machine, for example.

[0079] The nodes 605 may correspond to joints or endpoints of the hand. For example, the nodes 605 may represent knuckles and fingertips. For the first representation 601 , the locations of the nodes 605 may be extracted from image data. For the second representation, one or more coordinates of the nodes 605 may be preset or predetermined and not determined by the machine-learned model.

[0080] The links 607 may correspond to the bones between the nodes 605. For the first representation 601 , the lengths of the links 607 may be extracted from image data. For the second representation 603, the length of the links 607 mat be preset or predetermined and not determined by the machine-learned model.

[0081] Figure 7 is a flow chart diagram of an embodiment of a method for training a machine-learning model to determine a hand pose. Sensors 701 on a glove 703 may generate sensor data and transmit the data over a connection 705 to a computer 707. The sensor data may be processed 709 and form part of a training dataset 71 1 for training a machine learning model 713. Meanwhile, image data 715 may be captured by a camera 717 and hand poses 719 may be extracted from the image data 715. In some cases, the hand poses 719 may form part of the training data set 711 for the machine learning model 713. The machine learning model may learn to output a hand pose 721 based on the strain data.

[0082] The sensors 701 may be attached to the glove and generate data based on a bend or angle of the sensors 701. The sensors 701 may be the strain sensors 101 of Figure 1. The sensors 701 may generate strain data as the glove is held in one or more predetermined or unknown gestures or during a movement of the glove. The sensors 701 may be arranged on the glove according to the acts of Figure 3 or have other arrangements (e.g., arrangement of Figure 1 ). Though ten strain sensors 701 are shown, fewer sensors 701 in different arrangements on the glove 703 may be used.

[0083] The glove 703 may be a flexible glove worm by a user. For example, the glove 703 may be the glove 107 of Figure 1. The glove may support the sensors 701. For example, the sensors 701 may be adhered to the glove 703 or secured with a fastener.

[0084] The connection 705 may be a wired or wireless connection to the computer 707. In some cases, the connection 705 is formed by the wire 109.

In some other cases, the connection 705 is formed by a Bluetooth connection or other wireless connection. The connection 705 may transmit the data from the sensors 701 to the computer 707 for processing 709.

[0085] The computer 707 may be a general purpose or specialized computer. For example, the computer may be the computing system 1001 of Figure 10. The computer 707 may communicate with the sensors 701 via the connection 705. For example, the computer 707 may receive the strain sensor data from the sensors 701. In some cases, the computer 707 or a processor of the computer 707 may be configured to perform the acts of Figures 3, 5, and 8.

[0086] The processing 709 may include dividing the sensor data into blocks. For example, a user may perform one or more predetermined gestures, holding the glove 703 in a position for a period of time during each gesture while sensor data is recorded from the sensors 701. During the processing 709, the sensor data may be divided so that each block includes the sensor data corresponding to one or more predetermined gestures. One or more hand poses 719 (e.g. including one or more predetermined gestures) may be associated with each block. Additionally or alternatively, the sensor data may be recorded during a movement of the glove 703 and divided into blocks. Fland poses 719 may be associated with each block. In some cases, the hand poses 719 include joint angles of the hand or glove. In this way the joint angles may be associated with each block of sensor data. The joint angles may be extracted from image data. The output of the processing 709 may form part of the training dataset 711.

[0087] The training dataset 711 may include data from the strain sensors 701 and associated hand poses 719. The training dataset 711 may be used to train the machine learning model 713 to output a hand pose 721 based on input strain sensor data. [0088] The machine learning model 713 may include one or more of regression, quadratic regression, and feed forward neural networks. The machine learning model 713 may be trained to map from strain sensor data from a smaller number of strain sensors 701 to a larger number of joint angles in a hand pose 721.

[0089] The image data 715 may include visual information about the glove 703. The image data 715 may be recorded or acquired by the camera 717 as the glove 703 is held in one or more predetermined gestures or during a movement of the glove 703. In some cases, the image data 715 may be a 3D representation of the glove 703.

[0090] The camera 717 may be a RGB, stereo, depth, or other type of camera. For example, the camera 717 may be a Leap Motion depth camera. The camera 717 may capture visual information about the glove 703 as the glove is held in one or more predetermined gestures or during a movement of the glove 703.

[0091] The hand poses 719 may be extracted from image data 715 captured by the camera 717. For example, the computer 707 or another processor may extract one or more predetermined gestures or joint angles from the image data 715. The predetermined gestures may be recognized from the image data by matching the image data to the predetermined gestures. Vector geometry may be used to extract the joint angles from a 3D image 715 generated by a depth camera 717.

[0092] Once trained, the ML model 713 may output a pose from input strain sensor data. The output hand pose 721 may include one or more gestures or joint angles determined by the machine learning model 713 based on input strain sensor data. The hand pose 721 may be determined according to the acts of Figure 8. The number of joint angles in the hand pose 721 may be greater than the number of strain sensors 703 that generated the input strain sensor data. The hand pose may be the representation 603 of Figure 6b.

[0093] Figure 8 illustrates one embodiment of a method for identifying a hand pose. More, fewer, or different acts may be performed. Forexample, acts 809, 811 , and 813 may be omitted. The acts may be performed in any order. For example, act 811 may process from act 813.

[0094] Strain sensors located at limited number of optimum joint positions (e.g. as determined by the acts of Figure 3) or other arrangement of strain sensors record data of a glove or hand in motion. By training a machine learning model (e.g. using the acts of Figure 5) using image data from a vision-based depth camera, the strain sensor data may be mapped to fourteen joint angles, for example, of the hand or glove. The result is that the total hand skeleton (or a representation of the hand or glove, for example, as shown in Figure 6b) may be reconstructed based on data from several or any number of strain sensors.

[0095] In act 801 , strain data is received. The strain data may be generated by strain sensors on a glove. In some cases, the strain data may be referred to as measured strain data. The location and number of the strain sensors on the glove may be determined by a machine-learned classifier. For example, the machine-learned classifier trained in Figure 5 may determine an arrangement of strain sensors on the glove based on a highest accuracy in predicting hand poses. The strain data may be collected or generated during a movement of the hand or glove. For example, the strain sensors may measure the strain on the glove as a hand wearing the glove performs a hand pose. In some cases, the strain sensors may be EGaln sensors.

[0096] In act 803, the strain data is applied to a machine-learned model. The machine-learned model may be learned on a set of training strain data and associated training hand poses. In some cases, the training hand poses may be extracted from image data recorded simultaneously or

contemporaneously with the recording of the training set of strain data. The training strain data may be recorded while the glove is positioned in one or more predetermined gestures or during a movement of the glove. For example, the set of training strain data may be recorded while a user wearing the glove signs one or more ASL letters. Additionally or alternatively, the set of training strain data may be recorded as a user wearing the glove moves their hand, such as by bending fingers, moving a wrist, or other motions. The movement of the glove while the training strain data is recorded may be referred to as the training hand poses.

[0097] In some cases, the machine-learned model may have been trained according to the acts of Figure 5. Additionally or alternatively, the machine- learned model of act 803 may include the machine learning classifier of Figure 3 that is trained to determine an optimal arrangement of a plurality of strain sensors. The machine learning classifier may be retrained or updated based on the strain data of act 801.

[0098] In act 805, a hand pose is determined. The hand pose may be determined based on applying the measured strain sensor data to the machine-learned model. The hand pose to be determined based on the measured strain sensor data may be referred to as a measured hand pose. In some cases, the hand pose includes one or more of a predetermined gesture of the glove. In other cases, the hand pose includes one or more joint angles of the hand. A wireframe representation may represent the one or more joint angles, for example, as shown in Figure 6b. The number of strain sensors on the glove that generated the measured strain data may be less than the number of joint angles in the hand pose. In this way, the machine-learned model is able to map the lower dimension strain sensor data to the higher dimension joint angle and hand pose. For example, data from three strain sensors may be used to predict fourteen joint angles.

[0099] The joint angles may vary in time. For example, where the strain data is generated during a movement of the glove or hand, the hand pose, including the joint angles, may represent the movement of the glove or hand over time.

[00100] In act 807, the hand pose is output. The display 1011 of Figure 10 may output the hand pose. The hand pose may be output for modelling.

For example, a model of the hand of the user may be constructed based on the hand pose. The model may be the representation 603 of Figure 6b. The hand pose may be output in real time, in batches, or in another way. For example, the hand pose may be output as the hand pose is determined in response to receiving strain data in act 801. In some cases, the hand pose may be output along with information from an additional sensor on the glove. The additional information may indicate a spatial position, orientation, and/or acceleration of the hand. Together with the hand pose, the additional information may provide a complete spatial description of the hand position, orientation, and pose.

[00101] In act 809, a control command is generated. The control command may be based on the hand pose. For example, the hand pose may be used to guide or instruct a machine to perform an action. A robot, drone or other machine may follow the command. The command may be chosen from a set of predetermined control commands. For example, the hand pose may correspond to one or more of the predetermined control commands.

[00102] In act 811 , a machine is controlled based on the control command. The machine may perform an action in response to the control command. For example, a drone may raise, lower, hover, land, rotate, move forward, move backwards, or perform another action in response to the control command. In another example, a robot may position a tool based on the control command.

[00103] In act 813, a representation of the hand pose is generated. The representation may be an audial representation, a visual representation, or an audiovisual representation of the hand pose. For example, the representation may be a written description of the hand gestures based on an alphabet. The alphabet may be the ASL alphabet. The written description may include one or more letters or words indicated by the hand pose. In another example, the representation may include one or more predetermined pose interpretations. The predetermined pose interpretations may include a written, graphical, or visual label of the hand pose. The pose interpretation may be descriptive. For example, the pose interpretation may label a hand pose as open palm, waving, closed fist, or another interpretation. In a further example, the representation may include a spoken representation of the hand pose. The speech may be human or computer synthesized speech. The speech may be prerecorded. The speech may be based on an alphabet such as the ASL alphabet. For example, the speech may vocalize one or more letters or words indicated by the hand pose. In another example, the speech may include a vocalization of one or more predetermined pose interpretations. In some cases, the representation may include audial and visual information. For example, the representation may include a written or graphical representation of a letter, word, or pose interpretation of the hand pose with a speech vocalization of the letter, word, or pose interpretation. The representation may be output by the display 1011 of Figure 10.

[00104] Figure 9 is a schematic representation of using one or more gestures 901 , 903 to control a machine 91 1. The gestures 901 , 903 may be recorded by strain gauges 905 on a glove 907 worn by a user or operator. Data from the strain gauges may be input to a machine-learned model 909. The machine-learned model 909 may determine a hand pose based on the sensor data from the strain sensors 905 and output a command to the machine 911.

[00105] The first gesture 901 may be made by a user according to one or more predetermined gestures to control the machine 911. For example, the first gesture 901 may be a clenched fist which corresponds to a command to stop the machine 911. The gesture 901 may be measured by the strain sensors 905.

[00106] The second gesture 903 may be made by a user according to one or more predetermined gestures to control the machine 91 1. For example, the second gesture 903 may be an open hand which corresponds to a command to make the machine 911 run. The gesture 903 may be measured by the strain sensors 905.

[00107] The strain sensors 905 may generate data based on a movement or gesture of the glove 907. The sensors 905 may be the sensors 101 of Figure 1.

[00108] The glove 907 may be worn by the user or operator and support one or more of the sensors 905. The glove 907 may be the glove 103 of Figure 1.

[00109] The machine-learned model 909 may accept as input the strain sensor data from the sensors 905 and output a hand pose or command. In some cases, the machine-learned model 909 may be trained according to the acts of Figure 5. The machine-learned model 909 may generate a hand pose based on the strain data. The hand pose may correspond to one or more control commands for the machine 911. The machine-learned model or a processor (e.g. the processor 1003 of Figure 10) may determine the one or more control commands based on the hand pose. The control command may be output to the machine 91 1 to control the machine 911.

[00110] The machine 911 may be controlled based on a control command determined from the hand pose and sensor data. For example, a drone may raise, lower, hover, land, or perform another action in response to the control command. Though a flying drone is shown, any type of machine 911 may be controlled based on the control command. In another example, a robot may position a tool based on the control command.

[00111] Figure 10 is a block diagram of one embodiment of a computing system 1001 for determining an optimal arrangement of a plurality of strain sensors, training a machine learning model to determine a hand pose, and identifying a hand pose. The computing system 1001 may include a processor 1003 coupled with a memory 1005 and in communication with strain sensors 1007, a camera, 1009, and a display 1011. The computing system 601 performs the acts of Figures 3, 5, 8, or other acts.

[00112] The processor 1001 may be a general purpose or application specific processor. The processor may be configured to apply sensor data to machine-learned classifiers and models. Based on the applying, the processor may be configured to determine an optimal arrangement of sensors or to determine a hand pose including one or more predetermined gestures or joint angles.

[00113] The memory 1005 may be a non-transitory computer readable storage medium. The memory 1005 may be configured to store instructions that cause the processor to perform an operation. For example, the memory 1005 may store instructions that, when executed by the processor 601 , cause the processor 1001 to perform one or more acts of Figure 3, Figure 5, or Figure 8. The memory 1005 may be configured to store sensor data, image data, associated hand poses, joint angles, and machine-learned classifiers and models. The instructions for implementing the processes, methods, and/or techniques discussed herein are provided on non-transitory computer- readable storage media or memories, such as a cache, buffer, RAM, removable media, hard drive, or other computer readable storage media. Non-transitory computer readable storage media include various types of volatile and nonvolatile storage media. The memory 1005 may also be configured to store the training dataset for machine learning classifiers and models.

[00114] The strain sensors 1007 may generate data based on a bend or flex in the sensor. The strain sensors 1007 may be the strain sensors 101 of Figure 1. The strain sensors 1007 may be in communication with the processor 1003 via the cable 109 of Figure 1 or the connection 705 of Figure 7.

[00115] The camera 1009 may generate image data. For example, the camera 1009 may generate image data of a glove supporting the strain sensors 1007 while the glove is held in one or more predetermined gestures or during a movement of the glove. The camera 1009 may be a depth camera, such as a Leap Motion camera, an RGB camera, a stereo camera, or another type of camera. Image data from the camera 1009 may be stored in the memory 1005. The camera 1009 may be the camera 717 of Figure 7.

[00116] The display 1011 may be configured to accept user input and to display audiovisual information to the user. In some cases, the display 1011 may include a screen configured to present the audiovisual information. For example, the display 1011 may present the determined hand pose,

predetermined gesture, or joint angles using the screen. In another example, the hand representations 601 , 603 of Figure 6a and Figure 6b may be displayed using the screen. The display 1011 may include a user input device. In some cases, the user may input information relating to the combination of strain sensors 1007 to be used in determining the hand pose.

[00117] While the invention has been described above by reference to various embodiments, it should be understood that many changes and modifications can be made without departing from the scope of the invention.

It is therefore intended that the foregoing detailed description be regarded as illustrative rather than limiting, and that it be understood that it is the following claims, including all equivalents, that are intended to define the spirit and scope of this invention.

Claims

I (WE) CLAIM:

1. A method for identifying a measured hand pose, the method comprising:

receiving, by a processor, measured strain data from a plurality of strain sensors arranged on a glove;

applying, by the processor, the measured strain data to a machine- learned model learned on training strain data and one or more associated training hand poses;

determining, by the processor, the measured hand pose based on the application of the measured strain data to the machine-learned model; and

outputting, by the processor, the measured hand pose.

2. The method of claim 1 , wherein the plurality of strain sensors are arranged on the glove at locations determined by a most accurate machine- learned classifier of a plurality of machine-learned classifiers trained on different combinations of strain sensor placements.

3. The method of claim 1 , wherein the measured hand pose includes one or more joint angles, and

wherein a number of the one or more joint angles is larger than a number of the plurality of strain sensors.

4. The method of claim 3, wherein the one or more associated training hand poses were extracted from image data recorded contemporaneously with the training strain data.

5. The method of claim 4, wherein the training strain data was recorded while the glove is positioned in one or more predetermined gestures.

6. The method of claim 3, wherein the measured strain data is collected during a movement of the glove, and wherein the measured hand pose includes the one or more joint angles throughout the movement of the glove.

7. The method of claim 1 , further comprising:

generating, by the processor, a control command based on the measured hand pose; and

controlling, by the processor, a machine based on the control command.

8. The method of claim 1 , wherein the plurality of strain sensors are flexible Eutectic Gallium Indium-based strain sensors.

9. The method of claim 1 , further comprising:

generating, by the processor, an audial representation, a visual representation, or an audiovisual representation of the measured hand pose.

10. A method of training a machine learning model to determine a hand pose, the method comprising:

storing, in a memory, strain sensor data and a plurality of hand poses associated with the strain sensor data; and

training with machine learning, by a processor, the machine learning model to determine the hand pose based on the strain sensor data and the plurality of hand poses associated with the strain sensor data.

11. The method of claim 10, wherein the hand pose includes a plurality of joint angles, and

wherein a number of the plurality of joint angles is larger than a number of the plurality of strain sensors that generated the strain sensor data.

12. The method of claim 11 , wherein the training hand poses

associated with the training strain data are predetermined based on second joint angles extracted from data recorded by a depth camera.

13. The method of claim 10, further comprising:

recording, by a depth camera, image data contemporaneous with the strain sensor data; and

extracting, by the processor, the plurality of joint angles from the image data.

14. A method for determining an optimal arrangement of a plurality of strain sensors, the method comprising:

storing, in a memory, first strain data and second strain data from the plurality of strain sensors arranged on a first glove, a first plurality of hand poses associated with the first strain data, and a second plurality of hand poses associated with the second strain data, wherein the first strain data from the plurality of strain sensors is grouped in different combinations;

training with machine learning, by the processor, a plurality of machine learning classifiers to determine hand poses, the training based on the different combinations of the first strain data and the first plurality of hand poses associated with the first strain data, wherein at least one machine learning classifier of the plurality of machine learning classifiers is trained based on each different combination of the first strain data;

applying, by the processor, the second strain data to the plurality of machine-learned classifiers to determine a third plurality of hand poses;

comparing, by the processor, the second plurality of hand poses to the third plurality of hand poses determined by the plurality of machine- learned classifiers;

determining, by the processor, an accuracy of each of the plurality of machine-learned classifiers based on the comparing;

selecting, by the processor, a machine-learned classifier of the plurality of machine-learned classifiers having a highest accuracy;

building a second glove with strain sensors arranged on the glove at locations according to the combination of the first strain data used to train the machine-learned classifier having the highest accuracy.

15. The method of claim 14, wherein each combination of the first strain data includes different numbers and arrangements of strain data from the plurality of strain sensors.

16. The method of claim 15, wherein each combination of the first strain data includes strain data from at least three strain sensors.

17. The method of claim 14, wherein the machine learning classifiers are support vector machine models.

18. The method of claim 14, further comprising:

recording the first strain data;

adjusting a position of the glove; and

recording the second strain data with the adjusted position of the glove.

19. The method of claim 18, wherein recording the first strain data is performed while the glove is positioned in one or more predetermined gestures.

20. The method of claim 14, wherein two or more machine learning classifiers of the plurality of machine learning classifiers are trained with strain data and associated hand poses from each combination of the first strain data.