WO2019080203A1 - 一种机器人的手势识别方法、系统及机器人 - Google Patents

一种机器人的手势识别方法、系统及机器人

Info

Publication number
WO2019080203A1
WO2019080203A1 PCT/CN2017/111185 CN2017111185W WO2019080203A1 WO 2019080203 A1 WO2019080203 A1 WO 2019080203A1 CN 2017111185 W CN2017111185 W CN 2017111185W WO 2019080203 A1 WO2019080203 A1 WO 2019080203A1
Authority
WO
WIPO (PCT)
Prior art keywords
gesture
sample set
module
neural network
picture
Prior art date
Application number
PCT/CN2017/111185
Other languages
English (en)
French (fr)
Inventor
谢阳阳
Original Assignee
南京阿凡达机器人科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 南京阿凡达机器人科技有限公司 filed Critical 南京阿凡达机器人科技有限公司
Publication of WO2019080203A1 publication Critical patent/WO2019080203A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language

Definitions

  • the invention relates to the field of artificial intelligence and picture processing, in particular to a gesture recognition method, system and robot for a robot.
  • gesture recognition is an important way of human-computer interaction, and its research and development affects the naturalness and flexibility of human-computer interaction.
  • gesture recognition by conventional image processing techniques and machine learning methods generally includes steps such as gesture segmentation, gesture analysis, and gesture recognition. This method is generally suitable for identification in a single background.
  • gestures are usually in complex environments, such as complex backgrounds, excessive light or excessive light, different distances from gestures, etc. in complex environments. Under the circumstance, the machine learning method is prone to misjudgment. At this time, it is necessary to manually complete the screening, which cannot meet the purpose of intelligent detection.
  • the present invention provides a more intelligent gesture recognition method and system that enables a robot to better recognize gestures and perform corresponding work.
  • the invention provides a gesture recognition method, system and robot for a robot, which can filter the result of the recognition of the Adaboost cascade gesture detector by the gesture recognition convolutional neural network, and accurately recognize the gesture under the complex background, and the technical solution thereof as follows:
  • a gesture recognition method for a robot includes: pre-acquisition includes different gestures and does not include a picture of the gesture, obtaining a sample picture set; generating a detection sample set according to the sample picture set, filtering the sample set; and obtaining an Adaboost cascaded gesture detector according to the detection sample set; and obtaining a gesture according to the filtered sample set Identifying a convolutional neural network; recognizing a gesture in the to-be-detected picture by the Adaboost cascading gesture detector to obtain a gesture recognition result, and filtering the gesture recognition result by the gesture recognition convolutional neural network to obtain a correct gesture Identify the results.
  • the Adaboost cascade gesture detector trained by the gesture sample is not very accurate in recognizing the recognition result of the gesture in a complex environment, and some misjudgment results may occur, and the recognition result is performed by the gesture recognition convolutional neural network. Filtering can automatically screen out the correct recognition results, making the robot more intelligent at work.
  • the detecting the sample set and the filtering the sample set according to the sample picture set are: selecting a picture corresponding to the gesture to be trained from the sample picture set as a gesture sample set; and filtering out the matching from the gesture sample set Presetting the image required by the sample, obtaining the filtered sample sample set; marking the gesture position in each picture in the filtered gesture sample set, and cropping the image with the marked gesture according to a preset specification, and as a detection positive a sample set; the picture in the sample picture set containing no gesture, a picture containing a flesh color sample, and a picture containing other gestures as a negative sample set; combining the detected positive sample set and the detected negative sample set into a set Detecting a sample set; using the detected positive sample set as a filtered positive sample set; arranging the picture in the sample picture set that does not include a gesture, and the picture containing the flesh color sample is cropped according to the preset specification to obtain a filtered negative sample set
  • the selected samples can be more in line with the training requirements, improve the sample quality, and the trained Adaboost cascade gesture detector and the gesture recognition convolutional neural network recognize the gesture. The result is more precise.
  • the training of the Adaboost cascading gesture detector according to the detection sample set is: calculating, according to the detection sample set, a rectangular feature set corresponding to each detection sample; and respectively corresponding rectangles according to all detection samples a feature set, training obtains a plurality of weak classifiers; and screening, according to the Adaboost algorithm, a plurality of weak points with low false positive rate in the plurality of weak classifiers Classifiers form multiple strong classifiers, which are combined in the following ways:
  • M is the number of iterations, that is, the number of weak classifiers obtained;
  • ⁇ m is the weight of each weak classifier;
  • G m (x) is a weak classifier, and
  • f(x) is a strong classifier;
  • the plurality of strong classifiers are combined into an Adaboost cascade gesture detector.
  • the scheme can obtain a plurality of different weak classifiers through gesture sample training, and form different strong classifiers.
  • the improved Adaboost cascade gesture detector is used for preliminary recognition of gestures, and has high recognition accuracy. Strong recognition ability and high recognition accuracy.
  • the trained gesture recognition convolutional neural network is specifically: preprocessing the filtered sample set by a sample enhancement and normalization method; dividing the filtered sample set into a training sample set, a verification sample set, and Testing a sample set; initializing a lightweight neural network S-LeNet, the S-LeNet is a neural network optimized for LeNet, the optimization specifically using a convolutional layer and a downsampling layer instead of the fully connected layer of LeNet, and Reducing the number of convolution kernels; using the S-LeNet neural network, training the gesture recognition convolutional neural network using the training sample set, the verification sample set, and the test sample set.
  • the filtered sample set can be divided into a training sample set, a verification sample set and a test sample set, and the recognition degree can be high. Gesture recognition convolutional neural network.
  • a recognition result is obtained, and the gesture recognition result is filtered according to the gesture recognition convolutional zero neural network, and the correct gesture recognition result is obtained.
  • the cascading Adaboost classifier is used to detect each frame to be detected, and obtain a plurality of gesture classification pictures.
  • the plurality of gesture classification pictures are adjusted according to the preset specifications, and the adjusted gesture classification is obtained. a picture; inputting the adjusted gesture classification picture into the gesture recognition convolutional neural network, filtering in a multi-thread manner, and if the adjusted gesture classification picture includes a gesture, saving and displaying the adjusted gesture classification Picture, otherwise, filter the adjusted gesture classification picture.
  • the gesture when the gesture is recognized, the gesture is initially identified by the Adaboost cascade gesture detector, and multiple gesture recognition results are obtained, but the recognition result is insufficient. Accurate, at this time, the recognition results can all be input to the gesture recognition convolutional neural network for filtering.
  • the gesture recognition convolutional neural network is performed in a multi-threaded manner during filtering, which can maximize the filtering efficiency and greatly reduce the processing time. After filtering, the result of the gesture classification with higher precision can be obtained.
  • a gesture recognition system for a robot includes: a picture acquisition module, configured to pre-collect a picture containing different gestures and no gestures to obtain a sample picture set; and a detector training module for training according to the artificially generated detection sample set.
  • An Adaboost cascading gesture detector configured to obtain a gesture recognition convolutional neural network according to a manually generated filtered sample set; a gesture recognition module, respectively, and the detector training module and the neural network training module
  • An electrical connection configured to identify a gesture in the to-be-detected picture by the Adaboost cascading gesture detector, to obtain a gesture recognition result, and filter the gesture recognition result by using the gesture recognition convolutional neural network to obtain correct gesture recognition result.
  • the Adaboost cascade gesture detector trained by the gesture sample is not very accurate in recognizing the recognition result of the gesture in a complex environment, and some misjudgment results may occur, and the recognition result is performed by the gesture recognition convolutional neural network. Filtering can automatically screen out the correct recognition results, making the robot more intelligent at work.
  • the detector training module includes: a calculation sub-module, configured to calculate, according to the detection sample set, a rectangular feature set corresponding to each detection sample; a weak classifier training sub-module, and the calculation sub-module
  • the connection is used to train multiple weak classifiers according to the corresponding rectangular feature sets of all the detected samples;
  • the strong classifier training sub-module is electrically connected with the weak classifier training sub-module, and is used for multiple according to the Adaboost algorithm.
  • a number of weak classifiers with low false positive rate are selected in the weak classifier to form multiple strong classifiers. The combination is as follows:
  • M is the number of iterations, ie the number of weak classifiers obtained;
  • ⁇ m is the weight of each weak classifier;
  • G m (x) is a weak classifier, f(x) is a strong classifier;
  • detector The training submodule is electrically connected to the strong classifier training submodule for combining the plurality of strong classifiers into an Adaboost cascade gesture detector.
  • the strong classifier the finally trained Adaboost cascade gesture detector, is used for preliminary recognition of gestures, has higher recognition accuracy, strong recognition ability, and high recognition accuracy.
  • the neural network training module includes: a processing sub-module, which preprocesses a gesture recognition convolutional neural network sample set by a sample enhancement and normalization method; and a sample classification sub-module, configured to preset the filtered sample set The ratio is divided into a training sample set, a verification sample set, and a test sample set; the processing sub-module is further used to initialize a lightweight neural network S-LeNet, and the S-LeNet is a neural network optimized for LeNet, and the optimization is specifically Using a convolutional layer and a downsampling layer in place of the fully connected layer of LeNet, and reducing the number of convolution kernels; a neural network training sub-module electrically coupled to the processing module and the sample classification sub-module, through the S a LeNet neural network, using the training sample set, the verification sample set, and the test sample set training to obtain a gesture recognition convolutional neural network.
  • a processing sub-module which preprocesses a gesture recognition convolutional neural network sample set
  • the filtered sample set can be divided into a training sample set, a verification sample set and a test sample set, and the recognition degree can be high. Gesture recognition convolutional neural network.
  • the gesture recognition module includes: a detection sub-module, which detects each frame to be detected by using a cascaded Adaboost classifier to obtain a plurality of gesture classification pictures; a picture adjustment sub-module, and the detector
  • the module is electrically connected, and is configured to adjust a picture size according to the preset specification by using a plurality of gesture classification pictures to obtain an adjusted gesture classification picture; a filter sub-module, a storage sub-module, and a display sub-module that are electrically connected in sequence, the filter
  • the module is electrically connected to the picture adjustment submodule, and is configured to input the adjusted gesture classification picture into the gesture recognition convolutional neural network for filtering, if the adjusted gesture classification picture includes a gesture,
  • the storage sub-module saves the adjusted gesture classification picture, and displays the adjusted cut-off gesture classification picture by using a display sub-module; otherwise, filtering the cropped gesture classification picture by using the filtering sub-module.
  • the gesture when the gesture is recognized, the gesture is initially identified by the Adaboost cascade gesture detector, and multiple gesture recognition results are obtained, but the obtained recognition result is not accurate enough, and the recognition result can be input into the gesture at this time. Identify the convolutional neural network for filtering, and the gesture recognition convolutional neural network is filtered in a multi-threaded manner to maximize The filtering efficiency is improved, the processing time is greatly reduced, and after the filtering, the gesture classification result with higher precision can be obtained.
  • the invention trains a gesture recognition convolutional neural network, further filters and filters the results of the Adaboost cascade gesture detector recognition, eliminates the wrong recognition result, and improves the correct rate of gesture recognition.
  • the present invention provides an improved S-LeNet neural network structure.
  • the network size is reduced as much as possible while ensuring the accuracy.
  • the convolutional layer and the downsampling layer are used instead of one fully connected layer;
  • the overall network parameters have a large proportion, and can be replaced by a convolutional layer and a downsampling layer, which can effectively reduce network parameters and increase the ability of network feature extraction.
  • the present invention reduces the number of convolution kernels as much as possible, so that the recognition of the gesture recognition convolutional neural network The efficiency is improved, and the gesture recognition convolutional neural network obtained by the training has a fast processing speed, a higher correct rate of recognition, and a better recognition effect.
  • FIG. 1 is a flow chart of an embodiment of a gesture recognition method for a robot according to the present invention
  • FIG. 2 is a flow chart of another embodiment of a gesture recognition method for a robot according to the present invention.
  • Figure 3 is a flow chart of sample preparation in the present invention.
  • FIG. 4 is a flow chart of another embodiment of a gesture recognition method for a robot according to the present invention.
  • FIG. 6 is a flowchart of training of a gesture recognition convolutional neural network in the present invention.
  • FIG. 7 is a flow chart of another embodiment of a gesture recognition method for a robot according to the present invention.
  • Figure 8 is a structural diagram of an S-LeNet neural network in the present invention.
  • Figure 9 is a flow chart of gesture recognition in the present invention.
  • FIG. 11 is a schematic structural diagram of a gesture recognition system of a robot according to the present invention.
  • FIG. 12 is another schematic structural diagram of a gesture recognition system of a robot according to the present invention.
  • FIG. 13 is another schematic structural diagram of a gesture recognition system of a robot according to the present invention.
  • 1-picture acquisition module 2-detector training module, 21-calculation sub-module, 22-weak classifier training sub-module, 23-strong classifier training sub-module, 24-detector training sub-module, 3-neural network training Module, 31-processing sub-module, 32-sample classification sub-module, 33-neural network training sub-module, 4-gesture recognition module, 41-detection sub-module, 42-picture adjustment sub-module, 43-filter sub-module, 44- Storage submodule, 45-display submodule.
  • the present invention provides an embodiment of a gesture recognition method for a robot, including:
  • the gesture is recognized by the Adaboost cascading gesture detector in the to-be-detected picture, and the gesture recognition result is obtained, and the gesture recognition result is filtered by the gesture recognition convolutional neural network to obtain a correct gesture recognition result.
  • the process of recognizing gestures only utilizes the Adaboost cascade gesture detector, so that the accuracy of the recognition result is not high enough, and the recognition is correct in a complicated environment such as a complicated background and a light change.
  • the rate is greatly reduced, and the recognized gesture result often has erroneous results. Therefore, the present invention provides a gesture recognition convolutional neural network with deep learning ability for filtering out the wrong recognition result and further improving the correct recognition. rate.
  • the present invention provides an embodiment of a gesture recognition method for a robot, including:
  • the set of test samples includes the set of detected positive samples and the set of detected negative samples.
  • the filtered sample set includes the filtered positive sample set and the filtered negative sample set.
  • the embodiment specifically describes the production process of the sample.
  • the sample production process is as shown in FIG. 3, firstly collecting image samples containing different gestures, and also collecting images without gestures, and sorting and storing by gestures to obtain a set of gesture samples corresponding to the gestures to be trained;
  • the screens are selected to meet the requirements of the preset sample.
  • the preset sample requirements of the gesture include clear gesture images, complete gestures in the image, and the like. After that, manually mark the position of the gesture sample in the gesture sample and trim, and transform it to the specified size. , such as 40 ⁇ 40 pixel size, as a positive sample of training two algorithms.
  • the present invention trains different classifiers for different gestures.
  • the specific sample production sub-steps are as follows: the above-mentioned cropped sample is used as a positive sample set; and a negative sample picture containing no gesture is collected.
  • the negative sample picture should contain the flesh color sample.
  • the flesh color sample refers to the sample containing the color of the human skin.
  • the samples of different gestures are also used as the negative sample.
  • the sample of the fist and cloth is used as the negative sample set of the Adaboost algorithm. use.
  • the size of the negative samples required to train the cascade Adaboost does not need to be transformed to a specified size.
  • the Adaboost algorithm positive sample set and the Adaboost algorithm negative sample set are used as the Adaboost algorithm sample set.
  • the invention only judges the hand or the non-hand, so only simple classification is needed, and the sample production sub-steps are as follows: the positive sample set of the Adaboost algorithm is used as the gesture recognition convolution.
  • the neural network is a positive sample set; a negative sample picture containing no gestures is collected, and the negative sample picture should contain a flesh-colored sample; a negative sample will be collected, and a number of negative samples of a specified size, such as a 40 ⁇ 40 prime point size, are cropped.
  • the gesture sample production in this embodiment is mainly completed by manual, the precision of the screening sample is high, the sample is relatively standardized, and the recognition of the Adaboost classifier and the gesture recognition convolutional neural network obtained by the sample training obtained is relatively high.
  • the present invention provides an embodiment of a gesture recognition method for a robot, including:
  • M is the number of iterations, that is, the number of weak classifiers obtained;
  • ⁇ m is the weight of each weak classifier;
  • G m (x) is a weak classifier, and f(x) is a strong classifier;
  • the plurality of strong classifiers are combined into an Adaboost cascade gesture detector.
  • the S-LeNet is a neural network optimized for LeNet, the optimization is specifically using a convolutional layer and a downsampling layer instead of the fully connected layer of LeNet, and reducing the convolution kernel Number of
  • the gesture recognition convolutional neural network is trained by using the training sample set to obtain a training accuracy rate, and when the training accuracy rate reaches a first preset expected value, performing the next Step, otherwise, adjusting parameters of the S-LeNet neural network to continue training until the training accuracy reaches a first preset expected value;
  • the S-LeNet neural network using the verification sample set to verify the trained gesture recognition convolutional neural network, and obtaining a verification accuracy rate, when the verification accuracy reaches a second preset expectation value, Performing the next step, otherwise, adjusting the parameters of the S-LeNet neural network to retrain and verify until the verification accuracy reaches the second preset expectation value;
  • the trained gesture recognition convolutional neural network is tested by using the test sample set to obtain a test accuracy rate, when the test accuracy reaches a third preset expectation value, The training is stopped, and the trained gesture recognition convolutional neural network is obtained. Otherwise, the parameters of the S-LeNet neural network are retrained, verified, and tested until the test accuracy reaches the third preset expectation value.
  • the gesture is recognized by the Adaboost cascading gesture detector in the to-be-detected picture, and the gesture recognition result is obtained, and the gesture recognition result is filtered by the gesture recognition convolutional neural network to obtain a correct gesture recognition result.
  • this embodiment further describes how to train the Adaboost cascading gesture detector and the gesture recognition convolutional neural network.
  • the flow of training the Adaboost cascade gesture detector in this embodiment is shown in FIG. 5.
  • the Adaboost cascade gesture detector is composed of multiple strong classifiers, and the strong classifier is composed of multiple weak classifiers. Train multiple weak classifiers before cascading Adaboost. Different classifiers are trained according to samples of different gestures, each gesture training multiple layers of different classifiers, and combined for gesture detection and recognition.
  • the strong classifier training process for each gesture (for example, a single scissors hand, the other gesture training process is the same) is as follows:
  • the Adaboost algorithm is used to select the optimal weak classifiers to form a strong classifier
  • the Adaboost cascade gesture detector training method provided by the embodiment can make the Adaboost cascade gesture detector have higher recognition precision and better recognition effect, and reduce the processing task of the gesture recognition convolutional neural network.
  • the process of training the gesture recognition convolutional neural network in this embodiment is as shown in FIG. 6, and the specific training The process is as follows:
  • test sample set to test the trained gesture recognition convolutional neural network and obtain test accuracy.
  • test accuracy reaches the third preset expectation value
  • the training is stopped, and the gesture after training is obtained.
  • the convolutional neural network is identified, otherwise the parameters of the LeNet neural network are retrained, verified, and tested until the test accuracy reaches the third predetermined desired value.
  • the recognition rate of the gesture recognition convolutional neural network is very high, and the result of the recognition of the Adaboost cascade gesture detector can be accurately determined to achieve intelligent recognition. effect.
  • the present invention provides an embodiment of a gesture recognition method for a robot, including:
  • a plurality of weak classifiers with low false positive rate are selected from a plurality of weak classifiers to form a plurality of strong classifiers, and the combination manner is:
  • M is the number of iterations, that is, the number of weak classifiers obtained;
  • ⁇ m is the weight of each weak classifier;
  • G m (x) is a weak classifier, and f(x) is a strong classifier;
  • the plurality of strong classifiers are combined into an Adaboost cascade gesture detector.
  • the S-LeNet is a neural network optimized for LeNet, the optimization is specifically using a convolutional layer and a downsampling layer instead of the fully connected layer of LeNet, and reducing the convolution kernel Number of
  • the adjusted gesture classification picture is input into the gesture recognition convolutional neural network, and is filtered in a multi-thread manner. If the adjusted gesture classification picture includes a gesture, the adjusted gesture classification picture is saved and displayed. Otherwise, the adjusted gesture classification picture is filtered.
  • the S-LeNet neural network is specifically:
  • the input layer receives the input filtered sample
  • Each convolution kernel in the first convolutional layer respectively detects a specific feature corresponding to each filter sample in the input filter set by a convolution operation, and obtains a first volume corresponding to each gesture recognition convolutional neural network gesture.
  • the product feature set, the convolution operation mode is:
  • * is a two-dimensional discrete convolution operator
  • b is an offset
  • w ij is a convolution kernel
  • x is an input feature map
  • f( ⁇ ) is an activation function
  • the first activation function layer retains the feature of the first convolution feature set that meets the activation function requirement by nonlinear transformation, deletes the feature that does not meet the activation function requirement, and obtains the processed first processing feature set;
  • the first downsampling layer performs aggregation statistics on the first processing feature set, and obtains a first statistical feature set after aggregation statistics corresponding to each gesture recognition convolutional neural network gesture, and the statistical method is:
  • is the multiplicative bias
  • down() is the downsampling function
  • b is the additive bias
  • f( ⁇ ) is the activation function
  • the second convolution layer performs a convolution operation on the first statistical feature set obtained by the aggregation of the first downsampling layer to obtain a second convolution feature set;
  • the second activation function layer retains the feature of the second convolution feature set that meets the activation function requirement by nonlinear transformation, deletes the feature that does not meet the activation function requirement, and obtains the processed second processing feature set;
  • the second downsampling layer performs aggregation statistics on the second processing feature set, and obtains a second statistical feature set corresponding to the aggregation statistics corresponding to each gesture recognition convolutional neural network gesture;
  • the third convolution layer performs a convolution operation on the first statistical feature set obtained by the aggregation of the second downsampling layer to obtain a third convolution feature set;
  • the third activation function layer retains the feature of the third convolution feature set that meets the activation function requirement by nonlinear transformation, deletes the feature that does not meet the activation function requirement, and obtains the processed third processing feature set;
  • the third downsampling layer performs aggregation statistics on the third processing feature set, and obtains a third statistical feature set after the aggregated statistics corresponding to each gesture recognition convolutional neural network gesture;
  • All the neuron nodes in the fully connected layer are connected to all the feature points in the third feature set corresponding to each gesture recognition convolutional neural network gesture output by the third downsampling layer, and the output functions are:
  • x is the input of the fully connected layer
  • h(x) is the output of the fully connected layer
  • w is the weight
  • b is Additive bias
  • f( ⁇ ) is the activation function
  • the output of the fully connected layer is used as an input sample, and a K classifier is calculated by the SOFTMAX output layer, and the K classifier is a K-dimensional vector, and the calculation method is:
  • x is the input sample
  • y is the output
  • p(y j
  • x) is the probability that the sample is judged to be a certain category j.
  • Model parameters For the normalization function, the probability distribution is normalized such that the sum of all the probabilities is 1.
  • the embodiment describes the structure of the S-LeNet neural network in the gesture recognition convolutional neural network.
  • the existing LeNet structure consists of two convolutional layers, two downsampling layers, two fully connected layers, and one output layer.
  • the network used in the present invention comprises three convolution layers, three downsampling layers, one fully connected layer and one output layer, as shown in FIG.
  • the present invention uses a convolutional layer and a downsampled layer instead of one fully connected layer.
  • the parameters of the full connection layer account for a large proportion of the overall network parameters, and are replaced by a convolutional layer and a downsampling layer, which can effectively reduce network parameters and increase the ability of network feature extraction.
  • the number of convolution kernels is also reduced. The more the number of convolution kernels, the more parameters, the longer the forward propagation time, so the number of convolution kernels is reduced as much as possible while ensuring network accuracy. .
  • the first convolutional layer, the second convolutional layer, and the third convolutional layer have the same structural function, and each convolution kernel detects a specific feature at all positions of the input feature map, and realizes the weight on the same input feature map. Value sharing.
  • convolution operations are performed using different convolution kernels; after the convolutional layer is passed through the convolutional layer, the important parts of the feature are preserved and mapped out by nonlinear transformation. Redundant parts of the feature, while improving the ability to characterize features; common activation functions are sigmoid, Tanh, and Relu; then go through the downsampling layer. Aggregate statistics on the feature map obtained by convolution, which makes it easier to describe high-dimensional images.
  • This aggregation operation is downsampling.
  • the downsampling operation keeps the features of the high-resolution feature map while maintaining the resolution of the output feature map; all the neuron nodes of the fully connected layer are all in the feature map of the upper layer output.
  • the neuron nodes are connected to each other and then calculated by the output layer to output a vector of K dimensions.
  • each gesture can be followed by a corresponding K-dimensional vector.
  • the fist corresponds to a K-dimensional vector ⁇ a k ⁇
  • the scissors correspond to a
  • the cloth corresponds to a K-dimensional vector ⁇ c k ⁇ .
  • the process of gesture recognition can refer to the flowchart shown in FIG. 9.
  • the Adaboost cascading gesture detector is used to detect each frame to be detected, and multiple gesture classification pictures are obtained; then, multiple gesture classification pictures are followed.
  • the preset specification such as a 40 ⁇ 40 pixel size, is cropped to obtain a gesture classification image that can be recognized by the gesture recognition convolutional neural network; and the cropped gesture classification image is input into the gesture recognition convolutional neural network in a multi-threaded manner.
  • a K-dimensional vector is obtained, and the obtained K-dimensional vector is compared with the previously trained K-dimensional vector, thereby Recognizing the gesture, for example, the K-dimensional vector obtained by the recognition gesture is very close to the K-dimensional vector ⁇ a k ⁇ corresponding to the fist, and then the recognized gesture can be judged to be a fist. If it is recognized that the image contains a gesture, the image is saved and displayed, otherwise, the image is filtered. As shown in FIG. 10, the result detected by the Adaboost cascade gesture detector is three black frames, but because the background is complicated, the detected result is not very accurate, and the result will be recognized after the gesture recognition convolutional neural network filters. Displayed in a white box.
  • the present invention provides an embodiment of a gesture recognition system for a robot, including:
  • the picture collection module 1 is configured to pre-collect pictures containing different gestures and no gestures to obtain a sample picture set;
  • the detector training module 2 is configured to train an Adaboost cascade gesture detector according to the artificially generated detection sample set
  • the neural network training module 3 is configured to train the gesture recognition convolutional neural network according to the artificially generated filtered sample set
  • the Gesture recognition module 4 is electrically connected, and is configured to identify a gesture in the to-be-detected picture by using the Adaboost cascading gesture detector to obtain a gesture recognition result, and filter the gesture recognition result by using the gesture recognition convolutional neural network to obtain a correct Gesture recognition results.
  • the robot can install a plurality of cameras, and the cameras collect images containing different gestures and no gestures to obtain a sample image set; then manually process the sample image set, create a test sample set, filter the sample set, and then train through the detector.
  • the module and the neural network training module respectively train the Adaboost cascade gesture detector and the gesture recognition convolutional neural network.
  • the gesture recognition module first uses the Adaboost cascade gesture detector to perform a preliminary recognition of the gesture to obtain multiple results. Because the accuracy of the recognition of the Adaboost cascading gesture detector in a complex environment is not high enough, the results obtained may have erroneous results. Therefore, the gesture recognition neural network is used to filter the results and filter out the correct results. And displayed on the screen to complete the identification process.
  • the present invention provides an embodiment of a robot gesture recognition system. Based on the previous embodiment, the embodiment includes:
  • the picture collection module 1 is configured to pre-collect pictures containing different gestures and no gestures to obtain a sample picture set;
  • the detector training module 2 is configured to train an Adaboost cascade gesture detector according to the artificially generated detection sample set
  • the detector training module 2 includes:
  • a calculation sub-module 21 configured to calculate, according to the detection sample set, a rectangular feature set corresponding to each detection sample
  • the weak classifier training sub-module 22 is electrically connected to the computing sub-module, and is configured to train a plurality of weak classifiers according to the corresponding rectangular feature sets of all the detected samples;
  • the strong classifier training sub-module 23 is electrically connected to the weak classifier training sub-module, and is configured to select a plurality of weak classifiers with low false positive rate and multiple strong classifiers in multiple weak classifiers according to the Adaboost algorithm.
  • the combination is:
  • M is the number of iterations, that is, the number of weak classifiers obtained;
  • ⁇ m is the weight of each weak classifier;
  • G m (x) is a weak classifier, and f(x) is a strong classifier;
  • the detector training sub-module 24 is electrically connected to the strong classifier training sub-module for combining the plurality of strong classifiers into an Adaboost cascade gesture detector.
  • the neural network training module 3 is configured to train the gesture recognition convolutional neural network according to the artificially generated filtered sample set
  • the neural network training module 3 includes:
  • the processing sub-module 31 pre-processes the convolutional neural network sample set by the sample enhancement and normalization method
  • a sample classification sub-module 32 configured to divide the filtered sample set into a training sample set, a verification sample set, and a test sample set according to a preset ratio
  • the processing sub-module 31 is further configured to initialize a lightweight neural network S-LeNet, which is a neural network optimized for LeNet, the optimization specifically using a convolutional layer and a downsampling layer instead of a full connection of LeNet Layer, and reduce the number of convolution kernels;
  • S-LeNet a lightweight neural network optimized for LeNet
  • a neural network training sub-module 33 electrically coupled to the processing sub-module 31 and the sample classification sub-module 32, using the training sample set, the verification sample set, and the test through the S-LeNet neural network
  • the sample set training results in a gesture recognition convolutional neural network.
  • the gesture recognition module 4 is electrically connected to the detector training module 2 and the neural network training module 3, respectively, for identifying the collected gesture according to the Adaboost cascade gesture detector and the gesture recognition convolutional neural network. image.
  • this embodiment further describes how to train the Adaboost cascading gesture detector and the gesture recognition convolutional neural network.
  • the Adaboost cascade gesture detector is composed of multiple strong classifiers, and the strong classifier is composed of multiple weak classifiers. Therefore, multiple weak classifiers are trained before the cascaded Adaboost is obtained. Different classifiers are trained according to samples of different gestures, each gesture training multiple layers of different classifiers, and combined for gesture detection and recognition.
  • the strong classifier training process for each gesture is as follows:
  • the calculation sub-module takes the detection sample set as an input, and calculates and obtains a rectangular feature set under a given rectangular feature prototype;
  • the weak classifier training sub-module takes a rectangular feature set as an input, determines a threshold according to a given weak learning algorithm, and trains the weak classifier;
  • the strong classifier training sub-module takes the weak classifier as input. According to the detection rate and the false positive rate, the Adaboost algorithm is used to select the optimal weak classifiers to form a strong classifier.
  • the detector training sub-module takes a strong classifier as an input and combines into an Adaboost cascade gesture detector;
  • the Adaboost cascade gesture detector training method provided by the embodiment can make the Adaboost cascade gesture detector have higher recognition precision and better recognition effect, and reduce the processing task of the gesture recognition convolutional neural network.
  • the training gesture recognition convolutional neural network is specifically as follows: the processing sub-module preprocesses the gesture filtering sample set by the sample enhancement and normalization method to improve the sample diversity and accelerate the convergence of the network; manually set the filtered sample set to a preset a ratio, such as 6:2:2, is divided into a training sample set, a verification sample set, and a test sample set; and the processing sub-module initializes parameters of the S-LeNet neural network in the neural network convolution;
  • the neural network training sub-module uses the training sample set to train the gesture recognition convolutional neural network to obtain a training accuracy rate.
  • the training accuracy rate reaches the first preset expected value, the next step is performed; otherwise, the The parameters of the S-LeNet neural network continue to be trained until the training accuracy reaches a first preset expectation value;
  • the neural network training sub-module uses the verification sample set to verify the trained gesture recognition convolutional neural network, and obtains the verification accuracy rate. When the verification accuracy rate reaches the second preset expected value, the next step is performed; otherwise, Adjusting parameters of the S-LeNet neural network to retrain and verify until the verification accuracy reaches the second preset expected value;
  • the neural network training sub-module uses the test sample set to test the trained gesture recognition convolutional neural network, and obtains the test accuracy rate. When the test accuracy reaches the third preset expectation value, the training is stopped, and the training is obtained.
  • the gesture identifies a convolutional neural network, otherwise, the parameters of the S-LeNet neural network are retrained, verified, and tested until the test accuracy reaches the third predetermined desired value.
  • the trained hand is obtained.
  • the recognition rate of the potential recognition convolutional neural network is very high, and the result of the recognition of the Adaboost cascade gesture detector can be accurately determined to achieve the effect of intelligent recognition.
  • the present invention provides an embodiment of a gesture recognition system for a robot, including:
  • the picture collection module 1 is configured to pre-collect pictures containing different gestures and no gestures to obtain a sample picture set;
  • the detector training module 2 is configured to train an Adaboost cascade gesture detector according to the artificially generated detection sample set
  • the detector training module 2 includes:
  • a calculation sub-module 21 configured to calculate, according to the detection sample set, a rectangular feature set corresponding to each detection sample
  • the weak classifier training sub-module 22 is electrically connected to the computing sub-module, and is configured to train a plurality of weak classifiers according to the corresponding rectangular feature sets of all the detected samples;
  • the strong classifier training sub-module 23 is electrically connected to the weak classifier training sub-module, and is configured to select a plurality of weak classifiers with low false positive rate and multiple strong classifiers in multiple weak classifiers according to the Adaboost algorithm.
  • the combination is:
  • M is the number of iterations, that is, the number of weak classifiers obtained;
  • ⁇ m is the weight of each weak classifier;
  • G m (x) is a weak classifier, and f(x) is a strong classifier;
  • the detector training sub-module 24 is electrically connected to the strong classifier training sub-module for combining the plurality of strong classifiers into an Adaboost cascade gesture detector.
  • the neural network training module 3 is configured to train the gesture recognition convolutional neural network according to the artificially generated filtered sample set
  • the neural network training module 3 includes:
  • the processing sub-module 31 pre-processes the convolutional neural network sample set by the sample enhancement and normalization method
  • a sample classification sub-module 32 configured to divide the filtered sample set into a training sample set, a verification sample set, and a test sample set according to a preset ratio
  • the processing sub-module 31 is further configured to initialize a lightweight neural network S-LeNet, which is a neural network optimized for LeNet, the optimization specifically using a convolutional layer and a downsampling layer instead of a full connection of LeNet Layer, and reduce the number of convolution kernels;
  • S-LeNet a lightweight neural network optimized for LeNet
  • a neural network training sub-module 33 electrically coupled to the processing sub-module 31 and the sample classification sub-module 32, using the training sample set, the verification sample set, and the test through the S-LeNet neural network
  • the sample set training results in a gesture recognition convolutional neural network.
  • the gesture recognition module 4 is electrically connected to the detector training module 2 and the neural network training module 3, respectively, for identifying the collected gesture according to the Adaboost cascade gesture detector and the gesture recognition convolutional neural network. image.
  • the gesture recognition module 4 includes:
  • the detecting sub-module 41 detects the detected picture to be detected for each frame by using the cascaded Adaboost classifier, and obtains multiple gesture classification pictures;
  • the image adjustment sub-module 42 is electrically connected to the detection sub-module 41, and is configured to adjust a picture size according to the preset specification by using a plurality of gesture classification pictures to obtain an adjusted gesture classification picture;
  • the filter sub-module 43 and the display sub-module 45 are electrically connected in sequence, and the filter sub-module 43 is electrically connected to the picture adjustment sub-module 42 for inputting the adjusted gesture classification picture into gesture recognition. Filtering is performed in the convolutional neural network. If the adjusted gesture classification picture includes a gesture, the adjusted gesture classification picture is saved by the storage sub-module, and the adjustment is performed by the display sub-module 45. The gesture classifies the picture, otherwise, the filtered gesture sub-category filters the cropped gesture classification picture.
  • the S-LeNet neural network trained by the neural network training sub-module 33 includes:
  • An input layer configured to receive the input filtered sample
  • each convolution kernel in the first convolutional layer respectively detecting a specific feature corresponding to each filtered sample in the input filtered sample set by a convolution operation, to obtain each gesture recognition convolution
  • the first convolution feature set corresponding to the neural network gesture, and the convolution operation mode is:
  • * is a two-dimensional discrete convolution operator
  • b is an offset
  • w ij is a convolution kernel
  • x is an input feature map
  • f( ⁇ ) is an activation function
  • the first activation function layer retains the feature of the first convolution feature set that meets the activation function requirement by nonlinear transformation, deletes the feature that does not meet the activation function requirement, and obtains the processed first processing feature set;
  • the first downsampling layer performs aggregation statistics on the first processing feature set to obtain a first statistical feature set corresponding to the aggregation statistics corresponding to each gesture recognition convolutional neural network gesture, and the statistical method is:
  • is the multiplicative bias
  • down() is the downsampling function
  • b is the additive bias
  • f( ⁇ ) is the activation function
  • a second convolution layer performing a convolution operation on the first statistical feature set of the aggregated statistics obtained by the first downsampling layer to obtain a second convolution feature set;
  • a second activation function layer retaining, by the nonlinear transformation, a feature that meets an activation function requirement in the second convolution feature set, deleting a feature that does not meet an activation function requirement, and obtaining a processed second processing feature set;
  • the second downsampling layer performs aggregation statistics on the second processing feature set to obtain a second statistical feature set corresponding to the aggregation statistics corresponding to each gesture recognition convolutional neural network gesture;
  • a third convolution layer performing a convolution operation on the first statistical feature set of the aggregated statistics obtained by the second downsampling layer to obtain a third convolution feature set;
  • the third activation function layer retains the feature of the third convolution feature set that meets the activation function requirement by nonlinear transformation, deletes the feature that does not meet the activation function requirement, and obtains the processed third processing feature set;
  • the third downsampling layer performs aggregation statistics on the third processing feature set to obtain a third statistical feature set after the aggregated statistics corresponding to each gesture recognition convolutional neural network gesture;
  • x is the input of the fully connected layer
  • h(x) is the output of the fully connected layer
  • w is the weight
  • b is the additive bias
  • f( ⁇ ) is the activation function
  • An output layer configured to use the output of the fully connected layer as an input sample, and calculate a K classifier, wherein the K classifier is a K-dimensional vector, and the calculation method is:
  • x is the input sample
  • y is the output
  • p(y j
  • x) is the probability that the sample is judged to be a certain category j.
  • Model parameters For the normalization function, the probability distribution is normalized such that the sum of all the probabilities is 1.
  • the current LeNet neural network structure consists of two convolutional layers, two downsampling layers, two fully connected layers, and one output layer.
  • the network is used to reduce the network size as much as possible.
  • the network used in the present invention comprises three convolution layers, three downsampling layers, one full connection layer and one output layer, as shown in FIG.
  • the present invention uses a convolutional layer and a downsampled layer instead of one fully connected layer.
  • the parameters of the full connection layer account for a large proportion of the overall network parameters, and are replaced by a convolutional layer and a downsampling layer, which can effectively reduce network parameters and increase the ability of network feature extraction.
  • the number of convolution kernels is also reduced. The more the number of convolution kernels, the more parameters, the longer the forward propagation time, so the number of convolution kernels is reduced as much as possible while ensuring network accuracy. .
  • the first convolutional layer, the second convolutional layer, and the third convolutional layer have the same structural function, and each convolution kernel detects a specific feature at all positions of the input feature map, and realizes the weight on the same input feature map. Value sharing.
  • convolution operations are performed using different convolution kernels; after the convolutional layer is passed through the convolutional layer, the important parts of the feature are preserved and mapped out by nonlinear transformation. Redundant parts of the feature, while improving the ability to characterize features; common activation functions are sigmoid, Tanh, and Relu; then go through the downsampling layer. Aggregate statistics on the feature map obtained by convolution, which makes it easier to describe high-dimensional images.
  • This aggregation operation is downsampling.
  • the downsampling operation keeps the features of the high-resolution feature map while maintaining the resolution of the output feature map; all the neuron nodes of the fully connected layer are all in the feature map of the upper layer output.
  • the neuron nodes are connected to each other and then calculated by the output layer to output a vector of K dimensions.
  • each gesture can be followed by a corresponding K-dimensional vector.
  • the fist corresponds to a K-dimensional vector ⁇ a k ⁇
  • the scissors correspond to a
  • the cloth corresponds to a K-dimensional vector ⁇ c k ⁇ .
  • the Adaboost cascading gesture detector is first used to detect each frame to be detected to obtain a plurality of gesture classification pictures; then, the plurality of gesture classification pictures are according to the preset specifications, such as The 40 ⁇ 40 pixel size is cropped, and the gesture classification picture which can be recognized by the gesture recognition convolutional neural network is obtained; the cropped gesture classification picture is input into the gesture recognition convolutional neural network, and the recognition and filtering are performed in a multi-threaded manner, in the gesture recognition
  • the convolutional neural network recognizes the gesture, through the above steps, a K-dimensional vector is obtained, and the obtained K-dimensional vector is compared with the pre-trained K-dimensional vector to identify the gesture. If it is recognized that the image contains a gesture, the image is saved and displayed, otherwise, the image is filtered.
  • a robot integrates any of the gesture recognition systems of the above embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开了一种机器人的手势识别方法、系统及机器人,包括:预先采集包含不同手势及不包含手势的图片,得到样本图片集;根据所述样本图片集制作检测样本集、过滤样本集;根据所述检测样本集,训练得到Adaboost级联手势检测器;根据所述过滤样本集,训练得到手势识别卷积神经网络;通过所述Adaboost级联手势检测器识别采集到的手势图片,得到手势识别结果,通过所述手势识别卷积神经网络对所述手势识别结果进行过滤,得到正确的手势识别结果。根据本发明能够通过手势识别卷积神经网络对Adaboost级联手势检测器识别的结果进行过滤,在复杂背景下准确地识别出手势。

Description

一种机器人的手势识别方法、系统及机器人
本申请要求2017年10月25日提交的申请号为:201711006447.X、发明名称为“一种机器人的手势识别方法、系统及机器人”的中国专利申请的优先权,其全部内容合并在此。
技术领域
本发明涉及人工智能和图片处理领域,特别是一种机器人的手势识别方法、系统及机器人。
背景技术
随着科学技术的发展,智能机器人在日常生活和工业生产中的运用越来越多。在实现机器人智能化的过程中,手势识别作为人机交互一种重要方式,其研究发展影响着人机交互的自然性和灵活性。
目前有许多服务型机器人,能够根据用户的手势,识别用户指令,帮助人们完成许多工作。常规的图片处理技术和机器学习方法实现手势识别的流程通常包括手势分割、手势分析和手势识别等步骤。这种方式通常适用于单一背景下的识别,然而在现实应用中,手势通常处于复杂的环境下,例如背景复杂、光线过亮或过黯、手势离采集设备的距离不同等;在复杂的环境下,机器学习方法易出现误判,这时需要人工完成筛选,无法满足智能检测的目的。
因此,本发明提供了一种更加智能的手势识别方法和系统,能够使机器人更好地识别手势,完成相应的工作。
发明内容
本发明提供的一种机器人的手势识别方法、系统及机器人,能够通过手势识别卷积神经网络对Adaboost级联手势检测器识别的结果进行过滤,在复杂背景下准确地识别出手势,其技术方案如下:
一种机器人的手势识别方法,包括:预先采集包含不同手势及不包含 手势的图片,得到样本图片集;根据所述样本图片集制作检测样本集、过滤样本集;根据所述检测样本集,训练得到Adaboost级联手势检测器;根据所述过滤样本集,训练得到手势识别卷积神经网络;通过所述Adaboost级联手势检测器在待检测图片中识别手势,得到手势识别结果,通过所述手势识别卷积神经网络对所述手势识别结果进行过滤,得到正确的手势识别结果。
通过本方案,通过手势样本训练得到的Adaboost级联手势检测器,在复杂环境下识别手势得到的识别结果并不是非常准确,会出现一些误判结果,通过手势识别卷积神经网络对识别结果进行过滤,能够自动筛选出正确的识别结果,使机器人在工作时更加智能化。
优选的,根据所述样本图片集制作检测样本集、过滤样本集具体为:从所述样本图片集中筛选出需要训练的手势对应的图片,作为手势样本集;从所述手势样本集中筛选出符合预设样本要求的图片,得到筛选后的手势样本集;对筛选后的手势样本集中的每张图片中手势位置进行标记,并对标记过手势的图片按照预设规格进行裁剪,并作为检测正样本集;将所述样本图片集中不包含手势的图片、包含肉色样本的图片,以及含有其他手势的图片作为检测负样本集;将所述检测正样本集和所述检测负样本集组合成所述检测样本集;将所述检测正样本集作为过滤正样本集;将所述样本图片集中不包含手势的图片,以及包含肉色样本的图片按照所述预设规格进行裁剪,得到过滤负样本集;所述过滤正样本集和所述过滤负样本集组合成所述过滤样本集。
通过本方案提供的手势样本制作方法,通过人工筛选,使筛选出的样本能够更加符合训练的要求,提高样本质量,训练得到的Adaboost级联手势检测器和手势识别卷积神经网络在识别手势时的结果更加精确。
优选的,所述根据所述检测样本集,训练得到Adaboost级联手势检测器具体为:根据所述检测样本集,计算得到每个检测样本对应的矩形特征集;根据所有检测样本分别对应的矩形特征集,训练得到多个弱分类器;根据Adaboost算法在所述多个弱分类器中筛选出误判率低的若干个弱分 类器组成多个强分类器,其组合方式为:
Figure PCTCN2017111185-appb-000001
其中,M为迭代次数,即得到的弱分类器的个数;αm为每个弱分类器的权值;Gm(x)为弱分类器,f(x)为强分类器;将所述多个强分类器组合成Adaboost级联手势检测器。
本方案可以通过手势样本训练得到多个不同的弱分类器,组成不同的强分类器,最后训练得到的Adaboost级联手势检测器,用于对手势进行初步识别,有较高的识别精确度,识别能力较强,识别正确率高。
优选的,训练得到手势识别卷积神经网络具体为:通过样本增强、归一化方法预处理所述过滤样本集;将所述过滤样本集按预设比例分割为训练样本集、验证样本集和测试样本集;初始化轻量化神经网络S-LeNet,所述S-LeNet为对LeNet进行优化后的神经网络,所述优化具体为使用卷积层和降采样层来代替LeNet的全连接层,以及降低卷积核的个数;通过所述S-LeNet神经网络,使用所述训练样本集、所述验证样本集和所述测试样本集训练得到手势识别卷积神经网络。
通过预处理过滤样本集的方法,能够提高过滤样本集的多样性,提高精确度,并加速网络的收敛,将过滤样本集分成训练样本集、验证样本集和测试样本集,能够得到识别度高的手势识别卷积神经网络。
优选的,根据所述Adaboost级联手势检测器和识别采集到的手势图片,得到识别结果,根据所述手势识别卷积0神经网络对所述手势识别结果进行过滤,得到正确的手势识别结果具体为:使用级联Adaboost分类器对采集到的每一帧待检测图片进行检测,得到多个手势分类图片;将多个手势分类图片按照所述预设规格调整图片大小,得到调整后的手势分类图片;将调整后的手势分类图片输入手势识别卷积神经网络中,以多线程的方式进行过滤,若所述调整后的手势分类图片中包含手势,则保存并显示所述调整后的手势分类图片,否则,过滤所述调整后的手势分类图片。
通过本方案,在识别手势的时候,先通过Adaboost级联手势检测器对手势进行初步识别,得到多个手势识别结果,但是得到的识别结果不够 精确,此时可以再将识别结果全部输入到手势识别卷积神经网络进行过滤,手势识别卷积神经网络在过滤时,以多线程的方式进行,能最大程度地提高过滤效率,大大降低处理时间,过滤之后,能得到精度较高的手势分类结果。
一种机器人的手势识别系统,包括:图片采集模块,用于预先采集包含不同手势及不包含手势的图片,得到样本图片集;检测器训练模块,用于根据人工制作的检测样本集,训练得到Adaboost级联手势检测器;神经网络训练模块,用于根据人工制作的过滤样本集,训练得到手势识别卷积神经网络;手势识别模块,分别与所述检测器训练模块和所述神经网络训练模块电连接,用于通过所述Adaboost级联手势检测器在待检测图片中识别手势,得到手势识别结果,通过所述手势识别卷积神经网络对所述手势识别结果进行过滤,得到正确的手势识别结果。
通过本方案,通过手势样本训练得到的Adaboost级联手势检测器,在复杂环境下识别手势得到的识别结果并不是非常准确,会出现一些误判结果,通过手势识别卷积神经网络对识别结果进行过滤,能够自动筛选出正确的识别结果,使机器人在工作时更加智能化。
优选的,所述检测器训练模块包括:计算子模块,用于根据所述检测样本集,计算得到每个检测样本对应的矩形特征集;弱分类器训练子模块,与所述计算子模块电连接,用于根据所有检测样本分别对应的矩形特征集,训练得到多个弱分类器;强分类器训练子模块,与所述弱分类器训练子模块电连接,用于根据Adaboost算法在多个弱分类器中筛选出误判率低的若干个个弱分类器组成多个强分类器,其组合方式为:
Figure PCTCN2017111185-appb-000002
其中,M为迭代次数,即得到的弱分类器的个数;αm为每个弱分类器的权值;Gm(x)为弱分类器,f(x)为强分类器;检测器训练子模块,与所述强分类器训练子模块电连接,用于将所述多个强分类器组合成Adaboost级联手势检测器。
过本方案可以通过手势样本训练得到多个不同的弱分类器,组成不同 的强分类器,最后训练得到的Adaboost级联手势检测器,用于对手势进行初步识别,有较高的识别精确度,识别能力较强,识别正确率高。
优选的,所述神经网络训练模块包括:处理子模块,通过样本增强、归一化方法预处理手势识别卷积神经网络样本集;样本分类子模块,用于将所述过滤样本集按预设比例分割为训练样本集、验证样本集和测试样本集;处理子模块还用于初始化轻量化神经网络S-LeNet,所述S-LeNet为对LeNet进行优化后的神经网络,所述优化具体为使用卷积层和降采样层来代替LeNet的全连接层,以及降低卷积核的个数;神经网络训练子模块,与所述处理模块和所述样本分类子模块电连接,通过所述S-LeNet神经网络,使用所述训练样本集、所述验证样本集和所述测试样本集训练得到手势识别卷积神经网络。
通过预处理过滤样本集的方法,能够提高过滤样本集的多样性,提高精确度,并加速网络的收敛,将过滤样本集分成训练样本集、验证样本集和测试样本集,能够得到识别度高的手势识别卷积神经网络。
优选的,所述手势识别模块包括:检测子模块,使用级联Adaboost分类器对采集到的每一帧待检测图片进行检测,得到多个手势分类图片;图片调整子模块,与所述检测子模块电连接,用于将多个手势分类图片按照所述预设规格调整图片大小,得到调整后的手势分类图片;依次电连接的过滤子模块、储存子模块和显示子模块,所述过滤子模块与所述图片调整子模块电连接,用于将所述调整后的手势分类图片输入手势识别卷积神经网络中进行过滤,若所述调整后的手势分类图片中包含手势,则通过所述储存子模块保存所述调整后的手势分类图片,并通过显示子模块显示所述调整剪后的手势分类图片,否则,通过所述过滤子模块过滤裁剪后的手势分类图片。
通过本方案,在识别手势的时候,先通过Adaboost级联手势检测器对手势进行初步识别,得到多个手势识别结果,但是得到的识别结果不够精确,此时可以再将识别结果全部输入到手势识别卷积神经网络进行过滤,手势识别卷积神经网络在过滤时,以多线程的方式进行,能最大程度 地提高过滤效率,大大降低处理时间,过滤之后,能得到精度较高的手势分类结果。
一种机器人,其集成有上述的机器人手势识别系统。
根据本发明提供的,能够实现以下至少一种有益效果:
1、能够提高手势识别的正确率,识别手势更加精确。以往单一使用Adaboost级联手势检测器对手势进行识别,但是环境的变化会影响识别结果,导致识别结果正确率不高。本发明训练了一个手势识别卷积神经网络,对Adaboost级联手势检测器识别的结果进行进一步筛选过滤,剔除错误的识别结果,提高了手势识别的正确率。
2、本发明提供了一种改良后的S-LeNet神经网络结构。为了使手势检测在移动端和嵌入式等平台上运行,在保证准确率的同时尽可能的降低网络大小,使用卷积层和降采样层来代替一个全连接层;由于全连接层的参数占整体网络参数比重较大,换成卷积层和降采样层,能够有效的降低网络参数,同时还能增加网络特征提取的能力。由于卷积核个数越多,参数越多,前向传播时间越长,因此在保证网络准确率的同时,本发明尽可能的降低卷积核个数,使手势识别卷积神经网络的识别效率提高,由此训练得到的手势识别卷积神经网络处理速度快,识别的正确率更高,达到较好的识别效果。
附图说明
下面将以明确易懂的方式,结合附图说明优选实施方式,对一种机器人的手势识别方法、系统及机器人的上述特性、技术特征、优点及其实现方式予以进一步说明。
图1是本发明一种机器人的手势识别方法的一个实施例流程图;
图2是本发明一种机器人的手势识别方法的另一个实施例流程图;
图3是本发明中样本制作流程图;
图4是本发明一种机器人的手势识别方法的另一个实施例流程图;
图5是本发明中Adaboost级联手势检测器训练流程图;
图6是本发明中手势识别卷积神经网络训练流程图;
图7是本发明一种机器人的手势识别方法的另一个实施例流程图;
图8是本发明中S-LeNet神经网络结构图;
图9是本发明中手势识别流程图;
图10是本发明手势识别卷积神经网络过滤Adaboost分类器误判的效果图;
图11是本发明一种机器人的手势识别系统的一个结构示意图;
图12是本发明一种机器人的手势识别系统的另一个结构示意图;
图13是本发明一种机器人的手势识别系统的另一个结构示意图。
附图标号说明:
1-图片采集模块、2-检测器训练模块、21-计算子模块、22-弱分类器训练子模块、23-强分类器训练子模块、24-检测器训练子模块、3-神经网络训练模块、31-处理子模块、32-样本分类子模块、33-神经网络训练子模块、4-手势识别模块、41-检测子模块、42-图片调整子模块、43-过滤子模块、44-储存子模块、45-显示子模块。
具体实施方式
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对照附图说明本发明的具体实施方式。显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图,并获得其他的实施方式。
为使图面简洁,各图中只示意性地表示出了与本发明相关的部分,它们并不代表其作为产品的实际结构。另外,以使图面简洁便于理解,在有些图中具有相同结构或功能的部件,仅示意性地绘示了其中的一个,或仅标出了其中的一个。在本文中,“一个”不仅表示“仅此一个”,也可以表示“多于一个”的情形。
如图1所示,本发明提供了一种机器人的手势识别方法的一个实施例,包括:
预先采集包含不同手势及不包含手势的图片,得到样本图片集;
根据所述样本图片集制作检测样本集、过滤样本集;
根据所述检测样本集,训练得到Adaboost级联手势检测器;根据所述过滤样本集,训练得到手势识别卷积神经网络;
通过所述Adaboost级联手势检测器在待检测图片中识别手势,得到手势识别结果,通过所述手势识别卷积神经网络对所述手势识别结果进行过滤,得到正确的手势识别结果。
具体的,在以往的技术中,识别手势的过程只利用了Adaboost级联手势检测器,这样识别得到的结果精确度不够高,而且在例如背景复杂、光线变化等复杂的环境下,识别的正确率会大大降低,识别到的手势结果往往有错误的结果,因此,本发明提供了一种具有深度学习能力的手势识别卷积神经网络,用于过滤掉错误的识别结果,进一步提高识别的正确率。在训练Adaboost级联手势检测器和手势识别卷积神经网络之前,需要预先采集包含不同手势及不包含手势的图片,将之制作成检测样本集、过滤样本集,再通过检测样本集、过滤样本集训练对应的Adaboost级联手势检测器和手势识别卷积神经网络。
如图2所示,本发明提供了一种机器人的手势识别方法的一个实施例,包括:
预先采集包含不同手势及不包含手势的图片,得到样本图片集;
从所述样本图片集中筛选出需要训练的手势对应的图片,作为手势样本训练集;
从所述手势样本训练集中筛选出符合预设样本要求的图片,得到筛选后的手势样本训练集;
对筛选后的手势样本训练集中的每张图片中手势位置进行标记,并对标记过手势的图片按照预设规格进行裁剪,并作为检测正样本集;
将所述样本图片集中不包含手势的图片、包含肉色样本的图片,以及含有其他手势的图片作为检测负样本集;
所述检测样本集包括所述检测正样本集和所述检测负样本集。
将所述检测正样本集作为过滤正样本集;
将所述样本图片集中不包含手势的图片,以及包含肉色样本的图片按 照所述预设规格进行裁剪,得到过滤负样本集。
所述过滤样本集包括所述过滤正样本集和所述过滤负样本集。
根据所述检测样本集,训练得到Adaboost级联手势检测器;根据所述过滤样本集,训练得到手势识别卷积神经网络;
根据所述Adaboost级联手势检测器识别采集到的手势图片,得到手势识别结果,根据所述手势识别卷积神经网络对所述手势识别结果进行过滤,得到正确的手势识别结果。
具体的,本实施例具体阐述了样本的制作过程。样本制作流程如图3所示,首先采集包含不同手势的图片样本,同时也采集不包含手势的图片,并按手势进行分类存储,得到需要训练的手势对应的手势样本集;其次再从手势样本集中筛选出符合预设样本要求的图片,手势的预设样本要求包括手势图片清晰,图片中的手势完整等等;之后,人工分别标记手势样本集中手势的位置并剪裁,将其变换到指定大小,如40×40像素点大小,作为训练两种算法的正样本。
本发明在训练级联Adaboost检测器时,是针对不同的手势,训练不同的分类器,具体样本制作子步骤如下:将上述经过裁剪的样本作为检测正样本集;收集不包含手势的负样本图片,负样本图片中应包含肉色样本,肉色样本是指包含人体皮肤颜色的样本,不同手势的样本也作为负样本,如训练剪刀手检测分类器时,拳头、布的样本作为Adaboost算法负样本集使用。训练级联Adaboost所需的负样本的大小不需要变换成规定大小。将Adaboost算法正样本集和Adaboost算法负样本集作为Adaboost算法样本集。
本发明使用手势识别卷积神经网络进行误判过滤时,只判断手或非手,因此只需进行简单分类即可,具有的样本制作子步骤如下:将Adaboost算法正样本集作为手势识别卷积神经网络正样本集;收集不包含手势的负样本图片,负样本图片中应包含肉色样本;将收集到负样本,裁剪出若干个指定大小的负样本,如像40×40素点大小。
本实施例中的手势样本制作主要通过人工来完成,筛选样本的精度较高,样本比较规范,制作得到的样本训练得到的Adaboost分类器和手势识别卷积神经网络识别度会比较高。
如图4所示,本发明提供了一种机器人的手势识别方法的一个实施例,包括:
预先采集包含不同手势及不包含手势的图片,得到样本图片集;
根据所述样本图片集制作检测样本集、过滤样本集;
根据所述检测样本集,计算得到每个检测样本对应的矩形特征集;
根据所有检测样本分别对应的矩形特征集,训练得到多个弱分类器;
根据Adaboost算法在多个弱分类器中筛选出误判率低的若干个弱分类器组成多个强分类器,其组合方式为:
Figure PCTCN2017111185-appb-000003
其中,M为迭代次数,即得到的弱分类器的个数;αm为每个弱分类器的权值;Gm(x)为弱分类器,f(x)为强分类器;
将所述多个强分类器组合成Adaboost级联手势检测器。
通过样本增强、归一化方法预处理所述过滤样本集;
将所述过滤样本集按预设比例分割为训练样本集、验证样本集和测试样本集;
初始化轻量化神经网络S-LeNet,所述S-LeNet为对LeNet进行优化后的神经网络,所述优化具体为使用卷积层和降采样层来代替LeNet的全连接层,以及降低卷积核的个数;
通过所述S-LeNet神经网络,使用所述训练样本集、所述验证样本集和所述测试样本集训练得到手势识别卷积神经网络;
优选的,根据所述S-LeNet神经网络,使用所述训练样本集对手势识别卷积神经网络进行训练,得到训练准确率,当所述训练准确率达到第一预设期望值时,执行下一步骤,否则,调整所述S-LeNet神经网络的参数继续训练,直到所述训练准确率达到第一预设期望值;
优选的,根据所述S-LeNet神经网络,使用所述验证样本集对训练得到的手势识别卷积神经网络进行验证,得到验证准确率,当所述验证准确率达到第二预设期望值时,执行下一步骤,否则,调整所述S-LeNet神经网络的参数重新训练并验证,直到所述验证准确率达到所述第二预设期望 值;
优选的,根据所述S-LeNet神经网络,使用所述测试样本集对训练得到的手势识别卷积神经网络进行测试,得到测试准确率,当所述测试准确率达到第三预设期望值时,停止训练,得到训练后的所述手势识别卷积神经网络,否则,调整所述S-LeNet神经网络的参数重新训练、验证及测试,直到所述测试准确率达到所述第三预设期望值。
通过所述Adaboost级联手势检测器在待检测图片中识别手势,得到手势识别结果,通过所述手势识别卷积神经网络对所述手势识别结果进行过滤,得到正确的手势识别结果。
具体的,本实施例对如何训练Adaboost级联手势检测器以及手势识别卷积神经网络进行了进一步的说明。
本实施例中训练Adaboost级联手势检测器的流程如图5所示,Adaboost级联手势检测器是由多个强分类器组成而成,强分类器又由多个弱分类器组成,因此在得到级联Adaboost前,先训练多个弱分类器。根据不同手势的样本训练不同的分类器,每种手势训练多层不同分类器,并组合用于手势检测和识别。每个手势(例如,单个剪刀手,其他手势训练流程相同)的强分类器训练流程如下:
1、以检测样本集作为输入,在给定的矩形特征原型下,计算并获得矩形特征集;
2、以矩形特征集作为输入,根据给定的弱学习算法,确定阈值,训练弱分类器;
3、以弱分类器作为输入,根据检测率和误判率,使用Adaboost算法挑选最优的几个弱分类器组成强分类器;
4、以强分类器作为输入,组合成Adaboost级联手势检测器。
通过本实施例提供的Adaboost级联手势检测器训练方法,能够使Adaboost级联手势检测器有较高的识别精度和较好的识别效果,减小手势识别卷积神经网络的处理任务。
本实施例中训练手势识别卷积神经网络的流程如图6所示,具体训练 流程如下:
1、通过样本增强、归一化方法预处理手势过滤样本集,以提高样本的多样性和加速网络的收敛;
2、将所述过滤样本集按预设比例,如6∶2∶2分割为训练样本集、验证样本集和测试样本集;
3、初始化所述神经网络卷积中的S-LeNet神经网络的参数;
4、使用所述训练样本集对手势识别卷积神经网络进行训练,得到训练准确率,当所述训练准确率达到第一预设期望值时,执行下一步骤,否则,调整所述LeNet神经网络的参数继续训练,直到所述训练准确率达到第一预设期望值;
5、使用所述验证样本集对训练得到的手势识别卷积神经网络进行验证,得到验证准确率,当所述验证准确率达到第二预设期望值时,执行下一步骤,否则,调整所述LeNet神经网络的参数重新训练并验证,直到所述验证准确率达到所述第二预设期望值;
6、使用所述测试样本集对训练得到的手势识别卷积神经网络进行测试,得到测试准确率,当所述测试准确率达到第三预设期望值时,停止训练,得到训练后的所述手势识别卷积神经网络,否则,调整所述LeNet神经网络的参数重新训练、验证及测试,直到所述测试准确率达到所述第三预设期望值。
通过本实施例提供的手势识别卷积神经网络训练方法,训练得到的手势识别卷积神经网络识别率非常高,能够将Adaboost级联手势检测器识别的结果进行准确的判定,达到智能化识别的效果。
如图7所示,本发明提供了一种机器人的手势识别方法的一个实施例,包括:
预先采集包含不同手势及不包含手势的图片,得到样本图片集;
根据所述样本图片集制作检测样本集、过滤样本集;
根据所述检测样本集,计算得到每个检测样本对应的矩形特征集;
根据所有检测样本分别对应的矩形特征集,训练得到多个弱分类器;
根据Adaboost算法在多个弱分类器中筛选出误判率低的若干个个弱分类器组成多个强分类器,其组合方式为:
Figure PCTCN2017111185-appb-000004
其中,M为迭代次数,即得到的弱分类器的个数;αm为每个弱分类器的权值;Gm(x)为弱分类器,f(x)为强分类器;
将所述多个强分类器组合成Adaboost级联手势检测器。
通过样本增强、归一化方法预处理所述过滤样本集;
将所述过滤样本集按预设比例分割为训练样本集、验证样本集和测试样本集;
初始化轻量化神经网络S-LeNet,所述S-LeNet为对LeNet进行优化后的神经网络,所述优化具体为使用卷积层和降采样层来代替LeNet的全连接层,以及降低卷积核的个数;
通过所述S-LeNet神经网络,使用所述训练样本集、所述验证样本集和所述测试样本集训练得到手势识别卷积神经网络;
使用级联Adaboost分类器对采集到的每一帧待检测图片进行检测,得到多个手势分类图片;
将多个手势分类图片按照所述预设规格调整图片大小,得到调整后的手势分类图片;
将调整后的手势分类图片输入手势识别卷积神经网络中,以多线程的方式进行过滤,若所述调整后的手势分类图片中包含手势,则保存并显示所述调整后的手势分类图片,否则,过滤所述调整后的手势分类图片。
优选的,所述S-LeNet神经网络具体为:
输入层接收输入的所述过滤样本;
第一卷积层中的每个卷积核通过卷积操作,分别检测输入的所述过滤样本集中每个过滤样本对应的特定特征,得到每个手势识别卷积神经网络手势对应的第一卷积特征集,其卷积操作方式为:
X=f(x*wij+b)
其中,*为二维离散卷积运算符,b为偏置,wij为卷积核,x为输入特 征图,f(·)为激活函数;
第一激活函数层通过非线性变换保留所述第一卷积特征集中符合激活函数要求的特征,删除不符合激活函数要求的特征,得到处理后的第一处理特征集;
第一下采样层对所述第一处理特征集进行聚合统计,得到每个手势识别卷积神经网络手势对应的聚合统计后的第一统计特征集,其统计方法为:
x=f(β·down(x)+b)
其中,β为乘性偏置,down()为下采样函数,b为加性偏置,f(·)为激活函数;
第二卷积层对所述第一下采样层得到的聚合统计后的所述第一统计特征集进行卷积操作,得到第二卷积特征集;
第二激活函数层通过非线性变换保留所述第二卷积特征集中符合激活函数要求的特征,删除不符合激活函数要求的特征,得到处理后的第二处理特征集;
第二下采样层对所述第二处理特征集进行聚合统计,得到每个手势识别卷积神经网络手势对应的聚合统计后的第二统计特征集;
第三卷积层对所述第二下采样层得到的聚合统计后的所述第一统计特征集进行卷积操作,得到第三卷积特征集;
第三激活函数层通过非线性变换保留所述第三卷积特征集中符合激活函数要求的特征,删除不符合激活函数要求的特征,得到处理后的第三处理特征集;
第三下采样层对所述第三处理特征集进行聚合统计,得到每个手势识别卷积神经网络手势对应的聚合统计后的第三统计特征集;
全连接层中所有神经元节点与所述第三下采样层输出的每个手势识别卷积神经网络手势对应的第三特征集中所有的特征点相互连接,其输出函数为:
h(x)=f(w·x+b)
式中:x为全连接层的输入;h(x)为全连接层的输出;w为权值;b为 加性偏置;f(·)为激活函数;
所述全连接层的输出作为输入样本,通过SOFTMAX输出层计算得到K类分类器,所述K类分类器为K维向量,其计算方法为:
Figure PCTCN2017111185-appb-000005
式中:x为输入样本,y为输出,p(y=j|x)为将样本判定为某个类别j的概率。
Figure PCTCN2017111185-appb-000006
为模型参数;
Figure PCTCN2017111185-appb-000007
为归一化函数,对概率分布进行归一化,使得所有概率之和为1。
具体的,本实施例对手势识别卷积神经网络中的S-LeNet神经网络的结构进行了阐述。一般来说,现有的LeNet结构包含两个卷积层、两个降采样层、两个全连接层和一个输出层,为了使手势识别在移动端和嵌入式等平台上运行,在保证准确率的同时尽可能的降低网络大小,本发明使用的网络包含三个卷积层,三个降采样层、一个全连接层和一个输出层,如图8所示。本发明使用卷积层和降采样层来代替一个全连接层。全连接层的参数占整体网络参数比重较大,换成卷积层和降采样层,能够有效的降低网络参数,同时还能增加网络特征提取的能力。本实施例中,还降低卷积核的个数,卷积核个数越多,参数越多,前向传播时间越长,因此在保证网络准确率的同时尽可能的降低卷积核个数。
第一卷积层、第二卷积层、第三卷积层的结构功能都一样,其中的每个卷积核检测输入特征图所有位置上的特定特征,实现了同一输入特征图上的权值共享。为了提取输入特征图不同的特征,则使用不同的卷积核进行卷积操作;手势识别卷积神经网络样本在通过卷积层后,通过非线性变换保留特征中重要的部分并映射出来,去除特征中冗余的部分,同时提高特征的表征能力;常见激活函数有sigmoid、Tanh和Relu等;然后再经过降采样层。对卷积得到的特征图进行聚合统计,从而更加方便的描述高维图片,这种聚合操作就是下采样。下采样操作在降低了输出特征图分辨率的 同时,依旧较好的保持着高分辨率特征图描述的特征;全连接层的所有神经元节点,都与上一层输出的特征图中所有的神经元节点互相连接,然后再通过输出层计算,输出一个K维的向量。在训练每个手势对应的手势识别卷积神经网络的时候,每一个手势训练之后都能得到对应的一个K维向量,如拳头对应的是一个K维向量{ak},剪刀对应的是一个K维向量{bk},布对应的是一个K维向量{ck}。
手势识别的过程可以参照图9所示的流程图,首先使用Adaboost级联手势检测器对采集到的每一帧待检测图片进行检测,得到多个手势分类图片;然后将多个手势分类图片按照所述预设规格,如40×40像素大小进行裁剪,得到手势识别卷积神经网络能够识别的手势分类图片;将裁剪后的手势分类图片输入手势识别卷积神经网络中,以多线程的方式进行识别过滤,在手势识别卷积神经网络在识别手势的时候,通过上述的步骤,将得到一个K维向量,将得到的K维向量和预先训练得到的K维向量进行比对,由此来识别手势,比如识别手势得到的K维向量与拳头对应的K维向量{ak}非常接近,则可以判断识别到的手势为拳头。若识别到图片中包含手势,则保存并显示此图片,否则,过滤此图片。如图10所示,Adaboost级联手势检测器检测到的结果为三个黑框,但是由于背景比较复杂,检测到的结果不是很精确,经过手势识别卷积神经网络过滤之后将识别到的结果用白色框显示出来。
如图11所示,本发明提供了一种机器人的手势识别系统的一个实施例,包括:
图片采集模块1,用于预先采集包含不同手势及不包含手势的图片,得到样本图片集;
检测器训练模块2,用于根据人工制作的检测样本集,训练得到Adaboost级联手势检测器;
神经网络训练模块3,用于根据人工制作的过滤样本集,训练得到手势识别卷积神经网络;
手势识别模块4,分别与所述检测器训练模块2和所述神经网络训练 模块3电连接,用于通过所述Adaboost级联手势检测器在待检测图片中识别手势,得到手势识别结果,通过所述手势识别卷积神经网络对所述手势识别结果进行过滤,得到正确的手势识别结果。
具体的,机器人可以安装多个摄像头,这些摄像头采集包含不同手势及不包含手势的图片,得到样本图片集;之后人工处理样本图片集,制作得到检测样本集、过滤样本集,之后通过检测器训练模块和神经网络训练模块分别训练得到Adaboost级联手势检测器和手势识别卷积神经网络。在机器人识别手势的时候,通过手势识别模块,先使用Adaboost级联手势检测器对手势进行一个初步识别,得到多个结果。由于Adaboost级联手势检测器在复杂环境下识别的精度不够高,得到的结果有可能会出现错误的结果,因此,再使用手势识别卷积神经网络对得到的结果进行过滤,筛选出正确的结果,并显示在屏幕上,完成识别过程。
如图12所示,本发明提供了一种机器人的手势识别系统的一个实施例,在上一实施例的基础上,本实施例包括:
图片采集模块1,用于预先采集包含不同手势及不包含手势的图片,得到样本图片集;
检测器训练模块2,用于根据人工制作的检测样本集,训练得到Adaboost级联手势检测器;
检测器训练模块2包括:
计算子模块21,用于根据所述检测样本集,计算得到每个检测样本对应的矩形特征集;
弱分类器训练子模块22,与所述计算子模块电连接,用于根据所有检测样本分别对应的矩形特征集,训练得到多个弱分类器;
强分类器训练子模块23,与所述弱分类器训练子模块电连接,用于根据Adaboost算法在多个弱分类器中筛选出误判率低的若干个个弱分类器组成多个强分类器,其组合方式为:
Figure PCTCN2017111185-appb-000008
其中,M为迭代次数,即得到的弱分类器的个数;αm为每个弱分类 器的权值;Gm(x)为弱分类器,f(x)为强分类器;
检测器训练子模块24,与所述强分类器训练子模块电连接,用于将所述多个强分类器组合成Adaboost级联手势检测器。
神经网络训练模块3,用于根据人工制作的过滤样本集,训练得到手势识别卷积神经网络;
神经网络训练模块3包括:
处理子模块31,通过样本增强、归一化方法预处理手势识别卷积神经网络样本集;
样本分类子模块32,用于将所述过滤样本集按预设比例分割为训练样本集、验证样本集和测试样本集;
处理子模块31还用于初始化轻量化神经网络S-LeNet,所述S-LeNet为对LeNet进行优化后的神经网络,所述优化具体为使用卷积层和降采样层来代替LeNet的全连接层,以及降低卷积核的个数;
神经网络训练子模块33,与所述处理子模块31和所述样本分类子模块32电连接,通过所述S-LeNet神经网络,使用所述训练样本集、所述验证样本集和所述测试样本集训练得到手势识别卷积神经网络。
手势识别模块4,分别与所述检测器训练模块2和所述神经网络训练模块3电连接,用于根据所述Adaboost级联手势检测器和所述手势识别卷积神经网络识别采集到的手势图片。
具体的,本实施例对如何训练Adaboost级联手势检测器以及手势识别卷积神经网络进行了进一步的说明。
本实施例中,Adaboost级联手势检测器是由多个强分类器组成而成,强分类器又由多个弱分类器组成,因此在得到级联Adaboost前,先训练多个弱分类器。根据不同手势的样本训练不同的分类器,每种手势训练多层不同分类器,并组合用于手势检测和识别。每个手势的强分类器训练流程如下:
首先,计算子模块以检测样本集作为输入,在给定的矩形特征原型下,计算并获得矩形特征集;
弱分类器训练子模块以矩形特征集作为输入,根据给定的弱学习算法,确定阈值,训练弱分类器;
强分类器训练子模块以弱分类器作为输入,根据检测率和误判率,使用Adaboost算法挑选最优的几个弱分类器组成强分类器;
检测器训练子模块以强分类器作为输入,组合成Adaboost级联手势检测器;
通过本实施例提供的Adaboost级联手势检测器训练方法,能够使Adaboost级联手势检测器有较高的识别精度和较好的识别效果,减小手势识别卷积神经网络的处理任务。
训练手势识别卷积神经网络具体如下:处理子模块通过样本增强、归一化方法预处理手势过滤样本集,以提高样本的多样性和加速网络的收敛;人工将所述过滤样本集按预设比例,如6∶2∶2分割为训练样本集、验证样本集和测试样本集;处理子模块初始化所述神经网络卷积中的S-LeNet神经网络的参数;
神经网络训练子模块使用所述训练样本集对手势识别卷积神经网络进行训练,得到训练准确率,当所述训练准确率达到第一预设期望值时,执行下一步骤,否则,调整所述S-LeNet神经网络的参数继续训练,直到所述训练准确率达到第一预设期望值;
神经网络训练子模块使用所述验证样本集对训练得到的手势识别卷积神经网络进行验证,得到验证准确率,当所述验证准确率达到第二预设期望值时,执行下一步骤,否则,调整所述S-LeNet神经网络的参数重新训练并验证,直到所述验证准确率达到所述第二预设期望值;
神经网络训练子模块使用所述测试样本集对训练得到的手势识别卷积神经网络进行测试,得到测试准确率,当所述测试准确率达到第三预设期望值时,停止训练,得到训练后的所述手势识别卷积神经网络,否则,调整所述S-LeNet神经网络的参数重新训练、验证及测试,直到所述测试准确率达到所述第三预设期望值。
通过本实施例提供的手势识别卷积神经网络训练方法,训练得到的手 势识别卷积神经网络识别率非常高,能够将Adaboost级联手势检测器识别的结果进行准确的判定,达到智能化识别的效果。
如图13所示,本发明提供了一种机器人的手势识别系统的一个实施例,包括:
图片采集模块1,用于预先采集包含不同手势及不包含手势的图片,得到样本图片集;
检测器训练模块2,用于根据人工制作的检测样本集,训练得到Adaboost级联手势检测器;
检测器训练模块2包括:
计算子模块21,用于根据所述检测样本集,计算得到每个检测样本对应的矩形特征集;
弱分类器训练子模块22,与所述计算子模块电连接,用于根据所有检测样本分别对应的矩形特征集,训练得到多个弱分类器;
强分类器训练子模块23,与所述弱分类器训练子模块电连接,用于根据Adaboost算法在多个弱分类器中筛选出误判率低的若干个个弱分类器组成多个强分类器,其组合方式为:
Figure PCTCN2017111185-appb-000009
其中,M为迭代次数,即得到的弱分类器的个数;αm为每个弱分类器的权值;Gm(x)为弱分类器,f(x)为强分类器;
检测器训练子模块24,与所述强分类器训练子模块电连接,用于将所述多个强分类器组合成Adaboost级联手势检测器。
神经网络训练模块3,用于根据人工制作的过滤样本集,训练得到手势识别卷积神经网络;
神经网络训练模块3包括:
处理子模块31,通过样本增强、归一化方法预处理手势识别卷积神经网络样本集;
样本分类子模块32,用于将所述过滤样本集按预设比例分割为训练样本集、验证样本集和测试样本集;
处理子模块31还用于初始化轻量化神经网络S-LeNet,所述S-LeNet为对LeNet进行优化后的神经网络,所述优化具体为使用卷积层和降采样层来代替LeNet的全连接层,以及降低卷积核的个数;
神经网络训练子模块33,与所述处理子模块31和所述样本分类子模块32电连接,通过所述S-LeNet神经网络,使用所述训练样本集、所述验证样本集和所述测试样本集训练得到手势识别卷积神经网络。
手势识别模块4,分别与所述检测器训练模块2和所述神经网络训练模块3电连接,用于根据所述Adaboost级联手势检测器和所述手势识别卷积神经网络识别采集到的手势图片。
手势识别模块4包括:
检测子模块41,使用级联Adaboost分类器对采集到的每一帧待检测图片进行检测,得到多个手势分类图片;
图片调整子模块42,与所述检测子模块41电连接,用于将多个手势分类图片按照所述预设规格调整图片大小,得到调整后的手势分类图片;
依次电连接的过滤子模块43、储存子模块44和显示子模块45,所述过滤子模块43与所述图片调整子模块42电连接,用于将所述调整后的手势分类图片输入手势识别卷积神经网络中进行过滤,若所述调整后的手势分类图片中包含手势,则通过所述储存子模块保存所述调整后的手势分类图片,并通过显示子模块45显示所述调整剪后的手势分类图片,否则,通过所述过滤子模块过滤裁剪后的手势分类图片。
优选的,所述神经网络训练子模块33训练得到的S-LeNet神经网络包括:
输入层,用于接收输入的所述过滤样本;
第一卷积层,所述第一卷积层中的每个卷积核通过卷积操作,分别检测输入的所述过滤样本集中每个过滤样本对应的特定特征,得到每个手势识别卷积神经网络手势对应的第一卷积特征集,其卷积操作方式为:
X=f(x*wij+b)
其中,*为二维离散卷积运算符,b为偏置,wij为卷积核,x为输入特征图,f(·)为激活函数;
第一激活函数层,通过非线性变换保留所述第一卷积特征集中符合激活函数要求的特征,删除不符合激活函数要求的特征,得到处理后的第一处理特征集;
第一下采样层,对所述第一处理特征集进行聚合统计,得到每个手势识别卷积神经网络手势对应的聚合统计后的第一统计特征集,其统计方法为:
x=f(β·down(x)+b)
其中,β为乘性偏置,down()为下采样函数,b为加性偏置,f(·)为激活函数;
第二卷积层,对所述第一下采样层得到的聚合统计后的手势第一统计特征集进行卷积操作,得到第二卷积特征集;
第二激活函数层,通过非线性变换保留所述第二卷积特征集中符合激活函数要求的特征,删除不符合激活函数要求的特征,得到处理后的第二处理特征集;
第二下采样层,对所述第二处理特征集进行聚合统计,得到每个手势识别卷积神经网络手势对应的聚合统计后的第二统计特征集;
第三卷积层,对所述第二下采样层得到的聚合统计后的手势第一统计特征集进行卷积操作,得到第三卷积特征集;
第三激活函数层,通过非线性变换保留所述第三卷积特征集中符合激活函数要求的特征,删除不符合激活函数要求的特征,得到处理后的第三处理特征集;
第三下采样层,对所述第三处理特征集进行聚合统计,得到每个手势识别卷积神经网络手势对应的聚合统计后的第三统计特征集;
全连接层,所述全连接层中所有神经元节点与所述第三下采样层输出的每个手势识别卷积神经网络手势对应的第三特征集中所有的特征点相互连接,其输出函数为:
h(x)=f(w·x+b)
式中:x为全连接层的输入;h(x)为全连接层的输出;w为权值;b为加性偏置;f(·)为激活函数;
输出层,用于将所述全连接层的输出作为输入样本,计算得到K类分类器,所述K类分类器为K维向量,其计算方法为:
Figure PCTCN2017111185-appb-000010
式中:x为输入样本,y为输出,p(y=j|x)为将样本判定为某个类别j的概率。
Figure PCTCN2017111185-appb-000011
为模型参数;
Figure PCTCN2017111185-appb-000012
为归一化函数,对概率分布进行归一化,使得所有概率之和为1。
一般来说,当前的LeNet神经网络结构包含两个卷积层、两个降采样层、两个全连接层和一个输出层,为了使手势识别在移动端和嵌入式等平台上运行,在保证准确率的同时尽可能的降低网络大小,本发明使用的网络包含三个卷积层,三个降采样层、一个全连接层和一个输出层,如图8所示。本发明使用卷积层和降采样层来代替一个全连接层。全连接层的参数占整体网络参数比重较大,换成卷积层和降采样层,能够有效的降低网络参数,同时还能增加网络特征提取的能力。本实施例中,还降低卷积核的个数,卷积核个数越多,参数越多,前向传播时间越长,因此在保证网络准确率的同时尽可能的降低卷积核个数。
第一卷积层、第二卷积层、第三卷积层的结构功能都一样,其中的每个卷积核检测输入特征图所有位置上的特定特征,实现了同一输入特征图上的权值共享。为了提取输入特征图不同的特征,则使用不同的卷积核进行卷积操作;手势识别卷积神经网络样本在通过卷积层后,通过非线性变换保留特征中重要的部分并映射出来,去除特征中冗余的部分,同时提高特征的表征能力;常见激活函数有sigmoid、Tanh和Relu等;然后再经过降采样层。对卷积得到的特征图进行聚合统计,从而更加方便的描述高维图片,这种聚合操作就是下采样。下采样操作在降低了输出特征图分辨率的同时,依旧较好的保持着高分辨率特征图描述的特征;全连接层的所有神经元节点,都与上一层输出的特征图中所有的神经元节点互相连接,然 后再通过输出层计算,输出一个K维的向量。在训练每个手势对应的手势识别卷积神经网络的时候,每一个手势训练之后都能得到对应的一个K维向量,如拳头对应的是一个K维向量{ak},剪刀对应的是一个K维向量{bk},布对应的是一个K维向量{ck}。
在对手势进行识别之后,首先使用Adaboost级联手势检测器对采集到的每一帧待检测图片进行检测,得到多个手势分类图片;然后将多个手势分类图片按照所述预设规格,如40×40像素大小进行裁剪,得到手势识别卷积神经网络能够识别的手势分类图片;将裁剪后的手势分类图片输入手势识别卷积神经网络中,以多线程的方式进行识别过滤,在手势识别卷积神经网络在识别手势的时候,通过上述的步骤,将得到一个K维向量,将得到的K维向量和预先训练得到的K维向量进行比对,由此来识别手势。若识别到图片中包含手势,则保存并显示此图片,否则,过滤此图片。
在本发明的另一个实施例中,一种机器人,集成有上述各实施例中任一手势识别系统。
应当说明的是,上述实施例均可根据需要自由组合。以上所述仅是本发明的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本发明的保护范围。

Claims (10)

  1. 一种机器人的手势识别方法,其特征在于,包括:
    预先采集包含不同手势及不包含手势的图片,得到样本图片集;
    根据所述样本图片集制作检测样本集、过滤样本集;
    根据所述检测样本集,训练得到Adaboost级联手势检测器;根据所述过滤样本集,训练得到手势识别卷积神经网络;
    通过所述Adaboost级联手势检测器在待检测图片中识别手势,得到手势识别结果,通过所述手势识别卷积神经网络对所述手势识别结果进行过滤,得到正确的手势识别结果。
  2. 如权利要求1所述的一种机器人的手势识别方法,其特征在于,根据所述样本图片集制作检测样本集、过滤样本集具体为:
    从所述样本图片集中筛选出需要训练的手势对应的图片,作为手势样本集;
    从所述手势样本集中筛选出符合预设样本要求的图片,得到筛选后的手势样本集;
    对筛选后的手势样本集中的每张图片中手势位置进行标记,并对标记过手势的图片按照预设规格进行裁剪,作为检测正样本集;
    将所述样本图片集中不包含手势的图片、包含肉色样本的图片,以及含有其他手势的图片作为检测负样本集;
    将所述检测正样本集和所述检测负样本集组合成所述检测样本集;
    将所述检测正样本集作为过滤正样本集;
    将所述样本图片集中不包含手势的图片,以及包含肉色样本的图片按照所述预设规格进行裁剪,得到过滤负样本集;
    所述过滤正样本集和所述过滤负样本集组合成所述过滤样本集。
  3. 如权利要求1所述的一种机器人的手势识别方法,其特征在于,所述根据所述检测样本集,训练得到Adaboost级联手势检测器具体为:
    根据所述检测样本集,计算得到每个检测样本对应的矩形特征集;
    根据所有检测样本分别对应的矩形特征集,训练得到多个弱分类器;
    根据Adaboost算法在所述多个弱分类器中筛选出误判率低的若干个弱分类器组成多个强分类器,其组合方式为:
    Figure PCTCN2017111185-appb-100001
    其中,M为迭代次数,即得到的弱分类器的个数;αm为每个弱分类器的权值;Gm(x)为弱分类器,f(x)为强分类器;
    将所述多个强分类器组合成Adaboost级联手势检测器。
  4. 如权利要求1所述的一种机器人的手势识别方法,其特征在于,训练得到手势识别卷积神经网络具体为:
    通过样本增强、归一化方法预处理所述过滤样本集;
    将所述过滤样本集按预设比例分割为训练样本集、验证样本集和测试样本集;
    初始化轻量化神经网络S-LeNet,所述S-LeNet为对LeNet进行优化后的神经网络,所述优化具体为使用卷积层和降采样层来代替LeNet的全连接层,以及降低卷积核的个数;
    通过所述S-LeNet神经网络,使用所述训练样本集、所述验证样本集和所述测试样本集训练得到手势识别卷积神经网络。
  5. 如权利要求1~4中任一项所述的一种机器人的手势识别方法,其特征在于,根据所述Adaboost级联手势检测器和识别采集到的手势图片,得到识别结果,根据所述手势识别卷积0神经网络对所述手势识别结果进行过滤,得到正确的手势识别结果具体为:
    使用级联Adaboost分类器对采集到的每一帧待检测图片进行检测,得到多个手势分类图片;
    将多个手势分类图片按照所述预设规格调整图片大小,得到调整后的手势分类图片;
    将调整后的手势分类图片输入手势识别卷积神经网络中,以多线程的方式进行过滤,若所述调整后的手势分类图片中包含手势,则保存并显示所述调整后的手势分类图片,否则,过滤所述调整后的手势分类图片。
  6. 一种机器人的手势识别系统,其特征在于,包括:
    图片采集模块,用于预先采集包含不同手势及不包含手势的图片,得到样本图片集;
    检测器训练模块,用于根据人工制作的检测样本集,训练得到Adaboost级联手势检测器;
    神经网络训练模块,用于根据人工制作的过滤样本集,训练得到手势识别卷积神经网络;
    手势识别模块,分别与所述检测器训练模块和所述神经网络训练模块电连接,用于通过所述Adaboost级联手势检测器在待检测图片中识别手势,得到手势识别结果,通过所述手势识别卷积神经网络对所述手势识别结果进行过滤,得到正确的手势识别结果。
  7. 如权利要求6所述的一种机器人的手势识别系统,其特征在于,所述检测器训练模块包括:
    计算子模块,用于根据所述检测样本集,计算得到每个检测样本对应的矩形特征集;
    弱分类器训练子模块,与所述计算子模块电连接,用于根据所有检测样本分别对应的矩形特征集,训练得到多个弱分类器;
    强分类器训练子模块,与所述弱分类器训练子模块电连接,用于根据Adaboost算法在多个弱分类器中筛选出误判率低的若干个个弱分类器组成多个强分类器,其组合方式为:
    Figure PCTCN2017111185-appb-100002
    其中,M为迭代次数,即得到的弱分类器的个数;αm为每个弱分类 器的权值;Gm(x)为弱分类器,f(x)为强分类器;
    检测器训练子模块,与所述强分类器训练子模块电连接,用于将所述多个强分类器组合成Adaboost级联手势检测器。
  8. 如权利要求6所述的一种机器人的手势识别系统,其特征在于,所述神经网络训练模块包括:
    处理子模块,通过样本增强、归一化方法预处理手势识别卷积神经网络样本集;
    样本分类子模块,用于将所述过滤样本集按预设比例分割为训练样本集、验证样本集和测试样本集;
    处理子模块还用于初始化轻量化神经网络S-LeNet,所述S-LeNet为对LeNet进行优化后的神经网络,所述优化具体为使用卷积层和降采样层来代替LeNet的全连接层,以及降低卷积核的个数;
    神经网络训练子模块,与所述处理子模块和所述样本分类子模块电连接,通过所述S-LeNet神经网络,使用所述训练样本集、所述验证样本集和所述测试样本集训练得到手势识别卷积神经网络。
  9. 如权利要求6~8任一项所述的一种机器人的手势识别系统,其特征在于,所述手势识别模块包括:
    检测子模块,使用级联Adaboost分类器对采集到的每一帧待检测图片进行检测,得到多个手势分类图片;
    图片调整子模块,与所述检测子模块电连接,用于将多个手势分类图片按照所述预设规格调整图片大小,得到调整后的手势分类图片;
    依次电连接的过滤子模块、储存子模块和显示子模块,所述过滤子模块与所述图片调整子模块电连接,用于将所述调整后的手势分类图片输入手势识别卷积神经网络中进行过滤,若所述调整后的手势分类图片中包含手势,则通过所述储存子模块保存所述调整后的手势分类图片,并通过显示子模块显示所述调整剪后的手势分类图片,否则,通过所述 过滤子模块过滤裁剪后的手势分类图片。
  10. 一种机器人,其特征在于,集成有如权利要求6~9中任一项所述的一种机器人的手势识别系统。
PCT/CN2017/111185 2017-10-25 2017-11-15 一种机器人的手势识别方法、系统及机器人 WO2019080203A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711006447.XA CN107729854A (zh) 2017-10-25 2017-10-25 一种机器人的手势识别方法、系统及机器人
CN201711006447.X 2017-10-25

Publications (1)

Publication Number Publication Date
WO2019080203A1 true WO2019080203A1 (zh) 2019-05-02

Family

ID=61213476

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/111185 WO2019080203A1 (zh) 2017-10-25 2017-11-15 一种机器人的手势识别方法、系统及机器人

Country Status (2)

Country Link
CN (1) CN107729854A (zh)
WO (1) WO2019080203A1 (zh)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222645A (zh) * 2019-06-10 2019-09-10 济南大学 一种手势误识特征发现方法
GB2572472A (en) * 2018-02-01 2019-10-02 Ford Global Tech Llc Validating gesture recognition capabilities of automated systems
CN110348417A (zh) * 2019-07-17 2019-10-18 济南大学 一种深度手势识别算法的优化方法
CN111160114A (zh) * 2019-12-10 2020-05-15 深圳数联天下智能科技有限公司 手势识别方法、装置、设备及计算机可读存储介质
CN111428639A (zh) * 2020-03-24 2020-07-17 京东方科技集团股份有限公司 手势识别模型的训练方法、手势识别方法及装置
CN112053354A (zh) * 2020-09-15 2020-12-08 上海应用技术大学 一种轨道板裂缝检测方法
CN112132192A (zh) * 2020-09-07 2020-12-25 北京海益同展信息科技有限公司 一种模型训练方法、装置、电子设备及存储介质
CN112163447A (zh) * 2020-08-18 2021-01-01 桂林电子科技大学 基于Attention和SqueezeNet的多任务实时手势检测和识别方法
CN112764349A (zh) * 2019-11-01 2021-05-07 佛山市云米电器科技有限公司 晾衣架控制方法、晾衣架、系统及存储介质
CN113297956A (zh) * 2021-05-22 2021-08-24 温州大学 一种基于视觉的手势识别方法及系统
CN113837263A (zh) * 2021-09-18 2021-12-24 浙江理工大学 基于特征融合注意力模块和特征选择的手势图像分类方法
CN113934302A (zh) * 2021-10-21 2022-01-14 燕山大学 基于SeNet和门控时序卷积网络的肌电手势识别方法
CN113945566A (zh) * 2021-11-16 2022-01-18 南京华鼎纳米技术研究院有限公司 一种滤网失效检测方法
CN114740970A (zh) * 2022-02-23 2022-07-12 广东工业大学 一种基于联邦学习的毫米波手势识别方法及系统
CN116766213A (zh) * 2023-08-24 2023-09-19 烟台大学 一种基于图像处理的仿生手控制方法、系统和设备

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108629288B (zh) * 2018-04-09 2020-05-19 华中科技大学 一种手势识别模型训练方法、手势识别方法及系统
CN109446961B (zh) * 2018-10-19 2020-10-30 北京达佳互联信息技术有限公司 姿势检测方法、装置、设备及存储介质
CN109740738B (zh) * 2018-12-29 2022-12-16 腾讯科技(深圳)有限公司 一种神经网络模型训练方法、装置、设备和介质
CN111367415B (zh) * 2020-03-17 2024-01-23 北京明略软件系统有限公司 一种设备的控制方法、装置、计算机设备和介质
CN111401261B (zh) * 2020-03-18 2022-06-10 金陵科技学院 基于gan-cnn框架的机器人手势识别方法
CN111582235B (zh) * 2020-05-26 2023-04-07 瑞纳智能设备股份有限公司 用于实时监控站内异常事件的警报方法、系统及设备
CN111563483B (zh) * 2020-06-22 2024-06-11 武汉芯昌科技有限公司 一种基于精简lenet5模型的图像识别方法及系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105404845A (zh) * 2014-09-15 2016-03-16 腾讯科技(深圳)有限公司 图片处理方法及装置
CN106485214A (zh) * 2016-09-28 2017-03-08 天津工业大学 一种基于卷积神经网络的眼睛和嘴部状态识别方法
CN106600595A (zh) * 2016-12-21 2017-04-26 厦门可睿特信息科技有限公司 一种基于人工智能算法的人体特征尺寸自动测量方法
CN107179683A (zh) * 2017-04-01 2017-09-19 浙江工业大学 一种基于神经网络的交互机器人智能运动检测与控制方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105404845A (zh) * 2014-09-15 2016-03-16 腾讯科技(深圳)有限公司 图片处理方法及装置
CN106485214A (zh) * 2016-09-28 2017-03-08 天津工业大学 一种基于卷积神经网络的眼睛和嘴部状态识别方法
CN106600595A (zh) * 2016-12-21 2017-04-26 厦门可睿特信息科技有限公司 一种基于人工智能算法的人体特征尺寸自动测量方法
CN107179683A (zh) * 2017-04-01 2017-09-19 浙江工业大学 一种基于神经网络的交互机器人智能运动检测与控制方法

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10726248B2 (en) 2018-02-01 2020-07-28 Ford Global Technologies, Llc Validating gesture recognition capabilities of automated systems
GB2572472A (en) * 2018-02-01 2019-10-02 Ford Global Tech Llc Validating gesture recognition capabilities of automated systems
GB2572472B (en) * 2018-02-01 2021-02-17 Ford Global Tech Llc Validating gesture recognition capabilities of automated systems
CN110222645A (zh) * 2019-06-10 2019-09-10 济南大学 一种手势误识特征发现方法
CN110222645B (zh) * 2019-06-10 2022-09-27 济南大学 一种手势误识特征发现方法
CN110348417A (zh) * 2019-07-17 2019-10-18 济南大学 一种深度手势识别算法的优化方法
CN110348417B (zh) * 2019-07-17 2022-09-30 济南大学 一种深度手势识别算法的优化方法
CN112764349A (zh) * 2019-11-01 2021-05-07 佛山市云米电器科技有限公司 晾衣架控制方法、晾衣架、系统及存储介质
CN111160114A (zh) * 2019-12-10 2020-05-15 深圳数联天下智能科技有限公司 手势识别方法、装置、设备及计算机可读存储介质
CN111160114B (zh) * 2019-12-10 2024-03-19 深圳数联天下智能科技有限公司 手势识别方法、装置、设备及计算机可读存储介质
CN111428639A (zh) * 2020-03-24 2020-07-17 京东方科技集团股份有限公司 手势识别模型的训练方法、手势识别方法及装置
CN112163447B (zh) * 2020-08-18 2022-04-08 桂林电子科技大学 基于Attention和SqueezeNet的多任务实时手势检测和识别方法
CN112163447A (zh) * 2020-08-18 2021-01-01 桂林电子科技大学 基于Attention和SqueezeNet的多任务实时手势检测和识别方法
CN112132192A (zh) * 2020-09-07 2020-12-25 北京海益同展信息科技有限公司 一种模型训练方法、装置、电子设备及存储介质
CN112053354A (zh) * 2020-09-15 2020-12-08 上海应用技术大学 一种轨道板裂缝检测方法
CN112053354B (zh) * 2020-09-15 2024-01-30 上海应用技术大学 一种轨道板裂缝检测方法
CN113297956B (zh) * 2021-05-22 2023-12-08 温州大学 一种基于视觉的手势识别方法及系统
CN113297956A (zh) * 2021-05-22 2021-08-24 温州大学 一种基于视觉的手势识别方法及系统
CN113837263A (zh) * 2021-09-18 2021-12-24 浙江理工大学 基于特征融合注意力模块和特征选择的手势图像分类方法
CN113837263B (zh) * 2021-09-18 2023-09-26 浙江理工大学 基于特征融合注意力模块和特征选择的手势图像分类方法
CN113934302A (zh) * 2021-10-21 2022-01-14 燕山大学 基于SeNet和门控时序卷积网络的肌电手势识别方法
CN113934302B (zh) * 2021-10-21 2024-02-06 燕山大学 基于SeNet和门控时序卷积网络的肌电手势识别方法
CN113945566A (zh) * 2021-11-16 2022-01-18 南京华鼎纳米技术研究院有限公司 一种滤网失效检测方法
CN114740970A (zh) * 2022-02-23 2022-07-12 广东工业大学 一种基于联邦学习的毫米波手势识别方法及系统
CN114740970B (zh) * 2022-02-23 2024-05-24 广东工业大学 一种基于联邦学习的毫米波手势识别方法及系统
CN116766213A (zh) * 2023-08-24 2023-09-19 烟台大学 一种基于图像处理的仿生手控制方法、系统和设备
CN116766213B (zh) * 2023-08-24 2023-11-03 烟台大学 一种基于图像处理的仿生手控制方法、系统和设备

Also Published As

Publication number Publication date
CN107729854A (zh) 2018-02-23

Similar Documents

Publication Publication Date Title
WO2019080203A1 (zh) 一种机器人的手势识别方法、系统及机器人
US11681418B2 (en) Multi-sample whole slide image processing in digital pathology via multi-resolution registration and machine learning
CN110298266B (zh) 基于多尺度感受野特征融合的深度神经网络目标检测方法
Zhang et al. Research on face detection technology based on MTCNN
CN109344701B (zh) 一种基于Kinect的动态手势识别方法
CN106960195B (zh) 一种基于深度学习的人群计数方法及装置
CN107133616B (zh) 一种基于深度学习的无分割字符定位与识别方法
CN108717524B (zh) 一种基于双摄手机和人工智能系统的手势识别系统
CN108520226B (zh) 一种基于躯体分解和显著性检测的行人重识别方法
US8792722B2 (en) Hand gesture detection
WO2017096753A1 (zh) 人脸关键点跟踪方法、终端和非易失性计算机可读存储介质
CN107273832B (zh) 基于积分通道特征与卷积神经网络的车牌识别方法及系统
CN112801146B (zh) 一种目标检测方法及系统
CN107909081B (zh) 一种深度学习中图像数据集的快速获取和快速标定方法
CN108647625A (zh) 一种表情识别方法及装置
CN104504366A (zh) 基于光流特征的笑脸识别系统及方法
CN110781829A (zh) 一种轻量级深度学习的智慧营业厅人脸识别方法
CN111401145B (zh) 一种基于深度学习与ds证据理论的可见光虹膜识别方法
CN109190456B (zh) 基于聚合通道特征和灰度共生矩阵的多特征融合俯视行人检测方法
CN108734200B (zh) 基于bing特征的人体目标视觉检测方法和装置
CN104143081A (zh) 基于嘴部特征的笑脸识别系统及方法
CN106529441B (zh) 基于模糊边界分片的深度动作图人体行为识别方法
CN114332942A (zh) 基于改进YOLOv3的夜间红外行人检测方法及系统
CN113487610A (zh) 疱疹图像识别方法、装置、计算机设备和存储介质
WO2020119624A1 (zh) 一种基于深度学习的类别敏感型边缘检测方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17930034

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17930034

Country of ref document: EP

Kind code of ref document: A1