CN108717524B - Gesture recognition system based on double-camera mobile phone and artificial intelligence system - Google Patents
Gesture recognition system based on double-camera mobile phone and artificial intelligence system Download PDFInfo
- Publication number
- CN108717524B CN108717524B CN201810402470.9A CN201810402470A CN108717524B CN 108717524 B CN108717524 B CN 108717524B CN 201810402470 A CN201810402470 A CN 201810402470A CN 108717524 B CN108717524 B CN 108717524B
- Authority
- CN
- China
- Prior art keywords
- image
- gesture
- depth
- neural network
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000013473 artificial intelligence Methods 0.000 title claims abstract description 10
- 238000013528 artificial neural network Methods 0.000 claims abstract description 50
- 238000000034 method Methods 0.000 claims abstract description 38
- 238000012549 training Methods 0.000 claims abstract description 33
- 238000007781 pre-processing Methods 0.000 claims abstract description 27
- 238000010801 machine learning Methods 0.000 claims abstract description 8
- 238000001514 detection method Methods 0.000 claims abstract description 6
- 230000000007 visual effect Effects 0.000 claims abstract description 3
- 230000011218 segmentation Effects 0.000 claims description 26
- 238000012545 processing Methods 0.000 claims description 14
- 210000004205 output neuron Anatomy 0.000 claims description 6
- 230000009466 transformation Effects 0.000 claims description 4
- 230000002708 enhancing effect Effects 0.000 claims description 3
- 238000003709 image segmentation Methods 0.000 claims description 3
- 210000002364 input neuron Anatomy 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims 1
- 238000012360 testing method Methods 0.000 description 25
- 230000000694 effects Effects 0.000 description 15
- 230000008569 process Effects 0.000 description 11
- 238000010586 diagram Methods 0.000 description 9
- 238000013527 convolutional neural network Methods 0.000 description 6
- 238000011161 development Methods 0.000 description 5
- 230000018109 developmental process Effects 0.000 description 5
- 238000001914 filtration Methods 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 3
- 239000003086 colorant Substances 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 238000011176 pooling Methods 0.000 description 3
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 2
- 206010048245 Yellow skin Diseases 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000013210 evaluation model Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000010008 shearing Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/28—Recognition of hand or arm movements, e.g. recognition of deaf sign language
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/2163—Partitioning the feature space
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Probability & Statistics with Applications (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a gesture recognition system based on a double-camera mobile phone and an artificial intelligence system, which realizes the recognition of human body gestures by utilizing the double-camera mobile phone and machine learning, wherein an image acquisition module is used for acquiring two different original images generated by different camera visual angles, including color images of a left camera and a right camera and images containing depth information, and storing the images; the image preprocessing module is used for intercepting a gesture area from an original image and acquiring a depth image of the gesture area; the neural network training module is used for training the acquired depth image by using a depth neural network to obtain a neural network system with the recognition accuracy rate of more than 92%; and the gesture detection and recognition module is used for returning a gesture recognition result according to the gesture image input information to be recognized. Compared with the prior art, the method increases the depth information and has more accurate gesture information, so that the recognition accuracy is higher.
Description
Technical Field
The invention relates to the technologies of computer image processing and artificial intelligence, in particular to a system and a method for gesture recognition by acquiring a 3D image through binocular stereo vision.
Background
Human-computer interaction refers to a way of conversation between a human and a machine. From the original keyboard, mouse to the present camera, various sensors, etc., have undergone great innovation and development. With the continuous development of VR technology, the recognition of motion interaction becomes the heat of new development. How to capture the action gestures of the user and perform recognition and judgment is a complex art.
With the continuous development of mobile phone software and hardware, the dual-camera is becoming the standard configuration of mainstream mobile phones, the mobile phone carrying the dual-camera can provide better telephoto performance, and the mutual cooperation of the two lenses can bring the background blurring capability like a camera, so that the camera has a good effect when people take photos. Moreover, by using binocular stereo vision of the two cameras, image video with a 3D effect can be realized, and depth image data of a scene can be obtained. Thereby applying the 3D data to other specific scenes.
The field of machine learning has continued to improve and evolve since 2006. In the field of image processing, a convolutional neural network obtains huge practical application results. Through a supervised deep learning model CNN (convolutional neural network), the number of parameters is reduced by utilizing spatial calculation modes such as weight sharing, downsampling and the like, so that the number of local minimum values is reduced, the parameters can be reduced, and a best local optimal solution is found during training. Thereby improving the recognition rate and achieving good effect.
Disclosure of Invention
Based on the prior art, the invention provides a gesture recognition system utilizing a double-camera mobile phone and an artificial intelligence system, which is used as a novel man-machine interaction means.
The invention discloses a gesture recognition system based on a double-camera mobile phone and an artificial intelligence system, which realizes the recognition of human body gestures by utilizing the double-camera mobile phone and machine learning, and comprises an image acquisition module, an image preprocessing module, a neural network training module and a gesture recognition module; wherein:
the image acquisition module 100 is configured to acquire and store two different original images, which are generated due to different viewing angles of the cameras, including color images of the left and right cameras and an image including depth information;
the image preprocessing module 200 is configured to intercept a gesture area from an original image, and obtain a depth image of the gesture area;
the neural network training module 300 is configured to train the acquired depth image with a depth neural network to obtain a neural network system;
the gesture checking and identifying module 400 is configured to return a gesture identification result according to the gesture image input information to be identified;
the image acquisition module 100 is utilized to simultaneously acquire JPG image data of two cameras. The JPG image comprises 3 parts, namely a color image shot by a left camera, a color image shot by a right camera and a depth image obtained by preprocessing; then, JPG image segmentation processing is carried out, namely, according to JPG file format, the following steps are specified: extracting corresponding storage segments of the left camera image and the right camera image for storage respectively by taking a jpg file header as 0xFFD8 and an SOA format segment as 0 xFFDA; the depth image fragment then starts at 0x0065646f6600 and is extracted and stored separately. The characters of the hexadecimal string are denoted as edof flags;
acquiring an image with depth information from an original image by using the image preprocessing module 200, and intercepting a gesture area in the depth image by using a threshold segmentation method; intercepting a corresponding gesture area from the color image as a primary gesture segmentation result; converting the color image from an RGB space to an HSV space, clustering the color information of the image by using a kmeans machine learning clustering method, and clustering the image data of the HSV space into 3 types, namely obtaining a background white type, a gesture area type and other areas types; after the classified pixels of the gesture area are obtained, the pixel mean value and the variance are obtained, and the corresponding accurate gesture area in the color image is intercepted according to the mean value and the variance by utilizing a threshold segmentation method; cutting out a depth image gesture area by utilizing the color image accurate gesture area to obtain a final depth gesture image; performing transformation expansion on the final depth gesture image, and enhancing a training data set to reach more than about 30000 depth images;
performing neural network training on the gesture region depth map obtained by the image preprocessing module by using the neural network training module 300, wherein the neural network is composed of 4 layers, the first layer is a convolution neural network layer and comprises 16 convolution kernels with 5 × 5 and 1 convolution kernel with 2 × 2 maximum value subsampling, and 16 characteristics with 36 × 48 are output by inputting a gray scale map with the size of 72 × 96; the second layer is a convolution neural network layer, and the input 32 gray-scale graphs with the size of 36 × 48 are output with 64 features of 18 × 24 by 32 convolution kernels with 5 × 5 and 1 maximum subsampled convolution kernel with 2 × 2; the third layer is a full connection layer, and the 64 output characteristic maps of 18 × 24 are fully connected to 512 output neurons; the fourth layer is a softmax layer, 512 input neurons are output to 9 output neurons, 9 numbers representing 1-9 are output, and the maximum output item is taken as the recognition result;
the gesture detection and recognition module 400 is used for obtaining a gesture depth map through the preprocessing of the image preprocessing module, and then the gesture depth map is input into a neural network layer to obtain a prediction result.
Compared with the traditional technology of utilizing color images to recognize images, the invention starts from depth images, combines the particularity of gesture recognition images and utilizes depth information to recognize gestures. The depth information has more accurate gesture information, and thus, higher recognition accuracy.
Drawings
FIG. 1 is a functional block diagram of a gesture recognition system based on a dual-camera phone and an artificial intelligence system according to the present invention;
FIG. 2 is a schematic flow diagram of an image acquisition module;
FIG. 3 is a schematic flow diagram of an image pre-processing module;
FIG. 4 is a schematic diagram of a result of preliminary gesture segmentation of a depth image, (4-1) is an original depth image, (4-2) is a depth image of a gesture area captured by a threshold segmentation method, and (4-3) is a depth image after gray scale stretching;
FIG. 5 is a diagram illustrating effects of the embodiment; (5-1) is an original left camera color image, (5-2) is a color image segmented by a primary gesture area, (5-3) is a cut and scaled color image, (5-4) is a color image subjected to fuzzy processing, (5-5) is a color image segmented by an accurate gesture area, (5-4) is a color image subjected to fuzzy processing, (5-6) is a depth image segmented by an accurate gesture area, and (5-7) is a depth image subjected to gray scale inversion;
FIG. 6 is a schematic overall flow chart of a gesture recognition method based on a dual-camera mobile phone and an artificial intelligence system according to the present invention;
FIG. 7 is a diagram of a neural network model for depth images according to the present invention;
FIG. 8 is a schematic diagram of an original captured image, (8-1) is the original image, (8-2) is the left camera image, (8-3) is the right camera image, and (8-4) is the depth image;
FIG. 9 is a schematic view of depth images of 1-9 gestures;
FIG. 10 is a diagram illustrating an expansion effect of a number 1 gesture. (10-1) is an original figure, (10-2) to (10-4) are a shearing effect, (10-5) is an outline delineation effect, (10-6) and (10-7) are a maximization and minimization effect, (10-8) to (10-10) are a rotation effect, (10-11) is a sharpening effect, and (10-12) is a softening effect.
Detailed Description
Embodiments of the present invention will be described in further detail below with reference to the accompanying drawings.
Fig. 1 is a functional block diagram of a gesture recognition system based on a dual-camera phone and an artificial intelligence system according to the present invention. The system includes an image acquisition module 100, an image preprocessing module 200, a neural network training module 300, and a gesture verification recognition module 400.
The image acquisition module mainly utilizes the principle that when two rear cameras of a double-camera mobile phone take pictures simultaneously, different visual angles are generated due to the position difference of the cameras, so that two images with small difference are acquired, and a scene depth information image generated by the camera by utilizing a binocular stereo vision principle is acquired; the image preprocessing module 200 intercepts the primarily segmented gesture regions from the depth image by using a threshold value method, segments the same gesture regions by using the corresponding color images, divides the image regions by using a clustering method to obtain accurate gesture image regions, and correspondingly removes other image regions in the depth image to obtain a more accurate depth image of the gesture regions; the neural network training module 300 is used for collecting about 2500 depth images with manually marked correct results in advance, carrying out image filtering processing such as overturning, blurring, sharpening, line smoothing, boundary enhancement and the like on sample images, expanding the number of images, enhancing a neural network training set to obtain about 30000 training sample sets, and training the training sample sets by using the depth neural network to obtain a neural network system with the recognition accuracy rate of more than 92%; the gesture detection and recognition module 400 utilizes the built 4-layer structure neural network system, inputs information according to the gesture image to be recognized, inputs the information into the neural network after being processed by the image preprocessing module, and returns a gesture recognition result. The module can be integrated into various mobile-side apps, pc-side apps and web apps. In the actual gesture recognition application, a user utilizes a camera to take a picture, and the program acquires and preprocesses the taken picture, inputs the preprocessed picture into a neural network to obtain a neural network prediction result and feeds the neural network prediction result back to a user recognition result.
Fig. 2 shows a flow chart of the image capturing module 100 according to the present invention. The image acquisition process specifically comprises the following processing:
and 1003, exporting the depth image until the image acquisition is finished. The derivation operation process is as follows:
an image acquisition process:
(1) color image acquisition
1. Checking jpg header 0xffd8 identifier
2. Search for the first 0 xfdda identifier
3. Obtaining SOA format segment length
4. And intercepting the SOA format section and outputting the SOA format section to an x _ n _ rgb.jpg file. Wherein x is the hand-marked gesture recognition result, namely the number is 1-9, and n is the picture serial number.
(2) Depth image acquisition
1. Checking jpg header 0xffd8 identifier
2. Search for 0x0065646f6600(ascii code corresponds to edof) identifier
3. Obtaining the edof Format segment Length
4. And intercepting the edof format segment and outputting the edif format segment to an x _ n _ dep. Wherein x is the hand-marked gesture recognition result, namely the number is 1-9, and n is the picture serial number.
5. And the image is turned over, so that the direction of the depth image is consistent with that of the color image.
Fig. 3 is a flow chart of the image preprocessing module 200 according to the present invention. The image preprocessing flow specifically comprises several important steps of preliminarily segmenting a depth image gesture area by using a threshold method, preliminarily segmenting a color image gesture area, acquiring pixel characteristics of a hand area by using kmeans, accurately segmenting the color image gesture area by using the threshold method, accurately segmenting the depth image gesture area, and carrying out image transformation and expansion, and comprises the following steps:
depth image preliminary gesture segmentation:
1. and acquiring image gray value histogram statistical data, namely counting the number of pixels appearing in each pixel value for the subsequent processing process.
2. And obtaining the maximum gray value appearing in the histogram statistical data, and simultaneously satisfying that the number of the gray value pixel points is more than 0.1 percent of the total number of the pixels of the picture. The larger the gray value is, the closer the object is to the camera when the object is shot, and due to the particularity of the gesture image, the gray value can be considered as the part of the human hand closest to the camera. Taking 0.1% as a threshold value effectively avoids some single noise pixel points from being recognized as valid objects by mistake. The value is obtained through multiple tests, so that the filtering effect is good, and individual noise pixel points can be effectively eliminated.
3. Regarding the particularity of the gesture image, 30 is taken as a threshold value for gesture area segmentation. That is, after the maximum gray value is obtained, a value greater than the gray value by more than 30 is filtered. Experiments prove that when a scene is shot in a close view, the pixel gray value changes by about 30 every time the scene is 20cm away from the camera, so that the depth of field of the hand gesture does not exceed 20cm, the point closest to the camera is the starting point of the hand region, and the non-gesture region of the image is filtered by taking the point more than 20cm away as the background region.
4. And carrying out gray level stretching on the filtered image to enlarge the contrast of the depth value. And extending the 30 pixel value spaces to the pixel value spaces of 0-255, so as to increase the difference of depth information and facilitate comparison.
5. The image is temporarily stored as a result of the preliminary gesture segmentation of the depth image, as shown in FIG. 4.
The original depth image is shown as 4-1, the depth image after threshold segmentation is shown as 4-2, and the effect shown as 4-3 is obtained after the gray stretching processing process, so that the gesture image has an effective gesture area primarily filtered out after the primary gesture segmentation process of the depth image, and the depth information contrast is obvious.
kmeans acquires pixel characteristics of a human hand region:
1. and cutting and scaling the original color picture to reduce the pixel quantity of the picture and reduce the calculated value. From the practical situation, the gestures are generally concentrated near the center area of the picture, and in a few cases, the gestures are slightly deviated. Therefore, images with the width of 10 pixels are cut off in the upper, lower, left and right directions of the picture respectively, then the length and the width of the images are reduced by 40 times in an equal proportion and are scaled to be 72 × 96, so that the total size of the picture is reduced to 6912 pixels, and the calculation amount of pixel clustering and a neural network is greatly reduced.
2. The blurring operation is performed on the picture twice, so that the color change of different areas of the picture is smoother, the influence of partial bright points or dark points in the picture can be effectively reduced, and the obtained result is shown in fig. 5-3.
3. And converting the image pixel value from the rgb value to the hsv value by using a conversion rule from the rgb color space to the hsv color space, and then performing clustering analysis on the processed image pixels by using a kmeans clustering method. Kmeans is an unsupervised clustering machine learning algorithm, and the main objective is to automatically classify similar samples into one category. In the step, the image data of the HSV space is tried to be clustered into 3 types, and a background white type, a gesture area type and other areas types are obtained under an ideal state. In the default situation, the kmeans algorithm clusters 3 random initial centroids, and aiming at the characteristic of segmenting the color image by the initial gesture, the invention adopts 3 customized initial centroids, namely white (hsv: 0 degrees, 0 percent and 100 percent), black (hsv: 0 degrees, 0 percent and 0 percent) and yellow skin average color (hsv: 60 degrees, 90 percent and 60 percent). And 3 customized initial centroids are adopted, the centroids of 3 pixel classes of the final clustering result are closer to the initial centroids, and the obtained clustering result is more accurate. Through the kmeans clustering analysis, the picture pixels are divided into 3 classes, and the class with the initial centroid being the average color of the yellow skin is all the hand region pixels finally divided. The centroid value represents a color characteristic of the region of the human hand.
And step 2004, accurately segmenting the color image gesture area by using a threshold value method. The color characteristics of the region of the human hand, i.e., the centroid of the skin pixel class, are obtained from step 2003. And (3) obtaining the class centroid, and segmenting by using a threshold value method, namely using the pixel which is within 15 degrees, 10 percent and 10 percent of the hsv value of the centroid as the pixel of the hand region, and setting other pixels as white, wherein the threshold value is obtained through a plurality of tests, and the threshold value has a good segmentation effect. The resulting segmentation results are shown in FIGS. 5-4. And then, filtering the picture by adopting a minimum filter to eliminate part of independent noise points. The final segmentation result is obtained, as shown in fig. 5-5, the white area is the background area, and the middle area is the gesture area.
And step 2006, performing transformation expansion on the final depth gesture image. Due to the shortage of the image acquisition quantity, the neural network training set is too small, and large recognition errors are easily caused, so that the image quantity can be increased and the training set can be enlarged by carrying out scaling, filtering, overturning and other processing on the image. As shown in fig. 10, the depth gesture image and the expanded image are listed. FIG. 10-1 is an original depth gesture image, FIG. 10-2 is a 105% scaled image, FIG. 10-3 is a 110% scaled image, FIG. 10-4 is a 115% scaled image, FIG. 10-5 is a boundary enhancement filtered image, FIG. 10-6 is a maximum filter filtered image, FIG. 10-7 is a minimum filter filtered image, FIG. 10-8 is an image rotated 90 degrees counterclockwise, FIG. 10-9 is an image rotated 180 degrees counterclockwise, FIG. 10-10 is an image rotated 270 degrees counterclockwise, FIG. 10-11 is a sharpening filtered image, and FIG. 10-12 is a smooth filtered image.
And the gesture area depth map obtained by using the image preprocessing flow is used as the input of a neural network training module, so that the network is trained. As shown in fig. 9, the present invention collects 9 gestures representing numbers 1 to 9 for 10 testers, and the total number is about 2500; after the images are subjected to an image preprocessing flow, 30000 gesture depth images are obtained through image expansion and serve as a training set for training of a neural network system.
As shown in fig. 7, the neural network structure designed for the present invention. The neural network system uses a 4-layer network structure, which includes 2 convolutional layers (convolution) and maximum pooling (maxporoling), 1 full-link layer, and 1 softmax output layer in total:
the first layer is a convolutional neural network layer, with 16 convolution kernels of 5 x 5, and 1 maximum subsampling operation of 2 x 2. Each input is a gray value matrix with the size being 1 and 72 × 96, the stride parameter of convolution is 1, and the padding parameter selection is the same as that of the original image; the pooling operation took 2 x 2 max power and the output was 16 matrices of 36 x 48.
The second layer is a convolutional neural network layer, consisting of 32 convolution kernels of 5 x 5, and 1 maximum subsampled convolution kernel of 2 x 2. The input is the output of the upper network structure, 16 matrixes of 36 x 48 are input, the stride parameter of convolution is 1, and the padding parameter selection is the same as the original image; the pooling operation took 2 x 2 max power and the output was 32 matrices of 18 x 24.
The third layer is a full connection layer, and the 32 output matrixes of 18 × 24 are fully connected to 512 output neurons;
and the fourth layer is a softmax layer, 512 input neurons are output to 9 output neurons, 9 numbers representing 1-9 are output, and the item with the largest output is taken as the recognition result.
The training method adopts a cross entropy evaluation model, and the cross entropy is used for describing the inefficiency of the true phase and measuring the prediction result. The network is trained using an adaptive moment estimation method (adam). Adam is a first-order optimization algorithm that can replace the traditional stochastic gradient descent process, and can iteratively update neural network weights based on training data. The specific main implementation codes of the neural network are as follows:
after the code runs, after 2000 times of training, the recognition accuracy of the neural network model reaches 92.53%, the output of the training process is as follows, step is the training time, train _ accuracy is the recognition accuracy of the training set, and test accuracracacay is the recognition accuracy of the test:
step 0,train_accuracy 0.26
test accuracy 0.113806
test accuracy 0.402985
test accuracy 0.518657
test accuracy 0.636194
test accuracy 0.695895
step 500,train_accuracy 0.76
test accuracy 0.729478
step 600,train_accuracy 0.76
test accuracy 0.744403
step 700,train_accuracy 0.82
test accuracy 0.785448
step 800,train_accuracy 0.72
test accuracy 0.800373
step 900,train_accuracy 0.9
test accuracy 0.820895
step 1000,train_accuracy 0.96
test accuracy 0.845149
step 1100,train_accuracy 0.96
test accuracy 0.867537
step 1200,train_accuracy 0.98
test accuracy 0.873134
step 1300,train_accuracy 0.94
test accuracy 0.876866
step 1400,train_accuracy 0.98
test accuracy 0.882463
step 1500,train_accuracy 0.94
test accuracy 0.882463
step 1600,train_accuracy 0.96
test accuracy 0.897388
step 1700,train_accuracy 0.96
test accuracy 0.902985
step 1800,train_accuracy 0.96
test accuracy 0.893657
step 1900,train_accuracy 0.96
test accuracy 0.916045
step 2000,train_accuracy 0.98
test accuracy 0.925373
the gesture detection and recognition module 400 inputs the gesture image after being processed by the preprocessing module into the trained neural network, and returns a gesture recognition result. The module can be combined with practical application, for example, the module is integrated in a camera, a user takes a picture by using the camera, and a program acquires and preprocesses a taken picture, inputs the result into a neural network, obtains a neural network prediction result and feeds the result back to a user recognition result.
At present, the application of the 3D field by utilizing the double-shot mobile phone is very rare, and the invention creatively applies the double-shot mobile phone to other fields except the shooting field. The method is only applied to the mobile phone with the android platform, the android platform does not open an interface for acquiring data of the double cameras at present, the data of one camera cannot be acquired independently, and development interfaces of different mobile phone manufacturers need to be utilized. The invention realizes a double-shot data acquisition scheme by using the Huashi mobile phone series open interface.
Because the difference between the gesture image region and the background image is obvious in the depth image, namely the distance difference between the gesture and other parts and the camera is obvious, the sampling threshold segmentation method can obtain a good gesture image region intercepting effect.
then, in an image preprocessing stage, extracting a gesture area in an image, extracting a depth image, and obtaining about 30000 gesture depth images through image expansion to serve as a training set of a neural network system;
the method comprises the steps of utilizing an open source neural network framework (transducer flow) to realize a deep convolutional neural network, and utilizing a training sample set to train to obtain a neural network system with the identification accuracy rate of more than 92%;
the application stage is identified by utilizing the deep neural network trained in the previous stage to be applied to specific practice. And after the user shoots gesture images through the two cameras, inputting the gesture images into the system. The system carries out the same preprocessing stage on the image, and then inputs the image into a neural network system for recognition to obtain a recognition result.
Claims (1)
1. A gesture recognition system based on a double-camera mobile phone and an artificial intelligence system realizes the recognition of human body gestures by using the double-camera mobile phone and machine learning, and is characterized by comprising an image acquisition module (100), an image preprocessing module (200), a neural network training module (300) and a gesture recognition module (400); wherein:
the image acquisition module is used for acquiring and storing two different original images generated by different camera visual angles, including color images of a left camera and a right camera and images containing depth information;
the image preprocessing module is used for intercepting a gesture area from an original image and acquiring a depth image of the gesture area;
the neural network training module is used for training the acquired depth image by using a depth neural network to obtain a neural network system;
the gesture detection and recognition module is used for returning a gesture recognition result according to gesture image input information needing to be recognized;
acquiring JPG image data of two cameras by using the image acquisition module (100); the JPG image comprises 3 parts, namely a color image shot by a left camera, a color image shot by a right camera and a depth image obtained by preprocessing; then, JPG image segmentation processing is carried out, namely, the following are specified according to a JPG file format: extracting corresponding storage segments of the left camera image and the right camera image for storage respectively by taking a jpg file header as 0xFFD8 and an SOA format segment as 0 xFFDA; then starting the depth image segment with 0x0065646f6600, and storing the depth image segment after extraction; the ascii character of the hexadecimal string is denoted as an edof flag;
acquiring an image with depth information from an original image by using the image preprocessing module (200), and intercepting a gesture area in the depth image by using a threshold segmentation method; intercepting a corresponding gesture area from the color image as a primary gesture segmentation result; converting the color image from an RGB space to an HSV space, clustering the color information of the image by using a kmeans machine learning clustering method, and clustering the image data of the HSV space into 3 types, namely obtaining a background white type, a gesture area type and other areas; after the classified pixels of the gesture area are obtained, the pixel mean value and the variance are obtained, and the corresponding accurate gesture area in the color image is intercepted according to the mean value and the variance by utilizing a threshold segmentation method; cutting out a depth image gesture area by utilizing the color image accurate gesture area to obtain a final depth gesture image; performing transformation expansion on the final depth gesture image, and enhancing a training data set to reach more than about 30000 depth images;
performing neural network training on the gesture region depth map obtained by the image preprocessing module by using the neural network training module (300), wherein the neural network is composed of 4 layers, the first layer is a convolution neural network layer and comprises 16 convolution kernels with 5 × 5 and 1 convolution kernel with 2 × 2 maximum value subsampling, and 16 characteristics with 36 × 48 are output by the input gray scale map with the size of 72 × 96; the second layer is a convolution neural network layer, and the input 32 gray-scale graphs with the size of 36 × 48 are output with 64 features of 18 × 24 by 32 convolution kernels with 5 × 5 and 1 maximum subsampled convolution kernel with 2 × 2; the third layer is a full connection layer, and the 64 output characteristic maps of 18 × 24 are fully connected to 512 output neurons; the fourth layer is a softmax layer, 512 input neurons are output to 9 output neurons, 9 numbers representing 1-9 are output, and the maximum output item is taken as the recognition result;
and utilizing the gesture detection and recognition module (400), preprocessing the gesture depth map by the image preprocessing module, and inputting the gesture depth map into a neural network layer to obtain a prediction result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810402470.9A CN108717524B (en) | 2018-04-28 | 2018-04-28 | Gesture recognition system based on double-camera mobile phone and artificial intelligence system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810402470.9A CN108717524B (en) | 2018-04-28 | 2018-04-28 | Gesture recognition system based on double-camera mobile phone and artificial intelligence system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108717524A CN108717524A (en) | 2018-10-30 |
CN108717524B true CN108717524B (en) | 2022-05-06 |
Family
ID=63899399
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810402470.9A Expired - Fee Related CN108717524B (en) | 2018-04-28 | 2018-04-28 | Gesture recognition system based on double-camera mobile phone and artificial intelligence system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108717524B (en) |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109767488A (en) * | 2019-01-23 | 2019-05-17 | 广东康云科技有限公司 | Three-dimensional modeling method and system based on artificial intelligence |
CN109948483B (en) * | 2019-03-07 | 2022-03-15 | 武汉大学 | Character interaction relation recognition method based on actions and facial expressions |
CN110322546A (en) * | 2019-05-14 | 2019-10-11 | 广东康云科技有限公司 | Substation's three-dimensional digital modeling method, system, device and storage medium |
CN110322545A (en) * | 2019-05-14 | 2019-10-11 | 广东康云科技有限公司 | Campus three-dimensional digital modeling method, system, device and storage medium |
CN110322544A (en) * | 2019-05-14 | 2019-10-11 | 广东康云科技有限公司 | A kind of visualization of 3 d scanning modeling method, system, equipment and storage medium |
CN110141232B (en) * | 2019-06-11 | 2020-10-27 | 中国科学技术大学 | Data enhancement method for robust electromyographic signal identification |
CN110348323B (en) * | 2019-06-19 | 2022-12-16 | 广东工业大学 | Wearable device gesture recognition method based on neural network optimization |
CN111079530A (en) * | 2019-11-12 | 2020-04-28 | 青岛大学 | Mature strawberry identification method |
CN111429156A (en) * | 2020-03-26 | 2020-07-17 | 北京九歌创艺文化艺术有限公司 | Artificial intelligence recognition system for mobile phone and application thereof |
CN113553877B (en) * | 2020-04-07 | 2023-05-30 | 舜宇光学(浙江)研究院有限公司 | Depth gesture recognition method and system and electronic equipment thereof |
CN115147672A (en) * | 2021-03-31 | 2022-10-04 | 广东高云半导体科技股份有限公司 | Artificial intelligence system and method for identifying object types |
CN113408443B (en) * | 2021-06-24 | 2022-07-05 | 齐鲁工业大学 | Gesture posture prediction method and system based on multi-view images |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101710418A (en) * | 2009-12-22 | 2010-05-19 | 上海大学 | Interactive mode image partitioning method based on geodesic distance |
CN104050682A (en) * | 2014-07-09 | 2014-09-17 | 武汉科技大学 | Image segmentation method fusing color and depth information |
CN105825494A (en) * | 2015-08-31 | 2016-08-03 | 维沃移动通信有限公司 | Image processing method and mobile terminal |
CN107300976A (en) * | 2017-08-11 | 2017-10-27 | 五邑大学 | A kind of gesture identification household audio and video system and its operation method |
CN107563333A (en) * | 2017-09-05 | 2018-01-09 | 广州大学 | A kind of binocular vision gesture identification method and device based on ranging auxiliary |
CN107622257A (en) * | 2017-10-13 | 2018-01-23 | 深圳市未来媒体技术研究院 | A kind of neural network training method and three-dimension gesture Attitude estimation method |
CN107766842A (en) * | 2017-11-10 | 2018-03-06 | 济南大学 | A kind of gesture identification method and its application |
-
2018
- 2018-04-28 CN CN201810402470.9A patent/CN108717524B/en not_active Expired - Fee Related
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101710418A (en) * | 2009-12-22 | 2010-05-19 | 上海大学 | Interactive mode image partitioning method based on geodesic distance |
CN104050682A (en) * | 2014-07-09 | 2014-09-17 | 武汉科技大学 | Image segmentation method fusing color and depth information |
CN105825494A (en) * | 2015-08-31 | 2016-08-03 | 维沃移动通信有限公司 | Image processing method and mobile terminal |
CN107300976A (en) * | 2017-08-11 | 2017-10-27 | 五邑大学 | A kind of gesture identification household audio and video system and its operation method |
CN107563333A (en) * | 2017-09-05 | 2018-01-09 | 广州大学 | A kind of binocular vision gesture identification method and device based on ranging auxiliary |
CN107622257A (en) * | 2017-10-13 | 2018-01-23 | 深圳市未来媒体技术研究院 | A kind of neural network training method and three-dimension gesture Attitude estimation method |
CN107766842A (en) * | 2017-11-10 | 2018-03-06 | 济南大学 | A kind of gesture identification method and its application |
Non-Patent Citations (3)
Title |
---|
A Probabilistic Combination of CNN and RNN Estimates for Hand Gesture Based Interaction in Car;Aditya Tewari et al.;《2017 IEEE International Symposium on Mixed and Augmented Reality (ISMAR-Adjunct)》;20171030;全文 * |
基于Kinect传感器的动态手势识别;余旭;《中国优秀硕士学位论文全文数据库 信息科技辑》;20140915;全文 * |
结合深度信息的图像分割算法研究;皮志明;《中国博士学位论文全文数据库 信息科技辑》;20131015;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN108717524A (en) | 2018-10-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108717524B (en) | Gesture recognition system based on double-camera mobile phone and artificial intelligence system | |
KR102102161B1 (en) | Method, apparatus and computer program for extracting representative feature of object in image | |
WO2019128507A1 (en) | Image processing method and apparatus, storage medium and electronic device | |
CN106056064B (en) | A kind of face identification method and face identification device | |
CN109284738B (en) | Irregular face correction method and system | |
CN110929569B (en) | Face recognition method, device, equipment and storage medium | |
CA3100642A1 (en) | Multi-sample whole slide image processing in digital pathology via multi-resolution registration and machine learning | |
WO2019080203A1 (en) | Gesture recognition method and system for robot, and robot | |
CN112037320B (en) | Image processing method, device, equipment and computer readable storage medium | |
CN111989689A (en) | Method for identifying objects within an image and mobile device for performing the method | |
CN110929593A (en) | Real-time significance pedestrian detection method based on detail distinguishing and distinguishing | |
CN109190456B (en) | Multi-feature fusion overlook pedestrian detection method based on aggregated channel features and gray level co-occurrence matrix | |
CN110032932B (en) | Human body posture identification method based on video processing and decision tree set threshold | |
CN111967319B (en) | Living body detection method, device, equipment and storage medium based on infrared and visible light | |
CN110674759A (en) | Monocular face in-vivo detection method, device and equipment based on depth map | |
CN110046544A (en) | Digital gesture identification method based on convolutional neural networks | |
CN113011253B (en) | Facial expression recognition method, device, equipment and storage medium based on ResNeXt network | |
CN109816694A (en) | Method for tracking target, device and electronic equipment | |
CN110991349A (en) | Lightweight vehicle attribute identification method based on metric learning | |
CN112633221A (en) | Face direction detection method and related device | |
CN111209873A (en) | High-precision face key point positioning method and system based on deep learning | |
CN112418032A (en) | Human behavior recognition method and device, electronic equipment and storage medium | |
CN117252926B (en) | Mobile phone shell auxiliary material intelligent assembly control system based on visual positioning | |
CN116630828B (en) | Unmanned aerial vehicle remote sensing information acquisition system and method based on terrain environment adaptation | |
CN113128428A (en) | Depth map prediction-based in vivo detection method and related equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20220506 |