CN108717524A - It is a kind of based on double gesture recognition systems and method for taking the photograph mobile phone and artificial intelligence system - Google Patents

It is a kind of based on double gesture recognition systems and method for taking the photograph mobile phone and artificial intelligence system Download PDF

Info

Publication number
CN108717524A
CN108717524A CN201810402470.9A CN201810402470A CN108717524A CN 108717524 A CN108717524 A CN 108717524A CN 201810402470 A CN201810402470 A CN 201810402470A CN 108717524 A CN108717524 A CN 108717524A
Authority
CN
China
Prior art keywords
image
gesture
depth
module
identification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810402470.9A
Other languages
Chinese (zh)
Other versions
CN108717524B (en
Inventor
邓琨
孟昭鹏
郑岩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201810402470.9A priority Critical patent/CN108717524B/en
Publication of CN108717524A publication Critical patent/CN108717524A/en
Application granted granted Critical
Publication of CN108717524B publication Critical patent/CN108717524B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/2163Partitioning the feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

The invention discloses a kind of based on double gesture recognition systems and method for taking the photograph mobile phone and artificial intelligence system, utilize double identifications for taking the photograph mobile phone and machine learning realization human body gesture, described image acquisition module, for obtaining the original image because of the two width othernesses that camera visual angle difference generates, coloured image including left and right camera and image including depth information, and preserve;Described image preprocessing module obtains gesture area depth image for intercepting gesture area from original image;The neural metwork training module, is trained for the depth image to acquisition using deep neural network, and the nerve network system that recognition correct rate reaches 92% or more is obtained;The gesture examines identification module, for the images of gestures input information according to required identification, returns to gesture identification result.Compared with prior art, invention increases depth informations to have more accurate gesture information, to there is higher recognition accuracy.

Description

It is a kind of based on double gesture recognition systems and method for taking the photograph mobile phone and artificial intelligence system
Technical field
The present invention relates to the technologies in terms of Computer Image Processing, artificial intelligence, more particularly to one kind is by using binocular Stereoscopic vision obtains 3D rendering to carry out the system and gesture identification method of gesture identification.
Background technology
Human-computer interaction refers to people and a kind of conversational mode before machine.From the camera shooting of original keyboard, mouse till now Head, various sensors etc., experienced huge innovation and development.With the continuous development of VR technologies, interactive be identified as is acted In order to which new development is popular.Judgement is identified in the action gesture for how capturing user, is a complicated art.
With the continuous development of cell phone software hardware, dual camera is just becoming the standard configuration of mainstream mobile phone, carries double take the photograph It preferably dolly-out,s dolly-back performance as the mobile phone of head can provide, and two pieces of camera lenses mutual cooperations can also be brought such as the background as camera Virtualization ability has good effect when shooting portrait photo.Moreover, the binocular tri-dimensional of dual camera is utilized Feel, the image/video of 3D effect may be implemented, obtain the depth image data of scenery.To with 3D data to other specific fields Scape.
Constantly improve and the development since two thousand six of machine learning field.In image processing field, convolutional neural networks take Obtained huge practice achievement.It is total using weights by there is the deep learning MODEL C NN (convolutional neural networks) of supervision Enjoy, the spaces calculation such as down-sampling reduces number of parameters to reduce local minimum number, parameter can be reduced, helped In training when find a best locally optimal solution.To improve discrimination, reach good effect.
Invention content
Based on the prior art, the present invention propose it is a kind of using it is double take the photograph mobile phone and artificial intelligence system gesture recognition system and Method, as a kind of novel human-computer interaction means, the present invention shoots gesture photo by double dual cameras for taking the photograph mobile phone, utilizes Depth images of gestures that dual camera extracts trains deep neural network, by processing return gesture identification as a result, i.e. hand Gesture meaning.
The present invention's is a kind of based on double gesture recognition systems for taking the photograph mobile phone and artificial intelligence system, and mobile phone and machine are taken the photograph using double The identification of human body gesture is realized in device study, which includes image capture module, image pre-processing module, neural metwork training mould Block and gesture recognition module;Wherein:
Described image acquisition module (100), it is original because of the two width othernesses that camera visual angle difference generates for obtaining Image, including the coloured image of left and right camera and image including depth information, and preserve;
Described image preprocessing module (200) obtains gesture area depth for intercepting gesture area from original image Image;
The neural metwork training module (300), is instructed for the depth image to acquisition using deep neural network Practice, obtains nerve network system;
The gesture examines identification module (400), for the images of gestures input information according to required identification, returns to gesture Recognition result.
The present invention it is a kind of based on double gesture identification methods for taking the photograph mobile phone and artificial intelligence system, this method specifically include with Lower step:
Utilize the JPG image datas of two cameras of described image acquisition module (100) while acquisition.The JPG image packets Contained 3 parts, i.e., what the coloured image and pretreatment that the coloured image and right camera that left camera is shot are shot obtained Depth image.Then JPG image segmentation processing is carried out, i.e., is provided according to JPG file formats:0xFFD8 be jpg file headers, 0xFFDA is SOA format fields, and the stored fragments for extracting corresponding left and right camera image preserve respectively;Then depth image Segment is started with 0x0065646f6600, is individually preserved after extraction.The character representation of the hexadecimal string indicates for edof;
It is obtained in the image with depth information, is utilized from original image using described image preprocessing module (200) Thresholding method intercepts the gesture area in depth image;And corresponding gesture area is intercepted from coloured image, as preliminary hand Gesture segmentation result.Depth image colouring information is transformed into HSV space by rgb space, utilizes the machine learning classification sides kmeans Method clusters the colouring information of image, is 3 classes by the image data cluster of HSV space, it is white ideally to obtain background Color is a kind of, gesture area is a kind of and other regions are a kind of;Obtain gesture area classified pixels after, ask pixel mean value and Variance size, using thresholding method according to corresponding accurate gesture area in mean value and variance interception coloured image;Utilize coloured silk The accurate gesture area of color image is cut into depth image gesture area, obtains final depth images of gestures;To final depth Images of gestures carries out transformation extension, enhances training dataset, reaches about 30000 or more depth images.
The gesture area depth map that image pre-processing module is obtained using the neural metwork training module (300) into Row neural metwork training, the neural network are constituted by 4 layers, and first layer is convolutional neural networks layer, by the convolution kernel of 16 5*5, With the maximum value sub-sampling convolution kernel of 1 2*2, the gray-scale map that the size of input is 72*96 is exported into 16 36*48 features;The Two layers are convolutional neural networks layer, by the maximum value sub-sampling convolution kernel of the convolution kernel and 1 2*2 of 32 5*5, by input Size is the feature that a gray-scale map of 32 36*48 exports 64 18*24;Third layer is full articulamentum, by 64 18* of output 24 characteristic pattern is connected to 512 output neurons entirely;4th layer is softmax layers, has 512 input neurons, is output to 9 A output neuron represents 1~9 9 numbers, takes the result for exporting maximum item as identification.
Identification module (400) is examined using the gesture, is pre-processed by above-mentioned image pre-processing module Gesture depth map, then inputs neural net layer, obtains prediction result.
Compared with tradition carries out the technology of image recognition using coloured image, the present invention starts with from depth image, in conjunction with hand Gesture identifies the particularity of image, and gesture identification is carried out using depth information.Depth information has more accurate gesture information, to There is higher recognition accuracy.
Description of the drawings
Fig. 1 is a kind of based on double gesture recognition system functional block diagrams for taking the photograph mobile phone and artificial intelligence system of the present invention;
Fig. 2 is image capture module flow diagram;
Fig. 3 is image pre-processing module flow diagram;
Fig. 4 is the result schematic diagram of the preliminary Hand Gesture Segmentation of depth image, and (4-1) is former depth image, and (4-2) is threshold value point The gesture area depth map that the method for cutting is truncated to, (4-3) are the depth map after gray scale stretching;
Fig. 5 is embodiment design sketch;(5-1) is former left camera coloured image, and (5-2) is the segmentation of preliminary gesture area Coloured image, (5-3) are the coloured image by cutting and scaling, and (5-4) is the coloured image by Fuzzy Processing, (5-5) For the coloured image of accurate gesture area segmentation, (5-4) is the coloured image by Fuzzy Processing, and (5-6) is accurate gesture area The depth image of regional partition, (5-7) are the depth image after gray scale overturning;
Fig. 6 is a kind of based on double gesture identification method overall flow signals for taking the photograph mobile phone and artificial intelligence system of the present invention Figure;
Fig. 7 is the depth image neural network model figure of the present invention;
Fig. 8 is acquired original image schematic diagram, and (8-1) is original image, and (8-2) is left camera image, and the right side (8-3) is taken the photograph As head image, (8-4) is depth image;
Fig. 9 is 1~9 kind of gesture depth image schematic diagram;
Figure 10 is the extension effect diagram by taking digital 1 gesture as an example.(10-1) is artwork, and (10-2)~(10-4) is to cut Effect is cut, (10-5) is that profile retouches side effect, and (10-6) (10-7) is to maximize, minimize effect, and (10-8)~(10-10) is Rotates effe, (10-11) are to sharpen effect, and (10-12) is softening effect.
Specific implementation mode
Embodiments of the present invention are described in further detail below in conjunction with attached drawing.
As shown in Figure 1, for a kind of based on double gesture recognition system functions of taking the photograph mobile phone and artificial intelligence system of the present invention Module map.The system includes image capture module (100), image pre-processing module (200), neural metwork training module (300) Identification module (400) is examined with gesture.
Described image acquisition module is mainly using double two rear cameras for taking the photograph mobile phone while when taking pictures, because of camera position The principle that difference generates different visual angles is set, obtaining two width has fine difference image, and obtains camera and utilize binocular tri-dimensional Feel the scene depth information image that principle generates;Described image preprocessing module 200 is intercepted from depth image using threshold method The gesture area of primary segmentation, then by the identical gesture area of corresponding color images, divided using clustering method Image-region obtains accurate images of gestures region, corresponding to remove other image-regions in depth image, obtains more accurate hand The depth image in gesture region;The neural metwork training module 300, about 2500 gathered in advance are correctly tied with manual markings The depth image of fruit overturns sample image, is obscured, being sharpened, going the image filtering processing of smooth, boundary enhancing etc., being expanded Amount of images is opened up, strength neural network training set has obtained 30000 or so training sample sets, utilized deep neural network Training sample set is trained, the nerve network system that recognition accuracy reaches 92% or more is obtained;The gesture, which is examined, to be known Other module 400, using the 4 layers of artificial neural system built, according to the images of gestures input information of required identification, by figure After preprocessing module processing, neural network is inputted, returns to gesture identification result.The module is desirably integrated into various mobile terminals In the application of the end app, pc and web applications.In carrying out practical gesture identification application, user is taken pictures using camera, and program passes through To taking pictures, image carries out above-mentioned acquisition, pretreatment, then inputs neural network, obtains neural network prediction as a result, feeding back to use The result of family identification.
As shown in Fig. 2, the flow chart of the image capture module 100 for the present invention.Image Acquisition flow specifically includes following Processing:
Step 1001, camera acquires:The interface taken a picture by mobile phone large aperture, while obtaining two camera shootings JPG image datas.The JPG images contain 3 parts, i.e., the coloured image and pass through phase that two cameras are individually shot The deep image information that machine program pre-processes.Wherein in depth image, the deeper pixel of color represents subject distance and takes the photograph As head is closer, it is remoter that more shallow pixel represents subject distance camera, as shown in figure 8, Fig. 8-1 is original jpg images, 8-2 is Left camera coloured image, 8-3 are right camera coloured image, 8-4 is depth image;
Step 1002, JPG images " segmentation " are carried out to handle.It is provided according to JPG file formats, 0xFFD8 is marked in image file Know and be identified as SOA format paragraph headers for jpg file headers, 0xFFDA, 0xFFD9 or 0xFFD8 are format field tail.Left and right camera list Two images solely shot are respectively stored in two SOA format fields of jpg files.Only for coloured image gesture Accurate Segmentation The image of any camera is needed, therefore extracts any one SOA sections from jpg source files, a default choice left side of the present invention is taken the photograph As the stored fragments of head image save as new jpg files respectively, to carry out gesture Accurate Segmentation.Similarly, depth image format Section coding is started with 0x0065646f6600, and ascii character representations are that edof, 0xFFD9 or 0xFFD8 are format field tail, from original Edof stored fragments are extracted in jpg files saves as new jpg files;
Step 1003, depth image is exported, so far Image Acquisition is completed.It is as follows to export operating process:
Image acquisition process:
(1), Color Image Acquisition
1, jpg file header 0xffd8 identifiers are examined
2, first 0xffda identifier is searched for
3, SOA format segment length is obtained
4, SOA format fields are intercepted, are output in x_n_rgb.jpg files.Wherein x is the gesture identification knot marked by hand Fruit, i.e. 1~9, n of number are sequence of pictures number.
(2), depth image acquires
1, jpg file header 0xffd8 identifiers are examined
2,0x0065646f6600 (ascii codes correspond to edof) identifier is searched for
3, edof format segment length is obtained
4, edof format fields are intercepted, are output in x_n_dep.jpg files.Wherein x is the gesture identification knot marked by hand Fruit, i.e. 1~9, n of number are sequence of pictures number.
5, image is overturn, ensures that depth image direction is consistent with coloured image direction.
As shown in figure 3, the flow chart of the image pre-processing module 200 for the present invention.Image preprocessing flow specifically includes Human hand area is obtained using threshold method primary segmentation depth image gesture area, coloured image gesture area primary segmentation, kmeans Domain pixel characteristic is become using threshold method Accurate Segmentation coloured image gesture area, depth image gesture area Accurate Segmentation, image It changes and expands several important steps, as described below:
Step 2001, threshold method primary segmentation depth image gesture area is utilized.Image capture module is first passed around, is obtained The deep image information of gesture for depth image tentatively intercepts gesture area image, processing procedure using thresholding method It is as follows:
The preliminary Hand Gesture Segmentation of depth image:
1, gray value of image statistics with histogram data are obtained, that is, count the pixel quantity of each pixel value appearance, to Subsequent processes.
2, the maximum gradation value occurred in Histogram statistics data is obtained, while the number for meeting the gray-value pixel point is more than The 0.1% of the total number of pixels of picture.Gray value is bigger to be represented when the object is shot apart from camera apart from recently, due to gesture figure The particularity of picture, the gray value can be considered as part nearest apart from camera in human hand.0.1% is taken effectively to be kept away as threshold value Exempt from certain single noise pixel points and is misidentified as effective object.The value obtains have good filtering to imitate by test of many times Fruit can effectively eliminate individual noise pixel.
3, for the particularity of images of gestures, 30 threshold values divided as gesture area are taken.After obtaining maximum gradation value, Value more than 30 or more the gray value is filtered.Experiment proves that scenery is often far from camera 20cm's when close shot is taken pictures Distance, grey scale pixel value variation substantially 30, therefore, it is considered that the human hand gesture depth of field is no more than 20cm, the point nearest apart from camera The non-gesture area of image is filtered to being above background area more than point 20cm distances for human hand region starting point.
4, gray scale stretching is carried out to filtered image, expands depth value comparison.30 pixel value spaces are stretched to 0~ 255 pixel value space, increases the otherness of depth information, convenient for comparing.
5, image is kept in, as the preliminary Hand Gesture Segmentation of depth image as a result, as shown in Figure 4.
Former depth image is as shown in 4-1, and the depth image after Threshold segmentation is as shown in 4-2, using gray scale stretching After processing procedure, the effect as shown in 4-3 is obtained, it can be seen that after the preliminary Hand Gesture Segmentation process of depth image, images of gestures Effective gesture area is tentatively filtered out, and depth information contrast is obvious.
Step 2002, coloured image gesture area primary segmentation.Due to background or the influence for shake of taking pictures, thresholding method Obtained gesture area depth map may include many noises, for example other positions of arm, body or other extraneous background pixels. Therefore it needs to carry out further denoising, during Hand Gesture Segmentation, left and right camera image difference using the coloured image of acquisition It influences less, therefore the coloured image that present invention acquiescence is acquired using left camera is handled.It is corresponding to intercept coloured image The preliminary gesture area of depth image utilizes the preliminary Hand Gesture Segmentation that step 2001 generates as a result, portion will be corresponded in coloured image The gesture area divided is split.Original color image as shown in fig. 5-1, the coloured image after the preliminary Hand Gesture Segmentation step As shown in Fig. 5-2.
Step 2003, kmeans obtains human hand area pixel feature.It is also needed to other unrelated regions after preliminary Hand Gesture Segmentation Domain is filtered, as shown in above-mentioned Fig. 5-2, from the gesture area image after interception can see there are coat-sleeve, shade and other Partial noise region.It therefore, can be from the otherness angle of the human hand colour of skin and background to people using the colouring information of coloured image Hand region carries out further Accurate Segmentation.The present invention is by handling colouring information in hsv color spaces.HSV is H (form and aspect), S (saturation degree), V (brightness) come the space of position color as color value.The value range of form and aspect is 0~360 degree, is used To indicate the classification of color.Wherein red is 0 degree, and green is 120 degree, and blue is 240 degree.The value range of saturation degree is 0% ~100%.For indicating the bright-coloured degree of color, the saturation degree of grey is 0%, and pure color saturation is 100%.Brightness Value range be 0%~100%, be black for indicating the bright-dark degree of color, when brightness is 0%, when brightness is 100% For white, when between 0%~100%, then it is used for indicating the bright-dark degree of each color.Relative to rgb space, HSV is empty Between can intuitively express very much the light and shade of color, tone and bright-coloured degree, facilitate the comparison carried out between color.Therefore Using hsv color spaces, regions of different colours has the otherness of bigger, is more easy to distinguish the difference of regions of different colours.Profit With kmeans machine learning classification methods, the colouring information of image is clustered, obtains the average h of gesture area, s, v values; It is 3 classes by the image data of HSV space cluster, ideally obtains that background white is a kind of, gesture area is a kind of and other Region is a kind of;Specific implementation steps in detail are as described below:
Kmeans obtains human hand area pixel feature:
1, original color picture is cut and is scaled, reduce picture pixels amount, reduce calculated value.Come with practical conditions It sees, gesture is generally concentrated at center picture areas adjacent, and a few cases have slight deviations.Therefore to picture 4 sides up and down To the wide image of 10 pixels is cropped respectively, then by 40 times of image length and width scaled down, it is scaled the size of 72*96, this Master drawing piece total size is reduced into 6912 pixels, substantially reduces the calculation amount of pixel cluster and neural network.
2, fuzzification operation twice is carried out to picture, keeps picture different zones color change more smooth, can effectively reduces The influence of part bright spot or dim spot in picture, acquired results are as shown in Fig. 5-3.
3, picture pixels value is converted to hsv from rgb values with the transformation rule of rgb color spaces to hsv color spaces Value, then using kmeans clustering methods, picture pixels carry out clustering to treated.Kmeans is a kind of unsupervised Machine learning algorithm is clustered, main purpose is that similar sample is grouped into automatically in a classification.This step is attempted HSV space Image data cluster be 3 classes, ideally obtain that background white is a kind of, gesture area is a kind of and other regions are a kind of. Under default situations, kmeans algorithms cluster random 3 initial barycenter, for the spy of preliminary Hand Gesture Segmentation coloured image Point, the present invention use the initial barycenter of 3 customization, i.e. white (hsv:0 °, 0%, 100%), black (hsv:0 °, 0%, 0%), yellow race's skin average color (hsv:60 °, 90%, 60%).Initial barycenter is customized using 3, final cluster result For the barycenter of 3 pixel classes closer to initialization barycenter, obtained cluster result is more accurate.By kmeans clusterings, picture Pixel is divided into 3 classes, then the classification that initial barycenter is yellow race's skin average color is exactly all human hand regions finally divided Pixel.The center of mass values represents human hand regional color feature.
Step 2004, threshold method Accurate Segmentation coloured image gesture area is utilized.Human hand region is obtained by step 2003 Color character, i.e. the barycenter of skin pixels class.Obtain category barycenter, be split using threshold method, i.e., with barycenter hsv values Difference is used as human hand area pixel within (15 °, 10%, 10%), other pixels is set to white, which passes through multiple Experiment obtains, and has good segmentation effect.The segmentation result of gained is as shown in Fig. 5-4.Then minimum filtering device is used afterwards Picture is filtered, part independent noise point is eliminated.Final segmentation result is obtained, as illustrated in fig. 5-5, white area is Background area, intermediate region are gesture area.
Step 2005, the accurate gesture area segmentation of depth image.The accurate gesture area of coloured image is obtained by step 2004 Segmentation as a result, by intercepting the corresponding region of depth image, white background is set to other parts, it is accurate to obtain depth image Hand Gesture Segmentation is as a result, as seen in figs. 5-6.Then gray value overturning is carried out to picture, is calculated convenient for neural network, gained knot Fruit is as illustrated in figs. 5-7.
Step 2006, transformation extension is carried out to final depth images of gestures.Due to the deficiency of Image Acquisition quantity, nerve Network training collection is too small, be easy to cause larger identification error, therefore can utilize and be zoomed in and out to picture, filtered, overturn Processing increases amount of images, increases training set.As shown in Figure 10, depth images of gestures and the image after extension are listed. Figure 10-1 is former depth images of gestures, Figure 10-2 is the image for scaling 105%, Figure 10-3 is the image for scaling 110%, Figure 10-4 For scaling 115% image, Figure 10-5 be boundary enhancing filtering image, Figure 10-6 be maximum filter filtering image, Figure 10-7 is the image of minimum filtering device filtering, Figure 10-8 is the image being rotated by 90 ° counterclockwise, Figure 10-9 is to rotate counterclockwise 180 ° of image, the image, Figure 10-11 that Figure 10-10 is 270 ° of rotation counterclockwise are the image of sharp filtering, Figure 10-12 is flat The image of sliding filtering.
The gesture area depth map obtained using above-mentioned image preprocessing flow will be as neural metwork training module Input, to training network.Such as Fig. 9, the present invention acquires 9 kinds of gestures for indicating number 1 to 9 of 10 testers altogether, altogether Meter about 2500;These pictures obtain about 30000 gesture depth by image spreading all after image preprocessing flow Image is trained for following nerve network systems and is used as training set.
As shown in fig. 7, the neural network structure designed for the present invention.The nerve network system has used 4 layer network structures, Include 2 convolutional layers (convolution) and maximum pondization operation (maxpooling), 1 full articulamentum and 1 in total Softmax output layers:
First layer is convolutional neural networks layer, there is the maximum value sub-sampling operation of the convolution kernel and 1 2*2 of 16 5*5. The stride parameters of the gray scale value matrix that each input is 1 72*96 identical as depth image size, convolution are 1, padding Parameter selection is identical as original image;Pondization operates the max pooling for taking 2*2, exports as the matrix of 16 36*48.
The second layer is convolutional neural networks layer, by the maximum value sub-sampling convolution of the convolution kernel and 1 2*2 of 32 5*5 Core.Input is the output with upper layer network structure, and the stride parameters of the matrix of 16 36*48, convolution are 1, padding parameters Selection is identical as original image;Pondization operates the max pooling for taking 2*2, exports as the matrix of 32 18*24.
Third layer is full articulamentum, and the matrix of 32 18*24 of output is connected to 512 output neurons entirely;
4th layer is softmax layers, has 512 input neurons, is output to 9 output neurons, represents 9 of 1~9 Number takes the result for exporting maximum item as identification.
Training method uses cross entropy assessment models, cross entropy to be used to describe the inefficiencies of truth, for weighing prediction knot Fruit.Network is trained using adaptability moment estimation method (adam).Adam, which is one kind, can substitute traditional stochastic gradient descent process First-order optimization method, it can iteratively update neural network weight based on training data.Neural network specifically mainly realizes generation Code is as follows:
After above-mentioned code operation, for neural network model at 2000 times after training, recognition accuracy has reached 92.53%, Training process output is as follows, and step is frequency of training, train_accuracy is training set recognition accuracy, test Accuracy is that test is recognition accuracy:
step 0,train_accuracy 0.26
test accuracy 0.113806
step 100,train_accuracy 0.42
test accuracy 0.402985
step 200,train_accuracy 0.42
test accuracy 0.518657
step 300,train_accuracy 0.82
test accuracy 0.636194
step 400,train_accuracy 0.68
test accuracy 0.695895
step 500,train_accuracy 0.76
test accuracy 0.729478
step 600,train_accuracy 0.76
test accuracy 0.744403
step 700,train_accuracy 0.82
test accuracy 0.785448
step 800,train_accuracy 0.72
test accuracy 0.800373
step 900,train_accuracy 0.9
test accuracy 0.820895
step 1000,train_accuracy 0.96
test accuracy 0.845149
step 1100,train_accuracy 0.96
test accuracy 0.867537
step 1200,train_accuracy 0.98
test accuracy 0.873134
step 1300,train_accuracy 0.94
test accuracy 0.876866
step 1400,train_accuracy 0.98
test accuracy 0.882463
step 1500,train_accuracy 0.94
test accuracy 0.882463
step 1600,train_accuracy 0.96
test accuracy 0.897388
step 1700,train_accuracy 0.96
test accuracy 0.902985
step 1800,train_accuracy 0.96
test accuracy 0.893657
step 1900,train_accuracy 0.96
test accuracy 0.916045
step 2000,train_accuracy 0.98
test accuracy 0.925373
The gesture examines identification module 400 that it is complete to input above-mentioned training by images of gestures after preprocessing module is handled At neural network in, return gesture identification result.The module can be combined with practical application, such as be integrated in the camera, User is taken pictures using camera, then program inputs neural network, obtain by carrying out above-mentioned acquisition, pretreatment to image of taking pictures Neural network prediction is as a result, feed back to the result of user's identification.
Very rare currently with the double applications for taking the photograph the fields progress 3D of mobile phone, the present invention creatively takes the photograph mobile phone application by double Other field to other than taking pictures.The present invention is temporarily just in the mobile phone of Android platform, Android platform pair at present The interface of dual camera data acquisition does not open, and cannot individually obtain a certain camera data, needs to utilize different mobile phone vendors The development interface of quotient.The present invention is realized using Huawei's mobile phone series open interface and double takes the photograph data acquisition plan.
It is apparent with the otherness of background image since images of gestures region is in depth image, i.e. gesture and other parts With the range difference of camera away from apparent, therefore sampling threshold split plot design can obtain good images of gestures region interception effect.
Predefined number 1 to 9 indicates 9 kinds of gesture, is acquired under different background environment by 10 different testers To 2500 or so sample images;
Image pre-processing phase is subsequent passed through, extracts the gesture area in image, and extract depth image, and lead to It crosses image spreading and obtains training set of about 30000 gesture depth images as nerve network system;
Using the neural network framework tensor flow that increase income, realize depth convolutional neural networks, using training sample set into Row training, has obtained the nerve network system that recognition accuracy reaches 92% or more;
The identification application stage refers to being applied in concrete practice using the deep neural network that training is completed on last stage. After user shoots images of gestures by double cameras, input system.System carries out identical pretreatment stage to image, then defeated Enter nerve network system to be identified, the result identified.

Claims (2)

1. it is a kind of based on double gesture recognition systems for taking the photograph mobile phone and artificial intelligence system, take the photograph mobile phone and machine learning realization using double The identification of human body gesture, which is characterized in that the system includes image capture module (100), image pre-processing module (200), god Through network training module (300) and gesture recognition module (400);Wherein:
Described image acquisition module is used to obtain the original image because of the two width othernesses that camera visual angle difference generates, including The coloured image of left and right camera and image including depth information, and preserve;
Described image preprocessing module obtains gesture area depth image for intercepting gesture area from original image;
The neural metwork training module, is trained using deep neural network for the depth image to acquisition, obtains god Through network system;
The gesture examines identification module, for the images of gestures input information according to required identification, returns to gesture identification result.
2. a kind of based on double gesture identification methods for taking the photograph mobile phone and artificial intelligence system, which is characterized in that this method specifically includes Following steps:
Utilize the JPG image datas of two cameras of described image acquisition module (100) while acquisition;The JPG images contain 3 The depth that a part, i.e., the coloured image of left camera shooting and the coloured image of right camera shooting and pretreatment obtain Image;Then JPG image segmentation processing is carried out, i.e., is provided according to JPG file formats:0xFFD8 is jpg file headers, 0xFFDA is SOA format fields, the stored fragments for extracting corresponding left and right camera image preserve respectively;Then depth quantized image slice with 0x0065646f6600 starts, and is individually preserved after extraction;Ascii (ASCII) word of the hexadecimal string Symbol is expressed as edof marks;
It is obtained in the image with depth information from original image using described image preprocessing module (200), utilizes threshold value Split plot design intercepts the gesture area in depth image;And corresponding gesture area is intercepted from coloured image, as preliminary gesture point Cut result;Depth image colouring information is transformed into HSV space by rgb space, utilizes kmeans machine learning classification methods pair The colouring information of image is clustered, and is 3 classes by the image data cluster of HSV space, is ideally obtained background white one Class, gesture area are a kind of and other regions are a kind of;After the classified pixels for obtaining gesture area, pixel mean value and variance are asked Size, using thresholding method according to corresponding accurate gesture area in mean value and variance interception coloured image;Utilize cromogram As accurate gesture area is cut into depth image gesture area, final depth images of gestures is obtained;To final depth gesture Image carries out transformation extension, enhances training dataset, reaches about 30000 or more depth images;
The gesture area depth map obtained to image pre-processing module using the neural metwork training module (300) carries out god Through network training, which is constituted by 4 layers, and first layer is convolutional neural networks layer, by the convolution kernel of 16 5*5 and 1 The gray-scale map that the size of input is 72*96 is exported 16 36*48 features by the maximum value sub-sampling convolution kernel of 2*2;The second layer is The size of input is by convolutional neural networks layer by the maximum value sub-sampling convolution kernel of the convolution kernel and 1 2*2 of 32 5*5 The gray-scale map of a 32 36*48 exports the feature of 64 18*24;Third layer is full articulamentum, by the spy of 64 18*24 of output Sign figure is connected to 512 output neurons entirely;4th layer is softmax layers, has 512 input neurons, is output to 9 outputs Neuron represents 1~9 9 numbers, takes the result for exporting maximum item as identification;
Identification module (400), the gesture pre-processed by above-mentioned image pre-processing module are examined using the gesture Then depth map inputs neural net layer, obtain prediction result.
CN201810402470.9A 2018-04-28 2018-04-28 Gesture recognition system based on double-camera mobile phone and artificial intelligence system Expired - Fee Related CN108717524B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810402470.9A CN108717524B (en) 2018-04-28 2018-04-28 Gesture recognition system based on double-camera mobile phone and artificial intelligence system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810402470.9A CN108717524B (en) 2018-04-28 2018-04-28 Gesture recognition system based on double-camera mobile phone and artificial intelligence system

Publications (2)

Publication Number Publication Date
CN108717524A true CN108717524A (en) 2018-10-30
CN108717524B CN108717524B (en) 2022-05-06

Family

ID=63899399

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810402470.9A Expired - Fee Related CN108717524B (en) 2018-04-28 2018-04-28 Gesture recognition system based on double-camera mobile phone and artificial intelligence system

Country Status (1)

Country Link
CN (1) CN108717524B (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109767488A (en) * 2019-01-23 2019-05-17 广东康云科技有限公司 Three-dimensional modeling method and system based on artificial intelligence
CN109948483A (en) * 2019-03-07 2019-06-28 武汉大学 A kind of personage's interactive relation recognition methods based on movement and facial expression
CN110141232A (en) * 2019-06-11 2019-08-20 中国科学技术大学 Data enhancement methods for the identification of robust electromyography signal
CN110322544A (en) * 2019-05-14 2019-10-11 广东康云科技有限公司 A kind of visualization of 3 d scanning modeling method, system, equipment and storage medium
CN110322545A (en) * 2019-05-14 2019-10-11 广东康云科技有限公司 Campus three-dimensional digital modeling method, system, device and storage medium
CN110322546A (en) * 2019-05-14 2019-10-11 广东康云科技有限公司 Substation's three-dimensional digital modeling method, system, device and storage medium
CN110348323A (en) * 2019-06-19 2019-10-18 广东工业大学 A kind of wearable device gesture identification method based on Neural Network Optimization
CN111079530A (en) * 2019-11-12 2020-04-28 青岛大学 Mature strawberry identification method
CN111429156A (en) * 2020-03-26 2020-07-17 北京九歌创艺文化艺术有限公司 Artificial intelligence recognition system for mobile phone and application thereof
CN113408443A (en) * 2021-06-24 2021-09-17 齐鲁工业大学 Gesture posture prediction method and system based on multi-view images
CN113553877A (en) * 2020-04-07 2021-10-26 舜宇光学(浙江)研究院有限公司 Depth gesture recognition method and system and electronic equipment
CN115147672A (en) * 2021-03-31 2022-10-04 广东高云半导体科技股份有限公司 Artificial intelligence system and method for identifying object types

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101710418A (en) * 2009-12-22 2010-05-19 上海大学 Interactive mode image partitioning method based on geodesic distance
CN104050682A (en) * 2014-07-09 2014-09-17 武汉科技大学 Image segmentation method fusing color and depth information
CN105825494A (en) * 2015-08-31 2016-08-03 维沃移动通信有限公司 Image processing method and mobile terminal
CN107300976A (en) * 2017-08-11 2017-10-27 五邑大学 A kind of gesture identification household audio and video system and its operation method
CN107563333A (en) * 2017-09-05 2018-01-09 广州大学 A kind of binocular vision gesture identification method and device based on ranging auxiliary
CN107622257A (en) * 2017-10-13 2018-01-23 深圳市未来媒体技术研究院 A kind of neural network training method and three-dimension gesture Attitude estimation method
CN107766842A (en) * 2017-11-10 2018-03-06 济南大学 A kind of gesture identification method and its application

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101710418A (en) * 2009-12-22 2010-05-19 上海大学 Interactive mode image partitioning method based on geodesic distance
CN104050682A (en) * 2014-07-09 2014-09-17 武汉科技大学 Image segmentation method fusing color and depth information
CN105825494A (en) * 2015-08-31 2016-08-03 维沃移动通信有限公司 Image processing method and mobile terminal
CN107300976A (en) * 2017-08-11 2017-10-27 五邑大学 A kind of gesture identification household audio and video system and its operation method
CN107563333A (en) * 2017-09-05 2018-01-09 广州大学 A kind of binocular vision gesture identification method and device based on ranging auxiliary
CN107622257A (en) * 2017-10-13 2018-01-23 深圳市未来媒体技术研究院 A kind of neural network training method and three-dimension gesture Attitude estimation method
CN107766842A (en) * 2017-11-10 2018-03-06 济南大学 A kind of gesture identification method and its application

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ADITYA TEWARI ET AL.: "A Probabilistic Combination of CNN and RNN Estimates for Hand Gesture Based Interaction in Car", 《2017 IEEE INTERNATIONAL SYMPOSIUM ON MIXED AND AUGMENTED REALITY (ISMAR-ADJUNCT)》 *
余旭: "基于Kinect传感器的动态手势识别", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
皮志明: "结合深度信息的图像分割算法研究", 《中国博士学位论文全文数据库 信息科技辑》 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109767488A (en) * 2019-01-23 2019-05-17 广东康云科技有限公司 Three-dimensional modeling method and system based on artificial intelligence
CN109948483A (en) * 2019-03-07 2019-06-28 武汉大学 A kind of personage's interactive relation recognition methods based on movement and facial expression
CN109948483B (en) * 2019-03-07 2022-03-15 武汉大学 Character interaction relation recognition method based on actions and facial expressions
CN110322545A (en) * 2019-05-14 2019-10-11 广东康云科技有限公司 Campus three-dimensional digital modeling method, system, device and storage medium
CN110322544A (en) * 2019-05-14 2019-10-11 广东康云科技有限公司 A kind of visualization of 3 d scanning modeling method, system, equipment and storage medium
CN110322546A (en) * 2019-05-14 2019-10-11 广东康云科技有限公司 Substation's three-dimensional digital modeling method, system, device and storage medium
CN110141232B (en) * 2019-06-11 2020-10-27 中国科学技术大学 Data enhancement method for robust electromyographic signal identification
CN110141232A (en) * 2019-06-11 2019-08-20 中国科学技术大学 Data enhancement methods for the identification of robust electromyography signal
CN110348323A (en) * 2019-06-19 2019-10-18 广东工业大学 A kind of wearable device gesture identification method based on Neural Network Optimization
CN110348323B (en) * 2019-06-19 2022-12-16 广东工业大学 Wearable device gesture recognition method based on neural network optimization
CN111079530A (en) * 2019-11-12 2020-04-28 青岛大学 Mature strawberry identification method
CN111429156A (en) * 2020-03-26 2020-07-17 北京九歌创艺文化艺术有限公司 Artificial intelligence recognition system for mobile phone and application thereof
CN113553877A (en) * 2020-04-07 2021-10-26 舜宇光学(浙江)研究院有限公司 Depth gesture recognition method and system and electronic equipment
CN113553877B (en) * 2020-04-07 2023-05-30 舜宇光学(浙江)研究院有限公司 Depth gesture recognition method and system and electronic equipment thereof
CN115147672A (en) * 2021-03-31 2022-10-04 广东高云半导体科技股份有限公司 Artificial intelligence system and method for identifying object types
CN113408443A (en) * 2021-06-24 2021-09-17 齐鲁工业大学 Gesture posture prediction method and system based on multi-view images

Also Published As

Publication number Publication date
CN108717524B (en) 2022-05-06

Similar Documents

Publication Publication Date Title
CN108717524A (en) It is a kind of based on double gesture recognition systems and method for taking the photograph mobile phone and artificial intelligence system
KR102102161B1 (en) Method, apparatus and computer program for extracting representative feature of object in image
CN109636886B (en) Image processing method and device, storage medium and electronic device
CN107016405B (en) A kind of pest image classification method based on classification prediction convolutional neural networks
CN104866868B (en) Metal coins recognition methods based on deep neural network and device
CN107194371B (en) User concentration degree identification method and system based on hierarchical convolutional neural network
CN107423707A (en) A kind of face Emotion identification method based under complex environment
WO2018049084A1 (en) Methods and systems for human imperceptible computerized color transfer
CN109886153B (en) Real-time face detection method based on deep convolutional neural network
CN108734719A (en) Background automatic division method before a kind of lepidopterous insects image based on full convolutional neural networks
CN109173263A (en) A kind of image processing method and device
CN109657612B (en) Quality sorting system based on facial image features and application method thereof
CN110991349B (en) Lightweight vehicle attribute identification method based on metric learning
CN107545536A (en) The image processing method and image processing system of a kind of intelligent terminal
CN110032925A (en) A kind of images of gestures segmentation and recognition methods based on improvement capsule network and algorithm
CN110046574A (en) Safety cap based on deep learning wears recognition methods and equipment
CN104021384B (en) A kind of face identification method and device
CN107066916A (en) Scene Semantics dividing method based on deconvolution neutral net
CN109815920A (en) Gesture identification method based on convolutional neural networks and confrontation convolutional neural networks
CN113724354B (en) Gray image coloring method based on reference picture color style
CN110309707A (en) A kind of recognition methods of the coffee drupe maturity based on deep learning
CN109360179A (en) A kind of image interfusion method, device and readable storage medium storing program for executing
CN108229515A (en) Object classification method and device, the electronic equipment of high spectrum image
Nuanmeesri A hybrid deep learning and optimized machine learning approach for rose leaf disease classification
CN110135237A (en) A kind of gesture identification method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20220506