CN110348417A - A kind of optimization method of depth Gesture Recognition Algorithm - Google Patents

A kind of optimization method of depth Gesture Recognition Algorithm Download PDF

Info

Publication number
CN110348417A
CN110348417A CN201910646658.2A CN201910646658A CN110348417A CN 110348417 A CN110348417 A CN 110348417A CN 201910646658 A CN201910646658 A CN 201910646658A CN 110348417 A CN110348417 A CN 110348417A
Authority
CN
China
Prior art keywords
data set
sample
data
file
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910646658.2A
Other languages
Chinese (zh)
Other versions
CN110348417B (en
Inventor
徐涛
冯志全
杨晓晖
曹爱增
于杰
刁心怡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Jinan
Original Assignee
University of Jinan
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Jinan filed Critical University of Jinan
Priority to CN201910646658.2A priority Critical patent/CN110348417B/en
Publication of CN110348417A publication Critical patent/CN110348417A/en
Application granted granted Critical
Publication of CN110348417B publication Critical patent/CN110348417B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Image Analysis (AREA)

Abstract

This application discloses a kind of optimization methods of depth Gesture Recognition Algorithm, are related to human-computer interaction technique field, which comprises establish unified data set storage configuration and unified document data set;According to configuration file by data set sample to be loaded into memory;The data set sample being loaded onto gives complete unified training module;Optimal Neural Network Architectures;The behavior of various adjustable parameters and training module in training process is configured according to configuration information and starts to train;The preservation of data in the preservation and experiment of model is carried out after training according to configuration file;The data that training process generates are loaded into the drafting that picture module is trained process correlation curve.Unified data set storage configuration and unified document data set are established, can be migrated between different projects, the network parameter setting and optimization to a variety of traditional networks, actual use situation is combined to be optimized and improved traditional network, increase training speed, optimize training result, reduces model volume.

Description

A kind of optimization method of depth Gesture Recognition Algorithm
Technical field
This application involves human-computer interaction technique fields, and in particular to a kind of optimization method of depth Gesture Recognition Algorithm.
Background technique
Hand is more flexible position on the person, and by gesture, we can very easily transmit information.Therefore, gesture is known Do not play the role of in the human-computer interaction of a new generation vital.Such as it can realize that trick is assisted in VR by gesture operation With to easily carry out game and Virtual assemble training etc., the operation represented by identification gesture can not have to mouse-keyboard Realize at a distance with the interactive operation of computer.Gesture identification plays the role of vital, people at present in human-computer interaction Carry out gesture identification using machine learning method mostly, this relates to the training and optimization of gesture identification model, current more Base is generally more harsh to the requirement of illumination in the Gesture Recognition Algorithm of common RGB camera, to body similar in hand color The detection of body region is more sensitive, and gesture identification performance is easy to cause to decline.Since depth images of gestures to a certain extent can Influence of the illumination variation to recognition effect is enough reduced, so the Gesture Recognition Algorithm in recent years based on depth images of gestures obtains More concern.
Current depth gesture identification is mainly based upon deep learning progress, such method mainly with machine learning or Based on depth convolutional network, by using the correlated characteristic extraction operation of the convolutional networks such as convolution sum pond to every in image One part carries out the extraction of local feature and the extraction of global characteristics, then by the full articulamentums of convolutional neural networks or Other machines learning method carries out integration to feature and extracts again, finally obtains recognition result.Currently, conventional machines are based on Performance of the gesture identification method of habit or deep learning on large data sets is unsatisfactory, conventional machines learning method user The feature of work design can not well characterize gesture feature, and in the optimization of network for deep learning method Upper different gesture identification method uses different means, and biggish difference, and the instruction of network can be brought to experimental result Practice process and code quality is very different, different Training strategies can also bring very big difference to experiment effect.
" radio engineering " refers in " the gesture motion identification based on deep neural network " text of 07 phase in 2019 A kind of gesture motion recognition methods combined based on convolutional neural networks and human body surface myoelectric signal, this method utilize human body Surface electromyogram signal carries out the acquisition of data-signal, solves that sample collection is imperfect and noise is more to a certain extent Situation using this characteristic that can find feature automatically using convolutional neural networks can be very good that manual features is avoided to bring Identification bottleneck, acquisition human body surface myoelectric signal need using specific device, although can guarantee data acquisition it is accurate Property, but the speed of data acquisition can be slowed down and increase the difficulty of data acquisition.
" electronic engineering design " " is known 02 phase in 2019 based on oval complexion model and the static gesture of deep learning A kind of deep learning static gesture identification method based on oval complexion model is referred in a not " text.This method mainly utilizes Obvious ellipse is presented in the picture and is distributed this characteristic to carry out the processing of sample for the later colour of skin of nonlinear transformation, this The benefit of sample is directly to be obtained with image input from camera, does not need to be adopted using additional wearable device Collection, while guaranteeing that the feature extracted will not encounter bottleneck as the feature manually extracted using convolutional neural networks, But for this method to the dependence of environment light than more serious, the change of environment light can generate biggish interference to recognition result, and Very big interference effect can be also played in identification process with color similar in the colour of skin.
" information communication " proposes in " gesture identification based on depth convolutional neural networks " text of 01 phase in 2019 A kind of gesture identification method of the depth convolutional neural networks based on RGB image.Wherein gesture data is carried on the back using in black Gesture sample is captured by common RGB camera under scape, avoids environment light and ambient color to a certain extent in this way Interference, but can also allow simultaneously the collecting work of sample become more complicated with it is cumbersome, simultaneously as using deep neural network Process is opaque, and the difficulty for causing experiment to reappear becomes larger, optimum experimental means disunity, and Variable Control is not in place.
" automatic technology and application " is in " gesture identification based on Kinect depth image is classified " of 04 phase in 2019 Propose a kind of gesture identification method based on Kinect depth image.The acquisition of face gesture data herein uses Kinect generation For RGB camera, can be very good to solve existing for the gesture data sample based on RGB image by environment light and color change shadow Big problem is rung, but the hard recognition algorithm mentioned in this article uses morphological feature, such characteristic of human nature In reply gesture miscellaneous task, effect is not highly desirable, and the effect and recognition result to noise resistance are not as good as use Convolutional neural networks.
" computer science " proposes in " the dynamic hand gesture recognition summary based on depth information " text of 12 phases in 2018 A kind of dynamic hand gesture recognition technology based on depth data.The acquisition of image depth information, Ke Yibao are carried out using Kinect Demonstrate,prove collected gesture is not influenced by the color of environment light and surrounding, to increase the reliability of sample characteristics.Make simultaneously It can guarantee that the image of hand portion can be very good to be cut and remain without producing with the dividing method based on depth threshold Raw excessive noise.But the local feature and global characteristics of engineer used herein come replace neural network oneself extract Feature, will lead to noise or other factors interference image recognition effect.
In conclusion sample collection means used in existing gesture identification method are various, but really efficiently and imitate Fruit it is good should be the gesture sample that acquires out of equipment that depth information acquisition is supported based on Kinect etc..Current existing gesture Recognizer is mainly based upon machine learning and two kinds of deep learning: wherein the Gesture Recognition Algorithm based on machine learning is mainly Using the feature of engineer, it is big to the feature degree of dependence manually set that such feature will lead to algorithm, will not be according to defeated Enter the difference between the automatic learning characteristic of difference of image, so as to cause algorithm between disturbance rdativery sensitive subtle feature, drop The low robustness of algorithm;And use effective load to large data sets, for the optimization hand of model in the method for deep learning The description of method and experimental setup is unclear, and the reproduction difficulty of experiment is allowed to become larger, while the various appraisal curves when experimental result is drawn More, the difficulty for analyzing experiment becomes larger.
Summary of the invention
In order to solve the above-mentioned technical problem the application, proposes following technical solution:
In a first aspect, the embodiment of the present application provides a kind of optimization method of depth Gesture Recognition Algorithm, the method Include:
S1: unified data set storage configuration and unified document data set are established;
S2: according to configuration file by data set sample to be loaded into memory;
S3: the data set sample being loaded onto gives complete unified training module;
S4: Optimal Neural Network Architectures;
S5: the behavior of various adjustable parameters and training module in training process is configured according to configuration information and starts to instruct Practice;
S6: the preservation of data in the preservation and experiment of model is carried out after training according to configuration file;
S7: the data that training process generates are loaded into the drafting that picture module is trained process correlation curve.
Using above-mentioned implementation, unified data set storage configuration and unified document data set, data load are established Process is more unified and clear, can migrate between different projects, the network structure speed of service after optimization faster, accuracy rate compared with There is promotion before not optimizing, the training of network and use process are more unified, can guarantee on the basis of good Memory control On do not influence trained speed, and data important in training process and output achievement can be preserved, unified spy Data acquisition is levied as below between the performance network between model, accuracy rate, the control variable analysis of error rate etc. provides can Energy.Traditional network is combined actual use situation to be optimized and changed by network parameter setting and optimization to a variety of traditional networks Into increase training speed optimizes training result, reduces model volume.
With reference to first aspect, in a first possible implementation of that first aspect, the S1 includes:
S11: same kind of document data set is arranged under a file, and will using classification number as folder name The file of each classification is uniformly placed under a root;
S12: by the root where data set sample, operating system contents segmentation symbol, the sub-folder for needing to scan, text Part limits type, and data sample uniformly scales size, and training set test set and the arrangement form of data set etc. are written as configuring File;
S13: reading configuration file automatically, successively scans each sub-folder for needing to scan, the data that will be scanned Collect sample path dictionary, availability detection and pre- place are carried out to the file that each path in data set sample path dictionary is directed toward Reason;
S14: automatically according to configuration file and the corresponding training set file of data set sample path dictionary creation.
The first possible implementation with reference to first aspect, in a second possible implementation of that first aspect, institute Stating S13 includes:
S131: it is carried out according to the data set root for needing to scan in configuration file and the data set catalogue for needing to scan The scanning of data set sample file, while reading restriction file type in configuration file and scanning result is filtered;
S132: by the data set sample file of scanning according to the operating system directory separator composition data in configuration file Collect sample path dictionary;
S133: graphical usability detection is carried out to each of data set sample path dictionary path and attempts to carry out Image preprocessing;
S134: can detecting step S133 go on smoothly, if cannot go on smoothly, just by problematic path from data It is deleted in collection sample path dictionary.
Second of possible implementation with reference to first aspect, in first aspect in the third possible implementation, institute Stating S133 includes:
S1331: a path is obtained from data set sample path dictionary;
S1332: whether detection file can be opened by image procossing frame;
S1333: it averages and is detected whether as ater or plain white image to the array of pixels of image;
S1334: carrying out calculating bounding box to image, and the Boundingbox of data sample is extracted using OBBs algorithm;
S1335: extracting the pixel content in Boundingbox and carries out Resize according to configuration file;
S1336: the centre of new images will be received in after Resize;
S1337: pretreated image is covered to the image of source path.
The third possible implementation with reference to first aspect, in the 4th kind of possible implementation of first aspect, institute Stating S14 includes:
S141: not over availability inspection and pretreated path can not will be carried out from data set sample path dictionary It deletes;
S142: the organizational form that configuration information determines data set is read;
S143: check whether it is using cross-validation data set: if not then carrying out step using cross-validation data set S144 carries out step S145 if using cross-validation data set;
S144: the test set and training set dictionary of non-crossing verifying are generated according to configuration file;
S145: according to the test set of configuration file generation cross validation and training set and verifying collection dictionary;
S146: the data set dictionary after generation is written to file.
4th kind of possible implementation with reference to first aspect, in the 5th kind of possible implementation of first aspect, institute Stating S144 includes:
S1441: all sample paths of every a kind of sample are read;
S1442: sample size specified in configuration file is selected at random from all sample paths, and according to configuration text The sample path randomly selected out is carried out random assortment by the ratio configured in part;
S1443: the sample packet selected at random in all scanned sample class is combined, and generates one big survey Examination collection and a big training set.
4th kind of possible implementation with reference to first aspect, in the 6th kind of possible implementation of first aspect, institute Stating S145 includes:
S1451: all sample paths of every a kind of sample are read;
S1452: sample size specified in configuration file is selected at random from all sample paths, and according to configuration text The sample path randomly selected out is carried out random assortment by the ratio configured in part;
S1453: every a kind of sample is calculated in K and rolls over sorted quantity;
S1454: sample is divided into K parts according to the calculated result of S1453, and is numbered;
S1455: the K folding according to S1454 splits number and merges the data set dictionary with number, forms K data set dictionary.
With reference to first aspect, in the 7th kind of possible implementation of first aspect, the S2 includes:
S21: being using cross-validation data set file or non-crossing validation data set file according to configuration information determination;
S22: if step S23 is carried out using non-crossing validation data set file, if it is using cross-validation data set File carries out step;
S23: the content of non-crossing validation data set training set file and test set, the classification in resolution path information are read Information generate include respective data two dictionaries, include label, path, image data: etc. information;
S24: can be automatically test set allowing a data set every time when reading cross-validation data set file, remaining For training set, two word points comprising respective data are generated, include label, path, the information such as image data, and pending datas is waited to carry Enter into memory;
S25: if there is the relevant configuration for being loaded directly into data in configuration information, in the data loading that meeting directly will be all It deposits, if not being loaded directly into the relevant configuration of data in configuration information, data can be reloaded when data are required.
With reference to first aspect, in the 8th kind of possible implementation of first aspect, the S4 includes:
S41: judge the network type used: carrying out step S42 if using Lenet-5, if being carried out using Alexnet Step S43 carries out step S44 if using VGG-16, if carrying out step S45 using GoogLenet;
S42: Sigmoid activation primitive is substituted for ReLU activation primitive, modifies the size of input feature vector figure, appropriate adjustment Pond method is changed to maximum pond by FC layers of input feature vector number;
S43: eliminating LRN layers, modifies the size of input feature vector figure, FC layers of appropriate adjustment of input feature vector number;
S44: Batch Normalization is replaced using Dropout, modifies the size of input feature vector figure, appropriate adjustment FC layers of input feature vector number;
S45: the great Chiization layer of 5x5 is split into the little Chiization layer of two 3x3, BatchNormalization is enhanced and goes out Existing frequency modifies the size of input feature vector figure, FC layers of appropriate adjustment of input feature vector number;
S46: remove Softmax layers.
With reference to first aspect, in the 9th kind of possible implementation of first aspect, the S5 includes:
S51: configuration information is read;
S52: Learningrate is set according to configuration information;
S53: exercise wheel number is set according to configuration information;
S54: frequency is verified according to configuration information setting model accuracy;
S55: training process tracker is set according to configuration information;
S56: Batchsize is set according to configuration information;
S57: loss function is set according to configuration information;
S58: decline optimizer according to configuration information setting gradient;
S59: the working condition according to configuration information setup module;
S510: checking the working condition of module, if S511-S517 is thened follow the steps using training, if use state Then follow the steps S518-S520;
S511: automaticly inspecting data stress state, and the data of needs are then loaded if it is lazy load;
S512: data set is packaged into the type that deep learning frame can identify automatically;
S513: being automatically fed into deep learning frame and be trained, and whether automatic identification can be used GPU acceleration;
S514: automatic running training process tracker;
S515: automatic running model verifying;
S516: the temporary variable that automatic recycling step S512 and step S513 is generated clears up memory simultaneously;
S517: detecting whether to terminate to train automatically, repeats step S514, step if training is not over S515, step S516, step S517, if training terminates to execute step S522;
S518: data are loaded automatically;
S519: automatic stress model;
S520: being predicted using model automatically, and whether automatic identification can be used GPU acceleration;
S521: whether automatic detection input data is all predicted, and repeats step if not predicted all S518, step S519, step S520, step S521;
S522: operation result or operating status are returned to.
With reference to first aspect, in the tenth kind of possible implementation of first aspect, the S7 includes:
S71: the tracking data of load training process tracker;
S72: automatically formatting tracks data;
S73: automaticly inspecting tracking data, whether complete and corresponding relationship is correct;
S74: automatic mapping data;
S75: drawing line color is automatically selected;
S76: automatic addition drawing coordinate axis and legend;
S77: automatic graphing.
Detailed description of the invention
Fig. 1 is a kind of flow diagram of the optimization method of depth Gesture Recognition Algorithm provided by the embodiments of the present application;
Fig. 2 is the network structure after Lenet-5 provided by the embodiments of the present application optimization;
Fig. 3 is the network structure after Alexnet provided by the embodiments of the present application optimization;
Fig. 4 is the network structure after VGG provided by the embodiments of the present application optimization;
Fig. 5 is the network structure after GoogLenet provided by the embodiments of the present application optimization.
Specific embodiment
This programme is illustrated with specific embodiment with reference to the accompanying drawing.
Fig. 1 is a kind of stream of the efficient mask method of high-definition remote sensing target large data sets provided by the embodiments of the present application Journey schematic diagram, referring to Fig. 1, which comprises
S1 establishes unified data set storage configuration and unified document data set.
Step S1 in the present embodiment specifically: S11: same kind of document data set is arranged under a file, and The file of each classification is uniformly placed under a root using classification number as folder name.S12: by data set sample The root at place, operating system contents segmentation symbol, the sub-folder for needing to scan, file limit type, and data sample is agreed to Size is scaled, training set test set and the arrangement form of data set etc. are written as configuration file.S13: automatic to read configuration text Part successively scans each sub-folder for needing to scan, the data set sample path dictionary that will be scanned, to data set sample The file that each path is directed toward in the dictionary of path carries out availability detection and pretreatment.S14: automatically according to configuration file sum number According to the corresponding training set file of collection sample path dictionary creation.
Specifically, the step S13 includes: S131: according to the data set root and need for needing to scan in configuration file The data set catalogue to be scanned carries out the scanning of data set sample file, while reading restriction file type pair in configuration file Scanning result is filtered.S132: the data set sample file of scanning is separated according to the operating system catalogue in configuration file Accord with composition data collection sample path dictionary.S133: it is available that image is carried out to each of data set sample path dictionary path Property detection and attempt carry out image preprocessing.S134: can detecting step S133 go on smoothly, if cannot go on smoothly, Just problematic path is deleted from data set sample path dictionary.
Wherein, the step S133 includes: S1331: a path is obtained from data set sample path dictionary.S1332: Whether detection file can be opened by image procossing frame.S1333: it averages to the array of pixels of image and detects whether to be pure Black or plain white image.S1334: carrying out calculating bounding box to image, extracts data sample using OBBs algorithm Boundingbox.S1335: extracting the pixel content in Boundingbox and carries out Resize according to configuration file.S1336: The centre of new images will be received in after Resize.S1337: pretreated image is covered the image of source path.
The step S14 includes: S141: not over availability inspection and will can not carry out pretreated path from data It is deleted in collection sample path dictionary.S142: the organizational form that configuration information determines data set is read.S143: check whether it is to make With cross-validation data set: if not step S144 is then carried out using cross-validation data set, if using cross-validation data set Then carry out step S145.S144: the test set and training set dictionary of non-crossing verifying are generated according to configuration file.S145: according to The test set and training set and verifying collection dictionary of configuration file generation cross validation.146: the data set dictionary after generation is write Enter to file.
Wherein, the step S144 includes: S1441: reading all sample paths of every a kind of sample.S1442: from institute Select sample size specified in configuration file in some sample paths at random, and will be with according to the ratio configured in configuration file The sample path that machine extracts carries out random assortment.S1443: the sample packet that will be selected at random in all scanned sample class It combines, generates one big test set and a big training set.
The step S145 includes: S1451: reading all sample paths of every a kind of sample.S1452: from all Sample size specified in configuration file is selected in sample path at random, and will be taken out at random according to the ratio configured in configuration file The sample path of taking-up carries out random assortment.S1453: every a kind of sample is calculated in K and rolls over sorted quantity.S1454: according to Sample is divided into K parts by the calculated result of S1453, and is numbered.S1455: the K according to S1454, which rolls over fractionation number, merges with volume Number data set dictionary, formed K data set dictionary.
S2, according to configuration file by data set sample to be loaded into memory.
Step S2 in the present embodiment specifically: S21: according to configuration information determination be using cross-validation data set file also It is non-cross-validation data set file.S22: if step S23 is carried out using non-crossing validation data set file, if it is using Cross-validation data set file carries out step.S23: reading the content of non-crossing validation data set training set file and test set, Classification information in resolution path information generate include respective data two dictionaries, include label, path, image data: etc. Information.S24: can be automatically test set allowing a data set every time when reading cross-validation data set file, remaining is Training set generates two word points comprising respective data, includes label, path, the information such as image data, and pending datas is waited to be loaded into Into memory.S25: if there is the relevant configuration for being loaded directly into data in configuration information, in the data loading that meeting directly will be all It deposits, if not being loaded directly into the relevant configuration of data in configuration information, data can be reloaded when data are required.
S3, the data set sample being loaded onto give complete unified training module.
S4, Optimal Neural Network Architectures.
Step S4 in the present embodiment specifically: S41: judge the network type used: carrying out step if using Lenet-5 S42 carries out step S43 if using Alexnet, if carrying out step S44 using VGG-16, if using GoogLenet into Row step S45.S42: being substituted for ReLU activation primitive for Sigmoid activation primitive, modifies the size of input feature vector figure, appropriate to adjust Pond method is changed to maximum pond by whole FC layers of input feature vector number, the network structure after being illustrated in figure 2 Lenet-5 optimization. S43: eliminating LRN layers, modifies the size of input feature vector figure, and FC layers of appropriate adjustment of input feature vector number is illustrated in figure 3 Network structure after Alexnet optimization.S44: Batch Normalization is replaced using Dropout, modifies input feature vector figure Size, FC layers of appropriate adjustment of input feature vector number, be illustrated in figure 4 VGG optimization after network structure.S45: by the big of 5x5 Pond layer splits into the little Chiization layer of two 3x3, enhances the frequency of BatchNormalization appearance, modifies input feature vector The size of figure, FC layers of appropriate adjustment of input feature vector number, the network structure after being illustrated in figure 5 GoogLenet optimization.S46: it goes Fall Softmax layers.
S5 configures the behavior of various adjustable parameters and training module in training process according to configuration information and starts to instruct Practice.
In the present embodiment, the step S5 specifically: S51: configuration information is read.S52: it is arranged according to configuration information Learningrate.S53: exercise wheel number is set according to configuration information.S54: it is verified according to configuration information setting model accuracy Frequency.S55: training process tracker is set according to configuration information.S56: Batchsize is set according to configuration information.S57: according to According to configuration information, loss function is set.S58: decline optimizer according to configuration information setting gradient.S59: it is set according to configuration information Set the working condition of module;S510: checking the working condition of module, if thening follow the steps S511-S517 using training, if Use state thens follow the steps S518-S520.S511: automaticly inspecting data stress state, then loads needs if it is lazyness load Data.S512: data set is packaged into the type that deep learning frame can identify automatically.S513: it is automatically fed into depth It practises frame to be trained, whether automatic identification can be used GPU acceleration.S514: automatic running training process tracker.S515: The verifying of automatic running model.S516: the temporary variable that automatic recycling step S512 and step S513 is generated clears up memory simultaneously. S517: detecting whether to terminate to train automatically, and step S514, step S515, step are repeated if training is not over Rapid S516, step S517, if training terminates to execute step S522.S518: data are loaded automatically.S519: automatic stress model. S520: being predicted using model automatically, and whether automatic identification can be used GPU acceleration.S521: detection input data is automatically No whole is predicted, and step S518, step S519, step S520, step S521 are repeated if not predicted all. S522: operation result or operating status are returned to.
S6 carries out the preservation of data in the preservation and experiment of model according to configuration file after training.
The data that training process generates are loaded into the drafting that picture module is trained process correlation curve by S7.
In the present embodiment, the step S7 is specifically included: S71: the tracking data of load training process tracker.S72: from It is dynamic to format tracking data.S73: automaticly inspecting tracking data, whether complete and corresponding relationship is correct.S74: it reflects automatically Penetrate data.S75: drawing line color is automatically selected.S76: automatic addition drawing coordinate axis and legend.S77: automatic graphing.
The application is a schematic example, and an illustrative examples establish depth gesture data collection, by following procedure It establishes data set: gesture data collection is classified, depth is according to type tidied up into sample and is sorted out, be uniformly put into a text Under part folder.The sub-folder and operation system path separator, needs inserted path in program configuration file and need to scan Suffix name, image preprocessing parameter and the data set form of filtering.Data set generation function is called, waits program according to configuration text Part is scanned file, filters, image validation checking and image preprocessing.Obtain by filtering, validation checking and Last data set sample path dictionary file after pretreatment.
Data set importing training module is started to train, carry out by following procedure: selection needs trained network to be used.? Learningrate, exercise wheel number, model accuracy verifying frequency, worker thread, tracker configuration, mould are passed in parameter list Type save parameter, Batchsize, loss function, optimizer, trace files title, model name, data set state (whether be Cross-validation data set).Training module, and incoming parameter list and prototype network, call entry function are instantiated, is waited automatic Training and preservation process data and model are completed.Obtain training process data and training pattern.
Modelling effect assessment, which includes following procedure: load training process data.It is various types of in separate file Data.Data are passed to graphics module.Setting draw data, drawing essential information are called using chain type.Execute drawing function etc. It is completed wait draw.Obtain analysis chart.
Using the network trained, is carried out by following procedure: loading trained model;Load needs the number being predicted According to;The data being predicted will be needed to be put into Parameter Dictionary, and incoming training module;Instantiate training module, and call entry Function waits completion to be predicted;Obtain prediction result.
The load of large data sets and EMS memory occupation are considered in depth gesture identification, i.e., in the number of imperial scale Program is controlled to the occupancy of memory using lazy load and more radical Memory recycle according to collection is lower, prevents big data from occupying excessive Memory slows down the training speed of program.Present applicant proposes the network parameter settings and optimization to a variety of traditional networks, will be traditional Network integration actual use situation is optimized and is improved, and training speed is increased, and optimizes training result, reduces model volume.
Present applicant proposes a kind of extensive unified CNN training and application method, and traditional training process is packaged, And majority of case is judged automatically and has been handled, it can guarantee the unification of code training environment and training regulation in this way, Simultaneously it is also ensured that influencing the available good control of variable of experimental result in the training process of code.It also saves simultaneously Time of trained written in code, it can allow researcher that preferably the time to be placed in the research of algorithm.It also provides simultaneously Flexible configuration has no effect on the flexibility of network training itself although process is unified.
It can be seen from the above, present embodiments providing a kind of optimization method of depth Gesture Recognition Algorithm, unified number is established According to collection storage configuration and unified document data set, data load process is more unified and clear, can move between different projects Move, the network structure speed of service after optimization faster, accuracy rate less optimize before have promotion, network training and using Cheng Gengjia is unified, can guarantee not influencing trained speed on the basis of good Memory control, and can be by training process In important data and output achievement preserve, unified characteristic acquisition is below between the property network between model Can, the control variable analysis of accuracy rate, error rate etc. provides possibility.To the setting of the network parameters of a variety of traditional networks with it is excellent Change, combines actual use situation to be optimized and improved traditional network, increase training speed, optimize training result, reduce mould Type volume.
Certainly, above description is also not limited to the example above, technical characteristic of the application without description can by or It is realized using the prior art, details are not described herein;The technical solution that above embodiments and attached drawing are merely to illustrate the application is not It is the limitation to the application, Tathagata substitutes, and the application is described in detail only in conjunction with and referring to preferred embodiment, ability Domain it is to be appreciated by one skilled in the art that those skilled in the art were made in the essential scope of the application Variations, modifications, additions or substitutions also should belong to claims hereof protection scope without departure from the objective of the application.

Claims (10)

1. a kind of optimization method of depth Gesture Recognition Algorithm, which is characterized in that the described method includes:
S1: unified data set storage configuration and unified document data set are established;
S2: according to configuration file by data set sample to be loaded into memory;
S3: the data set sample being loaded onto gives complete unified training module;
S4: Optimal Neural Network Architectures;
S5: the behavior of various adjustable parameters and training module in training process is configured according to configuration information and starts to train;
S6: the preservation of data in the preservation and experiment of model is carried out after training according to configuration file;
S7: the data that training process generates are loaded into the drafting that picture module is trained process correlation curve.
2. the optimization method of depth Gesture Recognition Algorithm according to claim 1, which is characterized in that the S1 includes:
S11: same kind of document data set is arranged under a file, and will be each using classification number as folder name The file of classification is uniformly placed under a root;
S12: by the root where data set sample, operating system contents segmentation symbol, the sub-folder for needing to scan, file limit Determine type, data sample uniformly scales size, and training set test set and the arrangement form of data set etc. are written as configuration file;
S13: reading configuration file automatically, successively scans each sub-folder for needing to scan, the data set sample that will be scanned This path dictionary carries out availability detection and pretreatment to the file that each path in data set sample path dictionary is directed toward;
S14: automatically according to configuration file and the corresponding training set file of data set sample path dictionary creation.
3. the optimization method of depth Gesture Recognition Algorithm according to claim 2, which is characterized in that the S13 includes:
S131: data are carried out according to the data set root for needing to scan in configuration file and the data set catalogue for needing to scan Collect the scanning of sample file, while reading restriction file type in configuration file and scanning result is filtered;
S132: by the data set sample file of scanning according to the operating system directory separator composition data collection sample in configuration file This path dictionary;
S133: graphical usability detection is carried out to each of data set sample path dictionary path and attempts to carry out image Pretreatment;
S134: can detecting step S133 go on smoothly, if cannot go on smoothly, just by problematic path from data set sample It is deleted in this path dictionary.
4. the optimization method of depth Gesture Recognition Algorithm according to claim 3, which is characterized in that the S133 includes:
S1331: a path is obtained from data set sample path dictionary;
S1332: whether detection file can be opened by image procossing frame;
S1333: it averages and is detected whether as ater or plain white image to the array of pixels of image;
S1334: carrying out calculating bounding box to image, and the Boundingbox of data sample is extracted using OBBs algorithm;
S1335: extracting the pixel content in Boundingbox and carries out Resize according to configuration file;
S1336: the centre of new images will be received in after Resize;
S1337: the image of source path is covered with pretreated image.
5. the optimization method of depth Gesture Recognition Algorithm according to claim 4, which is characterized in that the S14 includes:
S141: not over availability inspection and it will can not carry out pretreated path from data set sample path dictionary and delete It removes;
S142: the organizational form that configuration information determines data set is read;
S143: check whether it is using cross-validation data set: if not step S144 is then carried out using cross-validation data set, Step S145 is carried out if using cross-validation data set;
S144: the test set and training set dictionary of non-crossing verifying are generated according to configuration file;
S145: according to the test set of configuration file generation cross validation and training set and verifying collection dictionary;
S146: the data set dictionary after generation is written to file.
6. the optimization method of depth Gesture Recognition Algorithm according to claim 5, which is characterized in that the S144 includes:
S1441: all sample paths of every a kind of sample are read;
S1442: sample size specified in configuration file is selected at random from all sample paths, and according in configuration file The sample path randomly selected out is carried out random assortment by the ratio of configuration;
S1443: the sample packet selected at random in all scanned sample class is combined, and generates a big test set With a big training set.
7. the optimization method of depth Gesture Recognition Algorithm according to claim 5, which is characterized in that the S145 includes:
S1451: all sample paths of every a kind of sample are read;
S1452: sample size specified in configuration file is selected at random from all sample paths, and according in configuration file The sample path randomly selected out is carried out random assortment by the ratio of configuration;
S1453: every a kind of sample is calculated in K and rolls over sorted quantity;
S1454: sample is divided into K parts according to the calculated result of S1453, and is numbered;
S1455: the K folding according to S1454 splits number and merges the data set dictionary with number, forms K data set dictionary.
8. the optimization method of depth Gesture Recognition Algorithm according to claim 1, which is characterized in that the S2 includes:
S21: being using cross-validation data set file or non-crossing validation data set file according to configuration information determination;
S22: if step S23 is carried out using non-crossing validation data set file, if it is using cross-validation data set file Carry out step;
S23: the content of non-crossing validation data set training set file and test set, the classification information in resolution path information are read Generate include respective data two dictionaries, include label, path, image data: etc. information;
S24: can be automatically test set allowing a data set every time when reading cross-validation data set file, remaining is instruction Practice collection, generates two word points comprising respective data, include label, path, the information such as image data, and pending datas is waited to be loaded into In memory;
S25: if there is the relevant configuration for being loaded directly into data in configuration information, all data directly can be loaded into memory, such as It is not loaded directly into the relevant configuration of data in fruit configuration information, data can be reloaded when data are required.
9. the optimization method of depth Gesture Recognition Algorithm according to claim 1, which is characterized in that the S4 includes:
S41: judge the network type used: carrying out step S42 if using Lenet-5, if carrying out step using Alexnet S43 carries out step S44 if using VGG-16, if carrying out step S45 using GoogLenet;
S42: being substituted for ReLU activation primitive for Sigmoid activation primitive, modifies the size of input feature vector figure, and FC layers of appropriate adjustment Input feature vector number, pond method is changed to maximum pond;
S43: eliminating LRN layers, modifies the size of input feature vector figure, FC layers of appropriate adjustment of input feature vector number;
S44: using Dropout replace Batch Normalization, modify input feature vector figure size, FC layers of appropriate adjustment Input feature vector number;
The great Chiization layer of 5x5: being split into the little Chiization layer of two 3x3 by S45, enhances BatchNormalization appearance Frequency modifies the size of input feature vector figure, FC layers of appropriate adjustment of input feature vector number;
S46: remove Softmax layers.
10. the optimization method of depth Gesture Recognition Algorithm according to claim 1, which is characterized in that the S5 includes:
S51: configuration information is read;
S52: Learningrate is set according to configuration information;
S53: exercise wheel number is set according to configuration information;
S54: frequency is verified according to configuration information setting model accuracy;
S55: training process tracker is set according to configuration information;
S56: Batchsize is set according to configuration information;
S57: loss function is set according to configuration information;
S58: decline optimizer according to configuration information setting gradient;
S59: the working condition according to configuration information setup module;
S510: checking the working condition of module, if S511-S517 is thened follow the steps using training, if use state is then held Row step S518-S520;
S511: automaticly inspecting data stress state, and the data of needs are then loaded if it is lazy load;
S512: data set is packaged into the type that deep learning frame can identify automatically;
S513: being automatically fed into deep learning frame and be trained, and whether automatic identification can be used GPU acceleration;
S514: automatic running training process tracker;
S515: automatic running model verifying;
S516: the temporary variable that automatic recycling step S512 and step S513 is generated clears up memory simultaneously;
S517: detecting whether to terminate to train automatically, repeats step S514, step if training is not over S515, step S516, step S517, if training terminates to execute step S522;
S518: data are loaded automatically;
S519: automatic stress model;
S520: being predicted using model automatically, and whether automatic identification can be used GPU acceleration;
S521: whether automatic detection input data is all predicted, repeated if not predicted all step S518, Step S519, step S520, step S521;
S522: operation result or operating status are returned to.
CN201910646658.2A 2019-07-17 2019-07-17 Optimization method of depth gesture recognition algorithm Active CN110348417B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910646658.2A CN110348417B (en) 2019-07-17 2019-07-17 Optimization method of depth gesture recognition algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910646658.2A CN110348417B (en) 2019-07-17 2019-07-17 Optimization method of depth gesture recognition algorithm

Publications (2)

Publication Number Publication Date
CN110348417A true CN110348417A (en) 2019-10-18
CN110348417B CN110348417B (en) 2022-09-30

Family

ID=68175603

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910646658.2A Active CN110348417B (en) 2019-07-17 2019-07-17 Optimization method of depth gesture recognition algorithm

Country Status (1)

Country Link
CN (1) CN110348417B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046796A (en) * 2019-12-12 2020-04-21 哈尔滨拓博科技有限公司 Low-cost space gesture control method and system based on double-camera depth information
CN112508193A (en) * 2021-02-02 2021-03-16 江西科技学院 Deep learning platform
CN117351016A (en) * 2023-12-05 2024-01-05 菲特(天津)检测技术有限公司 Post-processing optimization method and device for improving accuracy of defect detection model

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108052884A (en) * 2017-12-01 2018-05-18 华南理工大学 A kind of gesture identification method based on improvement residual error neutral net
CN109559298A (en) * 2018-11-14 2019-04-02 电子科技大学中山学院 Emulsion pump defect detection method based on deep learning
WO2019080203A1 (en) * 2017-10-25 2019-05-02 南京阿凡达机器人科技有限公司 Gesture recognition method and system for robot, and robot

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019080203A1 (en) * 2017-10-25 2019-05-02 南京阿凡达机器人科技有限公司 Gesture recognition method and system for robot, and robot
CN108052884A (en) * 2017-12-01 2018-05-18 华南理工大学 A kind of gesture identification method based on improvement residual error neutral net
CN109559298A (en) * 2018-11-14 2019-04-02 电子科技大学中山学院 Emulsion pump defect detection method based on deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张鑫鑫等: "基于迁移学习的卷积神经网络应用研究", 《电脑知识与技术》 *
杨文斌等: "基于卷积神经网络的手势识别方法", 《安徽工程大学学报》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046796A (en) * 2019-12-12 2020-04-21 哈尔滨拓博科技有限公司 Low-cost space gesture control method and system based on double-camera depth information
CN112508193A (en) * 2021-02-02 2021-03-16 江西科技学院 Deep learning platform
CN112508193B (en) * 2021-02-02 2021-05-07 江西科技学院 Deep learning platform
CN117351016A (en) * 2023-12-05 2024-01-05 菲特(天津)检测技术有限公司 Post-processing optimization method and device for improving accuracy of defect detection model
CN117351016B (en) * 2023-12-05 2024-02-06 菲特(天津)检测技术有限公司 Post-processing optimization method and device for improving accuracy of defect detection model

Also Published As

Publication number Publication date
CN110348417B (en) 2022-09-30

Similar Documents

Publication Publication Date Title
CN109359538B (en) Training method of convolutional neural network, gesture recognition method, device and equipment
CN109409222B (en) Multi-view facial expression recognition method based on mobile terminal
CN110738101B (en) Behavior recognition method, behavior recognition device and computer-readable storage medium
CN107491726B (en) Real-time expression recognition method based on multichannel parallel convolutional neural network
CN110532984B (en) Key point detection method, gesture recognition method, device and system
CN110348417A (en) A kind of optimization method of depth Gesture Recognition Algorithm
CN103226388A (en) Kinect-based handwriting method
CN111191564A (en) Multi-pose face emotion recognition method and system based on multi-angle neural network
CN109670517A (en) Object detection method, device, electronic equipment and target detection model
Kim et al. Exposing fake faces through deep neural networks combining content and trace feature extractors
Tautkutė et al. Classifying and visualizing emotions with emotional DAN
Sharma et al. Deepfakes Classification of Faces Using Convolutional Neural Networks.
CN109948483A (en) A kind of personage's interactive relation recognition methods based on movement and facial expression
CN114360073A (en) Image identification method and related device
CN110610131A (en) Method and device for detecting face motion unit, electronic equipment and storage medium
CN117541994A (en) Abnormal behavior detection model and detection method in dense multi-person scene
Singh et al. Demystifying deepfakes using deep learning
Liu et al. A3GAN: An attribute-aware attentive generative adversarial network for face aging
Wang et al. Fighting malicious media data: A survey on tampering detection and deepfake detection
CN113537173B (en) Face image authenticity identification method based on face patch mapping
Lei et al. Noise-robust wagon text extraction based on defect-restore generative adversarial network
Patel et al. Deepfake video detection using neural networks
CN109325521B (en) Detection method and device for virtual character
CN113642446A (en) Detection method and device based on face dynamic emotion recognition
Wang et al. Smoking behavior detection algorithm based on YOLOv8-MNC

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant