CN110175551A - A kind of sign Language Recognition Method - Google Patents

A kind of sign Language Recognition Method Download PDF

Info

Publication number
CN110175551A
CN110175551A CN201910426216.7A CN201910426216A CN110175551A CN 110175551 A CN110175551 A CN 110175551A CN 201910426216 A CN201910426216 A CN 201910426216A CN 110175551 A CN110175551 A CN 110175551A
Authority
CN
China
Prior art keywords
pond
sign language
layer
frame
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910426216.7A
Other languages
Chinese (zh)
Other versions
CN110175551B (en
Inventor
张淑军
张群
李辉
王传旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao University of Science and Technology
Original Assignee
Qingdao University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao University of Science and Technology filed Critical Qingdao University of Science and Technology
Priority to CN201910426216.7A priority Critical patent/CN110175551B/en
Publication of CN110175551A publication Critical patent/CN110175551A/en
Application granted granted Critical
Publication of CN110175551B publication Critical patent/CN110175551B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language

Abstract

The invention discloses a kind of sign Language Recognition Methods, comprising: carries out frequency-domain transform to video sequence corresponding to sign language video, obtains the phase information of image;Phase information and video sequence are sent into a C3D convolutional neural networks convolution of progress and merged, characteristic information is formed;The characteristic information is sent into depth convolutional neural networks and carries out secondary convolution sum pond, and executes adaptive learning pond algorithm during pond, target feature vector is filtered out, is sent into full articulamentum output category result.Frequency-domain transform is integrated in deep learning algorithm by the present invention, is extracted the phase information in sign language video using frequency-domain transform, is assisted rgb space information, and the feature that deep learning network generates sign language is sent into, and thus obtained feature is more essential, accurate.Adaptive learning pond algorithm is added by the pond layer in 3D convolutional neural networks model, video features more abstract, advanced in sign language video can be excavated, obtain more accurate classification results.

Description

A kind of sign Language Recognition Method
Technical field
The invention belongs to video identification technology fields, specifically, being to be related to a kind of method for sign language semantics recognition.
Background technique
In the epoch of computer nowadays technology fast development, human-computer interaction technology receives extensive attention, and achieves Certain research achievement, this technology mainly include human expressions' identification, action recognition and Sign Language Recognition etc..Sign language is deaf-mute With a kind of strong main exchange way listened between people, but for it is strong listen people for, really received sign language for they Training can not fundamentally understand that deaf-mute's is true other than having basic common sense to some simple gesture expression Idea, this makes deaf-mute exchange difficulties between people with strong listen.At the same time, Sign Language Recognition can also with assistance application in In the education and instruction of disability crowd, to ensure the normal life and study of disability crowd.
Traditional sign Language Recognition Method needs deaf-mute to wear the data glove for having multiple sensors, according to data glove The limbs action trail for acquiring deaf-mute generates intelligible semanteme according to trace information.Currently, being mostly based on the 3D of most original The Activity recognition method of convolutional neural networks modelling is low for the Sign Language Recognition accuracy rate under small data set, computationally intensive, The phenomenon that being easy to produce over-fitting, universality be not high.
Application No. is the Chinese invention patent application of CN107506712A, disclose a kind of based on 3D depth convolutional network Human behavior recognition methods improves 3 dimension convolutional network C3D of standard, introduce multistage pondization can to arbitrary resolution and when Long video clip carries out feature extraction, to obtain final classification results.But C3D convolution net used in this method Network structure is low for large-scale data set identify precision than shallower, and it is difficult to extract optimal characteristic informations.
Application No. is the Chinese invention patent applications of CN107679491A, disclose a kind of 3D volumes for merging multi-modal data Product neural network sign Language Recognition Method, it is special by being carried out to gesture infrared image and contour images from Spatial Dimension and time dimension Sign is extracted, and is merged two network outputs based on different data format and is carried out final sign language classification.But whole network inputs It needs to get up to the data processing of input more complicated using somatosensory device additional extractions infrared image and contour images, for The bigger details Activity recognition effect of some fluctuating ranges is bad.
Application No. is the Chinese invention patent application of CN104281853A, disclose a kind of based on 3D convolutional neural networks Activity recognition method inputs feeding network as multi-channel data in conjunction with Optic flow information and carries out feature extraction respectively, finally by Full articulamentum carries out final behavior classification, and will be divided into off-line training and online recognition stage all stage.This method can be with It realizes online recognition, but the requirement to data set is excessively high, and needs to use Optic flow information, calculates more complicated, recognition efficiency Nor very high.
Summary of the invention
The purpose of the present invention is to provide a kind of sign Language Recognition Methods, it is intended to solve present in existing sign Language Recognition Method Characteristic information extracts the problem unexcellent, recognition accuracy is not high.
In order to solve the above technical problems, the present invention is achieved by the following scheme:
A kind of sign Language Recognition Method, including following procedure:
Video sequence X is formed according to sign language video;
Image procossing based on frequency-domain transform is carried out to the video sequence X, extracts phase information;
The phase information and video sequence X are respectively fed to C3D convolutional neural networks and carry out a convolution, and to convolution The feature obtained afterwards is weighted fusion, forms fused characteristic information;
The fused characteristic information is sent into 3D ResNets depth convolutional neural networks and carries out secondary convolution sum pond Change, and execute adaptive learning pond algorithm during pond, filter out target feature vector, is sent into 3D ResNets depth The full articulamentum of convolutional neural networks, output category result.
Compared with prior art, the advantages and positive effects of the present invention are: sign Language Recognition Method of the invention becomes frequency domain It changes and is integrated in deep learning algorithm, extract the phase information in sign language video using frequency-domain transform, be sent into deep learning and calculate Method generates characteristic information, and thus obtained characteristic information is more essential and accurate.In addition, the present invention passes through to 3D convolutional Neural net Network model improves, and adaptive learning pond algorithm is added in the pond layer of network model, it is possible thereby to excavate sign language view More abstract in frequency, advanced video features then obtain more accurate classification results, so that the accuracy rate of Sign Language Recognition is bright It is aobvious to be promoted.
After the detailed description of embodiment of the present invention is read in conjunction with the figure, the other features and advantages of the invention will become more Add clear.
Detailed description of the invention
It to describe the technical solutions in the embodiments of the present invention more clearly, below will be to needed in the embodiment Attached drawing is made one and is simply introduced.It should be evident that drawings in the following description are some embodiments of the invention, for this field For those of ordinary skill, without creative efforts, it is also possible to obtain other drawings based on these drawings.
Fig. 1 is a kind of flow chart of embodiment of sign Language Recognition Method proposed by the invention;
Fig. 2 is a kind of structure chart of embodiment of 3D ResNets depth convolutional neural networks;
Fig. 3 is a kind of instance graph for carrying out dimensionality reduction to eigenmatrix using adaptive learning pond algorithm.
Specific embodiment
Specific embodiments of the present invention will be described in detail with reference to the accompanying drawing.
The sign Language Recognition Method of the present embodiment mainly includes two stages:
(1) the feature coding stage based on frequency-domain transform
Frequency-domain transform is combined with deep learning, the phase information in sign language video is extracted by frequency-domain transform;So Afterwards, the phase information and sign language video data are respectively fed to C3D convolutional neural networks and carry out a convolution, and to convolution The feature obtained afterwards is weighted fusion, forms fused characteristic information.
(2) the feature decoding stage based on improved 3D ResNets depth convolutional neural networks
The fused characteristic information that first stage is formed is sent to improved depth convolutional neural networks (3D ResNets in), secondary convolution is carried out to the timing information of different timing positions using the convolution kernel of different scale;Then, then lead to The adaptive learning pond algorithm for crossing the present embodiment proposition carries out dimensionality reduction to the eigenmatrix that secondary convolution obtains, and filters out more Abstract, advanced target feature vector, is sent into full articulamentum, to obtain more accurate classification results.
Below with reference to Fig. 1, the detailed process of the sign Language Recognition Method of the present embodiment is described in detail.
S1, video sequence X is formed according to sign language video;
In the process, following steps can specifically be designed:
S101, sign language video is carried out to cut frame;
Original sign language video RGB data is cut into N number of picture frame, the N is preferably greater than or equal to 34 frames.According to middle national champion The characteristics of language data set, the sign language video as corresponding to each semanteme is shorter and smaller, for Chinese Sign Language data Collection, it is more appropriate to be cut into 34 frames for each sign language video.
S102, picture frame is pre-processed;
In view of in each sign language video, former frames and rear a few frames are usually all frozen frozen mass or background frames, in order to The calculation amount for reducing subsequent step, a step data preprocessing process preferably is carried out after cutting frame, useful figure is gone out with preliminary screening As frame, or referred to as key frame.As a kind of preferred embodiment, in the N number of picture frame that can be generated after cutting frame, by preceding f frame It is rejected with rear f frame as redundant frame, only retains intermediate picture frame as key frame.F≤5 described in preferred design.
For Chinese Sign Language data set, preceding 5 frame and rear 5 frame can be weeded out in 34 picture frames being cut into, in reservation Between 24 frames as key frame.
S103, key frame is divided into n segment according to timing;
As a kind of preferred embodiment, preferably n=3, that is, pretreated key frame is divided into three pieces according to timing Section.
S104, continuous m picture frame is randomly selected from each segment, form video sequence X;
In the present embodiment, continuous 8 picture frames are preferably randomly selected out from each segment, form video sequence X=(x1,x2,…,xn);Wherein, xiIndicate m picture frame in i-th of segment, i=1,2 ..., n.
It, then can be from each if not removing redundant frame to 34 picture frames generating after frame are cut without pretreatment Continuous 11 picture frames are randomly selected out in a segment, form the video sequence X.
Certainly, it is formed for cutting the case where quantity of the picture frame generated after frame is greater than 34 frame, or after removal redundant frame Key frame quantity be greater than 24 frame the case where, or to key frame according to timing equal part number of fragments be less than 3 sections the case where, The successive image frame more than 8 can be then randomly selected out from each segment, form the video sequence X.
S2, the image procossing based on frequency-domain transform is carried out to video sequence X, extracts image phase information;
In many algorithms of frequency-domain transform, compared to for Fourier transformation, Gabor transformation have better locality, Direction selection and the features such as with the general character, there is preferable anti-interference ability;Meanwhile for Sign Language Recognition task, work as video When frame spatial position changes, the amplitude variation of Gabor characteristic is relatively small, and phase can be as the variation of position be with a certain Corresponding change occurs for rate, and accordingly, with respect to amplitude, Gabor phase information can more represent the abstract characteristics of behavior itself, With prior meaning.
To sum up, the characteristics of the present embodiment combination sign language video, it is preferred to use the Gabor transformation in frequency-domain transform extracts video The phase information of sequence X so that all information of signal can either be provided on the whole, and can provide in any local time The information of signal intensity severe degree realizes the optimization to sign language behavioural characteristic.Since the calculation method of Gabor phase information has Very much, the combination of these methods and deep learning network belongs to the scope of the present invention in principle, but in order to reduce data dimension Several and operand, the present embodiment preferably use document [Guo Y, Xu Z, Local Gabor Phase Difference Pattern for Face Recognition, the 19th International Conference on Pattern Recognition, IEEE, 2008:1-4] propose local Gabor phase difference mode (Local Gabor Phase Difference Pattern, LGPDP) extract phase information of the picture frame after Gabor transformation.Certainly, other are based on The innovatory algorithm of LGPDP is equally applicable.
S3, video sequence X and the phase information extracted are respectively fed to C3D convolutional neural networks one secondary volume of progress Product;
In the present embodiment, video sequence X and the phase information extracted are preferably first fed into conventional C3D convolution Neural network model carries out a process of convolution, the characteristic information after generating a convolution.
S4, fusion is weighted to the characteristic information obtained after a convolution, forms fused characteristic information;
In the present embodiment, traditional Weighted Fusion algorithm can be used to by C3D convolutional neural networks process of convolution Characteristic information afterwards is weighted fusion, to form fused eigenmatrix.
S5, fused characteristic information is sent into the secondary convolution sum pond of 3D ResNets depth convolutional neural networks progress Change, to filter out target feature vector;
More accurate video features in order to obtain, the present embodiment change 3D ResNets depth convolutional neural networks Into, the adaptive learning pond algorithm based on weighting Cross-covariance is introduced, dimensionality reduction is carried out to the eigenmatrix that convolution obtains, To filter out more abstract, advanced target feature vector.
As a kind of preferred embodiment, the present embodiment preferably uses 19 layers of 3D ResNets depth convolutional neural networks, It include: 3D convolutional layer, 8 pond layers and two layers of the full articulamentum of 1 data input layer, 8 different scale convolution kernels.Such as Shown in Fig. 2,8 3D convolutional layers and 8 pond layers described in preferred design are interlaced, wherein
C1-C8 is 8 3D convolutional layers, and the convolution kernel of each 3D convolutional layer is 3 × 3 × 3, and the quantity of convolution kernel is by 64 It is incremented by successively to 512, to generate further types of high-level characteristic from rudimentary feature combination;After convolutional layer, to two-way The Fusion Features of information progress convolutional layer;
S1-S8 is 8 pond layers, each pond layer uses adaptive learning pond algorithm to carry out dimensionality reduction, wherein the Two pond layer S2, the 6th pond layer S6, the 7th pond layer S7 and the 8th pond layer S8 use 2 × 2 × 2 window Mouth carries out down-sampling to time dimension and Spatial Dimension simultaneously, other ponds layer S1, S3, S4, S5 use 1 × 2 × 2 window Mouthful, down-sampling is only carried out on Spatial Dimension.
The 3D convolutional layer of the present embodiment it is preferable to use the convolution kernel of different scale to the timing informations of different timing positions into Then the secondary convolution of row carries out the characteristic aggregation on time dimension to the convolution feature of each timing position again, to reduce net The calculation amount of network structure.As a kind of preferred embodiment, can be sent into first using the convolution kernel of 1*1 to by data input layer Eigenmatrix carry out dimensionality reduction operation, to help to reduce model parameter, to different characteristic carry out size normalization.Then, right The timing information of different timing positions carries out the convolution of different scale convolution kernel respectively, such as selects the convolution of 3*3,5*5 respectively It checks the other middle height feature of its videl stage and carries out convolution, then the convolution information of each of which timing position is weighted and is melted It closes, the eigenmatrix after forming polymerization, is sent into the feature pool that pond layer carries out adaptivity.
The present embodiment improves pond algorithm performed by each pond layer, proposes a kind of adaptive learning pond Algorithm, as shown in figure 3, firstly, corresponding Cross-covariance is calculated for the eigenmatrix after polymerization, then to obtained Cross-covariance carries out dimensionality reduction operation, obtains the feature vector until current time;Then, the important of the frame is obtained Property, the feature vector of obtained each frame Chi Huahou is calculated, different weights is successively assigned according to the height of importance, is chosen The shared maximum feature vector of weight is as target feature vector.
The detailed process of the adaptive learning pond algorithm proposed below to the present embodiment is described below:
S501, the eigenmatrix F obtained after being merged according to 3D convolutional layer convolutionn, seek FnCross-covariance Qn
S502, using conventional pond algorithm to Cross-covariance QnPond dimensionality reduction is carried out, the feature after forming dimensionality reduction Vector;
S503, the feature vector after t frame moment dimensionality reduction is expressed asIt is calculated using the following equation the t+1 frame moment Feature vector after dimensionality reductionImportance βt+1, it may be assumed that
Wherein, fpFor the anticipation function in perceptron algorithm;φ(xt+1) indicate at the video sequence X, from the 1st frame The feature vector after dimensionality reduction until t+1 frame;
The weights omega of S504, the feature vector at calculating t+1 frame moment, the weights omega should meet following calculation formula:
S505, step S503-S504 is repeated, calculates the weight of the feature vector at each frame moment;
S506, according to sequence from high to low, the weight of the feature vector at each frame moment calculated to step S505 It is ranked up, weight is higher, and the useful information which contains is more;
The maximum feature vector of S507, weight selection is sent into full articulamentum as target feature vector.
In the present embodiment, the data for being sent to each 3D convolutional layer are eigenmatrixes, are executing convolution pond Later, a target feature vector is obtained by each pond layer.The target signature that will be obtained by each pond layer Vector is respectively fed to full articulamentum, to obtain more accurate classification results.In order to prevent under deep layer network gradient explosion or The problems such as disperse, is preferably added BN layers after each 3D convolutional layer, all carries out dropout in each layer of full articulamentum Operation.
The target feature vector that S6, basis filter out, is sent into full articulamentum and obtains final classification results;
3D ResNets depth convolutional neural networks two full articulamentums of preferred design of the present embodiment, as shown in Figure 2.Its In,
FC1 is first full articulamentum, preferably comprises 512 neurons, the feature exported by the 8th pond layer S8 Vector is connected with FC1 layers of 512 neurons, is converted into the feature vector of 512 dimensions in this layer;The 8th pond layer S8 with Dropout layers are used between first full articulamentum FC1, by 0.5 probability dropping partial nerve network unit, and utilize migration Learning algorithm freezes the 8th pond layer S8 with 0.1 probability and connect with the part of first full articulamentum FC1;
FC2 is second full articulamentum, while being also intensive output layer, including identical with the class number of classification results Neuron, such as the number of neuron is 6;Each neuron and first full articulamentum in second full articulamentum FC2 512 neurons in FC1 connect entirely, finally classify via classifier Softmax recurrence, export affiliated sign language classification Classification results.
As a kind of preferred embodiment, in 3D ResNets depth convolutional neural networks, 3D convolutional layer and first are complete It is preferable to use ELU by articulamentum FC1 as activation primitive, to promote the performance of depth network.Second full articulamentum FC2 preferably makes Use Softmax as activation primitive, majorized function is it is preferable to use SGD function, and it is preferable to use more classification cross entropy letters for loss function Several the sum of errors with adaptive learning pond algorithm, that is, loss function can be embodied as:
L (X, Y)=lcro(x,y)+μlB(τ);
Wherein, L (X, Y) is loss function;lcro(x, y) is that more classification intersect entropy function;lB(τ) is adaptive learning pond Change the error of algorithm;μ is hyper parameter.Since the error of loss function, more classification intersection entropy functions and pond algorithm is existing Technology, therefore, in above-mentioned formula, the meaning of the relevant parameter in each function is all known to those skilled in the art , the present embodiment is not described in detail.
The classification results exported as a result, by the full articulamentum of 3D ResNets depth convolutional neural networks, as identify Sign language meaning out.
The sign Language Recognition Method of the present embodiment can be divided into training and two stages of test.Training stage is using the above step Rapid S1-S6 is trained, and before this, carries out the initialization of weight to whole network structure first, it is preferred to use disclosed base Quasi- Activity recognition data set Kinetics carries out weights initialisation to 3D ResNets depth convolutional neural networks, so that weight Initialization adapt to the task of this Sign Language Recognition enough.Then, use transfer learning strategy to entire net during training Network structure carries out transfer learning, freezes convolutional layer, constantly the full articulamentum of training the last layer, keeps final classification result more quasi- Really.In addition, 0.001 is set by initial learning rate, over time, with 1/10th after each iterative process Rate gradually decreases learning rate, and learning rate is changed in 2000 stoppings until iteration is completed.Whole network the number of iterations is complete Accuracy rate is set gradually to tend towards stability at 2000 times or so before.Momentum is set as 0.9, loads last after iteration 30,000 times Secondary network model, into test phase.
In test phase, Chinese Sign Language data set can be selected as data source, all test process are in this data set On tested.
Frequency-domain transform is integrated in deep learning algorithm by sign Language Recognition Method of the invention, is well identified using having The Gabor phase information of performance assists the rgb space information of sign language video, utilize the phase information and deep learning extracted Process combines, and can obtain more essential, accurate sign language behavioural characteristic;Use improved 19 layers of deep layer convolutional Neural net Network excavates video features more abstract, advanced in original video;Different timing positions are captured using the convolution kernel of different scale Video level characteristics, calculation amount can not only be reduced, moreover it is possible to make full use of the raw information in video, better adapt to complexity Sign Language Recognition under background;Finally, the pond algorithm using adaptive learning carries out dimensionality reduction to the eigenmatrix that convolution obtains, obtain To more accurate classification results, the accuracy rate of Sign Language Recognition is improved.
Certainly, the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although referring to aforementioned reality Applying example, invention is explained in detail, those skilled in the art should understand that: it still can be to aforementioned each Technical solution documented by embodiment is modified or equivalent replacement of some of the technical features;And these are modified Or replacement, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution.

Claims (10)

1. a kind of sign Language Recognition Method characterized by comprising
Video sequence X is formed according to sign language video;
Image procossing based on frequency-domain transform is carried out to the video sequence X, extracts phase information;
The phase information and video sequence X are respectively fed to C3D convolutional neural networks and carry out a convolution, and is obtained to after convolution To feature be weighted fusion, form fused characteristic information;
The fused characteristic information is sent into 3D ResNets depth convolutional neural networks and carries out secondary convolution sum pond, and Adaptive learning pond algorithm is executed during pond, filters out target feature vector, is sent into 3D ResNets depth convolution The full articulamentum of neural network, output category result.
2. sign Language Recognition Method according to claim 1, which is characterized in that adaptive learning pond algorithm includes:
According to the eigenmatrix F generated after secondary convolutionn, seek FnCross-covariance Qn
To Cross-covariance QnPond dimensionality reduction is carried out, the feature vector after forming dimensionality reduction;
Feature vector after t frame moment dimensionality reduction is expressed asFeature vector after calculating t+1 frame moment dimensionality reductionImportance βt+1:
Wherein, fpFor the anticipation function in perceptron algorithm;φ(xt+1) indicate at the video sequence X, it is by the end of t+1 frame The feature vector after dimensionality reduction only;
The weights omega of the feature vector at t+1 frame moment is calculated, the weights omega meets following calculation formula:
Calculate the weight of the feature vector at each frame moment, the maximum feature vector of weight selection as the target signature to Amount.
3. sign Language Recognition Method according to claim 1, which is characterized in that during forming the video sequence X, Include:
Sign language video is carried out to cut frame;
Picture frame corresponding to sign language video is divided into n segment according to timing;
Continuous m picture frame is randomly selected from each segment, forms the video sequence X=(x1,x2,…,xn); Wherein, xiIndicate m picture frame in i-th of segment.
4. sign Language Recognition Method according to claim 3, which is characterized in that during forming the video sequence X, It specifically includes:
Each sign language video is cut to N frame, N >=34, and is rejected using preceding f frame and rear f frame as redundant frame, is retained intermediate Key frame, f≤5;
The key frame of the centre is divided into three segments according to timing;
Continuous at least eight picture frame is randomly selected from each segment, forms the video sequence X.
5. sign Language Recognition Method according to claim 1, which is characterized in that extracting phase information based on frequency-domain transform In the process, the phase information of picture frame is extracted using Gabor transformation.
6. sign Language Recognition Method according to any one of claim 1 to 5, which is characterized in that in the 3D ResNets In depth convolutional neural networks, 3D convolutional layer carries out the timing information of different timing positions using the convolution kernel of different scale Then secondary convolution carries out the characteristic aggregation on time dimension to the convolution feature of each timing position, forms secondary convolution Eigenmatrix later is sent into pond layer, and then carries out dimensionality reduction using adaptive learning pond algorithm, to filter out target Feature vector.
7. sign Language Recognition Method according to claim 6, which is characterized in that the 3D ResNets depth convolutional Neural net Network includes 8 3D convolutional layers and 8 pond layers, and 8 3D convolutional layers and 8 pond layers are interlaced;Wherein,
The convolution kernel of each 3D convolutional layer is 3 × 3 × 3, and the quantity of convolution kernel is incremented by successively by 64 to 512, in convolutional layer Later, the Fusion Features of convolutional layer are carried out to two-way information;
Each pond layer carries out dimensionality reduction using adaptive learning pond algorithm, wherein second pond layer, the 6th Pond layer, the 7th pond layer and the 8th pond layer use 2 × 2 × 2 window while to time dimensions and space dimension Degree carries out down-sampling, other pond layers use 1 × 2 × 2 window, down-sampling is only carried out on Spatial Dimension.
8. sign Language Recognition Method according to claim 7, which is characterized in that be separately added into after each 3D convolutional layer BN layers.
9. sign Language Recognition Method according to claim 7, which is characterized in that the 3D ResNets depth convolutional Neural net Network further includes a data input layer and two full articulamentums, wherein
First full articulamentum includes 512 neurons, is converted by the feature vector that the 8th pond layer exports in this layer For the feature vector of 512 dimensions, Dropout layers are used between the 8th pond layer and first full articulamentum, by 0.5 probability Partial nerve network unit is abandoned, and the 8th pond layer and first are freezed entirely with 0.1 probability using transfer learning algorithm The part of articulamentum connects;
Second full articulamentum is intensive output layer, and including neuron identical with the class number of classification results, second complete Each neuron in articulamentum is connect entirely with 512 neurons in first full articulamentum, is finally carried out via classifier Classification exports the classification results of affiliated sign language classification.
10. sign Language Recognition Method according to claim 9, which is characterized in that the 3D convolutional layer and first full connection Layer uses ELU as activation primitive, and described second full articulamentum uses Softmax as activation primitive, and majorized function uses SGD function, loss function are the sum of the error that more classification intersect entropy function and adaptive learning pond algorithm.
CN201910426216.7A 2019-05-21 2019-05-21 Sign language recognition method Active CN110175551B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910426216.7A CN110175551B (en) 2019-05-21 2019-05-21 Sign language recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910426216.7A CN110175551B (en) 2019-05-21 2019-05-21 Sign language recognition method

Publications (2)

Publication Number Publication Date
CN110175551A true CN110175551A (en) 2019-08-27
CN110175551B CN110175551B (en) 2023-01-10

Family

ID=67691821

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910426216.7A Active CN110175551B (en) 2019-05-21 2019-05-21 Sign language recognition method

Country Status (1)

Country Link
CN (1) CN110175551B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111126493A (en) * 2019-12-25 2020-05-08 东软睿驰汽车技术(沈阳)有限公司 Deep learning model training method and device, electronic equipment and storage medium
CN111310701A (en) * 2020-02-27 2020-06-19 腾讯科技(深圳)有限公司 Gesture recognition method, device, equipment and storage medium
CN111339837A (en) * 2020-02-08 2020-06-26 河北工业大学 Continuous sign language recognition method
CN111507275A (en) * 2020-04-20 2020-08-07 北京理工大学 Video data time sequence information extraction method and device based on deep learning
CN112464816A (en) * 2020-11-27 2021-03-09 南京特殊教育师范学院 Local sign language identification method and device based on secondary transfer learning
CN113378722A (en) * 2021-06-11 2021-09-10 西安电子科技大学 Behavior identification method and system based on 3D convolution and multilevel semantic information fusion
US11227151B2 (en) 2020-03-05 2022-01-18 King Fahd University Of Petroleum And Minerals Methods and systems for computerized recognition of hand gestures
CN116343342A (en) * 2023-05-30 2023-06-27 山东海量信息技术研究院 Sign language recognition method, system, device, electronic equipment and readable storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5901246A (en) * 1995-06-06 1999-05-04 Hoffberg; Steven M. Ergonomic man-machine interface incorporating adaptive pattern recognition based control system
CN104376306A (en) * 2014-11-19 2015-02-25 天津大学 Optical fiber sensing system invasion identification and classification method and classifier based on filter bank
CN105654037A (en) * 2015-12-21 2016-06-08 浙江大学 Myoelectric signal gesture recognition method based on depth learning and feature images
CN107767405A (en) * 2017-09-29 2018-03-06 华中科技大学 A kind of nuclear phase for merging convolutional neural networks closes filtered target tracking
CN107845390A (en) * 2017-09-21 2018-03-27 太原理工大学 A kind of Emotional speech recognition system based on PCNN sound spectrograph Fusion Features
CN109409276A (en) * 2018-10-19 2019-03-01 大连理工大学 A kind of stalwartness sign language feature extracting method
JP2019074478A (en) * 2017-10-18 2019-05-16 沖電気工業株式会社 Identification device, identification method and program

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5901246A (en) * 1995-06-06 1999-05-04 Hoffberg; Steven M. Ergonomic man-machine interface incorporating adaptive pattern recognition based control system
CN104376306A (en) * 2014-11-19 2015-02-25 天津大学 Optical fiber sensing system invasion identification and classification method and classifier based on filter bank
CN105654037A (en) * 2015-12-21 2016-06-08 浙江大学 Myoelectric signal gesture recognition method based on depth learning and feature images
CN107845390A (en) * 2017-09-21 2018-03-27 太原理工大学 A kind of Emotional speech recognition system based on PCNN sound spectrograph Fusion Features
CN107767405A (en) * 2017-09-29 2018-03-06 华中科技大学 A kind of nuclear phase for merging convolutional neural networks closes filtered target tracking
JP2019074478A (en) * 2017-10-18 2019-05-16 沖電気工業株式会社 Identification device, identification method and program
CN109409276A (en) * 2018-10-19 2019-03-01 大连理工大学 A kind of stalwartness sign language feature extracting method

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111126493A (en) * 2019-12-25 2020-05-08 东软睿驰汽车技术(沈阳)有限公司 Deep learning model training method and device, electronic equipment and storage medium
CN111126493B (en) * 2019-12-25 2023-08-01 东软睿驰汽车技术(沈阳)有限公司 Training method and device for deep learning model, electronic equipment and storage medium
CN111339837B (en) * 2020-02-08 2022-05-03 河北工业大学 Continuous sign language recognition method
CN111339837A (en) * 2020-02-08 2020-06-26 河北工业大学 Continuous sign language recognition method
CN111310701A (en) * 2020-02-27 2020-06-19 腾讯科技(深圳)有限公司 Gesture recognition method, device, equipment and storage medium
CN111310701B (en) * 2020-02-27 2023-02-10 腾讯科技(深圳)有限公司 Gesture recognition method, device, equipment and storage medium
US11227151B2 (en) 2020-03-05 2022-01-18 King Fahd University Of Petroleum And Minerals Methods and systems for computerized recognition of hand gestures
CN111507275A (en) * 2020-04-20 2020-08-07 北京理工大学 Video data time sequence information extraction method and device based on deep learning
CN111507275B (en) * 2020-04-20 2023-10-10 北京理工大学 Video data time sequence information extraction method and device based on deep learning
CN112464816A (en) * 2020-11-27 2021-03-09 南京特殊教育师范学院 Local sign language identification method and device based on secondary transfer learning
CN113378722A (en) * 2021-06-11 2021-09-10 西安电子科技大学 Behavior identification method and system based on 3D convolution and multilevel semantic information fusion
CN113378722B (en) * 2021-06-11 2023-04-07 西安电子科技大学 Behavior identification method and system based on 3D convolution and multilevel semantic information fusion
CN116343342A (en) * 2023-05-30 2023-06-27 山东海量信息技术研究院 Sign language recognition method, system, device, electronic equipment and readable storage medium
CN116343342B (en) * 2023-05-30 2023-08-04 山东海量信息技术研究院 Sign language recognition method, system, device, electronic equipment and readable storage medium

Also Published As

Publication number Publication date
CN110175551B (en) 2023-01-10

Similar Documents

Publication Publication Date Title
CN110175551A (en) A kind of sign Language Recognition Method
CN108319686B (en) Antagonism cross-media retrieval method based on limited text space
Li et al. A comparison of 3D shape retrieval methods based on a large-scale benchmark supporting multimodal queries
Wu et al. Attention deep model with multi-scale deep supervision for person re-identification
Xin et al. Arch: Adaptive recurrent-convolutional hybrid networks for long-term action recognition
CN108647628B (en) Micro-expression recognition method based on multi-feature multi-task dictionary sparse transfer learning
CN113221639A (en) Micro-expression recognition method for representative AU (AU) region extraction based on multitask learning
CN101968853A (en) Improved immune algorithm based expression recognition method for optimizing support vector machine parameters
Rasheed et al. Handwritten Urdu characters and digits recognition using transfer learning and augmentation with AlexNet
CN105160352A (en) High-dimensional data subspace clustering projection effect optimization method based on dimension reconstitution
Zhu et al. Big data image classification based on distributed deep representation learning model
Dong et al. A procedural texture generation framework based on semantic descriptions
Sun et al. Second-order encoding networks for semantic segmentation
Liu et al. Lightweight ViT model for micro-expression recognition enhanced by transfer learning
Rani et al. An effectual classical dance pose estimation and classification system employing convolution neural network–long shortterm memory (CNN-LSTM) network for video sequences
CN113222002A (en) Zero sample classification method based on generative discriminative contrast optimization
Han Residual learning based CNN for gesture recognition in robot interaction
Shen et al. Multi-dimensional, multi-functional and multi-level attention in YOLO for underwater object detection
CN115222998B (en) Image classification method
Zhu et al. Multiscale brain-like neural network for saliency prediction on omnidirectional images
Yang et al. A new residual dense network for dance action recognition from heterogeneous view perception
Ling et al. A facial expression recognition system for smart learning based on YOLO and vision transformer
Hua et al. Collaborative Generative Adversarial Network with Visual perception and memory reasoning
Li et al. Attentive 3d-ghost module for dynamic hand gesture recognition with positive knowledge transfer
Dafnis et al. Isolated sign recognition using ASL datasets with consistent text-based gloss labeling and curriculum learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant