CN109409276A

CN109409276A - A kind of stalwartness sign language feature extracting method

Info

Publication number: CN109409276A
Application number: CN201811218298.8A
Authority: CN
Inventors: 高庆华; 王洁; 马晓瑞
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2018-10-19
Filing date: 2018-10-19
Publication date: 2019-03-01

Abstract

The present invention provides a kind of healthy and strong sign language feature extracting method, and sign language feature healthy and strong and with significant separating capacity can be extracted from the sign language of multiple scenes movement, belongs to medical treatment & health and information technology field.The sign language feature extracting method fights the network architecture using depth, including three sub-networks: the feature extraction network based on depth convolutional coding structure, the scenery identification network based on full connection structure and the sorter network based on rarefaction representation structure.By minimizing sign language action classification evaluated error simultaneously and maximizing scene information evaluated error, this method guarantees that ga s safety degree of the extracted sign language feature between different sign languages movement and the consistency under different scenes, so that the sign Language Recognition using this method can across scene work.The present invention will promote performance of the sign Language Recognition in practical more scene work, exchange the condition of offer with the effective of ordinary people for dysaudia personage.

Description

A kind of stalwartness sign language feature extracting method

Technical field

The invention belongs to medical treatment & healths and information technology field, are related to a kind of healthy and strong sign language feature extracting method, Ke Yicong Sign language feature healthy and strong and with significant separating capacity is extracted in the sign language movement of multiple scenes.This method is by depth pair Anti- network extracts sign language feature, it is ensured that ga s safety degree of this feature between different sign languages movement and under the different scenes Consistency so that the sign Language Recognition using this method can across scene work.

Background technique

Sign language is the language of the daily exchange of dysaudia personage, for promoting the exchange between dysaudia personage to play weight It acts on.However, sign language movement is complex and is difficult to remember, sign language is caused to be difficult to be grasped.Sign Language Recognition can be automatic Sign language movement is identified, to effectively promote exchanging for dysaudia personage and ordinary people.

Currently, researcher devises a variety of methods to realize Sign Language Recognition, and has carried out beneficial exploration:

(bibliography: Huang Aifa, Xu Xiangmin, Xing Xiaofen, Li Zhaohai, Ni Haomiao one kind are based on Leap to Huang Aifa et al. Manual alphabet recognition methods [P] Chinese invention patent of Motion, the number of applying for a patent: CN201510254098.8,2015.) it mentions Sign Language Recognition is realized using body-sensing sensor out, sign language information is acquired by somatosensory device, extracts sign language feature, and then be based on Template matching method realizes Sign Language Recognition.

Hu Zhangfang et al. (bibliography: static state of Hu Zhangfang, Luo Yuan, Zhang Yi, Yang Lin, the Xi Bing based on Kinect sensor Manual alphabet identifying system and method [P] Chinese invention patent, the number of applying for a patent: CN201410191394.3,2014.) it proposes Human body sign language action video is captured by Kinect camera, realizes that sign language is dynamic based on image procossing and mode identification method later It identifies.

The above method mostly uses mode identification method to design sign Language Recognition, and the core of pattern-recognition is from acquisition The feature with separating capacity is extracted in signal.However, often failing in across scene domain pattern recognition methods.Such as: The sign Language Recognition that office's training study finishes, discrimination can significantly reduce when using at home.This be substantially due to Caused by not accounting for across scene ability when sign language feature extraction.

In view of this, the present invention fights network by depth to extract sign language feature, it is ensured that this feature is dynamic in different sign languages Ga s safety degree between work and the consistency under different scenes, so that the sign Language Recognition using this method can be with Across scene work.The present invention will promote the performance of sign Language Recognition in actual operation, to effectively promote dysaudia Personage exchanges with ordinary people's.

Summary of the invention

The purpose of the present invention is overcoming the deficiencies of existing technologies, providing one kind can mention from the sign language of multiple scenes movement Take out healthy and strong and the sign language feature with significant separating capacity method.Compared with prior art, method of the invention will make The trained sign Language Recognition under a scene is obtained, good sign language still can be obtained when working under new scene and is known Other performance.

Technical solution of the present invention:

A kind of stalwartness sign language feature extracting method, this method mainly by based on depth convolutional coding structure feature extraction network, Scenery identification network based on full connection structure and the sorter network based on rarefaction representation structure, which cooperate, to be completed；Its job step It suddenly include the calculating of offline network parameter and two stages of online sign language feature extraction, offline network parameter calculation phase is by known Sign language act true value, realized by minimizing cost function and the solution of all-network parameter calculated, online sign language feature The information that the extraction stage acquires according to sensor currently entered extracts healthy and strong sign language feature, specific as follows:

1) offline network parameter calculation phase

(1.1) human body executes a certain known sign language movement under a certain known scene, and Sign Language Recognition sensor is by acquisition Sign language action message is input in the feature extraction network based on depth convolutional coding structure, the sign language that the output of feature extraction network is extracted Feature is simultaneously transmitted to the scenery identification network based on full connection structure and the sorter network based on rarefaction representation structure；

(1.2) depth is carried out to the sign language feature that feature extraction network extracts based on the scenery identification network of full connection structure Analysis, identifies and outputting sign language acts corresponding scene information；

(1.3) sorter network based on rarefaction representation structure carries out identification point to the sign language feature that feature extraction network extracts Class exports the sign language action classification identified；

(1.4) according to the sign language action scene information and sign language action classification that identify and known true sign language Action scene information and true sign language action classification calculate cost function；

(1.5) it is based on error backpropagation algorithm, is realized by minimizing cost function to based on depth convolutional coding structure The net of feature extraction network, the scenery identification network based on full connection structure and the sorter network based on rarefaction representation structure The solution of network parameter calculates；

(1.6) step (1.1) to (1.5) are repeated, until all-network parameter remains unchanged, offline network parameter calculates rank Section finishes.

2) online sign language feature extraction phases, human body execute a certain unknown sign language movement, sign language under a certain unknown scene For identification sensor by the information input of acquisition to the feature extraction network based on depth convolutional coding structure, the sign language for exporting extraction is special Sign.

The sensing data is the frequency time two of the signal amplitude acquired by wireless receiver, phase information composition Tie up matrix；

The feature extraction network based on depth convolutional coding structure includes 3-5 layers, and every layer is performed both by convolution, Chi Hua, non-thread Property three kinds of activation primitive operation；

The scenery identification network based on full connection structure includes 3 layers, and every layer uses full connection structure with next layer, And execute nonlinear activation operation；

The sorter network based on rarefaction representation structure includes input layer and output layer, and input layer and output layer are using complete Connection structure, and each output unit numberical range is limited between 0 to 1, meanwhile, sparse constraint is increased to output layer, really An only output unit is protected to be active；

The cost function is equal to sign language action classification evaluated error and subtracts scene information evaluated error, by using Adam Algorithm carries out minimum operation to cost function, solves by error backpropagation algorithm and calculates whole network parameter.

Beneficial effects of the present invention: can provide a kind of healthy and strong sign language feature extracting method, and this method fights net by depth Network extracts sign language feature, it is ensured that ga s safety degree of this feature between different sign languages movement and one under the different scenes Cause property, so that the sign Language Recognition using this method can across scene work.The present invention will promote sign Language Recognition Performance in practical more scene work, exchanges the condition of offer with the effective of ordinary people for dysaudia personage.

Detailed description of the invention

Fig. 1 is the system structure functional block diagram of the method for the present invention.

Specific embodiment

Specific implementation of the invention is specifically elaborated below with reference to technical solution and attached drawing.

Embodiment uses system structure shown in FIG. 1.System constitutes as follows: the feature extraction net based on depth convolutional coding structure Network is formed by 3 layers, and every layer is performed both by 3 × 3 convolution operation, the operation of 2 × 2 pondization and the activation primitive based on RELU；Base It is formed in the scenery identification network of full connection structure by 3 layers, every layer is used full connection structure with next layer, and swashed using RELU Function living carries out nonlinear operation；Sorter network based on rarefaction representation structure is formed by 2 layers, and input layer and output layer are using complete Connection structure, and each output unit numberical range is limited between 0 to 1, meanwhile, sparse constraint is increased to output layer, really An only output unit is protected to be active.Sign Language Recognition otherwise, is shared using radio identification under 5 kinds of scenes 30 kinds of sign languages movement, sensing data be signal amplitude acquire by wireless receiver, phase information composition 60 × 200 Frequency time two-dimensional matrix, the feature extraction network inputs frequency time two-dimensional matrix based on depth convolutional coding structure, output 64 × 1 Sign language feature；The sign language feature vector is input to the scenery identification network based on full connection structure, and the sign language of output 5 × 1 is dynamic Make scene vector；The sign language feature vector is input to the sorter network based on rarefaction representation structure, the sign language movement of output 30 × 1 Categorization vector.Offline network parameter calculation phase is solved using Adam algorithmic minimizing cost function and calculates network parameter；Online Sign language feature extraction phases, the frequency directly obtained using the feature extraction network based on depth convolutional coding structure from current sign language movement Sign language feature is extracted in rate time 2-D matrix.

Test shows in applying across scene, after healthy and strong sign language feature extracting method of the invention, Sign Language Recognition system The accuracy rate of system can be obviously improved.

Claims

1. a kind of stalwartness sign language feature extracting method, which is characterized in that this method is mainly by the feature based on depth convolutional coding structure Network, the scenery identification network based on full connection structure and the sorter network based on rarefaction representation structure is extracted to have cooperated At；Its work step includes that offline network parameter calculates and two stages of online sign language feature extraction, the calculating of offline network parameter Stage acts true value by known sign language, realizes that the solution to all-network parameter calculates by minimizing cost function, The information that online sign language feature extraction phases are acquired according to sensor currently entered, extracts healthy and strong sign language feature, specifically such as Under:

1) offline network parameter calculation phase

(1.1) human body executes a certain known sign language movement under a certain known scene, and Sign Language Recognition sensor is by the sign language of acquisition Action message is input in the feature extraction network based on depth convolutional coding structure, the sign language feature that the output of feature extraction network is extracted And it is transmitted to the scenery identification network based on full connection structure and the sorter network based on rarefaction representation structure；

(1.2) depth point is carried out to the sign language feature that feature extraction network extracts based on the scenery identification network of full connection structure Analysis, identifies and outputting sign language acts corresponding scene information；

(1.3) sorter network based on rarefaction representation structure carries out identification classification to the sign language feature that feature extraction network extracts, Export the sign language action classification identified；

(1.4) according to the sign language action scene information and sign language action classification and known true sign language movement identified Scene information and true sign language action classification calculate cost function；

(1.5) it is based on error backpropagation algorithm, is realized by minimizing cost function to the feature based on depth convolutional coding structure Extract the network ginseng of network, the scenery identification network based on full connection structure and the sorter network based on rarefaction representation structure Several solutions calculates；

(1.6) step (1.1) to (1.5) are repeated, until all-network parameter remains unchanged, offline network parameter calculation phase is complete Finish；

2) online sign language feature extraction phases, human body execute a certain unknown sign language movement, Sign Language Recognition under a certain unknown scene The information input of acquisition to the feature extraction network based on depth convolutional coding structure, is exported the sign language feature of extraction by sensor；

The sensing data is the frequency time Two-Dimensional Moment of the signal amplitude acquired by wireless receiver, phase information composition Battle array；

The feature extraction network based on depth convolutional coding structure includes 3-5 layers, and every layer is performed both by convolution, Chi Hua, non-linear swashs Three kinds of operations of function living；

The scenery identification network based on full connection structure includes 3 layers, and every layer uses full connection structure with next layer, and holds The operation of row nonlinear activation；

The sorter network based on rarefaction representation structure includes input layer and output layer, and input layer is used with output layer and connect entirely Structure, and each output unit numberical range is limited between 0 to 1, meanwhile, sparse constraint is increased to output layer, it is ensured that only There is an output unit to be active；

The cost function is equal to sign language action classification evaluated error and subtracts scene information evaluated error, by using Adam algorithm Minimum operation is carried out to cost function, is solved by error backpropagation algorithm and calculates whole network parameter.