CN110705339A - C-C3D-based sign language identification method - Google Patents

C-C3D-based sign language identification method Download PDF

Info

Publication number
CN110705339A
CN110705339A CN201910303476.5A CN201910303476A CN110705339A CN 110705339 A CN110705339 A CN 110705339A CN 201910303476 A CN201910303476 A CN 201910303476A CN 110705339 A CN110705339 A CN 110705339A
Authority
CN
China
Prior art keywords
network
time sequence
sub
candidate
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910303476.5A
Other languages
Chinese (zh)
Inventor
赵宏伟
张卫山
刘霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Petroleum East China
Original Assignee
China University of Petroleum East China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Petroleum East China filed Critical China University of Petroleum East China
Priority to CN201910303476.5A priority Critical patent/CN110705339A/en
Publication of CN110705339A publication Critical patent/CN110705339A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a gesture recognition method based on C-C3D, which is characterized in that a C3D network is taken as a main feature extraction network and is improved, a time sequence candidate box is defined through a pair of corner points, a variable-length three-dimensional convolution kernel is designed, and the gesture motion of a hand language is recognized in a time sequence candidate box classification regression subnetwork.

Description

C-C3D-based sign language identification method
Technical Field
The invention relates to the field of deep learning target detection and behavior recognition, in particular to a C-C3D-based sign language recognition method.
Background
A sign language recognition method based on C-C3D is based on deep learning target detection and behavior recognition technology. The closest techniques to the present invention are:
(1) and deep learning: with the rapid development of the deep learning technology, a new idea is provided for solving a plurality of problems in production. The convolutional neural network is powerful in that the multilayer structure of the convolutional neural network can automatically learn features, and can learn features of multiple layers: the sensing domain of the shallower convolutional layer is smaller, and the characteristics of some local regions are learned; deeper convolutional layers have larger perceptual domains and can learn more abstract features. These abstract features are less sensitive to the size, position, orientation, etc. of the object, thereby contributing to an improvement in recognition performance. The method has strong adaptability to factors such as geometric transformation, deformation and illumination of the target, and effectively overcomes the recognition resistance caused by variable target appearance. The method can automatically extract and analyze the features according to the data input into the network, and has higher universality generalization capability.
(2) C3D network: the C3D network may be used to extract spatio-temporal features of video. Based on the 3D convolution operation, the C3D network has 8 convolution operations, 4 pooling operations. The sizes of the convolution kernels are all 3 × 3 × 3, and the step size is 1 × 1 × 1. In order not to reduce the length in time sequence too early, the sizes of other pooling cores are 2 × 2 × 2, and the step size is 2 × 2 × 2, except that the first-layer pooling size and the step size are both 1 × 2 × 2. And finally, the network obtains a final output result after passing through the full connection layer and the softmax layer twice. The input size of the network is 3 × 16 × 112 × 112, i.e., 16 frame images are input at a time. Compared with a 2D network, the 3D network can better extract features, and can obtain better performance than most of the existing algorithms at present only by matching with a simple classifier. However, the C3D network can only process fixed-length video and can only classify video and cannot detect objects in the video.
In order to fully utilize the advantages of deep learning and make up for the defects of C3D in the sign language motion recognition direction, a three-dimensional convolutional neural network C-C3D based on corner points is designed by improving a C3D network, and a sign language recognition method based on C-C3D is provided.
Disclosure of Invention
In order to overcome the defects and shortcomings in the prior art, the invention designs a three-dimensional convolutional neural network C-C3D based on angular points, and provides a sign language identification method based on C-C3D.
The technical scheme of the invention is as follows:
step (1), in a feature extraction sub-network of C-C3D, taking a C3D network as a main network, taking a video with any length as input, and obtaining a feature map of an original video after a series of convolution, pooling and activation operations of the feature extraction network;
step (2), in a time sequence candidate frame extraction sub-network of the C-C3D, designing a length-variable three-dimensional convolution based on actual data, virtual data and service characteristics, determining the position of a candidate frame by adopting a pair of corner points, and extracting a candidate time sequence which possibly has a target;
step (3), the time sequence candidate frame classification regression sub-network of the C-C3D selects a candidate region from the time sequence candidate frame extraction sub-network, extracts a feature with a fixed size in the selected candidate region, and performs category judgment and time sequence frame regression on the candidate region on the basis of the feature;
and (4) combining the time sequence candidate frame extraction sub-network and the classification regression sub-network, combining the classification and the regression, and designing a loss function together.
The invention has the beneficial effects that:
(1) in the method, a region where a gesture possibly exists is determined through a pair of corner points in the extraction process of a candidate region, and a three-dimensional convolution with variable length is designed;
(2) a new action recognition network C-C3D is designed, and the video length with variable length can be processed and the meaning of the video gesture can be analyzed.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a simplified model diagram of a gesture recognition method based on C-C3D according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in FIG. 1, the sign language recognition method model diagram based on C-C3D is divided into three parts, namely a feature extraction sub-network, a candidate region generation sub-network and a candidate region classification regression sub-network.
The following describes a specific flow of the sign language recognition method based on C-C3D in detail:
step (1), in a feature extraction sub-network of C-C3D, taking a C3D network as a main network, taking a video with any length as input, and obtaining a feature map of an original video after a series of convolution, pooling and activation operations of the feature extraction network;
step (2), in a time sequence candidate frame extraction sub-network of the C-C3D, designing a length-variable three-dimensional convolution based on actual data, virtual data and service characteristics, determining the position of a candidate frame by adopting a pair of corner points, and extracting a candidate time sequence which possibly has a target;
step (3), the time sequence candidate frame classification regression sub-network of the C-C3D selects a candidate region from the time sequence candidate frame extraction sub-network, extracts a feature with a fixed size in the selected candidate region, and performs category judgment and time sequence frame regression on the candidate region on the basis of the feature;
and (4) combining the time sequence candidate frame extraction sub-network and the classification regression sub-network, combining the classification and the regression, and designing a loss function together.
The gesture recognition method based on C-C3D provided by the invention is characterized in that a C3D network is taken as a main feature extraction network, a time sequence candidate frame is defined through a pair of corner points, a variable-length three-dimensional convolution kernel is designed, and gesture actions are recognized in a time sequence candidate frame classification regression sub-network.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (1)

1. A gesture recognition method based on C-C3D is characterized in that a C3D network is used as a main feature extraction network, a time sequence candidate box is defined through a pair of corner points, a variable-length three-dimensional convolution kernel is designed, and gesture actions are recognized in a time sequence candidate box classification regression sub-network, and the method comprises the following steps:
step (1), in a feature extraction sub-network of C-C3D, taking a C3D network as a main network, taking a video with any length as input, and obtaining a feature map of an original video after a series of convolution, pooling and activation operations of the feature extraction network;
step (2), in a time sequence candidate frame extraction sub-network of the C-C3D, designing a length-variable three-dimensional convolution based on actual data, virtual data and service characteristics, determining the position of a candidate frame by adopting a pair of corner points, and extracting a candidate time sequence which possibly has a target;
step (3), the C-C3D time sequence candidate frame classification regression sub-network selects a candidate region from the time sequence candidate frame extraction sub-network, extracts a feature with a fixed size in the selected candidate region, and performs category judgment and time sequence frame regression on the candidate region on the basis of the feature;
and (4) combining the time sequence candidate frame extraction sub-network and the classification regression sub-network, and combining the classification and the regression to jointly form a loss function.
CN201910303476.5A 2019-04-15 2019-04-15 C-C3D-based sign language identification method Pending CN110705339A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910303476.5A CN110705339A (en) 2019-04-15 2019-04-15 C-C3D-based sign language identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910303476.5A CN110705339A (en) 2019-04-15 2019-04-15 C-C3D-based sign language identification method

Publications (1)

Publication Number Publication Date
CN110705339A true CN110705339A (en) 2020-01-17

Family

ID=69193119

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910303476.5A Pending CN110705339A (en) 2019-04-15 2019-04-15 C-C3D-based sign language identification method

Country Status (1)

Country Link
CN (1) CN110705339A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111178344A (en) * 2020-04-15 2020-05-19 中国人民解放军国防科技大学 Multi-scale time sequence behavior identification method
CN113255570A (en) * 2021-06-15 2021-08-13 成都考拉悠然科技有限公司 Sequential action detection method for sensing video clip relation

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104021373A (en) * 2014-05-27 2014-09-03 江苏大学 Semi-supervised speech feature variable factor decomposition method
CN108399380A (en) * 2018-02-12 2018-08-14 北京工业大学 A kind of video actions detection method based on Three dimensional convolution and Faster RCNN
US10110738B1 (en) * 2016-08-19 2018-10-23 Symantec Corporation Systems and methods for detecting illegitimate voice calls
CN109448307A (en) * 2018-11-12 2019-03-08 哈工大机器人(岳阳)军民融合研究院 A kind of recognition methods of fire disaster target and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104021373A (en) * 2014-05-27 2014-09-03 江苏大学 Semi-supervised speech feature variable factor decomposition method
US10110738B1 (en) * 2016-08-19 2018-10-23 Symantec Corporation Systems and methods for detecting illegitimate voice calls
CN108399380A (en) * 2018-02-12 2018-08-14 北京工业大学 A kind of video actions detection method based on Three dimensional convolution and Faster RCNN
CN109448307A (en) * 2018-11-12 2019-03-08 哈工大机器人(岳阳)军民融合研究院 A kind of recognition methods of fire disaster target and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HEI LAW ETC: ""CornerNet:Detecting Objects as Paired Keypoints"", 《ARXIV:1808.01244V2[CS.CV]》 *
HUIJUAN XU ETC: ""R-C3D: Region Convolutional 3D Network for Temporal Activity Detection"", 《PROCEEDINGS OF THE IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111178344A (en) * 2020-04-15 2020-05-19 中国人民解放军国防科技大学 Multi-scale time sequence behavior identification method
CN113255570A (en) * 2021-06-15 2021-08-13 成都考拉悠然科技有限公司 Sequential action detection method for sensing video clip relation
CN113255570B (en) * 2021-06-15 2021-09-24 成都考拉悠然科技有限公司 Sequential action detection method for sensing video clip relation

Similar Documents

Publication Publication Date Title
CN109741331B (en) Image foreground object segmentation method
JP2022548438A (en) Defect detection method and related apparatus, equipment, storage medium, and computer program product
CN111753828B (en) Natural scene horizontal character detection method based on deep convolutional neural network
CN110929593B (en) Real-time significance pedestrian detection method based on detail discrimination
CN108345892B (en) Method, device and equipment for detecting significance of stereo image and storage medium
CN109145872B (en) CFAR and Fast-RCNN fusion-based SAR image ship target detection method
CN106897673B (en) Retinex algorithm and convolutional neural network-based pedestrian re-identification method
CN106022375B (en) A kind of clothes fashion recognition methods based on HU not bending moment and support vector machines
CN106127196A (en) The classification of human face expression based on dynamic texture feature and recognition methods
CN107066916B (en) Scene semantic segmentation method based on deconvolution neural network
CN109543548A (en) A kind of face identification method, device and storage medium
CN110569782A (en) Target detection method based on deep learning
CN109299303B (en) Hand-drawn sketch retrieval method based on deformable convolution and depth network
CN109165658B (en) Strong negative sample underwater target detection method based on fast-RCNN
CN105931241A (en) Automatic marking method for natural scene image
CN110751619A (en) Insulator defect detection method
CN102136074B (en) Man-machine interface (MMI) based wood image texture analyzing and identifying method
CN105893941B (en) A kind of facial expression recognizing method based on area image
CN110705339A (en) C-C3D-based sign language identification method
Hu et al. RGB-D image multi-target detection method based on 3D DSF R-CNN
EP2790130A1 (en) Method for object recognition
CN116109678A (en) Method and system for tracking target based on context self-attention learning depth network
Song et al. Depth-aware saliency detection using discriminative saliency fusion
Van Hoai et al. Feeding Convolutional Neural Network by hand-crafted features based on Enhanced Neighbor-Center Different Image for color texture classification
CN104504715A (en) Image segmentation method based on local quaternion-moment characteristic

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200117