CN112668543A - Isolated word sign language recognition method based on hand model perception - Google Patents
Isolated word sign language recognition method based on hand model perception Download PDFInfo
- Publication number
- CN112668543A CN112668543A CN202110016997.XA CN202110016997A CN112668543A CN 112668543 A CN112668543 A CN 112668543A CN 202110016997 A CN202110016997 A CN 202110016997A CN 112668543 A CN112668543 A CN 112668543A
- Authority
- CN
- China
- Prior art keywords
- hand
- sequence
- model
- joint point
- sign language
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Image Analysis (AREA)
Abstract
The invention discloses an isolated word sign language recognition method based on hand model perception, which comprises the following steps: for a hand sequence intercepted from a sign language video, converting the hand sequence into a latent semantic representation containing hand states through a visual encoder; then, working in a model perception mode through a hand model perception decoder, mapping the latent semantic representation containing the hand state into a three-dimensional hand grid, and obtaining the position of each hand joint point; and finally, optimizing the three-dimensional hand grid through an inference module to obtain the space-time representation of each hand joint point, and classifying to identify the vocabulary corresponding to the hand sequence. The method can integrate model and data drive, introduce hand type prior, improve the identification accuracy of the system, visualize the intermediate result (namely three-dimensional hand grid) and enhance the interpretability of the framework.
Description
Technical Field
The invention relates to the technical field of sign language recognition, in particular to an isolated word sign language recognition method based on hand model perception.
Background
According to the data of world health organization WHO in 2020, about 4.66 million people worldwide have hearing impairment, which accounts for about 5% of the global population. In the hearing impaired population, the most common communication medium is sign language. Sign language is a visual language and has its unique linguistic characteristics. The semantic information is expressed by the aid of fine-grained non-manual control features (expressions, lip shapes and the like) mainly through manual control features (hand shapes, hand movements, positions and the like).
In order to solve the communication gap between listeners and deaf people, sign language recognition is carried out and widely researched. It converts the input sign language video into corresponding text by computer algorithm. Isolated word sign language recognition is the basic task among others, and recognizes an input sign language video as a vocabulary corresponding to the video. The general identification process is that firstly, a representation is extracted from an input sign language video, then the representation is transformed into a probability vector, and a category corresponding to the maximum probability is taken as an identification result.
The hand takes a dominant position in sign language notation, and occupies only a small spatial dimension, exhibiting highly articulated joints. The hand has a similar appearance and less locally discernable features than the body and face. In sign language video, the hand usually has motion blur and self-occlusion phenomena, and the background is complex.
Early work typically employed manually designed features to describe gestures. With the development of deep learning and hardware computing power in recent years, sign language recognition systems based on deep learning are gradually dominant. The method comprises the steps of extracting representation through a Convolutional Neural Network (CNN), converting the representation into probability vectors after passing through a full connection layer and a Softmax layer, and taking a category corresponding to the maximum probability as an identification result. In recent years, some work has been to pull out by hand as an additional secondary branch and achieve some performance gains. These methods based on deep learning are all performed in a data-driven paradigm where features are learned under the supervision of video category labels. However, the direct data-driven sign language recognition method has the following problems: limited interpretability; it is easy to overfit with limited training data. Due to the fact that labeling of sign language data requires professional knowledge, compared with an action recognition data set, samples of each category are fewer in the existing sign language data set, and therefore recognition accuracy of the existing scheme is still to be improved.
Disclosure of Invention
The invention aims to provide an isolated word sign language recognition method based on hand model perception, which can improve the recognition accuracy of a system and enhance the interpretability of a recognition framework.
The purpose of the invention is realized by the following technical scheme:
a sign language recognition method for isolated words based on hand model perception comprises the following steps:
for a hand sequence intercepted from a sign language video, converting the hand sequence into a latent semantic representation containing hand states through a visual encoder; then, working in a model perception mode through a hand model perception decoder, mapping the latent semantic representation containing the hand state into a three-dimensional hand grid, and obtaining the position of each hand joint point; and finally, optimizing the three-dimensional hand grid through an inference module to obtain the space-time representation of each hand joint point, and classifying to identify the vocabulary corresponding to the hand sequence.
The technical scheme provided by the invention can be seen that the model and data drive can be fused, the hand type prior is introduced, the identification accuracy of the system is improved, the intermediate result (namely the three-dimensional hand grid) can be visualized, and the frame interpretability is enhanced.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
Fig. 1 is a frame diagram of an isolated word sign language recognition method based on hand model perception according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
Aiming at the technical problems in the prior art, the embodiment of the invention provides a sign language identification method of an isolated word sensed by a hand model, which can be used for fusing a model and data drive, introducing a hand model prior, improving the identification accuracy of a system and enhancing the interpretability of the system, and as shown in fig. 1, the invention provides a frame diagram of the sign language identification method of the isolated word sensed by the hand model, and the main identification process comprises the following steps: for a hand sequence intercepted from a sign language video, converting the hand sequence into a latent semantic representation containing hand states through a visual encoder; then, working in a model-aware mode through a hand model aware decoder, mapping the latent semantic representation containing the hand state into a three-dimensional hand grid, and obtaining the position of each hand joint point; and finally, optimizing the three-dimensional hand grid through an inference module to obtain the space-time representation of each hand joint point, and classifying to identify the vocabulary corresponding to the hand sequence.
For ease of understanding, the various components of the recognition framework and the corresponding training and testing process are described in detail below in conjunction with the framework diagram shown in FIG. 1.
Firstly, a frame structure.
1. Visual Encoder (Visual Encoder).
In the embodiment of the invention, the input of the visual encoder is a sign sequence containing T frames and intercepted from sign language videoConverting the hand sequence V' to a latent semantic representation by a visual encoder, denoted as:
wherein E (-) denotes a visual encoder, vtRepresenting a hand image at time T, wherein T is the length of a hand sequence; theta and beta represent hand states, which are respectively the representation of hand postures and shapes; c. Cr、coAnd csRepresenting camera parameters c for indicating rotation, translation and zoom, respectively.
In the embodiment of the invention, the hand sequence V' is an RGB video hand sequence, the interception mode from the sign language video can be realized in a conventional mode, and the data sets involved in the training stage and the testing stage are the hand sequences intercepted from the sign language video.
Illustratively, the visual encoder may be implemented by connecting fully connected layers at the end of the ResNet.
2. Hand Model-aware Decoder (Model-aware Decoder).
Hand model-aware decoders attempt to implement a mapping from latent semantic feature vectors to compact pose representations in a model-aware approach. The hand model aware decoder constrains the distribution of possible gestures through a pre-coded hand model prior, implicitly filtering out unreasonable gestures during the mapping process. Finally, the method can generate a more concise and highly reliable hand gesture, and reduces the optimization difficulty for the terminal reasoning module.
In the embodiment of the present invention, the hand model aware decoder is a statistical module, and for example, a differentiable MANO hand model can be used as the hand model aware decoder.
Hand model sensing decoder capable of pre-utilizing large amount of high quality hand scanningLearning, the hand template can be obtained by learningIn this way, the hand prior is encoded. At the same time, a compact mapping can be established to describe the hands, i.e. from low-dimensional semantic vectors (implicit feature vectors) to high-dimensional triangular hand meshes (containing 778 nodes and 1,538 faces).
The mapping process of the hand model aware decoder is represented as:
M(β,θ)=W(T(β,θ),J(β),θ,W′)
where T (β, θ) represents the characterization of θ and β from hand pose and shape by a mixing function BS(. and B)P(. to hand formObtaining a correction result; w' is the mixing weight; j (β) is a representation of a hand shape comprising a plurality of hand joints provided by a hand model aware decoder; w (-) represents a bone skinning animation algorithm; m (β, θ) represents a three-dimensional hand Mesh (3D Mesh).
Meanwhile, the more concise three-dimensional hand Joint point (3D Joint) position can also be taken out from the linear interpolation of the hand grid related points. Considering that the MANO hand model provides only 16 hand joint points, 5 more fingertips can be extracted from the three-dimensional hand mesh, constituting 21 hand joint points.
The hand model aware decoder may exhibit intermediate results, i.e. a reconstructed three-dimensional hand mesh of the hand, enhancing the interpretability of the framework.
3. Inference Module (Inference Module).
There may be some unsatisfactory results for the three-dimensional pose sequence (consisting of three-dimensional hand joint point positions in the T hand images) predicted by the hand model aware decoder. An inference module is used to further optimize the spatiotemporal characterization of hand gestures. With further adaptive attention calculations, the inference module grabs the most critical cues and performs video-level classification.
The hand gesture sequence is a structured data and there are natural physical connections between the joints, which also allows it to be naturally organized into a space-time diagram. In embodiments of the invention, a popular graph-convolution neural network (GCN) is used that has proven to efficiently process graph structure data, after which video-level classification is performed by a classification output layer.
Recording the position sequence of the hand joint points output by the hand model perception decoder asThe corresponding undirected space-time diagram G (V, E) is defined by a point set V and an edge set E, wherein the point set V comprises all hand joint point positions, and the edge set E comprises intra-frame and inter-frame connections, namely the physical connections of the hand joint points and the connections of the same joint point along time; adjacency matrix obtained according to edge set EAnd the unit matrix I are used for a graph convolution neural network layer, and the process of graph convolution is represented as follows:
where k is the group to which the neighborhood node belongs, WkIs the weight of the convolution kernel and is,is decomposed into k sub-matrices, i.e.:each sub-matrix AkRepresenting the connection after disassembly, TkIs an intermediate variable used for calculating a matrix D, M is a weight, the matrix D is used for normalization, M and n are row and column numbers of the matrix D,is a Hadamard product symbol; the information of the hand joint points is transmitted between the edges, so that the space-time representation (including not only the position information but also certain semantic information) of each hand joint point is obtained; further, the Hadamard product is initialized to all 1 attention weights M and A at learnable initializationskTo help the network capture the discriminating clues.
In the embodiment of the invention, after the neural network layers are convolved by a plurality of stacked graphs, the classification output layers classify the graphs, so that the vocabulary corresponding to the hand sequence is recognized.
Secondly, training a model.
In the embodiment of the invention, the visual encoder, the hand model perception decoder and the reasoning module are used as a recognition model. Since sign language datasets do not have labeling of hand poses, in the Training phase (Training Stage), except for cross-entropy classification loss(Classification Loss), a corresponding Loss function (a weak supervision Loss function based on the space and time relation of the middle hand posture) is designed according to the output of each stage to guide the learning of the middle posture representation. In the training phase, the overall loss function of the recognition model is represented as:
wherein the content of the first and second substances,the cross-entropy classification penalty of the inference module is stated,andloss of spatial and temporal consistency in hand joint point locations derived by hand model aware decoderIn the light of the above-mentioned problems,regularization loss of a hand state in latent semantic representation obtained by a visual encoder; lambda [ alpha ]spa、λtemAnd lambdaregRespectively, the weighting factors for the corresponding losses.
In the training process, the training process can be realized in a conventional manner based on the parameters of the total loss function and the recognition model.
1. Regularization Loss (Regularization Loss).
To ensure that the hand model works reasonably and generates a reasonable hand mesh, regularization penalties are used to further constrain the amplitudes of some of the hidden features, regularization penaltiesExpressed as:
wherein, wβRepresenting a weighting factor.
2. Spatial Consistency Loss (Spatial Consistency Loss).
In the embodiment of the invention, based on a weak perspective camera model, a camera parameter is output by combining a visual encoder, and a three-dimensional attitude sequence predicted by a hand model perception decoder is mapped to a two-dimensional space; the mapping process is represented as:
wherein Π (-) represents an orthogonal projection,representing hand model aware decoder output hand joint point position sequence using camera parametersAnd mapping the position sequence to a two-dimensional space.
Meanwhile, a two-dimensional position sequence J of Hand joint points (2D Joints) extracted from the Hand sequence by a two-dimensional gesture Detector (2D Hand position Detector) is utilized in advance2DAnd using it as pseudo label to constrain it and mapping resultTowards consistency.
where N is the total number of hand joint points (e.g., N ═ 21); t is the length of the hand sequence; (t, j) represents the jth hand joint point at time t; c (t, j) represents the confidence coefficient of the j th hand joint point position at the t moment extracted in advance, and if the confidence coefficient c (t, j) is greater than or equal to the threshold epsilon, the hand joint point position participates in the space consistency lossOtherwise, the calculation is not participated in;the indication function is represented.
3. Loss of Temporal Consistency (Temporal Consistency Loss).
To avoid prediction of jitter, the temporal consistency of the predicted three-dimensional joint points is further constrained. In the process of sign language, different hand joint points usually have different moving speeds, namely joints closer to the palm usually have lower speeds. Thus, the hand joint points are grouped into three groups, { SiI ═ 0,1,2}, corresponding to the palm, mid, and terminal joint sets, respectively.
wherein the content of the first and second substances,representing the output hand joint point position sequence of the hand model perception decoder, (t, j) representing the jth hand joint point at the t moment, SiA set of hand joint points; alpha is alphaiRefer to for set SiPredefined penalty weights will be given greater penalty weights for sets with slower motion speeds.
And thirdly, testing.
The Testing Stage (Testing Stage) is the same as the main flow of the training Stage, and the main difference is that the Testing Stage does not need to use camera parameters and calculate loss. The main flow of the test stage is as follows: inputting the extracted hand video sequence, obtaining the latent semantic representation of the hand state through a visual encoder, obtaining a corresponding three-dimensional hand grid through a hand model perception decoder, and finally optimizing through an inference module to obtain the space-time representation of each hand joint point, thereby classifying the video level and outputting a corresponding vocabulary.
As shown in the right side of fig. 1, for some hand images, the output layer is classified by the inference module to obtain probabilities corresponding to different words, and the category corresponding to the maximum probability is selected.
According to the scheme of the embodiment of the invention, model and data drive can be fused, hand type prior is introduced, the identification accuracy of the system is improved, the intermediate result can be visualized, and the frame interpretability is enhanced.
Through the above description of the embodiments, it is clear to those skilled in the art that the above embodiments can be implemented by software, and can also be implemented by software plus a necessary general hardware platform. With this understanding, the technical solutions of the embodiments can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions for enabling a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments of the present invention.
It will be clear to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely used as an example, and in practical applications, the above function distribution may be performed by different functional modules according to needs, that is, the internal structure of the system is divided into different functional modules to perform all or part of the above described functions.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (8)
1. A sign language recognition method for isolated words sensed by a hand model is characterized by comprising the following steps:
for a hand sequence intercepted from a sign language video, converting the hand sequence into a latent semantic representation containing hand states through a visual encoder; then, working in a model perception mode through a hand model perception decoder, mapping the latent semantic representation containing the hand state into a three-dimensional hand grid, and obtaining the position of each hand joint point; and finally, optimizing the three-dimensional hand grid through an inference module to obtain the space-time representation of each hand joint point, and classifying to identify the vocabulary corresponding to the hand sequence.
2. The method of claim 1, wherein the input of the visual encoder is a sign sequence intercepted from a sign language videoConverting the hand sequence V' to a latent semantic representation by a visual encoder, denoted as:
wherein E (-) denotes a visual encoder, vtRepresenting a hand image at time T, wherein T is the length of a hand sequence; theta and beta represent hand states, which are respectively the representation of hand postures and shapes; c. Cr、coAnd csRepresenting camera parameters indicating rotation, translation and zoom, respectively.
3. The method of claim 1, wherein the recognition of isolated words and sign language by hand model perception,
the hand model perception decoder is a statistical module, and uses hand scanning data learning in advance, and the mapping process is expressed as:
M(β,θ)=W(T(β,θ),J(β),θ,W′)
where T (β, θ) represents the characterization of θ and β from hand pose and shape by a mixing function BS(. and B)P(. to a previously learned hand templateThe obtained correction result, the representation theta and beta of the hand posture and the shape represent the hand state; w' is the mixing weight; w (-) represents a bone skinning animation algorithm; m (β, θ) represents a three-dimensional hand mesh; j (β) is a representation of a hand shape comprising a plurality of hand joints provided by a hand model aware decoder;
and obtaining the position of a hand joint point through a three-dimensional hand grid M (beta, theta), wherein the hand joint point comprises a plurality of hand joints and 5 fingertip points.
4. The method according to claim 1, wherein the inference module comprises a convolutional neural network layer and a classification output layer;
recording the position sequence of the hand joint points output by the hand model perception decoder asThe corresponding undirected space-time diagram G (V, E) is defined by a point set V and an edge set E, wherein the point set V comprises all hand joint point positions, and the edge set E comprises intra-frame and inter-frame connections, namely the physical connections of the hand joint points and the connections of the same joint point along time; adjacency matrix obtained according to edge set EAnd the unit matrix I are used for a graph convolution neural network layer, and the process of graph convolution is represented as follows:
where k is the group to which the neighborhood node belongs, WkIs the weight of the convolution kernel and is,is decomposed into k sub-matrices, i.e.:each sub-matrix AkRepresenting the connection after disassembly, TkIs an intermediate variable used for calculating a matrix D, M is a weight, the matrix D is used for normalization, M and n are row and column numbers of the matrix D,is a Hadamard product symbol;
transmitting the information of the hand joint points between the edges so as to obtain the space-time representation of each hand joint point;
after the neural network layers are convolved by a plurality of stacked graphs, the graphs are classified by a classification output layer, so that words corresponding to the hand sequences are recognized.
5. The sign language recognition method for isolated words perceived by a hand model according to any one of claims 1 to 4, wherein a visual encoder, a hand model perception decoder and an inference module are used as a recognition model, and in a training phase, a total loss function of the recognition model is expressed as:
wherein the content of the first and second substances,represents the cross-entropy classification penalty of the inference module,andrepresenting the loss of spatial and temporal consistency of the hand joint point positions obtained by the hand model aware decoder,regularization loss of a hand state in latent semantic representation obtained by a visual encoder; lambda [ alpha ]spa、λtemAnd lambdaregRespectively, the weighting factors for the corresponding losses.
7. The method of any of claims 5, wherein spatial consistency is lost by a loss of isolated word sign languageExpressed as:
wherein N is the total number of the hand joint points; t is the length of the hand sequence;representing hand model aware decoder output hand joint point position sequence using camera parametersA sequence of positions mapped to a two-dimensional space; j. the design is a square2DRepresenting a two-dimensional sequence of hand joint points, J, extracted in advance from the hand sequence2DAs a pseudo tag; (t, j) represents the jth hand joint point at time t; c (t, j) represents the confidence coefficient of the j th hand joint point position at the t moment extracted in advance, and if the confidence coefficient c (t, j) is greater than or equal to the threshold epsilon, the hand joint point position participates in the space consistency lossCalculating (1);representing an indicator function;
hand model perception decoder output hand joint point position using camera parameterThe process of mapping to two-dimensional space is represented as:
wherein n (·) represents an orthogonal projection, cr、coAnd csRepresenting camera parameters indicating rotation, translation and zoom, respectively.
8. A method as claimed in claim 5, wherein temporal consistency is lost by a loss of isolated word sign languageExpressed as:
wherein the content of the first and second substances,representing the output hand joint point position sequence of the hand model perception decoder, (t, j) representing the jth hand joint point at the t moment, SiSet of hand joint points, { Si-i-0, 1,2 corresponds to the palm, middle and end joint sets, respectively; alpha is alphaiRefer to for set SiSet predefined penalty weights.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110016997.XA CN112668543B (en) | 2021-01-07 | 2021-01-07 | Isolated word sign language recognition method based on hand model perception |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110016997.XA CN112668543B (en) | 2021-01-07 | 2021-01-07 | Isolated word sign language recognition method based on hand model perception |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112668543A true CN112668543A (en) | 2021-04-16 |
CN112668543B CN112668543B (en) | 2022-07-15 |
Family
ID=75413421
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110016997.XA Active CN112668543B (en) | 2021-01-07 | 2021-01-07 | Isolated word sign language recognition method based on hand model perception |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112668543B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113239835A (en) * | 2021-05-20 | 2021-08-10 | 中国科学技术大学 | Model-aware gesture migration method |
CN113239834A (en) * | 2021-05-20 | 2021-08-10 | 中国科学技术大学 | Sign language recognition system capable of pre-training sign model perception representation |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7418390B1 (en) * | 2000-11-20 | 2008-08-26 | Yahoo! Inc. | Multi-language system for online communications |
CN111145865A (en) * | 2019-12-26 | 2020-05-12 | 中国科学院合肥物质科学研究院 | Vision-based hand fine motion training guidance system and method |
CN111325099A (en) * | 2020-01-21 | 2020-06-23 | 南京邮电大学 | Sign language identification method and system based on double-current space-time diagram convolutional neural network |
CN111832468A (en) * | 2020-07-09 | 2020-10-27 | 平安科技(深圳)有限公司 | Gesture recognition method and device based on biological recognition, computer equipment and medium |
-
2021
- 2021-01-07 CN CN202110016997.XA patent/CN112668543B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7418390B1 (en) * | 2000-11-20 | 2008-08-26 | Yahoo! Inc. | Multi-language system for online communications |
CN111145865A (en) * | 2019-12-26 | 2020-05-12 | 中国科学院合肥物质科学研究院 | Vision-based hand fine motion training guidance system and method |
CN111325099A (en) * | 2020-01-21 | 2020-06-23 | 南京邮电大学 | Sign language identification method and system based on double-current space-time diagram convolutional neural network |
CN111832468A (en) * | 2020-07-09 | 2020-10-27 | 平安科技(深圳)有限公司 | Gesture recognition method and device based on biological recognition, computer equipment and medium |
Non-Patent Citations (2)
Title |
---|
TAORAN LE,AND ETC: "A novel chipless RFID-based stretchable and wearable hand gesture sensor", 《2015 EUROPEAN MICROWAVE CONFERENCE (EUMC)》 * |
李国友等: "基于Kinect的动态手势识别算法改进与实现", 《高技术通讯》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113239835A (en) * | 2021-05-20 | 2021-08-10 | 中国科学技术大学 | Model-aware gesture migration method |
CN113239834A (en) * | 2021-05-20 | 2021-08-10 | 中国科学技术大学 | Sign language recognition system capable of pre-training sign model perception representation |
CN113239835B (en) * | 2021-05-20 | 2022-07-15 | 中国科学技术大学 | Model-aware gesture migration method |
CN113239834B (en) * | 2021-05-20 | 2022-07-15 | 中国科学技术大学 | Sign language recognition system capable of pre-training sign model perception representation |
Also Published As
Publication number | Publication date |
---|---|
CN112668543B (en) | 2022-07-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Liu et al. | Towards natural and accurate future motion prediction of humans and animals | |
Zellinger et al. | Robust unsupervised domain adaptation for neural networks via moment alignment | |
Niculae et al. | A regularized framework for sparse and structured neural attention | |
CN107578014B (en) | Information processing apparatus and method | |
CN109948475B (en) | Human body action recognition method based on skeleton features and deep learning | |
CN110378208B (en) | Behavior identification method based on deep residual error network | |
CN111368993A (en) | Data processing method and related equipment | |
CN111539941B (en) | Parkinson's disease leg flexibility task evaluation method and system, storage medium and terminal | |
CN112561064A (en) | Knowledge base completion method based on OWKBC model | |
CN112668543B (en) | Isolated word sign language recognition method based on hand model perception | |
CN110321805B (en) | Dynamic expression recognition method based on time sequence relation reasoning | |
CN110968235B (en) | Signal processing device and related product | |
Halvardsson et al. | Interpretation of swedish sign language using convolutional neural networks and transfer learning | |
Irfan et al. | Enhancing learning classifier systems through convolutional autoencoder to classify underwater images | |
CN113780059A (en) | Continuous sign language identification method based on multiple feature points | |
Kwolek et al. | Recognition of JSL fingerspelling using deep convolutional neural networks | |
CN113436224B (en) | Intelligent image clipping method and device based on explicit composition rule modeling | |
CN114241606A (en) | Character interaction detection method based on adaptive set learning prediction | |
Jiang et al. | Cross-level reinforced attention network for person re-identification | |
CN114882493A (en) | Three-dimensional hand posture estimation and recognition method based on image sequence | |
CN109409246B (en) | Sparse coding-based accelerated robust feature bimodal gesture intention understanding method | |
Chen et al. | MSTP-net: Multiscale spatio-temporal parallel networks for human motion prediction | |
CN114581829A (en) | Continuous sign language identification method based on reinforcement learning, electronic equipment and storage medium | |
CN110555401B (en) | Self-adaptive emotion expression system and method based on expression recognition | |
Yu et al. | Multi-activity 3D human motion recognition and tracking in composite motion model with synthesized transition bridges |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |