CN110992783A - Sign language translation method and translation equipment based on machine learning - Google Patents

Sign language translation method and translation equipment based on machine learning Download PDF

Info

Publication number
CN110992783A
CN110992783A CN201911039201.1A CN201911039201A CN110992783A CN 110992783 A CN110992783 A CN 110992783A CN 201911039201 A CN201911039201 A CN 201911039201A CN 110992783 A CN110992783 A CN 110992783A
Authority
CN
China
Prior art keywords
sign language
information
hearing
impaired person
initial image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911039201.1A
Other languages
Chinese (zh)
Inventor
黄昌正
周言明
陈曦
王帅威
陈永乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dongguan Yilian Interation Information Technology Co ltd
Original Assignee
Dongguan Yilian Interation Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dongguan Yilian Interation Information Technology Co ltd filed Critical Dongguan Yilian Interation Information Technology Co ltd
Priority to CN201911039201.1A priority Critical patent/CN110992783A/en
Publication of CN110992783A publication Critical patent/CN110992783A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B21/00Teaching, or communicating with, the blind, deaf or mute
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language

Abstract

The embodiment of the invention relates to the technical field of machine learning, and discloses a sign language translation method and translation equipment based on machine learning, wherein the method comprises the following steps: controlling a depth camera to shoot an initial image; recognizing sign language information of the hearing-impaired person in the initial image by adopting a continuous gesture recognition framework; matching by adopting an algorithm matching model to obtain a plurality of character phrases corresponding to the sign language information; intelligently combining a plurality of character phrases into character sentences; and outputting the character sentence corresponding to the sign language information. Therefore, the sign language action made by the hearing-impaired person is accurately translated into the character information in real time, so that the ordinary people can understand the meaning of the sign language made by the hearing-impaired person, and the communication of the hearing-impaired person in the society is facilitated.

Description

Sign language translation method and translation equipment based on machine learning
Technical Field
The invention relates to the technical field of machine learning, in particular to a sign language translation method and translation equipment based on machine learning.
Background
Sign language as a visual language can assist the deaf-mutes to express self-thought, and establishes communication paths between the deaf-mutes and healthy people to help the deaf-mutes to integrate into the society.
However, unlike spoken languages such as chinese and english, sign language has a very low popularity in society, and usually only deaf-mutes and people engaged in related works can master sign language, and ordinary people have difficulty in knowing the actual meaning of sign language made by the deaf-mutes without ever contacting the sign language, and the deaf-mutes have great obstacles in communication in society; at present, although a plurality of translation devices exist in the market, the recognition rate is low and the use is inconvenient in a method of recognizing the hand language action through a stored hand language phrase database.
Disclosure of Invention
The embodiment of the invention discloses a sign language translation method and translation equipment based on machine learning, which can accurately translate sign language actions made by hearing-impaired people into character information in real time, so that ordinary people can understand the meaning of the sign language made by the hearing-impaired people, and the hearing-impaired people can communicate with each other socially conveniently.
The embodiment of the invention discloses a sign language translation method based on machine learning in a first aspect, which comprises the following steps:
controlling a depth camera to shoot an initial image;
recognizing sign language information of the hearing-impaired person in the initial image by adopting a continuous gesture recognition framework;
matching by adopting an algorithm matching model to obtain a plurality of character phrases corresponding to the sign language information;
intelligently combining the plurality of character phrases into character sentences;
and outputting the character sentence corresponding to the sign language information.
As an optional implementation manner, in the first aspect of the embodiment of the present invention, before the recognizing sign language information of the hearing impaired person in the initial image by using the continuous gesture recognition framework, the method further includes:
recognizing a face image of the hearing-impaired person in the initial image, and determining position information of the hearing-impaired person according to the face image;
detecting whether the hearing-impaired person carries out sign language expression or not according to the position information;
and if so, executing the step of adopting a continuous gesture recognition framework to recognize the sign language information of the hearing impaired person in the initial image.
As an optional implementation manner, in the first aspect of the embodiment of the present invention, the recognizing sign language information of the hearing impaired person in the initial image by using the continuous gesture recognition framework includes:
extracting a plurality of body posture information and a plurality of gesture information included in the initial image by adopting a two-dimensional convolution network to be used as static sign language information of the initial image;
extracting motion transformation information corresponding to each gesture information by adopting a three-dimensional convolution network to serve as the dynamic sign language information of the initial image;
and integrating the static sign language information and the dynamic sign language information to obtain the sign language information of the hearing-impaired people in the initial image.
As an optional implementation manner, in the first aspect of the embodiment of the present invention, after the recognizing sign language information of the hearing-impaired person in the initial image by using the continuous gesture recognition framework and before obtaining a plurality of text phrases corresponding to the sign language information by using algorithm matching model matching, the method further includes:
determining a regional characteristic matched with the sign language information;
and acquiring an algorithm matching model corresponding to the region characteristics.
As an optional implementation manner, in the first aspect of the embodiment of the present invention, the method further includes:
collecting audio information of a speaker;
identifying character information corresponding to the audio information;
processing the text information corresponding to the audio information into a plurality of text phrases;
matching by adopting an algorithm matching model to obtain sign language animations corresponding to the plurality of character phrases;
and outputting the sign language animation corresponding to the plurality of character phrases.
A second aspect of the embodiments of the present invention discloses a translation apparatus, including:
the shooting unit is used for controlling the depth camera to shoot an initial image;
the sign language recognition unit is used for recognizing sign language information of the hearing-impaired person in the initial image by adopting a continuous gesture recognition framework;
the phrase matching unit is used for matching by adopting an algorithm matching model to obtain a plurality of character phrases corresponding to the sign language information;
the phrase combination unit is used for intelligently combining the plurality of character phrases into character sentences;
and the character output unit is used for outputting the character sentences corresponding to the sign language information.
As an optional implementation manner, in the second aspect of the embodiment of the present invention, the translation apparatus further includes:
the face recognition unit is used for recognizing a face image of the hearing-impaired person in the initial image before the sign language recognition unit adopts a continuous gesture recognition framework to recognize sign language information of the hearing-impaired person in the initial image, and determining position information of the hearing-impaired person according to the face image;
the sign language detection unit is used for detecting whether the hearing-impaired person carries out sign language expression or not according to the position information;
the sign language identification unit is specifically configured to identify sign language information of the hearing-impaired person in the initial image by using a continuous gesture identification framework when the sign language detection unit detects that the hearing-impaired person performs sign language expression.
As an optional implementation manner, in a second aspect of the embodiment of the present invention, the sign language recognition unit includes:
the two-dimensional convolution subunit is used for extracting a plurality of body posture information and a plurality of gesture information included in the initial image by adopting a two-dimensional convolution network to be used as static sign language information of the initial image;
the three-dimensional convolution subunit is used for extracting action transformation information corresponding to each gesture information by adopting a three-dimensional convolution network, and the action transformation information is used as the dynamic sign language information of the initial image;
and the data synthesis subunit is used for synthesizing the static sign language information and the dynamic sign language information to obtain the sign language information of the hearing-impaired people in the initial image.
As an optional implementation manner, in the second aspect of the embodiment of the present invention, the translation apparatus further includes:
the feature recognition unit is used for recognizing regional features matched with the sign language information after the sign language recognition unit recognizes the sign language information of the hearing-impaired person in the initial image by adopting a continuous gesture recognition framework and before the phrase matching unit obtains a plurality of character phrases corresponding to the sign language information by adopting algorithm matching model matching;
and the model selection unit is used for selecting the algorithm matching model corresponding to the regional characteristics for matching the phrases.
As an optional implementation manner, in the second aspect of the embodiment of the present invention, the translation apparatus further includes:
the audio acquisition unit is used for acquiring the audio information of a speaker;
the audio conversion unit is used for identifying the character information corresponding to the audio information;
the word processing unit is used for processing the word information corresponding to the audio information into a plurality of word groups;
the sign language matching unit is used for matching by adopting an algorithm matching model to obtain sign language animations corresponding to the plurality of character phrases;
and the sign language output unit is used for outputting sign language animations corresponding to the plurality of character phrases.
A third aspect of the embodiments of the present invention discloses a translation apparatus, including:
a memory storing executable program code;
a processor coupled with the memory;
the processor calls the executable program code stored in the memory to execute part of the steps of the sign language translation method based on machine learning disclosed by the first aspect of the embodiment of the invention.
A fourth aspect of the embodiments of the present invention discloses a computer-readable storage medium storing a computer program, where the computer program enables a computer to execute all or part of the steps of the machine learning based sign language translation method disclosed in the first aspect of the embodiments of the present invention.
Compared with the prior art, the embodiment of the invention has the following beneficial effects:
in the embodiment of the invention, the depth camera is controlled to shoot an initial image; recognizing sign language information of the hearing-impaired person in the initial image by adopting a continuous gesture recognition framework; matching by adopting an algorithm matching model to obtain a plurality of character phrases corresponding to the sign language information; intelligently combining a plurality of character phrases into character sentences; and outputting the character sentence corresponding to the sign language information. Therefore, sign language information of the hearing-impaired people is recognized by adopting the continuous gesture recognition framework, and the sign language information is converted into text sentences for output, so that ordinary people can understand the meaning of the sign language made by the hearing-impaired people, and the communication of the hearing-impaired people in the society is facilitated.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flow chart diagram of a sign language translation method based on machine learning according to an embodiment of the present invention;
FIG. 2 is a flow chart of another method for machine learning-based sign language translation according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a translation apparatus according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of another translation apparatus disclosed in the embodiments of the present invention;
fig. 5 is a schematic structural diagram of another translation apparatus disclosed in the embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first", "second", "third" and "fourth" etc. in the description and claims of the present invention are used for distinguishing different objects, and are not used for describing a specific order. The terms "comprises," "comprising," and any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The embodiment of the invention discloses a sign language translation method and translation equipment based on machine learning, which can accurately translate sign language actions made by hearing-impaired people into character information in real time, so that ordinary people can understand the meaning of the sign language made by the hearing-impaired people, and the hearing-impaired people can communicate with each other socially conveniently.
Example one
Referring to fig. 1, as shown in fig. 1, a sign language translation method based on machine learning according to an embodiment of the present invention may include the following steps.
101. And controlling the depth camera to shoot an initial image.
In the embodiment of the invention, the translation equipment for translating the sign language is provided with a depth camera, and the depth camera is used for shooting a depth image including hearing-impaired people as an initial image; compared with a common camera, the depth image shot by the depth camera can acquire the depth information of the shot object, and the depth information comprises the position and size information of the shot object in the three-dimensional coordinate system.
102. And recognizing sign language information of the hearing-impaired person in the initial image by adopting a continuous gesture recognition framework.
In the embodiment of the invention, the sign language information of the hearing-impaired person in the initial image is recognized, wherein the sign language information comprises static sign language information such as body posture information and gesture information of the hearing-impaired person and dynamic sign language information when the hearing-impaired person changes gesture actions.
As an optional implementation manner, a two-dimensional convolution network is adopted to extract a plurality of body posture information and a plurality of gesture information included in an initial image, and the extracted body posture information and the plurality of gesture information are used as static sign language information of the initial image; extracting motion transformation information corresponding to each gesture information by adopting a three-dimensional convolution network, and taking the motion transformation information as dynamic sign language information of the initial image; and integrating the static sign language information and the dynamic sign language information to obtain the sign language information of the hearing-impaired person in the initial image. Specifically, the sign language information is recognized by adopting machine learning frames such as an LS-HAN continuous gesture recognition frame and the like, the traditional gesture recognition method needs to perform time segmentation on an initial image in advance to segment the initial image into a plurality of frame images, then performs gesture recognition on the images, and a large amount of time is consumed in the process; in addition, if the time division is not accurate, images of the hearing-impaired people when the gestures are changed can be divided, so that the gestures are misjudged, and the subsequent translation step is influenced; therefore, the LS-HAN continuous gesture recognition framework in the machine learning algorithm is adopted to continuously recognize gesture actions of the hearing-impaired person in the initial image, wherein a two-dimensional convolution network is adopted to extract a plurality of body posture information (sitting posture, standing posture, head posture and the like) and gesture information (positions of arms, palms and fingers) of the hearing-impaired person in the initial image to be used as static sign language information, a three-dimensional convolution network is adopted to extract action transformation information (transformation actions generated when the hearing-impaired person is transformed from the currently made static sign language to another static sign language) corresponding to each gesture information to be used as dynamic sign language information, and the static sign language information and the dynamic sign language information are further integrated to obtain the sign language information of the hearing-impaired person in the initial image. Therefore, the continuous gesture recognition framework does not need to perform tedious time division and frame-by-frame recognition work, the recognition speed of gesture actions is increased, and the conversion actions between each sign language action and the adjacent sign language actions can be clearly distinguished in the process of continuously recognizing the gesture actions, so that the recognition accuracy of the gesture actions is extremely high.
In the embodiment of the invention, the data volume of the depth image is huge, the requirements on the processing speed and the data bandwidth of the translation equipment are higher, and if the traditional processor is adopted to process the depth image, the real-time property of translating the sign language information into the character information cannot be ensured.
As an optional implementation manner, the embodiment of the present invention employs an NPU (embedded neural network processor) to process a depth image, and compared with a conventional processor, the NPU has a very high processing rate when processing a large amount of multimedia data, and can recognize a continuous depth image in real time.
103. And matching by adopting an algorithm matching model to obtain a plurality of character phrases corresponding to the sign language information.
In the embodiment of the invention, after a large amount of matched sign language information and character phrases are adopted for machine learning, the algorithm matching model can match the corresponding character phrases according to the characteristics of the static sign language information and the dynamic sign language information in the sign language information, so that the sign language information is primarily converted into the character information.
As an optional implementation manner, after recognizing sign language information of a hearing-impaired person in an initial image by using a continuous gesture recognition framework in step 102, and before obtaining a plurality of character phrases corresponding to the sign language information by using algorithm matching model matching in step 103, determining a region characteristic matched with the sign language information; acquiring an algorithm matching model corresponding to the region characteristics; specifically, sign language is the same as voiced language, and has different expression forms in different countries and different regions, for example, sign language action of "1 month" is expressed in the south, and "january" is expressed in the north, so that regional factors need to be considered in the sign language translation process to accurately translate sign language information into a matched text phrase; the embodiment of the invention is provided with a plurality of algorithm matching models corresponding to different regions, and after sign language information is obtained by adopting a continuous gesture recognition framework, the algorithm matching models corresponding to the region characteristics can be obtained according to the region characteristics in the sign language information, such as special unique gesture information in static sign language information or special action transformation information in dynamic sign language information. By selecting the algorithm matching model matched with the regional characteristics of the sign language information, the ambiguity of translated character phrases caused by the regional difference of the sign language can be avoided.
104. And intelligently combining a plurality of character phrases into character sentences.
In the embodiment of the invention, the translated word and phrase is relatively simple and does not conform to the common usage of common people.
As an alternative, the sign language is usually expressed by a plurality of simple phrases corresponding to gesture actions, as opposed to the numerous grammars and sentences of the vocal language, so that after a series of sign languages is translated into characters, a plurality of character phrases are obtained, rather than a sentence with precise words and strict sentence pattern. Therefore, the embodiment of the invention also intelligently combines a plurality of translated character phrases according to the use rule of sign language, and combines the plurality of character phrases into the character sentences which can be accurately understood by ordinary people only by adding prepositions and other modes among the character phrases.
105. And outputting the character sentence corresponding to the sign language information.
In the embodiment of the invention, the text sentences are output to the party communicating with the hearing impaired.
As an optional implementation manner, after the word phrases corresponding to the sign language information are obtained through matching, and the word phrases are intelligently combined into the word sentences, the word sentences are also output on a display medium such as a display screen of the translation equipment in real time, so that ordinary people communicating with the hearing-impaired people can understand the accurate meaning of the sign language made by the hearing-impaired people in real time, and the communication of the hearing-impaired people in the society is facilitated. It is understood that the text sentence may be output to the party communicating with the hearing impaired person in the form of audio or the like.
Therefore, by implementing the machine learning-based sign language translation method described in fig. 1, the sign language action made by the hearing-impaired person can be accurately translated into the text information in real time, so that the ordinary person can understand the meaning of the sign language made by the hearing-impaired person, and the hearing-impaired person can conveniently communicate with each other in the society.
Example two
Referring to fig. 2, fig. 2 is a diagram illustrating another method for machine learning-based sign language translation according to the present disclosure, which may include the following steps.
201. And controlling the depth camera to shoot an initial image.
202. And detecting whether the hearing-impaired person carries out sign language expression or not.
In the embodiment of the invention, the depth camera of the translation equipment shoots the depth image of the hearing impaired person towards the hearing impaired person, and preliminary positioning detection needs to be carried out on the hearing impaired person in order to ensure that the hearing impaired person is accurately positioned and the gesture action of the hearing impaired person is recognized.
As an optional implementation manner, before recognizing sign language information of a hearing-impaired person in an initial image by using a continuous gesture recognition framework, recognizing a face image of the hearing-impaired person in the initial image, and determining position information of the hearing-impaired person according to the face image; detecting whether the hearing-impaired person carries out sign language expression or not according to the position information; if so, the process goes to step 203. Specifically, during shooting of a depth image of a hearing-impaired person, other persons except the hearing-impaired person may exist in the depth image, and in order to accurately acquire and identify sign language information of the hearing-impaired person and avoid interference of actions of the other persons on a translation process of the hand language, it is necessary to detect the face and position information of the hearing-impaired person; the face data of the hearing-impaired person can be input into the translation equipment in advance, the face image of the hearing-impaired person is identified in the initial image through the face data, the trunk and the limbs of the hearing-impaired person are identified according to the identified face image of the hearing-impaired person, the position information of the hearing-impaired person is determined, and the positioning of the hearing-impaired person in the depth image is realized; then, according to the position information of the hearing-impaired person in the depth image, detecting whether the hearing-impaired person performs gesture actions by adopting action detection, and performing sign language expression; when the fact that the hearing-impaired person performs sign language expression is detected, the process goes to step 203, and a gesture continuing recognition framework is called to recognize sign language information. Therefore, the position information and the action of the hearing-impaired person are detected, so that the sign language information of the hearing-impaired person can be accurately acquired in the detection process; and the continuous gesture recognition framework is called to recognize after the action of sign language expression of the hearing-impaired person is detected, so that the interference action can be eliminated to trigger the misinterpretation, and the power consumption can be saved.
203. And recognizing sign language information of the hearing-impaired person in the initial image by adopting a continuous gesture recognition framework.
204. And matching by adopting an algorithm matching model to obtain a plurality of character phrases corresponding to the sign language information.
205. And intelligently combining a plurality of character phrases into character sentences.
206. And outputting the character sentence corresponding to the sign language information.
207. And translating the audio information of the speaker into sign language animation and outputting the sign language animation.
In the embodiment of the invention, the translation equipment can translate the sign language information of the hearing-impaired person into the text sentences and output the text sentences, and can also translate the audio information of the speaker into the sign language animation, thereby realizing the effect of bidirectional translation in the scene of communication between the hearing-impaired person and the common person.
As an alternative embodiment, audio information of the speaker is collected; identifying character information corresponding to the audio information; processing the text information corresponding to the audio information into a plurality of text phrases; matching by adopting an algorithm matching model to obtain sign language animations corresponding to a plurality of character phrases; outputting sign language animation corresponding to the plurality of character phrases. Specifically, the translation equipment can acquire sign language information of hearing-impaired people through the depth camera and translate the sign language information into corresponding text sentences for common people to check; on the contrary, the translation equipment can also collect the audio information of the speaker, translate the audio information into sign language animation which can be understood by the hearing-impaired person and output the sign language animation to the hearing-impaired person; the translation equipment collects audio information of a speaker, the processor identifies character information corresponding to the audio information, the character information is divided into a plurality of character phrases with definite ideograms, at the moment, the algorithm matching model is called to match sign language animations corresponding to the character phrases, the sign language animations are output to a display screen on one side of a hearing-impaired person on the translation equipment, and therefore the hearing-impaired person can check the sign language animations corresponding to the audio information in real time after the speaker speaks, and two-way barrier-free communication between the hearing-impaired person and a common person is achieved.
Therefore, by implementing the machine learning-based sign language translation method described in fig. 2, the position information and sign language information of the hearing-impaired person can be accurately identified, and the translation process is prevented from being interfered by irrelevant images; and the speaker who communicates with the hearing-impaired person can translate the audio information into sign language animation and output the sign language animation to the hearing-impaired person, so that the real-time two-way barrier-free communication between the hearing-impaired person and the common person is realized.
EXAMPLE III
Referring to fig. 3, fig. 3 is a schematic structural diagram of a translation device according to an embodiment of the present invention. The system may include:
a shooting unit 301, configured to control the depth camera to shoot an initial image;
a sign language recognition unit 302, configured to recognize sign language information of a hearing-impaired person in the initial image by using a continuous gesture recognition framework;
the phrase matching unit 303 is configured to match a plurality of character phrases corresponding to the sign language information by using an algorithm matching model;
a phrase combination unit 304, configured to intelligently combine a plurality of text phrases into text sentences;
a text output unit 305 for outputting text sentences corresponding to the sign language information;
the feature recognition unit 306 is configured to recognize a region feature matched with the sign language information after the sign language recognition unit 302 recognizes the sign language information of the hearing-impaired person in the initial image by using a continuous gesture recognition frame and before the phrase matching unit 303 matches a plurality of character phrases corresponding to the sign language information by using an algorithm matching model;
a model selecting unit 307, configured to select an algorithm matching model corresponding to the geographic feature for matching a phrase;
the sign language recognition unit 302 specifically includes:
a two-dimensional convolution subunit 3021, configured to extract, by using a two-dimensional convolution network, a plurality of body posture information and a plurality of gesture information included in the initial image, as static sign language information of the initial image;
the three-dimensional convolution subunit 3022 is configured to extract motion transformation information corresponding to each gesture information by using a three-dimensional convolution network, and use the motion transformation information as dynamic sign language information of the initial image;
and the data synthesis subunit 3023 is configured to synthesize the static sign language information and the dynamic sign language information to obtain the sign language information of the hearing-impaired person in the initial image.
In the embodiment of the present invention, the sign language recognition unit 302 is configured to recognize sign language information in the initial image captured by the capturing unit 301, and the phrase matching unit 303 and the phrase combining unit 304 translate the sign language information into text sentences and output the text sentences by the text output unit 305.
As an optional implementation manner, the two-dimensional convolution subunit 3021 extracts, by using a two-dimensional convolution network, a plurality of body posture information and a plurality of gesture information included in the initial image, as static sign language information of the initial image; the three-dimensional convolution subunit 3022 extracts the motion transformation information corresponding to each gesture information by using a three-dimensional convolution network, and uses the motion transformation information as the dynamic sign language information of the initial image; the data integrating subunit 3023 integrates the static sign language information and the dynamic sign language information to obtain the sign language information of the hearing-impaired person in the initial image. Specifically, the translation equipment adopts machine learning frames such as an LS-HAN continuous gesture recognition frame and the like to recognize sign language information, and the traditional gesture recognition method needs to perform time division on an initial image in advance to divide the initial image into a plurality of frames of images and then perform gesture recognition on the images, so that a great deal of time is consumed in the process; in addition, if the time division is not accurate, images of the hearing-impaired people when the gestures are changed can be divided, so that the gestures are misjudged, and the subsequent translation step is influenced; therefore, the sign language recognition unit 302 uses the LS-HAN continuous gesture recognition framework in the machine learning algorithm to continuously recognize the gesture actions made by the hearing impaired person in the initial image, wherein, the two-dimensional convolution subunit 3021 can extract a plurality of body posture information (sitting posture, standing posture, head posture, etc.) and gesture information (positions of arms, palms, fingers, etc.) of the hearing-impaired person in the initial image as static sign language information by using a two-dimensional convolution network, and the three-dimensional convolution subunit 3022 may extract motion transformation information (transformation motion generated when the hearing impaired person transforms from a static sign language currently made into another static sign language) corresponding to each gesture information as dynamic sign language information by using a three-dimensional convolution network, further, the data integrating subunit 3023 integrates the static sign language information and the dynamic sign language information to obtain the sign language information of the hearing-impaired person in the initial image. Therefore, the continuous gesture recognition framework does not need to perform tedious time division and frame-by-frame recognition work, the recognition speed of gesture actions is increased, and the conversion actions between each sign language action and the adjacent sign language actions can be clearly distinguished in the process of continuously recognizing the gesture actions, so that the recognition accuracy of the gesture actions is extremely high.
As an optional implementation manner, the sign language recognition unit 302 processes the depth image by using an NPU (embedded neural network processor), and compared with a conventional processor, the NPU has a very high processing rate when processing a large amount of multimedia data, and can recognize a continuous depth image in real time.
As an optional implementation manner, after the sign language recognition unit 302 recognizes sign language information of a hearing-impaired person in an initial image by using a continuous gesture recognition framework, and before the phrase matching unit 303 matches a plurality of character phrases corresponding to the sign language information by using an algorithm matching model, the feature recognition subunit 306 determines a region feature matched with the sign language information; the model selection subunit 307 acquires an algorithm matching model corresponding to the regional characteristics; specifically, sign language is the same as voiced language, and has different expression forms in different countries and different regions, for example, sign language action of "1 month" is expressed in the south, and "january" is expressed in the north, so that regional factors need to be considered in the sign language translation process to accurately translate sign language information into a matched text phrase; the translation device is provided with a plurality of algorithm matching models corresponding to different regions, and after the sign language information is obtained by adopting the continuous gesture recognition framework, the model selection subunit 307 can obtain the algorithm matching model corresponding to the region characteristics according to the region characteristics in the sign language information, such as special unique gesture information in static sign language information or special action transformation information in dynamic sign language information. By selecting the algorithm matching model matched with the regional characteristics of the sign language information, the ambiguity of translated character phrases caused by the regional difference of the sign language can be avoided.
As an alternative, the sign language is usually expressed by a plurality of simple phrases corresponding to gesture actions, as opposed to the numerous grammars and sentences of the vocal language, so that after a series of sign languages is translated into characters, a plurality of character phrases are obtained, rather than a sentence with precise words and strict sentence pattern. Therefore, the phrase combination unit 304 also intelligently combines the translated text phrases according to the usage rule of the sign language, and combines the text phrases into only text sentences that can be accurately understood by ordinary people by adding prepositions and the like between the text phrases.
As an optional implementation manner, after the word group matching unit 303 matches to obtain a text word group corresponding to the sign language information, and the word group combining unit 304 intelligently combines the text word group into a text sentence, the text output unit 305 outputs the text sentence on a display medium such as a display screen of the translation device in real time, so that a normal person communicating with the hearing impaired person can understand the accurate meaning of the sign language made by the hearing impaired person in real time, and the hearing impaired person can conveniently communicate with the society. It is understood that the text sentence may be output to the party communicating with the hearing impaired person in the form of audio or the like.
Therefore, the translation device described in fig. 3 can accurately translate the sign language action made by the hearing-impaired person into the character information in real time, so that the ordinary people can understand the meaning of the sign language made by the hearing-impaired person, and the hearing-impaired person can conveniently communicate with each other in the society.
Example four
Referring to fig. 4, fig. 4 is a schematic structural diagram of another translation device according to an embodiment of the present invention. The system further comprises:
a face recognition unit 308, configured to recognize a face image of the hearing-impaired person in the initial image before the sign language recognition unit 302 recognizes sign language information of the hearing-impaired person in the initial image by using a continuous gesture recognition framework, and determine position information of the hearing-impaired person according to the face image;
a sign language detecting unit 309 for detecting whether the hearing-impaired person performs sign language expression or not based on the position information;
the sign language identification unit 302 is specifically configured to identify sign language information of the hearing-impaired person in the initial image by using a continuous gesture identification framework when the sign language detection unit 309 detects that the hearing-impaired person performs sign language expression;
an audio collecting unit 310 for collecting audio information of a speaker;
the audio conversion unit 311 is configured to identify text information corresponding to the audio information;
a word processing unit 312, configured to process the word information corresponding to the audio information into a plurality of word phrases;
the sign language matching unit 313 is used for matching by adopting an algorithm matching model to obtain sign language animations corresponding to a plurality of character phrases;
and the sign language output unit 314 is configured to output a sign language animation corresponding to the plurality of text phrases.
In the embodiment of the present invention, the face recognition unit 308 is configured to determine position information of a hearing-impaired person according to a face image, the sign language detection unit 309 detects whether the hearing-impaired person performs sign language expression according to the position information, and when the hearing-impaired person performs sign language expression, the sign language recognition unit 302 is triggered to recognize sign language information of the hearing-impaired person; the audio acquisition unit 310 and the audio conversion unit 311 acquire and convert audio information into text information, and the text information is processed into corresponding sign language animation by the text processing unit 312 and the sign language matching unit 313 and output by the sign language output unit 314.
As an alternative implementation, before the sign language recognition unit 302 uses the continuous gesture recognition framework to recognize the sign language information of the hearing-impaired person in the initial image, the face recognition unit 308 recognizes the face image of the hearing-impaired person in the initial image, and determines the position information of the hearing-impaired person according to the face image; the sign language detecting unit 309 detects whether the hearing-impaired person performs sign language expression or not according to the position information; if yes, the sign language recognition unit 302 is triggered. Specifically, during shooting of a depth image of a hearing-impaired person, other persons except the hearing-impaired person may exist in the depth image, and in order to accurately acquire and identify sign language information of the hearing-impaired person and avoid interference of actions of the other persons on a translation process of the hand language, it is necessary to detect the face and position information of the hearing-impaired person; here, face data of the hearing-impaired person may be input in the translation device in advance, the face recognition unit 308 recognizes a face image of the hearing-impaired person in the initial image through the face data, and recognizes a trunk and limbs of the hearing-impaired person according to the recognized face image of the hearing-impaired person, so as to determine position information of the hearing-impaired person and realize positioning of the hearing-impaired person in the depth image; further, the sign language detecting unit 309 detects whether the hearing-impaired person performs a gesture action by using action detection according to the position information of the hearing-impaired person in the depth image, and performs sign language expression; when the fact that the hearing-impaired person expresses the sign language is detected, the sign language recognition unit 302 is triggered, and a gesture-continuing recognition framework is called to recognize sign language information. Therefore, the position information and the action of the hearing-impaired person are detected, so that the sign language information of the hearing-impaired person can be accurately acquired in the detection process; and the continuous gesture recognition framework is called to recognize after the action of sign language expression of the hearing-impaired person is detected, so that the interference action can be eliminated to trigger the misinterpretation, and the power consumption can be saved.
As an alternative embodiment, the audio collecting unit 310 collects audio information of a speaker; the audio conversion unit 311 identifies text information corresponding to the audio information; the word processing unit 312 processes the word information corresponding to the audio information into a plurality of word phrases; the sign language matching unit 313 matches the algorithm matching model to obtain sign language animations corresponding to the plurality of character phrases; the sign language output unit 314 outputs a sign language animation corresponding to a plurality of character phrases. Specifically, the audio acquisition unit 310 acquires audio information of a speaker, and the audio conversion unit 311, the word processing unit 312, and the sign language matching unit 313 translate the audio information into a sign language animation that can be understood by a hearing-impaired person and output the same to the hearing-impaired person by the sign language output unit 314; the audio acquisition unit 310 acquires audio information of a speaker, the audio conversion unit 311 identifies text information corresponding to the audio information, the text processing unit 312 divides the text information into a plurality of text phrases with clear ideograms, at the moment, the sign language matching unit 313 calls an algorithm matching model to match sign language animations corresponding to the text phrases, and the sign language output unit 314 outputs the sign language animations to a display screen on one side of a hearing-impaired person on the translation equipment, so that the hearing-impaired person can check the sign language animations corresponding to the audio information in real time after the speaker speaks, and two-way barrier-free communication between the hearing-impaired person and a common person is realized.
Therefore, by implementing the translation device described in fig. 4, the position information and sign language information of the hearing impaired person can be accurately identified, and the interference of irrelevant images on the translation process is avoided; and the speaker who communicates with the hearing-impaired person can translate the audio information into sign language animation and output the sign language animation to the hearing-impaired person, so that the real-time two-way barrier-free communication between the hearing-impaired person and the common person is realized.
EXAMPLE five
Referring to fig. 5, fig. 5 is a schematic structural diagram of another translation device according to an embodiment of the present invention. As shown in fig. 5, the translation apparatus may include:
a memory 501 in which executable program code is stored;
a processor 502 coupled to a memory 501;
the processor 502 calls the executable program code stored in the memory 501 to execute a part of the steps of any one of the sign language translation methods based on machine learning shown in fig. 1 to 2.
The embodiment of the invention discloses a computer-readable storage medium which stores a computer program, wherein the computer program enables a computer to execute all or part of the steps of any one of the machine learning-based sign language translation methods shown in the figures 1-2.
It will be understood by those skilled in the art that all or part of the steps in the methods of the embodiments described above may be implemented by instructions associated with a program, which may be stored in a computer-readable storage medium, where the storage medium includes Read-Only Memory (ROM), Random Access Memory (RAM), Programmable Read-Only Memory (PROM), Erasable Programmable Read-Only Memory (EPROM), One-time Programmable Read-Only Memory (OTPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), compact disc-Read-Only Memory (CD-ROM), or other Memory, magnetic disk, magnetic tape, or magnetic tape, Or any other medium which can be used to carry or store data and which can be read by a computer.
The machine learning-based sign language translation method and the translation device disclosed by the embodiment of the invention are described in detail, a specific example is applied in the text to explain the principle and the implementation of the invention, and the description of the embodiment is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (10)

1. A sign language translation method based on machine learning is characterized by comprising the following steps:
controlling a depth camera to shoot an initial image;
recognizing sign language information of the hearing-impaired person in the initial image by adopting a continuous gesture recognition framework;
matching by adopting an algorithm matching model to obtain a plurality of character phrases corresponding to the sign language information;
intelligently combining the plurality of character phrases into character sentences;
and outputting the character sentence corresponding to the sign language information.
2. The method of claim 1, wherein prior to said recognizing sign language information of the hearing impaired person in the initial image using the continuous gesture recognition framework, the method further comprises:
recognizing a face image of the hearing-impaired person in the initial image, and determining position information of the hearing-impaired person according to the face image;
detecting whether the hearing-impaired person carries out sign language expression or not according to the position information;
and if so, executing the step of adopting a continuous gesture recognition framework to recognize the sign language information of the hearing impaired person in the initial image.
3. The method of claim 1, wherein the recognizing sign language information of the hearing impaired person in the initial image by using the continuous gesture recognition framework comprises:
extracting a plurality of body posture information and a plurality of gesture information included in the initial image by adopting a two-dimensional convolution network to be used as static sign language information of the initial image;
extracting motion transformation information corresponding to each gesture information by adopting a three-dimensional convolution network to serve as the dynamic sign language information of the initial image;
and integrating the static sign language information and the dynamic sign language information to obtain the sign language information of the hearing-impaired people in the initial image.
4. The method according to claim 1, wherein after the recognizing sign language information of the hearing impaired person in the initial image by using the continuous gesture recognition framework and before obtaining a plurality of word phrases corresponding to the sign language information by using the algorithm matching model, the method further comprises:
determining a regional characteristic matched with the sign language information;
and acquiring an algorithm matching model corresponding to the region characteristics.
5. The method according to any one of claims 1 to 4, further comprising:
collecting audio information of a speaker;
identifying character information corresponding to the audio information;
processing the text information corresponding to the audio information into a plurality of text phrases;
matching by adopting an algorithm matching model to obtain sign language animations corresponding to the plurality of character phrases;
and outputting the sign language animation corresponding to the plurality of character phrases.
6. A translation apparatus, comprising:
the shooting unit is used for controlling the depth camera to shoot an initial image;
the sign language recognition unit is used for recognizing sign language information of the hearing-impaired person in the initial image by adopting a continuous gesture recognition framework;
the phrase matching unit is used for matching by adopting an algorithm matching model to obtain a plurality of character phrases corresponding to the sign language information;
the phrase combination unit is used for intelligently combining the plurality of character phrases into character sentences;
and the character output unit is used for outputting the character sentences corresponding to the sign language information.
7. The translation device according to claim 6, further comprising:
the face recognition unit is used for recognizing a face image of the hearing-impaired person in the initial image before the sign language recognition unit adopts a continuous gesture recognition framework to recognize sign language information of the hearing-impaired person in the initial image, and determining position information of the hearing-impaired person according to the face image;
the sign language detection unit is used for detecting whether the hearing-impaired person carries out sign language expression or not according to the position information;
the sign language identification unit is specifically configured to identify sign language information of the hearing-impaired person in the initial image by using a continuous gesture identification framework when the sign language detection unit detects that the hearing-impaired person performs sign language expression.
8. The translation apparatus according to claim 6, wherein the sign language recognition unit comprises:
the two-dimensional convolution subunit is used for extracting a plurality of body posture information and a plurality of gesture information included in the initial image by adopting a two-dimensional convolution network to be used as static sign language information of the initial image;
the three-dimensional convolution subunit is used for extracting action transformation information corresponding to each gesture information by adopting a three-dimensional convolution network, and the action transformation information is used as the dynamic sign language information of the initial image;
and the data synthesis subunit is used for synthesizing the static sign language information and the dynamic sign language information to obtain the sign language information of the hearing-impaired people in the initial image.
9. The translation device according to claim 6, further comprising:
the feature recognition unit is used for recognizing regional features matched with the sign language information after the sign language recognition unit recognizes the sign language information of the hearing-impaired person in the initial image by adopting a continuous gesture recognition framework and before the phrase matching unit obtains a plurality of character phrases corresponding to the sign language information by adopting algorithm matching model matching;
and the model selection unit is used for selecting the algorithm matching model corresponding to the regional characteristics for matching the phrases.
10. The translation apparatus according to any one of claims 6 to 9, further comprising:
the audio acquisition unit is used for acquiring the audio information of a speaker;
the audio conversion unit is used for identifying the character information corresponding to the audio information;
the word processing unit is used for processing the word information corresponding to the audio information into a plurality of word groups;
the sign language matching unit is used for matching by adopting an algorithm matching model to obtain sign language animations corresponding to the plurality of character phrases;
and the sign language output unit is used for outputting sign language animations corresponding to the plurality of character phrases.
CN201911039201.1A 2019-10-29 2019-10-29 Sign language translation method and translation equipment based on machine learning Pending CN110992783A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911039201.1A CN110992783A (en) 2019-10-29 2019-10-29 Sign language translation method and translation equipment based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911039201.1A CN110992783A (en) 2019-10-29 2019-10-29 Sign language translation method and translation equipment based on machine learning

Publications (1)

Publication Number Publication Date
CN110992783A true CN110992783A (en) 2020-04-10

Family

ID=70082541

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911039201.1A Pending CN110992783A (en) 2019-10-29 2019-10-29 Sign language translation method and translation equipment based on machine learning

Country Status (1)

Country Link
CN (1) CN110992783A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111488744A (en) * 2020-04-12 2020-08-04 北京花兰德科技咨询服务有限公司 Multi-modal language information AI translation method, system and terminal
CN112256827A (en) * 2020-10-20 2021-01-22 平安科技(深圳)有限公司 Sign language translation method and device, computer equipment and storage medium
CN113407034A (en) * 2021-07-09 2021-09-17 呜啦啦(广州)科技有限公司 Sign language inter-translation method and system
CN114120770A (en) * 2021-03-24 2022-03-01 张银合 Barrier-free communication method for hearing-impaired people
WO2022226919A1 (en) * 2021-04-29 2022-11-03 华为技术有限公司 Method for communicating with passenger, and related device
WO2023007213A1 (en) * 2021-07-27 2023-02-02 Telefonaktiebolaget Lm Ericsson (Publ) Translating sensory communications
CN116386149A (en) * 2023-06-05 2023-07-04 果不其然无障碍科技(苏州)有限公司 Sign language information processing method and system

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102236986A (en) * 2010-05-06 2011-11-09 鸿富锦精密工业(深圳)有限公司 Sign language translation system, device and method
CN107785017A (en) * 2016-08-24 2018-03-09 南京乐朋电子科技有限公司 A kind of interactive system based on Sign Language Recognition
KR101839244B1 (en) * 2016-12-13 2018-03-15 한밭대학교 산학협력단 Sigh language assisting system expressing feelings
CN107977611A (en) * 2017-11-20 2018-05-01 深圳天珑无线科技有限公司 Word conversion method, terminal and computer-readable recording medium
US20180129295A1 (en) * 2016-08-15 2018-05-10 Purple Communications, Inc. Gesture-based control and usage of video relay systems
CN108256458A (en) * 2018-01-04 2018-07-06 东北大学 A kind of two-way real-time translation system and method for deaf person's nature sign language
CN108877408A (en) * 2018-06-25 2018-11-23 贵州东仪医疗器械有限公司 Sign language translation device and method
CN108960126A (en) * 2018-06-29 2018-12-07 北京百度网讯科技有限公司 Method, apparatus, equipment and the system of sign language interpreter
CN108960158A (en) * 2018-07-09 2018-12-07 珠海格力电器股份有限公司 A kind of system and method for intelligent sign language translation
CN109637291A (en) * 2018-12-27 2019-04-16 深圳市赛亿科技开发有限公司 A kind of sign language interpretation method and system
CN109993130A (en) * 2019-04-04 2019-07-09 哈尔滨拓博科技有限公司 One kind being based on depth image dynamic sign language semantics recognition system and method
CN110008839A (en) * 2019-03-08 2019-07-12 西安研硕信息技术有限公司 A kind of intelligent sign language interactive system and method for adaptive gesture identification
CN110070065A (en) * 2019-04-30 2019-07-30 李冠津 The sign language systems and the means of communication of view-based access control model and speech-sound intelligent
CN110348420A (en) * 2019-07-18 2019-10-18 腾讯科技(深圳)有限公司 Sign Language Recognition Method, device, computer readable storage medium and computer equipment

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102236986A (en) * 2010-05-06 2011-11-09 鸿富锦精密工业(深圳)有限公司 Sign language translation system, device and method
US20180129295A1 (en) * 2016-08-15 2018-05-10 Purple Communications, Inc. Gesture-based control and usage of video relay systems
CN107785017A (en) * 2016-08-24 2018-03-09 南京乐朋电子科技有限公司 A kind of interactive system based on Sign Language Recognition
KR101839244B1 (en) * 2016-12-13 2018-03-15 한밭대학교 산학협력단 Sigh language assisting system expressing feelings
CN107977611A (en) * 2017-11-20 2018-05-01 深圳天珑无线科技有限公司 Word conversion method, terminal and computer-readable recording medium
CN108256458A (en) * 2018-01-04 2018-07-06 东北大学 A kind of two-way real-time translation system and method for deaf person's nature sign language
CN108877408A (en) * 2018-06-25 2018-11-23 贵州东仪医疗器械有限公司 Sign language translation device and method
CN108960126A (en) * 2018-06-29 2018-12-07 北京百度网讯科技有限公司 Method, apparatus, equipment and the system of sign language interpreter
CN108960158A (en) * 2018-07-09 2018-12-07 珠海格力电器股份有限公司 A kind of system and method for intelligent sign language translation
CN109637291A (en) * 2018-12-27 2019-04-16 深圳市赛亿科技开发有限公司 A kind of sign language interpretation method and system
CN110008839A (en) * 2019-03-08 2019-07-12 西安研硕信息技术有限公司 A kind of intelligent sign language interactive system and method for adaptive gesture identification
CN109993130A (en) * 2019-04-04 2019-07-09 哈尔滨拓博科技有限公司 One kind being based on depth image dynamic sign language semantics recognition system and method
CN110070065A (en) * 2019-04-30 2019-07-30 李冠津 The sign language systems and the means of communication of view-based access control model and speech-sound intelligent
CN110348420A (en) * 2019-07-18 2019-10-18 腾讯科技(深圳)有限公司 Sign Language Recognition Method, device, computer readable storage medium and computer equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JIE HUANG,WENGANG ZHOU,QILIN ZHANG,HOUQIANG LI,WEIPING LI: "《Video-based Sign Language Recognition without Temporal Segmentation》", 《32ND AAAI CONFERENCE ON ARTIFICAL INTELLIGENCE》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111488744A (en) * 2020-04-12 2020-08-04 北京花兰德科技咨询服务有限公司 Multi-modal language information AI translation method, system and terminal
CN112256827A (en) * 2020-10-20 2021-01-22 平安科技(深圳)有限公司 Sign language translation method and device, computer equipment and storage medium
CN114120770A (en) * 2021-03-24 2022-03-01 张银合 Barrier-free communication method for hearing-impaired people
WO2022226919A1 (en) * 2021-04-29 2022-11-03 华为技术有限公司 Method for communicating with passenger, and related device
CN113407034A (en) * 2021-07-09 2021-09-17 呜啦啦(广州)科技有限公司 Sign language inter-translation method and system
CN113407034B (en) * 2021-07-09 2023-05-26 呜啦啦(广州)科技有限公司 Sign language inter-translation method and system
WO2023007213A1 (en) * 2021-07-27 2023-02-02 Telefonaktiebolaget Lm Ericsson (Publ) Translating sensory communications
CN116386149A (en) * 2023-06-05 2023-07-04 果不其然无障碍科技(苏州)有限公司 Sign language information processing method and system
CN116386149B (en) * 2023-06-05 2023-08-22 果不其然无障碍科技(苏州)有限公司 Sign language information processing method and system

Similar Documents

Publication Publication Date Title
CN110992783A (en) Sign language translation method and translation equipment based on machine learning
US11847426B2 (en) Computer vision based sign language interpreter
US10692480B2 (en) System and method of reading environment sound enhancement based on image processing and semantic analysis
CN112088402A (en) Joint neural network for speaker recognition
WO2016150001A1 (en) Speech recognition method, device and computer storage medium
KR102167760B1 (en) Sign language analysis Algorithm System using Recognition of Sign Language Motion process and motion tracking pre-trained model
Madhuri et al. Vision-based sign language translation device
Kour et al. Sign language recognition using image processing
CN114401438A (en) Video generation method and device for virtual digital person, storage medium and terminal
TW201937344A (en) Smart robot and man-machine interaction method
CN110796101A (en) Face recognition method and system of embedded platform
CN109993130A (en) One kind being based on depth image dynamic sign language semantics recognition system and method
KR101187600B1 (en) Speech Recognition Device and Speech Recognition Method using 3D Real-time Lip Feature Point based on Stereo Camera
Shinde et al. Real time two way communication approach for hearing impaired and dumb person based on image processing
CN112749646A (en) Interactive point-reading system based on gesture recognition
Ivanko et al. Automatic lip-reading of hearing impaired people
CN113822187A (en) Sign language translation, customer service, communication method, device and readable medium
KR20190121593A (en) Sign language recognition system
JP7370050B2 (en) Lip reading device and method
KR20200001902A (en) Method and system for generating learning data of sign language recognition artificial neural network, and system for generating modified animation
Ivanko et al. A novel task-oriented approach toward automated lip-reading system implementation
KR102377767B1 (en) Handwriting and arm movement learning-based sign language translation system and method
Tang et al. Multimodal emotion recognition (MER) system
Mattos et al. Towards view-independent viseme recognition based on CNNs and synthetic data
Talea et al. Automatic visual speech segmentation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200410