CN110992783A

CN110992783A - Sign language translation method and translation equipment based on machine learning

Info

Publication number: CN110992783A
Application number: CN201911039201.1A
Authority: CN
Inventors: 黄昌正; 周言明; 陈曦; 王帅威; 陈永乐
Original assignee: Dongguan Yilian Interation Information Technology Co ltd
Current assignee: Dongguan Yilian Interation Information Technology Co ltd
Priority date: 2019-10-29
Filing date: 2019-10-29
Publication date: 2020-04-10

Abstract

The embodiment of the invention relates to the technical field of machine learning, and discloses a sign language translation method and translation equipment based on machine learning, wherein the method comprises the following steps: controlling a depth camera to shoot an initial image; recognizing sign language information of the hearing-impaired person in the initial image by adopting a continuous gesture recognition framework; matching by adopting an algorithm matching model to obtain a plurality of character phrases corresponding to the sign language information; intelligently combining a plurality of character phrases into character sentences; and outputting the character sentence corresponding to the sign language information. Therefore, the sign language action made by the hearing-impaired person is accurately translated into the character information in real time, so that the ordinary people can understand the meaning of the sign language made by the hearing-impaired person, and the communication of the hearing-impaired person in the society is facilitated.

Description

Sign language translation method and translation equipment based on machine learning

Technical Field

The invention relates to the technical field of machine learning, in particular to a sign language translation method and translation equipment based on machine learning.

Background

Sign language as a visual language can assist the deaf-mutes to express self-thought, and establishes communication paths between the deaf-mutes and healthy people to help the deaf-mutes to integrate into the society.

However, unlike spoken languages such as chinese and english, sign language has a very low popularity in society, and usually only deaf-mutes and people engaged in related works can master sign language, and ordinary people have difficulty in knowing the actual meaning of sign language made by the deaf-mutes without ever contacting the sign language, and the deaf-mutes have great obstacles in communication in society; at present, although a plurality of translation devices exist in the market, the recognition rate is low and the use is inconvenient in a method of recognizing the hand language action through a stored hand language phrase database.

Disclosure of Invention

The embodiment of the invention discloses a sign language translation method and translation equipment based on machine learning, which can accurately translate sign language actions made by hearing-impaired people into character information in real time, so that ordinary people can understand the meaning of the sign language made by the hearing-impaired people, and the hearing-impaired people can communicate with each other socially conveniently.

The embodiment of the invention discloses a sign language translation method based on machine learning in a first aspect, which comprises the following steps:

controlling a depth camera to shoot an initial image;

recognizing sign language information of the hearing-impaired person in the initial image by adopting a continuous gesture recognition framework;

matching by adopting an algorithm matching model to obtain a plurality of character phrases corresponding to the sign language information;

intelligently combining the plurality of character phrases into character sentences;

and outputting the character sentence corresponding to the sign language information.

As an optional implementation manner, in the first aspect of the embodiment of the present invention, before the recognizing sign language information of the hearing impaired person in the initial image by using the continuous gesture recognition framework, the method further includes:

recognizing a face image of the hearing-impaired person in the initial image, and determining position information of the hearing-impaired person according to the face image;

detecting whether the hearing-impaired person carries out sign language expression or not according to the position information;

and if so, executing the step of adopting a continuous gesture recognition framework to recognize the sign language information of the hearing impaired person in the initial image.

As an optional implementation manner, in the first aspect of the embodiment of the present invention, the recognizing sign language information of the hearing impaired person in the initial image by using the continuous gesture recognition framework includes:

extracting a plurality of body posture information and a plurality of gesture information included in the initial image by adopting a two-dimensional convolution network to be used as static sign language information of the initial image;

extracting motion transformation information corresponding to each gesture information by adopting a three-dimensional convolution network to serve as the dynamic sign language information of the initial image;

and integrating the static sign language information and the dynamic sign language information to obtain the sign language information of the hearing-impaired people in the initial image.

As an optional implementation manner, in the first aspect of the embodiment of the present invention, after the recognizing sign language information of the hearing-impaired person in the initial image by using the continuous gesture recognition framework and before obtaining a plurality of text phrases corresponding to the sign language information by using algorithm matching model matching, the method further includes:

determining a regional characteristic matched with the sign language information;

and acquiring an algorithm matching model corresponding to the region characteristics.

As an optional implementation manner, in the first aspect of the embodiment of the present invention, the method further includes:

collecting audio information of a speaker;

identifying character information corresponding to the audio information;

processing the text information corresponding to the audio information into a plurality of text phrases;

matching by adopting an algorithm matching model to obtain sign language animations corresponding to the plurality of character phrases;

and outputting the sign language animation corresponding to the plurality of character phrases.

A second aspect of the embodiments of the present invention discloses a translation apparatus, including:

the shooting unit is used for controlling the depth camera to shoot an initial image;

the sign language recognition unit is used for recognizing sign language information of the hearing-impaired person in the initial image by adopting a continuous gesture recognition framework;

the phrase matching unit is used for matching by adopting an algorithm matching model to obtain a plurality of character phrases corresponding to the sign language information;

the phrase combination unit is used for intelligently combining the plurality of character phrases into character sentences;

and the character output unit is used for outputting the character sentences corresponding to the sign language information.

As an optional implementation manner, in the second aspect of the embodiment of the present invention, the translation apparatus further includes:

the face recognition unit is used for recognizing a face image of the hearing-impaired person in the initial image before the sign language recognition unit adopts a continuous gesture recognition framework to recognize sign language information of the hearing-impaired person in the initial image, and determining position information of the hearing-impaired person according to the face image;

the sign language detection unit is used for detecting whether the hearing-impaired person carries out sign language expression or not according to the position information;

the sign language identification unit is specifically configured to identify sign language information of the hearing-impaired person in the initial image by using a continuous gesture identification framework when the sign language detection unit detects that the hearing-impaired person performs sign language expression.

As an optional implementation manner, in a second aspect of the embodiment of the present invention, the sign language recognition unit includes:

the two-dimensional convolution subunit is used for extracting a plurality of body posture information and a plurality of gesture information included in the initial image by adopting a two-dimensional convolution network to be used as static sign language information of the initial image;

the three-dimensional convolution subunit is used for extracting action transformation information corresponding to each gesture information by adopting a three-dimensional convolution network, and the action transformation information is used as the dynamic sign language information of the initial image;

and the data synthesis subunit is used for synthesizing the static sign language information and the dynamic sign language information to obtain the sign language information of the hearing-impaired people in the initial image.

the feature recognition unit is used for recognizing regional features matched with the sign language information after the sign language recognition unit recognizes the sign language information of the hearing-impaired person in the initial image by adopting a continuous gesture recognition framework and before the phrase matching unit obtains a plurality of character phrases corresponding to the sign language information by adopting algorithm matching model matching;

and the model selection unit is used for selecting the algorithm matching model corresponding to the regional characteristics for matching the phrases.

the audio acquisition unit is used for acquiring the audio information of a speaker;

the audio conversion unit is used for identifying the character information corresponding to the audio information;

the word processing unit is used for processing the word information corresponding to the audio information into a plurality of word groups;

the sign language matching unit is used for matching by adopting an algorithm matching model to obtain sign language animations corresponding to the plurality of character phrases;

and the sign language output unit is used for outputting sign language animations corresponding to the plurality of character phrases.

A third aspect of the embodiments of the present invention discloses a translation apparatus, including:

a memory storing executable program code;

a processor coupled with the memory;

the processor calls the executable program code stored in the memory to execute part of the steps of the sign language translation method based on machine learning disclosed by the first aspect of the embodiment of the invention.

A fourth aspect of the embodiments of the present invention discloses a computer-readable storage medium storing a computer program, where the computer program enables a computer to execute all or part of the steps of the machine learning based sign language translation method disclosed in the first aspect of the embodiments of the present invention.

Compared with the prior art, the embodiment of the invention has the following beneficial effects:

in the embodiment of the invention, the depth camera is controlled to shoot an initial image; recognizing sign language information of the hearing-impaired person in the initial image by adopting a continuous gesture recognition framework; matching by adopting an algorithm matching model to obtain a plurality of character phrases corresponding to the sign language information; intelligently combining a plurality of character phrases into character sentences; and outputting the character sentence corresponding to the sign language information. Therefore, sign language information of the hearing-impaired people is recognized by adopting the continuous gesture recognition framework, and the sign language information is converted into text sentences for output, so that ordinary people can understand the meaning of the sign language made by the hearing-impaired people, and the communication of the hearing-impaired people in the society is facilitated.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flow chart diagram of a sign language translation method based on machine learning according to an embodiment of the present invention;

FIG. 2 is a flow chart of another method for machine learning-based sign language translation according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a translation apparatus according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of another translation apparatus disclosed in the embodiments of the present invention;

fig. 5 is a schematic structural diagram of another translation apparatus disclosed in the embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first", "second", "third" and "fourth" etc. in the description and claims of the present invention are used for distinguishing different objects, and are not used for describing a specific order. The terms "comprises," "comprising," and any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example one

Referring to fig. 1, as shown in fig. 1, a sign language translation method based on machine learning according to an embodiment of the present invention may include the following steps.

101. And controlling the depth camera to shoot an initial image.

In the embodiment of the invention, the translation equipment for translating the sign language is provided with a depth camera, and the depth camera is used for shooting a depth image including hearing-impaired people as an initial image; compared with a common camera, the depth image shot by the depth camera can acquire the depth information of the shot object, and the depth information comprises the position and size information of the shot object in the three-dimensional coordinate system.

102. And recognizing sign language information of the hearing-impaired person in the initial image by adopting a continuous gesture recognition framework.

In the embodiment of the invention, the sign language information of the hearing-impaired person in the initial image is recognized, wherein the sign language information comprises static sign language information such as body posture information and gesture information of the hearing-impaired person and dynamic sign language information when the hearing-impaired person changes gesture actions.

As an optional implementation manner, a two-dimensional convolution network is adopted to extract a plurality of body posture information and a plurality of gesture information included in an initial image, and the extracted body posture information and the plurality of gesture information are used as static sign language information of the initial image; extracting motion transformation information corresponding to each gesture information by adopting a three-dimensional convolution network, and taking the motion transformation information as dynamic sign language information of the initial image; and integrating the static sign language information and the dynamic sign language information to obtain the sign language information of the hearing-impaired person in the initial image. Specifically, the sign language information is recognized by adopting machine learning frames such as an LS-HAN continuous gesture recognition frame and the like, the traditional gesture recognition method needs to perform time segmentation on an initial image in advance to segment the initial image into a plurality of frame images, then performs gesture recognition on the images, and a large amount of time is consumed in the process; in addition, if the time division is not accurate, images of the hearing-impaired people when the gestures are changed can be divided, so that the gestures are misjudged, and the subsequent translation step is influenced; therefore, the LS-HAN continuous gesture recognition framework in the machine learning algorithm is adopted to continuously recognize gesture actions of the hearing-impaired person in the initial image, wherein a two-dimensional convolution network is adopted to extract a plurality of body posture information (sitting posture, standing posture, head posture and the like) and gesture information (positions of arms, palms and fingers) of the hearing-impaired person in the initial image to be used as static sign language information, a three-dimensional convolution network is adopted to extract action transformation information (transformation actions generated when the hearing-impaired person is transformed from the currently made static sign language to another static sign language) corresponding to each gesture information to be used as dynamic sign language information, and the static sign language information and the dynamic sign language information are further integrated to obtain the sign language information of the hearing-impaired person in the initial image. Therefore, the continuous gesture recognition framework does not need to perform tedious time division and frame-by-frame recognition work, the recognition speed of gesture actions is increased, and the conversion actions between each sign language action and the adjacent sign language actions can be clearly distinguished in the process of continuously recognizing the gesture actions, so that the recognition accuracy of the gesture actions is extremely high.

In the embodiment of the invention, the data volume of the depth image is huge, the requirements on the processing speed and the data bandwidth of the translation equipment are higher, and if the traditional processor is adopted to process the depth image, the real-time property of translating the sign language information into the character information cannot be ensured.

As an optional implementation manner, the embodiment of the present invention employs an NPU (embedded neural network processor) to process a depth image, and compared with a conventional processor, the NPU has a very high processing rate when processing a large amount of multimedia data, and can recognize a continuous depth image in real time.

103. And matching by adopting an algorithm matching model to obtain a plurality of character phrases corresponding to the sign language information.

In the embodiment of the invention, after a large amount of matched sign language information and character phrases are adopted for machine learning, the algorithm matching model can match the corresponding character phrases according to the characteristics of the static sign language information and the dynamic sign language information in the sign language information, so that the sign language information is primarily converted into the character information.

As an optional implementation manner, after recognizing sign language information of a hearing-impaired person in an initial image by using a continuous gesture recognition framework in step 102, and before obtaining a plurality of character phrases corresponding to the sign language information by using algorithm matching model matching in step 103, determining a region characteristic matched with the sign language information; acquiring an algorithm matching model corresponding to the region characteristics; specifically, sign language is the same as voiced language, and has different expression forms in different countries and different regions, for example, sign language action of "1 month" is expressed in the south, and "january" is expressed in the north, so that regional factors need to be considered in the sign language translation process to accurately translate sign language information into a matched text phrase; the embodiment of the invention is provided with a plurality of algorithm matching models corresponding to different regions, and after sign language information is obtained by adopting a continuous gesture recognition framework, the algorithm matching models corresponding to the region characteristics can be obtained according to the region characteristics in the sign language information, such as special unique gesture information in static sign language information or special action transformation information in dynamic sign language information. By selecting the algorithm matching model matched with the regional characteristics of the sign language information, the ambiguity of translated character phrases caused by the regional difference of the sign language can be avoided.

104. And intelligently combining a plurality of character phrases into character sentences.

In the embodiment of the invention, the translated word and phrase is relatively simple and does not conform to the common usage of common people.

As an alternative, the sign language is usually expressed by a plurality of simple phrases corresponding to gesture actions, as opposed to the numerous grammars and sentences of the vocal language, so that after a series of sign languages is translated into characters, a plurality of character phrases are obtained, rather than a sentence with precise words and strict sentence pattern. Therefore, the embodiment of the invention also intelligently combines a plurality of translated character phrases according to the use rule of sign language, and combines the plurality of character phrases into the character sentences which can be accurately understood by ordinary people only by adding prepositions and other modes among the character phrases.

105. And outputting the character sentence corresponding to the sign language information.

In the embodiment of the invention, the text sentences are output to the party communicating with the hearing impaired.

As an optional implementation manner, after the word phrases corresponding to the sign language information are obtained through matching, and the word phrases are intelligently combined into the word sentences, the word sentences are also output on a display medium such as a display screen of the translation equipment in real time, so that ordinary people communicating with the hearing-impaired people can understand the accurate meaning of the sign language made by the hearing-impaired people in real time, and the communication of the hearing-impaired people in the society is facilitated. It is understood that the text sentence may be output to the party communicating with the hearing impaired person in the form of audio or the like.

Therefore, by implementing the machine learning-based sign language translation method described in fig. 1, the sign language action made by the hearing-impaired person can be accurately translated into the text information in real time, so that the ordinary person can understand the meaning of the sign language made by the hearing-impaired person, and the hearing-impaired person can conveniently communicate with each other in the society.

Example two

Referring to fig. 2, fig. 2 is a diagram illustrating another method for machine learning-based sign language translation according to the present disclosure, which may include the following steps.

201. And controlling the depth camera to shoot an initial image.

202. And detecting whether the hearing-impaired person carries out sign language expression or not.

In the embodiment of the invention, the depth camera of the translation equipment shoots the depth image of the hearing impaired person towards the hearing impaired person, and preliminary positioning detection needs to be carried out on the hearing impaired person in order to ensure that the hearing impaired person is accurately positioned and the gesture action of the hearing impaired person is recognized.

As an optional implementation manner, before recognizing sign language information of a hearing-impaired person in an initial image by using a continuous gesture recognition framework, recognizing a face image of the hearing-impaired person in the initial image, and determining position information of the hearing-impaired person according to the face image; detecting whether the hearing-impaired person carries out sign language expression or not according to the position information; if so, the process goes to step 203. Specifically, during shooting of a depth image of a hearing-impaired person, other persons except the hearing-impaired person may exist in the depth image, and in order to accurately acquire and identify sign language information of the hearing-impaired person and avoid interference of actions of the other persons on a translation process of the hand language, it is necessary to detect the face and position information of the hearing-impaired person; the face data of the hearing-impaired person can be input into the translation equipment in advance, the face image of the hearing-impaired person is identified in the initial image through the face data, the trunk and the limbs of the hearing-impaired person are identified according to the identified face image of the hearing-impaired person, the position information of the hearing-impaired person is determined, and the positioning of the hearing-impaired person in the depth image is realized; then, according to the position information of the hearing-impaired person in the depth image, detecting whether the hearing-impaired person performs gesture actions by adopting action detection, and performing sign language expression; when the fact that the hearing-impaired person performs sign language expression is detected, the process goes to step 203, and a gesture continuing recognition framework is called to recognize sign language information. Therefore, the position information and the action of the hearing-impaired person are detected, so that the sign language information of the hearing-impaired person can be accurately acquired in the detection process; and the continuous gesture recognition framework is called to recognize after the action of sign language expression of the hearing-impaired person is detected, so that the interference action can be eliminated to trigger the misinterpretation, and the power consumption can be saved.

203. And recognizing sign language information of the hearing-impaired person in the initial image by adopting a continuous gesture recognition framework.

204. And matching by adopting an algorithm matching model to obtain a plurality of character phrases corresponding to the sign language information.

205. And intelligently combining a plurality of character phrases into character sentences.

206. And outputting the character sentence corresponding to the sign language information.

207. And translating the audio information of the speaker into sign language animation and outputting the sign language animation.

In the embodiment of the invention, the translation equipment can translate the sign language information of the hearing-impaired person into the text sentences and output the text sentences, and can also translate the audio information of the speaker into the sign language animation, thereby realizing the effect of bidirectional translation in the scene of communication between the hearing-impaired person and the common person.

As an alternative embodiment, audio information of the speaker is collected; identifying character information corresponding to the audio information; processing the text information corresponding to the audio information into a plurality of text phrases; matching by adopting an algorithm matching model to obtain sign language animations corresponding to a plurality of character phrases; outputting sign language animation corresponding to the plurality of character phrases. Specifically, the translation equipment can acquire sign language information of hearing-impaired people through the depth camera and translate the sign language information into corresponding text sentences for common people to check; on the contrary, the translation equipment can also collect the audio information of the speaker, translate the audio information into sign language animation which can be understood by the hearing-impaired person and output the sign language animation to the hearing-impaired person; the translation equipment collects audio information of a speaker, the processor identifies character information corresponding to the audio information, the character information is divided into a plurality of character phrases with definite ideograms, at the moment, the algorithm matching model is called to match sign language animations corresponding to the character phrases, the sign language animations are output to a display screen on one side of a hearing-impaired person on the translation equipment, and therefore the hearing-impaired person can check the sign language animations corresponding to the audio information in real time after the speaker speaks, and two-way barrier-free communication between the hearing-impaired person and a common person is achieved.

Therefore, by implementing the machine learning-based sign language translation method described in fig. 2, the position information and sign language information of the hearing-impaired person can be accurately identified, and the translation process is prevented from being interfered by irrelevant images; and the speaker who communicates with the hearing-impaired person can translate the audio information into sign language animation and output the sign language animation to the hearing-impaired person, so that the real-time two-way barrier-free communication between the hearing-impaired person and the common person is realized.

EXAMPLE III

Referring to fig. 3, fig. 3 is a schematic structural diagram of a translation device according to an embodiment of the present invention. The system may include:

a shooting unit 301, configured to control the depth camera to shoot an initial image;

a sign language recognition unit 302, configured to recognize sign language information of a hearing-impaired person in the initial image by using a continuous gesture recognition framework;

the phrase matching unit 303 is configured to match a plurality of character phrases corresponding to the sign language information by using an algorithm matching model;

a phrase combination unit 304, configured to intelligently combine a plurality of text phrases into text sentences;

a text output unit 305 for outputting text sentences corresponding to the sign language information;

the feature recognition unit 306 is configured to recognize a region feature matched with the sign language information after the sign language recognition unit 302 recognizes the sign language information of the hearing-impaired person in the initial image by using a continuous gesture recognition frame and before the phrase matching unit 303 matches a plurality of character phrases corresponding to the sign language information by using an algorithm matching model;

a model selecting unit 307, configured to select an algorithm matching model corresponding to the geographic feature for matching a phrase;

the sign language recognition unit 302 specifically includes:

a two-dimensional convolution subunit 3021, configured to extract, by using a two-dimensional convolution network, a plurality of body posture information and a plurality of gesture information included in the initial image, as static sign language information of the initial image;

the three-dimensional convolution subunit 3022 is configured to extract motion transformation information corresponding to each gesture information by using a three-dimensional convolution network, and use the motion transformation information as dynamic sign language information of the initial image;

and the data synthesis subunit 3023 is configured to synthesize the static sign language information and the dynamic sign language information to obtain the sign language information of the hearing-impaired person in the initial image.

In the embodiment of the present invention, the sign language recognition unit 302 is configured to recognize sign language information in the initial image captured by the capturing unit 301, and the phrase matching unit 303 and the phrase combining unit 304 translate the sign language information into text sentences and output the text sentences by the text output unit 305.

As an optional implementation manner, the two-dimensional convolution subunit 3021 extracts, by using a two-dimensional convolution network, a plurality of body posture information and a plurality of gesture information included in the initial image, as static sign language information of the initial image; the three-dimensional convolution subunit 3022 extracts the motion transformation information corresponding to each gesture information by using a three-dimensional convolution network, and uses the motion transformation information as the dynamic sign language information of the initial image; the data integrating subunit 3023 integrates the static sign language information and the dynamic sign language information to obtain the sign language information of the hearing-impaired person in the initial image. Specifically, the translation equipment adopts machine learning frames such as an LS-HAN continuous gesture recognition frame and the like to recognize sign language information, and the traditional gesture recognition method needs to perform time division on an initial image in advance to divide the initial image into a plurality of frames of images and then perform gesture recognition on the images, so that a great deal of time is consumed in the process; in addition, if the time division is not accurate, images of the hearing-impaired people when the gestures are changed can be divided, so that the gestures are misjudged, and the subsequent translation step is influenced; therefore, the sign language recognition unit 302 uses the LS-HAN continuous gesture recognition framework in the machine learning algorithm to continuously recognize the gesture actions made by the hearing impaired person in the initial image, wherein, the two-dimensional convolution subunit 3021 can extract a plurality of body posture information (sitting posture, standing posture, head posture, etc.) and gesture information (positions of arms, palms, fingers, etc.) of the hearing-impaired person in the initial image as static sign language information by using a two-dimensional convolution network, and the three-dimensional convolution subunit 3022 may extract motion transformation information (transformation motion generated when the hearing impaired person transforms from a static sign language currently made into another static sign language) corresponding to each gesture information as dynamic sign language information by using a three-dimensional convolution network, further, the data integrating subunit 3023 integrates the static sign language information and the dynamic sign language information to obtain the sign language information of the hearing-impaired person in the initial image. Therefore, the continuous gesture recognition framework does not need to perform tedious time division and frame-by-frame recognition work, the recognition speed of gesture actions is increased, and the conversion actions between each sign language action and the adjacent sign language actions can be clearly distinguished in the process of continuously recognizing the gesture actions, so that the recognition accuracy of the gesture actions is extremely high.

As an optional implementation manner, the sign language recognition unit 302 processes the depth image by using an NPU (embedded neural network processor), and compared with a conventional processor, the NPU has a very high processing rate when processing a large amount of multimedia data, and can recognize a continuous depth image in real time.

As an optional implementation manner, after the sign language recognition unit 302 recognizes sign language information of a hearing-impaired person in an initial image by using a continuous gesture recognition framework, and before the phrase matching unit 303 matches a plurality of character phrases corresponding to the sign language information by using an algorithm matching model, the feature recognition subunit 306 determines a region feature matched with the sign language information; the model selection subunit 307 acquires an algorithm matching model corresponding to the regional characteristics; specifically, sign language is the same as voiced language, and has different expression forms in different countries and different regions, for example, sign language action of "1 month" is expressed in the south, and "january" is expressed in the north, so that regional factors need to be considered in the sign language translation process to accurately translate sign language information into a matched text phrase; the translation device is provided with a plurality of algorithm matching models corresponding to different regions, and after the sign language information is obtained by adopting the continuous gesture recognition framework, the model selection subunit 307 can obtain the algorithm matching model corresponding to the region characteristics according to the region characteristics in the sign language information, such as special unique gesture information in static sign language information or special action transformation information in dynamic sign language information. By selecting the algorithm matching model matched with the regional characteristics of the sign language information, the ambiguity of translated character phrases caused by the regional difference of the sign language can be avoided.

As an alternative, the sign language is usually expressed by a plurality of simple phrases corresponding to gesture actions, as opposed to the numerous grammars and sentences of the vocal language, so that after a series of sign languages is translated into characters, a plurality of character phrases are obtained, rather than a sentence with precise words and strict sentence pattern. Therefore, the phrase combination unit 304 also intelligently combines the translated text phrases according to the usage rule of the sign language, and combines the text phrases into only text sentences that can be accurately understood by ordinary people by adding prepositions and the like between the text phrases.

As an optional implementation manner, after the word group matching unit 303 matches to obtain a text word group corresponding to the sign language information, and the word group combining unit 304 intelligently combines the text word group into a text sentence, the text output unit 305 outputs the text sentence on a display medium such as a display screen of the translation device in real time, so that a normal person communicating with the hearing impaired person can understand the accurate meaning of the sign language made by the hearing impaired person in real time, and the hearing impaired person can conveniently communicate with the society. It is understood that the text sentence may be output to the party communicating with the hearing impaired person in the form of audio or the like.

Therefore, the translation device described in fig. 3 can accurately translate the sign language action made by the hearing-impaired person into the character information in real time, so that the ordinary people can understand the meaning of the sign language made by the hearing-impaired person, and the hearing-impaired person can conveniently communicate with each other in the society.

Example four

Referring to fig. 4, fig. 4 is a schematic structural diagram of another translation device according to an embodiment of the present invention. The system further comprises:

a face recognition unit 308, configured to recognize a face image of the hearing-impaired person in the initial image before the sign language recognition unit 302 recognizes sign language information of the hearing-impaired person in the initial image by using a continuous gesture recognition framework, and determine position information of the hearing-impaired person according to the face image;

a sign language detecting unit 309 for detecting whether the hearing-impaired person performs sign language expression or not based on the position information;

the sign language identification unit 302 is specifically configured to identify sign language information of the hearing-impaired person in the initial image by using a continuous gesture identification framework when the sign language detection unit 309 detects that the hearing-impaired person performs sign language expression;

an audio collecting unit 310 for collecting audio information of a speaker;

the audio conversion unit 311 is configured to identify text information corresponding to the audio information;

a word processing unit 312, configured to process the word information corresponding to the audio information into a plurality of word phrases;

the sign language matching unit 313 is used for matching by adopting an algorithm matching model to obtain sign language animations corresponding to a plurality of character phrases;

and the sign language output unit 314 is configured to output a sign language animation corresponding to the plurality of text phrases.

In the embodiment of the present invention, the face recognition unit 308 is configured to determine position information of a hearing-impaired person according to a face image, the sign language detection unit 309 detects whether the hearing-impaired person performs sign language expression according to the position information, and when the hearing-impaired person performs sign language expression, the sign language recognition unit 302 is triggered to recognize sign language information of the hearing-impaired person; the audio acquisition unit 310 and the audio conversion unit 311 acquire and convert audio information into text information, and the text information is processed into corresponding sign language animation by the text processing unit 312 and the sign language matching unit 313 and output by the sign language output unit 314.

As an alternative implementation, before the sign language recognition unit 302 uses the continuous gesture recognition framework to recognize the sign language information of the hearing-impaired person in the initial image, the face recognition unit 308 recognizes the face image of the hearing-impaired person in the initial image, and determines the position information of the hearing-impaired person according to the face image; the sign language detecting unit 309 detects whether the hearing-impaired person performs sign language expression or not according to the position information; if yes, the sign language recognition unit 302 is triggered. Specifically, during shooting of a depth image of a hearing-impaired person, other persons except the hearing-impaired person may exist in the depth image, and in order to accurately acquire and identify sign language information of the hearing-impaired person and avoid interference of actions of the other persons on a translation process of the hand language, it is necessary to detect the face and position information of the hearing-impaired person; here, face data of the hearing-impaired person may be input in the translation device in advance, the face recognition unit 308 recognizes a face image of the hearing-impaired person in the initial image through the face data, and recognizes a trunk and limbs of the hearing-impaired person according to the recognized face image of the hearing-impaired person, so as to determine position information of the hearing-impaired person and realize positioning of the hearing-impaired person in the depth image; further, the sign language detecting unit 309 detects whether the hearing-impaired person performs a gesture action by using action detection according to the position information of the hearing-impaired person in the depth image, and performs sign language expression; when the fact that the hearing-impaired person expresses the sign language is detected, the sign language recognition unit 302 is triggered, and a gesture-continuing recognition framework is called to recognize sign language information. Therefore, the position information and the action of the hearing-impaired person are detected, so that the sign language information of the hearing-impaired person can be accurately acquired in the detection process; and the continuous gesture recognition framework is called to recognize after the action of sign language expression of the hearing-impaired person is detected, so that the interference action can be eliminated to trigger the misinterpretation, and the power consumption can be saved.

As an alternative embodiment, the audio collecting unit 310 collects audio information of a speaker; the audio conversion unit 311 identifies text information corresponding to the audio information; the word processing unit 312 processes the word information corresponding to the audio information into a plurality of word phrases; the sign language matching unit 313 matches the algorithm matching model to obtain sign language animations corresponding to the plurality of character phrases; the sign language output unit 314 outputs a sign language animation corresponding to a plurality of character phrases. Specifically, the audio acquisition unit 310 acquires audio information of a speaker, and the audio conversion unit 311, the word processing unit 312, and the sign language matching unit 313 translate the audio information into a sign language animation that can be understood by a hearing-impaired person and output the same to the hearing-impaired person by the sign language output unit 314; the audio acquisition unit 310 acquires audio information of a speaker, the audio conversion unit 311 identifies text information corresponding to the audio information, the text processing unit 312 divides the text information into a plurality of text phrases with clear ideograms, at the moment, the sign language matching unit 313 calls an algorithm matching model to match sign language animations corresponding to the text phrases, and the sign language output unit 314 outputs the sign language animations to a display screen on one side of a hearing-impaired person on the translation equipment, so that the hearing-impaired person can check the sign language animations corresponding to the audio information in real time after the speaker speaks, and two-way barrier-free communication between the hearing-impaired person and a common person is realized.

Therefore, by implementing the translation device described in fig. 4, the position information and sign language information of the hearing impaired person can be accurately identified, and the interference of irrelevant images on the translation process is avoided; and the speaker who communicates with the hearing-impaired person can translate the audio information into sign language animation and output the sign language animation to the hearing-impaired person, so that the real-time two-way barrier-free communication between the hearing-impaired person and the common person is realized.

EXAMPLE five

Referring to fig. 5, fig. 5 is a schematic structural diagram of another translation device according to an embodiment of the present invention. As shown in fig. 5, the translation apparatus may include:

a memory 501 in which executable program code is stored;

a processor 502 coupled to a memory 501;

the processor 502 calls the executable program code stored in the memory 501 to execute a part of the steps of any one of the sign language translation methods based on machine learning shown in fig. 1 to 2.

The embodiment of the invention discloses a computer-readable storage medium which stores a computer program, wherein the computer program enables a computer to execute all or part of the steps of any one of the machine learning-based sign language translation methods shown in the figures 1-2.

It will be understood by those skilled in the art that all or part of the steps in the methods of the embodiments described above may be implemented by instructions associated with a program, which may be stored in a computer-readable storage medium, where the storage medium includes Read-Only Memory (ROM), Random Access Memory (RAM), Programmable Read-Only Memory (PROM), Erasable Programmable Read-Only Memory (EPROM), One-time Programmable Read-Only Memory (OTPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), compact disc-Read-Only Memory (CD-ROM), or other Memory, magnetic disk, magnetic tape, or magnetic tape, Or any other medium which can be used to carry or store data and which can be read by a computer.

The machine learning-based sign language translation method and the translation device disclosed by the embodiment of the invention are described in detail, a specific example is applied in the text to explain the principle and the implementation of the invention, and the description of the embodiment is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A sign language translation method based on machine learning is characterized by comprising the following steps:

controlling a depth camera to shoot an initial image;

2. The method of claim 1, wherein prior to said recognizing sign language information of the hearing impaired person in the initial image using the continuous gesture recognition framework, the method further comprises:

3. The method of claim 1, wherein the recognizing sign language information of the hearing impaired person in the initial image by using the continuous gesture recognition framework comprises:

4. The method according to claim 1, wherein after the recognizing sign language information of the hearing impaired person in the initial image by using the continuous gesture recognition framework and before obtaining a plurality of word phrases corresponding to the sign language information by using the algorithm matching model, the method further comprises:

5. The method according to any one of claims 1 to 4, further comprising:

collecting audio information of a speaker;

identifying character information corresponding to the audio information;

6. A translation apparatus, comprising:

7. The translation device according to claim 6, further comprising:

8. The translation apparatus according to claim 6, wherein the sign language recognition unit comprises:

9. The translation device according to claim 6, further comprising:

10. The translation apparatus according to any one of claims 6 to 9, further comprising: