AU2021101804A4

AU2021101804A4 - A system for translating sign language into speech and vice- versa

Info

Publication number: AU2021101804A4
Application number: AU2021101804A
Authority: AU
Inventors: Sandhya Makkar; HariKumar Pallathadka; Shalini Puri; Ashanta Ranjan Routray
Original assignee: Individual
Current assignee: Individual
Priority date: 2021-04-08
Filing date: 2021-04-08
Publication date: 2021-07-22
Anticipated expiration: 2029-04-08

Abstract

A SYSTEM FOR TRANSLATING SIGN LANGUAGE INTO SPEECHAND VICE VERSA 5 The present disclosure envisages a system (100) for translating sign language of a deaf/dumb person into speech. An image capturing device (102) capture video of a deaf/dumb person speaking sign language, in real-time. to generatea corresponding live image data stream. An image processing unit (104) extracts features of gesture of a left and right hand of the deaf/dumb person from said captured live image data stream. A feature matching unit (112) 10 matches extracted features of gesture of the left and right hand with the stored dataset having gesture information in a first repository (110). A feature recognition unit (114) recognizes the gesture information on the basis of temporal and spatial hand gesture variation. An audio unit (116) output the gesture information into a voice audio via an audio device. The system (100) also converts speech into sign language, thereby facilitating bilateral live 15 communication between deaf/dumb person and normal person. (FIG. 1 will be the reference figure) FIG. 1 Block diagram of a system for translating sign language into speech

Description

A SYSTEM FOR TRANSLATING SIGN LANGUAGE INTO SPEECH AND VICE VERSA TECHNICAL FIELD

[0001] The present disclosure relates to the field of digital image processing, particularly video processing in real time.

BACKGROUND

[0002] The background information herein below relates to the present disclosure but is not necessarily prior art. Around the world we have 466 million deaf and dumb people and 34 million of these are children. WHO says it will increase to 900 million by 2050. Hearing lossmay result from genetic causes and complication at birth. Usually, deaf and dumb people found difficulties to interact with normal person since they are not able to speakor hear and hence, they are unable to share their emotions to the normal person. Many timesthe expressions of the deaf and dumb people are wrongly interpreted by the normal person. Therefore, these deaf/dumb people will not come out and lack many opportunities such as jobs.

[0003] Many communication platforms are available for deaf and dumb people wherethey can interact with normal person. These platforms may include wearing of gloves having sensors and further tracking of the movements of hands to identify gestures of hands to interpret the sign language. However, deaf and dumb people cannot afford these platforms due to high cost and less efficiency to convert sign language into text or speech. Conventional approaches may found that a video filehaving sign language is processed to send the output in the form of audio or textso that the normal person may come to know about the expression of the deaf/dumb people using sign language but a delay was observedfor converting the video file into audio.

[0004] Efforts have been made in in the related prior art to provide different solutionsfor sign language recognition. For example, Chinese Patent no. CN103136986B addressed the issues relates to a kind of sign Language Recognition Method, comprise the following steps: gather the image comprising marked region; The attitude in identification marking region; Generate the steering order that described attitude is corresponding; Convert described steering order to natural language information. In addition, a kind of sign Language Recognition is additionally provided. Above-mentioned sign Language Recognition Method and system can improve the accuracy rate of identification. However, the prior art fails to provide a system that is capable of providing bilateral communication between disabled people.

[0005] In some embodiments, the numbers expressing quantities or dimensions of items, and so forth, used to describe and claim certain embodiments of the invention are to be understood as being modified in some instances by the term "about." Accordingly, in some embodiments, the numerical parameters set forth in the written description andattached claims are approximations that can vary depending upon the desired properties sought to be obtained by a particular embodiment. In some embodiments, the numerical parameters should be construed in light of the number of reported significant digits and by applying ordinary rounding techniques. Notwithstanding that the numerical ranges and parameters setting forth the broad scope of some embodiments of the invention are approximations, the numerical values set forth in the specific examples are reported as precisely as practicable. The numerical values presented in some embodiments of the invention may contain certain errors necessarily resulting from the standard deviation foundin their respective testing measurements.

[0006] As used in the description herein and throughout the claims that follow, the meaning of "a," "an," and "the" includes plural reference unless the context clearly dictatesotherwise. Also, as used in the description herein, the meaning of "in" includes "in" and "on" unless the context clearly dictates otherwise.

OBJECTS OF THE INVENTION

[0007] It is an object of the present disclosure, which provides a system for translating sign language into a speech.

[0008] It is an object of the present disclosure which provides a method that improves user security.

[0009] It is an object of the present disclosure to provide a system for translating a speech into sign language.

[0010] It is an object of the present disclosure to provide a system for translating sign language that is highly efficient and cost effective.

[0011] It is an object of the present invention will be more apparent from the following description, which is not intended to limit the scope of the present disclosure.

SUMMARY

[0012] The present concept envisages a system for translating sign language into speech and vice versa. The system for translating sign language of a deaf/dumb person into speech comprises an image capturing device, an image processingunit, a first repository, a feature matching unit, a feature recognition unit, and an audio unit. The image capturing device is configured to capture video of the deaf/dumb person speaking sign language, in real-time, and is further configuredto generate a corresponding live image data stream based on the captured video. The image processing unit is configured to receive the captured live image data stream from the image capturing device and is further configured to extract features of gesture of a left and right hand of the deaf/dumb person. The first repository is configured to store a dataset having a list of gestures and corresponding gesture information. The feature matching unit is configured to cooperate with the image processingunit to receive the extracted features of gesture of the left and right hand and is further configured to match extracted features of gesture of the left and right hand with the stored dataset having gestureinformation in the first repository. The feature recognition unit is configured to receive matched gesture information from the feature matching unit to recognize the gesture information on the basis of temporal and spatial hand gesturevariation. The audio unit is configured to receive recognized gesture information from the feature recognition unit and is further configured to output the gesture information into a voice audio via an audio device. The image processing unit, thefeature matching unit and the feature recognition unit are implemented using one or more processor(s).

[0013] In an embodiment, the image processing unit includes a detection unit and a hand feature extraction unit. The detection unit is configured to identify a region of interest within the received captured live image data stream from the image capturing device, wherein region of interest within the received captured live image data stream includes videoof coordinated left and right hand of the deaf/dumb person. The hand feature extraction unitis configured to receive the identified video of coordinated left and right hand and is further configured to extract temporal and spatial feature of gesture of the left and right hand.

[0014] In another embodiment, the image capturing device is selected from the groupconsisting of a still camera, IP camera, a 3D camera, an infrared camera, an imagecapturingsensor, a digital camera, and a CCD camera.

[0015] In still another embodiment, the gesture information in the first repository includes a plurality of classes related to hand gesture used for sign language, hand gesture for alphabets, hand gesture for numbers, hand gesture for words, hand gesture for expressions and the like.

[0016] In yet another embodiment, the feature recognition unit is configured to verify andrecognize the correct gesture information on the basis of tracking of left and right hand, movement's occlusion, and position of hands on the basis of the temporal and spatial hand gesture variation by employing segmentation techniques.

[0017] The system for translating speech of a user into sign language comprises an input unit, a speech to text convertor, a second repository, a gesture matching unit, anda display unit. The input unit is configured to accept input in the form of voice commands from the user. The speech to text convertor cooperates with the input unit to receive the voice commands and is further configured to convert the voice commands into text commands, inreal time. The second repository is configured to store a second dataset having a list of gestures information associated with a text. The gesture matching unit is configured to extract a suitable gesture based onthe received text commands by crawling through the stored list of gestures information via a crawler when the received text commands matches with the stored text in the second repository. The display unit is configured to display the corresponding gesture information received from the gesture matching unit. The speech to text convertor and the gesture matching unit are implemented using one or more processor(s).

[0018] In an embodiment, the system provides a bilateral communication to the deaf/dumb person speaking sign language and the user, in continuous live mode.

BRIEF DESCRIPTION OF THE DRAWINGS

[0019] The accompanying drawings are included to provide a further understanding ofthe present disclosure and are incorporated in and constitute a part of this specification. The drawings illustrate exemplary embodiments of the present disclosure and, together with the description, serve to explain the principles of the present disclosure.

[0020] FIG. 1 illustrates a block diagram of a system for translating sign language into speech, in real time, in accordance with the present disclosure.

[0021] FIG. 2 illustrates a block diagram of the system by translating speech into sign language, in real time, in accordance with the present disclosure.

[0022] Other objects, advantages and novel features of the invention will become apparent from the following detailed description of the present embodiment when taken in conjunction with the accompanying drawings.

DETAILED DESCRIPTION

[0023] Aspects of the present disclosure relate to a system for translating sign languageinto speech or vice-versa. It is inferred that the foregoing description is only illustrative of the present invention, and it is not intended that invention be limited or restrictive thereto. Many other specific embodiments of the present invention will be apparent to one skilled in the art from the foregoing disclosure. All substitutions, alterations and modifications of the present invention which comes within the scope of the following claims are to which the present invention is readily susceptible without departing from the spirit of the invention. The scope of the invention should therefore be determined not with reference to appended claims along with the full scope of equivalents to which such claims are entitled.

[0024] Various methods described herein may be practiced by combining one or moremachine readable storage media containing the code/instruction according to the present invention with appropriate standard device hardware to execute the instruction contained therein. An apparatus for practicing various embodiments of the present invention may involve one or more computers (say server) (or one or more processors within a single computer) and storage systems containing or having network access to computer program(s) coded in accordance with various methods described herein, and the method steps of the invention could be accomplished by modules, devises, routines, subroutines, or subparts of a computer program product.

[0025] If the specification states a component or feature"may, "can", "could", or "might" be included or have a characteristic, that particular component or feature is not required to be included or have the characteristic.

[00261 Various terms as used herein are shown below. To the extent a term used in a claim is not defined below, it should be given the broadest definition persons in the pertinentart have given that term as reflected in printed publications and issued patents at the time offiling.

[00271 Referring to FIG. 1, the present disclosure envisages a system 100 for translating sign language into speech, in real time. The system 100 comprises an image capturing device 102, an image processing unit 104, a first repository 110, a feature matching unit 112, a feature recognition unit 114, and an audio unit 116.

[0028] In an aspect, the image capturing device 102 is configured to capture video of a deafdumb person speaking sign language, in real-time. Further, the image capturing device102 is configured to generate a corresponding live image data stream based on the capturedvideo. In an embodiment, the image capturing device 102 is selected from the group consisting of a camera, IP camera, a 3D camera, an infrared camera, an image capturing sensor, a digital camera, and a CCD camera.

[0029] In another aspect of the present invention, the processing unit 104 is configuredto cooperate with the image capturing device 102 to receive captured live image data stream.The image processing unit 104 includes a detection unit 106, a hand feature extraction unit 108. The detection unit 106 is configured to identify a region of interest within the received captured live image data stream. The region of interest within the received captured live image data stream is the identified video of coordinated left and right hand of the deafdumb person. The hand feature extraction unit 108 is configured to receive the identified video of coordinated left and right hand and is further configured to extract temporal and spatial feature of gesture of the left and right hand.

[0030] In another aspect of the present invention, the first repository 110 is configured to store a dataset having a list of gestures and corresponding gesture information. In another embodiment, the gesture information may include a plurality of classes related to gesture used for sign language, like hand gesture for alphabets, hand gesture for numbers, hand gesture for words, hand gesture for expressions and the like. In another embodiment, the first repository 110 is configured to store a dataset having a list of gestures and corresponding gesture information in each and every language.

[0031] In another aspect of the present invention, the feature matching unit 112 is configured to cooperate with the hand feature extraction unit 108 and the first repository 110. The feature matching unit 112 is configured to receive the extracted feature of gesture of the left and right hand and is further configured to match with the stored dataset having gesture information. After matching with suitable gesture information, the feature matching unit 112 is configured to transmit the gesture information to the feature recognition unit 114.The feature recognition unit 114 is configured to verify and recognize the correct gesture information on the basis of tracking of left and right hand, movement's occlusion, and position of hands on the basis of the temporal and spatial hand gesture variation by employing segmentation techniques. Further, the feature recognition unit 114 is configured to transmit the temporal and spatial gesture information to the audio unit 116. In another embodiment, the feature recognition unit 114 is configured to recognize the gesture information in any language.

[0032] In yet another aspect of the present invention, the audio unit 116 is configuredto receive gesture information and is further configured to output the gesture information into a voice audio via an audio device. In an embodiment, the audio device can be a speaker.In an embodiment, the audio unit 116 includes a text to speech converter configured to convert the gesture information into the voice audio.

[0033] In an embodiment, the image processing unit 104, the feature matching unit 112 and the feature recognition unit 114 are implemented using one or more processor(s).

[0034] Referring to FIG. 2, the present disclosure envisages a system 100 for translating speech into sign language, in real time. The system 100 comprises an input unit 118, a speech to text convertor 120, a second repository 122, a gesture matching unit 124 and a display unit 126.

[0035] In yet another aspect of the present invention, the input unit 118 is configured to accept input in the form of voice commands from a user and is further configured to transmit the voice commands to the speech to text convertor 120. In an embodiment, the input unit 118 can be a microphone. The speech to text convertor 120 is configured to recognize the voice commands given by the user, and is further configured to convert the voice commands into a text commands. In an embodiment, the speech to text convertor 120is configured to detect the voice commands to generate the text commands, and send the text commands to the gesture matching unit 124.

[0036] In yet another aspect of the present invention, second repository 122 is configured to store a second dataset having a list of gestures information associated with a text. The gesture matching unit 124 is configured to cooperate with the speech to text convertor 120 and the second repository 122. The gesture matching unit 124 is configured to receive the text commands and is configured to search a suitable gesture associated with the text by crawling through the stored list of gestures information via a crawler (not shownin the figure) and is further configured to extract the gesture when the received text commands matches with the stored text.

[00371 The display unit 126 is configured to cooperate with the gesture matching unit 124 to receive the gesture information and is further configured to display the corresponding gesture information in the form of a Graphics Interchange Format (GIF).

[0038] Hence, the system 100 is configured to provide a bilateral communication to the deaf/dumb person speaking sign language and the user (normal person) by converting a live sign language input of deaf/dumb person into a voice output and converting voice command from the normal person into GIF of sign language, in continuous live mode, thereby providing lively interaction between the deaf and dumb person and the normal person without any delay.

[0039] In yet another aspect of the present invention, the speech to text convertor 120and the gesture matching unit 124 are implemented using one or more processor(s). Advantageously, the processors can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operationalinstructions.

[0040] The functions described herein may be implemented in hardware, software executed by a processor, firmware, or any combination thereof. If implemented in software executed by a processor, the functions may be stored on or transmitted over as one or more instructions or code on a computer- readable medium. Other examples and implementationsare within the scope and spirit of the disclosureand appended claims. For example, due to the nature of software, functions described above can be implemented using software executed by a processor, hardware, firmware, hardwiring, or combinations of any of these. Features implementing functions may also be physically located at various positions, including being distributed such that portions of functions are implemented at different physical locations.

[0041] Thus, the scope of the present disclosure is defined by the appended claims and includes both combinations and sub-combinations of the various features described hereinabove as well as variations and modifications thereof, which would occur to persons skilled in the art upon reading the foregoing description.

For

Claims

We Claim:

1. A system (100) for translating sign language of a deaf/dumb person into speech, said system (100) comprising:

an image capturing device (102) configured to capture video of the deaf/dumb person speaking sign language, in real-time, and further configured to generate a corresponding live image data streambased on the captured video;

an image processing unit (104) configured to receive said captured live image data stream from said image capturing device (102) and further configured to extract features of gesture of a left and right hand of the deaf/dumb person, wherein said image processing unit (104), wherein said feature matching unit (112) and said feature recognition unit (114) are implemented using one or more processor(s);

a first repository (110) configured to store a dataset having a list of gestures and corresponding gesture information;

a feature matching unit (112) configured to cooperate with said image processing unit (104) to receive said extracted features of gesture of the left and right hand and further configured to match extracted features of gesture of the left and right hand with the stored dataset having gesture information in said first repository (110);

a feature recognition unit (114) configured to receive matched gesture information from said feature matching unit (112) to recognize the gesture information on the basis of temporal and spatial hand gesture variation; and

an audio unit (116) configured to receive recognized gesture information from said feature recognition unit (114) and further configured to output the gesture information into a voice audio via an audio device.

generating primary spectrum key (106) for a primary user wherein the generated key indicates the data of nodes and spectrum holes; generating a secondary spectrum key (108) for a secondary user wherein the generated key indicates the data of nodes and spectrum holes; computing the secondary nodes for the generated secondary spectrum keys; verifying (110) the node through spectrum key management; reassigning of the primary spectrum key for node verification through new key management (112).

2. The system as claimed in claim 1, wherein said image processing unit

(104) includes:

A detection unit (106) configured to identify a region of interest within the received captured live image data stream from said image capturing device (102), wherein region of interest within the received captured live image data stream includes video of coordinated left and right hand of the deaf/dumb person; and

a hand feature extraction unit (108) configured to receive the identified video of coordinated left and right hand and further configured to extract temporal and spatial feature of gesture of the left and right hand.

3. The system (100) as claimed in claim 1, wherein said image capturing device (102) is selected from the group consisting of a still camera, ip camera, a 3d camera, an infrared camera, an image capturing sensor, a digital camera, and a ccd camera.

4. The system (100) as claimed in claim 1, wherein said gesture information in said first repository (110) includes a plurality of classes related to hand gesture used for sign language, hand gesture for alphabets, hand gesture for numbers, hand gesture for words, hand gesture for expressions and the like.

5. The system (100) as claimed in claim 1, wherein said feature recognition unit (114) is configured to verify and recognize the correct gesture information on the basis of tracking of left and right hand, movements occlusion, and position of hands on the basis ofthe temporal and spatial hand gesture variation by employing segmentation techniques.

6. The system (100) for translating speech into sign language, said system (100) comprising an input (118) configured to accept input in the form of voice commands from a user;

a speech to text convertor (120) cooperating with said input unit (118) to receive said voice commands and further configured to convert said voice commands into text commands, in real time;

a second repository (122) configured to store a second dataset having a list ofgestures information associated with a text;

a gesture matching unit (124) configured to extract a suitable gesture based on the received text commands by crawling through the stored list of gestures information via a crawler when said received text commands matches with the stored text in said second repository (122); and

a display unit (126) configured to display the corresponding gestureinformation received from said gesture matching unit (124).

7. The system (100) as claimed in claim 6, wherein the system (100) provides abilateral communication to the deaf/dumb person speaking sign language and the user, in continuous live mode.