WO2021218750A1 - Système et procédé de traduction de langue des signes - Google Patents

Système et procédé de traduction de langue des signes Download PDF

Info

Publication number
WO2021218750A1
WO2021218750A1 PCT/CN2021/088829 CN2021088829W WO2021218750A1 WO 2021218750 A1 WO2021218750 A1 WO 2021218750A1 CN 2021088829 W CN2021088829 W CN 2021088829W WO 2021218750 A1 WO2021218750 A1 WO 2021218750A1
Authority
WO
WIPO (PCT)
Prior art keywords
language
input
sign language
sign
user device
Prior art date
Application number
PCT/CN2021/088829
Other languages
English (en)
Inventor
Ravi Ranjan SWARNKAR
Abhishek GOGIA
Nitin Misra
Rahul Kumar TRIPATHI
Manoj Kumar
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp., Ltd. filed Critical Guangdong Oppo Mobile Telecommunications Corp., Ltd.
Publication of WO2021218750A1 publication Critical patent/WO2021218750A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B21/00Teaching, or communicating with, the blind, deaf or mute
    • G09B21/009Teaching or communicating with deaf persons
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/027Concept to speech synthesisers; Generation of natural phrases from machine-based concepts

Definitions

  • the present invention relates to the field of sign language translation, and particularly the present invention relates to a system and a method for translating sign language, and a user device thereof.
  • Sign language facilitates communication for people with speech or hearing impairments. Such persons with speech or hearing impairments use signs to communicate with the outside world.
  • the signs may be a combination of hand gestures, body movements, touch, facial expressions, and other forms of communication.
  • Like language differs from region to region, there are several types of sign languages, and each language is considerably different from the other.
  • the sign languages also differ from region to region. For instance, American Sign Language is different from German Sign Language. Often, a person may have difficulty understanding a sign language user due to, for instance, lack of fluency in the language.
  • the present invention is directed towards a system and a method for translating sign language.
  • a first aspect of the present disclosure relates to a method for translating sign-language.
  • the method begins with receiving, at an input unit, an input from a user of a user device. Subsequently, the input unit identifies a language type associated with the input.
  • sign language engine receives a region selection for the user device. Further, the sign language engine selects a sign language associated with the identified region selection.
  • a language processor translates the input into the selected sign language.
  • the system comprises a memory unit, an input unit, a sign language engine and a language processor.
  • the input unit is connected to the memory unit, said input unit is configured to receive an input from a user of a user device and to identify a language type associated with the input.
  • the sign language engine is connected to the memory unit and the input unit, said sign language engine is configured to receive a region selection for the user device, and to select a sign language associated with the identified region selection.
  • the language processor is connected to the memory unit, the input unit and the sign language engine, said language processor id configured to translate the input into the selected sign language.
  • the user device comprises a memory unit, an input unit, a sign language engine and a language processor.
  • the input unit is connected to the memory unit, said input unit is configured to receive an input from a user of a user device and to identify a language type associated with the input.
  • the sign language engine is connected to the memory unit and the input unit, said sign language engine is configured to receive a region selection for the user device, and to select a sign language associated with the identified region selection.
  • the language processor is connected to the memory unit, the input unit and the sign language engine, said language processor id configured to translate the input into the selected sign language.
  • FIG. 1 illustrates an architecture of a system [100] for translating sign language, in accordance with exemplary embodiments of the present disclosure.
  • FIG. 2 illustrates an exemplary method flow diagram depicting method [200] for translating sign language, in accordance with exemplary embodiments of the present disclosure.
  • FIG. 3 illustrates an exemplary signal flow diagram for method of [300] translating sign language, in accordance with exemplary embodiments of the present disclosure.
  • FIG. 4 illustrates an exemplary signal flow diagram for method of [400] translating sign language into a natural language output, in accordance with exemplary embodiments of the present disclosure.
  • FIG. 5 illustrates an exemplary implementation [500] of the system and method for translating sign language on a user device, in accordance with exemplary embodiments of the present disclosure.
  • FIG. 6 illustrates another exemplary implementation [600] of the system and method for translating sign language on a user device, in accordance with exemplary embodiments of the present disclosure.
  • circuits, systems, networks, processes, and other components may be shown as components in block diagram form in order not to obscure the embodiments in unnecessary detail.
  • well-known circuits, processes, algorithms, structures, and techniques may be shown without unnecessary detail in order to avoid obscuring the embodiments.
  • individual embodiments may be described as a process which is depicted as a flowchart, a flow diagram, a data flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be re-arranged.
  • a process is terminated when its operations are completed but could have additional steps not included in a figure.
  • a process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. When a process corresponds to a function, its termination can correspond to a return of the function to the calling function or the main function.
  • the terms ” includes, ” “has, ” “contains, ” and other similar words are used in either the detailed description or the claims, such terms are intended to be inclusive-in a manner similar to the term ” comprising” as an open transition word-without precluding any additional or other elements.
  • the disclosed subject matter may be implemented as a method, apparatus, or article of manufacture using standard programming and/or engineering techniques to produce software, firmware, hardware, or any combination thereof to control a computer to implement the disclosed subject matter.
  • a ” processor or ” processing unit includes one or more processors, wherein processor refers to any logic circuitry for processing instructions.
  • a processor may be a general-purpose processor, a special purpose processor, a conventional processor, a digital signal processor, a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits, Field Programmable Gate Array circuits, any other type of integrated circuits, etc.
  • the processor may perform signal coding data processing, input/output processing, and/or any other functionality that enables the working of the system according to the present disclosure. More specifically, the processor or processing unit [106] is a hardware processor.
  • an ” input/output (I/O) unit includes one or more data input device and one or more data output device.
  • An I/O unit may process commands and data to control I/O devices.
  • a data input device may comprise of input means for inputting character data, for example, a keyboard with a plurality of keys.
  • a data output device may comprise of output means for representing characters in response to the inputted data, for example, a display unit for displaying characters in response to the inputted data.
  • the input device and the output device may also be combined into a single unit, for example, a display unit with a virtual keyboard.
  • memory unit or ” memory refers to a machine or computer readable medium including any mechanism for storing information in a form readable by a computer or similar machine.
  • a computer-readable medium includes read-only memory (” ROM” ) , random access memory (” RAM” ) , magnetic-disk storage media, optical storage media, flash memory devices or other types of machine-accessible storage media.
  • the user device is capable of receiving and/or transmitting one or parameters, performing function/s, communicating with other user devices and transmitting data to the other user devices.
  • the user device may have a processor, a display, a memory unit, a battery and an input-means such as a hard keypad and/or a soft keypad.
  • the user device may be capable of operating on any radio access technology including but not limited to IP-enabled communication, Zig Bee, Bluetooth, Bluetooth Low Energy, Near Field Communication, Z-Wave, etc.
  • the user device may operate at all the seven levels of ISO reference model, and also works on the application layer along with the network, session and presentation layer with any additional features of a touch screen, apps ecosystem, physical and biometric security, etc.
  • the user devices may include, but not limited to, a mobile phone, smartphone, virtual reality (VR) devices, augmented reality (AR) devices, pager, laptop, a general-purpose computer, desktop, personal digital assistant, tablet computer, mainframe computer, or any other device as may be obvious to a person skilled in the art.
  • the user device may also have an interface which typically includes a display with or without keypad including a set of alpha-numeric (ITU-T type) keys that may be real keys or virtual keys.
  • the input interface also comprises touch/acoustic/video components for touch/sound/video input and output.
  • the output interface may comprise a microphone, a speaker, camera and additionally audio/video I/O ports in an accessories interface, wherein the speaker normally serves to provide acoustic output in the form of human speech, ring signals, music, etc.
  • the present invention provides a method and system for translating natural language input into sign language, and for translating sign language into a natural language input.
  • the present invention aims to facilitate bi-directional communication between an ordinary person and a person with hearing or speech disability.
  • the present invention encompasses through the proposed solution, that an ordinary person may direct the camera of his user device towards the person with special ability and record the actions of the specially-abled persons. These actions are translated, in real-time, to the text or voice in localized language so that the ordinary person can understand easily.
  • the ordinary person may type or speak into the user device, and the text/speech input is translated into a series of sign-language GIFs or videos which the person with special ability may understand easily, thus providing a two-way communication.
  • the solution of the present invention also takes into consideration the different sign languages used in different regions around the world.
  • the present invention also encompasses utilizing the localized languages for processing text or speech input provided by an ordinary man.
  • FIG. 1 an architecture of a system [100] for for translating sign language, is disclosed in accordance with exemplary embodiments of the present invention.
  • the system comprises at least a memory unit [112] , input unit [102] , a sign language engine [104] and a language processor [106] , an output unit [108] and a positioning system [110] , all components connected to each other.
  • the system of the present invention also encompasses various other hardware and logical components that may be required to work the invention and obvious to a person skilled in the art. All the components of the system [100] are assumed to be connected to each other unless otherwise indicated. The connections, however, have not been shown in the figures for the purpose of clarity.
  • the input unit [104] is connected to the memory unit [112] , the sign language engine [104] , the language processor [106] , the output unit [108] and the positioning system [110] .
  • the input unit [104] is configured to receive an input from a user of a user device. For instance, a user may provide to the user device one of a text, a speech or a video input, using a touch-based display, a microphone, and a camera present at the user device. Particularly, if the user is a specially abled person communicating with an ordinary person, the ordinary person may place the camera of the user device in front of the specially abled person, and record a video of the actions of the specially abled person.
  • the input unit [102] is further configured to receive a video input from the user of the user device wherein the video input comprises of sign language.
  • the input unit [104] is also configured to identify a language type associated with the input.
  • the present invention encompasses that the input unit [104] comprises of a natural language processor for identifying the language type associated with the input. For instance, if a user speaks into the user device, the input unit [102] determines whether the language of speech is English, French, German, Hindi, etc.
  • natural language processing involves the lexical, syntactic (grammatical) , and semantic domain analysis of the user input using both statistical observations of the various surface forms and a broader interpretation of the relationships and dependencies among words, phrases, and concepts, to identify a type of language.
  • the sign language engine [104] is connected to the memory unit [112] , the input unit [102] , the language processor [106] , the output unit [108] and the positioning system [110] .
  • the sign language engine [104] is configured to receive a region selection for the user device.
  • the region selection may be received, at the sign language engine [104] , from either a positioning system [110] (say, GPS on the mobile device) of the user device automatically or from the user of the user device manually.
  • the sign language engine [104] is also configured to select a sign language associated with the identified region selection. For instance, if the region selection received at the sign language engine [104] is Asia Pacific (APAC) , and the country is India, and city is Delhi, the sign language engine [104] may determine that the sign language is Vietnamese Sign Language.
  • APAC Asia Pacific
  • the present invention encompasses that the sign language engine [104] is also configured to identify the sign language associated with the video input.
  • the sign language engine [104] may one or more frames of the video input, wherein the video input comprises at least one frame.
  • the sign language engine [104] then identifies the one or more frames with at least one sign language identifier. Thereafter, the sign language engine [104] matches or identifies the at least one sign language identifier in a linguistic database comprising at least sign language identifier for one or more sign languages.
  • the input video is firstly processed frame by frame by the sign language engine [104] .
  • the difference in frames is calculated and the highlighted frames are chosen. From these highlighted frames, the sign-language actions are identified which are then matched with the adaptive library of the engine to produce the most accurate and sensible sentences for the ordinary person in natural language format.
  • the language processor [106] is connected to the memory unit [112] , the sign language engine [104] , the input unit [102] , the output unit [108] and the positioning system [110] .
  • the language processor [106] is configured to translate the input into the selected sign language.
  • the language processor [106] is further configured to divide the input into one or more segments, and match the one or more segments of the input in a linguistic database comprising at least natural language word and at least one sign language identifier associated with each of the at least one natural language word. Thereafter, the language processor [106] correlates one or more sign language identifier of the one or more segments.
  • the language processor [106] For instance, if the input from an ordinary user is a speech “Hi, a very good afternoon. Please tell me your query. ” , the language processor [106] then divides the speech into meaningful segments, say “hi” , “good afternoon” , “tell me” and “your query” . The language processor [106] then searches the one or more segments in the linguistic database and identifies a sign language identifier corresponding to the one or more segments. The language processor may then correlate the one or more sign language identifiers to form a meaningful sentence. In this regard, the present invention encompasses that the language processor is also configured to generate a video of the correlation of the sign language identifiers. The same may be understood easily from Fig. 6.
  • the language processor [106] is configured to translate the video input into a natural language output, wherein the language type of the natural language output is based on the region selection.
  • the language processor [106] is further configured to match the at least one sign language identifier of the video input in a linguistic database comprising at least natural language word and at least one sign language identifier associated with each of the at least one natural language word. Thereafter, the language processor [106] correlates the one or more natural language identifier of the at least one sign language identifier.
  • the language processor matches the two or more expressions in the linguistic database, and identifies corresponding natural language identifiers, and correlate the natural language identifiers to form a meaningful sentence.
  • the present invention encompasses that the language processor is also configured to generate text or speech output of the correlation of the natural language identifiers. The same may be understood easily from Fig. 5.
  • the output unit [108] is connected to the memory unit [112] , the sign language engine [104] , the language processor [106] , the input unit [102] and the positioning system [110] .
  • the output unit [108] is configured to provide the translation of the input. For instance, when the language processor [106] translates natural language into sign language, the output unit [108] may be configured to display a GIF or video of the translation on a display of the user device. In another instance, when the language processor [106] translates sign language into natural language, the output unit [108] may be configured to display text output on the display of the user device, or may be configured to provide speech output via the speaker of the user device.
  • the memory unit [112] of the user device is connected to the input unit [102] , the sign language engine [104] , the language processor [106] , the output unit [108] and the positioning system [110] .
  • the memory unit [112] is configured to store linguistic database comprising the sign language identifiers and corresponding the natural language identifiers for the one or more languages that may exist in the world.
  • the present invention encompasses implementation using machine learning, whereby the system may keep expanding its capabilities of learning new sign languages and natural languages.
  • the present invention also encompasses that the system [100] is implemented at a user device.
  • the user device comprises a memory unit [112] , input unit [102] , a sign language engine [104] and a language processor [106] , an output unit [108] and a positioning system [110] , all components connected to each other.
  • the user device of the present invention also encompasses various other hardware and logical components that may be required to work the invention and obvious to a person skilled in the art.
  • the input unit [104] of the user device is connected to the memory unit [112] , the sign language engine [104] , the language processor [106] , the output unit [108] and the positioning system [110] .
  • the input unit [104] is configured to receive an input from a user of a user device. For instance, a user may provide to the user device one of a text, a speech or a video input, using a touch-based display, a microphone, and a camera present at the user device.
  • the present invention encompasses that the input unit [102] of the user device is further configured to receive a video input from the user of the user device wherein the video input comprises of sign language.
  • the input unit [104] of the user device is also configured to identify a language type associated with the input.
  • the present invention encompasses that the input unit [104] comprises of a natural language processor for identifying the language type associated with the input.
  • the sign language engine [104] of the user device is connected to the memory unit [112] , the input unit [102] , the language processor [106] , the output unit [108] and the positioning system [110] .
  • the sign language engine [104] is configured to receive a region selection for the user device.
  • the region selection may be received, at the sign language engine [104] , from either a positioning system [110] (say, GPS on the mobile device) of the user device automatically or from the user of the user device manually.
  • the sign language engine [104] of the user device is also configured to select a sign language associated with the identified region selection.
  • the present invention encompasses that the sign language engine [104] of the user device is also configured to identify the sign language associated with the video input.
  • the sign language engine [104] of the user device may one or more frames of the video input, wherein the video input comprises at least one frame.
  • the sign language engine [104] of the user device then identifies the one or more frames with at least one sign language identifier. Thereafter, the sign language engine [104] of the user device matches the at least one sign language identifier in a linguistic database comprising at least sign language identifier for one or more sign languages.
  • the language processor [106] of the user device is connected to the memory unit [112] , the sign language engine [104] , the input unit [102] , the output unit [108] and the positioning system [110] .
  • the language processor [106] of the user device is configured to translate the input into the selected sign language.
  • the language processor [106] of the user device is further configured to divide the input into one or more segments, and match the one or more segments of the input in a linguistic database comprising at least natural language word and at least one sign language identifier associated with each of the at least one natural language word. Thereafter, the language processor [106] correlates one or more sign language identifier of the one or more segments.
  • the present invention encompasses that the language processor is also configured to generate a video of the correlation of the sign language identifiers. The same may be understood easily from Fig. 6.
  • the language processor [106] of the user device in an event the video input comprises of sign language, is configured to translate the video input into a natural language output, wherein the language type of the natural language output is based on the region selection.
  • the language processor [106] of the user device is further configured to match the at least one sign language identifier of the video input in a linguistic database comprising at least natural language word and at least one sign language identifier associated with each of the at least one natural language word. Thereafter, the language processor [106] of the user device correlates the one or more natural language identifier of the at least one sign language identifier.
  • the present invention encompasses that the language processor is also configured to generate text or speech output of the correlation of the natural language identifiers. The same may be understood easily from Fig. 5.
  • the output unit [108] of the user device is connected to the memory unit [112] , the sign language engine [104] , the language processor [106] , the input unit [102] and the positioning system [110] .
  • the output unit [108] is configured to provide the translation of the input.
  • the output unit [108] may be configured to display text output on the display of the user device, or may be configured to t provide speech output via the speaker of the user device.
  • the memory unit [112] of the user device is connected to the input unit [102] , the sign language engine [104] , the language processor [106] , the output unit [108] and the positioning system [110] .
  • the memory unit [112] of the user device is configured to store linguistic database comprising the sign language identifiers and corresponding the natural language identifiers for the one or more languages that may exist in the world.
  • the present invention encompasses implementation using machine learning at the user device, whereby the system may keep expanding its capabilities of learning new sign languages and natural languages.
  • FIG. 2 illustrates an exemplary method flow diagram depicting method [200] for translating sign language, in accordance with exemplary embodiments of the present disclosure.
  • the method [200] begin at step 202, when a user starts operating the user device and starts providing an input to the user device.
  • the method begins upon receiving at the user device, a user input request to initiate the conversion of natural language to sign language or vide versa.
  • the method [200] comprises receiving, at an input unit [102] , an input from a user of a user device.
  • a user may provide to the user device one of a text, a speech or a video input, using a touch-based display, a microphone, and a camera present at the user device.
  • the method comprises identifying, by the input unit [102] , a language type associated with the input. For instance, if a user speaks into the user device, the input unit [102] determines whether the language of speech is English, French, German, Hindi, etc. This step may include processing the input to identify one or more words or tokens in the input and identifying the language associated with said words based on pre-stored data. For instance, if the user says, “Hi, how are you” , the words ‘Hi’ , ‘how’ , ‘are’a nd ‘you’a re extracted from the input and based on pre-stored data it is determined that all the words identified belong to English language. Thus, in this instance, the language type determined is English.
  • the method encompasses receiving, at a sign language engine [104] , a region selection for the user device.
  • the sign language engine [104] is configured to receive a region selection for the user device.
  • the region selection may be received, at the sign language engine [104] , from either a positioning system [110] (say, GPS on the mobile device) of the user device automatically or from the user of the user device manually.
  • the invention encompasses prioritising a user manual input over the GPS data to ensure that if a user who is American having preferred language English is currently traveling to Germany, then the region selection is still America as the user would prefer to receive conversion in English language and not German language.
  • step [210] selecting, by the sign language engine [104] , a sign language associated with the identified region selection.
  • the region selection received at the sign language engine [104] is Asia Pacific (APAC) , and the country is India, and city is Delhi, the sign language engine [104] may determine that the sign language is Vietnamese Sign Language.
  • the present invention also encompasses that the sign language engine [104] may identify the sign language based on the language type of the input.
  • the method encompasses translating, by a language processor [106] , the input into the selected sign language.
  • the language processor [106] is further configured to divide the input into one or more segments, and match the one or more segments of the input in a linguistic database comprising at least natural language word and at least one sign language identifier associated with each of the at least one natural language word. Thereafter, the language processor [106] correlates one or more sign language identifier of the one or more segments.
  • the language processor [106] then divide the speech into meaningful segments, say “hi” , “good afternoon” , “tell me” and “your query” .
  • the language processor [106] may then search the one or more segments in the linguistic database, and find a sign language identifier corresponding to the one or more segments.
  • the language processor may then correlate the one or more sign language identifiers to form a meaningful sentence.
  • the present invention encompasses that the language processor is also configured to generate a video of the correlation of the sign language identifiers. The same may be understood easily from Fig. 6.
  • the method encompasses providing, at an output unit [108] , the translation of the input.
  • the output unit [108] may be configured to display a GIF or video of the translation on a display of the user device.
  • the present invention encompasses that method [200] comprises receiving, at the input unit [102] , a video input from the user of the user device wherein the video input comprises of sign language.
  • the video input comprises of sign language.
  • the ordinary person may place the camera of the user device in front of the specially abled person, and record a video of the actions of the specially-abled person.
  • the present invention encompasses identifying, by the sign language engine [104] , the sign language associated with the video input.
  • the region selection received at the sign language engine [104] is Asia Pacific (APAC) , and the country is India, and city is Delhi
  • the sign language engine [104] may determine that the sign language is Vietnamese Sign Language.
  • the sign language engine [104] may one or more frames of the video input, wherein the video input comprises at least one frame.
  • the sign language engine [104] then identifies the one or more frames with at least one sign language identifier. Thereafter, the sign language engine [104] matched the at least one sign language identifier in a linguistic database comprising at least sign language identifier for one or more sign languages.
  • the method of the present invention comprises translating, by a language processor [106] , the video input into a natural language output, wherein the language type of the natural language output is based on the region selection.
  • the language processor [106] matches the at least one sign language identifier of the video input in a linguistic database comprising at least natural language word and at least one sign language identifier associated with each of the at least one natural language word.
  • the language processor [106] correlates the one or more natural language identifier of the at least one sign language identifier.
  • the language processor matches the two or more expressions in the linguistic database, and identifies corresponding natural language identifiers, and correlate the natural language identifiers to form a meaningful sentence.
  • the language processor is also configured to generate text or speech output of the correlation of the natural language identifiers
  • the output unit [108] may be configured to display text output on the display of the user device or may be configured to provide speech output via the speaker of the user device.
  • FIG. 3 illustrates an exemplary signal flow diagram for method of [300] translating sign language, in accordance with exemplary embodiments of the present disclosure.
  • the present invention encompasses, at step 302, receiving, at an input unit, an input from a user of a user device.
  • the invention encompasses identifying a language type associated with the input and a region selection for the user device.
  • the sign language engine is connected to the linguistic database that comprises one or more sign languages [312A, 312B, ..., 312N] and one or more natural languages [314A, 314B, ..., 314N] .
  • the system may either encode or decode the input.
  • the input is text or speech from an ordinary person
  • the system encodes the input by translating the input into sign language using the method [200] of the present invention.
  • the input is video from a specially abled person
  • the system decodes the input by translating the sign language input into natural language output using the method [200] of the present invention.
  • the present invention encompasses that the system may comprise mode of operations. In an exemplary mode A, the system may allow an ordinary user to communicate with the specially abled user. In another exemplary mode B, the system may allow a specially abled user to communicate with an ordinary user.
  • FIG. 4 illustrates an exemplary signal flow diagram for method of [400] translating sign language into a natural language output, in accordance with exemplary embodiments of the present disclosure. Particularly, it describes an exemplary method of decoding the sign language into a natural language output.
  • the present invention encompasses that the input unit [404] is further configured to receive a video input from the user of the user device wherein the video input comprises of sign language.
  • the input video is processed [406] frame by frame by the sign language engine [104] .
  • the frame detection [408] calculated the difference in frames and the highlighted frames are chosen. From these highlighted frames the sign-language actions are identified which are then matched [410] with the adaptive linguistic library of the engine to produce [416] the most accurate and sensible sentences for the ordinary person in natural language format.
  • the linguistic database that comprises one or more sign languages [412A, 412B, ..., 412N] and one or more natural languages [414A, 414B, ..., 414N] .
  • FIG. 5 illustrates an exemplary implementation [500] of the system and method for translating sign language on a user device, in accordance with exemplary embodiments of the present disclosure.
  • Fig. 5 presents a scenario where a hearing and speaking impaired wishes to communicate with an ordinary person.
  • the specially abled person starts expressing in sign language, the ordinary person captures the expression of the specially abled person using camera of the user device.
  • the video is processed frame by frame at the user device to provide a meaningful output for the ordinary person.
  • the output can be in the form of text or speech or sign language.
  • the regions and languages for input and output are pre-decided so as to provide the communication in accordance with different sign languages being used throughout the world.
  • FIG. 6 illustrates another exemplary implementation [600] of the system and method for translating sign language on a user device, in accordance with exemplary embodiments of the present disclosure.
  • Fig. 6 represents the response from an ordinary person to a specially abled person.
  • the ordinary person can type, speak or use another set of sign-language to provide an input to the user device.
  • This input is converted to sign language output in the form of GIFs or animated videos in a format which is understood by the specially abled person.
  • the solution provided by present invention effectively solves the problem of effective communication between an ordinary person and a person with hearing or speech disability. It provides technical advancement over existing solutions, by providing a method and system for translating natural language input into sign language, and vice versa.
  • the present invention also provides the advantage of automatic identification of region of a user, and a local language and a sign language associated with a user’s region.
  • present invention provides the technical effect of converting, in real-time, a natural language speech input into a sign language output (preferably, video output) , and also coverts, in real-time, sign language output (preferably, video output) into a natural language output.
  • the present invention may be implemented in any type of communication technology, where a system [100] may be conversing with a specially abled user. While the implementation of the solution of the present invention has been discussed to a very few usages including, the invention may also be used in many other applications that may be known to a person skilled in the art, all of which are objectives of the present invention.
  • the interface, module, memory, database, processor and component depicted in the figures and described herein may be present in the form of a hardware, a software and a combination thereof.
  • the connection shown between these components/module/interface in the system [100] are exemplary, and any components/module/interface in the system [100] may interact with each other through various logical links and/or physical links. Further, the components/module/interface may be connected in other possible ways.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Machine Translation (AREA)

Abstract

L'invention concerne un procédé (200) et un système (100) pour un procédé de traduction de langue des signes. Le procédé (200) consiste à recevoir, au niveau d'une unité d'entrée (102), une entrée provenant d'un utilisateur d'un dispositif utilisateur. Ultérieurement, l'unité d'entrée (102) identifie un type de langue associé à l'entrée. Ensuite, le moteur de langue des signes (104) reçoit une sélection de région pour le dispositif utilisateur. Puis le moteur de langue des signes (104) sélectionne une langue des signes associée à la sélection de région identifiée. Ensuite, un processeur de langue (106) traduit l'entrée en la langue des signes sélectionnée. Enfin, une unité de sortie (108) fournit la traduction de l'entrée.
PCT/CN2021/088829 2020-04-30 2021-04-22 Système et procédé de traduction de langue des signes WO2021218750A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IN20204101859.5 2020-04-30
IN202041018595 2020-04-30

Publications (1)

Publication Number Publication Date
WO2021218750A1 true WO2021218750A1 (fr) 2021-11-04

Family

ID=78374225

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/088829 WO2021218750A1 (fr) 2020-04-30 2021-04-22 Système et procédé de traduction de langue des signes

Country Status (1)

Country Link
WO (1) WO2021218750A1 (fr)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030069997A1 (en) * 2001-08-31 2003-04-10 Philip Bravin Multi modal communications system
CN1464433A (zh) * 2002-06-17 2003-12-31 中国科学院计算技术研究所 通过中间模式语言进行手语翻译的方法
US20070043503A1 (en) * 2005-08-18 2007-02-22 Oesterling Christopher L Navigation system for hearing-impaired operators
US20180293986A1 (en) * 2017-04-11 2018-10-11 Sharat Chandra Musham Worn device for conversational-speed multimedial translation between two individuals and verbal representation for its wearer
US20190130176A1 (en) * 2017-11-01 2019-05-02 Sorenson Ip Holdings Llc Performing artificial intelligence sign language translation services in a video relay service environment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030069997A1 (en) * 2001-08-31 2003-04-10 Philip Bravin Multi modal communications system
CN1464433A (zh) * 2002-06-17 2003-12-31 中国科学院计算技术研究所 通过中间模式语言进行手语翻译的方法
US20070043503A1 (en) * 2005-08-18 2007-02-22 Oesterling Christopher L Navigation system for hearing-impaired operators
US20180293986A1 (en) * 2017-04-11 2018-10-11 Sharat Chandra Musham Worn device for conversational-speed multimedial translation between two individuals and verbal representation for its wearer
US20190130176A1 (en) * 2017-11-01 2019-05-02 Sorenson Ip Holdings Llc Performing artificial intelligence sign language translation services in a video relay service environment

Similar Documents

Publication Publication Date Title
AU2021202694B2 (en) Facilitating end-to-end communications with automated assistants in multiple languages
US10073843B1 (en) Method and apparatus for cross-lingual communication
JP6802005B2 (ja) 音声認識装置、音声認識方法及び音声認識システム
CN104380375B (zh) 用于从对话中提取信息的设备
CN101923858B (zh) 一种实时同步互译语音终端
US20090144048A1 (en) Method and device for instant translation
KR20130123037A (ko) 양방향 자동 통역 및 번역 서비스 제공 장치 및 그 방법
TW201606750A (zh) 使用外國字文法的語音辨識
US10810381B2 (en) Speech converter
CN110517668B (zh) 一种中英文混合语音识别系统及方法
US20230127787A1 (en) Method and apparatus for converting voice timbre, method and apparatus for training model, device and medium
Abid et al. Speak Pakistan: Challenges in developing Pakistan sign language using information technology
WO2021218750A1 (fr) Système et procédé de traduction de langue des signes
JP2004015478A (ja) 音声通信端末装置
KR20210068790A (ko) 수화통역 시스템
KR20210051523A (ko) 도메인 자동 분류 대화 시스템
US20220382998A1 (en) Translation method and translation device
KR20050052943A (ko) 여행자용 키워드 기반 음성번역 시스템 및 그 방법
Raheem et al. Real-time speech recognition of arabic language
JP6985311B2 (ja) 相槌判定によって応答発話生成を制御する対話実施プログラム、装置及び方法
Vijayakumar et al. ISAY: SMS Service Using ARM
Chung et al. Topic spotting common sense translation assistant
KR20200134573A (ko) 대화문맥 학습 기반 자동 번역 장치 및 방법
KR20230126971A (ko) 다중 번역 제공 방법 및 다중 번역 제공 서버
KR20220030448A (ko) 언어 대응 화상 출력 장치, 방법 및 시스템

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21797035

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21797035

Country of ref document: EP

Kind code of ref document: A1