GB2594081A - Gesture recognition systems and methods of its use - Google Patents

Gesture recognition systems and methods of its use Download PDF

Info

Publication number
GB2594081A
GB2594081A GB2005568.7A GB202005568A GB2594081A GB 2594081 A GB2594081 A GB 2594081A GB 202005568 A GB202005568 A GB 202005568A GB 2594081 A GB2594081 A GB 2594081A
Authority
GB
United Kingdom
Prior art keywords
gesture
gestures
recipe
user
message
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
GB2005568.7A
Other versions
GB202005568D0 (en
Inventor
Bright Rebecca
Gadgil Swapnil
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Therapy Box Ltd
Original Assignee
Therapy Box Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Therapy Box Ltd filed Critical Therapy Box Ltd
Priority to GB2005568.7A priority Critical patent/GB2594081A/en
Publication of GB202005568D0 publication Critical patent/GB202005568D0/en
Publication of GB2594081A publication Critical patent/GB2594081A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61FFILTERS IMPLANTABLE INTO BLOOD VESSELS; PROSTHESES; DEVICES PROVIDING PATENCY TO, OR PREVENTING COLLAPSING OF, TUBULAR STRUCTURES OF THE BODY, e.g. STENTS; ORTHOPAEDIC, NURSING OR CONTRACEPTIVE DEVICES; FOMENTATION; TREATMENT OR PROTECTION OF EYES OR EARS; BANDAGES, DRESSINGS OR ABSORBENT PADS; FIRST-AID KITS
    • A61F4/00Methods or devices enabling patients or disabled persons to operate an apparatus or a device not forming part of the body 
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B21/00Teaching, or communicating with, the blind, deaf or mute
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems

Abstract

A gesture recognition system that allows for a first user to setup a plurality of gesture recipes, each gesture recipe comprising at least one gesture, allocate these gestures to a grid of at least one cell where each cell comprises a message such that a gesture capturing device can capture gestures, as setup by the user, wherein the gesture capturing device selects at least one cell position in the grid based on the gesture and determines at least one message from the selected cell position such that the determined message can be output to the user. Also disclosed is a gesture recognition system having a database storing a plurality of gesture recipes, triggers, pre-defined gestures, messages wherein each of the gesture recipes comprises a combination of least one gesture, a gesture capturing device to capture gestures of a first user, a trigger detection device to match each gesture with the stored plurality of triggers, to determine if the gesture corresponds to a trigger, a recipe determining device to retrieve a gesture recipe relevant to at least one trigger, retrieve a message corresponding to the recipe, then converting the message using a text-to-speech converter to output the relevant speech message.

Description

GESTURE RECOGNITION SYSTEMS AND METHODS OF ITS USE
TECHNICAL FIELD
[0001] The present disclosure relates to gesture recognition. In particular, the present disclosure relates to systems and methods for gesture recognition.
BACKGROUND
[0002] Any references to methods, apparatus or documents of the prior art are not to be taken as constituting any evidence or admission that they formed, or form part of the common general knowledge.
[0003] There are many people in the world suffering from one or other disease or physical disability. Some people with major physical disability and people with ALS may have limited movements and gestures. Further, for people with ALS or other major physical disability, their own available movement for communicating with others may only be eyes or some face movements.
[0004] Currently, there exist systems and methods for using gestures, such as those created by the movement of a hand, as input. For example, there exists handwriting recognition systems or apparatuses that can interpret a first user's gesture made through a stylus or pen as input. Also, there are systems that provide a first user with wiring or props or other devices in order to track first user's hand movements using optical sensors.
[0005] It would be attractive, practically and commercially, to be able to provide gesture recognition techniques to assist the people with ALS or other major physical disability to communicate with others or express themselves.
[0006] It is an object of the present invention to overcome or ameliorate the above discussed disadvantages of the prior art, or at least offer a useful alternative.
SUMMARY
[0007] An embodiment of the present disclosure provides a gesture recognition system configured to be accessed via a computing device. The system includes a recipe set up device configured to enable a first user to: set up a plurality of gesture recipes that works for the first user, wherein each of the plurality of gesture recipes comprises a combination of at least one gesture; and set up at least one grid comprising one or more cells by allocating a pre-defined gesture of a plurality of pre-defined gestures to a cell position in the at least one grid. Each cell of the at least one grid may include a message. The system also includes a gesture capturing device configured to continuously capture one or more gestures of the first user, wherein the gesture capturing device waits for a pre-defined time after each gesture of the one or more gestures is captured. The system also includes a gesture processing device configured to: analyse the captured one or more gestures to determine if the one or more gestures corresponds to a gesture recipe of the plurality of gesture recipes, wherein the first user needs to provide the one or more gestures within a pre-defined time frame to be considered as a gesture recipe; select at least one cell position in the at least one grid based on the determined gesture recipe; and determine at least one message from the selected at least one cell position. The system also includes an output device configured to present the at least one message to the first user.
[0008] According to an aspect of the present disclosure, the gesture recognition system also includes a text-to-speech convertor configured to convert the at least one message into a speech message prior to presenting the at least one message to the first user. The at least one message may include a text message, a key shortcut, and so forth.
[0009] Further, the gesture recognition system may include a configured to store the plurality of gesture recipes, a plurality of triggers, the plurality of pre-defined gestures, the at least one grid, and a plurality of messages.
[0010] According to another aspect of the present disclosure, the output device is configured to present the speech message aloud via a speaker of the computing device.
[0011] According to another aspect of the present disclosure, the first user sets up the plurality of gesture recipes based on the plurality of pre-defined gestures, wherein each of the gesture recipe comprises two or more of the plurality of pre-defined gestures.
[0012] According to another aspect of the present disclosure, the plurality of pre-defined gestures comprises blink both eyes, blink left eye, blink right eye, mouth open, tongue out, smile, frown, eyebrows up, and cheek puff.
[0013] According to another aspect of the present disclosure, the gesture capturing device comprises a true depth camera system.
[0014] According to another aspect of the present disclosure, the gesture processing device is further configured to process the captured one or more gestures based on at least one of two trigger modes comprising a direct phrase trigger mode and a cell position trigger mode.
[0015] According to another aspect of the present disclosure, for the direct trigger mode, the gesture processing device is configured to enable a second user to: identify an appropriate recipe comprising at least one gesture for the first user; and allocate a phrase from a plurality of phrases stored in the database.
[0016] According to yet another aspect of the present disclosure, for the cell position trigger mode, the gesture processing device is configured to: set up a gesture recipe; and allocate the gesture recipe to a position on a specific cell of a grid of a plurality of cells, wherein each of the plurality of cells comprises at least one of a terminal cell and a category folder comprising a category leading to a new grid, wherein the new grid comprises one or more cells comprising at least one of another terminal cell and another category folder, wherein the terminal cell and the another terminal cell comprises at least one message.
[0017] Another embodiment of the present disclosure provides a gesture recognition system configured to be accessed via a computing device, comprising a database configured to store a plurality of gesture recipes, a plurality of triggers, a plurality of pre-defined gestures, and a plurality of messages, wherein each of the plurality of gesture recipes includes a combination of at least one gesture. The system also includes a gesture capturing device configured to capture one or more gestures of a first user, wherein the gesture capturing device waits for a predefined time after each gesture of the one or more gestures is captured. The system also includes a trigger detection device configured to match each gesture of the one or more gestures with the stored plurality of triggers to determine if the gesture corresponds to at least one trigger. The system also includes a recipe determining device configured to: retrieve at least one gesture recipe relevant to the at least one trigger from the stored plurality of gesture recipes; and retrieve a message corresponding to the at least one gesture recipe from the database. The system also includes a text-to-speech convertor configured to convert the message into a speech message; and an output device configured to present the speech message corresponding to the at least one recipe.
[0018] Another embodiment of the present disclosure provides a method for recognising gestures by using a gesture recognition system. The method includes: enabling, by a recipe set up device, a first user to set up a plurality of gesture recipes that works for the first user, wherein each of the plurality of gesture recipes comprises a combination of at least one gesture; enabling, by the recipe set up device, the first user to set up at least one grid comprising one or more cells by allocating a pre-defined gesture of a plurality of pre-defined gestures to a cell position in the at least one grid, wherein each cell of the at least one grid comprises a message; capturing, by a gesture capturing device, one or more gestures of the first user, wherein the gesture capturing device waits for a pre-defined time after each gesture of the one or more gestures is captured; analysing, by a gesture processing device, the captured one or more gestures to determine if the one or more gestures corresponds to a gesture recipe of the plurality of gesture recipes, wherein the first user needs to provide the one or more gestures within a predefined time frame to be considered as a gesture recipe; selecting, by the gesture processing device, at least one cell position in the at least one grid based on the determined gesture recipe; determining, by the gesture processing device, at least one message from the selected at least one cell position; and presenting, by an output device, the at least one message to the first user.
[0019] The method may also include converting, by a text-to-speech convertor, the at least one message comprising a text message into a speech message prior to presenting the at least one message to the user; and [0020] In some embodiments, the method may also include storing, by a database, the plurality of gesture recipes, a plurality of triggers, the plurality of pre-defined gestures, the grid, and a plurality of messages.
[0021] According to an aspect of the present disclosure, the method may also include presenting the speech message aloud via a speaker.
[0022] According to another aspect of the present disclosure, the first user sets up the plurality of gesture recipes based on the plurality of pre-defined gestures, wherein each of the gesture recipe comprises two or more of the plurality of pre-defined gestures.
[0023] According to another aspect of the present disclosure, the plurality of pre-defined gestures comprises blink both eyes, blink left eye, blink right eye, mouth open, tongue out, smile, frown, eyebrows up, and cheek puff.
[0024] According to another aspect of the present disclosure, the method may also include processing, by the gesture processing device, the captured one or more gestures based on a trigger mode comprising a direct phrase trigger mode and a cell position trigger mode.
[0025] According to another aspect of the present disclosure, the method may also include for the direct trigger mode, enabling, by the gesture processing device, a second user to: identify an appropriate recipe comprising at least one gesture for the first user; and allocate a phrase from a plurality of phrases stored in the database.
[0026] According to another aspect of the present disclosure, the method may also include for the cell position trigger mode: setting up, by the gesture processing device, a gesture recipe; and allocating, by the gesture processing device, the gesture recipe to a position on a specific cell of a grid of a plurality of cells, wherein each of the plurality of cells comprises at least one of a terminal cell and a category folder comprising a category leading to a new grid, wherein the new grid comprises one or more cells comprising at least one of another terminal cell and another category folder, wherein the terminal cell and the another terminal cell comprises at least one message.
BRIEF DESCRIPTION OF THE DRAWINGS
[0027] The illustrated embodiments of the disclosed subject matter will be best understood by reference to the drawings, wherein like parts are designated by like numerals throughout. The following description is intended only by way of example, and simply illustrates certain selected embodiments of devices, systems, and processes that are consistent with the disclosed subject matter as claimed herein.
[0028] Figures 1A-1B are schematic diagrams illustrating exemplary environments, where various embodiments of the present disclosure may function; [0029] Figure 2 is a block diagram illustrating various system elements of an exemplary gesture recognition system, in accordance with an embodiment of the present disclosure; [0030] Figures 3A-3B is a flowchart diagram illustrating a method for recognising gestures by using the exemplary gesture recognition system of Figure 2, in accordance with an embodiment of the present disclosure; and [0031] Figures 4A-4C are screenshots illustrating various embodiments of the gesture recognition system of Figure 2, in accordance with an embodiment of the present disclosure.
DETAILED DESCRIPTION
[0032] The following detailed description is made with reference to the figures. Exemplary embodiments are described to illustrate the disclosure, not to limit its scope, which is defined by the claims. Those of ordinary skill in the art will recognize a number of equivalent variations in the description that follows.
[0033] The functional units described in this specification have been labelled as devices or modules. A device or a module may be implemented in programmable hardware devices such as processors, digital signal processors, central processing units, field programmable gate arrays, programmable array logic, programmable logic devices, cloud processing systems, or the like. The devices or modules may also be implemented in software for execution by various types of processors. An identified device or module may include executable code and may, for instance, comprise one or more physical or logical blocks of computer instructions, which may, for instance, be organized as an object, procedure, function, or other construct. Nevertheless, the executable of an identified device/module need not be physically located together but may comprise disparate instructions stored in different locations which, when joined logically together, comprise the device and achieve the stated purpose of the device/module.
[0034] Indeed, an executable code of a device or module could be a single instruction, or many instructions, and may even be distributed over several different code segments, among different applications, and across several memory devices. Similarly, operational data may be identified and illustrated herein within the device and may be embodied in any suitable form and organized within any suitable type of data structure. The operational data may be collected as a single data set or may be distributed over different locations including over different storage devices, and may exist, at least partially, as electronic signals on a system or network.
[0035] Reference throughout this specification to "a select embodiment," "one embodiment," or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the disclosed subject matter. Thus, appearances of the phrases "a select embodiment," "in one embodiment," or "in an embodiment" in various places throughout this specification are not necessarily referring to the same embodiment.
[0036] Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided, to provide a thorough understanding of embodiments of the disclosed subject matter. One skilled in the relevant art will recognize, however, that the disclosed subject matter can be practiced without one or more of the specific details, or with other methods, components, materials, etc. In other instances, well-known structures, materials, or operations are not shown or described in detail to avoid obscuring aspects of the disclosed subject matter.
[0037] In accordance with the exemplary embodiments, the disclosed computer programs or modules can be executed in many exemplary ways, such as an application that is resident in the memory of a device or as a hosted application that is being executed on a server and communicating with the device application or browser via a number of standard protocols, such as TCP/IP, HTTP, XML, SOAP, REST, JSON and other sufficient protocols. The disclosed computer programs can be written in exemplary programming languages that execute from memory on the device or from a hosted server, such as BASIC, COBOL, C, C++, Java, Pascal, or scripting languages such as JavaScript, Python, Ruby, PHP, Perl or other sufficient programming languages.
[0038] Some of the disclosed embodiments include or otherwise involve data transfer over a network, such as communicating various inputs or files over the network. The network may include, for example, one or more of the Internet, Wide Area Networks (WANs), Local Area Networks (LANs), analog or digital wired and wireless telephone networks (e.g., a PSTN, Integrated Services Digital Network (ISDN), a cellular network, and Digital Subscriber Line (xDSL)), radio, television, cable, satellite, and/or any other delivery or tunnelling mechanism for carrying data. The network may include multiple networks or sub networks, each of which may include, for example, a wired or wireless data pathway. The network may include a circuit-switched voice network, a packet-switched data network, or any other network able to carry electronic communications, a cellular telephone network configured to enable exchange of text or SMS messages.
[0039] Examples of the network include, but are not limited to, a personal area network (PAN), a storage area network (SAN), a home area network (HAN), a campus area network (CAN), a local area network (LAN), a wide area network (WAN), a metropolitan area network (MAN), a virtual private network (VPN), an enterprise private network (EPN), Internet, a global area network (GAN), and so forth.
[0040] As used herein, the term "gesture recognition system" refers to a system that can be a single device or a combination of multiple devices configured to process one or more gestures of a first user and speak out a message by recognizing the one or more gestures. The gesture recognition system may include hardware, software, firmware, or combination of these.
[0041] As used herein, the term "first user" may refer to a person that accesses the gesture recognition system by using a computing device.
[0042] As used herein, the term "computing device" refers to a device configured to enable a user to access the transcription system. Examples of the computing device may include such as, but not limited to, computers, smart phones, laptops, smart televisions, servers, tablet computers, and so forth [0043] Figures 1A-1B are schematic diagrams illustrating exemplary environments 100A-100B, where various embodiments of the present disclosure may function. As shown in the Figure 1A, the environment 100A primarily includes a first user 102 and a computing device 104 associated with the first user 102. The computing device 104 includes a gesture recognition system 106 configured to recognise gesture(s) of the first user 102, determine a message corresponding to the gestures, and present the message as a speech message to the first user 102. The message may be a text message. In some embodiments, the message may be an audio/video message. In some embodiments, the first user 102 may download and install the gesture recognition system 106 as a mobile application on the computing device 104. In some embodiments, the first user 102 may access the gesture recognition system 106 (hereinafter may be referred as system 106) by typing a web address such as a uniform resource locator (URL) on a web browser application on the computing device 104.
[0044] As shown in the Figure 1B, the environment 100B includes the first user 102, a second user 108, the computing device 104 associated with the first user 102, and the gesture recognition system 106 present in a network 110. The network 110 may be a cloud network. In some embodiments, the system 106 may be present is a cloud network.
[0045] The gesture recognition system 106 may enable the first user 102 to set up a plurality of gesture recipes that works for the first user. Each gesture recipe of the plurality of gesture recipes may include a combination of at least one gestures or multiple gestures Further, the system 106 may enable the first user 102 to set up at least one grid comprising one or more cells by allocating a pre-defined gesture of a plurality of pre-defined gestures to a cell position in a grid. Each cell on the grid comprises a message. The system 106 may continuously capture one or more gestures of the first user 102. The system 106 may wait for a predefined time after each gesture of the one or more gestures is captured. For example, the system 106 may wait for 5 seconds after receiving a first gesture and may also display a countdown of 5 seconds to the first user 102. If a second gesture in a gesture recipe is not received within five seconds that the system may clear the first gesture. In some embodiments, the system 106 may use a camera of the computing device 104 for capturing the one or more gestures.
[0046] The system 106 may also analyse the captured one or more gestures to determine if the one or more gestures corresponds to a gesture recipe of the plurality of gesture recipes, wherein the first user 102 needs to provide the one or more gestures within a pre-defined time frame to be considered as a gesture recipe. The system 106 then may select at least one cell position in the at least one grid based on the determined gesture recipe.
[0047] The system 106 may also determine at least one message from the selected at least one cell position corresponding to the determined gesture recipe. The system 106 may convert the at least one message comprising a text message into a speech message and present the speech message to the first user 102 via a speaker. The speaker may be a speaker of the computing device 104 or a speaker of another device in communication with the computing device 104. In some embodiments, the message includes an audio message that may be directly presented to the user 102 via the speaker.
[0048] In some embodiments, the system 106 may store the plurality of gesture recipes, a plurality of triggers, the plurality of pre-defined gestures, and the plurality of messages, wherein each of the plurality of gesture recipes comprises a combination of at least one gesture. Further, the system may capture the one or more gestures of the first user 102. The system 106 may match each gesture of the one or more gestures with the stored plurality of triggers to determine if the gesture corresponds to at least one trigger. The system 106 may also retrieve at least one gesture recipe relevant to the at least one trigger from the stored plurality of gesture recipes. Further, the system 106 may also retrieve a message corresponding to the at least one gesture recipe and convert the message into a speech message. The system 106 then may present the speech message to the first user 102.
[0049] The system 106 may allow two modes of gesture processing i.e. two trigger modes comprising a direct phrase trigger mode and a cell position trigger mode. In some embodiments, in a direct phrase trigger mode, the system 106 may allow the second user 108 may identify an appropriate recipe i.e. the appropriate combination of gestures for the first user 102 and then allocate a phrase (i.e. a message) from the phrases or the plurality of messages stored in the system 106.
[0050] For the cell position trigger mode, the system 106 is configured to: set up a gesture recipe; and allocate the gesture recipe to a position on a specific cell of a grid of a plurality of cells, wherein each of the plurality of cells comprises at least one of a terminal cell and a category folder comprising a category leading to a new grid, wherein the new grid comprises one or more cells comprising at least one of another terminal cell and another category folder, wherein the terminal cell and the another terminal cell comprises at least one message. The message may be a text message or a picture message.
[0051] Figure 2 is a block diagram illustrating various system elements of an exemplary gesture recognition system, in accordance with an embodiment of the present disclosure. The gesture recognition system 200 includes a database 204, a recipe set up device 206, a gesture capturing device 208, a gesture processing device 210, a text-to-speech convertor 212, a trigger detection device 214, a recipe determining device 216, and an output device 218. Further, the devices 204-218 may be connected to each other via a bus 220.
[0052] The recipe set up device 206 is configured to enable a first user to set up a plurality of gesture recipes that works for the first user, wherein each of the plurality of gesture recipes comprises a combination of at least one gesture. In some embodiments, the first user sets up the plurality of gesture recipes based on the plurality of pre-defined gestures, wherein each of the gesture recipe comprises two or more of the plurality of pre-defined gestures. The non-limiting examples of the plurality of pre-defined gestures may include blink both eyes, blink left eye, blink right eye, mouth open, tongue out, smile, frown, eyebrows up, and cheek puff. The first user can define its own gestures that may be stored in the database 204 for future reference.
[0053] In some embodiments, the recipe set up device 206 may be configured to enable the first user to set up at least one grid comprising one or more cells by allocating a pre-defined gesture of a plurality of pre-defined gestures to a cell position in the at least one grid. Each cell of the at least one grid comprises a message. In some embodiments, the message may be a category leading to another grid.
Each cell of the grid may be a terminal cell i.e. including a message or the cell may be a category cell comprising a category that may lead to a new grid comprising a message or a sub-category. Therefore, a grid may include multiple grids within each cell. For example, in a 3 x 3 grid the cells may be for Al, A2, A3, Bl, B2, B3, Cl, C2, C3. Each of these cells can be a folder or a terminal cell. A folder is likely to be a category that leads to a new grid (also 3 x 3, for example) with either further category folders (e.g. for a sub-category), or terminal cells (e.g. final messages to be spoken aloud with the app's text to speech). The first user 102 may set up a Gridl, a Grid2, and a Grid 3 as shown below: Gridl:
A B C
1 Category: Food Category: Family Category: Place 2 Category: Category: Category: 3 Category: Category: Category: Grid 2:
A B C
1 Category: Countries Category: Cities Category: Shops 2 Category: Category: Category: 3 Category: Category: Category Grid 3:
A B C
1 London Manchester Birmingham 2 Edinburgh Category: Category: 3 Category: Category: Category [0054] The gesture capturing device 208 is configured to continuously capture one or more gestures of the first user, wherein the gesture capturing device waits for a pre-defined time after each gesture of the one or more gestures is captured. In some embodiments, the gesture capturing device 208 may include a true depth camera system. In some embodiments, the camera of the computing device 104 may be used for capturing live one or more gestures of the first user 102 as discussed with reference to the Figure 1A.
[0055] The gesture processing device 210 is configured to analyse the captured one or more gestures of the first user to determine if the one or more gestures corresponds to a gesture recipe of the plurality of gesture recipes, wherein the first user needs to provide the one or more gestures within a pre-defined time frame to be considered as a gesture recipe. The gesture processing device 210 is also configured to select at least one cell position in the at least one grid based on the determined gesture recipe. Further, the gesture processing device 210 is configured to determine at least one message from the selected at least one cell position corresponding to the determined gesture recipe. In some embodiments, the gesture processing device 210 may not be able to determine at least one message from the selected at least one cell position corresponding to the determined gesture recipe. In such embodiments, no message is presented to the first user.
[0056] In some embodiments, the gesture processing device 210 is configured to process the captured one or more gestures based on at least one of two trigger modes comprising a direct phrase trigger mode and a cell position trigger mode. In some embodiments, for the direct trigger mode, the gesture processing device 210 is configured to enable a second user to: identify an appropriate recipe comprising at least one gesture for the first user; and allocate a phrase from a plurality of phrases stored in the database. In other embodiments, for the cell position trigger mode, the gesture processing device is configured to: set up a gesture recipe; and allocate the gesture recipe to a position on a specific cell of a grid of a plurality of cells, wherein each of the plurality of cells comprises at least one of a terminal cell and a category folder comprising a category leading to a new grid, wherein the new grid comprises one or more cells comprising at least one of another terminal cell and another category folder, wherein the terminal cell and the another terminal cell comprises at least one message.
[0057] The text-to-speech convertor 212 is configured to convert the at least one message comprising a text message into a speech message prior to presenting the at least one message to the first user. In some embodiments, the trigger detection device 214 is configured to match each gesture of the one or more gestures with the stored plurality of triggers to determine if the gesture corresponds to at least one trigger.
[0058] The recipe determining device 216 may be configured to retrieve at least one gesture recipe relevant to the at least one trigger from the stored plurality of gesture recipes. Further, the recipe determining device 216 is configured to retrieve a message corresponding to the at least one gesture recipe from the database; [0059] The output device 218 is configured to present the at least one message or the speech message to the first user. In some embodiments, the output device 218 is configured to present the speech message aloud via a speaker of the computing device. In some embodiments, no message may be presented to the first user.
[0060] The database 204 is configured to store the plurality of gesture recipes, a plurality of triggers, the plurality of pre-defined gestures, the at least one grid, and a plurality of messages, speech messages, user information, and so forth.
[0061] In some embodiments, the gesture recognition system 202 may provide features such as, but not limited to, optimised head tracking, emoji keyboard, grid layouts and the gesture speak feature that recognizes the one or more gestures of the first user and speaks out the recognized gestures as speech message.
[0062] In some embodiments, the database 204 of the gesture recognition system 202 includes a phrase bank to save commonly used content; emotes (non-speech sounds). Further, the system 202 may provide alternative access methods such as, but not limited to, a switch access, whole screen as switch, head tracking and so forth to the user 102 for accessing the system 202.
[0063] Figures 3A-3B is a flowchart diagram illustrating a method for recognising gestures by using the exemplary gesture recognition system 202 of Figure 2, in accordance with an embodiment of the present disclosure. As discussed with reference to the Figure 1A, the first user 102 accesses the gesture recognition system 106 (or 202) on the computing device 104.
[0064] At step 302, the system 202 enables a first user, such as the first user 102, to set up a plurality of gesture recipes that works for the first user. Each of the plurality of gesture recipes may include a combination of at least one gesture. In some embodiments, the recipe set up device 206 enables the first user 102 to set up the plurality of gesture recipes.
[0065] Then at step 304, the first user 102 sets up at least one grid comprising one or more cells by allocating a pre-defined gesture of a plurality of pre-defined gestures to a cell position in the at least one grid. Each cell of the at least one grid may include a message. The message may be a text message, an audio message, a video message, and combination of these. In some embodiments, the recipe set up device 206 enables the first user 102 to set up the at least one grid. Each cell of the grid may be a terminal cell i.e. including a message or the cell may be a category cell comprising a category that may lead to a new grid comprising a message or a sub-category. For example, in a 3 x 3 grid the cells may be for Al, A2, A3, B1, B2, B3, Cl, C2, C3. Each of these cells can be a folder or a terminal cell. A folder is likely to be a category that leads to a new grid (also 3 x 3, for example) with either further category folders (e.g. for a subcategory), or terminal cells (e.g. final messages to be spoken aloud with the app's text to speech). The user 102 may set up a Grid1, a Grid2, and a Grid 3 as shown below: Gridl:
A B C
1 Category: Food Category: Family Category: Place 2 Category: Category: Category: 3 Category: Category: Category: Grid 2:
A B C
1 Category: Countries Category: Cities Category: Shops 2 Category: Category: Category: 3 Category: Category: Category Grid 3:
A B C
1 London Manchester Birmingham 2 Edinburgh Category: Category: 3 Category: Category: Category [0066] Further, the system 202 (i.e. the recipe set up device 206) enables the first user 102 to set up a gesture recipe and then allocated to a position on a grid (a specific cell). In this case, for example: Al = blink + blink + blink B1 = cheek puff + cheek puff + cheek puff Cl = mouth open + mouth open + mouth open A2 = blink + cheek puff + cheek puff B2 = cheek puff + mouth open [0067] So, to trigger a message "London or "I live in London", the user 102 may first need to trigger Cl (via the Cl recipe) then B1 folder on the second level screen and finally Al on the third screen.
[0068] In some embodiments, a confusability matrix may allow users to understand which gestures are not being reliably distinguished from another gesture and as such should either be avoided or not used with the gesture that is too similar. The system 106 may set up the confusability matrix, alternatively, the first user 102 may set up the confusability matrix. The confusability matrix may be a way of understanding which gestures are perceived as too similar and therefore might accidentally trigger the wrong message. For example, rounded mouth and smile might be too similar to have so, the first user 102 should use eye brow raise and rounded mouth -to make it less likely to trigger the wrong cell.
[0069] Then at step 306, the system 106 may capture one or more gestures of the first user 102. In some embodiments, the gesture capturing device 208 may capture the one or more gestures of the first user 102. Further, the gesture capturing device 208 may wait for a pre-defined time after each gesture of the one or more gestures is captured. In some embodiments, a timer countdown of the predefined time is shown to the user 102 via a display screen of the computing device 104.
[0070] Then at step 308, the system 202 analyses the captured one or more gestures to determine if the one or more gestures corresponds to a gesture recipe of the plurality of recipe. In some embodiments, the gesture processing device 210 analyses the captured one or more gestures to determine if the one or more gestures corresponds to a gesture recipe of the plurality of recipe.
[0071] Then at step 310, the system 202 selects at least one cell position in the at least one grid based on the determined gesture recipe. For example, if the one or more gestures comprises blink + blink + blink then the system selects Al from the Grid 1 as shown above. In some embodiments, the gesture processing device 210 selects the at least one cell position in the at least one grid based on the determined gesture recipe.
[0072] At step 312, the system 202 determines a message from the selected at least one cell position. In some embodiments, the gesture processing device 210 determines a message from the selected at least one cell position. For example, for the gesture recipe comprising blink + blink + blink, the cell Al is selected from the Grid 1, the Al includes a message i.e. a category: Food, this may lead to another Grid 2.
[0073] At step 314, the system 202 converts the message into a speech message. In some embodiments, the text-to-speech convertor 212 converts the message into a speech message [0074] Thereafter at step 316, the system 202 presents the speech message to the first user 102. In some embodiments, the output device 218 presents the speech message to the first user 102 via a speaker. The speaker may be inbuilt speaker of the computing device 104.
[0075] Figures 4A-4C are screenshots illustrating various interfaces 400A-400C according to various embodiments of the gesture recognition system of Figure 2, in accordance with an embodiment of the present disclosure. As discussed with reference to the Figure 1A, the first user 102 may access the gesture recognition system 106 via the computing device 104 as a mobile application. The gesture recognition system 106 may display one or more interfaces to allow the user 102 to access one or more features of the gesture recognition system 106.
[0076] As shown in the Figure 4A, the interface 400A includes a gesture speak option 402 that may be selected by the first user 102, and a pick category option 404 showing various categories such as, but not limited to, favourites, chat, questions, things I like, help and so forth. Further, the interface 400A includes a number of triggers 406A-406N and a pick phrase window 408. The number of triggers 406A-406N may include gestures that will trigger a gesture recipe.
[0077] As shown in the Figure 4B, the interface 400A includes the pick phrase showing a number of phrases 412 determined based on the trigger. The system 202 determines the trigger based on the one or more gestures of the first user 102 captured by the system 202. The trigger detection device 214 may match each gesture of the one or more gestures with the stored plurality of triggers to determine if the gesture corresponds to at least one trigger. Based on the determined trigger, the recipe determining device 216 retrieves a gesture recipe relevant to the trigger from the stored plurality of gesture recipes and accordingly retrieves a phrase (i.e. a message) corresponding to the at least one gesture recipe from the database 204 is picked and displayed in the pick phrase window 408.
[0078] In some embodiments, the first user 102 may need to perform the predetermined steps in the sequence prepared within the pre-defined time limit and have it recognised [0079] As shown, in the Figure 4B-4C, in the main keyboard 416/ or any other screen, such as the folder pages, the first user 102 may need to complete the gesture recipe comprising the one or more gestures, i.e. blink, mouth open, and frown, within the set time to trigger that phrase. As shown in the Figure 4C, a visual prompt is placed on the screen to identify that a gesture has been recognised and shows a timer countdown 414 to when it will clear if another gesture in that gesture recipe has not been identified.
[0080] In some embodiments, gesture speak function is turned on in the interface 400A by selecting the gesture speak option 402 in settings. The first user 102 sets up gesture recipes that would work for them. This may be combinations of face gestures etc. (including eye blink, eyebrow raise etc). The first user 102 allocates gesture to a cell position in a grid. In one example embodiment, the first user 102 records (prior to onset of the disability) his/her voice of each text message corresponding to a cell position so that, during use, the first user's own voice is heard when a text message is determined. The first user 102 goes out of settings and returns to main keyboard 416 or phrases page. The keyboard 416 may be an emoji (or emotes) keyboard. The first user 102 makes the sequence of face gestures that equal a gesture recipe. The gesture recipe acts as a way of selecting the cell position. The cell contains a message which is converted into a speech message and is spoken aloud using the output device (or the text-tospeech device in the mobile application).
[0081] In some embodiments, the system 202 may use or process the gesture recipes in two ways: direct phrase trigger and cell position trigger. The direct phrase trigger may set up where a person, such as the second user 108, identifies the appropriate recipe (i.e. the appropriate combination of gestures for the first user 102) and then allocates a phrase (or message) from the phrase bank stored in the database 204. The cell position trigger may enable a gesture recipe to be set up and then allocated to a position on a grid (a specific cell). For example, in a 3x3 grid, it may be for Al, A2, A3, Bl, B2, B3, Cl, C2, C3. Each of these cells can be a folder or a terminal cell. A folder is likely to be a category that leads to a new grid that may also be a 3x3 grid with either further category folders (e.g. for a sub-category), or terminal cells including final messages to be spoken aloud with the text-to-speech convertor 212 and the output device 218.
[0082] In some embodiments, the system 202 receives a gesture from the first user 102 and is configured to detect and check if it's a trigger corresponding to a gesture recipe (i.e. if the trigger is a gesture speak trigger). If the trigger is a gesture speak trigger than the system may perform a phrase action i.e. searching for a phrase corresponding to the gesture recipe from the database 204 and presenting to the first user 102. Else the system 202 may check if the trigger is a start of any gesture recipe. If yes then the system may perform steps of a recipe algorithm. Else the system ignores the gesture. The recipe algorithm includes one or more phases or steps. At phasel or step 1, the system 202 retrieves all gesture recipes starting with the detected trigger and a timer countdown of a pre-defined time like 5 seconds is started. And if no other trigger or gesture is received from the first user 102 within the pre-defined time, then recipe is cancelled. Else in phase 2 or step2, one or more gesture recipes are retrieved by the system 202 from the relevant gesture recipes with trigger in position 2 when a trigger (gesture) is received in the position 2. Then if no gesture recipe from the relevant gesture recipe is retrieved then the process of detection of one or more gestures starts again. If another trigger is detected, then retrieve all recipes from relevant recipes 2 with trigger in position 3. And after processing each trigger, the phrase action is performed. The phrase action includes determining one or more message or phrases from the phrase bank or the database 204 and presenting the message as a speech message to the first user.
[0083] The system 202 may be useful for people with limited movements, any physical disability, ALS, autism, MND, cerebral palsy, stroke, laryngectomy, and so forth to communicate with others.
[0084] In compliance with the statute, the invention has been described in language more or less specific to structural or methodical features. The term "comprises" and its variations, such as "comprising" and "comprised of" is used throughout in an inclusive sense and not to the exclusion of any additional features. It is to be understood that the invention is not limited to specific features shown or described since the means herein described comprises preferred forms of putting the invention into effect. The invention is, therefore, claimed in any of its forms or modifications within the proper scope of the appended claims appropriately interpreted by those skilled in the art.
[0085] Any embodiment of the invention is meant to be illustrative only and is not meant to be limiting to the invention. Therefore, it should be appreciated that various other changes and modifications can be made to any embodiment described without departing from the spirit and scope of the invention.

Claims (18)

  1. What is claimed is: 1. A gesture recognition system configured to be accessed via a computing device, comprising: a recipe set up device configured to enable a first user to: set up a plurality of gesture recipes that works for the first user, wherein each of the plurality of gesture recipes comprises a combination of at least one gesture; set up at least one grid comprising one or more cells by allocating a predefined gesture of a plurality of pre-defined gestures to a cell position in the at least one grid, wherein each cell of the at least one grid comprises a message; a gesture capturing device configured to continuously capture one or more gestures of the first user, wherein the gesture capturing device waits for a pre-defined time after each gesture of the one or more gestures is captured; a gesture processing device configured to: analyse the captured one or more gestures to determine if the one or more gestures corresponds to a gesture recipe of the plurality of gesture recipes, wherein the first user needs to provide the one or more gestures within a pre-defined time frame to be considered as a gesture recipe; select at least one cell position in the at least one grid based on the determined gesture recipe; and determine at least one message from the selected at least one cell position; and an output device configured to present the at least one message to the first user. The gesture recognition system of claim 1 further comprising: a text-to-speech convertor configured to convert the at least one message into a speech message prior to presenting the at least one message to the first user; and a database configured to store the plurality of gesture recipes, a plurality of triggers, the plurality of pre-defined gestures, the at least one grid, and a plurality of messages.
  2. 3. The gesture recognition system of claim 2, wherein the output device is configured to present the speech message aloud via a speaker of the computing device.
  3. 4. The gesture recognition system of claim 2, wherein the first user sets up the plurality of gesture recipes based on the plurality of pre-defined gestures, wherein each of the gesture recipe comprises two or more of the plurality of pre-defined gestures.
  4. 5. The gesture recognition system of claim 5, wherein the plurality of pre-defined gestures comprises blink both eyes, blink left eye, blink right eye, mouth open, tongue out, smile, frown, eyebrows up, and cheek puff.
  5. 6. The gesture recognition system of claim 1, wherein the gesture capturing device comprises a true depth camera system.
  6. 7. The gesture recognition system of claim 1, wherein the gesture processing device is further configured to process the captured one or more gestures based on at least one of two trigger modes comprising a direct phrase trigger mode and a cell position trigger mode.
  7. 8. The gesture recognition system of claim 7, wherein for the direct trigger mode, the gesture processing device is configured to enable a second user to: identify an appropriate recipe comprising at least one gesture for the first user; and allocate a phrase from a plurality of phrases stored in the database.
  8. 9. The gesture recognition system of claim 7, wherein for the cell position trigger mode, the gesture processing device is configured to: set up a gesture recipe, and allocate the gesture recipe to a position on a specific cell of a grid of a plurality of cells, wherein each of the plurality of cells comprises at least one of a terminal cell and a category folder comprising a category leading to a new grid, wherein the new grid comprises one or more cells comprising at least one of another terminal cell and another category folder, wherein the terminal cell and the another terminal cell comprises at least one message.
  9. 10. A gesture recognition system configured to be accessed via a computing device, comprising: a database configured to store a plurality of gesture recipes, a plurality of triggers, a plurality of pre-defined gestures, and a plurality of messages, wherein each of the plurality of gesture recipes comprises a combination of at least one gesture; a gesture capturing device configured to capture one or more gestures of a first user, wherein the gesture capturing device waits for a pre-defined time after each gesture of the one or more gestures is captured; a trigger detection device configured to match each gesture of the one or more gestures with the stored plurality of triggers to determine if the gesture corresponds to at least one trigger; a recipe determining device configured to: retrieve at least one gesture recipe relevant to the at least one trigger from the stored plurality of gesture recipes; and retrieve a message corresponding to the at least one gesture recipe from the database; a text-to-speech convertor configured to convert the message into a speech message, and an output device configured to present the speech message corresponding to the at least one recipe.
  10. 11. A method for recognising gestures by using a gesture recognition system, comprising: enabling, by a recipe set up device, a first user to set up a plurality of gesture recipes that works for the first user, wherein each of the plurality of gesture recipes comprises a combination of at least one gesture; enabling, by the recipe set up device, the first user to set up at least one grid comprising one or more cells by allocating a pre-defined gesture of a plurality of predefined gestures to a cell position in the at least one grid, wherein each cell of the at least one grid comprises a message; capturing, by a gesture capturing device, one or more gestures of the first user, wherein the gesture capturing device waits for a pre-defined time after each gesture of the one or more gestures is captured; analysing, by a gesture processing device, the captured one or more gestures to determine if the one or more gestures corresponds to a gesture recipe of the plurality of gesture recipes, wherein the first user needs to provide the one or more gestures within a pre-defined time frame to be considered as a gesture recipe; selecting, by the gesture processing device, at least one cell position in the at least one grid based on the determined gesture recipe; determining, by the gesture processing device, at least one message from the selected at least one cell position; and presenting, by an output device, the at least one message to the first user.
  11. 12. The method of claim 11 further comprising: converting, by a text-to-speech convertor, the at least one message comprising a text message into a speech message prior to presenting the at least one message to the user; and storing, by a database, the plurality of gesture recipes, a plurality of triggers, the plurality of pre-defined gestures, the grid, and a plurality of messages.
  12. 13. The method of claim 12, further comprising presenting the speech message aloud via a speaker.
  13. 14. The method of claim 12, wherein the first user sets up the plurality of gesture recipes based on the plurality of pre-defined gestures, wherein each of the gesture recipe comprises two or more of the plurality of pre-defined gestures.
  14. 15. The method of claim 14, wherein the plurality of pre-defined gestures comprises blink both eyes, blink left eye, blink right eye, mouth open, tongue out, smile, frown, eyebrows up, and cheek puff.
  15. 16. The method of claim 11, further comprising processing, by the gesture processing device, the captured one or more gestures based on a trigger mode comprising a direct phrase trigger mode and a cell position trigger mode.
  16. 17. The method of claim 16 further comprising, for the direct trigger mode, enabling, by the gesture processing device, a second user to: identify an appropriate recipe comprising at least one gesture for the first user; and allocate a phrase from a plurality of phrases stored in the database.
  17. 18. The method of claim 16 further comprising, for the cell position trigger mode: setting up, by the gesture processing device, a gesture recipe; and allocating, by the gesture processing device, the gesture recipe to a position on a specific cell of a grid of a plurality of cells, wherein each of the plurality of cells comprises at least one of a terminal cell and a category folder comprising a category leading to a new grid, wherein the new grid comprises one or more cells comprising at least one of another terminal cell and another category folder, wherein the terminal cell and the another terminal cell comprises at least one message.
GB2005568.7A 2020-04-16 2020-04-16 Gesture recognition systems and methods of its use Pending GB2594081A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB2005568.7A GB2594081A (en) 2020-04-16 2020-04-16 Gesture recognition systems and methods of its use

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB2005568.7A GB2594081A (en) 2020-04-16 2020-04-16 Gesture recognition systems and methods of its use

Publications (2)

Publication Number Publication Date
GB202005568D0 GB202005568D0 (en) 2020-06-03
GB2594081A true GB2594081A (en) 2021-10-20

Family

ID=70860042

Family Applications (1)

Application Number Title Priority Date Filing Date
GB2005568.7A Pending GB2594081A (en) 2020-04-16 2020-04-16 Gesture recognition systems and methods of its use

Country Status (1)

Country Link
GB (1) GB2594081A (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113315871B (en) * 2021-05-25 2022-11-22 广州三星通信技术研究有限公司 Mobile terminal and operating method thereof
CN115484391B (en) * 2021-06-16 2023-12-12 荣耀终端有限公司 Shooting method and electronic equipment

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180095539A1 (en) * 2016-10-03 2018-04-05 Microsoft Technology Licensing, Llc Automated e-tran application

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180095539A1 (en) * 2016-10-03 2018-04-05 Microsoft Technology Licensing, Llc Automated e-tran application

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
An IP.com Prior Art Database Technical Disclosure, 5 March 2019, ip.com, MingHung Heish, "Gesture-to-Speak", pages 1-8. *

Also Published As

Publication number Publication date
GB202005568D0 (en) 2020-06-03

Similar Documents

Publication Publication Date Title
AU2015375326B2 (en) Headless task completion within digital personal assistants
JP6911155B2 (en) Memory of metadata associated with acquired images
TWI585744B (en) Method, system, and computer-readable storage medium for operating a virtual assistant
CN106415719B (en) It is indicated using the steady endpoint of the voice signal of speaker identification
WO2019128103A1 (en) Information input method, device, terminal, and computer readable storage medium
US20150088515A1 (en) Primary speaker identification from audio and video data
CN105320726A (en) Reducing the need for manual start/end-pointing and trigger phrases
KR20170014353A (en) Apparatus and method for screen navigation based on voice
WO2020249038A1 (en) Audio stream processing method and apparatus, mobile terminal, and storage medium
GB2594081A (en) Gesture recognition systems and methods of its use
US11595591B2 (en) Method and apparatus for triggering special image effects and hardware device
WO2015043442A1 (en) Method, device and mobile terminal for text-to-speech processing
CN109036398A (en) Voice interactive method, device, equipment and storage medium
CN113033245A (en) Function adjusting method and device, storage medium and electronic equipment
JP7462070B2 (en) INTERACTION INFORMATION PROCESSING METHOD, APPARATUS, ELECTRONIC DEVICE, AND STORAGE MEDIUM
CN109256116A (en) Pass through the method for speech recognition keypad function, system, equipment and storage medium
CN116088992B (en) Click control method and system based on image recognition and voice recognition
US20150006173A1 (en) System and Method for Processing a Keyword Identifier
WO2016077681A1 (en) System and method for voice and icon tagging
Madake et al. Voice-based email system for visually impaired people
CN115407904A (en) Shared document display method and device, storage medium and electronic equipment
CN116540972A (en) Method, apparatus, device and storage medium for question and answer
CN114567700A (en) Interaction method, interaction device, electronic equipment and storage medium
CN117912448A (en) Digital reading method and device, electronic equipment and storage medium
CN117809650A (en) Display device and voice recognition method