WO2021190267A1 - System and method for providing computer aided memorization of text - Google Patents
System and method for providing computer aided memorization of text Download PDFInfo
- Publication number
- WO2021190267A1 WO2021190267A1 PCT/CN2021/079074 CN2021079074W WO2021190267A1 WO 2021190267 A1 WO2021190267 A1 WO 2021190267A1 CN 2021079074 W CN2021079074 W CN 2021079074W WO 2021190267 A1 WO2021190267 A1 WO 2021190267A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- text
- received
- expected
- user device
- user
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B21/00—Teaching, or communicating with, the blind, deaf or mute
- G09B21/009—Teaching or communicating with deaf persons
Definitions
- This disclosure relates to the field of computer technology, and particularly to a system and method for providing computer-aided memorization of text.
- the present invention relates generally to methods and systems for providing computer-aided memorization of text.
- the system receives audio data generated by a user (e.g., speech, music, etc. ) .
- the received audio data is converted to received text data in real-time (i.e., as it is received) , and received text is displayed on a display device as it is received.
- the system identifies an expected text related to the received audio data, which may be indicated by a user or determined automatically. Once an expected text has been identified, the received text is compared to that expected text in order to identify one or more discrepancies between the received text and the expected text. The one or more discrepancies are then indicated within the received text displayed upon the display device.
- One embodiment of the disclosure is directed to a method performed by a user device.
- the method comprises receiving audio data received from a user of the user device, converting the audio data into a received text, comparing the received text to an expected text to determine one or more discrepancies between the received text and the expected text, presenting, on a display of the user device and as the received text is translated, the received text, and indicating, in association with the received text, the one or more discrepancies between the received text and the expected text.
- Another embodiment of the disclosure is directed to a system comprising a processor; and a memory including instructions that, when executed with the processor, cause the system to, at least receive audio data received from a user of the user device, convert the audio data into a received text, compare the received text to an expected text to determine one or more discrepancies between the received text and the expected text, present, on a display as the received text is translated, the received text, and indicate, in association with the received text, the one or more discrepancies between the received text and the expected text
- Yet another embodiment of the disclosure is directed to a non-transitory computer readable medium storing specific computer-executable instructions that, when executed by a processor, cause a computer system to at least receive audio data received from a user of the user device, convert the audio data into a received text, compare the received text to an expected text to determine one or more discrepancies between the received text and the expected text, present, on a display as the received text is translated, the received text, and indicate, in association with the received text, the one or more discrepancies between the received text and the expected text.
- embodiments of the present disclosure involve methods and systems that provide computer-aided memorization of text in a manner that has advantages over conventional techniques. More particularly, providing real-time visual feedback in response to receiving a user’s audio performance enables the user to quickly correct errors and prevents the forming of incorrect habits. Moreover, the visual feedback is less intrusive than audio feedback, in that users can choose to ignore the visual feedback if they prefer. Additionally, visual information including highlights, error correction, and associated pictures can help the memorization task, making the system more effective than conventional systems.
- FIG. 1 depicts an illustrative example of a system for providing computer-aided memorization of text in accordance with at least some embodiments
- FIG. 2 depicts a system architecture for a system that provides computer-aided memorization of text in accordance with at least some embodiments
- FIG. 3 is a simplified flowchart illustrating a method of providing computer-aided memorization of text according to an embodiment of the present invention
- FIG. 4 depicts some illustrative examples of features that may be implemented in accordance with embodiments described herein;
- FIG. 5 depicts techniques for annotating text with images in order to aid in text memorization in accordance with at least some embodiments
- FIG. 6 depicts some example graphical user interfaces demonstrating example features that may be implemented in accordance with embodiments described herein;
- FIG. 7 illustrates a flow diagram depicting a process for providing computer-aided memorization of text in accordance with at least some embodiments.
- FIG. 8 illustrates examples of components of a computer system according to certain embodiments.
- the present invention relates generally to methods and systems related to providing computer-aided memorization of text. More particularly, embodiments of the present invention provide methods and systems in which real-time visual feedback is provided to a user of a user device in response to receiving audio data from that user.
- the audio data received from the user is converted into a received text and printed onto a display of the user device.
- the system identifies an expected text that corresponds to the received text.
- the received text is then compared to the identified expected text using language processing techniques in order to identify a number of discrepancies between the received text and the expected text. The discrepancies are then indicated within the text printed onto the display of the user device.
- FIG. 1 depicts an illustrative example of a system for providing computer-aided memorization of text in accordance with at least some embodiments.
- a user device 102 may be used to receive audio input from a user as well as to present feedback to that audio input.
- the user device in some cases, may be in communication with a mobile application server 104, which may be further connected to a network 106.
- the user device 102 represents a suitable computing device that includes one or more graphical processing units (GPUs) , one or more general purpose processors (GPPs) , and one or more memories storing computer-readable instructions that are executable by at least one of the processors to perform various functionalities of the embodiments of the present disclosure.
- user device 102 can be any of smart glasses, a smartphone, a tablet, a laptop, a personal computer, a gaming console, or a smart television.
- the user device 102 additionally includes at least one input sensor 108, such as a microphone, which is capable of obtaining audio input from a user.
- the user device 102 may also include additional input sensors such as a camera, a gyroscope, or an accelerometer.
- the at least one input sensor 108 of the user device 102 may be used to capture audio data 110.
- the audio data 110 may include audible data provided by a user of the user device 102.
- audio data 110 may include a recording of speech provided by the user.
- audio data may include a recording of a musical instrument being played by the user.
- Audio data 100 may be processed on the fly without saving the audio data 110 to a file.
- Audio data 110 may be captured in any suitable file format.
- audio data 110 may be captured as a . wav file, a . mp3 file, a . wma file, or any other suitable audio file format.
- One or more audio recognition techniques are then used to convert the audio data 110 into text data representing received text 112.
- One skilled in the art would appreciate that a number of audio recognition techniques are available in the art that are capable of being used to convert audio data 110 into received text 112.
- the audio data 110 is converted into received text 112 as it is received (i.e., in real time) .
- the text is referred to as including words, the text may instead include musical notes.
- a discrepancy, delta, or difference, between the received text 112 and the expected text 114 may include a number of discrepancies, which are each variances between specific content of the texts. In some cases, a discrepancy may be an extra or missing word or words. In some cases, a discrepancy may be detected where a word was used in the received text 112 that is different from the corresponding word in the expected text 114.
- the expected text 114 is retrieved from a data store 118 within the user device. In some embodiments, the expected text is selected by a user before the audio data 110 is obtained.
- the received text 112 is compared to a number of different potential expected texts stored in the data store 118 in order to identify the expected text 114 as a closest match to the received text 112.
- the expected text 114 is received from a mobile application server 104 in communication with the user device 102.
- the mobile application server 104 may provide one or more text data files to the user device 102 to be used in embodiments described herein.
- the one or more text data files 116 may be retrieved from a network (e.g., the Internet) .
- the mobile application server 104 may include any computing device capable of providing back end support for the computer-aided memorization application as described herein. In some embodiments, this may involve identifying and providing text files to the user device 102. For example, the user may indicate a specific text that he or she wishes to memorize and the mobile application server 104 may retrieve a text file associated with that text and provide the text file to the user device 102. In some embodiments, the mobile application server 104 may receive text from a user device 102, identify a closest matching text from a database or the network 106, retrieve a text file associated with that closest matching text, and provide the text file to the user device 102 as the expected text 114.
- the mobile application server 104 may receive text from the user device 102 that states "four score and seven years ago. " In this illustrative example, the mobile application server 104 may identify, from the received text, that the Gettysburg Address is the closest matching text and is hence likely the intended memorization target of the user. In this example, the mobile application server 104 may be configured to retrieve a text file for the Gettysburg Address and to provide that text file to the user device 102.
- the user device 102 is configured to receive audio input from a user as the user provides that input, display received text corresponding to the audio input on a display of the user device 102, identify one or more discrepancies between the received text and an expected text, and display those discrepancies along with the displayed received text. In some embodiments, this is done in real time as the user speaks. This allows a user wishing to memorize a particular text to recite the text from memory and receive immediate feedback on their recollection of the text.
- FIG. 1 For clarity, a certain number of components are shown in FIG. 1. It is understood, however, that embodiments of the invention may include more than one of each component. In addition, some embodiments of the invention may include fewer than or greater than all of the components shown in FIG. 1. In addition, the components in FIG. 1 may communicate via any suitable communication medium (including the internet) , using any suitable communication protocol.
- any suitable communication medium including the internet
- FIG. 2 depicts a system architecture for a system that provides computer-aided memorization of text in accordance with at least some embodiments.
- a user device 202 may be in communication with a number of other components, including at least a mobile application server 204.
- the mobile application server 204 may perform at least a portion of the processing functions required by a mobile application installed upon the user device.
- the user device 202 and mobile application server 204 may be examples of the user device 102 and mobile application server 104 respectively described with respect to FIG. 1.
- a user device 202 may be any suitable electronic device that is capable of providing at least a portion of the capabilities described herein.
- the user device 202 may be any electronic device capable of capturing audio data from a user and/or presenting a stream of corresponding text on a display.
- a user device may be capable of establishing a communication session with another electronic device (e.g., mobile application server 204) and transmitting /receiving data from that electronic device.
- a user device may include the ability to download and/or execute mobile applications.
- User devices may include mobile communication devices as well as personal computers and thin-client devices.
- a user device may be a set of smart glasses, a smart phone, a personal data assistant (PDA) , or any other suitable handheld device.
- PDA personal data assistant
- the user device can be implemented as a self-contained unit with various components (e.g., input sensors, one or more processors, memory, etc. ) integrated into the user device.
- various components e.g., input sensors, one or more processors, memory, etc.
- outputs of various components might remain inside a self-contained unit that defines a user device.
- the user device 202 may include at least one memory 206 and one or more processing units (or processor (s) ) 208.
- the processor (s) 208 may be implemented as appropriate in hardware, computer-executable instructions, firmware or combinations thereof.
- Computer-executable instruction or firmware implementations of the processor (s) 208 may include computer-executable or machine executable instructions written in any suitable programming language to perform the various functions described.
- the user device 202 may also include one or more input sensors 210 for receiving user and/or environmental input. There may be a variety of input sensors 210 capable of detecting user or environmental input, such as an accelerometer, a camera device, a depth sensor, a microphone, a global positioning system (e.g., GPS) receiver, etc.
- GPS global positioning system
- the memory 206 may store program instructions that are loadable and executable on the processor (s) 208, as well as data generated during the execution of these programs.
- the memory 206 may be volatile (such as random access memory (RAM) ) and/or non-volatile (such as read-only memory (ROM) , flash memory, etc. ) .
- the user device 202 may also include additional storage 212, such as either removable storage or non-removable storage including, but not limited to, magnetic storage, optical disks, and/or tape storage.
- the disk drives and their associated computer-readable media may provide non-volatile storage of computer-readable instructions, data structures, program modules, and other data for the computing devices.
- the memory 206 may include multiple different types of memory, such as static random access memory (SRAM) , dynamic random access memory (DRAM) or ROM.
- SRAM static random access memory
- DRAM dynamic random access memory
- the memory 206 may include an operating system 214 and one or more application programs or services for implementing the features disclosed herein including at least a mobile application 216.
- the memory 206 may also include application data 218, which provides information to be generated by and/or consumed by the mobile application 216.
- the application data 218 may be stored in a database.
- a mobile application may be any set of computer executable instructions installed upon, and executed from, a user device 202.
- Mobile applications may be installed on a user device by a manufacturer of the user device or by another entity.
- the mobile application 216 may cause a user device to establish a communication session with a mobile application server 204 which provides backend support for the mobile application 216.
- a mobile application server 204 may maintain account information associated with a particular user device and/or user.
- a user may be required to log into an account for the mobile application in order to access functionality provided by the mobile application 216.
- the mobile application 216 is configured to receive audio input provided by a user (e.g., speech) and to present information regarding discrepancies in the audio input to the user. More particularly, the mobile application 216 is configured to obtain audio data from a user, display received text corresponding to the audio input on a display of the user device 102, identify one or more discrepancies between the received text and an expected text, and display those discrepancies along with the displayed received text.
- a user e.g., speech
- the mobile application 216 is configured to obtain audio data from a user, display received text corresponding to the audio input on a display of the user device 102, identify one or more discrepancies between the received text and an expected text, and display those discrepancies along with the displayed received text.
- the mobile application 216 may receive output from the input sensors 210 and generate an audio file based upon that output. Using this information, the mobile application 216 may generate a text file. For example, the mobile application 216 may use a text-to-speech conversion application to convert the received audio into corresponding text. The received text may then be compared to expected text in order to identify one or more discrepancies between the received text and the expected text. The mobile application 216 then causes the received text to be displayed within a graphical user interface (GUI) of the mobile application 216 along with an indication of the identified one or more discrepancies. The mobile application 216 may provide a user with the ability to finish reciting their recollection of the expected text or to repeat a portion of the expected text that included discrepancies.
- GUI graphical user interface
- the user device 202 may also contain communications interface (s) 220 that enable the user device 202 to communicate with any other suitable electronic devices.
- the communication interface 220 may enable the user device 202 to communicate with other electronic devices on a network (e.g., on a private network) .
- the user device 202 may include a BLUETOOTH TM wireless communication module, which allows it to communicate with another electronic device.
- the user device 202 may also include input/output (I/O) device (s) and/or ports 222, such as for enabling connection with a keyboard, a mouse, a pen, a voice input device, a touch input device, a display, speakers, a printer, etc.
- I/O input/output
- the user device 202 may communicate with the mobile application server 204 via a communication network.
- the communication network may include any one or a combination of many different types of networks, such as cable networks, the Internet, wireless networks, cellular networks, and other private and/or public networks.
- the communication network may comprise multiple different networks.
- the user device 202 may utilize a wireless local area network (WLAN) to communicate with a wireless router, which may then route the communication over a public network (e.g., the Internet) to the mobile application server 204.
- WLAN wireless local area network
- the mobile application server 204 may be any computing device or plurality of computing devices configured to perform one or more calculations on behalf of the mobile application 216 on the user device 202.
- the mobile application 216 may be in periodic communication with the mobile application server 204.
- the mobile application 216 may receive updates, push notifications, or other instructions from the mobile application server 204.
- the mobile application 216 and mobile application server 204 may utilize a proprietary encryption and/or decryption scheme to secure communications between the two.
- the mobile application server 204 may be executed by one or more virtual machines implemented in a hosted computing environment.
- the hosted computing environment may include one or more rapidly provisioned and released computing resources, which computing resources may include computing, networking, and/or storage devices.
- a hosted computing environment may also be referred to as a cloud-computing environment.
- the mobile application server 204 may include at least one memory 224 and one or more processing units (or processor (s) ) 226.
- the processor (s) 226 may be implemented as appropriate in hardware, computer-executable instructions, firmware or combinations thereof.
- Computer-executable instruction or firmware implementations of the processor (s) 226 may include computer-executable or machine executable instructions written in any suitable programming language to perform the various functions described.
- the memory 224 may store program instructions that are loadable and executable on the processor (s) 226, as well as data generated during the execution of these programs.
- the memory 224 may be volatile (such as random access memory (RAM) ) and/or non-volatile (such as read-only memory (ROM) , flash memory, etc. ) .
- the mobile application server 204 may also include additional storage 228, such as either removable storage or non-removable storage including, but not limited to, magnetic storage, optical disks, and/or tape storage.
- the disk drives and their associated computer-readable media may provide non-volatile storage of computer-readable instructions, data structures, program modules, and other data for the computing devices.
- the memory 224 may include multiple different types of memory, such as static random access memory (SRAM) , dynamic random access memory (DRAM) or ROM.
- the memory 224 may include an operating system 230 and one or more application programs or services for implementing the features disclosed herein including at least a module for analyzing text to identify discrepancies between text files (text analysis module 232) .
- the memory 224 may also include account data 234, which provides information associated with user accounts maintained by the described system, as well as text file data 236, which maintains text files for which memorization may be requested. At least some of the text files stored in text file data 236 may be stored in relation to a particular user account. In some embodiments, one or more of the account data 234 or the text file data 236 may be stored in a database.
- the memory 224 and the additional storage 228, both removable and non-removable, are examples of computer-readable storage media.
- computer-readable storage media may include volatile or non-volatile, removable or non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data.
- modules may refer to programming modules executed by computing systems (e.g., processors) that are installed on and/or executed from the mobile application server 204.
- the mobile application server 204 may also contain communications connection (s) 238 that allow the mobile application server 204 to communicate with a stored database, another computing device or server, user terminals, and/or other components of the described system.
- the mobile application server 204 may also include input/output (I/O) device (s) and/or ports 240, such as for enabling connection with a keyboard, a mouse, a pen, a voice input device, a touch input device, a display, speakers, a printer, etc.
- I/O input/output
- ports 240 such as for enabling connection with a keyboard, a mouse, a pen, a voice input device, a touch input device, a display, speakers, a printer, etc.
- the memory 224 may include the text analysis module 232, the database containing account data 234, and/or the database containing text file data 236.
- the text analysis module 232 may be configured to, in conjunction with the processors 226, analyze received text in order to determine one or more discrepancies between the received text and an expected text. In some embodiments, this may be a letter-by-letter comparison of words and sentences. However this may also be an approximate matching where strings that match approximately rather than exactly are considered a match. This may be done using one or more natural language processing (NLP) techniques, such as parsing, to analyze the words and sentences. This might result in the creation of a parse tree revealing the syntactic relationship between words, which can contain semantics which can be analyzed using semantic analysis.
- NLP natural language processing
- the text analysis module 232 may compare received text from the user device 202 to an expected text in order to provide an indication of detected discrepancies back to the user device 202. In some embodiments, the text analysis module 232 may compare received text from the user device 202 to a number of available texts from the text file data 236 in order to identify a file that includes an expected text, which may then be provided back to the user device 202.
- one or more of the text files stored within the text file database 236 may be associated with a particular user and/or account. For example, a user who is planning to give a speech may upload a text file for that speech to the text file database 236 of the mobile application server 204. In some cases, that text file may then be pushed to the user’s user device 202. In another example, a user may highlight a portion of a religious text that he or she wishes to memorize. In that example, an indication of the portion of the religious text highlighted by the user may be stored in relation to that user. Additionally, it should be noted that some speech-to-text applications may need to be trained on each particular user that wishes to utilize the application. Accordingly, a user may be required to go through an onboarding or training process in some situations in order to use the described system.
- FIG. 3 is a simplified flowchart illustrating a method of providing computer-aided memorization of texts according to an embodiment of the present invention.
- the flow is described in connection with a computer system that is an example of the computer systems described herein.
- Some or all of the operations of the flows can be implemented via specific hardware on the computer system and/or can be implemented as computer-readable instructions stored on a non-transitory computer-readable medium of the computer system.
- the computer-readable instructions represent programmable modules that include code executable by a processor of the computer system. The execution of such instructions configures the computer system to perform the respective operations.
- Each programmable module in combination with the processor represents a means for performing a respective operation (s) . While the operations are illustrated in a particular order, it should be understood that no particular order is necessary and that one or more operations may be omitted, skipped, and/or reordered.
- the process 300 begins at 302 when an initial received text is obtained.
- the received text is obtained by converting received audio data into text data using one or more speech processing techniques.
- the process 300 involves identifying and retrieving an expected text.
- an expected text is identified by virtue of having been selected by a user. For example, the user may elect to memorize a particular speech.
- that received text may be compared to the user elected speech.
- the process may then involve identifying the elected speech as the expected text if text of some portion of the received text matches text from a portion of the elected speech.
- the process 300 involves identifying the expected text by identifying a closest matching text file from some set of expected text files.
- text within several text files may be compared to the received text in order to identify a closest matching text file from the several text files.
- suitable text comparison techniques are capable of being used at this step.
- the process 300 involves receiving additional text as that text is converted from audio data.
- additional text is streamed from a user device to a mobile application server as it is received and converted from audio.
- the received additional text is processed on the user device.
- the received text is presented on a display of the user device as that text is received. Accordingly, as a user speaks, he or she would see their words printed onto a display device in real time.
- the process 300 involves processing the additional text by comparing the additional text to corresponding text within the expected text to determine whether the two match.
- the additional text is compared to text within the expected text in order identify one or more discrepancies (i.e., differences) between the two.
- discrepancies i.e., differences
- additional and/or omitted words may be identified as discrepancies.
- a discrepancy may also be detected where a word was used in the received text that is different from the corresponding word in the expected text.
- the particular discrepancies identified are assessed in order to determine a severity of the discrepancy.
- the process may involve performing natural language processing in order to determine whether the discrepancy is an important one (i.e., one that would change the meaning of the text or make it unclear) . If no discrepancy is identified, or a severity of any detected discrepancy is below some predetermined threshold, then the received text is considered to match the expected text. Otherwise, the process involves providing a correction at 310.
- the process 300 involves providing a correction to the user.
- an appropriate correction is identified based on a type of discrepancy detected.
- a set of rules may be maintained that indicate what type of correction should be provided for each available type of discrepancy that may be detected. For example, if the user includes an additional word that is not in the expected text, that word, when presented upon a display of the user device, may be struck out (e.g., have a line through it) . Alternatively, if the user fails to include a word that is in the expected text, that word may be inserted into the presented text on the display but in a different font, color, or style to indicate that it is one that the user missed. Examples of such corrections are provided elsewhere in this disclosure, such as with respect to FIG. 6.
- the process 300 involves determining whether a recitation related to the portion of expected text in which the discrepancy was detected should be repeated. In some embodiments, this determination is made based on the behavior of the user. For example, if the user continues recollecting text that follows the discrepancy, then the process would not repeat and would instead proceed to the next portion of expected text. However, if the process next receives text that matches the expected text, then the process would return to 306 and repeat by comparing the received additional text to the portion of the expected text that included the discrepancy. In some embodiments, whether the process repeats at step 312 may be determined based on a preference elected by the user.
- the process 300 involves determining whether the end of the expected text has been reached. This may involve determining that the latest received text from the user matches text included at the end of a text file associated with the expected text. If the end of the expected text has not been reached, the process may continue to repeat by monitoring for, and processing, additional text from 306. If, however, the process 300 does determine that the end of the expected text has been reached, then the process 300 may be concluded at 316.
- the user may be provided statistics upon reaching the end of the expected text. For example, the user may be provided metrics related to the number and/or severity of discrepancies identified throughout the process 300. Such metrics may be provided in any format. For example, the user may be provided with an overall percentage that represents a degree to which their recollection of the expected text correctly matches the expected text.
- FIG. 3 provides a particular method of providing computer-aided memorization of text according to an embodiment of the present invention.
- other sequences of steps may also be performed according to alternative embodiments.
- alternative embodiments of the present invention may perform the steps outlined above in a different order.
- the individual steps illustrated in FIG. 3 may include multiple sub-steps that may be performed in various sequences as appropriate to the individual step.
- additional steps may be added or removed depending on the particular applications.
- One of ordinary skill in the art would recognize many variations, modifications, and alternatives.
- FIG. 4 depicts some illustrative examples of features that may be implemented in accordance with embodiments described herein.
- FIG. 4 at least some features of the system and methods described herein are depicted as being implemented on a user device 402.
- received text 404 along with corrections 406 of discrepancies detected between the received text and an expected text.
- corrections 406 may be presented in real time as a user audibly recites their recollection of the expected text.
- the user device 402 may include a camera device capable of obtaining images of an environment 408 in which the user device 402 is located. The captured image information may then be presented on a display of the user device 402.
- the user device 402 may be a set of smart glasses or another device that includes a transparent/semi-transparent display that allows the user to view the environment 408 through the display.
- the received text 404 may be presented so that it overlays at least a portion of the user’s view of the environment 408 on the display. This advantageously allows a user of the user device 402 to utilize the described system while paying attention to their environment. This is especially useful for users who wish to use the system on a set of smart glasses while traveling.
- the user device 402 may obtain gesture information from the captured image information. For example, the user device 402 may, upon capturing an image of the environment 408, identify a portion of the image information as including a hand 410 using one or more object recognition techniques. One skilled in the art would recognize that a number of object recognition techniques may be used to accomplish this. Upon identifying a hand 410 within the environment 408, the system may monitor the hand 410 to determine whether a user of the user device 402 has performed a gesture. To do this, one or more actions performed by the hand 410 may be compared to gestures stored in a gesture repository. If the hand 410 is determined to have performed some gesture, then an action associated with that gesture may be executed.
- object recognition techniques One skilled in the art would recognize that a number of object recognition techniques may be used to accomplish this.
- the system may monitor the hand 410 to determine whether a user of the user device 402 has performed a gesture. To do this, one or more actions performed by the hand 410 may be compared to gestures stored in a gesture repository
- the user may move their hand in front of the camera so that it moves from the right side of the display to the left side of the display.
- the gesture repository may determine that the movement is associated with a “go back” gesture and may subsequently execute an action to revert to an earlier portion of the expected text. This advantageously allows a user of the user device 402 to provide commands to the described system while it is in use, since voice commands may not be available.
- the user device 402 may detect objects within the environment that might contrast with text overlaid on the display of the user device 402 and may adjust the text appropriately.
- the system might perform an analysis of the environment 408 using a camera or other sensors on the device and then automatically adjust the text color and/or intensity for a portion of text according to the actual background color/brightness at the location of the text.
- a light source 412 e.g., a light or window
- a dark object 414 may be detected within the environment 408.
- text corresponding to a location of the object 412 and/or 414 may be adjusted in order to make the text more visible without obscuring the background.
- a portion of an image of the object 412 and/or 414 may be removed or covered such that the text is visible as illustrated at 416.
- the text may be bolded or otherwise emphasized.
- a color of the text may be changed to one that contrasts with colors in the environment 408.
- FIG. 5 depicts techniques for annotating text with images in order to aid in text memorization in accordance with at least some embodiments.
- the system may annotate an expected text 502 with mnemonic devices such as images (e.g., Graphics Interchange Format (GIF) images or icons) .
- images e.g., Graphics Interchange Format (GIF) images or icons
- the system may use language processing techniques to determine a subject and/or context for a portion 504 of the expected text 502.
- the system may then identify one or more mnemonic devices, such as an image 506, which relate to the determined subject and/or context.
- the mnemonic devices may be associated with the expected text 502 at the corresponding portion 504.
- the expected text 502 may include metadata that stores information related to the expected text 502. This metadata may include an indication of images 506 (or locations at which images 506 can be accessed) as well as indications of the portion 504 of the expected text 502 that the images 506 are to be associated with.
- the system may monitor the user’s progress through the recitation of the expected text 502 and may present the annotated mnemonic devices at appropriate times. In some embodiments, this involves displaying an image 506 when a user has reached the portion 504 of the expected text 502. In some embodiments, this involves displaying the image 506 upon detecting a pause at the related portion 504 of the expected text 502, where the pause is a failure to receive text within some predetermined amount of time. In some embodiments, this involves displaying an image 506 when a user has incorrectly recited the portion 504 of the expected text 502.
- FIG. 6 depicts some example graphical user interfaces (GUIs) demonstrating example features that may be implemented in accordance with embodiments described herein.
- GUIs graphical user interfaces
- the GUIs of FIG. 6 depict a number of scenarios 602, 604, and 606. Each of the scenarios 602, 604, and 606 are depicted as a sequential series of GUI representations (A-C) .
- Depicted at the series of GUI representations 602 is an example scenario in which a discrepancy is identified between a received text and an expected text in which the received text has omitted one or more portions of text included within the expected text.
- audio data provided by a user is converted into received text data and presented via the GUI.
- the received text presented via the GUI is updated to include more recent received text as audio data continues to be received.
- the received text presented at 602 (B) is compared to the expected text to identify discrepancy 608, which is text included within the expected text that was omitted from the received text.
- the received text presented at 602 (B) is then modified to indicate the discrepancy 608 at 602 (C) .
- the discrepancy 608 may be presented as text that has been inserted into the received text.
- the color, intensity, size, and/or formatting of the text may be modified to indicate that it is a discrepancy of omitted text.
- Depicted at the series of GUI representations 604 is an example scenario in which a discrepancy is identified between a received text and an expected text in which the received text has one or more portions of text that do not match that included within the expected text.
- audio data provided by a user is converted into received text data and presented via the GUI.
- the received text presented via the GUI is updated to include more recent received text as audio data continues to be received.
- the received text presented at 604 (B) is compared to the expected text to identify discrepancy 610, which is text that differs from corresponding text within the expected text.
- the received text presented at 604 (B) is then modified to indicate the discrepancy 610 at 604 (C) .
- the discrepancy 610 may be presented as text that does not belong within the received text.
- the appearance of the text may be modified to indicate that it is a discrepancy of differing or additional text.
- the text of discrepancy 610 may appear as struck out text with the correct text also presented.
- mnemonic devices may be associated with various portions of an expected text. Depicted at the series of GUI representations 606 is an example scenario in which such a mnemonic device is used.
- audio data provided by a user is converted into received text data and presented via the GUI.
- the received text presented via the GUI would normally be updated to include more recent received text as audio data continues to be received.
- a mnemonic device 612 may be presented along with the most recent received text.
- a mnemonic device 612 may be presented automatically upon reaching a location within the expected text associated with the mnemonic device 612.
- the mnemonic device 612 may be presented upon detecting a pause or break in the received text (e.g., no additional received text is received for some threshold period of time) . Once additional received text is received that matches the portion of the expected text associated with the mnemonic device 612, the mnemonic device 612 may be removed from presentation and replaced by the received text as depicted at 606 (C) .
- FIG. 7 illustrates a flow diagram depicting a process for providing computer-aided memorization of text in accordance with at least some embodiments.
- the process 700 depicted in FIG. 7 may be performed by a user device (e.g., user device 202 of FIG. 2) which may be in communication with a mobile application server (e.g., mobile application server 204 of FIG. 2) .
- a user device e.g., user device 202 of FIG. 2
- a mobile application server e.g., mobile application server 204 of FIG. 2
- the process 700 begins at 702, when audio data is received from a user at a user device 702.
- the audio data includes words spoken by the user of the user device or music played by the user via a musical instrument.
- the process 700 involves converting the received audio data to received text data.
- One skilled in the art would appreciate that a number of audio recognition techniques are available in the art that are capable of being used to convert audio data into received text.
- the process 700 involves comparing the received text data to an expected text data to identify one or more discrepancies.
- Discrepancies are each variances between specific content of the received text data and the expected text data.
- a discrepancy is an extra or missing word or words.
- a discrepancy is detected where a word was used in the received text data that is different from the corresponding word in the expected text data.
- the process 700 involves presenting the received text as it is converted from audio data.
- the method further involves performing one or more language processing techniques on the expected text, identifying a mnemonic device (e.g., an image) associated with the expected text based on a result of the one or more language processing techniques, and displaying the mnemonic device associated with the expected text on a display of the user device.
- the method further involves capturing environment data using one or more sensors of the user device, and adjusting a text style in which the received text is presented based on the environment data.
- the process 700 involves indicating the identified discrepancies within the presented text.
- the one or more discrepancies between the received text and the expected text are indicated via highlighting or cross out.
- the method further involves presenting, for each of the one or more discrepancies between the received text and the expected text, a correction that includes a portion of the expected text corresponding to the corresponding discrepancy.
- FIG. 8 illustrates examples of components of a computer system 800 according to certain embodiments.
- the computer system 800 is an example of the computer system described herein above. Although these components are illustrated as belonging to a same computer system 800, the computer system 800 can also be distributed.
- the computer system 800 includes at least a processor 802, a memory 804, a storage device 806, input/output peripherals (I/O) 808, communication peripherals 810, and an interface bus 812.
- the interface bus 812 is configured to communicate, transmit, and transfer data, controls, and commands among the various components of the computer system 800.
- the memory 804 and the storage device 806 include computer-readable storage media, such as RAM, ROM, electrically erasable programmable read-only memory (EEPROM) , hard drives, CD-ROMs, optical storage devices, magnetic storage devices, electronic non-volatile computer storage, for example FLASH TM memory, and other tangible storage media. Any of such computer readable storage media can be configured to store instructions or program codes embodying aspects of the disclosure.
- the memory 804 and the storage device 806 also include computer readable signal media.
- a computer readable signal medium includes a propagated data signal with computer readable program code embodied therein. Such a propagated signal takes any of a variety of forms including, but not limited to, electromagnetic, optical, or any combination thereof.
- a computer readable signal medium includes any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use in connection with the computer system 800.
- the memory 804 includes an operating system, programs, and applications.
- the processor 802 is configured to execute the stored instructions and includes, for example, a logical processing unit, a microprocessor, a digital signal processor, and other processors.
- the memory 804 and/or the processor 802 can be virtualized and can be hosted within another computer system of, for example, a cloud network or a data center.
- the I/O peripherals 808 include user interfaces, such as a keyboard, screen (e.g., a touch screen) , microphone, speaker, other input/output devices, and computing components, such as graphical processing units, serial ports, parallel ports, universal serial buses, and other input/output peripherals.
- the I/O peripherals 808 are connected to the processor 802 through any of the ports coupled to the interface bus 812.
- the communication peripherals 810 are configured to facilitate communication between the computer system 800 and other computing devices over a communications network and include, for example, a network interface controller, modem, wireless and wired interface cards, antenna, and other communication peripherals.
- a computing device can include any suitable arrangement of components that provide a result conditioned on one or more inputs.
- Suitable computing devices include multipurpose microprocessor-based computer systems accessing stored software that programs or configures the computer system from a general-purpose computing apparatus to a specialized computing apparatus implementing one or more embodiments of the present subject matter. Any suitable programming, scripting, or other type of language or combinations of languages may be used to implement the teachings contained herein in software to be used in programming or configuring a computing device.
- Embodiments of the methods disclosed herein may be performed in the operation of such computing devices.
- the order of the blocks presented in the examples above can be varied-for example, blocks can be re-ordered, combined, and/or broken into sub-blocks. Certain blocks or processes can be performed in parallel.
- based on is meant to be open and inclusive, in that a process, step, calculation, or other action "based on” one or more recited conditions or values may, in practice, be based on additional conditions or values beyond those recited.
- use of “based at least in part on” is meant to be open and inclusive, in that a process, step, calculation, or other action “based at least in part on” one or more recited conditions or values may, in practice, be based on additional conditions or values beyond those recited. Headings, lists, and numbering included herein are for ease of explanation only and are not meant to be limiting.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Educational Administration (AREA)
- Educational Technology (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- User Interface Of Digital Computer (AREA)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202180016189.1A CN115136223A (zh) | 2020-03-25 | 2021-03-04 | 用于提供文本的计算机辅助记忆的系统和方法 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202062994597P | 2020-03-25 | 2020-03-25 | |
US62/994,597 | 2020-03-25 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021190267A1 true WO2021190267A1 (en) | 2021-09-30 |
Family
ID=77890926
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/079074 WO2021190267A1 (en) | 2020-03-25 | 2021-03-04 | System and method for providing computer aided memorization of text |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN115136223A (zh) |
WO (1) | WO2021190267A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115050349A (zh) * | 2022-06-14 | 2022-09-13 | 抖音视界(北京)有限公司 | 文本转换音频的方法、装置、设备和介质 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160254000A1 (en) * | 2015-02-27 | 2016-09-01 | Kevin Thomas Thalanany | Automated Digital Agreement Attestation & Electronic Signature Execution via Speech-Recognition |
CN107222490A (zh) * | 2017-06-19 | 2017-09-29 | 广州市讯飞樽鸿信息技术有限公司 | 一种语音验证方法 |
CN109448455A (zh) * | 2018-12-20 | 2019-03-08 | 广东小天才科技有限公司 | 一种实时纠错的背诵方法及家教设备 |
CN109614971A (zh) * | 2018-12-05 | 2019-04-12 | 山东政法学院 | 一种比对式文件检验仪 |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003150291A (ja) * | 2001-11-14 | 2003-05-23 | Oki Electric Ind Co Ltd | 携帯端末の画面表示制御方法及び装置 |
US20130230833A1 (en) * | 2012-03-05 | 2013-09-05 | Vitrepixel Holdings, Llc | Method for Enhanced Memorization and Retention of Consecutive Text |
CN105630447B (zh) * | 2015-12-24 | 2019-04-16 | 小米科技有限责任公司 | 调整文字显示的方法及装置 |
CN108389440A (zh) * | 2018-03-15 | 2018-08-10 | 广东小天才科技有限公司 | 一种基于麦克风的语音播放方法、装置及语音播放设备 |
CN110309350B (zh) * | 2018-03-21 | 2023-09-01 | 腾讯科技(深圳)有限公司 | 背诵任务的处理方法、系统、装置、介质及电子设备 |
CN109448460A (zh) * | 2018-12-17 | 2019-03-08 | 广东小天才科技有限公司 | 一种背诵检测方法及用户设备 |
CN110310086B (zh) * | 2019-06-06 | 2022-04-05 | 安徽淘云科技有限公司 | 辅助背诵提醒方法、设备和存储介质 |
CN110413955B (zh) * | 2019-07-30 | 2023-04-07 | 北京小米移动软件有限公司 | 字重调节方法、装置、终端及存储介质 |
-
2021
- 2021-03-04 WO PCT/CN2021/079074 patent/WO2021190267A1/en active Application Filing
- 2021-03-04 CN CN202180016189.1A patent/CN115136223A/zh active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160254000A1 (en) * | 2015-02-27 | 2016-09-01 | Kevin Thomas Thalanany | Automated Digital Agreement Attestation & Electronic Signature Execution via Speech-Recognition |
CN107222490A (zh) * | 2017-06-19 | 2017-09-29 | 广州市讯飞樽鸿信息技术有限公司 | 一种语音验证方法 |
CN109614971A (zh) * | 2018-12-05 | 2019-04-12 | 山东政法学院 | 一种比对式文件检验仪 |
CN109448455A (zh) * | 2018-12-20 | 2019-03-08 | 广东小天才科技有限公司 | 一种实时纠错的背诵方法及家教设备 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115050349A (zh) * | 2022-06-14 | 2022-09-13 | 抖音视界(北京)有限公司 | 文本转换音频的方法、装置、设备和介质 |
CN115050349B (zh) * | 2022-06-14 | 2024-06-11 | 抖音视界有限公司 | 文本转换音频的方法、装置、设备和介质 |
Also Published As
Publication number | Publication date |
---|---|
CN115136223A (zh) | 2022-09-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102347398B1 (ko) | 터치스크린 상에 표시되는 조치 가능한 콘텐츠 | |
US11176141B2 (en) | Preserving emotion of user input | |
US11749276B2 (en) | Voice assistant-enabled web application or web page | |
US10970678B2 (en) | Conference information accumulating apparatus, method, and computer program product | |
CN110473525B (zh) | 获取语音训练样本的方法和装置 | |
US20120166522A1 (en) | Supporting intelligent user interface interactions | |
JP6150268B2 (ja) | 単語登録装置及びそのためのコンピュータプログラム | |
CN110969012A (zh) | 文本纠错方法、装置、存储介质及电子设备 | |
US10950240B2 (en) | Information processing device and information processing method | |
US11514893B2 (en) | Voice context-aware content manipulation | |
CN111860000A (zh) | 文本翻译编辑方法、装置、电子设备及存储介质 | |
WO2021190267A1 (en) | System and method for providing computer aided memorization of text | |
US20170004859A1 (en) | User created textbook | |
CN110286776A (zh) | 字符组合信息的输入方法、装置、电子设备和存储介质 | |
US20210134177A1 (en) | System and method for displaying voice-animated multimedia content | |
US20130179165A1 (en) | Dynamic presentation aid | |
US11238754B2 (en) | Editing tool for math equations | |
US11741302B1 (en) | Automated artificial intelligence driven readability scoring techniques | |
WO2023235018A1 (en) | Automatic content generation | |
CN108509057B (zh) | 输入方法与相关设备 | |
KR102618311B1 (ko) | 영어 회화 강의 콘텐츠 제공 방법 및 장치 | |
US20240282303A1 (en) | Automated customization engine | |
US20230082325A1 (en) | Utterance end detection apparatus, control method, and non-transitory storage medium | |
US20240095448A1 (en) | Automatic guidance to interactive entity matching natural language input | |
US20230385320A1 (en) | Automatic content generation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21775789 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21775789 Country of ref document: EP Kind code of ref document: A1 |