WO2012152290A1 - Dispositif mobile d'alphabétisation - Google Patents

Dispositif mobile d'alphabétisation Download PDF

Info

Publication number
WO2012152290A1
WO2012152290A1 PCT/EG2011/000011 EG2011000011W WO2012152290A1 WO 2012152290 A1 WO2012152290 A1 WO 2012152290A1 EG 2011000011 W EG2011000011 W EG 2011000011W WO 2012152290 A1 WO2012152290 A1 WO 2012152290A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
unit
reading
handwriting
literacy
Prior art date
Application number
PCT/EG2011/000011
Other languages
English (en)
Inventor
Mohsen Abdel-Razik Ali Rashwan
Sherif Mahdy Abdou ESSAWY
Original Assignee
Mohsen Abdel-Razik Ali Rashwan
Essawy Sherif Mahdy Abdou
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mohsen Abdel-Razik Ali Rashwan, Essawy Sherif Mahdy Abdou filed Critical Mohsen Abdel-Razik Ali Rashwan
Publication of WO2012152290A1 publication Critical patent/WO2012152290A1/fr

Links

Classifications

    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B11/00Teaching hand-writing, shorthand, drawing, or painting
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B17/00Teaching reading
    • G09B17/003Teaching reading electrically operated apparatus or devices
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B19/00Teaching not covered by other main groups of this subclass
    • G09B19/06Foreign languages
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B5/00Electrically-operated educational appliances
    • G09B5/04Electrically-operated educational appliances with audible presentation of the material to be studied
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/065Adaptation
    • G10L15/07Adaptation to the speaker
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Definitions

  • a device for helping a person learn to read and write displays a plurality of character sets, such that the character sets in the proper order spell a word in a given language. Each character set is associated with a unique color relative to the other character sets in the plurality.
  • the device further displays a set of items (e.g., lines or boxes) representing character locations, such that each item is associated with an image representing the sound of one of the character sets and a color matching the color associated with the one of the character sets.
  • the colors associated with the items representing the character locations indicate the order of the character sets to spell the word.
  • the device may be a flashcard, which may be made for use with a dry erase marker.
  • Various learning games can be played that use the device to make learning easy and fun.
  • a multimedia computerized literacy system providing for and enhancing the teaching of Literacy Skills to enable people to function more effectively in schools, society, and the workplace.
  • the design of the system is based on the identification and understanding of the structures, contents, and strategies which underlie prose, document, and quantitative literacy.
  • the system's design is based, in part, on the Knowledge Model Procedure, which builds upon a unique taxonomy of document structures, contents, and strategies in a way that facilitates the transfer of Literacy Skills across the wide array of document types, as well as across quantitative and prose Literacy Skills.
  • the computerized literacy system through a series of structure and use lessons, provides students with the skills to perform document, prose, and quantitative tasks of increasing complexity.
  • the system uses specially designed exercise and practice tasks to enhance students' abilities to perform literacy tasks commonly required of adolescents and adults in modern society.
  • a literacy system provides teaching for reading and writing skills.
  • the literacy system may include exercises for teaching visual sequencing, motor skills, phonology, semantics, syntax, and text.
  • the literacy system may have a pre-reading section, which includes exercises for developing visual sequencing skills and motor skills prior to teaching the skills of reading and writing.
  • the device Reduce the required in-class time from several months to few weeks.
  • the student can have long intermediate breaks for self study and practice which optimize the time required for the regular classes.
  • the device provides an alternative choice for the full time workers who can study any time and any place according to their needs.
  • the device helps in reaching the far-flung areas in the country and assist in bridging the divide between the large cities and the distant small village communities.
  • the device provides the privacy of the self study style specifically for the adult learner who may be reluctant to join the regular classrooms and commit reading and writing errors in front of other students.
  • the device can greatly help to reduce the rate of relapse back into illiteracy.
  • the student will receive regular messages that are downloaded to his device.
  • the new practice materials can be selected according to student favorites such as new lessons, recent news, vocational guides, public services and job announcements.
  • This device consists of a small screen to display the educational materials and lessons.
  • This device includes a microphone to record the student reading. Also the device screen is touch enabled to capture the handwriting input.
  • the device includes a set of control keys as shown in figure (1).
  • the keys functions are:
  • Reading Practice In these practice questions the student select the correct answer from multiple choice answers using the movement arrow keys then press enter. Those practice examples are associated with recorded messages that explain for the student what is required to solve the practice question in case the student didn't manage to read the instructions. Reading Practice
  • the reading practice examples are enabled to be self study where the student can read an example displayed on the device screen then the application verify his reading using the Automatic Speech Recognition (ASR) technology.
  • ASR Automatic Speech Recognition
  • the system uses Hidden Markov Models (HMM) to produce a score for the student reading. If the student score is above certain threshold then his reading is judged as correct one.
  • the system produces feedback messages to inform the user on his performance either as correct reading or inform him with his reading errors with guiding instructions to avoid them in future trials.
  • the system calculates a confidence measure associated with the score to produce feedback messages to the user in cases of high confidence to reduce the mistaken feedback messages. Initially the system records a sample of the user voice to adapt the system models to match the acoustic characteristics of the user.
  • the following sections include detailed descriptions for each of the system items:
  • a Hidden Markov Model is a network of states connected by directed transition branches.
  • a HMM based speech recognizer uses a HMM model to model the production of speech sounds.
  • the HMM recognizer represents each type of phone in a language by a phone model made up of a handful of connected states.
  • Each state in an HMM has an associated probability distribution of the acoustic features which are produced while in the state.
  • the output distributions may be Gaussian distributions, or weighted mixtures of Gaussian distributions, etc.
  • Each transition branch in a HMM model has a transition probability indicating the probability of transiting from the branch's source state to its destination state. All transition probabilities out of any given state, including any self transition probabilities, sum to one.
  • the output and transition probability distributions for all states in a HMM are established from training data using standard HMM training algorithms such as the famous forward-backward (Baum-Welch) algorithm.
  • the HMM recognizer models every spoken sentence as having been produced by traversing a path through the states within the HMMs. In general, a frame of acoustic features is produced at each time-step along this path. The path identifies the sequence of states traversed. The path also identifies the duration of time spent in each state of the sequence, thereby defining the time-duration of each phone and each word of a sentence. Put in another way, the path describes an "alignment" of the sequence of frames with a corresponding sequence of states of the HMMs.
  • An HMM search engine within the HMM recognizer computes a maximum likelihood path.
  • the maximum likelihood path is a path through the hidden Markov models with the maximum likelihood of generating the acoustic feature sequence extracted from the speech of the user.
  • the maximum likelihood path includes the sequence of states traversed and the duration of time spent in each state.
  • the maximum likelihood path defines an acoustic segmentation of the acoustic features into a sequence of phones.
  • the acoustic segmentation is a subset of the path information, including time boundaries and the phone-type labels of the sequence of phones.
  • the HMM search engine computes the maximum likelihood path through its HMMs according to a standard pruning HMM search algorithm that uses the well-known Viterbi search method.
  • the sequence of spoken words from the speaker is known in advance by the pronunciation evaluation system.
  • Using the known word sequence as an additional constraint can reduce recognition and segmentation errors and also reduce the amount of computation required by the HMM engine.
  • the input speech is composed of its constituent words, which are in turn broken down into constituent phones, which are themselves broken down into constituent states.
  • the target of this phase is to collect some utterances for the user and use this data to adapt the system acoustic models to match the user speech characteristics.
  • the user enrolment process can be summarized in the following steps:
  • Step 1 - collect few common sentences from the user to be able to select the nearest cluster to the user's voice in the acoustic space. This nearest cluster model will be used as a reference model for that user.
  • Step 2 Prompt the user to utter phrases and test them with reference models generated in step 1. If the system decides that an utterance is free of pronunciation errors add it to the group that is used in speaker adaptation.
  • Step 3 Continue until the amount of collected adaptation data is sufficient to produce a system with merely acceptable performance then apply incremental speaker adaptation technique to transform the reference models to the speaker's domain.
  • the system makes an analysis for the user reading samples by segmenting them to its constitutes phonemes according to the reference sentence under test using the HMM adapted models that match the user acoustic characteristics.
  • Figure (5) show this process.
  • All the produced phoneme segments in the previous step are associated with a statistical score that represents the similarity measure between the phone segment and its representative HMM model.
  • the user reading evaluation is estimated from the average score for all the phone segments in the user reading sentence. If the user score is above a specific threshold, it will be considered a correct trail and if it is below the threshold it will be rejected.
  • the decision threshold is selected based on experimental results for large number of users.
  • a confidence scoring algorithm is implemented. This algorithm receives the n-best decoded word sequence from the decoder, and then analyzes their scores.
  • the first alternative path model is used as the competing decode model to calculate the confidence score based on likelihood ratio as shown in the following equation. cs ⁇ l T ⁇ ⁇ ⁇ ⁇ ⁇
  • N is the number of frames of a hypothesized phone
  • S is the start frame
  • E is the end frame
  • ⁇ best ⁇ s the hypothesized path score
  • This layer Analyze results from the speech recognizer and user selectable options to produce useful feedback messages to the user.
  • the feedback response is designed based on the confidence score that was calculated by the speech recognizer.
  • the present system handle this issue with one of the following scenarios based on the confidence score value:
  • the writing practice examples are enabled to be self study where the application displays an animation for a writing example then the student imitate the example by handwriting on the device screen then the system verify his handwriting using Handwriting Verification technology.
  • the tool displays a transparent image for a handwritten training sample. The user writes over this transparent image.
  • the tool sets specific control points on this transparent image as shown in figure (6). These points aren't visible to user but they are used to track his handwriting.
  • the function of the tool is to track the student performance and measure four factors:
  • This invention provides a device for teaching literacy students the basic reading and writing skills and the language rules.
  • the device can be useful also for the initial grade students in elementary schools.
  • Another important usage for the device is for teaching the Arabic language for non- native speakers.
  • the system models can be modified t match the teaching of the different spoken dialects of the Arabic language and any other foreign language.
  • the device is not designed to be a complete substitute for the class teacher but an assistant for him to save in-class effort by providing an educational device for extra training in any time and any where. Brief description of the drawing
  • FIG.1 shows the outer layout of the device. It consists of: 101 speakers to hear the reference reading and the corrective audio feedback, 102 display, 103 microphone for the user to enter his response, 104 navigation buttons, 105 lesson selection button to select the needed lesson to be learnt, 106 example selection button to select the needed verse to be practiced, and 107 an enter button to enter selection, start recording button, start handwriting button.
  • FIG. 2 shows the internal layout of the device. It consists of 201 the speech output controller, 202 the display controller, 203 Random Access Memory (RAM) to store temporarily results, 204 Read Only Memory (ROM) to store the system permanent settings, 205 Digital Signal Processor (DSP) to perform the need system operations, 206 speech input controller, 207 main controller to control all system operations, and 208 Electrically Erasable Programmable Random Access Memory (EEPROM) to store user settings and profile.
  • RAM Random Access Memory
  • ROM Read Only Memory
  • DSP Digital Signal Processor
  • EEPROM Electrically Erasable Programmable Random Access Memory
  • FIG.3 shows the block diagram of the user enrolment phase.
  • First 305 cluster transform is selected out of 304 user clusters.
  • 308 user input is applied to 307 verification HMM with 306 enrollment phase lattices to generate 309 phonetic recognition and segmentation of 308 user input.
  • 310 phonetic error detector decides if the 308 user input is correct or not. Correct utterances are store in 315 correct files database to be entered to 314 user adaptation to create 313 user transform.
  • FIG. 4 shows the block diagram of the pronunciation variants generator block. Starting with 401 Reading text and 402 reading symbols entered to the 403 events engine producing 404 features on each characters. Then the 405 transcription engine outputs the 406 phonetic transcription of the input 401 reading text. 410 lattice generator takes 409 lattice generation rules and the 408 pronunciation patterns resulted from the 407 pattern engine to produce a 411 searchable lattice.
  • FIG. 5 shows the block diagram of the whole system to judge certain user selected example in a certain user selected mode .
  • the 411 searchable lattice is generated using the 504 pronunciation variants generator (which was described in figure 4).
  • the 307 verification HMM takes the 313 user transform (generated in the user enrollment phase described in figure 3) and the 505 acoustic features that was generated from the 504 feature extraction block, to produce 507 phonetic recognition and segmentation of the 508 user selected example which is passed to the 509 confidence layer.
  • the 511 phoneme duration analysis layer gets decisions on recitation rules related to phoneme duration, then the 512 user feedback generator produces the 513 corrective feedback to the user depending on his 510 configuration and preferences.
  • FIG. A sample for handwriting training.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

L'invention porte sur un dispositif mobile d'alphabétisation. L'innovation dans cette invention est l'utilisation de procédés éducatifs interactifs. Ces procédés sont basés sur des technologies de reconnaissance de parole automatique et de reconnaissance d'écriture manuscrite automatique afin d'évaluer les lectures et écritures manuscrites de pratique d'échantillon de l'élève en alphabétisation et lui donner un retour d'informations sur ses erreurs et une évolution de performance d'une manière similaire au professeur de classe. Cette invention stimule les capacités d'auto-apprentissage des élèves, ce qui fait du dispositif un outil efficace lorsque le professeur de classe n'est pas disponible.
PCT/EG2011/000011 2011-05-11 2011-05-11 Dispositif mobile d'alphabétisation WO2012152290A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EG2011050739 2011-05-11
EG2011050739 2011-05-11

Publications (1)

Publication Number Publication Date
WO2012152290A1 true WO2012152290A1 (fr) 2012-11-15

Family

ID=44628555

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EG2011/000011 WO2012152290A1 (fr) 2011-05-11 2011-05-11 Dispositif mobile d'alphabétisation

Country Status (1)

Country Link
WO (1) WO2012152290A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170330479A1 (en) * 2016-05-11 2017-11-16 OgStar Reading, LLC Interactive Multisensory Learning Process and Tutorial Device
US10311296B2 (en) 2016-02-15 2019-06-04 Samsung Electronics Co., Ltd. Method of providing handwriting style correction function and electronic device adapted thereto
US20220028390A1 (en) * 2020-07-23 2022-01-27 Pozotron Inc. Systems and methods for scripted audio production

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5983178A (en) * 1997-12-10 1999-11-09 Atr Interpreting Telecommunications Research Laboratories Speaker clustering apparatus based on feature quantities of vocal-tract configuration and speech recognition apparatus therewith
WO2000072290A1 (fr) * 1999-05-21 2000-11-30 Revelation Computing Pty Limited Appareil d'apprentissage de l'ecriture
US6224383B1 (en) * 1999-03-25 2001-05-01 Planetlingo, Inc. Method and system for computer assisted natural language instruction with distracters
GB2388239A (en) * 2002-05-03 2003-11-05 Yehouda Harpaz Hand-writing practising system
WO2006122361A1 (fr) * 2005-05-17 2006-11-23 Swinburne University Of Technology Systeme d’apprentissage personnel
US20090083288A1 (en) * 2007-09-21 2009-03-26 Neurolanguage Corporation Community Based Internet Language Training Providing Flexible Content Delivery

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5983178A (en) * 1997-12-10 1999-11-09 Atr Interpreting Telecommunications Research Laboratories Speaker clustering apparatus based on feature quantities of vocal-tract configuration and speech recognition apparatus therewith
US6224383B1 (en) * 1999-03-25 2001-05-01 Planetlingo, Inc. Method and system for computer assisted natural language instruction with distracters
WO2000072290A1 (fr) * 1999-05-21 2000-11-30 Revelation Computing Pty Limited Appareil d'apprentissage de l'ecriture
GB2388239A (en) * 2002-05-03 2003-11-05 Yehouda Harpaz Hand-writing practising system
WO2006122361A1 (fr) * 2005-05-17 2006-11-23 Swinburne University Of Technology Systeme d’apprentissage personnel
US20090083288A1 (en) * 2007-09-21 2009-03-26 Neurolanguage Corporation Community Based Internet Language Training Providing Flexible Content Delivery

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10311296B2 (en) 2016-02-15 2019-06-04 Samsung Electronics Co., Ltd. Method of providing handwriting style correction function and electronic device adapted thereto
US20170330479A1 (en) * 2016-05-11 2017-11-16 OgStar Reading, LLC Interactive Multisensory Learning Process and Tutorial Device
US11417234B2 (en) * 2016-05-11 2022-08-16 OgStar Reading, LLC Interactive multisensory learning process and tutorial device
US20220028390A1 (en) * 2020-07-23 2022-01-27 Pozotron Inc. Systems and methods for scripted audio production
WO2022020422A1 (fr) * 2020-07-23 2022-01-27 Pozotron Inc. Systèmes et procédés pour une production audio scripté
US11875797B2 (en) 2020-07-23 2024-01-16 Pozotron Inc. Systems and methods for scripted audio production

Similar Documents

Publication Publication Date Title
Agarwal et al. A review of tools and techniques for computer aided pronunciation training (CAPT) in English
Wik et al. Embodied conversational agents in computer assisted language learning
Ehsani et al. Speech technology in computer-aided language learning: Strengths and limitations of a new CALL paradigm
Saz et al. Tools and technologies for computer-aided speech and language therapy
US7280964B2 (en) Method of recognizing spoken language with recognition of language color
Tsubota et al. Practical use of English pronunciation system for Japanese students in the CALL classroom
CN109697988B (zh) 一种语音评价方法及装置
US20070003913A1 (en) Educational verbo-visualizer interface system
US9520068B2 (en) Sentence level analysis in a reading tutor
Cucchiarini et al. Second language learners' spoken discourse: Practice and corrective feedback through automatic speech recognition
Inoue et al. A Study of Objective Measurement of Comprehensibility through Native Speakers' Shadowing of Learners' Utterances.
Vaquero et al. VOCALIZA: An application for computer-aided speech therapy in Spanish language
LaRocca et al. On the path to 2X learning: Exploring the possibilities of advanced speech recognition
McKay Authenticity in the language teaching curriculum
Liao et al. A prototype of an adaptive Chinese pronunciation training system
CN109697975B (zh) 一种语音评价方法及装置
WO2012152290A1 (fr) Dispositif mobile d'alphabétisation
Hönig Automatic assessment of prosody in second language learning
Price et al. Assessment of emerging reading skills in young native speakers and language learners
Delmonte Exploring speech technologies for language learning
Bai Pronunciation Tutor for Deaf Children based on ASR
Altalmas et al. Lips tracking identification of a correct Quranic letters pronunciation for Tajweed teaching and learning
Utami et al. Improving students’ English pronunciation competence by using shadowing technique
van Doremalen Developing automatic speech recognition-enabled language learning applications: from theory to practice
Zhang et al. Cognitive state classification in a spoken tutorial dialogue system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11732354

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11732354

Country of ref document: EP

Kind code of ref document: A1