EP3979239A1 - Procédé et appareil pour l'évaluation automatique de compétences vocales et linguistiques - Google Patents
Procédé et appareil pour l'évaluation automatique de compétences vocales et linguistiques Download PDFInfo
- Publication number
- EP3979239A1 EP3979239A1 EP20200164.0A EP20200164A EP3979239A1 EP 3979239 A1 EP3979239 A1 EP 3979239A1 EP 20200164 A EP20200164 A EP 20200164A EP 3979239 A1 EP3979239 A1 EP 3979239A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- speech
- response
- user
- speech segment
- stimulus
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 230000004044 response Effects 0.000 claims abstract description 77
- 238000012545 processing Methods 0.000 claims abstract description 21
- 238000003058 natural language processing Methods 0.000 claims abstract description 14
- 238000010801 machine learning Methods 0.000 claims description 11
- 238000013499 data model Methods 0.000 claims description 10
- 230000000153 supplemental effect Effects 0.000 claims description 10
- 230000009471 action Effects 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 8
- 230000003595 spectral effect Effects 0.000 claims description 3
- 238000012549 training Methods 0.000 description 9
- 238000005516 engineering process Methods 0.000 description 6
- 230000007246 mechanism Effects 0.000 description 6
- 230000006870 function Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 208000030979 Language Development disease Diseases 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 3
- 208000030137 articulation disease Diseases 0.000 description 3
- 208000027765 speech disease Diseases 0.000 description 3
- 206010011469 Crying Diseases 0.000 description 2
- 230000019771 cognition Effects 0.000 description 2
- 238000013178 mathematical model Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000001755 vocal effect Effects 0.000 description 2
- 241000282326 Felis catus Species 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000001427 coherent effect Effects 0.000 description 1
- 238000010835 comparative analysis Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 230000035622 drinking Effects 0.000 description 1
- 208000011977 language disease Diseases 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000002250 progressing effect Effects 0.000 description 1
- 238000013179 statistical model Methods 0.000 description 1
- 230000009182 swimming Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B19/00—Teaching not covered by other main groups of this subclass
- G09B19/04—Speaking
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B19/00—Teaching not covered by other main groups of this subclass
- G09B19/06—Foreign languages
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B5/00—Electrically-operated educational appliances
- G09B5/02—Electrically-operated educational appliances with visual presentation of the material to be studied, e.g. using film strip
Definitions
- the present invention relates to a method and system for the automatic assessment of speech and language skills..
- Speech and language skills refer to a range of skills that are assessed when determining if speech skills, in particular those of children, are progressing in line with expected targets for children of comparable ages. Speech and language skills are roughly grouped into four categories, namely:
- a method of automatically assessing speech of a user comprising eliciting a response from the user based on the presentation of a stimulus to a user, the stimulus comprising an image and an associated instruction and the response comprising a speech segment; processing the speech segment using a speech recognition engine and a natural language processing engine to identify a characteristic of the speech segment; and scoring the elicited response based on the identified characteristic.
- This automated method and system provides increased efficiency and accuracy in the automatic assessment of speech and language skills.
- the methods apparatus and systems described herein serve to remove subjectivity in assessment by providing a standard and trainable system.
- An expanded set of data models as described herein is easily scaleable to provide a large set of data points.
- predefined prompts or stimuli are customised to the delivery mechanisms thus improving the efficiencies of the system.
- These customised stimuli enable personalised to be generated while still providing benchmarked assessments. By combining multiple machine learning techniques higher accuracy and efficiency is achieved.
- Scoring the elicited response may comprise comparing the elicited response with an expected response to the stimulus.
- Processing the speech segment using a speech recognition engine may comprise extracting a target word from the speech segment; processing the target word to determine a set of phonetic components for each target word; comparing each phonetic component of the set of phonetic components with a machine learning data model for the predefined image; and characterising each component based on the comparison.
- Characterising each component may comprise characterising each component as correct, omitted or substituted.
- Characterising each component may further comprise comparing each phonetic component with machine learning data models for a set of predefined similar target words, the predefined similar target words having a number of features in common with the target.
- Processing the speech segment using the natural language processing engine to identify the characteristic may comprise: assessing the speech segment to determine a plurality of speech-related skills, said speech related skills extracted from the list of characteristics, including determining whether a user identified an action associated with the instruction;
- the method may further comprise differentiating between speech components and non-speech components of the speech segment to identify a supplemental feature and wherein scoring the elicited response is further based on the supplemental feature.
- the supplemental features may comprise a duration of utterance, a number of pauses or a number of speech segments
- the supplemental features may comprise an acoustic feature selected from the list including spectral features, voice quality features, prosodic features, duration of speech segment, number of speech segments or voice onset timing.
- the method may further comprise eliciting a second response from the user based on a predefined audible instruction, the second response comprising a selection of an image from a predefined set of images displayed on the screen in response to the audible instruction, and metadata associated with the selection of the image.
- Assessing speech of the user may be further based on the second response.
- the screen may comprise a touch screen.
- the method of the present embodiment may further comprise generating a report outlining the scoring of the user in response to the stimulus.
- an apparatus for automatically assessing speech of a user comprising means for eliciting a response from the user based on the presentation of a stimulus to a user, the stimulus comprising an image and an associated instruction and the response comprising a speech segment; means for processing the speech segment using a speech recognition engine and a natural language processing engine to identify a characteristic of the speech segment; and means for scoring the elicited response based on the identified characteristic
- Means for scoring the elicited response may comprise means for comparing the elicited response with an expected response to the stimulus.
- Means for processing the speech segment using a speech recognition engine may comprise means for extracting a target word from the speech segment; means for processing the target word to determine a set of phonetic components for each target word; means for comparing each phonetic component of the set of phonetic components with a machine learning data model for the predefined image; and characterising each component based on the comparison.
- Means for characterising each component may comprise means for characterising each component as correct, omitted or substituted.
- Means for characterising each component may further comprise comparing each phonetic component with machine learning data models for a set of predefined similar target words, the predefined similar target words having a number of features in common with the target
- Means for processing the speech segment using the natural language processing engine to identify the characteristic may comprise: means for assessing the speech segment to determine a plurality of speech-related skills, said speech related skills extracted from the list of characteristics, including means for determining whether a user identified an action associated with the instruction; means for determining a coherence of a description of the image; determining whether the user response was grammatically correct; means for determining whether the response demonstrated an understanding of the instruction; means for determining whether the user identified one or more action words associated with the stimulus; or means for determining whether the user identified a correct preposition.
- the apparatus may further comprise means for differentiating between speech components and non-speech components of the speech segment to identify a supplemental feature and wherein scoring the elicited response is further based on the supplemental feature.
- the apparatus may further comprise means for eliciting a second response from the user based on a predefined audible instruction, the second response comprising a selection of an image from a predefined set of images displayed on the screen in response to the audible instruction, and metadata associated with the selection of the image.
- the screen may comprise a touch screen.
- the apparatus may further comprise means for generating a report outlining the scoring of the user in response to the stimulus.
- the means for eliciting a response from the user based on the presentation of a stimulus to a user comprising an image and an associated instruction and the response comprising a speech segment; means for processing the speech segment using a speech recognition engine and a natural language processing engine to identify a characteristic of the speech segment; and means for scoring the elicited response based on the identified characteristic may comprise a processing module.
- the apparatus may further comprise a memory module.
- the apparatus may further comprise a graphical user interface.
- the apparatus may further comprise a touch screen interface.
- the apparatus may further comprise an audio interface.
- the apparatus may further comprise a microphone.
- the audio interface may comprise a speaker.
- a computer program comprising program instructions for causing a computer program to carry out the above method which may be embodied on a record medium, carrier signal or read-only memory.
- the user may be a child. It is desired to elicit a predetermined response to facilitate a comparative analysis of the response of the user. Predetermined responses allow for expected norms and comparisons with these expected norms.
- the stimulus provided may include an image and an associated instruction. By providing an image and associated instruction, articulation, phonology, expressive language and receptive language skills can be assessed.
- the associated instruction may include a question asking "Why is the dog wet”.
- the expected response would be, "because he jumped into the pool”.
- Another example may include providing a plurality of images of different size bottles and asking a child to choose the largest bottle of water from stimuli including several sizes of bottles.
- the choice can be verbal or on a touch screen.
- a further example may also include an image of a child falling over.
- the expected response would include target words such as “falling "or “tripping”. Expanding on this to determine a range of skills the expected response required may be "she is falling”.
- the stimulus may be provided as any combination of audio and image based instructions.
- the audio stimulus may be provided using a speaker or through headphones or other audio delivery mechanism.
- the image based instructions may be provided on a graphical user interface displayed on a screen. In one configuration, which is shown as an example only, four separate images 101, 102, 103, 104 are displayed on a screen 100.
- a written command 105 may be displayed at the bottom of the picture. This command may be desired to elicit a verbal response or a touch response or a combination of both from the user. For example if the screen is a touch screen, the user may be required to touch a picture and verbalise what they see, for example "a small dog".
- Language is usually divided into expressive (ability to express needs/feeling/describe actions and event, structuring language and using grammar correctly ability to understand language spoken to them) and receptive skills (the ability to understand words and language). To provide an assessment of speech skills it is necessary to consider both the expressive and receptive skills.
- the response both speech and non-speech is elicited to specifically target clinical speech and language skills.
- the response to specific stimuli is processed to determine whether a user, for example a child, has successfully mastered speech skills.
- the elicited samples are processed, 202 using automated algorithms as described here in and in further details below.
- automated algorithms or data models 303 for example a speech recognition engine and a natural language processing engine, characteristics of the speech segment, and where applicable the non-speech segment are identified.
- a single characteristic may be used or a combination of one or more characteristics used in the assessment of language skills.
- the assessment of "achieving a specific language skill” will be based on a combination of the above features. This may be the combination of skills which achieves the highest correlation with the assessment of clinical speech therapist.
- the elicited response is then scored, 203, based on the characteristic or characteristics.
- scoring is a confidence score achieved by comparing the characteristics with one or more thresholds.
- a confidence score of 100 would indicate an exact match.
- a confidence score of 0 would indicate that no matches were found.
- a score of 100 would be labelled as correct and 0 would be labelled as omitted.
- Scoring is implemented using a machine learning algorithm.
- This machine learning algorithm is trained using a corpus of verified training data, 302.
- the corpus 302 is made up of responses elicited in response to the same combination of image and instruction and which have been verified as correct samples. New stimuli/prompt formats are processed in response to the training corpus.
- a first characteristic of an elicited response may be whether or not a target speech skill has been achieved. In one configuration, this can be determined by comparing a confidence score from test examples to a training model and identifying the presence of a target word in the response. It will be appreciated that a combination of scores based on a plurality of characteristics may be used to determine if any particular speech skill has been achieved. Scoring may be based on the target word only or based on the target word and an additional combination of identified skills. For example a plurality of words (cat/bat/mat) may be scored for comparative confidence scores. It will be appreciated that statistical models may be build using the confidence scores obtained based on the training data.
- Mathematical models may be developed in accordance with Figure 4 .
- Prompts are designed to elicit specific responses, 401a, 401b.
- a training corpus of speech samples is extracted 402a, 402b. These samples comprise speech and additional data components. Speech data is labelled at both a word level and a phonetic level. In addition the presence of phonetic processes may also be labelled, 403a.
- the training corpus for the algorithms of the present application may be a corpus of audio files of examples of single words collected over time from a number of users of the system. Each of these single words is considered a target word. Each target word is broken down into a plurality of phoenetic components, i.e. Guitar /G IH T AA R/. It will be appreciated that this may be in accordance with the phonetic alphabet used in speech recognition (ARPAbet). Each phonetic component is labelled 403a.
- Example labels in a phonological assessment may include "Successful", or “Correct”, "Omitted” and/or "Substituted".
- Each audio file is also labelled to determine whether or not speech skills have been achieved, 403b.
- Speech recognition results for speech data, phone level/word level are also determined 405a using natural language processing techniques.
- acoustic features are determined 405b.
- the mathematical model (speech model) is trained based on the obtained speech recognition results and the corresponding labelled speech data, 406a.
- the language feature model is trained based on the training data and corresponding labels determined from the obtained acoustic features and the natural language processing.
- the provision of a predetermined stimulus to a user means that a controlled set of results are expected.
- the speech recognition engine and the natural language processing engine are looking for particular target words and phraseology in the audio sample.
- a first image displayed to a user of the system The image may be of a child who has fallen and is crying.
- the associated instruction may be a question asking why is the child crying?
- the user provides a response comprising a speech segment the response being "fell".
- the speech segment or sample is processed using the speech processing engine and the natural language processing engine to determine a plurality of speech skills or characteristics of the speech segment, for example, use of sentence structure, tense, verb, etc.
- the result of the processing would score the response identifying that there was no sentence, and just a single word, no preposition is used, but the tense is correct.
- the response is therefore scored but the score would indicate that the expected skill or response is not achieved. Accordingly, the score would be compared to an acquisition table to determine whether those characteristics are age appropriate. For example, if a user is aged three and has not achieved the full expected response, the score may be compared with an acquisition table to determine whether the score is age appropriate.
- this process may be repeated a plurality of times to determine an average result. Multiple scores may be required to determine an accurate assessment.
- target words may be compared to similar words where there is a likelihood of confusion. This allows the score to be double checked.
- a confusion matrix may be provided for comparison of similar words which may be confused with the target sample.
- a speech segment or sample is obtained 501.
- the speech segment is then processed.
- Phonological assessment is based on an assessment of acoustic features 502 and also speech recognition 503.
- the analysis of the combination of acoustic and speech features provides characteristics of language skills, both receptive and expressive. Recognising the speech comprises recognising words or phrases included in the speech sample.
- Acoustic features include, but are not limited to spectral features (Mel Frequency cepstral coefficients, prosodic features (pitch, energy), voice quality features (jitter & shimmer) and temporal acoustic features, voice onset timing, duration of phonation, etc.
- the combination of speech recognition and acoustic feature recognition allows articulation errors, for example through limitations in motor or cognitive processes to be taken into account. It will be appreciated that the recognition of articulation disorders is a further advantage of the present invention. Articulation errors, for example include a lisp. The data models can be trained using such articulation errors to ensure a correct identification of such. It will be appreciated that speech recognition alone can be used, however, the results of the phonological assessment can be improved through the combination of speech recognition and acoustic features as shown in Figure 5 .
- the output of the language skills component is scored, 504, as outlined above.
- Speech recognition may be supplemented with a measure of fluency and or utterance duration to improve accuracy of scoring.
- These 2 measures may be calculated from signal processing techniques such frequency analysis to differentiate between speech and non-speech, including but not limited to: duration of utterance, number of pauses, number of speech segments.
- Speech recognition alone and the combination of words and phones can be equally used to determine 506 a score which again can be compared to expected norms.
- An example of such a norm would be for example i.e. if a child can't say /sh/ correctly but is 3.5yr, they are not expected to fully achieve this sound until 4.5yrs.
- the speech recognition scores can also be supplemented with confusion scores based on comparisons with predefined similar words.
- Some sounds require analysis of additional acoustic features to achieve acceptable accuracies. Articulation errors occur from a failure to achieve the correct sound, through limitations in motor or cognitive processes. Sounds that are affected by articulation disorders will not be recognized by a traditional ASR as they will most likely not occur in the training data or the pronunciation dictionary. It will be appreciated that using the mechanisms proposed herein additional models may be trained to identify these articulation disorders.
- Any electronic device with a graphical user interface can be used to display an image.
- This device may have associated audio output.
- an audio output may be separately provided from a speaker, a mobile device or other audio output device.
- the electronic device may comprise a microphone or other audio collection mechanism.
- the microphone may also be provided through a separate device, distinct from the computing device, for example an audio recorder, or a mobile communication device.
- the graphical user interface may comprise a touch screen.
- the assessment of language skills may also incorporate touchscreen responses to stimuli designed to assess receptive skills.
- Additional notepad functionality may also be provided. This notepad functionality may be provided on the touch screen. Alternatively the notepad functionality may be digital paper or an additional touch screen or the like.
- the number of features may vary depending on the skill being assessed, data analysis will determine the optimum combination of features to achieve the highest accuracy when scoring whether a child has achieved a speech skill.
- the embodiments in the invention described with reference to the drawings comprise a computer apparatus and/or processes performed in a computer apparatus.
- the invention also extends to computer programs, particularly computer programs stored on or in a carrier adapted to bring the invention into practice.
- the program may be in the form of source code, object code, or a code intermediate source and object code, such as in partially compiled form or in any other form suitable for use in the implementation of the method according to the invention.
- the carrier may comprise a storage medium such as ROM, e.g. CD ROM, or magnetic recording medium, e.g. a floppy disk or hard disk.
- the carrier may be an electrical or optical signal which may be transmitted via an electrical or an optical cable or by radio or other means.
- the above-described embodiments of the present technology can be implemented in any of numerous ways.
- the embodiments may be implemented using hardware, software or a combination thereof.
- the software code can be executed on any suitable processor or collection of processors, whether provided in a single computer or distributed among multiple computers.
- any component or collection of components that perform the functions described above can be genetically considered as one or more controllers that control the above-discussed functions.
- the one or more controllers can be implemented in numerous ways, such as with dedicated hardware, or with general purpose hardware (e.g., one or more processors) that is programmed using microcode or software to perform the functions recited above.
- one implementation of the embodiments of the present technology comprises at least one computer-readable storage medium (e.g., a computer memory, a floppy disk, a compact disk, a tape, a flash drive, etc.) encoded with a computer program (i.e., a plurality of instructions), which, when executed on a processor, performs the above-discussed functions of the embodiments of the present technology.
- the computer-readable storage medium can be transportable such that the program stored thereon can be loaded onto any computer resource to implement the aspects of the present technology discussed herein.
- the reference to a computer program which, when executed, performs the above-discussed functions is not limited to an application program running on a host computer. Rather, the term computer program is used herein in a generic sense to reference any type of computer code (e.g., software or microcode) that can be employed to program a processor to implement the above- discussed aspects of the technology.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Educational Administration (AREA)
- Educational Technology (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Entrepreneurship & Innovation (AREA)
- General Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Electrically Operated Instructional Devices (AREA)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP20200164.0A EP3979239A1 (fr) | 2020-10-05 | 2020-10-05 | Procédé et appareil pour l'évaluation automatique de compétences vocales et linguistiques |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP20200164.0A EP3979239A1 (fr) | 2020-10-05 | 2020-10-05 | Procédé et appareil pour l'évaluation automatique de compétences vocales et linguistiques |
Publications (1)
Publication Number | Publication Date |
---|---|
EP3979239A1 true EP3979239A1 (fr) | 2022-04-06 |
Family
ID=72752392
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP20200164.0A Withdrawn EP3979239A1 (fr) | 2020-10-05 | 2020-10-05 | Procédé et appareil pour l'évaluation automatique de compétences vocales et linguistiques |
Country Status (1)
Country | Link |
---|---|
EP (1) | EP3979239A1 (fr) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1998014934A1 (fr) * | 1996-10-02 | 1998-04-09 | Sri International | Procede et systeme d'evaluation automatique de la prononciation independamment du texte pour l'apprentissage d'une langue |
US20110213610A1 (en) * | 2010-03-01 | 2011-09-01 | Lei Chen | Processor Implemented Systems and Methods for Measuring Syntactic Complexity on Spontaneous Non-Native Speech Data by Using Structural Event Detection |
-
2020
- 2020-10-05 EP EP20200164.0A patent/EP3979239A1/fr not_active Withdrawn
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1998014934A1 (fr) * | 1996-10-02 | 1998-04-09 | Sri International | Procede et systeme d'evaluation automatique de la prononciation independamment du texte pour l'apprentissage d'une langue |
US20110213610A1 (en) * | 2010-03-01 | 2011-09-01 | Lei Chen | Processor Implemented Systems and Methods for Measuring Syntactic Complexity on Spontaneous Non-Native Speech Data by Using Structural Event Detection |
Non-Patent Citations (1)
Title |
---|
ELIMAT AMAL KHALIL ET AL: "Automatic Speech Recognition Technology as an Effective Means for Teaching Pronunciation", THE JALT CALL JOURNAL, 10(1), vol. 10, no. 1, 1 April 2014 (2014-04-01), pages 21 - 47, XP055782301, ISSN: 1832-4215, Retrieved from the Internet <URL:https://pdfs.semanticscholar.org/8861/c0a6c742fe013b251e499d48c5fced9b9df2.pdf> [retrieved on 20210304], DOI: 10.29140/jaltcall.v10n1.166 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5717828A (en) | Speech recognition apparatus and method for learning | |
US6134529A (en) | Speech recognition apparatus and method for learning | |
US5487671A (en) | Computerized system for teaching speech | |
Gerosa et al. | A review of ASR technologies for children's speech | |
US7433819B2 (en) | Assessing fluency based on elapsed time | |
US8109765B2 (en) | Intelligent tutoring feedback | |
CN105792752B (zh) | 用于诊断和治疗语言相关障碍的计算技术 | |
US20070055514A1 (en) | Intelligent tutoring feedback | |
US20060069562A1 (en) | Word categories | |
AU2003300130A1 (en) | Speech recognition method | |
JP2002503353A (ja) | 音読及び発音指導装置 | |
Michael | Automated Speech Recognition in language learning: Potential models, benefits and impact | |
US20140141392A1 (en) | Systems and Methods for Evaluating Difficulty of Spoken Text | |
Peabody | Methods for pronunciation assessment in computer aided language learning | |
CN109817244B (zh) | 口语评测方法、装置、设备和存储介质 | |
US9520068B2 (en) | Sentence level analysis in a reading tutor | |
US20060053012A1 (en) | Speech mapping system and method | |
Zhang et al. | WithYou: automated adaptive speech tutoring with context-dependent speech recognition | |
Kabashima et al. | Dnn-based scoring of language learners’ proficiency using learners’ shadowings and native listeners’ responsive shadowings | |
EP3979239A1 (fr) | Procédé et appareil pour l'évaluation automatique de compétences vocales et linguistiques | |
WO2006031536A2 (fr) | Retour d'information de tutorat intelligent | |
US20220230626A1 (en) | Creative work systems and methods thereof | |
Kantor et al. | Reading companion: The technical and social design of an automated reading tutor | |
van Alphen | Perceptual relevance of prevoicing in Dutch | |
van Doremalen | Developing automatic speech recognition-enabled language learning applications: from theory to practice |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION HAS BEEN PUBLISHED |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20221007 |