NL2008809C2

NL2008809C2 - Automated system for training oral language proficiency.

Info

Publication number: NL2008809C2
Application number: NL2008809A
Authority: NL
Inventors: Wilhelmus Albertus Johannes Strik; Catia Cucchiarini
Original assignee: Stichting Katholieke Universtiteit
Priority date: 2012-05-14
Filing date: 2012-05-14
Publication date: 2013-11-18
Also published as: WO2013172707A4; WO2013172707A2; WO2013172707A3

Description

Automated system for training oral language proficiency DESCRIPTION

FIELD OF THE INVENTION

5 The present invention is in the field of automated

systems and methods for training of oral language proficiency. BACKGROUND OF THE INVENTION

As a result of increasing globalization there is a growing demand from education and business community for peo-10 pie who speak foreign languages well. Intelligible pronuncia-tion/speech in a second language (L2) is regarded as important for e.g. successful interaction and social acceptance. However, an important problem is that oral proficiency training requires so much time, feedback and practice, that very often 15 it cannot be sufficiently provided in traditional language classes. For instance, Dutch students have problems with different aspects of English, especially the sound system, such as that different words sound similar. Besides pronunciation they often also have problems with grammar, vocabulary and 20 sentence structure.

Typically a state of the art high-end computer program does not specially address oral proficiency skills in a second language and can not be used to support and improve language learning anytime and anywhere. Advanced, dedicated 25 technology is not available to make this possible. Such leads to a lack of appropriate feedback and remedial exercises for a learner .

There are some rudimentary programs that do use computer assisted learning applications with automatic speech 30 recognition (ASR), but these only provide right/wrong feedback, and they generally do not provide control or checks.

This kind of technology is not advanced yet. Feedback on pronunciation of a user may be provided through waveforms.

Various documents recite systems for improve-35 ment/training of oral proficiency skills and the like.

Some recite a sort of speech recognition system. Such does not relate to improvement/training of oral language proficiency. Further, such systems typically do not take a level of proficiency and/or accents or the like into account.

2

Some recite "schemes" for processing oral input, some recite hearing systems, some recite generic learning, and some recite formalistic approaches. Typically these are very general, and do not provide adequate details. Typical prior art 5 documents do not provide reliable and reproducible results, for instance as underlying systems are not or poorly developed.

Typically prior art systems comprise only one or a few of necessary technology (modules), in order to perform ade-10 quately, e.g. in terms of teaching ability. Typically prior art systems relate to one aspect of language learning only. Typically there is no or limited correction of oral proficiency skills errors. If a comparison can be made, typically a correction would relate to identifying if pronunciation is 15 "wrong" or "correct", i.e. there is no underlying system for identifying further details, which details may be improved or which may be sufficient. Some prior art systems focus on only one item, such as improvement of improving pronunciation of vocals. Therefore a prior art system is typically also not ca-20 pable of handling an accent of a user. For instance, for a given language pair, such as the Dutch-English language pair, there is no technology that can automatically handle nonnative English with different (degrees of) Dutch accents.

There is therefore a need for the technology to be optimized 25 for a specific language pair, i.e. Dutch-English.

Not only should a system on one hand be able to recognize all the (English) input, although spoken with many different (Dutch) accents, but on the other hand a system should also be able to detect pronunciation errors.

30 Preferably all modules to detect errors need to be developed in order to provide adequate performance, and further these should be combined in an optimal way (design) to obtain a system that is suitable for practicing (English) oral proficiency skills. This requires not only the technology (the 35 separate modules), which in itself is very challenging, but also a mix of expertise including knowledge about language acquisition, language teaching, software design, etc. The prior art systems do not meet all these requirements.

Also typical prior art systems can not be operated in 3 real-time, thereby making interactive learning virtually impossible .

Further, users consider that a program must take the level of the user into account and automatically offer new ex-5 ercises.

The present invention therefore relates to a system and a method for automatic improvement of oral proficiency skills, which overcomes one or more of the above disadvantages, without jeopardizing functionality and advantages.

10 SUMMARY OF THE INVENTION

The present invention relates in a first aspect to an automated system for improvement of oral language proficiency according to claim 1.

In the present application the phrase "oral language 15 proficiency" relates amongst others to communication per se, such as posing a (simple) question, obtaining an answer and interpreting the answer, morphology of words, syntax of a (simple) sentence, pronunciation, e.g. using correct phonemes, and skills attributed thereto.

20 In general it is noted that Automatic Speech Recogni tion (ASR) is already quite challenging for native speech, but it is even more challenging for non-native speech, since nonnative speech deviates substantially from native speech: the sounds, lexicon, and grammar differ. These three aspects are 25 directly related to three main components or knowledge sources of the present ASR system: acoustic models, lexicon, and language model. These components may use a chance algorithm in order to identify one or more probable occurrences. The components are adapted and optimized in view of input, e.g. dia-30 lect, probability, context, etc. Further a sort of decoder is provided for interaction between the components, communication with an outside world, etc. In developing this technology, the present inventors took into account what is possible and what is not yet possible with state-of-the-art ASR. Since automatic 35 recognition of all unconstrained, spontaneous non-native speech is not yet possible exercises in the present system have been constrained in such a way that they elicit speech that can be handled automatically with ASR, but are still suitable for language learning. In an academic context, there 4 are a limited number of research groups world-wide that carry out research in this field. The present inventors are leading experts in this line of research and have been involved in research and related activities for many years. First of all, 5 none of the other groups develop speech technology for language learning specifically for Dutch, neither for Dutch as a first language (LI) nor as a second language (L2). Furthermore, the technology developed by these academic sites is generally intended for research alone, not for commercial or 10 practical purposes. Finally, the present technology differs in many ways from others.

The present inventors have carried out a considerable body of research into applying speech technology e.g. speech recognition technology to language learning and testing, spe-15 cifically to learning Dutch as a second language (DL2), i.e. foreigners that are in the Netherlands and want to learn Dutch. In this case L2 is Dutch, and LI can be many different languages. It is noted that since people with different Lis differ in the way they speak an L2 (here Dutch), research car-20 ried out on Dutch was quite challenging, especially compared to a fixed language pair such as LI = Dutch and L2 = English. The present technology and products extend to other (combinations of) languages, e.g. French, German, or Spanish for Dutch students or Dutch (as L2) for students from other countries.

25 It has been found that to a large extent the present technology can be ported from one language to the other. Thereto studies were performed.

Desktop research was carried out to see whether information on porting speech technology between languages was 30 available. Porting speech technology is taken to mean taking speech technology, e.g. speech recognition technology that was developed for recognizing speech in a first language (LI), and then applying it for recognizing speech in a second language. No relevant information was found on porting speech technology 35 for use by non-native speakers. In addition, two pilot experiments were carried out on the feasibility of porting specific speech technology modules from Dutch to English (both as L2, as target language).

An experiment concerned porting the present technol- 5 ogy developed for detecting errors in the pronunciation of sounds for Dutch to English. First of all, speech recordings of Dutch students speaking English were collected and annotated in various ways: what was said (the words), how it was 5 said (phonetic transcriptions providing information on how these words were pronounced), and also which sounds were pronounced correctly and which ones incorrectly. In addition inventors implemented a so-called computer phonetic alphabet for English, an English lexicon, and acoustic models for non-10 native English. All these resources and information were used to develop and optimize speech technology for detecting errors in sounds in non-native English speech. An important aspect of this work was the identification of classifiers i.e. parameters that define particular errors made by non-native speakers 15 when speaking a foreign language, in this example, Dutch people speaking English. Such required a lot of manual work. In addition experts on English were often consulted (learning) from Radboud In'to Languages and the English Language Department of thereof.

20 A second experiment concerned porting the technology developed for detecting errors on prosody, e.g. intonation and word stress, for Dutch to English. Detecting word stress errors proved to be more complex than detecting errors in the pronunciation of sounds. Nevertheless, the experiences were 25 similar to those in pilot experiment 1. Also in this case recordings and annotations were needed. Speech recordings can be the same as those used for pronunciation error detection, as the speech material was carefully designed in such a way that it was suitable for both purposes. However, additional annota-30 tions were necessary that indicate syllabification, word stress, and whether the words were pronounced with correct word stress or not. It has been found experimentally that this also required a lot of manual work. Besides the data (mainly recordings and annotations) mentioned above and the expertise 35 of several persons, this also requires software to make recordings, annotations, analyses and training classifiers for word stress error detection. The adjustments were made in the software (in going from e.g. Dutch to English) were limited, and the expertise needed for carrying out the work described 6 above was available with the present inventors. However, for every new language (pair) new data was collected and annotated. In addition, the material then is used to train, test, and optimize classifiers.

5 To summarize, it has been shown to be feasible to de velop these classifiers, and tools (software) and expertise acquired for one language which has been shown to be very useful in developing the technology for other languages. Such has also been shown to speed up developing technology for new lan-10 guages (pairs), i.e. the amount of time needed to develop technology for other languages gradually diminishes. Then, the main costs relate to those for collecting and annotating the speech recordings. The present invention provides very specific, detailed and accurate feedback at sound level.

15 The present system is provided with a means for re ceiving input. The input is typically provided by the user, such as in the form of spoken language. Spoken language is typically provided within an exercise, such as by reading out loud a word, a sentence and the like. Therefore the means 20 typically relates to one or more microphones, directed to receive input. The one or microphones may be part of a further apparatus, such as a computer, a mobile phone, etc.

The present system is further provided with a processor for processing input and providing output, such as a CPU 25 of a computer, a mobile phone, and the like. The processor may further comprise software, for performing one or more of detecting errors, determining input, providing output, reducing noise, improving signal to noise ratio, etc.

The present system may further comprise a first means 30 for determining input, such as first phase speech recognition software, which software typically determines input in a tolerant mode, e.g. globally checking given (or actual) input versus required (target) input. The system may further comprise a second means for determining input, such as second phase 35 speech recognition software comprising a pronunciation quality evaluation unit for processing input to determine potential difference between target pronunciation and actual pronunciation, which unit functions in a detailed and strict manner.

The manner may depend on the level of the user.

7

The system may further comprise various error detectors. These detectors relate to one or more of sounds and phonemes, lexicon, grammar, and prosody. Examples are a pronunciation error detector, a prosody error detector, e.g. a word 5 stress error detector and an intonation error detector, a respiration error detector, a formant error detector, and a grammar error detector, e.g. a morphology error detector and a syntax error detector, an interaction error detector, and a lexicon error detector. Typically these detectors are opti-10 mized, e.g. in view of first and second language, such as

Dutch. Also these detectors may be provided in a training environment, such as the present My Pronunciation Coach® (MPC).

It is noted that various elements of the present system may be located within one location, even within one appa-15 ratus, such as a computer, wherein e.g. software is loaded on memory, or located at different locations, such as on the internet, on a mobile phone, on a computer, at a learning center, and combinations thereof. Within e.g. a combination a first element may function as a client to a further element, 20 an element may function as a server, etc.

A formant is considered as a concentration of acoustic energy around a particular frequency in the speech wave. The formants well represent vowel sounds. It has been established that formant frequencies changes during length of syl-25 lable. It is noted that formants depend on a person using speech, e.g. a man has a different set of formants than a woman, typically. Once the present system identifies the formants, it can correct for specific errors therein, by e.g. providing feedback to that end.

30 The present invention relates to a system comprising a sophisticated multi-component computer program, including various technologies needed, that students can use to practice a second language, such as English, and specifically pronunciation thereof. It has been developed over a long period of 35 time, based on scientific insights and technology. The present invention also relates to a product comprising said system. Such a system may be referred to as a computer assisted language learning (CALL) or computer assisted oral proficiency training (CAPT) system. The present system may include the 8 following functionalities, for which the required technology or modules (Mi-Mi) are than incorporated:

Ml. non-native automatic speech recognition (ASR): e.g. recognize first to second language, e.g. Dutch-English; 5 M2, pronunciation error detection (PED): detect er rors in the sounds (phonemes) produced; M3, word stress error detection (WED); and M4. intonation error detection (IED);

It is noted that within the present system all of the 10 above, and optional further functionality, are carried out automatically, for e.g. non-native English with different (degrees of) Dutch accents, or non-native Dutch speakers. There is no need for a teacher or tutor.

The present invention fulfills a need for Computer As-15 sisted Language Learning (CALL) applications that make use of Automatic Speech Recognition (ASR). CALL provides a private, stress-free environment in which students can access virtually unlimited input, practice at their own pace and, through the integration of ASR, receive individualized, instantaneous 20 feedback anytime and anywhere. In an example the system is intended for Dutch students that want to learn English (fixed language pair).

The present invention makes use of unique, advanced ASR technology for accurate pronunciation error detection, de-25 veloped by experts operating at the forefront of international research. This allows the system to offer new functionalities such as detailed and accurate phone specific corrective feedback and related remedial exercises, which are not yet offered by other products, and certainly not with the degree of preci-30 sion that is required for effective oral proficiency training and that the present technology can achieve. The present invention provides enabling technology modules that can be integrated into existing educational applications and courses. The present speech recognition and error analysis technology may 35 be accessible through an application programming interface which connects via web services. The present invention provides an application that customers can use to develop courses. Customers can easily create courses with the authoring tools supplied in the framework. The framework application 9 is built upon the technology module and available as software and as a service.

The present invention relates amongst others to a complete course based upon content from Radboud in'to Lan-5 guages. This ready-made course can be used by organizations to improve the learners' pronunciation skills. The course is modular and at present suited for levels from A2 to B2 (according to the Common European Framework of Reference for Languages, CEFR). Further, the course, being interactive, can be 10 adapted within its present framework to a need of a client, e.g. in terms of level.

The present invention provides products and services that generate leads, strengthens client relations (customer satisfaction) and improves the center of expertise. It also 15 relates to an advice on policies and didactics: information and advice on necessity, added value and didactic applicability of ASR-based CALL. Further to implementation guid-ance/project management: well-planned and structured guidance to ensure organization-wide use of the products in line with 20 strategic and didactic objectives of client. Also to training: to stimulate acceptance and use, and transfer our knowledge and experience. As noted, the present invention may be integrated into a client's ICT infrastructure.

The present invention provides a unique product-25 market combination. The market can be divided into various segments, e.g. the segments 1-4 being further detailed below.

1. Conventional education

Language teaching in conventional education institutions is typically based on core objectives, end terms or 30 qualification profiles, which are (legally) embedded in a curriculum. Schoolbooks and digital courses from publishers are commonly used. For this market segment, a ready-made course is an interesting product; preferably courses based upon the methods from publishers the school uses. Additionally, the 35 present framework will allow institutions to develop their own pronunciation courses.

2. Commercial language centers

These centers will be able to use present technology by integrating it in their own educational applications and 10 courses. Additionally, present framework will allow them to develop their own courses.

3. Publishers

The present framework allows them to develop pronun-5 ciation courses that link up to their methods. Additionally, they have the option of offering content that users of the inventors My Pronunciation Coach (MPC) framework can assemble into pronunciation courses. An example of this would be a publisher supplying lists of words that a teacher can assemble 10 into a course within the framework application.

4. Integration partners

Modular technology will be most interesting for this market segment e.g. because of the possibilities for integration within suppliers applications and courses. Additionally, 15 the functionality of their existing applications can be extended by linking them to MPC framework. For instance, a supplier of test software integrates the MPC tech to introduce new question types.

The present invention also relates to a so-called 20 Software Development Kit: specifications and documentation of APIs and web services enabling third parties to develop their own tools and extensions, which can be plugged into the present framework. Such further relates to certifying, promoting and distributing add-ons within a user community. In addition, 25 educational languages are developed games.

Unique product features are e.g. a personalized, accurate feedback on individual words and sounds, adaptive learning combined with remedial exercises, and individualized progress reports. The present system provides a "coach" for 30 improving English proficiency with an automatic coach that listens ."

Within the present system an ASR software package SPRAAK has been used. It is freely available for non-profit research, and can also be used for commercial applications.

35 The present system allows, however, for a switch to another speech recognizer system. Such another speech recognizer system can be implemented in a straightforward manner into the present system.

Typically pronunciation errors are detected at the 11 level of individual words and sounds with a high degree of accuracy thereby e.g. providing appropriate feedback and exercises .

The present system can be optimized for any language 5 pair, following a procedure developed thereto. For instance for a specific language pair, Dutch-English, such a procedure is implemented, using expertise and data obtained for this language pair. Using expertise, data, and technology for other languages (esp. foreign-Dutch) the procedure is repeated.

10 Typically the present method and system uses a two- step approach, in which e.g. Dutch-English is recognized in a tolerant manner, typically relating to verification of (intended) expression, that is without overstressing (minor) mistakes and reflecting thereon, and in a further step detect 15 pronunciation errors, which is done in a strict way, wherein strictness may vary depending on level of expertise of a user. The level of expertise may be characterized according to the Common European Framework of Reference for Languages (CEFR: from Al, low level, to C2 advanced level).

20 Experimentally modules were developed and tested in isolation, and later combined into a complete, suitable CALL system using a mix of expertise (as mentioned above).

Above various problems were mentioned. Regarding three problems thereof the following is noted: 25 There is no CALL system for e.g. the Dutch-English language pair. In an example present system is optimized for this specific language pair (Dutch-English).

It is not known to use a two-step procedure as indicated above to be used in a CALL system, such as for other 30 language pairs.

Some of the modules Ml-4 (see above) seem to have been developed, albeit typically for different language pairs, but not for Dutch-English. However these have not been implemented into one combined system. Further they do not provide 35 the present functionality. Also the prior art systems do not make use of a mix of expertise's, such as mentioned above, and further expertise provided by tutors, language professionals, e.g. relating to learning algorithms.

The present invention has been applied to amongst 12 others native Dutch (especially word stress and intonation), and non-native Dutch users, for foreigners (with many different nationalities) learning Dutch. The latter are referred to as foreign-Dutch. For foreign-Dutch, a complete system is de-5 veloped, tested, and it has been established that it is suitable and effective for language learners in general. It has been found that the technology and expertise acquired for the foreign-Dutch case, is in principle transferable to any other language pair, such as localized Dutch-English.

10 The present system targets amongst other correction of prosody (appropriate emphasis and inflection), deficits in rate of articulation, intensity, formant and phonation (control of vocal folds for appropriate voice quality and valving of airway). These treatments may involve exercises to increase 15 strength and control over articulator muscles, and using alternate speaking techniques to increase speaker intelligibility.

The present system may be accessible on internet, on a hard disk of a computer, on a DVD, a CD-ROM, etc.

20 The system may be used in Computer Assisted Language Learning (CALL), in Computer Assisted Learning (CAL), in Computer Assisted (aided) Instruction (CAI), in Computer Assisted Pronunciation Training (CAPT), in improving any language proficiency, etc. A lack of proficiency may be caused by a 25 human body deficiency, such as caused by an accident, being present from birth, etc.

Thereby the present invention provides a solution to one or more of the above mentioned problems, by providing an extended system, comprising various functionalities, wherein 30 the functionalities are further optimized with respect to each other, thereby further improving functionality and user friendliness .

Advantages of the present description are detailed throughout the description.

35 DETAILED DESCRIPTION OF THE INVENTION

The present invention relates in a first aspect to a system for automatic improvement of oral language proficiency according to claim 1.

In an example the present invention relates to a 13 system wherein input and/or output are in a second language and the user being native in a first language, wherein the first and second language are selected from Indo-European languages, such as Spanish, English, Hindi, Portuguese, Bengali, 5 Russian, German, Marathi, French, Italian, Punjabi, Urdu,

Dutch, German, French, Spanish, Italian, Sino-Tibetan languages, such as Chinese, Austro-Asiatic languages, Austrone-sian languages, Altaic languages, such as wherein the first and second language are Dutch and English, Dutch and German, 10 Dutch and Spanish, Dutch and Chinese, German and English,

French and English, Chinese and English, preferably wherein the second language is a foreign language such as English, and vice versa, wherein the first and second language are optionally the same, such as Dutch and Dutch.

15 It is noted that the present system allows for the first and a second language to be the same, e.g. Dutch and Dutch, or to be different.

In an example the first language may be selected from Dutch, German, French, Spanish, Italian, Polish, Chinese, 20 Japanese, Korean, Afrikaans, and English.

In an example the second language may be selected from Dutch, German, French, Spanish, Italian, Polish, Chinese, Japanese, Korean, Afrikaans, and English.

Also varieties of the above languages may be se-25 lected, such as British English, American English, Australian English, Canadian English, New Zealandian English, Indian English, etc.

Further, the present system is also adapted to process dialects, such as Dutch dialects and varieties, such as 30 wherein the pronunciation quality evaluation unit is adapted for one or more varieties and/or dialects, such as British English, Limburgs, Brabants, Gronings, and Drenths. Clearly such can only be achieved after gathering data, analyzing data, ordering data, etc. as described throughout the descrip-35 tion.

Likewise the system may also be used to learn a language, not being a mother language, e.g. a foreign language, when staying abroad, such as when immigrating, for study, for work, etc. For example, the present system may be used to 14 learn Dutch as a second language for e.g. people form Turkey, Morocco, Suriname, the Dutch Antilles, Germany, Great Britain, Poland, etc. As such the present invention is widely applicable .

5 In an example the present invention relates to a sys tem wherein the pronunciation quality evaluation unit comprises software, wherein the software is preferably being stored on a computer.

In an example the present invention relates to a sys- 10 tern further comprising one or more of a language model, a lexicon, a phoneme model, one or more thresholds, one or more probability criteria, one or more random number generators, a level adjustment set-up, and a decoder, wherein the decoder may comprise the previous elements.

15 In an example the present invention relates to a system further comprising one or more of a reference set of pa rameters, a fine-tuning mechanism, a self-learning algorithm, a self-improvement algorithm, and a selection means for selecting criteria. The parameters may for instance relate to 20 one or more classifiers, as well as to (implementing) algorithms, e.g. for determining a probability.

In an example the present invention relates to a system further comprising a data base, wherein data is stored for one or more of pronunciation, word stress, intonation, and 25 phoneme segmentation. It is noted that the present data base comprises an extensive amount of data, gathered throughout the years .

In an example a reference set of parameters or classifiers relate for instance to selecting relevant parameters 30 first, than identifying a sub-set thereof to be taken into account, further identifying a cutoff value, for instance below which no action is taken. Values and parameters may vary in view of a level of a user. For instance, in Europe users are categorized from Al, being the lowest level, towards C2, being 35 the most advanced level. Based on the level of the user, feedback, use of parameters etc. may e.g. be more or less stringent, that is in view of objectives. As such the present system may be fine-tuned to the level of a user. Also input from professionals is taken into account.

15

As such various levels, e.g. in view of the above parameters, may be distinguished, such as for advanced learners and for beginners. Feedback is provided at an expected level.

The present database is filled with a huge amount of 5 information, such as spoken sentences, models, etc. Further, the data is organized, e.g. automatically by self-learning software, such as probabilistic software.

Even further, input from professionals in the field, e.g. tutors, is incorporated in the database.

10 The present invention relates in a second aspect to a method for automatic improvement of oral language proficiency according to claim 8.

It is noted that feedback may be provided in various ways, as indicated. Such may depend on optimal efficiency of a 15 learning method. Feedback may be in a form wherein the input is fed back, but also in a form wherein improved pronunciation is provided, such as by repeating (part) of the input, in an optimized manner however.

The present invention relates in a third aspect to a 20 system according to the invention and/or a method according to the invention for improving a non-mother language. Such is especially relevant for immigrants, e.g. working in a country, and children thereof. It is noted that children of immigrants speak another language e.g. at home than that of the country 25 they are living in. As such the another language can be con sidered as a second language, wherein especially the children are less proficient. Therefore there is a need for a system or method for improving the non-mother language.

The present invention relates in a fourth aspect to a 30 system according to the invention and/or a method according to the invention for use in medicine.

Specifically the present invention may be used to improved speech of a patient. It is noted that speech may be hampered for various reasons, some of these are given below.

35 The present system may be amended slightly in order to improve treatment results. For instance number of repetitions may be altered, typically increased. Also a larger number of similar exercises may be provided to a patient in need thereof. Even further the first means for determining input 16 may be set more tolerant, e.g. in that more often input is accepted as sufficient. Even further besides the second means for determining input may also be set more tolerant. A person using the present system for treatment may have different ob-5 jectives, e.g. less stringent in certain aspects and more stringent in other aspects. Even further an intermediate means for determining input may be provided. If it is expected that a patient will recover relatively slowly, such a further third means may provide an intermediate level to be reached.

10 For example, dysarthria is a motor speech disorder believed to result from (neurological) injury of a motor component of a motor-speech system and is amongst others characterized by poor articulation of phonemes. Any of a speech subsystem (such as respiration, phonation, formant, prosody, and 15 articulation) can be affected, leading to impairments in intelligibility, audibility, naturalness, and efficiency of vocal communication. It is noted that also one or more of speech subsystems may be improved by the present invention.

In the case of neurological injury due to damage in 20 the central or peripheral nervous system such may result in e.g. weakness, paralysis, and lack of coordination of the above motor-speech system, producing e.g. dysarthria. These effects in turn hinder for example control over tongue, throat, lips or lungs and swallowing problems (dysphagia) are 25 also often present. In this respect the present invention is also aimed at improving e.g. weakness, paralysis and coordination .

It is noted that typically the term dysarthria does not include speech disorders from structural abnormalities, 30 such as cleft palate, and must not be confused with apraxia, which refers to problems in the planning and programming aspect of the motor-speech system. It is noted that other speech disorders, such as the ones mentioned before, may also be improved by the present invention.

35 As such the present invention is also aimed at im proving functionality of e.g. the nerve system, e.g. in terms of restoring nerve paths. For instance, functionality of cranial nerves that control muscles, e.g. relating to trigeminal nerve's motor branch, facial nerve, glossopharyngeal nerve, 17 vagus nerve, and hypoglossal nerve.

The present invention is also aimed at improving functionality relating to specific dysarthria's, such as spastic, flaccid, ataxic, unilateral upper motor neuron, hyperki-5 netic and hypokinetic, such as in Huntington's disease or

Parkinsonism, and mixed dysarthria's. The above disorders may be of severe and mild nature. It is noted that dysarthria patients are often diagnosed as having 'mixed' dysarthria. Neural damage resulting in dysarthria is rarely contained to one 10 part of the nervous system — for example, multiple strokes, traumatic brain injury, and some kinds of degenerative illnesses often damage many different sectors of the nervous system, causing mixed dysarthria's.

It is noted that dysarthria may sometimes also affect 15 a single system. Severity ranges from occasional articulation difficulties to verbal speech that is completely unintelligible .

Individuals with dysarthria may encounter difficulties relating to e.g. pitch, vocal quality, speed, volume, 20 breath control, strength, range, timing, steadiness and tone. Examples of specific effects include irregular breakdown of articulation, distorted vowels, a continuous breathy voice, monopitch, word flow without pauses, and hyper nasality. Such may also be the case for a user in a second language as an im-25 migrant.

It is noted that causes of dysarthria and the like can be many such as Huntington's disease, Parkinsonism, Niemann Pick disease, Ataxia, ALS, trauma, thrombosis, injury embolic stroke, etc.

30 Articulation problems resulting from dysarthria are treated by speech language pathologists, using a variety of techniques, however not using the present system.

The present system is also aimed at use for improving eating performance, such as by improving control of organs, 35 such as tongue, lips, swallowing, etc. If control of these organs is improved it becomes easier to e.g. eat. Such problems may for instance specifically occur with elderly people. The present system may further be supported by (slightly) changing characteristics of food, such as palatability, viscosity, 18 etc., making it easier for a person to intake food. Various side effects, also optionally present as such, such as intelligibility, audibility, naturalness, and/or efficiency of vocal communication may be improved by the present system.

5 The present system and method may be used in more re cent techniques based on the principles of motor learning (PML). Further devices may support speech, such as Augmentative and Alternative Communication (AAC) devices that make coping with a dysarthria easier, which may include speech syn-10 thesis and text-based telephones.

EXAMPLES

The invention is further detailed by the accompanying examples, which are exemplary and explanatory of nature and are not limiting the scope of the invention. To the per-15 son skilled in the art it may be clear that many variants, being obvious or not, may be conceivable falling within the scope of protection, defined by the present claims.

In an example the present invention provides an overview of most important pronunciation errors and progres-20 sion in time. Tables 1 (vowels) and 2 (consonants) show in column 1 the British English (RP) sound, followed by the condition or the context in which the error occurs, in column 3 the sound often pronounced by Dutch speakers, and in column 4 some example words. If there is no condition speci-25 fied, the error can be applied to all conditions.

Table 1: Vowel Errors in Dutch English Pronunciation __RP Condition__Dutch__Example 1 /r a/ before fa! isl tien beer, idea 30-----:- 2 /$/ +fortis consonant /;,/ pet bat + lenis consonant /£»/ bad 3 /A/ spelling with o /•,::>/sofc other spelling with u /y/bus bus ___^__/a/ bedacht mwis€ 35 4 im:J /a/soep soup 5 /0/ /W goed good 19

Table 2: Consonant Errors in Dutch English Pronunciation RP Condition Dutch Example 1 fhi word-final /p/ Rob iiub 2 M/ word-final /tf bad bad 3 /g/ word-final /k/ lik big 4 [ph] initial /p/pak Pllck [t*l ™ia'k'ss /tf tak u,t' . nlosi\ es cup 1 o [1^3 /k/ kat 5 /tf/ /ƒ/sjaal chips 6 /dy___/tf/__bridge 7 /d.3/ /5/ sjaal jam 8 /w/ /:o/wIe wi|lc 15 9 ./0/ /d/dak the, this to mi___/a/ sap-__booth 11 /0/ /a/sap thirt>. three ___./tf tap__ 12 M ./J7 sjaal socks 2 0 13 /2/ word-final /5/sap jazz

In an example a solution is provided for Dutch students which have problems with different aspects of the English sound system, for instance with final devoicing, aspi-25 ration, dental fricatives and the pronunciation of some vowels .

Examples of these errors are provided in tables 1 and 2. Examples for consonant errors in Table 2: final devoicing errors in rows 1, 2, and 3, e.g. the last sound of 30 the word 'bad' being pronounced as /t// aspiration errors in row 4, e.g. the first sound of the word 'cap' being pronounced as /k/; and dental fricative errors in row 9, e.g. the first sound of the word 'this' being pronounced as /6./. A frequent vowel error is shown in row 3 of table 1, the 35 first sound of the word 'unwise' being pronounced as /9/.

In a way, the example relates to a (scientific and development) process wherein an inventory of frequent errors made by Dutch people learning English is made, an inventory of existing technology for handling non-native speech (for 20 speech recognition, assessment, error detection, etc.) is made, feasibility of porting technology from one language to the other is investigated, conducting pilot experiments, such as porting the present technology developed for detect-5 ing pronunciation errors in Dutch to English, and conducting further pilot experiments, such as porting the present technology developed for detecting word stress errors in Dutch to English. Therein typically input is first determined tolerantly, and thereafter more strict. It is noted that a fur-10 ther similar sub-division may be provided by the present system, and even further that application of determination may vary throughout use of the present system, e.g. at one point being (somewhat) more tolerant, than being (somewhat) more strict, than even more strict, and then somewhat less 15 strict. Other components mentioned above are ported as well, in a later stage.

In the above feasibility study an inventory was drawn up of errors that should be addressed in a training program. The selected errors are based on research data and 20 on Radboud in'to Languages's teaching experience throughout the Netherlands. The relevance of the selected errors is regarded not only dependent on an effect mispronunciation can have on intelligibility, but also on a possible negative attitude a Dutch English pronunciation can evoke.

25 The above survey of existing technology for han dling non-native speech has revealed that research on ASR for non-native speech is carried out by a limited number of academic sites world-wide. On the market there are few products that employ ASR, usually with limited functionalities 30 and for different target groups than those addressed by the present invention. There are some products on the market that make use of speech technology; some products even purport to employ ASR, although this cannot always be ascer tained. However, the present survey of tests and demos has 35 made clear that most of these products do not really make use of ASR and certainly do not use the present advanced ASR technology for error detection that allows providing feedback on the correctness of e.g. individual sounds, prosody, etc .

21

The above feasibility study has shown that expertise acquired in developing classifiers and tools (software) for one language is very useful in developing similar tools for other languages, and this also speed up the development 5 of the present technology for those new languages, i.e. the amount of time needed will gradually diminish. A main effort therein is for collecting and annotating the speech recordings required.

The present system makes it possible to optimize 10 for a given language pair by: focus on errors made by learners, optimizing technology for detecting these errors, providing suitable exercises for practicing the problematic aspects, etc. The tasks language learners have to perform are, for instance, read utterances aloud, listen to utterances 15 produced by the present system and then repeat (produce) these utterances, and so-called shadowing (i.e. listen to utterances, and repeat them while they are produced, with only a short delay). The level of difficulty of these tasks will gradually increase and adapt to the proficiency level 20 of the student. For these tasks it is known what the stu dents should say; however, since what they actually produce could be different, the technology is able to verify whether the learner was making a serious attempt to produce the utterance in the task or whether (s)he was trying to fool the 25 system. To this end utterance verification algorithms are employed. In all cases mentioned above, the technology should be able to cope with English spoken with e.g. many different Dutch accents at different levels. This is a difficult and challenging task that requires dedicated technol-30 ogy optimized for this specific goal.

With the present system students can practice their pronunciation in English: they produce utterances, the sys tem assesses their pronunciation, checks whether sounds were pronounced incorrectly, provides feedback on errors de-35 tected, and suggests appropriate exercises for improvement. Dedicated technology is developed to perform these tasks automatically through a computer program. Since the system has to cope with e.g. English spoken with a whole range of e.g. Dutch accents, this has been a challenging task requir- 22 ing innovative technology, developed and optimized for this specific task. The system can be web-based, providing students an opportunity to use it anytime and anywhere they want. The present system relates to a high-end product that 5 educational institutions teaching e.g. English can use to support teachers in providing feedback to their e.g. Dutch students on their pronunciation of the English language.

The present technology relates amongst others to error detection at word level, such as speech verification, 10 segment error detection, API-definition, and XML-representation, and error detection at utterance level, such as speech verification, segment error detection, API-definition, XML-representation.

Claims

An automated system for training a user's oral language proficiency in a second language, comprising: a) at least one means of reception, such as a microphone, b) at least one of i) a first means of determining input at tolerant manner, such as first phase speech recognition software, ii) a second means for determining input in a strict manner, such as second phase speech recognition software, comprising a pronunciation quality evaluation unit for processing input because of any difference between target pronunciation and actual pronunciation iii) a pronunciation error detector, iv) a word stress error detector, v) a morphology error detector, vi) a syntax error detector, vii) an interaction error detector, viii) an intonation error detector ix) a respiratory error detector, and x) a formant error detector, c ) a processor for processing input and providing output, such as a computer, and d) at least one means for providing output, such as a speaker for providing audio feedback, and a monitor for providing visual feedback.

2. System as claimed in claim 1, wherein input and / or output are in a second language and the user has his own first language, wherein the first and second language are selected from Indo-European languages, such as Spanish, English, Hindi, Portuguese, Bengali, Russian, German, Marathi, French, Itali-35 aans, Punjabi, Urdu, Dutch, German, French, Spanish, Italian, Sino-Tibetan languages such as Chinese, Austro-Asian languages, Austronesian languages, Altaic languages, 5 such as wherein the first and second languages are Dutch and English, Dutch and German, Dutch and Spanish, Dutch and Chinese, German and English, French and English, Chinese and English, preferably where the second language is English, and vice versa , 10 where the first and second languages are possibly the same, such as Dutch and Dutch.

A system according to claim 1 or 2, wherein the voice quality evaluation unit is adapted for one or more varieties and / or dialects, such as British English, American

15 English, Australian English, Canadian English, New Zealand English, Indian English, Limburgish, Brabant, Gronings, and Drents.

4. System as claimed in any of the claims 1-3, wherein the pronunciation quality assessment unit comprises software, wherein the software is preferably stored in a computer.

5. System as claimed in any of the claims 1-4, further comprising one or more of a language model, a word list, a phoneme model, one or more threshold values, one or more probability criteria, one or more random number generators, a level control setting, and a decoder.

6. System as claimed in any of the claims 1-5, further comprising one or more of a reference set of parameters, a fine-tuning mechanism, a self-learning algorithm, a self-improving algorithm, and a selection means for selecting criteria.

The system of claim 6, further comprising a database, wherein data is stored for one or more of pronunciation, word stress, intonation, and phoneme segmentation.

8. Method for automatically improving oral language proficiency, comprising the steps of: a) providing input to a microphone, b) processing the input with speech recognition software, c) wherein preferably a computer for processing input and output is used, d) providing feedback, such as audio feedback through a loudspeaker, visual feedback through a monitor, and e) providing automatic feedback aimed at pronunciation enhancement by a pronunciation quality valuation unit.

The method of claim 8, further comprising determining input in a tolerant manner.

A system according to any of claims 1-7 and / or a method according to any of claims 8-9 for improving a non-native language.

11. System according to any of claims 1-7 and 10 and / or a method according to any of claims 8-9 for use in medicine, such as in clinical or pre-clinical care.

A system or method according to claim 11 for treating dysarthria caused, for example, by

20 CVA, a brain tumor, an accident, ALS (Amyotrophic Lateral Sclerosis), a neurological disorder such as Parkinson's disease, a disorder associated with the motor nervous system, such as in speech therapy, for improving eating performance, just improving control over organs, such as tongue, for improving the intelligibility, audibility, naturalness, and / or efficiency of vocal communication.