WO2015062465A1 - 移动设备上的实时口语评价系统及方法 - Google Patents
移动设备上的实时口语评价系统及方法 Download PDFInfo
- Publication number
- WO2015062465A1 WO2015062465A1 PCT/CN2014/089644 CN2014089644W WO2015062465A1 WO 2015062465 A1 WO2015062465 A1 WO 2015062465A1 CN 2014089644 W CN2014089644 W CN 2014089644W WO 2015062465 A1 WO2015062465 A1 WO 2015062465A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- voice
- evaluated
- text data
- speech
- pronunciation score
- Prior art date
Links
- 238000011156 evaluation Methods 0.000 title claims abstract description 45
- 238000000034 method Methods 0.000 title claims abstract description 29
- 238000004422 calculation algorithm Methods 0.000 claims description 13
- 238000013209 evaluation strategy Methods 0.000 claims description 11
- 230000000694 effects Effects 0.000 abstract description 3
- 230000006870 function Effects 0.000 description 10
- 238000010586 diagram Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B19/00—Teaching not covered by other main groups of this subclass
- G09B19/06—Foreign languages
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B5/00—Electrically-operated educational appliances
- G09B5/04—Electrically-operated educational appliances with audible presentation of the material to be studied
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/10—Speech classification or search using distance or distortion measures between unknown speech and reference templates
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/60—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/225—Feedback of the input speech
Definitions
- the present invention relates to the field of computer technologies, and in particular, to a real-time spoken language evaluation system and method on a mobile device.
- Most of the existing spoken language evaluation systems use a computer as a client.
- the user records through a microphone connected to the computer.
- the audio data is transmitted to the server through the network, and is evaluated by an algorithm running on the server.
- the evaluation algorithms are all running on computing resources. (CPU resources, memory resources, storage resources) are relatively sufficient on the server side of the computer.
- the migration of the client of the evaluation system to the mobile device mostly adopts the following solutions: the voice data is collected by the mobile device client, the voice data is transmitted to the server through the network, the spoken language evaluation algorithm runs on the server, and the evaluation result is passed through The network is passed back to the mobile device client.
- the present invention has been made in order to provide a real-time spoken language evaluation system and method on a mobile device that overcomes the above problems or at least partially solves the above problems, by reducing the spoken language evaluation system on the mobile device, not only reducing the spoken language Evaluate the system's reliance on the network, which can reduce the traffic loss of message transmission on the mobile device and the server side, and also provide users with instant oral evaluation feedback, so as to realize when and where to use the spoken language evaluation system to practice speaking and improve the user experience. effect.
- a real-time spoken language evaluation system on a mobile device comprising: an acquisition module, configured to collect voice data of a voice to be evaluated, and a voice or a character string including at least one character in the voice to be evaluated a voice module; the recognition module is configured to identify the voice data collected by the collection module as text data; and the matching module is configured to match the text data recognized by the recognition module with the text data of the voice sample in the voice sample library to obtain a matching result; And an evaluation module, configured to obtain and output a pronunciation score of at least one character or string in the voice to be evaluated according to the matching evaluation strategy and the matching result obtained by the matching module, and/or a pronunciation score of the voice to be evaluated.
- system further includes: a display module, configured to display text data of the voice samples in the voice sample library;
- the collecting module is further configured to collect voice data that is input by the user according to the text data of the voice sample in the voice sample library displayed by the display module, and is used as the voice to be evaluated.
- the system further includes: a score comparison module, a pronunciation score of the voice to be evaluated output by the evaluation module, and/or a pronunciation score of at least one character or string in the voice to be evaluated, and a predefined pronunciation a score threshold for comparing; a marking module, configured to mark, in a text data displayed by the display module, a pronunciation score lower than a predefined one in a case that a pronunciation score of the to-be-evaluated speech is lower than a predefined pronunciation score threshold Text data of the pronunciation score threshold; and/or, in the case where the pronunciation score of the character or the character string in the speech to be evaluated is lower than the predefined pronunciation score threshold, the pronunciation score is marked in the text data displayed by the display module is low A character or string that is at a predefined pronunciation score threshold.
- the matching module is further configured to perform matching calculation on the text data recognized by the identification module and the text data of the voice sample in the voice sample library according to the Levenshtein Distance editing distance algorithm, to obtain a matching result.
- the pre-defined evaluation strategy is: when the recognized text data matches the text data of the voice sample in the voice sample library, the posterior probability of the character or the character string in the text data is determined according to the voice data.
- the pronunciation score of the character or the character string in the speech to be evaluated; the average score of the pronunciation scores of all characters or strings in the speech to be evaluated is taken as the pronunciation score of the speech to be evaluated.
- the system further includes: a storage module, configured to store the voice sample library, where the voice sample library includes at least one voice sample.
- a real-time spoken language evaluation method on a terminal device includes: collecting voice data of a voice to be evaluated, wherein the voice to be evaluated includes a voice of a voice or a character string of at least one character Identifying the collected voice data as text data; matching the identified text data with text data of the voice samples in the voice sample library to obtain a matching result; and according to a predefined evaluation strategy and the matching result, Acquiring and outputting a pronunciation score of at least one character or character string in the speech to be evaluated, and/or a pronunciation score of the speech to be evaluated.
- the method further includes: displaying text data of the voice sample in the voice sample library;
- the step of collecting the voice data of the voice to be evaluated is: collecting voice data input by the user according to the text data of the voice sample in the displayed voice sample library as the voice to be evaluated.
- the method further includes: comparing the output pronunciation score of the to-be-evaluated speech, and/or the pronunciation score of at least one character or character string in the speech to be evaluated, with a predefined pronunciation score threshold; In a case where the pronunciation score of the to-be-evaluated speech is lower than a predefined pronunciation score threshold, text data whose pronunciation score is lower than a predefined pronunciation score threshold is marked in the displayed text data; and/or, to be evaluated In the case where the pronunciation score of at least one character or character string in the voice is lower than a predefined pronunciation score threshold, a character or a character string whose pronunciation score is lower than a predefined pronunciation score threshold is marked in the displayed text data.
- the step of matching the identified text data with the text data of the voice samples in the voice sample library to obtain a matching result is: editing the distance algorithm according to the Levenshtein Distance, and identifying the obtained text data and the voice sample database The text data of the speech sample is matched and calculated to obtain a matching result.
- the voice data of the voice to be evaluated is collected by a real-time spoken language evaluation system on the mobile device; then the collected voice data is recognized as text data; and then the recognized text data and the voice in the voice sample library are The text data of the sample is matched to obtain a matching result; and according to the predefined evaluation strategy and the matching result, the pronunciation score of the speech to be evaluated, and/or the pronunciation score of at least one character or string in the speech to be evaluated is obtained and output.
- the network's dependence reduces the traffic loss of messaging on the mobile device and the server side, and can give users instant feedback evaluation, so as to achieve the effect of practicing spoken language when and where the spoken language evaluation system can be used.
- FIG. 1 is a block diagram schematically showing the structure of a real-time spoken language evaluation system 100 on a mobile device according to an embodiment of the present invention
- FIG. 2 schematically illustrates a flow diagram of a real-time spoken language evaluation method 200 on a mobile device in accordance with an embodiment of the present invention.
- means for carrying out a specified function are intended to cover any means of performing the function, including, for example, (a) a combination of circuit elements that perform the function, or (b) any form of software, thus including firmware, Microcode, etc., combined with appropriate circuitry, to execute software that implements functionality.
- the functions provided by the various modules are combined in the manner claimed by the claims, and it should be understood that any module, component, or component that can provide these functions is equivalent to the modules defined in the claims.
- the real-time spoken language evaluation system 100 on the mobile device may mainly include: an acquisition module 110, an identification module 130, a matching module 150, and an evaluation module 170, which should be understood as represented in FIG.
- the connection relationship of each module is only an example, and those skilled in the art can fully adopt other connection relationships, as long as each module can implement the functions of the present invention under such a connection relationship.
- the functions of the respective modules can be realized by using dedicated hardware or hardware capable of performing processing in combination with appropriate software.
- Such hardware or special purpose hardware may include application specific integrated circuits (ASICs), various other circuits, various processors, and the like.
- ASICs application specific integrated circuits
- this functionality may be provided by a single dedicated processor, a single shared processor, or multiple independent processors, some of which may be shared.
- a processor should not be understood to refer to hardware capable of executing software, but may implicitly include, without limitation, digital signal processor (DSP) hardware, read-only memory (ROM) for storing software, random Access memory (RAM), as well as non-volatile storage devices.
- DSP digital signal processor
- ROM read-only memory
- RAM random Access memory
- the collecting module 110 is configured to collect the number of voices of the voice to be evaluated According to the voice in which the voice or the character string of at least one character is included in the voice to be evaluated.
- the voice to be evaluated may include any one or more combinations of Chinese words, English words, and Arabic numerals. It is of course understood that the language of the voice to be evaluated is not limited in the embodiment of the present invention. Types of.
- the acquisition module 110 is responsible for recording the voice to be evaluated and saving the voice data of the voice to be evaluated.
- the collection module 110 can be an existing microphone, and the user can input the voice to be evaluated to the system 100 through the microphone.
- the content of the speech to be evaluated may be the following English sentence: "Welcome to Liu Li shuo! My name is Peter. I'm an English teacher at Liu Li shuo.”.
- the system 100 converts the voice data of the voice to be evaluated into an audio file in a .wav format through the collection module 110, where the WAV format is a sound waveform file format. It should be understood that the specific structure of the acquisition module 110 is not limited in the embodiment of the present invention.
- the identification module 130 is configured to identify the voice data collected by the collection module 110 as text data.
- the voice data of the voice to be evaluated exemplified above can be identified by the recognition module 130 as the following text data: WELCOME TO LIU LI SHUO! MY NAME IS PETER.I’M AN ENGLISH TEACHER AT LIU LI SHUO.
- the recognition module 130 adopts a speech recognition model, which is a Hidden Markov Model (HMM) with a mixed Gaussian distribution as an output probability distribution.
- HMM Hidden Markov Model
- the identification module 130 can identify the voice data collected by the acquisition module 110 as text data by using a fixed point operation.
- the fixed-point operation is performed in the following manner, and of course, it is not limited to this:
- Method 1 In the existing speech recognition algorithm, there are many floating-point operations, and fixed-point DSP can be used.
- the fixed-point DSP performs integer arithmetic or fractional operation.
- the numerical format does not include the order code.
- the fixed-point DSP is 16-bit. Or 24-bit data width) to achieve floating-point operations, and then through the number of calibration methods to achieve floating-point conversion to fixed-point numbers.
- the scaling of the number is to determine the position of the decimal point in the fixed point number.
- the Q notation is a commonly used calibration method.
- the representation mechanism is: the set point is x, the floating point number is y, then the conversion relationship between the fixed point number of the Q representation and the floating point number is:
- Method 2 (1) define and simplify the algorithm structure; (2) determine the function that needs to be quantized Key variables; (3) collecting statistical information on key variables; (4) determining accurate representations of key variables; and (5) determining fixed-point formats for the remaining variables.
- a fixed point operation can be used instead of a general floating point operation, and an integer number is used instead of a general floating point number to represent the output probability of the recognition result. Since the fixed point operation can be adopted in the embodiment of the present invention, the fixed point operation does not need to define a large number of parameters with respect to the floating point operation, so that the identification module 130 can occupy less system resources (CPU resources, memory resources, storage resources). In the case of the completion of the identification process. It will of course be understood that the specific type of recognition model employed by the recognition module 130 for character recognition is not limited in the embodiment of the present invention.
- the matching module 150 is configured to match the text data recognized by the recognition module 130 with the text data of the voice samples in the voice sample library to obtain a matching result.
- the text data of the voice sample in the voice sample library in the embodiment of the present invention may be text data pre-stored in the voice sample library, for example, the following text data is pre-defined: WELCOME TO LIU LI SHUO! MY NAME IS PETER.I’M AN ENGLISH TEACHER AT LIU LI SHUO, stored in the voice sample library.
- the matching module 150 is further configured to perform matching calculation on the text data recognized by the recognition module 130 and the text data of the voice samples in the voice sample library according to the Levenshtein Distance edit distance algorithm to obtain a match. result.
- the matching result may include: the text data identified by the recognition module 130 and the text data of the voice sample in the voice sample library are matched, and the text data recognized by the recognition module 130 does not match the text data of the voice sample in the voice sample library. It is of course understood that the matching algorithm employed by the matching module 150 is not limited in the embodiment of the present invention.
- the evaluation module 170 is configured to obtain and output a pronunciation score of at least one character or string in the voice to be evaluated according to the matching result obtained by the matching evaluation strategy and the matching module 150, and/or The pronunciation score of the speech to be evaluated.
- the pre-defined evaluation strategy is: when the recognized text data matches the text data of the voice sample in the voice sample library, the character or string in the text data is recognized. Posterior probability as a character or string in the speech to be evaluated The pronunciation score of the pronunciation score, and the average score of the pronunciation scores of all characters or strings in the speech to be evaluated as the pronunciation score of the speech to be evaluated.
- the posterior probability of the character or the character string obtained based on the voice data is p (between 0 and 1), and the pronunciation score of the character or the character string is p ⁇ 100.
- the evaluation module 170 can obtain the pronunciation score of the entire English sentence of "Welcome to Liu Li shuo! My name is Peter. I'm an English teacher at Liu Li shuo.”, and/or obtain The pronunciation score of each word in the above English sentence. That is, a unigram language model composed of sentence words can be used in the embodiment of the present invention.
- the real-time spoken language rating system 100 on the mobile device may also include one or more optional modules to implement additional or additional functionality, however, these optional modules are for the purposes of the present invention. Not necessarily indispensable, the real-time spoken language rating system 100 on a mobile device in accordance with an embodiment of the present invention may fully accomplish the objectives of the present invention without these optional modules. Although these optional modules are not shown in Fig. 1, their connection relationship with each of the above modules can be easily obtained by those skilled in the art in accordance with the following teachings.
- the system 100 further includes: a display module, configured to display text data of the voice samples in the voice sample library, for example, displaying the following English sentence "Welcome to Liu Li shuo! My name is Peter. I'm an English teacher at Liu Li shuo";
- the collecting module 110 is further configured to collect voice data that is input by the user according to the text data of the voice sample in the voice sample library displayed by the display module.
- the acquisition module 110 collects voice data of the following English sentence "Welcome to Liu Li shuo! My name is Peter. I'm an English teacher at Liu Li shuo".
- system 100 further includes: a score comparison module and a markup module, wherein
- the score comparison module is configured to compare the pronunciation score of the to-be-evaluated speech output by the evaluation module 170, and/or the pronunciation score of at least one character or character string in the speech to be evaluated, with a predefined pronunciation score threshold; optionally
- the pre-defined pronunciation score threshold may be set to 60 points, although it is understood that the specific values are not limited in the embodiment of the present invention.
- the tagging module is used to lower the pronunciation score of the speech to be evaluated below a predefined pronunciation score threshold
- a predefined pronunciation score threshold In the case of a value, the text data whose pronunciation score is lower than the predefined pronunciation score threshold is marked in the text data displayed by the display module; and/or the pronunciation score of at least one character or string in the speech to be evaluated is lower than
- a predefined pronunciation score threshold a character or character string whose pronunciation score is lower than a predefined pronunciation score threshold is marked in the text data displayed by the display module.
- the score comparison module compares that the pronunciation score of "Welcome” is lower than the predefined pronunciation score threshold, "Welcome” may be marked in the entire English sentence, optionally, Set the color of "Welcome” to red.
- the system 100 further includes: a storage module, configured to store a library of voice samples, wherein the voice sample library includes at least one voice sample, for example, the content of the voice sample is: “Welcome to Liu Li shuo! My name is Peter.I'm an English teacher at Liu Li shuo.”.
- a storage module configured to store a library of voice samples, wherein the voice sample library includes at least one voice sample, for example, the content of the voice sample is: “Welcome to Liu Li shuo! My name is Peter.I'm an English teacher at Liu Li shuo.”
- the mobile device By implementing the spoken language evaluation system on the client of the mobile device, the mobile device not only reduces the dependence of the mobile device on the network, but also reduces the traffic loss of the message transmission on the mobile device and the server end, and can provide the user with instant spoken language. Evaluate feedback so that when and where you can use the spoken language evaluation system to practice oral English.
- the present invention in accordance with a real-time spoken language evaluation system 100 on a mobile device according to an embodiment of the present invention as described above, the present invention also provides a real-time spoken language evaluation method 200 on a mobile device.
- a flow diagram of a real-time spoken language evaluation method 200 on a mobile device in accordance with an embodiment of the present invention is schematically illustrated.
- the method 200 includes steps S210, S230, S250, and S270.
- the method 200 begins in step S210, in which voice data of a voice to be evaluated is collected.
- the voice to be evaluated includes a voice of a voice or a character string of at least one character.
- the voice to be evaluated may include any one or more combinations of Chinese words, English words, and Arabic numerals, which may be understood.
- the language type of the speech to be evaluated is not limited in the embodiment of the present invention.
- the user can input the voice to be evaluated to the system 100 through the microphone.
- the content of the speech to be evaluated may be the following English sentence: "Welcome to Liu Li shuo! My name is Peter. I'm an English teacher at Liu Li shuo.”
- the system 100 converts the voice data of the voice to be evaluated into an audio file in a .wav format through the collection module 110 and saves it.
- the medium WAV format is the sound waveform file format.
- step S230 the collected voice data is recognized as text data. That is, the voice data of the voice to be evaluated exemplified above can be identified as the following text data by step S230: WELCOME TO LIU LI SHUO! MY NAME IS PETER.I’M AN ENGLISH TEACHER AT LIU LI SHUO.
- the speech recognition model is a Hidden Markov Model (HMM) with a mixed Gaussian distribution as an output probability distribution. That is, in the embodiment of the present invention, a fixed point operation is used instead of a general floating point operation, and an integer number is used instead of a general floating point number to represent the output probability of the recognition result. It will of course be understood that the specific type of recognition model employed for character recognition is not limited in embodiments of the present invention.
- HMM Hidden Markov Model
- step S250 the recognized text data is matched with the text data of the speech samples in the speech sample library to obtain a matching result.
- the text data of the voice sample in the voice sample library in the embodiment of the present invention may be text data pre-stored in the voice sample library, for example, the following text data is pre-defined: WELCOME TO LIU LI SHUO! MY NAME IS PETER.I’M AN ENGLISH TEACHER AT LIU LI SHUO, stored in the voice sample library.
- the Levenshtein Distance edit distance algorithm performs matching calculation on the recognized text data and the text data of the voice samples in the voice sample library to obtain a matching result.
- the matching result includes: the recognized text data is matched with the text data of the speech sample in the speech sample library, and the recognized text data does not match the text data of the speech sample in the speech sample library.
- the matching algorithm employed is not limited in the embodiments of the present invention.
- step S270 based on the predefined evaluation strategy and the matching result, a pronunciation score of at least one character or character string in the speech to be evaluated, and/or a pronunciation score of the speech to be evaluated is obtained and output.
- the pre-defined evaluation strategy is: when the recognized text data matches the text data of the voice sample in the voice sample library, the character or string in the text data is recognized.
- the posterior probability as the pronunciation score of the character or string in the speech to be evaluated, and the average score of the pronunciation scores of all characters or strings in the speech to be evaluated As the pronunciation score of the speech to be evaluated.
- the posterior probability of the character or the character string obtained based on the voice data is p (between 0 and 1), and the pronunciation score of the character or the character string is p ⁇ 100.
- the pronunciation score of the entire English sentence can be obtained by "Selcome to Liu Li shuo! My name is Peter. I'm an English teacher at Liu Li shuo.”
- the pronunciation score of each word in an English sentence That is, a unigram language model composed of sentence words can be used in the embodiment of the present invention.
- the real-time spoken language evaluation method 200 on the mobile device may further comprise one or more optional steps to implement additional or additional functions, however these optional steps are for the purposes of the present invention. It is not indispensable that the real-time spoken language evaluation method 200 on a mobile device according to an embodiment of the present invention can fully achieve the object of the present invention without these optional steps.
- These optional steps are not shown in Figure 2, but their sequential execution with the various steps described above can be readily derived by those skilled in the art in light of the teachings below. It should be noted that these optional steps, together with the order of execution of the above steps, may be selected according to actual needs, unless otherwise specified.
- the method 200 further includes: displaying text data of the text data of the voice sample in the voice sample library, for example, displaying the following English sentence "Welcome to Liu Li shuo! My name is Peter. I'm an English teacher at Liu Li shuo .”;
- the step of collecting the voice data of the voice to be evaluated is: collecting voice data of the voice to be evaluated according to the input of the voice sample in the displayed voice sample library by the user.
- the method 200 further includes comparing the outputted pronunciation score of the to-be-evaluated speech, and/or the pronunciation score of at least one character or character string in the speech to be evaluated, with a predefined pronunciation score threshold.
- the predefined pronunciation score threshold may be set to 60 points, although it is understood that the specific values are not limited in the embodiments of the present invention.
- the pronunciation score of the to-be-evaluated speech is lower than a predefined pronunciation score threshold
- text data whose pronunciation score is lower than a predefined pronunciation score threshold is marked in the displayed text data; and/or, the pronunciation score of at least one character or string in the speech to be evaluated is lower than a predefined pronunciation score
- a threshold a character or a character string whose pronunciation score is lower than a predetermined pronunciation score threshold is marked in the displayed text data.
- modules in the devices in the various embodiments can be adaptively changed and placed in one or more different devices than the embodiment.
- Several of the modules in the embodiments may be combined into one module or unit or component, and they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the steps of any of the methods disclosed in this specification or all of the modules of any device may be combined in any combination, except where the features and/or processes are mutually exclusive.
- Each feature disclosed in this specification can be replaced by an alternative feature that provides the same, equivalent or similar purpose, unless stated otherwise.
- the various device embodiments of the present invention may be implemented in hardware, or in a software module running on one or more processors, or in a combination thereof.
- a microprocessor or digital signal processor may be used in practice to implement some or all of the functionality of some or all of the modules in accordance with embodiments of the present invention.
- DSP digital signal processor
- the invention can also be implemented as a device program (e.g., a computer program and a computer program product) for performing the methods described herein.
- the word “comprising” or “comprises” or “comprising” does not exclude the presence of a
- the use of the terms “a” or “an” The invention can be implemented by means of hardware comprising several different modules or by means of a suitably programmed computer or processor. In the device claim enumerating several modules, several of these can be implemented by the same hardware module.
- the use of the terms “first”, “second”, and “third”, etc., does not denote any order, and these terms may be interpreted as a name.
- the terms “connected,” “coupled,” and the like, when used in this specification, are defined to be operatively connected in any desired form, eg, mechanically, electronically, digitally, analogly, directly, indirectly, through software. Connect by hardware or the like.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Human Computer Interaction (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Educational Technology (AREA)
- Educational Administration (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Signal Processing (AREA)
- Entrepreneurship & Innovation (AREA)
- Quality & Reliability (AREA)
- Machine Translation (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
Description
Claims (10)
- 一种移动设备上的实时口语评价系统(100),其包括:采集模块(110),用于采集待评价语音的语音数据,所述待评价语音中包括至少一个字符的语音或字符串的语音;识别模块(130),用于将所述采集模块(110)采集到的语音数据识别为文本数据;匹配模块(150),用于将所述识别模块(130)识别得到的文本数据与语音样本库中语音样本的文本数据进行匹配,得到匹配结果;以及评价模块(170),用于根据预先定义的评价策略和所述匹配模块(150)匹配得到的匹配结果,得到并输出所述待评价语音中至少一个字符或字符串的发音得分,和/或所述待评价语音的发音得分。
- 根据权利要求1所述的系统,所述系统还包括:显示模块,用于显示所述语音样本库中语音样本的文本数据;所述采集模块(110)进一步用于采集用户按照所述显示模块显示的语音样本库中语音样本的文本数据输入的、作为待评价语音的语音数据。
- 根据权利要求2所述的系统,所述系统还包括:得分比较模块,用于将所述评价模块(170)输出的待评价语音的发音得分,和/或所述待评价语音中至少一个字符或字符串的发音得分,与预先定义的发音得分阈值进行比较;标记模块,用于在所述待评价语音的发音得分低于预先定义的发音得分阈值的情况下,在所述显示模块显示的文本数据中标记出发音得分低于预先定义的发音得分阈值的文本数据;和/或,在待评价语音中字符或字符串的发音得分低于预先定义的发音得分阈值的情况下,在所述显示模块显示的文本数据中标记出发音得分低于预先定义的发音得分阈值的字符或字符串。
- 根据权利要求1所述的系统,其中,所述匹配模块(150)进一步用于根据Levenshtein Distance编辑距离算法,对所述识别模块(130)识别得到的文本数据与语音样本库中语音样本的文本数据进行匹配计算,得 到匹配结果。
- 根据权利要求1~4任一所述系统,其中,所述预先定义的评价策略为:在识别得到的文本数据与语音样本库中语音样本的文本数据匹配的情况下,将根据语音数据识别得到文本数据中字符或字符串的后验概率作为待评价语音中字符或字符串的发音得分;将待评价语音中所有字符或字符串的发音得分的平均分作为待评价语音的发音得分。
- 根据权利要求1~4任一所述的系统,其中,所述系统还包括:存储模块,用于存储所述语音样本库,所述语音样本库中包括至少一个语音样本。
- 一种终端设备上的实时口语评价方法(200),其包括:采集待评价语音的语音数据,所述待评价语音中包括至少一个字符的语音或字符串的语音(S210);将采集到的所述语音数据识别为文本数据(S230);将识别得到的文本数据与语音样本库中语音样本的文本数据进行匹配,得到匹配结果(S250);以及根据预先定义的评价策略和所述匹配结果,得到并输出所述待评价语音中至少一个字符或字符串的发音得分,和/或所述待评价语音的发音得分(S270)。
- 根据权利要求7所述的方法,在所述采集待评价语音的语音数据(S210)的步骤之前,所述方法还包括:显示语音样本库中语音样本的文本数据;所述采集待评价语音的语音数据(S210)的步骤为:采集用户按照显示的语音样本库中语音样本的文本数据输入的、作为待评价语音的语音数据。
- 根据权利要求8所述的方法,所述方法还包括:将输出的所述待评价语音的发音得分,和/或所述待评价语音中的至少一个字符或字符串的发音得分,与预先定义的发音得分阈值进行比较;在所述待评价语音的发音得分低于预先定义的发音得分阈值的情况 下,在显示的文本数据中标记出发音得分低于预先定义的发音得分阈值的文本数据;和/或,在待评价语音中的至少一个字符或字符串的发音得分低于预先定义的发音得分阈值的情况下,在显示的文本数据中标记出发音得分低于预先定义的发音得分阈值的字符或字符串。
- 根据权利要求7~9任一所述的方法,其中,所述将识别得到的文本数据与语音样本库中语音样本的文本数据进行匹配,得到匹配结果的步骤为:根据Levenshtein Distance编辑距离算法,对识别得到的文本数据与语音样本库中语音样本的文本数据进行匹配计算,得到匹配结果。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2016550920A JP6541673B2 (ja) | 2013-10-30 | 2014-10-28 | モバイル機器におけるリアルタイム音声評価システム及び方法 |
US15/033,210 US20160253923A1 (en) | 2013-10-30 | 2014-10-28 | Real-time spoken language assessment system and method on mobile devices |
EP14859160.5A EP3065119A4 (en) | 2013-10-30 | 2014-10-28 | Real-time oral english evaluation system and method on mobile device |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310524873.8 | 2013-10-30 | ||
CN201310524873.8A CN104599680B (zh) | 2013-10-30 | 2013-10-30 | 移动设备上的实时口语评价系统及方法 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2015062465A1 true WO2015062465A1 (zh) | 2015-05-07 |
Family
ID=53003339
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2014/089644 WO2015062465A1 (zh) | 2013-10-30 | 2014-10-28 | 移动设备上的实时口语评价系统及方法 |
Country Status (5)
Country | Link |
---|---|
US (1) | US20160253923A1 (zh) |
EP (1) | EP3065119A4 (zh) |
JP (1) | JP6541673B2 (zh) |
CN (1) | CN104599680B (zh) |
WO (1) | WO2015062465A1 (zh) |
Families Citing this family (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9911410B2 (en) * | 2015-08-19 | 2018-03-06 | International Business Machines Corporation | Adaptation of speech recognition |
CN105513612A (zh) * | 2015-12-02 | 2016-04-20 | 广东小天才科技有限公司 | 语言词汇的音频处理方法及装置 |
JP7028179B2 (ja) * | 2016-09-29 | 2022-03-02 | 日本電気株式会社 | 情報処理装置、情報処理方法およびコンピュータ・プログラム |
CN108154735A (zh) * | 2016-12-06 | 2018-06-12 | 爱天教育科技(北京)有限公司 | 英语口语测评方法及装置 |
CN107578778A (zh) * | 2017-08-16 | 2018-01-12 | 南京高讯信息科技有限公司 | 一种口语评分的方法 |
CN108053839B (zh) * | 2017-12-11 | 2021-12-21 | 广东小天才科技有限公司 | 一种语言练习成果的展示方法及麦克风设备 |
CN108831212B (zh) * | 2018-06-28 | 2020-10-23 | 深圳语易教育科技有限公司 | 一种口语教学辅助装置及方法 |
CN109493852A (zh) * | 2018-12-11 | 2019-03-19 | 北京搜狗科技发展有限公司 | 一种语音识别的评测方法及装置 |
US11640767B1 (en) * | 2019-03-28 | 2023-05-02 | Emily Anna Bridges | System and method for vocal training |
CN110349583A (zh) * | 2019-07-15 | 2019-10-18 | 高磊 | 一种基于语音识别的游戏教育方法及系统 |
CN110634471B (zh) * | 2019-09-21 | 2020-10-02 | 龙马智芯(珠海横琴)科技有限公司 | 一种语音质检方法、装置、电子设备和存储介质 |
CN110797049B (zh) * | 2019-10-17 | 2022-06-07 | 科大讯飞股份有限公司 | 一种语音评测方法及相关装置 |
CN110827794B (zh) * | 2019-12-06 | 2022-06-07 | 科大讯飞股份有限公司 | 语音识别中间结果的质量评测方法和装置 |
CN111415684B (zh) * | 2020-03-18 | 2023-12-22 | 歌尔微电子股份有限公司 | 语音模组的测试方法、装置及计算机可读存储介质 |
WO2022003104A1 (en) * | 2020-07-01 | 2022-01-06 | Iliescu Alexandru | System and method for interactive and handsfree language learning |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2002050803A2 (en) * | 2000-12-18 | 2002-06-27 | Digispeech Marketing Ltd. | Method of providing language instruction and a language instruction system |
US20090087822A1 (en) * | 2007-10-02 | 2009-04-02 | Neurolanguage Corporation | Computer-based language training work plan creation with specialized english materials |
CN101551947A (zh) * | 2008-06-11 | 2009-10-07 | 俞凯 | 辅助口语语言学习的计算机系统 |
CN101551952A (zh) * | 2009-05-21 | 2009-10-07 | 无敌科技(西安)有限公司 | 发音评测装置及其方法 |
CN101739869A (zh) * | 2008-11-19 | 2010-06-16 | 中国科学院自动化研究所 | 一种基于先验知识的发音评估与诊断系统 |
CN102800314A (zh) * | 2012-07-17 | 2012-11-28 | 广东外语外贸大学 | 具有反馈指导的英语句子识别与评价系统及其方法 |
CN103065626A (zh) * | 2012-12-20 | 2013-04-24 | 中国科学院声学研究所 | 英语口语考试系统中的朗读题自动评分方法和设备 |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002175095A (ja) * | 2000-12-08 | 2002-06-21 | Tdk Corp | 発音学習システム |
JP2006133521A (ja) * | 2004-11-05 | 2006-05-25 | Kotoba No Kabe Wo Koete:Kk | 語学学習機 |
US8272874B2 (en) * | 2004-11-22 | 2012-09-25 | Bravobrava L.L.C. | System and method for assisting language learning |
JP2006208644A (ja) * | 2005-01-27 | 2006-08-10 | Toppan Printing Co Ltd | 語学会話力測定サーバシステム及び語学会話力測定方法 |
JP4165898B2 (ja) * | 2005-06-15 | 2008-10-15 | 学校法人早稲田大学 | 文章評価装置及び文章評価プログラム |
JP2007148170A (ja) * | 2005-11-29 | 2007-06-14 | Cai Media Kyodo Kaihatsu:Kk | 外国語学習支援システム |
GB2458461A (en) * | 2008-03-17 | 2009-09-23 | Kai Yu | Spoken language learning system |
CN101246685B (zh) * | 2008-03-17 | 2011-03-30 | 清华大学 | 计算机辅助语言学习系统中的发音质量评价方法 |
JP2010282058A (ja) * | 2009-06-05 | 2010-12-16 | Tokyobay Communication Co Ltd | 外国語学習補助方法及び装置 |
US9361908B2 (en) * | 2011-07-28 | 2016-06-07 | Educational Testing Service | Computer-implemented systems and methods for scoring concatenated speech responses |
CA2923003C (en) * | 2012-09-06 | 2021-09-07 | Rosetta Stone Ltd. | A method and system for reading fluency training |
-
2013
- 2013-10-30 CN CN201310524873.8A patent/CN104599680B/zh active Active
-
2014
- 2014-10-28 JP JP2016550920A patent/JP6541673B2/ja active Active
- 2014-10-28 WO PCT/CN2014/089644 patent/WO2015062465A1/zh active Application Filing
- 2014-10-28 EP EP14859160.5A patent/EP3065119A4/en not_active Withdrawn
- 2014-10-28 US US15/033,210 patent/US20160253923A1/en not_active Abandoned
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2002050803A2 (en) * | 2000-12-18 | 2002-06-27 | Digispeech Marketing Ltd. | Method of providing language instruction and a language instruction system |
US20090087822A1 (en) * | 2007-10-02 | 2009-04-02 | Neurolanguage Corporation | Computer-based language training work plan creation with specialized english materials |
CN101551947A (zh) * | 2008-06-11 | 2009-10-07 | 俞凯 | 辅助口语语言学习的计算机系统 |
CN101739869A (zh) * | 2008-11-19 | 2010-06-16 | 中国科学院自动化研究所 | 一种基于先验知识的发音评估与诊断系统 |
CN101551952A (zh) * | 2009-05-21 | 2009-10-07 | 无敌科技(西安)有限公司 | 发音评测装置及其方法 |
CN102800314A (zh) * | 2012-07-17 | 2012-11-28 | 广东外语外贸大学 | 具有反馈指导的英语句子识别与评价系统及其方法 |
CN103065626A (zh) * | 2012-12-20 | 2013-04-24 | 中国科学院声学研究所 | 英语口语考试系统中的朗读题自动评分方法和设备 |
Also Published As
Publication number | Publication date |
---|---|
CN104599680B (zh) | 2019-11-26 |
EP3065119A4 (en) | 2017-04-19 |
JP2016536652A (ja) | 2016-11-24 |
US20160253923A1 (en) | 2016-09-01 |
JP6541673B2 (ja) | 2019-07-10 |
CN104599680A (zh) | 2015-05-06 |
EP3065119A1 (en) | 2016-09-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2015062465A1 (zh) | 移动设备上的实时口语评价系统及方法 | |
CN107680582B (zh) | 声学模型训练方法、语音识别方法、装置、设备及介质 | |
CN107195295B (zh) | 基于中英文混合词典的语音识别方法及装置 | |
CN105632486B (zh) | 一种智能硬件的语音唤醒方法和装置 | |
WO2020024690A1 (zh) | 语音标注方法、装置及设备 | |
CN107016994B (zh) | 语音识别的方法及装置 | |
WO2020224119A1 (zh) | 用于语音识别的音频语料筛选方法、装置及计算机设备 | |
TWI532035B (zh) | 語言模型的建立方法、語音辨識方法及電子裝置 | |
CN111128223B (zh) | 一种基于文本信息的辅助说话人分离方法及相关装置 | |
TWI391915B (zh) | 語音變異模型建立裝置、方法及應用該裝置之語音辨識系統和方法 | |
US10515292B2 (en) | Joint acoustic and visual processing | |
US8972260B2 (en) | Speech recognition using multiple language models | |
WO2015090215A1 (zh) | 区分地域性口音的语音数据识别方法、装置和服务器 | |
CN111341305B (zh) | 一种音频数据标注方法、装置及系统 | |
WO2018223796A1 (zh) | 语音识别方法、存储介质及语音识别设备 | |
KR20160119274A (ko) | 핫워드 적합성을 결정하는 방법 및 장치 | |
WO2021120602A1 (zh) | 节奏点检测方法、装置及电子设备 | |
CN105551485B (zh) | 语音文件检索方法及系统 | |
CN109377981B (zh) | 音素对齐的方法及装置 | |
TW200926140A (en) | Method and system of generating and detecting confusion phones of pronunciation | |
CN109686383A (zh) | 一种语音分析方法、装置及存储介质 | |
CN109448704A (zh) | 语音解码图的构建方法、装置、服务器和存储介质 | |
CN104347071B (zh) | 生成口语考试参考答案的方法及系统 | |
CN109102800A (zh) | 一种确定歌词显示数据的方法和装置 | |
CN106782517A (zh) | 一种语音音频关键词过滤方法及装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 14859160 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2016550920 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 15033210 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
REEP | Request for entry into the european phase |
Ref document number: 2014859160 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2014859160 Country of ref document: EP |