CN109378015B

CN109378015B - Voice learning system and method

Info

Publication number: CN109378015B
Application number: CN201811445093.3A
Authority: CN
Inventors: 程冰
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2018-11-29
Filing date: 2018-11-29
Publication date: 2023-07-25
Anticipated expiration: 2038-11-29
Also published as: CN109378015A

Abstract

The application belongs to the technical field of sound processing, and particularly relates to a voice learning system and a voice learning method. The existing voice learning system makes the learner unable to autonomously correct voice errors. The application provides a voice learning system, which comprises a corpus manufacturing unit, wherein the corpus manufacturing unit is connected with a voice learning unit, and the voice learning unit is connected with a voice testing unit; the corpus production unit comprises a voice acquisition module, the voice acquisition module is connected with a voice processing module, and the voice processing module is connected with a video editing module; the voice learning unit comprises a database module, wherein the database module comprises a login sub-module, the login sub-module is connected with an icon display sub-module, and the icon display sub-module is connected with a pronunciation sub-module; the voice test unit comprises a voice icon test module, the voice icon test module is connected with a pronunciation test module, and the pronunciation test module is connected with a positive and negative judgment module. The voice learning system can correct the voice error of the learner.

Description

Voice learning system and method

Technical Field

The application belongs to the technical field of sound processing, and particularly relates to a voice learning system and a voice learning method.

Background

Speech, the sound of a language, is the carrier of a language-symbology system, which relies on speech to achieve its social function. Speech is the sound with distinguishing meaning function made by human pronunciation organ, and is the symbology for most directly recording thinking activity. Thus, although speech is a sound, it is essentially different from general sound. Speech learning is the basis for language learning. Studies have shown that infants gradually lose sensitivity to non-native language speech after 12 months, thereby creating a hurdle to future foreign language speech learning. Since the speaker is insensitive to non-native language voices, a learner cannot fully receive voice information in hearing, so that the speaker has different perceptions on the same English voice, the brain of Americans and Chinese. Meanwhile, the language environment contacted by the foreign language learner and the native language learner cannot be compared, so that the voice category established in the brains of the foreign language learner is far away. Thus, the difficulty of speech learning is that the sensitivity of the brain to foreign language speech perception is reduced from the learner's own perspective; from the external speech learning environment, it is a rule that speech input cannot effectively cooperate with brain speech perception.

The existing voice learning system only evaluates pronunciation of a learner, but does not train the learner on the voice, so that the learner cannot autonomously correct voice errors.

Disclosure of Invention

1. Technical problem to be solved

The existing voice learning system only evaluates pronunciation of a learner and does not train the learner on the voice, so that the learner cannot autonomously correct voice errors. Based on the problem, the application provides a voice learning system and a voice learning method.

2. Technical proposal

In order to achieve the above objective, the present application provides a speech learning system, which includes a corpus production unit, wherein the corpus production unit is connected with a speech learning unit, and the speech learning unit is connected with a speech testing unit;

the corpus production unit comprises a voice acquisition module, wherein the voice acquisition module is connected with a voice processing module, and the voice processing module is connected with a video editing module; the voice acquisition module is used for acquiring natural sound recordings; the voice processing module is used for expanding the frequency spectrum characteristics in voice to different degrees and manufacturing corpus; the video editing module is used for editing the voice video and the processed voice to synthesize different video clips;

the voice learning unit comprises a database module, wherein the database module comprises a login sub-module, the login sub-module is connected with an icon display sub-module, and the icon display sub-module is connected with a pronunciation sub-module; the login sub-module is used for setting a login account and a password; the icon display sub-module is used for displaying voice element icons; the pronunciation sub-module is used for clicking the voice element icon, then pronouncing the word containing the voice element and simultaneously displaying the mouth shape of the speaker;

the voice test unit comprises a voice icon test module, the voice icon test module is connected with a pronunciation test module, and the pronunciation test module is connected with a correct and incorrect judgment module; the voice icon testing module is used for displaying tested voice element icons; the pronunciation test module is used for playing the voice material and selecting the played voice by the learner; and the correct and incorrect judgment module is used for judging and recording the correct and incorrect expression of the selected play voice of the learner.

Optionally, the voice processing module includes a MATLAB-based voice processing sub-module, the MATLAB-based voice processing sub-module including a formant frequency difference expander, a pitch synchronization splicer, a frequency separator, a bandwidth separator, and a gap separator; the MATLAB-based sound processing die block includes a sound analyzer and a sound synthesizer.

Optionally, the video editing module includes a format processing sub-module and a frame rate processing sub-module.

Optionally, the icon display submodule includes a first display icon and a second display icon, and the voice element in the first display icon and the voice element in the second display icon are similar in pronunciation.

Optionally, the pronunciation sub-module includes a number of learning levels.

The application also provides a voice learning method, which comprises the following steps:

step 1, amplifying the frequency spectrum characteristics of sound;

step 2, matching the voice materials with different frequency spectrum characteristics with a plurality of speakers to form learning materials with different levels;

step 3, testing the learner when each level of learning is finished;

step 4, if the learner passes the test, the learner enters the next level of learning, otherwise, the learner continues to learn the level which does not pass the test;

and 5, repeating the step 3 and the step 4 until learning the voice is mastered.

Optionally, the spectrum characteristics in the step 1 are amplified to 3 degrees, which are respectively: 300%,208% and 144%.

Optionally, in the step 2, several speakers pronounce the word containing the phonetic element.

Optionally, the testing in the step 3 is a random testing of the word which is not learned.

Optionally, in the step 4, the test accuracy reaches more than 90%, and the test accuracy is passed.

3. Advantageous effects

Compared with the prior art, the voice learning system and the voice learning method have the beneficial effects that:

according to the voice learning system, the corpus is manufactured after the voice frequency spectrum features are amplified through the corpus manufacturing unit, then after the voice elements are learned through the voice learning unit, the learning results are checked through the voice testing unit, the learning effect is further consolidated, and therefore the purpose of correcting voice errors is achieved. The voice learning system helps a foreign language learner to create a language learning and application environment conforming to a brain cognition rule by comparing the difference of a native language learning process and a foreign language learning process, and helps the foreign language learner to establish a voice category similar to the native language, so that the problem of foreign language accent of the learner is relieved.

Drawings

FIG. 1 is a schematic diagram of a speech learning system of the present application;

FIG. 2 is a schematic diagram of the corpus fabrication unit of the present application;

FIG. 3 is a schematic diagram of the principles of the speech learning unit of the present application;

FIG. 4 is a schematic diagram of the speech test unit of the present application;

in the figure: the system comprises a 1-corpus making unit, a 2-voice learning unit, a 3-voice testing unit, a 4-voice acquisition module, a 5-voice processing module, a 6-video editing module, a 7-database module, an 8-login sub-module, a 9-icon display sub-module, a 10-pronunciation sub-module, a 11-voice icon testing module, a 12-pronunciation testing module, a 13-positive and negative judgment module, a 14-MATLAB-based voice processing sub-module, a 15-formant frequency difference expander, a 16-pitch synchronous splicer, a 17-frequency separator, a 18-bandwidth separator, a 19-gap separator, a 20-voice analyzer, a 21-voice synthesizer, a 22-format processing sub-module and a 23-frame frequency processing sub-module.

Detailed Description

Hereinafter, specific embodiments of the present application will be described in detail with reference to the accompanying drawings, and according to these detailed descriptions, those skilled in the art can clearly understand the present application and can practice the present application. Features from various embodiments may be combined to obtain new implementations or to replace certain features from certain embodiments to obtain other preferred implementations without departing from the principles of the present application.

Referring to fig. 1 to 4, the application provides a voice learning system, which comprises a corpus preparation unit, wherein the corpus preparation unit 1 is connected with a voice learning unit 2, and the voice learning unit 2 is connected with a voice test unit 3;

the corpus production unit 1 comprises a voice acquisition module 4, wherein the voice acquisition module 4 is connected with a voice processing module 5, and the voice processing module 5 is connected with a video editing module 6; the voice acquisition module 4 is used for acquiring natural sound recordings; the voice processing module 5 is used for expanding the frequency spectrum characteristics in voice to different degrees and manufacturing corpus; the video editing module 6 is used for editing the voice video and the processed voice to synthesize different video clips;

the voice learning unit 2 comprises a database module 7, wherein the database module 7 comprises a login sub-module 8, the login sub-module 8 is connected with an icon display sub-module 9, and the icon display sub-module 9 is connected with a pronunciation sub-module 10; the login sub-module 8 is used for setting a login account and a password; the icon display submodule 9 is used for displaying voice element icons; the pronunciation sub-module 10 is configured to play a word containing a voice element after clicking the voice element icon, and simultaneously display a mouth shape of a speaker;

the voice test unit 3 comprises a voice icon test module 11, the voice icon test module 11 is connected with a pronunciation test module 12, and the pronunciation test module 12 is connected with a correct and incorrect judgment module 13; the voice icon testing module 11 is used for displaying tested voice element icons; the pronunciation test module 12 is configured to play the voice material and allow a learner to determine the played voice material; the correct and incorrect judgment module 13 is used for judging and recording the correct and incorrect performances of the learner.

The error determination module 13 is a conventional technique, and is only for recording the accuracy.

Optionally, the speech processing module 5 comprises a MATLAB-based sound processing sub-module 14, the MATLAB-based sound processing sub-module 14 comprising a formant frequency difference expander 15, a pitch synchronization splicer 16, a frequency separator 17, a bandwidth separator 18 and a gap separator 19; the MATLAB-based sound processing sub-module 14 includes a sound analyzer 20 and a sound synthesizer 21.

Optionally, the video editing module 6 includes a format processing sub-module 22 and a frame rate processing sub-module 23.

Optionally, the icon display sub-module 9 includes a first display icon and a second display icon, and the voice element in the first display icon and the voice element in the second display icon are similar in pronunciation.

Optionally, the pronunciation sub-module 10 includes several learning levels.

The application provides a voice learning method, which comprises the following steps:

step 1, amplifying the frequency spectrum characteristics of sound;

step 3, testing the learner when each level of learning is finished;

step 4, if the learner passes the test, entering the next level of learning, otherwise, continuing to learn the level which does not pass the test;

Examples

Firstly, the voice acoustic characteristics of the important distinguishing acoustic elements of the target contrast voice are enlarged. For each group of voices to be learned, the physical parameters of specific natural sound processing need to be determined according to the distinguishing factors of the acoustic characteristics of the two voices.

The natural record is obtained by the voice obtaining module 4 in the corpus making unit 1 and then transmitted to the voice processing module 5, the spectral features in the voice are amplified to 300%,208% and 144% respectively by 3 different degrees based on the MATLAB voice processing sub-module 14, and then four grades of learning corpus are made together with the original voice. For example, english language voice/r-l/pair, 3 parameters are F3 separation frequency, F3 bandwidth and F3 transition time. During the synthesis, the formant frequency difference of/r-l/is amplified by the formant frequency difference amplifier 15 and the F3 bandwidth is reduced. The amplification of the/r-l/time characteristic is added by the pitch synchronous splicer 16 using a time-warping technique. For example, the vowels/I-I/pairs of english are separated by frequency separator 17, bandwidth separator 18 and gap separator 19 to separate frequencies and bandwidths of F1 and F2, and the gap between F1 and F2 is adjusted.

The sub-module "LPC Analysis and Synthesis of Speech" in the MATLAB sound processing module 4 is used in the fabrication. LPC refers to Linear Prediction Coding. Including the sound analyzer 20 and the sound synthesizer 21, new sounds can be analyzed and synthesized. (see operations: DSP System Toolbox) ^TM functionality available at the command line.)

After the sound processing is finished, the Final Cut Pro7 is used, and the processing module comprises a format processing sub-module 22 and a frame frequency processing sub-module 23, so that different formats and frame frequencies can be mixed and matched in a time axis, videos of the sound are processed through synchronizing slow lens videos of different versions and time stretching audio tracks, and then the processed videos and the processed sound are put together to edit and synthesize different video clips to be used as corpus for further manufacturing learning software.

The produced corpus is transmitted to the speech learning unit 2, and the speech learning is currently classified into 7 classes as shown in the following table. After each level of learning is completed, there is a corresponding level test. The learner registers and sets the password through the login sub-module 8 in the database module 7, and the learner can continue the previous learning and testing after inputting the set user name and password each time, and meanwhile the database module 7 stores the learning condition of the learner. After logging in, first level learning is performed, two similar voice element icons appear on the interface, namely a first display icon and a second display icon, for example, an I icon and an I icon, and after clicking one icon, a word containing the voice element is heard to sound, and the voice element is accompanied with mouth shape video of a speaker. The entire learning process takes about 2 to 3 hours depending on the learner's progress. Each time a learner has learned a level, he needs to participate in a test that contains 10 words that have not been learned. Only when the test accuracy reaches more than 90%, the learner can enter the study of the next stage, and if the accuracy does not reach more than 90%, the learner can accept the study and the test of the previous stage again. If the learner receives the test of the same level for the second time, the learner can directly enter the study of the next level without being limited by the accuracy. And so on, there are seven levels in total, until the end.

The number of words used and the level of phonetic material exaggeration for each level

The voice learning system related to the application comprises the following main characteristics: 1) Amplifying spectral features of sound; 2) As the learning level increases, the degree of amplification of the spectral features gradually decreases until natural speech is restored, and the number of speakers gradually increases; 3) Providing a pronunciation mouth shape animation of a pronunciation person; 4) Providing a plurality of speech contexts including target phonemes; 5) The learner controls the learning speed and the learning process autonomously without any judgment reaction. The acoustic corpus used in the learning software is a true monosyllabic word, and the speaker is 4 native languages (2 men and 2 women).

Each time a learner finishes learning a phase, he needs to participate in a test containing 10 words that have not been learned. The test accuracy reaches more than 90%, the learner can enter the study of the next stage, otherwise, the learner will accept the study and test of the previous stage again.

The voice learning system utilizes Microsoft ACCESS program to realize a fully functional database structure, processes a large amount of data generated by all learners in the pre-learning test, the post-learning test and all learning stages, and performs the work of accessing, searching, reporting, analyzing and the like. The learning stimulus uses real english vocabulary and the software interface icon adopts international phonetic symbols. The voice learning system can also register user names and passwords, and each learner can set login accounts and passwords in the database, so that online remote learning can be realized, and large-scale learning and research can be performed.

It has been found that the voice features used by the mother when speaking to the infant can help the infant to more easily distinguish the units of voice, and also make them easily perceive the key voice elements that distinguish the meaning of a single word in the mother's voice. The method simulates the voice characteristics of a mother when speaking to a baby, expands the acoustic characteristics of natural voice, then makes corpus suitable for brain perception for a learner, stimulates the nervous system which loses sensitivity to non-native voice to be reopened so as to comprehensively receive voice information, thereby helping the learner to improve perception expression and alleviate accent problem.

According to the voice learning system, the corpus is manufactured into the learning corpus after the corpus manufacturing unit amplifies the voice frequency spectrum characteristics, then after the voice learning unit learns the voice elements, the learning result is checked through the voice testing unit, the learning effect is further consolidated, and accordingly voice errors are corrected. The voice learning system helps a foreign language learner to create a language learning and application environment conforming to a brain cognition rule by comparing the difference of a native language learning process and a foreign language learning process, and helps the foreign language learner to establish a voice category similar to the native language, so that the problem of foreign language accent of the learner is relieved.

Although the present application has been described with reference to particular embodiments, those skilled in the art will appreciate that many modifications are possible in the principles and scope of the disclosure. The scope of the application is to be determined by the appended claims, and it is intended that the claims cover all modifications that are within the literal meaning or range of equivalents of the technical features of the claims.

Claims

1. A speech learning system, characterized by: the device comprises a corpus manufacturing unit, a voice learning unit and a voice testing unit, wherein the corpus manufacturing unit is connected with the voice learning unit;

the voice learning unit comprises a database module, wherein the database module comprises a login sub-module, the login sub-module is connected with an icon display sub-module, and the icon display sub-module is connected with a pronunciation sub-module; the login sub-module is used for setting a login account and a password; the icon display sub-module is used for displaying voice element icons; the pronunciation sub-module is used for playing words containing voice elements after clicking the voice element icons and displaying the mouth shapes of the pronunciation persons;

the voice test unit comprises a voice icon test module, the voice icon test module is connected with a pronunciation test module, and the pronunciation test module is connected with a correct and incorrect judgment module; the voice icon testing module is used for displaying tested voice element icons; the pronunciation test module is used for playing the voice material and judging and selecting the played voice by a learner; the correct and incorrect judgment module is used for judging and recording correct and incorrect expressions of pronunciation selection of the learner; the voice processing module comprises a voice processing sub-module based on MATLAB, and the voice processing sub-module based on MATLAB is used for amplifying the spectrum characteristics in the voice to 3 different degrees, namely 300%,208% and 144% respectively.

2. The speech learning system of claim 1 wherein: the MATLAB-based sound processing submodule comprises a formant frequency difference expander, a pitch synchronous splicer, a frequency separator, a bandwidth separator and a gap separator; the MATLAB-based sound processing die block includes a sound analyzer and a sound synthesizer.

3. The speech learning system of claim 1 wherein: the video editing module comprises a format processing sub-module and a frame frequency processing sub-module.

4. The speech learning system of claim 1 wherein: the icon display submodule comprises a first display icon and a second display icon, and voice elements in the first display icon are similar to those in the second display icon in pronunciation.

5. The speech learning system of any one of claims 1-4 wherein: the pronunciation sub-module includes a number of learning levels.

6. A voice learning method using the voice learning system according to any one of claims 1 to 5, characterized in that: the method comprises the following steps:

step 1, amplifying the frequency spectrum characteristics of sound;

step 3, testing the learner when each level of learning is finished;

step 5, repeating the step 3 and the step 4 until learning voice is mastered;

the voice acquisition module is used for acquiring natural sound recordings; the voice processing module is used for expanding the frequency spectrum characteristics in voice to different degrees, making corpus, acquiring natural sound recordings through the voice acquisition module in the corpus making unit, transmitting the natural sound recordings to the voice processing module, amplifying the frequency spectrum characteristics in voice to different degrees through the MATLAB-based voice processing sub-module, respectively 300%,208% and 144%, and then making four-level learning corpus together with the original voice; english voice/r-l/pair, 3 parameters are F3 separation frequency, F3 bandwidth and F3 transition time; during the synthesis, the formant frequency difference of/r-l/is amplified by a formant frequency difference amplifier and the F3 bandwidth is reduced.

7. The method of speech learning of claim 6, wherein: in the step 2, several speakers pronounce the words containing the voice elements.

8. The method of speech learning of claim 6, wherein: the test in the step 3 is to randomly test the word which is not learned.

9. The method of speech learning of claim 6, wherein: in the step 4, the test accuracy rate reaches more than 90%, and the test accuracy rate is passed.