CN113506572A

CN113506572A - Portable real-time feedback language learning system

Info

Publication number: CN113506572A
Application number: CN202110774465.2A
Authority: CN
Inventors: 刘育雁
Original assignee: Northeast Normal University
Current assignee: Northeastern University China; Northeast Normal University
Priority date: 2021-07-08
Filing date: 2021-07-08
Publication date: 2021-10-15

Abstract

The invention discloses a portable real-time feedback language learning system, and belongs to the technical field of language learning. The equipment can intelligently analyze whether the pronunciation of the user needs to be corrected according to the learning target and the application occasion of the user, and can conduct teaching guidance. The invention comprises a display module, a voice transmission module, a control module and a language library module; the device converts collected voice signals into digital signals after voice recognition processing, transmits the digital signals to the controller, adopts a sliding window discrete Fourier transform method to carry out harmonic extraction analysis, displays voice signal waveforms (corresponding tone intensity, tone length and tone pitch) in real time, compares corresponding statement waveforms in a language library based on a residual error theory, and judges whether the pronunciation is correct or not. The equipment is convenient to carry, and can meet the requirement that non-native language learners and users with nonstandard pronunciation can solve the problem of language pronunciation according to the requirements.

Description

Portable real-time feedback language learning system

Technical Field

The invention relates to the technical field of language learning, in particular to a portable real-time feedback language learning system.

Background

As the cooperation degree of each country in the world is continuously deepened, the influence on communication due to language obstruction is always a difficult problem. Most of current language learning depends on teaching of teachers engaged in language work and occupation, the autonomous learning method is few and has many limitations, and if no guidance of related professionals exists, errors are easy to occur in language learning.

The current intelligent learning software has a single learning spoken language pronunciation error correction mode and cannot be intelligently learned by users according to different language learning requirements, so that a portable real-time feedback voice learning system is provided aiming at the defects of the existing language learning method and application.

Disclosure of Invention

This section is for the purpose of summarizing some aspects of embodiments of the invention and to briefly introduce some preferred embodiments. In this section, as well as in the abstract and the title of the invention of this application, simplifications or omissions may be made to avoid obscuring the purpose of the section, the abstract and the title, and such simplifications or omissions are not intended to limit the scope of the invention.

The present invention has been made in view of the problems occurring in the existing language learning system.

Therefore, an object of the present invention is to provide a portable real-time feedback language learning system, which can implement a language learning function in a portable use situation, can feed back a pronunciation part that a user needs to improve in time, and can provide correct pronunciation teaching.

To solve the above technical problem, according to an aspect of the present invention, the present invention provides the following technical solutions:

a portable real-time feedback language learning system comprises a display module, a communication module, a control module, a voice library module and a language transmission module;

the display module: the system is used for realizing information interaction between a user and the system, wherein the interactive information comprises a learning language selection target learning statement, a learning stage selection and a language analysis report;

the display module is connected with display equipment (including an industrial display screen, a mobile phone, an iPad, a computer and the like) in a USB interface or Bluetooth connection mode and can realize human-computer information interaction by matching with corresponding apps;

the control module: the system is used for receiving the requirement input by the display module, outputting the waveform of the tone intensity and tone pitch by the digital quantity signal input by the language transmission module through a sliding window discrete Fourier transform method and comparing the waveform with the waveform of the corpus, and is provided with a USB interface to realize the transmission with an upper computer;

the language transmission module: the voice broadcasting system comprises a voice broadcasting module, a language conversion module and a microphone module, wherein the microphone module is used for collecting voice signals, the voice conversion module is used for carrying out A/D conversion on the collected voice signals into digital signals and sending the digital signals to a controller module, and the voice broadcasting module is used for broadcasting correct pronunciation of a target statement;

the voice library module comprises a multinational language library, standard pronunciations such as pitch, tone intensity and tone color of each sentence, pronunciation skills of words and example analysis;

step1, a user selects a learning language, a target learning stage and related sentences;

step2, matching target sentences in a voice information base, outputting the target sentences to a voice broadcasting module through a voice recognition module of a control module, and performing demonstration teaching;

step3, collecting the pronunciation content of the user, and inputting the pronunciation content of the user to the control module through the language recognition module by using a digital quantity signal;

step4, performing harmonic extraction analysis on the voice learning pronunciation content according to pitch, tone and tone intensity by using a sliding window discrete Fourier transform method in the control module;

step5, comparing the extracted pitch, tone color and tone intensity of the user with the standard pronunciation of the information base, and outputting the tone length, the pitch and the tone intensity by a base and residual error theory respectively;

step6, generating a report based on the residual error of the S5 part, comparing the pronunciation index of each part with the residual error threshold value set in the current learning stage, and defining the part of the residual error exceeding the set threshold value as the part of the pronunciation unnormal to be displayed in a display module;

and 7, transmitting the part with the irregular pronunciation to a user in a display module and a voice broadcast module according to the standard audio frequency, the pronunciation method and the example teaching in the standard voice database.

The step of S4 further includes:

a portable feedback type language learning system converts collected voice signals into digital signals after voice recognition processing, transmits the digital signals to a controller, performs harmonic extraction analysis by adopting a sliding window discrete Fourier transform method, and displays voice signal waveforms (namely corresponding tone intensity, tone length and tone pitch) in a display module in real time.

The step of S4 further includes:

the method comprises the steps of utilizing the sensitivity of a sliding window discrete Fourier transform algorithm to frequency fluctuation, carrying out discretization processing on collected voice signals to be divided into n signal segments, then carrying out sliding window processing, taking out the signal segments to carry out Fourier transform so as to observe frequency components of signals in the segments, and realizing the reconstruction of amplitude and phase information of harmonic signals.

The step of S4 further includes:

the method can be represented by an infinite series of sine and cosine functions for any periodic function, namely:

according to the euler formula:

wherein, c₀Is a constant direct current constant; omega₁An angular frequency representing a fundamental frequency of the periodic function;

representing the initial phase of each harmonic; m_nRepresenting the amplitude of each level of trigonometric function, wherein n is more than or equal to 2 and is each harmonic amplitude, and when n is 1, the amplitude is the fundamental wave amplitude; a. the_nCosine coefficients representing the nth harmonic; b is_nRepresenting the sine coefficients of the nth harmonic.

From the parity of the trigonometric function, the equation can be equivalent to:

the above equation is a complex exponential form of a fourier series expansion, according to the definition of the fourier transform. As the upgrade of the fourier series, the fourier transform can realize the conversion of the signal in the time domain and the frequency domain, and can decompose the signal through time-frequency conversion to convert a continuous periodic component into a discrete spectral component in the frequency domain. Mathematically, for satisfying u (t) e L²(R) a continuous-time signal u (t) whose continuous fourier transform can be defined as:

the inverse fourier transform of X (ω) is:

when the signal analysis of the voice correction system is actually applied, the period of a periodic voice signal u (T) with any limited bandwidth is set as T according to the Fourier series principle, and the bandwidth of a frequency band is from fundamental wave angular frequency omega to N_maxω, its fourier transform expression is:

the amplitude Mn and the initial phase angle of the subharmonic signal can be obtained according to the sine coefficient An and the cosine coefficient Bn of the subharmonic signal

Information:

according to the expression of Fourier transform, the spectrum analysis of the speech harmonic signal can be realized through the Fourier transform. In practice, the fourier series method is mostly implemented by a digital processing method, i.e., discrete fourier transform. For the discrete fourier transform, its arguments in both the time and frequency domains are discrete. For a finite long voice signal u (N) which is processed into a discrete time domain signal through sampling and A/D conversion, a sampling time window is formed by taking N sampling data as a group, and discrete Fourier transform is performed, namely:

the step of S4 further includes:

the effect of the discrete fourier transform is in fact to discretize a finite-length sequence in the frequency domain;

for a harmonic signal u (T) of the voice correction system, the period is T, and a discrete Fourier transform expression corresponding to the formula is as follows:

wherein k is 0,1,2, 1, N-1;

the step of S4 further includes:

aiming at the requirement of high real-time performance of a voice correction system, a sliding window discrete Fourier transform algorithm is adopted;

and carrying out iterative updating on a sampling time window formed by the N sampling data, and adding new real-time sampling data to replace the original part to carry out analysis and detection on the voice harmonic signals.

According to the formula, the extracted nth-order harmonic component of the speech signal of its corresponding sliding window fourier can be expressed as:

in the formula, N_newRepresents the latest sampling point, u_n(k τ) represents the sample data at time k. A. the_nRepresents the n-th harmonic cosine coefficient, B_nRepresenting the sine coefficients of the nth harmonic.

The sine factor and cosine factor can be written as follows:

in a period, every time data updating is carried out, the obtained new iteration value needs to be placed in the storage space of the old iteration value again, and the method has strong practicability in the situation of higher real-time requirement in the harmonic wave analysis process. Therefore, the device is applied to real-time voice monitoring and correction.

The step of S5 further includes:

the part for comparing the actual pronunciation of the user with the harmonic wave of the voice library;

comparing the pitch tone intensity of each word of the user pronunciation with a standard voice library based on a residual error theory, wherein the residual error sequence expression comprises the following steps:

wherein the content of the first and second substances,

actual output for the user; and y is the output of the standard voice library.

Mean square sum of residual sequences:

where N is the number of samples.

The discriminant for discriminating whether the pronunciation needs to continue learning the threshold detection according to whether the residual value exceeds the target learning set threshold is as follows:

wherein epsilon₀In order to detect the threshold, the threshold is a decisive factor for judging whether the system has a fault, and if the residual value exceeds the threshold, the learning is required to be continued.

The step of S1 further includes:

the user inputs a target learning language and a learning stage according to the self learning condition, and the system sets different fault-tolerant thresholds corresponding to different learning stages, namely the fault-tolerant threshold of the user with low spoken language level requirement is larger;

the step of S2 further includes:

the language library comprises standard pronunciations of various countries and dialects of various regions, and a user can select a target learning language according to different requirements.

Compared with the prior art, the invention has the beneficial effects that: the method comprises the steps of converting collected voice signals into digital signals after voice recognition processing, transmitting the digital signals to a controller, carrying out harmonic extraction analysis by adopting a sliding window discrete Fourier transform method, displaying voice signal waveforms (namely corresponding tone intensity, tone length and tone pitch) in real time, comparing corresponding statement waveforms in a language library based on a residual error theory, and judging whether pronunciation is correct or not. The equipment is convenient to carry, and can meet the requirement that non-native language learners and users with nonstandard pronunciation can solve the problem of language pronunciation according to the requirements.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the present invention will be described in detail with reference to the accompanying drawings and detailed embodiments, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise. Wherein:

FIG. 1 is a block diagram of a portable self-feedback language learning system according to an embodiment of the present invention;

FIG. 2 is a three-dimensional schematic diagram of a portable self-feedback language learning system according to an embodiment of the present invention;

FIG. 3 is a flow chart of a portable self-feedback language learning system according to an embodiment of the present invention;

FIG. 4 is a time domain signal digital processing flow chart of a portable self-feedback language learning system according to an embodiment of the present invention;

fig. 5 is a schematic diagram of a sliding window discrete fourier transform data iteration process of a portable self-feedback language learning system according to an embodiment of the present invention.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described herein, and it will be apparent to those of ordinary skill in the art that the present invention may be practiced without departing from the spirit and scope of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.

Next, the present invention will be described in detail with reference to the drawings, wherein for convenience of illustration, the cross-sectional view of the device structure is not enlarged partially according to the general scale, and the drawings are only examples, which should not limit the scope of the present invention. In addition, the three-dimensional dimensions of length, width and depth should be included in the actual fabrication.

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

Example 1

The structural schematic diagram of the invention is shown in fig. 1, and the structural characteristics comprise a display module, a control module, a voice library module and a language transmission module. Each module is described below;

fig. 3 is a flowchart of a portable self-feedback language learning system provided in an embodiment, which is divided into 7 steps, and includes:

step1, selecting a learning language by a user, a target learning stage and related sentences;

step2, matching target sentences in the voice information base, outputting the target sentences to the voice broadcasting module through the voice recognition module of the control module, and performing demonstration teaching;

step3, collecting the pronunciation content of the user, the voice conversion module A/D converts the collected voice signal into digital signal and sends the digital signal to the controller module, and the time domain signal digital processing flow of the conversion process is shown in figure 3;

step4, performing harmonic extraction analysis on the phonetic learning pronunciation content according to pitch, tone and tone intensity by using a sliding window discrete Fourier transform method;

FIG. 4 is a schematic diagram of an iterative process of sliding window discrete Fourier transform data of a portable self-feedback language learning system provided by an embodiment

according to the euler formula:

representing the initial phase of each harmonic; m_nRepresenting the amplitude of each level of trigonometric function, wherein n is more than or equal to 2 and is each harmonic amplitude, and when n is 1, the amplitude is the fundamental wave amplitude; a. the_nCosine coefficients representing the nth harmonic; b is_nRepresenting the sine coefficients of the nth harmonic. Parity according to trigonometric functionsAlternatively, the equation may be equivalent to:

the inverse fourier transform of X (ω) is:

Information:

the step of S4 further includes:

for a harmonic signal u (T) of the voice correction system, the period is T, and the corresponding discrete Fourier transform expression is as follows:

wherein k is 0,1, 2.., N-1;

the step of S4 further includes:

in the formula, N_newRepresents the latest sampling point, u_n(k τ) represents the sample data at time k. A. the_nRepresents the n-th harmonic cosine coefficient, B_nRepresenting the sine coefficients of the nth harmonic. The corresponding nth harmonic information is the same as the formula.

The sine factor and cosine factor can be written as follows:

Comparing the extracted pitch, tone and tone intensity of the user with the standard pronunciation of the information base, and outputting the duration, pitch and tone intensity by the base and residual error theory respectively at Step5 shown in FIG. 1;

wherein the content of the first and second substances,

actual output for the user; and y is the output of the standard voice library.

Mean square sum of residual sequences:

where N is the number of samples.

Step6, generating a report based on the residual error of the S5 part, comparing the pronunciation index of each part with the residual error threshold value set in the current learning stage, and defining the part of the residual error exceeding the set threshold value as the part of the pronunciation unnormality to be displayed in a display module;

Step7, transmitting the part with the abnormal pronunciation in the standard voice library, the standard audio, the pronunciation method and the example teaching to the user in the display module and the voice broadcast module.

While the invention has been described above with reference to an embodiment, various modifications may be made and equivalents may be substituted for elements thereof without departing from the scope of the invention. In particular, the various features of the disclosed embodiments of the invention may be used in any combination, provided that no structural conflict exists, and the combinations are not exhaustively described in this specification merely for the sake of brevity and resource conservation. Therefore, it is intended that the invention not be limited to the particular embodiments disclosed, but that the invention will include all embodiments falling within the scope of the appended claims.

Claims

1. A portable real-time feedback language learning system, characterized by: the voice recognition system comprises a display module, a control module, a voice library module and a language transmission module;

the display module: the system is used for realizing information interaction between a user and the system, is connected with display equipment in a USB interface or Bluetooth connection mode, and can realize man-machine information interaction by matching with corresponding software, wherein the interactive information comprises learning language selection, target learning sentences, learning stage selection and language analysis reports;

the control module: the method comprises the steps of carrying out control file transmission with an upper computer through a USB interface, executing the input requirement of a display module, comparing a tone intensity tone pitch output waveform of a digital quantity signal input by a language transmission module with a corpus standard waveform by a sliding window discrete Fourier transform method by using a residual error theory, and outputting the waveform to the display module;

the language transmission module: the voice conversion module is used for A/D conversion of the collected voice signals into digital signals and sending the digital signals to the controller module, and the controller outputs voice to play correct pronunciation of a target statement through voice;

the voice library module: the universal language library is connected with an upper computer through a usb module for downloading, and comprises standard pronunciations such as pitch, tone intensity and tone of standard languages of a multi-national language library, corresponding different learning stages, pronunciation skills of the languages and example analysis.

2. A portable real-time feedback language learning system according to claim 1, wherein: the system execution comprises the following steps:

step 1: selecting a learning language, a target learning stage and related sentences by a user;

step 2: matching target sentences in a voice information base, outputting the target sentences to a voice broadcasting module through a voice recognition module of a control module, and performing demonstration teaching;

and step 3: collecting the pronunciation content of the user, and inputting the pronunciation content of the user to the control module in a digital quantity form through the language transmission module;

and 4, step 4: carrying out harmonic extraction analysis on the phonetic learning pronunciation content according to pitch, tone and tone intensity by using a sliding window discrete Fourier transform method in a control module;

and 5: comparing the extracted pitch, tone and tone intensity of the user with the standard pronunciation corresponding to the language library, and displaying the residual errors of tone length, pitch and tone intensity by the base and residual error theory respectively;

step 6: generating a report based on the residual error of the S5 part, comparing each part of pronunciation index with the residual error threshold value set in the current learning stage, and defining the part of the residual error exceeding the set threshold value as the part of pronunciation unnormality to be displayed in a display module;

and 7: and transmitting the part with the irregular pronunciation to a user in a standard voice database, wherein the part with the irregular pronunciation is a standard voice frequency, a pronunciation method and an example teaching in a display module and a voice broadcast module.

3. A portable real-time feedback language learning system according to claim 1, wherein: the method comprises the steps of converting collected voice signals into digital signals after voice recognition processing, transmitting the digital signals to a controller, extracting and analyzing harmonic waves by adopting a sliding window discrete Fourier transform method, and displaying voice signal waveforms, namely corresponding tone intensity, tone length and tone pitch, in a display module in real time.

4. A portable real-time feedback language learning system according to claim 3, wherein: the method comprises the steps of utilizing the sensitivity of a sliding window discrete Fourier transform algorithm to frequency fluctuation, carrying out discretization processing on collected voice signals to be divided into n signal segments, then carrying out sliding window processing, taking out the signal segments to carry out Fourier transform so as to observe frequency components of signals in the segments, and realizing the reconstruction of amplitude and phase information of harmonic signals.

5. A portable real-time feedback language learning system according to claim 2, wherein: a harmonic comparison part for comparing the pitch tone intensity of each word of the user pronunciation with the standard voice library based on the residual error theory, wherein the residual error sequence expression comprises:

in the formula (I), the compound is shown in the specification,

actual output for the user; y is the output of the standard speech library

Mean square sum of residual sequences:

where N is the number of samples

6. A portable real-time feedback language learning system according to claim 5, wherein: the detection threshold value sets different fault tolerance threshold values aiming at different learning stages input by a user, and the range of the residual error contrast threshold value is different correspondingly.