CN113506572A - Portable real-time feedback language learning system - Google Patents

Portable real-time feedback language learning system Download PDF

Info

Publication number
CN113506572A
CN113506572A CN202110774465.2A CN202110774465A CN113506572A CN 113506572 A CN113506572 A CN 113506572A CN 202110774465 A CN202110774465 A CN 202110774465A CN 113506572 A CN113506572 A CN 113506572A
Authority
CN
China
Prior art keywords
voice
language
module
pronunciation
learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110774465.2A
Other languages
Chinese (zh)
Inventor
刘育雁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeastern University China
Northeast Normal University
Original Assignee
Northeast Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeast Normal University filed Critical Northeast Normal University
Priority to CN202110774465.2A priority Critical patent/CN113506572A/en
Publication of CN113506572A publication Critical patent/CN113506572A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B5/00Electrically-operated educational appliances
    • G09B5/04Electrically-operated educational appliances with audible presentation of the material to be studied
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/45Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/90Pitch determination of speech signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/90Pitch determination of speech signals
    • G10L2025/906Pitch tracking

Abstract

The invention discloses a portable real-time feedback language learning system, and belongs to the technical field of language learning. The equipment can intelligently analyze whether the pronunciation of the user needs to be corrected according to the learning target and the application occasion of the user, and can conduct teaching guidance. The invention comprises a display module, a voice transmission module, a control module and a language library module; the device converts collected voice signals into digital signals after voice recognition processing, transmits the digital signals to the controller, adopts a sliding window discrete Fourier transform method to carry out harmonic extraction analysis, displays voice signal waveforms (corresponding tone intensity, tone length and tone pitch) in real time, compares corresponding statement waveforms in a language library based on a residual error theory, and judges whether the pronunciation is correct or not. The equipment is convenient to carry, and can meet the requirement that non-native language learners and users with nonstandard pronunciation can solve the problem of language pronunciation according to the requirements.

Description

Portable real-time feedback language learning system
Technical Field
The invention relates to the technical field of language learning, in particular to a portable real-time feedback language learning system.
Background
As the cooperation degree of each country in the world is continuously deepened, the influence on communication due to language obstruction is always a difficult problem. Most of current language learning depends on teaching of teachers engaged in language work and occupation, the autonomous learning method is few and has many limitations, and if no guidance of related professionals exists, errors are easy to occur in language learning.
The current intelligent learning software has a single learning spoken language pronunciation error correction mode and cannot be intelligently learned by users according to different language learning requirements, so that a portable real-time feedback voice learning system is provided aiming at the defects of the existing language learning method and application.
Disclosure of Invention
This section is for the purpose of summarizing some aspects of embodiments of the invention and to briefly introduce some preferred embodiments. In this section, as well as in the abstract and the title of the invention of this application, simplifications or omissions may be made to avoid obscuring the purpose of the section, the abstract and the title, and such simplifications or omissions are not intended to limit the scope of the invention.
The present invention has been made in view of the problems occurring in the existing language learning system.
Therefore, an object of the present invention is to provide a portable real-time feedback language learning system, which can implement a language learning function in a portable use situation, can feed back a pronunciation part that a user needs to improve in time, and can provide correct pronunciation teaching.
To solve the above technical problem, according to an aspect of the present invention, the present invention provides the following technical solutions:
a portable real-time feedback language learning system comprises a display module, a communication module, a control module, a voice library module and a language transmission module;
the display module: the system is used for realizing information interaction between a user and the system, wherein the interactive information comprises a learning language selection target learning statement, a learning stage selection and a language analysis report;
the display module is connected with display equipment (including an industrial display screen, a mobile phone, an iPad, a computer and the like) in a USB interface or Bluetooth connection mode and can realize human-computer information interaction by matching with corresponding apps;
the control module: the system is used for receiving the requirement input by the display module, outputting the waveform of the tone intensity and tone pitch by the digital quantity signal input by the language transmission module through a sliding window discrete Fourier transform method and comparing the waveform with the waveform of the corpus, and is provided with a USB interface to realize the transmission with an upper computer;
the language transmission module: the voice broadcasting system comprises a voice broadcasting module, a language conversion module and a microphone module, wherein the microphone module is used for collecting voice signals, the voice conversion module is used for carrying out A/D conversion on the collected voice signals into digital signals and sending the digital signals to a controller module, and the voice broadcasting module is used for broadcasting correct pronunciation of a target statement;
the voice library module comprises a multinational language library, standard pronunciations such as pitch, tone intensity and tone color of each sentence, pronunciation skills of words and example analysis;
step1, a user selects a learning language, a target learning stage and related sentences;
step2, matching target sentences in a voice information base, outputting the target sentences to a voice broadcasting module through a voice recognition module of a control module, and performing demonstration teaching;
step3, collecting the pronunciation content of the user, and inputting the pronunciation content of the user to the control module through the language recognition module by using a digital quantity signal;
step4, performing harmonic extraction analysis on the voice learning pronunciation content according to pitch, tone and tone intensity by using a sliding window discrete Fourier transform method in the control module;
step5, comparing the extracted pitch, tone color and tone intensity of the user with the standard pronunciation of the information base, and outputting the tone length, the pitch and the tone intensity by a base and residual error theory respectively;
step6, generating a report based on the residual error of the S5 part, comparing the pronunciation index of each part with the residual error threshold value set in the current learning stage, and defining the part of the residual error exceeding the set threshold value as the part of the pronunciation unnormal to be displayed in a display module;
and 7, transmitting the part with the irregular pronunciation to a user in a display module and a voice broadcast module according to the standard audio frequency, the pronunciation method and the example teaching in the standard voice database.
The step of S4 further includes:
a portable feedback type language learning system converts collected voice signals into digital signals after voice recognition processing, transmits the digital signals to a controller, performs harmonic extraction analysis by adopting a sliding window discrete Fourier transform method, and displays voice signal waveforms (namely corresponding tone intensity, tone length and tone pitch) in a display module in real time.
The step of S4 further includes:
the method comprises the steps of utilizing the sensitivity of a sliding window discrete Fourier transform algorithm to frequency fluctuation, carrying out discretization processing on collected voice signals to be divided into n signal segments, then carrying out sliding window processing, taking out the signal segments to carry out Fourier transform so as to observe frequency components of signals in the segments, and realizing the reconstruction of amplitude and phase information of harmonic signals.
The step of S4 further includes:
the method can be represented by an infinite series of sine and cosine functions for any periodic function, namely:
Figure BDA0003154095140000031
according to the euler formula:
Figure BDA0003154095140000032
wherein, c0Is a constant direct current constant; omega1An angular frequency representing a fundamental frequency of the periodic function;
Figure BDA0003154095140000033
representing the initial phase of each harmonic; mnRepresenting the amplitude of each level of trigonometric function, wherein n is more than or equal to 2 and is each harmonic amplitude, and when n is 1, the amplitude is the fundamental wave amplitude; a. thenCosine coefficients representing the nth harmonic; b isnRepresenting the sine coefficients of the nth harmonic.
From the parity of the trigonometric function, the equation can be equivalent to:
Figure BDA0003154095140000041
the above equation is a complex exponential form of a fourier series expansion, according to the definition of the fourier transform. As the upgrade of the fourier series, the fourier transform can realize the conversion of the signal in the time domain and the frequency domain, and can decompose the signal through time-frequency conversion to convert a continuous periodic component into a discrete spectral component in the frequency domain. Mathematically, for satisfying u (t) e L2(R) a continuous-time signal u (t) whose continuous fourier transform can be defined as:
Figure BDA0003154095140000042
the inverse fourier transform of X (ω) is:
Figure BDA0003154095140000043
when the signal analysis of the voice correction system is actually applied, the period of a periodic voice signal u (T) with any limited bandwidth is set as T according to the Fourier series principle, and the bandwidth of a frequency band is from fundamental wave angular frequency omega to Nmaxω, its fourier transform expression is:
Figure BDA0003154095140000044
the amplitude Mn and the initial phase angle of the subharmonic signal can be obtained according to the sine coefficient An and the cosine coefficient Bn of the subharmonic signal
Figure BDA0003154095140000045
Information:
Figure BDA0003154095140000046
Figure BDA0003154095140000047
according to the expression of Fourier transform, the spectrum analysis of the speech harmonic signal can be realized through the Fourier transform. In practice, the fourier series method is mostly implemented by a digital processing method, i.e., discrete fourier transform. For the discrete fourier transform, its arguments in both the time and frequency domains are discrete. For a finite long voice signal u (N) which is processed into a discrete time domain signal through sampling and A/D conversion, a sampling time window is formed by taking N sampling data as a group, and discrete Fourier transform is performed, namely:
Figure BDA0003154095140000051
the step of S4 further includes:
the effect of the discrete fourier transform is in fact to discretize a finite-length sequence in the frequency domain;
for a harmonic signal u (T) of the voice correction system, the period is T, and a discrete Fourier transform expression corresponding to the formula is as follows:
Figure BDA0003154095140000052
Figure BDA0003154095140000053
wherein k is 0,1,2, 1, N-1;
Figure BDA0003154095140000054
the step of S4 further includes:
aiming at the requirement of high real-time performance of a voice correction system, a sliding window discrete Fourier transform algorithm is adopted;
and carrying out iterative updating on a sampling time window formed by the N sampling data, and adding new real-time sampling data to replace the original part to carry out analysis and detection on the voice harmonic signals.
According to the formula, the extracted nth-order harmonic component of the speech signal of its corresponding sliding window fourier can be expressed as:
Figure BDA0003154095140000055
in the formula, NnewRepresents the latest sampling point, un(k τ) represents the sample data at time k. A. thenRepresents the n-th harmonic cosine coefficient, BnRepresenting the sine coefficients of the nth harmonic.
The sine factor and cosine factor can be written as follows:
Figure BDA0003154095140000061
Figure BDA0003154095140000062
in a period, every time data updating is carried out, the obtained new iteration value needs to be placed in the storage space of the old iteration value again, and the method has strong practicability in the situation of higher real-time requirement in the harmonic wave analysis process. Therefore, the device is applied to real-time voice monitoring and correction.
The step of S5 further includes:
the part for comparing the actual pronunciation of the user with the harmonic wave of the voice library;
comparing the pitch tone intensity of each word of the user pronunciation with a standard voice library based on a residual error theory, wherein the residual error sequence expression comprises the following steps:
Figure BDA0003154095140000063
wherein the content of the first and second substances,
Figure BDA0003154095140000064
actual output for the user; and y is the output of the standard voice library.
Mean square sum of residual sequences:
Figure BDA0003154095140000065
where N is the number of samples.
The discriminant for discriminating whether the pronunciation needs to continue learning the threshold detection according to whether the residual value exceeds the target learning set threshold is as follows:
Figure BDA0003154095140000066
wherein epsilon0In order to detect the threshold, the threshold is a decisive factor for judging whether the system has a fault, and if the residual value exceeds the threshold, the learning is required to be continued.
The step of S1 further includes:
the user inputs a target learning language and a learning stage according to the self learning condition, and the system sets different fault-tolerant thresholds corresponding to different learning stages, namely the fault-tolerant threshold of the user with low spoken language level requirement is larger;
the step of S2 further includes:
the language library comprises standard pronunciations of various countries and dialects of various regions, and a user can select a target learning language according to different requirements.
Compared with the prior art, the invention has the beneficial effects that: the method comprises the steps of converting collected voice signals into digital signals after voice recognition processing, transmitting the digital signals to a controller, carrying out harmonic extraction analysis by adopting a sliding window discrete Fourier transform method, displaying voice signal waveforms (namely corresponding tone intensity, tone length and tone pitch) in real time, comparing corresponding statement waveforms in a language library based on a residual error theory, and judging whether pronunciation is correct or not. The equipment is convenient to carry, and can meet the requirement that non-native language learners and users with nonstandard pronunciation can solve the problem of language pronunciation according to the requirements.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the present invention will be described in detail with reference to the accompanying drawings and detailed embodiments, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise. Wherein:
FIG. 1 is a block diagram of a portable self-feedback language learning system according to an embodiment of the present invention;
FIG. 2 is a three-dimensional schematic diagram of a portable self-feedback language learning system according to an embodiment of the present invention;
FIG. 3 is a flow chart of a portable self-feedback language learning system according to an embodiment of the present invention;
FIG. 4 is a time domain signal digital processing flow chart of a portable self-feedback language learning system according to an embodiment of the present invention;
fig. 5 is a schematic diagram of a sliding window discrete fourier transform data iteration process of a portable self-feedback language learning system according to an embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described herein, and it will be apparent to those of ordinary skill in the art that the present invention may be practiced without departing from the spirit and scope of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.
Next, the present invention will be described in detail with reference to the drawings, wherein for convenience of illustration, the cross-sectional view of the device structure is not enlarged partially according to the general scale, and the drawings are only examples, which should not limit the scope of the present invention. In addition, the three-dimensional dimensions of length, width and depth should be included in the actual fabrication.
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
Example 1
The structural schematic diagram of the invention is shown in fig. 1, and the structural characteristics comprise a display module, a control module, a voice library module and a language transmission module. Each module is described below;
the display module: the system is used for realizing information interaction between a user and the system, wherein the interactive information comprises a learning language selection target learning statement, a learning stage selection and a language analysis report;
the display module is connected with display equipment (including an industrial display screen, a mobile phone, an iPad, a computer and the like) in a USB interface or Bluetooth connection mode and can realize human-computer information interaction by matching with corresponding apps;
the control module: the system is used for receiving the requirement input by the display module, outputting the waveform of the tone intensity and tone pitch by the digital quantity signal input by the language transmission module through a sliding window discrete Fourier transform method and comparing the waveform with the waveform of the corpus, and is provided with a USB interface to realize the transmission with an upper computer;
the language transmission module: the voice broadcasting system comprises a voice broadcasting module, a language conversion module and a microphone module, wherein the microphone module is used for collecting voice signals, the voice conversion module is used for carrying out A/D conversion on the collected voice signals into digital signals and sending the digital signals to a controller module, and the voice broadcasting module is used for broadcasting correct pronunciation of a target statement;
the voice library module comprises a multinational language library, standard pronunciations such as pitch, tone intensity and tone color of each sentence, pronunciation skills of words and example analysis;
fig. 3 is a flowchart of a portable self-feedback language learning system provided in an embodiment, which is divided into 7 steps, and includes:
step1, selecting a learning language by a user, a target learning stage and related sentences;
the user inputs a target learning language and a learning stage according to the self learning condition, and the system sets different fault-tolerant thresholds corresponding to different learning stages, namely the fault-tolerant threshold of the user with low spoken language level requirement is larger;
step2, matching target sentences in the voice information base, outputting the target sentences to the voice broadcasting module through the voice recognition module of the control module, and performing demonstration teaching;
step3, collecting the pronunciation content of the user, the voice conversion module A/D converts the collected voice signal into digital signal and sends the digital signal to the controller module, and the time domain signal digital processing flow of the conversion process is shown in figure 3;
step4, performing harmonic extraction analysis on the phonetic learning pronunciation content according to pitch, tone and tone intensity by using a sliding window discrete Fourier transform method;
FIG. 4 is a schematic diagram of an iterative process of sliding window discrete Fourier transform data of a portable self-feedback language learning system provided by an embodiment
The method can be represented by an infinite series of sine and cosine functions for any periodic function, namely:
Figure BDA0003154095140000091
according to the euler formula:
Figure BDA0003154095140000101
wherein, c0Is a constant direct current constant; omega1An angular frequency representing a fundamental frequency of the periodic function;
Figure BDA0003154095140000102
representing the initial phase of each harmonic; mnRepresenting the amplitude of each level of trigonometric function, wherein n is more than or equal to 2 and is each harmonic amplitude, and when n is 1, the amplitude is the fundamental wave amplitude; a. thenCosine coefficients representing the nth harmonic; b isnRepresenting the sine coefficients of the nth harmonic. Parity according to trigonometric functionsAlternatively, the equation may be equivalent to:
Figure BDA0003154095140000103
the above equation is a complex exponential form of a fourier series expansion, according to the definition of the fourier transform. As the upgrade of the fourier series, the fourier transform can realize the conversion of the signal in the time domain and the frequency domain, and can decompose the signal through time-frequency conversion to convert a continuous periodic component into a discrete spectral component in the frequency domain. Mathematically, for satisfying u (t) e L2(R) a continuous-time signal u (t) whose continuous fourier transform can be defined as:
Figure BDA0003154095140000104
the inverse fourier transform of X (ω) is:
Figure BDA0003154095140000105
when the signal analysis of the voice correction system is actually applied, the period of a periodic voice signal u (T) with any limited bandwidth is set as T according to the Fourier series principle, and the bandwidth of a frequency band is from fundamental wave angular frequency omega to Nmaxω, its fourier transform expression is:
Figure BDA0003154095140000106
the amplitude Mn and the initial phase angle of the subharmonic signal can be obtained according to the sine coefficient An and the cosine coefficient Bn of the subharmonic signal
Figure BDA0003154095140000111
Information:
Figure BDA0003154095140000112
Figure BDA0003154095140000113
according to the expression of Fourier transform, the spectrum analysis of the speech harmonic signal can be realized through the Fourier transform. In practice, the fourier series method is mostly implemented by a digital processing method, i.e., discrete fourier transform. For the discrete fourier transform, its arguments in both the time and frequency domains are discrete. For a finite long voice signal u (N) which is processed into a discrete time domain signal through sampling and A/D conversion, a sampling time window is formed by taking N sampling data as a group, and discrete Fourier transform is performed, namely:
Figure BDA0003154095140000114
the step of S4 further includes:
the effect of the discrete fourier transform is in fact to discretize a finite-length sequence in the frequency domain;
for a harmonic signal u (T) of the voice correction system, the period is T, and the corresponding discrete Fourier transform expression is as follows:
Figure BDA0003154095140000115
Figure BDA0003154095140000116
wherein k is 0,1, 2.., N-1;
Figure BDA0003154095140000117
the step of S4 further includes:
aiming at the requirement of high real-time performance of a voice correction system, a sliding window discrete Fourier transform algorithm is adopted;
and carrying out iterative updating on a sampling time window formed by the N sampling data, and adding new real-time sampling data to replace the original part to carry out analysis and detection on the voice harmonic signals.
According to the formula, the extracted nth-order harmonic component of the speech signal of its corresponding sliding window fourier can be expressed as:
Figure BDA0003154095140000121
in the formula, NnewRepresents the latest sampling point, un(k τ) represents the sample data at time k. A. thenRepresents the n-th harmonic cosine coefficient, BnRepresenting the sine coefficients of the nth harmonic. The corresponding nth harmonic information is the same as the formula.
The sine factor and cosine factor can be written as follows:
Figure BDA0003154095140000122
Figure BDA0003154095140000123
in a period, every time data updating is carried out, the obtained new iteration value needs to be placed in the storage space of the old iteration value again, and the method has strong practicability in the situation of higher real-time requirement in the harmonic wave analysis process. Therefore, the device is applied to real-time voice monitoring and correction.
Comparing the extracted pitch, tone and tone intensity of the user with the standard pronunciation of the information base, and outputting the duration, pitch and tone intensity by the base and residual error theory respectively at Step5 shown in FIG. 1;
comparing the pitch tone intensity of each word of the user pronunciation with a standard voice library based on a residual error theory, wherein the residual error sequence expression comprises the following steps:
Figure BDA0003154095140000124
wherein the content of the first and second substances,
Figure BDA0003154095140000125
actual output for the user; and y is the output of the standard voice library.
Mean square sum of residual sequences:
Figure BDA0003154095140000126
where N is the number of samples.
Step6, generating a report based on the residual error of the S5 part, comparing the pronunciation index of each part with the residual error threshold value set in the current learning stage, and defining the part of the residual error exceeding the set threshold value as the part of the pronunciation unnormality to be displayed in a display module;
the discriminant for discriminating whether the pronunciation needs to continue learning the threshold detection according to whether the residual value exceeds the target learning set threshold is as follows:
Figure BDA0003154095140000131
wherein epsilon0In order to detect the threshold, the threshold is a decisive factor for judging whether the system has a fault, and if the residual value exceeds the threshold, the learning is required to be continued.
Step7, transmitting the part with the abnormal pronunciation in the standard voice library, the standard audio, the pronunciation method and the example teaching to the user in the display module and the voice broadcast module.
While the invention has been described above with reference to an embodiment, various modifications may be made and equivalents may be substituted for elements thereof without departing from the scope of the invention. In particular, the various features of the disclosed embodiments of the invention may be used in any combination, provided that no structural conflict exists, and the combinations are not exhaustively described in this specification merely for the sake of brevity and resource conservation. Therefore, it is intended that the invention not be limited to the particular embodiments disclosed, but that the invention will include all embodiments falling within the scope of the appended claims.

Claims (6)

1. A portable real-time feedback language learning system, characterized by: the voice recognition system comprises a display module, a control module, a voice library module and a language transmission module;
the display module: the system is used for realizing information interaction between a user and the system, is connected with display equipment in a USB interface or Bluetooth connection mode, and can realize man-machine information interaction by matching with corresponding software, wherein the interactive information comprises learning language selection, target learning sentences, learning stage selection and language analysis reports;
the control module: the method comprises the steps of carrying out control file transmission with an upper computer through a USB interface, executing the input requirement of a display module, comparing a tone intensity tone pitch output waveform of a digital quantity signal input by a language transmission module with a corpus standard waveform by a sliding window discrete Fourier transform method by using a residual error theory, and outputting the waveform to the display module;
the language transmission module: the voice conversion module is used for A/D conversion of the collected voice signals into digital signals and sending the digital signals to the controller module, and the controller outputs voice to play correct pronunciation of a target statement through voice;
the voice library module: the universal language library is connected with an upper computer through a usb module for downloading, and comprises standard pronunciations such as pitch, tone intensity and tone of standard languages of a multi-national language library, corresponding different learning stages, pronunciation skills of the languages and example analysis.
2. A portable real-time feedback language learning system according to claim 1, wherein: the system execution comprises the following steps:
step 1: selecting a learning language, a target learning stage and related sentences by a user;
step 2: matching target sentences in a voice information base, outputting the target sentences to a voice broadcasting module through a voice recognition module of a control module, and performing demonstration teaching;
and step 3: collecting the pronunciation content of the user, and inputting the pronunciation content of the user to the control module in a digital quantity form through the language transmission module;
and 4, step 4: carrying out harmonic extraction analysis on the phonetic learning pronunciation content according to pitch, tone and tone intensity by using a sliding window discrete Fourier transform method in a control module;
and 5: comparing the extracted pitch, tone and tone intensity of the user with the standard pronunciation corresponding to the language library, and displaying the residual errors of tone length, pitch and tone intensity by the base and residual error theory respectively;
step 6: generating a report based on the residual error of the S5 part, comparing each part of pronunciation index with the residual error threshold value set in the current learning stage, and defining the part of the residual error exceeding the set threshold value as the part of pronunciation unnormality to be displayed in a display module;
and 7: and transmitting the part with the irregular pronunciation to a user in a standard voice database, wherein the part with the irregular pronunciation is a standard voice frequency, a pronunciation method and an example teaching in a display module and a voice broadcast module.
3. A portable real-time feedback language learning system according to claim 1, wherein: the method comprises the steps of converting collected voice signals into digital signals after voice recognition processing, transmitting the digital signals to a controller, extracting and analyzing harmonic waves by adopting a sliding window discrete Fourier transform method, and displaying voice signal waveforms, namely corresponding tone intensity, tone length and tone pitch, in a display module in real time.
4. A portable real-time feedback language learning system according to claim 3, wherein: the method comprises the steps of utilizing the sensitivity of a sliding window discrete Fourier transform algorithm to frequency fluctuation, carrying out discretization processing on collected voice signals to be divided into n signal segments, then carrying out sliding window processing, taking out the signal segments to carry out Fourier transform so as to observe frequency components of signals in the segments, and realizing the reconstruction of amplitude and phase information of harmonic signals.
5. A portable real-time feedback language learning system according to claim 2, wherein: a harmonic comparison part for comparing the pitch tone intensity of each word of the user pronunciation with the standard voice library based on the residual error theory, wherein the residual error sequence expression comprises:
Figure FDA0003154095130000021
in the formula (I), the compound is shown in the specification,
Figure FDA0003154095130000022
actual output for the user; y is the output of the standard speech library
Mean square sum of residual sequences:
Figure FDA0003154095130000031
where N is the number of samples
The discriminant for discriminating whether the pronunciation needs to continue learning the threshold detection according to whether the residual value exceeds the target learning set threshold is as follows:
Figure FDA0003154095130000032
wherein epsilon0In order to detect the threshold, the threshold is a decisive factor for judging whether the system has a fault, and if the residual value exceeds the threshold, the learning is required to be continued.
6. A portable real-time feedback language learning system according to claim 5, wherein: the detection threshold value sets different fault tolerance threshold values aiming at different learning stages input by a user, and the range of the residual error contrast threshold value is different correspondingly.
CN202110774465.2A 2021-07-08 2021-07-08 Portable real-time feedback language learning system Pending CN113506572A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110774465.2A CN113506572A (en) 2021-07-08 2021-07-08 Portable real-time feedback language learning system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110774465.2A CN113506572A (en) 2021-07-08 2021-07-08 Portable real-time feedback language learning system

Publications (1)

Publication Number Publication Date
CN113506572A true CN113506572A (en) 2021-10-15

Family

ID=78012276

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110774465.2A Pending CN113506572A (en) 2021-07-08 2021-07-08 Portable real-time feedback language learning system

Country Status (1)

Country Link
CN (1) CN113506572A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116453543A (en) * 2023-03-31 2023-07-18 华南师范大学 Teaching language specification analysis method and system based on voice recognition

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0135864A2 (en) * 1983-09-28 1985-04-03 International Business Machines Corporation System and method for automatically testing integrated circuit memory arrays on different memory array testers
US20060084047A1 (en) * 2004-10-20 2006-04-20 Inventec Corporation System and method of segmented language learning
CN102394015A (en) * 2011-03-25 2012-03-28 黄进明 Speech learning machine and player with bilingual model
US20160019379A1 (en) * 2014-07-21 2016-01-21 Green Grade Solutions Ltd. E-Learning Utilizing Remote Proctoring and Analytical Metrics Captured During Training and Testing
CN106205634A (en) * 2016-07-14 2016-12-07 东北电力大学 A kind of spoken English in college level study and test system and method
CN107316638A (en) * 2017-06-28 2017-11-03 北京粉笔未来科技有限公司 A kind of poem recites evaluating method and system, a kind of terminal and storage medium
CN107818164A (en) * 2017-11-02 2018-03-20 东北师范大学 A kind of intelligent answer method and its system
CN109545244A (en) * 2019-01-29 2019-03-29 北京猎户星空科技有限公司 Speech evaluating method, device, electronic equipment and storage medium
EP3503074A1 (en) * 2016-08-17 2019-06-26 Kainuma, Ken-ichi Language learning system and language learning program
CN111899576A (en) * 2020-07-23 2020-11-06 腾讯科技(深圳)有限公司 Control method and device for pronunciation test application, storage medium and electronic equipment
CN112530459A (en) * 2020-11-27 2021-03-19 珠海读书郎网络教育有限公司 Mouth shape correction method and mouth shape correction system
CN112599115A (en) * 2020-11-19 2021-04-02 上海电机学院 Spoken language evaluation system and method thereof

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0135864A2 (en) * 1983-09-28 1985-04-03 International Business Machines Corporation System and method for automatically testing integrated circuit memory arrays on different memory array testers
US20060084047A1 (en) * 2004-10-20 2006-04-20 Inventec Corporation System and method of segmented language learning
CN102394015A (en) * 2011-03-25 2012-03-28 黄进明 Speech learning machine and player with bilingual model
US20160019379A1 (en) * 2014-07-21 2016-01-21 Green Grade Solutions Ltd. E-Learning Utilizing Remote Proctoring and Analytical Metrics Captured During Training and Testing
CN106205634A (en) * 2016-07-14 2016-12-07 东北电力大学 A kind of spoken English in college level study and test system and method
EP3503074A1 (en) * 2016-08-17 2019-06-26 Kainuma, Ken-ichi Language learning system and language learning program
CN107316638A (en) * 2017-06-28 2017-11-03 北京粉笔未来科技有限公司 A kind of poem recites evaluating method and system, a kind of terminal and storage medium
CN107818164A (en) * 2017-11-02 2018-03-20 东北师范大学 A kind of intelligent answer method and its system
CN109545244A (en) * 2019-01-29 2019-03-29 北京猎户星空科技有限公司 Speech evaluating method, device, electronic equipment and storage medium
CN111899576A (en) * 2020-07-23 2020-11-06 腾讯科技(深圳)有限公司 Control method and device for pronunciation test application, storage medium and electronic equipment
CN112599115A (en) * 2020-11-19 2021-04-02 上海电机学院 Spoken language evaluation system and method thereof
CN112530459A (en) * 2020-11-27 2021-03-19 珠海读书郎网络教育有限公司 Mouth shape correction method and mouth shape correction system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
LUIZA OROSANU,ET AL.: "Combining criteria for the detection of incorrect entries of non-native speech in the context of foreign language learning", 《2012 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT)》 *
刘育雁: "来华预科生"然后"的习得情况研究", 《长春教育学院学报》 *
罗刚峰等: "《基于序列匹配的英语口语测试自动评分系统设计》", 《自动化与仪器仪表》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116453543A (en) * 2023-03-31 2023-07-18 华南师范大学 Teaching language specification analysis method and system based on voice recognition

Similar Documents

Publication Publication Date Title
CN109754778B (en) Text speech synthesis method and device and computer equipment
CN109523989B (en) Speech synthesis method, speech synthesis device, storage medium, and electronic apparatus
JP6312942B2 (en) Language model generation apparatus, language model generation method and program thereof
US20230317055A1 (en) Method, apparatus, storage medium and electronic device for speech synthesis
CN109256152A (en) Speech assessment method and device, electronic equipment, storage medium
KR20170041105A (en) Apparatus and method for calculating acoustic score in speech recognition, apparatus and method for learning acoustic model
KR20170034227A (en) Apparatus and method for speech recognition, apparatus and method for learning transformation parameter
CN111312209A (en) Text-to-speech conversion processing method and device and electronic equipment
CN111354343B (en) Voice wake-up model generation method and device and electronic equipment
CN111563390B (en) Text generation method and device and electronic equipment
CN110797010A (en) Question-answer scoring method, device, equipment and storage medium based on artificial intelligence
CN111489735B (en) Voice recognition model training method and device
EP3929768A1 (en) Method and apparatus for generating triple sample, electronic device and computer storage medium
CN108597538B (en) Evaluation method and system of speech synthesis system
CN112217947B (en) Method, system, equipment and storage medium for transcribing text by customer service telephone voice
US20220358956A1 (en) Audio onset detection method and apparatus
CN112349289B (en) Voice recognition method, device, equipment and storage medium
CN111339758A (en) Text error correction method and system based on deep learning model
CN114330371A (en) Session intention identification method and device based on prompt learning and electronic equipment
CN113506572A (en) Portable real-time feedback language learning system
JP2021179590A (en) Accent detection method, device and non-temporary storage medium
CN112309409A (en) Audio correction method and related device
CN112346696A (en) Speech comparison of virtual assistants
EP3822813A1 (en) Similarity processing method, apparatus, server and storage medium
CN1308908C (en) Transformation from characters to sound for synthesizing text paragraph pronunciation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination