CN113506572A - Portable real-time feedback language learning system - Google Patents
Portable real-time feedback language learning system Download PDFInfo
- Publication number
- CN113506572A CN113506572A CN202110774465.2A CN202110774465A CN113506572A CN 113506572 A CN113506572 A CN 113506572A CN 202110774465 A CN202110774465 A CN 202110774465A CN 113506572 A CN113506572 A CN 113506572A
- Authority
- CN
- China
- Prior art keywords
- voice
- language
- module
- pronunciation
- learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B5/00—Electrically-operated educational appliances
- G09B5/04—Electrically-operated educational appliances with audible presentation of the material to be studied
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/45—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/90—Pitch determination of speech signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/90—Pitch determination of speech signals
- G10L2025/906—Pitch tracking
Abstract
The invention discloses a portable real-time feedback language learning system, and belongs to the technical field of language learning. The equipment can intelligently analyze whether the pronunciation of the user needs to be corrected according to the learning target and the application occasion of the user, and can conduct teaching guidance. The invention comprises a display module, a voice transmission module, a control module and a language library module; the device converts collected voice signals into digital signals after voice recognition processing, transmits the digital signals to the controller, adopts a sliding window discrete Fourier transform method to carry out harmonic extraction analysis, displays voice signal waveforms (corresponding tone intensity, tone length and tone pitch) in real time, compares corresponding statement waveforms in a language library based on a residual error theory, and judges whether the pronunciation is correct or not. The equipment is convenient to carry, and can meet the requirement that non-native language learners and users with nonstandard pronunciation can solve the problem of language pronunciation according to the requirements.
Description
Technical Field
The invention relates to the technical field of language learning, in particular to a portable real-time feedback language learning system.
Background
As the cooperation degree of each country in the world is continuously deepened, the influence on communication due to language obstruction is always a difficult problem. Most of current language learning depends on teaching of teachers engaged in language work and occupation, the autonomous learning method is few and has many limitations, and if no guidance of related professionals exists, errors are easy to occur in language learning.
The current intelligent learning software has a single learning spoken language pronunciation error correction mode and cannot be intelligently learned by users according to different language learning requirements, so that a portable real-time feedback voice learning system is provided aiming at the defects of the existing language learning method and application.
Disclosure of Invention
This section is for the purpose of summarizing some aspects of embodiments of the invention and to briefly introduce some preferred embodiments. In this section, as well as in the abstract and the title of the invention of this application, simplifications or omissions may be made to avoid obscuring the purpose of the section, the abstract and the title, and such simplifications or omissions are not intended to limit the scope of the invention.
The present invention has been made in view of the problems occurring in the existing language learning system.
Therefore, an object of the present invention is to provide a portable real-time feedback language learning system, which can implement a language learning function in a portable use situation, can feed back a pronunciation part that a user needs to improve in time, and can provide correct pronunciation teaching.
To solve the above technical problem, according to an aspect of the present invention, the present invention provides the following technical solutions:
a portable real-time feedback language learning system comprises a display module, a communication module, a control module, a voice library module and a language transmission module;
the display module: the system is used for realizing information interaction between a user and the system, wherein the interactive information comprises a learning language selection target learning statement, a learning stage selection and a language analysis report;
the display module is connected with display equipment (including an industrial display screen, a mobile phone, an iPad, a computer and the like) in a USB interface or Bluetooth connection mode and can realize human-computer information interaction by matching with corresponding apps;
the control module: the system is used for receiving the requirement input by the display module, outputting the waveform of the tone intensity and tone pitch by the digital quantity signal input by the language transmission module through a sliding window discrete Fourier transform method and comparing the waveform with the waveform of the corpus, and is provided with a USB interface to realize the transmission with an upper computer;
the language transmission module: the voice broadcasting system comprises a voice broadcasting module, a language conversion module and a microphone module, wherein the microphone module is used for collecting voice signals, the voice conversion module is used for carrying out A/D conversion on the collected voice signals into digital signals and sending the digital signals to a controller module, and the voice broadcasting module is used for broadcasting correct pronunciation of a target statement;
the voice library module comprises a multinational language library, standard pronunciations such as pitch, tone intensity and tone color of each sentence, pronunciation skills of words and example analysis;
step1, a user selects a learning language, a target learning stage and related sentences;
step2, matching target sentences in a voice information base, outputting the target sentences to a voice broadcasting module through a voice recognition module of a control module, and performing demonstration teaching;
step3, collecting the pronunciation content of the user, and inputting the pronunciation content of the user to the control module through the language recognition module by using a digital quantity signal;
step4, performing harmonic extraction analysis on the voice learning pronunciation content according to pitch, tone and tone intensity by using a sliding window discrete Fourier transform method in the control module;
step5, comparing the extracted pitch, tone color and tone intensity of the user with the standard pronunciation of the information base, and outputting the tone length, the pitch and the tone intensity by a base and residual error theory respectively;
step6, generating a report based on the residual error of the S5 part, comparing the pronunciation index of each part with the residual error threshold value set in the current learning stage, and defining the part of the residual error exceeding the set threshold value as the part of the pronunciation unnormal to be displayed in a display module;
and 7, transmitting the part with the irregular pronunciation to a user in a display module and a voice broadcast module according to the standard audio frequency, the pronunciation method and the example teaching in the standard voice database.
The step of S4 further includes:
a portable feedback type language learning system converts collected voice signals into digital signals after voice recognition processing, transmits the digital signals to a controller, performs harmonic extraction analysis by adopting a sliding window discrete Fourier transform method, and displays voice signal waveforms (namely corresponding tone intensity, tone length and tone pitch) in a display module in real time.
The step of S4 further includes:
the method comprises the steps of utilizing the sensitivity of a sliding window discrete Fourier transform algorithm to frequency fluctuation, carrying out discretization processing on collected voice signals to be divided into n signal segments, then carrying out sliding window processing, taking out the signal segments to carry out Fourier transform so as to observe frequency components of signals in the segments, and realizing the reconstruction of amplitude and phase information of harmonic signals.
The step of S4 further includes:
the method can be represented by an infinite series of sine and cosine functions for any periodic function, namely:
according to the euler formula:
wherein, c0Is a constant direct current constant; omega1An angular frequency representing a fundamental frequency of the periodic function;representing the initial phase of each harmonic; mnRepresenting the amplitude of each level of trigonometric function, wherein n is more than or equal to 2 and is each harmonic amplitude, and when n is 1, the amplitude is the fundamental wave amplitude; a. thenCosine coefficients representing the nth harmonic; b isnRepresenting the sine coefficients of the nth harmonic.
From the parity of the trigonometric function, the equation can be equivalent to:
the above equation is a complex exponential form of a fourier series expansion, according to the definition of the fourier transform. As the upgrade of the fourier series, the fourier transform can realize the conversion of the signal in the time domain and the frequency domain, and can decompose the signal through time-frequency conversion to convert a continuous periodic component into a discrete spectral component in the frequency domain. Mathematically, for satisfying u (t) e L2(R) a continuous-time signal u (t) whose continuous fourier transform can be defined as:
the inverse fourier transform of X (ω) is:
when the signal analysis of the voice correction system is actually applied, the period of a periodic voice signal u (T) with any limited bandwidth is set as T according to the Fourier series principle, and the bandwidth of a frequency band is from fundamental wave angular frequency omega to Nmaxω, its fourier transform expression is:
the amplitude Mn and the initial phase angle of the subharmonic signal can be obtained according to the sine coefficient An and the cosine coefficient Bn of the subharmonic signalInformation:
according to the expression of Fourier transform, the spectrum analysis of the speech harmonic signal can be realized through the Fourier transform. In practice, the fourier series method is mostly implemented by a digital processing method, i.e., discrete fourier transform. For the discrete fourier transform, its arguments in both the time and frequency domains are discrete. For a finite long voice signal u (N) which is processed into a discrete time domain signal through sampling and A/D conversion, a sampling time window is formed by taking N sampling data as a group, and discrete Fourier transform is performed, namely:
the step of S4 further includes:
the effect of the discrete fourier transform is in fact to discretize a finite-length sequence in the frequency domain;
for a harmonic signal u (T) of the voice correction system, the period is T, and a discrete Fourier transform expression corresponding to the formula is as follows:
the step of S4 further includes:
aiming at the requirement of high real-time performance of a voice correction system, a sliding window discrete Fourier transform algorithm is adopted;
and carrying out iterative updating on a sampling time window formed by the N sampling data, and adding new real-time sampling data to replace the original part to carry out analysis and detection on the voice harmonic signals.
According to the formula, the extracted nth-order harmonic component of the speech signal of its corresponding sliding window fourier can be expressed as:
in the formula, NnewRepresents the latest sampling point, un(k τ) represents the sample data at time k. A. thenRepresents the n-th harmonic cosine coefficient, BnRepresenting the sine coefficients of the nth harmonic.
The sine factor and cosine factor can be written as follows:
in a period, every time data updating is carried out, the obtained new iteration value needs to be placed in the storage space of the old iteration value again, and the method has strong practicability in the situation of higher real-time requirement in the harmonic wave analysis process. Therefore, the device is applied to real-time voice monitoring and correction.
The step of S5 further includes:
the part for comparing the actual pronunciation of the user with the harmonic wave of the voice library;
comparing the pitch tone intensity of each word of the user pronunciation with a standard voice library based on a residual error theory, wherein the residual error sequence expression comprises the following steps:
wherein the content of the first and second substances,actual output for the user; and y is the output of the standard voice library.
Mean square sum of residual sequences:
where N is the number of samples.
The discriminant for discriminating whether the pronunciation needs to continue learning the threshold detection according to whether the residual value exceeds the target learning set threshold is as follows:
wherein epsilon0In order to detect the threshold, the threshold is a decisive factor for judging whether the system has a fault, and if the residual value exceeds the threshold, the learning is required to be continued.
The step of S1 further includes:
the user inputs a target learning language and a learning stage according to the self learning condition, and the system sets different fault-tolerant thresholds corresponding to different learning stages, namely the fault-tolerant threshold of the user with low spoken language level requirement is larger;
the step of S2 further includes:
the language library comprises standard pronunciations of various countries and dialects of various regions, and a user can select a target learning language according to different requirements.
Compared with the prior art, the invention has the beneficial effects that: the method comprises the steps of converting collected voice signals into digital signals after voice recognition processing, transmitting the digital signals to a controller, carrying out harmonic extraction analysis by adopting a sliding window discrete Fourier transform method, displaying voice signal waveforms (namely corresponding tone intensity, tone length and tone pitch) in real time, comparing corresponding statement waveforms in a language library based on a residual error theory, and judging whether pronunciation is correct or not. The equipment is convenient to carry, and can meet the requirement that non-native language learners and users with nonstandard pronunciation can solve the problem of language pronunciation according to the requirements.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the present invention will be described in detail with reference to the accompanying drawings and detailed embodiments, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise. Wherein:
FIG. 1 is a block diagram of a portable self-feedback language learning system according to an embodiment of the present invention;
FIG. 2 is a three-dimensional schematic diagram of a portable self-feedback language learning system according to an embodiment of the present invention;
FIG. 3 is a flow chart of a portable self-feedback language learning system according to an embodiment of the present invention;
FIG. 4 is a time domain signal digital processing flow chart of a portable self-feedback language learning system according to an embodiment of the present invention;
fig. 5 is a schematic diagram of a sliding window discrete fourier transform data iteration process of a portable self-feedback language learning system according to an embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described herein, and it will be apparent to those of ordinary skill in the art that the present invention may be practiced without departing from the spirit and scope of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.
Next, the present invention will be described in detail with reference to the drawings, wherein for convenience of illustration, the cross-sectional view of the device structure is not enlarged partially according to the general scale, and the drawings are only examples, which should not limit the scope of the present invention. In addition, the three-dimensional dimensions of length, width and depth should be included in the actual fabrication.
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
Example 1
The structural schematic diagram of the invention is shown in fig. 1, and the structural characteristics comprise a display module, a control module, a voice library module and a language transmission module. Each module is described below;
the display module: the system is used for realizing information interaction between a user and the system, wherein the interactive information comprises a learning language selection target learning statement, a learning stage selection and a language analysis report;
the display module is connected with display equipment (including an industrial display screen, a mobile phone, an iPad, a computer and the like) in a USB interface or Bluetooth connection mode and can realize human-computer information interaction by matching with corresponding apps;
the control module: the system is used for receiving the requirement input by the display module, outputting the waveform of the tone intensity and tone pitch by the digital quantity signal input by the language transmission module through a sliding window discrete Fourier transform method and comparing the waveform with the waveform of the corpus, and is provided with a USB interface to realize the transmission with an upper computer;
the language transmission module: the voice broadcasting system comprises a voice broadcasting module, a language conversion module and a microphone module, wherein the microphone module is used for collecting voice signals, the voice conversion module is used for carrying out A/D conversion on the collected voice signals into digital signals and sending the digital signals to a controller module, and the voice broadcasting module is used for broadcasting correct pronunciation of a target statement;
the voice library module comprises a multinational language library, standard pronunciations such as pitch, tone intensity and tone color of each sentence, pronunciation skills of words and example analysis;
fig. 3 is a flowchart of a portable self-feedback language learning system provided in an embodiment, which is divided into 7 steps, and includes:
step1, selecting a learning language by a user, a target learning stage and related sentences;
the user inputs a target learning language and a learning stage according to the self learning condition, and the system sets different fault-tolerant thresholds corresponding to different learning stages, namely the fault-tolerant threshold of the user with low spoken language level requirement is larger;
step2, matching target sentences in the voice information base, outputting the target sentences to the voice broadcasting module through the voice recognition module of the control module, and performing demonstration teaching;
step3, collecting the pronunciation content of the user, the voice conversion module A/D converts the collected voice signal into digital signal and sends the digital signal to the controller module, and the time domain signal digital processing flow of the conversion process is shown in figure 3;
step4, performing harmonic extraction analysis on the phonetic learning pronunciation content according to pitch, tone and tone intensity by using a sliding window discrete Fourier transform method;
FIG. 4 is a schematic diagram of an iterative process of sliding window discrete Fourier transform data of a portable self-feedback language learning system provided by an embodiment
The method can be represented by an infinite series of sine and cosine functions for any periodic function, namely:
according to the euler formula:
wherein, c0Is a constant direct current constant; omega1An angular frequency representing a fundamental frequency of the periodic function;representing the initial phase of each harmonic; mnRepresenting the amplitude of each level of trigonometric function, wherein n is more than or equal to 2 and is each harmonic amplitude, and when n is 1, the amplitude is the fundamental wave amplitude; a. thenCosine coefficients representing the nth harmonic; b isnRepresenting the sine coefficients of the nth harmonic. Parity according to trigonometric functionsAlternatively, the equation may be equivalent to:
the above equation is a complex exponential form of a fourier series expansion, according to the definition of the fourier transform. As the upgrade of the fourier series, the fourier transform can realize the conversion of the signal in the time domain and the frequency domain, and can decompose the signal through time-frequency conversion to convert a continuous periodic component into a discrete spectral component in the frequency domain. Mathematically, for satisfying u (t) e L2(R) a continuous-time signal u (t) whose continuous fourier transform can be defined as:
the inverse fourier transform of X (ω) is:
when the signal analysis of the voice correction system is actually applied, the period of a periodic voice signal u (T) with any limited bandwidth is set as T according to the Fourier series principle, and the bandwidth of a frequency band is from fundamental wave angular frequency omega to Nmaxω, its fourier transform expression is:
the amplitude Mn and the initial phase angle of the subharmonic signal can be obtained according to the sine coefficient An and the cosine coefficient Bn of the subharmonic signalInformation:
according to the expression of Fourier transform, the spectrum analysis of the speech harmonic signal can be realized through the Fourier transform. In practice, the fourier series method is mostly implemented by a digital processing method, i.e., discrete fourier transform. For the discrete fourier transform, its arguments in both the time and frequency domains are discrete. For a finite long voice signal u (N) which is processed into a discrete time domain signal through sampling and A/D conversion, a sampling time window is formed by taking N sampling data as a group, and discrete Fourier transform is performed, namely:
the step of S4 further includes:
the effect of the discrete fourier transform is in fact to discretize a finite-length sequence in the frequency domain;
for a harmonic signal u (T) of the voice correction system, the period is T, and the corresponding discrete Fourier transform expression is as follows:
the step of S4 further includes:
aiming at the requirement of high real-time performance of a voice correction system, a sliding window discrete Fourier transform algorithm is adopted;
and carrying out iterative updating on a sampling time window formed by the N sampling data, and adding new real-time sampling data to replace the original part to carry out analysis and detection on the voice harmonic signals.
According to the formula, the extracted nth-order harmonic component of the speech signal of its corresponding sliding window fourier can be expressed as:
in the formula, NnewRepresents the latest sampling point, un(k τ) represents the sample data at time k. A. thenRepresents the n-th harmonic cosine coefficient, BnRepresenting the sine coefficients of the nth harmonic. The corresponding nth harmonic information is the same as the formula.
The sine factor and cosine factor can be written as follows:
in a period, every time data updating is carried out, the obtained new iteration value needs to be placed in the storage space of the old iteration value again, and the method has strong practicability in the situation of higher real-time requirement in the harmonic wave analysis process. Therefore, the device is applied to real-time voice monitoring and correction.
Comparing the extracted pitch, tone and tone intensity of the user with the standard pronunciation of the information base, and outputting the duration, pitch and tone intensity by the base and residual error theory respectively at Step5 shown in FIG. 1;
comparing the pitch tone intensity of each word of the user pronunciation with a standard voice library based on a residual error theory, wherein the residual error sequence expression comprises the following steps:
wherein the content of the first and second substances,actual output for the user; and y is the output of the standard voice library.
Mean square sum of residual sequences:
where N is the number of samples.
Step6, generating a report based on the residual error of the S5 part, comparing the pronunciation index of each part with the residual error threshold value set in the current learning stage, and defining the part of the residual error exceeding the set threshold value as the part of the pronunciation unnormality to be displayed in a display module;
the discriminant for discriminating whether the pronunciation needs to continue learning the threshold detection according to whether the residual value exceeds the target learning set threshold is as follows:
wherein epsilon0In order to detect the threshold, the threshold is a decisive factor for judging whether the system has a fault, and if the residual value exceeds the threshold, the learning is required to be continued.
Step7, transmitting the part with the abnormal pronunciation in the standard voice library, the standard audio, the pronunciation method and the example teaching to the user in the display module and the voice broadcast module.
While the invention has been described above with reference to an embodiment, various modifications may be made and equivalents may be substituted for elements thereof without departing from the scope of the invention. In particular, the various features of the disclosed embodiments of the invention may be used in any combination, provided that no structural conflict exists, and the combinations are not exhaustively described in this specification merely for the sake of brevity and resource conservation. Therefore, it is intended that the invention not be limited to the particular embodiments disclosed, but that the invention will include all embodiments falling within the scope of the appended claims.
Claims (6)
1. A portable real-time feedback language learning system, characterized by: the voice recognition system comprises a display module, a control module, a voice library module and a language transmission module;
the display module: the system is used for realizing information interaction between a user and the system, is connected with display equipment in a USB interface or Bluetooth connection mode, and can realize man-machine information interaction by matching with corresponding software, wherein the interactive information comprises learning language selection, target learning sentences, learning stage selection and language analysis reports;
the control module: the method comprises the steps of carrying out control file transmission with an upper computer through a USB interface, executing the input requirement of a display module, comparing a tone intensity tone pitch output waveform of a digital quantity signal input by a language transmission module with a corpus standard waveform by a sliding window discrete Fourier transform method by using a residual error theory, and outputting the waveform to the display module;
the language transmission module: the voice conversion module is used for A/D conversion of the collected voice signals into digital signals and sending the digital signals to the controller module, and the controller outputs voice to play correct pronunciation of a target statement through voice;
the voice library module: the universal language library is connected with an upper computer through a usb module for downloading, and comprises standard pronunciations such as pitch, tone intensity and tone of standard languages of a multi-national language library, corresponding different learning stages, pronunciation skills of the languages and example analysis.
2. A portable real-time feedback language learning system according to claim 1, wherein: the system execution comprises the following steps:
step 1: selecting a learning language, a target learning stage and related sentences by a user;
step 2: matching target sentences in a voice information base, outputting the target sentences to a voice broadcasting module through a voice recognition module of a control module, and performing demonstration teaching;
and step 3: collecting the pronunciation content of the user, and inputting the pronunciation content of the user to the control module in a digital quantity form through the language transmission module;
and 4, step 4: carrying out harmonic extraction analysis on the phonetic learning pronunciation content according to pitch, tone and tone intensity by using a sliding window discrete Fourier transform method in a control module;
and 5: comparing the extracted pitch, tone and tone intensity of the user with the standard pronunciation corresponding to the language library, and displaying the residual errors of tone length, pitch and tone intensity by the base and residual error theory respectively;
step 6: generating a report based on the residual error of the S5 part, comparing each part of pronunciation index with the residual error threshold value set in the current learning stage, and defining the part of the residual error exceeding the set threshold value as the part of pronunciation unnormality to be displayed in a display module;
and 7: and transmitting the part with the irregular pronunciation to a user in a standard voice database, wherein the part with the irregular pronunciation is a standard voice frequency, a pronunciation method and an example teaching in a display module and a voice broadcast module.
3. A portable real-time feedback language learning system according to claim 1, wherein: the method comprises the steps of converting collected voice signals into digital signals after voice recognition processing, transmitting the digital signals to a controller, extracting and analyzing harmonic waves by adopting a sliding window discrete Fourier transform method, and displaying voice signal waveforms, namely corresponding tone intensity, tone length and tone pitch, in a display module in real time.
4. A portable real-time feedback language learning system according to claim 3, wherein: the method comprises the steps of utilizing the sensitivity of a sliding window discrete Fourier transform algorithm to frequency fluctuation, carrying out discretization processing on collected voice signals to be divided into n signal segments, then carrying out sliding window processing, taking out the signal segments to carry out Fourier transform so as to observe frequency components of signals in the segments, and realizing the reconstruction of amplitude and phase information of harmonic signals.
5. A portable real-time feedback language learning system according to claim 2, wherein: a harmonic comparison part for comparing the pitch tone intensity of each word of the user pronunciation with the standard voice library based on the residual error theory, wherein the residual error sequence expression comprises:
in the formula (I), the compound is shown in the specification,actual output for the user; y is the output of the standard speech library
Mean square sum of residual sequences:
where N is the number of samples
The discriminant for discriminating whether the pronunciation needs to continue learning the threshold detection according to whether the residual value exceeds the target learning set threshold is as follows:
wherein epsilon0In order to detect the threshold, the threshold is a decisive factor for judging whether the system has a fault, and if the residual value exceeds the threshold, the learning is required to be continued.
6. A portable real-time feedback language learning system according to claim 5, wherein: the detection threshold value sets different fault tolerance threshold values aiming at different learning stages input by a user, and the range of the residual error contrast threshold value is different correspondingly.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110774465.2A CN113506572A (en) | 2021-07-08 | 2021-07-08 | Portable real-time feedback language learning system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110774465.2A CN113506572A (en) | 2021-07-08 | 2021-07-08 | Portable real-time feedback language learning system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113506572A true CN113506572A (en) | 2021-10-15 |
Family
ID=78012276
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110774465.2A Pending CN113506572A (en) | 2021-07-08 | 2021-07-08 | Portable real-time feedback language learning system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113506572A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116453543A (en) * | 2023-03-31 | 2023-07-18 | 华南师范大学 | Teaching language specification analysis method and system based on voice recognition |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0135864A2 (en) * | 1983-09-28 | 1985-04-03 | International Business Machines Corporation | System and method for automatically testing integrated circuit memory arrays on different memory array testers |
US20060084047A1 (en) * | 2004-10-20 | 2006-04-20 | Inventec Corporation | System and method of segmented language learning |
CN102394015A (en) * | 2011-03-25 | 2012-03-28 | 黄进明 | Speech learning machine and player with bilingual model |
US20160019379A1 (en) * | 2014-07-21 | 2016-01-21 | Green Grade Solutions Ltd. | E-Learning Utilizing Remote Proctoring and Analytical Metrics Captured During Training and Testing |
CN106205634A (en) * | 2016-07-14 | 2016-12-07 | 东北电力大学 | A kind of spoken English in college level study and test system and method |
CN107316638A (en) * | 2017-06-28 | 2017-11-03 | 北京粉笔未来科技有限公司 | A kind of poem recites evaluating method and system, a kind of terminal and storage medium |
CN107818164A (en) * | 2017-11-02 | 2018-03-20 | 东北师范大学 | A kind of intelligent answer method and its system |
CN109545244A (en) * | 2019-01-29 | 2019-03-29 | 北京猎户星空科技有限公司 | Speech evaluating method, device, electronic equipment and storage medium |
EP3503074A1 (en) * | 2016-08-17 | 2019-06-26 | Kainuma, Ken-ichi | Language learning system and language learning program |
CN111899576A (en) * | 2020-07-23 | 2020-11-06 | 腾讯科技(深圳)有限公司 | Control method and device for pronunciation test application, storage medium and electronic equipment |
CN112530459A (en) * | 2020-11-27 | 2021-03-19 | 珠海读书郎网络教育有限公司 | Mouth shape correction method and mouth shape correction system |
CN112599115A (en) * | 2020-11-19 | 2021-04-02 | 上海电机学院 | Spoken language evaluation system and method thereof |
-
2021
- 2021-07-08 CN CN202110774465.2A patent/CN113506572A/en active Pending
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0135864A2 (en) * | 1983-09-28 | 1985-04-03 | International Business Machines Corporation | System and method for automatically testing integrated circuit memory arrays on different memory array testers |
US20060084047A1 (en) * | 2004-10-20 | 2006-04-20 | Inventec Corporation | System and method of segmented language learning |
CN102394015A (en) * | 2011-03-25 | 2012-03-28 | 黄进明 | Speech learning machine and player with bilingual model |
US20160019379A1 (en) * | 2014-07-21 | 2016-01-21 | Green Grade Solutions Ltd. | E-Learning Utilizing Remote Proctoring and Analytical Metrics Captured During Training and Testing |
CN106205634A (en) * | 2016-07-14 | 2016-12-07 | 东北电力大学 | A kind of spoken English in college level study and test system and method |
EP3503074A1 (en) * | 2016-08-17 | 2019-06-26 | Kainuma, Ken-ichi | Language learning system and language learning program |
CN107316638A (en) * | 2017-06-28 | 2017-11-03 | 北京粉笔未来科技有限公司 | A kind of poem recites evaluating method and system, a kind of terminal and storage medium |
CN107818164A (en) * | 2017-11-02 | 2018-03-20 | 东北师范大学 | A kind of intelligent answer method and its system |
CN109545244A (en) * | 2019-01-29 | 2019-03-29 | 北京猎户星空科技有限公司 | Speech evaluating method, device, electronic equipment and storage medium |
CN111899576A (en) * | 2020-07-23 | 2020-11-06 | 腾讯科技(深圳)有限公司 | Control method and device for pronunciation test application, storage medium and electronic equipment |
CN112599115A (en) * | 2020-11-19 | 2021-04-02 | 上海电机学院 | Spoken language evaluation system and method thereof |
CN112530459A (en) * | 2020-11-27 | 2021-03-19 | 珠海读书郎网络教育有限公司 | Mouth shape correction method and mouth shape correction system |
Non-Patent Citations (3)
Title |
---|
LUIZA OROSANU,ET AL.: "Combining criteria for the detection of incorrect entries of non-native speech in the context of foreign language learning", 《2012 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT)》 * |
刘育雁: "来华预科生"然后"的习得情况研究", 《长春教育学院学报》 * |
罗刚峰等: "《基于序列匹配的英语口语测试自动评分系统设计》", 《自动化与仪器仪表》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116453543A (en) * | 2023-03-31 | 2023-07-18 | 华南师范大学 | Teaching language specification analysis method and system based on voice recognition |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109754778B (en) | Text speech synthesis method and device and computer equipment | |
CN109523989B (en) | Speech synthesis method, speech synthesis device, storage medium, and electronic apparatus | |
JP6312942B2 (en) | Language model generation apparatus, language model generation method and program thereof | |
US20230317055A1 (en) | Method, apparatus, storage medium and electronic device for speech synthesis | |
CN109256152A (en) | Speech assessment method and device, electronic equipment, storage medium | |
KR20170041105A (en) | Apparatus and method for calculating acoustic score in speech recognition, apparatus and method for learning acoustic model | |
KR20170034227A (en) | Apparatus and method for speech recognition, apparatus and method for learning transformation parameter | |
CN111312209A (en) | Text-to-speech conversion processing method and device and electronic equipment | |
CN111354343B (en) | Voice wake-up model generation method and device and electronic equipment | |
CN111563390B (en) | Text generation method and device and electronic equipment | |
CN110797010A (en) | Question-answer scoring method, device, equipment and storage medium based on artificial intelligence | |
CN111489735B (en) | Voice recognition model training method and device | |
EP3929768A1 (en) | Method and apparatus for generating triple sample, electronic device and computer storage medium | |
CN108597538B (en) | Evaluation method and system of speech synthesis system | |
CN112217947B (en) | Method, system, equipment and storage medium for transcribing text by customer service telephone voice | |
US20220358956A1 (en) | Audio onset detection method and apparatus | |
CN112349289B (en) | Voice recognition method, device, equipment and storage medium | |
CN111339758A (en) | Text error correction method and system based on deep learning model | |
CN114330371A (en) | Session intention identification method and device based on prompt learning and electronic equipment | |
CN113506572A (en) | Portable real-time feedback language learning system | |
JP2021179590A (en) | Accent detection method, device and non-temporary storage medium | |
CN112309409A (en) | Audio correction method and related device | |
CN112346696A (en) | Speech comparison of virtual assistants | |
EP3822813A1 (en) | Similarity processing method, apparatus, server and storage medium | |
CN1308908C (en) | Transformation from characters to sound for synthesizing text paragraph pronunciation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |