CN112863263B - Korean pronunciation correction system based on big data mining technology - Google Patents

Korean pronunciation correction system based on big data mining technology Download PDF

Info

Publication number
CN112863263B
CN112863263B CN202110060609.8A CN202110060609A CN112863263B CN 112863263 B CN112863263 B CN 112863263B CN 202110060609 A CN202110060609 A CN 202110060609A CN 112863263 B CN112863263 B CN 112863263B
Authority
CN
China
Prior art keywords
pronunciation
tongue
korean
signal
audio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110060609.8A
Other languages
Chinese (zh)
Other versions
CN112863263A (en
Inventor
金清子
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jilin Agricultural Science and Technology College
Original Assignee
Jilin Agricultural Science and Technology College
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jilin Agricultural Science and Technology College filed Critical Jilin Agricultural Science and Technology College
Priority to CN202110060609.8A priority Critical patent/CN112863263B/en
Publication of CN112863263A publication Critical patent/CN112863263A/en
Application granted granted Critical
Publication of CN112863263B publication Critical patent/CN112863263B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B5/00Electrically-operated educational appliances
    • G09B5/04Electrically-operated educational appliances with audible presentation of the material to be studied
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B19/00Teaching not covered by other main groups of this subclass
    • G09B19/06Foreign languages
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/90Pitch determination of speech signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/90Pitch determination of speech signals
    • G10L2025/906Pitch tracking

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention relates to a Korean pronunciation correction system based on big data mining technology, which utilizes a sensor to detect the formant frequency and the position changes of the tongue and the chin in the pronunciation process to determine the chin pronunciation parameters related to the pitch, and also utilizes magnetic resonance imaging and palate electrogram data to capture the three-dimensional sound channel geometrical characteristics of near consonants in the pronunciation process, and guides the dynamic adjustment of the movements of the lower jaw, the tongue and the throat of a learner according to the actual phoneme string and standard pronunciation.

Description

Korean pronunciation correction system based on big data mining technology
Technical Field
The invention relates to the field of language learning, in particular to a Korean pronunciation correction system based on a big data mining technology.
Technical Field
Due to historical reasons, Korean is greatly influenced by Chinese, so that the Korean has many similarities with Chinese, and the similarity brings great convenience for Korean to learn Chinese and brings great negative migration. Although many korean pronunciations and chinese pronunciations are similar in pronunciation and are particularly apparent in korean kanji words, in fact, there are great differences in pronunciation methods and pronunciation positions. This difference makes korean students have many difficult difficulties to overcome when learning chinese, and brings much trouble to korean chinese speech teaching. It is necessary to study the consonant difference problem of Han-Han pronunciation, to discuss the difference of Han-Han consonants, and to discuss the corresponding teaching strategy.
Consonants are the sounds formed by significant obstruction of airflow at the site of pronunciation, called consonants. Consonants in chinese and korean are different in pronunciation method, pronunciation position, and pronunciation strength. The consonant system of Mandarin Chinese and the consonant system of Korean have no correspondence, and some sounds exist in Mandarin Chinese and do not exist in Korean, such as f [ ] f](ii) a There are also some sounds that appear to be pronounced in the same location and manner, but in fact are not as pronounced, such as
Figure BDA0002902200860000011
And g, k; still other sounds exist in korean, but not in chinese, such as the tonic of korean, and the chinese consonant system does not. There are also tight tones in korean, which are distinguished from loose tones by a stronger airflow. Meanwhile, there is a throat sound in the korean consonant system
Figure BDA0002902200860000012
Nasal sound
Figure BDA0002902200860000013
Flashing sound
Figure BDA0002902200860000014
These three sounds do not exist in Chinese, but are also more specific in Korean, the nasal sounds
Figure BDA0002902200860000015
At the beginning of the syllable, the syllable is not pronounced, the larynx is voiced
Figure BDA0002902200860000016
Like the h-tone, the flashing tone
Figure BDA0002902200860000017
The pronunciation method is similar to r sound when receiving sound.
In the learning process, learners often have strong dependence on the native language. Generally speaking, it is common that the learner prefers to learn the second language from the original language, and replaces the target language with the sound similar to that of the original language, or learning the target language by thinking of the original language may cause errors. (1) Similar speech causes bias errors, both Mandarin and Korean are inherently similar, and substitution is more common, as some of the above-described approximations, e.g. by using similar sounds
Figure BDA0002902200860000018
G, k are replaced, thereby causing bias; (2) substitution of pronunciation not present in the native language by native speech, e.g. by laryngeal tone
Figure BDA0002902200860000019
Instead of h, or
Figure BDA00029022008600000110
Pronunciations are substituted for either l or r. (3) The pronunciation change of korean language causes bias. Therefore, learning the mandarin chinese through the ambisonics thinking of the native language also causes bias.
In summary, understanding the relationship between voicing characteristics and acoustic signals is crucial to solving the voicing inversion problem.
Disclosure of Invention
The invention provides a Korean pronunciation correction system based on a big data mining technology, which realizes the detection and automatic correction of Korean spoken pronunciation errors and provides technical support for students to learn Korean.
A Korean pronunciation correction system based on big data mining technology comprises an audio signal acquisition module, a data analysis module, a correction module, a control module, a terminal module and a cloud module, wherein the signal transmission device comprises a vocal cord vibration sensor and an electromagnetic sensor, the electromagnetic sensor is used for capturing the movement of the tongue and the chin in voice recognition, the electromagnetic sensor is a wearable permanent magnetic tracer agent, the movement of the tongue is tracked wirelessly by utilizing a magnetic sensor array, the ultrasonic imaging measurement of the coordinates and the curvature position of the tongue is carried out to represent the tongue in the speaking process, meanwhile, the formant frequency of vowels in a pronunciation model is estimated based on the combination of the lower jaw, the tongue and the throat, the data analysis module optimizes two formants before Korean vowels and consonants, and the specific steps comprise:
s1. for vowels, the first formant is expressed as
Figure BDA0002902200860000021
Its value is inversely proportional to the tongue height h:
Figure BDA0002902200860000022
the second resonance peak, is shown as
Figure BDA0002902200860000023
For vowel production, its value is inversely proportional to the horizontal axis advance of the tongue,/:
Figure BDA0002902200860000024
the mouth is considered as a tubular model and as a resonator, and the model is modified to obtain:
Figure BDA0002902200860000025
Figure BDA0002902200860000026
β1and beta2Is the closest constant value, beta, of the formant response of the provided tongue vowel pronunciation system1、β2E is R, c is the speed of sound, c is 340 m/s;
s2, determining beta1And beta2Value of (a), beta1And beta2The value of (a) is calculated based on the acquired value of the formants of the existing oral system of the experimental value of the permanent magnetic tracer, in order to improve the accuracy, a loss function between the formants of the estimation system and the tongue pronunciation system is calculated, and the loss is calculated by using a mean square error function:
Figure BDA0002902200860000027
calculating partial derivatives of the loss function and updating beta by1And beta2Current value of (a):
Figure BDA0002902200860000031
Figure BDA0002902200860000032
s3, the first formants of the relaxing tone, the tightening tone and the air supply tone are respectively expressed as follows:
Figure BDA0002902200860000033
Figure BDA0002902200860000034
Figure BDA0002902200860000035
the second formants of the relaxing tone, the relaxing tone and the air supply tone are respectively expressed as follows:
Figure BDA0002902200860000036
Figure BDA0002902200860000037
Figure BDA0002902200860000038
in the formula, gamma1、γ2Is the closest constant value of the provided tongue consonant pronunciation system formant response, c is the speed of sound, B is the burst release time, Duration is the Duration of pronunciation;
s4, cascading the simplified oral cavity system based on the tongue with the throat system to provide a calculation formula of the vocal tract system, wherein a transfer function of a resonant peak frequency of the vocal tract system is expressed as V (z)kTransfer function of formant frequencies of the laryngeal system and tongue is expressed as L (z)kAnd
Figure BDA0002902200860000039
Figure BDA00029022008600000310
Figure BDA00029022008600000311
Figure BDA00029022008600000312
A1,A2representing the formant frequencies of the laryngeal and lingual articulatory systems, respectively, T representing the duration of each formant, z representing the bandwidth of the formant, FikThe expression represents that the values of i and k are different
Figure BDA00029022008600000313
Figure BDA00029022008600000314
S5, the correction module acquires the formant frequency and the position changes of the tongue and the chin through a sensor so as to determine a chin pronunciation parameter related to the pitch; and in the process of pronunciation, performing acoustic and electromyogram analysis, capturing the three-dimensional vocal tract geometric characteristics of the near consonant by using magnetic resonance imaging and palatal electrogram data, and guiding the dynamic adjustment of the movements of the lower jaw, tongue and throat of a learner according to the actual phoneme string and standard pronunciation.
Furthermore, the introduction of error-eliminating calculation can effectively perform high-precision spoken language pronunciation correction calculation, firstly, data processing and error calculation are performed, and the process is as follows:
Figure BDA0002902200860000041
in the formula, the error E is an error threshold, H is an extreme value of a vibration trough, C is an effective period law of audio, D is a constant frequency parameter, and PAH is a standard amplitude of Korean voice;
the collected Korean spoken utterances are normalized:
Figure BDA0002902200860000042
in the formula etaEIs a function discrete value in a Korean pronunciation process, n is a weight of the function discrete value, T represents a hop count between two audio nodes, dijRepresenting the shortest distance between audio node i and node jA path;
the pronunciation is corrected as follows:
Vi=RUi(ATS-1)-1
in the formula, ATFor the natural skewness of the audio, it is a parameter for measuring the note, S-1Is the combination of audio attributes, is a function parameter of audio proofreading, R is the lifting weight of the high-grade audio, Ui is the measurement of the audio, ViIs the audio error protection limit.
Further, the vocal cord vibration sensor comprises a voice signal acquisition sensor array, and the frequency domain of the korean voice signal feature detection is v (t, θ), that is:
Figure BDA0002902200860000043
in the formula, ωi(theta) represents an instantaneous time-domain signal weighting vector of the ith pronunciation output of Korean,
Figure BDA0002902200860000044
representing the transient time domain signal component of the Korean pronunciation output, theta is a speech signal parameter, theta represents a conjugation operator, M represents a sensor, and the maximum value of the quantity is M;
and performing time domain matching and filtering on the voice signals by adopting an adaptive beam forming method. The frequency domain characteristics of the output signal are as follows:
V(t,θ)=xH(t)ω(θ)
in the formula, H represents complex conjugate transpose;
the weight vector and components of the instantaneous time-domain signal of the korean speech output can be expressed as:
x(t)=[x1(t),x2(t),…,xM(t)]T
ω(θ)=[ω1(θ),ω2(θ),…,ωM(θ)]T
combining the self-adaptive filtering and blind source separation, decomposing the voice signal to obtain an FM component of Korean voice detection, and outputting the FM component as follows:
Tm(θ)=(m-1)T0(θ);
in the formula, T0(θ) represents the initial FM component. Combining with the signal processing method of the sensor array, a signal model for detecting the pronunciation error of the Korean language is obtained as follows:
Figure BDA0002902200860000051
in the formula, gmTo calculate the coefficients, nm(t) is an auxiliary parameter.
Furthermore, the audio signal acquisition module comprises a signal transmission device, an audio signal modulator, a demodulator and a voice acquisition device.
Furthermore, the audio signal modulator modulates the low-frequency digital signal into a high-frequency digital signal through a digital signal processing technology and transmits the high-frequency digital signal, the audio signal modulator and the demodulator are used in pair and used for adjusting the digital signal into the high-frequency signal to transmit, and the demodulator restores the digital signal into an original signal.
Further, the demodulator restores the low frequency digital signal modulated in the high frequency digital signal.
Furthermore, the control module consists of a program counter, an instruction register, an instruction decoder, a time sequence generator and an operation controller and is used for issuing commands and coordinating and commanding the operation of the whole system.
Further, the terminal module comprises a client UI module and a visualization module, and the client UI module is suitable for collecting terminal user information.
Further, the cloud module comprises a signal receiving module, and the cloud module comprises a korean standard pronunciation and a database of an oral system and a throat system.
In the pronunciation process, the sensor is used for detecting the formant frequency and the position change of the tongue and the chin so as to determine the chin pronunciation parameters related to the pitch. In the process of pronunciation, acoustic and electromyogram analysis is carried out, magnetic resonance imaging and palatal electrogram data are used for capturing the three-dimensional vocal tract geometric characteristics of the near consonants, and dynamic adjustment guidance is carried out on the movements of the lower jaw, the tongue and the throat of a learner according to the actual phoneme string and standard pronunciation.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The Korean pronunciation error correction system is mainly used for recognizing Korean spoken pronunciation, detecting and automatically correcting the Korean spoken pronunciation error. Spoken pronunciation is the first step in learning korean, and is the basis for the entire learning of korean. The primary problem in learning korean is to remember words. The primary task of remembering words is to remember the pronunciation of a word. The hearing can be greatly improved by the correct pronunciation habit of the spoken language. Even if some familiar words are in a sentence, they cannot understand the correct spoken pronunciation of others because of their own unique spoken pronunciation, thus causing difficulties in korean spoken interaction. Accurate korean pronunciation is very important to the hearing of students.
The system hardware architecture is constructed according to the requirements of the Korean spoken language pronunciation error automatic error correction system, and comprises an audio signal acquisition module, a data analysis module, a correction module, a control module, a terminal module and a cloud module.
The audio signal modulator is a device that modulates a low-frequency digital signal into a high-frequency digital signal by a digital signal processing technique and transmits the signal. An audio signal modulator is usually used in combination with a demodulator to convert a digital signal into a high frequency signal for transmission, and a demodulator to restore the digital signal to an original signal. A demodulator is a device that restores a low frequency digital signal modulated in a high frequency digital signal using a digital signal processing technique. The main function of the voice collector is to collect the pronunciation of Korean spoken language. The controller is a main circuit for changing a preset sequence, explaining the wiring and the circuit of the control circuit, controlling the resistance of the die punching, controlling the rotating speed of a motor of the die punching in the die punching and a main device for braking and reversing, and mainly comprises a program counter, an instruction register, an instruction decoder, a time sequence generator and an operation controller; issuing commands, i.e., coordinating and directing the operation of the entire system, is the "decision-making principal".
The traditional spoken language voice correction system adopts a signal processing method to extract the characteristics of a spoken language voice signal and recognize information, compares an extracted voiceprint image with a standard voiceprint, but does not correct the voiceprint image on the basis of a pronunciation mechanism. The invention researches a voice system, and enables a user to sense and detect the muscle movement mode of own vocal organs (including lips, chin, tongue and teeth) in the vocal process through a signal transmission device arranged on a napestrap so as to correct and adjust the vocal. The speech system is used to record the activity of the pronunciation system (including facial muscles), detect the synthesis of speech signals using electromagnetic signals, and determine the acoustic properties of the pronunciation map by describing the pronunciation trajectory of the mandible, lips, tongue body and tongue tip.
The vocal cord vibrating device is located in the larynx and captures sensor signals, which are sent to a control system to detect periodic vibrations associated with the utterance. Meanwhile, the electromagnetic sensor is connected to the face and records the pulse, while the tongue and ear interface is a wearable system that can capture the movements of the tongue and chin for speech recognition.
The tongue's characteristics in terms of vowel production are considered in the present invention to be the primary role in the production of speech through the mouth. Wearable permanent magnetic tracers are fixed on the tongue, the magnetic sensor array is used for tracking the movement of the tongue in a wireless mode, and the wearable system is free of physical invasion. Ultrasonic imaging measurements of the coordinates of the tongue and its curvature location to represent the tongue during speech while the formant frequencies of the vowels in the articulatory model are estimated based on the combination of the mandible, tongue and larynx. The vowel formant frequency values were experimentally counted using recorded voices of ten thousand koreans, which were associated with their tongue curvatures obtained by ultrasonically analyzing the resonance mechanism of the oral vocal tract system. From the relationship between the coordinates of the tongue and the formant frequencies, it is concluded that: the first formant frequency is dependent on the height of the tongue and the second formant is dependent on the length of advance of the horizontal axis of the tongue.
During the pronunciation process, the sensor is used for detecting the formant frequency and the position change of the tongue and the chin so as to determine the chin pronunciation parameters related to the pitch. In the process of pronunciation, acoustic and electromyogram analysis is carried out, and the three-dimensional vocal tract geometric characteristics of the near consonants are captured by utilizing magnetic resonance imaging and palatal electrogram data.
The first formant is inversely proportional to tongue height and the second formant frequency is related to the size of the forehead cavity or the degree of tongue advancement based on the displayed tongue and lip position. And formant frequencies are speaker dependent and vary with gender and age. In the invention, an optimized statistical formula of the vowel formant frequency is provided from the accumulation result of the vowel and is expanded to the consonant, and all researches are based on tongue motion mapping in the pronunciation process of the vowel and the consonant. The tongue basal oral cavity statistical model provided by the invention is associated with the throat model and is compared with the voice generated by the vocal tract model in detail. The algorithm is based on a formant expression and is suitable for vowel and consonant generation of different age groups and sexes.
The invention provides an optimized statistical relationship of two formants in front of a Korean vowel and a consonant, defines an age-and gender-independent speech generation system by using human tongue motion, and associates a tongue pronunciation system with a known throat model.
When the vocal cords suddenly close, a pulse-like excitation in the vibration source causes the glottis to close, and it is at this stage that the subglottal region and the supraglottic region are separated, and therefore the effective length of the vocal tract is reduced, thus producing resonance only for the supraglottic portion. This variation in vocal tract length causes a variation in the dominant resonances of the spectrum, and it is difficult to extract the resonance frequencies and their associated bandwidths accurately, since these frequencies and their associated bandwidths vary continuously due to the variation in vocal tract shape, not only within the pitch period, but also within the pitch period (i.e. from the closed phase to the open phase of the glottis), and therefore the estimation of the resonance bandwidth must be carefully done for short speech segments. When the speech spectrum is decomposed into amplitude and phase components, the prominent resonance locations and the associated bandwidths are called formants. During the vowel sounds, the first two formants of the oral system formants are inversely proportional to tongue height and tongue propulsion, respectively. And performing statistical estimation by mapping tongue direction characteristics by adopting a sound channel synthesizer and a vowel space theory. The vocal tract shape and the quadrangle are displayed in pairs representing each vowel. In vowel space theory, the same pattern is quadrilateral, where the horizontal axis l represents tongue advancement, e.g., anterior, medial, posterior, which describes the tongue being elevated during vowel articulation, and the slope line h represents tongue height, e.g., closed, medial, and open.
A first resonance peak, denoted as
Figure BDA0002902200860000081
For vowel production, its value is inversely proportional to tongue height h:
Figure BDA0002902200860000082
the second resonance peak, is shown as
Figure BDA0002902200860000083
For vowel production, its value is inversely proportional to the horizontal axis advance of the tongue,/:
Figure BDA0002902200860000084
the mouth is considered to be a tubular model and assumed to be a resonator. And correcting the model to obtain:
Figure BDA0002902200860000085
Figure BDA0002902200860000086
β1and beta2Is the closest constant value, beta, of the formant response of the provided tongue vowel pronunciation system1、β2e.R, c is the speed of sound, c 340 m/s.
The next step is to determine beta1And beta2Value of (a), beta1And beta2The value of (a) is calculated based on the acquired value of the formants of the existing oral system of the experimental value of the permanent magnetic tracer, in order to improve the accuracy, a loss function between the formants of the estimation system and the tongue pronunciation system is calculated, and the loss is calculated by using a mean square error function:
Figure BDA0002902200860000087
calculating partial derivatives of the loss function and updating beta by1And beta2The current value of (a).
Figure BDA0002902200860000088
Figure BDA0002902200860000089
The pronunciation produced for a consonant represents the position and movement of the tongue by the relationship between the tongue height h and the horizontal axis advance l of the consonant. In a manner similar to a vowel, a relationship between the tongue height h of the consonant quadrilateral and the horizontal axis advance l of the tongue is established. A statistical formula of the consonant oral cavity formants is obtained by a gradient descent method and optimized. Consonants are described and distinguished by a phonological and modal system, on the basis of which consonants are divided into three distinct groups: loose sound, tight sound, air-feeding sound. From the acoustic properties of consonants, the first and second formants are affected by the size of the constriction, the manner of articulation (tongue height) and bursting (sudden release of air), the position of the tongue, and the voiced or unvoiced sounds and the articulation (tongue forward).
The first formants of the relaxing tone, the relaxing tone and the air supply tone are respectively expressed as follows:
Figure BDA0002902200860000091
Figure BDA0002902200860000092
Figure BDA0002902200860000093
the second formants of the relaxing tone, the relaxing tone and the air supply tone are respectively expressed as follows:
Figure BDA0002902200860000094
Figure BDA0002902200860000095
Figure BDA0002902200860000096
in the formula, gamma1、γ2Is the closest constant value of the provided tongue consonant pronunciation system formant response, c is the speed of sound, B is the burst release time, and Duration is the Duration of pronunciation.
After the formants of the complete set of vowels and consonants are established, the present invention proposes a new method of quantifying speech intelligibility using the above results and indicates that the formants of the first two formants of the tongue pronunciation system are different.
The vocal tract model includes the lung (glottic source) and the larynx, andoral cavity of single conduit. The lungs act as a motive force to provide airflow to the larynx. The larynx regulates the airflow from the lungs and provides a periodic or noisy source of airflow. Thus, the output provides a modulated airflow by spectrally shaping the light source, a calculation formula for the vocal tract system is developed by cascading a simplified tongue-based oral system (tongue articulatory system) with the laryngeal system, the transfer function of the vocal tract system formant frequencies being represented by V (z)kTransfer function of formant frequencies of the laryngeal system and tongue is expressed as L (z)kAnd
Figure BDA0002902200860000097
Figure BDA0002902200860000098
Figure BDA0002902200860000099
Figure BDA00029022008600000910
A1,A2representing the formant frequencies of the laryngeal and lingual articulatory systems, respectively, T representing the duration of each formant, z representing the bandwidth of the formant, FikThe expression represents that the values of i and k are different
Figure BDA00029022008600000911
Figure BDA0002902200860000101
In addition, the bandwidth of the formants obtained by short-time processing can be approximate to the instantaneous bandwidth of each formant, and the formants can be extracted by the instantaneous bandwidth in addition to the amplitude component. The formant bandwidth is determined by decomposing the speech signal through a bank of bandpass filters and then demodulating each band to obtain an amplitude envelope and an instantaneous frequency signal. The bandwidth of formants is then extracted from these instantaneous frequency signals using an energy separation algorithm, the bandwidth values are normalized with respect to the maximum and plotted as histogram curves, and the bandwidth at the dominant resonance frequency of the spectral response is extracted from short segments of speech to highlight the variation in bandwidth in vowel and consonant segments.
The vocal cord vibration sensor comprises a voice signal acquisition sensor array, and the frequency domain of Korean voice signal feature detection is v (t, theta), namely:
Figure BDA0002902200860000102
in the formula, ωi(theta) represents an instantaneous time-domain signal weighting vector of the ith pronunciation output of Korean,
Figure BDA0002902200860000103
represents the instantaneous time domain signal component of the Korean pronunciation output, theta is the speech signal parameter, phi represents the conjugation operator, M represents the sensor, and the maximum value of the number is M.
And performing time domain matching and filtering on the voice signals by adopting an adaptive beam forming method. The frequency domain characteristics of the output signal are as follows:
V(t,θ)=xH(t)ω(θ)
in the formula, H represents a complex conjugate transpose.
The weight vector and components of the instantaneous time-domain signal of the korean speech output can be expressed as:
x(t)=[x1(t),x2(t),…,xM(t)]T
ω(θ)=[ω1(θ),ω2(θ),…,ωM(θ)]T
combining the self-adaptive filtering and blind source separation, decomposing the voice signal to obtain an FM component of Korean voice detection, and outputting the FM component as follows:
Tm(θ)=(m-1)T0(θ)
in the formula, T0(θ) represents the initial FM component. Combining with the signal processing method of the sensor array, a signal model for detecting the pronunciation error of the Korean language is obtained as follows:
Figure BDA0002902200860000104
in the formula, gmTo calculate the coefficients, nm(t) is an auxiliary parameter.
Speech error detection
After the learner pronounces according to the prompt of the system, the system combines the standard pronunciation dictionary and the pronunciation rule to form a phoneme detection network. Meanwhile, the formant frequency and the position change of the tongue and the jaw are obtained through a sensor so as to determine a jaw pronunciation parameter related to the pitch; and in the process of pronunciation, performing acoustic and electromyogram analysis, capturing the three-dimensional vocal tract geometric characteristics of the near consonant by using magnetic resonance imaging and palatal electrogram data, and guiding the dynamic adjustment of the movements of the lower jaw, tongue and throat of a learner according to the actual phoneme string and standard pronunciation.
The introduction of error-eliminating calculation can effectively carry out high-precision spoken language pronunciation correction calculation, firstly, data processing and error calculation are carried out, and the process is as follows:
Figure BDA0002902200860000111
in the formula, the error E, the error threshold H, the extreme value of the vibration trough B, the effective period law of the audio frequency C, the constant frequency parameter D and the PAH are the standard amplitude of the Korean speech.
By the above method, the collected korean spoken utterances are "normalized":
Figure BDA0002902200860000112
in the formula etaEIs a function discrete value in the Korean pronunciation process, n is the weight of the function discrete value, and T represents twoNumber of hops between individual audio nodes, dijRepresenting the shortest path between audio node i and node j.
The pronunciation is corrected as follows:
Vi=RUi(ATS-1)-1
in the formula, ATFor the natural skewness of the audio, it is a parameter for measuring the note, S-1Is the combination of audio attributes, is a function parameter of audio proofreading, R is the lifting weight of the high-grade audio, Ui is the measurement of the audio, ViIs the audio error protection limit.
Through the research on the sound channel and the oral cavity model, the pronunciation errors of the Korean spoken language are automatically corrected based on pronunciation phonemes, and technical support is provided for students to learn the Korean language.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and all simple modifications, equivalent changes and modifications made to the above embodiment according to the technical spirit of the present invention are within the scope of the technical solution of the present invention.

Claims (8)

1. The Korean pronunciation correction system based on the big data mining technology is characterized by comprising an audio signal acquisition module, a data analysis module, a correction module, a control module, a terminal module and a cloud module, wherein a signal transmission device comprises a vocal cord vibration sensor and an electromagnetic sensor, the electromagnetic sensor is used for capturing the movement of the tongue and the chin in voice recognition, the electromagnetic sensor is a wearable permanent magnetic tracer agent, the movement of the tongue is wirelessly tracked by utilizing a magnetic sensor array, the ultrasonic imaging measurement of the coordinates and the curvature position of the tongue is carried out to represent the tongue in the speaking process, meanwhile, the formant frequency of a vowel in a pronunciation model is estimated based on the combination of the lower jaw, the tongue and the throat, and the data analysis module optimizes two formants before a Korean vowel and a consonant, and the method comprises the following specific steps:
s1. for vowels, the first formant is expressed as
Figure FDA0003330866460000011
Its value is inversely proportional to the tongue height h:
Figure FDA0003330866460000012
the second resonance peak, is shown as
Figure FDA0003330866460000013
For vowel production, its value is inversely proportional to the horizontal axis advance of the tongue,/:
Figure FDA0003330866460000014
the mouth is considered as a tubular model and as a resonator, and the model is modified to obtain:
Figure FDA0003330866460000015
Figure FDA0003330866460000016
β1and beta2Is the closest constant value, beta, of the formant response of the provided tongue vowel pronunciation system1、β2E is R, c is the speed of sound, c is 340 m/s;
s2, determining beta1And beta2Value of (a), beta1And beta2The value of (a) is calculated based on the acquired value of the formants of the existing oral system of the experimental value of the permanent magnetic tracer, in order to improve the accuracy, a loss function between the formants of the estimation system and the tongue pronunciation system is calculated, and the loss is calculated by using a mean square error function:
Figure FDA0003330866460000017
calculating partial derivatives of the loss function and updating beta by1And beta2Current value of (a):
Figure FDA0003330866460000018
Figure FDA0003330866460000019
s3, the first formants of the relaxing tone, the tightening tone and the air supply tone are respectively expressed as follows:
Figure FDA00033308664600000110
Figure FDA00033308664600000111
Figure FDA00033308664600000112
the second formants of the relaxing tone, the relaxing tone and the air supply tone are respectively expressed as follows:
Figure FDA0003330866460000021
Figure FDA0003330866460000022
Figure FDA0003330866460000023
in the formula, gamma1、γ2Is the closest constant value of the provided tongue consonant pronunciation system formant response, c is the speed of sound, B is the burst release time, Duration is the Duration of pronunciation;
s4, cascading the simplified oral cavity system based on the tongue with the throat system to provide a calculation formula of the vocal tract system, wherein a transfer function of a resonant peak frequency of the vocal tract system is expressed as V (z)kTransfer function of formant frequencies of the laryngeal system and tongue is expressed as L (z)kAnd O (z)k
Figure FDA0003330866460000024
Figure FDA0003330866460000025
Figure FDA0003330866460000026
A1,A2Representing the formant frequencies of the laryngeal and lingual articulatory systems, respectively, T representing the duration of each formant, z representing the bandwidth of the formant, FikThe expression represents that the values of i and k are different
Figure FDA0003330866460000027
S5, the correction module acquires the formant frequency and the position changes of the tongue and the chin through a sensor so as to determine a chin pronunciation parameter related to the pitch; in the process of pronunciation, performing acoustic and electromyogram analysis, capturing the three-dimensional vocal tract geometric characteristics of the near consonant by using magnetic resonance imaging and palatal electrogram data, and guiding the dynamic adjustment of the movements of the lower jaw, tongue and throat of a learner according to the actual phoneme string and standard pronunciation;
the audio signal acquisition module comprises a signal transmission device, an audio signal modulator, a demodulator and a voice acquisition device.
2. The korean pronunciation correction system based on big data mining technology as claimed in claim 1, wherein the introduction of error-eliminating calculation can effectively perform the high-precision spoken pronunciation correction calculation, and the data processing and error calculation are performed first, as follows:
Figure FDA0003330866460000028
in the formula, the error E is an error threshold, H is an extreme value of a vibration trough, C is an effective period law of audio, D is a constant frequency parameter, and PAH is a standard amplitude of Korean voice;
combining the self-adaptive filtering and blind source separation, decomposing the voice signal to obtain an FM component of Korean voice detection, and outputting the FM component as follows:
Tm(θ)=(m-1)T0(θ);
in the formula, T0(θ) represents the initial FM component, Tm(θ) is the FM component;
and (3) carrying out normalized calculation on the collected Korean spoken pronunciation:
Figure FDA0003330866460000031
in the formula etaEIs a function discrete value in a Korean pronunciation process, n is a weight of the function discrete value, T represents a hop count between two audio nodes, dijRepresenting the shortest path between audio node i and node j;
the pronunciation is corrected as follows:
Vi=RUi(ATS-1)-1
in the formula, ATFor the natural skewness of the audio, it is a parameter for measuring the note, S-1Is a combination of audio attributes, is a function parameter of audio proofreading, and R is highLifting weight, U, of the level audioiIs a measure of audio, ViIs the audio error protection limit.
3. The system of claim 1, wherein the vocal cord vibration sensor comprises a voice signal acquisition sensor array, and the frequency domain of the feature detection of the Korean voice signal is v (t, θ), that is:
Figure FDA0003330866460000032
in the formula, ωi(theta) represents an instantaneous time-domain signal weighting vector of the ith pronunciation output of Korean,
Figure FDA0003330866460000033
representing the transient time domain signal component of the Korean pronunciation output, theta is a speech signal parameter, theta represents a conjugation operator, M represents a sensor, and the maximum value of the quantity is M;
the time domain matching and filtering are carried out on the voice signals by adopting a self-adaptive beam forming method, and the frequency domain characteristics of the output signals are as follows:
V(t,θ)=xH(t)ω(θ)
in the formula, H represents complex conjugate transpose;
the weight vector and components of the instantaneous time-domain signal of the korean speech output can be expressed as:
x(t)=[x1(t),x2(t),…,xM(t)]T
ω(θ)=[ω1(θ),ω2(θ),…,ωM(θ)]T
combining with the signal processing method of the sensor array, a signal model for detecting the pronunciation error of the Korean language is obtained as follows:
Figure FDA0003330866460000034
in the formula, gmTo calculate the coefficients, nm(t) is an auxiliary parameter.
4. The system of claim 3, wherein the audio modulator modulates a low frequency digital signal into a high frequency digital signal by a digital signal processing technique and transmits the high frequency digital signal, and the audio modulator is used in pair with the demodulator to modulate the digital signal into a high frequency signal and transmit the high frequency signal, and the demodulator restores the digital signal to an original signal.
5. The system of claim 4, wherein the demodulator recovers a low frequency digital signal modulated in a high frequency digital signal.
6. The korean pronunciation correction system based on big data mining technology as claimed in claim 1, wherein the control module is composed of a program counter, an instruction register, an instruction decoder, a timing generator and an operation controller for issuing commands, coordinating and directing the operation of the whole system.
7. The system of claim 1, wherein the terminal module comprises a client UI module, a visualization module, and the client UI module is adapted to collect information of a terminal user.
8. The system of claim 1, wherein the cloud module comprises a signal receiving module, and the cloud module comprises a standard pronunciation for korean and a database of an oral system and a throat system.
CN202110060609.8A 2021-01-18 2021-01-18 Korean pronunciation correction system based on big data mining technology Active CN112863263B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110060609.8A CN112863263B (en) 2021-01-18 2021-01-18 Korean pronunciation correction system based on big data mining technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110060609.8A CN112863263B (en) 2021-01-18 2021-01-18 Korean pronunciation correction system based on big data mining technology

Publications (2)

Publication Number Publication Date
CN112863263A CN112863263A (en) 2021-05-28
CN112863263B true CN112863263B (en) 2021-12-07

Family

ID=76005979

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110060609.8A Active CN112863263B (en) 2021-01-18 2021-01-18 Korean pronunciation correction system based on big data mining technology

Country Status (1)

Country Link
CN (1) CN112863263B (en)

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20150024180A (en) * 2013-08-26 2015-03-06 주식회사 셀리이노베이션스 Pronunciation correction apparatus and method
CN104732977B (en) * 2015-03-09 2018-05-11 广东外语外贸大学 A kind of online spoken language pronunciation quality evaluating method and system
CN105261246B (en) * 2015-12-02 2018-06-05 武汉慧人信息科技有限公司 A kind of Oral English Practice error correction system based on big data digging technology
KR20180115599A (en) * 2017-04-13 2018-10-23 인하대학교 산학협력단 The Guidance and Feedback System for the Improvement of Speech Production and Recognition of its Intention Using Derencephalus Action
KR20190066314A (en) * 2017-12-05 2019-06-13 순천향대학교 산학협력단 Pronunciation and vocal practice device and method for deaf and dumb person
CN108922563B (en) * 2018-06-17 2019-09-24 海南大学 Based on the visual verbal learning antidote of deviation organ morphology behavior
CN112185186B (en) * 2020-09-30 2022-07-01 北京有竹居网络技术有限公司 Pronunciation correction method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN112863263A (en) 2021-05-28

Similar Documents

Publication Publication Date Title
Lieberman et al. Speech physiology, speech perception, and acoustic phonetics
Dart An aerodynamic study of Korean stop consonants: Measurements and modeling
US6275795B1 (en) Apparatus and method for normalizing an input speech signal
JP2000504849A (en) Speech coding, reconstruction and recognition using acoustics and electromagnetic waves
KR20190037183A (en) The Articulatory Physical Features and Sound-Text Synchronization for the Speech Production and its Expression Based on Speech Intention and its Recognition Using Derencephalus Action
Ladefoged Speculations on the control of speech
CN113496696A (en) Speech function automatic evaluation system and method based on voice recognition
CN112863263B (en) Korean pronunciation correction system based on big data mining technology
CN113241065B (en) Dysarthria voice recognition method and system based on visual facial contour motion
Koreman Decoding linguistic information in the glottal airflow
Garnier et al. Efforts and coordination in the production of bilabial consonants
US10388184B2 (en) Computer implemented method and system for training a subject's articulation
Deng et al. Speech analysis: the production-perception perspective
CN116701709B (en) Method, system and device for establishing single consonant physiological voice database
Peterson Some observations on speech
Huang et al. Model-based articulatory phonetic features for improved speech recognition
Nataraj Estimation of place of articulation of fricatives from spectral parameters using artificial neural network
Colton et al. Physiology of vocal registers in singers and non-singers
Bush Modeling coarticulation in continuous speech
Liu et al. A study on the pronunciation of nasal initial syllables in Shigatse dialect based on Glottal MS-110
Munoz-Luna et al. Spectral study with automatic formant extraction to improve non-native pronunciation of English vowels
Albalkhi Articulation modelling of vowels in dysarthric and non-dysarthric speech
CN112967714A (en) Information acquisition method for English voice
Wang et al. Speech recognition system based on visual feature for the hearing impaired
Mikuöová Estimating Vocal Tract Resonances of Synthesized High-Pitched Vowels Using CNN

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant