CN103310273A - Method for articulating Chinese vowels with tones and based on DIVA model - Google Patents

Method for articulating Chinese vowels with tones and based on DIVA model Download PDF

Info

Publication number
CN103310273A
CN103310273A CN2013102611289A CN201310261128A CN103310273A CN 103310273 A CN103310273 A CN 103310273A CN 2013102611289 A CN2013102611289 A CN 2013102611289A CN 201310261128 A CN201310261128 A CN 201310261128A CN 103310273 A CN103310273 A CN 103310273A
Authority
CN
China
Prior art keywords
vowel
chinese
voice
diva
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2013102611289A
Other languages
Chinese (zh)
Inventor
张少白
纪艳春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Post and Telecommunication University
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing Post and Telecommunication University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Post and Telecommunication University filed Critical Nanjing Post and Telecommunication University
Priority to CN2013102611289A priority Critical patent/CN103310273A/en
Publication of CN103310273A publication Critical patent/CN103310273A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses a method for articulating Chinese vowels with tones and based on a DIVA model. The method includes collecting the Chinese vowels with four tones, extracting four tone base frequency sequences of the Chinese vowels through a time domain autocorrelation algorithm, acquiring first three formant frequencies through an LPC and inputting the base frequency sequence of one Chinese vowel with a tone and the first three formant frequencies into a DIVA nerve net model to conduct practice and learning of the Chinese vowel. A cell which is not used in a vowel mapping unit can be activated to represent the vowel. For the learned vowels, users only need to activate the corresponding vowel mapping cells, the corresponding activated cells of the activated vowel mapping unit generate corresponding voice through a feed-forward and feedback subsystem of the model. By means of the method, generation and acquisition of articulation of the Chinese vowels with tones are achieved, and the method is of great importance for research on artificial intelligence field and language barrier diseases in the medical field.

Description

Toned Chinese vowel manner of articulation based on the DIVA model
Technical field
The present invention relates to a kind of manner of articulation, more specifically say a kind of toned Chinese vowel manner of articulation based on the DIVA neural network model.
Background technology
Relate to speech production and understand regional correlation function in emulation and the description brain on Nervous System Anatomy and Neuropsychology level, this is the main thought that recent artificial speech synthesis system is pursued.Speech production is a cognitive process that relates to all multiple location complexity of brain with obtaining, this process comprises and a kind ofly extends to the hierarchy that phoneme produces from the statement according to syntax and grammatical organization sentence or phrase always, in the time of need to be according to sounding in the brain reciprocation of various sense organs and moving region set up corresponding neural network model.The Guenther of the MIT of Massachusetts Institute of Technology (MIT) sound lab proposed a kind of in order to generate word, syllable or phoneme first in 1994, and be used for the motion of control simulation sound channel speech production with obtain model, constantly be updated in the last few years, improved.This model is made of feedforward and FEEDBACK CONTROL subsystem, specifically comprises: the position of channel model, vocal organs and direction vector, planned position vector sum planning direction vector, the neural tuple of speech utterance, conversion (mapping) study mechanism and control gear.The learning process of model is substantially: when offering new voice of model as a speech samples study, a cell that is not used in the voice map unit will be activated to represent that voice at every turn.After voice were learnt, the cell of voice activated map unit will produce corresponding voice by feedforward and the feedback subsystem of model.
DIVA model present stage is for the speech production of English and the model that obtains, by improving the orthoepy of the toned Chinese vowel of DIVA model realization, and relate to the correlation function in speech production and speech understanding zone in the emulation brain relevant to description, be necessary for the study of Chinese pronunciations.
Chinese is compared with western language or other Asian languages, from etic viewpoint, obvious characteristics is arranged.The isolated word of Chinese is made of single syllable and the four tones of standard Chinese pronunciation (tone), and the same sound of Chinese has four kinds of tones and four kinds of different connotations is arranged.This tone of Chinese all shows in the vowel, and this just makes vowel and the four tones of standard Chinese pronunciation that close relationship is arranged, and tone is the variation of fundamental frequency cycles between the syllable.
Tone is the main feature of standard Chinese, and its tone situation has directly determined the quality of Chinese speech pronunciation quality.Tonequality, the duration of a sound, loudness of a sound and fundamental frequency are the four elementses of sound, and any sound all comprises that four kinds of key elements, lack wherein that any sound just can not exist, and tonequality, the duration of a sound, loudness of a sound and fundamental frequency are the objective attributes of sound.Time span when pitch period refers to send out voiced sound between the adjacent glottis closing point, the inverse of pitch period is fundamental frequency (F0), fundamental frequency is the main carriers of tone, although people recognize, the voice characteristic also is to distinguish the important phoneme of tone, up to the present, and in the speech engineering is used, fundamental frequency remains all indispensable the most general, most important discrimination factors of any tone system, is the best features of distinguishing Chinese language tone.The detection of fundamental frequency and estimation are the important contents that voice signal is processed, and be especially all the more so in Chinese speech is processed, and this is because the tone of Chinese is mainly reflected on the time dependent track of fundamental frequency (F0) of vowel.This research has increased simulates the generation of Chinese language tone to the control of fundamental frequency in the DIVA model and obtains.
Summary of the invention
The technical problem to be solved in the present invention provides a kind of toned Chinese vowel manner of articulation based on the DIVA model, by change fundamental frequency and formant frequency revise vocal organs movement instruction, and then generate toned vowel and corresponding brain activity zone.
For solving the problems of the technologies described above, the present invention is based on DIVA(Directions Into of Articulators) the toned Chinese vowel manner of articulation of model, utilize the time domain auto-correlation algorithm to extract Chinese vowel tone fundamental frequency, adopt LPC(Linear Prediction Coding, linear predictive coding) extracts first three formant frequency of Chinese vowel, by change fundamental frequency and formant frequency revise vocal organs movement instruction, and then generate toned vowel and corresponding brain activity zone, comprise the steps:
Step 1, gather toned Chinese vowel, and obtain first three formant frequency and the fundamental frequency sequence of toned Chinese vowel;
Step 2, first three formant frequency of toned Chinese vowel and fundamental frequency sequence are input in the DIVA neural network model, this Chinese vowel is trained and learnt, a cell that is not used in the voice map unit will be activated to represent this voice, for the voice of having learnt, only need to activate corresponding voice mapping cell;
The corresponding cell of step 3, the voice map unit that is activated produces corresponding voice by feedforward and the feedback subsystem of model.
Preferably, the present invention is based in the toned Chinese vowel manner of articulation of DIVA model, in the described step 1, the method for obtaining formant frequency is based on linear prediction LPC or Cepstrum Method.
Preferably, the present invention is based in the toned Chinese vowel manner of articulation of DIVA model, in the described step 1, the method for obtaining the fundamental frequency sequence is the time domain auto-correlation algorithm.
Toned Chinese vowel manner of articulation based on the DIVA model of the present invention, utilize the DIVA neural network model, emulation and described the correlation function that toned Chinese vowel generates and obtains on Nervous System Anatomy and Neuropsychology level, realization is to the generation of toned Chinese vowel pronunciation and obtain, and is significant to the research and development of artificial intelligence field and medical domain aphasis disease.
Description of drawings
Fig. 1 is the structural representation of existing DIVA neural network model.
Fig. 2 is that four kinds of tone fundamental frequencies are found the solution process flow diagram.
The Chinese vowel four kind tone fundamental frequency sequence of Fig. 3 for adopting the time domain auto-correlation algorithm to extract.
Fig. 4 is for adopting the LPC method to obtain the process flow diagram of Chinese vowel formant frequency.
Fig. 5 is first three formant frequency value of utilizing the Chinese vowel that LPC obtains.
Embodiment
The present invention is based on the toned Chinese vowel manner of articulation of DIVA model, utilize the time domain auto-correlation algorithm to extract Chinese vowel tone fundamental frequency, adopt LPC to extract first three formant frequency of Chinese vowel, by change fundamental frequency and formant frequency revise vocal organs movement instruction, and then generate toned vowel and corresponding brain activity zone, comprise the steps:
Step 1, gather toned Chinese vowel, and obtain first three formant frequency and the fundamental frequency sequence of toned Chinese vowel;
Step 2, first three formant frequency of toned Chinese vowel and fundamental frequency sequence are input in the DIVA neural network model, this Chinese vowel is trained and learnt, a cell that is not used in the voice map unit will be activated to represent this voice, for the voice of having learnt, only need to activate corresponding voice mapping cell;
The corresponding cell of step 3, the voice map unit that is activated produces corresponding voice by feedforward and the feedback subsystem of model.
The principal character of voice signal is fundamental frequency and two speech parameters the most basic of resonance peak.Fundamental frequency refers to the frequency of vocal cord vibration in the generating process of voiced sound.Have at present the method that a lot of fundamental frequencies extract, this paper adopts the correlation method of time domain, correlation method is by the algorithm of short-time autocorrelation function to fundamental frequency estimation, its advantage is that hydraulic performance decline is smaller under noise situations, system realizes fairly simple, so it is widely used in practice.
Time domain auto-correlation algorithm: by the in time slowly supposition of variation of characteristic of voice signal, namely can derive following short-time autocorrelation function.The short-time autocorrelation function of signal { x (n) } is:
R n ( k ) = Σ m = - ∞ ∞ x ( m ) · w ( n - m ) · x ( m + k ) · w ( n - ( m + k ) )
= Σ m = - ∞ ∞ x ( m ) · x ( m + k ) · h k ( n - m )
The main thought of this Algorithm for Solving: utilize above-mentioned in short-term autocorrelative computing method, the every 20ms of voice signal as a frame, is obtained sequence x (n).If the cycle is T, then signal moves after the T point and the original signal doubling of the image.Original signal is x (n), and the signal behind the mobile k point is x (n+k).
Had by Cauchy inequality:
Σx(n)x(n+k)≤Σx(n)x(n)
And if only if k=T, 2T, 3T ..., equal sign is set up during nT.
First maximum point of Σ x (n) x (n+k), then corresponding k is exactly cycle T.Certainly, because voice signal is not the absolute cycle, the voice in two cycles are slightly different, possible maximal value is at k=T, ZT, 3T ..., obtain during nT, so must compare first large peak value, second large peak value, the 3rd large peak value ..., if peak value about equally, then get that peak value of k minimum.Get the cycle that peaked k is exactly desired voice signal this moment.So, the fundamental frequency size of this frame language is the 1/k reciprocal in cycle.
Resonance peak is defined as sound channel impulse response, if sound channel is considered as a resonator cavity, resonance peak is exactly the resonance frequency of this cavity, and this paper adopts the resonance peak acquiring method based on linear prediction (LPC).What voice signal was synthetic studies show that, expression voiced sound signal is first three resonance peak most importantly.The tone of the Chinese four tones of standard Chinese pronunciation determines that the vibration frequency of vocal band of fundamental frequency cycles changes in change procedure, still determine that the oral cavity shape of harmonious sounds is but almost constant, and this frequency change that has just determined harmonic peak is very little.
As shown in Figure 1, the DIVA model of new edition is called again The Simulink DIVA model, moves under the Simulink of Matlab environment because be.Each square among Fig. 1 has represented corresponding to Nervous System Anatomy neural metaset in the brain when pronouncing, and arrow has represented the direction of motion command.
DIVA is made of feedforward and FEEDBACK CONTROL subsystem.The feedforward control subsystem is used for speech production, the learning process of FEEDBACK CONTROL subsystem responsible voice.In the feed-forward subsystems, the factor string that is activated of the cell in the voice mapping ensemblen, each cell represents a kind of target phoneme.The voice Motion mapping collection of left veutro premotor cortex is as an object space that forms according to fundamental frequency, first three formant frequency and response somatesthesia target, characterize the fundamental frequency and the input of first three formant frequency as the DIVA model of a word or vowel, activate corresponding cell.After cell is activated, corresponding echo signal directly or to be converted into vocal organs through cerebellum (be front field model, it describes the shape of channel model with eight parameters, respectively protruding, the jaw of position, the shape of tongue, the tip of the tongue, lip height, the lip of tongue and the height of throat and the deflation degree of glottis just simply arrange these eight parameters and can obtain various voice) movement instruction.This process is called the mapping that vocal organs are arrived in the sense of hearing in the motor cortex.Movement instruction in vocal organs speed and the position mapping ensemblen comprises direction and the position of vocal organs.Front field model comes the simulated sound road shape and generates related voice with these instructions.From physiology, feedforward control subsystem simulation proparea motor cortex is to the mapping process of motor cortex.The FEEDBACK CONTROL subsystem comprises somatesthesia and audio feedback.The former returns current sound channel state to the somatesthesia cortex, and the latter collects audible signal and passes to the high-order auditory cortex.Suppose that somatesthesia and audio feedback are distributed in respectively inferior parietal lobule cortical area and superior temporal gyrus.Obtain somatesthesia and sense of hearing error by the target in relevant cortex zone and the difference of current state, the correct one's pronunciation movement instruction of organ of error signal.Feedforward and FEEDBACK CONTROL subsystem are so that DIVA model correctly analog voice generation and acquisition process.
In simulation process, the DIVA model produces a Serial No. that represent brain activity zone and level, and these export data through SPM(Statistical Parametric Mapping) analysis in 2 tool boxes demonstrates the zone of action of brain.
As shown in Figure 2, when four kinds of tone fundamental frequencies of Chinese vowel are found the solution, at first with toned Chinese single vowel a, o, e, i, u, ü by behind the 400Hz low-pass filter, sample with 8kc, then 8 quantifications get the l frame every 20ms, determine top and terminal from frame sequence; Utilize auto-correlation algorithm to extract fundamental frequency; Utilize interpolation smoothing to guarantee the continuity of data; Obtain Chinese vowel tone fundamental frequency cycles sequence, the inverse in cycle is fundamental frequency.
As shown in Figure 3, utilize the time domain auto-correlation algorithm obtain in the 1s four kinds of tone fundamental frequencies of Chinese single vowel, time dependent fundamental frequency situation as one of input parameter of DIVA model generation tonal Chinese vowel.
As shown in Figure 4, when adopting the LPC method to obtain the formant frequency of Chinese vowel, at first Chinese single vowel a, o, e, i, u, ü are carried out pre-filtering, eliminate and disturb; Divide frame, pre-emphasis, windowing and end-point detection to the vowel after the pre-filtering, determine top and the knot end of vowel, utilize linear predictive coding to data analysis, obtain the formant frequency of Chinese single vowel, the Chinese vowel formant frequency that obtains is further carried out the Chinese vowel formant frequency that medium filtering and linear smoothing obtain the final input of DIVA neural network model.
As shown in Figure 5, utilize first three formant frequency value of the Chinese vowel that LPC obtains, these formant frequencies are produced one of input parameter of toned Chinese vowel as the DIVA model.
A specific embodiment
Step 1, utilize the LPC method to obtain first three formant frequency F1, F2 of Chinese vowel a and the value of F3 is respectively 1088Hz, 1540Hz and 2420Hz, the fundamental frequency sequence of utilizing the time domain auto-correlation algorithm to extract Chinese vowel a three is 118Hz, 117Hz, 15Hz, 117Hz, 118Hz, 120Hz, 122Hz, 150Hz, 195Hz and 200Hz;
Step 2, first three formant frequency of three a of Chinese vowel and fundamental frequency sequence are input in the DIVA neural network model, this Chinese vowel is trained and learnt.In the training and learning process of three a of new voice as a speech samples, a cell that is not used in the voice map unit will be activated to represent three a of these voice, the voice map unit cell that is activated will produce three a of corresponding voice first by feedforward and the feedback subsystem of model, by training study repeatedly, finally generate the accurate pronunciation of three a of voice;
If step 3 is inputted three a of voice that learnt again, DIVA only need to activate corresponding voice mapping cell, does not need directly to generate orthoepy through the training study process;
If step 4 input is the voice of an indiscipline study, then repeating step 2.After the voice of study reach certain quantity, model even can under certain constraint condition, generate any voice and make up;
Step 5, show through the analysis in SPM2 tool box, the brain activity zone relevant with the Chinese vowel of same tone not is to move and premotor cortex.

Claims (3)

1. based on the toned Chinese vowel manner of articulation of DIVA model, it is characterized in that, by change fundamental frequency and formant frequency revise vocal organs movement instruction, and then generate toned vowel and corresponding brain activity zone, comprise the steps:
Step 1, gather toned Chinese vowel, and obtain first three formant frequency and the fundamental frequency sequence of toned Chinese vowel;
Step 2, first three formant frequency of toned Chinese vowel and fundamental frequency sequence are input in the DIVA neural network model, this Chinese vowel is trained and learnt, a cell that is not used in the voice map unit will be activated to represent this voice, for the voice of having learnt, only need to activate corresponding voice mapping cell;
The corresponding cell of step 3, the voice map unit that is activated produces corresponding voice by feedforward and the feedback subsystem of model.
2. the Chinese vowel manner of articulation based on the DIVA neural network model according to claim 1 is characterized in that, in the described step 1, the method for obtaining formant frequency is based on linear prediction LPC or Cepstrum Method.
3. the Chinese vowel manner of articulation based on the DIVA neural network model according to claim 1 is characterized in that, in the described step 1, the method for obtaining the fundamental frequency sequence is the time domain auto-correlation algorithm.
CN2013102611289A 2013-06-26 2013-06-26 Method for articulating Chinese vowels with tones and based on DIVA model Pending CN103310273A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2013102611289A CN103310273A (en) 2013-06-26 2013-06-26 Method for articulating Chinese vowels with tones and based on DIVA model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2013102611289A CN103310273A (en) 2013-06-26 2013-06-26 Method for articulating Chinese vowels with tones and based on DIVA model

Publications (1)

Publication Number Publication Date
CN103310273A true CN103310273A (en) 2013-09-18

Family

ID=49135460

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2013102611289A Pending CN103310273A (en) 2013-06-26 2013-06-26 Method for articulating Chinese vowels with tones and based on DIVA model

Country Status (1)

Country Link
CN (1) CN103310273A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107993071A (en) * 2017-11-21 2018-05-04 平安科技(深圳)有限公司 Electronic device, auth method and storage medium based on vocal print
CN108836574A (en) * 2018-06-20 2018-11-20 广州智能装备研究院有限公司 It is a kind of to utilize neck vibrator work intelligent sounding system and its vocal technique
CN111599347A (en) * 2020-05-27 2020-08-28 广州科慧健远医疗科技有限公司 Standardized sampling method for extracting pathological voice MFCC (Mel frequency cepstrum coefficient) features for artificial intelligence analysis
CN112309371A (en) * 2019-07-30 2021-02-02 上海流利说信息技术有限公司 Intonation detection method, apparatus, device and computer readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020174079A1 (en) * 1999-09-01 2002-11-21 Keith E. Mathias Method for improving neural network architectures using evolutionary algorithms
CN102201236A (en) * 2011-04-06 2011-09-28 中国人民解放军理工大学 Speaker recognition method combining Gaussian mixture model and quantum neural network
CN102237088A (en) * 2011-06-17 2011-11-09 盛乐信息技术(上海)有限公司 Device and method for acquiring speech recognition multi-information text
CN103310376A (en) * 2013-06-26 2013-09-18 南昌航空大学 Index trend space-time product probability predicting method based on time smooth filtering algorithm (TSFA) and artificial nerve network (ANN)

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020174079A1 (en) * 1999-09-01 2002-11-21 Keith E. Mathias Method for improving neural network architectures using evolutionary algorithms
CN102201236A (en) * 2011-04-06 2011-09-28 中国人民解放军理工大学 Speaker recognition method combining Gaussian mixture model and quantum neural network
CN102237088A (en) * 2011-06-17 2011-11-09 盛乐信息技术(上海)有限公司 Device and method for acquiring speech recognition multi-information text
CN103310376A (en) * 2013-06-26 2013-09-18 南昌航空大学 Index trend space-time product probability predicting method based on time smooth filtering algorithm (TSFA) and artificial nerve network (ANN)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107993071A (en) * 2017-11-21 2018-05-04 平安科技(深圳)有限公司 Electronic device, auth method and storage medium based on vocal print
CN108836574A (en) * 2018-06-20 2018-11-20 广州智能装备研究院有限公司 It is a kind of to utilize neck vibrator work intelligent sounding system and its vocal technique
CN112309371A (en) * 2019-07-30 2021-02-02 上海流利说信息技术有限公司 Intonation detection method, apparatus, device and computer readable storage medium
CN111599347A (en) * 2020-05-27 2020-08-28 广州科慧健远医疗科技有限公司 Standardized sampling method for extracting pathological voice MFCC (Mel frequency cepstrum coefficient) features for artificial intelligence analysis
CN111599347B (en) * 2020-05-27 2024-04-16 广州科慧健远医疗科技有限公司 Standardized sampling method for extracting pathological voice MFCC (functional peripheral component interconnect) characteristics for artificial intelligent analysis

Similar Documents

Publication Publication Date Title
CN103928023B (en) A kind of speech assessment method and system
Story Phrase-level speech simulation with an airway modulation model of speech production
Suni et al. Wavelets for intonation modeling in HMM speech synthesis
JP2000504849A (en) Speech coding, reconstruction and recognition using acoustics and electromagnetic waves
CN103310273A (en) Method for articulating Chinese vowels with tones and based on DIVA model
Prom-on et al. Identifying underlying articulatory targets of Thai vowels from acoustic data based on an analysis-by-synthesis approach
CN102880906B (en) Chinese vowel pronunciation method based on DIVA nerve network model
Prom-on et al. Training an articulatory synthesizer with continuous acoustic data.
CN104679249A (en) Method for implementing Chinese BCI (brain and computer interface) based on a DIVA (directional into velocities of articulators) model
Haque et al. Modification of energy spectra, epoch parameters and prosody for emotion conversion in speech
Canevari et al. A new Italian dataset of parallel acoustic and articulatory data.
Sering et al. Resynthesizing the GECO speech corpus with VocalTractLab
JP5574344B2 (en) Speech synthesis apparatus, speech synthesis method and speech synthesis program based on one model speech recognition synthesis
Blackburn Articulatory methods for speech production and recognition
Bekolay Biologically inspired methods in speech recognition and synthesis: closing the loop
Sun et al. Unsupervised Inference of Physiologically Meaningful Articulatory Trajectories with VocalTractLab.
Zhao et al. Online noise estimation using stochastic-gain HMM for speech enhancement
Shaobai et al. Research on the mechanism for phonating stressed English syllables based on DIVA model
Esposito et al. Some notes on nonlinearities of speech
Murphy Controlling the voice quality dimension of prosody in synthetic speech using an acoustic glottal model
Tobing et al. Articulatory controllable speech modification based on Gaussian mixture models with direct waveform modification using spectrum differential
Liu et al. Model-based parametric prosody synthesis with deep neural network
Eshghi et al. Phoneme Embeddings on Predicting Fundamental Frequency Pattern for Electrolaryngeal Speech
Krug et al. Self-Supervised Solution to the Control Problem of Articulatory Synthesis
Liu Fundamental frequency modelling: An articulatory perspective with target approximation and deep learning

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20130918