CN102789594B - Voice generation method based on DIVA neural network model - Google Patents

Voice generation method based on DIVA neural network model Download PDF

Info

Publication number
CN102789594B
CN102789594B CN201210219670.3A CN201210219670A CN102789594B CN 102789594 B CN102789594 B CN 102789594B CN 201210219670 A CN201210219670 A CN 201210219670A CN 102789594 B CN102789594 B CN 102789594B
Authority
CN
China
Prior art keywords
neuron
hidden
hidden layer
candidate
layer candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210219670.3A
Other languages
Chinese (zh)
Other versions
CN102789594A (en
Inventor
张少白
徐磊
刘欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Boao Zongheng Network Technology Co ltd
Guangzhou Zib Artificial Intelligence Technology Co ltd
Original Assignee
Nanjing Post and Telecommunication University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Post and Telecommunication University filed Critical Nanjing Post and Telecommunication University
Priority to CN201210219670.3A priority Critical patent/CN102789594B/en
Publication of CN102789594A publication Critical patent/CN102789594A/en
Application granted granted Critical
Publication of CN102789594B publication Critical patent/CN102789594B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Feedback Control In General (AREA)

Abstract

The invention discloses a voice generation method based on a DIVA neural network model, which comprises voice sample extraction, voice sample classification and learning, voice output and output voice revise, wherein the voice sample classification and learning adopts a self-adaptive growth neural network (AGNN) to implement classified learning of a voice sample; the number of candidate nerve cells in an input layer is further calculated by the acquired voice formant frequency, then a nerve cell of an invisible layer is determined according to the candidate nerve cells in the input layer, and finally an output value of the AGNN is obtained and a phoneme is confirmed according to the output value. The neural network adopting the structure is high in training accuracy and a learner can learn quickly.

Description

A kind of speech production method based on DIVA neural network model
Technical field
The present invention relates to a kind of speech production method, particularly a kind of speech production method based on DIVA neural network model.
Background technology
Along with the development of artificial intelligence, people deepen continuously to the research in this field.Speech production to similar true man's pronunciation and the control of obtaining are robot articulatory system urgent problems.Speech production is a cognitive process that relates to all multiple location complexity of brain with obtaining, this process comprises a kind of hierarchy producing from extend to phoneme according to the statement of syntax and grammatical organization sentence or phrase, and need to be according to sounding time, in brain, the reciprocation of various sense organs and moving region be set up corresponding neural network model.DIVA (Directions Into Velocities of Articulators) model is exactly a kind of about speech production and the mathematical model of obtaining rear description correlation procedure at present, is mainly used to emulation and describes the correlation function that relates to speech production and speech understanding region in brain.Also can say that it is a kind of in order to generate word, syllable or phoneme, and be used for the adaptive neural network model of control simulation sound channel motion.In the neural network model that really has the speech production of biological significance now and obtain, the definition of DIVA model and test are the most comparatively speaking, and are unique a kind of models of applying pseudoinverse control program.
The development that people DIVA model for the demand driving of the unified calculation model of human language ability.This model, since being proposed first by the Guenther1994 of MIT university sound lab, had constantly been updated in the last few years, had improved.DIVA system is made up of voice channel module, cochlea module, auditory cortex model module, auditory cortex classification sensing module, voice cell collection module, motor cortex module, sound channel module, somesthetic cortex module, Sensory module and sensory channel module.
By the analysis to DIVA model, we can find that the sorting technique using in its auditory cortex classification sensing module is RBF.And RBF neural network is very large to the dependence of sample, for a certain concrete studying a question, the hidden layer node number that how to confirm is suitable, there is no general effective algorithm or theorem at present.People are more by virtue of experience, and repetition test is determined the scale of network, and the method that this examination is gathered is very loaded down with trivial details, are difficult for finding suitable structure.Speed of convergence, precision and the generalization ability of the nodes of network hidden layer to network all has a great impact.Hidden layer node is too much, though can complete training, can affect speed of convergence, and likely occurs study; And hidden layer node is very few, network can not fully be learnt, and does not reach the requirement of training precision.In addition, the time of RBF neural metwork training is also fast not.
Summary of the invention
The object of the present invention is to provide a kind of pronounce precision is high, pace of learning the is fast speech production method based on DIVA neural network model.
The technical solution that realizes the object of the invention is: a kind of speech production method based on DIVA neural network model, comprise speech samples extraction, speech samples classification and study, voice output and correction output voice, described speech samples classification adopts self-adaptive growth type neural network (AGNN) to realize the classification learning to speech samples with study, is specially:
Step 1, the speech resonant peak frequency of extraction is converted to matrix form by Jacobian, the dimension of the proper vector of this matrix is the neuronic number m of input layer candidate; The order of calculating the neuronic fitness function value of input layer candidate and increase progressively by fitness function value is arranged candidate's neuron, and the list of input layer candidate neuron fitness function value is S={S accordingly i1≤ S i2≤ ... ≤ S im, and by corresponding order, candidate's neuron is placed in list X to X=(x 1..., x m), described fitness function computing formula is:
Y ifor real output value, for desired value, n is that number and the n of data centralization sample is natural number;
Step 2, initial hidden neuron number r=0 also establish C 0=S i1, C 0fitness function value during for hidden neuron number r=0;
Step 3, establish r=r+1 and p=r+1, wherein r is r hidden layer candidate neuron, generates hidden layer candidate's neuron that has p input;
If step 4 r>1, is connected respectively to hidden neurons all before it and input node x by this hidden layer candidate neuron 1on; Otherwise this hidden layer candidate neuron is only connected to input node x 1on;
Step 5, the initial value that the residing position h of element in the next set X that needs be connected with the new hidden layer candidate's neuron adding is set are 2, wherein 2≤h≤m, and m, h are positive integer; Neuronic this hidden layer candidate P input is connected on the input node that list X meta is set to h;
Step 6, train this hidden layer candidate's neuron and calculate its fitness function value C rif, C r>=C r-1, perform step seven; If C r<C r-1this hidden layer candidate neuron is connected in network as r hidden neuron, then returns to execution step three to step 6, until do not meet this condition in m input layer access network;
Step 7, by h=h+1, again train this hidden layer candidate neuron, until when h=m, if still do not meet C r<C r-1, finishing training, this hidden layer candidate neuron is irrelevant with classification, gives up this hidden layer candidate neuron, using neuronic this hidden layer candidate previous hidden neuron as output layer;
Step 8, determine phoneme according to the output numerical value of output layer.
Further, the present invention is based in the speech production method of DIVA neural network model, in step 6, train this hidden layer candidate's neuron and calculate its fitness function value C r, be specially:
(1), data set that speech resonant peak frequency normalization is formed is divided into training set, checking collection and test set, the number of samples of the training set of dividing here and checking collection is respectively n a, n b, divide according to being: n a=n b;
(2), according to three set after dividing, utilize following formula to calculate the neuronic fitness function value of hidden layer candidate C r, i=1 ..., n b, n bfor verifying concentrated sample number, wherein, y b∈ Y b, Y bfor verifying concentrated object vector, U bfor input and the U of checking set pair hidden neuron bfor the matrix of p × 1 vector, W k-1for weight vector, k is iterations, and span is k=0,1,2,3 ..., n, wherein n is positive integer.
Further, the present invention is based in the speech production method of DIVA neural network model, output numerical value according to output layer in described step 8 is determined phoneme, be specially: the output numerical value of described output layer is the numerical value in 0 to 1 interval, and determine the corresponding phoneme of AGNN neural network output numerical value according to the corresponding value range of each phoneme in DIVA neural network model.
The present invention compared with prior art, its remarkable advantage: because self-adaptive growth type neural network model is since an input node study, regulate neuronic weights according to external rules, and increase gradually new input node and new hidden neuron.So the AGNN constructing is a narrow and dark network, have the input neuron, hidden neuron and the network that approach minimal amount to connect, can effectively prevent the over-fitting of network, and network assess the cost lowly, pace of learning is fast; The RBF network originally using in DIVA model on average can reach more than 90% at 80%, AGNN the nicety of grading of sample; For the learning sample master mould classification learning of general difficulty and generate this process used time of voice 10s-13s, and use system after AGNN model refinement under same case, learn and generate only used time 8s-10s of this process of voice for the learning sample category of model of general difficulty, under the same terms, compare fast 2-3s.For medium difficulty and above learning sample, use the system performance under same case after AGNN model refinement just to seem more excellent, compare the fast 4-5s in used time aspect with the model before improving, nicety of grading aspect original system to sample drops to 70%-75%, and uses the system after AGNN model refinement under equal conditions still can keep higher accuracy rate 90%; Visible self-adaptive growth type neural network model is applied to and in DIVA model, makes that the pronunciation precision of model is higher and pace of learning is faster.
Brief description of the drawings
Fig. 1 is process flow diagram of the present invention;
Fig. 2 is the structured flowchart of DIVA neural network model;
Fig. 3 is the AGNN neural network structure schematic diagram for classifying in embodiment;
Embodiment
Below in conjunction with accompanying drawing, the present invention is described in further detail.
As shown in Figure 1, a kind of speech production method based on DIVA neural network model of the present invention, comprise speech samples extraction, speech samples classification and study, voice output and correction output voice, it is characterized in that, described speech samples classification adopts self-adaptive growth type neural network (AGNN) to realize the classification learning to speech samples with study, is specially:
Step 1, the speech resonant peak frequency of extraction is converted to matrix form by Jacobian, the dimension of the proper vector of this matrix is the neuronic number m of input layer candidate; The order of calculating the neuronic fitness function value of input layer candidate and increase progressively by fitness function value is arranged candidate's neuron, and the list of input layer candidate neuron fitness function value is S={S accordingly i1≤ S i2≤ ... ≤ S im, and by corresponding order, candidate's neuron is placed in list X to X=(x 1..., x m), described fitness function computing formula is: y ifor real output value, for desired value, n is that number and the n of data centralization sample is natural number;
Step 2, initial hidden neuron number r=0 also establish C 0=S i1, C 0fitness function value during for hidden neuron number r=0;
Step 3, establish r=r+1 and p=r+1, wherein r is r hidden layer candidate neuron, generates hidden layer candidate's neuron that has p input;
If step 4 r>1, is connected respectively to hidden neurons all before it and input node x by this hidden layer candidate neuron 1on; Otherwise this hidden layer candidate neuron is only connected to input node x 1on;
Step 5, the initial value that the residing position h of element in the next set X that needs be connected with the new hidden layer candidate's neuron adding is set are 2, wherein 2≤h≤m, and m, h are positive integer; Neuronic this hidden layer candidate P input is connected on the input node that list X meta is set to h;
Step 6, train this hidden layer candidate's neuron and calculate its fitness function value C rif, C r>=C r-1, perform step seven; If C r<C r-1this hidden layer candidate neuron is connected in network as r hidden neuron, then returns to execution step three to step 6, until do not meet this condition in m input layer access network; Wherein fitness function value C r, be specially:
(1), data set that speech resonant peak frequency normalization is formed is divided into training set, checking collection and test set, the number of samples of the training set of dividing here and checking collection is respectively n a, n b, divide according to being: n a=n b;
(2), according to three set after dividing, utilize following formula to calculate the neuronic fitness function value of hidden layer candidate C r, i=1 ..., n b, n bfor verifying concentrated sample number, wherein, y b∈ Y b, Y bfor verifying concentrated object vector, U bfor input and the U of checking set pair hidden neuron bfor the matrix of p × 1 vector, W k-1for weight vector, k is iterations, and span is k=0,1,2,3 ..., n, wherein n is positive integer, the higher iterations k of the precision value of training is larger.
Step 7, by h=h+1, again train this hidden layer candidate neuron, until when h=m, if still do not meet C r<C r-1, finishing training, this hidden layer candidate neuron is irrelevant with classification, gives up this hidden layer candidate neuron, using neuronic this hidden layer candidate previous hidden neuron as output layer;
Step 8, determine phoneme according to the output numerical value of output layer, the output numerical value of described output layer is the numerical value in 0 to 1 interval, and determines the corresponding phoneme of AGNN neural network output numerical value according to the corresponding value range of each phoneme in DIVA neural network model.
Embodiment
As shown in Figure 2, in this embodiment, first gather the voice process voice channel module of the pronunciation equipments such as microphone with a given delay, the formant frequency of voice is sent to cochlea module with vectorial form.The cochlea that cochlea module is calculated these voice represents (frequency spectrum), and formant frequency is sent to auditory cortex module.The voice transfer that auditory cortex represents the formant frequency being transmitted by cochlea module is to auditory cortex classification sensing module.Auditory cortex classification sensing module receives the base unit-phoneme that is just divided into voice after these voice, the phoneme target of initialization output arrive via voice cell collection module separately that auditory cortex and somatesthesia cortex module form respectively to the sense of hearing and somatesthesia result, this module is by relatively characterizing to identify sound bite from the sound bite of auditory cortex module and the phoneme stored, wherein each phoneme is characterized by the numerical range between 0-1, be stored in voice cell collection module, identifying is specifically: auditory cortex classification sensing module characterizes with the concentrated phoneme of voice cell the phoneme (being the output valve of AGNN) being divided into match one by one, if concentrate and do not find the phoneme that matches with it to characterize this phoneme also not by training study mistake at voice cell, voice cell collection module will create a new phoneme in specific region and characterize to represent current phoneme.Between the phoneme target of auditory cortex classification sensing module output and voice cell collection, it is man-to-man relation.Afterwards, voice cell collection module starts the generation of phoneme fragment, send need the phoneme target that produces index motor cortex, auditory cortex and somatesthesia cortex module.Motor cortex sends control command to sound channel module after receiving from the phoneme target index of voice cell collection module, sound channel module is calculated the channel parameters of the control command receiving, be sent to sound-box device and produce corresponding voice, sound channel module sends the auditory effect of calculating and parameter configuration to cochlea module and Sensory module formation feedback by voice channel and sensory channel respectively simultaneously.Sensory module is receiving after the channel configuration information sending over vector form being transmitted by sensory channel, calculates the result of the somatesthesia that channel configuration is relevant, and they are sent to somesthetic cortex module.Then, somesthetic cortex module is calculated the difference that the cortex between somatesthesia and the somatesthesia target of inputting represents, and sends somatesthesia error to motor cortex module in order to revise the voice that generate.Cochlea module is receiving after the formant frequency that the voice that produced by sound channel module pass the voice that represent with vector form of coming through voice channel, passed to auditory cortex module, auditory cortex module is just calculated the difference between these voice and its target voice of cortex representative, and propagation of error is arrived to motor cortex module, in order to revise the voice that generate.
As shown in the table, 29 phonemes storing in voice cell collection module in existing DIVA neural network model characterize corresponding numerical range.The classification results of AGNN is exactly a numerical value, and the numerical value of gained represents different phoneme (numerical value of gained drops on different numerical value intervals and just represents a specific phoneme).
As shown in Figure 3, get learn efficiency η=1.9, Δ=0.0015, the selection of initial weight meets normal distribution.
According to input data set X, the dimension that calculates its proper vector is input layer candidate neuron number m=8, according to formula: wherein y ifor real output value, for desired value, n is the number of data centralization sample.The fitness function value of calculating each element in input data set X, the order increasing progressively by fitness function is arranged them, and choose successively front 8 as candidate's neuron, be respectively x 8, x 5, x 12, x 16, x 24, x 27, x 19, x 23, wherein first input neuron x 8corresponding fitness function value minimum is also designated as C 0.
Increase a hidden layer candidate neuron z 1, it has 2 inputs.Its two inputs and input layer candidate neuron x 8and x 5be connected, train this hidden layer candidate neuron then to calculate z 1fitness function C 1, now by C 1with C 0compare, have C 1<C 0, at this moment z 1join in network as the 1st hidden neuron.Increase again a hidden layer candidate neuron z 2, it has 3 inputs.By its front 2 inputs and hidden neuron z above 1and x 8be connected, the 3rd input is connected to x 5, train this candidate's hidden neuron and calculate its fitness function C 2, by C 2with C 1relatively there is C 2<C 1, z 2join in network as the 2nd hidden neuron.Add hidden layer candidate neuron z 3, it has 4 inputs.Front 3 inputs are connected to hidden neuron z 1and z 2and input layer x 8upper, the 4th input is connected to x 5upper, train this candidate's neuron and calculate its fitness function but the fitness function value now calculated is less than C 2, the 4th input is connected to x 12upper, train this candidate's neuron and calculate fitness function value C now 3, have C 3<C 2, z 3join in network as the 3rd hidden neuron.Z 4join in network as hidden layer candidate neuron, have 5 inputs, front 4 inputs are connected to z 1~ z 3and x 8upper, the 5th input is connected to x 12upper, train this candidate's neuron and calculate its fitness function but be less than C 3, the 5th input is connected to x 16upper, train this candidate's neuron and calculate its fitness function C 4, because C 4<C 3, z 4join in network as the 4th hidden neuron.Then again z 5join in network as hidden layer candidate neuron, it has 6 inputs, and front 5 inputs are connected to z 1~ z 4and x 8upper, the 6th input is connected to x 16upper, train this candidate's neuron and calculate its fitness function C 5, due to C 5<C 4by z 5join in network as the 5th hidden neuron.Continue to add z 6as hidden layer candidate neuron, it has 7 inputs, and front 6 inputs are connected to z 1~ z 5and x 8upper, the 7th input is connected to x 16upper, train this candidate's neuron and calculate its fitness function but be less than C 5, the 7th input is connected to x 24upper, train this candidate's neuron and calculate its fitness function C 6, because C 6<C 5, z 6join in network as the 6th hidden neuron.Again z 7be connected in network as hidden layer candidate neuron, have 8 inputs, front 7 inputs are connected to z 1~ z 6and x 8upper, the 8th input is connected to x 24upper, train this candidate's neuron and calculate its fitness function C 7, C 7<C 6, z7 is joined in network as the 7th hidden neuron.Again z 8join in network as hidden layer candidate neuron, it has 9 inputs, and front 8 inputs are connected to z 1~ z 7and x 8upper, and the 9th input is connected to x 24upper, train this hidden layer candidate's neuron and calculate its fitness function but be less than C 7, the 9th input is connected to x 27upper, train this hidden layer candidate's neuron and calculate its fitness function C 8, because C 8<C 7, by z 8join in network as the 8th hidden neuron.Below again by z 9add in network as hidden layer candidate neuron, it has 10 inputs, and front 9 inputs are connected to z 1~ z 8and x 8upper, the 10th input is connected to x 27, train this hidden layer candidate's neuron and calculate its fitness function C 9, due to C 9<C 8, by z 9join in network as the 9th hidden neuron.Continue to add z 10, as hidden layer candidate neuron, it has 11 inputs, and front 10 inputs are connected to z 1~ z 9and x 8upper, the 11st input is connected to x 27, train this hidden layer candidate's neuron and calculate its fitness function but be less than C 9, the 11st input is connected to x 19upper, train this hidden layer candidate's neuron and calculate its fitness function C 10, because C 10<C 9, z 10join in network as the 10th hidden neuron.Then add z 11as hidden layer candidate neuron, there are 12 inputs, front 11 inputs are connected to z 1~ z 10with upper x 8upper, the 12nd input is connected to x 19upper, train this hidden layer candidate's neuron and calculate its fitness function C 11, because C 11<C 10, z 11join in network.Add z 12as hidden layer candidate neuron, there are 13 inputs, its front 12 inputs are connected to z 1~ z 11with upper x 8upper, the 13rd input is connected to x 19upper, train this hidden layer candidate's neuron and calculate its fitness function but be less than C 11, by 13individual input is connected to x 23upper, train this hidden layer candidate's neuron and calculate its fitness function C 12, because C 12<C 11therefore, z 12join in network as hidden neuron.Add z 13as hidden layer candidate neuron, there are 14 inputs, its front 13 inputs are connected to z 1~ z 12with upper x 8upper, the 14th input is connected to x 23upper, train this hidden layer candidate's neuron and calculate its fitness function C 13<C 12, z 13join in network as hidden neuron.Add z 14as hidden layer candidate neuron, there are 15 inputs, front 14 inputs are connected to z 1~ z 13with upper x 8upper, the 15th input is connected to x 23upper, train this hidden layer candidate's neuron and calculate its fitness function but be less than C 13, now do not have candidate's neuron to connect below, give up z 14, z 13as output neuron.
Network has been selected 8 input feature vectors, 12 hidden neurons and 1 output neuron.First hidden neuron is connected to input node x 8, x 5.The input of output neuron is connected to the output z of hidden neuron 1~ z 12with input node x 8, x 23on.

Claims (3)

1. the speech production method based on DIVA neural network model, comprise speech samples extraction, speech samples classification and study, voice output and correction output voice, it is characterized in that, described speech samples classification adopts self-adaptive growth type neural network to realize the classification learning to speech samples with study, is specially:
Step 1, the speech resonant peak frequency of extraction is converted to matrix form by Jacobian, the dimension of the proper vector of this matrix is the neuronic number m of input layer candidate; The order of calculating the neuronic fitness function value of input layer candidate and increase progressively by fitness function value is arranged candidate's neuron, and the list of input layer candidate neuron fitness function value is S={S accordingly i1≤ S i2≤ ... ≤ S im, and by corresponding order, candidate's neuron is placed in list X to X=(x 1..., x m), described fitness function computing formula is: y ifor real output value, for desired value, n is that number and the n of data centralization sample is natural number;
Step 2, initial hidden neuron number r=0 also establish C 0=S i1, C 0fitness function value during for hidden neuron number r=0;
Step 3, establish r=r+1 and p=r+1, wherein r is r hidden layer candidate neuron, generates hidden layer candidate's neuron that has p input;
If step 4 r>1, is connected respectively to hidden neurons all before it and input node x by this hidden layer candidate neuron 1on; Otherwise this hidden layer candidate neuron is only connected to input node x 1on;
Step 5, the initial value that the residing position h of element in the next set X that needs be connected with the new hidden layer candidate's neuron adding is set are 2, wherein 2≤h≤m, and m, h are positive integer; Neuronic this hidden layer candidate P input is connected on the input node that list X meta is set to h;
Step 6, train this hidden layer candidate's neuron and calculate its fitness function value C rif, C r>=C r-1, perform step seven; If C r<C r-1this hidden layer candidate neuron is connected in network as r hidden neuron, then returns to execution step three to step 6, until do not meet this condition in m input layer access network;
Step 7, by h=h+1, again train this hidden layer candidate neuron, until when h=m, if still do not meet C r<C r-1, finishing training, this hidden layer candidate neuron is irrelevant with classification, gives up this hidden layer candidate neuron, using neuronic this hidden layer candidate previous hidden neuron as output layer;
Step 8, determine phoneme according to the output numerical value of output layer.
2. the speech production method based on DIVA neural network model according to claim 1, is characterized in that: in step 6, train this hidden layer candidate's neuron and calculate its fitness function value C r, be specially:
(1) data set speech resonant peak frequency normalization being formed is divided into training set, checking collection and test set, and the number of samples of the training set of dividing here and checking collection is respectively n a, n b, divide according to being: n a=n b;
(2) according to three set after dividing, utilize following formula to calculate the neuronic fitness function value of hidden layer candidate Cr, C r = E B ( k ) = ( &Sigma; i ( e Bi k ) 2 ) 1 / 2 , i = 1 , &CenterDot; &CenterDot; &CenterDot; , n B , N bfor verifying concentrated sample number, wherein, y b∈ Y b, Y bfor verifying concentrated object vector, U bfor input and the U of checking set pair hidden neuron bfor the matrix of p × 1 vector, W k-1for weight vector, k is iterations, and span is k=0,1,2,3 ..., n, wherein n is positive integer.
3. the speech production method based on DIVA neural network model according to claim 1, it is characterized in that: the output numerical value according to output layer in described step 8 is determined phoneme, be specially: the output numerical value of described output layer is the numerical value in 0 to 1 interval, and determine the corresponding phoneme of AGNN neural network output numerical value according to the corresponding value range of each phoneme in DIVA neural network model.
CN201210219670.3A 2012-06-28 2012-06-28 Voice generation method based on DIVA neural network model Active CN102789594B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210219670.3A CN102789594B (en) 2012-06-28 2012-06-28 Voice generation method based on DIVA neural network model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210219670.3A CN102789594B (en) 2012-06-28 2012-06-28 Voice generation method based on DIVA neural network model

Publications (2)

Publication Number Publication Date
CN102789594A CN102789594A (en) 2012-11-21
CN102789594B true CN102789594B (en) 2014-08-13

Family

ID=47154995

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210219670.3A Active CN102789594B (en) 2012-06-28 2012-06-28 Voice generation method based on DIVA neural network model

Country Status (1)

Country Link
CN (1) CN102789594B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110119810B (en) * 2019-03-29 2023-05-16 华东师范大学 Human behavior dependency analysis method based on neural network
CN112861988B (en) * 2021-03-04 2022-03-11 西南科技大学 Feature matching method based on attention-seeking neural network
CN115565540B (en) * 2022-12-05 2023-04-07 浙江大学 Invasive brain-computer interface Chinese pronunciation decoding method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101650945A (en) * 2009-09-17 2010-02-17 浙江工业大学 Method for recognizing speaker based on multivariate core logistic regression model
CN102201236A (en) * 2011-04-06 2011-09-28 中国人民解放军理工大学 Speaker recognition method combining Gaussian mixture model and quantum neural network
CN102222501A (en) * 2011-06-15 2011-10-19 中国科学院自动化研究所 Method for generating duration parameter in speech synthesis

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101650945A (en) * 2009-09-17 2010-02-17 浙江工业大学 Method for recognizing speaker based on multivariate core logistic regression model
CN102201236A (en) * 2011-04-06 2011-09-28 中国人民解放军理工大学 Speaker recognition method combining Gaussian mixture model and quantum neural network
CN102222501A (en) * 2011-06-15 2011-10-19 中国科学院自动化研究所 Method for generating duration parameter in speech synthesis

Non-Patent Citations (9)

* Cited by examiner, † Cited by third party
Title
A neural network model of speech acquisition and motor equivalent speech production;Frank H. Guenther;《Biological Cybernetics》;19941231;第72卷(第1期);第43-53页 *
Brain-computer interfaces for speech communication;J.S. Brumberg et al.;《Speech communication》;20101231;第52卷(第2期);第367-379页 *
Frank H. Guenther.A neural network model of speech acquisition and motor equivalent speech production.《Biological Cybernetics》.1994,第72卷(第1期),第43-53页.
J.S. Brumberg et al..Brain-computer interfaces for speech communication.《Speech communication》.2010,第52卷(第2期),第367-379页.
一种新的适用于DIVA模型的小脑模型构建方法;张少白等;《2009中国控制与决策会议论文集》;20090617;第954-959页 *
关于DIVA模型中语速对语音生成影响的研究;刘燕燕等;《计算机技术与发展》;20111210;第21卷(第12期);第33-35、40页 *
刘燕燕等.关于DIVA模型中语速对语音生成影响的研究.《计算机技术与发展》.2011,第21卷(第12期),第33-35、40页.
张少白等.一种新的适用于DIVA模型的小脑模型构建方法.《2009中国控制与决策会议论文集》.2009,第954-959页.
张昕等.一种改进的伪逆控制方案在DIVA模型中的应用.《南京邮电大学学报(自然科学版)》.2012,第32卷(第3期),第81-85页. *

Also Published As

Publication number Publication date
CN102789594A (en) 2012-11-21

Similar Documents

Publication Publication Date Title
CN105139864B (en) Audio recognition method and device
CN105741832B (en) Spoken language evaluation method and system based on deep learning
CN105513591B (en) The method and apparatus for carrying out speech recognition with LSTM Recognition with Recurrent Neural Network model
CN107132516B (en) A kind of Radar range profile&#39;s target identification method based on depth confidence network
CN105279555B (en) A kind of adaptive learning neural network implementation method based on evolution algorithm
Wan et al. Day-ahead prediction of wind speed with deep feature learning
CN105243398A (en) Method of improving performance of convolutional neural network based on linear discriminant analysis criterion
CN106297792A (en) The recognition methods of a kind of voice mouth shape cartoon and device
CN111259750A (en) Underwater sound target identification method for optimizing BP neural network based on genetic algorithm
CN105023570B (en) A kind of method and system for realizing sound conversion
CN106683666A (en) Field adaptive method based on deep neural network (DNN)
CN109376933A (en) Lithium ion battery negative material energy density prediction technique neural network based
CN105259331A (en) Uniaxial strength forecasting method for jointed rock mass
CN107293290A (en) The method and apparatus for setting up Speech acoustics model
CN108109615A (en) A kind of construction and application method of the Mongol acoustic model based on DNN
CN102789594B (en) Voice generation method based on DIVA neural network model
CN107862329A (en) A kind of true and false target identification method of Radar range profile&#39;s based on depth confidence network
Haikun et al. Speech recognition model based on deep learning and application in pronunciation quality evaluation system
CN108461080A (en) A kind of Acoustic Modeling method and apparatus based on HLSTM models
Shibata et al. Analytic automated essay scoring based on deep neural networks integrating multidimensional item response theory
CN106611599A (en) Voice recognition method and device based on artificial neural network and electronic equipment
CN113378581B (en) Knowledge tracking method and system based on multivariate concept attention model
CN103680491A (en) Speed dependent prosodic message generating device and speed dependent hierarchical prosodic module
Weihong et al. Optimization of BP neural network classifier using genetic algorithm
CN113743083A (en) Test question difficulty prediction method and system based on deep semantic representation

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20121121

Assignee: Jiangsu Nanyou IOT Technology Park Ltd.

Assignor: NANJING University OF POSTS AND TELECOMMUNICATIONS

Contract record no.: 2016320000207

Denomination of invention: Voice generation method based on DIVA neural network model

Granted publication date: 20140813

License type: Common License

Record date: 20161109

LICC Enforcement, change and cancellation of record of contracts on the licence for exploitation of a patent or utility model
EC01 Cancellation of recordation of patent licensing contract

Assignee: Jiangsu Nanyou IOT Technology Park Ltd.

Assignor: NANJING University OF POSTS AND TELECOMMUNICATIONS

Contract record no.: 2016320000207

Date of cancellation: 20180116

EC01 Cancellation of recordation of patent licensing contract
TR01 Transfer of patent right

Effective date of registration: 20180517

Address after: 510030 Guangzhou, Guangdong, Yuexiu District Beijing Road No. 374, two 1101, 1102 rooms (for office use only).

Patentee after: GUANGZHOU ZIB ARTIFICIAL INTELLIGENCE TECHNOLOGY CO.,LTD.

Address before: 510000 B1B2, one, two, three and four floors of the podium building 231 and 233, science Avenue, Guangzhou, Guangdong.

Patentee before: BOAO ZONGHENG NETWORK TECHNOLOGY Co.,Ltd.

Effective date of registration: 20180517

Address after: 510000 B1B2, one, two, three and four floors of the podium building 231 and 233, science Avenue, Guangzhou, Guangdong.

Patentee after: BOAO ZONGHENG NETWORK TECHNOLOGY Co.,Ltd.

Address before: 210003 new model road 66, Gulou District, Nanjing, Jiangsu

Patentee before: NANJING University OF POSTS AND TELECOMMUNICATIONS

TR01 Transfer of patent right