CN103310272B - Based on the DIVA neural network model manner of articulation that sound channel action knowledge base is improved - Google Patents

Based on the DIVA neural network model manner of articulation that sound channel action knowledge base is improved Download PDF

Info

Publication number
CN103310272B
CN103310272B CN201310274341.3A CN201310274341A CN103310272B CN 103310272 B CN103310272 B CN 103310272B CN 201310274341 A CN201310274341 A CN 201310274341A CN 103310272 B CN103310272 B CN 103310272B
Authority
CN
China
Prior art keywords
diva
model
sound channel
knowledge base
simulation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201310274341.3A
Other languages
Chinese (zh)
Other versions
CN103310272A (en
Inventor
张少白
徐歆冰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN201310274341.3A priority Critical patent/CN103310272B/en
Publication of CN103310272A publication Critical patent/CN103310272A/en
Application granted granted Critical
Publication of CN103310272B publication Critical patent/CN103310272B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The present invention relates to a kind of manner of articulation, be based especially on the DIVA neural network model manner of articulation that sound channel action knowledge base is improved. The described DIVA neural network model manner of articulation improved based on sound channel action knowledge base utilizes the DIVA neural network model after the improvement that with the addition of sound channel action knowledge base, for the voice not having in voice mapping ensemblen, revised auditory feedback information is obtained in conjunction with disturbance factor, recycle revised auditory feedback information training neutral net, decrease the model frequency of training when producing pronunciation, improve pronunciation accuracy.

Description

Based on the DIVA neural network model manner of articulation that sound channel action knowledge base is improved
Technical field
The present invention relates to a kind of manner of articulation, be based especially on the DIVA neural network model manner of articulation that sound channel action knowledge base is improved.
Background technology
Neurocomputing speech model (Neuro-computationalspeechmodel) is the model by series of complex processes such as computer the Realization of Simulation speech production, perception and acquisitions. The composition of neurocomputing speech model is sufficiently complex, at least includes a cognition part, a motion process part and a sense organ process part: the effect of cognitive part is to produce neuron activation (or generation phoneme characterizes) in speech production and speech perception stage; Motion process part starts from, according to producing phoneme sign activation programming movement, ending at the phonatory organ motion that particular phoneme item is corresponding; Sense organ process part includes producing corresponding audition according to external voice signal and characterizes and activate corresponding phoneme sign.
Up to the present, the research that nerve calculates speech model has been achieved for a lot of achievement, and wherein DIVA (DirectionsIntoofArticulators) model is exactly the neurocomputing speech model of a kind of more advanced speech production, perception and acquisition.
DIVA model is Boston University sound lab Frank.Guenther professor and team's exploitation thereof. Calculating in speech model at the nerve really at present with biophysics's meaning, the definition of DIVA model and test are the most thoroughly, and it or unique a kind of adaptive neural network model applying puppet Adverse control technology. DIVA model can describe the relevant processing procedure in voice acquisition, perception and generation process, it is possible to becomes phoneme, syllable or word by controlling simulated sound living according to principles for self-cultivation. Fig. 1 gives the composition frame chart of DIVA model.
The feature of DIVA model includes:
Model includes the feedforward and two subsystems of feedback control;
The target area of model is made up of the somatesthesia target of fundamental frequency F0, first three formant frequency and correspondence;
The input of model is word, syllable or phoneme. Although the object that model focuses on up to now is still short and simple voice sequence, but the impact (i.e. the rhythm and metrics structure, morphology and word circle etc.) of language must be related to longer more complicated structure by it, and these structures have been considered in a model;
The explanation of coarticulation and its correlation is similar to the window model of Keating by model, but has more advantage than window model in the explanation how target is learnt;
The study of sensory perceptual system is obtained unprecedented success by fully application by DIVA model. The method of its institute's foundation is that the audible sound existed is classified, and need not explain how to be learnt.
There are some defects in DIVA model, these defects be mainly manifested in following some: for model, it is assumed that all status informations provided at set point are all that moment can; Hypothetical model is absent from neural delay and system use instantaneous feedback control; Infrastructural frame for controlling can only select phonatory organ sensation reference frame space or auditory space reference frame, and both can not simultaneously and deposit; Description relative coarseness about the segmentation of cortex and sub-cortex processing procedure and the relatedness of brain region composition.
Summary of the invention
The technical problem to be solved is the deficiency for above-mentioned background technology, it is provided that based on the DIVA neural network model manner of articulation that sound channel action knowledge base is improved.
The present invention adopts the following technical scheme that for achieving the above object
Based on the DIVA neural network model manner of articulation that sound channel action knowledge base is improved, comprise the steps:
Step 1, builds the DIVA neurocomputing speech model improved: add the sound channel action knowledge base acting on simulation phonatory organ in DIVA neurocomputing speech model,
Described sound channel action knowledge base, activate and start from the activation that the phoneme of speech items characterizes, the programming movement of high frequency syllable has been obtained when processing high frequency syllable, voice mapping ensemblen activates programming movement, the sound channel action that each syllable is corresponding produces motor neuron and activates pattern, neuromuscular processes the motion that result in phonatory organ and allows to generate voice signal by pronunciation-auditory model, and the planning activated in similar syllable verbal audio by voice mapping ensemblen when processing low frequency syllable activates programming movement;
Step 2, gathers the formant frequency of pronunciation unit, as the input quantity of DIVA neurocomputing speech model;
Step 3, is mapped in voice mapping ensemblen by the input quantity of DIVA neural network model, and initializing all of phoneme unit in voice mapping ensemblen is unactivated state;
Step 4, the peak frequency of shaking of input arbitrarily pronunciation unit, train the DIVA neurocomputing speech model improved based on sound channel action knowledge base:
When being present in the identical factor unit of peak frequency of shaking of pronunciation unit of input in voice mapping ensemblen, simulation phonatory organ are directly over the feedforward and send the pronunciation unit of input;
Otherwise, simulation phonatory organ send the pronunciation unit of input through feedback control study.
The described DIVA neural network model manner of articulation improved based on sound channel action knowledge base, the pronunciation unit detailed description of the invention that the simulation phonatory organ described in step 4 send input through feedback control is as follows:
Simulation phonatory organ are imposed disturbance pronunciation unit, gather the auditory feedback information of DIVA model, somatesthesia feedback information by step A, and somatesthesia error map collection is obtained somatesthesia feedback command by somatesthesia target area and somatesthesia feedback information;
Step B, is mapped to hearing status mapping ensemblen by the auditory feedback information of DIVA model, disturbance pronunciation unit;
Step C, audition error map collection obtains auditory feedback order according to input quantity and the described simulation phonatory organ auditory feedback information of described DIVA neural network model;
Step D, phonatory organ speed and position mapping ensemblen obtain the training burden of described simulation phonatory organ according to somatesthesia feedback command, auditory feedback order, and simulation phonatory organ pronounce under the effect of sound channel action knowledge base.
The present invention adopts technique scheme, has the advantages that the frequency of training reducing model when producing pronunciation, improves pronunciation accuracy.
Accompanying drawing explanation
Fig. 1 is the block diagram of DIVA model.
Fig. 2 is the block diagram of sound channel vibration knowledge base.
Fig. 3 is the block diagram of the DIVA model improved.
Detailed description of the invention
Below in conjunction with accompanying drawing, the technical scheme of invention is described in detail:
Fig. 2 gives the composition frame chart of sound channel action knowledge base model. Sound channel action knowledge base comprises sensorimotor (sensory-motor), articulatory skill (speakingskills) and comparable psychology syllable (mentalsyllabary).
The workflow of sound channel action knowledge base model is divided into voice to produce and classification two stages of perception:
Voice produces stage workflow: sound channel action knowledge base model activates and starts from the activation that the phoneme of speech items characterizes, and this speech pattern is to process syllable one by one. In the situation processing high frequency syllable, model has obtained the programming movement of high frequency syllable, and first programming movement is activated by voice mapping ensemblen, and the sound channel action that then each syllable is corresponding produces motor neuron and activates pattern. Neuromuscular subsequently processes the motion that result in phonatory organ, and allows to generate voice signal by pronunciation-auditory model. The sensory modalities of the identical syllable above obtained is activated by voice mapping ensemblen simultaneously. In Fig. 3, state TS is corresponding with state ES, and then produces current syllable. When there is notable difference, audition and somatesthesia error signal are mapped by voice to be transmitted, and is used for changing the programming movement of new or after renewal a syllable. When low frequency syllable, activate the planning in similar syllable verbal audio by voice mapping ensemblen and activate programming movement module and then produce programming movement.
Classification perception stage model workflow is: speech perception starts from the generation of external voice signal. If being intended to phoneme recognition, it is necessary to be that the signal of high frequency syllable could realize. For this purpose, signal carries out pretreatment in peripheral and lower dermal layer regions, and impermanent memory is loaded into external auditory state. Then its neuron activation pattern is passed to physical training condition mapping ensemblen, first results in the common of the neuron areas on voice map horizontal and activates, next to that the special neuronic common activation in phoneme mapping ensemblen level; First pronunciation representing this syllable, second harmonious sounds representing this syllable. This nervous pathway is mapped by voice, also referred to as the speech perception of dorsal part nerve tract, also jointly activates a programming movement for high frequency syllable. Second nerve tract, such as veutro nerve tract in speech perception, directly contact audition activates pattern and speech processing module. Assuming that dorsal part nerve tract is highly important in speech acquisition process, and veutro nerve tract is occupied an leading position in adult speech's perception later.
Improvement DIVA model of the present invention is as it is shown on figure 3, with the addition of the sound channel action base module and disturbance module that act on simulation phonatory organ.
Model use different phonetic to phoneme, sense organ, programming movement mapping ensemblen initialization train 200 examples. The period of babbling out one's first speech sounds and imitate each case-based system in period model " knowledge " is stored in during voice mapping ensemblen maps to other mapping ensemblens two-way neural. In voice mapping ensemblen, neuron is expressed as:
The realization of (a) vowel or vowel-consonant phoneme state;
(b) programming movement state;
(c) hearing status;
(d) somatesthesia state.
Training experiment includes babbling stage and imitation stage (being embodied in DIVA model).In the babbling stage, model is associated together programming movement state and hearing status. On this basis, this model can produce programming movement in the imitation exercise stage.
In the imitation exercise stage, voice mapping ensemblen level occurs in that phoneme regions. After having carried out these initial experiments, we have proceeded more complicated model language, including vowel--, consonant-vowel--with consonant-vowel vowel--syllable, this is based on a bigger consonant collection. Training show again the strict sequence of a voice mapping ensemblen, and this sequence is relevant to the consonant type of characteristics of speech sounds, phoneme alignment characteristics and cluster.
In order to understand workflow and the voice effect of the DIVA process after improvement, we use the DIVA model after improvement to carry out following study experiment:
1. five vowel systems/i, e, a, o, a u/
2. little consonant system (the simple syllable being combined into by turbid plosive/b, d, g/ and 5 vowels obtaining before)
3. a little language model, including five vowel systems, turbid plosive and tenuis/b, d, g, p, t, k/, rhinophonia/m, n/, sidetone/l/ and three syllable types (V, CV, CCV)
4. test modal 200 syllables in English with the testing standard of the child of 6 years old.
Step 1, builds the DIVA neurocomputing speech model improved: add the sound channel action knowledge base acting on simulation phonatory organ in DIVA neurocomputing speech model;
Step 2, gathers the formant frequency of pronunciation unit, as the input quantity of DIVA neurocomputing speech model;
Step 3, is mapped in voice mapping ensemblen by the input quantity of DIVA neural network model, and initializing all of phoneme unit in voice mapping ensemblen is unactivated state;
Step 4, the peak frequency of shaking of input arbitrarily pronunciation unit, train the DIVA neurocomputing speech model improved based on sound channel action knowledge base:
When being present in the identical factor unit of peak frequency of shaking of pronunciation unit of input in voice mapping ensemblen, simulation phonatory organ are directly over the feedforward and send the pronunciation unit of input;
Otherwise, simulation phonatory organ send the pronunciation unit of input through feedback control study.
In step 4, the pronunciation unit detailed description of the invention that simulation phonatory organ send input through feedback control is as follows:
Simulation phonatory organ are imposed disturbance pronunciation unit, gather the auditory feedback information of DIVA model, somatesthesia feedback information by step A, and somatesthesia error map collection is obtained somatesthesia feedback command by somatesthesia target area and somatesthesia feedback information;
Step B, is mapped to hearing status mapping ensemblen by the auditory feedback information of DIVA model, disturbance pronunciation unit;
Step C, audition error map collection obtains auditory feedback order according to input quantity and the described simulation phonatory organ auditory feedback information of described DIVA neural network model;
Step D, phonatory organ speed and position mapping ensemblen obtain the training burden of described simulation phonatory organ according to somatesthesia feedback command, auditory feedback order, and simulation phonatory organ pronounce under the effect of sound channel action knowledge base.
The unit that disturbance pronounced is mapped to the purpose of hearing status mapping ensemblen and is in that to improve hearing status mapping ensemblen further, the addition of sound channel action knowledge base is intended to the action of abundant simulation phonatory organ, and then improve pronunciation degree of accuracy, improve the learning efficiency of whole DIVA model.
Amended model has been integrated in sensorimotor and cognition.When the severe problem that in sound language processing procedure, voice or sensorimotor model face is voice not to be obtained, the development of phoneme mapping ensemblen is modeled. This problem has been improved by we, introduces a feasible solution: be do not having the clear and definite voice introducing voice mapping ensemblen to obtain the incipient stage, making behavior knowledge storehouse and mental lexicon direct-coupling. So our amended DIVA model time delay of pronouncing compared with original model is less, and accuracy is higher.
The present invention is compared with prior art, there is following significant advantage: the present invention is based on DIVA neural network model, neuroanatomy and neuro physiology level describe and simulate the correlation function of pronunciation, to model add disturbance module, enable model more efficient, accurately produce pronunciation; Model is added sound channel action base module and enriches the original channel configuration of DIVA model, reduce the model frequency of training when producing pronunciation, improve pronunciation accuracy. DIVA neural network model may finally by the combination with brain-computer interface (BCI), construct meet Chinese speech sounding rule, have real physiologic meaning Chinese speech generate with obtain neural computation model, thus constructing " the thought reader " with Chinese's Thinking Characteristics further to establish theory and practice basis.

Claims (2)

1. the DIVA neural network model manner of articulation improved based on sound channel action knowledge base, it is characterised in that comprise the steps:
Step 1, builds the DIVA neurocomputing speech model improved: add the sound channel action knowledge base acting on simulation phonatory organ in DIVA neurocomputing speech model,
Described sound channel action knowledge base activates and starts from the activation that the phoneme of speech items characterizes, the programming movement of high frequency syllable has been obtained when processing high frequency syllable, voice mapping ensemblen activates programming movement, the sound channel action that each syllable is corresponding produces motor neuron and activates pattern, neuromuscular processes the motion that result in phonatory organ and allows to generate voice signal by pronunciation-auditory model, and the planning activated in similar syllable verbal audio by voice mapping ensemblen when processing low frequency syllable activates programming movement;
Step 2, gathers the formant frequency of pronunciation unit, as the input quantity of DIVA neurocomputing speech model;
Step 3, is mapped in voice mapping ensemblen by the input quantity of DIVA neural network model, and initializing all of phoneme unit in voice mapping ensemblen is unactivated state;
Step 4, the peak frequency of shaking of input arbitrarily pronunciation unit, train the DIVA neurocomputing speech model improved based on sound channel action knowledge base:
When being present in the identical factor unit of peak frequency of shaking of pronunciation unit of input in voice mapping ensemblen, simulation phonatory organ are directly over the feedforward and send the pronunciation unit of input;
Otherwise, simulation phonatory organ send the pronunciation unit of input through feedback control study.
2. the DIVA neural network model manner of articulation improved based on sound channel action knowledge base according to claim 1, it is characterised in that the pronunciation unit detailed description of the invention that the simulation phonatory organ described in step 4 send input through feedback control is as follows:
Simulation phonatory organ are imposed disturbance pronunciation unit, gather the auditory feedback information of DIVA model, somatesthesia feedback information by step A, and somatesthesia error map collection is obtained somatesthesia feedback command by somatesthesia target area and somatesthesia feedback information;
Step B, is mapped to hearing status mapping ensemblen by the auditory feedback information of DIVA model, disturbance pronunciation unit;
Step C, audition error map collection obtains auditory feedback order according to input quantity and the described simulation phonatory organ auditory feedback information of described DIVA neural network model;
Step D, phonatory organ speed and position mapping ensemblen obtain the training burden of described simulation phonatory organ according to somatesthesia feedback command, auditory feedback order, and simulation phonatory organ pronounce under the effect of sound channel action knowledge base.
CN201310274341.3A 2013-07-02 2013-07-02 Based on the DIVA neural network model manner of articulation that sound channel action knowledge base is improved Expired - Fee Related CN103310272B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310274341.3A CN103310272B (en) 2013-07-02 2013-07-02 Based on the DIVA neural network model manner of articulation that sound channel action knowledge base is improved

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310274341.3A CN103310272B (en) 2013-07-02 2013-07-02 Based on the DIVA neural network model manner of articulation that sound channel action knowledge base is improved

Publications (2)

Publication Number Publication Date
CN103310272A CN103310272A (en) 2013-09-18
CN103310272B true CN103310272B (en) 2016-06-08

Family

ID=49135459

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310274341.3A Expired - Fee Related CN103310272B (en) 2013-07-02 2013-07-02 Based on the DIVA neural network model manner of articulation that sound channel action knowledge base is improved

Country Status (1)

Country Link
CN (1) CN103310272B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104605845B (en) * 2015-01-30 2017-01-25 南京邮电大学 Electroencephalogram signal processing method based on DIVA model
CN104679249B (en) * 2015-03-06 2017-07-07 南京邮电大学 A kind of Chinese brain-computer interface implementation method based on DIVA models
CN107368895A (en) * 2016-05-13 2017-11-21 扬州大学 A kind of combination machine learning and the action knowledge extraction method planned automatically

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5586033A (en) * 1992-09-10 1996-12-17 Deere & Company Control system with neural network trained as general and local models
CN102201236A (en) * 2011-04-06 2011-09-28 中国人民解放军理工大学 Speaker recognition method combining Gaussian mixture model and quantum neural network
CN102880906A (en) * 2012-07-10 2013-01-16 南京邮电大学 Chinese vowel pronunciation method based on DIVA nerve network model

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5586033A (en) * 1992-09-10 1996-12-17 Deere & Company Control system with neural network trained as general and local models
CN102201236A (en) * 2011-04-06 2011-09-28 中国人民解放军理工大学 Speaker recognition method combining Gaussian mixture model and quantum neural network
CN102880906A (en) * 2012-07-10 2013-01-16 南京邮电大学 Chinese vowel pronunciation method based on DIVA nerve network model

Also Published As

Publication number Publication date
CN103310272A (en) 2013-09-18

Similar Documents

Publication Publication Date Title
Moulin-Frier et al. COSMO (“Communicating about Objects using Sensory–Motor Operations”): A Bayesian modeling framework for studying speech communication and the emergence of phonological systems
Ling et al. Modeling spectral envelopes using restricted Boltzmann machines and deep belief networks for statistical parametric speech synthesis
Kröger et al. Towards a neurocomputational model of speech production and perception
CN104538024B (en) Phoneme synthesizing method, device and equipment
Kröger et al. Associative learning and self-organization as basic principles for simulating speech acquisition, speech production, and speech perception
CN103366618A (en) Scene device for Chinese learning training based on artificial intelligence and virtual reality
CN108960407A (en) Recurrent neural network language model training method, device, equipment and medium
Caponetti et al. Biologically inspired emotion recognition from speech
Murakami et al. Seeing [u] aids vocal learning: Babbling and imitation of vowels using a 3D vocal tract model, reinforcement learning, and reservoir computing
CN103310272B (en) Based on the DIVA neural network model manner of articulation that sound channel action knowledge base is improved
Prom-on et al. Identifying underlying articulatory targets of Thai vowels from acoustic data based on an analysis-by-synthesis approach
Prom-on et al. Training an articulatory synthesizer with continuous acoustic data.
CN102880906B (en) Chinese vowel pronunciation method based on DIVA nerve network model
Kröger et al. Phonemic, sensory, and motor representations in an action-based neurocomputational model of speech production
Kröger et al. Emergence of an action repository as part of a biologically inspired model of speech processing: the role of somatosensory information in learning phonetic-phonological sound features
Ananthakrishnan et al. Using imitation to learn infant-adult acoustic mappings
Kröger et al. Phonetotopy within a neurocomputational model of speech production and speech acquisition
Kim et al. Estimation of the movement trajectories of non-crucial articulators based on the detection of crucial moments and physiological constraints.
Shaobai et al. Research on the mechanism for phonating stressed English syllables based on DIVA model
Lapthawan et al. Estimating underlying articulatory targets of Thai vowels by using deep learning based on generating synthetic samples from a 3D vocal tract model and data augmentation
Kröger et al. The LS Model (Lexicon-Syllabary Model)
Shitov Computational speech acquisition for articulatory synthesis
Rasilo et al. Discovering Articulatory Speech Targets from Synthesized Random Babble.
Liu Fundamental frequency modelling: An articulatory perspective with target approximation and deep learning
Ni et al. Superpositional HMM-based intonation synthesis using a functional F0 model

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20130918

Assignee: Jiangsu Nanyou IOT Technology Park Ltd.

Assignor: Nanjing Post & Telecommunication Univ.

Contract record no.: 2016320000207

Denomination of invention: Articulation method of Directions Into of Articulators (DIVA) neural network model improved on basis of track action knowledge base

Granted publication date: 20160608

License type: Common License

Record date: 20161109

LICC Enforcement, change and cancellation of record of contracts on the licence for exploitation of a patent or utility model
EC01 Cancellation of recordation of patent licensing contract

Assignee: Jiangsu Nanyou IOT Technology Park Ltd.

Assignor: Nanjing Post & Telecommunication Univ.

Contract record no.: 2016320000207

Date of cancellation: 20180116

EC01 Cancellation of recordation of patent licensing contract
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160608

Termination date: 20190702

CF01 Termination of patent right due to non-payment of annual fee