CN105788608A - Chinese initial consonant and compound vowel visualization method based on neural network - Google Patents

Chinese initial consonant and compound vowel visualization method based on neural network Download PDF

Info

Publication number
CN105788608A
CN105788608A CN201610121430.8A CN201610121430A CN105788608A CN 105788608 A CN105788608 A CN 105788608A CN 201610121430 A CN201610121430 A CN 201610121430A CN 105788608 A CN105788608 A CN 105788608A
Authority
CN
China
Prior art keywords
sound
neural network
voice signal
mother
wavelet neural
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610121430.8A
Other languages
Chinese (zh)
Other versions
CN105788608B (en
Inventor
韩志艳
王健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bohai University
Original Assignee
Bohai University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bohai University filed Critical Bohai University
Priority to CN201610121430.8A priority Critical patent/CN105788608B/en
Publication of CN105788608A publication Critical patent/CN105788608A/en
Application granted granted Critical
Publication of CN105788608B publication Critical patent/CN105788608B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
    • G10L21/10Transforming into visible information
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention provides a Chinese initial consonant and compound vowel visualization method based on a neural network. The method comprises a step of obtaining a voice signal, a step of extracting a voice signal characteristic parameter and carrying out PCA dimension reduction, a step of designing and training a wavelet neural network, wherein the 64 binary codes output by the wavelet neural network are orderly corresponding to the 8*8 squares in a display screen, wherein the first 47 binary codes and the squares corresponding to the first 47 binary codes are orderly corresponding to 47 initial consonants and compound vowels arranged according to pronunciation characteristics, and when the voice signal comprehensive characteristic vector of an initial consonant or compound vowel is inputted into the wavelet neural network, the output of the wavelet neural network is the position information of the initial consonant or compound vowel, a step of dividing the 47 initial consonants and compound vowels into 12 groups, assigning different values to the RGB of the squares corresponding to the 12 groups of initial consonants and compound vowels to obtain color information, and a step of synthesizing the position information and the color information to achieve the visualization of initial consonants and compound vowels. The method is convenient to remember by a deft-mute and has good robustness and understandability, and the deft-mute can accurately identify the pronunciation corresponding to a visual image.

Description

Chinese phonetic mother's method for visualizing based on neutral net
Technical field
The present invention relates to the method for visualizing of a kind of Chinese phonetic initial consonant and simple or compound vowel of a Chinese syllable, particularly to a kind of Chinese phonetic mother's method for visualizing based on neutral net.
Background technology
Voice is the acoustics performance of language, is that Human communication's information is the most natural, most effective, the means of most convenient, is indispensable in daily life.But for deaf mute, communication is a thing that cannot realize.Research shows, the mankind, in the extraneous process of perception, receive that information rate is the fastest, acquisition information is maximum is vision, if can visually perceptual speech, deaf mute is carried out speech training by this, sets up, improves auditory sense cognition and have huge help.
Nineteen forty-seven R.K.Potter and G.A.Kopp et al. proposes a kind of voice visualization method sound spectrograph, there is different voice study experts to begin one's study subsequently and improve this voice visualization method, such as propose chromatogram and G.M.Kuhn in 1984 et al. at L.C.Stewart in 1976 et al. and propose the real-time sound spectrograph system that person hard of hearing is trained, and P.E.Stern in 1986, F.Plante in 1998 and R.Steinberg in 2008 et al. it is also proposed the improved method of many sound spectrographs, but the sound spectrograph of display is professional very strong, and be difficult to distinguish memory.Especially for the people that same voice is different, or even same voice same person is likely to cause the change of sound spectrograph, more bad for its robust performance of voice signal recorded under varying environment.
In addition, some scholars are also had to realize voice visual based on the motion change of phonatory organ and the change of facial expression, effectively dissect the phonation of people, but for its intelligibility of speech, it is difficult to reach ideal effect, beyond depolarization minority expert, people are difficult to either directly through observing the change of the motion of phonatory organ and facial expression and perceptual speech exactly.
Summary of the invention
For the deficiencies in the prior art, the present invention proposes the Chinese phonetic mother's method for visualizing based on neutral net.Specifically comprising the following steps that of the method
Step 1, voice signal obtain: utilize mike input speech data, and by obtaining corresponding voice signal after processing unit sample quantization.
Step 2, speech signal pre-processing: the voice signal obtained is carried out preemphasis, framing windowing and end-point detection.
Step 3, phonic signal character parameter extraction.
Step 3.1, employing estimate pretreated Speech formant frequency feature based on the method for Hibert-Huang conversion, obtain formant eigenvalue F1, F2, F3, the F4 of every frame signal;
Step 3.2, calculate based on the voice signal robust features parameter WPTC:WPTC1~WPTC20 of wavelet package transforms.
Step 3.3, calculate based on the robust features parameter PMUSIC-MFCC:PMUSIC-MFCC1~PMUSIC-MFCC12 of MUSIC and perception characteristic.
Step 3.4, calculating Mel frequency cepstral coefficient MFCC:MFCC1~MFCC12.
Step 4, PCA dimensionality reduction: utilize PCA PCA that above-mentioned phonic signal character parameter is carried out dimension-reduction treatment, it is thus achieved that voice signal multi-feature vector.
Step 5, neutral net design: adopt three layers wavelet neural network, wherein input layer has 12 neurons, hidden layer has 8 neurons, output layer has 6 neurons, utilizes M voice signal multi-feature vector to train this wavelet neural network, it is desirable to error is P, maximum iteration time is Q, if this wavelet neural network output error reaches maximum iteration time less than anticipation error or frequency of training, then deconditioning wavelet neural network, complete neutral net design.
nullStep 6、Positional information maps: the output layer of wavelet neural network has 6 neurons,All adopt binary coding,Have 64 different binary codings,Display screen arranges 64 grids,64 grids are lined up 8 row 8 and are arranged,64 binary codings are according to by left-to-right、Order from top to bottom is corresponding in turn to 8 × 8 grids,Wherein front 47 binary codings and grid corresponding to front 47 binary codings are corresponding in turn to the sound mother aoeiu ü ywaneninun ü njqxbpmfdtnlangengingongzhchshrgkhzcsaieiuiaoouiuie ü eer according to pronunciation characteristic sequence,When the voice signal multi-feature vector input wavelet neural network that some sound is female,Wavelet neural network is output as the binary coding of grid corresponding to this sound mother,This binary coding is the positional information that this sound is female,Grid corresponding to this sound mother is selected.
Step 7, colouring information obtain: according to pronunciation characteristic or place of articulation, 47 sound mothers are divided into 12 groups, and respectively the RGB of grid corresponding for 12 groups of sound mothers are composed different values, and the grid making 12 groups of sound mothers corresponding shows different colors.
Step 8, information synthesize: synthesising position information and colouring information, and when inputting the female voice signal multi-feature vector of some sound, grid corresponding to this sound mother shows certain color, all the other grids display black, it is achieved the visualization that sound is female.
When in described step 1, voice signal obtains, wherein the sample frequency of sample quantization is 11.025KHz, quantified precision is 16bit.
In described step 2 during speech signal pre-processing, wherein preemphasis is to utilize single order digital pre-emphasis filter to realize, the coefficient value scope of preemphasis filter is between 0.93-0.97, framing is to carry out with the standard of frame length 256, and the voice signal after framing is added Hamming window process, end-point detection is to utilize short-time energy-zero-product method to carry out.
In described step 8 during information synthesis, synthesising position information and colouring information are the positional information of the sound mother first obtaining input, more corresponding grid is added colouring information, and the grid making this sound mother corresponding shows certain color.
Beneficial effect:
1) present invention designs, in conjunction with the pronunciation characteristic that sound is female, the positional information that each sound is female, it is simple to deaf mute remembers;
2) present invention is according to pronunciation characteristic or place of articulation by the region that 47 grid image division are 12 groups of different colours, has given full play to the advantage that deaf mute is stronger to the Visual memory of chromatic stimulus;
3) positional information and colouring information are synthesized in piece image by the present invention, achieve voice signal visualization, compared with prior art, there is good robustness and understandability, compensate for the shortcoming that sound spectrograph is difficult to distinguish and remember, through specialized training after a while, deaf mute can go out the pronunciation corresponding to visual image by accurate recognition, exchanges with abled person;
4) present invention utilizes wavelet neural network to map to realize positional information, and wavelet neural network has the advantage of structure designability, convergence precision controllability and fast convergence rate, is effectively improved the correct coding rate that Chinese phonetic is female.
Accompanying drawing explanation
Fig. 1 is the flow chart of an embodiment of the present invention;
Fig. 2 is the structural representation of the wavelet neural network of an embodiment of the present invention;
The positional information that Fig. 3 is an embodiment of the present invention maps schematic diagram;
Fig. 4 is the voice visual effect exemplary plot of the initial consonant p of an embodiment of the present invention;
Fig. 5 is the voice visual effect exemplary plot of the simple or compound vowel of a Chinese syllable o of an embodiment of the present invention;
Fig. 6 is initial consonant y, the w voice visual effect exemplary plot with simple or compound vowel of a Chinese syllable i, u of an embodiment of the present invention.
Detailed description of the invention
Below in conjunction with accompanying drawing, the specific embodiment of the invention is elaborated.Based on Chinese phonetic mother's method for visualizing of neutral net, comprise the following steps that, as shown in Figure 1:
Step 1, voice signal obtain: utilize mike input speech data, and carried out sample quantization by processing units such as computer, single-chip microcomputer or dsp chips with the quantified precision of the sample frequency of 11.025KHz, 16bit, it is thus achieved that corresponding voice signal.The present embodiment uses computer as processing unit.
Step 2, speech signal pre-processing: the voice signal obtained is carried out preemphasis, framing windowing and end-point detection.Utilizing single order digital pre-emphasis filter that the voice signal obtained is carried out preemphasis process, wherein the coefficient value scope of preemphasis filter is between 0.93-0.97, takes 0.9375 in the present embodiment.Then carrying out sub-frame processing with the standard of frame length 256, and the voice signal after framing adds Hamming window process, recycling short-time energy-zero-product method carries out end-point detection.
Step 3, phonic signal character parameter extraction.
Step 3.1, employing estimate pretreated Speech formant frequency feature based on the method for Hibert-Huang conversion, obtain formant eigenvalue F1, F2, F3, the F4 of every frame signal.
Each rank formant frequency of the voice signal gone out according to a preliminary estimate by fast Fourier transform (FFT) determines the parameter of respective band pass filters, and by this parameter, voice signal is made Filtering Processing, filtered voice signal is carried out empirical mode decomposition (EMD) and obtains family's intrinsic mode function (IMF), determine that the instantaneous frequency and the Hilbert that calculate this IMF compose the formant frequency parameter namely obtaining voice signal containing formant frequency IMF by the maximum principle of energy.
Step 3.2, calculate based on the voice signal robust features parameter WPTC:WPTC1~WPTC20 of wavelet package transforms.
According to wavelet package transforms in the feature processing characteristic of voice signal is consistent with human auditory system of permanent Q (quality factor) characteristic of each analysis frequency range, in conjunction with wavelet packet, the multi-level of frequency band is divided, and the feature according to auditory perceptual frequency band, select frequency band adaptively, calculate the voice signal robust features parameter WPTC:WPTC1~WPTC20 based on wavelet package transforms.
Step 3.3, calculate based on the robust features parameter PMUSIC-MFCC:PMUSIC-MFCC1~PMUSIC-MFCC12 of MUSIC and perception characteristic.
For improving the robustness of voice visual, adopt Multiple signal classification (MultipleSignalClassification, MUSIC) Power estimation technology also introduces perception characteristic wherein, calculates the robust features parameter PMUSIC-MFCC:PMUSIC-MFCC1~PMUSIC-MFCC12 based on MUSIC and perception characteristic.
Step 3.4, calculating Mel frequency cepstral coefficient MFCC:MFCC1~MFCC12.
To carry out discrete Fourier transform through pretreated every frame voice signal and obtain linear spectral, and obtain Mel frequency by Mel frequency filter group, then taking the logarithm and carrying out discrete cosine transform obtains Mel frequency cepstral coefficient.
Step 4, PCA dimensionality reduction: utilize PCA PCA that above-mentioned phonic signal character parameter is carried out dimension-reduction treatment, tieed up phonic signal character vector by 48 and reduce to 12 dimension voice signal multi-feature vectors.
Step 5, neutral net design: adopt three layers wavelet neural network, as shown in Figure 2, wherein input layer has 12 neurons, and hidden layer has 8 neurons, and output layer has 6 neurons, 1000 voice signal multi-feature vectors are utilized to train this wavelet neural network, anticipation error is 0.001, and maximum iteration time is 200, if this wavelet neural network output error reaches maximum iteration time less than anticipation error or frequency of training, then deconditioning wavelet neural network, completes neutral net design.
nullStep 6、Positional information maps: the output layer of wavelet neural network has 6 neurons,All adopt binary coding,Have 64 different binary codings,Display screen arranges 64 grids,64 grids are lined up 8 row 8 and are arranged,64 binary codings are according to by left-to-right、Order from top to bottom is corresponding in turn to 8 × 8 grids,Wherein front 47 binary codings and grid corresponding to front 47 binary codings are corresponding in turn to the sound mother aoeiu ü ywaneninun ü njqxbpmfdtnlangengingongzhchshrgkhzcsaieiuiaoouiuie ü eer according to pronunciation characteristic sequence,As shown in Figure 3,When the voice signal multi-feature vector input wavelet neural network that some sound is female,Wavelet neural network is output as the binary coding of grid corresponding to this sound mother,This binary coding is the positional information that this sound is female,Grid corresponding to this sound mother is selected,Such as 000000 grid representing the first row first row,Correspond to simple or compound vowel of a Chinese syllable a,000001 grid representing the first row secondary series,Correspond to simple or compound vowel of a Chinese syllable o,By that analogy.
Step 7, colouring information obtain: according to pronunciation characteristic or place of articulation, 47 sound mothers are divided into 12 groups, and respectively the RGB of grid corresponding for 12 groups of sound mothers are composed different values, and the grid making 12 groups of sound mothers corresponding shows different colors, such as binary coding 000000,000001,000010,000011,000100,000101 is 1st district, i.e. single vowel district, set R=0.95, G=0.75, B=0.68, color is pink;Binary coding 000110,000111 is 2nd district, i.e. yw district, sets R=0, G=0.95, B=0, and color is green;Binary coding 001000,001001,001010,001011,001100 is 3rd district, Ji Qian vowel followed by a nasal consonant district, sets R=0.52, G=0.38, B=0.76, and color is bluish violet;Binary coding 001101,001110,001111 is 4th district, i.e. dorsal district, sets R=0.25, G=0.52, B=0.18, and color is bottle green;Binary coding 010000,010001,010010 is 5th district, i.e. the lips range of sound, sets R=0.12, G=0.98, B=0.76, and color is aeruginous;Binary coding 010011 is 6th district, i.e. the lips and teeth range of sound, sets R=0, G=0, B=0.55, and color is blue;Binary coding 010100,010101,010110,010111 is 7th district, i.e. blade-alveolar district, sets R=0.75, G=0, B=0.55, and color is purple;Binary coding 011000,011001,011010,011011 is 8th district, Ji Hou vowel followed by a nasal consonant district, sets R=0.75, G=0, B=0, and color is red;Binary coding 011100,011101,011110,011111 is 9th district, i.e. blade-palatal district, sets R=0.98, G=0.96, B=0, and color is yellow;Binary coding 100000,100001,100010 is 10th district, the i.e. root of the tongue range of sound, sets R=0.87, G=0.87, B=0.79, and color is canescence;Binary coding 100011,100100,100101 is 11st district, i.e. dental district, sets R=0.74, G=0.42, B=0, and color is brown;Binary coding 100110,100111,101000,101001,101010,101011,101100,101101,101110 is 12nd district, i.e. compound vowel district, sets R=1, G=1, B=1, and color is white.
Step 8, information synthesize: synthesising position information and colouring information, when inputting the voice signal multi-feature vector of some sound mother, grid corresponding to this sound mother shows certain color, all the other grids display black, realize the visualization that sound is female, described synthesising position information and colouring information are the positional information of the sound mother first obtaining input, more corresponding grid is added colouring information, and the grid making this sound mother corresponding shows certain color.As shown in Figure 4, grid position corresponding for initial consonant p is at the third line secondary series, and binary coding is 010001, and color is aeruginous.As it is shown in figure 5, grid position corresponding to simple or compound vowel of a Chinese syllable o is at the first row secondary series, binary coding is 000001, and color is pink.As shown in Figure 6, y and i, both w and u pronounce much like, and sound spectrograph is also very similar, be difficult to identification, and the present invention is easily discriminated out.

Claims (5)

1. the Chinese phonetic mother's method for visualizing based on neutral net, it is characterised in that: comprise the steps:
Step 1, voice signal obtain: utilize mike input speech data, and by obtaining corresponding voice signal after processing unit sample quantization;
Step 2, speech signal pre-processing: the voice signal obtained is carried out preemphasis, framing windowing and end-point detection;
Step 3, phonic signal character parameter extraction;
Step 4, PCA dimensionality reduction: utilize PCA PCA that above-mentioned phonic signal character parameter is carried out dimension-reduction treatment, it is thus achieved that voice signal multi-feature vector;
Step 5, neutral net design: adopt three layers wavelet neural network, wherein input layer has 12 neurons, hidden layer has 8 neurons, output layer has 6 neurons, utilizes M voice signal multi-feature vector to train this wavelet neural network, it is desirable to error is P, maximum iteration time is Q, if this wavelet neural network output error reaches maximum iteration time less than anticipation error or frequency of training, then deconditioning wavelet neural network, complete neutral net design;
nullStep 6、Positional information maps: the output layer of wavelet neural network has 6 neurons,All adopt binary coding,Have 64 different binary codings,Display screen arranges 64 grids,64 grids are lined up 8 row 8 and are arranged,64 binary codings are according to by left-to-right、Order from top to bottom is corresponding in turn to 8 × 8 grids,Wherein front 47 binary codings and grid corresponding to front 47 binary codings are corresponding in turn to the sound mother aoeiu ü ywaneninun ü njqxbpmfdtnlangengingongzhchshrgkhzcsaieiuiaoouiuie ü eer according to pronunciation characteristic sequence,When the voice signal multi-feature vector input wavelet neural network that some sound is female,Wavelet neural network is output as the binary coding of grid corresponding to this sound mother,This binary coding is the positional information that this sound is female,Grid corresponding to this sound mother is selected;
Step 7, colouring information obtain: according to pronunciation characteristic or place of articulation, 47 sound mothers are divided into 12 groups, and respectively the RGB of grid corresponding for 12 groups of sound mothers are composed different values, and the grid making 12 groups of sound mothers corresponding shows different colors;
Step 8, information synthesize: synthesising position information and colouring information, and when inputting the female voice signal multi-feature vector of some sound, grid corresponding to this sound mother shows certain color, all the other grids display black, it is achieved the visualization that sound is female.
2. the Chinese phonetic mother's method for visualizing based on neutral net according to claim 1, it is characterised in that: described step 3 specifically comprises the following steps that
Step 3.1, employing estimate pretreated Speech formant frequency feature based on the method for Hibert-Huang conversion, obtain formant eigenvalue F1, F2, F3, the F4 of every frame signal;
Step 3.2, calculate based on the voice signal robust features parameter WPTC:WPTC1~WPTC20 of wavelet package transforms;
Step 3.3, calculate based on the robust features parameter PMUSIC-MFCC:PMUSIC-MFCC1~PMUSIC-MFCC12 of MUSIC and perception characteristic;
Step 3.4, calculating Mel frequency cepstral coefficient MFCC:MFCC1~MFCC12.
3. the Chinese phonetic mother's method for visualizing based on neutral net according to claim 1, it is characterised in that: in described step 1, processing unit carries out sample quantization with the quantified precision of the sample frequency of 11.025KHz, 16bit.
4. the Chinese phonetic mother's method for visualizing based on neutral net according to claim 1, it is characterized in that: described step 2 concrete grammar is as follows: preemphasis is to utilize single order digital pre-emphasis filter to realize, the coefficient value scope of preemphasis filter is between 0.93-0.97, framing is to carry out with the standard of frame length 256, and the voice signal after framing is added Hamming window process, end-point detection is to utilize short-time energy-zero-product method to carry out.
5. the Chinese phonetic mother's method for visualizing based on neutral net according to claim 1, it is characterized in that: in described step 8 during information synthesis, synthesising position information and colouring information are the positional information of the sound mother first obtaining input, again corresponding grid being added colouring information, the grid making this sound mother corresponding shows certain color.
CN201610121430.8A 2016-03-03 2016-03-03 Chinese phonetic mother method for visualizing neural network based Expired - Fee Related CN105788608B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610121430.8A CN105788608B (en) 2016-03-03 2016-03-03 Chinese phonetic mother method for visualizing neural network based

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610121430.8A CN105788608B (en) 2016-03-03 2016-03-03 Chinese phonetic mother method for visualizing neural network based

Publications (2)

Publication Number Publication Date
CN105788608A true CN105788608A (en) 2016-07-20
CN105788608B CN105788608B (en) 2019-03-26

Family

ID=56387776

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610121430.8A Expired - Fee Related CN105788608B (en) 2016-03-03 2016-03-03 Chinese phonetic mother method for visualizing neural network based

Country Status (1)

Country Link
CN (1) CN105788608B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111312208A (en) * 2020-03-09 2020-06-19 广州深声科技有限公司 Neural network vocoder system with irrelevant speakers
CN111599347A (en) * 2020-05-27 2020-08-28 广州科慧健远医疗科技有限公司 Standardized sampling method for extracting pathological voice MFCC (Mel frequency cepstrum coefficient) features for artificial intelligence analysis
CN111899724A (en) * 2020-08-06 2020-11-06 中国人民解放军空军预警学院 Voice feature coefficient extraction method based on Hilbert-Huang transform and related equipment
CN112101462A (en) * 2020-09-16 2020-12-18 北京邮电大学 Electromechanical device audio-visual information fusion method based on BMFCC-GBFB-DNN
CN112270406A (en) * 2020-11-11 2021-01-26 浙江大学 Neural information visualization method of brain-like computer operating system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6070140A (en) * 1995-06-05 2000-05-30 Tran; Bao Q. Speech recognizer
CN102231281A (en) * 2011-07-18 2011-11-02 渤海大学 Voice visualization method based on integration characteristic and neural network
CN103177733A (en) * 2013-03-11 2013-06-26 哈尔滨师范大学 Method and system for evaluating Chinese mandarin retroflex suffixation pronunciation quality
KR20140079937A (en) * 2012-12-20 2014-06-30 엘지전자 주식회사 Mobile device for having touch sensor and method for controlling the same
CN104205062A (en) * 2012-03-26 2014-12-10 微软公司 Profile data visualization
CN104392728A (en) * 2014-11-26 2015-03-04 东北师范大学 Colored repeated sentence spectrum construction method for speech reconstruction
US20150235637A1 (en) * 2014-02-14 2015-08-20 Google Inc. Recognizing speech in the presence of additional audio

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6070140A (en) * 1995-06-05 2000-05-30 Tran; Bao Q. Speech recognizer
CN102231281A (en) * 2011-07-18 2011-11-02 渤海大学 Voice visualization method based on integration characteristic and neural network
CN104205062A (en) * 2012-03-26 2014-12-10 微软公司 Profile data visualization
KR20140079937A (en) * 2012-12-20 2014-06-30 엘지전자 주식회사 Mobile device for having touch sensor and method for controlling the same
CN103177733A (en) * 2013-03-11 2013-06-26 哈尔滨师范大学 Method and system for evaluating Chinese mandarin retroflex suffixation pronunciation quality
US20150235637A1 (en) * 2014-02-14 2015-08-20 Google Inc. Recognizing speech in the presence of additional audio
CN104392728A (en) * 2014-11-26 2015-03-04 东北师范大学 Colored repeated sentence spectrum construction method for speech reconstruction

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
苏敏 等: ""基于模糊粗神经网络的汉语声韵母切分"", 《电声技术》 *
韩志艳 等: ""基于遗传小波神经网络的语音识别分类器设计"", 《计算机科学》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111312208A (en) * 2020-03-09 2020-06-19 广州深声科技有限公司 Neural network vocoder system with irrelevant speakers
CN111599347A (en) * 2020-05-27 2020-08-28 广州科慧健远医疗科技有限公司 Standardized sampling method for extracting pathological voice MFCC (Mel frequency cepstrum coefficient) features for artificial intelligence analysis
CN111599347B (en) * 2020-05-27 2024-04-16 广州科慧健远医疗科技有限公司 Standardized sampling method for extracting pathological voice MFCC (functional peripheral component interconnect) characteristics for artificial intelligent analysis
CN111899724A (en) * 2020-08-06 2020-11-06 中国人民解放军空军预警学院 Voice feature coefficient extraction method based on Hilbert-Huang transform and related equipment
CN112101462A (en) * 2020-09-16 2020-12-18 北京邮电大学 Electromechanical device audio-visual information fusion method based on BMFCC-GBFB-DNN
CN112101462B (en) * 2020-09-16 2022-04-19 北京邮电大学 Electromechanical device audio-visual information fusion method based on BMFCC-GBFB-DNN
CN112270406A (en) * 2020-11-11 2021-01-26 浙江大学 Neural information visualization method of brain-like computer operating system

Also Published As

Publication number Publication date
CN105788608B (en) 2019-03-26

Similar Documents

Publication Publication Date Title
CN105788608B (en) Chinese phonetic mother method for visualizing neural network based
CN104272382B (en) Personalized singing synthetic method based on template and system
CN110085245B (en) Voice definition enhancing method based on acoustic feature conversion
Lee EMG-based speech recognition using hidden Markov models with global control variables
CN110675891B (en) Voice separation method and module based on multilayer attention mechanism
CN108564965B (en) Anti-noise voice recognition system
CN101916566A (en) Electronic larynx speech reconstructing method and system thereof
CN106653056A (en) Fundamental frequency extraction model based on LSTM recurrent neural network and training method thereof
CN102426834B (en) Method for testing rhythm level of spoken English
CN107293286A (en) A kind of speech samples collection method that game is dubbed based on network
CN112992121B (en) Voice enhancement method based on attention residual error learning
CN112382308A (en) Zero-order voice conversion system and method based on deep learning and simple acoustic features
CN102176313B (en) Formant-frequency-based Mandarin single final vioce visualizing method
CN116364096B (en) Electroencephalogram signal voice decoding method based on generation countermeasure network
CN109452932A (en) A kind of Constitution Identification method and apparatus based on sound
CN107274887A (en) Speaker's Further Feature Extraction method based on fusion feature MGFCC
CN111326170B (en) Method and device for converting ear voice into normal voice by combining time-frequency domain expansion convolution
Diener et al. Improving fundamental frequency generation in emg-to-speech conversion using a quantization approach
CN110349565B (en) Auxiliary pronunciation learning method and system for hearing-impaired people
CN102820037B (en) Chinese initial and final visualization method based on combination feature
CN101894566A (en) Visualization method of Chinese mandarin complex vowels based on formant frequency
Krecichwost et al. Automated detection of sigmatism using deep learning applied to multichannel speech signal
Healy et al. Deep learning based speaker separation and dereverberation can generalize across different languages to improve intelligibility
CN117854473A (en) Zero sample speech synthesis method based on local association information
CN102231275B (en) Embedded speech synthesis method based on weighted mixed excitation

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190326

Termination date: 20200303