CN102820037B - Chinese initial and final visualization method based on combination feature - Google Patents
Chinese initial and final visualization method based on combination feature Download PDFInfo
- Publication number
- CN102820037B CN102820037B CN201210252989.6A CN201210252989A CN102820037B CN 102820037 B CN102820037 B CN 102820037B CN 201210252989 A CN201210252989 A CN 201210252989A CN 102820037 B CN102820037 B CN 102820037B
- Authority
- CN
- China
- Prior art keywords
- information
- image
- frame
- feature
- voice signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Abstract
The invention relates to a Chinese initial and final visualization method based on a combination feature, which comprises the steps of: pre-processing a voice signal; calculating the frame number of the pre-processed voice signal as a length feature, representing a resonance strength feature by correlation of a frequency domain peak amplitude and an average amplitude to obtain a resonant peak feature value of each frame signal, and calculating robust feature parameters WPTC1-WPTC20 and PMUSIC-MFCC1-PMUSIC-MFCC12; respectively encoding image width information and image length information by the length feature and the resonance strength feature; encoding the main color information by the resonant peak feature; enabling 32 feature parameters to serve as input of a neural network and the output of the neural network to be corresponding pattern information, wherein the output corresponds to 23 initials and 24 finals sequentially; and fusing the width, length, main color and pattern information in an image and displaying the image on a display screen. The Chinese initial and final visualization method has the advantages that the Chinese initial and final visualization method based on the combination feature is helpful for deaf-mutes for speech training to establish and improve auditory perception and form correct speed reflection so as to recover the speed function of the deaf-mutes.
Description
Technical field
The present invention relates to the method for visualizing of a kind of Chinese phonetic initial consonant and simple or compound vowel of a Chinese syllable, particularly the female method for visualizing of a kind of Chinese phonetic based on assemblage characteristic.
Background technology
Voice are acoustics performances of language, are that mankind's exchange of information is the most natural, the most effective, the means of most convenient, are also a kind of supports of human thinking.And concerning deaf-mute, communication becomes one and is difficult to the thing realizing, a part of deaf-mute is mute is because their hearing organ is destroyed, and voice messaging can not be collected to brain.Research shows, human auditory system and vision system be two different in kinds and there is complementary infosystem, vision system is that the information of a highly-parallel receives and disposal system, millions of cone cells in mankind's eyeball on retina are connected with brain by fibrous nerve fiber, form the channel of a highly-parallel, the speed that vision channel is received information is very high, and according to measuring and estimation, the information receiving velocity while seeing TV roughly can reach
, information rate when this listens voice than auditory system exceeds thousands of times, therefore it is believed that the information that the mankind obtain has 70% to be the saying obtaining by vision.So for deaf and dumb everybody, this is exactly a very large assistant undoubtedly, the defect of the sense of hearing is compensated by vision, voice can not only be heard, can also deaf-mute " be seen " by multiple other forms and see.
The people such as nineteen forty-seven R.K.Potter and G.A.Kopp have just proposed a kind of method for visualizing-sound spectrograph, there is subsequently different voice study experts to begin one's study and improve this voice visualization method, such as proposed the real-time sound spectrograph system that has the people such as chromatogram and G.M.Kuhn in 1984 to propose deaf person to train people such as L.C.Stewart in 1976, and P.E.Stern in 1986, the people such as F.Plante in 1998 and R.Steinberg in 2008 have also proposed improving one's methods of many sound spectrographs, but the sound spectrograph showing is professional very strong, and be difficult to distinguish memory.Especially for the different people of same voice, or even same voice same person all likely causes the variation of sound spectrograph, more bad for its robust performance of the voice signal of recording under varying environment.
In addition, also have some scholars to realize voice visual to the variation of the motion change of people's vocal organs and facial expression, effectively dissected people's phonation, but with regard to its intelligibility of speech, also be difficult to reach ideal effect, except only a few expert, people are difficult to directly by the observation motion of vocal organs and the variation of facial expression and perceptual speech exactly.
Summary of the invention
Technical matters to be solved by this invention is to provide a kind of voice visualization method based on assemblage characteristic of being simply convenient to memory and high robust, by the method, can help deaf-mute to carry out speech training, set up, improve auditory sense cognition, form correct speech reflection, rebuild sense of hearing speech chain, the phonetic function of recovery self that can maximum possible.
Technical solution of the present invention is:
The female method for visualizing of Chinese phonetic based on assemblage characteristic, comprises the following steps:
1, voice signal pre-service
By microphone input speech signal, by obtaining corresponding speech data after processing unit sample quantization, then carry out pre-emphasis, minute frame windowing and end-point detection;
2, feature extraction
(2.1) frame number that calculates pretreated voice signal is as its otonaga features;
(2.2) adopt a kind of relativity of frequency domain peak amplitude size and average amplitude size to represent resonance strength characteristic, for the voice signal after minute frame, the resonance intensity of every frame voice signal is:
Wherein, plural number
represent the
individual harmonic component transforms to the coefficient after frequency domain;
the harmonic wave number that represents this frame signal;
the frequency domain transformed value that represents every frame voice signal;
expression is averaged;
according to dissimilar identification voice, adjust, wherein
;
(2.3) adopt the method based on Hilbert-Huang conversion to estimate pretreated voice signal resonance peak feature, obtain the resonance peak eigenwert F1 of every frame signal, F2, F3;
(2.4) calculate the voice signal robust features parameter (WPTC) based on wavelet package transforms: WPTC1~WPTC20;
(2.5) calculate the robust features parameter (PMUSIC-MFCC) based on MUSIC and apperceive characteristic: PMUSIC-MFCC1~PMUSIC-MFCC 12;
3, width information coding
Adopt otonaga features to encode to picture traverse information, according to the size of viewing area pixel, otonaga features is converted into picture traverse information by linear transformation;
4, length information coding
Adopt resonance strength characteristic to encode to image length information, according to the size of viewing area pixel, each frame resonance strength characteristic mean value is converted into image length information by linear transformation;
5, main color coding
Adopt resonance peak feature to encode to main colouring information, all resonance peak eigenwert F1, F2, F3 averages respectively, then by R=5F1/F3, G=3F3/5F2, B=F2/3F1, converts thereof into main colouring information;
6, neural network design
Described neural network is three layers of BP neural network, and wherein input layer has 32 neurons, and output layer has 6 neurons;
7, pattern-information coding
1, WPTC1~WPTC20 and PMUSIC-MFCC1~PMUSIC-MFCC 12 totally 32 assemblage characteristics as the input of neural network, the output of neural network is corresponding pattern-information, the output layer of neural network has 6 neurons, all adopt binary coding, have 64 different codes, wherein only with front 47 codes, corresponding 23 initial consonant b successively, p, m, f, d, t, n, l, g, k, h, j, q, x, zh, ch, sh, r, z, c, s, y, w and 24 simple or compound vowel of a Chinese syllable a, o, e, i, u, ü, ai, ei, ui, ao, ou, iu, ie, ü e, er, an, en, in, un, ü n, ang, eng, ing, ong,
8, image is synthetic
When image synthesizes, width information, length information, main colouring information and pattern-information are merged in piece image and showing screen display.
When described image synthesizes, first obtain width information and length information and determine image size, then in picture position, add main colouring information, finally, with the main colouring information of pattern-information displacement relevant position, obtain corresponding phonetic image.
During described voice signal pre-service, sample quantization is carried out with the sample frequency of 11.025kHz, the quantified precision of 16bit by processing unit; Pre-emphasis is to realize by single order numeral preemphasis filter, and the coefficient value of its preemphasis filter is between 0.93-0.97; Dividing frame windowing is to carry out with the standard of 256 of frame lengths, and the data after minute frame are added to Hamming window processing, and end-point detection is to utilize in short-term nil product method to carry out.
Described picture traverse information=otonaga features *
,
value so that the image showing is beneficial to observer most observes and be identified as principle.
Described image length information=each frame resonance strength characteristic mean value *
,
value so that the image showing is beneficial to observer most observes and be identified as principle.
The pattern of described initial consonant image is white quality, and the pattern of described simple or compound vowel of a Chinese syllable image is black quality.
When the relativity of described employing frequency domain peak amplitude size and average amplitude size represents resonance strength characteristic, take 256 points as a frame.
Beneficial effect of the present invention is as follows:
(1) the present invention enters in piece image by combining different phonetic features, for deaf-mute create voice signal can reading mode, compared with prior art, there is good robustness and understandability, made up with sound spectrograph and carried out the visual shortcoming distinguishing and remember of being difficult to.No matter be impaired hearing crowd or ordinary people, through specialized training after a while, can pick out intuitively the corresponding pronunciation of this visual image, and exchange with abled person.
(2) the present invention has utilized deaf-mute's vision distinguishing ability and the advantage stronger to the Visual memory of chromatic stimulus fully, and different sound mothers' color of image is different, has improved widely the interest of deaf-mute's study.
(3) the present invention adopts the method for dynamic training collection to train modeling to neural network, has avoided blindly finding training set and has caused the drawback that training load is overweight, has effectively improved the correct coding rate of pattern-information.
(4) the present invention carrys out layout information in conjunction with sound mother's pronunciation law, has greatly reduced deaf and dumb man memory burden.
Accompanying drawing explanation
Fig. 1 is system architecture diagram of the present invention;
Fig. 2 is main color coding block diagram;
Fig. 3 is the structural representation of neural network in Fig. 1;
Fig. 4 is compound vowel pattern-information coding schematic diagram;
Fig. 5 is front vowel followed by a nasal consonant (an en in un ü n) pattern-information coding schematic diagram;
Fig. 6 is rear vowel followed by a nasal consonant (ang eng ing ong) pattern-information coding schematic diagram;
Fig. 7 is bilabial sound (b p m) pattern-information coding schematic diagram;
Fig. 8 is labiodental (f) pattern-information coding schematic diagram;
Fig. 9 is dental (z c s) pattern-information coding schematic diagram;
Figure 10 is blade-alveolar (d t n l) pattern-information coding schematic diagram;
Figure 11 is blade-palatal (zh ch sh r) pattern-information coding schematic diagram;
Figure 12 is dorsal (j q x) pattern-information coding schematic diagram;
Figure 13 is velar (g k h) pattern-information coding schematic diagram;
Figure 14 is initial consonant (y w) pattern-information coding schematic diagram;
Figure 15 is the voice visual effect exemplary plot of initial consonant " y ";
Figure 16 is the voice visual effect exemplary plot of two phonetics " y+u ";
Figure 17 is the voice visual effect exemplary plot of three phonetics " y+u+an ";
Figure 18 is the voice visual effect exemplary plot of initial consonant " y, w " and simple or compound vowel of a Chinese syllable " i, u ".
Embodiment
Below in conjunction with drawings and Examples, technical solutions according to the invention are elaborated:
As shown in Figure 1, the method comprises voice signal pretreatment module, characteristic extracting module, width information encoding block module, length information coding module, main color coding module, neural network design module, pattern-information coding and image synthesis unit, specific as follows:
One, voice signal pre-service
By processing unit with the sample frequency of 11.025kHz, the quantified precision of 16bit carries out sample quantization, obtain corresponding speech data, then with single order numeral preemphasis filter, realize pre-emphasis, the coefficient value scope of its preemphasis filter is that between 0.93-0.97, this example gets 0.9375.Next with the frame length standard of 256 o'clock minute frame, and to the data after minute frame, add Hamming window processing, recycling can nil product method be carried out end-point detection in short-term.Described processing unit can adopt computing machine, single-chip microcomputer or dsp chip etc., and this example be take computing machine as example.
Two, feature extraction
1, otonaga features
Calculate the frame number of pretreated voice signal as its otonaga features, wherein take 256 sampled points as a frame, it is 80 sampled points that frame moves.
2, resonance strength characteristic
Adopt the corresponding relativity of frequency domain peak amplitude size and average amplitude size to represent resonance strength characteristic.
The harmonic-model of voice signal is widely used in speech analysis and synthetic.The core content of harmonic-model is the sinusoidal expression of voice signal, for the voice signal after minute frame, supposes that harmonic characteristic changes not quite in a short time frame, and the resonance intensity of so every frame voice signal is:
Wherein, plural number
represent the
individual harmonic component transforms to the coefficient after frequency domain;
the harmonic wave number that represents this frame signal;
the frequency domain transformed value that represents every frame voice signal;
expression is averaged;
according to dissimilar identification voice, adjust, wherein
range of adjustment be 2 ~ 8, this example is got
.
3, resonance peak feature
The method of employing based on Hilbert-Huang conversion estimated pretreated Speech formant frequency feature, obtains the resonance peak eigenwert F1 of every frame signal, F2, F3.
Each rank formant frequency of the voice signal specifically being gone out according to a preliminary estimate by Fast Fourier Transform (FFT) (FFT) is determined the parameter of respective band pass filters, and by this parameter, voice signal is done to filtering and process, filtered signal is carried out to empirical mode decomposition (EMD) and obtain gang's intrinsic mode function (IMF), by the maximum principle of energy, determine and contain formant frequency IMF, calculate the instantaneous frequency of this IMF and the formant frequency parameter that Hilbert spectrum obtains voice signal.
4, calculate WPTC parameter
According to wavelet package transforms, at each, analyze the permanent Q(quality factor of frequency range) the characteristic feature consistent to the processing characteristic of signal with human auditory system, multi-level division in conjunction with wavelet packet to frequency band, and according to the feature of auditory perceptual frequency band, select adaptively frequency band, calculate the voice signal robust features parameter (WPTC) based on wavelet package transforms: WPTC1~WPTC20.
5, calculate PMUSIC-MFCC parameter
For improving the robustness of voice visual, adopt Multiple Signal Classification method (Multiple Signal Classification, MUSIC) spectrum estimation technique is also introduced apperceive characteristic therein, calculates the robust features parameter (PMUSIC-MFCC) based on MUSIC and apperceive characteristic: PMUSIC-MFCC1~PMUSIC-MFCC 12.
Three, width information coding
Adopt otonaga features to encode to picture traverse information, according to the size of viewing area pixel, otonaga features is converted into picture traverse information by linear transformation, picture traverse information=otonaga features *
value in line with the image that makes to show, be beneficial to the principle that observer observes identification most, it is example that this example be take the viewing area of 300 * 300 pixel sizes, such as the duration information of initial consonant y is the width of 15 pixels after linear operation.
Four, length information coding
Adopt resonance strength characteristic to encode to image length information, according to the size of viewing area pixel, each frame resonance strength characteristic mean value is converted into image length information by linear transformation, length information=each frame resonance strength characteristic mean value *
,
value is in line with making the image showing be beneficial to the principle that observer observes identification most, and this example is got
be 180.If the resonance strength information of initial consonant y is the length of 150 pixels after linear operation.
Five, main color coding
As shown in Figure 2, adopt resonance peak feature to shine upon main colouring information, all resonance peak eigenwert F1, F2, F3 averages respectively, then passes through formula: R=5F1/F3, G=3F3/5F2, B=F2/3F1, convert thereof into main colouring information, wherein coefficient 5,3/5 and 1/3 is to have good color identifying power through experimental verification, select target is to make varying in color of most of pronunciation, contributes to like this deaf-mute's identification memory.By giving the RGB assignment of screen relevant position, obtain main colouring information, red-green-blue amplitude is 1 to obtain white entirely, and red-green-blue amplitude is 0 to obtain black entirely, and each primary colours are additive color rules to the contribution of color.
As three resonance peak mean values of initial consonant " b " are respectively F1=538.97Hz, F2=1059.73Hz, F3=2841.58Hz, thus the R=0.9484 calculating, G=1.6089, B=0.6554, so the main color of image producing is light yellow.
Six, neural network design
As shown in Figure 3, described neural network is three layers of BP neural network, and wherein input layer has 32 neurons, and output layer has 6 neurons.And adopt the method for dynamic training collection to train modeling to neural network, use existing sample set (number is less) to train identification, during identification error, just this speech samples is added in corresponding training set and is gone by reality pronunciation, identify when correct and just this speech samples is given up not, make training set sample more and more abundant under this Practical Condition.When occurring that wrong probability less to a certain extent time, just obtains the neural network model under this practical application condition.
Seven, pattern-information coding
As shown in Fig. 4-Figure 14, with WPTC1~WPTC20 and PMUSIC-MFCC1~PMUSIC-MFCC 12, totally 32 assemblage characteristics are as the input of neural network, and the output of neural network is corresponding pattern-information.The output layer of neural network has 6 neurons, all adopts binary coding, has 64 different codes, wherein only with front 47 codes, successively correspondence 23 initial consonants and 24 simple or compound vowel of a Chinese syllable.Represent initial consonant b as 000000,000001 represents initial consonant p, and by that analogy, and the pattern of each initial consonant image is white quality, the pattern of each simple or compound vowel of a Chinese syllable image is black quality, can show differing texture pattern by changing the saturation degree of the three primary colours RGB of relevant position, a wherein, o, e, i, u, ü does not have pattern.What in figure, have identical patterns has very large similarity in pronunciation, as first three sound b, and p, m is bilabial sound, accordingly pattern is quality striped of adularescent all up and down; For another example d, t, n, l is blade-alveolar, corresponding pattern is in centre position.
Eight, image is synthetic
When image synthesizes, width information, length information, main colouring information and pattern-information are merged in piece image and showing screen display.
Be specially and first obtain width information and length information and determine image size, then in picture position, add main colouring information, finally, with the main colouring information of pattern-information displacement relevant position, obtain corresponding phonetic image.
Above-mentioned whole process is processed by computing machine.
?image is synthetic to be exemplified below:
1, as shown in figure 15, the main color of the image of initial consonant y is blue, and the pattern of adularescent quality above.
2, as shown in figure 16, two phonetic yu, initial consonant and simple or compound vowel of a Chinese syllable piece together a sound, and during phonetic, initial consonant is front, and simple or compound vowel of a Chinese syllable is rear, and vowel u becomes in the back time not only heavily but also is long.
3, as shown in figure 17, three phonetic yuan have a sound between initial consonant and simple or compound vowel of a Chinese syllable in mandarin, can be its referral letter, and vowel u serves as referral letter, and are weakened while serving as referral letter, become short and light, and vowel an becomes in the back time not only heavily but also be long.
4, as shown in figure 18, y and i, although w is very similar to u pronunciation, the two pronunciation is different.Initial consonant y and w pronunciation is short and light, and simple or compound vowel of a Chinese syllable i grows and weighs with u pronunciation, so the two is easy to distinguish, and sound spectrograph is very similar, very difficult identification.
Claims (7)
1. the female method for visualizing of the Chinese phonetic based on assemblage characteristic, is characterized in that:
1.1, voice signal pre-service
By microphone input speech signal, by obtaining corresponding speech data after processing unit sample quantization, then carry out pre-emphasis, minute frame windowing and end-point detection;
1.2, feature extraction
(a) frame number that calculates pretreated voice signal is as its otonaga features;
(b) adopt a kind of relativity of frequency domain peak amplitude size and average amplitude size to represent resonance strength characteristic, for the voice signal after minute frame, the resonance intensity of every frame voice signal is:
Wherein, plural number
represent the
individual harmonic component transforms to the coefficient after frequency domain;
the harmonic wave number that represents this frame signal;
the frequency domain transformed value that represents every frame voice signal;
expression is averaged;
according to dissimilar identification voice, adjust, wherein
;
(c) adopt the method based on Hilbert-Huang conversion to estimate pretreated voice signal resonance peak feature, obtain the resonance peak eigenwert F1 of every frame signal, F2, F3;
(d) calculate the voice signal robust features parameter WPTC:WPTC1~WPTC20 based on wavelet package transforms;
(e) calculate the robust features parameter PMUSIC-MFCC:PMUSIC-MFCC1~PMUSIC-MFCC 12 based on MUSIC and apperceive characteristic;
1.3, width information coding
Adopt otonaga features to encode to picture traverse information, according to the size of viewing area pixel, otonaga features is converted into picture traverse information by linear transformation;
1.4, length information coding
Adopt resonance strength characteristic to encode to image length information, according to the size of viewing area pixel, each frame resonance strength characteristic mean value is converted into image length information by linear transformation;
1.5, main color coding
Adopt resonance peak feature to encode to main colouring information, all resonance peak eigenwert F1, F2, F3 averages respectively, then by R=5F1/F3, G=3F3/5F2, B=F2/3F1, converts thereof into main colouring information;
1.6, neural network design
Described neural network is three layers of BP neural network, and wherein input layer has 32 neurons, and output layer has 6 neurons;
1.7, pattern-information coding
Totally 32 assemblage characteristics are as the input of neural network for WPTC1~WPTC20 and PMUSIC-MFCC1~PMUSIC-MFCC 12, and the output of neural network is corresponding pattern-information; The output layer of neural network has 6 neurons, all adopt binary coding, have 64 different codes, wherein only with front 47 codes, successively correspondence 23 initial consonant b, p, m, f, d, t, n, l, g, k, h, j, q, x, zh, ch, sh, r, z, c, s, y, w and 24 simple or compound vowel of a Chinese syllable a, o, e, i, u, ü, ai, ei, ui, ao, ou, iu, ie, ü e, er, an, en, in, un, ü n, ang, eng, ing, ong;
1.8, image is synthetic
When image synthesizes, width information, length information, main colouring information and pattern-information are merged in piece image and showing screen display.
2. the female method for visualizing of Chinese phonetic based on assemblage characteristic according to claim 1, it is characterized in that: when described image synthesizes, first obtain width information and length information and determine image size, then in picture position, add main colouring information, finally, with the main colouring information of pattern-information displacement relevant position, obtain corresponding phonetic image.
3. the female method for visualizing of the Chinese phonetic based on assemblage characteristic according to claim 1, is characterized in that: during described voice signal pre-service, sample quantization is carried out with the sample frequency of 11.025kHz, the quantified precision of 16bit by processing unit; Pre-emphasis is to realize by single order numeral preemphasis filter, and the coefficient value of its preemphasis filter is 0.93-0.97; Dividing frame windowing is to carry out with the standard of 256 of frame lengths, and the data after minute frame are added to Hamming window processing, and end-point detection is to utilize in short-term nil product method to carry out.
4. the female method for visualizing of Chinese phonetic based on assemblage characteristic according to claim 1 and 2, is characterized in that: described picture traverse information=otonaga features *
,
value so that the image showing is beneficial to observer most observes and be identified as principle.
5. the female method for visualizing of Chinese phonetic based on assemblage characteristic according to claim 1 and 2, is characterized in that: described image length information=each frame resonance strength characteristic mean value *
,
value so that the image showing is beneficial to observer most observes and be identified as principle.
6. the female method for visualizing of the Chinese phonetic based on assemblage characteristic according to claim 1, is characterized in that: the pattern of described initial consonant image is white quality, and the pattern of described simple or compound vowel of a Chinese syllable image is black quality.
7. the female method for visualizing of the Chinese phonetic based on assemblage characteristic according to claim 1, is characterized in that: when the relativity of described employing frequency domain peak amplitude size and average amplitude size represents resonance strength characteristic, take 256 points as a frame.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210252989.6A CN102820037B (en) | 2012-07-21 | 2012-07-21 | Chinese initial and final visualization method based on combination feature |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210252989.6A CN102820037B (en) | 2012-07-21 | 2012-07-21 | Chinese initial and final visualization method based on combination feature |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102820037A CN102820037A (en) | 2012-12-12 |
CN102820037B true CN102820037B (en) | 2014-03-12 |
Family
ID=47304122
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210252989.6A Expired - Fee Related CN102820037B (en) | 2012-07-21 | 2012-07-21 | Chinese initial and final visualization method based on combination feature |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102820037B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105825847B (en) * | 2016-03-16 | 2019-08-16 | 北京语言大学 | The parameter synthesis method and perception scope measurement method, device of front and back nasal sound simple or compound vowel of a Chinese syllable |
CN106024010B (en) * | 2016-05-19 | 2019-08-20 | 渤海大学 | A kind of voice signal dynamic feature extraction method based on formant curve |
CN111009234B (en) * | 2019-12-25 | 2023-06-02 | 上海锦晟电子科技有限公司 | Voice conversion method, device and equipment |
CN111613240B (en) * | 2020-05-22 | 2023-06-27 | 杭州电子科技大学 | Camouflage voice detection method based on attention mechanism and Bi-LSTM |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101894566A (en) * | 2010-07-23 | 2010-11-24 | 北京理工大学 | Visualization method of Chinese mandarin complex vowels based on formant frequency |
CN102176313A (en) * | 2009-10-10 | 2011-09-07 | 北京理工大学 | Formant-frequency-based Mandarin single final vioce visualizing method |
CN102231281A (en) * | 2011-07-18 | 2011-11-02 | 渤海大学 | Voice visualization method based on integration characteristic and neural network |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH1166345A (en) * | 1997-08-11 | 1999-03-09 | Sega Enterp Ltd | Image acoustic processor and recording medium |
US7624019B2 (en) * | 2005-10-17 | 2009-11-24 | Microsoft Corporation | Raising the visibility of a voice-activated user interface |
-
2012
- 2012-07-21 CN CN201210252989.6A patent/CN102820037B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102176313A (en) * | 2009-10-10 | 2011-09-07 | 北京理工大学 | Formant-frequency-based Mandarin single final vioce visualizing method |
CN101894566A (en) * | 2010-07-23 | 2010-11-24 | 北京理工大学 | Visualization method of Chinese mandarin complex vowels based on formant frequency |
CN102231281A (en) * | 2011-07-18 | 2011-11-02 | 渤海大学 | Voice visualization method based on integration characteristic and neural network |
Non-Patent Citations (3)
Title |
---|
JP特开平11-66345A 1999.03.09 |
基于正交实验设计的语音识别特征参数优化;韩志艳等;《计算机科学》;20100131;第37卷(第1期);第214-216、250页 * |
韩志艳等.基于正交实验设计的语音识别特征参数优化.《计算机科学》.2010,第37卷(第1期),第214-216、250页. |
Also Published As
Publication number | Publication date |
---|---|
CN102820037A (en) | 2012-12-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105976809B (en) | Identification method and system based on speech and facial expression bimodal emotion fusion | |
CN102231281B (en) | Voice visualization method based on integration characteristic and neural network | |
CN105023573B (en) | It is detected using speech syllable/vowel/phone boundary of auditory attention clue | |
CN101916566B (en) | Electronic larynx speech reconstructing method and system thereof | |
US20200178883A1 (en) | Method and system for articulation evaluation by fusing acoustic features and articulatory movement features | |
CN110619301A (en) | Emotion automatic identification method based on bimodal signals | |
CN103996155A (en) | Intelligent interaction and psychological comfort robot service system | |
CN102820037B (en) | Chinese initial and final visualization method based on combination feature | |
CN105788608B (en) | Chinese phonetic mother method for visualizing neural network based | |
CN105448291A (en) | Parkinsonism detection method and detection system based on voice | |
CN111326178A (en) | Multi-mode speech emotion recognition system and method based on convolutional neural network | |
WO2022048404A1 (en) | End-to-end virtual object animation generation method and apparatus, storage medium, and terminal | |
CN114863905A (en) | Voice category acquisition method and device, electronic equipment and storage medium | |
CN101894566A (en) | Visualization method of Chinese mandarin complex vowels based on formant frequency | |
CN113349801A (en) | Imaginary speech electroencephalogram signal decoding method based on convolutional neural network | |
CN116434759B (en) | Speaker identification method based on SRS-CL network | |
CN114626424B (en) | Data enhancement-based silent speech recognition method and device | |
CN108831472B (en) | Artificial intelligent sounding system and sounding method based on lip language recognition | |
CN111009262A (en) | Voice gender identification method and system | |
CN111785262B (en) | Speaker age and gender classification method based on residual error network and fusion characteristics | |
CN115472182A (en) | Attention feature fusion-based voice emotion recognition method and device of multi-channel self-encoder | |
CN114550701A (en) | Deep neural network-based Chinese electronic larynx voice conversion device and method | |
JP4381404B2 (en) | Speech synthesis system, speech synthesis method, speech synthesis program | |
CN115410061B (en) | Image-text emotion analysis system based on natural language processing | |
CN116172580B (en) | Auditory attention object decoding method suitable for multi-sound source scene |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20140312 Termination date: 20140721 |
|
EXPY | Termination of patent right or utility model |