CN101894566A - Visualization method of Chinese mandarin complex vowels based on formant frequency - Google Patents
Visualization method of Chinese mandarin complex vowels based on formant frequency Download PDFInfo
- Publication number
- CN101894566A CN101894566A CN2010102348459A CN201010234845A CN101894566A CN 101894566 A CN101894566 A CN 101894566A CN 2010102348459 A CN2010102348459 A CN 2010102348459A CN 201010234845 A CN201010234845 A CN 201010234845A CN 101894566 A CN101894566 A CN 101894566A
- Authority
- CN
- China
- Prior art keywords
- formant
- complex
- voice
- vowel
- formant frequency
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 241001672694 Citrus reticulata Species 0.000 title claims abstract description 21
- 238000007794 visualization technique Methods 0.000 title claims abstract description 4
- 238000000605 extraction Methods 0.000 claims abstract description 5
- 238000012800 visualization Methods 0.000 claims abstract description 5
- 239000003086 colorant Substances 0.000 claims abstract description 4
- 238000001514 detection method Methods 0.000 claims abstract description 3
- 238000009432 framing Methods 0.000 claims abstract 2
- 150000001875 compounds Chemical class 0.000 claims description 21
- 238000000034 method Methods 0.000 claims description 20
- 238000010586 diagram Methods 0.000 claims description 8
- 238000001914 filtration Methods 0.000 claims description 7
- 239000000284 extract Substances 0.000 claims description 4
- 230000008859 change Effects 0.000 claims description 2
- 230000000007 visual effect Effects 0.000 description 24
- 210000000056 organ Anatomy 0.000 description 7
- 238000009499 grossing Methods 0.000 description 6
- 230000001755 vocal effect Effects 0.000 description 6
- 230000001771 impaired effect Effects 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 230000001815 facial effect Effects 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000007704 transition Effects 0.000 description 4
- 238000005070 sampling Methods 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000008921 facial expression Effects 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000021615 conjugation Effects 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 239000010432 diamond Substances 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 210000000697 sensory organ Anatomy 0.000 description 1
- 210000001584 soft palate Anatomy 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Landscapes
- Electrically Operated Instructional Devices (AREA)
Abstract
The invention relates to a visualization method of Chinese mandarin complex vowel based on formant frequency, comprising the following steps: characteristics extraction, i.e. carrying out prefiltering, framing, pre-emphasis, windowing and endpoint detection on primitive complex vowels, and extracting the first three formant frequencies F1, F2 and F3 of each frame signal; and realization steps of complex vowels visualization, i.e. expressing a first formant frequency F1 with the abscissa, expressing specific value between the two formant frequencies with the ordinate, calculating the values of F2/F1 and F1/F2 for each frame, and showing the points (F1, F2/F1) and (F1, F3/F2) on coordinate graphs with different icons and colors. The invention visually shows the complex vowels by the images, can exactly distinguish the voice signals of the complex vowels, only needs to extract short time mean energy, first three formant frequencies and others simple voice acoustics parameters of the voice signals and is easy for realization.
Description
Technical field
The present invention relates to a kind of method for visualizing of Chinese mandarin complex vowels, particularly a kind of compound vowel method for visualizing based on formant frequency belongs to the voice visual field.
Background technology
Voice are the sound with difference meaning function that human vocal organs send, and are indispensable in daily life.But for the impaired hearing crowd, owing to do not receive sufficient acoustic information, it usually is very difficult fluently exchanging smoothly concerning them.Studies show that in people's perception to external world, what the information of obtaining was maximum is vision, next is only the sense of hearing, and the information that the combination of the vision and the sense of hearing is obtained than any single sense organ perception is all many.In addition, experience is also told us, and chart is that people express thoughts, transmit information most convenient, one of method the most intuitively, so people also attempt to come perceptual speech from vision, perhaps more useful information is transmitted in the combination of audio-visual.A kind of voice visual method is explored and sought to purpose of the present invention exactly, promptly utilizes visual element to show voice, reaches the purpose of " by the visually-perceptible voice ", for the effective perceptual speech of impaired hearing crowd, exercise orthoepy provide actual help.
Before the present invention, a lot of voice visual methods all are based on faceform or vocal organs.This method is carried out qualitative or quantitative description to pronunciation mouth shape.Qualitative description is as the size of circle lip, flat lip, opening, height of tongue position or the like.Present many applications need be carried out objectively quantitative measurement to the vision voice,, machine automatic labiomaney synthetic as visual human's face or the like.International standard MPEG-4 has defined people's face defined parameters FDP (Facial Definition Parameter), human face animation parameter F AP (facial animation parameter) and human face animation parameter linear module FAPU (Facial Animation Parameter Unit), wherein the advantage of FAP parameter has made it become the international standard of human face animation, and it by the definition human face animation FAPU of parameter unit (facial animation parameter unit) standard different people face difference, make same parameter can on different faceforms, make similar human face expression.
Realize the comparatively hommization of method of voice visual based on the variation of the motion change of vocal organs and facial expression, analyzed the phonation of human body effectively, help the impaired hearing crowd to practise pronunciation.But the sound that sends for soft palate, these inner vocal organs of lower jaw just is difficult to show effectively by vision.Simultaneously, with regard to its intelligibility of speech, also be difficult to reach ideal effect, except that the only a few expert, people are difficult to directly by the motion of observation vocal organs perceptual speech accurately and efficiently.In addition, visual effect is more single, and expressive force is not strong.
In addition, the human auditory properties that also had some scholar's research is attempted by analyzing hearing organ's hearing mechanism, utilizes corresponding auditory model to obtain distinguishing characteristics information and The Visual Implementation in addition between the voice signal.But, also being in the elementary step at present for human auditory's The Characteristic Study, the information that we can utilize is also very limited.
Summary of the invention
Technical matters to be solved by this invention is the method for visualizing that a kind of voice will be provided, and by the different phonetic feature is integrated into single image, makes image have readability.These class methods adopt different color, icon and different icon sizes, visually represent voice in the mode of image.With compare based on vocal organs model, faceform, the voice visual method of integrating based on phonetic feature possesses good readability, intelligibility.No matter impaired hearing crowd or ordinary people after a relatively short training, can identify the visual image of corresponding pronunciation intuitively.By reading the visual image of this invention, we can make a distinction diphthong compound vowel in the standard Chinese at an easy rate.
Technical scheme of the present invention is:
A kind of Chinese mandarin complex vowels voice visual method based on formant frequency may further comprise the steps:
One, feature extraction, concrete grammar is:
(1) original compound vowel is carried out pre-filtering, eliminate power frequency and disturb;
(2) compound vowel after the pre-filtering is carried out branch frame, pre-emphasis, windowing and end-point detection, determine the initial end points and the end caps of compound vowel;
(3) first three formant frequency F1, F2, the F3 of every frame signal between initial end points of extraction and the end caps;
Two, compound vowel The Visual Implementation step, concrete grammar is: represent the first formant frequency F1 with horizontal ordinate, ordinate is represented two ratios between the formant frequency, for each frame, calculate the value of F2/F1 and F3/F2, and with point (F1, F2/F1) and (F1 F3/F2) is illustrated on the coordinate diagram with different icons or different colours respectively.
The radius of each point is with the increase of frame number rule or dwindle on the coordinate diagram, thereby can reflect formant trajectory direction over time on coordinate diagram intuitively.
Beneficial effect:
(1) the present invention represents compound vowel intuitively by image, utilize the first resonance peak F1 over time trend and F2/F1 and F3/F2 trend and relative position relation are distinguished different Chinese mandarin complex vowels pronunciations over time.Image difference between the Chinese mandarin complex vowels is obvious, therefore can accurately distinguish the compound vowel voice signal.For some specific compound vowel, can also distinguish more exactly by the degree of rarefication of two tracks and the overlapping situation of two tracks.
(2) the present invention only extracts the simple voice parameters,acoustic such as short-time average energy, first three formant frequency of voice signal, is easy to realize.
Description of drawings
Fig. 1 is a Chinese mandarin complex vowels voice visual system chart.
Fig. 2 finds the solution process flow diagram for formant frequency.
Fig. 3 is a male voice Chinese mandarin complex vowels ai voice visual effect exemplary plot.
Fig. 4 is a female voice Chinese mandarin complex vowels ai voice visual effect exemplary plot.
Fig. 5 is a male voice Chinese mandarin complex vowels ao voice visual effect exemplary plot.
Fig. 6 is a female voice Chinese mandarin complex vowels ao voice visual effect exemplary plot.
Fig. 7 is a male voice Chinese mandarin complex vowels ia voice visual effect exemplary plot.
Fig. 8 is a female voice Chinese mandarin complex vowels ia voice visual effect exemplary plot.
Fig. 9 is a male voice Chinese mandarin complex vowels ve voice visual effect exemplary plot.
Figure 10 is a female voice Chinese mandarin complex vowels ve voice visual effect exemplary plot.
Figure 11 is a male voice Chinese mandarin complex vowels ua voice visual effect exemplary plot.
Figure 12 is a female voice Chinese mandarin complex vowels ua voice visual effect exemplary plot.
Embodiment
Below in conjunction with accompanying drawing, specify specific embodiments of the invention.
Shown in Figure 1 is a system chart having realized the method for the invention, mainly is divided into two major parts: characteristic extracting module and effect of visualization figure generation module.
One, characteristic extracting module, this module has realized characteristic extraction step of the present invention.
At first, voice signal is carried out pre-service such as pre-filtering, branch frame, windowing.Directly extract short-time energy, preceding 3 formant frequencies of every frame voice signal then, give up the formant frequency of last some frame of compound vowel latter half and carry out corresponding linear time axis conversion and smoothing processing afterwards.
(1) short-time energy of voice signal:
Wherein, m is the starting point of window, and N is window long (counting).
(2) utilize the LPC technology to find the solution formant frequency:
As shown in Figure 2, at first, utilize the LPC technology to obtain the transition function H (z) of voice system.The root of polynomial correspondence of the transition function H (z) of a digital filter the pole and zero of system frequency transfer curve.According to this theory, the transition function H (z) of the voice here is full polar form, has only the denominator polynomial expression, that is:
Wherein M is the linear prediction exponent number.
In the formula, r
iBe the mould of compound radical, θ
iBe argument.Theoretical derivation shows, they and formant frequency F
iFollowing relation is arranged:
F
i=θ
i/2πT
i(4)
T in the formula
iIt is the sampling period.Concerning general speech analysis, the M value is 10-18.
(2) linear time base conversion process
For diphtong, what its differentiation was played a decisive role is the formant frequency of its The initial segment and middle transition section, so we at first give up the formant frequency of some frame of compound vowel latter half.Because the formant trajectory length difference of different compound vowels, because the course length difference of different compound vowels, we need carry out regular to formant trajectory.Resonance peak length this paper after regular gets 50 frames, and frame number just no longer compresses less than 50 after the partial frame if give up, when frame number greater than 50 the time, regular coefficient is:
Coeff=formant trajectory original length/regular back formant trajectory length (5)
If n node of original formant trajectory is x
1<x
2<....<x
n, its corresponding formant frequency value is y
i(i=1,2 ... .n).The m of the formant trajectory after a regular node is
Its corresponding formant frequency is z
i(i=1,2 ... .m).
In order to obtain the formant trajectory after regular
The frequency values of node at first will
Node is mapped on the original formant frequency, obtains corresponding position x
i, and
Because x
iIn most of the cases be non-integer, we just are chosen at x
iHithermost two some x
I-1And x
I+1Frequency values calculate regular back formant trajectory
Frequency values:
z
i=y
i-1*(x
i+1-x
i)+y
i+1(x
i-x
i-1)(7)
(4) median filter smoothness of image is handled:
It is a kind of method that adopts the statistics with histogram processing of sliding window that median smoothing is handled.Its ultimate principle is: establish { x (n) } and be input signal, { y (n) } is the output of median filter, and window is long to be 2L+1, n so
0Output valve y (the n at place
0) be exactly that center with window moves on to n
0The intermediate value of input sample in the window during place.So-called intermediate value is exactly 2L+1 input sample x (n
0-L), x (n
0-L+1) ..., x (n
0), x (n
0+ 1), x (n
0+ 2) ..., x (n
0+ L) add up, obtain an accumulative histogram, wherein 1/2 fractile is exactly an intermediate value.
Medium filtering can be corrected indivedual singular points and the value of sampling point around not influencing.
Linear smoothing is to carry out linear filtering with sliding window to handle, that is:
Wherein w (m), and m=-L ,-L+ 1 ..., 0,1,2 ..., L} is a 2L+1 point smoothing windows, satisfies:
For example the value of 3 windows desirable 0.25,0.5,0.25}.Linear smoothing is in rectified input signal in the unsmooth place sample value, and the value of each sampling point is revised near also making, more than two kinds of smoothing techniques can combine use.
Two, effect of visualization is realized module:
Fig. 3---Figure 12 shows that diphthong zero initial simple or compound vowel of a Chinese syllable/ai/ in the standard Chinese ,/ao/ ,/ia/ ,/ve/ and/the effect of visualization figure of ua/, wherein corresponding each simple or compound vowel of a Chinese syllable comprises the pronunciation of male voice and the pronunciation of female voice again.Represent the first formant frequency F1 with horizontal ordinate, ordinate is represented two ratios between the formant frequency, for each frame, calculate the value of F2/F1 and F3/F2, and with point (F1, F2/F1) and (F1 F3/F2) is illustrated on the coordinate diagram with different icons or different colours respectively.In the present embodiment, the respective icon of in each view, representing F2/F1 and F3/F2 respectively with red round dot and blue Diamond spot.In order to reflect formant trajectory order over time, the radius of each icon changes by following rule:
d
i=3+i
0.6(i represents i icon, d
iBe i icon diameter) (10)
The first resonance peak F1 trend and F1 over time on image, have been reflected, F2, relation between the F3 three, utilize F1 whether according to from big to small variation order, F3/F2 and F2/F1 whether according to from big to small variation tendency and in view the value of F3/F2 whether distinguish different standard Chinese diphthong simple or compound vowel of a Chinese syllable pronunciations greater than the value of F2/F1, for some specific compound vowel, the F2/F1 track that it can also be seen that them distributes more sparse, F3/F2 track and F2/F1 track overlap at the place, end, and this all provides additional information for we distinguish compound vowel more exactly.Concrete grammar is:
As can be seen from the figure, the variation tendency of each pronunciation F1 and variation tendency and the position of F2/F1 and F3/F2 concern obvious difference, and the human eye ratio is easier to they are divided into several big classes.The F2/F1 of indivedual pronunciations and F3/F2 track present discontinuous, and this mainly is because the resonance peak of some frame extracts due to the mistake.
Utilize the method for the invention, the Chinese mandarin complex vowels voice signal is expressed as the coordinate diagram that can intuitively distinguish, can provide actual help for the effective perceptual speech of impaired hearing crowd, exercise orthoepy.
Above embodiment is only in order to illustrate that technical scheme of the present invention is not intended to limit; Although with reference to preferred embodiment the present invention is had been described in detail, those of ordinary skill in the field should be appreciated that still and can make amendment or the part technical characterictic is equal to replacement the specific embodiment of the present invention; And not breaking away from the spirit of technical solution of the present invention, it all should be encompassed in the middle of the technical scheme scope that the present invention asks for protection.
Claims (2)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2010102348459A CN101894566A (en) | 2010-07-23 | 2010-07-23 | Visualization method of Chinese mandarin complex vowels based on formant frequency |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2010102348459A CN101894566A (en) | 2010-07-23 | 2010-07-23 | Visualization method of Chinese mandarin complex vowels based on formant frequency |
Publications (1)
Publication Number | Publication Date |
---|---|
CN101894566A true CN101894566A (en) | 2010-11-24 |
Family
ID=43103737
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2010102348459A Pending CN101894566A (en) | 2010-07-23 | 2010-07-23 | Visualization method of Chinese mandarin complex vowels based on formant frequency |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101894566A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102231281A (en) * | 2011-07-18 | 2011-11-02 | 渤海大学 | Voice visualization method based on integration characteristic and neural network |
CN102820037A (en) * | 2012-07-21 | 2012-12-12 | 渤海大学 | Chinese initial and final visualization method based on combination feature |
CN103077728A (en) * | 2012-12-31 | 2013-05-01 | 上海师范大学 | Patient weak voice endpoint detection method |
CN107993071A (en) * | 2017-11-21 | 2018-05-04 | 平安科技(深圳)有限公司 | Electronic device, auth method and storage medium based on vocal print |
CN108962251A (en) * | 2018-06-26 | 2018-12-07 | 珠海金山网络游戏科技有限公司 | A kind of game role Chinese speech automatic identifying method |
TWI749796B (en) * | 2020-09-30 | 2021-12-11 | 瑞軒科技股份有限公司 | Resonance test system and resonance test method |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0473698A (en) * | 1990-07-13 | 1992-03-09 | Sony Corp | Shape control method based on audio signal |
CN101281747A (en) * | 2008-05-30 | 2008-10-08 | 苏州大学 | Chinese Ear Speech Tone Recognition Method Based on Vocal Tract Parameters |
-
2010
- 2010-07-23 CN CN2010102348459A patent/CN101894566A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0473698A (en) * | 1990-07-13 | 1992-03-09 | Sony Corp | Shape control method based on audio signal |
CN101281747A (en) * | 2008-05-30 | 2008-10-08 | 苏州大学 | Chinese Ear Speech Tone Recognition Method Based on Vocal Tract Parameters |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102231281A (en) * | 2011-07-18 | 2011-11-02 | 渤海大学 | Voice visualization method based on integration characteristic and neural network |
CN102231281B (en) * | 2011-07-18 | 2012-07-18 | 渤海大学 | Voice visualization method based on integration characteristic and neural network |
CN102820037A (en) * | 2012-07-21 | 2012-12-12 | 渤海大学 | Chinese initial and final visualization method based on combination feature |
CN102820037B (en) * | 2012-07-21 | 2014-03-12 | 渤海大学 | Chinese initial and final visualization method based on combination feature |
CN103077728A (en) * | 2012-12-31 | 2013-05-01 | 上海师范大学 | Patient weak voice endpoint detection method |
CN103077728B (en) * | 2012-12-31 | 2015-08-19 | 上海师范大学 | A kind of patient's weak voice endpoint detection method |
CN107993071A (en) * | 2017-11-21 | 2018-05-04 | 平安科技(深圳)有限公司 | Electronic device, auth method and storage medium based on vocal print |
CN108962251A (en) * | 2018-06-26 | 2018-12-07 | 珠海金山网络游戏科技有限公司 | A kind of game role Chinese speech automatic identifying method |
TWI749796B (en) * | 2020-09-30 | 2021-12-11 | 瑞軒科技股份有限公司 | Resonance test system and resonance test method |
US11641462B2 (en) | 2020-09-30 | 2023-05-02 | Amtran Technology Co., Ltd. | Resonant testing system and resonant testing method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Sandoval et al. | Automatic assessment of vowel space area | |
CN108564942B (en) | Voice emotion recognition method and system based on adjustable sensitivity | |
Drugman et al. | Glottal source processing: From analysis to applications | |
Peterson et al. | A physiological theory of phonetics | |
Gobl et al. | 11 voice source variation and its communicative functions | |
CN101894566A (en) | Visualization method of Chinese mandarin complex vowels based on formant frequency | |
CN112151030B (en) | Multi-mode-based complex scene voice recognition method and device | |
CN1681002B (en) | Speech synthesis system, speech synthesis method | |
CN101916566B (en) | Electronic larynx speech reconstructing method and system thereof | |
CN102723078B (en) | Emotion speech recognition method based on natural language comprehension | |
CN102609969B (en) | Method for processing face and speech synchronous animation based on Chinese text drive | |
CN103810994B (en) | Speech emotional inference method based on emotion context and system | |
CN105427869A (en) | Session emotion autoanalysis method based on depth learning | |
CN105760852A (en) | Driver emotion real time identification method fusing facial expressions and voices | |
CN102176313B (en) | Formant-frequency-based Mandarin single final vioce visualizing method | |
CN108549628A (en) | The punctuate device and method of streaming natural language information | |
CN105788608B (en) | Chinese phonetic mother method for visualizing neural network based | |
CN106097835A (en) | A kind of deaf mute exchanges the method for intelligent assistance system and exchange | |
CN102044254B (en) | Speech spectrum color enhancement method for speech visualization | |
CN107221344A (en) | A kind of speech emotional moving method | |
CN102426834A (en) | Method for testing rhythm level of spoken English | |
CN110946554A (en) | Cough type identification method, device and system | |
CN110047474A (en) | A kind of English phonetic pronunciation intelligent training system and training method | |
CN109065073A (en) | Speech-emotion recognition method based on depth S VM network model | |
CN103383845B (en) | Multi-dimensional dysarthria measuring system and method based on real-time vocal tract shape correction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20101124 |