CN101894566A - Visualization method of Chinese mandarin complex vowels based on formant frequency - Google Patents
Visualization method of Chinese mandarin complex vowels based on formant frequency Download PDFInfo
- Publication number
- CN101894566A CN101894566A CN2010102348459A CN201010234845A CN101894566A CN 101894566 A CN101894566 A CN 101894566A CN 2010102348459 A CN2010102348459 A CN 2010102348459A CN 201010234845 A CN201010234845 A CN 201010234845A CN 101894566 A CN101894566 A CN 101894566A
- Authority
- CN
- China
- Prior art keywords
- formant
- formant frequency
- voice
- complex
- complex vowels
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Electrically Operated Instructional Devices (AREA)
Abstract
The invention relates to a visualization method of Chinese mandarin complex vowel based on formant frequency, comprising the following steps: characteristics extraction, i.e. carrying out prefiltering, framing, pre-emphasis, windowing and endpoint detection on primitive complex vowels, and extracting the first three formant frequencies F1, F2 and F3 of each frame signal; and realization steps of complex vowels visualization, i.e. expressing a first formant frequency F1 with the abscissa, expressing specific value between the two formant frequencies with the ordinate, calculating the values of F2/F1 and F1/F2 for each frame, and showing the points (F1, F2/F1) and (F1, F3/F2) on coordinate graphs with different icons and colors. The invention visually shows the complex vowels by the images, can exactly distinguish the voice signals of the complex vowels, only needs to extract short time mean energy, first three formant frequencies and others simple voice acoustics parameters of the voice signals and is easy for realization.
Description
Technical field
The present invention relates to a kind of method for visualizing of Chinese mandarin complex vowels, particularly a kind of compound vowel method for visualizing based on formant frequency belongs to the voice visual field.
Background technology
Voice are the sound with difference meaning function that human vocal organs send, and are indispensable in daily life.But for the impaired hearing crowd, owing to do not receive sufficient acoustic information, it usually is very difficult fluently exchanging smoothly concerning them.Studies show that in people's perception to external world, what the information of obtaining was maximum is vision, next is only the sense of hearing, and the information that the combination of the vision and the sense of hearing is obtained than any single sense organ perception is all many.In addition, experience is also told us, and chart is that people express thoughts, transmit information most convenient, one of method the most intuitively, so people also attempt to come perceptual speech from vision, perhaps more useful information is transmitted in the combination of audio-visual.A kind of voice visual method is explored and sought to purpose of the present invention exactly, promptly utilizes visual element to show voice, reaches the purpose of " by the visually-perceptible voice ", for the effective perceptual speech of impaired hearing crowd, exercise orthoepy provide actual help.
Before the present invention, a lot of voice visual methods all are based on faceform or vocal organs.This method is carried out qualitative or quantitative description to pronunciation mouth shape.Qualitative description is as the size of circle lip, flat lip, opening, height of tongue position or the like.Present many applications need be carried out objectively quantitative measurement to the vision voice,, machine automatic labiomaney synthetic as visual human's face or the like.International standard MPEG-4 has defined people's face defined parameters FDP (Facial Definition Parameter), human face animation parameter F AP (facial animation parameter) and human face animation parameter linear module FAPU (Facial Animation Parameter Unit), wherein the advantage of FAP parameter has made it become the international standard of human face animation, and it by the definition human face animation FAPU of parameter unit (facial animation parameter unit) standard different people face difference, make same parameter can on different faceforms, make similar human face expression.
Realize the comparatively hommization of method of voice visual based on the variation of the motion change of vocal organs and facial expression, analyzed the phonation of human body effectively, help the impaired hearing crowd to practise pronunciation.But the sound that sends for soft palate, these inner vocal organs of lower jaw just is difficult to show effectively by vision.Simultaneously, with regard to its intelligibility of speech, also be difficult to reach ideal effect, except that the only a few expert, people are difficult to directly by the motion of observation vocal organs perceptual speech accurately and efficiently.In addition, visual effect is more single, and expressive force is not strong.
In addition, the human auditory properties that also had some scholar's research is attempted by analyzing hearing organ's hearing mechanism, utilizes corresponding auditory model to obtain distinguishing characteristics information and The Visual Implementation in addition between the voice signal.But, also being in the elementary step at present for human auditory's The Characteristic Study, the information that we can utilize is also very limited.
Summary of the invention
Technical matters to be solved by this invention is the method for visualizing that a kind of voice will be provided, and by the different phonetic feature is integrated into single image, makes image have readability.These class methods adopt different color, icon and different icon sizes, visually represent voice in the mode of image.With compare based on vocal organs model, faceform, the voice visual method of integrating based on phonetic feature possesses good readability, intelligibility.No matter impaired hearing crowd or ordinary people after a relatively short training, can identify the visual image of corresponding pronunciation intuitively.By reading the visual image of this invention, we can make a distinction diphthong compound vowel in the standard Chinese at an easy rate.
Technical scheme of the present invention is:
A kind of Chinese mandarin complex vowels voice visual method based on formant frequency may further comprise the steps:
One, feature extraction, concrete grammar is:
(1) original compound vowel is carried out pre-filtering, eliminate power frequency and disturb;
(2) compound vowel after the pre-filtering is carried out branch frame, pre-emphasis, windowing and end-point detection, determine the initial end points and the end caps of compound vowel;
(3) first three formant frequency F1, F2, the F3 of every frame signal between initial end points of extraction and the end caps;
Two, compound vowel The Visual Implementation step, concrete grammar is: represent the first formant frequency F1 with horizontal ordinate, ordinate is represented two ratios between the formant frequency, for each frame, calculate the value of F2/F1 and F3/F2, and with point (F1, F2/F1) and (F1 F3/F2) is illustrated on the coordinate diagram with different icons or different colours respectively.
The radius of each point is with the increase of frame number rule or dwindle on the coordinate diagram, thereby can reflect formant trajectory direction over time on coordinate diagram intuitively.
Beneficial effect:
(1) the present invention represents compound vowel intuitively by image, utilize the first resonance peak F1 over time trend and F2/F1 and F3/F2 trend and relative position relation are distinguished different Chinese mandarin complex vowels pronunciations over time.Image difference between the Chinese mandarin complex vowels is obvious, therefore can accurately distinguish the compound vowel voice signal.For some specific compound vowel, can also distinguish more exactly by the degree of rarefication of two tracks and the overlapping situation of two tracks.
(2) the present invention only extracts the simple voice parameters,acoustic such as short-time average energy, first three formant frequency of voice signal, is easy to realize.
Description of drawings
Fig. 1 is a Chinese mandarin complex vowels voice visual system chart.
Fig. 2 finds the solution process flow diagram for formant frequency.
Fig. 3 is a male voice Chinese mandarin complex vowels ai voice visual effect exemplary plot.
Fig. 4 is a female voice Chinese mandarin complex vowels ai voice visual effect exemplary plot.
Fig. 5 is a male voice Chinese mandarin complex vowels ao voice visual effect exemplary plot.
Fig. 6 is a female voice Chinese mandarin complex vowels ao voice visual effect exemplary plot.
Fig. 7 is a male voice Chinese mandarin complex vowels ia voice visual effect exemplary plot.
Fig. 8 is a female voice Chinese mandarin complex vowels ia voice visual effect exemplary plot.
Fig. 9 is a male voice Chinese mandarin complex vowels ve voice visual effect exemplary plot.
Figure 10 is a female voice Chinese mandarin complex vowels ve voice visual effect exemplary plot.
Figure 11 is a male voice Chinese mandarin complex vowels ua voice visual effect exemplary plot.
Figure 12 is a female voice Chinese mandarin complex vowels ua voice visual effect exemplary plot.
Embodiment
Below in conjunction with accompanying drawing, specify specific embodiments of the invention.
Shown in Figure 1 is a system chart having realized the method for the invention, mainly is divided into two major parts: characteristic extracting module and effect of visualization figure generation module.
One, characteristic extracting module, this module has realized characteristic extraction step of the present invention.
At first, voice signal is carried out pre-service such as pre-filtering, branch frame, windowing.Directly extract short-time energy, preceding 3 formant frequencies of every frame voice signal then, give up the formant frequency of last some frame of compound vowel latter half and carry out corresponding linear time axis conversion and smoothing processing afterwards.
(1) short-time energy of voice signal:
Wherein, m is the starting point of window, and N is window long (counting).
(2) utilize the LPC technology to find the solution formant frequency:
As shown in Figure 2, at first, utilize the LPC technology to obtain the transition function H (z) of voice system.The root of polynomial correspondence of the transition function H (z) of a digital filter the pole and zero of system frequency transfer curve.According to this theory, the transition function H (z) of the voice here is full polar form, has only the denominator polynomial expression, that is:
Wherein M is the linear prediction exponent number.
In the formula, r
iBe the mould of compound radical, θ
iBe argument.Theoretical derivation shows, they and formant frequency F
iFollowing relation is arranged:
F
i=θ
i/2πT
i(4)
T in the formula
iIt is the sampling period.Concerning general speech analysis, the M value is 10-18.
(2) linear time base conversion process
For diphtong, what its differentiation was played a decisive role is the formant frequency of its The initial segment and middle transition section, so we at first give up the formant frequency of some frame of compound vowel latter half.Because the formant trajectory length difference of different compound vowels, because the course length difference of different compound vowels, we need carry out regular to formant trajectory.Resonance peak length this paper after regular gets 50 frames, and frame number just no longer compresses less than 50 after the partial frame if give up, when frame number greater than 50 the time, regular coefficient is:
Coeff=formant trajectory original length/regular back formant trajectory length (5)
If n node of original formant trajectory is x
1<x
2<....<x
n, its corresponding formant frequency value is y
i(i=1,2 ... .n).The m of the formant trajectory after a regular node is
Its corresponding formant frequency is z
i(i=1,2 ... .m).
In order to obtain the formant trajectory after regular
The frequency values of node at first will
Node is mapped on the original formant frequency, obtains corresponding position x
i, and
Because x
iIn most of the cases be non-integer, we just are chosen at x
iHithermost two some x
I-1And x
I+1Frequency values calculate regular back formant trajectory
Frequency values:
z
i=y
i-1*(x
i+1-x
i)+y
i+1(x
i-x
i-1)(7)
(4) median filter smoothness of image is handled:
It is a kind of method that adopts the statistics with histogram processing of sliding window that median smoothing is handled.Its ultimate principle is: establish { x (n) } and be input signal, { y (n) } is the output of median filter, and window is long to be 2L+1, n so
0Output valve y (the n at place
0) be exactly that center with window moves on to n
0The intermediate value of input sample in the window during place.So-called intermediate value is exactly 2L+1 input sample x (n
0-L), x (n
0-L+1) ..., x (n
0), x (n
0+ 1), x (n
0+ 2) ..., x (n
0+ L) add up, obtain an accumulative histogram, wherein 1/2 fractile is exactly an intermediate value.
Medium filtering can be corrected indivedual singular points and the value of sampling point around not influencing.
Linear smoothing is to carry out linear filtering with sliding window to handle, that is:
Wherein w (m), and m=-L ,-L+ 1 ..., 0,1,2 ..., L} is a 2L+1 point smoothing windows, satisfies:
For example the value of 3 windows desirable 0.25,0.5,0.25}.Linear smoothing is in rectified input signal in the unsmooth place sample value, and the value of each sampling point is revised near also making, more than two kinds of smoothing techniques can combine use.
Two, effect of visualization is realized module:
Fig. 3---Figure 12 shows that diphthong zero initial simple or compound vowel of a Chinese syllable/ai/ in the standard Chinese ,/ao/ ,/ia/ ,/ve/ and/the effect of visualization figure of ua/, wherein corresponding each simple or compound vowel of a Chinese syllable comprises the pronunciation of male voice and the pronunciation of female voice again.Represent the first formant frequency F1 with horizontal ordinate, ordinate is represented two ratios between the formant frequency, for each frame, calculate the value of F2/F1 and F3/F2, and with point (F1, F2/F1) and (F1 F3/F2) is illustrated on the coordinate diagram with different icons or different colours respectively.In the present embodiment, the respective icon of in each view, representing F2/F1 and F3/F2 respectively with red round dot and blue Diamond spot.In order to reflect formant trajectory order over time, the radius of each icon changes by following rule:
d
i=3+i
0.6(i represents i icon, d
iBe i icon diameter) (10)
The first resonance peak F1 trend and F1 over time on image, have been reflected, F2, relation between the F3 three, utilize F1 whether according to from big to small variation order, F3/F2 and F2/F1 whether according to from big to small variation tendency and in view the value of F3/F2 whether distinguish different standard Chinese diphthong simple or compound vowel of a Chinese syllable pronunciations greater than the value of F2/F1, for some specific compound vowel, the F2/F1 track that it can also be seen that them distributes more sparse, F3/F2 track and F2/F1 track overlap at the place, end, and this all provides additional information for we distinguish compound vowel more exactly.Concrete grammar is:
As can be seen from the figure, the variation tendency of each pronunciation F1 and variation tendency and the position of F2/F1 and F3/F2 concern obvious difference, and the human eye ratio is easier to they are divided into several big classes.The F2/F1 of indivedual pronunciations and F3/F2 track present discontinuous, and this mainly is because the resonance peak of some frame extracts due to the mistake.
Utilize the method for the invention, the Chinese mandarin complex vowels voice signal is expressed as the coordinate diagram that can intuitively distinguish, can provide actual help for the effective perceptual speech of impaired hearing crowd, exercise orthoepy.
Above embodiment is only in order to illustrate that technical scheme of the present invention is not intended to limit; Although with reference to preferred embodiment the present invention is had been described in detail, those of ordinary skill in the field should be appreciated that still and can make amendment or the part technical characterictic is equal to replacement the specific embodiment of the present invention; And not breaking away from the spirit of technical solution of the present invention, it all should be encompassed in the middle of the technical scheme scope that the present invention asks for protection.
Claims (2)
1. the Chinese mandarin complex vowels voice visual method based on formant frequency is characterized in that, may further comprise the steps:
One, feature extraction, concrete grammar is:
(1) original compound vowel is carried out pre-filtering, eliminate power frequency and disturb;
(2) compound vowel after the pre-filtering is carried out branch frame, pre-emphasis, windowing and end-point detection, determine the initial end points and the end caps of compound vowel;
(3) first three formant frequency F1, F2, the F3 of every frame signal between initial end points of extraction and the end caps;
Two, compound vowel The Visual Implementation step, concrete grammar is: represent the first formant frequency F1 with horizontal ordinate, ordinate is represented two ratios between the formant frequency, for each frame, calculate the value of F2/F1 and F3/F2, and with point (F1, F2/F1) and (F1 F3/F2) is illustrated on the coordinate diagram with different icons or different colours respectively.
2. a kind of Chinese mandarin complex vowels voice visual method according to claim 1 based on formant frequency, it is characterized in that, the radius of each point is with the increase of frame number rule or dwindle on the coordinate diagram, thereby can reflect formant trajectory direction over time on coordinate diagram intuitively.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2010102348459A CN101894566A (en) | 2010-07-23 | 2010-07-23 | Visualization method of Chinese mandarin complex vowels based on formant frequency |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2010102348459A CN101894566A (en) | 2010-07-23 | 2010-07-23 | Visualization method of Chinese mandarin complex vowels based on formant frequency |
Publications (1)
Publication Number | Publication Date |
---|---|
CN101894566A true CN101894566A (en) | 2010-11-24 |
Family
ID=43103737
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2010102348459A Pending CN101894566A (en) | 2010-07-23 | 2010-07-23 | Visualization method of Chinese mandarin complex vowels based on formant frequency |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101894566A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102231281A (en) * | 2011-07-18 | 2011-11-02 | 渤海大学 | Voice visualization method based on integration characteristic and neural network |
CN102820037A (en) * | 2012-07-21 | 2012-12-12 | 渤海大学 | Chinese initial and final visualization method based on combination feature |
CN103077728A (en) * | 2012-12-31 | 2013-05-01 | 上海师范大学 | Patient weak voice endpoint detection method |
CN107993071A (en) * | 2017-11-21 | 2018-05-04 | 平安科技(深圳)有限公司 | Electronic device, auth method and storage medium based on vocal print |
CN108962251A (en) * | 2018-06-26 | 2018-12-07 | 珠海金山网络游戏科技有限公司 | A kind of game role Chinese speech automatic identifying method |
TWI749796B (en) * | 2020-09-30 | 2021-12-11 | 瑞軒科技股份有限公司 | Resonance test system and resonance test method |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0473698A (en) * | 1990-07-13 | 1992-03-09 | Sony Corp | Shape control method based on audio signal |
CN101281747A (en) * | 2008-05-30 | 2008-10-08 | 苏州大学 | Method for recognizing Chinese language whispered pectoriloquy intonation based on acoustic channel parameter |
-
2010
- 2010-07-23 CN CN2010102348459A patent/CN101894566A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0473698A (en) * | 1990-07-13 | 1992-03-09 | Sony Corp | Shape control method based on audio signal |
CN101281747A (en) * | 2008-05-30 | 2008-10-08 | 苏州大学 | Method for recognizing Chinese language whispered pectoriloquy intonation based on acoustic channel parameter |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102231281A (en) * | 2011-07-18 | 2011-11-02 | 渤海大学 | Voice visualization method based on integration characteristic and neural network |
CN102231281B (en) * | 2011-07-18 | 2012-07-18 | 渤海大学 | Voice visualization method based on integration characteristic and neural network |
CN102820037A (en) * | 2012-07-21 | 2012-12-12 | 渤海大学 | Chinese initial and final visualization method based on combination feature |
CN102820037B (en) * | 2012-07-21 | 2014-03-12 | 渤海大学 | Chinese initial and final visualization method based on combination feature |
CN103077728A (en) * | 2012-12-31 | 2013-05-01 | 上海师范大学 | Patient weak voice endpoint detection method |
CN103077728B (en) * | 2012-12-31 | 2015-08-19 | 上海师范大学 | A kind of patient's weak voice endpoint detection method |
CN107993071A (en) * | 2017-11-21 | 2018-05-04 | 平安科技(深圳)有限公司 | Electronic device, auth method and storage medium based on vocal print |
CN108962251A (en) * | 2018-06-26 | 2018-12-07 | 珠海金山网络游戏科技有限公司 | A kind of game role Chinese speech automatic identifying method |
TWI749796B (en) * | 2020-09-30 | 2021-12-11 | 瑞軒科技股份有限公司 | Resonance test system and resonance test method |
US11641462B2 (en) | 2020-09-30 | 2023-05-02 | Amtran Technology Co., Ltd. | Resonant testing system and resonant testing method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Sandoval et al. | Automatic assessment of vowel space area | |
CN105976809B (en) | Identification method and system based on speech and facial expression bimodal emotion fusion | |
CN1681002B (en) | Speech synthesis system, speech synthesis method | |
CN101894566A (en) | Visualization method of Chinese mandarin complex vowels based on formant frequency | |
Peterson et al. | A physiological theory of phonetics | |
CN101916566B (en) | Electronic larynx speech reconstructing method and system thereof | |
Sroka et al. | Human and machine consonant recognition | |
CN102609969B (en) | Method for processing face and speech synchronous animation based on Chinese text drive | |
CN103218842A (en) | Voice synchronous-drive three-dimensional face mouth shape and face posture animation method | |
CN101950249B (en) | Input method and device for code characters of silent voice notes | |
CN102176313B (en) | Formant-frequency-based Mandarin single final vioce visualizing method | |
CN105206271A (en) | Intelligent equipment voice wake-up method and system for realizing method | |
CN110210416B (en) | Sign language recognition system optimization method and device based on dynamic pseudo tag decoding | |
CN105788608B (en) | Chinese phonetic mother method for visualizing neural network based | |
CN108549628A (en) | The punctuate device and method of streaming natural language information | |
CN101290766A (en) | Syllable splitting method of Tibetan language of Anduo | |
CN110946554A (en) | Cough type identification method, device and system | |
CN101930619A (en) | Collaborative filtering-based real-time voice-driven human face and lip synchronous animation system | |
CN105765654A (en) | Hearing assistance device with fundamental frequency modification | |
CN105845126A (en) | Method for automatic English subtitle filling of English audio image data | |
CN110349565A (en) | A kind of auxiliary word pronunciation learning method and its system towards hearing-impaired people | |
Liu et al. | A pilot study on mandarin chinese cued speech | |
CN102820037B (en) | Chinese initial and final visualization method based on combination feature | |
CN113838169A (en) | Text-driven virtual human micro-expression method | |
Han et al. | Speech recognition and lip shape feature extraction for English vowel pronunciation of the hearing-impaired based on SVM technique |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20101124 |