CN104299611A - Chinese tone recognition method based on time frequency crest line-Hough transformation - Google Patents

Chinese tone recognition method based on time frequency crest line-Hough transformation Download PDF

Info

Publication number
CN104299611A
CN104299611A CN201410509560.XA CN201410509560A CN104299611A CN 104299611 A CN104299611 A CN 104299611A CN 201410509560 A CN201410509560 A CN 201410509560A CN 104299611 A CN104299611 A CN 104299611A
Authority
CN
China
Prior art keywords
time
frequency
line
chinese
tone
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410509560.XA
Other languages
Chinese (zh)
Inventor
于凤芹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangnan University
Original Assignee
Jiangnan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangnan University filed Critical Jiangnan University
Priority to CN201410509560.XA priority Critical patent/CN104299611A/en
Publication of CN104299611A publication Critical patent/CN104299611A/en
Pending legal-status Critical Current

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The invention provides a Chinese tone recognition method based on time frequency crest line-Hough transformation. Chinese tone recognition is converted into classification of the change trend of a line segment in a time frequency distribution diagram so that a new Chinese tone recognition method and technique can be acquired. The method includes the steps that firstly, final voice signals carrying Chinese tones are expressed through the SPWVD time frequency distribution diagram and tone information is shown through a group of similarly-parallel time frequency crest lines in the time frequency diagram; secondly, due to the fact that the main time frequency crest line is a region with larger energy in the diagram, the change trend of different tones is reflected, and in order to reduce the calculated amount, treatment such as binaryzation, thresholding and refining is conducted on the time frequency distribution diagram, and a center line segment of the main time frequency crest line reflecting the change trend of the tones is acquired; thirdly, Hough transformation is conducted on the time frequency distribution diagram containing the center line of the main crest line, so that the intercept and included angle parameters of the center line of the main crest line are acquired; finally, the tone type is judged according to the intercept and the included angle of the line segment and the coordinate values of a start point and an end point of the line segment.

Description

Based on the Chinese tone recognition method of time-frequency crestal line-Hough transform
Technical field
The invention belongs to the Tone recognition technical field in phonetic synthesis and speech recognition.The present invention represents the simple or compound vowel of a Chinese syllable voice signal carrying Chinese language tone by a kind of time-frequency distributions, the tone information of Chinese is then embodied in the crestal line variation tendency in time-frequency distributions image, by obtaining the line segment reflecting tone variation tendency after carrying out the pre-service such as binaryzation, thresholding, refinement to time-frequency distributions image, Hough transform is carried out to these line segments, identifies the tone of Chinese according to Hough transform parameter.
Background technology
Chinese speech signal is except having the general character of voice signal non-stationary, and it also has tone feature, and tone is one of underlying attribute of Chinese, has word-building, distinguishes justice and improve the functions such as expression effect.In Chinese 30% is the unisonance not people having the same aspiration and interest, and tone is the unavoidable problem of Chinese speech analysis process, tone Chinese speech signal identification and synthesis etc. play an important role, the speech recognition in conjunction with tone feature contributes to the discrimination improving voice; Consider that the phonetic synthesis of tone can reduce the machine taste enhancing natural sense of synthetic speech.
Individual Chinese character in Chinese is all single syllable, syllable can as the elementary cell of Chinese speech analysis, and the syllable of Chinese is made up of initial consonant and simple or compound vowel of a Chinese syllable, tone information is carried by simple or compound vowel of a Chinese syllable, standard Chinese is a kind of language with tone, and Chinese language tone is generally divided into (high and level tone), two sound (rising tone), three sound (upper sound) and the four tones of standard Chinese pronunciation (falling tone) four class.The pronunciation of Chinese character forms a syllable by simple or compound vowel of a Chinese syllable and initial consonant cooperation, and tone is carried by simple or compound vowel of a Chinese syllable pronunciation part, and each tone all presents the pitch curve of given shape, and it reflects the pitch form of normal syllable, has arched feature.
Current extraction tone feature mainly time domain approach and frequency domain method.Time domain approach utilizes linear prediction and autocorrelation function etc. to extract fundamental frequency, and frequency domain method carries out to linear predictive residual the elaborate position that cepstral analysis can obtain fundamental frequency.Time domain approach operand is little, but noiseproof feature difference and easily there is frequency multiplication or half frequency multiplication, complicated with the frequency domain method computing that Hilbert-Huang conversion and cepstrum combine, and in the process extracting fundamental tone, the fundamental frequency track adopting any method to extract and real fundamental frequency track all can not fit like a glove.In addition, tone feature generally all uses the sorter identifications such as support vector machine, gauss hybrid models, neural network, could identify it is which tone after needing training process after extracting, and algorithm is complicated, operation time is long.
Summary of the invention
(1) the best time-frequency representation of Chinese simple or compound vowel of a Chinese syllable
Voice are typical non-stationary signals, and time-frequency distributions analyzes the powerful of Non-stationary Signal Analysis.Wigner-power distribution (Wigner-Ville Distribution, WVD) has best time-frequency locality, but there is cross term for multicomponent data processing, and the existence of intercrossing disturbs the true time-frequency distributions of signal.Level and smooth pseudo-Wigner-power distribution (Smoothed Pseudo Wigner-Ville Distribution, SPWVD), by smoothly suppressing the cross term of WVD in time domain and window adding in frequency domain function, has taken into account time-frequency locality and cross-term restrain.SPWVD is defined as:
SPWVD z ( t , f ) = ∫ - ∞ ∞ ∫ - ∞ ∞ z ( t - u + τ 2 ) z * ( t - u - τ 2 ) g ( u ) h ( τ ) e - j 2 πτf dudτ - - - ( 1 )
In formula, g (u), h (τ) they are the even window functions of two realities, and g (0)=h (0)=1.
The tone of Chinese is carried by simple or compound vowel of a Chinese syllable, namely tone information is reflected in the voiced segments of voice, the present invention is by carrying out SPWVD to taking toned simple or compound vowel of a Chinese syllable, by the instantaneous frequency of simple or compound vowel of a Chinese syllable voice signal over time process clearly show at time frequency plane, instantaneous frequency that what time-frequency crestal line represented in time-frequency figure is with change procedure, be the most concentrated region of signal energy.SPWVD time-frequency crestal line clearly shows not same tone crestal line, and over time, crestal line is different along the change of time shaft frequently at that time for the not same tone of same simple or compound vowel of a Chinese syllable.The SPWVD of initial consonant " o " four tones as shown in Figure 1.
(2) time-frequency backbone line drawing and thinning preprocess
Thus present harmonic wave because Chinese simple or compound vowel of a Chinese syllable belongs to voiced sound sounding, namely in time-frequency figure, there will be one or several time-frequency crestal line, but the variation tendency of these several time-frequency crestal lines is substantially identical, only need extract a wherein time-frequency backbone line for Tone recognition.For this reason, need to carry out thresholding process to SPWVD time-frequency matrix.Due to SPWVD by added-time window and frequently window to WVD smoothing come suppressing crossterms, its time-frequency locality is caused to be deteriorated, namely time-frequency crestal line is thicker, the time-frequency crestal line now extracted has certain width, if directly carry out Hough transform to SPWVD, operation time can be increased, therefore, need to carry out binaryzation, thresholding, the further pre-service of refinement to SPWVD image, extract the center line of time-frequency crestal line.Center line such as the figure (2) of the SPWVD time-frequency crestal line of initial consonant " o " four tones represents.
(3) parameter space of line segment is obtained through Hough transform
The center line of the SPWVD time-frequency crestal line of extraction is carried out Hough transform, obtains the coordinate figure reflecting line segment intercept and angle parameter and line segment initial sum distal point.Formation spike is assembled in position corresponding with straight line parameter in parameter space for straight line in detected image by Hough transform, according to number and the position of spike, thus obtains the straight line of image space and the parameter of straight line.
The basic thought of Hough transform is a little-line duality, and at image space before image conversion, at parameter space after conversion.In image space, all straight lines crossing point (x, y) all meet equation:
y=px+q (2)
Wherein p is slope, and q is intercept, and above-mentioned straight-line equation also can be written as:
q=-px+y (3)
The straight line of point (p, q) is crossed in its representation parameter space.Two point (x on the same straight line of image space 1, y 1) and (x 2, y 2) all meet equation of line (2), can be write as q=-px in parameter space 1+ y 1and q=-px 2+ y 2, they are two different straight lines at parameter space, but because they have identical slope and intercept at image space, so these two straight lines intersect, as shown in Fig. 3 (a), (b) at the point (p, q) of parameter space.As can be seen here, the corresponding line intersected in parameter space of the point of conllinear in image space, conversely, all straight lines intersecting at same point at parameter space have the point of conllinear corresponding with it at image space.According to point-line duality, when some marginal points of Given Graph image space, just determine by Hough transform the straight line connecting these points, Hough transform straight-line detection question variation in image space in parameter space to point test problems, by carrying out cumulative statistics to the point intersected in parameter space, just can the detection of accomplish linear and parameter estimation task.
In order to avoid when straight line is close to vertical and horizontal direction, problem calculated amount being increased due to the value approach infinity of p and q, can straight line be used instead polar coordinate representation:
ρ = x cos θ + y sin θ = x 2 + y 2 sin ( θ + arctan x y ) - - - ( 4 )
Here ρ represents the normal distance of straight line apart from initial point, and θ is the angle of this normal and X-axis forward, as shown in Fig. 4 (a).According to this equation, the point in original image space correspond to a sinusoidal curve in new parameter space, namely be transformed into polar coordinate space by Cartesian coordinate space, Hough transform becomes a little-sinusoidal curve antithesis, as shown in Fig. 4 (b) by original point-straight line antithesis.The straight line detected in image space needs in parameter space, detect sinusoidal intersection point, and the parameter of straight line is represented by the angle theta of normal distance ρ and normal and X-axis forward.
(4) obtain line segment parameter space according to Hough transform and identify tone
Tone type is judged according to the value of angle theta of normal distance ρ and normal and X-axis forward and the extreme coordinates at the whole story of respective straight.Be aided with line segment two-end-point coordinate according to the value scope of θ, four kinds of tones can be distinguished.When θ value is positive-angle, or the ordinate that the ordinate of end is greater than top is then two sound; When θ value is negative angle, or the ordinate that the ordinate of end is less than top is then the four tones of standard Chinese pronunciation; If θ's is less, be almost 0, be then; Other situations are three sound.
Accompanying drawing explanation
Fig. 1 is the SPWVD time frequency distribution map under initial consonant " o " four tones, wherein Fig. 1 (a) is the SPWVD time-frequency figure of " o ", Fig. 1 (b) is the SPWVD time-frequency figure of two sound " o ", the SPWVD time-frequency figure of Fig. 1 (c) to be the SPWVD time-frequency figure of three sound " o ", Fig. 1 (d) be four tones of standard Chinese pronunciation " o ".
Fig. 2 is to the center line of the time-frequency crestal line extracted after the SPWVD thresholding of initial consonant " o " four tones and refinement.
The image space of Fig. 3 Hough transform represents explanation schematic diagram to the parameter space of intercept and slope.
The image space of Fig. 4 Hough transform represents explanation schematic diagram to the parameter space of normal distance and angle.
Fig. 5 overall framework explanation of the present invention
Embodiment
Step 1: speech signal pre-processing and sound segmentation.After filtering and pre-emphasis process are first carried out to signal, carry out end-point detection according to short-time average magnitade difference function and zero-crossing rate etc. and remove the unvoiced segments of voice, then carry out sound segmentation and find and take toned simple or compound vowel of a Chinese syllable part.
Step 2: the SPWVD time frequency distribution map of making simple or compound vowel of a Chinese syllable.With SPWVD, time-frequency conversion is carried out to rhythm parent signal and obtain SPWVD time-frequency image.Time-frequency crestal line is the region that in time-frequency image, energy is larger, and the time-frequency ridge of same tone is not different along the change of time shaft.Because simple or compound vowel of a Chinese syllable has very strong harmonic wave, so several time-frequency crestal lines can be there are in time-frequency figure simultaneously.
Step 3: carry out binaryzation, thresholding and thinning processing to time-frequency distributions image, obtains time-frequency backbone line.By carrying out binaryzation to SPWVD time-frequency image, thresholding process extracts a main time-frequency crestal line.The image crestal line now extracted, has certain width, also needs to carry out thinning processing with bwmorph function, obtains the center line of backbone line.
Step 4: carry out Hough transform to the time-frequency image of the center line containing backbone line, obtains these line segments of center line of backbone line, and obtains intercept and the angle parameter of this line segment, the Hough matrix be namely made up of ρ and θ.Under certain threshold value, search for the value that Hough matrix returns ρ and θ being more than or equal to this threshold value place, preserve the extreme coordinates value at the whole story of respective straight simultaneously.
Step 5: judge that tone obtains type according to the value of ρ and θ and the extreme coordinates value at the whole story of respective straight.Value scope according to extracting θ is aided with line segment two-end-point coordinate figure, can distinguish four kinds of tones.When θ value is positive-angle, or the ordinate that the ordinate of end is greater than top is then two sound; When θ value is negative angle, or the ordinate that the ordinate of end is less than top is then the four tones of standard Chinese pronunciation; If θ's is less, be almost 0, be then; Other situations are three sound.

Claims (5)

1., based on the Chinese tone recognition method of time-frequency crestal line-Hough transform, it is characterized in that:
The simple or compound vowel of a Chinese syllable voice signal carrying Chinese language tone is represented by kind of a time-frequency distributions, then the tone information of Chinese is then embodied in the crestal line variation tendency in time-frequency distributions image, by obtaining the line segment reflecting tone variation tendency after carrying out the pre-service such as binaryzation, thresholding, refinement to time-frequency distributions image, Hough transform is carried out to these line segments, identifies the tone of Chinese according to Hough transform parameter.
2. as claimed in claim 1 based on the Chinese tone recognition method of time-frequency crestal line-Hough transform, it is characterized in that: by carrying out SPWVD to taking toned simple or compound vowel of a Chinese syllable, by the instantaneous frequency of simple or compound vowel of a Chinese syllable voice signal over time process clearly show at time frequency plane.
Voice are typical non-stationary signals, and time-frequency distributions analyzes the powerful of Non-stationary Signal Analysis.Wigner-power distribution (Wigner-Ville Distribution, WVD) has best time-frequency locality, but there is cross term for multicomponent data processing, and the existence of intercrossing disturbs the true time-frequency distributions of signal.Level and smooth pseudo-Wigner-power distribution (Smoothed Pseudo Wigner-Ville Distribution, SPWVD), by smoothly suppressing the cross term of WVD in time domain and window adding in frequency domain function, has taken into account time-frequency locality and cross-term restrain.
The tone of Chinese is carried by simple or compound vowel of a Chinese syllable, namely tone information is reflected in the voiced segments of voice, the present invention is by carrying out SPWVD to taking toned simple or compound vowel of a Chinese syllable, by the instantaneous frequency of simple or compound vowel of a Chinese syllable voice signal over time process clearly show at time frequency plane, instantaneous frequency that what time-frequency crestal line represented in time-frequency figure is with change procedure, be the most concentrated region of signal energy.SPWVD time-frequency crestal line clearly shows not same tone crestal line, and over time, crestal line is different along the change of time shaft frequently at that time for the not same tone of same simple or compound vowel of a Chinese syllable.
3., as claimed in claim 1 based on the Chinese tone recognition method of time-frequency crestal line-Hough transform, it is characterized in that: binaryzation, thresholding, the further pre-service of refinement are carried out to SPWVD image, extracts the center line of time-frequency crestal line.
Thus present harmonic wave because Chinese simple or compound vowel of a Chinese syllable belongs to voiced sound sounding, namely in time-frequency figure, there will be one or several time-frequency crestal line, but the variation tendency of these several time-frequency crestal lines is substantially identical, only need extract a wherein time-frequency backbone line for Tone recognition.For this reason, need to carry out thresholding process to SPWVD time-frequency matrix.Due to SPWVD by added-time window and frequently window to WVD smoothing come suppressing crossterms, its time-frequency locality is caused to be deteriorated, namely time-frequency crestal line is thicker, the time-frequency crestal line now extracted has certain width, if directly carry out Hough transform to SPWVD, operation time can be increased, therefore, need to carry out binaryzation, thresholding, the further pre-service of refinement to SPWVD image, extract the center line of time-frequency crestal line.
4. as claimed in claim 1 based on the Chinese tone recognition method of time-frequency crestal line-Hough transform, it is characterized in that: the center line of the SPWVD time-frequency crestal line of extraction is carried out Hough transform, obtain the parameter space value of the coordinate figure reflecting line segment intercept and angle parameter and line segment initial sum distal point.
Formation spike is assembled in position corresponding with straight line parameter in parameter space for straight line in detected image by Hough transform, according to number and the position of spike, thus obtains the straight line of image space and the parameter of straight line.The basic thought of Hough transform is a little-line duality, and at image space before image conversion, at parameter space after conversion.The corresponding line intersected in parameter space of the point of conllinear in image space, conversely, all straight lines intersecting at same point at parameter space have the point of conllinear corresponding with it at image space.According to point-line duality, when some marginal points of Given Graph image space, just determine by Hough transform the straight line connecting these points, Hough transform straight-line detection question variation in image space in parameter space to point test problems, by carrying out cumulative statistics to the point intersected in parameter space, just can the detection of accomplish linear and parameter estimation task.
5., as claimed in claim 1 based on the Chinese tone recognition method of time-frequency crestal line-Hough transform, it is characterized in that: obtain line segment parameter space according to Hough transform and identify tone
Tone type is judged according to the value of angle theta of normal distance ρ and normal and X-axis forward and the extreme coordinates at the whole story of respective straight.Be aided with line segment two-end-point coordinate according to the value scope of θ, four kinds of tones can be distinguished.When θ value is positive-angle, or the ordinate that the ordinate of end is greater than top is then two sound; When θ value is negative angle, or the ordinate that the ordinate of end is less than top is then the four tones of standard Chinese pronunciation; If θ's is less, be almost 0, be then; Other situation is three sound.
CN201410509560.XA 2014-09-28 2014-09-28 Chinese tone recognition method based on time frequency crest line-Hough transformation Pending CN104299611A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410509560.XA CN104299611A (en) 2014-09-28 2014-09-28 Chinese tone recognition method based on time frequency crest line-Hough transformation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410509560.XA CN104299611A (en) 2014-09-28 2014-09-28 Chinese tone recognition method based on time frequency crest line-Hough transformation

Publications (1)

Publication Number Publication Date
CN104299611A true CN104299611A (en) 2015-01-21

Family

ID=52319310

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410509560.XA Pending CN104299611A (en) 2014-09-28 2014-09-28 Chinese tone recognition method based on time frequency crest line-Hough transformation

Country Status (1)

Country Link
CN (1) CN104299611A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108257606A (en) * 2018-01-15 2018-07-06 江南大学 A kind of robust speech personal identification method based on the combination of self-adaptive parallel model
CN111401204A (en) * 2020-03-11 2020-07-10 南京工程学院 Feature extraction method based on fractional order Hilbert cepstrum
CN113297969A (en) * 2021-05-25 2021-08-24 中国人民解放军海军航空大学航空基础学院 Radar waveform identification method and system
CN113419228A (en) * 2021-06-02 2021-09-21 中国人民解放军海军航空大学航空作战勤务学院 Sea surface small target detection method and device based on time-frequency ridge-Radon transformation

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
徐郑丹: "基于时频分析的汉语声调识别的研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
徐郑丹等: "基于SPWD时频脊特征提取的汉语声调识别", 《计算机应用与软件》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108257606A (en) * 2018-01-15 2018-07-06 江南大学 A kind of robust speech personal identification method based on the combination of self-adaptive parallel model
CN111401204A (en) * 2020-03-11 2020-07-10 南京工程学院 Feature extraction method based on fractional order Hilbert cepstrum
CN111401204B (en) * 2020-03-11 2022-07-26 南京工程学院 Feature extraction method based on fractional order Hilbert cepstrum
CN113297969A (en) * 2021-05-25 2021-08-24 中国人民解放军海军航空大学航空基础学院 Radar waveform identification method and system
CN113419228A (en) * 2021-06-02 2021-09-21 中国人民解放军海军航空大学航空作战勤务学院 Sea surface small target detection method and device based on time-frequency ridge-Radon transformation

Similar Documents

Publication Publication Date Title
Dahake et al. Speaker dependent speech emotion recognition using MFCC and Support Vector Machine
CN101136199B (en) Voice data processing method and equipment
Ramamohan et al. Sinusoidal model-based analysis and classification of stressed speech
CN101226742B (en) Method for recognizing sound-groove based on affection compensation
JP6189970B2 (en) Combination of auditory attention cue and phoneme posterior probability score for sound / vowel / syllable boundary detection
CN102982811B (en) Voice endpoint detection method based on real-time decoding
Ajmera et al. Text-independent speaker identification using Radon and discrete cosine transforms based features from speech spectrogram
Deb et al. Multiscale amplitude feature and significance of enhanced vocal tract information for emotion classification
CN103714806B (en) A kind of combination SVM and the chord recognition methods of in-dash computer P feature
US20170154640A1 (en) Method and electronic device for voice recognition based on dynamic voice model selection
Liu et al. Simultaneous utilization of spectral magnitude and phase information to extract supervectors for speaker verification anti-spoofing
KR20140079092A (en) Method and Apparatus for Context Independent Gender Recognition Utilizing Phoneme Transition Probability
KR20130133858A (en) Speech syllable/vowel/phone boundary detection using auditory attention cues
CN105047194A (en) Self-learning spectrogram feature extraction method for speech emotion recognition
CN103366735B (en) The mapping method of speech data and device
CN104299611A (en) Chinese tone recognition method based on time frequency crest line-Hough transformation
Montalvo et al. Language identification using spectrogram texture
CN104103280A (en) Dynamic time warping algorithm based voice activity detection method and device
Matoušek et al. Classification-based detection of glottal closure instants from speech signals
Stanek et al. Algorithms for vowel recognition in fluent speech based on formant positions
Yarra et al. A mode-shape classification technique for robust speech rate estimation and syllable nuclei detection
CN103366737A (en) An apparatus and a method for using tone characteristics in automatic voice recognition
Maqsood et al. A comparative study of classifier based mispronunciation detection system for confusing
Bouzid et al. Voice source parameter measurement based on multi-scale analysis of electroglottographic signal
Wang Speech emotional classification using texture image information features

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20150121