The device of automatically splitting speech signal of single character
Technical field
The utility model relates to a kind of device of splitting speech signal of single character, especially a kind of device of automatically splitting speech signal of single character.
Background technology
In the language learning process, regular meeting is by many language learning instruments, as e-dictionary etc., is used for increasing results of learning and accelerates pace of learning.General e-dictionary all has the function of pronunciation at present, that is to say can pass through the function of pronunciation after the user is by e-dictionary inquiry individual character or example sentence, and listens to the pronunciation of correct individual character or example sentence.So, can significantly promote the user in language learning, in the ability of listening aspect saying.The function of pronunciation that therefore, increasing manufacturer pay attention to day by day e-dictionary is arranged.
Recently e-dictionary is bragged about the function with true man's pronunciation, has become the characteristic of each institute of manufacturer demand.And true man's pronunciation can be recorded the sound wave of each individual character by true man, and reaches the function of true man's pronunciation.Yet, utilize true man to record the sound wave of all individual characters, will expend the very many storage areas of the sub-dictionary internal memory of power down, and then promote the expenditure of cost.
Therefore, develop, and reach function, so can save the space of internal memory, also improve the quality of pronunciation simultaneously near true man pronunciation by the synthetic mode of pronouncing.The synthetic mode of pronunciation generally can be divided into two kinds, is that example is described as follows with the English-word.
First kind of mode decides syllable according to the phonetic symbol in the English dictionary word list.Before the speech data of a synthetic English-word, must earlier this English-word be divided into single or multiple syllables, obtain out in the data by original recording again and the corresponding sound wave of syllable, and in addition in conjunction with getting final product.
The second way is recorded each syllable sound wave of all various initial consonants, simple or compound vowel of a Chinese syllable and combinations of tones, and is stored in the internal memory.Before the speech data of a synthetic English-word, must earlier this English-word be divided into single or multiple syllables, again by obtain out in the data of recording with cut apart after the corresponding sound wave of each syllable, and in addition combination gets final product.
As shown in the above description,, all must earlier English-word be divided into single or multiple syllables, just can carry out follow-up processing no matter be the synthetic mode of which kind of pronunciation.And on traditional practice, be to utilize manual cutting by people's ear audition.So, need to drop into great amount of manpower and just can finish man-hour.In addition, the work of manual cutting syllable is uninteresting, quantity is huge, and adopts people's ear audition and do the cutting of syllable, very easily produces error.
Therefore, how to solve the problem that artificial traditionally cutting individual character voice are derived, be the subject under discussion that needs to be resolved hurrily.
The utility model content
The utility model is to solve the above-mentioned technical matters that exists in the background technology, and proposes a kind of device of automatically splitting speech signal of single character.
Technical solution of the present utility model is: the utility model is a kind of device of automatically splitting speech signal of single character, its special character is: this device comprises: receiving element, be used for receiving the holophrastic speech tone signal, and the holophrastic speech tone signal is divided into a plurality of sound frames; Analytic unit is analyzed the sound frame, produces the phonetic feature corresponding to each sound frame; Cutting unit, according to phonetic feature, splitting speech signal of single character is a syllable, receiving element connect into analysis unit, analytic unit inserts cutting unit.
Above-mentioned phonetic feature comprises the average amplitude value of sound frame.
Above-mentioned phonetic feature comprises the average zero-crossing rate of sound frame.
Above-mentioned phonetic feature comprises the cepstrum parameter of sound frame.
Above-mentioned analytic unit produces threshold value according to phonetic feature, by cutting unit contrast phone feature and threshold value.
The device of the automatically splitting speech signal of single character that the utility model provides, coming automatically splitting speech signal of single character by phonetic feature is syllable, substituted the mode of traditional artificial cutting individual character voice fully, whole process does not need artificial intervention, time saving and energy saving, the efficient height, and to greatly reduce the people be the fault rate that brings.
Description of drawings
Fig. 1 is the device synoptic diagram of automatically splitting speech signal of single character;
Fig. 2 is the synoptic diagram of multisyllable holophrastic speech tone signal;
Fig. 3 is the synoptic diagram of cutting apart of multisyllable holophrastic speech tone signal.
Wherein, 10-receiving element, 20-analytic unit, 30-cutting unit;
Embodiment
Referring to Fig. 1, the device of automatically splitting speech signal of single character comprises: receiving element 10, analytic unit 20 and cutting unit 30.
Receiving element 10 receives the holophrastic speech tone signal, and the holophrastic speech tone signal is divided into a plurality of sound frames.Analytic unit 20 is analyzed a plurality of sound frames, and produces the phonetic feature corresponding to each sound frame.The phonetic feature that cutting unit 30 is analyzed according to analytic unit 20, and then splitting speech signal of single character is a syllable.
The voice signal of each individual character can be not identical, but have some common characteristics, and for example: the pronunciation of multisyllable individual character is made up of each syllable; Syllable is formed has specific rule to follow on voice signal; Utilize phonetic feature can carry out syllable splitting etc.Therefore, the utility model proposes and earlier the holophrastic speech tone signal is divided into a plurality of sound frames, is unit with each sound frame then, utilizes analytic unit 20 to analyze the phonetic feature of each sound frame.
Wherein, above-mentioned mentioned phonetic feature comprises: average amplitude value, average zero-crossing rate, cepstrum parameter etc.To do simple declaration at each phonetic feature below.
The size that is voice signal of the amplitude indication of voice signal has height in a minute as the mankind and rises and falls, so the branch that also has height or power that waveform presented of voice signal.And amplitude is a size of representing voice signal, and the average amplitude value, is that amplitude with all sound frames adds the General Logistics Department and does on average again, so can find out in the unit interval certain section signal strength distribution that voice signal is compared with whole section voice signal.
The average zero-crossing rate of voice signal is meant that signal waveform is passed transverse axis (zero) number of times in the unit interval.That is to say, the amplitude of voice signal in the unit interval, on the occasion of and negative value between transition times be called zero-crossing rate.And signal is divided by the sound frame, the zero-crossing rate of all sound frames is taken statistics on average, promptly be called average zero-crossing rate.
Therefore zero-crossing rate is the number of times of voice signal in the unit interval zero passage briefly.And zero-crossing rate is widely used, especially aspect speech recognition.The high section of zero-crossing rate is corresponding to voiceless sound or do not have the sound area.Relative, noise is higher, and the lower section of zero-crossing rate is corresponding to voiced sound.Hence one can see that, can distinguish voiceless sound in the voice signal and voiced sound, sound and noiseless etc. by judging zero-crossing rate.
Then introduce the cepstrum parameter.In the identification of signal, the most frequently used characteristic parameter is the energy value of signal on frequency spectrum (spectrum), for example: high-frequency signal only has bigger energy value at HFS, relative low frequency signal is bigger at low frequency energy partly, and these energy values on frequency spectrum just can be described as a kind of eigenwert.Utilize the method for fourier transform (Fourier Transform) to frequency spectrum, to deal with the conversion of signals on the time shaft.Yet at voice signal, another kind is called the parameter of cepstrum more can represent the characteristic of voice signal, and discrimination power is improved.Therefore, adopt the cepstrum parameter can promote the discrimination power of holophrastic speech tone signal.
Therefore, the utility model is by the analyzing speech feature, and reaching automatically splitting speech signal of single character by cutting unit 30 according to phonetic feature again is single or multiple syllables.Wherein, analytic unit 20 can produce threshold value according to phonetic feature, utilizes threshold value can judge whether cut-point into syllable.When the sound frame phonetic feature of holophrastic speech tone signal is lower than threshold value, promptly represent the cut-point that this sound frame is a syllable.Therefore, after analytic unit 20 produces threshold values, cutting unit 30 contrast phone feature and threshold values, and then the holophrastic speech tone signal is divided into single a plurality of syllable.
Illustrating, referring to Fig. 2, is that example explains with individual character dagoba (dagoba) wherein.Dagoba has three syllables, finds out have tangible phonetic feature to distinguish between each syllable by knowing among Fig. 2.
Referring to Fig. 3, adopt the average amplitude value in the phonetic feature to combine in this embodiment, but be not limited thereto with average zero-crossing rate.Utilize cutting unit 30 that average amplitude value, average zero-crossing rate and threshold value are made comparisons, when finding the average amplitude value, when average zero-crossing rate is lower than threshold value, promptly being expressed as the cut-point of syllable.Therefore, by knowing the holophrastic speech tone signal of finding out Dagoba among Fig. 3, be three syllables according to its phonetic feature and by cutting.
In addition, when cutting unit 30 according to phonetic feature, the holophrastic speech tone signal is divided into single or multiple syllables after, can utilize the storage element (not shown) that each syllable is stored, so that later use to be provided, for example: in the e-dictionary, pronunciation synthetic etc.