CN201323053Y - Automatic segmentation device of single-word speech signal - Google Patents

Automatic segmentation device of single-word speech signal Download PDF

Info

Publication number
CN201323053Y
CN201323053Y CNU200820222733XU CN200820222733U CN201323053Y CN 201323053 Y CN201323053 Y CN 201323053Y CN U200820222733X U CNU200820222733X U CN U200820222733XU CN 200820222733 U CN200820222733 U CN 200820222733U CN 201323053 Y CN201323053 Y CN 201323053Y
Authority
CN
China
Prior art keywords
speech signal
unit
phonetic feature
single character
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNU200820222733XU
Other languages
Chinese (zh)
Inventor
陈淮琰
韩召宁
杨亚冬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inventec Besta Xian Co Ltd
Original Assignee
Inventec Besta Xian Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inventec Besta Xian Co Ltd filed Critical Inventec Besta Xian Co Ltd
Priority to CNU200820222733XU priority Critical patent/CN201323053Y/en
Application granted granted Critical
Publication of CN201323053Y publication Critical patent/CN201323053Y/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Abstract

The utility model relates to an automatic segmentation device of a single-word speech signal, which comprises a reception unit, an analysis unit and a segmentation unit. The reception unit is connected with the analysis unit, and the analysis unit is connected with the segmentation unit. The utility model completely replaces the traditional method of manually separating the single-word speech; the whole process requires no manual intervention; the operation is time-saving, labor-saving and highly efficient; and the utility model greatly reduces the rate of human-induced errors.

Description

The device of automatically splitting speech signal of single character
Technical field
The utility model relates to a kind of device of splitting speech signal of single character, especially a kind of device of automatically splitting speech signal of single character.
Background technology
In the language learning process, regular meeting is by many language learning instruments, as e-dictionary etc., is used for increasing results of learning and accelerates pace of learning.General e-dictionary all has the function of pronunciation at present, that is to say can pass through the function of pronunciation after the user is by e-dictionary inquiry individual character or example sentence, and listens to the pronunciation of correct individual character or example sentence.So, can significantly promote the user in language learning, in the ability of listening aspect saying.The function of pronunciation that therefore, increasing manufacturer pay attention to day by day e-dictionary is arranged.
Recently e-dictionary is bragged about the function with true man's pronunciation, has become the characteristic of each institute of manufacturer demand.And true man's pronunciation can be recorded the sound wave of each individual character by true man, and reaches the function of true man's pronunciation.Yet, utilize true man to record the sound wave of all individual characters, will expend the very many storage areas of the sub-dictionary internal memory of power down, and then promote the expenditure of cost.
Therefore, develop, and reach function, so can save the space of internal memory, also improve the quality of pronunciation simultaneously near true man pronunciation by the synthetic mode of pronouncing.The synthetic mode of pronunciation generally can be divided into two kinds, is that example is described as follows with the English-word.
First kind of mode decides syllable according to the phonetic symbol in the English dictionary word list.Before the speech data of a synthetic English-word, must earlier this English-word be divided into single or multiple syllables, obtain out in the data by original recording again and the corresponding sound wave of syllable, and in addition in conjunction with getting final product.
The second way is recorded each syllable sound wave of all various initial consonants, simple or compound vowel of a Chinese syllable and combinations of tones, and is stored in the internal memory.Before the speech data of a synthetic English-word, must earlier this English-word be divided into single or multiple syllables, again by obtain out in the data of recording with cut apart after the corresponding sound wave of each syllable, and in addition combination gets final product.
As shown in the above description,, all must earlier English-word be divided into single or multiple syllables, just can carry out follow-up processing no matter be the synthetic mode of which kind of pronunciation.And on traditional practice, be to utilize manual cutting by people's ear audition.So, need to drop into great amount of manpower and just can finish man-hour.In addition, the work of manual cutting syllable is uninteresting, quantity is huge, and adopts people's ear audition and do the cutting of syllable, very easily produces error.
Therefore, how to solve the problem that artificial traditionally cutting individual character voice are derived, be the subject under discussion that needs to be resolved hurrily.
The utility model content
The utility model is to solve the above-mentioned technical matters that exists in the background technology, and proposes a kind of device of automatically splitting speech signal of single character.
Technical solution of the present utility model is: the utility model is a kind of device of automatically splitting speech signal of single character, its special character is: this device comprises: receiving element, be used for receiving the holophrastic speech tone signal, and the holophrastic speech tone signal is divided into a plurality of sound frames; Analytic unit is analyzed the sound frame, produces the phonetic feature corresponding to each sound frame; Cutting unit, according to phonetic feature, splitting speech signal of single character is a syllable, receiving element connect into analysis unit, analytic unit inserts cutting unit.
Above-mentioned phonetic feature comprises the average amplitude value of sound frame.
Above-mentioned phonetic feature comprises the average zero-crossing rate of sound frame.
Above-mentioned phonetic feature comprises the cepstrum parameter of sound frame.
Above-mentioned analytic unit produces threshold value according to phonetic feature, by cutting unit contrast phone feature and threshold value.
The device of the automatically splitting speech signal of single character that the utility model provides, coming automatically splitting speech signal of single character by phonetic feature is syllable, substituted the mode of traditional artificial cutting individual character voice fully, whole process does not need artificial intervention, time saving and energy saving, the efficient height, and to greatly reduce the people be the fault rate that brings.
Description of drawings
Fig. 1 is the device synoptic diagram of automatically splitting speech signal of single character;
Fig. 2 is the synoptic diagram of multisyllable holophrastic speech tone signal;
Fig. 3 is the synoptic diagram of cutting apart of multisyllable holophrastic speech tone signal.
Wherein, 10-receiving element, 20-analytic unit, 30-cutting unit;
Embodiment
Referring to Fig. 1, the device of automatically splitting speech signal of single character comprises: receiving element 10, analytic unit 20 and cutting unit 30.
Receiving element 10 receives the holophrastic speech tone signal, and the holophrastic speech tone signal is divided into a plurality of sound frames.Analytic unit 20 is analyzed a plurality of sound frames, and produces the phonetic feature corresponding to each sound frame.The phonetic feature that cutting unit 30 is analyzed according to analytic unit 20, and then splitting speech signal of single character is a syllable.
The voice signal of each individual character can be not identical, but have some common characteristics, and for example: the pronunciation of multisyllable individual character is made up of each syllable; Syllable is formed has specific rule to follow on voice signal; Utilize phonetic feature can carry out syllable splitting etc.Therefore, the utility model proposes and earlier the holophrastic speech tone signal is divided into a plurality of sound frames, is unit with each sound frame then, utilizes analytic unit 20 to analyze the phonetic feature of each sound frame.
Wherein, above-mentioned mentioned phonetic feature comprises: average amplitude value, average zero-crossing rate, cepstrum parameter etc.To do simple declaration at each phonetic feature below.
The size that is voice signal of the amplitude indication of voice signal has height in a minute as the mankind and rises and falls, so the branch that also has height or power that waveform presented of voice signal.And amplitude is a size of representing voice signal, and the average amplitude value, is that amplitude with all sound frames adds the General Logistics Department and does on average again, so can find out in the unit interval certain section signal strength distribution that voice signal is compared with whole section voice signal.
The average zero-crossing rate of voice signal is meant that signal waveform is passed transverse axis (zero) number of times in the unit interval.That is to say, the amplitude of voice signal in the unit interval, on the occasion of and negative value between transition times be called zero-crossing rate.And signal is divided by the sound frame, the zero-crossing rate of all sound frames is taken statistics on average, promptly be called average zero-crossing rate.
Therefore zero-crossing rate is the number of times of voice signal in the unit interval zero passage briefly.And zero-crossing rate is widely used, especially aspect speech recognition.The high section of zero-crossing rate is corresponding to voiceless sound or do not have the sound area.Relative, noise is higher, and the lower section of zero-crossing rate is corresponding to voiced sound.Hence one can see that, can distinguish voiceless sound in the voice signal and voiced sound, sound and noiseless etc. by judging zero-crossing rate.
Then introduce the cepstrum parameter.In the identification of signal, the most frequently used characteristic parameter is the energy value of signal on frequency spectrum (spectrum), for example: high-frequency signal only has bigger energy value at HFS, relative low frequency signal is bigger at low frequency energy partly, and these energy values on frequency spectrum just can be described as a kind of eigenwert.Utilize the method for fourier transform (Fourier Transform) to frequency spectrum, to deal with the conversion of signals on the time shaft.Yet at voice signal, another kind is called the parameter of cepstrum more can represent the characteristic of voice signal, and discrimination power is improved.Therefore, adopt the cepstrum parameter can promote the discrimination power of holophrastic speech tone signal.
Therefore, the utility model is by the analyzing speech feature, and reaching automatically splitting speech signal of single character by cutting unit 30 according to phonetic feature again is single or multiple syllables.Wherein, analytic unit 20 can produce threshold value according to phonetic feature, utilizes threshold value can judge whether cut-point into syllable.When the sound frame phonetic feature of holophrastic speech tone signal is lower than threshold value, promptly represent the cut-point that this sound frame is a syllable.Therefore, after analytic unit 20 produces threshold values, cutting unit 30 contrast phone feature and threshold values, and then the holophrastic speech tone signal is divided into single a plurality of syllable.
Illustrating, referring to Fig. 2, is that example explains with individual character dagoba (dagoba) wherein.Dagoba has three syllables, finds out have tangible phonetic feature to distinguish between each syllable by knowing among Fig. 2.
Referring to Fig. 3, adopt the average amplitude value in the phonetic feature to combine in this embodiment, but be not limited thereto with average zero-crossing rate.Utilize cutting unit 30 that average amplitude value, average zero-crossing rate and threshold value are made comparisons, when finding the average amplitude value, when average zero-crossing rate is lower than threshold value, promptly being expressed as the cut-point of syllable.Therefore, by knowing the holophrastic speech tone signal of finding out Dagoba among Fig. 3, be three syllables according to its phonetic feature and by cutting.
In addition, when cutting unit 30 according to phonetic feature, the holophrastic speech tone signal is divided into single or multiple syllables after, can utilize the storage element (not shown) that each syllable is stored, so that later use to be provided, for example: in the e-dictionary, pronunciation synthetic etc.

Claims (5)

1, a kind of device of automatically splitting speech signal of single character is characterized in that: this device comprises: receiving element, analytic unit and cutting unit; Described receiving element connect into analysis unit, described analytic unit inserts cutting unit.
2, the device of automatically splitting speech signal of single character according to claim 1 is characterized in that: described phonetic feature comprises the average amplitude value of sound frame.
3, the device of automatically splitting speech signal of single character according to claim 1 is characterized in that: described phonetic feature comprises the average zero-crossing rate of sound frame.
4, the device of automatically splitting speech signal of single character according to claim 1 is characterized in that: described phonetic feature comprises the cepstrum parameter of sound frame.
5, the device of automatically splitting speech signal of single character according to claim 1 is characterized in that: described analytic unit produces threshold value according to phonetic feature, by cutting unit contrast phone feature and threshold value.
CNU200820222733XU 2008-12-02 2008-12-02 Automatic segmentation device of single-word speech signal Expired - Fee Related CN201323053Y (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNU200820222733XU CN201323053Y (en) 2008-12-02 2008-12-02 Automatic segmentation device of single-word speech signal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNU200820222733XU CN201323053Y (en) 2008-12-02 2008-12-02 Automatic segmentation device of single-word speech signal

Publications (1)

Publication Number Publication Date
CN201323053Y true CN201323053Y (en) 2009-10-07

Family

ID=41160429

Family Applications (1)

Application Number Title Priority Date Filing Date
CNU200820222733XU Expired - Fee Related CN201323053Y (en) 2008-12-02 2008-12-02 Automatic segmentation device of single-word speech signal

Country Status (1)

Country Link
CN (1) CN201323053Y (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019119553A1 (en) * 2017-12-21 2019-06-27 深圳市沃特沃德股份有限公司 Semantic recognition method and apparatus

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019119553A1 (en) * 2017-12-21 2019-06-27 深圳市沃特沃德股份有限公司 Semantic recognition method and apparatus

Similar Documents

Publication Publication Date Title
CN101751919B (en) Spoken Chinese stress automatic detection method
CN109410914B (en) Method for identifying Jiangxi dialect speech and dialect point
CN101930735B (en) Speech emotion recognition equipment and speech emotion recognition method
Zhou et al. Efficient audio stream segmentation via the combined T/sup 2/statistic and Bayesian information criterion
US7089184B2 (en) Speech recognition for recognizing speaker-independent, continuous speech
US8326610B2 (en) Producing phonitos based on feature vectors
CN101136199A (en) Voice data processing method and equipment
CN102982811A (en) Voice endpoint detection method based on real-time decoding
CN102903361A (en) Instant call translation system and instant call translation method
CN101290766A (en) Syllable splitting method of Tibetan language of Anduo
CN111105785B (en) Text prosody boundary recognition method and device
CN105374352A (en) Voice activation method and system
CN107564543B (en) Voice feature extraction method with high emotion distinguishing degree
CN103985390A (en) Method for extracting phonetic feature parameters based on gammatone relevant images
CN109377981B (en) Phoneme alignment method and device
CN110459202A (en) A kind of prosodic labeling method, apparatus, equipment, medium
CN110428853A (en) Voice activity detection method, Voice activity detection device and electronic equipment
CN101419796A (en) Device and method for automatically splitting speech signal of single character
CN115240655A (en) Chinese voice recognition system and method based on deep learning
CN114550706A (en) Smart campus voice recognition method based on deep learning
Stanek et al. Algorithms for vowel recognition in fluent speech based on formant positions
CN201323053Y (en) Automatic segmentation device of single-word speech signal
CN103794208A (en) Device and method for separating English word pronunciation according to syllables by utilizing voice characteristics
CN111402887A (en) Method and device for escaping characters by voice
Cen et al. Segmentation of speech signals in template-based speech to singing conversion

Legal Events

Date Code Title Description
C14 Grant of patent or utility model
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Assignee: Village Technology Limited

Assignor: Wudi Science and Technology Co., Ltd. (Xian)

Contract record no.: 2011310000129

Denomination of utility model: Device and method for automatically splitting speech signal of single character

Granted publication date: 20091007

License type: Exclusive License

Record date: 20110808

C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20091007

Termination date: 20131202