TW394928B - A process for adjusting voice speed at a fixed frequency - Google Patents

A process for adjusting voice speed at a fixed frequency Download PDF

Info

Publication number
TW394928B
TW394928B TW86109157A TW86109157A TW394928B TW 394928 B TW394928 B TW 394928B TW 86109157 A TW86109157 A TW 86109157A TW 86109157 A TW86109157 A TW 86109157A TW 394928 B TW394928 B TW 394928B
Authority
TW
Taiwan
Prior art keywords
segment
point
speed
speech
segments
Prior art date
Application number
TW86109157A
Other languages
Chinese (zh)
Inventor
Bi-Yu Pan
Original Assignee
Pan Bi Yu
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pan Bi Yu filed Critical Pan Bi Yu
Priority to TW86109157A priority Critical patent/TW394928B/en
Application granted granted Critical
Publication of TW394928B publication Critical patent/TW394928B/en

Links

Landscapes

  • Signal Processing Not Specific To The Method Of Recording And Reproducing (AREA)

Abstract

A process of adjusting the voice speed at a fixed frequency. It is mainly that the voice data are divided into several segments. To take the highest point in each segment and search for the over-zero point from the highest point. After recording the position of the over-zero point, take the over-zero point of the tail of the highest wave form in each segment as the cutting point to partition and eliminate the gap between two segments. Thus, the voice speed is increased or decreased without changing the voice frequency.

Description

五、發明說明(1) 本發明係有關一種「在固定頻率下調整語音速度之方 法」,尤指一種針對語言學習機或電腦教學軟體中的重播 功能’將語音速度放慢或加快,而不會改變其語音頻率之 方法。 按;一般之語言學習係能錄取錄音帶的語音於暫停前 最後所放的一段予以重覆播放,但常因原始語句過快而聽 不清楚,右將其放慢則不可能,因此,有些語言學習機設 有速度快慢的調整鈕,惟當速度調慢時僅為將錄音機的馬 達轉速變慢,或於重覆放音時將單位時間内的取樣點減 少’造成語音的頻率降低,甚至無法辨識語音内容,如圖 一所示,其係為原始語音波形,而圖二則為轉速放慢後之 波形,其猶如對時間轴將原音波形拉長,因此而降低了頻 率。 爰是,本發明之主要目的,即在提供一種「在固定頻 ί下:ί: θ速度之方法」’係將語音資料分割為若干小 :往立丰:一小段重覆兩次,或將分割後的語音中之偶數 段°二2,使語音於固定頻率下達到調整放音之速度。 $ 個點方法,係將語音資料分割為每一段含128個點 或256個點,以防止再生音之失真。 利於方法,係以m作為語音資料之取樣頻率,以 前述之方法,係於每一小 點向下尋找越零點,並記錄該 點波形尾蠕之越零點為切割點 段中取其最高點,由該最高 越零點位置,再以各段最高 予以分段,以消除兩段間之V. Description of the invention (1) The present invention relates to a "method for adjusting the speed of speech at a fixed frequency", in particular to a replay function in a language learning machine or computer teaching software to slow down or speed up the speed of speech without Method that will change its speech frequency. Press; the general language learning department can record the audio of the tape before the pause and repeat it, but often because the original sentence is too fast to hear clearly, it is impossible to slow down the right, so some languages The learning machine is equipped with a speed adjustment button, but when the speed is slowed down, the speed of the recorder's motor will only be slowed down, or the sampling points per unit time will be reduced when repeating playback, which will cause the frequency of the speech to decrease, or even not. Recognizing the voice content, as shown in Figure 1, it is the original voice waveform, while Figure 2 is the waveform after the speed is reduced, which is like stretching the original sound waveform on the time axis, thus reducing the frequency. That is, the main purpose of the present invention is to provide a "under a fixed frequency:": θ speed method ", which is to divide the voice data into a number of small: To Lifeng: Repeat a short section twice, or The even segment of the segmented voice is ° 2, which enables the voice to adjust the speed of playback at a fixed frequency. The $ point method is to divide the speech data into 128 points or 256 points per segment to prevent distortion of the reproduced sound. Convenient method is to use m as the sampling frequency of the voice data. In the foregoing method, the zero-crossing point is searched for each small point downward, and the zero-crossing point of the trailing waveform of the point is recorded as the highest point in the cutting point segment. From the highest zero-crossing position, segment by the highest segment to eliminate the gap between the two segments.

第4頁 五、發明說明(2) 間隙’擷取完整之語音波形^ 本發明之主要特點及其新穎性,將於配合以下所附圖 式實施之洋細說明而更趨明瞭,如圖所示: 第一圖係原始語音波形示意圖。 第二圖係一般語言學習機放慢轉速之波形示意圖。 第三圖係2b i t取樣之波形示意圖。 第四圖係第三圖增加取樣率之波形示意圖。 第五圖係3b i t取樣之波形示意圖。 第六圖係8bi t取樣之波形示意圖' 第七圖係221(頻率之取樣波形示意圖。 第八、九圖係分割後之語音波形示意圖。 第十圖係本發明之語音波形示意圖。 第十一圖係本發明加…快語音速度之波形名意圖。 第十一、十三圖係本發明不同速度調整取段參考表。 由於本發明於分段過程中,需將類比語音訊號轉換為 數位語音訊號,因此,先將該轉換取樣過程作一說明: 所謂取樣率(Samp 1 i ng rate )係將單位時間内(每 秒)的取樣點數,以正弦波為例,當一個正弦波的訊號被 數位化時,如圖三示,若以2個!;) i t s取樣,因$ =4,可有 〇-3 等4 個 level,以2 為零越點(Zero crossing point, 即作為波形中央之參考點或沒有語音訊號時之基準點), 則如圖所示’其直線段連成之波形係為再生之波形,與原 來的正弦波形存在相當之差距,而如圖四所示,雖已增加 了取樣率,但其精確度仍嫌不夠。5. Description of the invention on page 4 (2) The gap 'captures complete voice waveforms' ^ The main features and novelty of the present invention will be made clearer with the detailed description of the implementation of the following drawings, as shown in the figure The first picture is a schematic diagram of the original voice waveform. The second figure is a waveform diagram of the slowing down speed of a general language learning machine. The third diagram is a waveform diagram of 2b i t sampling. The fourth diagram is a waveform diagram of the third diagram for increasing the sampling rate. The fifth figure is a waveform diagram of 3b i t sampling. The sixth diagram is a waveform diagram of 8bit sampling. The seventh diagram is a diagram of 221 (frequency sampling waveform. The eighth and ninth diagrams are the divided voice waveform diagrams. The tenth diagram is the voice waveform diagram of the present invention. The eleventh The figure shows the intent of the waveform name of the present invention with fast speech speed. The eleventh and thirteenth drawings are reference tables of different speed adjustment and segmentation of the present invention. Since the present invention is in the process of segmentation, the analog voice signal needs to be converted into digital voice Signal, therefore, the conversion sampling process is explained first: The so-called sampling rate (Samp 1 ing rate) refers to the number of sampling points per unit time (per second), taking a sine wave as an example, when a sine wave signal When it is digitized, as shown in Figure 3, if there are 2 !;) its sampling, because $ = 4, there can be 4 levels such as 0-3, and 2 is the zero crossing point (Zero crossing point, that is, the center of the waveform) Reference point or reference point when there is no voice signal), as shown in the figure, the waveform formed by the straight line segments is a reproduced waveform, which is quite different from the original sinusoidal waveform, as shown in Figure 4, although it has been increased Sampling rate , But its accuracy is still not enough.

五、發明說明(3) 因此,如圖五所示,可增加為3b it來取樣’因23 = 8, 可有0-7等8個level,以4為越零點,使再生波較接近原正 弦波,而一般則如圖六所示,以8個b i t取樣,可將語音振 幅分為0-255等256個level,並以128為越零點。 惟一般之樂音中含有許多高頻樂器,故需要較大的取 樣率,人耳的聽力範圍係於20〜20KHZ,語音頻率於1KHZ以 下,所以2K以上的取樣率應足以辨識(如圖七所示)。 而若欲將吾人的語音放慢而不改變其頻率,可將語音 資料的§§·句分割為若干小段’假設每一小.段内的波形變化 不大,則將每一小段重覆播放兩次,即可達到速度放慢一 倍之效杲,惟由於語句的切割與重置如圖八、九所示·,相 鄰的兩段之間連接處會出現間隙,使調整後的語音產生抖 2 ’因此’要避免抖音的產生就需先除去間隙,但由於語 曰的訊號相當複雜,若於每一段只擷取一個完整的週期波 非常不易,且語音的頻率隨時在變化中,波長亦隨之變 =,因此很難以固定長度擷取,由是,本發明係依據下列 步驟改善前述之缺失: ^將數位化的語音資料分割為每一段含128個點。 .於每一段中取極大值之所在,即最高點。 置3:由最高點向下尋找越零點,並記錄該越零點之位 4.以最南點波長尾端之越变點太φι 可消昤& π * β < 鸡苓點為切割點予以分段,則 J存除兩段連接處之間隙, 點^ ^ u 四運接點的值均在越零 點附近,且其斜率皆為負,因 』你K令 U此連接點之線段較為V. Description of the invention (3) Therefore, as shown in Figure 5, it can be increased to 3b it to sample 'because 23 = 8, there can be 8 levels such as 0-7, with 4 as the zero crossing point, so that the regeneration wave is closer to the original A sine wave, as shown in Figure 6, generally uses 8-bit sampling to divide the speech amplitude into 256 levels, such as 0-255, with 128 as the zero crossing. However, the general music contains many high-frequency instruments, so a large sampling rate is required. The hearing range of the human ear is 20 ~ 20KHZ, and the speech frequency is below 1KHZ. Therefore, the sampling rate above 2K should be sufficient to identify (see Figure 7). Show). And if you want to slow down my voice without changing its frequency, you can divide the §§ · sentence of the voice data into several small segments. Assuming that each small segment. The waveform in the segment does not change much, then repeat each small segment. Twice, the speed can be doubled. However, because the sentence is cut and reset as shown in Figures VIII and IX, there will be a gap between the adjacent two segments, making the adjusted voice To produce vibrating 2 'Therefore, to avoid the occurrence of vibrato, it is necessary to remove the gap first, but because the signal is very complicated, it is very difficult to capture only one complete periodic wave in each segment, and the frequency of the voice is changing at any time. The wavelength changes accordingly, so it is difficult to capture with a fixed length. Therefore, the present invention improves the aforementioned defects according to the following steps: ^ Segment the digitalized voice data into 128 segments per segment. Take the maximum value in each paragraph, which is the highest point. Set 3: Find the zero-crossing point from the highest point downwards, and record the position of the zero-crossing point 4. The changing point at the tail of the southernmost wavelength is too φι Can be eliminated & π * β If it is segmented, J stores the gap between the two joints. The value of the point ^ ^ u is close to the zero crossing point, and the slopes are all negative, because "You make U the line segment of this connection point.

五、發明說明(4) 平缓而少犬出(如圖十所示)。 前述每一段所含點數的適當值係由取樣率決定,於 22K的取樣率之下,若每段取1024點,則放慢之再生音有 明顯迴音出現’若每一段取5丨2點,則迴音較少,而若每 一段取256點或128點,則再生音失真極少,而若每一段取 64點,則再生音有明顯雜音出現,於22K之取樣率下,每 一段所含點數從128點至256點皆可,而若取樣率為5.5K, w 則每一段所含點數從32〜64點。 依m述之分段方法,若欲在固定頻率下將語音速度加 快,則可於分段時只取單數段(即一、三、五·._ )(如圖 Ί—所示)。 而若欲於加快一倍與放慢一倍間有其他的速度選擇, 則可如圖十二所顯示之選取語音段的方法來調整語音速 度,該例表中的第三欄之數字係表示切割語音之^ ^,加 圈者為此段被去除,未加圈者為被選取,第二欄中的七表 不播放語音段數與原語音切割段數之比,而為利於程式設 計時之處理’圖十二可歸納為如圖十三之例表(其中之/ 表示切割之語音段之序號)。 、V. Description of the invention (4) Gentle but less dog-like (as shown in Figure 10). The appropriate value of the number of points contained in each of the foregoing paragraphs is determined by the sampling rate. Under the sampling rate of 22K, if 1024 points are taken in each segment, the slower reproduction sound will have obvious echoes. If each segment takes 5 丨 2 points , There are fewer echoes, and if each segment takes 256 or 128 points, the distortion of the reproduced sound is very small, and if each segment takes 64 points, the reproduced sound has obvious noise. At a sampling rate of 22K, each segment contains The number of points can be from 128 to 256 points, and if the sampling rate is 5.5K, w contains points from 32 to 64 points. According to the segmentation method described in m, if you want to speed up the speech speed at a fixed frequency, you can only take the singular segment (ie, one, three, five · ._) when segmenting (as shown in Figure Ί—). And if there are other speed options between double the speed and double the speed, you can adjust the voice speed by selecting the voice segment as shown in Figure 12. The number in the third column of the example table indicates Cut the voice ^ ^, the circled person is removed for this segment, the non-circled person is selected, and the seven tables in the second column do not play the ratio of the number of speech segments to the number of original speech cut segments, which is beneficial for programming. The processing of Fig. 12 can be summarized as the example table in Fig. 13 (where / indicates the serial number of the cut voice segment). ,

第7頁Page 7

Claims (1)

六、申請專利範圍 調整語音速度之方法」,主 1· 一種「在固定頻率下 要包含: 當之=化語音資料分割為若干小段’使每一段内含適 於每一段中取其最高點; 由最高點向下尋找至越零點,並記錄該越零點之位 以最高點所在波尾端之越零點為切割點予以分段,以 消除相鄰兩段語音連接處之間隙; 於放音時取其單數段,使語音於固定頻率下加 之速度; 於放音時使每一段連續取兩次,使語音於固定頻率下 放慢一倍之速度; 以及 選取不同間隔之語音段,以調整不同之放音速度。 立2/:如申請專利範圍第1項所述之方法,在其中,該語 曰=料每一段所含之點數值,係以22KHZ作為取樣率。 3;如申請專利範圍第1項所述之方法,在其中,該語 資料所含之點數,係為128或256點。6. Method for adjusting the speed of speech by applying for a patent scope ", Main 1. ·" In a fixed frequency, it shall include: when it = divides the speech data into several small segments' so that each segment is suitable for each segment to take its highest point; Search downward from the highest point to the zero crossing point, and record the position of the zero crossing point to segment the zero crossing point at the tail of the highest point as the cutting point to eliminate the gap between the adjacent two segments of voice connection. Take its singular segments to increase the speed of the speech at a fixed frequency; make each segment take two consecutive times when playing back, so that the speech slows down twice as fast at the fixed frequency; and select speech segments at different intervals to adjust different The speed of sound reproduction. 2 /: The method as described in item 1 of the scope of patent application, where the phrase = the number of points contained in each paragraph, using 22KHZ as the sampling rate. 3; as in the scope of patent application The method described in item 1, wherein the number of points included in the language data is 128 or 256 points.
TW86109157A 1997-06-30 1997-06-30 A process for adjusting voice speed at a fixed frequency TW394928B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW86109157A TW394928B (en) 1997-06-30 1997-06-30 A process for adjusting voice speed at a fixed frequency

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW86109157A TW394928B (en) 1997-06-30 1997-06-30 A process for adjusting voice speed at a fixed frequency

Publications (1)

Publication Number Publication Date
TW394928B true TW394928B (en) 2000-06-21

Family

ID=21626756

Family Applications (1)

Application Number Title Priority Date Filing Date
TW86109157A TW394928B (en) 1997-06-30 1997-06-30 A process for adjusting voice speed at a fixed frequency

Country Status (1)

Country Link
TW (1) TW394928B (en)

Similar Documents

Publication Publication Date Title
Arons Techniques, perception, and applications of time-compressed speech
US7853447B2 (en) Method for varying speech speed
Lewis Automated lip‐sync: Background and techniques
McLoughlin Speech and Audio Processing: a MATLAB-based approach
JP2000511651A (en) Non-uniform time scaling of recorded audio signals
CN1148230A (en) Method and system for karaoke scoring
JPH10187188A (en) Method and device for speech reproducing
Crockett High quality multi-channel time-scaling and pitch-shifting using auditory scene analysis
TW394928B (en) A process for adjusting voice speed at a fixed frequency
JPS5982608A (en) System for controlling reproducing speed of sound
Whalen et al. The Haskins Laboratories’ pulse code modulation (PCM) system
Martin Speech acoustic analysis
Modegi et al. Proposals of MIDI coding and its application for audio authoring
JP2003255999A (en) Variable speed reproducing device for encoded digital audio signal
CN1127053C (en) Method of and apparatus for discriminating non-sounds and voiceless sounds of speech signals
US20040054524A1 (en) Speech transformation system and apparatus
JPH10133678A (en) Voice reproducing device
Jones Compositional control of phonetic/nonphonetic perception
JP2734028B2 (en) Audio recording device
TW442740B (en) Method for changing articulation speed
KR20010025770A (en) On the Real-Time Fairy Tale Narration System with Parent's Voice Color
JP2011180194A (en) Phoneme code-converting device, phoneme code database, and voice synthesizer
KR100359988B1 (en) real-time speaking rate conversion system
CN1310439A (en) Speech speed regulating method in fixed frequency
JP2002502510A (en) Method and apparatus for playing recorded audio with alternative performance attributes and temporal characteristics

Legal Events

Date Code Title Description
GD4A Issue of patent certificate for granted invention patent
MM4A Annulment or lapse of patent due to non-payment of fees