五、發明說明(1) 本發明係有關一種「在固定頻率下調整語音速度之方 法」,尤指一種針對語言學習機或電腦教學軟體中的重播 功能’將語音速度放慢或加快,而不會改變其語音頻率之 方法。 按;一般之語言學習係能錄取錄音帶的語音於暫停前 最後所放的一段予以重覆播放,但常因原始語句過快而聽 不清楚,右將其放慢則不可能,因此,有些語言學習機設 有速度快慢的調整鈕,惟當速度調慢時僅為將錄音機的馬 達轉速變慢,或於重覆放音時將單位時間内的取樣點減 少’造成語音的頻率降低,甚至無法辨識語音内容,如圖 一所示,其係為原始語音波形,而圖二則為轉速放慢後之 波形,其猶如對時間轴將原音波形拉長,因此而降低了頻 率。 爰是,本發明之主要目的,即在提供一種「在固定頻 ί下:ί: θ速度之方法」’係將語音資料分割為若干小 :往立丰:一小段重覆兩次,或將分割後的語音中之偶數 段°二2,使語音於固定頻率下達到調整放音之速度。 $ 個點方法,係將語音資料分割為每一段含128個點 或256個點,以防止再生音之失真。 利於方法,係以m作為語音資料之取樣頻率,以 前述之方法,係於每一小 點向下尋找越零點,並記錄該 點波形尾蠕之越零點為切割點 段中取其最高點,由該最高 越零點位置,再以各段最高 予以分段,以消除兩段間之V. Description of the invention (1) The present invention relates to a "method for adjusting the speed of speech at a fixed frequency", in particular to a replay function in a language learning machine or computer teaching software to slow down or speed up the speed of speech without Method that will change its speech frequency. Press; the general language learning department can record the audio of the tape before the pause and repeat it, but often because the original sentence is too fast to hear clearly, it is impossible to slow down the right, so some languages The learning machine is equipped with a speed adjustment button, but when the speed is slowed down, the speed of the recorder's motor will only be slowed down, or the sampling points per unit time will be reduced when repeating playback, which will cause the frequency of the speech to decrease, or even not. Recognizing the voice content, as shown in Figure 1, it is the original voice waveform, while Figure 2 is the waveform after the speed is reduced, which is like stretching the original sound waveform on the time axis, thus reducing the frequency. That is, the main purpose of the present invention is to provide a "under a fixed frequency:": θ speed method ", which is to divide the voice data into a number of small: To Lifeng: Repeat a short section twice, or The even segment of the segmented voice is ° 2, which enables the voice to adjust the speed of playback at a fixed frequency. The $ point method is to divide the speech data into 128 points or 256 points per segment to prevent distortion of the reproduced sound. Convenient method is to use m as the sampling frequency of the voice data. In the foregoing method, the zero-crossing point is searched for each small point downward, and the zero-crossing point of the trailing waveform of the point is recorded as the highest point in the cutting point segment. From the highest zero-crossing position, segment by the highest segment to eliminate the gap between the two segments.
第4頁 五、發明說明(2) 間隙’擷取完整之語音波形^ 本發明之主要特點及其新穎性,將於配合以下所附圖 式實施之洋細說明而更趨明瞭,如圖所示: 第一圖係原始語音波形示意圖。 第二圖係一般語言學習機放慢轉速之波形示意圖。 第三圖係2b i t取樣之波形示意圖。 第四圖係第三圖增加取樣率之波形示意圖。 第五圖係3b i t取樣之波形示意圖。 第六圖係8bi t取樣之波形示意圖' 第七圖係221(頻率之取樣波形示意圖。 第八、九圖係分割後之語音波形示意圖。 第十圖係本發明之語音波形示意圖。 第十一圖係本發明加…快語音速度之波形名意圖。 第十一、十三圖係本發明不同速度調整取段參考表。 由於本發明於分段過程中,需將類比語音訊號轉換為 數位語音訊號,因此,先將該轉換取樣過程作一說明: 所謂取樣率(Samp 1 i ng rate )係將單位時間内(每 秒)的取樣點數,以正弦波為例,當一個正弦波的訊號被 數位化時,如圖三示,若以2個!;) i t s取樣,因$ =4,可有 〇-3 等4 個 level,以2 為零越點(Zero crossing point, 即作為波形中央之參考點或沒有語音訊號時之基準點), 則如圖所示’其直線段連成之波形係為再生之波形,與原 來的正弦波形存在相當之差距,而如圖四所示,雖已增加 了取樣率,但其精確度仍嫌不夠。5. Description of the invention on page 4 (2) The gap 'captures complete voice waveforms' ^ The main features and novelty of the present invention will be made clearer with the detailed description of the implementation of the following drawings, as shown in the figure The first picture is a schematic diagram of the original voice waveform. The second figure is a waveform diagram of the slowing down speed of a general language learning machine. The third diagram is a waveform diagram of 2b i t sampling. The fourth diagram is a waveform diagram of the third diagram for increasing the sampling rate. The fifth figure is a waveform diagram of 3b i t sampling. The sixth diagram is a waveform diagram of 8bit sampling. The seventh diagram is a diagram of 221 (frequency sampling waveform. The eighth and ninth diagrams are the divided voice waveform diagrams. The tenth diagram is the voice waveform diagram of the present invention. The eleventh The figure shows the intent of the waveform name of the present invention with fast speech speed. The eleventh and thirteenth drawings are reference tables of different speed adjustment and segmentation of the present invention. Since the present invention is in the process of segmentation, the analog voice signal needs to be converted into digital voice Signal, therefore, the conversion sampling process is explained first: The so-called sampling rate (Samp 1 ing rate) refers to the number of sampling points per unit time (per second), taking a sine wave as an example, when a sine wave signal When it is digitized, as shown in Figure 3, if there are 2 !;) its sampling, because $ = 4, there can be 4 levels such as 0-3, and 2 is the zero crossing point (Zero crossing point, that is, the center of the waveform) Reference point or reference point when there is no voice signal), as shown in the figure, the waveform formed by the straight line segments is a reproduced waveform, which is quite different from the original sinusoidal waveform, as shown in Figure 4, although it has been increased Sampling rate , But its accuracy is still not enough.
五、發明說明(3) 因此,如圖五所示,可增加為3b it來取樣’因23 = 8, 可有0-7等8個level,以4為越零點,使再生波較接近原正 弦波,而一般則如圖六所示,以8個b i t取樣,可將語音振 幅分為0-255等256個level,並以128為越零點。 惟一般之樂音中含有許多高頻樂器,故需要較大的取 樣率,人耳的聽力範圍係於20〜20KHZ,語音頻率於1KHZ以 下,所以2K以上的取樣率應足以辨識(如圖七所示)。 而若欲將吾人的語音放慢而不改變其頻率,可將語音 資料的§§·句分割為若干小段’假設每一小.段内的波形變化 不大,則將每一小段重覆播放兩次,即可達到速度放慢一 倍之效杲,惟由於語句的切割與重置如圖八、九所示·,相 鄰的兩段之間連接處會出現間隙,使調整後的語音產生抖 2 ’因此’要避免抖音的產生就需先除去間隙,但由於語 曰的訊號相當複雜,若於每一段只擷取一個完整的週期波 非常不易,且語音的頻率隨時在變化中,波長亦隨之變 =,因此很難以固定長度擷取,由是,本發明係依據下列 步驟改善前述之缺失: ^將數位化的語音資料分割為每一段含128個點。 .於每一段中取極大值之所在,即最高點。 置3:由最高點向下尋找越零點,並記錄該越零點之位 4.以最南點波長尾端之越变點太φι 可消昤& π * β < 鸡苓點為切割點予以分段,則 J存除兩段連接處之間隙, 點^ ^ u 四運接點的值均在越零 點附近,且其斜率皆為負,因 』你K令 U此連接點之線段較為V. Description of the invention (3) Therefore, as shown in Figure 5, it can be increased to 3b it to sample 'because 23 = 8, there can be 8 levels such as 0-7, with 4 as the zero crossing point, so that the regeneration wave is closer to the original A sine wave, as shown in Figure 6, generally uses 8-bit sampling to divide the speech amplitude into 256 levels, such as 0-255, with 128 as the zero crossing. However, the general music contains many high-frequency instruments, so a large sampling rate is required. The hearing range of the human ear is 20 ~ 20KHZ, and the speech frequency is below 1KHZ. Therefore, the sampling rate above 2K should be sufficient to identify (see Figure 7). Show). And if you want to slow down my voice without changing its frequency, you can divide the §§ · sentence of the voice data into several small segments. Assuming that each small segment. The waveform in the segment does not change much, then repeat each small segment. Twice, the speed can be doubled. However, because the sentence is cut and reset as shown in Figures VIII and IX, there will be a gap between the adjacent two segments, making the adjusted voice To produce vibrating 2 'Therefore, to avoid the occurrence of vibrato, it is necessary to remove the gap first, but because the signal is very complicated, it is very difficult to capture only one complete periodic wave in each segment, and the frequency of the voice is changing at any time. The wavelength changes accordingly, so it is difficult to capture with a fixed length. Therefore, the present invention improves the aforementioned defects according to the following steps: ^ Segment the digitalized voice data into 128 segments per segment. Take the maximum value in each paragraph, which is the highest point. Set 3: Find the zero-crossing point from the highest point downwards, and record the position of the zero-crossing point 4. The changing point at the tail of the southernmost wavelength is too φι Can be eliminated & π * β If it is segmented, J stores the gap between the two joints. The value of the point ^ ^ u is close to the zero crossing point, and the slopes are all negative, because "You make U the line segment of this connection point.
五、發明說明(4) 平缓而少犬出(如圖十所示)。 前述每一段所含點數的適當值係由取樣率決定,於 22K的取樣率之下,若每段取1024點,則放慢之再生音有 明顯迴音出現’若每一段取5丨2點,則迴音較少,而若每 一段取256點或128點,則再生音失真極少,而若每一段取 64點,則再生音有明顯雜音出現,於22K之取樣率下,每 一段所含點數從128點至256點皆可,而若取樣率為5.5K, w 則每一段所含點數從32〜64點。 依m述之分段方法,若欲在固定頻率下將語音速度加 快,則可於分段時只取單數段(即一、三、五·._ )(如圖 Ί—所示)。 而若欲於加快一倍與放慢一倍間有其他的速度選擇, 則可如圖十二所顯示之選取語音段的方法來調整語音速 度,該例表中的第三欄之數字係表示切割語音之^ ^,加 圈者為此段被去除,未加圈者為被選取,第二欄中的七表 不播放語音段數與原語音切割段數之比,而為利於程式設 計時之處理’圖十二可歸納為如圖十三之例表(其中之/ 表示切割之語音段之序號)。 、V. Description of the invention (4) Gentle but less dog-like (as shown in Figure 10). The appropriate value of the number of points contained in each of the foregoing paragraphs is determined by the sampling rate. Under the sampling rate of 22K, if 1024 points are taken in each segment, the slower reproduction sound will have obvious echoes. If each segment takes 5 丨 2 points , There are fewer echoes, and if each segment takes 256 or 128 points, the distortion of the reproduced sound is very small, and if each segment takes 64 points, the reproduced sound has obvious noise. At a sampling rate of 22K, each segment contains The number of points can be from 128 to 256 points, and if the sampling rate is 5.5K, w contains points from 32 to 64 points. According to the segmentation method described in m, if you want to speed up the speech speed at a fixed frequency, you can only take the singular segment (ie, one, three, five · ._) when segmenting (as shown in Figure Ί—). And if there are other speed options between double the speed and double the speed, you can adjust the voice speed by selecting the voice segment as shown in Figure 12. The number in the third column of the example table indicates Cut the voice ^ ^, the circled person is removed for this segment, the non-circled person is selected, and the seven tables in the second column do not play the ratio of the number of speech segments to the number of original speech cut segments, which is beneficial for programming. The processing of Fig. 12 can be summarized as the example table in Fig. 13 (where / indicates the serial number of the cut voice segment). ,
第7頁Page 7