TW394928B

TW394928B - A process for adjusting voice speed at a fixed frequency

Info

Publication number: TW394928B
Application number: TW86109157A
Authority: TW
Inventors: Bi-Yu Pan
Original assignee: Pan Bi Yu
Priority date: 1997-06-30
Filing date: 1997-06-30
Publication date: 2000-06-21

Abstract

A process of adjusting the voice speed at a fixed frequency. It is mainly that the voice data are divided into several segments. To take the highest point in each segment and search for the over-zero point from the highest point. After recording the position of the over-zero point, take the over-zero point of the tail of the highest wave form in each segment as the cutting point to partition and eliminate the gap between two segments. Thus, the voice speed is increased or decreased without changing the voice frequency.

Description

五、發明說明（1) 本發明係有關一種「在固定頻率下調整語音速度之方法」，尤指一種針對語言學習機或電腦教學軟體中的重播功能’將語音速度放慢或加快，而不會改變其語音頻率之方法。按；一般之語言學習係能錄取錄音帶的語音於暫停前最後所放的一段予以重覆播放，但常因原始語句過快而聽不清楚，右將其放慢則不可能，因此，有些語言學習機設有速度快慢的調整鈕，惟當速度調慢時僅為將錄音機的馬達轉速變慢，或於重覆放音時將單位時間内的取樣點減少’造成語音的頻率降低，甚至無法辨識語音内容，如圖一所示，其係為原始語音波形，而圖二則為轉速放慢後之波形，其猶如對時間轴將原音波形拉長，因此而降低了頻率。爰是，本發明之主要目的，即在提供一種「在固定頻 ί下：ί: θ速度之方法」’係將語音資料分割為若干小 :往立丰：一小段重覆兩次，或將分割後的語音中之偶數段°二2，使語音於固定頻率下達到調整放音之速度。 $ 個點方法，係將語音資料分割為每一段含128個點或256個點，以防止再生音之失真。利於方法，係以m作為語音資料之取樣頻率，以前述之方法，係於每一小點向下尋找越零點，並記錄該點波形尾蠕之越零點為切割點段中取其最高點，由該最高越零點位置，再以各段最高予以分段，以消除兩段間之V. Description of the invention (1) The present invention relates to a "method for adjusting the speed of speech at a fixed frequency", in particular to a replay function in a language learning machine or computer teaching software to slow down or speed up the speed of speech without Method that will change its speech frequency. Press; the general language learning department can record the audio of the tape before the pause and repeat it, but often because the original sentence is too fast to hear clearly, it is impossible to slow down the right, so some languages The learning machine is equipped with a speed adjustment button, but when the speed is slowed down, the speed of the recorder's motor will only be slowed down, or the sampling points per unit time will be reduced when repeating playback, which will cause the frequency of the speech to decrease, or even not. Recognizing the voice content, as shown in Figure 1, it is the original voice waveform, while Figure 2 is the waveform after the speed is reduced, which is like stretching the original sound waveform on the time axis, thus reducing the frequency. That is, the main purpose of the present invention is to provide a "under a fixed frequency:": θ speed method ", which is to divide the voice data into a number of small: To Lifeng: Repeat a short section twice, or The even segment of the segmented voice is ° 2, which enables the voice to adjust the speed of playback at a fixed frequency. The $ point method is to divide the speech data into 128 points or 256 points per segment to prevent distortion of the reproduced sound. Convenient method is to use m as the sampling frequency of the voice data. In the foregoing method, the zero-crossing point is searched for each small point downward, and the zero-crossing point of the trailing waveform of the point is recorded as the highest point in the cutting point segment. From the highest zero-crossing position, segment by the highest segment to eliminate the gap between the two segments.

第4頁五、發明說明（2) 間隙’擷取完整之語音波形^ 本發明之主要特點及其新穎性，將於配合以下所附圖式實施之洋細說明而更趨明瞭，如圖所示：第一圖係原始語音波形示意圖。第二圖係一般語言學習機放慢轉速之波形示意圖。第三圖係2b i t取樣之波形示意圖。第四圖係第三圖增加取樣率之波形示意圖。第五圖係3b i t取樣之波形示意圖。第六圖係8bi t取樣之波形示意圖' 第七圖係221(頻率之取樣波形示意圖。第八、九圖係分割後之語音波形示意圖。第十圖係本發明之語音波形示意圖。第十一圖係本發明加…快語音速度之波形名意圖。第十一、十三圖係本發明不同速度調整取段參考表。由於本發明於分段過程中，需將類比語音訊號轉換為數位語音訊號，因此，先將該轉換取樣過程作一說明：所謂取樣率（Samp 1 i ng rate )係將單位時間内（每秒）的取樣點數，以正弦波為例，當一個正弦波的訊號被數位化時，如圖三示，若以2個!；) i t s取樣，因$ =4，可有〇-3 等4 個 level，以2 為零越點（Zero crossing point，即作為波形中央之參考點或沒有語音訊號時之基準點），則如圖所示’其直線段連成之波形係為再生之波形，與原來的正弦波形存在相當之差距，而如圖四所示，雖已增加了取樣率，但其精確度仍嫌不夠。5. Description of the invention on page 4 (2) The gap 'captures complete voice waveforms' ^ The main features and novelty of the present invention will be made clearer with the detailed description of the implementation of the following drawings, as shown in the figure The first picture is a schematic diagram of the original voice waveform. The second figure is a waveform diagram of the slowing down speed of a general language learning machine. The third diagram is a waveform diagram of 2b i t sampling. The fourth diagram is a waveform diagram of the third diagram for increasing the sampling rate. The fifth figure is a waveform diagram of 3b i t sampling. The sixth diagram is a waveform diagram of 8bit sampling. The seventh diagram is a diagram of 221 (frequency sampling waveform. The eighth and ninth diagrams are the divided voice waveform diagrams. The tenth diagram is the voice waveform diagram of the present invention. The eleventh The figure shows the intent of the waveform name of the present invention with fast speech speed. The eleventh and thirteenth drawings are reference tables of different speed adjustment and segmentation of the present invention. Since the present invention is in the process of segmentation, the analog voice signal needs to be converted into digital voice Signal, therefore, the conversion sampling process is explained first: The so-called sampling rate (Samp 1 ing rate) refers to the number of sampling points per unit time (per second), taking a sine wave as an example, when a sine wave signal When it is digitized, as shown in Figure 3, if there are 2 !;) its sampling, because $ = 4, there can be 4 levels such as 0-3, and 2 is the zero crossing point (Zero crossing point, that is, the center of the waveform) Reference point or reference point when there is no voice signal), as shown in the figure, the waveform formed by the straight line segments is a reproduced waveform, which is quite different from the original sinusoidal waveform, as shown in Figure 4, although it has been increased Sampling rate , But its accuracy is still not enough.

五、發明說明（3) 因此，如圖五所示，可增加為3b it來取樣’因23 = 8，可有0-7等8個level，以4為越零點，使再生波較接近原正弦波，而一般則如圖六所示，以8個b i t取樣，可將語音振幅分為0-255等256個level，並以128為越零點。惟一般之樂音中含有許多高頻樂器，故需要較大的取樣率，人耳的聽力範圍係於20〜20KHZ，語音頻率於1KHZ以下，所以2K以上的取樣率應足以辨識（如圖七所示）。而若欲將吾人的語音放慢而不改變其頻率，可將語音資料的§§·句分割為若干小段’假設每一小.段内的波形變化不大，則將每一小段重覆播放兩次，即可達到速度放慢一倍之效杲，惟由於語句的切割與重置如圖八、九所示·，相鄰的兩段之間連接處會出現間隙，使調整後的語音產生抖 2 ’因此’要避免抖音的產生就需先除去間隙，但由於語曰的訊號相當複雜，若於每一段只擷取一個完整的週期波非常不易，且語音的頻率隨時在變化中，波長亦隨之變 =，因此很難以固定長度擷取，由是，本發明係依據下列步驟改善前述之缺失： ^將數位化的語音資料分割為每一段含128個點。 .於每一段中取極大值之所在，即最高點。置3:由最高點向下尋找越零點，並記錄該越零點之位 4.以最南點波長尾端之越变點太φι 可消昤& π * β < 鸡苓點為切割點予以分段，則 J存除兩段連接處之間隙，點^ ^ u 四運接點的值均在越零點附近，且其斜率皆為負，因』你K令 U此連接點之線段較為V. Description of the invention (3) Therefore, as shown in Figure 5, it can be increased to 3b it to sample 'because 23 = 8, there can be 8 levels such as 0-7, with 4 as the zero crossing point, so that the regeneration wave is closer to the original A sine wave, as shown in Figure 6, generally uses 8-bit sampling to divide the speech amplitude into 256 levels, such as 0-255, with 128 as the zero crossing. However, the general music contains many high-frequency instruments, so a large sampling rate is required. The hearing range of the human ear is 20 ~ 20KHZ, and the speech frequency is below 1KHZ. Therefore, the sampling rate above 2K should be sufficient to identify (see Figure 7). Show). And if you want to slow down my voice without changing its frequency, you can divide the §§ · sentence of the voice data into several small segments. Assuming that each small segment. The waveform in the segment does not change much, then repeat each small segment. Twice, the speed can be doubled. However, because the sentence is cut and reset as shown in Figures VIII and IX, there will be a gap between the adjacent two segments, making the adjusted voice To produce vibrating 2 'Therefore, to avoid the occurrence of vibrato, it is necessary to remove the gap first, but because the signal is very complicated, it is very difficult to capture only one complete periodic wave in each segment, and the frequency of the voice is changing at any time. The wavelength changes accordingly, so it is difficult to capture with a fixed length. Therefore, the present invention improves the aforementioned defects according to the following steps: ^ Segment the digitalized voice data into 128 segments per segment. Take the maximum value in each paragraph, which is the highest point. Set 3: Find the zero-crossing point from the highest point downwards, and record the position of the zero-crossing point 4. The changing point at the tail of the southernmost wavelength is too φι Can be eliminated & π * β If it is segmented, J stores the gap between the two joints. The value of the point ^ ^ u is close to the zero crossing point, and the slopes are all negative, because "You make U the line segment of this connection point.

五、發明說明（4) 平缓而少犬出（如圖十所示）。前述每一段所含點數的適當值係由取樣率決定，於 22K的取樣率之下，若每段取1024點，則放慢之再生音有明顯迴音出現’若每一段取5丨2點，則迴音較少，而若每一段取256點或128點，則再生音失真極少，而若每一段取 64點，則再生音有明顯雜音出現，於22K之取樣率下，每一段所含點數從128點至256點皆可，而若取樣率為5.5K， w 則每一段所含點數從32〜64點。依m述之分段方法，若欲在固定頻率下將語音速度加快，則可於分段時只取單數段（即一、三、五·._ )(如圖 Ί—所示）。而若欲於加快一倍與放慢一倍間有其他的速度選擇，則可如圖十二所顯示之選取語音段的方法來調整語音速度，該例表中的第三欄之數字係表示切割語音之^ ^，加圈者為此段被去除，未加圈者為被選取，第二欄中的七表不播放語音段數與原語音切割段數之比，而為利於程式設計時之處理’圖十二可歸納為如圖十三之例表（其中之/ 表示切割之語音段之序號）。、V. Description of the invention (4) Gentle but less dog-like (as shown in Figure 10). The appropriate value of the number of points contained in each of the foregoing paragraphs is determined by the sampling rate. Under the sampling rate of 22K, if 1024 points are taken in each segment, the slower reproduction sound will have obvious echoes. If each segment takes 5 丨 2 points , There are fewer echoes, and if each segment takes 256 or 128 points, the distortion of the reproduced sound is very small, and if each segment takes 64 points, the reproduced sound has obvious noise. At a sampling rate of 22K, each segment contains The number of points can be from 128 to 256 points, and if the sampling rate is 5.5K, w contains points from 32 to 64 points. According to the segmentation method described in m, if you want to speed up the speech speed at a fixed frequency, you can only take the singular segment (ie, one, three, five · ._) when segmenting (as shown in Figure Ί—). And if there are other speed options between double the speed and double the speed, you can adjust the voice speed by selecting the voice segment as shown in Figure 12. The number in the third column of the example table indicates Cut the voice ^ ^, the circled person is removed for this segment, the non-circled person is selected, and the seven tables in the second column do not play the ratio of the number of speech segments to the number of original speech cut segments, which is beneficial for programming. The processing of Fig. 12 can be summarized as the example table in Fig. 13 (where / indicates the serial number of the cut voice segment). ,

第7頁Page 7

Claims

6. Method for adjusting the speed of speech by applying for a patent scope ", Main 1. ·" In a fixed frequency, it shall include: when it = divides the speech data into several small segments' so that each segment is suitable for each segment to take its highest point; Search downward from the highest point to the zero crossing point, and record the position of the zero crossing point to segment the zero crossing point at the tail of the highest point as the cutting point to eliminate the gap between the adjacent two segments of voice connection. Take its singular segments to increase the speed of the speech at a fixed frequency; make each segment take two consecutive times when playing back, so that the speech slows down twice as fast at the fixed frequency; and select speech segments at different intervals to adjust different The speed of sound reproduction. 2 /: The method as described in item 1 of the scope of patent application, where the phrase = the number of points contained in each paragraph, using 22KHZ as the sampling rate. 3; as in the scope of patent application The method described in item 1, wherein the number of points included in the language data is 128 or 256 points.