1278219 九、發明說明: 【發明所屬之技術領域】 本發明係為-種網路電話調整語音速度之系統及其方法, 特別係對即時的語音訊號進行處理,讓使用者可以依個人習慣 決定收聽語音速度之系統及其方法。 【先前技術】 傳統上聲音的傳輸方法,是先要建立起一條固有的電 路’然後在這條電路進行通話,缺點是傳輸資源會被佔用後, 要等待其中—方斷線’線路才可再被其他人使用。但利用網路 電話(VoIP,Voice over Intemet pr〇t〇c〇1)技術,聲音首先會 被數位化,然後將數位聲音資料切割成很多小單位,各小單位 再加上IP (Internet Protocol)後形成封包。 适些封包被傳至IP數據網路上(lp-based data networks)後,可根據網路的使用情形,而作出適當的傳輸路 徑選擇;在到達目的地後,各封包便再次合併並還原為最初的 聲音。利用這技術,聲音的封包便可透過網際網路傳送到世界 各地’而以完全可不使用傳統的公用交換電話網路, Public Switched Telephone Network) 〇 最初的VoIP技術相當粗糙,使用上亦有相當的限制。例 如’你不能直接使用普通的電話作VoIP通話,而被限定只能 使用電腦作通訊設備。除此之外,聲音的品質也不穩定,完全 1278219 要看當時網際網路的擠塞情況而定,但由於使用ν〇π>可以不 須繳付長途電話費而能和世界各地通訊,使得利用這種技術的 人愈來愈多。 到了 1998和推出了能整合電訊網絡的交換器(論啦, 使VoIP技術正式步入了和傳統的公用交換電話網路整合的階 • 段。 VGlP技術,電訊商可贿用互聯_為長途電話的 傳輸骨幹’使長途電話的費用進一步下調。但也隨著替的 馨 技術愈來愈成熟,很多跨國公司已放棄傳統的長途電話,而建 立起公司内部的話音傳輸網路;另一方面,配合不同地區或國 ㈣電關發政策,亦造就了—些小型的電訊公司興起,他們 靈活地使用v〇ip技術,_錄^㈣胃#。 過去在講電話時’常因為外在噪音或是通話對方之習慣 速度太快㈣致聽不清楚之_,或者是由於語謂隔閣,而 籲 必、須對方一直重覆相同對話才能聽的懂之窘境。特別是透過封 包傳輸之聲音常會因為頻寬不足等原因,而有斷斷續續的情況 發生。因此,如何達到調整對話接收速度,使聲音更清楚明確 也傳達者谷易验聽,特別是猶微放慢接收之速度,或者達 到縮短林對財如的咖,使雙方在進行響通話時能 依個人吾好蚊縣逮度,讓聽者可以更清楚地祕不同語t 與速度的對話。 ° 【發明内容】 6 I278219 二鑒於以上的問題,本發明的主要目的在於提供-種網路電 居D周整…速度之系統及其方法’藉由在網路電話所收到之語 音訊號進行語音!輕後,再進行語音輸出,使收話端之使用= 可以獲得㈣#之_效果,且發話端也可_收話端調整語 音速度之通知,吨依提賴適當之調整。1278219 IX. Description of the Invention: [Technical Field] The present invention relates to a system for adjusting voice speed by a network telephone and a method thereof, and particularly for processing an instant voice signal, so that the user can decide to listen according to personal habits. Voice speed system and method. [Prior Art] Traditionally, the method of transmitting sound is to first establish an inherent circuit 'and then make a call in this circuit. The disadvantage is that after the transmission resource is occupied, it is necessary to wait for the - break line' to be re-routed. Used by others. But with VoIP (Voice over Intemet pr〇t〇c〇1) technology, the sound is first digitized, and then the digital sound data is cut into many small units, each small unit plus IP (Internet Protocol) After forming a packet. After the appropriate packets are transmitted to the lp-based data networks, the appropriate transmission path selection can be made according to the usage of the network; after reaching the destination, the packets are merged again and restored to the original. the sound of. With this technology, voice packets can be transmitted to the world via the Internet', but the traditional VoIP technology is quite rough, and the VoIP technology is quite rough. limit. For example, 'you can't use a regular phone for VoIP calls, but you can only use a computer as a communication device. In addition, the quality of the sound is also unstable. The total 1278219 depends on the congestion of the Internet at the time, but because of the use of ν〇π>, it can communicate with the world without paying long-distance telephone charges. More and more people are using this technology. In 1998, we introduced a switch that can integrate telecommunications networks. (There is a VoIP technology that is stepping into the integration with the traditional public switched telephone network. VGlP technology, telecommunications companies can use the Internet for long distance calls. The transmission backbone has further reduced the cost of long-distance calls. However, as the technology of the replacement is becoming more and more mature, many multinational companies have abandoned traditional long-distance calls and established a voice transmission network within the company. On the other hand, Cooperating with different regions or countries (4), the power generation policy has also created - some small telecommunications companies have emerged, they have flexible use of v〇ip technology, _ recorded ^ (four) stomach #. In the past when talking on the phone 'often because of external noise or It is the habit of calling the other party too fast (4) to hear the unclear _, or because the language is the same as the cabinet, and the appeal must be repeated by the other party to listen to the same situation. Especially through the transmission of the voice of the packet often Because of the lack of bandwidth and other reasons, there are intermittent situations. Therefore, how to adjust the speed of the dialogue reception, so that the sound is clearer and clear, also conveyed to Gu Yi, In particular, Yumi slows down the speed of receiving, or shortens the forest to the money, so that when the two sides make a call, they can rely on the individual mosquitoes to catch the degree, so that the listener can know more clearly the different words and speed. Voice signal for voice! Light, then voice output, so that the use of the receiving end = can get (four) # _ effect, and the caller can also _ call the end to adjust the voice speed notice, ton according to the appropriate adjustment .
因此,為達上述目的,本發明所揭露之一種網路電話調整 語音速度之系統,至少關包含:—設定模組,用以接收使用 者設定語音速度調整之參數;—傳輸顯,肋接收發話端所 傳來經過壓縮編碼的語音資料封包,以及傳送語音速度調整之 提示訊號到發話端;—暫存記憶體,肋儲存發話端所傳來之 語音訊號^巾域則,_貞餘音速度調整之運算處 理’ -提喊組’肋根據提示訊號進行提示;—調整模組, 用以將語音訊麵行_及解壓_較峨,並根據語音速 度調整之减,對各辦位之聲紋誠進行調整;及一輸出模 組,用以播放調整後之語音訊號。 依據本發日狀目的且翻上狀優點,本發明之方法包含 下列步驟:當語音縦魏薇動之後,首先,接收到使用者 作調整語音賊之設定;接著,傳敎話顧絲音調整功能 之提不至發話u後,依據輕之參數可對收話端或者是發 話端之暫存記憶體巾之語音峨,對各娜狀聲紋訊號進行 調整;最後,輸出調整後之語音訊號。 7 1278219 有關本發明的特徵與實作,茲配合圖示作最佳實施例詳細 說明如下。 【實施方式】 本發明將揭露一種網路電話(ν〇ΙΡ,voice 〇ver IntemetTherefore, in order to achieve the above objective, a system for adjusting a voice speed of a network telephone according to the present invention includes at least: a setting module for receiving a parameter for setting a voice speed adjustment by a user; The voice data packet transmitted by the compression and transmitted, and the prompt signal of the voice speed adjustment are transmitted to the calling end; the temporary memory, the voice signal transmitted by the rib storage terminal, the _ 贞 residual sound speed adjustment The operation processing '-the shouting group' ribs prompts according to the prompt signal; the adjustment module is used to uncompress the voice signal line _ and decompress _, and according to the voice speed adjustment, the voiceprint of each office Integrity adjustment; and an output module for playing the adjusted voice signal. According to the purpose of the present invention, the method of the present invention comprises the following steps: after the voice 縦 Wei Wei moves, firstly, the user is adjusted to adjust the voice thief; then, the voice is adjusted. After the function is not sent, the voice parameter of the temporary storage memory towel of the receiving end or the speaking end can be adjusted according to the light parameter, and the voice signal of each voice is adjusted; finally, the adjusted voice signal is output. . 7 1278219 The features and implementations of the present invention are described in detail below with reference to the preferred embodiments. [Embodiment] The present invention will disclose a network telephone (ν〇ΙΡ, voice 〇ver Intemet)
Protocol)調整語音速度之系統及其方法。在本發明的以下詳細 說明中,將描述多種特定的細節以便提供本發明的完整說明。 然而,對熟知技藝者來說,並可以不需要使用該等特定細節便 可以實施本發明,或者可以藉著利用替代的元件或方法來實施 本發明。在其他的狀況下,並不特別詳細地說明已知的方法、 程序、部件、以及電路,以免不必要地混淆本發明的重點。 請參照「第1圖」,此為本發明之系統方塊圖,包含了以 下元件: 暫存兄憶體110,係為一隨機存取記憶體(ram ,Random Access Memory),例如:動態隨機存取記憶體(DRAM,Protocol) A system and method for adjusting voice speed. In the following detailed description of the invention, numerous specific details However, the invention may be practiced without departing from the specific details, or may be practiced by the use of alternative elements or methods. In other instances, well-known methods, procedures, components, and circuits are not described in detail to avoid unnecessarily obscuring the invention. Please refer to "FIG. 1", which is a block diagram of the system of the present invention, which includes the following components: Temporary memory element 110, which is a random access memory (ram, Random Access Memory), for example: dynamic random access memory Take memory (DRAM,
Dynamic Random Access Memory )、EDO DRAM (Extend Data Out Dynamic Random Access Memory)、RDRAM (Rambus DRAM )、SDRAM ( Synchronous Dynamic RAM )、VCM SDRAM ( Virtual Channel Memory SDRAM )以及最近已成為 市%主流之雙倍資料速率(DDR,Double Date Rate)SDRAM>·· 等等,用以作為傳輪模組140所接收音訊串流資料之暫存。 傳輸模組140除了可接收發話端傳入之語音封包外,也可 1278219 以接收及傳送收話端所設定之語音調整速度之提示。當語音資 料被切割成多個封包傳送出去時,在標頭會被加人收話端之網 路位址以及用以進行語音資料重組之相關資訊,以綠保資料 胃確無誤。因此’㈣卩服務需要使用的 個重要標準是信號傳輸協物gnalingpr〇t〇c〇i),來創建網路 上客戶的軟體和硬體之間的連接。對話的呼叫建立和控制的主 要功能包_戶位址麵、位址轉換、連接建立、服務特性石差 商、呼叫終止和呼叫參與者的管理等。Dynamic Random Access Memory), EDO DRAM (Extend Data Out Dynamic Random Access Memory), RDRAM (Rambus DRAM), SDRAM (Synchronous Dynamic RAM), VCM SDRAM (Virtual Channel Memory SDRAM), and recently become the mainstream of the city's double data The DDR (Double Date Rate) SDRAM is used as a temporary storage of the audio stream data received by the transmission module 140. In addition to receiving the incoming voice packet from the originating end, the transmission module 140 can also receive 1278219 to receive and transmit the prompt of the voice adjustment speed set by the receiving end. When the voice data is cut into multiple packets and transmitted, the header address will be added to the network address of the receiving terminal and the related information for voice data reorganization, so that the green insurance data is correct. Therefore, an important criterion for the use of the (4) service is the signal transmission protocol gnalingpr〇t〇c〇i) to create a connection between the client's software and hardware on the network. The main function package of the call setup and control of the conversation is the address plane, address translation, connection establishment, service feature difference, call termination and call participant management.
VoIP標料組駐科ITU_T,嶋網路讀工作特別 小組(IETF ’ The Internet Engineering Task Force)和歐洲電信標 準學會(European Telecommunications Standards Institute,ETSI) 等。兩個比較值得注意的可用於正電話信號傳輸的標準是^口 的H.323系列標準和正TF的入會協定(Sessi〇n以出鉗⑽ Protocol,SIP)。該協定原來是為在網際網路上召開多媒體會 議開發的協定。H.323和SIP這兩種協議代表解決相同問題的 兩種不同的解決方法。此外,還有兩個信號傳輸協定被考慮為 sip結構的一部分。這兩個協議是··會話說明協定(Sessi〇n Description Protoc〇l,SDP)和會話通告協定(Sessi〇nThe VoIP standard group is located in ITU_T, the Internet Engineering Task Force (IETF) and the European Telecommunications Standards Institute (ETSI). Two notable standards that can be used for positive telephone signal transmission are the H.323 series of standards and the membership agreement for positive TFs (Sessi〇n out of the clamp (10) Protocol, SIP). The agreement was originally an agreement for the development of multimedia conferences on the Internet. The two protocols H.323 and SIP represent two different solutions to the same problem. In addition, two signal transmission protocols are considered as part of the sip structure. These two protocols are: Session Description Protocol (Sessi〇n Description Protoc〇l, SDP) and Session Notification Agreement (Sessi〇n)
Announcement Protocol,SAP)。Announcement Protocol, SAP).
VoIP的呼叫建立和控制大多建立在TCP基礎上,而音頻 串流的傳送則建立在UDP基礎上,為保證傳送的即時性,ietf 1278219 土曰加了成個重要的協議· RSVP ( Resource Reservation Protocol): —般說來,在網際網路上保留足夠的頻寬用於多 媒體的傳送是十分困難的,IETF定義了資源預留協議 (RSVP,Resource Reservation Setup Protocol)。RSVP 允許 接收者申請特定數量的頻寬用以進行資料傳輪,如此一來便能 獲付了 QoS (Quality of Service)保證。 设定模組120,是用以接收使用者自鍵盤或是其他輸入裝 置所作之輸人設定,例如··細者可從鍵财輸人語音調整功 成之啟動’選擇作語音調整加快或放慢之倍數,提示功能之啟 動’並將設定之參數傳至中央處理器刷,以進行下一步之調 整工作。 提示模組130貝J是當收話端啟動語音調整功能,且接收來 自收話端所設定之提示訊號時’可將語音魏啟動之訊 息,於發話稿示,_示方法可以是在螢幕中顯示,或是以 一特定燈號,或是以音效"·等各種方絲作為提示使用者語音 調整功能啟動之方式。 调整模組15G峡當語音罐功能啟_,_在收話端 收到傳輸她⑽所傳來之語音喊,根據使用麵設定之1 音調整速度之參數,複製不·目之聲紋訊號單位資料量^ 單位’或者是發話者之麥克風收到使用者 力……㈣,觀成數位語音訊號再根據收話端傳 10 1278219 來之語音難速度參數,針對之倍數來複衫次語音訊 號,或者是本模組在將語音訊號進行壓縮編碼時,可加入複製 一數加人傳送之封包巾,當收話端收_包進行重組 時,可根據複製之次數來加糊整,使達聽音輕之目的。 中央處理器160,用以進行語音訊號之編碼,利用數位信 CDSP ? Digital Signal Processing) » ,The call setup and control of VoIP are mostly based on TCP, and the transmission of audio stream is based on UDP. In order to ensure the immediacy of transmission, ietf 1278219 has added an important protocol. RSVP (Resource Reservation Protocol) ): In general, it is very difficult to reserve enough bandwidth for multimedia transmission on the Internet. The IETF defines the Resource Reservation Setup Protocol (RSVP). RSVP allows recipients to apply for a specific amount of bandwidth for data transfer, which results in a QoS (Quality of Service) guarantee. The setting module 120 is configured to receive input settings made by the user from a keyboard or other input device, for example, the user can select from the key to adjust the voice of the user to select the voice adjustment to speed up or put The multiple of slow, the start of the prompt function 'and pass the set parameters to the central processor brush for the next adjustment. The prompt module 130 is a message that when the receiving terminal starts the voice adjustment function and receives the prompt signal set by the receiving terminal, the message that can be started by the voice is displayed in the voice message, and the method can be displayed on the screen. Display, or a specific light number, or a variety of square yarns such as sound effects, etc. as a way to prompt the user to adjust the voice adjustment function. Adjusting the module 15Gxia as the voice can function _, _ received the voice call from her (10) at the receiving end, according to the parameters of the 1 tone adjustment speed set by the use surface, copy the sound signal unit The amount of data ^ unit 'or the speaker's microphone receives the user's power... (4), observe the digital voice signal and then according to the voice transmission speed parameter of 10 1278219, the multiple times to repeat the voice signal, Or, when the module compresses and encodes the voice signal, the module can add a copy of the packet to be transmitted by the person. When the receiving party receives the packet for reorganization, it can add the paste according to the number of times of copying. The purpose of sound. The central processing unit 160 is configured to encode the voice signal by using a digital signal processing (CDSP).
術(voicecoding)及聲音壓縮技術㈤㈣卿㈣^ 聲音訊號編碼成為触之語音峨,經賴縮,再_資料成 封包(Packetization),每個封包獨立地在數據網路上傳送。 在接收端的部分則是將收到的封包重組(Paeket As_b^), 去除封包格式(De-Paketization)及解壓縮,並將數位之語音 訊號還原成類比訊號,使達成語音傳遞的功能。 备使用者要加快語音的撥放速度時,中央處理器16〇即將 所收到之語音訊號依據調整加快之倍數,當加快為2倍速度 %,則選擇在兩個連續聲紋訊號中選擇拋棄一個聲紋訊號,播 放語音訊號的資料量便減少一倍,因此整體之語音播放速度可 加快。同樣的,當使用者設定播放速度放慢為2倍速度時,則 是將每個聲紋訊號皆複製一次,並縮短每句話之間的空白時 間’必要時可延長整體之播放時間。 輸出模組170在此是指VoIP上的擴大器,用以撥放數位 曰Λ號。請參照「第2a .圖」,此為正常播放速度之語音播 11 1278219 放示意圖,假設所收到之語音訊號包含三句話··,,你好, 是厦的史密斯,,、”請問村上先生在嗎,,,每句話檐包Γ -段空白_,當伽者轉放慢播放速料,如「第 所示,調整模組150會根據使用者所設定語音調整速度麵,」 複製聲紋訊號後,因此每句話之間的間斷時間將會縮短:甚至 3句洁整體的播放時間將比正常時間還來的長,由於發 提示模組130可以得知收話端正在啟動語音調整功能,所以可 以得知收話端之回答會比正常對話來的慢。同樣的,當收知 之使用者舊加_放速度時,如「第2,」所示,則根2 加快之速度輕參數進行浦後,每句綱播鱗騎變短, 維持辆關始之時間,鹏句話之間的無㈣·加大。 明麥'日尽弟3圖」’此為本發明在收話端進行語音調整之 方絲糊。者在替之收話端啟動語音功能後, 首先,受話端之設定模組⑽便接收到使用者作語音調整之播 放設定’調快或調慢,以及調整之速度參數(步驟3ig);接著, 傳輸模組刚便將傳送啟動語音調整功能之提示至發話端(步 驟聊發話端之提示模組130可以訊息或是提示燈號或聲音 來提示發話端之使用者。 傳輪模組U0將接收下來之語音封包,重組成數位之語音 訊號先存㈣存記㈣!财,罐池⑼再域調整之速 度參數,對暫存記憶體11G中語音訊號之各個聲紋訊號逐一進 12 1278219 行調整(步驟330)。例如:當速度參數為加快2倍速度時,則 選擇在兩個連續聲紋訊號中選擇拋棄一個單位之聲紋訊號,播 放語音訊號的資料量便減少-倍,因此整體之語音播放速度可 加快。。最後,輸出模組170便將調整後之數位之語音訊號輸 出(步驟340),收話端之使用者自傳輪模組〗4〇收到後,便可 由輸出模組170聽到發話端調整後之聲音。 藉由本發明所揭露之方法,使用者可啸容易的在使用 進行通話時,減發話者之絲速度人的需求,調 整所收到語音訊號的播放速度’並可使發話端獲得收話端啟動 語音調整功能之提示。 雖然本發明以前述之較佳實施例揭露如上,然其並非用以 蚊本發明’任何翻減㈣者,在不_本伽之精神和 範圍内’當可作些許之更動與潤飾,因此本發明之專利保護範 圍須視本說明書所附之申請專利範圍所界定者為準。 【圖式簡單說明】 第1圖係本發明之系統架構圖; 苐2a圖係本發明正常播放之示意圖; 第2b圖係本發明放慢播放之示意圖; 苐2c圖係本發明加快播放之示意圖;及 第3圖係本發明收話端一實施例之方法流程圖。 【主要元件符號說明】 110 暫存記憶體 1278219 120 設定模組 130 提示模組 140 傳輸模組 150 調整模組 160 中央處理器 170 輸出模組 步驟310 接收使用者作語音調整之播放設定 步驟320 傳送啟動語音調整功能之提示至發話端 步驟330 依據語音調整之速度參數複製不同數目之 聲紋訊號單位資料量 步驟340 輸出調整後之語音訊號 14(voicecoding) and voice compression technology (5) (4) Qing (four) ^ voice signal coding becomes the voice of the voice, after the shrinkage, and then _ data into packets (Packetization), each packet is transmitted independently on the data network. At the receiving end, the received packet is reassembled (Paeket As_b^), the packet format (De-Paketization) is decompressed, and the digital voice signal is restored to an analog signal, so that the voice transmission function is achieved. When the user wants to speed up the playback speed of the voice, the central processor 16 〇 will receive the voice signal according to the adjustment multiple times, when the speed is 2 times the speed %, then choose to discard the two consecutive voice signals. A voiceprint signal reduces the amount of data played by voice signals by a factor of two, so the overall voice playback speed can be increased. Similarly, when the user sets the playback speed to a slower speed of 2x, it copies each voiceprint signal once and shortens the blank time between each sentence. If necessary, the overall playback time can be extended. The output module 170 herein refers to an expander on the VoIP for dialing a digit apostrophe. Please refer to "2a. Figure", which is a schematic diagram of the normal broadcast speed of the voice broadcast 11 1278219, assuming that the received voice signal contains three sentences ·, Hello, is the Smith of Xiamen,,," I would like to ask Murakami Mr. is there, every sentence, Γ 段 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - After the voiceprint signal, the time between each sentence will be shortened: even the overall playback time of the three sentences will be longer than the normal time, because the prompting module 130 can know that the receiving terminal is starting the voice. Adjust the function, so you can know that the answer to the receiving end will be slower than the normal conversation. Similarly, when the user of the receiving user adds the speed of the _, as shown in the "2nd,", then the root 2 speeds up the speed parameter, and the scale of each sentence is shortened, maintaining the start of the car. Time, no between the Peng sentence (four) · increase. Ming Mai's "Day 3" "This is the square wire paste for voice adjustment at the receiving end of the present invention. After the voice function is activated for the receiving terminal, first, the setting module (10) of the receiving end receives the playback setting of the user's voice adjustment 'fastening or slowing down, and adjusting the speed parameter (step 3ig); then The transmission module will just send a prompt to initiate the voice adjustment function to the caller (the prompt module 130 of the chat terminal can message or prompt the light or sound to prompt the user of the caller. The transfer module U0 will Received voice packets, re-composed into a digital voice signal pre-existing (four) record (four)! Finance, tank pool (9) re-domain adjustment speed parameters, the voice signal of the temporary memory 11G voice signal into the 12 1278219 line Adjusting (step 330). For example, when the speed parameter is 2 times faster, the voice signal selected to discard one unit in two consecutive voice signals is selected, and the amount of data for playing the voice signal is reduced by - times, so the overall The voice playback speed can be accelerated. Finally, the output module 170 outputs the adjusted digital voice signal (step 340), and the user self-transmitting wheel module of the receiving end receives the voice signal. The sound of the uttered end can be heard by the output module 170. By the method disclosed in the present invention, the user can easily reduce the speed of the speaker's wire when using the call, and adjust the received voice signal. The playback speed 'and the utterer can obtain the prompt for the voice-activated function to be activated at the receiving end. Although the present invention has been disclosed above in the preferred embodiment, it is not used for the invention of the invention, any reduction (four), In the spirit and scope of the gamma, the scope of the patent protection of the present invention shall be determined by the scope of the patent application attached to the specification. [Simplified illustration] Figure 1 The system architecture diagram of the present invention; 苐2a diagram is a schematic diagram of normal playback of the present invention; 2b is a schematic diagram of slow playback of the present invention; 苐2c diagram is a schematic diagram of accelerated playback of the present invention; and FIG. 3 is a schematic diagram of the present invention. Flow chart of the method of the embodiment of the terminal. [Description of main component symbols] 110 temporary storage memory 1278219 120 setting module 130 prompting module 140 transmission module 150 adjusting module 160 central processing unit 170 output module step 310 receiving the user's voice adjustment playback setting step 320 transmitting the prompt to initiate the voice adjustment function to the utterance step 330 copying the different number of voiceprint signal unit data according to the voice adjustment speed parameter step 340 Output adjusted voice signal 14