TWI790705B - Method for adjusting speech rate and system using the same - Google Patents
Method for adjusting speech rate and system using the same Download PDFInfo
- Publication number
- TWI790705B TWI790705B TW110129198A TW110129198A TWI790705B TW I790705 B TWI790705 B TW I790705B TW 110129198 A TW110129198 A TW 110129198A TW 110129198 A TW110129198 A TW 110129198A TW I790705 B TWI790705 B TW I790705B
- Authority
- TW
- Taiwan
- Prior art keywords
- signal
- original
- unit
- speech
- frame
- Prior art date
Links
Images
Landscapes
- Television Signal Processing For Recording (AREA)
- Mobile Radio Communication Systems (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
Description
本發明涉及語音處理技術,特別是一種語速調整方法及其系統。The invention relates to speech processing technology, in particular to a speech rate adjustment method and system thereof.
影音的音頻描述通過特製的音軌提供對角色動作和場景變化等事件的變化進行語音描述。音頻描述可為盲人、低視力或其他視力障礙的人改善視覺影圖像的可訪問性。The audio description of video and audio provides a voice description of changes in events such as character actions and scene changes through a specially-made audio track. Audio description can improve the accessibility of visual imagery for people who are blind, have low vision, or have other visual impairments.
音頻描述的創建既昂貴又麻煩。傳統上,影音的製作者僱用腳本編寫者和語音人才來創建音頻描述。在這種傳統方法中,腳本編寫者透過觀看影音的內容找出需加上音頻描述的影像片段,以確定需插入音頻描述的時間點並估算音頻描述的可用時間,並根據影像片段的內容創建描述性音頻的腳本。然後,語音人才再依據腳本錄製符合可用時間的音頻描述。通常,在可用時間的時間限制下,腳本編寫者和語音人才必須多次重複前述過程以優化獲得的音頻描述。例如,腳本編寫者可以修改需要音頻描述的影像片段以得到新的可用時間、或者腳本編寫者可以重寫腳本以適應較短的可用時間,或者語音人才可反覆調整其講話速度以適應可用時間。由於這些挑戰,傳統的音頻描述服務的價格相當高。Audio descriptions are expensive and cumbersome to create. Producers of audiovisuals have traditionally hired script writers and voice talent to create audio descriptions. In this traditional method, the script writer finds out the video clips that need to add audio descriptions by watching the content of the video and audio, so as to determine the time point where the audio descriptions need to be inserted and estimate the available time of the audio descriptions, and create a script based on the content of the video clips Script for descriptive audio. The voice talent then records the audio description according to the available time according to the script. Typically, script writers and speech talents must repeat the aforementioned process several times to optimize the resulting audio description, within the time constraints of available time. For example, a script writer may modify a video segment that requires audio description to accommodate a new available time, or a script writer may rewrite a script to accommodate a shorter available time, or a voice talent may repeatedly adjust his speaking speed to accommodate an available time. Because of these challenges, the price of traditional audio description services is quite high.
本發明一實施例提供一種語速調整方法,其包括:取得多個字的原始語音訊號及總調整時長;分析原始語音訊號以取得對應各字的濁音訊號區段與清音訊號區段;根據總調整時長與單位音框時長計算音框調整量;以及根據音框調整量調整至少一濁音訊號區段的音框數量以形成調整後語音訊號。An embodiment of the present invention provides a speech rate adjustment method, which includes: obtaining original speech signals of multiple characters and the total adjustment duration; analyzing the original speech signals to obtain voiced signal segments and unvoiced signal segments corresponding to each word; calculating the frame adjustment amount by the total adjustment duration and the unit frame duration; and adjusting the number of frames in at least one voiced signal segment according to the frame adjustment amount to form an adjusted voice signal.
本發明另一實施例提供一種語速調整系統,其包括:儲存單元、分析單元、以及調整單元。分析單元耦接儲存單元,且調整單元耦接儲存單元與分析單元。儲存單元暫存多個字的原始語音訊號。分析單元分析原始語音訊號以取得對應各字的濁音訊號區段與清音訊號區段。調整單元根據總調整時長與單位音框時長計算音框調整量,並且根據音框調整量調整至少一濁音訊號區段的音框數量以形成調整後語音訊號。Another embodiment of the present invention provides a speech rate adjustment system, which includes: a storage unit, an analysis unit, and an adjustment unit. The analysis unit is coupled to the storage unit, and the adjustment unit is coupled to the storage unit and the analysis unit. The storage unit temporarily stores the original voice signals of multiple characters. The analysis unit analyzes the original speech signal to obtain voiced signal segments and unvoiced signal segments corresponding to each character. The adjusting unit calculates the frame adjustment amount according to the total adjustment duration and the unit frame duration, and adjusts the number of frames in at least one voiced signal segment according to the frame adjustment amount to form an adjusted speech signal.
綜上所述,任一實施例之語速調整方法適用於調整語音訊號的語速,以提供滿足時間限制的音檔,進而減少重複錄音的次數,並大幅減少音檔的製作成本。To sum up, the speech rate adjustment method of any embodiment is suitable for adjusting the speech rate of the speech signal to provide an audio file that satisfies the time limit, thereby reducing the number of repeated recordings and greatly reducing the production cost of the audio file.
參照圖1與圖2,本發明一實施例提供一種語速調整系統10,包括:儲存單元110、分析單元120以及調整單元130。分析單元120耦接儲存單元110,並且調整單元130耦接儲存單元110與分析單元120。於此,儲存單元110暫存至少一字的原始語音訊號。其中,本發明還提供一種語速調整方法,其能以語速調整系統實現。為清楚說明,以下以多個字的原始語音訊號為例進行說明。Referring to FIG. 1 and FIG. 2 , an embodiment of the present invention provides a speech
於此,語速調整方法可適用於調整單字或句子的發音的語速,或調整影音的音頻描述的語速。 後續將詳述語速調整方法的技術內容。Here, the method for adjusting the speech rate may be applicable to adjusting the speech rate of the pronunciation of a word or sentence, or adjusting the speech rate of the audio description of a video or video. The technical content of the speech rate adjustment method will be described in detail later.
於一實施例中,分析單元120能取得並分析原始語音訊號Si(步驟S21)以取得對應各字的濁音(voiced sound,又稱有聲音)訊號區段與清音(unvoiced sound,又稱無聲音)訊號區段(步驟S22)。調整單元130能取得總調整時長N1(步驟S21),根據總調整時長N1與單位音框時長計算待移除的音框的數量(即音框調整量)(步驟S23),並且根據音框調整量調整此些字的濁音訊號區段其中至少一濁音訊號區段的音框數量以形成調整後語音訊號So(步驟S24)。其中,音框(speech frame)為進行語音訊號處理時的最小訊號區段,而單位音框時長即為一個最小訊號區段的時間長度。In one embodiment, the
在一些實施例中,原始語音訊號Si包括此些字Wo的字音訊號Sp1~Sp7,如圖3所示。各字音訊號Sp1~Sp7包括複數音框(以下稱原始音框)。In some embodiments, the original speech signal Si includes the word sound signals Sp1˜Sp7 of the words Wo, as shown in FIG. 3 . Each character sound signal Sp1-Sp7 includes a plurality of phonetic frames (hereinafter referred to as original phonetic frames).
其中,於語言學中,發音時聲帶振動的音稱爲濁音,聲帶不振動的音稱爲清音,另有輔音,其同時具有清音與濁音,於本案中於遇有輔音的字時,可以先區分出濁音與清音後,再進行語速調整。Among them, in linguistics, the sound whose vocal cords vibrate during pronunciation is called voiced sound, the sound without vocal cord vibration is called unvoiced sound, and there are consonants, which have both unvoiced and voiced sounds. In this case, when encountering a word with consonants, you can first After distinguishing between voiced and unvoiced sounds, adjust the speech rate.
在步驟S22的一些實施例中,分析單元120會分析原始語音訊號Si以找出原始語音訊號Si中每個字Wo的字音訊號Sp1~Sp7(即對應每個字的訊號區段),然後再分析每個字Wo的字音訊號Sp1~Sp7以找出每個字音訊號Sp1~Sp7中濁音訊號區段(即對應濁音發音的訊號區段)與清音訊號區段(即對應清音發音的訊號區段)。舉例來說,搭配參照圖3,分析單元120能將原始語音訊號Si從時間對振幅的語音波形F1轉換成時間對頻率的聲音頻譜F2,並根據能量分布狀態識別出每個字Wo的字音訊號Sp1~Sp7。在圖3中,對於語音波形F1,橫軸為時間(秒),縱軸為振幅(分貝);對於聲音頻譜F2,橫軸為時間(秒);縱軸為頻率(赫茲(Hz))。然後,分析單元120再根據每個字Wo的字音訊號Sp中的能量分布狀態識別出濁音訊號區段Z2與清音訊號區段Z1。In some embodiments of step S22, the
在步驟S23的一實施例中,調整單元130是以固定間隔移除一個原始音框的方式從濁音訊號區段Z2移除音框調整量的原始音框,以形成相對於原始語音訊號Si語速變快的調整後語音訊號So。其中,調整單元130是對其音框總量大於音框調整量的濁音訊號區段Z2進行音框刪除的聲音訊號處理。In an embodiment of step S23, the
舉例來說,假設音框調整量為每個字刪除20個原始音框。當一個字Wo的字音訊號Sp1的濁音訊號區段Z2有100個原始音框時,調整單元130即對此濁音訊號區段Z2進行每間隔5個原始音框刪除1個原始音框的聲音訊號處理。For example, assume that the amount of frame adjustment is to delete 20 original frames for each character. When the voiced sound signal section Z2 of the character sound signal Sp1 of a word Wo has 100 original sound frames, the
在一些實施例中,調整單元130可依據總調整時長、單位音框時長以及處理數量(即字音訊號Sp1~Sp7中具有濁音訊號區段Z2的字音訊號的數量)來計算出每個字Wo待移除的原始音框之音框調整量,然後再從每個具有濁音訊號區段Z2的字音訊號中刪除音框調整量的原始音框。在一些實施例中,在音框調整量後,調整單元130會先確認具有濁音訊號區段Z2的字音訊號其濁音訊號區段Z2的音框數量是否均大於當前的音框調整量。於均大於時,調整單元130才進行音框刪除。反之,調整單元130排除小於的字音訊號以獲得新的處理數量並重新計算音框調整量。In some embodiments, the
在步驟S23的另一實施例中,調整單元130是以固定間隔插入一音框(以下稱補充音框)的方式插入音框調整量的補充音框至濁音訊號區段Z2,以形成相對於原始語音訊號Si語速變慢的調整後語音訊號So。In another embodiment of step S23, the
舉例來說,假設音框調整量為每個字增加20個補充音框。當一個字Wo的字音訊號Sp1的濁音訊號區段Z2有100個原始音框時,調整單元130即對此濁音訊號區段Z2進行每5個原始音框插入1個補充音框的聲音訊號處理。For example, assume that the frame adjustment amount is to add 20 supplementary frames to each character. When there are 100 original sound frames in the voiced sound signal section Z2 of the character sound signal Sp1 of a word Wo, the
在一些實施例中,插入的補充音框相關於此濁音訊號區段Z2中與插入位置相鄰的至少一原始音框。在一實施例中,插入的補充音框可為此濁音訊號區段Z2中與插入位置相鄰的原始音框的平均。例如,承前例,調整單元130在濁音訊號區段Z2的第5個原始音框與第6個原始音框之間插入由第5個原始音框與第6個原始音框平均所獲的補充音框。在另一實施例中,插入的補充音框可為此濁音訊號區段Z2中插入位置的前一原始音框。例如,承前例,調整單元130在濁音訊號區段Z2的第5個原始音框與第6個原始音框之間插入透過複製第5個原始音框所得的補充音框。在又一實施例中,插入的補充音框可為此濁音訊號區段Z2中插入位置的下一原始音框。例如,承前例,調整單元130在濁音訊號區段Z2的第5個原始音框與第6個原始音框之間插入透過複製第6個原始音框所得的補充音框。換言之,調整單元130可以視實際情況調整每間隔多少音框移除或插入一至多個音框的方式,本發明並非為限制。In some embodiments, the inserted supplementary frame is related to at least one original frame adjacent to the insertion position in the voiced signal segment Z2. In one embodiment, the supplementary sound frame to be inserted may be the average of the original sound frames adjacent to the insertion position in the voiced signal section Z2. For example, following the previous example, the
在一些實施例中,語速調整系統10可進一步根據調整後語音訊號So與一動態影像生成一口述影像。其中,調整後語音訊號對應於動態影像中的無聲內容。在一些實施例中,動態影像可以是無聲連續圖片(如,GIF等)或是原始影音視頻(如,電影或動畫等),而產生的口述影像則對應為有聲連續圖片(如,有聲GIF等)或是口述影音視頻(如,口述電影或口述動畫等)。In some embodiments, the speech
在一些實施例中,原始語音訊號Si與調整後語音訊號So個別可用以提供原始影音視頻中的一段無聲影像的影像內容(即,無聲內容)的事件描述。於此,無聲影像是指影像內容中沒有人在講話也沒有具有劇情上意義的音效聲(如,開門聲、或車輛靠近聲等)。換言之,原始語音訊號Si與調整後語音訊號So是以不同語速提供對此段無聲影像中角色動作及/或場景變化等事件的變化進行語音描述。In some embodiments, the original audio signal Si and the adjusted audio signal So are individually used to provide an event description of an image content (ie, silent content) of a silent image in the original audiovisual video. Here, a silent image means that there are no people speaking and no dramatic sound effects (such as the sound of doors opening, or the sound of vehicles approaching, etc.) in the image content. In other words, the original voice signal Si and the adjusted voice signal So provide voice descriptions of changes in events such as character movements and/or scene changes in the silent video at different speech rates.
在一些實施例中,請參照圖4,語速調整系統10可更包括:影音處理單元180。影音處理單元180耦接儲存單元110與調整單元130。於此,參照圖4及圖5,影音處理單元180能根據調整後語音訊號So與原始影音視頻Vi生成口述影音視頻Vo(步驟S26)。In some embodiments, please refer to FIG. 4 , the speech
在步驟S21的一些實施例中,原始語音訊號Vi與總調整時長N1可由耦接語速調整系統10的外部裝置提供,及/或使用者經由使用者介面輸入至語速調整系統10。In some embodiments of step S21 , the original voice signal Vi and the total adjustment duration N1 may be provided by an external device coupled to the speech
在一些實施例中,請參照圖4,語速調整系統10可更包括:轉換單元140以及判斷單元150。轉換單元140耦接儲存單元110,且判斷單元150耦接轉換單元140、分析單元120與調整單元130。In some embodiments, please refer to FIG. 4 , the speech
在步驟S21的一些實施例中,參照圖4及圖5,轉換單元140接收對應無聲內容的描述文本XL(步驟S11),並且將描述文本XL轉換為原始語音訊號Si(步驟S12)。其中,描述文本XL內記錄有以此些字Wo所構成的事件描述,並且此事件描述是敘述原始影音視頻Vi中的無聲內容。In some embodiments of step S21 , referring to FIG. 4 and FIG. 5 , the
於轉換後,轉換單元140會將生成的原始語音訊號Si暫存於儲存單元110,並且判斷單元150會比較原始語音訊號Si的總時長與無聲內容的總時長Tt(步驟S13)。After conversion, the converting
在一些實施例中,判斷單元150透過比較步驟(步驟S13)確認原始語音訊號Si的總時長是否大於無聲內容的總時長Tt(步驟S14)。In some embodiments, the judging
於原始語音訊號Si的總時長大於無聲內容的總時長Tt時,判斷單元150會計算原始語音訊號Si的總時長與無聲內容的總時長Tt之間的時間差以得到總調整時長N1(步驟S15),並提供給調整單元130。並且,判斷單元150還會致能分析單元120開始對生成的原始語音訊號Si進行分析,讓調整單元130根據總調整時長N1及分析結果生成相對於原始語音訊號Si語速較快的調整後語音訊號So(即,接續執行步驟S22~S24)。When the total duration of the original audio signal Si is greater than the total duration Tt of the silent content, the judging
於原始語音訊號Si的總時長不大於無聲內容的總時長Tt時,判斷單元150則不會致能分析單元120(即不接續執行步驟S22~S24)。此時,若語速調整系統10具有影音處理單元180,判斷單元150則會致使影音處理單元180根據原始語音訊號Si與原始影音視頻Vi生成口述影音視頻Vo(步驟S26)。When the total duration of the original audio signal Si is not greater than the total duration Tt of the silent content, the judging
在一些實施例中,參照圖4及圖6,判斷單元150透過比較步驟(步驟S13)確認原始語音訊號Si的總時長是否等於無聲內容的總時長Tt(步驟S14’)。In some embodiments, referring to FIG. 4 and FIG. 6 , the judging
於原始語音訊號Si的總時長不等於無聲內容的總時長Tt時,判斷單元150會計算原始語音訊號Si的總時長與無聲內容的總時長Tt之間的時間差以得到總調整時長N1(步驟S15),並提供給調整單元130。並且,判斷單元150還會致能分析單元120開始對生成的原始語音訊號Si進行分析,以致於調整單元130根據總調整時長N1及分析結果生成調整後語音訊號So(即,接續執行步驟S22~S24)。其中,於原始語音訊號Si的總時長大於無聲內容的總時長Tt時,調整單元130會生成相對於原始語音訊號Si語速較快的調整後語音訊號So(步驟S24)。於原始語音訊號Si的總時長小於無聲內容的總時長Tt時,調整單元130會生成相對於原始語音訊號Si語速較慢的調整後語音訊號So(步驟S24)。When the total duration of the original audio signal Si is not equal to the total duration Tt of the silent content, the judging
於原始語音訊號Si的總時長等於無聲內容的總時長Tt時,判斷單元150則不會致能分析單元120(即不接續執行步驟S22~S24)。此時,若語速調整系統10具有影音處理單元180,判斷單元150則會致使影音處理單元180根據原始語音訊號Si與原始影音視頻Vi生成口述影音視頻Vo(步驟S26)。When the total duration of the original audio signal Si is equal to the total duration Tt of the silent content, the judging
在一些實施例中,影音處理單元180能以混音、取代、或關聯等方式將語音訊號(即原始語音訊號Si或調整後語音訊號So)與原始影音視頻Vi結合而形成以語音訊號作為無聲內容的音頻之口述影音視頻Vo。In some embodiments, the audio-
在一實施例中,影音處理單元180接收原始影音視頻Vi並將原始影音視頻Vi分離為原始音軌與無聲影像視頻。接著,影音處理單元180將原始音軌與語音訊號(即原始語音訊號Si或調整後語音訊號So)混音以形成調整後音軌,然後透過同步調整後音軌與無聲影像視頻來將調整後音軌與無聲影像視頻結合成口述影音視頻Vo。In one embodiment, the
在另一實施例中,影音處理單元180接收原始影音視頻Vi並將原始影音視頻Vi分離為原始音軌與無聲影像視頻。接著,影音處理單元180以語音訊號取代原始音軌中對應無聲內容的播放時間的音軌區段以形成調整後音軌,然後透過同步調整後音軌與無聲影像視頻來將調整後音軌與無聲影像視頻結合成口述影音視頻Vo。In another embodiment, the
在又一實施例中,影音處理單元180接收原始影音視頻Vi並找出原始影音視頻Vi中無聲內容對應的音軌區段。接著,影音處理單元180建立語音訊號對此音軌區段的替代訊號,並產生含有原始影音視頻Vi、語音訊號以及替代訊號的口述影音視頻Vo。假設無聲內容對應的音軌區段在原始音軌中的第一播放時間到第二播放時間之間。此時,在口述影音視頻Vo的播放過程中,在第一播放時間時會觸發替代訊號而由執行原始音軌改為執行語音訊號,直到第二播放時間再切回從原始音軌中第二播放時間的位置接續執行原始音軌。In yet another embodiment, the
在一些實施例中,於調整後語音訊號So生成後可先進行語義識別,並且於調整後語音訊號So的語義可識別時才輸出調整後語音訊號So。In some embodiments, semantic recognition may be performed first after the adjusted speech signal So is generated, and the adjusted speech signal So is output only when the semantics of the adjusted speech signal So can be recognized.
在一些實施例中,請參照圖4,語速調整系統10可更包括:識別單元160與更新單元170。識別單元160耦接調整單元130與更新單元170。In some embodiments, please refer to FIG. 4 , the speech
參照圖4與圖5或圖4與圖6,於調整單元130生成調整後語音訊號So後,識別單元160會先偵測調整後語音訊號So的語義(步驟S25),以確認語義是否可識別(即確認調整後語音訊號So後所播放出的語音的內容是否能識別)。於此,語義偵測技術為本領域之技術人員所熟知,故於此不再贅述。Referring to FIG. 4 and FIG. 5 or FIG. 4 and FIG. 6, after the
於語義不為可識別時,識別單元160不輸出調整後語音訊號So,並且致使更新單元170更新描述文本以減少構成事件描述的字數(步驟S27)。然後,更新單元170將更新後的描述文本提供給轉換單元140進行轉換(即接續執行步驟S12)。此時,若語速調整系統10具有影音處理單元180,識別單元160則不會將生成的調整後語音訊號So輸出給影音處理單元180。When the semantics is not recognizable, the
於語義可識別時,識別單元160才輸出調整後語音訊號So。此時,若語速調整系統10具有影音處理單元180,識別單元160會將調整後語音訊號So輸出給影音處理單元180以進行影音處理(即執行步驟S26)。When the semantics are recognizable, the
在一些實施例中,儲存單元110、分析單元120、調整單元130、轉換單元140、判斷單元150、識別單元160、更新單元170以及影音處理單元180能以單個或多個處理組件實現。In some embodiments, the
在一些實施例中,原始語音訊號Si與調整後語音訊號So可提供無聲連續圖片中的無聲內容的對應語音。換言之,原始語音訊號Si與調整後語音訊號So是提供相同內容但不同語速的語音。舉例來說,無聲連續圖片可呈現單字或句子的發音的口型變化,而原始語音訊號Si與調整後語音訊號So則提供此單字或句子的發音。In some embodiments, the original speech signal Si and the adjusted speech signal So can provide the corresponding speech of the silent content in the silent continuous picture. In other words, the original speech signal Si and the adjusted speech signal So provide speech with the same content but different speech rates. For example, the continuous pictures without sound can present the lip changes of the pronunciation of a word or sentence, and the original speech signal Si and the adjusted speech signal So provide the pronunciation of the word or sentence.
在一些實施例中,請參照圖7,語速調整系統10可更包括:合併單元190。合併單元190耦接調整單元130。參照圖7及圖8,合併單元190接收一無聲連續圖片Mi,並且透過同步調整後語音訊號So與無聲連續圖片Mi來將調整後語音訊號So與無聲連續圖片Mi結合成一有聲連續圖片Mo(步驟S26’)。In some embodiments, please refer to FIG. 7 , the speech
在一些實施例中,儲存單元110、分析單元120、調整單元130以及合併單元190能以單個或多個處理組件實現。In some embodiments, the
在一些實施例中,儲存單元110能由單個或多個記憶體實現。前述之處理組件可為微處理器、微控制器、中央處理器、可編程邏輯控制器、邏輯電路、類比電路、數位電路或任何基於操作指令操作信號的類比和/或數位裝置。In some embodiments, the
在一些實施例中,任一實施例之語速調整方法可由一電腦程式產品實現,以致於當電腦載入程式並執行後可完成任一實施例之語速調整方法。在一些實施例中,電腦程式產品可為非暫態記錄媒體,而上述程式則儲存在非暫態記錄媒體中供電腦載入。在一些實施例中,上述程式本身即可為電腦程式產品,並且經由有線或無線的方式傳輸至電腦中。In some embodiments, the speech rate adjusting method of any embodiment can be realized by a computer program product, so that the speech rate adjusting method of any embodiment can be completed after the computer loads the program and executes it. In some embodiments, the computer program product may be a non-transitory recording medium, and the above-mentioned programs are stored in the non-transitory recording medium for the computer to load. In some embodiments, the above program itself can be a computer program product, and can be transmitted to the computer via wired or wireless means.
綜上所述,任一實施例之語速調整方法適用於調整語音訊號的語速,以提供滿足時間限制的音檔,進而減少重複錄音的次數,並大幅減少音檔的製作成本。To sum up, the speech rate adjustment method of any embodiment is suitable for adjusting the speech rate of the speech signal to provide an audio file that satisfies the time limit, thereby reducing the number of repeated recordings and greatly reducing the production cost of the audio file.
10:語速調整系統 110:儲存單元 120:分析單元 130:調整單元 140:轉換單元 150:判斷單元 160:識別單元 170:更新單元 180:影音處理單元 190:合併單元 Si:原始語音訊號 N1:總調整時長 So:調整後語音訊號 Wo:字 Sp1~Sp7:字音訊號 F1:語音波形 F2:聲音頻譜 Z1:清音訊號區段 Z2:濁音訊號區段 Vi:原始影音視頻 Vo:口述影音視頻 Mi:無聲連續圖片 Mo:有聲連續圖片 XL:描述文本 Tt:無聲內容的總時長 S21~S27:步驟 S11~S15:步驟 S14’:步驟 S26’:步驟10:Speech adjustment system 110: storage unit 120: Analysis unit 130: Adjustment unit 140: conversion unit 150: judgment unit 160: Identification unit 170: update unit 180: AV processing unit 190: merge unit Si: original voice signal N1: total adjustment time So: Adjusted voice signal Wo: word Sp1~Sp7: word tone signal F1: Speech Waveform F2: Sound Spectrum Z1: Unvoiced signal section Z2: voiced signal section Vi: original audio and video Vo: dictation audio-visual video Mi: Continuous picture without sound Mo: continuous picture with sound XL: Descriptive text Tt: Total duration of silent content S21~S27: Steps S11~S15: Steps S14': step S26': step
圖1為一些實施例的語速調整方法的流程圖。 圖2為一些實施例的語速調整系統的功能方塊圖。 圖3為一實施例的原始語音訊號的示意圖。 圖4為一些實施例的語速調整系統的功能方塊圖。 圖5為一些實施例的語速調整方法的流程圖。 圖6為一些實施例的語速調整方法的流程圖。 圖7為一些實施例的語速調整系統的功能方塊圖。 圖8為一些實施例的語速調整方法的流程圖。 Fig. 1 is a flowchart of a speech rate adjustment method in some embodiments. FIG. 2 is a functional block diagram of a speech rate adjustment system in some embodiments. FIG. 3 is a schematic diagram of an original speech signal according to an embodiment. FIG. 4 is a functional block diagram of a speech rate adjustment system in some embodiments. Fig. 5 is a flowchart of a speech rate adjustment method in some embodiments. Fig. 6 is a flowchart of a speech rate adjustment method in some embodiments. FIG. 7 is a functional block diagram of a speech rate adjustment system in some embodiments. Fig. 8 is a flowchart of a speech rate adjustment method in some embodiments.
S21~S24:步驟 S21~S24: Steps
Claims (10)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW110129198A TWI790705B (en) | 2021-08-06 | 2021-08-06 | Method for adjusting speech rate and system using the same |
CN202111553476.4A CN115705838A (en) | 2021-08-06 | 2021-12-17 | Method and system for adjusting speed of speech |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW110129198A TWI790705B (en) | 2021-08-06 | 2021-08-06 | Method for adjusting speech rate and system using the same |
Publications (2)
Publication Number | Publication Date |
---|---|
TWI790705B true TWI790705B (en) | 2023-01-21 |
TW202308396A TW202308396A (en) | 2023-02-16 |
Family
ID=85181376
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW110129198A TWI790705B (en) | 2021-08-06 | 2021-08-06 | Method for adjusting speech rate and system using the same |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN115705838A (en) |
TW (1) | TWI790705B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103440862A (en) * | 2013-08-16 | 2013-12-11 | 北京奇艺世纪科技有限公司 | Method, device and equipment for synthesizing voice and music |
CN107068158A (en) * | 2011-11-03 | 2017-08-18 | 沃伊斯亚吉公司 | Improve the non-voice context of low rate code Excited Linear Prediction decoder |
CN111091810A (en) * | 2019-12-19 | 2020-05-01 | 佛山科学技术学院 | VR game character expression control method based on voice information and storage medium |
CN111226278A (en) * | 2017-08-17 | 2020-06-02 | 塞伦妮经营公司 | Low complexity voiced speech detection and pitch estimation |
-
2021
- 2021-08-06 TW TW110129198A patent/TWI790705B/en active
- 2021-12-17 CN CN202111553476.4A patent/CN115705838A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107068158A (en) * | 2011-11-03 | 2017-08-18 | 沃伊斯亚吉公司 | Improve the non-voice context of low rate code Excited Linear Prediction decoder |
CN103440862A (en) * | 2013-08-16 | 2013-12-11 | 北京奇艺世纪科技有限公司 | Method, device and equipment for synthesizing voice and music |
CN111226278A (en) * | 2017-08-17 | 2020-06-02 | 塞伦妮经营公司 | Low complexity voiced speech detection and pitch estimation |
CN111091810A (en) * | 2019-12-19 | 2020-05-01 | 佛山科学技术学院 | VR game character expression control method based on voice information and storage medium |
Also Published As
Publication number | Publication date |
---|---|
TW202308396A (en) | 2023-02-16 |
CN115705838A (en) | 2023-02-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11190855B2 (en) | Automatic generation of descriptive video service tracks | |
US9552807B2 (en) | Method, apparatus and system for regenerating voice intonation in automatically dubbed videos | |
KR101492816B1 (en) | Apparatus and method for providing auto lip-synch in animation | |
USRE42647E1 (en) | Text-to speech conversion system for synchronizing between synthesized speech and a moving picture in a multimedia environment and a method of the same | |
JP2008546016A (en) | Method and apparatus for performing automatic dubbing on multimedia signals | |
CN112567721B (en) | Method and device for synchronizing sectional mixed video and audio | |
US11430485B2 (en) | Systems and methods for mixing synthetic voice with original audio tracks | |
JP2000508845A (en) | Automatic synchronization of video image sequences to new soundtracks | |
KR20070020252A (en) | Method of and system for modifying messages | |
EP3224834B1 (en) | Apparatus and method for generating visual content from an audio signal | |
EP3935635A1 (en) | System and method for simultaneous multilingual dubbing of video-audio programs | |
JP4594908B2 (en) | Explanation additional voice generation device and explanation additional voice generation program | |
JP5050445B2 (en) | Movie playback apparatus and movie playback method | |
TWI790705B (en) | Method for adjusting speech rate and system using the same | |
KR100710600B1 (en) | The method and apparatus that createdplayback auto synchronization of image, text, lip's shape using TTS | |
KR101618777B1 (en) | A server and method for extracting text after uploading a file to synchronize between video and audio | |
Fried et al. | Puppet dubbing | |
JP6295381B1 (en) | Display timing determination device, display timing determination method, and program | |
JP7069386B1 (en) | Audio converters, audio conversion methods, programs, and recording media | |
WO2021157192A1 (en) | Control device, control method, computer program, and content playback system | |
JP6486582B2 (en) | Electronic device, voice control method, and program | |
CN111899714A (en) | Dubbing method and system | |
JP4052561B2 (en) | VIDEO Attached Audio Data Recording Method, VIDEO Attached Audio Data Recording Device, and VIDEO Attached Audio Data Recording Program | |
US20220417659A1 (en) | Systems, methods, and devices for audio correction | |
CN1726707A (en) | Method and apparatus for selectable rate playback without speech distortion |