TW201118856A - The LR-Book handheld device based on ARM920T embedded platform - Google Patents

The LR-Book handheld device based on ARM920T embedded platform Download PDF

Info

Publication number
TW201118856A
TW201118856A TW98139276A TW98139276A TW201118856A TW 201118856 A TW201118856 A TW 201118856A TW 98139276 A TW98139276 A TW 98139276A TW 98139276 A TW98139276 A TW 98139276A TW 201118856 A TW201118856 A TW 201118856A
Authority
TW
Taiwan
Prior art keywords
book
operating system
voice
text
system platform
Prior art date
Application number
TW98139276A
Other languages
Chinese (zh)
Other versions
TWI405184B (en
Inventor
Jhing-Fa Wang
Tien-Huang Huang
Original Assignee
Univ Nat Cheng Kung
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Univ Nat Cheng Kung filed Critical Univ Nat Cheng Kung
Priority to TW98139276A priority Critical patent/TWI405184B/en
Publication of TW201118856A publication Critical patent/TW201118856A/en
Application granted granted Critical
Publication of TWI405184B publication Critical patent/TWI405184B/en

Links

Landscapes

  • Machine Translation (AREA)

Abstract

In recent years, hand-held devices have become more and more popular in our daily life. In addition to the trend of low price and small volume, these devices usually possess strong software functions and high operation ability. Owing to these technology advances, many unfeasible applications in old handheld devices can already be realized nowadays. The purpose of this patent is to propose a ''Listenable and Readable BOOK Device''. This device is based on S3C2440A with ARM920T as the main processor and the Linux is adopted as the operating system. First, the text content of a physical book is converted into digital speech by a user-friendly text-tospeech (TTS) system. The speech content can then be easily downloaded into the memory of LR-Book through the USB interface. With the optical character recognition (OCR) process, LR-Book system is able to identify the page number of the currently reading physical book and then obtain the corresponding digital speech content in the memory. Finally, the speech output of the LRBook can be read out. The proposed speech synthesis system is based on Hidden Markov Models to synthesize smooth and easyunderstanding speech. In the semantic unpredictable sentence (SUS) dictation test, the correct mean rate is 96.4%. In the naturalness test, the mean opinion score (MOS) is 3.6. The model of synthesize output is very small and can be used in many applications because of its flexibility and portability.

Description

201118856 '六、發明說明·· 【發明所屬之技術領域】 本發明係㈣於-做人式作㈣辭台之隨賴聽電 子書手縣置,尤指涉及-種整合語音合成(Textt(>_speech, TTS)及人·面互動,特職減協祕髮魏讀隨聽 閱讀且具可攜性之手持裝置者。 【先前技術】 釀近年來’手持式裝置越來越普及化,其主要特點係趨向小 體積、低價格、高運算能力且擁有強大之軟體功能。由於技術 之進步,許多無法在傳統手持式裝置實現之應用在今日已有被 實現之可能性。 惟,目前市面上之有聲電子書服務皆需請專業人士來先行 進行錄音’而在印刷之書本上皆需印有具編碼圖案之書籍,其 耗費之人力資源成本勢必提高,因此利用性將相對降低許多。 此外,為了合成出自然度與清晰度更高之語音,必需採用更多 | 之訓練語料,而平衡語料庫之收集、標記與校正也將耗費大量 之人力及時間。故,一般習用者係無法符合使用者於實際使用 時之所需。 【發明内容】 » - 本發明之主要.目的係在於,克服習知技藝所遭遇之上述問 題並提供一種整合語音合成(Text-t0_speech,τ1^)及人機介 面互動,包含數位書籍產生及使用方式,俾以應用於協助銀髮 族隨讀隨聽閱讀且具可攜性之手持裝置者。 201118856 為達以上之目的,本發明係一種嵌入式作業系統平台之隨 讀隨聽電子書手持裝置,係包括一儲存單元,係與一語音合成 模組(TTS Synthesis System)連接’用以接收並儲存該語音合 成模組合成後之語音檔;一文字辨識元件,係與該儲存單元及 -感測元件連接,肋取得該感測元件從實體書籍所辨識出來 之數字檔,配合該儲存單元内之語音檔加以光學頁碼辨識,並 於辨識完成後轉換為一文字檔輸出;一處理元件,係與該文字 辨識元件連接,用以將該文字檔轉換為語音格式;以及一後處 理模組(Post-Processing),係與該處理元件連接,用以對辨識 過之字元決定最後辨識結果,並將此最後辨識結果傳送至一輸 出單元供語音播放輸出。 【實施方式】 請參閱『第1圖』所示,係本發明之整體架構示意圖。如 圖所示:本發明係T種嵌入式作業系統平台之隨讀隨聽電子書 手持裝置,係包括一儲存單元1〇、一文字辨識元件1 1、一 處理元件.1 2尽一後處理模組(post-processing) 1 3所構成。 其特徵在於整合語音合成(Text-t〇-Speech, TTS)及人機介面 互動,可兼顧傳統紙本書籍及電子書之優點,俾以應用於協助 銀髮族隨讀隨聽閱讀^ 上述儲存單元1 〇係與一語音合成模組(TTS synthesis System) 2連接,用以接收並儲存該語音合成模組2合成後之 語音槽,其中’該語音檔係包括語音資料及頁碼對應表之多媒 體數值化文章内容。 該文字辨識元件11係與該儲存單元1〇及一感測元件 201118856 3連接,用以取得該感測元件3從實體書籍7所辨識出來之數 子檔,配合該儲存單元1〇内之語音檔加以光學頁碼辨識,並 於辨識完成後轉換為一文字檔輸出。 該處理元件12係與該文字辨識元件11連接,用以將該 文子播轉換為語音格式。 該後處理模組1 3係與該處理元件1 2連接,用以對辨識 過之字元決定最後辨識結果,並將此最後辨識結果傳送至一輸 出單元4供語音播放輸出。以上所述,係構成一全新之嵌入式 作業系統平台之隨讀隨聽電子書手持裝置1。 當運用時,本發明係採用三星公司開發之ARM920T處理 器(S3C2440A)作為該處理元件1 2,並以光學文字辨識系 統(Optical Character Rec〇gniti〇n,0CR )作為該文字辨識元件 1 1,使用Linux環境之作業系統實作本發明之隨讀隨聽電子 書(LR-Book)手持裳置1。首先,以針對銀髮族設計適合之 使用者操作介面,將内容先經由人性化之語音合成模組2合 成’而制者可透過USB倾介面存取合成後之㈣體數位 化文章内容並下載至該LR_B〇〇k手持裝置丄之儲存單元工〇 中三最後’透過該文字辨識猶i i取得目前正在,之實體 書籍7内容範H,g仏贿存單元i⑽之多舰數位化文章 内谷,使文章以該輸出單元4語音輸出以達到閱讀之目的。於 其中’ 3亥輸出单元4係為一味j 。 有鑑於本發明係可應用於協助銀髮族隨讀隨聽閱讀,因此 本裝置之特徵乃包含數位書籍產生及使用方式。首先,針對該 數位書籍產生,本裝置可透過人i編輯或光學自_識建槽, 再將書籍之數位内容蚊字轉語音處理,產生語音播,即產生 201118856 為書籍之語音資料與頁碼對應表。於其中,係以文字與韻律分 析進行文章内容之斷句、_與字轉音動作後,再將每個發音 单位之前後文資賴至聰音合成··語音合^針對該 使用方式’本裝置係先將語音資料及頁碼對應表放置在本裝置 之儲存單元巾,使用者只需將本裝置之光學辨識細元件對準 實體書籍之貞碼並按下掃纖,即可完颜碼纖,並接著播 放該頁之語音内容。 清參閱『第2圖』所示’係本發明之語音合成模組架構示 意圖。如:本發明_之語音合賴組2係為一基於隱 (HMM-based Speech Synthesis System’HTS)’ 其包含一訓練部(TrainingPart) 2 〇與一合成 部(SynthesisPart) 2 1。其中在該訓練部2 〇,係由所收集201118856 'Six, invention description··Technical field to which the invention belongs. The invention is based on the fact that (4) is a human being (4) resignation to listen to an e-book, especially to involve an integrated speech synthesis (Textt (> _speech, TTS) and person-to-face interaction, special-purpose reduction, secret-reading, and reading and portable handheld devices. [Prior Art] In recent years, handheld devices have become more and more popular, mainly Features tend to be small size, low price, high computing power and powerful software functions. Due to advances in technology, many applications that cannot be implemented in traditional handheld devices have been realized today. However, currently on the market In the audio e-book service, professionals are required to make recordings first. In the printed books, books with coded patterns are required to be printed, and the cost of human resources is bound to increase, so the usability will be relatively reduced. In order to synthesize speech with higher naturalness and clarity, more training corpus must be used, and the collection, marking and correction of the balanced corpus will also cost a lot of manpower and Therefore, the general practitioner cannot meet the needs of the user in actual use. [Invention] The main purpose of the present invention is to overcome the above problems encountered in the prior art and to provide an integrated speech synthesis. (Text-t0_speech, τ1^) and human-machine interface interaction, including the way several books are produced and used, for use in handheld devices that assist the silver-haired family to read and listen with them. 201118856 For the above purposes The present invention relates to an embedded operating system platform for reading and listening e-book handheld devices, which comprises a storage unit connected to a TTS Synthesis System for receiving and storing the speech synthesis module. a synthesized speech file; a text recognition component connected to the storage unit and the sensing component, the rib obtaining the digital file recognized by the sensing component from the physical book, and the optical page identification of the voice file in the storage unit And after the identification is completed, converted into a text file output; a processing component is connected to the text recognition component to The word file is converted into a voice format; and a post-processing module is connected to the processing component for determining the final identification result for the recognized character and transmitting the final identification result to an output unit. [Embodiment] Please refer to the "Figure 1", which is a schematic diagram of the overall architecture of the present invention. As shown in the figure: The present invention is a T-type embedded operating system platform with an audio-visual e-book handheld The device comprises a storage unit 1 , a character recognition component 1 1 , a processing component 1 1 and a post-processing 13 . The feature is integrated speech synthesis (Text-t〇- Speech, TTS) and human-machine interface interaction can take into account the advantages of traditional paper books and e-books, and can be used to assist silver-haired people to read and listen with them. ^ Storage unit 1 〇 system and a speech synthesis module (TTS synthesis) The system 2 is connected to receive and store the synthesized speech slot of the speech synthesis module 2, wherein the voice file includes multimedia content of the speech data and the page number correspondence table. The character recognition component 11 is connected to the storage unit 1A and a sensing component 201118856 3 for obtaining the number of sub-files recognized by the sensing component 3 from the physical book 7 and matching the voice in the storage unit 1 The file is identified by optical page number and converted to a text file output after the identification is completed. The processing element 12 is coupled to the text recognition component 11 for converting the text to a speech format. The post-processing module 13 is connected to the processing component 12 for determining the final identification result for the recognized character, and transmitting the final identification result to an output unit 4 for voice playback output. As described above, it constitutes a new embedded operating system platform for reading and listening e-book handheld device 1. When used, the present invention uses the ARM920T processor (S3C2440A) developed by Samsung as the processing element 12, and uses an optical character recognition system (Optical Character Rec〇gniti〇n, 0CR) as the character recognition element 1 1. The operating system of the Linux environment is used to implement the LR-Book handheld cradle of the present invention. Firstly, the user interface is designed for the silver-haired family, and the content is first synthesized through the humanized speech synthesis module 2, and the maker can access the synthesized (four) body digitalized article content through the USB dump interface and download to the content. The LR_B〇〇k handheld device 储存 储存 储存 储存 最后 最后 最后 最后 最后 最后 最后 最后 最后 最后 最后 最后 最后 最后 最后 最后 最后 最后 最后 最后 最后 最后 最后 最后 最后 最后 最后 最后 最后 最后 ii ii ii ii ii ii ii ii ii ii ii ii ii ii ii ii The article is voice outputted by the output unit 4 for reading purposes. In which the '3H output unit 4 is a single j. In view of the fact that the present invention can be applied to assist a silver-haired family to read and listen, the features of the device include the manner in which digital books are produced and used. Firstly, for the digital book generation, the device can edit or optically build the slot through the human i, and then convert the digital content of the book to the voice processing to generate a voice broadcast, that is, the voice data corresponding to the page number of 201118856 is generated. table. Among them, the text and rhythm analysis are used to break the sentence of the article content, _ and the word transliteration action, and then each vocabulary unit is affixed to the genre of the vocal synthesis················ The voice data and the page number correspondence table are placed on the storage unit towel of the device, and the user only needs to align the optical identification component of the device with the weight of the physical book and press the fiber to complete the color code, and then Play the voice content of this page. Referring to the "Fig. 2", the schematic of the speech synthesis module architecture of the present invention is shown. For example, the speech group 2 of the present invention is a HMM-based Speech Synthesis System (HTS), which includes a training part 2 and a synthesis part 2 1 . Among them, in the training department 2, it is collected by

之聲音語料2 0 1估算音韻參數及頻譜參數,而鱗音語料2 0 1對應之文字則由一文字分析器2 〇2分析出對應之音素 序列(Label)。繼之’操取該頻譜參數與該音韻參數,並將此 頻譜參數經過梅爾倒頻譜(MFC)之聲音合成技術(v〇c〇ding Technique)’提取梅爾倒頻譜係數(MFCC)後,與該音韻參 數及該音素序列結合作為一隱藏式馬可夫模型之訓練資料 (Training Of HMM) 2 0 3,再配合上下文相關之問題集, 訓練狀態合併分裂樹,產生出上下文相關之HMM模型與音長 模型2 0 4;在該合成部21,所輸入之文字經由同樣之文字 分析器211分析出音素序列以及所對應之上下文訊息,透過 分類與回歸樹之挑選,選出對應之HMM模型序列,由 產生出音韻參數與頻譜參數,再將此音韻與頻譜參數以一合成 濾波器212合成為語音訊號輸出。 201118856 上述基於HMM之語音合成器主要核心技術係包括: (1 )基於梅爾倒頻譜之聲音合成技術,包含梅爾倒頻譜 係數之分析,以及使用梅爾對數頻譜近似濾波器(Mel_1〇gThe sound corpus 2 0 1 estimates the phoneme parameters and the spectral parameters, and the text corresponding to the scale corpus 2 0 1 is analyzed by a text analyzer 2 〇 2 to the corresponding phoneme sequence (Label). Following the operation of the spectral parameters and the phonetic parameters, and the spectral parameters are extracted by the Meer Cepstrum (MFC) sound synthesis technique (v〇c〇ding Technique) to extract the Mel Cepstral Coefficient (MFCC). Combined with the phoneme parameter and the phoneme sequence as a hidden Markov model training data (Training Of HMM) 2 0 3, and then with the context-related problem set, the training state merges the split tree to generate a context-dependent HMM model and sound The long model 2 0 4; in the synthesizing unit 21, the input text is analyzed by the same text analyzer 211 to analyze the phoneme sequence and the corresponding context message, and the corresponding HMM model sequence is selected by the classification and regression tree selection. The phonological parameters and the spectral parameters are generated, and the phonological and spectral parameters are synthesized into a speech signal output by a synthesis filter 212. 201118856 The main core technologies of the above HMM-based speech synthesizers include: (1) Sound synthesis technology based on Mel's cepstrum, including analysis of Mel cepstral coefficients, and using Mel logarithmic spectrum approximation filter (Mel_1〇g

Spectrum Approximation Filter, MLSA Filter )直接將梅爾倒頻譜 係數合成回語音訊號; (2 )從HMM模型產生語音參數時,係使用考慮參數動 態特性之參數生成演算法;以及 (3 )基於多空間機率分佈-隱藏式馬可夫模型 (Multi-Space probability Distribution HMM, MSD-HMM),考 慮基頻僅在濁音段有值,而在清音段沒有定義這樣之特性,使 參數之維度在濁音段為1,清音段為〇。 請參閱『第3圖〜第6 B圖』所示,係分別為本發明之硬 體嵌入式系統架構示意圖、本發明比較NAND FLASH.與N〇R flash之差異示意圖、本發明之LM1117規格書示意圖、本 發明設計穩壓電路之範列一示意圖及本發明設計穩壓電路之 範列二示意圖。如圖所示:係本裝置之硬體實現,其包含穩壓 電路、額外之記憶體單元(SDRAM、NAND Flash)、音效控 制電路、攝影鏡頭控制電路、串列傳輸介面電路、USB介面 傳輸電路、及鍵盤控制電路,如第3圖所示。其中在此記憶體 單元部份,本發明係採用NAND FLASH當作整體LR-B00K 手持裝置之記憶體單元,並捨棄掉N0RFLASH,其說明如第 4圖所示。而在穩壓電路方面係採用LM1117,因為整體系統 需使用到多種不同之電壓,所以需透過穩壓器來調整出不同之 電壓值’但因為穩學器並沒有生產出令使用者相符之電壓值範 園’故本發明係自行設計穩壓電路,其範例如第5圖及第6 201118856 A、6 B圖所示’利用第5圖所提供之規格書設計&達到本發 明所需電壓之電壓值。其中,鶴6㈣之範例係可將$伏特 (V)之電·穩壓至2·8ν,該第6 B圖之範例則可將之 電壓穩壓至UV,其計算方式如下所示: (αϋέ+0·06^χ0·122^Ω+1.25ν22.8ν * 1.25ν +〇.06mA) x〇.〇\KCl + \ .25v = 1.3v 凊參閱『第7圖及第8圖』所示,係分別為本發明對各家: 嵌入式作業系統之比示意圖、及本發明LR B〇〇K手持装置之 ί呆作流程示意圖如圖所示:承上所述,本裝置係採用Linux 作業系統為其開發環境,與Plam、Win CE及Symbian等各家 嵌入式作業系統之優缺點比較如第7圖所示。而在本裝置之整 體操作流程如第8圖所示,係》光學辨識感測器作為感測元件 3由該感測元件3從實體書本上榻取圖像,再透過文字辨識 碰1 1以二值化、各數字分離、特賴取及數字辨識等處 理’取得·體書籍上所觸出來之數字槽,並触從儲存單 疋1 0内事先利用ns所合成出之語音資料中搜尋對應之語 音檔’最後透過喇叭將其聲音播放出來。 請參閱『第9圖』所示,係本發明一較佳實例之整體架構 不意圖。如圖所示:於一較佳實例中,本發明係包含一有聲書 產生器5供語音稽產生、以及一基於ARM920T開發之有聲筆 裝置6,並在Linux環境開發光學辨識及音檔對應、播放功能。 上述有聲書產生器5係包含一文字與韻律分析單元5 〇 及一語音合成模組51,用以遶過該文字與韻律分析單元5 〇 201118856 將文章内之語句轉換為具有料與結構資訊之文稿資訊,再透 過此基於HMM 槽0 之語音合成模組51將語句内容轉換為語音 該有聲筆裝置6係包含-文字辨識元件6 0、-儲存單元 6 1及一音檔對應/播放單元6 2,且該音檔對應/播放單元6 2係包含一 ARM920T處理器(S3C244〇A)及一後處理模組。 用以透過該有聲筆裝置6之資料傳輸介©,將合成後之語音槽 存放至該儲存單元61中。 當辨識時,係透過該文字辨識元件6 〇取得目前正在閱讀 之實體書籍7内容範ϋ (或電子書全文),從該實體書籍所^ 識出來之數字檔,配合該儲存單元6 i内之語音撐(即包括語 音資料及頁碼對應表之多媒體數位化文章内容)加以光學頁碼 辨識,並於辨識完成後轉換為一文字檔且輸出至該音檔對應/ 播放單元6 2,將該文字檔轉換為語音格式,且對辨識過之字 兀決定最後辨識結果後’俾以將此最後辨識結果之文章透過一 剩〇八8以語音輸出,進而達到隨讀隨聽之目的。 上述經由本發明所開發之語音合成模組中,為採用基於 HMM之語音合成器。其在語義不可預測句子 Unpredictable Sentence,SUS)聽寫之測試中,平均受測者之正 確率係可達到96.4〇/〇 ;而在針對不同之題材短文測試中,主觀 «平/則之自然度平均意見得分(Mean Opinion Score, MOS)亦可 達到3.6分。由此可知,本裝置已可合成出流暢及可理解之語 音。同時合成部份之語音模型所佔記憶體空間極小,故在可攜 性灰適應性上更為其發展優勢。 藉此’本發明係提出創意兼顧傳統紙本書籍及電子書之優 201118856 點’可協助銀髮族隨讀隨聽閱讀,利用整合語音合成、光學頁 碼辨識及系統單晶片(System Qn ehip,Μ}軟硬體共_ &, 可減^人力與_之支出並增加可利用性。 細上所述’本發明係—種嵌人式作業系統平台之隨讀隨聽 電子書手_置’可有效改善習狀種種細,雜合語音合 =人機介面互動’包含數位書籍產生及方式,可合成: *暢及可雜之料並具可攜性,俾讀躲協祕髮族隨讀 ,聽閱讀之目的,進而使本發明之産生能更進步、更實用、更 符合使用者之所須,確已符合發明專利申請之要件,爰依法提 出專利申請。 惟以上所述者,僅為本發明之較佳實施例而已,當不能以 此限定本㈣實施之細;故,凡依本發明㈣專利範圍及發 明說明書内谷所作之解的等效變化與修飾,皆應仍屬本發明 專利涵蓋之範圍内。 【圖式簡單說明】 第1圖’係本發明之整體架構示意圖。 第2圖’係本發明之語音合成模組架構示意圖。 第3圖,係本發明之硬體嵌入式系統架構示意圖。 第4圖’係本發明比較NAND FLASH與NOR FLASH之 差異示意圖。 第5圖’係本發明之LM1117規格書示意圖。 第6 A圖’係本發明設計穩壓電路之範列一示意圖。 第6 B圖’係本發明設計穩壓電路之範列二示意圖。 第7圖,係本發明對各家嵌入式作業系統之比示意圖。 201118856 第8圖,係本發明LR_BOOK手持裝置之操作流程示意 圖。 第9圖,係本發明一較佳實例之整體架構示意圖。 【主要元件符號說明】 隨讀隨聽電子書手持裝置1 儲存單元10 文字辨識元件11 φ 處理元件12 後處理模組13 語音合成模組2 訓練部2 0 聲音語料201 文字分析器202 隱藏式馬可夫模型之訓練資料2 0 3 HMM模型與音長模型204 • 合成部21 文字分析器211 合成濾波器212 感測元件3 輸出單元4 有聲書產生器5 文字與韻律分析單元5 0 語音合成模組51 有聲筆裝置6 12 201118856 文字辨識元件6 Ο 儲存單元61 音檔對應/播放單元6 2 實體書籍7 0刺口八8Spectrum Approximation Filter, MLSA Filter) directly synthesizes the Mel cepstral coefficients back to the speech signal; (2) when generating speech parameters from the HMM model, generates algorithms using parameters that take into account the dynamic characteristics of the parameters; and (3) based on multi-space probability Multi-Space probability distribution HMM (MSD-HMM), considering that the fundamental frequency has a value only in the voiced segment, but does not define such a feature in the unvoiced segment, so that the dimension of the parameter is 1 in the voiced segment, unvoiced The paragraph is 〇. Please refer to FIG. 3 to FIG. 6B for a schematic diagram of the architecture of the hardware embedded system of the present invention, the difference between the NAND FLASH. and the N〇R flash of the present invention, and the LM1117 specification of the present invention. The schematic diagram, the schematic diagram of the design of the voltage regulator circuit of the present invention and the schematic diagram of the second embodiment of the voltage regulator circuit of the present invention. As shown in the figure: it is a hardware implementation of the device, which includes a voltage stabilization circuit, an additional memory unit (SDRAM, NAND Flash), a sound effect control circuit, a photographic lens control circuit, a serial transmission interface circuit, and a USB interface transmission circuit. And the keyboard control circuit, as shown in Figure 3. In the memory unit portion, the present invention uses NAND FLASH as the memory unit of the overall LR-B00K handheld device, and discards NORFSLASH, as illustrated in FIG. In the aspect of the voltage regulator circuit, the LM1117 is used. Because the whole system needs to use a variety of different voltages, it is necessary to adjust the voltage value through the voltage regulator. However, because the stabilizer does not produce a voltage that matches the user. The value of Fan Park's invention is based on the design of the voltage regulator circuit, such as Figure 5 and Figure 6 201118856 A, 6 B diagram 'Using the specifications provided in Figure 5 design & reach the voltage required by the present invention The voltage value. Among them, the example of Crane 6 (4) can regulate the voltage of $V (V) to 2·8ν, and the example of Figure 6B can regulate the voltage to UV, which is calculated as follows: (αϋέ +0·06^χ0·122^Ω+1.25ν22.8ν * 1.25ν +〇.06mA) x〇.〇\KCl + \ .25v = 1.3v 凊 See “Figure 7 and Figure 8”, The schematic diagram of the ratio of the embedded operating system and the LR B〇〇K handheld device of the present invention are shown in the figure as shown in the figure: As described above, the device adopts the Linux operating system. For its development environment, the advantages and disadvantages of various embedded operating systems such as Plam, Win CE and Symbian are shown in Figure 7. In the overall operation flow of the device, as shown in FIG. 8, the optical identification sensor is used as the sensing component 3, and the sensing component 3 takes an image from the physical book, and then touches the text recognition. The binarization, the digit separation, the special acquisition and the digital identification process are used to process the digital slot touched by the body book, and the search is performed by using the voice data synthesized by ns in the storage unit. The corresponding voice file 'finally plays its sound through the speaker. Please refer to FIG. 9 for the overall architecture of a preferred embodiment of the present invention. As shown in the figure, in a preferred embodiment, the present invention comprises an audio book generator 5 for voice generation, and an vocal device 6 based on the ARM920T, and develops optical identification and audio file correspondence in a Linux environment. Playback function. The audio book generator 5 includes a text and prosody analysis unit 5 and a speech synthesis module 51 for bypassing the text and prosody analysis unit 5 〇201118856 to convert the sentence in the article into a document with material and structure information. The information is converted into speech by the speech synthesis module 51 based on the HMM slot 0. The stylus device 6 includes a text recognition component 60, a storage unit 61, and a audio file corresponding/playing unit 6 2 And the audio file corresponding/playing unit 6 2 includes an ARM920T processor (S3C244〇A) and a post-processing module. The synthesized voice slot is stored in the storage unit 61 through the data transmission medium of the stylus device 6. When the identification is performed, the content recognition component 6 is used to obtain the content of the physical book 7 (or the full text of the electronic book) currently being read, and the digital file recognized from the physical book is matched with the storage unit 6 i The voice support (that is, the multimedia digitized article content including the voice data and the page number correspondence table) is identified by the optical page number, and after the identification is completed, converted into a text file and output to the sound file corresponding/playing unit 6 2, and the text file is converted. For the voice format, and after determining the final identification result for the recognized word, the article is used to output the final identification result through a remaining eight-8 voice output, thereby achieving the purpose of reading and listening. In the speech synthesis module developed by the present invention, an HMM-based speech synthesizer is used. In the unpredictable Sentence (SUS) dictation test, the average test subject's correct rate can reach 96.4 〇 / 〇; and in the different subject essay test, the subjective « flat / then natural average The Mean Opinion Score (MOS) can also reach 3.6 points. It can be seen that the device has been able to synthesize a smooth and understandable speech. At the same time, the synthesized speech model occupies a very small memory space, so it has a development advantage in terms of portability gray adaptability. In this way, the invention is based on the idea that the traditional paper book and the e-book of the best 201118856 point can assist the silver-haired family to read and listen, using integrated speech synthesis, optical page identification and system single chip (System Qn ehip, Μ} The combination of software and hardware _ &, can reduce the expenditure of manpower and _ and increase the availability. The above-mentioned invention is a type of embedded electronic operating system platform. Effectively improve the variety of habits, heterozygous speech = human-machine interface interaction 'including digital book production and methods, can be synthesized: * smooth and miscellaneous materials and portability, reading the secret association secrets with the reader, Listening to the purpose of reading, and thus making the invention more progressive, more practical, and more in line with the needs of the user, has indeed met the requirements of the invention patent application, and filed a patent application according to law. The preferred embodiment of the invention is not limited to the implementation of the fourth embodiment; therefore, the equivalent variation and modification of the solution according to the scope of the invention (4) and the invention of the invention should still belong to the patent of the invention. Covered by BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a schematic diagram of the overall architecture of the present invention. FIG. 2 is a schematic diagram of a speech synthesis module architecture of the present invention. FIG. 3 is a schematic diagram of a hardware embedded system architecture of the present invention. Fig. 4 is a schematic diagram showing the difference between NAND FLASH and NOR FLASH according to the present invention. Fig. 5 is a schematic diagram of the LM1117 specification of the present invention. Fig. 6A is a schematic diagram of the design of the voltage regulator circuit of the present invention. 6B is a schematic diagram of the second embodiment of the present invention. The seventh diagram is a schematic diagram of the ratio of the present invention to various embedded operating systems. 201118856 FIG. 8 is a schematic diagram showing the operation flow of the LR_BOOK handheld device of the present invention. Figure 9 is a schematic diagram of the overall architecture of a preferred embodiment of the present invention. [Description of main component symbols] Read and write e-book handheld device 1 Storage unit 10 Character recognition component 11 φ Processing component 12 Post-processing module 13 Speech synthesis Module 2 Training Department 2 0 Sound Corp. 201 Text Analyzer 202 Training Material of Hidden Markov Model 2 0 3 HMM Model and Sound Length Model 204 • Synthesis Unit 21 Text Analyzer 211 Synthesis filter 212 Sensing element 3 Output unit 4 Audiobook generator 5 Text and prosody analysis unit 5 Speech synthesis module 51 Audiobook component 6 12 201118856 Character recognition component 6 储存 Storage unit 61 Audio/correspondence/playback unit 6 2 Entity books 7 0 spurs eight 8

Claims (1)

201118856 七、申請專利範圍:' 1 ,種嵌人式作業系統平台之隨讀隨聽電子書手持裝置,其特 徵在於整合語音合成(Text_tG_Speeeh,TTS)及人機介面互 動包含數位書籍產生及使財式,俾以朗於協助銀髮族 隨讀隨聽閱讀,其包括: 、 一儲存單元,係與一語音合成模組(TTS Symhesis System)連接’肋魏存聰音合成馳合成後之語 音檔; 文子辨識元件,係與該儲存單元及一感測元件連接, 用以取传該感測元件從實體書籍所辨識出來之數字槽,配合 該儲存單元内之語音檔加以光學頁碼辨識,並於辨識完成後 轉換為一文字檔輸出; 一處理元件,係與該文字辨識元件連接,用以將該文字 檔轉換為語音格式;以及 一後處理模組(Post-Processing),係與該處理元件連接, 用以對辨識過之字元決定最後辨識結果,並將此最後辨識結 果傳送至一輸出單元供語音播放輸出。 2 ·依據申請專利範圍第丄項所述之嵌入式作業系統平台之隨讀 隨聽電子書手持裝置,其中,該文字辨識元件係經由二值化 處王里、各數字分離處理、特徵擷取處理及數字辨識處理取得 從實體書籍上所辨識出來之數字檔。 3.依據申請專利範圍第1項所述之嵌入式作業系統平台之隨讀 隨聽電子書手持裝置,其中,該文字辨識元件係將該感測元 件對準實體書籍之頁碼並按下掃描鍵,配合該儲存單元内之 ‘語音檔進行光學頁碼辨識,俾以獲得該文字檔。 4 ·依據申請專利範圍第;l項所述之嵌入式作業系統平台之隨讀 隨聽電子書手持裝置,其中,該語音觀包括語音資料及頁 碼對應表之多媒體數位化文章内容。 5·依據中請專利範圍第丄項所述之嵌人式作業系統平台之隨讀 隨聽電子書手持裝置,其中,該語音合賴組係與—文字與 韻律分析單元構成-有聲f產生H,關文字與讎分析單 謂文章内之語㈣換為具有發音錢贿訊之文稿資訊 後,再透過該語音合成模組將語句内容轉換為語音檔。 6 ·依據”專利範圍第1項所述之喪人式作業系統平台之隨讀 P遺聽電子書手絲置,其中,該語音合成模組係為一基於隱 藏式馬可夫模型之語音合成器(HMM based Spee比 System, HTS ) 〇 7 ·依據巾請專纖圍第w所述之嵌人式作業轉平台之隨讀 隨聽電子書手持裝置,其巾,該文字辨識元件料—光學文 字辨識系統(Optical Character Recognition,0〇〇。 •依據申請專利範圍第1項所述之嵌入式作業系統平台之隨讀 隨聽電子書手持震置,其中,該處理元件係為一 ARM92〇t 處理器(S3C2440A)。 •依據申請專纖㈣1項所述之嵌人式作業祕平台之隨讀 隨聽電子書手持裝置,其中,該輸出單元係為一督八。 0·依據申請專利範圍第i項所述之嵌入式作業系統平台之隨 讀隨聽電子書手持裝置,其中,該作業系統平台係為u 作業系統。 1·依據申凊專利範圍第1項所述之嵌入式作業系統平台之隨 讀隨聽電子書手持裝置,其中,本裝置係透過USB傳輸介面 201118856 存取合成後之語音檔並下載至該儲存單元中。 1 2 · -種以式作㈣統平台之隨讀隨聽電子書手持裝置,其 特徵在於整合語音合成及人機介面互動,包含數位書籍產生 及使用方式,俾以應驗協助銀髮族隨讀隨聽_,其包括: -有聲書產生器,係包含-文字與韻律分析單元及一語 音合成歡’用以將文章内之語句轉換為具有發音*结構^ 訊之文稿資訊,再透過該語音合成敝縣㈣容轉換為語 音播, -有聲筆裝置’係包含-文字辨識元件、—儲存單元及 -音標對應/減單元,用以_其娜傳齡面將合成後之 語音播存放至該儲存單元中;以及 當辨識時’係透過該文字觸元件取得從實體書籍所辨 識出來之數· ’配合簡存單元内之語音檔加以光學頁碼 辨識’並於辨識完成後轉換為一文字樓且輸出至該音槽對應/ 播放單元’俾使文章得以語音輸出。 ^ 1 3 ·依射請專利範圍第i 2項所述之嵌人式作業系統平台之 隨讀_電子書手魏置’其巾,該音撕蘭單元係包 含-ARM92GT處㈣(S3C244GA)及-後處理模組,用以 將該文字㈣換為語音格式’且對辨識過之字元決定最後辨 識結果,俾以將此最後辨識結果以語音輸出。 14·依據申凊專利範圍第12項所述之敌入式作業系統平台之 P遺讀隨聽電子書手縣置,其巾,該語音輸出錢過一飢 播放。 15 ·依據申凊專利範圍第12項所述之傲入式作業系統平台之 隨讀隨聽電子書手縣置,其中,該語音合成模組係為一基 201118856 於隱藏式馬可夫模型之語音合成器。 16·依據申請專利範圍第12項所述之嵌入式作業系統平台之 隨讀隨聽電子書手持裝置,其中,該文字辨識元件係為一光 學文字辨識系統。、 17·依據申請專利範圍第12項所述之嵌入式作業系統平台之 隨讀隨聽電子書手持裝置,其中,該作業系統平台係為Linux 作業系統。 18·依據申請專利龕圍第12項所述之嵌入式作業系統平台之 隨讀隨聽電子書手持裝置,其中,該語音檔係包括語音資料 及頁碼對應表之多媒體數位化文章内容。 19·依據申請專利範圍第12項所述之嵌入式作業系統平台之 隨讀隨聽電子書手持裝置,其中,該文字辨識元件係經由二 值化處理、各數字分離處理、特徵擷取處理及數字辨識處理 取得從實體書籍上所辨識出來之數字檔。 2 0 ·依據申請專利範圍第12項所述之嵌入式作業系統平台之 隨讀隨聽電子書手持震置,其中,該文字辨識元件係將-感 測元件對渠實體書籍之頁碼並按下掃描鍵,配合該儲存單元 内之語音檔進行光學頁碼辨識,俾以獲得該文字檔。 1 17201118856 VII. Patent application scope: '1. The embedded e-book handheld device with embedded human operation system platform is characterized by integrated speech synthesis (Text_tG_Speeeh, TTS) and human-machine interface interaction including several books to generate and make money. The program is designed to assist the silver-haired family to read and listen, including: a storage unit connected to a speech synthesis module (TTS Symhesis System). The component is connected to the storage unit and a sensing component for taking the digital slot recognized by the sensing component from the physical book, and matching the voice file in the storage unit for optical page number identification, and after the identification is completed Converting to a text file output; a processing component coupled to the text recognition component for converting the text file to a voice format; and a post-processing device coupled to the processing component for Determine the final identification result for the recognized character, and transmit the final identification result to an output unit for voice playback output2. The portable audio-visual handheld device of the embedded operating system platform according to the scope of the patent application scope, wherein the character recognition component is separated by a binary computer, digital separation processing, and feature extraction The processing and digital identification process obtains the digital file identified from the physical book. 3. The audio-visual e-book handheld device of the embedded operating system platform according to claim 1, wherein the character recognition component aligns the sensing component with a page number of the physical book and presses a scan key The optical page number is recognized by the 'voice file' in the storage unit to obtain the text file. 4) The portable audio-visual handheld device of the embedded operating system platform according to the scope of the patent application, wherein the voice view includes the multimedia digital article content of the voice data and the page correspondence table. 5. The portable e-book handheld device of the embedded operating system platform according to the scope of the patent scope of the present invention, wherein the voice group and the text and rhythm analysis unit are formed - the sound f generates H The words in the article and the analytic list are used to convert the words in the article (4) into the manuscript with the pronunciation and bribes, and then the speech content is converted into a voice file through the speech synthesis module. 6 · According to the patent application scope of the mourning-type operating system platform, the P-reading e-book hand is set, wherein the speech synthesis module is a speech synthesizer based on the hidden Markov model ( HMM based Spee is more than System, HTS) 〇7 · According to the towel, please read the embedded work-reading e-book handheld device described in the second paragraph, the towel, the text recognition component material - optical character recognition System (Optical Character Recognition, 0〇〇. • The embedded operating system platform of the embedded operating system platform according to the scope of claim 1 is manually placed, wherein the processing component is an ARM92〇t processor. (S3C2440A) • According to the application of the special fiber (4) 1 item of the embedded operating secret platform of the audio-visual handheld device, wherein the output unit is a supervisor eight. 0. According to the scope of patent application i The portable operating system platform of the embedded listening and e-book handheld device, wherein the operating system platform is a u operating system. 1. The embedded operating system according to claim 1 of the patent scope The platform is connected to the e-book handheld device, wherein the device accesses the synthesized voice file through the USB transmission interface 201118856 and downloads it to the storage unit. 1 2 · - Type (4) platform reading The e-book handheld device is characterized by integrated speech synthesis and human-machine interface interaction, including the generation and use of digital books, and assists the silver-haired family to read and listen with _, including: - audio book generator, including - The text and prosody analysis unit and a speech synthesis joy" are used to convert the sentences in the article into manuscript information with pronunciation* structure^, and then through the speech synthesis, the county (four) capacity is converted into a voice broadcast, - the vocal pen device' The system includes a text recognition component, a storage unit, and a phonetic symbol corresponding/subtracting unit, wherein the synthesized voice broadcast is stored in the storage unit; and when recognized, the text component is transmitted through the text. Obtain the number recognized from the physical book · 'Compatate the voice file in the simple storage unit with the optical page number identification' and convert it into a text building and output after the identification is completed The sound box corresponding/playing unit '俾 enables the article to be voice output. ^ 1 3 · According to the shot, please read the embedded operating system platform as described in item i 2 of the patent scope _ e-booker Wei Wei's towel, The tone tearing unit includes -ARM92GT (4) (S3C244GA) and - post processing module, which is used to change the text (4) into a voice format 'and determines the final identification result for the recognized character, so as to identify the last The result is output by voice. 14· According to the application system of the enemy operating system platform described in item 12 of the scope of the patent application, the e-reader is placed in the county, and the voice output is overwhelmed. 15 · According to the application system of the proud operating system platform described in claim 12, the speech synthesis module is a speech synthesizer of a hidden Markov model. The portable audio-reading device of the embedded operating system platform according to claim 12, wherein the character recognition component is an optical character recognition system. 17. The portable operating system e-book handheld device of the embedded operating system platform according to claim 12, wherein the operating system platform is a Linux operating system. 18. The portable audio-reading device of the embedded operating system platform according to claim 12 of the patent application, wherein the voice file comprises a multimedia digitized article content of a voice data and a page number correspondence table. 19. The portable audio-reading device of the embedded operating system platform according to claim 12, wherein the character recognition component is processed by binarization processing, digital separation processing, feature extraction processing, and The digital identification process obtains the digital file identified from the physical book. 2 0. According to the embedded operating system platform described in claim 12, the portable reading system is manually placed, wherein the character recognition component is to press the page of the sensing element to the physical book of the channel. The scan key is used to perform optical page number recognition with the voice file in the storage unit to obtain the text file. 1 17
TW98139276A 2009-11-19 2009-11-19 The lr-book handheld device based on arm920t embedded platform TWI405184B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW98139276A TWI405184B (en) 2009-11-19 2009-11-19 The lr-book handheld device based on arm920t embedded platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW98139276A TWI405184B (en) 2009-11-19 2009-11-19 The lr-book handheld device based on arm920t embedded platform

Publications (2)

Publication Number Publication Date
TW201118856A true TW201118856A (en) 2011-06-01
TWI405184B TWI405184B (en) 2013-08-11

Family

ID=44935890

Family Applications (1)

Application Number Title Priority Date Filing Date
TW98139276A TWI405184B (en) 2009-11-19 2009-11-19 The lr-book handheld device based on arm920t embedded platform

Country Status (1)

Country Link
TW (1) TWI405184B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103680223A (en) * 2013-12-02 2014-03-26 中国科学院深圳先进技术研究院 Auxiliary reading equipment, auxiliary reading system and auxiliary reading method

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2291571A (en) * 1994-07-19 1996-01-24 Ibm Text to speech system; acoustic processor requests linguistic processor output
US6446040B1 (en) * 1998-06-17 2002-09-03 Yahoo! Inc. Intelligent text-to-speech synthesis
TWM263589U (en) * 2004-03-01 2005-05-01 Afaya Technology Corp An interactive learning device
TWM258352U (en) * 2004-06-09 2005-03-01 Hung-Teng Liu E-book structure with coded image

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103680223A (en) * 2013-12-02 2014-03-26 中国科学院深圳先进技术研究院 Auxiliary reading equipment, auxiliary reading system and auxiliary reading method

Also Published As

Publication number Publication date
TWI405184B (en) 2013-08-11

Similar Documents

Publication Publication Date Title
US10891928B2 (en) Automatic song generation
US9761219B2 (en) System and method for distributed text-to-speech synthesis and intelligibility
US8396714B2 (en) Systems and methods for concatenation of words in text to speech synthesis
US8583418B2 (en) Systems and methods of detecting language and natural language strings for text to speech synthesis
US8712776B2 (en) Systems and methods for selective text to speech synthesis
US8352272B2 (en) Systems and methods for text to speech synthesis
US8352268B2 (en) Systems and methods for selective rate of speech and speech preferences for text to speech synthesis
US8355919B2 (en) Systems and methods for text normalization for text to speech synthesis
WO2019165748A1 (en) Speech translation method and apparatus
CN106898340B (en) Song synthesis method and terminal
CN108806655B (en) Automatic generation of songs
US20100082328A1 (en) Systems and methods for speech preprocessing in text to speech synthesis
JP2011033874A (en) Device for multilingual voice recognition, multilingual voice recognition dictionary creation method
US20100082327A1 (en) Systems and methods for mapping phonemes for text to speech synthesis
US20150112679A1 (en) Method for building language model, speech recognition method and electronic apparatus
JP2008185805A (en) Technology for creating high quality synthesis voice
US20120046948A1 (en) Method and apparatus for generating and distributing custom voice recordings of printed text
TW201517015A (en) Method for building acoustic model, speech recognition method and electronic apparatus
US20120046949A1 (en) Method and apparatus for generating and distributing a hybrid voice recording derived from vocal attributes of a reference voice and a subject voice
CN108305611B (en) Text-to-speech method, device, storage medium and computer equipment
CN104899192B (en) For the apparatus and method interpreted automatically
JP2019109278A (en) Speech synthesis system, statistic model generation device, speech synthesis device, and speech synthesis method
CN102970618A (en) Video on demand method based on syllable identification
Mukherjee et al. A Bengali speech synthesizer on Android OS
JP5693834B2 (en) Speech recognition apparatus and speech recognition method

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees