TW201118856A

TW201118856A - The LR-Book handheld device based on ARM920T embedded platform

Info

Publication number: TW201118856A
Application number: TW98139276A
Authority: TW
Inventors: Jhing-Fa Wang; Tien-Huang Huang
Original assignee: Univ Nat Cheng Kung
Priority date: 2009-11-19
Filing date: 2009-11-19
Publication date: 2011-06-01
Also published as: TWI405184B

Abstract

In recent years, hand-held devices have become more and more popular in our daily life. In addition to the trend of low price and small volume, these devices usually possess strong software functions and high operation ability. Owing to these technology advances, many unfeasible applications in old handheld devices can already be realized nowadays. The purpose of this patent is to propose a ''Listenable and Readable BOOK Device''. This device is based on S3C2440A with ARM920T as the main processor and the Linux is adopted as the operating system. First, the text content of a physical book is converted into digital speech by a user-friendly text-tospeech (TTS) system. The speech content can then be easily downloaded into the memory of LR-Book through the USB interface. With the optical character recognition (OCR) process, LR-Book system is able to identify the page number of the currently reading physical book and then obtain the corresponding digital speech content in the memory. Finally, the speech output of the LRBook can be read out. The proposed speech synthesis system is based on Hidden Markov Models to synthesize smooth and easyunderstanding speech. In the semantic unpredictable sentence (SUS) dictation test, the correct mean rate is 96.4%. In the naturalness test, the mean opinion score (MOS) is 3.6. The model of synthesize output is very small and can be used in many applications because of its flexibility and portability.

Description

201118856 '六、發明說明·· 【發明所屬之技術領域】本發明係㈣於-做人式作㈣辭台之隨賴聽電子書手縣置，尤指涉及-種整合語音合成（Textt(>_speech， TTS)及人·面互動，特職減協祕髮魏讀隨聽閱讀且具可攜性之手持裝置者。【先前技術】釀近年來’手持式裝置越來越普及化，其主要特點係趨向小體積、低價格、高運算能力且擁有強大之軟體功能。由於技術之進步，許多無法在傳統手持式裝置實現之應用在今日已有被實現之可能性。惟，目前市面上之有聲電子書服務皆需請專業人士來先行進行錄音’而在印刷之書本上皆需印有具編碼圖案之書籍，其耗費之人力資源成本勢必提高，因此利用性將相對降低許多。此外，為了合成出自然度與清晰度更高之語音，必需採用更多 | 之訓練語料，而平衡語料庫之收集、標記與校正也將耗費大量之人力及時間。故，一般習用者係無法符合使用者於實際使用時之所需。【發明内容】 » - 本發明之主要.目的係在於，克服習知技藝所遭遇之上述問題並提供一種整合語音合成（Text-t0_speech，τ1^)及人機介面互動，包含數位書籍產生及使用方式，俾以應用於協助銀髮族隨讀隨聽閱讀且具可攜性之手持裝置者。 201118856 為達以上之目的，本發明係一種嵌入式作業系統平台之隨讀隨聽電子書手持裝置，係包括一儲存單元，係與一語音合成模組（TTS Synthesis System)連接’用以接收並儲存該語音合成模組合成後之語音檔；一文字辨識元件，係與該儲存單元及 -感測元件連接，肋取得該感測元件從實體書籍所辨識出來之數字檔，配合該儲存單元内之語音檔加以光學頁碼辨識，並於辨識完成後轉換為一文字檔輸出；一處理元件，係與該文字辨識元件連接，用以將該文字檔轉換為語音格式；以及一後處理模組（Post-Processing)，係與該處理元件連接，用以對辨識過之字元決定最後辨識結果，並將此最後辨識結果傳送至一輸出單元供語音播放輸出。【實施方式】請參閱『第1圖』所示，係本發明之整體架構示意圖。如圖所示：本發明係T種嵌入式作業系統平台之隨讀隨聽電子書手持裝置，係包括一儲存單元1〇、一文字辨識元件1 1、一處理元件.1 2尽一後處理模組（post-processing) 1 3所構成。其特徵在於整合語音合成（Text-t〇-Speech, TTS)及人機介面互動，可兼顧傳統紙本書籍及電子書之優點，俾以應用於協助銀髮族隨讀隨聽閱讀^ 上述儲存單元1 〇係與一語音合成模組（TTS synthesis System) 2連接，用以接收並儲存該語音合成模組2合成後之語音槽，其中’該語音檔係包括語音資料及頁碼對應表之多媒體數值化文章内容。該文字辨識元件11係與該儲存單元1〇及一感測元件 201118856 3連接，用以取得該感測元件3從實體書籍7所辨識出來之數子檔，配合該儲存單元1〇内之語音檔加以光學頁碼辨識，並於辨識完成後轉換為一文字檔輸出。該處理元件12係與該文字辨識元件11連接，用以將該文子播轉換為語音格式。該後處理模組1 3係與該處理元件1 2連接，用以對辨識過之字元決定最後辨識結果，並將此最後辨識結果傳送至一輸出單元4供語音播放輸出。以上所述，係構成一全新之嵌入式作業系統平台之隨讀隨聽電子書手持裝置1。當運用時，本發明係採用三星公司開發之ARM920T處理器（S3C2440A)作為該處理元件1 2，並以光學文字辨識系統（Optical Character Rec〇gniti〇n，0CR )作為該文字辨識元件 1 1，使用Linux環境之作業系統實作本發明之隨讀隨聽電子書（LR-Book)手持裳置1。首先，以針對銀髮族設計適合之使用者操作介面，將内容先經由人性化之語音合成模組2合成’而制者可透過USB倾介面存取合成後之㈣體數位化文章内容並下載至該LR_B〇〇k手持裝置丄之儲存單元工〇中三最後’透過該文字辨識猶i i取得目前正在,之實體書籍7内容範H，g仏贿存單元i⑽之多舰數位化文章内谷，使文章以該輸出單元4語音輸出以達到閱讀之目的。於其中’ 3亥輸出单元4係為一味j 。有鑑於本發明係可應用於協助銀髮族隨讀隨聽閱讀，因此本裝置之特徵乃包含數位書籍產生及使用方式。首先，針對該數位書籍產生，本裝置可透過人i編輯或光學自_識建槽，再將書籍之數位内容蚊字轉語音處理，產生語音播，即產生 201118856 為書籍之語音資料與頁碼對應表。於其中，係以文字與韻律分析進行文章内容之斷句、_與字轉音動作後，再將每個發音单位之前後文資賴至聰音合成··語音合^針對該使用方式’本裝置係先將語音資料及頁碼對應表放置在本裝置之儲存單元巾，使用者只需將本裝置之光學辨識細元件對準實體書籍之貞碼並按下掃纖，即可完颜碼纖，並接著播放該頁之語音内容。清參閱『第2圖』所示’係本發明之語音合成模組架構示意圖。如:本發明_之語音合賴組2係為一基於隱 (HMM-based Speech Synthesis System’HTS)’ 其包含一訓練部（TrainingPart) 2 〇與一合成部（SynthesisPart) 2 1。其中在該訓練部2 〇，係由所收集201118856 'Six, invention description··Technical field to which the invention belongs. The invention is based on the fact that (4) is a human being (4) resignation to listen to an e-book, especially to involve an integrated speech synthesis (Textt (> _speech, TTS) and person-to-face interaction, special-purpose reduction, secret-reading, and reading and portable handheld devices. [Prior Art] In recent years, handheld devices have become more and more popular, mainly Features tend to be small size, low price, high computing power and powerful software functions. Due to advances in technology, many applications that cannot be implemented in traditional handheld devices have been realized today. However, currently on the market In the audio e-book service, professionals are required to make recordings first. In the printed books, books with coded patterns are required to be printed, and the cost of human resources is bound to increase, so the usability will be relatively reduced. In order to synthesize speech with higher naturalness and clarity, more training corpus must be used, and the collection, marking and correction of the balanced corpus will also cost a lot of manpower and Therefore, the general practitioner cannot meet the needs of the user in actual use. [Invention] The main purpose of the present invention is to overcome the above problems encountered in the prior art and to provide an integrated speech synthesis. (Text-t0_speech, τ1^) and human-machine interface interaction, including the way several books are produced and used, for use in handheld devices that assist the silver-haired family to read and listen with them. 201118856 For the above purposes The present invention relates to an embedded operating system platform for reading and listening e-book handheld devices, which comprises a storage unit connected to a TTS Synthesis System for receiving and storing the speech synthesis module. a synthesized speech file; a text recognition component connected to the storage unit and the sensing component, the rib obtaining the digital file recognized by the sensing component from the physical book, and the optical page identification of the voice file in the storage unit And after the identification is completed, converted into a text file output; a processing component is connected to the text recognition component to The word file is converted into a voice format; and a post-processing module is connected to the processing component for determining the final identification result for the recognized character and transmitting the final identification result to an output unit. [Embodiment] Please refer to the "Figure 1", which is a schematic diagram of the overall architecture of the present invention. As shown in the figure: The present invention is a T-type embedded operating system platform with an audio-visual e-book handheld The device comprises a storage unit 1 , a character recognition component 1 1 , a processing component 1 1 and a post-processing 13 . The feature is integrated speech synthesis (Text-t〇- Speech, TTS) and human-machine interface interaction can take into account the advantages of traditional paper books and e-books, and can be used to assist silver-haired people to read and listen with them. ^ Storage unit 1 〇 system and a speech synthesis module (TTS synthesis) The system 2 is connected to receive and store the synthesized speech slot of the speech synthesis module 2, wherein the voice file includes multimedia content of the speech data and the page number correspondence table. The character recognition component 11 is connected to the storage unit 1A and a sensing component 201118856 3 for obtaining the number of sub-files recognized by the sensing component 3 from the physical book 7 and matching the voice in the storage unit 1 The file is identified by optical page number and converted to a text file output after the identification is completed. The processing element 12 is coupled to the text recognition component 11 for converting the text to a speech format. The post-processing module 13 is connected to the processing component 12 for determining the final identification result for the recognized character, and transmitting the final identification result to an output unit 4 for voice playback output. As described above, it constitutes a new embedded operating system platform for reading and listening e-book handheld device 1. When used, the present invention uses the ARM920T processor (S3C2440A) developed by Samsung as the processing element 12, and uses an optical character recognition system (Optical Character Rec〇gniti〇n, 0CR) as the character recognition element 1 1. The operating system of the Linux environment is used to implement the LR-Book handheld cradle of the present invention. Firstly, the user interface is designed for the silver-haired family, and the content is first synthesized through the humanized speech synthesis module 2, and the maker can access the synthesized (four) body digitalized article content through the USB dump interface and download to the content. The LR_B〇〇k handheld device 储存储存储存储存最后最后最后最后最后最后最后最后最后最后最后最后最后最后最后最后最后最后最后最后最后最后最后最后最后最后最后最后最后最后 ii ii ii ii ii ii ii ii ii ii ii ii ii ii ii ii The article is voice outputted by the output unit 4 for reading purposes. In which the '3H output unit 4 is a single j. In view of the fact that the present invention can be applied to assist a silver-haired family to read and listen, the features of the device include the manner in which digital books are produced and used. Firstly, for the digital book generation, the device can edit or optically build the slot through the human i, and then convert the digital content of the book to the voice processing to generate a voice broadcast, that is, the voice data corresponding to the page number of 201118856 is generated. table. Among them, the text and rhythm analysis are used to break the sentence of the article content, _ and the word transliteration action, and then each vocabulary unit is affixed to the genre of the vocal synthesis················ The voice data and the page number correspondence table are placed on the storage unit towel of the device, and the user only needs to align the optical identification component of the device with the weight of the physical book and press the fiber to complete the color code, and then Play the voice content of this page. Referring to the "Fig. 2", the schematic of the speech synthesis module architecture of the present invention is shown. For example, the speech group 2 of the present invention is a HMM-based Speech Synthesis System (HTS), which includes a training part 2 and a synthesis part 2 1 . Among them, in the training department 2, it is collected by

之聲音語料2 0 1估算音韻參數及頻譜參數，而鱗音語料2 0 1對應之文字則由一文字分析器2 〇2分析出對應之音素序列（Label)。繼之’操取該頻譜參數與該音韻參數，並將此頻譜參數經過梅爾倒頻譜（MFC)之聲音合成技術（v〇c〇ding Technique)’提取梅爾倒頻譜係數（MFCC)後，與該音韻參數及該音素序列結合作為一隱藏式馬可夫模型之訓練資料 (Training Of HMM) 2 0 3，再配合上下文相關之問題集，訓練狀態合併分裂樹，產生出上下文相關之HMM模型與音長模型2 0 4;在該合成部21，所輸入之文字經由同樣之文字分析器211分析出音素序列以及所對應之上下文訊息，透過分類與回歸樹之挑選，選出對應之HMM模型序列，由產生出音韻參數與頻譜參數，再將此音韻與頻譜參數以一合成濾波器212合成為語音訊號輸出。 201118856 上述基於HMM之語音合成器主要核心技術係包括： (1 )基於梅爾倒頻譜之聲音合成技術，包含梅爾倒頻譜係數之分析，以及使用梅爾對數頻譜近似濾波器（Mel_1〇gThe sound corpus 2 0 1 estimates the phoneme parameters and the spectral parameters, and the text corresponding to the scale corpus 2 0 1 is analyzed by a text analyzer 2 〇 2 to the corresponding phoneme sequence (Label). Following the operation of the spectral parameters and the phonetic parameters, and the spectral parameters are extracted by the Meer Cepstrum (MFC) sound synthesis technique (v〇c〇ding Technique) to extract the Mel Cepstral Coefficient (MFCC). Combined with the phoneme parameter and the phoneme sequence as a hidden Markov model training data (Training Of HMM) 2 0 3, and then with the context-related problem set, the training state merges the split tree to generate a context-dependent HMM model and sound The long model 2 0 4; in the synthesizing unit 21, the input text is analyzed by the same text analyzer 211 to analyze the phoneme sequence and the corresponding context message, and the corresponding HMM model sequence is selected by the classification and regression tree selection. The phonological parameters and the spectral parameters are generated, and the phonological and spectral parameters are synthesized into a speech signal output by a synthesis filter 212. 201118856 The main core technologies of the above HMM-based speech synthesizers include: (1) Sound synthesis technology based on Mel's cepstrum, including analysis of Mel cepstral coefficients, and using Mel logarithmic spectrum approximation filter (Mel_1〇g

Spectrum Approximation Filter, MLSA Filter )直接將梅爾倒頻譜係數合成回語音訊號； (2 )從HMM模型產生語音參數時，係使用考慮參數動態特性之參數生成演算法；以及 (3 )基於多空間機率分佈-隱藏式馬可夫模型 (Multi-Space probability Distribution HMM, MSD-HMM)，考慮基頻僅在濁音段有值，而在清音段沒有定義這樣之特性，使參數之維度在濁音段為1，清音段為〇。請參閱『第3圖〜第6 B圖』所示，係分別為本發明之硬體嵌入式系統架構示意圖、本發明比較NAND FLASH.與N〇R flash之差異示意圖、本發明之LM1117規格書示意圖、本發明設計穩壓電路之範列一示意圖及本發明設計穩壓電路之範列二示意圖。如圖所示：係本裝置之硬體實現，其包含穩壓電路、額外之記憶體單元（SDRAM、NAND Flash)、音效控制電路、攝影鏡頭控制電路、串列傳輸介面電路、USB介面傳輸電路、及鍵盤控制電路，如第3圖所示。其中在此記憶體單元部份，本發明係採用NAND FLASH當作整體LR-B00K 手持裝置之記憶體單元，並捨棄掉N0RFLASH，其說明如第 4圖所示。而在穩壓電路方面係採用LM1117，因為整體系統需使用到多種不同之電壓，所以需透過穩壓器來調整出不同之電壓值’但因為穩學器並沒有生產出令使用者相符之電壓值範園’故本發明係自行設計穩壓電路，其範例如第5圖及第6 201118856 A、6 B圖所示’利用第5圖所提供之規格書設計&達到本發明所需電壓之電壓值。其中，鶴6㈣之範例係可將$伏特 (V)之電·穩壓至2·8ν，該第6 B圖之範例則可將之電壓穩壓至UV，其計算方式如下所示： (αϋέ+0·06^χ0·122^Ω+1.25ν22.8ν * 1.25ν +〇.06mA) x〇.〇\KCl + \ .25v = 1.3v 凊參閱『第7圖及第8圖』所示，係分別為本發明對各家: 嵌入式作業系統之比示意圖、及本發明LR B〇〇K手持装置之 ί呆作流程示意圖如圖所示：承上所述，本裝置係採用Linux 作業系統為其開發環境，與Plam、Win CE及Symbian等各家嵌入式作業系統之優缺點比較如第7圖所示。而在本裝置之整體操作流程如第8圖所示，係》光學辨識感測器作為感測元件 3由該感測元件3從實體書本上榻取圖像，再透過文字辨識碰1 1以二值化、各數字分離、特賴取及數字辨識等處理’取得·體書籍上所觸出來之數字槽，並触從儲存單疋1 0内事先利用ns所合成出之語音資料中搜尋對應之語音檔’最後透過喇叭將其聲音播放出來。請參閱『第9圖』所示，係本發明一較佳實例之整體架構不意圖。如圖所示：於一較佳實例中，本發明係包含一有聲書產生器5供語音稽產生、以及一基於ARM920T開發之有聲筆裝置6，並在Linux環境開發光學辨識及音檔對應、播放功能。上述有聲書產生器5係包含一文字與韻律分析單元5 〇及一語音合成模組51，用以遶過該文字與韻律分析單元5 〇 201118856 將文章内之語句轉換為具有料與結構資訊之文稿資訊，再透過此基於HMM 槽0 之語音合成模組51將語句内容轉換為語音該有聲筆裝置6係包含-文字辨識元件6 0、-儲存單元 6 1及一音檔對應/播放單元6 2，且該音檔對應/播放單元6 2係包含一 ARM920T處理器（S3C244〇A)及一後處理模組。用以透過該有聲筆裝置6之資料傳輸介©，將合成後之語音槽存放至該儲存單元61中。當辨識時，係透過該文字辨識元件6 〇取得目前正在閱讀之實體書籍7内容範ϋ (或電子書全文），從該實體書籍所^ 識出來之數字檔，配合該儲存單元6 i内之語音撐（即包括語音資料及頁碼對應表之多媒體數位化文章内容）加以光學頁碼辨識，並於辨識完成後轉換為一文字檔且輸出至該音檔對應/ 播放單元6 2，將該文字檔轉換為語音格式，且對辨識過之字兀決定最後辨識結果後’俾以將此最後辨識結果之文章透過一剩〇八8以語音輸出，進而達到隨讀隨聽之目的。上述經由本發明所開發之語音合成模組中，為採用基於 HMM之語音合成器。其在語義不可預測句子 Unpredictable Sentence，SUS)聽寫之測試中，平均受測者之正確率係可達到96.4〇/〇 ;而在針對不同之題材短文測試中，主觀 «平/則之自然度平均意見得分（Mean Opinion Score, MOS)亦可達到3.6分。由此可知，本裝置已可合成出流暢及可理解之語音。同時合成部份之語音模型所佔記憶體空間極小，故在可攜性灰適應性上更為其發展優勢。藉此’本發明係提出創意兼顧傳統紙本書籍及電子書之優 201118856 點’可協助銀髮族隨讀隨聽閱讀，利用整合語音合成、光學頁碼辨識及系統單晶片（System Qn ehip，Μ}軟硬體共_ &，可減^人力與_之支出並增加可利用性。細上所述’本發明係—種嵌人式作業系統平台之隨讀隨聽電子書手_置’可有效改善習狀種種細，雜合語音合 =人機介面互動’包含數位書籍產生及方式，可合成: *暢及可雜之料並具可攜性，俾讀躲協祕髮族隨讀，聽閱讀之目的，進而使本發明之産生能更進步、更實用、更符合使用者之所須，確已符合發明專利申請之要件，爰依法提出專利申請。惟以上所述者，僅為本發明之較佳實施例而已，當不能以此限定本㈣實施之細；故，凡依本發明㈣專利範圍及發明說明書内谷所作之解的等效變化與修飾，皆應仍屬本發明專利涵蓋之範圍内。【圖式簡單說明】第1圖’係本發明之整體架構示意圖。第2圖’係本發明之語音合成模組架構示意圖。第3圖，係本發明之硬體嵌入式系統架構示意圖。第4圖’係本發明比較NAND FLASH與NOR FLASH之差異示意圖。第5圖’係本發明之LM1117規格書示意圖。第6 A圖’係本發明設計穩壓電路之範列一示意圖。第6 B圖’係本發明設計穩壓電路之範列二示意圖。第7圖，係本發明對各家嵌入式作業系統之比示意圖。 201118856 第8圖，係本發明LR_BOOK手持裝置之操作流程示意圖。第9圖，係本發明一較佳實例之整體架構示意圖。【主要元件符號說明】隨讀隨聽電子書手持裝置1 儲存單元10 文字辨識元件11 φ 處理元件12 後處理模組13 語音合成模組2 訓練部2 0 聲音語料201 文字分析器202 隱藏式馬可夫模型之訓練資料2 0 3 HMM模型與音長模型204 • 合成部21 文字分析器211 合成濾波器212 感測元件3 輸出單元4 有聲書產生器5 文字與韻律分析單元5 0 語音合成模組51 有聲筆裝置6 12 201118856 文字辨識元件6 Ο 儲存單元61 音檔對應/播放單元6 2 實體書籍7 0刺口八8Spectrum Approximation Filter, MLSA Filter) directly synthesizes the Mel cepstral coefficients back to the speech signal; (2) when generating speech parameters from the HMM model, generates algorithms using parameters that take into account the dynamic characteristics of the parameters; and (3) based on multi-space probability Multi-Space probability distribution HMM (MSD-HMM), considering that the fundamental frequency has a value only in the voiced segment, but does not define such a feature in the unvoiced segment, so that the dimension of the parameter is 1 in the voiced segment, unvoiced The paragraph is 〇. Please refer to FIG. 3 to FIG. 6B for a schematic diagram of the architecture of the hardware embedded system of the present invention, the difference between the NAND FLASH. and the N〇R flash of the present invention, and the LM1117 specification of the present invention. The schematic diagram, the schematic diagram of the design of the voltage regulator circuit of the present invention and the schematic diagram of the second embodiment of the voltage regulator circuit of the present invention. As shown in the figure: it is a hardware implementation of the device, which includes a voltage stabilization circuit, an additional memory unit (SDRAM, NAND Flash), a sound effect control circuit, a photographic lens control circuit, a serial transmission interface circuit, and a USB interface transmission circuit. And the keyboard control circuit, as shown in Figure 3. In the memory unit portion, the present invention uses NAND FLASH as the memory unit of the overall LR-B00K handheld device, and discards NORFSLASH, as illustrated in FIG. In the aspect of the voltage regulator circuit, the LM1117 is used. Because the whole system needs to use a variety of different voltages, it is necessary to adjust the voltage value through the voltage regulator. However, because the stabilizer does not produce a voltage that matches the user. The value of Fan Park's invention is based on the design of the voltage regulator circuit, such as Figure 5 and Figure 6 201118856 A, 6 B diagram 'Using the specifications provided in Figure 5 design & reach the voltage required by the present invention The voltage value. Among them, the example of Crane 6 (4) can regulate the voltage of $V (V) to 2·8ν, and the example of Figure 6B can regulate the voltage to UV, which is calculated as follows: (αϋέ +0·06^χ0·122^Ω+1.25ν22.8ν * 1.25ν +〇.06mA) x〇.〇\KCl + \ .25v = 1.3v 凊 See “Figure 7 and Figure 8”, The schematic diagram of the ratio of the embedded operating system and the LR B〇〇K handheld device of the present invention are shown in the figure as shown in the figure: As described above, the device adopts the Linux operating system. For its development environment, the advantages and disadvantages of various embedded operating systems such as Plam, Win CE and Symbian are shown in Figure 7. In the overall operation flow of the device, as shown in FIG. 8, the optical identification sensor is used as the sensing component 3, and the sensing component 3 takes an image from the physical book, and then touches the text recognition. The binarization, the digit separation, the special acquisition and the digital identification process are used to process the digital slot touched by the body book, and the search is performed by using the voice data synthesized by ns in the storage unit. The corresponding voice file 'finally plays its sound through the speaker. Please refer to FIG. 9 for the overall architecture of a preferred embodiment of the present invention. As shown in the figure, in a preferred embodiment, the present invention comprises an audio book generator 5 for voice generation, and an vocal device 6 based on the ARM920T, and develops optical identification and audio file correspondence in a Linux environment. Playback function. The audio book generator 5 includes a text and prosody analysis unit 5 and a speech synthesis module 51 for bypassing the text and prosody analysis unit 5 〇201118856 to convert the sentence in the article into a document with material and structure information. The information is converted into speech by the speech synthesis module 51 based on the HMM slot 0. The stylus device 6 includes a text recognition component 60, a storage unit 61, and a audio file corresponding/playing unit 6 2 And the audio file corresponding/playing unit 6 2 includes an ARM920T processor (S3C244〇A) and a post-processing module. The synthesized voice slot is stored in the storage unit 61 through the data transmission medium of the stylus device 6. When the identification is performed, the content recognition component 6 is used to obtain the content of the physical book 7 (or the full text of the electronic book) currently being read, and the digital file recognized from the physical book is matched with the storage unit 6 i The voice support (that is, the multimedia digitized article content including the voice data and the page number correspondence table) is identified by the optical page number, and after the identification is completed, converted into a text file and output to the sound file corresponding/playing unit 6 2, and the text file is converted. For the voice format, and after determining the final identification result for the recognized word, the article is used to output the final identification result through a remaining eight-8 voice output, thereby achieving the purpose of reading and listening. In the speech synthesis module developed by the present invention, an HMM-based speech synthesizer is used. In the unpredictable Sentence (SUS) dictation test, the average test subject's correct rate can reach 96.4 〇 / 〇; and in the different subject essay test, the subjective « flat / then natural average The Mean Opinion Score (MOS) can also reach 3.6 points. It can be seen that the device has been able to synthesize a smooth and understandable speech. At the same time, the synthesized speech model occupies a very small memory space, so it has a development advantage in terms of portability gray adaptability. In this way, the invention is based on the idea that the traditional paper book and the e-book of the best 201118856 point can assist the silver-haired family to read and listen, using integrated speech synthesis, optical page identification and system single chip (System Qn ehip, Μ} The combination of software and hardware _ &, can reduce the expenditure of manpower and _ and increase the availability. The above-mentioned invention is a type of embedded electronic operating system platform. Effectively improve the variety of habits, heterozygous speech = human-machine interface interaction 'including digital book production and methods, can be synthesized: * smooth and miscellaneous materials and portability, reading the secret association secrets with the reader, Listening to the purpose of reading, and thus making the invention more progressive, more practical, and more in line with the needs of the user, has indeed met the requirements of the invention patent application, and filed a patent application according to law. The preferred embodiment of the invention is not limited to the implementation of the fourth embodiment; therefore, the equivalent variation and modification of the solution according to the scope of the invention (4) and the invention of the invention should still belong to the patent of the invention. Covered by BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a schematic diagram of the overall architecture of the present invention. FIG. 2 is a schematic diagram of a speech synthesis module architecture of the present invention. FIG. 3 is a schematic diagram of a hardware embedded system architecture of the present invention. Fig. 4 is a schematic diagram showing the difference between NAND FLASH and NOR FLASH according to the present invention. Fig. 5 is a schematic diagram of the LM1117 specification of the present invention. Fig. 6A is a schematic diagram of the design of the voltage regulator circuit of the present invention. 6B is a schematic diagram of the second embodiment of the present invention. The seventh diagram is a schematic diagram of the ratio of the present invention to various embedded operating systems. 201118856 FIG. 8 is a schematic diagram showing the operation flow of the LR_BOOK handheld device of the present invention. Figure 9 is a schematic diagram of the overall architecture of a preferred embodiment of the present invention. [Description of main component symbols] Read and write e-book handheld device 1 Storage unit 10 Character recognition component 11 φ Processing component 12 Post-processing module 13 Speech synthesis Module 2 Training Department 2 0 Sound Corp. 201 Text Analyzer 202 Training Material of Hidden Markov Model 2 0 3 HMM Model and Sound Length Model 204 • Synthesis Unit 21 Text Analyzer 211 Synthesis filter 212 Sensing element 3 Output unit 4 Audiobook generator 5 Text and prosody analysis unit 5 Speech synthesis module 51 Audiobook component 6 12 201118856 Character recognition component 6 储存 Storage unit 61 Audio/correspondence/playback unit 6 2 Entity books 7 0 spurs eight 8

Claims

201118856 VII. Patent application scope: '1. The embedded e-book handheld device with embedded human operation system platform is characterized by integrated speech synthesis (Text_tG_Speeeh, TTS) and human-machine interface interaction including several books to generate and make money. The program is designed to assist the silver-haired family to read and listen, including: a storage unit connected to a speech synthesis module (TTS Symhesis System). The component is connected to the storage unit and a sensing component for taking the digital slot recognized by the sensing component from the physical book, and matching the voice file in the storage unit for optical page number identification, and after the identification is completed Converting to a text file output; a processing component coupled to the text recognition component for converting the text file to a voice format; and a post-processing device coupled to the processing component for Determine the final identification result for the recognized character, and transmit the final identification result to an output unit for voice playback output2. The portable audio-visual handheld device of the embedded operating system platform according to the scope of the patent application scope, wherein the character recognition component is separated by a binary computer, digital separation processing, and feature extraction The processing and digital identification process obtains the digital file identified from the physical book. 3. The audio-visual e-book handheld device of the embedded operating system platform according to claim 1, wherein the character recognition component aligns the sensing component with a page number of the physical book and presses a scan key The optical page number is recognized by the 'voice file' in the storage unit to obtain the text file. 4) The portable audio-visual handheld device of the embedded operating system platform according to the scope of the patent application, wherein the voice view includes the multimedia digital article content of the voice data and the page correspondence table. 5. The portable e-book handheld device of the embedded operating system platform according to the scope of the patent scope of the present invention, wherein the voice group and the text and rhythm analysis unit are formed - the sound f generates H The words in the article and the analytic list are used to convert the words in the article (4) into the manuscript with the pronunciation and bribes, and then the speech content is converted into a voice file through the speech synthesis module. 6 · According to the patent application scope of the mourning-type operating system platform, the P-reading e-book hand is set, wherein the speech synthesis module is a speech synthesizer based on the hidden Markov model ( HMM based Spee is more than System, HTS) 〇7 · According to the towel, please read the embedded work-reading e-book handheld device described in the second paragraph, the towel, the text recognition component material - optical character recognition System (Optical Character Recognition, 0〇〇. • The embedded operating system platform of the embedded operating system platform according to the scope of claim 1 is manually placed, wherein the processing component is an ARM92〇t processor. (S3C2440A) • According to the application of the special fiber (4) 1 item of the embedded operating secret platform of the audio-visual handheld device, wherein the output unit is a supervisor eight. 0. According to the scope of patent application i The portable operating system platform of the embedded listening and e-book handheld device, wherein the operating system platform is a u operating system. 1. The embedded operating system according to claim 1 of the patent scope The platform is connected to the e-book handheld device, wherein the device accesses the synthesized voice file through the USB transmission interface 201118856 and downloads it to the storage unit. 1 2 · - Type (4) platform reading The e-book handheld device is characterized by integrated speech synthesis and human-machine interface interaction, including the generation and use of digital books, and assists the silver-haired family to read and listen with _, including: - audio book generator, including - The text and prosody analysis unit and a speech synthesis joy" are used to convert the sentences in the article into manuscript information with pronunciation* structure^, and then through the speech synthesis, the county (four) capacity is converted into a voice broadcast, - the vocal pen device' The system includes a text recognition component, a storage unit, and a phonetic symbol corresponding/subtracting unit, wherein the synthesized voice broadcast is stored in the storage unit; and when recognized, the text component is transmitted through the text. Obtain the number recognized from the physical book · 'Compatate the voice file in the simple storage unit with the optical page number identification' and convert it into a text building and output after the identification is completed The sound box corresponding/playing unit '俾 enables the article to be voice output. ^ 1 3 · According to the shot, please read the embedded operating system platform as described in item i 2 of the patent scope _ e-booker Wei Wei's towel, The tone tearing unit includes -ARM92GT (4) (S3C244GA) and - post processing module, which is used to change the text (4) into a voice format 'and determines the final identification result for the recognized character, so as to identify the last The result is output by voice. 14· According to the application system of the enemy operating system platform described in item 12 of the scope of the patent application, the e-reader is placed in the county, and the voice output is overwhelmed. 15 · According to the application system of the proud operating system platform described in claim 12, the speech synthesis module is a speech synthesizer of a hidden Markov model. The portable audio-reading device of the embedded operating system platform according to claim 12, wherein the character recognition component is an optical character recognition system. 17. The portable operating system e-book handheld device of the embedded operating system platform according to claim 12, wherein the operating system platform is a Linux operating system. 18. The portable audio-reading device of the embedded operating system platform according to claim 12 of the patent application, wherein the voice file comprises a multimedia digitized article content of a voice data and a page number correspondence table. 19. The portable audio-reading device of the embedded operating system platform according to claim 12, wherein the character recognition component is processed by binarization processing, digital separation processing, feature extraction processing, and The digital identification process obtains the digital file identified from the physical book. 2 0. According to the embedded operating system platform described in claim 12, the portable reading system is manually placed, wherein the character recognition component is to press the page of the sensing element to the physical book of the channel. The scan key is used to perform optical page number recognition with the voice file in the storage unit to obtain the text file. 1 17