TW202009891A - E-book apparatus with audible narration and method using the same - Google Patents

E-book apparatus with audible narration and method using the same Download PDF

Info

Publication number
TW202009891A
TW202009891A TW107127720A TW107127720A TW202009891A TW 202009891 A TW202009891 A TW 202009891A TW 107127720 A TW107127720 A TW 107127720A TW 107127720 A TW107127720 A TW 107127720A TW 202009891 A TW202009891 A TW 202009891A
Authority
TW
Taiwan
Prior art keywords
content
dynamic
mark
display area
text content
Prior art date
Application number
TW107127720A
Other languages
Chinese (zh)
Other versions
TWI717627B (en
Inventor
洪士哲
吳宗銘
陳秀華
雷珵麟
鄧旭敦
施詠禎
蔡忠婷
廖秀美
吳淑琴
Original Assignee
台灣大哥大股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 台灣大哥大股份有限公司 filed Critical 台灣大哥大股份有限公司
Priority to TW107127720A priority Critical patent/TWI717627B/en
Publication of TW202009891A publication Critical patent/TW202009891A/en
Application granted granted Critical
Publication of TWI717627B publication Critical patent/TWI717627B/en

Links

Images

Landscapes

  • User Interface Of Digital Computer (AREA)

Abstract

The invention relates to an automatic reading apparatus that receives a multi-media content including text content. The reading apparatus includes a display having a display region for the multi-media content; an input interface receiving an input signal associated with a location identification and/or a change of the multi-media content in the display region; and a reading and highlight unit generating audio content as well as one or more dynamic highlights associated with the text content, said dynamic highlight skip from a first portion of said text content to a second portion of said text content in response to the input signal.

Description

電子書語音朗讀裝置及其方法Electronic book voice reading device and method

本發明關於一種電子書閱讀裝置及其方法,尤其是一種能夠語音朗讀與動態標記文字內容之電子書閱讀裝置。The invention relates to an electronic book reading device and a method thereof, in particular to an electronic book reading device capable of voice reading and dynamically marking text content.

電子書發展至今已經多年,常見的電子書格式包含PDF、EPUB、mobi及AZW等等。根據現有的技術,電子書所包含的圖片內容和文字內容均可完整地視覺呈現,唯有關電子書的朗讀功能這塊卻是發展較緩慢,特別是針對機器學習的自動朗讀功能。此原因在於,機器朗讀的難度相當高,需要克服單調的機器發音及上下文語意的分析才可順利朗讀。舉例而言,文字內容「3/4開幕典禮」與「3/4的影響範圍」,其中雖然都載有「3/4」,但前者是朗讀為三月四日,後著是朗讀為四分之三。然而,這些問題隨著AI技術的發展皆陸續克服,電子書的朗讀功能未來將成逐漸普及。E-books have been developed for many years. Common e-book formats include PDF, EPUB, mobi, AZW, etc. According to the existing technology, the picture content and text content contained in the e-book can be completely visually presented, but the reading function of the e-book is relatively slow to develop, especially the automatic reading function for machine learning. The reason for this is that the difficulty of reading aloud by the machine is quite high, and it is necessary to overcome the monotonous machine pronunciation and the analysis of contextual semantics to read aloud smoothly. For example, the text contents "3/4 Opening Ceremony" and "3/4 Scope of Influence", although both contain "3/4", the former is read aloud as March 4th, and the latter is read aloud as four Thirds. However, these problems have been overcome with the development of AI technology, and the reading function of e-books will gradually become popular in the future.

現有的電子書閱讀裝置可開啟朗讀功能,且部分還伴隨文字內容的標記(highlight)來引導讀者閱讀,讓讀者可透過文字的標記與朗讀的配合更輕鬆的進入閱讀狀態。然而,現有電子書的朗讀及標記功能僅是單調地按照文字內容的順序性單向進行,不容許朗讀及標記的目標被任意選擇。Existing e-book reading devices can open the reading function, and some are accompanied by highlights of the text content to guide readers to read, so that readers can more easily enter the reading state through the cooperation of the text marking and the reading. However, the reading and marking functions of the existing e-books are only performed monotonously according to the sequence of the text content, and the targets of reading and marking are not allowed to be arbitrarily selected.

據此,有必要發展一種朗讀裝置或方法,允許依據使用者操作而選擇性地改變朗讀的目標,且文字的標記也一併同步。Accordingly, it is necessary to develop a reading device or method that allows the target of reading to be selectively changed according to the user's operation, and the marking of the text is also synchronized.

本發明目的在於提供一種自動朗讀裝置,經配置以接收並顯示一多媒體內容,該多媒體內容至少包含文字內容,該朗讀裝置包含:一顯示器,具有一顯示區域以顯示該多媒體內容的一部分;一輸入介面,接收一輸入訊號,該輸入訊號與在該顯示區域中的一位置辨識及/或與該多媒體內容的該部分在該顯示區域中的變化有關;及一朗讀及標記單元,經配置以產生關聯於所述文字內容的聲音內容及一或多個動態標記,所述動態標記自所述文字內容的一第一部分跳躍至所述文字內容的一第二部分以回應該輸入訊號,其中所述文字內容的第一部分與一第一聲音內容有關,所述文字內容的第二部分與該輸入訊號有關且出現在該顯示區域中。An object of the present invention is to provide an automatic reading device configured to receive and display a multimedia content, the multimedia content including at least text content, the reading device includes: a display with a display area to display a part of the multimedia content; an input Interface, receiving an input signal related to a position recognition in the display area and/or to the change of the portion of the multimedia content in the display area; and a reading and marking unit configured to generate Sound content and one or more dynamic tags associated with the text content, the dynamic tags jumping from a first part of the text content to a second part of the text content in response to the input signal, wherein the The first part of the text content is related to a first sound content, and the second part of the text content is related to the input signal and appears in the display area.

在一具體實施例中,所述動態標記具有一句子標記。所述動態標記具有一單字標記。或者,所述動態標記具有一句子標記及一單字標記,該句子標記與該單字標記視覺可區隔地重疊。該句子標記的範圍由所述文字內容的兩個標點符號定義。In a specific embodiment, the dynamic mark has a sentence mark. The dynamic mark has a single word mark. Or, the dynamic mark has a sentence mark and a word mark, and the sentence mark and the word mark overlap visually and distinguishably. The range of the sentence mark is defined by the two punctuation marks of the text content.

在一具體實施例中,所述文字內容的第二部分與一第二聲音內容有關。In a specific embodiment, the second part of the text content is related to a second sound content.

本發明還提供一種非暫態電腦可讀取媒介,包含複數個指令,可由一處理單元執行以:分析一多媒體內容包含的文字內容以辨識複數個句子及/或單字;接收一輸入訊號,該輸入訊號與在一顯示區域中的一位置辨識及/或與該多媒體內容的一部分在該顯示區域中的變化有關;產生關聯於所述文字內容的一或多個動態標記以回應該輸入訊號,所述動態標記為可視於該顯示區域中;及令所述動態標記自所述文字內容的一第一部分跳躍至所述文字內容的一第二部分,其中所述文字內容的第二部分與該輸入訊號有關且出現在該顯示區域中。The present invention also provides a non-transitory computer readable medium, including a plurality of instructions, which can be executed by a processing unit to: analyze the text content contained in a multimedia content to identify a plurality of sentences and/or words; receive an input signal, the The input signal is related to a position recognition in a display area and/or to a change of a part of the multimedia content in the display area; generating one or more dynamic marks associated with the text content in response to the input signal, The dynamic mark is visible in the display area; and the dynamic mark jumps from a first part of the text content to a second part of the text content, wherein the second part of the text content and the The input signal is related and appears in the display area.

在一具體實施例中,該等指令更執行:基於所述文字內容的句子及/或單字的一辨識產生對應的聲音內容,所述聲音內容的輸出與所述動態標記同步。In a specific embodiment, the instructions are further executed: generating a corresponding sound content based on a recognition of the sentence and/or word of the text content, and the output of the sound content is synchronized with the dynamic mark.

在一具體實施例中,所述產生關聯於所述文字內容的一或多個動態標記,包含取消一原動態標記。In a specific embodiment, the generating one or more dynamic tags associated with the text content includes canceling an original dynamic tag.

在一具體實施例中,所述該多媒體內容的一部分在該顯示區域中的變化,包含關於該顯示區域的一捲動操作或一翻頁操作。In a specific embodiment, the change of a part of the multimedia content in the display area includes a scrolling operation or a page turning operation about the display area.

一種自動朗讀方法,由一運算裝置的處理單元執行,該方法包含:取得並顯示一多媒體內容的一部分於一顯示器的顯示區域上,其中該多媒體內容具有文字內容;起始一機械朗讀手段以基於所述文字內容輸出聲音內容;產生一或多個動態標記於該顯示區域中,所述動態標記指示所述文字內容的一句子及/或一單字,所述動態標記所指示的文字內容與該聲音內容關聯的文字內容同步;及接收一輸入訊號,該輸入訊號與在該顯示區域中的一位置辨識及/或與該多媒體內容的該部分在該顯示區域中的變化有關,所述動態標記的顯示及聲音內容的輸出自所述文字內容的一第一部分跳躍至一第二部分以回應該輸入訊號,其中所述文字內容的第一部分與一第一聲音內容有關,所述文字內容的第二部分與該輸入訊號和一第二聲音內容有關且出現在該顯示區域中。An automatic reading method is executed by a processing unit of an arithmetic device. The method includes: acquiring and displaying a part of a multimedia content on a display area of a display, wherein the multimedia content has text content; starting a mechanical reading method based on The text content outputs sound content; one or more dynamic marks are generated in the display area, the dynamic mark indicates a sentence and/or a word of the text content, the text content indicated by the dynamic mark and the Synchronization of text content associated with audio content; and receiving an input signal related to a position recognition in the display area and/or related to the change of the portion of the multimedia content in the display area, the dynamic mark The display and output of sound content jump from a first part of the text content to a second part in response to the input signal, wherein the first part of the text content is related to a first sound content, the The two parts are related to the input signal and a second sound content and appear in the display area.

在一具體實施例中,所述產生一或多個動態標記於該顯示區域中包含同時產生指示一句子的一第一動態標記及指示一單字的一第二動態標記,該第一動態標記與該第二動態標記視覺可區隔地重疊。In a specific embodiment, the generating one or more dynamic marks in the display area includes simultaneously generating a first dynamic mark indicating a sentence and a second dynamic mark indicating a word, the first dynamic mark and The second dynamic marker can visually overlap.

在一具體實施例中,該輸入訊號是關聯於一觸控介面的操作、一影像辨識結果或一語音辨識結果。In a specific embodiment, the input signal is associated with a touch interface operation, an image recognition result, or a voice recognition result.

在一具體實施例中,所述動態標記的顯示自所述文字內容的第一部分跳躍至第二部分以回應該輸入訊號,包含所述動態標記的顯示自所述文字內容之一第一部分的第一句子跳躍至所述文字內容之一第二部分的第二句子。In a specific embodiment, the display of the dynamic mark jumps from the first part to the second part of the text content in response to the input signal, and the display containing the dynamic mark is displayed from the first part of the first part of the text content One sentence jumps to the second sentence of the second part of one of the text content.

在以下多個示例具體實施例的詳細敘述中,對該等隨附圖式進行參考,該等圖式形成本發明之一部分。且係以範例說明的方式顯示,藉由該範例可實作該等所敘述之具體實施例。提供足夠的細節以使該領域技術人員能夠實作該等所述具體實施例,而要瞭解到在不背離其精神或範圍下,也可以使用其他具體實施例,並可以進行其他改變。此外,雖然可以如此,但對於「一具體實施例」的參照並不需要屬於該相同或單數的具體實施例。因此,以下詳細敘述並不具有限制的想法,而該等敘述具體實施例的範圍係僅由該等附加申請專利範圍所定義。In the following detailed description of a number of example specific embodiments, reference is made to the accompanying drawings, which form part of the present invention. It is shown by way of example description, and the described specific embodiments can be implemented by this example. Provide sufficient details to enable those skilled in the art to implement the described specific embodiments, and understand that other specific embodiments can be used and other changes can be made without departing from the spirit or scope thereof. In addition, although this may be the case, the reference to "a specific embodiment" does not need to belong to the same or singular specific embodiment. Therefore, the following detailed description does not have a limiting idea, and the scope of the specific embodiments of the description is only defined by the scope of the additional patent applications.

在整體申請書與申請專利範圍中,除非在上下文中另外明確說明,否則以下用詞係具有與此明確相關聯的意義。當在此使用時,除非另外明確說明,否則該用詞「或」係為一種包含的「或」用法,並與該用詞「及/或」等價。除非在上下文中另外明確說明,否則該用詞「根據」並非排他,並允許根據於並未敘述的多數其他因子。此外,在整體申請書中,「一」、「一個」與「該」的意義包含複數的參照。「在…中」的意義包含「在…中」與「在…上」。In the scope of the entire application and patent application, unless otherwise clearly stated in the context, the following word systems have a meaning clearly associated with this. When used herein, unless expressly stated otherwise, the term "or" is an inclusive usage of "or" and is equivalent to the term "and/or". Unless expressly stated otherwise in the context, the term "based on" is not exclusive and allows the basis of most other factors not stated. In addition, in the overall application, the meanings of "one", "one" and "the" include plural references. The meaning of "in" includes "in" and "on".

當在此使用時,該用詞「網路連接」意指一種鏈結及/或軟體元件的集合,能使一計算裝置透過一網路與另一計算裝置通訊。一種所述網路連接可為傳輸控制協定(TCP)連接。傳輸控制協定連接為兩網路節點之間的虛擬連接,且一般而言係透過一種傳輸控制協定交握通訊協定所建立。As used herein, the term "network connection" means a collection of links and/or software components that enables a computing device to communicate with another computing device through a network. One such network connection may be a transmission control protocol (TCP) connection. The transmission control protocol connection is a virtual connection between two network nodes, and is generally established through a transmission control protocol handshake communication protocol.

以下簡短提供該等創新主題的簡要總結,以提供對某些態樣的一基本瞭解。並不預期此簡短敘述做為一完整的概述。不預期此簡短敘述用於辨識主要或關鍵元件,或用於描繪或是限縮該範圍。其目的只是以簡要形式呈現某些概念,以做為稍後呈現之該更詳細敘述的序曲。The following provides a brief summary of these innovative themes to provide a basic understanding of some aspects. This short narrative is not intended as a complete overview. This short description is not intended to identify major or key components, or to depict or limit the scope. Its purpose is only to present some concepts in a brief form as a prelude to the more detailed narrative presented later.

第一圖顯示本發明提供的一系統(100),其包含一或多個伺服器(102)及其網路連接的多個使用者終端裝置(104),又稱用戶裝置或使用者裝置,尤其所述終端裝置(104)適於作為電子書閱讀器。對於用戶裝置(104)而言,所述伺服器(102)為一遠端伺服器。伺服器(102)可經由編程以創建一網站或是可供使用者裝置之瀏覽軟體存取的其他形式,以便讓使用者經由網路(106)下載伺服器允許存取的資料,如應用程式、多媒體內容資料、軟體更新資料等。伺服器(102)可經進一步配置以執行特定的運算,並將運算結果經由網路連接提供至用戶裝置(104)。在一些實施例中,伺服器(102)可提供一電子書網站及電子書閱讀軟體之下載連結。使用者可經由給定的伺服器存取電子書,伺服器的存取可被限制,例如限制存取的人數或資料流量等。The first figure shows a system (100) provided by the present invention, which includes one or more servers (102) and multiple user terminal devices (104) connected to the network thereof, also known as user devices or user devices, In particular, the terminal device (104) is suitable as an e-book reader. For the user device (104), the server (102) is a remote server. The server (102) can be programmed to create a website or other forms that can be accessed by the browsing software of the user device, so that the user can download the data allowed by the server, such as an application, through the network (106) , Multimedia content data, software update data, etc. The server (102) may be further configured to perform specific operations and provide the results of the operations to the user device (104) via a network connection. In some embodiments, the server (102) may provide a download link for an e-book website and e-book reading software. The user can access the e-book through a given server, and the server's access can be restricted, such as limiting the number of people who access or data traffic.

一般而言,伺服器(102)包含有一或多個中央處理器(1020)及記憶體(1022)用以儲存可由處理器執行的多個操作指令。網路介面(1024)網路連接至一或多個網路(106)及使用者終端裝置(104),以接收來自網路的資料、請求及指令並向用戶裝置發送各種形式的資料,例如數位儲存單元(1026)所存放的多媒體內容資料,包含圖片資料、文字資料及聲音資料。處理器(1020)可經由網路(106)從其他電腦系統或服務接收資訊及指令。例如,處理器(1020)可利用網路介面(1024)接收或提供用於電子書呈現的各種內容項目。處理器(1020)可進一步利用網路介面(1024)接收或傳送關於內容項目的同步資訊。處理器(1020)可與記憶體(1022)通訊以存取在其中的各種操作指令。在一些實施例中,中央處理器(1020)所扮演的角色及其執行的操作可由用戶裝置(104)固有的處理器分擔或取代。Generally speaking, the server (102) includes one or more central processors (1020) and a memory (1022) for storing a plurality of operation instructions executable by the processor. Network interface (1024) The network is connected to one or more networks (106) and user terminal devices (104) to receive data, requests and commands from the network and send various forms of data to the user device, for example The multimedia content data stored in the digital storage unit (1026) includes picture data, text data and sound data. The processor (1020) can receive information and instructions from other computer systems or services via the network (106). For example, the processor (1020) may utilize a network interface (1024) to receive or provide various content items for e-book presentation. The processor (1020) may further utilize the network interface (1024) to receive or transmit synchronized information about content items. The processor (1020) can communicate with the memory (1022) to access various operation instructions therein. In some embodiments, the role played by the central processor (1020) and the operations it performs may be shared or replaced by the processor inherent in the user device (104).

記憶體(1024)包含多個電腦可執行指令,其可由中央處理器(1020)執行以實現本發明所揭露的各種操作。記憶體(1024)可包含任何暫態或非暫態記憶體的組合,包含RAM、ROM、硬碟、固態硬碟、快閃記憶體等。記憶體(1024)可儲存一操作系統,其提供多個電腦程式指令由處理器(1020)使用於一般的內容物管理和操作中。在其他實施例中,數位資料儲存單元(1026)可被包含在記憶體(1024)的配置中並儲存用於在用戶裝置呈現的各種內容項目,如圖片內容、文字內容及聲音內容。數位資料儲存單元(1026)可包含關於內容項目的其他資訊,例如關於內容項目的同步映射資訊、內容項目的元資料等。在其他實施例中,伺服器(102)可網路連接至外部的另一數位資料儲存單元(未顯示)以取得用於在用戶裝置呈現的內容項目或儲存在記憶體(1024)。The memory (1024) includes a plurality of computer-executable instructions that can be executed by the central processor (1020) to implement various operations disclosed in the present invention. Memory (1024) can include any combination of transient or non-transitory memory, including RAM, ROM, hard disk, solid state drive, flash memory, etc. The memory (1024) can store an operating system, which provides multiple computer program instructions to be used by the processor (1020) for general content management and operation. In other embodiments, the digital data storage unit (1026) may be included in the configuration of the memory (1024) and store various content items for presentation on the user device, such as picture content, text content, and sound content. The digital data storage unit (1026) may contain other information about the content item, such as synchronization mapping information about the content item, metadata of the content item, etc. In other embodiments, the server (102) may be network connected to another external digital data storage unit (not shown) to obtain content items for presentation on the user device or stored in memory (1024).

使用者終端裝置或用戶裝置(104)可以是一個人電腦、一平板電腦、一個人數位助理、行動裝置或任何適當的形式。在該實施例中,用戶裝置(104)包含一顯示器、一輸入單元(如實體鍵盤、觸控螢幕、滑鼠、收音器或成像單元)、處理單元、記憶體及其他用以執行本發明所有實施例之配置。The user terminal device or user device (104) may be a personal computer, a tablet computer, a digital assistant, a mobile device, or any suitable form. In this embodiment, the user device (104) includes a display, an input unit (such as a physical keyboard, a touch screen, a mouse, a microphone, or an imaging unit), a processing unit, a memory, and other devices used to execute the present invention. The configuration of the embodiment.

第二圖顯示第一圖用戶裝置(104)的一實施例(200),其包含一處理單元(202)、一電腦可讀取媒介(210)、一網路介面(220)、一記憶體(230)、一輸出/輸入介面(240)及一朗讀及標記單元(250)。相似地,處理單元(202)可自網路接收各種資訊、指令及多媒體內容。處理單元(202)可利用網路介面(220)以接收用於呈現電子書的各種內容項目。處理單元(202)還進一步利用網路介面(220)傳送或接收用於執行本發明各種實施例的其他資訊或指令,例如關於多個內容項目的同步映射資訊。處理單元(202)可存取記憶體(230)所包含的電腦可執行程式,如用戶端裝置的操作系統(231)、內容播放模組(232)及內容資料(233),並經由輸入/輸出介面(240)輸出至輸出單元(260),其可包含用於向使用者呈現各種內容項目的一或多個輸出裝置,如顯示器及揚聲器。輸入/輸出介面(240)可接收來自一輸入單元(270)的輸入,輸入單元(270)可包含一或多個輸入裝置,如觸控螢幕(輸出裝置與輸入裝置的結合)、滑鼠、麥克風及成像裝置。以觸控螢幕而言,一觸碰事件的發生產生相應的一輸入訊號,處理單元(202)可根據該輸入訊號決定關於該觸碰事件的一或多個座標,而進一步根據該等座標的分析處理單元(202)可辨識出顯示器的一或多個畫素及相應的觸控行為。據此,處理單元(202)可根據所述座標決定相關的畫素輸出。The second figure shows an embodiment (200) of the user device (104) of the first figure, which includes a processing unit (202), a computer readable medium (210), a network interface (220), and a memory (230), an output/input interface (240) and a reading and marking unit (250). Similarly, the processing unit (202) can receive various information, commands, and multimedia content from the network. The processing unit (202) can utilize the web interface (220) to receive various content items for presenting the e-book. The processing unit (202) further uses the network interface (220) to transmit or receive other information or instructions for performing various embodiments of the present invention, such as synchronous mapping information about multiple content items. The processing unit (202) can access the computer-executable programs contained in the memory (230), such as the operating system (231), content playback module (232), and content data (233) of the client device, and input/ The output interface (240) outputs to the output unit (260), which may include one or more output devices for presenting various content items to the user, such as a display and a speaker. The input/output interface (240) can receive input from an input unit (270). The input unit (270) can include one or more input devices, such as a touch screen (combination of output device and input device), a mouse, Microphone and imaging device. In terms of the touch screen, a touch event generates a corresponding input signal, and the processing unit (202) can determine one or more coordinates about the touch event according to the input signal, and further according to the coordinates of the touch event The analysis processing unit (202) can recognize one or more pixels of the display and corresponding touch behaviors. According to this, the processing unit (202) can determine the relevant pixel output according to the coordinates.

記憶體(230)可包含暫態及非暫態的任何組合,如RAM、ROM、硬碟、固態硬碟及快閃記憶體等。操作系統(231)提供用戶裝置的一般管理和操作的電腦編程指令,其因用戶端裝置的種類而異,且為本領域所熟知,故不在此贅述。內容播放模組(232)可經配置以執行關於各種內容項目的呈現,以及提供用於控制內容播放的使用者互動介面。內容資料(233)包含一或多個內容項目,如文字內容、圖片內容、聲音內容,其可經由內容播放模組(232)播放。內容資料(233)可進一步包含與各內容項目有關的其他資訊,例如在相異兩個內容項目之間的同步映射資訊,以及各內容項目的元資料。內容資料(233)可根據來自網路介面(220)或輸出/輸入介面(240)所接收的其他內容資料而產生更新。用戶裝置(200)可獲取外部的其他內容項目並存放在內容資料(233)中以實現即時串流播放或隨時播放。以電子書而言,內容播放模組(232)可處理文字內容、圖片內容及聲音內容,內容播放模組(232)還可提供使用者操作元件,如翻頁按鈕或捲動組件。The memory (230) may include any combination of transient and non-transient, such as RAM, ROM, hard disk, solid state drive, and flash memory. The operating system (231) provides computer programming instructions for general management and operation of the user device, which varies according to the type of the user device and is well known in the art, so it will not be repeated here. The content playback module (232) may be configured to perform presentations on various content items and provide a user interactive interface for controlling content playback. The content data (233) contains one or more content items, such as text content, picture content, and sound content, which can be played through the content playing module (232). The content data (233) may further include other information related to each content item, such as synchronization mapping information between two different content items, and metadata of each content item. The content data (233) can be updated according to other content data received from the network interface (220) or the output/input interface (240). The user device (200) can obtain other external content items and store them in the content material (233) to realize real-time streaming playback or play at any time. For e-books, the content playback module (232) can process text content, picture content, and sound content. The content playback module (232) can also provide user operation elements, such as page turning buttons or scrolling components.

朗讀及標記單元(250)經配置以執行對應文字內容的有聲朗讀及標記動作。在其他實施例中,朗讀及標記單元(250)可以拆分為相互獨立的一朗讀單元和一標記單元。在一些實施例中,朗讀及標記單元(250)可以是內容播放模組(232)的一部分或相關延伸。或者,朗讀及標記單元(250)的部分工作可在伺服器(100)端執行。第三圖顯示本發明朗讀及標記單元的一實施例(300),包含一文字產生引擎(301)、文字處理引擎(302)、語義分析引擎(303)、音訊匹配引擎(304)、文字標記引擎(305)及同步產生引擎(306)。The reading and marking unit (250) is configured to perform voice reading and marking actions corresponding to the text content. In other embodiments, the reading and marking unit (250) may be split into a reading unit and a marking unit that are independent of each other. In some embodiments, the reading and marking unit (250) may be part of the content playback module (232) or a related extension. Alternatively, part of the work of the reading and marking unit (250) may be performed on the server (100) side. The third figure shows an embodiment (300) of the reading and marking unit of the present invention, including a text generation engine (301), a word processing engine (302), a semantic analysis engine (303), an audio matching engine (304), and a text marking engine (305) and synchronization generation engine (306).

文字產生引擎(301)經配置以自伺服器(100)或內容資料(233)存放的一或多個內容項目中辨識文字內容並產生可被顯示的視覺文字及標點符號,並可根據內容項目中的其他資訊決定文字排版、字體及字型等視覺效果。所述排版可包含圖片與文字的視覺呈現。所述內容項目可由各種電子書專用的格式所定義,如PDF、EPUB及AZW等。所述視覺文字可被涵蓋在電子書的單一頁或分別多頁的空間中。在本發明其他可能的實施例中,如在有聲書的應用中,文字產生引擎(301)可經配置以利用已知的語音辨識手段而根據已接收的聲音內容產生對應的文字內容。The text generation engine (301) is configured to recognize text content from one or more content items stored in the server (100) or content data (233) and generate visual text and punctuation marks that can be displayed, and according to the content items Other information in determines the visual effects of text layout, fonts, and fonts. The typesetting may include visual presentation of pictures and text. The content items can be defined by various e-book specific formats, such as PDF, EPUB and AZW. The visual text may be covered in a single page of the e-book or a space of multiple pages respectively. In other possible embodiments of the present invention, as in an audio book application, the text generation engine (301) may be configured to use known speech recognition means to generate corresponding text content based on the received sound content.

文字處理引擎(302)經配置以辨識文字內容中的一或多個句子。舉例而言,根據已知的規則,句子可以是介於兩個鄰近句點之間的文字,或任兩個鄰近標點符號(逗號和句號)之間的文字。括號所涵蓋的一或多個文字組成也可被視為句子的辨識。在其他實施例中,可進一步根據基於機械學習的技術來優化句子的辨識,此可解決可能因標點符號錯誤所導致的無法辨識。在一實施例中,經辨識為一句子的文字內容可給予一識別符或標籤並與對應的文字內容一起存放在記憶體,即每一句子具有各自的一識別符或標籤。例如,可給予這些句子特定的識別符,使得句子可被識別且句子與句子彼此之間的關係能夠被清楚定義,例如句子與句子的順序關係,句子所出現的段落或行數。The word processing engine (302) is configured to recognize one or more sentences in the text content. For example, according to known rules, a sentence may be a text between two adjacent periods, or any two adjacent punctuation marks (comma and period). The composition of one or more words covered by the parentheses can also be regarded as sentence recognition. In other embodiments, the sentence recognition can be further optimized according to the technology based on machine learning, which can solve the unrecognition that may be caused by punctuation error. In one embodiment, the text content recognized as a sentence may be given an identifier or label and stored in the memory together with the corresponding text content, that is, each sentence has its own identifier or label. For example, specific identifiers can be given to these sentences so that the sentences can be identified and the relationship between the sentences and the sentences can be clearly defined, such as the sequential relationship between the sentences and the sentences, and the number of paragraphs or lines in the sentences.

語義分析引擎(303)經配置以根據已辨識的一或多個句子決定關聯於該一或多個句子的語義特徵,其可伴隨文字內容及所述識別符或標籤存放在記憶體中。這邊所述語義特徵是指與句子的文法、文義及/或字詞組成有關的統計或衡量。在一實施例中,語義分析引擎(303)可將每一句子的文字分為多段並針對每一段決定對應的語義特徵。所述語義分析引擎(303)可由已知的機器學習手段實現(如???),而語義分析引擎(303)的建立可以在遠端伺服器完成並下載安裝至使用者終端裝置。可替代地,語義分析引擎(303)可不在用戶裝置中執行,而是在遠端伺服器執行並將分析結果存放在遠端伺服器。語義分析引擎(303)可經由持續的訓練回饋而不斷優化語義分析的精準度,甚至偵測句子中的錯誤。The semantic analysis engine (303) is configured to determine the semantic features associated with the one or more sentences based on the recognized one or more sentences, which may be stored in memory along with the text content and the identifier or tag. The semantic features mentioned here refer to the statistics or measurement related to the grammar, semantic meaning and/or word composition of the sentence. In one embodiment, the semantic analysis engine (303) may divide the text of each sentence into multiple segments and determine the corresponding semantic features for each segment. The semantic analysis engine (303) can be implemented by known machine learning means (such as ??), and the establishment of the semantic analysis engine (303) can be completed on a remote server and downloaded to the user terminal device. Alternatively, the semantic analysis engine (303) may not be executed in the user device, but executed in the remote server and store the analysis result in the remote server. The semantic analysis engine (303) can continuously optimize the accuracy of semantic analysis through continuous training feedback, and even detect errors in sentences.

音訊匹配引擎(304)經配置以根據關聯於一句子的語義特徵辨識與該句子對應的一或多個聲音內容,藉此完成文字內容及聲音內容的匹配。所述聲音內容可以是一或多個檔案構成並可經轉換成聲音訊號而經揚聲器輸出。在一實施例中,音訊匹配引擎(304)可存取一音訊樣本資料庫(圖中未示),其可存放有與各種字詞對應的候選聲音內容。在一實施例中,在音訊樣本資料庫中,對應一字或一詞的每一聲音內容項目可被關聯於一或多個語義特徵,而所述匹配是至少基於字、詞及/或句子的語義特徵和聲音內容所關聯之語義特徵的辨識。所述匹配使所述文字內容(字、詞、句子)與一或多個聲音內容產生關聯。所述音訊匹配引擎(304)可由已知的手段實現,例如自動朗讀應用程式。在其他實施例中,如有聲書的應用,可以預錄的人聲朗讀取代音訊匹配引擎所合成的聲音內容,意即人聲朗讀的聲音內容可經處理而關聯至對應的文字內容作為播放。The audio matching engine (304) is configured to recognize one or more sound contents corresponding to a sentence according to the semantic features associated with the sentence, thereby completing the matching of text content and sound content. The sound content may be composed of one or more files and may be converted into sound signals and output through a speaker. In one embodiment, the audio matching engine (304) can access an audio sample database (not shown), which can store candidate sound contents corresponding to various words. In an embodiment, in the audio sample database, each sound content item corresponding to a word or word may be associated with one or more semantic features, and the matching is based at least on words, words and/or sentences Identification of the semantic features associated with the sound content and the semantic features of the sound content. The matching associates the textual content (words, words, sentences) with one or more sound content. The audio matching engine (304) can be implemented by known means, such as an automatic reading application. In other embodiments, if there is an audio book application, the voice content synthesized by the audio matching engine can be read by pre-recorded vocals, which means that the vocal content of the vocal reading can be processed and associated with the corresponding text content for playback.

文字標記引擎(305)經配置以根據一起始訊號或一輸入訊號而產生一或多個動態標記於關聯該起始訊號或輸入訊號的句子及/或文字。所述動態標記可經由顯示器視覺呈現給使用者。動態標記句有任何可能的形式,如於文字上的螢光標記、文字下方的底線、文字的顏色/字型/字體等。此處的動態是指標記會在自動朗讀期間隨著朗讀目標的前進而在句子及/或字詞間出現及跳躍的動作(skip),當自動朗讀停止時標記會靜止於文字或消失。所述起始訊號指示了一自動朗讀動作的開始。文字標記引擎(305)會標記文字內容中所識別的第一句子或第一字詞以回應該起始訊號。或者,所述起始訊號可進一步指示經暫停後繼續自動朗讀動作的開始。標記的跳躍頻率與句子或字詞的長短還有自動朗讀的速度有關。所述輸入訊號是經由輸入單元(270)所產生,此處的輸入訊號指示了顯示器的一顯示區域上的位置資訊。在一實施例中,所述輸入訊號是基於顯示器的一顯示區域的一座標或一座標集合之辨識所產生(如使用者點選觸控螢幕),其中所述座標是關聯於一或多個像素位置。在另一實施例中,所述輸入訊號是基於多媒體內容的一部分的選擇而產生(如使用者在顯示區域中點選顯示內容的一部分)。可替代地,所述輸入訊號指示了未被顯示的多媒體內容的位置資訊(如使用者在顯示的目錄上點選第三章)。文字標記引擎(305)可關聯一標記至未顯示的文字內容以回應該輸入訊號。值得注意的是,雖然多媒體內容未被顯示,但可根據已套用的排版規則而決定多媒體內容中各內容項目的一位置資訊(如文字內容的第三段第七句位在第九頁第一至五行)。前述內容播放模組(232)可提供一輸入欄位允許輸入電子書的導覽資訊,如章節、頁數、行數。The text tagging engine (305) is configured to generate one or more dynamic tags based on a start signal or an input signal in sentences and/or text associated with the start signal or input signal. The dynamic mark can be visually presented to the user via the display. There are any possible forms of dynamic markup sentences, such as fluorescent marks on the text, underline below the text, color/type/font of the text, etc. The dynamic here refers to the action that the mark will appear and jump between sentences and/or words as the reading goal advances during automatic reading. When the automatic reading stops, the mark will stop at the text or disappear. The start signal indicates the start of an automatic reading. The text marking engine (305) marks the first sentence or first word recognized in the text content in response to the starting signal. Alternatively, the start signal may further indicate the start of the automatic reading operation after the pause. The jumping frequency of the mark is related to the length of the sentence or word and the speed of automatic reading. The input signal is generated by the input unit (270), where the input signal indicates position information on a display area of the display. In one embodiment, the input signal is generated based on the recognition of a coordinate or a set of coordinates in a display area of the display (such as the user tapping the touch screen), wherein the coordinate is associated with one or more Pixel location. In another embodiment, the input signal is generated based on the selection of a part of the multimedia content (eg, the user clicks on a part of the display content in the display area). Alternatively, the input signal indicates location information of multimedia content that is not displayed (eg, the user clicks on Chapter 3 on the displayed directory). The text tagging engine (305) can associate a tag to undisplayed text content in response to the input signal. It is worth noting that although the multimedia content is not displayed, the location information of each content item in the multimedia content can be determined according to the applied typographic rules (for example, the seventh sentence of the third paragraph of the text content is on the first page of the ninth page) To five lines). The foregoing content playback module (232) can provide an input field allowing input of navigation information of the e-book, such as chapters, pages, and lines.

以電子書為例,第四圖例示一顯示器的顯示畫面(400),其中一顯示區域(401)顯示了文字內容的一部分,而其他部分未被顯示或被視窗(402)覆蓋。可選擇地,未被顯示的內容可經由一捲動操作或一翻頁操作而出現。舉例而言,捲動操作(403)所對應的輸入訊號指示顯示區域(401)上的一座標集合係沿著一縱向方向變化,據此未被顯示的文字內容可由顯示區域(401)的上方或下方載入畫面。翻頁操作(404)所對應的輸入訊號指示顯示區域(401)上的一座標集合係沿著一橫向方向變化,據此未被顯示的內容可由顯示區域(401)的左右側邊載入畫面。動態標記可不回應未顯示內容的載入,或者動態標記可回應未顯示內容的載入而維持在顯示區域(401)中。一選擇操作可關聯於顯示區域(401)中的文字內容。如圖示,一第一選擇操作(405)所對應的輸入訊號指示對應文字內容「我知道」的一座標或一座標集合,文字標記引擎可據此標記該文字內容所對應的句子或字詞。一第二選擇操作(406)所對應的輸入訊號指示對應一頁邊空白的一座標或一座標集合,文字標記引擎可據此標記與該頁邊空白(margin)最靠近的文字內容的句子或字詞。一第三選擇操作(407)所對應的輸入訊號指示對應兩個句子連接觸的一座標或一座標集合,文字標記引擎可據此選擇標記兩個句子的其中一者。Taking an e-book as an example, the fourth diagram illustrates a display screen (400) of a display, in which a display area (401) displays a part of text content, while other parts are not displayed or covered by a window (402). Alternatively, the undisplayed content may appear through a scrolling operation or a page turning operation. For example, the input signal corresponding to the scrolling operation (403) indicates that a set of coordinates on the display area (401) changes along a longitudinal direction, and accordingly the undisplayed text content can be above the display area (401) Or the loading screen below. The input signal corresponding to the page turning operation (404) indicates that a set of coordinates on the display area (401) changes along a horizontal direction, according to which the undisplayed content can be loaded into the screen by the left and right sides of the display area (401) . The dynamic mark may not respond to the loading of the undisplayed content, or the dynamic mark may remain in the display area (401) in response to the loading of the undisplayed content. A selection operation can be associated with the text content in the display area (401). As shown in the figure, the input signal corresponding to a first selection operation (405) indicates a mark or a set of marks corresponding to the text content "I know", and the text marking engine can accordingly mark the sentence or word corresponding to the text content . An input signal corresponding to a second selection operation (406) indicates a mark or a set of marks corresponding to a page margin, and the text marking engine can mark the sentence or the text content closest to the margin according to the margin Word. An input signal corresponding to a third selection operation (407) indicates a mark or a set of marks corresponding to the connection of two sentences, and the text marking engine can select and mark one of the two sentences accordingly.

各種形式的輸入單元可實現前述操作並產生對應的輸入訊號。觸控螢幕為普遍的輸入單元,也可提供直覺的操作。可替代地,輸入單元可為成像裝置用於捕捉讀者的眼球影像或是手勢影像,並配合影像辨識而產生操作所對應的輸入訊號。已知的影像辨識技術可判斷讀者眼睛在顯示區域中(401)所聚焦的位置或掃視以識別前述操作。例如,當眼球盯著顯示區域(401)中的一位置長達一時間或配合一扎眼動作,選擇操作可被辨識。當讀者遠離顯示器且給予一揮動手勢或指向手勢時,翻頁操作或選擇操作可被識別。可替代地,輸入單元可為用於捕捉人聲的麥克風。已知的語音辨識技術可判斷讀者給出的關鍵字並對應產生關聯選擇操作的輸入訊號。進一步,配合已知搜尋技術的搜尋技術,文字標記引擎可標記文字內容中所有被選擇的關鍵字。這些輸入單元的選擇性對於身障人士來說是友善的,另一方面也有助於教學領域的應用,而非僅侷限於已知電子書的使用。Various forms of input units can implement the aforementioned operations and generate corresponding input signals. The touch screen is a universal input unit and can also provide intuitive operation. Alternatively, the input unit may be an imaging device used to capture the reader's eyeball image or gesture image, and cooperate with the image recognition to generate an input signal corresponding to the operation. Known image recognition techniques can determine the position or saccade of the reader's eye in the display area (401) to identify the aforementioned operation. For example, when the eyeball is staring at a position in the display area (401) for a period of time or with an eye-catching motion, the selection operation can be recognized. When the reader moves away from the display and gives a waving gesture or pointing gesture, the page turning operation or the selection operation can be recognized. Alternatively, the input unit may be a microphone for capturing human voice. The known speech recognition technology can judge the keywords given by the reader and correspondingly generate the input signal of the associated selection operation. Further, in conjunction with the search technology of the known search technology, the text tagging engine can tag all selected keywords in the text content. The selectivity of these input units is friendly to people with disabilities, on the other hand, it is also helpful in the field of teaching, not just limited to the use of known e-books.

可選擇性地,一或多個動態標記可顯示於顯示區域中。第五A至五B圖顯示本發明動態標記的的各種實施例示意。第五A圖顯示針對單一句子的動態標記(501)。在捲動畫面的實施例中,動態標記(501)於自動朗讀期間可被維持在顯示區域的一水平高度或一範圍,因此隨著自動朗讀的進行畫面是動態的被自動捲動。在翻頁畫面的實施例中,當動態標記出現在當前頁面的底部內容,接著下一畫面的載入將動態標記至於畫面的頂部內容。第五B圖顯示針對單一文字的動態標記(502)。然而,當遇到冗長的句子或朗讀速度過快的情況,單純使用句子標記(501)或單字標記(502)均有其缺點。因此,綜合兩者可相互彌補缺點。如第五C圖同時顯示兩種標記,其中單字標記(502)被包含在單句標記(501)中,兩者可給予適當的視覺區別,例如顏色或透明度的處理。第五D圖進一步顯示段落標記(503),其適用於基於段落的縮小內容。Optionally, one or more dynamic marks can be displayed in the display area. The fifth figures A to B show schematic diagrams of various embodiments of the present invention. Figure 5A shows the dynamic tagging for a single sentence (501). In the embodiment of the scrolling animation screen, the dynamic mark (501) can be maintained at a horizontal height or a range of the display area during the automatic reading, so the screen is dynamically scrolled automatically as the automatic reading progresses. In the embodiment of the page turning screen, when the dynamic mark appears at the bottom content of the current page, then the loading of the next screen will dynamically mark the top content of the screen. Figure 5B shows the dynamic markup (502) for a single text. However, when encountering lengthy sentences or reading too fast, the simple use of sentence markers (501) or word markers (502) has its disadvantages. Therefore, combining the two can make up for each other's shortcomings. As shown in the fifth image C, two kinds of marks are displayed at the same time, in which the single-word mark (502) is included in the single-sentence mark (501), and the two can give appropriate visual distinction, such as the treatment of color or transparency. The fifth image D further shows the paragraph mark (503), which is suitable for the reduced content based on the paragraph.

返參第三圖,同步產生引擎(306)經配置以將辨識的聲音內容與對應的一或多個動態標記同步。同步產生引擎(306)可根據儲存的同步資訊或識別資訊(即識別符或標籤)將關於一句子或一字詞的聲音內容與動態標記同步輸出至輸出單元(270),如揚聲器及顯示器。在一實施例中,同步產生引擎(306)可利用已知的識別符或標籤將文字內容中的一部分及其對應的動態標記與聲音內容中的一部分產生關聯,例如經由已知的一連結手段,其中所述識別符或標籤係用於識別文字內容中的一句子或字詞。在一些施例中,同步產生引擎(306)可持續執行連結手段直到同步完成所有文字內容與聲音內容。同步產生引擎(306)的執行可以在遠端伺服器完成並將同步結果儲存在雲端,其可隨著多媒體內容一並下載至用戶裝置。進一步地,同步產生引擎(306)可基於聲音內容項目的播放時間而決定同步的動態標記的顯示時間,並記錄於同步結果中。Referring back to the third figure, the synchronization generation engine (306) is configured to synchronize the recognized sound content with the corresponding one or more dynamic tags. The synchronization generation engine (306) can output the sound content about a sentence or a word and the dynamic mark to the output unit (270), such as a speaker and a display, according to the stored synchronization information or identification information (that is, identifier or label). In an embodiment, the synchronization generation engine (306) may use a known identifier or label to associate a part of the text content and its corresponding dynamic tag with a part of the sound content, for example, through a known linking method , Where the identifier or label is used to identify a sentence or word in the text content. In some embodiments, the synchronization generation engine (306) may continuously execute the linking method until all the text content and sound content are synchronized. The execution of the synchronization generation engine (306) can be completed on the remote server and the synchronization result is stored in the cloud, which can be downloaded to the user device along with the multimedia content. Further, the synchronization generation engine (306) may determine the display time of the synchronized dynamic mark based on the playing time of the sound content item, and record it in the synchronization result.

第六圖顯示使用者與自動朗讀裝置(如第一圖的用戶裝置,104)的互動流程圖,包含步驟S600至S640。在步驟S600,使用者開啟用戶裝置所安裝的電字書閱讀器並將電子書檔案經由該閱讀器開啟。所述電子書閱讀器可自遠端伺服器提供的網站或連結下載,或可內建於用戶裝置。閱讀器可包含控制介面以允許使用者選擇性導覽文章內容。閱讀器還可包含其他附加功能的選擇,如自動朗讀與文字標記的輔助。閱讀器可提供一閱讀視窗顯示於顯示器的一顯示區域,其顯示電子書內容的一部分,包含文字內容和圖片內容,甚至可點擊的連結。閱讀器的導覽模式可根據使用者設定或載入電子書檔案的類型而為捲動導覽模式或翻頁導覽模式。閱讀器開啟後,文章的一部分內容出現在顯示區域中,步驟S600結束The sixth figure shows a flowchart of interaction between the user and the automatic reading device (such as the user device 104 in the first figure), including steps S600 to S640. In step S600, the user turns on the e-book reader installed on the user device and opens the e-book file through the reader. The e-book reader can be downloaded from a website or link provided by a remote server, or can be built into a user device. The reader may include a control interface to allow the user to selectively navigate the content of the article. The reader can also contain options for other additional functions, such as automatic reading and the aid of text marking. The reader can provide a reading window displayed on a display area of the display, which displays a part of the e-book content, including text content and picture content, and even clickable links. The navigation mode of the reader can be scrolling navigation mode or page turning navigation mode according to the user setting or the type of e-book file loaded. After the reader is turned on, part of the content of the article appears in the display area, and step S600 ends

在步驟S610,使用者經由閱讀器的控制介面(如虛擬或實體按鍵)啟動自動朗讀及標記功能。在一實施例中,使用者可點選如第四圖視窗(402)中的虛擬按鈕而呼叫一選擇介面的顯示。可替代地,使用者可自顯示區域的一邊緣以滑動的手勢拉出列有多項可選擇功能的一選擇介面。朗讀及標記功能可以是個別獨立的。較佳地,當兩者被決定為主動狀態時,朗讀與標記的結果應同步。當朗讀與標記功能為主動時,使用者按下播放鍵後,朗讀之聲音訊號與動態標記實質同時為使用者所接收。動態標記係以單字、單詞或單句為單位出現在顯示區域中,且動態標記所關連的文字與聲音訊號關聯的文字完全或部分匹配。動態標記與聲音訊號會以適當的速度且依文字內容預定的順序而產生。如第四圖所例示的顯示畫面,動態標記與朗讀聲音訊號係由顯示區域中的第一行第一句或第一字往下自動同步關聯至顯示區域中的最後一句或最後一字。在捲動導覽模式中,當朗讀至顯示區域的最後一句或字,閱讀器可自動捲動畫面,使未被顯示的文字內容取代原來的一部分,並從更新的部分接續朗讀及標記。在翻頁導覽模式中,當朗讀至顯示區域的最後一句或字,閱讀器可自動載入未顯示的文字內容來取代原來的文字,並從更新的部分接續朗讀及標記。除非使用者命令閱讀器停止動作,否則朗讀的聲音訊號與動態標記會依文章的順序持續播放直到文章結束,結束步驟S610。In step S610, the user activates the automatic reading and marking functions through the control interface of the reader (such as virtual or physical keys). In one embodiment, the user can click on the virtual button in the fourth window (402) to call the display of a selection interface. Alternatively, the user can pull a selection interface listing a plurality of selectable functions in a sliding gesture from an edge of the display area. The reading and marking functions can be independent. Preferably, when the two are determined to be active, the results of reading aloud and marking should be synchronized. When the reading and marking functions are active, after the user presses the play button, the audio signal and the dynamic marking of the reading are received by the user at the same time. The dynamic mark appears in the display area in units of words, words, or sentences, and the text associated with the dynamic mark completely or partially matches the text associated with the sound signal. Dynamic marks and audio signals are generated at an appropriate speed and in a predetermined order of text content. As shown in the display picture illustrated in the fourth figure, the dynamic mark and the reading voice signal are automatically synchronized from the first sentence or the first word in the first line of the display area to the last sentence or the last word in the display area. In the scrolling navigation mode, when reading to the last sentence or word in the display area, the reader can automatically scroll the animated page to replace the original part of the undisplayed text content, and continue reading and marking from the updated part. In the page navigation mode, when reading to the last sentence or word in the display area, the reader can automatically load the undisplayed text content to replace the original text, and continue reading and marking from the updated part. Unless the user instructs the reader to stop the action, the voice signal and the dynamic mark read aloud will continue to play in the order of the article until the end of the article, and step S610 ends.

在步驟S620,閱讀器判斷使用者是否有指示跳躍或略過部分文章內容。所述跳躍或略過是指使用者指定文章中的一個新的標記及朗讀目標內容,其不包含在當前被標記及/或朗讀的目標內容。所述跳躍或略過的動作可代表使用者希望從當前的朗讀、標記及/或顯示目標內容切換至另一個未朗讀、未標記及/或未顯示目標內容,即使用者希望改變當前朗讀及/或標記的文字內容。例如,使用者可經由閱讀器決定及改變朗讀和動態標記的句子。閱讀器持續偵測是否有任何關於使用者指示跳躍文章的輸入訊號,結束步驟S620。In step S620, the reader determines whether the user has an instruction to skip or skip part of the article content. The skipping or skipping refers to a new tag and reading target content in the article specified by the user, which is not included in the currently marked and/or reading target content. The jump or skip action may represent that the user wishes to switch from the current reading, marking and/or displaying target content to another unreading, marking and/or non-displaying target content, ie the user wishes to change the current reading and reading /Or marked text content. For example, a user can decide and change aloud and dynamically marked sentences via a reader. The reader continuously detects whether there is any input signal about the user's instruction to skip the article, and ends step S620.

如果閱讀器沒有收到相關的指示,閱讀器會依預定的文章順序依續地朗讀和標記下一個句子或單字,直到文章結束(步驟S630)。If the reader does not receive the relevant instructions, the reader will read and mark the next sentence or word in sequence according to the predetermined article sequence until the end of the article (step S630).

在步驟S640,使用者指示閱讀器切換自動朗讀和動態標記的目標內容。例如,使用者可經由導覽找到目標內容並於其中選擇自動朗讀和動態標記的一起始位置。在一可行的實施例中,在導覽的過程(無論是捲動或翻頁),自動朗讀和動態標記的動作可隨著顯示區域中文字內容的變化而自動識別一起始位置。例如,當捲動畫面的過程導致當前朗讀及標記中的文字消失在顯示區域中,閱讀器可經配置以自動識別當前顯示區域中的一位置,其允許新的朗讀及標示從此接續。在其他實施例中,朗讀與標記的動作不會隨著顯示區域的變化而改變,意即縱使捲動畫面的過程導致當前朗讀及標記中的文字消失在顯示區域中,朗讀的位置及順序仍未改變。此適用於使用者僅導覽,但未意圖改變當前的朗讀及標示目標。In step S640, the user instructs the reader to switch the target content of automatic reading and dynamic tagging. For example, the user can find the target content through the navigation and select a starting position for automatic reading and dynamic marking. In a feasible embodiment, during the navigation process (whether scrolling or turning pages), the actions of automatic reading and dynamic marking can automatically recognize a starting position as the text content in the display area changes. For example, when the process of scrolling the animation surface causes the text in the current reading and marking to disappear in the display area, the reader can be configured to automatically recognize a position in the current display area, which allows new reading and marking to continue from then on. In other embodiments, the actions of reading and marking will not change with the change of the display area, which means that even if the process of scrolling the animation surface causes the current reading and the text in the mark to disappear in the display area, the position and order of reading Unchanged. This applies to users who only navigate, but do not intend to change the current reading and marking goals.

第七圖顯示本發明動態標記文字內容的步驟流程,包含步驟S700至S730。這些步驟可由存放在一或多個記憶體(如存在第一圖伺服器或用戶裝置)的多個可執行指令所實現。這些步驟的執行可在使用者終端裝置完成,或者這些步驟的一部分可在遠端伺服器執行,或者這些步驟的部分可由終端裝置及遠端伺服器共同執行。The seventh figure shows the flow of steps for dynamically marking text content of the present invention, including steps S700 to S730. These steps can be implemented by multiple executable instructions stored in one or more memories (such as the first image server or user device). The execution of these steps may be performed on the user terminal device, or part of these steps may be performed on the remote server, or part of these steps may be performed on both the terminal device and the remote server.

在步驟S700,一指令可經配置以執行分析一多媒體內容(如電子書檔案)以辨識出該多媒體內容包含之文字內容的複數個句子及/或單字。所述辨識可基於已知的語義辨識及機械學習而實現。在一實施例中,所述句子的辨識是基於標點符號間的關聯性。在其他實施例中,所述辨識可包含嘗試對一連串的文字內容切割出不同的區段以進行分析。在一些實施例中,所述辨識可包含分析一連串文字內容的語義而決定一句子的範圍。經辨識的句字、單字或段落可給予對應的辨識資訊,如辨識符或標籤,其可具體指示該句子在一文章中的上下文關係或位置。這些辨識資訊可隨著該多媒體內容被儲存及傳送,結束步驟S700。In step S700, an instruction may be configured to perform analysis of a multimedia content (such as an e-book file) to recognize a plurality of sentences and/or words of the text content included in the multimedia content. The recognition can be achieved based on known semantic recognition and machine learning. In one embodiment, the recognition of the sentence is based on the correlation between punctuation marks. In other embodiments, the identification may include attempting to cut out a series of text content for analysis. In some embodiments, the recognition may include analyzing the semantics of a series of text content to determine the scope of a sentence. Recognized sentences, words or paragraphs can be given corresponding recognition information, such as identifiers or tags, which can specifically indicate the contextual relationship or position of the sentence in an article. The identification information can be stored and transmitted along with the multimedia content, and step S700 ends.

在步驟S705,一指令經配置以執行基於前述文字內容的句子及/或單字的辨識產生對應的聲音內容,此可採已知機械朗讀手段實現。已知手段可針對文字內容的每一單字、一單詞及一單句子輸出或合成為對應的聲音內容。這些聲音內容可在朗讀之前產生並儲存於適當的記憶體,或者這些聲音內容可以在執行朗讀的過程中產生並及時地輸出。In step S705, an instruction is configured to perform sentence and/or word recognition based on the aforementioned text content to generate corresponding sound content, which can be achieved by known mechanical reading methods. Known means can output or synthesize corresponding sound content for each word, word and sentence of the text content. These sound contents can be generated and stored in an appropriate memory before reading aloud, or these sound contents can be generated and output in a timely manner during the execution of the reading aloud.

在步驟S710,一指令經配置以令一處理單元處理一輸入訊號,尤其該輸入訊號與在一顯示器之顯示區域中的一位置辨識及/或與該多媒體內容的一部分在該顯示區域中的變化有關。輸入訊號由如第二圖用戶端裝置的輸出/輸入介面(240)接收。不同的輸入單元(260)可與輸出/輸入介面(240)通訊連接,如觸控面板、光學鏡頭或麥克風。關於這些輸入單元所產生的輸入訊號已如前述。如第四圖所示,使用者可經由這些輸入單元產生相對於顯示區域的導覽操作及選擇操作,其中導覽操作(403、404)將致使多媒體內容的部分在該顯示區域中的變化(如捲動、翻頁、切換),而選擇操作(405、406、407)是致使在顯示區域中的一位置選擇。在一操作中,使用者首先執行導覽操作以找到希望閱讀的內容,接著執行選擇操作以決定一閱讀項目,如句子或字詞。據此,處理單元至少獲得顯示區域中的一或多個位置資訊,步驟S710結束。In step S710, an instruction is configured to cause a processing unit to process an input signal, in particular the input signal and a position recognition in a display area of a display and/or changes in a portion of the multimedia content in the display area related. The input signal is received by the output/input interface (240) of the client device as shown in the second figure. Different input units (260) can communicate with the output/input interface (240), such as touch panels, optical lenses, or microphones. The input signals generated by these input units are as described above. As shown in the fourth figure, the user can generate a navigation operation and a selection operation relative to the display area through these input units, where the navigation operation (403, 404) will cause a part of the multimedia content to change in the display area ( Such as scrolling, page turning, switching), and the selection operation (405, 406, 407) is to cause a position selection in the display area. In one operation, the user first performs a navigation operation to find what he wants to read, and then performs a selection operation to decide a reading item, such as a sentence or a word. According to this, the processing unit obtains at least one or more pieces of position information in the display area, and step S710 ends.

在步驟S720,一指令經配置以令處理單元(或標記單元)產生關聯於所述文字內容的一或多個動態標記,所述動態標記為可視於該顯示區域中。在一實施例中,處理單元自動產生所述動態標記於顯示區域中的文字。在另一實施例中,根據前述所獲得之顯示區域中的位置辨識資訊,處理單元找到對應該位置辨識的句子或字詞並產生所述動態標記於顯示區域中相應的範圍。如第四圖所示,無論使用者的選擇操作(405、406、407)所關聯的位置是否直接指出文章的一句子或單字,與該位置最相關的句子應優先被識別並給予標記。在一實施例中,處理單元還可根據導覽操作產生所述動態標記於顯示區域最終停留的文字內容。例如,根據一翻頁操作,新的動態標記可產生於新頁面的第一句。一或多個可視之動態標記可出現在顯示區域中。如第五A至五D圖顯示了單一句子標記、單一單字標記及其組合之示意。In step S720, an instruction is configured to cause the processing unit (or marking unit) to generate one or more dynamic marks associated with the text content, the dynamic marks being visible in the display area. In an embodiment, the processing unit automatically generates the text of the dynamic mark in the display area. In another embodiment, according to the position recognition information in the display area obtained above, the processing unit finds the sentence or word corresponding to the position recognition and generates the dynamic mark in the corresponding range in the display area. As shown in the fourth figure, regardless of whether the position associated with the user's selection operation (405, 406, 407) directly points to a sentence or word of the article, the sentence most relevant to that position should be recognized and marked with priority. In an embodiment, the processing unit may further generate the text content where the dynamic mark finally stays in the display area according to the navigation operation. For example, according to a page turning operation, a new dynamic mark may be generated in the first sentence of a new page. One or more visible dynamic markers can appear in the display area. Figures 5A to 5D show single sentence marks, single word marks and their combinations.

在步驟S730,處理單元令所述動態標記自所述文字內容的一第一部分跳躍至所述文字內容的一第二部分以回應該輸入訊號,其中所述文字內容的第二部分出現在該顯示區域中。在一可能的情況中,於當前顯示區域中的一第一句子已見有一動態標記,而在使用者選擇當前顯示區域中的一第二句子後(即產生輸入訊號),原第一句子的動態標記跳躍至使用者所選擇的第二句子。本文所描述的跳躍並非是指具體的跳躍動作,而應理解為視覺上類似跳躍或切換的視覺效果。所述跳躍的視覺效果可看出動態標記忽略了第一句子和第二句子之間其他句子的停留。在另一可能的情況中,見有動態標記的一第一句子因導覽操作而從當前的顯示區域消失,同時使用者選擇當前顯示區域中的一第二句子後(即產生輸入訊號),動態標記回到顯示區域中並標記第二句子。儘管動態標記有可能因操作而自顯示區域消失,可假想所述動態標記是從未顯示部分跳躍至顯示中的內容。步驟S730可與步驟S720同時執行或者為步驟S720的一部分。In step S730, the processing unit causes the dynamic mark to jump from a first part of the text content to a second part of the text content in response to the input signal, wherein the second part of the text content appears on the display In the area. In a possible situation, a first sentence in the current display area has seen a dynamic mark, and after the user selects a second sentence in the current display area (that is, an input signal is generated), the original first sentence's The dynamic marker jumps to the second sentence selected by the user. The jump described herein does not refer to a specific jump action, but should be understood as a visual effect similar to jump or switching. The visual effect of the jump can be seen that the dynamic mark ignores the stay of other sentences between the first sentence and the second sentence. In another possible situation, a first sentence with a dynamic mark disappears from the current display area due to the navigation operation, and the user selects a second sentence in the current display area (that is, generates an input signal), The dynamic marker returns to the display area and marks the second sentence. Although the dynamic mark may disappear from the display area due to the operation, it may be assumed that the dynamic mark jumps from the undisplayed portion to the content being displayed. Step S730 may be performed simultaneously with step S720 or be part of step S720.

第八圖顯示本發明朗讀方法的步驟流程,包含步驟S800至S830。這些步驟可由一或多個運算裝置(如第一圖伺服器102及用戶裝置104)各別地或共同執行。例如,當用戶裝置與伺服器通訊連線,這些步驟的部分可由兩者共同執行。或者,在離現狀態,用戶裝置可獨立執行這些步驟。The eighth figure shows the step flow of the reading method of the present invention, including steps S800 to S830. These steps can be performed individually or jointly by one or more computing devices (such as the first image server 102 and the user device 104). For example, when the user device communicates with the server, some of these steps can be performed by both. Alternatively, in the free state, the user device can independently perform these steps.

在步驟S800,經由一遠端伺服器或一用戶裝置取得一多媒體內容,並將該多媒體內容的一部分顯示於一顯示器的顯示區域上。所述多媒體內容可以是各種內容項目的組合,如文字內容、圖片內容、聲音內容及影像內容,其亦可整合在串流內容中從遠端伺服器往用戶裝置傳送。顯示器可被包含在用戶裝置或者是獨立於用戶裝置且與之通訊連接的一外接裝置。該顯示區域(如第四圖,401)顯示有多媒體內容的一部分。以電子書而言,顯示區域以文字內容為主並可於文字之間穿插圖片或廣告看板。結束步驟S800。In step S800, a multimedia content is obtained through a remote server or a user device, and a part of the multimedia content is displayed on the display area of a display. The multimedia content may be a combination of various content items, such as text content, picture content, audio content, and video content, which may also be integrated in the streaming content and transmitted from the remote server to the user device. The display may be included in the user device or an external device independent of the user device and communicatively connected thereto. The display area (such as the fourth image, 401) displays a part of multimedia content. For e-books, the display area is mainly based on text content, and pictures or advertisement boards can be interspersed between texts. Step S800 is ended.

在步驟S810,經由用戶裝置起始一機械朗讀手段以基於所述文字內容輸出一聲音訊號。用戶裝置可配置成具備朗讀文字的能力。例如,用戶裝置可包含電子書閱讀器、聲音資料庫、語義辨識引擎或模組以及喇叭。聲音資料庫存放有對應每一單字、詞或句的聲音資料,這些資料可和語義辨識的結果匹配而輸出對應的聲音訊號。本文描述的聲音訊號的可以是數位或類比的形式,不限於電路傳輸階段或最終輸出的可聽見訊號。在使用者未指定的情況下,機械朗讀可從文字內容的任一處開始,如文字內容的第一個字,或顯示區域中當前文字內容的第一個字,或先前朗讀結束的位置。所述朗讀持續直到文章結束或使用者主動停止,結束步驟S810。In step S810, a mechanical reading method is initiated through the user device to output an audio signal based on the text content. The user device can be configured to have the ability to read text aloud. For example, the user device may include an e-book reader, an audio database, a semantic recognition engine or module, and a speaker. The sound data library contains sound data corresponding to each word, word or sentence. These data can be matched with the results of semantic recognition to output corresponding sound signals. The audio signal described in this article can be in digital or analog form, and is not limited to the audible signal at the circuit transmission stage or the final output. In the case that the user does not specify, the mechanical reading can start from anywhere in the text content, such as the first word of the text content, or the first word of the current text content in the display area, or the position where the previous reading ends. The reading continues until the end of the article or the user stops actively, and step S810 ends.

在步驟S820,經由用戶裝置產生一或多個動態標記於該顯示區域中,所述動態標記指出希望被讀者注視的一句子及/或一單字。所述動態標記所指示的文字內容與該聲音訊號關聯的文字內容同步。本文描述的同步是指動態標記產生的範圍與當前朗讀的字相同或者與當前朗讀的字詞所屬的句子相同,並非僅限於相關訊號發生時間上的相同。在一實施例中,所述動態標記為句子標記,其出現在一句子顯示的位置使該句子可視覺地與其他文字區隔(如第五A圖)。在另一實施例中,所述動態標記為單字標記,其出現在一單字顯示的位置使該單字可視覺地與其他文字區隔(如第五B圖)。單字標記實質上可跟隨朗讀的速度持續跳躍至下一單字。在其他實施例中,所述動態標記為句子標記和單字標記的組合,其同時出現在一句子的位置及該句子中的一單字位置,且兩者視覺上可區隔(如第五C圖)。例如,句子標記和單字標記可分別具有不同的顏色,或其中一者為文字底線。動態標記會持續往文章的末端跳躍直到朗讀停,結束步驟S830。較佳地,步驟S810與步驟S820一起執行。 在步驟S830,在朗讀和動態標記尚未停止前或停止後,經由用戶裝置接收一輸入訊號,該輸入訊號與在該顯示區域中的一位置辨識及/或與該多媒體內容的該部分在該顯示區域中的變化有關。用戶裝置可包含或通訊連接一輸入單元,如觸控面板、光學鏡頭或麥克風,其允許使用者操作以指出顯示區域上的一位置資訊及允許使用者於顯示區域中導覽所有的多媒體內容。所述位置資訊包含關於顯示區域上的一座標或一座標集合,其可指示一或多個使用者操作,如前述選擇操作或導覽操作。所述動態標記的顯示及聲音訊號的輸出自所述文字內容的一第一部分跳躍至一第二部分以回應該輸入訊號,其中所述文字內容的第二部分出現在該顯示區域中。當一選擇操作被用戶裝置識別,一句子或一單字(最關聯於所述位置資訊)接著被識別以回應所述選擇操作。基於所述輸入訊號而被識別的句子或單子則成為朗讀和標記的新目標,並立即被朗讀和標記。動態標記的目標自原句子(第一句子)跳躍至已識別的句子(第二句子),第一句子和第二句字為不同的句子,且不限於第一句在第二句之前。若指示一導覽操作的輸入訊號導致顯示畫面的內容變化,動態標記會隨著所述變化改變出現在顯示畫面上的位置或消失。在一實施例中,當動態標記因此消失時,用戶裝置可產生一新的動態標記在變化後的顯示區域中以標記當前的內容,同時朗讀目標一併同步至新的目標內容。在一些實施例中,如包含章節連結的電子書目錄或返回首頁的快捷鍵,指示一選擇操作或導覽操作的輸入訊號可致使用戶裝置將讀者導向與所選章節連結所關聯的頁面,同時朗讀和標記目標也一併同步至該頁面。基於該輸入訊號,動態標記的目標及/或朗讀目標跳躍至新的目標內容,步驟S830結束。In step S820, one or more dynamic marks are generated in the display area via the user device, the dynamic marks indicate a sentence and/or a word that the reader wishes to watch. The text content indicated by the dynamic mark is synchronized with the text content associated with the sound signal. Synchronization described in this article means that the scope of dynamic mark generation is the same as the currently spoken word or the sentence to which the currently spoken word belongs. It is not limited to the same time when the relevant signal occurs. In an embodiment, the dynamic mark is a sentence mark, which appears at a position where a sentence is displayed so that the sentence can be visually distinguished from other words (as shown in Fig. 5A). In another embodiment, the dynamic mark is a word mark, which appears at a position where a word is displayed so that the word can be visually distinguished from other characters (as shown in the fifth image B). The word mark can actually jump to the next word following the speed of reading aloud. In other embodiments, the dynamic mark is a combination of a sentence mark and a word mark, which appears at the same time as the position of a sentence and the position of a word in the sentence, and the two are visually distinguishable (as shown in Figure 5C) ). For example, the sentence mark and the word mark may have different colors, or one of them may be a text underline. The dynamic mark will continue to jump towards the end of the article until the reading stops, and step S830 is ended. Preferably, step S810 and step S820 are performed together. In step S830, before or after the reading and the dynamic mark have not been stopped, an input signal is received through the user device, the input signal is identified with a position in the display area and/or the portion of the multimedia content is displayed on the display The changes in the area are related. The user device may include or communicatively connect an input unit, such as a touch panel, optical lens, or microphone, which allows the user to operate to indicate position information on the display area and allows the user to navigate all multimedia content in the display area. The location information includes a landmark or a collection of landmarks on the display area, which can instruct one or more user operations, such as the aforementioned selection operation or navigation operation. The display of the dynamic mark and the output of the audio signal jump from a first part of the text content to a second part in response to the input signal, wherein the second part of the text content appears in the display area. When a selection operation is recognized by the user device, a sentence or a word (most related to the location information) is then recognized in response to the selection operation. The sentence or list recognized based on the input signal becomes a new target for reading and marking, and is immediately read and marked. The target of dynamic marking jumps from the original sentence (first sentence) to the recognized sentence (second sentence). The first sentence and the second sentence are different sentences, and are not limited to the first sentence before the second sentence. If the input signal indicating a navigation operation causes the content of the display screen to change, the dynamic mark may change the position that appears on the display screen or disappears with the change. In an embodiment, when the dynamic mark disappears, the user device may generate a new dynamic mark in the changed display area to mark the current content, and simultaneously read the target and synchronize to the new target content. In some embodiments, such as an e-book directory containing chapter links or a shortcut key to return to the home page, an input signal indicating a selection operation or a navigation operation may cause the user device to direct the reader to the page associated with the selected chapter link, while The reading and marking of the target are also synchronized to this page. Based on the input signal, the dynamically marked target and/or the reading target jump to the new target content, and step S830 ends.

要瞭解在該類流程圖描繪中的步驟圖示及組合,係可實作為電腦程式指令。這些程式指令可提供至一處理器以製造一種機器,因此當在該處理器上執行該等指令時,產生用於實現在該流程圖區塊或多數區塊中所指定的動作之方法。該等電腦程式指令可由一處理器執行以由該處理器執行一連串的操作步驟,而形成一電腦實作程序,因此該等指令係於該處理器上執行,以提供用於實現在該流程圖區塊或多數區塊中所指定的動作之步驟。這些程式指令可被儲存於一電腦可讀媒體或機器可讀媒體上,像是儲存在一電腦可讀儲存媒體上。To understand the step diagrams and combinations in such flow charts, it can be implemented as computer program instructions. These program instructions can be provided to a processor to make a machine, so when the instructions are executed on the processor, a method for implementing the actions specified in the flowchart block or blocks is generated. The computer program instructions can be executed by a processor to perform a series of operation steps by the processor to form a computer-implemented program, so the instructions are executed on the processor to provide for implementation in the flowchart The step of the action specified in the block or multiple blocks. These program instructions can be stored on a computer-readable medium or machine-readable medium, such as on a computer-readable storage medium.

據此,該等描述支援執行該等具體動作之手段的組合、支援執行該等具體動作之多數的組合,以及支援執行該等具體動作之程式指令方式。也將可瞭解,該流程圖描繪中的每一區塊以及該流程圖描繪中區塊的組合可由模組實作,像是以特殊目的硬體為基礎的系統,該系統執行該等具體動作步驟,或是特殊目的硬體與電腦指令的組合。Accordingly, the descriptions support a combination of means to perform the specific actions, a combination of the majority supporting the execution of the specific actions, and a program instruction method that supports the execution of the specific actions. It will also be understood that each block depicted in the flowchart and the combination of blocks depicted in the flowchart can be implemented by a module, like a system based on special purpose hardware that performs these specific actions Steps, or a combination of special purpose hardware and computer instructions.

以上內容提供該等敘述具體實施例之組合的製造與使用的完整描述。因為在不背離此敘述精神與範圍下可以產生許多具體實施例,因此這些具體實施例將存在於以下所附加之該等申請專利範圍之中。The foregoing provides a complete description of the manufacture and use of combinations of these specific embodiments. Since many specific embodiments can be produced without departing from the spirit and scope of this description, these specific embodiments will exist within the scope of the patent applications appended below.

100‧‧‧系統102‧‧‧伺服器1020‧‧‧中央處理器1022‧‧‧記憶體1024‧‧‧網路介面1026‧‧‧數位儲存單元104‧‧‧使用者終端裝置、用戶裝置106‧‧‧網路200‧‧‧用戶裝置202‧‧‧處理單元210‧‧‧電腦可讀取媒介220‧‧‧網路介面230‧‧‧記憶體231‧‧‧操作系統232‧‧‧內容播放模組233‧‧‧內容資料240‧‧‧輸出/輸入介面250‧‧‧朗讀及標記單元260‧‧‧輸出單元270‧‧‧輸入單元300‧‧‧朗讀及標記單元301‧‧‧文字產生引擎302‧‧‧文字處理引擎303‧‧‧語義分析引擎304‧‧‧音訊匹配引擎305‧‧‧文字標記引擎306‧‧‧同步產生引擎400‧‧‧顯示畫面401‧‧‧顯示區域402‧‧‧視窗403‧‧‧捲動操作404‧‧‧翻頁操作405‧‧‧第一選擇操作406‧‧‧第二選擇操作407‧‧‧第三選擇操作501‧‧‧動態標記502‧‧‧動態標記503‧‧‧段落標記S600-S640‧‧‧步驟S700-S730‧‧‧步驟S800-S830‧‧‧步驟100‧‧‧System 102‧‧‧Server 1020‧‧‧CPU 1022‧‧‧Memory 1024‧‧‧Network interface 1026‧‧‧Digital storage unit 104‧‧‧User terminal device, user device 106 ‧‧‧Network 200‧‧‧User device 202‧‧‧Processing unit 210‧‧‧Computer-readable medium 220‧‧‧Network interface 230‧‧‧Memory 231‧‧‧Operating system 232‧‧‧Content Play module 233‧‧‧Content data 240‧‧‧Output/input interface 250‧‧‧Reading and marking unit 260‧‧‧ Output unit 270‧‧‧Input unit 300‧‧‧Reading and marking unit 301‧‧‧Text Generating engine 302‧‧‧Text processing engine 303‧‧‧Semantic analysis engine 304‧‧‧ Audio matching engine 305‧‧‧Text marking engine 306‧‧‧ Synchronous generating engine 400‧‧‧Display screen 401‧‧‧Display area 402 ‧‧‧Window 403‧‧‧scrolling operation 404‧‧‧page turning operation 405‧‧‧first selection operation 406‧‧‧second selection operation 407‧‧‧third selection operation 501‧‧‧dynamic marker 502‧ ‧‧Dynamic Mark 503‧‧‧ Paragraph Mark S600-S640‧‧‧Step S700-S730‧‧‧Step S800-S830‧‧‧Step

第一圖顯示本發明提供的一系統。The first figure shows a system provided by the present invention.

第二圖顯示第一圖用戶裝置的一實施例。The second figure shows an embodiment of the user device of the first figure.

第三圖顯示本發明朗讀及標記單元的一實施例。The third figure shows an embodiment of the reading and marking unit of the present invention.

第四圖例示一顯示器的顯示畫面。The fourth figure illustrates a display screen of a display.

第五A至五B圖顯示本發明動態標記的的各種實施例示意。The fifth figures A to B show schematic diagrams of various embodiments of the present invention.

第六圖顯示使用者與自動朗讀裝置的互動流程。The sixth figure shows the interaction process between the user and the automatic reading device.

第七圖顯示本發明動態標記文字內容的步驟流程。The seventh figure shows the flow of steps for dynamically marking text content of the present invention.

第八圖顯示本發明朗讀方法的步驟流程。The eighth figure shows the steps of the reading method of the present invention.

S600-S640‧‧‧步驟 S600-S640‧‧‧Step

Claims (14)

一種自動朗讀裝置,經配置以接收並顯示一多媒體內容,該多媒體內容至少包含文字內容,該朗讀裝置包含: 一顯示器,具有一顯示區域以顯示該多媒體內容的一部分; 一輸入介面,接收一輸入訊號,該輸入訊號與在該顯示區域中的一位置辨識及/或與該多媒體內容的該部分在該顯示區域中的變化有關;及 一朗讀及標記單元,經配置以產生關聯於所述文字內容的聲音內容及一或多個動態標記,所述動態標記自所述文字內容的一第一部分跳躍至所述文字內容的一第二部分以回應該輸入訊號,其中所述文字內容的第一部分與一第一聲音內容有關,所述文字內容的第二部分與該輸入訊號有關且出現在該顯示區域中。An automatic reading device is configured to receive and display a multimedia content, the multimedia content includes at least text content, the reading device includes: a display with a display area to display a part of the multimedia content; an input interface to receive an input A signal, the input signal is associated with a position recognition in the display area and/or with the change of the portion of the multimedia content in the display area; and a reading and marking unit configured to generate the text associated with the text Sound content of the content and one or more dynamic tags that jump from a first part of the text content to a second part of the text content in response to the input signal, wherein the first part of the text content Related to a first sound content, the second part of the text content is related to the input signal and appears in the display area. 如申請專利範圍第1項所述之自動朗讀裝置,其中所述動態標記具有一句子標記。The automatic reading device as described in item 1 of the patent application scope, wherein the dynamic mark has a sentence mark. 如申請專利範圍第1項所述之自動朗讀裝置,其中所述動態標記具有一單字標記。The automatic reading device as described in item 1 of the patent application scope, wherein the dynamic mark has a single-word mark. 如申請專利範圍第1項所述之自動朗讀裝置,其中所述動態標記具有一句子標記及一單字標記,該句子標記與該單字標記視覺可區隔地重疊。The automatic reading device as described in item 1 of the patent application scope, wherein the dynamic mark has a sentence mark and a word mark, and the sentence mark and the word mark overlap visually and distinguishably. 如申請專利範圍第2至4項其中一項所述之自動朗讀裝置,其中該句子標記的範圍由所述文字內容的兩個標點符號定義。The automatic reading device as described in one of items 2 to 4 of the patent application scope, wherein the range of the sentence mark is defined by two punctuation marks of the text content. 如申請專利範圍第1項所述之自動朗讀裝置,其中所述文字內容的第二部分與一第二聲音內容有關。The automatic reading device as described in item 1 of the patent application scope, wherein the second part of the text content is related to a second sound content. 一種非暫態電腦可讀取媒介,包含複數個指令,可由一處理單元執行以: 分析一多媒體內容包含的文字內容以辨識複數個句子及/或單字; 接收一輸入訊號,該輸入訊號與在一顯示區域中的一位置辨識及/或與該多媒體內容的一部分在該顯示區域中的變化有關; 產生關聯於所述文字內容的一或多個動態標記以回應該輸入訊號,所述動態標記為可視於該顯示區域中;及 令所述動態標記自所述文字內容的一第一部分跳躍至所述文字內容的一第二部分,其中所述文字內容的第二部分與該輸入訊號有關且出現在該顯示區域中。A non-transitory computer readable medium, including a plurality of instructions, which can be executed by a processing unit to: analyze the text content contained in a multimedia content to identify a plurality of sentences and/or words; receive an input signal, the input signal and the A position recognition in a display area and/or related to a change of a part of the multimedia content in the display area; generating one or more dynamic marks associated with the text content in response to the input signal, the dynamic marks Is visible in the display area; and causes the dynamic mark to jump from a first part of the text content to a second part of the text content, wherein the second part of the text content is related to the input signal and Appears in the display area. 如申請專利範圍第7項所述之非暫態電腦可讀取媒介,其中該等指令更執行:基於所述文字內容的句子及/或單字的一辨識產生對應的聲音內容,所述聲音內容的輸出與所述動態標記同步。The non-transitory computer-readable medium as described in item 7 of the patent application scope, in which the instructions are further executed: generating a corresponding sound content based on a recognition of sentences and/or words of the text content, the sound content Is synchronized with the dynamic tag. 如申請專利範圍第7項所述之非暫態電腦可讀取媒介,其中所述產生關聯於所述文字內容的一或多個動態標記,包含取消一原動態標記。The non-transitory computer readable medium as described in item 7 of the patent application scope, wherein the generating one or more dynamic marks associated with the text content includes canceling an original dynamic mark. 如申請專利範圍第7項所述之非暫態電腦可讀取媒介,其中所述該多媒體內容的一部分在該顯示區域中的變化,包含關於該顯示區域的一捲動操作或一翻頁操作。The non-transitory computer readable medium as described in item 7 of the patent application scope, wherein the change of a part of the multimedia content in the display area includes a scrolling operation or a page turning operation on the display area . 一種自動朗讀方法,由一運算裝置的處理單元執行,該方法包含: 取得並顯示一多媒體內容的一部分於一顯示器的顯示區域上,其中該多媒體內容具有文字內容; 起始一機械朗讀手段以基於所述文字內容輸出聲音內容; 產生一或多個動態標記於該顯示區域中,所述動態標記指示所述文字內容的一句子及/或一單字,所述動態標記所指示的文字內容與該聲音內容關聯的文字內容同步;及 接收一輸入訊號,該輸入訊號與在該顯示區域中的一位置辨識及/或與該多媒體內容的該部分在該顯示區域中的變化有關,所述動態標記的顯示及聲音內容的輸出自所述文字內容的一第一部分跳躍至一第二部分以回應該輸入訊號,其中所述文字內容的第一部分與一第一聲音內容有關,所述文字內容的第二部分與該輸入訊號和一第二聲音內容有關且出現在該顯示區域中。An automatic reading method is executed by a processing unit of an arithmetic device. The method includes: acquiring and displaying a part of a multimedia content on a display area of a display, wherein the multimedia content has text content; starting a mechanical reading method based on The text content outputs sound content; one or more dynamic marks are generated in the display area, the dynamic mark indicates a sentence and/or a word of the text content, the text content indicated by the dynamic mark and the Synchronization of text content associated with audio content; and receiving an input signal related to a position recognition in the display area and/or related to the change of the portion of the multimedia content in the display area, the dynamic mark The display and output of sound content jump from a first part of the text content to a second part in response to the input signal, wherein the first part of the text content is related to a first sound content, the The two parts are related to the input signal and a second sound content and appear in the display area. 如申請專利範圍第12項所述之自動朗讀方法,其中所述產生一或多個動態標記於該顯示區域中包含同時產生指示一句子的一第一動態標記及指示一單字的一第二動態標記,該第一動態標記與該第二動態標記視覺可區隔地重疊。The automatic reading method as described in item 12 of the patent application scope, wherein the generating one or more dynamic marks in the display area includes simultaneously generating a first dynamic mark indicating a sentence and a second dynamic indicating a word Mark, the first dynamic mark and the second dynamic mark overlap visually and distinguishably. 如申請專利範圍第12項所述之自動朗讀方法,其中該輸入訊號是關聯於一觸控介面的操作、一影像辨識結果或一語音辨識結果。The automatic reading method described in item 12 of the patent application scope, wherein the input signal is associated with a touch interface operation, an image recognition result, or a voice recognition result. 如申請專利範圍第12項所述之自動朗讀方法,其中所述動態標記的顯示自所述文字內容的第一部分跳躍至第二部分以回應該輸入訊號,包含所述動態標記的顯示自所述文字內容之一第一部分的第一句子跳躍至所述文字內容之一第二部分的第二句子。The automatic reading method as described in item 12 of the patent application scope, wherein the display of the dynamic mark jumps from the first part of the text content to the second part in response to the input signal, and the display containing the dynamic mark is read from the The first sentence of the first part of one of the text content jumps to the second sentence of the second part of one of the text content.
TW107127720A 2018-08-09 2018-08-09 E-book apparatus with audible narration and method using the same TWI717627B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW107127720A TWI717627B (en) 2018-08-09 2018-08-09 E-book apparatus with audible narration and method using the same

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW107127720A TWI717627B (en) 2018-08-09 2018-08-09 E-book apparatus with audible narration and method using the same

Publications (2)

Publication Number Publication Date
TW202009891A true TW202009891A (en) 2020-03-01
TWI717627B TWI717627B (en) 2021-02-01

Family

ID=70766734

Family Applications (1)

Application Number Title Priority Date Filing Date
TW107127720A TWI717627B (en) 2018-08-09 2018-08-09 E-book apparatus with audible narration and method using the same

Country Status (1)

Country Link
TW (1) TWI717627B (en)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100225809A1 (en) * 2009-03-09 2010-09-09 Sony Corporation And Sony Electronics Inc. Electronic book with enhanced features
CN104423868B (en) * 2013-09-04 2019-03-08 腾讯科技(深圳)有限公司 E-book reading localization method and device
CN107369462B (en) * 2017-07-21 2020-06-26 阿里巴巴(中国)有限公司 Electronic book voice playing method and device and terminal equipment
TWM575595U (en) * 2018-08-09 2019-03-11 台灣大哥大股份有限公司 E-book apparatus with audible narration

Also Published As

Publication number Publication date
TWI717627B (en) 2021-02-01

Similar Documents

Publication Publication Date Title
US10845947B1 (en) Modular systems and methods for selectively enabling cloud-based assistive technologies
US20200175890A1 (en) Device, method, and graphical user interface for a group reading environment
US20130268826A1 (en) Synchronizing progress in audio and text versions of electronic books
US20150356361A1 (en) Method and device for reproducing content
US10642463B2 (en) Interactive management system for performing arts productions
US9317486B1 (en) Synchronizing playback of digital content with captured physical content
US20120276504A1 (en) Talking Teacher Visualization for Language Learning
WO2012086356A1 (en) File format, server, view device for digital comic, digital comic generation device
US20140315163A1 (en) Device, method, and graphical user interface for a group reading environment
CN115082602B (en) Method for generating digital person, training method, training device, training equipment and training medium for model
US20220374585A1 (en) User interfaces and tools for facilitating interactions with video content
US20110138286A1 (en) Voice assisted visual search
US9137483B2 (en) Video playback device, video playback method, non-transitory storage medium having stored thereon video playback program, video playback control device, video playback control method and non-transitory storage medium having stored thereon video playback control program
US20230244363A1 (en) Screen capture method and apparatus, and electronic device
US9472113B1 (en) Synchronizing playback of digital content with physical content
US10089059B1 (en) Managing playback of media content with location data
KR20150135056A (en) Method and device for replaying content
TWI717627B (en) E-book apparatus with audible narration and method using the same
TWM575595U (en) E-book apparatus with audible narration
US9253436B2 (en) Video playback device, video playback method, non-transitory storage medium having stored thereon video playback program, video playback control device, video playback control method and non-transitory storage medium having stored thereon video playback control program
JP2022051500A (en) Related information provision method and system
KR20170009487A (en) Chunk-based language learning method and electronic device to do this
KR101753986B1 (en) Method for providing multi-language lylics service, terminal and server performing the method
WO2020023070A1 (en) Text-to-speech interface featuring visual content supplemental to audio playback of text documents
KR102618311B1 (en) An apparatus and method for providing conversational english lecturing contents