TW434492B

TW434492B - Hyper text-to-speech conversion method

Info

Publication number: TW434492B
Application number: TW87110229A
Authority: TW
Inventors: Jin-Jiun Jung; Shau-Hua Huang; Chung-Bin Jung
Original assignee: Ind Tech Res Inst
Priority date: 1998-06-25
Filing date: 1998-06-25
Publication date: 2001-05-16

Abstract

The present invention relates to a system that can convert hyper text into speech signals. The system consists of a hyper text markup language (HTML) analyzer, an articulation control command analyzer, a tag converter, a text converter, and a traditional text-to-speech converter. The HTML analyzer reads and analyzes the input hyper text and divides it into a text content, a HTML tag for marking up the text structure, and an articulation control command for controlling the way of articulation. The articulation control command analyzer analyzes the articulation control command and, in accordance with the command type, stores the command content into a tag corresponding table, a sound effect table, a parameter table, a heteronym table, or a proper noun table. In accordance with the heteronym table and the proper noun table, the text converter carries out text replacement or conversion to correct the text articulation. The tag converter not only analyzes the HTML tag and, in accordance with the tag corresponding table and the parameter table, controls the traditional text-to-speech converter's way of articulation but also, in accordance with the tag corresponding table and the sound effect table, synthesizes the sound effects to allow listeners to easily distinguish the text structure. Based on the way of articulation set by the tag converter, the traditional text-to-speech converter converts the text content converted by the text converter into speech signals.

Description

經濟部中央標準局員工消費合作社印製 434492 A7 ____ B7_ 五、發明説明（/ ) 背景說明文字語音轉換器（text-to-speech converter)是一種將文字轉換為語音的裝置。對有視覺障礙的人而言，這種裝置可幫助他們聽取外界的資訊。在特定的環境下，這種裝置也是一般人獲取資訊的一種重要配備，例如開車的時候，或是使用電話的時候。而這些資訊的來源，可能是電子文件，或是透過光學掃瞄器及文字辨識裝置所轉換而來的文字資訊。在曰常生活中’電子式資訊之來源日益眾多，且成級數之成長，例如電子郵件、行事曆、電子新聞、股票資訊及備受矚目的全球資訊網。要將這些電子資訊轉換為數位語音，若採用人工錄音再加以數位化處理，不僅需要耗費龐大的人力及巨大的儲存空間，而且人工錄音的方式’無法適用於電腦系統依使用者需求，自動彙整產生的電子資訊。如何將各種原本適用於視覺顯示的電子資訊轉換為語音型式，對文字語音轉換器的設計人員而言，確實是一種挑戰。其中最主要的原因在於電子資訊的呈現，不僅需要呈現其文字内容，_也需要考慮這些文字内容的呈現方式例如在視覺顯示中的大小寫、粗體、斜體、段落型式及列舉型式等呈現方式。在進行文字語音轉換時’這些原本用於控制視覺顯㈣格式及字體控制瑪，不能直接轉換為語音在文字内容中的標點符號，也是不症直接轉換為語音H —字串的發音，在不同的前後文，也會有不㈣發音方法，例如巾文的破音字發 i_ i— - I— I - -I— I --1 -- - . I - - t^i I- ^n. : U3 、罗 (請先閱讀背面之注意事項再填寫本頁) -2 - 經濟部中央標準局員工消費合作衽印製 434492 A7 ____ __ B7 五、發明説明（〆）音，就是一典型的例子。為了解決這些問題，先前的技藝發明提出了各種的解決方法。美國專利第5,555,343號揭示一種解決這類問題的文字#音轉換技術，其中包含格式化及字體控制碼的處理方式，以及標點符號及特定的文數字格式的處理方法。此方法採用第一事先建好的表格，將格式化及字體控制瑪對應成語音控制瑪，用來控制發音之速度或音量的大小s此方法採用第二個事先建好的表格，將特定的文數字格式對應成口語化的文字字串。這些特定的文數字格式包括用來表示時間而以冒號分隔的數字字串、用來表示曰期而以斜線分隔的數字字串及用來表示檔案目錄而以斜線分隔的文字字串等。此方法採用第三個事先建好的表格，將標點符號或數學運算符號對應成口語化的文字字串或語音控制碼。此方法使用一事先建好的表格，以決疋一輸入的字元為可發音或是不可發音。遇到不可發音的字元，才依前述的第一、第.二個及第三個事先建好的表格，決定適當的發音方式。美國專利第5,634,084號揭示另一種解決這類問題的文字語音轉換技術。此方法先將輸入之文字，依上下文的關係’加以分類成數字、度量單位、地理名詞及時間曰期等類別。再將此分類後之文字，依一或多個不同類別的縮寫字表加以展開，對應成口語化之字詞。例如地名的縮寫、、SF，CA〃，此方法可將其轉換為、San Francisco California";亦可將'MPEG 〃轉換為口語化的'"mpeg"。本纸張尺度適用中囷國家標隼（CNS ) a4規格（uox^7公釐） -------^------ir_--^----β (請先閲讀背面之注意事項再填寫本頁) 4344 92 經濟部中央標準局員工消費合作社印- A7 _____B7_ 五、發明説明（夕）由於網際網路（Internet)及全球資訊網（World Wide Web)的普及’全球資訊網已成為現今電子資訊的主要來源之一。全球資訊網上的電子資訊’大部份是採闬超媒體標示語言（Hyper Text Markup Language, HTML)的格式，我們稱之為超媒體文件。超媒體文件與其他電子文件不一樣的地方’是在其原始文件令，除了文件内容外，還含有超媒體標籤(HTML tag)。超媒體標籤是超媒體標示語言所定義的文字標籤，用於標示文件之内容與結構，或是文件之顯示控制。例如下面這個例子表示一段超媒趙文件的原始文件： <!BODY BGCOLOR=#DBFFFF> <body bgcolor=white> 〈CENTER〉 <map name=,,Main1,> <area shape=’’rect” coords=”157,12,257，112” href=,,Main.html,,> <area shape=,,rect,' coords=5,293,141,393,24Γ' href^VRML.html”〉〈area shape=”rect” coords=’’18，141，118,241” href=，’VRML._”> 〈area shape=”rect” coords=”]L57,226,257,366” href=,,Main.htmr,> </map> <img src=”Images/Main.gif” usemap=”#Main” -4 - 本纸铁尺度適用中國國家標準（CNS ) A4規格（210X297公|T) ^^~' (請先閱讀背面之注意事項再填寫本頁} 4344 9 2 A7 B7 經濟部中央標準局員工消費合作社印製五、發明説明) border 二 0></img> <brxbrxbrxbr> Welcome to the VR workshop of our company <a href=“http://www.ccUtri.org.tw” xfont size=3 color=blue>ITRI</fontx/a> / <a href= “http://www.ccl.itri.org.tw”>CCL</fontx/a> . We have been developing some advanced technologies as follows, <ul> <a href=,,Main.html,s> <lixfont size=3 color=blue> PanoVR </a> (A panoramic image-based VR)</f〇ntxbr> <a href=”VRML,htmr’> <lixfont size=3 color=blue>CyberVR </a> (A VRML 1.0 browser)</f〇ntxbr> </ul> <brxbr> (請先閱讀背&之注意事項苒填寫本f } 本紙乐尺度通用中國國家標準（CNS ) A4規格（2丨0X 29?公釐）經濟部中央操準局員工消费合作社印掣 434492 A7 _______B7^_ 五、發明説明（7 ) <a href=’’Winner‘htmr><iing src=,,Images/Winner.gif, border=no></img></axbr> </a> <brxbr> You are the <img src=1,cgi-bin/Count.cgi?df=vvr.dat” border=0 align=middle:>th visitor <HR size=2 WIDTH=480 ALIGN=CENTER> (C) Copyright 1996 Computer and Communication Laboratory, Printed by the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs 434492 A7 ____ B7_ V. Description of the Invention (/) Background Description A text-to-speech converter is a device that converts text to speech. For people with visual impairments, this device helps them to hear information from the outside world. Under certain circumstances, this device is also an important device for ordinary people to obtain information, such as when driving a car or using a phone. The source of this information may be electronic documents or text information converted through optical scanners and text recognition devices. In everyday life, the sources of electronic information are becoming more numerous and growing in progression, such as e-mail, calendars, electronic news, stock information, and the high-profile global information network. To convert these electronic information into digital voice, if manual recording is used and then digitized, it will not only consume a huge amount of manpower and huge storage space, but also the manual recording method cannot be applied to computer systems to automatically integrate according to user needs. Electronic information generated. How to convert a variety of electronic information that was originally suitable for visual display to speech is a challenge for designers of text-to-speech converters. The most important reason is the presentation of electronic information, not only the presentation of its text content, but also the presentation of these text content, such as capitalization, bold, italics, paragraph patterns, and enumeration patterns in visual display. the way. When performing text-to-speech conversion, these were originally used to control the visual display format and font control. They cannot be directly converted to the punctuation marks in the text content. They are also directly converted to the speech H — the pronunciation of the string. Before and after the text, there are also pronunciation methods, such as the syllabary's broken sounds i_ i—-I— I--I— I --1--. I--t ^ i I- ^ n.: U3, Luo (please read the notes on the back before filling out this page) -2-Consumption Cooperation of Employees of the Central Bureau of Standards, Ministry of Economic Affairs, printed 434492 A7 ____ __ B7 V. Explanation of Invention (〆) The tone is a typical example. To solve these problems, the prior art inventions have proposed various solutions. U.S. Patent No. 5,555,343 discloses a text #tone conversion technology to solve such problems, which includes formatting and font control code processing methods, as well as punctuation marks and specific alphanumeric format processing methods. This method uses the first pre-built form to map the formatting and font control mark to the voice control mark, which is used to control the speed or volume of the pronunciation. This method uses the second pre-built form to change the specific The alphanumeric format corresponds to a spoken text string. These specific alphanumeric formats include numeric strings separated by colons to represent time, numeric strings separated by slashes to represent dates, and text strings separated by slashes to represent file directories. This method uses a third pre-built table, which corresponds to punctuation or mathematical operation symbols into spoken text strings or voice control codes. This method uses a pre-built form to determine whether the input character is articulate or non-pronounceable. When it encounters characters that cannot be pronounced, the first, second, and third pre-built forms are used to determine the proper pronunciation method. U.S. Patent No. 5,634,084 discloses another text-to-speech technology that solves this problem. This method first classifies the input text into categories such as numbers, units of measure, geographic terms, and time and date according to the context's relationship. This classified text is expanded according to one or more different types of abbreviated word lists to correspond to spoken words. For example, the abbreviation of a place name, SF, CA〃, this method can be converted to San Francisco California " It can also convert 'MPEG〃 to spoken' " mpeg ". The size of this paper is applicable to China National Standard (CNS) a4 (uox ^ 7mm) ------- ^ ------ ir _-- ^ ---- β (Please read the back first Please pay attention to this page and fill in this page) 4344 92 Printed by the Consumer Standards Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs-A7 _____B7_ V. Description of the Invention (Even) Due to the popularity of the Internet and the World Wide Web, 'Global Information The Internet has become one of the main sources of electronic information today. Most of the electronic information on the World Wide Web is in the format of Hyper Text Markup Language (HTML), which we call a hypermedia file. The difference between a hypermedia file and other electronic files is that in its original file order, in addition to the content of the file, it also contains a hypermedia tag (HTML tag). Hypermedia tags are text tags defined by the hypermedia markup language. They are used to indicate the content and structure of a document, or to control the display of a document. For example, the following example represents the original file of a hypermedia Zhao file: <! BODY BGCOLOR = # DBFFFF > < body bgcolor = white > <CENTER> < map name = ,, Main1, > < area shape = ' 'rect ”coords =” 157,12,257,112 ”href = ,, Main.html ,, > < area shape = ,, rect,' coords = 5,293,141,393,24Γ 'href ^ VRML.html”> 〈area shape = ”Rect” coords = `` 18,141,118,241 ”href =, 'VRML ._” > <area shape = ”rect” coords = ”] L57,226,257,366” href = ,, Main.htmr, > < / map > < img src = ”Images / Main.gif” usemap = ”# Main” -4-This paper iron scale is applicable to China National Standard (CNS) A4 specification (210X297 male | T) ^^ ~ '(Please first Read the notes on the back and fill in this page} 4344 9 2 A7 B7 Printed by the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs 5. Description of the invention) border II 0 > < / img > < brxbrxbrxbr > Welcome to the VR workshop of our company < a href = "http://www.ccUtri.org.tw" xfont size = 3 color = blue > ITRI < / fontx / a > / < a href = "http://www.ccl.itri.org.tw" > CCL < / fontx / a > . We have been developing some advanced technologies as follows, < ul > < a href = ,, Main.html, s > < lixfont size = 3 color = blue > PanoVR < / a > (A panoramic image-based VR) < / f〇ntxbr > < a href = ”VRML, htmr '> < lixfont size = 3 color = blue > CyberVR < / a > (A VRML 1.0 browser) < / f〇 ntxbr > < / ul > < brxbr > (Please read the notes on the back & fill out this f)} This paper music standard General Chinese National Standard (CNS) A4 specification (2 丨 0X 29? mm) Central Ministry of Economic Affairs Printed by the Consumer Cooperatives of the Bureau of Accreditation Service 434492 A7 _______ B7 ^ _ V. Description of Invention (7) < a href = `` Winner'htmr > < iing src = ,, Images / Winner.gif, border = no > < / img > < / axbr > < / a > < brxbr > You are the < img src = 1, cgi-bin / Count.cgi? df = vvr.dat ”border = 0 align = middle: > th visitor < HR size = 2 WIDTH = 480 ALIGN = CENTER > (C) Copyright 1996 Computer and Communication Laboratory,

Industrial Technology Research Institute, Taiwan, R.O.C, </BODY> 由這個例子可看到，在超媒體文件的原始文件中，全都是可顯示的字元，沒有特殊無法顯示的控制碼。超媒體標籤是以 '、<〃及*>〃標示’並分為起始標籤及結束標籤。啟始標籤以、、<〃起頭，而結束標籤則以、</" 起頭。因此 ’ '、〃是'"font々超媒體標籤的起始標籤，而、、〃則為其結束標籤。超媒體標示語言將由超媒體標籤的起始標籤及結束標籤所標示的文字内容賦予特殊的意義，以表達文件的結構’例如標題、段落、列舉及表格等。而這些結構元本紙張尺度通用中國國家標準（CNS )八私見格（2!〇Χ297公慶） ---；------装------訂，--.·----京 (請先閱讀背面之注意事項再填寫本頁) 經濟部中央標準局員工消费合作社印製 434492 A7 ________B7 五、發明説明（么）件的顯示方式，則是由全球資訊_覽器所控制。因此，同-超媒體文件’在不同的全球資訊_覽器或顧示系統上’會有不同的顯示方式。另外超媒體標籤也允許巢狀的排列方式，例如在上射，W及'<f⑽加=3 color=black>"以巢狀的方式加諸於、'Wd_e t〇此呢 workgroup of our company^ ° 雖然上述_是以英文撰寫的超媒體文件，但超媒體文件的文字内容允許用其他的語言，如中、曰及韓文等。對一般全球資訊網中文網頁的讀者而言，對一些英文的專有名詞，如、n〇rld Wide Webm 等^詞，並不陌生。因此在—般的中文網頁中，常會混雜英文的專有名詞m在以㈣體標示語言所撰寫的技術文件、開會通知及備忘錄等文件中，也是常使用英文的專有名詞或縮寫。因此在-般的全球資訊網網頁中，常混雜有多種的語言。除此之外’我們還需考慮同一字串具有多種發音的問題，例如中文的破音字，同一子在不同的詞或前後文中，會有不同的發音方式。上述傳統的文字語音轉換器並不適用於將超媒體文件轉換為語音訊號。在轉換的過程中，首先必須解決的問題，就是如何剖析超媒體文件以辨識超媒體標籤。由於超媒體標籤均由可顯示的字元所組成，而非特殊的控 =碼，而且超媒體標藏允許巢狀式的排列方式，因此先韵有關文子語音轉換器的技藝發明並不適用於剖析超媒體文件。再者，上述傳統的文字語音轉換器並無法解決破音字問題。雖然有些中文文字語音轉換器可解決部份本紙張尺度賴悄财―辟（CNS丨 -------—滅------訂------來 • - (請先閱讀背面之注意事項再填寫本頁) 43^492 A7 B7 經濟部中央標樂局員工消費合作社印製五、發明説明（）的破音字問題，但這些文字語音轉換器並未考慮多種語言混雜的問題〇本發明的重點即在克服這些問題。發明概要依據本發明所提出的方法，可以將超媒體文件轉換為語音訊號，或是達到其他的目的β在此所提出的實施例，是一將超媒體文件轉換為語音訊號的電腦系統^這個電腦系統包含一超媒體標示語言剖析器、一發音控制指令剖析器、一標籤轉換器、一文字轉換器及一傳統的文字語音轉換Is。超媒體標示語言剖析器讀取並剖析輸入的超媒體文件，將其分離成文字内容、標示文件結構的超媒體標藏及控制發音方式的發音控制指令。發音控制指令剖析器分析發音控制指令，1依據指令的類別將其内各存入標籤對應表、音效表、參數表 '破音字表或專有名詞表中。文字轉換器依據破音字表，將超媒體文件的文字内容中’凡是必須修改發音的字串，以替代字串修正文字的發音。同時文字轉換器也依據專有名詞表，將超媒體文件的文字内容中，凡是必須翻譯的字串，以ί文字串取代原字串的發音。標籤轉換器分析超媒體標籤丄並依據標籤對應表及參數纟，控制傳統文字語音轉換器的發音參數，以改變該超媒體標籤所標示的文字内容的音量、速度及韻料發音參數。科標籤轉換器也依據標籤對應表及音效表執行音效的合成，讓聽取易於分辨文件的結構。傳統文字語音轉換器則負責將經由文子轉換器轉換完成的文字内容，依據標籤轉換器所設定的發音方式，轉換為語音訊號。本紙張妓ϋ财國國家標準（CNS —) Α4娱权f ^ 装-- (請先閱讀背面之注意事項再填寫本頁) 訂-Industrial Technology Research Institute, Taiwan, R.O.C, < / BODY > As can be seen from this example, in the original file of the hypermedia file, all are displayable characters, and there is no special control code that cannot be displayed. Hypermedia tags are marked with ', < 〃 and * > 〃, and are divided into a start tag and an end tag. The opening tag starts with ,, < and the ending tag starts with, < / ". Therefore, '', 〃 is the start tag of '" font々 hypermedia tag, and < // font > 〃 is the end tag. The hypermedia markup language gives special meaning to the text content marked by the start tag and end tag of the hypermedia tag to express the structure of the document, such as a title, paragraph, enumeration, and table. And the paper standard of these structural elements is in accordance with the Chinese National Standard (CNS) Eight Private Sights (2! 〇 × 297 public celebration) ---; --Beijing (Please read the precautions on the back before filling this page) Printed by the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs 434492 A7 ________B7 5. The description of the invention (?) control. Therefore, the same-hypermedia file ′ will be displayed in different ways on different global information browsers or Gu display systems. In addition, hypermedia tags also allow nested arrangements. For example, in the upper shot, W and '< f⑽ 加 = 3 color = black > " are nested and added to the' Wd_e t〇 workgroup of our company ^ ° Although the above_ is a hypermedia file written in English, the text content of the hypermedia file is allowed in other languages, such as Chinese, Japanese, and Korean. For readers of the Chinese web pages of the World Wide Web, they are no strangers to some English-language proper nouns, such as ^ rld Wide Webm. Therefore, in common Chinese webpages, English proper nouns are often mixed. In technical documents, meeting notices, and memos written in the body language, English proper nouns or abbreviations are often used. As a result, many languages are often mixed in the web-like World Wide Web pages. In addition, we also need to consider the problem that the same string has multiple pronunciations. For example, in the case of Chinese broken sounds, the same son may have different pronunciations in different words or in the context. The above-mentioned traditional text-to-speech converter is not suitable for converting hypermedia files into speech signals. During the conversion process, the first problem that must be solved is how to parse the hypermedia file to identify the hypermedia tag. Because the hypermedia tags are composed of displayable characters, not special control codes, and the hypermedia tags allow nested arrangements, the technical invention of the pre-rhyme-related text-to-speech converter is not applicable to Anatomy of a hypermedia file. Furthermore, the above-mentioned traditional text-to-speech converter cannot solve the problem of broken words. Although some Chinese text-to-speech converters can solve some of the paper's standards, such as 赖 (CNS 丨 ----------------- order --------) Read the notes on the back and fill in this page) 43 ^ 492 A7 B7 Printed by the Consumer Cooperatives of the Central Biaolu Bureau of the Ministry of Economy Problem 0 The present invention focuses on overcoming these problems. SUMMARY OF THE INVENTION According to the method proposed by the present invention, a hypermedia file can be converted into a voice signal, or other purposes can be achieved. The embodiment proposed here is a Computer system for converting hypermedia files into voice signals ^ This computer system includes a hypermedia markup language parser, a pronunciation control command parser, a tag converter, a text converter, and a traditional text-to-speech Is. Hypermedia Marker The language parser reads and analyzes the input hypermedia file, and separates it into text content, hypermedia markup that indicates the structure of the file, and pronunciation control instructions that control the way of pronunciation. Pronunciation control The parsing command parser analyzes the pronunciation control commands. 1 According to the type of the command, each of them is stored in a tag correspondence table, a sound effect table, a parameter table, a broken word table or a proper noun table. The text converter according to the broken word table, In the text content of the media file, 'everyone must modify the pronunciation string to replace the string to correct the pronunciation of the text. At the same time, the text converter also uses the proper noun table to convert all the text strings in the hypermedia file's text content. Replace the pronunciation of the original string with a text string. The tag converter analyzes the hypermedia tags 丄 and controls the pronunciation parameters of the traditional text-to-speech converter according to the tag correspondence table and parameters 纟 to change the text content marked by the hypermedia tag Volume, speed, and rhyme sounding parameters. The Tag Converter also performs sound synthesis according to the tag correspondence table and sound effect table, so that you can hear the structure of the file easily. The traditional text-to-speech converter is responsible for converting The text content is converted into a voice signal according to the pronunciation method set by the label converter. National Standard of the Country (CNS —) Α4 Entertainment Right f ^ Pack-(Please read the notes on the back before filling this page) Order-

.«- 1 ·1-· 11 I. «-1 · 1- · 11 I

.I - -t I t- -I ,-f - t ' ,-f - t ' 經濟部中央標隼局貝工消費合作社印製 A7 五、發明説明（y) 這個依據本發明所建構的系統，可以將超媒體文件轉換為語音訊號，同時也解決破音字及語言混雜等問題。這個系統不但設計精巧，同時也極具應用與擴充能力°此裝置與方法不但可提供個人化的文字語音轉換，而且也讓超媒體文件的提供者，更易於設計超媒體文件的語音表達方式。 434492 A7 B7 丨 —~ . . 五、發明説明（7) 圖式簡單說明經濟部中央標準局員工消费合作社印裝圖1說明本發明的一具體實施例。圖2說明本發明各個元件間的資料流程。圖3說明-序列隱藏於超媒體文件中的發音控制指令。圖4A’4B及4C分別說明參數表、音效表及標籤對應圖5A及5B分別說明破音字表及專有名詞表。 ^ 圖ό說明文件讀取控制器的執行步驟。圖7說明發音控制指令剖析器的執行步驟。圖8說明文字轉換器的執行步驟。圖9說明標蕺轉換器的執行步驟。發明的詳細說明圖1說明一依據本發明所建構的超媒體語音轉換統10。此系統是一電腦系統，内含一中央處理器U、、主記憶體12、網路裝置13、電話介面裝置14 和滑鼠15、音效裝置16、顯示器17以及儲存裝置 18。此系統使用一條排線π將這些裝置1Μ8連接在一起。經由此排線19 ’這些裝置11-18間可傳輪指令或資料。儲存裝置18可以是磁碟機，用於儲存資料及程 (pr〇CeSS)。主記憶體12也是儲存資料及程序，但崎當是用來儲存目前中央處理器21纟在執行的指令；料。中央處理ϋ n是絲執躲序巾的指令並處理資料。網路裝置13是用來與電腦網路連接，例如乙太網 {請先閲讀背面之注意事項再填寫本頁) 、-5 良 __________ - 10 - G法尺度適$中國國家標準了Gs、Μ規格經濟部中央標準局負工消費合作社印製 A7 ---—--- -B7 五、發明説明（丨0 ) 路連接器或是其他型態之網路卡。電話介面裝置14是用來與電話網路連接。鍵盤和滑& 15 A用於接收使用者輸入的命令或育料。顯示器丨7是以文字型式或圖形塑式顯示電子資訊。音效裝置16接收數位語音訊號，並透過喇17八或耳機發出聲音或音效。。如圖1所示，儲存裝置丨8内儲存了作業系統、應用程式超媒體文件禮案23、發音控制指令檀21及文件讀取模組29。作業线及應_式是-般熟知的技術在此不再贅述。文件讀取模組29包含文件讀取，制器28、文字語音轉換器27、超媒體標示語言剖析，24、發音控制指令剖析器22、標籤轉換器乃、標籤對應表41、參數表42'音效表43、文字轉換器26、破音字表31及專有名詞表％。雖然上述之程序22，24-28是以分時共享的方式，由中央處理ϋ Π貞責執行，但這只是為了方便陳述本發明之方法。上狀程彳22，24-28也可使用熟知的硬體技術，以硬體的實施方式來達到相同的功能，在此不再贅述這種實施方式。另外文字語音轉換器27及超媒體標示語言剖析器24也是熟知的技藝，在此不再詳述。文件讀取控制器28負責控制整個轉換過程中，各個程序_資料流程。圖2顯示文件讀取控制器28内之資料流程。超媒體文件檔案23可能源自於網路裝置13,或是從儲存裝置18讀取。發音控制指令檔21也是源自於網路裝置13，或是從儲存裝置讀取。 __ - π - 國國家標準（CNS ) A4現格（ -- ----- —I I— n H I Ά —I n .—I I--- ]—. 丁— — _____ n ^ - 、T-.' (讀先閱讀背面之注意事項再填寫本頁) 經濟部中央搮箪局員工消資合作社印製 4344 92 A7 ________ ___B7 五、發明説明（（1 ) 超媒體標示語言騎器24分析超媒體文件標案23 的=容’將其分離成文字内容、標示文件結構的超媒體標籤及控制發音方式的發音㈣指令^超雜標示語言剖析器24將分離出的超媒體標籤送至標籤轉換器25。超媒體標示語言剖析器24將分離出的文字内容送至文字轉換器26。超媒體標示語言剖析器24將分離出的發音控制指令送至發音控制指令剖析器22。發音控制指令剖析器22負責分析發音控制指令。發音控制指令可能儲存於獨立的發音控制指令檔21，或是隱含在超媒體文件檔案23中。發音控制指令分成下列四種型態： (1) 聲音控制指令’其格式為：PARAM超媒體標籤屬性發音參數； (2) 音效控制指令’其格式為：AUDIO超媒體標籤屬性音效資料； (3) 破音字控制指令，其格式為：ALT破音字_替代字串前後文字串； (4) 專有名詞控制指令，其格式為：TERM專有名詞字串譯文字串。圖3顯示一序列的發音控制指令no, 12〇, 13〇, 140， 150 ’ 160 ’ 170及180。這些發音控制指令是使用超媒體標示語言中的註解標籤（comment tag)加以標示，使其能隱藏於超媒體文件中。指令U0是一聲音控制指令，因其使用^PARAIVT識別碼111。這個聲音控制指令11〇定義所有以超媒體標籤113，Lr標示的文字内容，均 -12 - 本紙張尺度適用中國國家標準（CNS ) A4規格（210 X29*/公釐） --------- I I 1 (請先閱讀背面之注意事項再填寫本頁) 訂竦 4344 9 A7 B7 經濟部中央榡隼局員工消費合作社印製五、發明説明（丨>) 依據其發音參數115所定義的參數作聲音控制。這個聲音控制指令110中，發音參數115所定義的參數為：速度（speed)是 1.0，音量（volume)是 0.8，韻律(pitch)是 1.2。在聲音控制指令中，可以在超媒體標籤113及發音參數 115間，選擇性地加入屬性攔位。屬性欄位是用於限制該聲音控制指令110所適用之範圍。例如屬性欄位可以是超媒體標籤的一些屬性，使用屬性欄位則可限制該聲音控制指令110只適用於據有特定屬性的超媒體標籤 113。指令120是一音效控制指令，因其使用"AUDICT 識別碼121。這個音效控制指令120定義碰到超媒體標籤123 *LI〃時，必須插入音效資料125〜beep.au〃。在此例中，音效資料125是一取名為beep.au的音效資料檔。在音效控制指令120中，也可以選擇性地加入屬性攔位，用來限制此音效控制指令120所適用之範圍。發音控制指令剖析器22在分析聲音控制指令及音效控制指令時，會依據指令的内容修改如圖4A所示的參數表42，或是如圖4B所示的音效表43。然後發音控制指令剖析器22再修改如圖4C所示的標籤對應表41。如圖4A所示，發音控制指令剖析器22在分析聲音控制指令110時，會在參數表42中，加入或修改項目 .42-1。首先發音控制指令剖析器22必須在參數表42取得可以使用的項目42-1。項目42-1的取得方式可以是在參數表42中插入新的項目，或是重覆使用參數表42中不再使用的項目。在取得項目42-1之後，發音控制指令 -13 - 本紙張尺度適用中國國家標準（CNS丨A4規格（210X297公釐） 1 - I - —-H --- r-»—— - ί ! I β^- _ I -. » I ί —---ΐ _n J. (請先閱讀背面之注意事項再填寫本頁) 4344 92 經濟部中央樣準局貝工消費合作社印製 A7 —______B7五、發明説明（丨勹) ^ ~~〜 ' 剖析器22會將聲音控制指令110中發音參數丨丨5所定義的參數存入項目42-1的欄位42-12，4243及42-U, 圖4A中的PID欄位4241是表示參數識別碼，此欄位是為了說明方便’在實際使用時可以忽略不用。如圖4B所示，發音控制指令剖析器22在分析音致控制指令120時’會在音效表43中，加入或修改二"目 43-1 〇首先發音控制指令剖析器22必須在音效表43取得可以使用的項目43-1。項目43-1的取得方式可以是在音效表43中插入新的項目，或是重覆使用音效表43中不再使用的項目。在取得項目43_丨之後，發音控制指令剖析器22會將音效控制指令12〇中的音效資料檔名 125及其音效資料内容存入項目43-1的欄位43-12及Μ-ΐ 3 。圖 4B 中的 AID 攔位 43-11 是表示音效資料識別碼，此攔位是為了說明方便，在實際使用時可以忽略不用。另外，為了節省記憶體空間之使用’發音控制指令剖析器22在修改音效表43之前，可使用音效資料檔名125 對音效表43進行檢索，若發現相同的檔名已存在，則不做修改的動作。發音控制指令剖析器22在修改參數表42或音效表 43元成之後，接著修改標滅對應表々I的内容。首先發音控制指令剖析器22以發音控制指令丨或120的識別，111或121、超媒體標籤113或i23及屬性攔位檢索標籤對應表41 ’並獲得項目4M或41-2。在檢索中，若項目41-丨或41-2不存在，則建立新的項目或41-2。然後發音控制指令剖析器22會針對聲音控制指令11〇， -14 - 本纸狀度財闕家辟（CNS)A4規格（21QX 297公楚） f請先閑讀背面之注意事¾再填寫本頁) 装. 訂- t------- 1 1 I I [. .1 4344 y ? Α7 Β7 五、發明説明（丨仏）經濟部中央標準局員工消費合作社印製將、目4i-i的型態欄位4M3設定為嫩倾，以標明指，爛位4H4是指向參數表42t;㈣發音控制指令剖析器22也將指標棚位41_14設定為該聲音控制指令ιι〇在參數表42中的對應項目42-卜針對音效控制指令12〇，發音控制指令剖析器22將項目的型態爛位 41-23 設定為AUDIO ’以標明指標襴位4丨_24是指向音效表43。同時發音控制指令剖析H 2 2也將指賴位4丨_ 2 4設定為該音效控制指令12〇在音效表43中的對應項目43-1 » 在圖3令，指令13〇，14Q及150是破音字控制指令，這些指令的識別碼131，141及15】均為'、AIjr^每個破曰子控制指令丨30 ’ 140及150均定義破音字串133，143 及153，以及取代破音字串133 , 143及153的替代字串 135 ’ 145及155。其目的是使用替代字串135，145及155 讓文字語音轉換器27產生正確的發音。指令14〇及15〇也定義前後文字串147及157，用來限制指令140及150 所適用的範圍。如圖5A所示，發音控制指令剖析器22在分析破音字控制指令130，140及150時，會在破音字表31中，依序加入或修改項目31-1 ’ 31-2及31-3。發音控制指令剖析器22將破音字串133，143及153讀入並作適當的文字轉換，再分別存入項目31-1，31-2及31-3的破音字串攔位31-11，31-21及31-31。發音控制指令剖析器22 將替代字串135，145及155讀入並作適當的文字轉換，再分別存入項目31-1 ’ 31-2及31-3的替代字串欄位3μ Π ’ 31-22及31-32。發音控制指令剖析器22將前後文 I .----^-----.1"i . (請先閲讀背面之注意事項再填寫本頁) -15 - 本紙張尺度適用中國國家標率（CN’S ) A4規格（210 ·Χ 297公釐） 434492 經濟部中央標準局員工消費合作社印製 A7 B7五、發明説明（丨S ) 字串147及157讀入並作適當的文字轉換，再分別存入項目31-2及31-3的前後文字串攔位31-23及31-33。在圖3中，指令160，170及180是專有名詞控制指令，這些指令的識別碼161，171及181均為"TERM"。每個專有名詞控制指令160，170及180均定義專有名詞字串163，173及183，以及用來取代專有名詞字串163， 173及183的譯文字串165，Π5及185。其目的是使用譯文字串165，175及185，讓文字語音轉換器27能夠針對專有名詞字串163，173及183，產生正確的發音。例如當文字語音轉換器27只能轉換中文文字時，專有名詞控制指令160，170及180可用於將英文或_英混雜的專有名詞轉換為中文語音訊號。如圖5B所示，發音控制指令剖析器22在分析專有名詞控制指令160，170及180時，會在專有名詞表32 中，依序加入或修改項目32-1，32-2及32-3。發音控制指令剖析器22將專有名詞字串163，173及183讀入並作適當的文字轉換(詳述於後），再分別存入項目32-1，32-2 及32-3的專有名詞欄位32-11，32-21及32-31。發音控制指令剖析器22將譯文字串165，175及185讀入並作適當的文字轉換（詳述於後），再分別存入項目32-1，32-2 及32-3的譯文欄位32-12，32-22及32-32。在圖2中，文字轉換器26接收並處理由超媒體標示語言剖析器24所分離產生的文字内容。文字轉換器 26搜尋該文字内容，以決定是否有破音字表31内的破音字串及專有名詞表32内的專有名詞。若發現有破音 ----------^-----—iT---^----1-—— - - (請先閱讀背面之注意事項再填寫本頁) -16 - 本紙張尺度適用中國國家標準（CMS ) A4現格（2iOx 297公釐） 4344 9 2 經濟部中央標隼局員工消費合作社印製 A7 B7五、發明説明（丨^) 字串或專有名詞，文字轉換器26將依據音字表31或專有名詞表32的項目内容，執行字串替代。文字轉換器26將處理完成的文字内容，送交給文字語音轉換器 27 ° 標籤轉換器25接收並處理由超媒體標示語言剖析器24所分離產生的超媒體標籤。標籤轉換器25用此超媒體標籤檢索標籤對應表41，以決定該超媒體標籤是否需要做聲音控制或是音效控制。若發現該超媒體標籤需要做聲音控制，標籤轉換器25依據檢索獲得之項目的指標攔位，自參數表42取得對應之參數，並將這些發音參數送至文字語音轉換器27。若發現該超媒體標籤需要做音效控制，標籤轉換器25依據檢索獲得之項目的指標欄位，自音效表43取得對應之音效資料，並將其送至音效裝置16或是電話介面裝置14。文字語音轉換器27接收並處理來自文字轉換器26 的文字内容，以及來自標籤轉換器25的發音參數。文字語音轉換器27會依據新收到的發音參數，改變其參數設定，例如改變聲音的速度、音量及韻律等參數之設定值。文字語音轉換器27接收到文字内容時，會依據當時發音參數之設定值，將該文字内容轉換為語音訊號，並將結果送至音效裝置16或是電話介面裝置14。圖6說明文件讀取控制器28的執行步驟。在步驟 S1，超媒體語音轉換系統10 (中央處理器11執行作業系統或應用程式）決定是否需要讀取獨立的發音控制指令檔21。若是的話，文件讀取控制器28會執行步驟 (請先閲讀背面之注意事項再填寫本頁) 訂 -17 - 本紙張尺度適用中國國家標隼（CNS ) Μ規格（2[0Χ297公釐） 4344 92 經濟部中央標準局員工消費合作社印製 A7 B7五、發明説明（丨1 ) S2,讀取該檔案的内容，並在步驟S6將檔案内容交給發音控制指令剖析器22分析發音控制指令。在執行完步驟S6,或是不需要讀取獨立的發音控制指令檔21，文件讀取控制器28執行步驟S3,讀取超媒體文件檔案 23的内容。在步驟S4中，文件讀取控制器28將超媒體文件檔案23的内容交由超媒體標示語言剖析器24· 分離成文件元件。一文件元件可以是一超媒體標籤、一文字内容的文字串或是一發音控制指令。文件讀取控制器28則依序讀取並處理超媒體標示語言剖析器24分離出來的文件元件。在步驟S5中，文件讀取控制器28 決定超媒體標示語言剖析器24分離出來的文件元件是否為發音控制指令。若是的話，文件讀取控制器28執行步驟S6，將該發音控制指令交由發音控制指令剖析器22分析。執行完步驟S6後，文件讀取控制器28回到步驟S4，讀取並處理下一由超媒體標示語言剖析器 24分離出來的文件元件。在步驟S5中，如果讀取的文件元件不是發音控制指令，文件讀取控制器28執行步驟S7。如果文件讀取控制器28在步驟S7發現該文件元件是一超媒體標籤，文件讀取控制器28在步驟S8 中，將該超媒體標籤交由標籤轉換器25進行聲音及音效的控制。然後文件讀取控制器28回到步驟S4，讀取並處理下一文件元件。在步驟S7中，如果讀取的文件元件不是超媒體標籤，文件讀取控制器28執行步驟 S9。在步驟S9中，文件讀取控制器28將讀取的文件元件視為一文字内容的文字争，並將其交由文字轉換器 (請先聞讀背面之注意事項再填寫本頁) . 線 -18 - 本纸張疋度適用中國國家標準（CNS ) A4規格（210X297公釐）經濟部卡央標準局員工消費合作杜印製 A7 __________B7 五、發明説明（洛） 26做文字的替代轉換。在步驟S10中，文件讀取控制器 28把轉換之結果交給文字語音轉換器27處理，^字; .音轉換器27會將其轉換為語音訊號，並經由音效裝置 16或電話介面裝置14播放出來。然後文件讀取控制器 28會回到步驟S4 ’讀取並處理下—文件元件。文件讀取控制器28會重覆執行這些步驟S4_Si〇，直到超媒體文件檔案23内所有的文件元件都處理完。圖7說明發音控制指令剖析器22的執行流程。在步驟S11中，發音控制指令剖析器22讀取發音控制指令。在步驟S12,發音控制指令剖析器22依據指令^ 識別碼，判別該發音控制指令是否為聲音控制指令。若是的話’發音控制指令剖析器22執行步驟sn，在標籤對應表41中加入或修改一項目，並將該指令的超^ 體標籤、指令型態（在此為PARAM) '屬性及參數指標存入對應的欄位。然後發音控制指令剖析器22執行步驟 S14,將該指令所定義的發音參數存入參數表42。執行完步驟S14後，發音控制指令剖析器22回到步驟su，讀取並處理下一發音控制指令。在步驟S12，如果指令不是聲音控制指令，發音控制指令剖析器22執行步驟S15,判別該指令是否二^ 效控制指令。若是的話，發音控制指令剖析器22執行步驟S16,在標籤對應表41中加入或修改一項目，並將該指令的超媒體標籤、指令型態（在此為Aum〇)、屬性及音效資料指標存入對應的欄位。然後發音控制指人剖析器22執行步驟S25,將該指令所定義的音效 ___- 19 - CNS ) A4規格（2丨0x297公釐） (請先閱讀背面之注意事項再填寫本頁) I - I — - I r 1— —κ τ» -6 域.I--t I t- -I, -f-t ', -f-t' Printed by A7, Shellfish Consumer Cooperative, Central Bureau of Standards of the Ministry of Economic Affairs 5. Description of the invention (y) This system is constructed in accordance with the present invention. , Can convert hypermedia files into voice signals, and also solve problems such as broken sounds and mixed languages. This system not only has a compact design, but also has great application and expansion capabilities. This device and method can not only provide personalized text-to-speech conversion, but also make it easier for hypermedia file providers to design hypermedia file speech expressions. 434492 A7 B7 丨 — ~.. V. Description of the invention (7) Brief illustration of the drawing Printed by the Consumer Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs Figure 1 illustrates a specific embodiment of the present invention. Figure 2 illustrates the data flow between the various components of the invention. Figure 3 illustrates the sequence of pronunciation control instructions hidden in a hypermedia file. 4A '4B and 4C illustrate the parameter table, sound effect table, and label correspondence, respectively. Figs. 5A and 5B illustrate the broken word table and proper noun table, respectively. ^ Figure 6 illustrates the steps performed by the file read controller. FIG. 7 illustrates the execution steps of the pronunciation control instruction parser. Figure 8 illustrates the execution steps of a text converter. Figure 9 illustrates the execution steps of a standard converter. DETAILED DESCRIPTION OF THE INVENTION Fig. 1 illustrates a hypermedia speech conversion system 10 constructed in accordance with the present invention. This system is a computer system including a central processing unit U, a main memory 12, a network device 13, a telephone interface device 14, a mouse 15, a sound effect device 16, a display 17, and a storage device 18. This system uses a ribbon cable π to connect these devices 1M8 together. Through this line 19 ', these devices 11-18 can pass wheel instructions or information. The storage device 18 may be a disk drive for storing data and programs (prCeSS). The main memory 12 is also used to store data and programs, but is used to store the instructions currently being executed by the central processing unit 21; The central processing unit ϋ n is the command to handle the data and process the data. Network device 13 is used to connect to a computer network. For example, Ethernet (please read the precautions on the back before filling this page), -5 __________-10-G-law scale is suitable for Chinese national standards. Gs, M7 printed by the Central Standards Bureau, Ministry of Economic Affairs, Consumer Cooperatives -------- -B7 V. Description of the invention (丨 0) Road connector or other types of network cards. The telephone interface device 14 is used to connect with the telephone network. The keyboard and slide & 15 A are used to receive commands or feeds entered by the user. The display 丨 7 displays electronic information in text or graphic form. The sound effect device 16 receives a digital voice signal, and emits a sound or sound effect through a sound cable or a headset. . As shown in FIG. 1, the storage device 8 stores an operating system, an application program hypermedia file courtesy 23, a pronunciation control instruction module 21, and a document reading module 29. The operation line and the application technique are generally well-known techniques and will not be repeated here. File reading module 29 includes file reading, controller 28, text-to-speech converter 27, analysis of hypermedia markup language, 24, pronunciation control command parser 22, tag converter, tag correspondence table 41, parameter table 42 ' Sound effect table 43, text converter 26, broken sound table 31 and proper noun table%. Although the above procedures 22, 24-28 are executed by the central processing unit Π 贞责 in a time-sharing manner, this is only for the convenience of stating the method of the present invention. The above procedures 彳 22, 24-28 can also use the well-known hardware technology to achieve the same function in a hardware implementation, which is not described in detail here. In addition, the text-to-speech converter 27 and the hypermedia markup language parser 24 are also well-known techniques and will not be described in detail here. The file reading controller 28 is responsible for controlling each program_data flow in the entire conversion process. FIG. 2 shows the data flow in the file reading controller 28. The hypermedia file file 23 may originate from the network device 13 or be read from the storage device 18. The pronunciation control command file 21 is also derived from the network device 13 or read from the storage device. __-π-National Standard (CNS) A4 is now available (------ —II— n HI Ά —I n .—I I ---] —. Ding — — _____ n ^-, T- . '(Read the precautions on the back before filling out this page) Printed by the Central Government Bureau of the Ministry of Economic Affairs, Employees' Cooperatives, 4344 92 A7 ________ ___B7 V. Invention Description ((1) Hypermedia Markup Language Rider 24 Analysis Hypermedia == 'in document mark 23 separates it into text content, a hypermedia label indicating the structure of the file, and a pronunciation command that controls the way of pronunciation ^ The hyper-markup language parser 24 sends the separated hypermedia label to the label converter 25. Hypermedia markup language parser 24 sends the separated text content to text converter 26. Hypermedia markup language parser 24 sends the separated pronunciation control command to pronunciation control command parser 22. Pronunciation control command parser 22 is responsible for analyzing the sound control instructions. The sound control instructions may be stored in a separate sound control instruction file 21 or hidden in the hypermedia file file 23. The sound control instructions are divided into the following four types: (1) Voice control instructions' Its format is : PARAM hypermedia tag attribute pronunciation parameters; (2) Audio control command 'in the format: AUDIO hypermedia tag attribute sound data; (3) Broken word control command, the format is: ALT Broken word _ alternative string before and after text string (4) Proper noun control instruction, its format is: TERM proper noun string translation string. Figure 3 shows a sequence of pronunciation control instructions no, 12〇, 13〇, 140, 150 '160' 170 and 180 These pronunciation control instructions are marked with a comment tag in the hypermedia markup language so that they can be hidden in the hypermedia file. The instruction U0 is a sound control instruction because it uses the ^ PARAIVT identification code 111. This The voice control instruction 11 defines all the text content marked with hypermedia label 113, Lr, -12-This paper size applies to China National Standard (CNS) A4 specification (210 X29 * / mm) ------- -II 1 (Please read the notes on the back before filling out this page) Order 4344 9 A7 B7 Printed by the Central Government Bureau of the Ministry of Economic Affairs, Employees' Cooperatives. 5. Description of the invention (丨 >) Defined according to its pronunciation parameter 115 Ginseng For voice control. In this voice control command 110, the parameters defined by the pronunciation parameter 115 are: speed (speed) is 1.0, volume (volume) is 0.8, and rhythm (pitch) is 1.2. In the voice control command, you can Optionally, attribute tags are added between the media tag 113 and the pronunciation parameter 115. The attribute field is used to limit the scope to which the sound control instruction 110 is applicable. For example, the attribute field may be some attributes of the hypermedia tag, and the use of the attribute field may limit the sound control instruction 110 to only apply to the hypermedia tag 113 having a specific attribute. Command 120 is a sound control command because it uses " AUDICT ID 121. This sound effect control instruction 120 defines that when encountering the hypermedia label 123 * LI〃, sound effect data 125 ~ beep.au〃 must be inserted. In this example, the audio data 125 is an audio data file named beep.au. In the sound effect control instruction 120, an attribute block may be optionally added to limit the scope to which the sound effect control instruction 120 is applicable. When the sound control instruction parser 22 analyzes the sound control instruction and the sound effect control instruction, it will modify the parameter table 42 shown in FIG. 4A or the sound effect table 43 shown in FIG. 4B according to the content of the instruction. The pronunciation control command parser 22 then modifies the tag correspondence table 41 shown in FIG. 4C. As shown in FIG. 4A, when the sound control command parser 22 analyzes the sound control command 110, it adds or modifies the item .42-1 in the parameter table 42. First, the pronunciation control command parser 22 must obtain the usable items 42-1 in the parameter table 42. The item 42-1 can be obtained by inserting a new item in the parameter table 42 or repeatedly using an item that is no longer used in the parameter table 42. After obtaining item 42-1, the pronunciation control instruction -13-this paper size applies the Chinese national standard (CNS 丨 A4 specification (210X297 mm) 1-I-—-H --- r-»——-ί! I β ^-_ I-. »I ί —--- ΐ _n J. (Please read the precautions on the back before filling out this page) 4344 92 Printed by the Shell Sample Consumer Cooperative of the Central Sample Bureau of the Ministry of Economy A7 —______ B7 Description of the Invention (丨勹) ^ ~~~ 'The parser 22 stores the parameters defined in the sound control command 110 丨丨 5 into the fields 42-12, 4243, and 42-U of item 42-1, as shown in the figure. The PID field 4241 in 4A is a parameter identification code. This field is for convenience of explanation. 'It can be ignored in practical use. As shown in FIG. 4B, the pronunciation control command parser 22 analyzes the sound control command 120.' Will add or modify the second item in the sound effect table 43. "Item 43-1." First, the sound control command parser 22 must obtain the usable item 43-1 in the sound effect table 43. The way to obtain the item 43-1 can be in the sound effect. Insert a new item in Table 43, or repeat the use of an item that is no longer used in Sound Table 43. After getting item 43_ After that, the pronunciation control command parser 22 stores the sound effect data file name 125 and the sound effect data content in the sound effect control command 120 into the fields 43-12 and M-ΐ 3 of the item 43-1. AID in FIG. 4B Block 43-11 is the sound data identification code. This block is for convenience of explanation and can be ignored in actual use. In addition, in order to save memory space, the use of the pronunciation control command parser 22 before modifying the sound table 43 You can use the sound effect data file name 125 to search the sound effect table 43. If you find that the same file name already exists, you will not modify it. After the pronunciation control command parser 22 modifies the parameter table 42 or sound effect table 43, Then modify the contents of the unmarked correspondence table 首先 I. First, the pronunciation control command parser 22 recognizes the pronunciation control command 丨 or 120, 111 or 121, the hypermedia tag 113 or i23, and the attribute stop retrieval tag correspondence table 41 'and obtains Item 4M or 41-2. In the search, if item 41- 丨 or 41-2 does not exist, a new item or 41-2 is created. Then the sound control command parser 22 will control the sound control command 11,- 14-This paper is a paper size (CNS) A4 specification (21QX 297). F Please read the notes on the back ¾ before filling in this page.) Binding. Order-t ------- 1 1 II [. .1 4344 y? Α7 Β7 V. Description of the invention (丨仏) Printed by the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs, set the type field 4M3 of head 4i-i to tender, to indicate that it is bad Bit 4H4 points to the parameter table 42t; ㈣ the pronunciation control command parser 22 also sets the index booth 41_14 as the sound control command ι〇 The corresponding item 42 in the parameter table 42-For the sound control command 12, the sound control command The parser 22 sets the pattern bit positions 41-23 of the item to AUDIO 'to indicate that the index position 4 丨 _24 is pointing to the sound effect table 43. At the same time, the analysis of the sound control command H 2 2 will also set the finger position 4 丨 _ 2 4 as the sound control command 12. Corresponding item 43-1 in the sound effect table 43. »In Figure 3, the commands 13, 14Q, and 150 It is a cracked word control instruction. The identification codes 131, 141, and 15 of these instructions are all “, AIjr ^ Each cracked sub-control instruction 30 ′, 140, and 150 define the cracked word strings 133, 143, and 153, and replace the broken word. Phonetic strings 133, 143, and 153 substitute strings 135'145 and 155. The purpose is to use the alternative strings 135, 145 and 155 to make the text-to-speech converter 27 produce the correct pronunciation. The instructions 14 and 15 also define the front and back text strings 147 and 157, which are used to limit the scope to which the instructions 140 and 150 are applicable. As shown in FIG. 5A, when the pronunciation control command parser 22 analyzes the broken word control commands 130, 140, and 150, it will sequentially add or modify items 31-1 '31-2 and 31-3 in the broken word table 31. . The pronunciation control command parser 22 reads the broken strings 133, 143, and 153 and converts them appropriately, and then saves the broken strings 31-11, 31-1, 31-2, and 31-3, 31-21 and 31-31. The pronunciation control command parser 22 reads in the substitute strings 135, 145, and 155 and converts them appropriately, and then stores them in the substitute string fields 3μ of the items 31-1 '31-2 and 31-3 respectively Π' 31 -22 and 31-32. The pronunciation control command parser 22 will read the context I. ---- ^ -----. 1 " i. (Please read the precautions on the back before filling this page) -15-This paper applies the Chinese national standard (CN'S) A4 specification (210 · X 297 mm) 434492 Printed by the Consumer Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs A7 B7 V. Description of the invention (丨 S) Strings 147 and 157 are read and converted into appropriate text, then Deposit the text strings 31-23 and 31-33 before and after items 31-2 and 31-3. In FIG. 3, the instructions 160, 170, and 180 are proper noun control instructions, and the identification codes 161, 171, and 181 of these instructions are " TERM ". Each proper noun control instruction 160, 170, and 180 defines proper noun strings 163, 173, and 183, and translation strings 165, Π5, and 185 that are used in place of proper noun strings 163, 173, and 183. The purpose is to use the translated strings 165, 175, and 185, so that the text-to-speech converter 27 can produce proper pronunciation for proper noun strings 163, 173, and 183. For example, when the text-to-speech converter 27 can only convert Chinese characters, proper noun control instructions 160, 170, and 180 can be used to convert English or _English mixed proper nouns into Chinese voice signals. As shown in FIG. 5B, when the pronunciation control instruction parser 22 analyzes the proper noun control instructions 160, 170, and 180, it will sequentially add or modify items 32-1, 32-2, and 32 in the proper noun table 32. -3. The pronunciation control command parser 22 reads proper noun strings 163, 173, and 183 and converts them as appropriate (detailed later), and then saves them into the specialists of items 32-1, 32-2, and 32-3, respectively. There are noun fields 32-11, 32-21 and 32-31. The pronunciation control command parser 22 reads the translated strings 165, 175, and 185 and converts them as appropriate (detailed later), and stores them into the translation fields of items 32-1, 32-2, and 32-3, respectively. 32-12, 32-22 and 32-32. In Fig. 2, the text converter 26 receives and processes the text content generated by the hypermedia markup language parser 24. The text converter 26 searches for the content of the text to determine whether there is a broken string in the broken word table 31 and a proper noun in the proper name table 32. If you find a broken sound ---------- ^ ------- iT --- ^ ---- 1 -------(Please read the precautions on the back before filling this page) -16-This paper size is applicable to Chinese National Standard (CMS) A4 (2iOx 297 mm) 4344 9 2 A7 B7 printed by the Consumer Cooperatives of the Central Bureau of Standards of the Ministry of Economic Affairs 5. Description of the invention (丨 ^) If there is a noun, the text converter 26 will perform string substitution according to the contents of the phonetic word table 31 or the proper noun table 32. The text converter 26 sends the processed text content to the text-to-speech converter 27. The label converter 25 receives and processes the hypermedia labels generated by the hypermedia markup language parser 24. The tag converter 25 uses this hypermedia tag to search the tag correspondence table 41 to determine whether the hypermedia tag needs to be controlled by sound or effect. If it is found that the hypermedia tag needs to be controlled by voice, the tag converter 25 obtains corresponding parameters from the parameter table 42 according to the index stop of the retrieved items, and sends these pronunciation parameters to the text-to-speech converter 27. If it is found that the hypermedia tag needs sound control, the tag converter 25 obtains the corresponding sound effect data from the sound effect table 43 according to the index field of the retrieved item, and sends it to the sound effect device 16 or the telephone interface device 14. The text-to-speech converter 27 receives and processes the text content from the text converter 26 and the pronunciation parameters from the tag converter 25. The text-to-speech converter 27 changes its parameter settings according to the newly received pronunciation parameters, such as changing the settings of parameters such as the speed, volume, and rhythm of the sound. When the text-to-speech converter 27 receives the text content, it will convert the text content into a voice signal according to the setting value of the pronunciation parameter at that time, and send the result to the sound effect device 16 or the telephone interface device 14. FIG. 6 illustrates the execution steps of the file reading controller 28. In step S1, the hypermedia speech conversion system 10 (the CPU 11 executes the operating system or application program) determines whether it is necessary to read the independent pronunciation control instruction file 21. If yes, the document reading controller 28 will perform the steps (please read the precautions on the back before filling this page) Order-17-This paper size is applicable to China National Standard (CNS) M specifications (2 [0 × 297mm) 4344 92 Printed by A7, B7, Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs 5. Description of the invention (丨 1) S2, read the content of the file, and send the content of the file to the pronunciation control instruction parser 22 to analyze the pronunciation control instruction in step S6. After executing step S6, or if it is not necessary to read the independent pronunciation control instruction file 21, the file reading controller 28 executes step S3 to read the content of the hypermedia file file 23. In step S4, the file reading controller 28 passes the contents of the hypermedia file archive 23 to the hypermedia markup language parser 24 · and separates them into file elements. A file element can be a hypermedia tag, a text string of text content, or a pronunciation control instruction. The file reading controller 28 sequentially reads and processes the file components separated by the hypermedia markup language parser 24. In step S5, the file reading controller 28 determines whether the file element separated by the hypermedia markup language parser 24 is a pronunciation control command. If so, the file reading controller 28 executes step S6 and passes the sound control command analysis to the sound control command parser 22. After executing step S6, the file reading controller 28 returns to step S4 to read and process the next file element separated by the hypermedia markup language parser 24. In step S5, if the read file element is not a sound control command, the file reading controller 28 executes step S7. If the file reading controller 28 finds that the file element is a hypermedia tag in step S7, the file reading controller 28 hands the hypermedia tag to the tag converter 25 for sound and sound control in step S8. The file reading controller 28 then returns to step S4 to read and process the next file element. In step S7, if the read file element is not a hypermedia tag, the file reading controller 28 executes step S9. In step S9, the file reading controller 28 regards the read file element as a text content of a text content, and passes it to a text converter (please read the precautions on the back before filling this page). Line- 18-This paper is compliant with China National Standard (CNS) A4 (210X297 mm), and printed by A7 __________ B7 in the consumer cooperation of the Card Central Standards Bureau of the Ministry of Economic Affairs. In step S10, the file reading controller 28 passes the conversion result to the text-to-speech converter 27 for processing. The sound converter 27 will convert it into a voice signal and pass the sound effect device 16 or the telephone interface device 14 Play it out. The file reading controller 28 then returns to step S4 'to read and process the file element. The file reading controller 28 executes these steps S4_Si repeatedly until all the file elements in the hypermedia file file 23 have been processed. FIG. 7 illustrates an execution flow of the pronunciation control command parser 22. In step S11, the pronunciation control command parser 22 reads the pronunciation control command. In step S12, the pronunciation control command parser 22 determines whether the pronunciation control command is a voice control command according to the command ^ identification code. If so, the 'pronunciation control command parser 22 executes step sn, adds or modifies an item in the tag correspondence table 41, and stores the super tag, the command type (here, PARAM) of the command, and the attribute and parameter indicators. Enter the corresponding fields. The pronunciation control instruction parser 22 then executes step S14, and stores the pronunciation parameters defined by the instruction in the parameter table 42. After executing step S14, the pronunciation control command parser 22 returns to step su to read and process the next pronunciation control command. In step S12, if the command is not a voice control command, the pronunciation control command parser 22 executes step S15 to determine whether the command is a valid control command. If yes, the pronunciation control command parser 22 executes step S16, adds or modifies an item in the tag correspondence table 41, and sets the hypermedia tag, command type (here, Aum〇), attributes, and sound data indicators of the command. Save the corresponding field. Then the pronunciation control finger analyzer 22 executes step S25, and the sound effect defined by the instruction ___- 19-CNS) A4 specification (2 丨 0x297 mm) (Please read the precautions on the back before filling this page) I- I —-I r 1— —κ τ »-6 domain

Oi Μ濟部t央標準局員工消費合作社印製 A7 B7 五、發明説明（丨及其檔案内容存入音效# μ 發音控制指令。 1步驟阳，请取並處理下- 奂犯如果指令不是音效控制指令，發音控則曰令剖析n 22執行步驟S17，·令是否^ ίί=Γ J是的話’發音控制指令剖… 音字表31現有的内容，對該指 ^ 子轉換。也就是說’針對破音字表31的各個 m該指令是否存在有需要轉換的破音字串。右在糾令中發現需要轉換的破音字串，發指令剖析器22 Μ純換料應 = =⑽’發音控制指令剖析…行丄；^ 據專有名列表32現有的内容，對該指令對”名詞…各個項目，：查 α!Φ曰I疋否存在有*要轉換的專有名詞字串。若在該指 :中發現需要轉換的專有名詞字串’發音控制 ;22則將其轉換為對應的課文字串。執行完步驟519, =控制指令剖析H 22執行步驟S2Q,將轉換後的專 ίΐΓϋ制指令存人專有名詞表32。然、後發音控制指析裔22回到步驟S11，讀取並處理下—發音控制才曰令β 在步驟SH’如果指令不是專有名詞控制指令，發音控制指令剖析器22執行步驟S2卜判別該指令是否為破音字控制指令。若是的話’發音控制指令剖析器Μ 執行步驟S22，依據破音字表μ現有的内容，對該俨Oi Μ printed by the Central Bureau of Standards Consumer Cooperatives printed A7 B7 V. Description of the invention (丨 and its file contents are stored in sound effects # μ pronunciation control instructions. 1 step Yang, please take and deal with-offender if the instruction is not sound effects Control instruction, pronunciation control command analysis n 22 execute step S17, · order whether ^ ί = Γ J Yes, 'pronunciation control command analysis ... the existing contents of the phonetic word table 31, the finger ^ conversion. That is,' target For each m of the broken sound table 31, is there a broken sound string that needs to be converted for this instruction. On the right, a broken sound string that needs to be converted is found in the correction order, and the command parser 22 is sent. … OK; ^ According to the existing contents of the distinguished name list 32, the command pairs "nouns ..." for each item: check α! Φ I I 疋 there are * proper noun strings to be converted. If you refer to: It found that the proper noun string that needs to be converted 'pronunciation control; 22 then convert it into the corresponding lesson text string. After executing step 519, = analysis of the control instruction H 22 execute step S2Q to save the converted special instruction People proper nouns table 32. Ran After that, the pronunciation control finger analysis 22 returns to step S11, and reads and processes-the pronunciation control command is only β. In step SH ', if the instruction is not a proper noun control instruction, the pronunciation control instruction parser 22 executes step S2 to determine the Whether the command is a cracked word control command. If it is, the 'pronunciation control command parser M executes step S22, according to the existing content of the cracked word table μ, the

本紙張尺度辦（CNS)赠格（21C)X (請先閱讀背面之注意事項再填寫本頁)The Paper Size Office (CNS) Gift Box (21C) X (Please read the notes on the back before filling this page)

Α7 Β7 經濟部中央標準局員工消費合作社印製五、發明説明（>0) 令進行文字轉換。也就是說，針對破音字表31的各個項目，依序檢查該指令是否存在有需要轉換的破音字串。若在該指令中發現需要轉換的破音字串，發音控制指令剖析器22則將其轉換為對應的替代字串。執行完步驟S22，發音控制指令剖析器22執行步驟S23，將轉換後的破音字控制指令存入破音字表3丨。然後發音控制指令剖析器22回到步騾SU，讀取並處理下一發音控制指令。在步驟S21，如果指令不是破音字控制指令，發音控制指令剖析器22執行步驟S24。在步驟S24中，發音控制指令剖析器22將所讀取的資料視為註解，因此將其忽略。然後發音控制指令剖析器22回到步驟S11，讀取並處理下一發音控制指令。發音控制指令剖析器22 會重覆執行步驟S11-S24,直到所有的發音控制指令都處理完。圖8說明文字轉換器26的執行流程。在步驟S31 中，文字轉換器26讀取超媒體文件檔案23的文字内容。接著文字轉換器26執行步驟S32,依據破音字表31 現有的内容，對該文字内容進行文字轉換。也就是說，文字轉換器26會針對破音字表31的各個項目，依序檢查該文字内容是否存在有需要轉換的破音字串。若在該文字内容中發現需要轉換的破音字串，文字轉換器26 則將其轉換為對應的替代字串。為了執行上的效率，文字轉換器26在針對破音字表31的一項目做文字轉換時，首先搜尋該文字内容，找出該項目的破音字串的位 nil — - I n 1— - 1 n n n n n 1— T n t— ί I . _ 1 % (請先閱讀背面之注意事項再填寫本頁) 本紙張尺度適用中國國家標準（CNS ) A4規格（210X297公釐） 434492 五經濟部中央標準局員工消費合作社印製 A7 B7 、發明説明（叫 2再以㈣置㈣後字元或字串，加上該破音字串，目的前後文字串進行輯，以決定敎字内容的 ^破9字串是否需要以該項目的替代字串來取代。若 =的話’文字轉換器26就以該項目的替代字串來取代 =個！音字串。如果該項目沒有定義前後文字串，文字轉26則直接以替代㈣來取代這個破音字串。處里疋後文子轉換器26繼續以此方式搜尋該文字内容， =並處理下-該項目的破音字串，直龍搜尋完畢。執行完步驟S32,文字轉換器26接著執行步驟如，依據專有名詞表32現有的内容，對步驟S32所轉換完成的文字内容進行文字轉換。也就是說，文字轉換器％會針對專有名财32的各㈣目，料㈣該指令是否存在有需要轉換的專有名詞字I若在該文字内容中發現需要轉換的專有名詞字串，文字轉換器％則將其，換為對應的譯文字串。文字轉換器26在針對專有名詞表32的一項目做文字轉換時，首先搜尋該文字内容’ 找出該項目的專有名詞字串，並直接以譯文字串來取代這個專有名詞字串。處理完後，文字轉換器26繼續以此巧搜尋該文字内容，找出並處理下―該項目的專有名詞字串，直到該文字内容都搜尋完畢。執行完步驟S33 ’文字轉換器26回到步驟S31，讀取並處理下一串由超媒體標示語言剖析器24所產生的文字内容。 ______- 22 - 本紙張从適用中gf國家揉準（CNS )7^77—297公餐_) (請先聞讀背面之注意事項再填寫本X )Α7 Β7 Printed by the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs 5. Description of Invention (> 0) Order text conversion. That is, for each item of the broken word table 31, it is sequentially checked whether there is a broken string in the instruction that needs to be converted. If a broken string to be converted is found in the command, the pronunciation control command parser 22 converts it into a corresponding substitute string. After executing step S22, the pronunciation control instruction parser 22 executes step S23 to store the converted broken word control instruction in the broken word table 3 丨. The pronunciation control command parser 22 then returns to step SU, and reads and processes the next sound control command. In step S21, if the instruction is not a broken word control instruction, the pronunciation control instruction parser 22 executes step S24. In step S24, the utterance control command parser 22 treats the read data as a comment, and therefore ignores it. The pronunciation control command parser 22 then returns to step S11 to read and process the next pronunciation control command. The pronunciation control command parser 22 will repeatedly execute steps S11-S24 until all the pronunciation control commands have been processed. FIG. 8 illustrates an execution flow of the character converter 26. In step S31, the text converter 26 reads the text content of the hypermedia file file 23. Next, the text converter 26 executes step S32, and performs text conversion on the text content according to the existing content of the broken sound word table 31. In other words, the text converter 26 sequentially checks each item of the broken character table 31 for broken characters in the text content to be converted. If a broken string to be converted is found in the text content, the text converter 26 converts it into a corresponding substitute string. In order to perform efficiently, when the text converter 26 performs text conversion on an item of the broken character table 31, it first searches the text content to find the bit of the broken character string of the item nil —-I n 1 —-1 nnnnn 1— T nt— ί I. _ 1% (Please read the notes on the back before filling in this page) This paper size applies to China National Standard (CNS) A4 (210X297 mm) 434492 Employees of the Central Bureau of Standards of the Ministry of Economic Affairs The cooperative prints A7 B7 and the description of the invention (called 2 and then placing the last character or string, plus the broken string, and the purpose of the text string before and after to determine whether the ^ broken 9 string content Replace with the item's alternative string. If = ', the text converter 26 will replace the = string with the item's alternative string! If the item does not define a text string before and after, the text to 26 will be replaced directly取代 to replace this broken sound string. Afterwards, the post sub-converter 26 continues to search for the text content in this way, and ======================================================================================================== This is the end of the search. 26 Then execute the steps such as text conversion of the text content converted in step S32 according to the existing contents of the proper noun table 32. That is to say, the text converter% will aim at each item of the proprietary name 32. Instruction whether there is a proper noun word that needs to be converted. If a proper noun string that needs to be converted is found in the text content, the text converter% will replace it with the corresponding translation string. The text converter 26 is When an item of the proper noun table 32 is used for text conversion, first search the text content to find the proper noun string of the item, and directly replace the proper noun string with the translated string. After processing, the text The converter 26 continues to search for the text content, and finds and processes the proper noun string of the item until the text content is searched. After executing step S33, the text converter 26 returns to step S31, reads Take and process the next string of text generated by the Hypermedia Markup Language Parser 24. ______- 22-This paper is approved by the GF countries (CNS) 7 ^ 77-297 public meals_) (Please first listen Note to fill out the back of this X)

434^^ 434^^ 經濟部中央標準局員工消費合作社印裝 -23 - A7 B7 五、發明説明（圖9說明;^籤轉換器25的執行步驟。為了處理超媒體標籤的巢狀式排列方式，標藏轉換胃25使用一健存在主記憶髋i2或中央處理器u中的堆叠(似⑻，以便於執打超媒體標籤的轉換處理。在步驟，針對一由超媒，標示語言剖析器24所產生的超媒體標藏，標藏轉換益25首先判別其是否為起始標籤。若該超媒體標籤是一起始標籤，標籤轉換器25會執行步驟s42，將其推進(push)堆疊。否則的話，標籤轉換器乃會執行步驟S43，自堆疊中彈出（p〇p) 一超媒體標籤。在步驟S44,標籤轉換器25針對堆疊頂端的超媒體標籤進行標籤轉換，並以該超媒體標錢索標籤對岸表 41 °在步驟S45’標籤轉換器25依據檢索的結果，蚊該超媒體標蚊料對應的參數設定項目（其型態爛位為 PARAM)。若有的話’標籤轉換器25執行步驟⑽，使用該項目的參數指標’自參數表42中讀取對應的參數。然後標籤轉_ 25料些輪送交給文字語音轉換器 27 ’以改變往後文字内容的發音方式。執行完步驟S46’或是該超媒體標籤沒有對應的參數設定項目’標麟換器25執行步驟S47，依據檢索的結果，決定該超媒體標籤是否有對應的音效控制項目（其型態棚位為AUDIC〇。若有的話，標籤轉換器25執行步瑪 S48’使用該項目的音效f料指標，自音絲43令讀取對應的音效資料。織標__ 25將此音效資料送交給音效裝置16或電話介面裝置丨4播出。本紙張纽適财朗( CNS ) Mim ( 210>^97^ (請先閲讀背面之注意事項再填寫本Ϊ -訂缘1. 434492 A7 B7 五、發明説明（在步驟S47,如果該超媒體標籤沒有對應的音效控制項目，標籤轉換器25執行步驟S49，將該超媒體標籤忽略不處理。執行完步驟S48或步驟S49，標籤轉換器25回到步驟S41，等待處理下一由超媒體標示語言剖析器24所產生的超媒體標籤。標籤轉換器25所使用的堆疊，可以確保内層的超媒體元件（HTML dement)可使用自己的聲音及音效控制。同時當回到上層的超媒體元件時，仍能恢復該元件所使用的聲音及音效控制。上述的實施例，只是為了說明本發明所提出的方法。熟悉本行技藝之人士，尚可導出不同的實施方式，而離不開下列申請專利範圍所揭露的精神與範圍。 ---------^-------丨訂---^---—竦 • , (請先閱讀背面之注意事項再填寫本頁) 經濟部中央樣準局員工消費合作社印製本紙張尺度適用中國國家標準（CNS ) A4規格（210X 297公釐）434 ^^ 434 ^^ Printed by the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs-23-A7 B7 V. Description of the invention (Figure 9 illustrates; ^ sign the converter 25 execution steps. In order to handle the nested arrangement of hypermedia labels The label conversion stomach 25 uses a stack stored in the main memory hip i2 or the central processing unit u (likely, in order to facilitate the conversion process of the hypermedia tag. In step, for a hypermedia, markup language parser The hypermedia label generated by 24, and the label conversion benefit 25 first determines whether it is a start label. If the hypermedia label is a start label, the label converter 25 executes step s42 to push it into a stack. Otherwise, the tag converter executes step S43 and pops (p0p) a hypermedia tag from the stack. In step S44, the tag converter 25 performs label conversion on the hypermedia tag at the top of the stack, and uses the hypermedia Tag money tag label on the shore 41 ° At step S45, the tag converter 25 determines the parameter setting item corresponding to the hypermedia tag material (its type is PARAM). If there is a tag conversion, Device Step 25: Use the parameter index of this item to read the corresponding parameters from the parameter table 42. Then, the label is transferred to the text-to-speech converter 27 and transferred to the text-to-speech converter 27 to change the pronunciation of the text content in the future. After executing step S46 'or the hypermedia tag has no corresponding parameter setting item, the standard converter 25 executes step S47, and determines whether the hypermedia tag has a corresponding sound control item (its type of booth) according to the search result. It is AUDIC. If there is, the tag converter 25 executes Buma S48 'to use the item's sound effect f material index, and reads the corresponding sound effect data from the tone 43. Weaving standard __ 25 sends this sound effect data Broadcast to sound effects device 16 or telephone interface device 丨 4. This paper New Zealand Financial (CNS) Mim (210 > ^ 97 ^ (Please read the notes on the back before filling in this card-Binding 1. 434492 A7 B7 Five 2. Description of the invention (In step S47, if the hypermedia tag does not have a corresponding sound effect control item, the tag converter 25 executes step S49, and ignores the hypermedia tag and does not process it. After performing step S48 or step S49, the tag converter 25 Return to step S41 and wait to process the next hypermedia tag generated by the hypermedia markup language parser 24. The stack used by the tag converter 25 can ensure that the inner hypermedia element (HTML dement) can use its own voice And sound effect control. At the same time, when returning to the upper-level hypermedia element, the sound and sound effect control used by the element can still be restored. The above-mentioned embodiments are only for explaining the method proposed by the present invention. Different embodiments can still be derived without the spirit and scope disclosed in the following patent application scope. --------- ^ ------- 丨 Order --- ^ ----- 竦 •, (Please read the notes on the back before filling this page) The paper size printed by the consumer cooperative is applicable to the Chinese National Standard (CNS) A4 specification (210X 297 mm)

Claims

^ 4 3 44 92

ABCD VI. Scope of Patent Application 1. A computer system for converting hypermedia files into voice signals, including: a hypermedia markup language parser, separating a hypermedia markup language format file into text content and markup file structure Hypermedia tags and pronunciation control instructions that control the way of pronunciation; a pronunciation control instruction parser, analyzes the pronunciation control instruction, and modifies the tag correspondence table, sound effect table, parameter table, broken word table and proprietary according to the content of the pronunciation control instruction Noun list; a text converter, which specifies the pronunciation string that must be modified in the broken pronunciation list. When it appears in the text, the pronunciation is modified according to the way specified in the broken pronunciation list. The string specified in the table must be translated. When it appears in the text, it will be translated according to the method specified in the proper noun table. A label converter, according to the label corresponding table, the pronunciation parameters must be modified or inserted. The hypermedia label of the sound effect, when it appears in the hypermedia file, The sound effect table items specified in the table are inserted into the sound effect data, and the corresponding items of the parameter table specified in the table corresponding to the label are used to modify the pronunciation parameters of the text content marked by the hypermedia label; Printed by the Standards Bureau Consumer Cooperative (please read the notes on the back before filling out this page) A text-to-speech converter converts the text content to the voice signal through the modified result of the text converter and the label converter. 2. —A method for converting hypermedia files into speech signals in a pronunciation control instruction parser, including the following steps: analyzing sound control instructions specifying pronunciation parameters such as volume, speed, and prosody that should be used for a particular hypermedia tag, and Analyze that the specified specific ultra-paper size is applicable to the Chinese National Standard (CNS) A4 specifications (2x297 mm) 434492 A8 B8 C8 DS 3. Printed by the Consumer Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs The sound effect control instruction of the sound effect data, and according to the content of the sound control instruction, the pronunciation parameters are stored in the parameter table item, and the corresponding relationship between the hypermedia standard and the parameter table item is set in the tag correspondence table, and according to The content of the sound effect control instruction stores sound effect data into the sound effect table item, and sets the corresponding relationship between the hypermedia label and the sound effect table item in the tag correspondence table; and analyzes the specific content of the text in the specified hypermedia file. The way to change the pronunciation of Zi Zai and the broken word control instructions of the text string before and after, Analyze a proper noun control instruction that specifies a translation method for a specific string in the text content. The pronunciation modification method and the translation method are text strings that can be converted into a speech signal by a text-to-speech converter. The content of the broken word control instruction generates a broken word table, which stores the specific string and the corresponding pronunciation modification method and the before and after text strings, and generates a proper noun table according to the content of the proper noun control instruction, and stores it. The specific string and the corresponding translation method. Steps for converting a hypermedia file into a voice signal in a text converter: The special character string that must be modified according to each item of the word list, and the specific string in the text content of the hypermedia file is used for its specified purpose. For the replacement of the pronunciation of the alternative string, the alternative string can specify multiple pronunciations of the specific word _ as a specific pronunciation; and, this paper must be translated for each item of the proper noun list Ruler, .I ί, π ..-I--J-I s _ I__I (Please read the precautions on the back before filling in this tile) 4344 9 VI. Application for patent scope A8 B8 C8 D8 Central Shake Office of the Ministry of Economic Affairs Printed by the Shellfish Consumer Cooperative

The following steps-28-= substring, replace the specific word in the text content of the hypermedia file = with the specified translation string for translation instead of 'the translation word ...: to make the special ^ string The middle part that cannot be converted into a voice signal is converted into a specified voice signal by a text-to-speech converter. 4 .: The following steps are to convert tags into ^ to change the super ^ into a voice signal to modify the pronunciation parameters and insert sound effects: Establish a hypermedia tag for each item in the tag correspondence table. , Modify the content of the hypermedia label: hypertext label in the hypermedia file according to the number table item specified by the corresponding item of the label to modify the volume, speed, and foot special parameters; and for each label in the corresponding table of the label The project requires that a hypermedia tag of the f effect be inserted, and the hypermedia tag in the hypermedia file is generated in accordance with the sound effect table item specified by the tag correspondence table item. 5. — A step of converting a hypermedia file into a voice signal: W Analyze the pronunciation control instruction. This step includes the following steps. ^ — For the sound control instruction, generate a tag correspondence retrieved by the hypermedia label specified by the sound control instruction. A table item, and generating a parameter table item, storing the volume, tempo, and prosody parameters specified by the sound control instruction; and in the label corresponding table item, setting an index in its index field to point to the parameter table item ; National Standard for Good Paper Standards (CNS) Α4ίι ^ ~ ^ 0χ297 Public Luxury --------- ^ ---- „—., 玎 (Please read the notes on the back before filling in this page)

The scope of the patent application for the sound effect control instruction generates a tag correspondence table item retrieved by the hypermedia tag specified by the sound effect control instruction, and generates a sound effect table item, stores the sound effect data specified by the sound effect control instruction, and stores the sound effect data specified in the sound effect control instruction. In the corresponding table item, set the index in the index block to point to the sound table item; For the broken word control command, generate a broken word table entry that is specified by the broken word control command to specify a specific string that must be modified to be retrieved and stored. An alternative string specified by the broken word control instruction, which can convert the specific string with multiple pronunciations into one of the specific pronunciations; and for a proper noun control instruction, a proper noun is generated The control command specifies a proper noun table entry that must be retrieved by translating a specific string, and stores the translation string specified by the proper noun control command. This translation string can make it impossible to convert from a text-to-speech converter to a voice signal. To convert that specific string to a specific voice signal. β 1 -------- Order (Please read the precautions on the back before filling out this page) Printed by the Consumers' Cooperative of the Central Government Bureau of the Ministry of Economic Affairs 6 · As described in item 5 of the scope of patent application, enter The step includes the following steps: the method described in item 5 of the scope of patent application from the -transfer + transfer of job control instructions of the super file, the step further includes the step of reading each of the pronunciation control instructions. -29 Specifications of this paper · Regulation 43449P ABCD VI. Application scope of patent 8. The method described in item 5 of the scope of patent application further includes the following steps: Analyze the data of a hypermedia file; Retrieve the wearing correspondence table with the hypermedia tag. For the tag correspondence table item obtained through the retrieval, use the index in its index field to obtain the parameter table item and the sound effect table item; use the parameter table item. A set of pronunciation parameters, modify the pronunciation parameters used by the text-to-speech converter in the future; and insert the sound effect data stored in the sound effect table item. 9. The method according to item 8 of the scope of patent application, further comprising the steps of: when encountering a start tag in a hypermedia tag, pushing the start tag into a stack; and encountering a hypermedia tag When the end tag of the tag is popped, a hypermedia tag is popped out of the stack, and the specific hypermedia tag used for retrieval is a hypermedia tag located at the top of the stack. (Please read the precautions on the back before filling this page) Printed by the Central Consumers Bureau of the Ministry of Economic Affairs, Consumer Cooperative 10. The method described in item 5 of the scope of patent application, further includes the following steps: Analyze the data of a hypermedia file Search for the text content of this hypermedia file; -30-This paper size uses the Chinese National Standard (CNS) and private opinion (210 X 297 mm) 434492 A8 BS C8 D8 Printed by the Consumer Standards Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs System 6. The scope of the patent application shall replace all the strings in the text content with the pronunciation string that must be modified in accordance with the provisions of the broken character list, and replace the specified strings in the broken character list item; and in the text content, Any string that must be translated in accordance with the provisions of the proper noun list must be replaced by the translation string specified in the proper noun list item. 11 'The method as described in item 10 of the scope of patent application, wherein the broken word list item also contains a context block' The text content meets the rules of the broken word list item and the pronunciation string must be modified. It must also conform to the pre- and post-text strings specified in the pre- and post-fields of the item. In this case, 'the string will be replaced with the alternative string specified in the broken-character table entry. 12. The method according to item 11 of the scope of patent application, further comprising the following steps: generating a voice signal, the content of which includes the voice signal generated by the sound effect data, the text content of the hypermedia file, the substitute word String and the translated string, the speech signal converted according to the pronunciation parameter. -------—____ '31-This paper size applies to China National Standard (CNS) A4 (210X297 mm) -------------- ft ----- · —— Order (Please read the notes on the back before filling this page)