KR20000014088A

KR20000014088A - Method of operation control in text/speech conversion system using hypertext mark-up language element

Info

Publication number: KR20000014088A
Application number: KR1019980033298A
Authority: KR
Inventors: 이영직; 이항섭; 이정철
Original assignee: 정선종; 한국전자통신연구원
Priority date: 1998-08-17
Filing date: 1998-08-17
Publication date: 2000-03-06
Also published as: KR100279741B1

Abstract

PURPOSE: A method of operation control in text to speech conversion system(TTS) is provided to enable listening of a hypertext markup language(HTML) and to clarify a content transmittance using a synthetic by defining a HTML elements and an interface of the synthetic speech. CONSTITUTION: The TTS system includes a HTML document analyzer, a synthesizer control command former forming a control command into a structured data, and speech synthesizer. The method is comprising the steps of separating the markup language and interpreting structure of the sentence by inputting the document including the HTML element from the HTML document analyzer, reconstructing the HTML element with sentence unit by using the interpreted document structure transmitted from the HTML document analyzer, generating a synthetic speech by inputting the structured TTS data transmitted from the synthesizer control command former.

Description

Operation Control Method of Text / Speech Converter Using Hypertext Markup Language Element

본 발명은 텍스트/음성변환기(Text-to-Speech conversion system, 이하 TTS라 칭함) 구현 방법에 있어서 하이퍼텍스트 마크업 언어(Hypertext Markup Language, 이하 HTML라 칭함) 문서 구조로 작성된 문서가 입력되는 경우, HTML 요소를 이용한 텍스트/음성변환기의 동작 제어방법에 관한 것이다.According to the present invention, when a document written in a Hypertext Markup Language (HTML) document structure is input in a text-to-speech conversion system (TTS) implementation method, The present invention relates to a method of controlling an operation of a text / voice converter using HTML elements.

HTML는 다른 웹 페이지(Web page)와 연결 기능을 갖는 하이퍼텍스트(Hypertext) 기능과 월드와이드 웹(World Wide Web)상에서 텍스트 문서의 양식, 예를들면 글자의 모양, 크기, 문단의 모양에 대한 상세한 명령을 정의하는 마크업(Markup) 기능을 가지고 있는 언어이다.HTML is a hypertext feature that links to other Web pages and provides details on the form of text documents on the World Wide Web, such as the shape, size, and text of paragraphs. A language that has markup capabilities to define commands.

오늘날 WWW에서 표준으로 채택, 사용하고 있는 문서 양식인 HTML을 텍스트/음성변환기(Text-to-Speech conversion system, 이하 TTS라 칭함)를 이용하여 합성음을 생성할 때 하이퍼텍스트(hypertext) 기능은 배제하고 마크업과 관련된 요소를 이용하여 TTS의 동작을 제어한다.When using the text-to-speech conversion system (hereinafter referred to as TTS), HTML, a document form adopted and used by the WWW as a standard, the hypertext function is excluded. Elements of markup are used to control the operation of the TTS.

상기 TTS의 기능은 컴퓨터가 사용자인 인간에게 다양한 형태의 정보를 음성으로 제공하는데 있다.The function of the TTS is to provide a voice of various types of information to a human being who is a computer.

이를 위해서 TTS는 사용자에게 주어진 텍스트로부터 고품질의 합성음 서비스를 제공할 수 있어야 한다.To this end, the TTS should be able to provide high quality synthesized speech services from the text given to the user.

고품질의 합성음이란 음가가 명료해야 하며 끊어읽기, 음의 길이, 세기, 높이와 같은 운율적 요소들이 적절히 구현되어 자연성이 높아야 한다.High-quality synthesized sound should be clear in sound value and have high naturality by appropriately embodying rhyme elements such as cut-off, length, strength, and height.

이는 마치 텍스트에서 독자에 의미전달을 정확히 하기 위해 적합한 어휘와 띄어쓰기, 구두점 등을 이용하는 것과 동일하다 할 수 있다.This is the same as using the proper vocabulary, spacing, and punctuation to correctly convey meaning to the reader in the text.

그런데 발화에서 나타나는 이들 운율적 요소들은 문장의 의미구조, 구문구조, 단어, 조음결합 현상, 화자의 의도, 발화속도 등이 복합적으로 작용한 결과이다.However, these rhyme elements appearing in speech are the result of the combination of sentence semantic structure, syntax structure, words, articulation combination, speaker's intention, and speech rate.

그러므로 고품질의 합성음을 생성하기 위해서 기존의 음성합성기는 입력되는 텍스트 문장으로부터 읽기변환, 구문구조분석, 운율처리, 신호처리 과정을 통하여 음편선정, 끊어읽기, 음의 길이, 세기, 높이를 조절한 합성음을 생성한다.Therefore, in order to generate high quality synthesized sound, the existing voice synthesizer synthesizes the sound selection, cut-off, sound length, strength and height by reading conversion, syntax structure analysis, rhyme processing, and signal processing from input text sentences. Create

즉 기존의 TTS는 도 1에 도시된 바와 같이 입력된 텍스트로부터 합성음을 생성하기까지 일반적으로 3 단계의 과정을 거치게 된다.That is, the conventional TTS generally goes through a three step process to generate a synthesized sound from the input text as shown in FIG.

1 단계인 언어처리부(1)에서는 먼저 입력된 텍스트로부터 문장부호를 포함한 순수 문장 텍스트만을 분리한다.In the first step, the language processing unit 1 separates only pure sentence text including punctuation marks from the input text.

그리고 분리된 문장으로부터 구문구조분석 과정을 통하여 문장내 구·절의 경계, 어절이나 구의 문법적 기능 등과 관련된 구문구조 정보를 추정한다.And then, through the syntax structure analysis process from the separated sentences, the syntax structure information related to the boundary of phrases and clauses, and the grammatical function of phrases and phrases is estimated.

마지막으로 읽기변환 과정을 통하여 문장을 음소열로 변환한다.Finally, the sentence is converted into a phoneme string through the read conversion process.

추정된 구문구조 정보와 변환된 음소열 출력은 2 단계인 운율처리부(2)로 전달된다.The estimated syntax structure information and the converted phoneme string output are transmitted to a rhyme processing unit 2 which is two steps.

2 단계인 운율처리부(2)에서는 언어처리부(1)에서 전달받은 구문구조 정보와 음소열을 입력으로 하여 규칙과 운율 테이블 이용, 끊어읽기, 소리의 높낮이 강약, 소리의 강약, 소리의 장단과 관련된 운율 파라미터 값을 계산한다.In the second step, the rhyme processing unit 2 inputs the syntax structure information and the phoneme strings transmitted from the language processing unit 1, and uses rules and rhyme tables, reading, cutting up and down the sound, the strength and weakness of the sound, and the length and length of the sound. Compute the prosody parameter value.

즉, 구·절의 경계에서 분리도에 따라 끊어읽기 규칙을 적용하여 적절한 쉼구간을 삽입하고 단어간 수식 관계에 따라 규칙과 테이블을 이용하여 억양과 관련된 피치 값을 계산하여 해당음소에 피치 값을 할당한다.In other words, insert the appropriate rest period by applying the break reading rule according to the degree of separation at the boundary of phrase and clause, and calculate the pitch value related to intonation by using rules and tables according to the formula relation between words and assign the pitch value to the phoneme. .

그리고 음소별 고유 지속시간 값과 음운환경 및 구문구조에 따른 변화 규칙을 이용하여 각 음소의 지속시간을 계산한다.And the duration of each phoneme is calculated by using the phoneme-specific duration value and change rules according to phoneme environment and syntax structure.

또한 각 음소별 조음특성과 문법적 기능을 고려하여 각 음소의 에너지 컨투어를 생성한다.In addition, the energy contour of each phoneme is generated by considering the articulation characteristics and grammatical functions of each phoneme.

이상의 과정을 통하여 추정된 운율 파라미터 값들은 음소열 정보와 함께 3 단계인 신호처리부(3)로 전달된다.Rhythm parameter values estimated through the above process are transmitted to the signal processor 3 having three levels along with phoneme string information.

3 단계인 신호처리부(3)에서는 상기 운율처리부(2)에서 전달받은 음소열과 운율 파라미터 값을 이용하여 합성 단위 데이터베이스(4)에서 적합한 음편을 선택하고 필요시 운율 파라미터 값을 가공하여 이들 음편들을 접합함으로써 원하는 합성음을 생성한다.In the third step, the signal processing unit 3 selects a suitable piece from the synthesis unit database 4 using the phoneme sequence and the rhyme parameter value received from the rhythm processing unit 2, and processes the rhyme parameter value if necessary to join these pieces. This produces a desired synthesized sound.

따라서 기존의 합성기는 언어처리부(1)와 운율처리부(2)에서 합성음의 자연성, 명료도와 관련된 정보를 단지 입력 텍스트만으로 추정해야 함을 알 수 있다.Therefore, it can be seen that the existing synthesizer has to estimate the information related to the naturalness and clarity of the synthesized sound only by the input text in the language processor 1 and the rhyme processor 2.

이 경우 언어처리부와 운율처리부에서 규칙 적용 과정에 개재하는 오류를 피할 수 없고, 또 획일적인 규칙의 적용으로 인하여 문서 작성자의 의도가 합성음에 반영될 방법이 없다.In this case, errors in the rule application process cannot be avoided in the language processing unit and the prosody processing unit, and there is no way that the intention of the document author is reflected in the synthesized sound due to the uniform application of the rule.

이에 대한 보완 방법으로 분석오류 방지와 사용자 의도 표현과 관련된 부가적 정보를 텍스트에 첨가하는 마크업 언어가 제안되어 일부 시도되고 있다.As a supplementary method, a markup language that adds additional information related to prevention of analysis error and expression of user intention to a text has been proposed and attempted.

그 예로는 언어합성 마크업 언어(Speech Synthesis Markup Language, 이하 SSML라 칭함), 구술 텍스트 마크업 언어(Spoken Text Markup Language, 이하 STML라 칭함), 자바 언어 마크업 언어(Java Speech Markup Language, 이하 JSML라 칭함), 세이블 일치 마크업 언어(Sable Consortion Markup Language) 등이 있다.Examples include Speech Synthesis Markup Language (SSML), Oral Text Markup Language (STML), Java Speech Markup Language (JSML). And the Sable Consortion Markup Language.

상기 SSML은 1995년 Edinburgh 대학의 A. Isard가 기존의 표준 범용 마크업 언어(Standard Generalized Markup language, 이하 SGML라 칭함; ISO 8879:1996)를 이용하여 합성에 이용할 수 있도록 최초로 제안한 마크업 언어이다.The SSML is a markup language first proposed in 1995 by A. Isard of Edinburgh University using the existing standard generalized markup language (hereinafter referred to as SGML; ISO 8879: 1996) for synthesis.

SSML에서 사용한 4개의 성분으로는 구 경계, 강조 단어, 특징 발음 표기, 음성파일 내포 등의 정보들로서, 텍스트에 이들 부가정보를 SSML의 표시방식으로 기록하여 합성기에서 이용함으로써 합성음의 부자연스러움을 개선하고자 하였다.The four components used in SSML are information such as phrase boundaries, highlighted words, feature phonetic notation, and inclusion of voice files.In order to improve the unnaturalness of synthesized sound by recording these additional information in text using SSML display method, it is used in synthesizer. It was.

한편, 선행논문으로 Proc. EUROSPEECH’97에 게재된 “A Markup Language for Text-to-Speech Synthesis” 및 tttp://www.cst r.ed.ac.uk/projetcs/sable_spes.2.html에 게재된 “SABLE: A Synthesis Markup Language(version 0.2)”는 문서에 텍스트를 삽입하는 방식으로써 이는 음성합성이 가능하나 HTML 구현이 불가능한 문제점이 있었다.In the meantime, Proc. “A Markup Language for Text-to-Speech Synthesis” in EUROSPEECH'97 and “SABLE: A Synthesis Markup” in tttp: //www.cst r.ed.ac.uk/projetcs/sable_spes.2.html Language (version 0.2) ”is a method of inserting text into a document, which can synthesize speech but cannot implement HTML.

또한, 선행특허로 미국특허 “Authoring and use systems for sound synchronized animation[등록번호 5111409]”및 “Speech animation and infection system[등록번호 5278943]”은 어휘 음성 합성 방식으로 정해진 무제한 어휘 음성합성이 가능하나, HTML 합성기 구현이 불가능한 문제점이 있었다.In addition, the US patents "Authoring and use systems for sound synchronized animation [registration No. 5111409] and" Speech animation and infection system [registration No. 5278943] "as a prior patent is possible unlimited vocabulary speech synthesis determined by the lexical speech synthesis method, There was a problem that the HTML synthesizer could not be implemented.

물론 이전에 각 합성기 제작자마다 특정 문자열을 이용하여 사용자가 부가적 정보를 입력하고 합성기에서 사용할 수 있는 방법을 독자적으로 개발하여 사용하는 경우도 있었으나, 이들 표현방법에서의 표준화에 대한 관심이 높아지면서 SSML에 대한 활용방안에 대해 Edinburgh 대학과 Bell 연구소간의 공동연구가 진행되었고, 그 결과 1996-1997년 만들어 졌다.Of course, in the past, each manufacturer of a synthesizer used a specific character string to input additional information and developed a method that can be used in the synthesizer independently. A joint study between Edinburgh University and Bell Laboratories was undertaken in 1996-1997.

STML은 Edinburgh 대학의 SSML을 기본 골격으로 이용하고, 피치 변화 범위, 엑센트 패턴, 발성자 특성 등과 관련된 정보는 Bell 연구소가 사용하고 있던 것을 수용하는 형태로 작성되었다.STML uses the SSML of Edinburgh University as the basic framework, and the information related to the pitch variation range, accent pattern, and speaker characteristics is written in a form that accepts what Bell Lab was using.

Sun Microsystems에서도 합성음의 품질과 자연성 향상을 위해서 부가정보를 텍스트에 첨가하는 JSML을 제안하였다.Sun Microsystems also proposed JSML to add additional information to text to improve the quality and naturalness of synthesized sound.

JSML은 문장뿐만 아니라 문단까지도 확장된 형태이고 단어, 구에 대한 발음표기, 강조단어, 경계정보, 끊어읽기, 피치 제어, 발화속도 등에 대한 세부적인 제어를 수용할 수 있는 구조 언어일 뿐만 아니라 특정 합성기에 적합한 제어도 가능하다.JSML extends not only sentences but also paragraphs, and is not only a structural language that can accommodate detailed control of words, phrases, phonetic notation, accented words, boundary information, reading, pitch control, speech rate, etc. Control is also possible.

최근 Edinburgh 대학, Bell 연구소, British Telecomm, AT＆T, Sun Microsystems가 모든 합성시스템에 사용될 수 있는 합성용 마크업 언어(Markup Language)의 단일화를 목적으로 Sable Consortium을 형성하였으며 1998년 1월 26일에 Sable v0.1 규격서를 공개하였다.Recently Edinburgh University, Bell Labs, British Telecomm, AT & T and Sun Microsystems formed the Sable Consortium to unify the Markup Language for synthesis that can be used in any synthesis system. .1 The specification has been published.

그러나 이상의 방식은 기존 TTS에서 발생되는 분석오류 방지와 사용자 의도 표현과 관련된 부가적 정보를 텍스트에 인위적으로 첨가하는 수준이다.However, the above method is to artificially add additional information related to prevention of analysis error and expression of user intention generated in the existing TTS to the text.

상기에서 살펴본 바와 같이 기존의 합성기는 언어처리부(1)와 운율처리부(2)에서 합성음의 자연성, 명료도와 관련된 정보를 단지 입력 텍스트만으로 추정해야 함을 알 수 있다.As described above, it can be seen that the existing synthesizer needs to estimate information related to the naturalness and clarity of the synthesized sound by only the input text in the language processor 1 and the rhyme processor 2.

또한 기존의 TTS를 이용하여 인터넷 환경에서 널리 사용되는 HTML 양식의 텍스트를 읽고자 할 때, 언어처리부(1)에서 파서를 이용하여 입력 텍스트에 포함된 부가적 정보를 제거하고 순수 문장 텍스트만 분리해 사용한다.In addition, when you want to read the text of the HTML form widely used in the Internet environment by using the existing TTS, the language processing unit 1 uses the parser to remove the additional information included in the input text and separate only pure sentence text. use.

이 경우 문서에서의 활자 크기, 색상, 모양 등의 부가 정보가 모두 무시되어 문서 작성자의 의도가 합성음에 반영될 방법이 거의 없는 문제점이 있었다.In this case, all the additional information such as the letter size, color, shape, etc. in the document is ignored, and there is a problem that there is almost no way to reflect the intention of the document creator in the synthesized sound.

그리고 마크업 언어(markup language)를 사용하는 방법 또한 기존 TTS에서 발생되는 분석오류 방지와 사용자 의도 표현과 관련된 부가적 정보를 텍스트에 인위적으로 참가하는 수준이다.In addition, the method of using a markup language is also a level of artificially participating in the text of additional information related to the prevention of analysis errors occurring in the existing TTS and expression of user intention.

상기 문제점을 해결하기 위해 본 발명은, 화면에 표시되는 활자 크기, 색상, 모양 등 문서의 모양을 제어하는 HTML 구성 요소와 합성음의 화자, 음색, 발화속도, 억양, 세기간의 인터페이스를 정의하고 합성음 생성에 사용함으로써 가시적 HTML 문서특성의 가청화 구현과 합성음을 이용한 내용 전달의 명확한 차별화를 그 목적으로 한다.In order to solve the above problems, the present invention defines the interface between the HTML component that controls the appearance of the document, such as the type size, color, shape, etc. displayed on the screen and the speaker, tone, speech rate, intonation, intensity of the synthesized sound, and generates the synthesized sound This paper aims to achieve audible visualization of HTML document characteristics and clear differentiation of content delivery using synthesized sound.

도 1은 종래의 텍스트/음성변환기의 구성도,1 is a block diagram of a conventional text-to-speech converter,

도 2는 본 발명이 적용되는 텍스트/음성변환기의 구성도.2 is a block diagram of a text / voice converter to which the present invention is applied.

* 도면의 주요부분에 대한 부호의 설명* Explanation of symbols for main parts of the drawings

1 : 언어처리부 2 : 운율처리부1: language processing unit 2: rhyme processing unit

3 : 신호처리부 4 : 합성 단위 데이터베이스3: signal processor 4: synthesis unit database

5 : HTML 문서 분석기 6 : 합성기 제어 명령어 생성기5: HTML Document Analyzer 6: Synthesizer Control Command Generator

7 : 언어처리부 8 : 운율처리부7: language processing unit 8: rhyme processing unit

9 : 신호처리부 10 : 합성 단위 데이터베이스9: signal processing unit 10: synthesis unit database

상기 목적을 달성하기 위해 본 발명은, 하이퍼텍스트 마크업 언어(HTML) 문서 분석기에서 HTML 구성요소가 포함된 문서를 입력으로 받아서 마크업 언어(markup language)를 분리하고 문서의 구조를 해석하는 제 1 과정, 상기 HTML 문서 분석기로부터 전달받은 문서의 트리 구조의 정보를 이용해서 HTML 요소를 재구성하는 제 2 과정, 상기 합성기 제어 명령어 생성기로부터 전달받은 구조화된 TTS용 데이터를 입력받아 합성음을 생성하는 제 3 과정을 포함하는 것을 특징으로 한다.In order to achieve the above object, the present invention provides a first method for receiving a document including an HTML component from a hypertext markup language (HTML) document analyzer, separating a markup language, and interpreting the structure of the document. A second process of reconstructing an HTML element using information of a tree structure of a document received from the HTML document analyzer, a third process of receiving a structured TTS data received from the synthesizer control command generator and generating a synthesis sound Characterized in that it comprises a.

이하 첨부된 도면을 참조하여 본 발명을 상세히 설명하면 다음과 같다.Hereinafter, the present invention will be described in detail with reference to the accompanying drawings.

도 2 는 본 발명이 적용되는 텍스트/음성변환기의 구성도로서, HTML 문서를 분석하는 HTML 문서 분석기(5), 이 분석 결과를 바탕으로 합성음을 생성할 문장과 문장의 끊어읽기, 억양, 강세, 지속시간에 대한 제어 명령을 구조화된 데이터 형태로 작성하는 합성기 제어 명령어 생성기(6) 및 구조화된 데이터를 입력받아 음성을 합성하는 음성합성기(7)를 구비하고 있다.2 is a block diagram of a text / voice converter to which the present invention is applied, and an HTML document analyzer 5 for analyzing an HTML document, and based on the result of the analysis, a sentence and a sentence to be read, intonation, accent, A synthesizer control command generator 6 for creating a control command for the duration in the form of structured data and a voice synthesizer 7 for receiving the structured data and synthesizing the voice are provided.

상기 HTML 문서 분석기(5)는 HTML 구성요소가 포함된 문서를 입력으로 받아서 마크업 언어(markup language)를 우선 분리하고 문서의 구조를 해석한다.The HTML document analyzer 5 receives a document containing HTML elements as input and first separates markup languages and interprets the structure of the document.

먼저 HTML 문서의 시작과 끝을 나타내는 <HTML>, </HTML>을 확인하고, <HEAD>, </HEAD> 식별자 구역내에 저장된 제목 등의 페이지(page) 정보를 읽는다.First, we check the <HTML>, </ HTML>, which mark the beginning and end of the HTML document, and read page information such as the title stored in the <HEAD>, </ HEAD> identifier section.

그리고 실제 화면에 나타나는 문서의 내용인 본문을 <BODY>, </BODY> 식별자 구역에서 읽어들여서 파서를 이용하여 문서의 구성을 해석한다.The body of the document, which is displayed on the actual screen, is read from the <BODY> and </ BODY> identifier sections and the parser is used to interpret the structure of the document.

마크업 언어(Markup language)는 트리 구조로 작성되므로 해석된 문서는 트리 구조를 가지게 되며 각각의 마크업(markup)은 하나의 노드를 가지며, 트리의 끝단(leaf)은 문장내 단어, 구, 절 혹은 문장이 된다.Since markup language is written in a tree structure, the interpreted document has a tree structure, and each markup has one node, and the leaf of the tree has words, phrases, and clauses within sentences. Or a sentence.

그리고 상위 노드에서 지정된 마크업(markup) 요소는 하위 노드에서 새로이 설정되지 않았다면 하위 노드에서 유효하게 사용된다.And the markup element specified in the parent node is used effectively in the child node unless it is newly set in the child node.

분석된 문서의 트리 구조 정보는 합성기 제어 명령어 생성기(6)로 전달된다.The tree structure information of the analyzed document is passed to the synthesizer control command generator 6.

합성기 제어 명령어 생성기(6)는 HTML 문서 분석기로부터 전달받은 트리 구조 정보로부터 문장의 경계를 검출한 뒤, 문자 단위로 최상의 노드와 문장을 구성하는 끝단들과 연결된 노드들의 HTML 요소들을 이용하여 문장단위로 HTML 요소를 재구성한다.The synthesizer control command generator 6 detects the boundary of a sentence from the tree structure information received from the HTML document analyzer, and then uses the HTML elements of the nodes connected to the ends forming the best node and the sentence in units of sentences. Reconstruct the HTML element.

재구성 과정에서 사용되는 본문의 요소로는 글자의 크기, 물리적 스타일, 논리적 스타일 등이 있다.The elements of the text used in the reconstruction process include letter size, physical style, and logical style.

글자의 크기를 나타내는 요소는 실제 글자크기를 지정하는 <FONTSIZE= >와 머리글 레벨을 지정하는 <H >가 사용된다.The element that represents the size of the font is <FONTSIZE = > And <H specifying a header level > Is used.

물리적 스타일을 나타내는 요소로는 굵은 문자 표시인 , 이탤릭 문자 표시인 , 밑줄 문자 표시인 , 타자체 문자 표시인 <TT>이 있다.Elements representing the physical style include which is a bold letter, which is an italic letter, which is an underlined letter, and <TT> which is a typeface letter.

그리고 논리적 스타일을 나타내는 요소로는 강조 표시인 , 강한 강조 표시인 , 인용 문자열 표시 <CITE>, 컴퓨터 코드 문자열 표시 <CODE> 등이 있다.Elements that represent logical styles include for highlights, for strong highlights, <CITE> for quoted strings, and <CODE> for computer code strings.

그리고 HTML 요소들과 합성기 제어 명령어간의 매핑(mapping) 요소들과 합성기 제어 명령간의 매핑 규칙에 대한 일 예를 나타내었다.And an example of the mapping rule between the elements of the mapping between the HTML elements and the synthesizer control command and the synthesizer control command is shown.

각각의 적용 범위는 아래 1과 같이 해당 태그의 식별자 내에 있는 단어들이다.Each coverage is words within the tag's identifier, as shown below.

<표 1>TABLE 1

분류Classification 태그tag 변화 방법Change way HeadingHeading <Hx></Hx><Hx> </ Hx> 크고 느리게 발성후 약간의 pause를 삽입.Insert a little pause after loud and slow vocalization. Font sizeFont size 폰트크기에 따라 소리 세기를 변화.Change the sound intensity according to the font size. Font colorFont color 에너지/피치값 변화.Energy / pitch change. 물리적 스타일Physical style 강조emphasis 에너지/피치값 변화.Energy / pitch change. U>U> 속도 조절하여 또박또박 발성Speed up and talk again <TT></TT><TT> </ TT> 소리 작게, 피치값 높이기.Lower the volume, increase the pitch. <STRIKE></STRIKE><STRIKE> </ STRIKE> 소리 조절하여 또박또박 발성.Sound throbbing again and again. <BIG></BIG><BIG> </ BIG> 소리 크게 조정.Adjust the sound louder. 소리 작게 조정.Adjust sound quietly. 논리적 스타일Logical style 소리 크게, 피치값 높게 조정Volume up, pitch up 소리 크게 , 피치값 높게 조정Volume Up, Pitch Up <CITE></CITE><CITE> </ CITE> 소리 크게, 피치값 높게 조정Volume up, pitch up <BLINK>/<BLINK><BLINK> / <BLINK> 강조emphasis <CODE></CODE><CODE> </ CODE> 소리 작게, 피치값 높이기Lower the volume, increase the pitch <KDB></KDB><KDB> </ KDB> 소리 작게, 피치값 높이기Lower the volume, increase the pitch 특정한 텍스트Specific text <BLOCKQUOTE></BLOCKQUOTE><BLOCKQUOTE> </ BLOCKQUOTE> 음색 변화, 조금 느리게Tone change, a little slow 또박또박 발성한 뒤 breakAfter talking again and again, break <A HREE=></A><A HREE=> </A> 또박또박 발성한 뒤 breakAfter talking again and again, break <A NAME=></A><A NAME=> </A> 또박또박 발성한 뒤 breakAfter talking again and again, break <A REL=></A><A REL=> </A> 또박또박 발성한 뒤 breakAfter talking again and again, break A REV=></A>A REV => </A> 또박또박 발성한 뒤 breakAfter talking again and again, break A TITLE></A>A TITLE> </A> 또박또박 발성한 뒤 breakAfter talking again and again, break 문서제목Document Title <TITLE></TITLE><TITLE> </ TITLE> 강조하여 또박또박 발성한 뒤 breakAfter accenting again and again, break

이렇게 하여 만들어진 것이 도 2의 구조화된 TTS용 입력 데이터이다.What was thus produced is the structured input data for TTS of FIG.

생성된 구조화된 TTS용 입력 데이터는 구조화된 데이터 음성합성기(7)로 전달된다.The generated input data for the structured TTS is passed to the structured data speech synthesizer 7.

아래 표 2에 구조화된 TTS용 입력 데이터의 일 예를 보이고 있다.Table 2 below shows an example of structured input data for TTS.

<표 2>TABLE 2

HTML 문서의 일 예Example of an HTML document 구조화된 TTS용 입력 데이터Input data for structured TTS <html><head><title>Test Page><title></head><body background="backround/ibori_back.gif"><center>이것은 한 예입니다.</center><h2>머릿글 1 <h2>합성기는 글자 모양에 따라 다른 목소리의 합성음을 만듭니다.</body></html><html> <head> <title> Test Page> <title> </ head> <body background = "backround / ibori_back.gif"> <center> This is an example. < / center> <h2> Header 1 <h2> The synthesizer creates combinations of different voices based on the letter shape. </ body> </ html> <EMPH> 이것은 한 예입니다.</EMPH><VOLUME LEVEL="+40％"> <RATE SPEED="-20％"> 머릿글 1</RATE></VOLUME><BREAK>합성기는 <EMPH><PITCH BASE="20％"> 글자 모양에 </PITCH></EMPH> 따라 <EMPH> 다른 목소리의 </EMPH> 합성음을 만듭니다.<EMPH> This is an example. </ EMPH> <VOLUME LEVEL = "+ 40%"> <RATE SPEED = "-20%"> Heading 1 </ RATE> </ VOLUME> <BREAK> The synthesizer is <EMPH> <PITCH BASE = "20%"> Creates a synthesized sound of <EMPH> different voices </ PITCH> </ EMPH> depending on the shape of the letters.

구조화된 데이터 음성합성기(7)에서는 합성기 제어 명령어 생성기(6)로부터 전달받은 구조화된 TTS용 입력 데이터를 입력받아 문장내 끊어읽기, 억양, 강세, 지속시간을 조절하여 문장에 대한 합성음을 생성하고 이를 외부로 출력한다.The structured data speech synthesizer 7 receives the structured TTS input data received from the synthesizer control command generator 6 and generates a synthesized sound for the sentence by adjusting the reading, intonation, stress, and duration in the sentence. Output to the outside.

상술한 바와 같이 본 발명은, 가시적 HTML 문서특성에 따라 변별적 합성음을 생성함으로써, 웹 브라우즈 읽기, 전자메일 읽기, 주문형 소설(동화) 낭독 서비스 등의 통신서비스와 교육 등의 여러 분야에서 응용할 수 있는 탁월한 효과가 있다.As described above, the present invention can be applied in various fields, such as communication services such as web browsing reading, e-mail reading, and reading novels on-demand (fairy tale), by generating distinctive synthesized sounds according to the characteristics of visible HTML documents. Excellent effect

Claims

In the text / voice conversion method,

An HTML document analyzer that analyzes hypertext markup language (HTML) documents, and uses the results of the analysis to write structured data in the form of commands to control the reading, intonation, accent, and duration of sentences and sentences A synthesizer control command generator and a speech synthesizer for inputting structured data and synthesizing a speech, and separating a markup language by receiving a document including an HTML component as an input from the HTML document analyzer and dividing a markup language. A first process of interpreting;

A second step of reconstructing the HTML elements in sentence units using the HTML elements of the structure of the parsed document received from the HTML document analyzer;

And reconstructing the HTML element, and receiving a structured text / speech converter (TTS) data received from the synthesizer control command generator to generate synthesized sounds. Operation control method of text / voice converter using

The method of claim 1, wherein the first process is

A first step of identifying identifiers indicating the start and end of an HTML document and reading page information of a title stored in the start identifier section;

A second step of reading the body, which is the content of the document appearing on the actual screen after performing the first step, from the body identifier section and using the parser to interpret the structure of the document;

And a third step of transmitting the tree structure information in the structure of the parsed document to the synthesizer control command generator.

The method of claim 1, wherein the second process

Detecting a boundary of a sentence from tree structure information of a document received from a hypertext markup language (HTML) document analyzer;

After performing the first step, a hypertext markup language element comprising a second step of reconstructing the HTML element by sentence unit using the HTML elements of nodes connected to the top node and the ends of the sentence unit in sentence units. Operation control method of text / voice converter using.

The method of claim 1, wherein the second process

An actual text size specified using an element for specifying a size of a character and an element for specifying a header level as an element of a body used in a reconstruction process when reconstructing the hypertext markup language (HTML);

A physical style using bold text, italic text display, underline text display, and typographic text display as elements;

A method of controlling the operation of a text / to-speech converter using a hypertext markup language element characterized by reconstruction using a logical style using highlighting, strong highlighting, quoted string representation, and computer code string representation as elements.

The method of claim 1, wherein the third process is

A first step of receiving input data for a structured text-to-speech converter (TTS) from a synthesizer control command generator;

Using a hypertext markup language element, characterized in that the second step of generating a compound sound for the sentence by outputting the cut-out, intonation, stress, duration of the structured TTS input data and output it to the outside Method of controlling operation of text / voice converter.