KR100275529B1

KR100275529B1 - Text/voice converter and method using trick mode

Info

Publication number: KR100275529B1
Application number: KR1019970025241A
Authority: KR
Inventors: 이항섭; 이정철; 한민수
Original assignee: 정선종; 한국전자통신연구원
Priority date: 1997-06-17
Filing date: 1997-06-17
Publication date: 2000-12-15
Also published as: KR19990001793A

Abstract

PURPOSE: A text-to-voice converter using a trick mode and a method thereof are provided to generates a composite sound by enabling a user to designate an arbitrary sentence using a trick mode and a slide bar and to determine interface between information and a text-to-voice converter. CONSTITUTION: A graphic user interface module(230) sets a file using a file button, and transmits information on the file to an input text analysis module(210). The input text analysis module(210) receives and analyzes the information on the file, and transmits the result. A user input analysis module(220) receives the information on sentence analysis from the input text analysis module(210). The user input analysis module(220) analyzes a user input command inputted from the graphic user interface module(230). In addition, the user input analysis module(220) transmits the user input information analyzed. A synthesizer module(240) converts a text into the voice using the user input information analyzed. The synthesizer module(240) reports information on the sentence outputted as a composite sound to a user through the graphic user interface module(230).

Description

Text-to-speech converter using trick mode and method

발명은 사용자가 그래픽 사용자 인터페이스의 트릭모드(Trick Mode) 및 슬라이더 바를 이용하여 파일내 문장의 위치,문장내 음절 또는 단어 위치를 설정하여 텍스트를 음성으로 변환하는 텍스트/음성변환기(TTS : text-to-speechconversion system) 및 그 방법에 관한 것이다.The present invention provides a text-to-speech converter (TTS: text-to-voice converter) that converts text to speech by setting the position of a sentence, a syllable or a word in a sentence using a trick mode and a slider bar of a graphical user interface. -speech conversion system) and a method thereof.

기존의 텍스트/음성변환기는 문장 단위로 입력된 데이터를 합성음으로 출력하는 단순한 기능을 가지고 있다. 그러므로 파일내에 저장된 문장들을 연속해서 합성음으로 출력하기 위해서는 파일내 문장을 읽어서 텍스트/음성변환기의 입력으로 전달하는 주 제어 프로그램이 필요하다. 이러한 주 제어 프로그램중에는 단순히 파일의 처음부터 끝까지 1회 합성음을 출력하는 방법이 있지만, 사용자가 원하는 위치부터 합성음을 출력할 수 있는 방법이 없어서 사용자 편의성이 거의 고려되지않고 있다. 다른 방법으로는 텍스트 편집기와 연동하여 합성음을 생성하는 것으로서 사용자가 텍스트 편집기의 기능을 이용하여 합성하고자 하는 영역을 선택할 수 있지만 편의성은 낮은 편이다. 최근 주 제어 프로그램에 그래픽 인터페이스를이용하여 영상재생기(VTR)에서와 같이 순차적 검색 및 재생 기능을 구현함으로써 사용자의 편의를 향상시켰다. 그러나,아직 파일내 임의의 문장, 또는 문장내 임의의 위치를 사용자가 임의로 선택하여 합성음을 출력할 수 있는 기능은 구현된것이 없다.Conventional text / voice converters have a simple function of outputting data input in sentence units as synthesized sounds. Therefore, in order to continuously output the sentences stored in the file as a synthesized sound, a main control program which reads the sentences in the file and passes them to the input of the text / voice converter is required. In such a main control program, there is a method of simply outputting the synthesized sound once from the beginning to the end of the file. However, since there is no method of outputting the synthesized sound from the user's desired position, user convenience is hardly considered. Alternatively, it is possible to select a region to be synthesized using the text editor function, but it is less convenient to generate a synthesis sound in conjunction with a text editor. Recently, the user's convenience has been improved by implementing a sequential search and playback function as in a video player (VTR) using a graphical interface to the main control program. However, no function has yet been implemented for the user to arbitrarily select any sentence in the file or any position in the sentence and output the synthesized sound.

또한, 기존의 텍스트/음성변환기는 입력된 문장을 기준으로 합성음을 생성하므로 만일 문장내 임의의 위치에서부터 합성음을 생성하고자 하면 해당 위치 이전의 텍스트를 제외한 문장을 텍스트/음성변환기에 입력으로 전달해야 된다. 그런데,텍스트/음성변환기는 입력문장을 분석하여 운율을 제어하므로 입력문장이 변하면 운율패턴도 변하게 된다. 따라서, 원래의 문장이 비록 동일하더라도 문장내 임의의 위치에서부터 합성음을 생성하면 원래 문장에서의 운율패턴을 유지할 수 없게 된다.In addition, since the existing text / sound converter generates a compound sound based on the input sentence, if a compound sound is to be generated from an arbitrary position in the sentence, the sentence except for the text before the position must be transmitted as an input to the text / sound converter. . However, since the text / voice converter analyzes the input sentence and controls the rhyme, when the input sentence changes, the rhyme pattern also changes. Therefore, even if the original sentence is the same, if the synthesized sound is generated from an arbitrary position in the sentence, the rhyme pattern in the original sentence cannot be maintained.

따라서, 사용자가 트릭 모드를 지원하는 텍스트/음성변환기를 편리하게 사용하기 위해서는 손쉽게 파일내 임의의 문장,또는 문장내 임의의 위치를 사용자가 임의로 선택하여 합성음을 출력할 수 있는 기능과, 문장내 임의의 위치에서부터 합성음을 생성해도 원래 문장에서의 운율패턴이 유지되는 기능의 구현이 필요하다Therefore, in order for a user to conveniently use a text / to-speech converter that supports the trick mode, the user can easily select an arbitrary sentence in the file, or an arbitrary position in the sentence, and output a composite sound, and optionally It is necessary to implement a function that maintains the rhyme pattern in the original sentence even when the synthesized sound is generated from the position of.

따라서, 본 발명은 사용자가 트릭 모드 및 파일 슬라이드바를 이용하여 파일내 임의의 문장을 지정하고, 문장 슬라이더바를 이용하여 문장내 임의의 위치를 지정하며, 이들 정보와 텍스트/음성변환기사이의 인터페이스를 정의하여 합성음을생성하는 텍스트/음성변환기 및 그 방법을 제공하는데 그 목적이 있다Thus, the present invention allows the user to specify arbitrary sentences in the file using trick modes and file slidebars, arbitrary positions within sentences using sentence slider bars, and define the interface between these information and the text / voice converter. To provide a text-to-speech converter and a method for generating the synthesized sound

도 1 은 본 발명이 적용되는 시스템의 구성도,1 is a configuration diagram of a system to which the present invention is applied;

도 2 는 본 발명에 따른 트릭 모드를 이용한 텍스트/음성 변환기의 구성도,2 is a block diagram of a text-to-speech converter using a trick mode according to the present invention;

도 3 은 본 발명에 따른 트릭 모드를 이용한 그래픽 사용자 인터페이스의 일예시도,3 is an exemplary diagram of a graphical user interface using a trick mode according to the present invention;

도 4A 및 4B 는 본 발명에 따른 트릭 모드를 이용한 텍스트/음성 변환방법에 대한 흐름도.4A and 4B are flowcharts illustrating a text / voice conversion method using a trick mode according to the present invention.

*도면의 주요부분에 대한 부호의 설명* Explanation of symbols for main parts of the drawings

101 : 사용자 입력 장치 102 : 중앙 처리 장치101: user input device 102: central processing unit

103 : 텍스트/음성 변환장치 104 : 디지털/아날로그 변환장치103: text / voice converter 104: digital / analog converter

105 : 출력 장치 210 : 입력 텍스트 분석 모듈105: output device 210: input text analysis module

220 : 사용자 입력 분석 모듈 230 : 그래픽 사용자 인터페이스 모듈220: user input analysis module 230: graphical user interface module

240 : 합성기 모듈 250 : 스피커240: synthesizer module 250: speaker

310 : 트릭모드 제어창 320 : 부가 옵션창310: trick mode control window 320: additional option window

330 : 입술 애니메이션 창 340 : 텍스트 출력?/p>330: Lips animation window 340: Text output?

상기의 목적을 달성하기 위한 본 발명의 장치는, 합성을 시작하기 위해 파일 버튼을 이용하여 파일을 설정하고 상기 설정된 파일의 문장, 음절 및 단어수의 정보를 전송하며, 트릭모드의 파일 슬라이더 바를 이용하여 임의의 문장 위치를 설정하고 문장 슬라이더 바를 이용하여 문장내 임의의 위치를 설정하여 그 정보를 전송하는 사용자 그래픽 인터페이스 수단;사용자가 상기 사용자 그래픽 인터페이스에서 파일을 열었을 경우에 파일내 포함된 문장의 모든 정보를 입력받아 문장,음절, 및 단어를 분석하여 그 결과를 전송하는 입력 텍스트 분석 수단; 상기 입력 텍스트 분석 수단으로부터 문장 분석 정보를 입력받아 이용하고 상기 사용자 그래픽 인터페이스 수단으로부터 입력되는 사용자 입력 명령을 분석하여 그 결과를상기 사용자 그래픽 인터페이스 수단을 통해 사용자에게 알려주고, 상기 분석된 사용자 입력 정보를 전송하는 사용자 입력 분석 수단; 및 상기 사용자 입력 분석 수단으로부터 입력되는 분석된 사용자 입력 정보를 이용하여 텍스트를 음성으로변환하고, 현재 합성음으로 출력되고 있는 문장과 문장내의 단어 또는 음절 정보를 상기 사용자 그래픽 인터페이스 수단을 통해 사용자에게 알려주고 음성 출력 장치로 출력하는 합성 수단을 포함한다.The apparatus of the present invention for achieving the above object, to set the file using the file button to start the synthesis and to transmit the information of the sentence, syllable and word count of the set file, using the file slider bar of the trick mode User graphic interface means for setting an arbitrary position of a sentence and transmitting the information by setting an arbitrary position in a sentence using a sentence slider bar; when the user opens a file in the user graphical interface, Input text analysis means for receiving information, analyzing sentences, syllables, and words, and transmitting the result; Receives and uses sentence analysis information from the input text analysis means, analyzes a user input command input from the user graphic interface means, informs the user of the result through the user graphic interface means, and transmits the analyzed user input information. User input analyzing means; And converts the text into speech using analyzed user input information input from the user input analyzing means, and informs the user of the sentence and the word or syllable information in the sentence that are currently output as synthesized sounds through the user graphic interface means. Synthesis means for outputting to an output device.

한편, 본 발명의 방법은, 텍스트/음성변환기에 적용되는 텍스트/음성변환 방법에 있어서, 합성기를 동작시키기 위해 선택한 파일을 여는 제 1 단계; 선택한 파일의 문장을 트릭 모드를 이용해 합성 시작점을 선택하는 제 2 단계; 및 해당하는문장을 합성하여 음성으로 변환하는 제 3 단계를 포함한다.On the other hand, the method of the present invention, a text / speech conversion method applied to the text / speech converter, comprising: a first step of opening a file selected for operating the synthesizer; A second step of selecting a synthesis start point using a trick mode of a sentence of the selected file; And a third step of synthesizing the corresponding sentences and converting them into speech.

그리고, 본 발명의 방법은, 파일 슬라이더의 변경이 있을 경우에 해당 문장을 읽고 문장 슬라이더가 설정된 다음에 합성시작점을 파악하여 선택된 문장을 출력하는 제 4 단계; 및 문장 슬라이더의 변화가 있을 경우에 합성 시작점을 파악하여선택된 문장을 출력하는 제 5 단계를 더 포함한다.In addition, the method of the present invention comprises: a fourth step of reading a corresponding sentence when the file slider is changed, grasping a synthesis start point after the sentence slider is set, and outputting the selected sentence; And a fifth step of identifying the synthesis starting point and outputting the selected sentence when there is a change in the sentence slider.

이하, 첨부된 도면을 참조하여 본 발명에 따른 일시시예를 상세히 설명한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1 은 본 발명이 적용되는 시스템의 구성도로서, 도면에서 101은 사용자 입력 장치, 102는 중앙 처리 장치, 103은 텍스트/음성 변환장치 , 104는 디지털/아날로그(D/A) 변환 장치, 105는 출력 장치를 각각 나타낸다.1 is a block diagram of a system to which the present invention is applied, in which 101 is a user input device, 102 is a central processing unit, 103 is a text / voice converter, 104 is a digital / analog converter, 105 Indicates an output device, respectively.

사용자 입력 장치(101)는 키보드와 마우스 등을 통해 사용자의 입력을 받아 중앙 처리 장치(102)로 전달한다.The user input device 101 receives a user's input through a keyboard and a mouse and transmits the input to the central processing unit 102.

중앙 처리 장치(102)는 사용자의 요구에 따라 적절한 처리를 수행하는 장치로서, 본 발명의 알고리즘이 탑재되어 수행되는 부분이다.The central processing unit 102 is a device that performs appropriate processing according to a user's request, and is a part where the algorithm of the present invention is mounted and performed.

텍스트/음성 변환장치(103)는 텍스트를 입력으로 받아 이를 음성으로 변환하여 주는 장치이다.Text-to-speech converter 103 is a device that receives a text as an input and converts it to voice.

디지털/아날로그(D/A) 변환 장치(104)는 합성이 끝난 디지털 데이터를 아날로그 신호로 변환하여 외부로 출력한다.The digital / analog (D / A) converter 104 converts the synthesized digital data into an analog signal and outputs it to the outside.

출력 장치(105)는 현재 중앙 처리 장치에 의해서 수행되는 내용 및 결과를 표시하며, 사용자에게 필요한 정보를 제공한다.The output device 105 displays the contents and the results currently performed by the central processing unit, and provides the information necessary for the user.

도 2 는 본 발명에 따른 트릭 모드를 이용한 텍스트/음성변환기의 구성도로서, 210은 입력 텍스트 분석 모듈, 220은 사용자 입력 분석 모듈, 230은 그래픽 사용자 인터페이스 모듈, 240은 합성기 모듈, 250은 스피커를 각각 나타낸다.2 is a block diagram of a text / voice converter using a trick mode according to the present invention, 210 is an input text analysis module, 220 is a user input analysis module, 230 is a graphic user interface module, 240 is a synthesizer module, and 250 is a speaker. Represent each.

본 발명은 기존 텍스트/음성변환기의 언어 처리부(241), 운율 처리부(242), 신호처리부(243), 합성 단위 데이터베이스(244)를 기본적으로 구비하고, 입력 텍스트 분석 모듈(210), 사용자 입력 분석 모듈(220), 그래픽 사용자 인터페이스 모듈(230) 및 영상출력장치(105)를 더 구비하고 있다.The present invention basically includes a language processing unit 241, a rhyme processing unit 242, a signal processing unit 243, and a synthesis unit database 244 of an existing text / voice converter, an input text analysis module 210, and user input analysis. The module 220, a graphic user interface module 230, and an image output device 105 are further provided.

그래픽 사용자 인터페이스 모듈(230)은 합성을 시작하기 위해 파일 버튼을 이용하여 파일을 설정하고 상기 설정된 파일의문장, 음절 및 단어수의 정보를 입력 텍스트 분석 모듈(210)에 전송하며, 트릭모드의 파일 슬라이더 바를 이용하여 임의의 문장의 위치를 설정하고, 문장 슬라이더 바를 이용하여 문장내 임의의 위치를 설정하여 사용자가 주어진 텍스트에서합성을 시작하고 싶은 문장과 선택된 문장내에서의 시작 어절 및 음절의 위치를 선택할 수 있게 하며, 그 정보를 사용자입력 분석 모듈(220)로 전송한다.The graphic user interface module 230 sets a file by using a file button to start synthesis, and transmits the sentence, syllable, and word count information of the set file to the input text analysis module 210, and the file in the trick mode. Use the slider bar to set the position of an arbitrary sentence, and use the sentence slider bar to set the position within a sentence to determine the sentence you want to start synthesizing in the given text and the position of the starting word and syllable within the selected sentence. It makes a selection, and transmits the information to the user input analysis module 220.

입력 텍스트 분석 모듈(210)은 입력된 텍스트를 분석하여 텍스트의 구성을 알아내는 모듈로, 입력된 텍스트 데이터가 몇개의 문장으로 구성되었는가를 파악하고, 각각의 문장이 몇 개의 음절과 몇 개의 단어로 구성되었는가를 파악하여 그 결과를 사용자 입력 분석 모듈(220)로 전달한다.The input text analysis module 210 analyzes the input text to find out the composition of the text. The input text analysis module 210 determines how many sentences the input text data is composed of, and each sentence is represented by how many syllables and words. It determines whether it is configured and delivers the result to the user input analysis module 220.

사용자 입력 분석 모듈(220)은 입력 텍스트 분석 모듈(210)의 문장 분석 정보를 이용하고 사용자 그래픽 인터페이스 모듈(230)에서의 사용자 입력 명령을 분석하여 텍스트를 음성으로 변환하고 현재 합성음으로 출력되고 있는 문장과 문장내의단어 또는 음절 위치 정보의 결과를 다시 사용자 그래픽 인터페이스 모듈(230)을 통해 사용자에게 알려주고, 합성기 모듈(240)을 제어할 경우에는 합성기 구동에 필요한 텍스트 데이터와 각종 정보를 합성기 모듈(240)로 전달하고, 합성기를 구동시키는 경우에는 사용자가 그래픽 사용자 인터페이스를 통하여 선택한 합성 시작 문장 및 그 문장의 시작점 정보를 이용하여 선택된 문장과 문장내의 시작 위치 그리고 성별, 발성 속도 등 합성에 필요한 정보를 합성기 모듈(240)로 전달하며, 분석된 사용자 입력 정보를 합성기 모듈(240)로 전송한다.The user input analysis module 220 uses sentence analysis information of the input text analysis module 210 and analyzes a user input command in the user graphic interface module 230 to convert text into speech and output a sentence currently being synthesized. The result of the word or syllable location information in the sentence and the sentence is notified to the user again through the user graphic interface module 230, and when controlling the synthesizer module 240, the text data and various information necessary for driving the synthesizer module 240 In the case of driving the synthesizer, the synthesizer module uses the synthesized start sentence selected by the user through the graphical user interface and the start point information of the sentence, and synthesizes the necessary information such as the selected sentence, the start position in the sentence, and the gender and the voice speed. And passes the analyzed user input information to the synthesizer. Send to module 240.

합성기 모듈(240)은 사용자 입력 분석 모듈(220)에서의 정보를 이용하여 텍스트를 음성으로 변환하기 위해 텍스트를 처리하여 사용자 그래픽 인터페이스 모듈(230)을 통해 사용자에게 텍스트의 어느 위치를 합성하고 있는가를 알려준다. 또한현재 합성되고 있는 음소에 해당하는 입술 모양을 그래픽 사용자 인터페이스 모듈(230)에 전달하여 입술 애니메이션을 가능하게 한다.The synthesizer module 240 processes the text to convert the text into speech using the information in the user input analysis module 220 and informs the user of the location of the text through the user graphic interface module 230. . In addition, the lip shape corresponding to the phoneme being synthesized is transmitted to the graphic user interface module 230 to enable lip animation.

이러한 기능을 구현하기 위해서 각 문장과 문장내의 위치를 나타내는 두개의 슬라이드 바(slider bar)(317,318)와 선택된문장내의 위치를 음절로 나타낼 것인가 아니면 단어로 나타낼 것인가를 선택하는 두개의 라디오 버튼(radio button)(326)을 사용한다. 또한 이 사용자 인터페이스 모듈을 통하여 사용자는 합성기에 관계된 각종 기능을 선택할 수 있으며, 원하는 기능을 수행하고 그 결과를 볼 수 있다.To implement this feature, two slide bars (317, 318) for each sentence and position within the sentence and two radio buttons for selecting whether the position within the selected sentence should be represented by syllables or words. 326 is used. In addition, the user interface module allows a user to select various functions related to the synthesizer, perform a desired function, and view the result.

스피커(250)는 합성기 모듈에서 합성된 음성을 출력해 준다.The speaker 250 outputs the synthesized voice from the synthesizer module.

도 3 은 본 발명이 적용되는 트릭 모드를 이용한 그래픽 사용자 인터페이스의 일예시도로서, 310은 트릭 모드 제어창,320은 부가 옵션창, 330은 입술 애니메이션창, 340은 텍스트 출력창을 각각 나타낸다.3 is an exemplary diagram of a graphical user interface using a trick mode to which the present invention is applied, where 310 is a trick mode control window, 320 is an additional option window, 330 is a lip animation window, and 340 is a text output window.

그래픽 사용자 인터페이스의 구성은 중앙처리장치의 출력을 나타내어주는 그래픽 사용자 인터페이스 모듈로서, 사용자의모든 입력을 수용하며 그 처리 결과를 보여주는 역할을 한다.The configuration of the graphical user interface is a graphical user interface module that represents the output of the central processing unit, and accepts all inputs of the user and shows the result of the processing.

먼저, 트릭 모드 제어창(310)은 비디오 플레이어와 같은 제어 버튼인 플레이(play)(311), 포즈(pause)(312), 리줌(resume)(313), 스톱(stop)(314), 포워드(forward)(315), 및 백워드(backward)(316) 버튼을 구비한다.First, the trick mode control window 310 is a control button such as a video player play 311, pause 312, resume 313, stop 314, forward ( forward 315, and backward 316 buttons.

플레이(play)(311)는 합성기를 구동시키며, 포즈(pause)(312)는 합성기를 일시 정지시키며, 리줌(resume)(313)은 일시 정지 상태에 있는 합성기를 그 위치에서부터 다시 구동시키며, 스톱(stop)(314)은 합성기를 종료시키며, 백워드(backward)(315)는 합성할 문장을 이전 문장으로 이동시키며, 그리고 포워드(forward)(316)는 합성할 문장을 다음 문장으로 이동시킨다.Play 311 drives the synthesizer, pause 312 pauses the synthesizer, resume 313 drives the synthesizer in its paused state from that position, and stops. Stop 314 terminates the synthesizer, backward 315 moves the sentence to be synthesized to the previous sentence, and forward 316 moves the sentence to be synthesized to the next sentence.

또한, 트릭 모드 제어창(310)은 현재 합성하기 위해서 사용될 텍스트가 몇 개의 문장으로 구성되어있고 각 문장이 몇 개의 음절 또는 단어로 구성되어 있는가를 알려주어 합성기의 시작 위치를 사용자로 하여금 선택할 수 있게 하고, 현재 합성되고 있는 문장과 그 문장내의 위치를 알려주기 위해서 문장의 갯수를 나타내는 파일 슬라이더 바(317)와 파일 슬라이더 바에서 선택된 문장의 구성을 나타내는 문장 슬라이더 바(318)를 구비한다. 사용자는 이 두개의 슬라이더 바를 사용하여 합성하기 원하는 문장을 선택하고 선택된 문장내에서의 시작 위치를 정확히 조절할 수 있다. 그리고, 합성을 위한 문장을 선택하기 위하여 슬라이더 바를 움직이면 그 위치에 해당하는 문장이 텍스트 출력창(340)에 표시된다.In addition, the trick mode control window 310 informs the user of the start position of the synthesizer by indicating how many sentences the text to be used for the current composition and how many syllables or words are included in each sentence, A file slider bar 317 indicating the number of sentences and a sentence slider bar 318 indicating the structure of the sentence selected from the file slider bar are provided to inform the currently synthesized sentence and the position within the sentence. You can use these two slider bars to select the sentence you want to combine and to precisely adjust the starting position within the selected sentence. Then, when the slider bar is moved to select a sentence for composition, a sentence corresponding to the position is displayed on the text output window 340.

부가 옵션창(320)은 합성할 문장을 열고 닫는 파일 열기(file open) 버튼(321) 및 파일 닫기(file close) 버튼(322), 입술 애니메이션 온/오프(on/off) 버튼(323), 합성기를 동작 시키는데 필요한 사항을 선택 할 수 있게 해주는 남성음/여성음 선택 버튼(324), 합성음 발성 속도 선택 슬라이더 바(325) 및 문장내 시작 단위 선택 버튼(326)을 구비한다.The additional option window 320 may include a file open button 321 and a file close button 322 for opening and closing a sentence to be synthesized, an on / off button 323 for lip animation, There is a male / female sound selection button 324, a synthesized speech rate selection slider bar 325, and a start unit selection button 326 that allow selection of items necessary for operating the synthesizer.

문장내 시작 단위 선택 버튼(326)의 선택이 달라지면 이에 따라 트릭 모드 제어창(310)의 문장 슬라이더 바(318)의 표시도 변하게 된다.When the selection of the start unit selection button 326 in the sentence is changed, the display of the sentence slider bar 318 of the trick mode control window 310 is changed accordingly.

입술 애니메이션창(330)은 현재 합성기에서 합성하고 있는 것을 음소 환경을 고려해 발성시 사람의 입술 움직임을 표준패턴화하여 만든 입술 모양 사진을 보여줌으로써 마치 사람이 이야기하는 것과 같은 효과를 부여한다. 이 입술 모양의 움직임은 부가 옵션창(320)의 입술 애니메이션 온/오프(on/off) 버튼(323)과 연동 된다.The lip animation window 330 shows a lip-shaped picture made by standard patterning of human lip movements during speech in consideration of the phoneme environment of what is being synthesized in the synthesizer, thereby giving an effect as if a person is talking. This lip movement is linked to the lip animation on / off button 323 of the additional option window 320.

텍스트 출력창(340)은 합성을 위해 현재 준비되어 있는 텍스트 데이터를 보여 주는 것으로 전체 텍스트를 나타내는 전체문장 윈도우와 현재 합성기가 합성하고 있는 문장만을 나타내는 현재 문장 윈도우로 구성되어 있으며, 현재 문장 윈도우의 내용은 트릭 모드 제어창(310)의 파일 슬라이더 바(317), 및 백워드(backward)(315), 포워드(forward)(316) 버튼과 연동된다.The text output window 340 shows text data that is currently prepared for compositing. The text output window 340 includes a full sentence window showing the entire text and a current sentence window showing only the sentences synthesized by the current synthesizer. Is linked with the file slider bar 317 of the trick mode control window 310, and the backward 315 and forward 316 buttons.

도 4A 및 4B 는 본 발명에 따른 트릭 모드를 이용한 텍스트/음성 변환방법에 대한 흐름도로서, 사용자가 사용자 입력장치인 키보드/마우스를 사용하여 그래픽 사용자 인터페이스를 제어하였을 때 해당하는 기능의 수행 흐름을 보여준다.4A and 4B are flowcharts of a text / voice conversion method using a trick mode according to the present invention, and show a flow of performance of a corresponding function when a user controls a graphical user interface using a keyboard / mouse as a user input device. .

먼저, 사용자의 입력이 있었는가를 검사하여(401) 해당하는 기능을 수행한다. 여기서 두 개의 변수인 플레이 식별 플래그(Flag_Play)와 파일 식별 플래그(Flag_File)를 사용하는데 플레이 식별 플래그는 현재 합성기의 동작 여부를 나타내고 파일 식별 플래그는 합성할 대상 파일의 존재 여부를 나타낸다.First, it checks whether there is a user input (401) and performs a corresponding function. Here, two variables, a play identification flag (Flag_Play) and a file identification flag (Flag_File), are used. The play identification flag indicates whether the synthesizer is currently operating and the file identification flag indicates whether a target file to be synthesized exists.

합성기가 동작하기 위해서는 제일 먼저 합성할 대상 파일을 열어야만 한다. 사용자가 파일 열기(402)를 선택하면 원하는파일을 열고(413), 합성할 대상 파일의 존재 여부를 나타내는 파일 식별 플래그를 1로 바꾸고 나서(414) 그 파일의 내용을 입력 텍스트 분석 모듈(415)로 보낸다. 입력 텍스트 분석 모듈(415)은 입력된 텍스트가 몇 개의 문장으로 구성되었으며 각 문장은 몇 개의 단어 그리고 몇 개의 어절로 구성되었는가를 분석한 후, 이 결과로 트릭 모드 제어창의 파일 슬라이더 바와 문장 슬라이더 바를 초기화한다. 파일이 열리면 텍스트 출력창이 나타나며 현재 문장 윈도우에는 첫 번째 문장이 보여진다.In order for the synthesizer to work, you must first open the target file for synthesis. When the user selects Open File (402), the user opens the desired file (413), changes the file identification flag indicating whether the target file to be synthesized to 1 (414), and then inputs the contents of the file into the input text analysis module (415). Send to. The input text analysis module 415 analyzes how many sentences the input text is composed of and how many words and how many words each sentence is, and then initializes the file slider bar and sentence slider bar in the trick mode control window. do. When the file is opened, the text output window appears and the first sentence is displayed in the current sentence window.

그리고, 파일 닫기(403)가 선택되면 현재 사용하고 있는 파일을 닫고, 합성할 파일의 존재 여부를 나타내는 파일 식별 플래그를 0으로 바꾼 후에(417) 입력 텍스트 분석 모듈에서 이미 주어진 값이 0으로 초기화된다(415).If the file close 403 is selected, the file currently used is closed, the file identification flag indicating whether the file to be synthesized is changed to 0 (417), and the value already given by the input text analysis module is initialized to 0. (415).

플레이(play)(404)가 선택되면 파일 슬라이더 바와 문장 슬라이더 바를 조사하여 합성 시작점을 파악한 후에(418) 합성을시작하고(419), 합성기의 동작 여부를 나타내는 플레이 식별 플래그 변수를 1로 바꾸는데(420) 이때 부가 옵션창을 통해합성음의 성별을 검사하고(432), 발성 속도를 검사한 후에(433) 합성기 제어 루틴(434)을 통해 합성기를 동작시킨다(435).When play 404 is selected, the file slider bar and the sentence slider bar are examined to determine the start point of the synthesis (418), then the synthesis is started (419), and the play identification flag variable indicating whether the synthesizer is operating is changed to 1 (420). At this time, the sex of the synthesized sound is examined through the additional option window (432), and after checking the speech speed (433), the synthesizer is operated through the synthesizer control routine (434) (435).

합성기가 동작을 시작하면 합성기는 주어진 시작점 위치에서부터 합성을 시작하며 현재 합성기가 합성을 하고 있는 문장의 위치를 트릭 모드 제어창의 문장 슬라이더 바의 변화로 표시하며(436) 또한 립 싱크(lip sync) 옵션이 온(on) 상태인경우에는(437) 음소 환경을 고려한 표준패턴화된 입술 모양을 입술 애니메이션 창에 표시한다(438).When the synthesizer starts running, the synthesizer starts synthesizing from the given starting point, and displays the position of the sentence currently synthesized by the synthesizer as a change of the sentence slider bar in the trick mode control window (436), and also the lip sync option. In the on state (437), a standard patterned lip shape considering the phoneme environment is displayed on the lip animation window (438).

주어진 문장에 대한 합성이 끝난 후, 사용자의 다른 요청이 없을 경우(401)에는 자동으로 다음 문장을 합성하기 위해서파일 슬라이더 바를 변경한다(440). 만일 방금 합성한 문장이 파일내의 마지막 문장이었다면 파일 슬라이더 바는 더 이상변화(441)가 없을 것이며 합성이 모두 끝났다는 것을 나타내기 위해 플레이 식별 플래그를 0으로 바꾼다(442).After the synthesis of the given sentence is finished, if there is no user request (401), the file slider bar is automatically changed to synthesize the next sentence (440). If the just synthesized sentence was the last sentence in the file, the file slider bar changes the play identification flag to 0 to indicate that there will be no further change 441 and that the synthesis is all finished (442).

파일 슬라이더 바의 위치가 변경되면 그 위치의 문장을 읽어 들이고(427), 선택된 문장에 맞게 문장 슬라이더 바도 변경되고(428) 현재 선택된 문장이 텍스트 출력창에 보여진다(430). 파일 슬라이더 바와 문장 슬라이더 바는 합성기 동작중자동으로 변경되는 경우와 합성기 동작 전에 사용자가 합성의 시작점을 선택하기 위하여 변경하는 경우가 있기 때문에 합성기의 동작 상태를 나타내는 플레이 식별 플래그 변수를 사용하여 두 가지 경우에 대한 처리 흐름을 조절한다(431).When the position of the file slider bar is changed, the sentence at the position is read (427), the sentence slider bar is changed according to the selected sentence (428), and the currently selected sentence is displayed in the text output window (430). The file slider bar and the sentence slider bar are automatically changed during synthesizer operation, and the user can change it to select the starting point of the synthesis before the synthesizer operation. Adjust the processing flow for (431).

만일 합성기가 동작중이었다면 새로이 바뀐 문장에 대하여 합성기는 남성/여성 검사(432)와 발성 속도 검사(433)를 거쳐계속 합성을 수행한다.If the synthesizer was in operation, the synthesizer continued to synthesize the newly changed sentence through the male / female test 432 and the speech rate test 433.

사용자가 합성의 시작점이 문장의 시작이 아닌 경우에 전체 문장을 합성하는 것과 동일한 운율패턴을 유지하기 위하여 합성기는 주어진 문장 전체를 대상으로 먼저 운율 파라미터를 계산한 후 주어진 문장내 시작점에서부터 합성음을 생성한다.In order to maintain the same rhyme pattern as when a user synthesizes a whole sentence when the starting point of the synthesis is not the beginning of the sentence, the synthesizer first calculates a rhyme parameter for the entire sentence and then generates a synthesis sound from the starting point in the given sentence. .

포즈(pause)(406)가 선택되면 합성기를 일시적으로 정지시키며, 이때 리줌(resume)(405)이 선택되면 합성은 현재 서있던위치부터 다시 시작된다. 스톱(stop)(407)이 선택되면 합성기의 동작을 종료시키며, 포워드(forward)(408)가 선택되면 합성할 문장을 하나 다음으로 이동시키고, 백워드(backward)(409)가 선택되면 합성할 문장을 하나 이전으로 이동시킨다. 포워드(forward)(408)와 백워드(backward)(409)의 선택은 파일 슬라이더 바를 직접 움직이는 것과 같은 역할을 하나 합성기가 동작중에 포워드(forward)(408) 또는 백워드(backward)(409)가 선택되면 스톱(stop)(407)이 선택된 것과 같이 일단 합성을 중지시킨 후 파일 슬라이더 바를 움직인다.When pause 406 is selected, the synthesizer is temporarily stopped, and when resume 405 is selected, synthesis resumes from the current standing position. When the stop 407 is selected, the operation of the synthesizer is terminated. When the forward 408 is selected, the sentence to be synthesized is moved to the next one, and when the backward 409 is selected, the synthesizer is synthesized. Move a sentence to the previous one. The selection of forward 408 and backward 409 acts as a direct move of the file slider bar, but the forwarder 408 or backward 409 is disabled while the synthesizer is running. Once selected, stop the synthesis once as stop 407 is selected and then move the file slider bar.

각종 버튼들의 조작없이 문장을 선택하는 파일 슬라이더 바나 문장내의 위치를 선택하는 문장 슬라이더 바를 움직였을 때에는 파일 슬라이더 바의 위치 변화(410)를 감지하거나 문장 슬라이더 바의 위치 변화(411)를 감지하면 합성을 시작할 시작점 위치를 계산하며 선택된 문장을 텍스트 출력창에 보여준다.When the file slider bar for selecting a sentence or the sentence slider bar for selecting a position in a sentence is moved without manipulation of various buttons, the composition starts to be detected by detecting a position change 410 of the file slider bar or a position change 411 of the sentence slider bar. The starting point is calculated and the selected sentence is displayed in the text output window.

이상에서 설명한 본 발명은, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 있어 본 발명의 기술적 사상을 벗어나지 않는 범위내에서 여러 가지 치환, 변형 및 변경이 가능하므로 전술한 실시예 및 첨부된 도면에 한정되는 것이 아니다The present invention described above is capable of various substitutions, modifications, and changes without departing from the spirit of the present invention for those skilled in the art to which the present invention pertains, and the above-described embodiments and accompanying It is not limited to drawing

상기와 같은 본 발명은, 사용자가 화면출력장치의 파일 슬라이더 바, 문장 슬라이더 바, 및 트릭 모드를 구비한 그래픽사용자 인터페이스를 이용하여 파일내 문장의 위치, 문장내 음절 또는 단어 위치를 설정하여 텍스트/음성변환을 시작하도록 하므로써, 다음과 같은 효과를 얻을 수 있다.According to the present invention, the user can set the position of a sentence, a syllable or a word in a file using a graphic user interface including a file slider bar, a sentence slider bar, and a trick mode of a screen output device. By starting the voice conversion, the following effects can be obtained.

첫째, 그래픽 인터페이스를 통하여 사용자로 하여금 현재 합성 대상인 텍스트가 몇 개의 문장으로 이루어져 있고, 각 문장은 몇 개의 단어 및 음절로 이루어졌는가 하는 텍스트의 구성을 쉽게 알 수 있도록 하여 준다.First, through the graphical interface, the user can easily understand the composition of the text of how many sentences and text each syllable is composed of the text to be synthesized.

둘째, 그래픽 인터페이스를 통하여 합성을 시작하고 싶은 위치를 문장 선택과 문장내의 단어 및 음절 선택을 통하여 쉽게설정할 수 있게 해주며, 합성기가 동작중인 경우에 현재 합성기가 합성을 하고 있는 문장과 그 문장내의 위치를 정확히사용자에게 알려준다.Second, through the graphical interface, it is possible to easily set the position to start synthesis through the sentence selection and the words and syllables in the sentence.When the synthesizer is in operation, the sentence currently synthesized and the position in the sentence Tells the user exactly.

셋째, 사용자에 의해서 설정된 합성의 시작 위치가 문장의 시작이 아닌 문장내의 어느 지점일지라도 전체 문장에 대한 운율패턴을 유지하면서 자연스러운 합성음을 만들어준다Third, even if the starting position of the synthesis set by the user is any point in the sentence other than the beginning of the sentence, it creates a natural synthesis sound while maintaining the rhyme pattern for the whole sentence.

Claims

To start compositing, set the file using the file button, transmit the sentence, syllable and word count information of the set file, set the arbitrary sentence position using the file slider bar in trick mode, and use the sentence slider bar. User graphic interface means for setting an arbitrary position in the sentence and transmitting the information;

Input text analysis means for receiving all information of sentences contained in the file when the user opens the file in the user graphic interface, analyzing the sentences, syllables, and words, and transmitting the result;

Receives and uses sentence analysis information from the input text analysis means, analyzes a user input command input from the user graphic interface means, informs the user of the result through the user graphic interface means, and transmits the analyzed user input information. User input analyzing means; And

The text is converted into speech using analyzed user input information input from the user input analyzing means, and the user is informed of the sentence and the word or syllable position information in the sentence and the speech output as the current synthesized sound through the user graphic interface means. Text-to-speech converter including synthesizing means for output to output

The method of claim 1,

The synthesis means,

After calculating the prosody parameter for the entire sentence, the user generates the synthesized sound from the starting point in the sentence designated by the user through the graphic user interface means, and thus synthesizes the synthesized sound from the beginning of the sentence even if it generates the synthesized sound from any position in the sentence. Text to speech converter characterized by having a rhyme.

The method according to claim 1 or 2,

The graphical user interface means,

A trick mode for a user to designate an arbitrary sentence position using a selection button;

A file slider bar that precomputes the total number of sentences in the file using sentence punctuation and informs the user when a target file of text / voice conversion is opened in order to allow the user to easily specify a starting point of the synthesis; And

A text / voice converter comprising a sentence slider bar for calculating a syllable number, which is a unit of position movement within a sentence, to inform the user in advance of the sentence to be synthesized by the user.

The method of claim 3, wherein

The graphical user interface means,

And text input means for allowing a user to specify an arbitrary position of a sentence to be synthesized and an arbitrary position in a sentence based on the total sentence number information in the file and the total syllable information in the sentence displayed on the screen. / Voice converter

The method according to claim 1 or 2,

The graphical user interface means,

A file slider bar that precomputes the total number of sentences in the file using sentence punctuation and informs the user when the target file of text / voice conversion is opened so that the user can easily specify the starting point of the synthesis; And

A text / voice converter, comprising: a sentence slider bar for pre-calculating the number of words that are position movement units within a sentence specified by the user in a sentence to be synthesized by the user and informing the user of the sentence number;

The method of claim 5,

The graphical user interface means,

The apparatus further comprises input means for allowing a user to specify an arbitrary position of a sentence to be synthesized and an arbitrary position in a sentence based on the total number of sentence information in the file and the total number of words in the sentence displayed on the screen. / Voice converter

In the text / voice conversion method applied to the text / voice converter,

A first step of opening the selected file to operate the synthesizer;

A second step of selecting a synthesis start point using a trick mode of a sentence of the selected file; And

A text / voice conversion method comprising a third step of synthesizing a corresponding sentence into a voice

The method of claim 7, wherein

A fourth step of reading a corresponding sentence when the file slider is changed, identifying a synthesis start point, and outputting the selected sentence after the sentence slider is set; And

And a fifth step of identifying the synthesis starting point and outputting the selected sentence when there is a change in the sentence slider.

The method according to claim 7 or 8,

The second step,

Selecting play to determine synthesis starting point and then starting synthesis;

Selecting synthesis to restart synthesis;

Stopping the synthesis for a moment when selecting a pause;

Selecting synthesis stops the synthesis;

Selecting forward to move the object to be synthesized to the previous sentence; And

Selecting backward, the object to be synthesized comprises the step of moving to the back sentence text / voice conversion method comprising the

The method of claim 9,

The third step,

In order to control the lip animation window of the graphic user interface, gender is examined, voice speed is examined, and synthesized through a synthesizer control routine, when a human lip movement is standardized and converted into an image form. Text to speech conversion method characterized in that output to the user as a lip animation using the conversion rule between