KR20030058708A

KR20030058708A - Voice recording device using text to speech conversion

Info

Publication number: KR20030058708A
Application number: KR1020010089240A
Authority: KR
Inventors: 전화성; 김은정
Original assignee: 에스엘투(주)
Priority date: 2001-12-31
Filing date: 2001-12-31
Publication date: 2003-07-07

Abstract

PURPOSE: A system for recording a voice using text-to-speech transformation is provided to easily use the system by users of general personal computers, and easily synthesize and process voices of the users, text-transformed voices, or music of general music files. CONSTITUTION: A text-to-speech transformation module(112) transforms text to speech. A speech processing module(114) processes various voice data including music files. A text processing module(116) inputs and processes text documents. A control unit(100) transforms the text inputted through the text processing module to the speech, and plays the voice data from the music files for outputting the voice data to an external speaker or recorder(180).

Description

Voice recording system using text / voice conversion {VOICE RECORDING DEVICE USING TEXT TO SPEECH CONVERSION}

본 발명은 텍스트/음성(TTS: Text to Speech) 변환을 이용한 음성 처리 기술에 관한 것으로, 특히 이를 이용한 음성 녹음 시스템에 관한 것이다.The present invention relates to a speech processing technology using text-to-speech (TTS), and more particularly, to a voice recording system using the same.

TTS 변환을 이용한 음성 처리 기술은 텍스트 메시지를 음성 메시지로 변환하여 합성하는 기술로서, 통상적으로 기본 어휘 요소를 미리 녹음해서 조합하는 방식을 이용하여 텍스트를 음성으로 변환하며, 초기에는 주로 숫자 텍스트를 음성화하는 데 사용되었다. 이러한 TTS 변환을 이용한 음성 처리 기술은 최근에는 국문 텍스트만 아니라 영어를 비롯한 다양한 외국 텍스트까지 음성으로 변환할 수 있도록 개발되었으며, 특히 음소 조합 방식을 적용하여 거의 일반 사람이 이야기하는 것과 유사하게 들리 수 있도록 변환이 가능한 기술에 이르렀다.Speech processing technology using TTS conversion is a technology that converts text messages into voice messages and synthesizes them. Typically, text is converted to speech using a method of pre-recording and combining basic lexical elements. Was used to. Recently, the voice processing technology using the TTS conversion has been developed to convert not only Korean text but also various foreign texts including English, so that the sound can be almost similar to that of ordinary people by applying the phoneme combination method. A technology that can be converted has been reached.

이러한 TTS 변환을 이용한 음성 처리 기술은 일반전화망(PSTN)을 이용한 음성, 팩스(FAX) 및 인터넷 망을 이용한 전자우편까지 다양한 형태의 메시지를 하나의 메일박스로 통합하는 UMS(Unified Messaging System; 통합 메시징 시스템)에 적용되고 있으며, 이외에도, 청각 장애인을 위한 각종 텍스트/음성 변환 솔루션이 등에 적용되고 있다.The voice processing technology using the TTS conversion is a Unified Messaging System (UMS) that integrates various types of messages into one mailbox, such as voice using a general telephone network (PSTN), fax (FAX), and e-mail using an internet network. System) and various text / voice conversion solutions for the hearing impaired.

그런데, 이러한 TTS 변환을 이용한 음성 처리 기술은 그 넓은 적용성에도 불구하고 아직 일반 퍼스널 컴퓨터를 이용하는 사용자들이 쉽고 유용하게 이용할 수 있는 솔루션으로 제공된 것이 전무한 실정이다.However, the voice processing technology using the TTS conversion, despite its wide applicability, has not been provided as a solution that can be easily and usefully used by users of general personal computers.

따라서 본 발명의 목적은 일반 퍼스널 컴퓨터를 이용하는 사용자들이 쉽고 유용하게 이용할 수 있도록 하기 위한 텍스트/음성 변환을 이용한 음성 녹음 시스템을 제공함에 있다.Accordingly, an object of the present invention is to provide a voice recording system using text / voice conversion for easy and useful use by users of general personal computers.

본 발명의 다른 목적은 사용자 음성이나 텍스트를 변환한 음성 또는 일반 음악파일의 음악을 손쉽게 합성 및 처리할 수 있도록 하기 위한 텍스트/음성 변환을이용한 음성 녹음 시스템을 제공함에 있다.Another object of the present invention is to provide a voice recording system using text / voice conversion for easily synthesizing and processing music of a voice or text converted from a user voice or text.

상기한 목적을 달성하기 위하여 본 발명은 텍스트/음성 변환을 이용한 음성 녹음 시스템에 있어서, 텍스트를 음성으로 변환하는 텍스트/음성 변환 모듈과, 음악 파일을 포함하는 음성 데이터를 처리하는 음성 처리 모듈과, 텍스트 문서를 입력 및 처리하는 텍스트 처리 모듈을 포함하며, 상기 텍스트 처리 모듈을 통해 입력한 텍스트를 상기 텍스트/음성 변환 모듈을 통해 음성으로 변환하며, 상기 음성 처리 모듈을 상기 음악 파일로부터 음성 데이터를 재생하여 외부 스피커 또는 녹음기로 출력토록 하는 제어부를 구비함을 특징으로 한다.In order to achieve the above object, the present invention provides a voice recording system using text / voice conversion, a text / voice conversion module for converting text into voice, a voice processing module for processing voice data including a music file, And a text processing module for inputting and processing a text document, converting text input through the text processing module into voice through the text / voice conversion module, and playing the voice data from the music file by the voice processing module. It characterized in that it comprises a control unit to output to an external speaker or recorder.

도 1은 본 발명의 일 실시예에 따른 텍스트/음성 변환을 이용한 음성 녹음 장치의 전체 블록 구성도1 is a block diagram of an entire voice recording apparatus using a text / voice conversion according to an embodiment of the present invention

도 2는 본 발명의 일 실시예에 따른 텍스트/음성 변환을 이용한 음성 녹음 장치의 사용자 인터페이스 화면의 예시도2 is a diagram illustrating a user interface screen of a voice recording apparatus using text / voice conversion according to an embodiment of the present invention.

도 3은 본 발명의 일 실시예에 따른 텍스트/음성 변환을 이용한 음성 녹음 동작의 개략적인 흐름도3 is a schematic flowchart of a voice recording operation using text / voice conversion according to an embodiment of the present invention;

이하 본 발명에 따른 바람직한 실시예를 첨부한 도면을 참조하여 상세히 설명한다. 하기 설명에서는 구체적인 구성 소자 등과 같은 특정 사항들이 나타나고 있는데 이는 본 발명의 보다 전반적인 이해를 돕기 위해서 제공된 것일 뿐 이러한 특정 사항들이 본 발명의 범위 내에서 소정의 변형이나 혹은 변경이 이루어질 수 있음은 이 기술분야에서 통상의 지식을 가진 자에게는 자명하다 할 것이다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the following description, specific details such as specific components are shown, which are provided to help a more general understanding of the present invention, and it is understood that these specific details may be changed or changed within the scope of the present invention. It is self-evident to those of ordinary knowledge in Esau.

도 1은 본 발명의 일 실시예에 따른 텍스트/음성 변환을 이용한 음성 녹음 장치의 전체 블록 구성도이다. 도 1을 참조하면 본 발명에 따른 음성 녹음 시스템은 시스템의 전반적인 동작을 총괄적으로 제어하는 중앙처리장치인 제어부(100)와, 동작 프로그램 메모리로 사용되는 롬(142)과, 각종 동작 수행 중에 발생되는 임시 데이터들을 일시적으로 저장하는 램(144)과, 데이터를 영구적으로 저장하기 위한비휘발성 메모리(146)와, 대용량의 정보를 장기간 저장할 수 있는 하나 이상의 보조 기억장치로서 하드디스크(140)를 구비한다. 또한 상기와 같은 기록매체 외에도 CD-ROM 드라이브를 포함하는 다양한 저장매체들이 구비될 수 있다.1 is a block diagram illustrating an entire voice recording apparatus using text / voice conversion according to an embodiment of the present invention. Referring to FIG. 1, the voice recording system according to the present invention includes a control unit 100, which is a central processing unit for overall control of the overall operation of the system, a ROM 142 used as an operation program memory, and a variety of operations that are generated. RAM 144 temporarily storing temporary data, nonvolatile memory 146 for permanently storing data, and hard disk 140 as one or more auxiliary storage devices capable of storing large amounts of information for a long time. . In addition to the above recording medium, various storage media including a CD-ROM drive may be provided.

또한 본 발명에 따른 음성 녹음 시스템에서 사용자 인터페이스부(126)는 입력부(160)를 통해 사용자와 시스템간을 인터페이스하며, 입력부(160)는 키보드, 마우스 등이 있을 수 있다. 또한 표시부 인터페이스부(124)는 시스템에서 처리되는 정보를 표시부(150)를 통해 사용자에게 표시하기 위한 데이터 처리 동작을 수행하며, 표시부(150)는 예를 들어 모니터(CRT)일 수 있다. 통신 인터페이스부(122)는 시스템과 외부 장치 또는 인터넷과의 통신 기능을 수행하며, 이러한 통신 인터페이스부(122)는 모뎀, 랜카드 등이 될 수 있다. 오디오부(130)는 디지털 음성 데이터를 아날로그 음성 데이터로 변환하여 출력 단자를 통해 외부 스피커 등으로 출력하는 기능을 수행하며, 이러한 오디오부(130)는 일반 사운드 카드가 될 수 있다.In addition, in the voice recording system according to the present invention, the user interface 126 interfaces between the user and the system through the input unit 160, and the input unit 160 may include a keyboard, a mouse, and the like. In addition, the display interface unit 124 performs a data processing operation for displaying the information processed in the system to the user through the display unit 150, the display unit 150 may be, for example, a monitor (CRT). The communication interface 122 performs a communication function between the system and an external device or the Internet. The communication interface 122 may be a modem, a LAN card, or the like. The audio unit 130 converts digital voice data into analog voice data and outputs the same to an external speaker through an output terminal. The audio unit 130 may be a general sound card.

상기한 본 발명에 따른 음성 녹음 시스템의 각 기능부들은 데이터 또는 어드레스 버스로 서로 연결되며, 이러한 각 기능부들은 일반적인 퍼스널 컴퓨터에 구성을 그대로 적용할 수 있다.Each of the functional units of the voice recording system according to the present invention is connected to each other by a data or address bus, and these functional units can be applied to the general personal computer as it is.

한편, 이러한 본 발명에 따른 음성 녹음 시스템의 제어부(100)에는 본 발명의 특징에 따라 음성/텍스트 처리 툴(110)이 구비된다. 음성/텍스트 처리 툴(110)은 본 발명의 특징에 따라 텍스트를 음성으로 변환하는 TTS 모듈(112)과, MP3 파일과 같은 음악 파일을 비롯한 각종 음성 데이터를 처리하는 음성 처리 모듈(114)과, 각종 텍스트 문서를 입력하기 위한 텍스트 처리 모듈(116)을 포함하여 구성된다.상기 텍스트 처리 모듈(116)은 사용자가 직접 입력부(160)를 통해 입력한 문서를 처리할 수도 있으며, 인터넷의 하이퍼텍스트 문서나 또는 각종 문서 편집 프로그램에서 작성한 문서를 복사하여 입력할 수도 있도록 구성한다. 이러한 음성/텍스트 처리 툴(110)은 응용프로그램화 되어 하드디스크(140)에 저장될 수 있으며 제어부(100)에 의해 읽혀져서 동작될 수 있다.On the other hand, the control unit 100 of the voice recording system according to the present invention is provided with a voice / text processing tool 110 in accordance with the features of the present invention. The speech / text processing tool 110 includes a TTS module 112 for converting text to speech according to a feature of the present invention, a speech processing module 114 for processing various voice data including music files such as MP3 files, And a text processing module 116 for inputting various text documents. The text processing module 116 may process a document input by a user directly through the input unit 160, or a hypertext document on the Internet. It is also configured to copy and input documents created by me or various document editing programs. The voice / text processing tool 110 may be programmed and stored in the hard disk 140, and may be read and operated by the controller 100.

이와 같은 구성을 가지는 음성/텍스트 처리 툴(110)은 상기 텍스트 처리 모듈(116)을 통해 사용자가 입력한 텍스트를 상기 TTC 모듈(112)을 통해 음성으로 변환하며, 또한 음성 처리 모듈(114)을 통해 사용자가 선택한 음악 파일과 같은 음성 데이터를 처리한다. 제어부(100)는 이와 같은 음성/텍스트 툴(110)에서 처리된 음성 데이터를 오디오부(130)를 통해 외부 스피커(SPK) 또는 녹음기(180)로 출력하여 가청음으로 출력시키거나 녹음기(180)에 녹음될 수 있도록 한다. 이러한 동작시 제어부(100)는 사용자의 텍스트 또는 음악 파일 선택을 용이토록 하기 위한 사용자 인터페이스 화면을 표시부(150)를 통해 출력하며, 입력부(160)를 통해 사용자의 입력을 받게 된다. 인터페이스 화면의 예로는 도 2에 도시된 바를 들 수 있다.The voice / text processing tool 110 having such a configuration converts the text input by the user through the text processing module 116 into voice through the TTC module 112, and also converts the voice processing module 114 into a voice. It processes voice data such as music files selected by the user. The controller 100 outputs the voice data processed by the voice / text tool 110 to an external speaker (SPK) or the recorder 180 through the audio unit 130 and outputs the audible sound to the recorder 180. Allow it to be recorded. In this operation, the controller 100 outputs a user interface screen through the display unit 150 to facilitate the user's selection of a text or music file, and receives the user's input through the input unit 160. An example of the interface screen may be the bar shown in FIG. 2.

도 2는 본 발명의 일 실시예에 따른 텍스트/음성 변환을 이용한 음성 녹음 장치의 사용자 인터페이스 화면의 예시도이다. 도 2를 참조하면, 사용자 인터페이스 화면(200)은 크게 음악 파일을 실행시키기 위한 음악 파일 실행창(210)과, 텍스트의 입력을 위한 텍스트 입력창(220)으로 이루어진다.2 is an exemplary diagram of a user interface screen of a voice recording apparatus using text / voice conversion according to an embodiment of the present invention. Referring to FIG. 2, the UI screen 200 includes a music file execution window 210 for largely executing a music file and a text input window 220 for text input.

음악 파일 실행창(210)은 현재 실행중인 음악 파일의 출력 동작을 제어하기 위한 메인 창(212)과, 이퀄라이저 조정을 위한 이퀄라이저 창(214) 및 출력되는 음악 파일의 목록 설정을 위한 파일 목록 창(216)으로 이루어진다. 메인 창(212)은 현재 출력되는 음악 파일의 출력 시간과 같은 정보를 표시하는 표시 바(212a)와, 현재 출력되는 음악 파일의 플레이, 정지, 잠시 멈춤과 같은 동작 설정을 입력받기 위한 메뉴 바(212b)를 구비한다. 마찬가지로 파일 목록 창(216)은 파일 열기, 목록 추가 등과 같은 동작 설정을 입력받기 위한 메뉴 바(216a)와, 이에 대한 정보를 표시하는 표시 바(216b)를 구비한다. 이러한 음악 파일 실행창(210)을 통해 사용자는 원하는 각종 음악 파일을 불러 들어 실행되도록 조작할 수 있게 되며, 제어부(100)는 이에 따라 해당 음악 파일을 실행시키게 된다.The music file execution window 210 may include a main window 212 for controlling an output operation of a music file currently being executed, an equalizer window 214 for adjusting an equalizer, and a file list window for setting a list of output music files ( 216). The main window 212 has a display bar 212a which displays information such as the output time of the currently output music file, and a menu bar for receiving operation settings such as play, stop, and pause of the currently output music file ( 212b). Similarly, the file list window 216 includes a menu bar 216a for receiving an operation setting such as opening a file, adding a list, and the like, and a display bar 216b displaying information about the same. Through the music file execution window 210, the user can call and manipulate various music files as desired, and the control unit 100 executes the corresponding music file accordingly.

텍스트 입력창(220)은 실제 텍스트가 입력되는 텍스트 편집 창(220b)과, 이러한 텍스트 편집 창(220b)에 입력되는 텍스트를 음성 변환하기 위한 각종 설정을 입력받기 위한 메뉴 바(220a)를 구비한다. 이러한 텍스트 입력 창(220)을 통해 사용자는 원하는 각종 텍스트를 직접 또는 다른 텍스트 문서에서 복사하여 입력할 수 있으며 이를 음성 변환되도록 조작할 수 있게 되며, 제어부(100)는 이에 따라 해당 텍스트를 음성 변환하여 출력시키게 된다.The text input window 220 includes a text editing window 220b in which actual text is input, and a menu bar 220a for receiving various settings for voice converting text input in the text editing window 220b. . Through the text input window 220, the user can copy and input various texts directly or from another text document, and manipulate the text to be converted into a voice. The control unit 100 converts the text accordingly. Will be printed.

이때, 사용자는 상기 사용자 인터페이스 화면(200)에서 음악 파일 실행 및 텍스트의 음성 변환 조작을 통해 음악 파일 실행과 텍스트의 음성 변환 동작을 각각 별도로 동작시킬 수 있고, 또한 이들을 동시에 동작시켜 음악 파일의 출력과 텍스트의 음성 변환된 출력이 합성될 수도 있도록 한다. 이에 따른 본 발명의 음성 녹음 시스템의 동작을 첨부 도면을 참조하여 이하 보다 상세히 설명한다.In this case, the user may operate the music file and the text-to-speech operation separately by executing the music file and the text-to-speech operation on the user interface screen 200, and simultaneously operate the music file and the text-to-speech operation of the text. Allows speech-converted output of text to be synthesized. Accordingly, the operation of the voice recording system of the present invention will be described in more detail with reference to the accompanying drawings.

도 3은 본 발명의 일 실시예에 따른 텍스트/음성 변환을 이용한 음성 녹음동작의 개략적인 흐름도이다. 도 3을 참조하면, 먼저 310단계에서 사용자의 문서 복사 또는 글 입력이 있게 되며, 320단계에서는 MP3와 같은 음악 파일 열거나 목록 작성하게 된다. 사용자가 입력된 문서의 변환 동작을 조작하게 되면, 312단계에서 이를 음성 변환하여 출력하게 되며, 이에 따른 출력은 342a단계에서 녹음기에 녹음되거나, 342b단계에서 스피커를 통해 출력되어 사용자가 해당 출력 내용을 확인할 수 있도록 한다. 또한 사용자가 음악 파일의 실행 조작을 하게 되면 322단계에서 MP3와 같은 음악 파일이 재생되며, 이에 따른 출력은 342a단계에서 녹음기에 녹음되거나, 342b단계에서 스피커를 통해 출력되어 사용자가 해당 출력 내용을 확인할 수 있도록 한다.3 is a schematic flowchart of a voice recording operation using text / voice conversion according to an embodiment of the present invention. Referring to FIG. 3, first, a user's document copy or writing is input in step 310, and in step 320, a music file such as MP3 is opened or a list is created. If the user manipulates the conversion operation of the input document, the voice is converted into a sound in step 312 and output. The output is recorded in the sound recorder in step 342a, or output through the speaker in step 342b. Make sure to check. In addition, when the user executes the music file, a music file such as MP3 is played in step 322, and the output is recorded in the recorder in step 342a, or output through the speaker in step 342b so that the user can check the output content. To help.

이때 상기 312단계에서와 같은 텍스트 음성 변환 출력 동작과 상기 322단계와 같은 음악 파일 재생 동작은 동시에 수행될 수 있고, 이에 따라 합성된 음성이 342a 및 342b단계에서와 같이 녹음기에 녹음되거나 스피커로 출력될 수 있다. 한편, 이때 332단계에서와 같이 마이크를 통해 사용자가 직접 음성을 입력하게 되면, 이는 상기 음악 파일 재생 출력 및(또는) 상기 텍스트 음성 변환 출력과 합성되어 상기 342a 및 342b단계에서와 같이 녹음기에 녹음되거나 스피커로 출력될 수도 있다.In this case, the text-to-speech output operation as in step 312 and the music file reproduction operation as in step 322 may be performed at the same time. Can be. On the other hand, when the user directly inputs the voice through the microphone as in step 332, it is synthesized with the music file playback output and / or the text-to-speech output and recorded in the recorder as in steps 342a and 342b. It may be output to a speaker.

상기한 본 발명의 음성 녹음 시스템의 동작을 요약하면, 먼저 텍스트만을 음성 변환을 통해 녹음할 수 있으며, 또한 음악 파일만을 재생하여 이를 녹음할 수도 있고, 마이크를 통해 자신의 목소리만을 녹음할 수도 있다. 또한 텍스트의 음성 변환과 음악 파일의 재생 출력을 합성하여 이를 동시에 녹음할 수도 있다. 더불어,마이크를 통해 자신의 목소리와 음악 파일의 재생 출력이나 텍스트의 음성 변환을 합성하여 녹음할 수 있으며, 이 모두를 합성하여 녹음할 수도 있다.Summarizing the operation of the voice recording system of the present invention, first, only text may be recorded through voice conversion, and only a music file may be played and recorded, or only a voice may be recorded through a microphone. You can also synthesize text-to-speech and playback output of music files and record them simultaneously. In addition, the microphone can synthesize and record the voice output of the user's voice and music files, or the voice conversion of the text.

상기와 같이 본 발명이 음성 녹음 시스템의 구성 및 동작이 이루어질 수 있다.As described above, the configuration and operation of the voice recording system can be achieved.

한편 상기한 본 발명의 설명에서는 구체적인 실시예에 관해 설명하였으나 여러 가지 변형이 본 발명의 범위를 벗어나지 않고 실시될 수 있다. 따라서 본 발명의 범위는 설명된 실시예에 의하여 정할 것이 아니고 청구범위와 청구범위의 균등한 것에 의하여 정하여져야 할 것이다.Meanwhile, in the above description of the present invention, specific embodiments have been described, but various modifications may be made without departing from the scope of the present invention. Therefore, the scope of the present invention should not be defined by the described embodiments, but by the claims and equivalents of the claims.

상기한 바와 같이 본 발명에 따른 텍스트/음성 변환을 이용한 음성 녹음 시스템은 일반 퍼스널 컴퓨터를 이용하는 사용자들이 쉽고 유용하게 이용할 수 있으며, 사용자 음성이나 텍스트를 변환한 음성 또는 일반 음악파일의 음악을 손쉽게 합성 및 처리할 수 있다.As described above, the voice recording system using the text / voice conversion according to the present invention can be easily and usefully used by a user using a general personal computer, and can easily synthesize and synthesize music of a user voice or text converted from a voice or a general music file. Can be processed.

Claims

In a voice recording system using a text / voice conversion,

A text-to-speech module that converts text to speech,

A voice processing module for processing voice data including music files;

A text processing module for inputting and processing text documents,

And a control unit for converting text input through the text processing module into voice through the text / voice conversion module, and outputting the voice processing module to reproduce the voice data from the music file to an external speaker or a recorder. A voice recording system characterized by the above.

The voice recording system of claim 1, wherein the document input to the text processing module is a document input by a user directly through an input unit, or is input by copying a document created by another document editing program.

The method of claim 1, wherein the control unit outputs an interface screen for receiving an operation setting from the user to the user,

The interface screen includes a music file execution window for executing the music file, and a text input window for inputting the text.

The voice recording system of claim 1, further comprising a microphone for receiving the voice of the user and outputting the voice to the external speaker or the voice recorder.