KR0153642B1

KR0153642B1 - Character-voice transformation service apparatus and control method of the same

Info

Publication number: KR0153642B1
Application number: KR1019950055898A
Authority: KR
Inventors: 이승훈; 강동규; 정유현; 김대웅
Original assignee: 양승택; 한국전자통신연구원; 이준; 한국전기통신공사
Priority date: 1995-12-23
Filing date: 1995-12-23
Publication date: 1998-11-16
Also published as: KR970056695A

Abstract

본 발명은 정보제공자들에게 접속하여 전자식전화기 가입자들에게 문자도된 메뉴와 정보를 음성으로 변환하여 전화기를 통해서 합성된 음성으로 들려주는 문자-음성변환 서비스 장치 및 그 제어 방법에 관한 것으로, 외부의 전화망접속부(1)에 접속되는 가입자접속수단(10), 상기 가입자접속수단(10)에 접속되는 중앙처리수단(20), 상기 중앙처리수단(20)과 외부의 패킷망접속부(3) 사이에 접속되는 데이터처리수단(30)을 구비하는 것을 특징으로 하는 문자-음성변환 서비스 장치와, 이에 적용되는 문자-음성변환 서비스 제어 방법에 있어서, 상기 가입자접속수단(10)으로 부너 호가 발생하면 중앙처리수단(20)이 상기 패킷망접속부(3)를 통해 정보제공자와 접속하여 데이터를 수신하는 제1단계; 상기 수신한 데이터를 분석하여 현재 상태가 서비스 해제가 아니면 메뉴를 선택하는 단계인가를 분석하는 제2단계; 상기 메뉴선택단계 여부에 따라 메뉴 데이터를 수신하고 합성음를 생성하거나 자료정보를 검출하여 해당하는 합성음을 생성, 전송하는 제3단계; 및 사용자가 누른 DTMF 신호를 수신하여 데이터처리수단(30)을 통해 정보제공자에게 다음 상태로 넘어가기 위한 명령을 전송하는 제4단계를 포함하는 것을 특징으로 하는 문자-음성변환 서비스 제어 방법은 문자를 음성으로 변환하여 이용자들에게 음서의 형태로 정보를 제공할 수 있는 효과가 있다.The present invention relates to an apparatus for controlling a text-to-speech service and a method of controlling the same, which are connected to information providers and converted to a voice menu and information to an electronic telephone subscriber to a voice synthesized through a telephone. Connection between the subscriber connection means 10 connected to the telephone network connection part 1, the central processing means 20 connected to the subscriber connection means 10, the central processing means 20 and the external packet network connection part 3; In the text-to-speech service apparatus and a method for controlling the text-to-speech service applied thereto, the central processing means is generated when the subscriber access means 10 generates a burner call. A first step (20) of connecting to an information provider through the packet network connection unit (3) to receive data; A second step of analyzing the received data to determine whether a current state is a step of selecting a menu unless the service is released; A third step of receiving menu data according to whether the menu selection step is performed, generating synthesized sound or detecting data information to generate and transmit a corresponding synthesized sound; And a fourth step of receiving the DTMF signal pressed by the user and transmitting a command to the information provider to move to the next state through the data processing means 30. It is effective to provide information in the form of a sound note to the user by converting to voice.

Description

Text-to-speech service device and control method thereof

제1도는 본 발병이 적용된 통신망 시스템의 전체적인 구성도.1 is an overall configuration diagram of a communication network system to which the present invention is applied.

제2도는 본 발명에 따른 문자-음성변환 서비스장치의 하드웨어 구성도.2 is a hardware configuration diagram of a text-to-speech service apparatus according to the present invention.

제3도는 본 발명에 따른 문자- 음성변환 서비스장치의 전체적인 소프트웨어 구성도.3 is a general software diagram of a text-to-speech service apparatus according to the present invention.

제4도는 본 발명에 따른 음성합성파라미터의 구조도.4 is a structural diagram of a speech synthesis parameter according to the present invention.

제5도는 본 발명에 따른 음성합성 처리 흐름도.5 is a flowchart of speech synthesis processing according to the present invention.

제6도는 본 발명에 따른 문자-음성변환 서비스의 전체적인 제어 흐름도.6 is a general control flow diagram of a text-to-speech service according to the present invention.

* 도면의 주요 부분에 대한 부호의 설명* Explanation of symbols for the main parts of the drawings

1 : 전화망접속블럭 2 : 문자-음성변환블럭1: Telephone network connection block 2: Text-to-speech block

3 : 패킷망접속블럭 10 : 가입자접속부3: packet network access block 10: subscriber access unit

20 : 중앙처리부 30 : 데이타처리부20: central processing unit 30: data processing unit

40 : 음성합성부40: speech synthesis unit

본 발명은 하이텔과 같은 정보제공자들에게 접속하여 전자식전화기 가입자들에게 문자로된 메뉴와 정보를 음성으로 변환하여 전화기룰 통해서 합성된 음성으로 들려주는 문자-음성변환 서비스 장치 및 그제어 방법에 관한 것이다.The present invention relates to an apparatus for controlling a text-to-speech service and a method of controlling the same, which are connected to information providers such as a hightel, and converted into a voice menu and information to a subscriber of an electronic telephone to a voice synthesized through telephone rules. .

현재 사용중인 음성서비스는 주로 제공하고자 하는 정보를 음성으로 녹음하여 디지탈로 저장한 뒤 사용자가 원하는 정보는 다시 아날로그로 변환하여 재생하는 방식을 취하고 있다. 이 방식은 녹음장비, 아날로그/디지탈 변환장비 등이 있어야만이 정보제공시스템이 정보를 입력할수 있으므로, 누구나 손쉽게 어디에서나 원하는 정보를 제공하기가 어려운 문제점이 있었다.The voice service currently in use has a method of recording the information to be provided by voice and storing it digitally, and then converting the information desired by the user to analog again and playing. This method has a problem that it is difficult for anyone to provide the desired information anywhere, because the information providing system can input information only if there is a recording equipment, analog / digital conversion equipment.

본 발명은 상기와 같은 종래기술의 문제점을 해결하기 위하여 안출된 것으로, 누구나 글자를 입력할 수 있는 장비만 있으면 정보제공이 가능할 수 있도록 많은 사람들이 사용하는 컴퓨터나 단말기를 통하여 정보제공시스템에 정보를 문자형태로 입력하면 이 문자를 음성으로 변환하여 이용자들에게 음성의 형태로 정보를 제공하는 문자-음성변환 서비스 장치 및 그 제어 방법을 제공함에 그 목적이 있다.The present invention has been made to solve the problems of the prior art as described above, the information provided to the information providing system through a computer or a terminal used by many people to be able to provide information if anyone can input the character It is an object of the present invention to provide a text-to-speech service apparatus and a method of controlling the same, which converts the text into speech and provides the user with information in the form of speech.

상기 목적을 달성하기 위하여 안출된 본 발명의 문자-음성변환 서비스 장치는, 일반 전화 가입자를 수용하는 공중 전화망에 접속되는 전화망접속부, 정보 제공자를 수용하는 공중 데이터망 사이에 접속되는 패킷망접속부로 이루어지는 통신망 시스템에 적용되는 문자-음성변환 서비스 장치에 있어서, 링신호 및 후크온 신호를 감지하여 후크온 신호를 상기 전화망접속부로 보내 호절단울 요구하고, 합성된 아날로그 음성을 PCM(Pulse Code Modulation) 신호로 변환하여 상기 전화망접속부로 전송하는 가입자접속수단; 전반적인 시스템의 동작을 제어하는 것으로, 상기 가입자접속수단에 합성된 음성을 제공하는 음성합성기능을 실시간으로 수행하기 위해 고속의 디바이스와 멀티태스킹 OS(Operating System)가 동작할 수 있도록 구성된 중앙처리수단; 상기 중앙처리수단과 패킷망접속부 사아에 접속되어 외부와의 데이터 입출력을 담당하는 데이터처리수단; 및 음성합성부는 문자를 음성으로 변환하는 것으로, 상기 중앙처리수단의 제어에 따라 디지탈 음성을 합성하여 가입자접속수단으로 전송되도록 하는 음성합성수단을 구비하는 것을 특징으로 한다.In order to achieve the above object, a text-to-speech service device of the present invention is a communication network comprising a telephone network connection unit connected to a public telephone network accommodating a general telephone subscriber and a packet network connection unit connected to a public data network accommodating an information provider. In the text-to-speech service device applied to the system, the ring-signal and hook-on signals are sensed and the hook-on signal is sent to the telephone network connection to request a call disconnection, and the synthesized analog voice is converted into a pulse code modulation (PCM) signal. Subscriber access means for converting and transmitting to the telephone network connection unit; A central processing unit configured to operate a high-speed device and a multitasking OS to perform a voice synthesis function for providing a synthesized voice to the subscriber access unit in real time by controlling the overall system operation; Data processing means connected to the central processing means and the packet network connection part and performing data input / output with the outside; And a voice synthesizing means for converting a text into a voice and synthesizing the digital voice under the control of the central processing means and transmitting the synthesized voice to the subscriber access means.

또한, 문자-음성변환 서비스 제어 방법은, 상기 문자-음성변환 서비스 장치에 적용되는 문자-음성변환 서비스 제어 방법에 있어서, 상기 가입자접속수단으로부터 호가 발생하면 중앙처리수단이 상기 패킷망접속부를 통해 정보제공자와 접속하여 데이터를 수신하는 제1단계 ; 상기 수신한 데이터를 분석하여 현재 상태가 서비스 해제가 아니면 메뉴를 선택하는 단계인가를 분석하는 제2단계; 상기 메뉴 선택단계 여부에 따라 메뉴 데이터를 수신하고 합성음을 생성하거나 자료정보를 검출하여 해당하는 합성음을 생성, 전송하는 제3단계; 및 사용자가 누른 DTMF 신호를 수신하여 데이터처리수단를 통해 정보제공자에게 다음상태로 넘어가기 위한 명령을 전송하는 제4단계를 포함하는 것을 특징으로 한다.In addition, the text-to-speech service control method, in the text-to-speech service control method applied to the text-to-speech service apparatus, when a call is generated from the subscriber access means, the central processing means is an information provider through the packet network connection unit; A first step of connecting with and receiving data; A second step of analyzing the received data to determine whether a current state is a step of selecting a menu unless the service is released; A third step of receiving menu data according to whether the menu selection step is performed, generating synthesized sounds or detecting data information to generate and transmit a corresponding synthesized sound; And a fourth step of receiving the DTMF signal pressed by the user and transmitting a command to the information provider to move to the next state through the data processing means.

즉, 상기한 구성으로 된 본 발명에 의하면 수많은 정보를 손쉽고, 신속하게 서비스 할 수 있으며, 정보제고 데이터베이스도 음성데이타가 아닌 텍스트파일 형태의 문자로 구성하므로 사용하는 저장매체의 사이즈도 작아진다는 장점이 있다.That is, according to the present invention having the above-described configuration, a large number of information can be easily and quickly serviced, and the information enhancement database is composed of text file type characters rather than voice data, so the size of the storage medium used is also reduced. There is this.

한편, 본 발명의 서비스 영역으로는 국내에 있는 여러 정보제공자들 중에서 하이텔과 연결하여 문자-음성변환을 하도록 시험하고 있다. 하이텔은 일반 전화망을 통해서 접속되고 있는 가장 보편적인 정보통신 서비스로서 사용자들은 뉴스, 기상, 증권정보, 문화/생활, 동호회모임 등 수많은 정보를 얻고 공유할수 있다. 물론 본 장치의 서비스 영역은 하이텔로만 국한되는 것은 아니고,부가가치통신망 사업자이면 누구나 서비스 가능하다.On the other hand, in the service area of the present invention, it is being tested to make a text-to-speech connection by connecting with a hightel among various information providers in Korea. Heytel is the most common information and communication service connected through the general telephone network. Users can get and share a lot of information such as news, weather, stock information, culture / life, club meetings and so on. Of course, the service area of the device is not limited to only hightel, and anyone who is a value-added network operator can service it.

이하, 첨부된 도면을 참조하여 본 발명의 일 실시예를 상세히 설명한다.Hereinafter, with reference to the accompanying drawings will be described an embodiment of the present invention;

제1도는 본 발명이 적용된 통신망 시스템의 전체적인 구성도로서, 일반 전화 가입자를 수용하는 공중 전화망에 접속되는 전화망접속블럭(1), 상기 전화망접속블럭(1)에 접속되는 본 발명의 문자-음성변환블럭(2), 상기 문자-음성변환블럭(2)과 정보 제공자를 수용하는 공중 데이타망 사이에 접속되는 패킷망접속블럭(3)으로 구성되어 있다.1 is an overall configuration diagram of a communication network system to which the present invention is applied and includes a telephone network connection block 1 connected to a public telephone network accommodating a general telephone subscriber and a text-to-speech conversion of the present invention connected to the telephone network connection block 1. A block 2, a packet network connection block 3 connected between the text-to-speech block 2 and a public data network for accommodating the information provider.

도면에 도시된 바와 같이 상기 전화망접속블럭(1)은 일반전자식전화기를 가진 사용자들의 전화기 인터페이스를 담당하는 부분으로서, 전자교환기와의 접속 등을 수행하며 전화가 걸려올때마다 채널을 할달하여 문자-음성변환블럭(2)과 연결시키는 기능을 가지고 있다.As shown in the figure, the telephone network connection block 1 is a part in charge of the telephone interface of users having a general electronic telephone. The telephone network access block 1 performs a connection with an electronic switch and sends a channel every time a call is received. It has a function of connecting with the conversion block (2).

문자-음성변환블럭(2)는 본 발명이 적용되는 부분으로서, 상기 전화망접속블럭(1)과 접속하여 접속신호들을 주고 받는 기능, 음성합성알고리즘을 구동하여 문자를 음성으로 변환하는 기능, 패킷망접속블럭(3)을 통해 정보제공자와 데이타를 주고받는 기능으로 구성되어 있다.The text-to-speech conversion block 2 is a part to which the present invention is applied, a function of connecting with the telephone network connection block 1 to send and receive connection signals, a function of driving a voice synthesis algorithm to convert text into voice, and a packet network connection. It consists of the function of exchanging data with the information provider through the block (3).

패킷망접속블럭(3)은 공중데이타망을 통하여 정보테이타베이스를 연결시키는 X.25 설정 및 연결과 해제를 수행하면서 문자-음성변환블럭(2)과 정보제공자를 연결시켜준다.The packet network access block (3) connects the text-to-speech block (2) and the information provider while performing the X.25 setup, connection, and disconnection that connect the information database through the public data network.

제2도는 본 발명에 따른 문자-음성변환 서비스장치의 하드웨어 구성도로서, 가입자 접속부(10), 중앙처리부(20), 데이터처리부(30), 음성합성부(40)으로 나누어져 있다.2 is a hardware configuration diagram of a text-to-speech service apparatus according to the present invention, which is divided into a subscriber connection unit 10, a central processing unit 20, a data processing unit 30, and a voice synthesis unit 40.

가입자접속부(10)는 PCM(Pulse Code Modulation) 신호변환을 위한 코덱, 접속채널을 설정하는 채널 셀렉터 및 DTME(Dual Tone Multifrequency) 신호를 감지하여 8비트로 변환하는 DTMF 변환기로 구성되어 있으며, 기능들을 다음과 같다.The subscriber access unit 10 is composed of a codec for PCM (Pulse Code Modulation) signal conversion, a channel selector for setting an access channel, and a DTMF converter that detects a dual tone multifrequency (DTME) signal and converts the signal into 8 bits. Same as

첫째, 링신호 및 후크온 신호를 감지하여 후크온 신호를 전화망접속블럭(1)으로 보내 호절단을 요구기능, 둘째, 합성된 아날로그 음성을 PCM(2.048MHz)신호로 변환하여 전화망접속블럭(1)으로 전송기능, 셋째, 전화접속채널 설정기능, 넷째, DTMF 신호 변환기능이 있다.First, it detects ring signal and hook-on signal and sends hook-on signal to telephone network access block (1) to request call disconnection. Second, converts synthesized analog voice into PCM (2.048MHz) signal and telephone network access block (1). ), There is transmission function, third, dial-up channel setting function, fourth, DTMF signal conversion function.

중앙처리부(20)는 중앙처리장치(CPU), 부동소수점 처리장치(FPU:Floating point Processing Unit), 메모리소자(ROM, SRAM, DRAM, DPRAM), 및 디바이스 제어기로 구성되어 있으며, 전체 하드웨어를 감시하고 시스템 프로그램 및 응용프로그램이 동작하는 부분이다. 특히 음성합성기능을 실시간으로 수행하기 위해 고속의 디바이스와 멀티태스킹 OS(Operating System)가 동작할 수 있도록 구성하였다.The central processing unit 20 is composed of a central processing unit (CPU), a floating point processing unit (FPU), memory devices (ROM, SRAM, DRAM, DPRAM), and a device controller, and monitors the entire hardware. This is the part where system program and application program operate. In particular, high-speed devices and multitasking operating systems (OS) can be operated to perform voice synthesis.

그 구성을 자세히 설명하면, MC68030 CPU(33MHz), MC68882, FPU(33MHz), 1M바이트 ROM, 2M바이트 SRAM, 16M바이트 DRAM, 4K워드 DPRAM, 및 디바이스 제어기로 구성되어 있다.The configuration will be described in detail with the MC68030 CPU (33MHz), MC68882, FPU (33MHz), 1Mbyte ROM, 2Mbyte SRAM, 16Mbyte DRAM, 4K word DPRAM, and device controller.

ROM에는 시스템제어기능이, SRAM에는 빠른 속도의 실행을 필요로하는 음성합성프로그램이, DRAM에는 대용량의 메모리를 요구하는 합성데이타베이스가 탑재되며, DPRAM은 음성합성부(40)와 데이터를 주고받기 위해서 사용된다.The ROM has a system control function, the SRAM has a voice synthesis program that requires fast execution, the DRAM has a synthesis database that requires a large amount of memory, and the DPRAM exchanges data with the voice synthesizer 40. Used for

데이터처리부(30)는 시리얼 입출력, 이더넷 입출력으로 구성되어 있으며, 중앙처리부(20)와 외부화의 데이타 입출력을 담당한다. 시리얼입출력은 MC68901 MFP, Z8530 SCC를 사용하여 3개의 입출력 포트를 구현하였으며 패킷망접속블럭(3)과 데이타를 주고 받는데 사용된다. 즉, 전화가입자가 누른 DTMF신호는 8비트 데이타로 변환되어 시리얼입출력을 통해 패킷방접속블럭(3)으로 전송되며, 정보제공자로 부터 전송되는 데이타는 다시 패킷망접속블럭(3)과 연결된 데이타처리부(30)을 통해 중앙처리부(20)로 전송된다.The data processing unit 30 is composed of serial input / output and Ethernet input / output, and is responsible for data input / output of the central processing unit 20 and externalization. Serial I / O implements three input / output ports using MC68901 MFP and Z8530 SCC and is used to exchange data with packet network access block (3). That is, the DTMF signal pressed by the subscriber is converted into 8-bit data and transmitted to the packet access block 3 through serial I / O, and the data transmitted from the information provider is again connected to the data processing unit connected to the packet network access block 3 ( 30 is transmitted to the central processing unit 20.

그리고, AM7990 렌스(LANCE)로 구성된 이더넷 입출력은 시스템의 디버깅 및 14M바이트 정도의 용량을 가진 합성데이타베이스 및 응용프로그램의 다운로딩에 사용된다.Ethernet I / O configured with AM7990 LANCE is used for debugging the system and downloading synthetic databases and applications with a capacity of about 14M bytes.

음성합성부(40)는 문자를 음성으로 변환하는 부분으로서, TMS320C30 DSP(Digital Signal Processor)(33MHz), 2M바이트 SRAM, 16K바이트 ROM, TLS320C46 AIC, 및 디바이스 제어기로 구성되어 있다. 음성합성 알고리즘은 ROM과 SRAM에서 실행되며 중앙처리부(20)로부터 DPRAM에 저장되는 합성파라미터들을 이용하여 신호처리적인 합성을 수핸한다. 합성된 디지탈 음성은 0.0625msec 마다 14비트의 해상도를 가진 AIC를 통해 가입자접속부(10)의 코덱으로 전송된다.The voice synthesizer 40 converts a text into a voice, and is composed of a TMS320C30 DSP (33 MHz), a 2M byte SRAM, a 16K byte ROM, a TLS320C46 AIC, and a device controller. The speech synthesis algorithm is implemented in ROM and SRAM and handles signal processing synthesis using the synthesis parameters stored in the DPRAM from the central processing unit 20. The synthesized digital voice is transmitted to the codec of the subscriber access unit 10 through an AIC having a resolution of 14 bits every 0.0625 msec.

제3도는 본 발병에 따른 문자-음성변환 서비스장치의 전체적인 소프트웨어 구성도로서, PSOLA합성을 수행하는 TMS320C30 DSP(Digital Signal Processor)부(101)와, 중앙처리부(20)에서 동작하는 VRTX32 OS부(102)로 나눌 수 있다.3 is an overall software configuration diagram of the text-to-speech service apparatus according to the present invention, which includes a TMS320C30 DSP (Digital Signal Processor) unit 101 for performing PSOLA synthesis and a VRTX32 OS unit operating in the central processing unit 20 ( 102).

중앙처리부(20)에서 동작하는 VRTX32 OS부(102)는 DPRAM 접속구동기(110), 시리얼 접속 구동기(120), 이더넷 접속 구동기(130), 전화기접속 구동기(140), 인터럽트 서비스 루틴(150), 태스크관리(160), 시스템제어(170), 및 합성파라미터생성(180)이 있다.The VRTX32 OS unit 102 operating in the central processing unit 20 may include a DPRAM connection driver 110, a serial connection driver 120, an Ethernet connection driver 130, a telephone connection driver 140, an interrupt service routine 150, Task management 160, system control 170, and synthetic parameter generation 180.

상기 DPRAM 접속 구동기(110)는 DPRAM을 통해서 DSP부(101)와 합성파라미터를 주고 받을 수 있도록 해주는 부분으로서, 4K워드의 DPRAM 영역중에서 최상위 번지인 OxFFF와 OxFFE 어드레스를 이용하여 OxFFE 번지는 중앙처리부(20)가 음성합성부(40)으로 인터럽트를 걸어 합성에 필요한 파라미터들을 전송할때, OxFFF 번지는 음성합성부(40)가 중앙처리부(20)로 인터럽트를 걸어 합성 중간과정에서 발생하는 결과들을 넘겨줄때 사용한다. 이외의 나머지 DPRAM영역은 주고받는 합성 파라미터들의 저장에 사용한다.The DPRAM connection driver 110 is a part that allows the DSP unit 101 to transmit and receive a synthesis parameter through DPRAM, and the OxFFE address is the central processing unit using OxFFF and OxFFE addresses, which are the highest addresses in the DPRAM region of 4K words. When 20) interrupts the voice synthesis unit 40 and transmits the necessary parameters for synthesis, the OxFFF addressed by the voice synthesis unit 40 interrupts the central processing unit 20 to pass the results generated in the middle of the synthesis process. use. The rest of the DPRAM area is used to store the compositing parameters.

시리얼 입출력 접속 구동기(120)는 시리얼포트로부터 송수신되는 데이타들을 관리하는 부분으로서, 데이타송신은 주고받는 데이타의 야이 적으므로 폴링방식으로 처리하고 데이타수신은 정보제공자로부터 많은 양의 데이타가 한꺼번에 들어오므로 인터럽트방식으로 처리하며, 송수신 모두 9600bps로 동작하도록 구성하였다.The serial I / O connection driver 120 manages the data transmitted and received from the serial port. Since the data transmission is less expensive, the data transmission is handled in a polling manner. It is handled by interrupt method and configured to operate at 9600bps.

이더넷 접속 구동기(130)는 시스템의 디버깅 및 14M바이트 정도의 대용량을 차지하는 합성 데이타베이스와 응용프로그램을 중앙처리부(20)의 메모리에 워크스테이션으로부터 다운로딩하기 위해서 사용하며, 전송속도는 10Mbps 급이다.The Ethernet connection driver 130 is used for debugging a system and downloading a synthetic database and an application program, which occupies a large capacity of about 14M bytes, from the workstation in the memory of the central processing unit 20, and the transmission speed is 10Mbps.

전화기접속 접속 구동기(140)는 가입자접속부(10)로부터 들어오는 후크 온/오프 신호, 링신호, 및 DTMF 신호를 처리하는 부분으로서 인터럽트가 걸려오면 일정번지를 읽어 본 뒤 3가지 신호중어떤 인터럽트가 들어온 것인지 판단하여 처리하도록 구성하였다.The telephone connection connection driver 140 processes hook on / off signals, ring signals, and DTMF signals coming from the subscriber connection unit 10, and reads a certain address when an interrupt is received, and then selects one of three signals. It is configured to judge and process.

인터럽트 서비스르틴(150)은 상기 접속 구동기들(110-140)과 시스템을 제어하는 타이머인터럽를 관리하는 기능을 수행하며, 각각의 인터럽트가 걸려오면 해당하는 응용프로그램에 메일박스나 카운터를 이용하여 알려줌으로써 이에 대응하는 처리를 할 수 있도록 한다.The interrupt service routine 150 manages a timer interrupt for controlling the connection drivers 110-140 and the system, and informs a corresponding application program using a mailbox or a counter when each interrupt occurs. To make the corresponding processing possible.

태스크관리(160)는 시스템제어(170)에서 생성하는 태스크들을 각각의 순위에 따라 순차적으로 실행하는 역할을 하며 VRTX32 OS 커널에서 담당한다.Task management 160 is responsible for sequentially executing the tasks generated by the system control 170 in accordance with their respective ranks and is in charge of the VRTX32 OS kernel.

시스템제어(170)는 시스템의 전반적인 관리를 하며, 태스크의 생성 및 소멸 상기 인터럽트 서비스루틴(150)의 관리, 메일박스 및 큐의 관리 및 변수의 관리등을 수행한다.The system control 170 performs overall management of the system, and creates and destroys tasks, manages the interrupt service routine 150, manages mailboxes and queues, manages variables, and the like.

합성단위생성(180)은 시리얼입출력부(30)로부터 전송받은 문자를 음성 합성에 필요한 파라미터들로 변환하는 역할을 수행하며, 생성된 파라미터들은 DPRAM을 통해 음성합성부(40)로 전송된다.The synthesis unit generation 180 converts the text received from the serial input / output unit 30 into parameters necessary for speech synthesis, and the generated parameters are transmitted to the speech synthesis unit 40 through DPRAM.

한편, DSP부(101)는 DPRAM 구동기(190), D/A 변환기능(200), 인터럽트 서비스루틴(210), 및 PSOLA 합성부(220)가 있으며, 실제 신호 처리적인 음성합성을 수행한다.Meanwhile, the DSP unit 101 includes a DPRAM driver 190, a D / A conversion function 200, an interrupt service routine 210, and a PSOLA synthesis unit 220, and perform voice synthesis through actual signal processing.

DPRAM 구동기(190)는 중앙처리부(2)와 합성파라미터와 중간결과들을 주고 받는데 사용하며, D/A 변환기(200)는 매 0.625msec마다 타이머 인터럽트에 의해서 합성한 디지탈 음성을 아날로그로 변환하여 내보내는 역할을 수행한다. 인터럽트 서비스루틴(210)은 DSP부(101)에서 발생하는 인터럽트들을 처리하여 PSOLA 합성기(220)과 연결시켜주는 기능을 담당한다. PSOLA 합성기(220)는 DPRAM을 통해 전송받은 파라미터들과 운율규칙 및 음성학적인 지식들을 이용하여 시간영역에서 파형을 연결함으로써 합성음으로 생성해내며, 생성된 파형은 인터럽트 서비스루틴(210)을 통해 D/A 변환기(200)에 전달된다.The DPRAM driver 190 is used to exchange intermediate results with the central processing unit 2 and the intermediate parameters. The D / A converter 200 converts the digital voice synthesized by the timer interrupt every 0.625 msec and converts it into analog. Do this. The interrupt service routine 210 is responsible for processing interrupts generated by the DSP unit 101 and connecting the PSOLA synthesizer 220. The PSOLA synthesizer 220 generates the synthesized sound by connecting the waveform in the time domain using the parameters transmitted through the DPRAM, the rhythm rule, and the phonological knowledge, and the generated waveform is generated through the interrupt service routine 210. A converter 200 is passed.

제4도는 본 발명에 따른 음성합성파라미터의 구조도로서, N은 합성단위의 갯수를 나타내며 값은 1228이다. 그리고 합성데이타베이스를 구성하는 각각의 합성단위는 6개의 요소로 구분되어 있다.4 is a structural diagram of a speech synthesis parameter according to the present invention, where N represents the number of synthesis units and a value of 1228. Each synthesis unit constituting the synthesis database is divided into six elements.

합성단위시작위치(310)는 데이타베이스내에서 합성단위의 시작 위치를 나타내고, 세그먼트기호 및 길이(320)는 합성단위내에 음향적 구분을 표시한 기호와 단위내에서 기호의 길이를 나타낸다. 피치시작위치(330)는 피치 데이타의 시작위치를 나타내고, 이 피치 데이타(340)에는 합성단위의 피치값이 저장되어 있다. 샘플시작위치(350)는 실제 파형의 시작위치이며, 샘플데이타(360)에는 음성파형이 저장되어 있다.The synthesizing unit start position 310 indicates the starting position of the synthesizing unit in the database, and the segment symbol and the length 320 indicate the symbol representing the acoustic division in the synthesizing unit and the length of the symbol in the unit. The pitch start position 330 indicates the start position of the pitch data, and the pitch data 340 stores the pitch value of the synthesis unit. The sample start position 350 is a start position of the actual waveform, and the voice waveform is stored in the sample data 360.

제5도는 본 발명에 따른 음성합성 처리 흐름도로서, 중앙처리부(20)에서 수행되는 언어처리 및 합성단위생성 과정과 음성합성부(40)에서 수행되는 음성합성과으로 나누어져 있다.5 is a flowchart of a speech synthesis processing according to the present invention, and is divided into a speech processing and a synthesis unit generation process performed by the central processing unit 20 and a speech synthesis performed by the speech synthesis unit 40.

전처리부(410)에서는 숫자와 약어를 한국어를 변환하고, 구분석기(420)는 문장을 형태소분석을 통하여 각각을 구분한 뒤 구문정보를 생성하고, 운율생성기(430)는 13개의 운율정보, 지속시간, 및 피치등의 정보를 생성하며, 글자-음운변환기(440)는 26개의 음운규칙과 예외사전을 이용하여 소리나는 형태의 발음기호열을 생성한다.The preprocessing unit 410 converts numbers and abbreviations into Korean, and the classifier 420 separates sentences through morphological analysis to generate syntax information, and the rhyme generator 430 generates 13 rhyme information. Generating information such as time, pitch, and the like, the letter-to-phonic converter 440 generates a phonetic phonetic sequence of sounds using 26 phonetic rules and exception dictionaries.

합성단위생성기(450)는 각각의 발음기호열을 합성에 적당한 합성단위로 변환하고, 합성단위결합기(460)는 합성에 필요한 파라미터를 데이타베이스로 부터 가져와 순서대로 연결함으로써 합성기에서 필요한 시간영역의 합성파라미터들을 생성하며, 음성합성부(40)에서 동작하는 PSOLA 합성기(470)에서는 전송받은 파라미터들과 음성학적인 지식들을 이용하여 시간영역의 파형에 피치, 지속시간, 및 에너지등을 규칙적으로 조절함으로써 합성음을 만들어 낸다.The synthesizing unit generator 450 converts each phonetic sequence into a synthesizing unit suitable for synthesis, and the synthesizing unit combiner 460 synthesizes the time domain required by the synthesizer by sequentially obtaining the parameters necessary for synthesis from a database. The PSOLA synthesizer 470, which generates the parameters and operates in the speech synthesizer 40, regularly adjusts the pitch, duration, and energy of the waveform in the time domain using the received parameters and phonetic knowledge. To produce

제6도는 본 발명에 따른 문자-음성변환 서비스의 전체적인 제어흐름도이다.6 is an overall control flowchart of the text-to-speech service according to the present invention.

도면에 도시된 바와 같이 가입자접속부(10)로 부터 호가 발생하면 중앙처리부(20)에서는 해당하는 호접속을 시도하여(510), 호 접속이 완료되면 중앙처리부(20)는 정보제공자와 접속을 시도한다.(515) 상기 정보제공자와 접속이 완료되면 정보제공자로부터 들어오는 데이타를 수신하여(520), 현재 어떤 단계에 있는지 분석하는 데이타 분석을 수행한다.(525).As shown in the figure, when a call occurs from the subscriber access unit 10, the central processing unit 20 attempts a corresponding call connection (510), and when the call connection is completed, the central processing unit 20 attempts to connect with the information provider. In step 515, when the connection with the information provider is completed, data received from the information provider is received (520), and data analysis is performed to analyze what stage it is at (525).

상기 분석한 결과, 서비스 종료를 묻는 단계이면 종료 메세지를 내보내 해제할 것인가 확인하는데(530), 이 때 종료에 해당하는 DTMF 신호가 들어오면 정보제공자 접속을 해제하고 서비스를 종료하며(535), 그렇지 않은 경우에는 현재 상태가 메뉴를 선택하는 단계인가를 분석한다(540).As a result of the analysis, if it is a step for requesting service termination, it is determined whether to release a termination message (530). At this time, if a DTMF signal corresponding to termination is received, the service provider disconnects and terminates the service (535). If not, it is analyzed whether the current state is a step of selecting a menu (540).

상기 현재 상태 분석 결과, 메뉴단계가이면 각가의 메뉴와 안내메세지를 음성합성부(40)로 보내어 메뉴 데이타를 수신하고(545), 한국어 규칙 및 음성 합성알고리즘을 적용하여 합성음을 생성한다(550). 상기 합성된 결과는 전화기를 통해 사용자가 들을 수 있도록 가입자접속부(10)의 코덱으로 전송된다(555).As a result of the analysis of the current state, if the menu step is a step, each menu and the guidance message is sent to the voice synthesis unit 40 to receive the menu data (545), and the Korean rule and the speech synthesis algorithm are applied to generate the synthesized sound (550). . The synthesized result is transmitted to the codec of the subscriber access unit 10 so that the user can hear it through the telephone (555).

한편, 상기 현재 상태 분석 결과, 메튜단계가 아니고 사용자가 원하는 최종적인 검색정보인 경우에는 정보구간을 검출하는 자료정보 검출을 수행하여(560), 해당하는 합성음을 생성한 뒤(565), 코덱으로 합성음을 전송한다(570).On the other hand, if the result of the current state analysis is not the Matthew step, but the final search information desired by the user, data information detection for detecting the information section is performed (560), and a corresponding synthesized sound is generated (565) to the codec. The synthesized sound is transmitted (570).

정보제공자로 부터 전송된 데이타에 대하여 상기와 같은 과정을 거쳐 함성음을 생성, 전송한 뒤에는 사용자가 누른 DTMF 신호를 수신하여(575), 데이타처리부(30)를 통해 정보제공자에게 다음상태로 넘어가기 위한 명령을 전송한다(580). 그리고 다시 정보제공자로 부터 데이타를 수신(520)하는 과정으로 돌아간다.After generating and transmitting a shout for the data transmitted from the information provider through the above process, the user receives the DTMF signal pressed by the user (575), and proceeds to the information provider through the data processor 30 to the next state. Send a command for (580). The process then returns to the process of receiving data from the information provider (520).

상기와 같이 구성하여 수행되는 본 발명은 다음과 같은 특수한 효과를 얻을 수 있다.The present invention configured and carried out as described above can obtain the following special effects.

첫째, 문자정보를 음성으로 변환하여 전화가입자들에게는 제공하므로 누구나 장소에 구애받지 않고 신속하게 정보를 얻을 수 있다.First, text information is converted into voice and provided to telephone subscribers, so anyone can get information quickly regardless of place.

둘째, 단말기를 보유한 사용자뿐만 아니라 전화사용자까지 정보제공자로 부터 받을 수 있는 서비스의 범위를 확대할 수 있다.Second, it is possible to expand the range of services that can be received from information providers not only users who have terminals but also telephone users.

세째, 사용자는 원하는 정보를 음성으로 들을 수 있으므로 동시에 다른 작업을 할 수 있다.Third, the user can listen to the desired information by voice, and can perform other tasks at the same time.

넷째, 정보제공 데이타베이스가 문자만으로 구성되므로 정보제공자는 녹음방식으로 데이타베이스를 구축하던 경우에 비해서 경비 및 시간의 부담을 줄일 수 있다.Fourth, since the information providing database consists only of letters, the information provider can reduce the burden of expense and time as compared to the case of building the database by recording method.

다섯째, 시각 장애인들에게도 정보제공서비스를 할 수 있다.Fifth, information service can be provided to the visually impaired.

여섯째, 음성을 이용한 새로운 정보제공서비스를 창출 할 수 있다.Sixth, it is possible to create a new information service using voice.

Claims

Text-to-speech service applied to a communication network system comprising a telephone network connection unit 1 connected to a telephone network connected to a public telephone accommodating a general telephone subscriber and a packet network connection unit 3 connected between a public data network accommodating an information provider In the apparatus, detecting a ring signal and hook-on signal to send a hook-on signal to the telephone network connection unit (1) to request call disconnection, converts the synthesized analog voice into a PCM (Pulse Code Modulation) signal to the telephone network connection ( Subscriber access means 10 for transmitting to 1); Controlling the operation of the overall system, the central device configured to operate a high-speed device and a multitasking OS (Operating System) to perform a voice synthesis function for providing a synthesized voice to the subscriber access means 10 in real time Processing means 20; Data processing means (30) connected to the central processing means (20) and the packet network connection part (3) to perform data input / output with the outside; And the voice synthesis unit 40 converts a text into a voice, and synthesizes a digital voice under the control of the central processing unit 20 and transmits the synthesized voice to the subscriber access unit 10. Character-to-speech service device, characterized in that.

The method of claim 1, wherein the subscriber access terminal 10 comprises: a codec for PCM signal conversion; A channel selector for setting a connection channel; And a DTMF converter for detecting a dual tone multifrequency (DTMF) signal and converting the signal to a predetermined bit.

The CPU of claim 1, wherein the central processing unit 20 comprises a central processing unit (CPU), a floating point processing unit (FPU), a memory device (ROM, SRAM, DRAM, DPRAM), and a device controller. Character-to-speech service apparatus comprising a.

4. The memory device of claim 3, wherein the memory device comprises: a ROM having a system control function; a state RAM (SRAM) having a speech synthesis program requiring fast execution; A synchronous RAM (DRAM) having a synthetic database requiring a large memory; And a dual port RAM (DPRAM) used to exchange data with the voice synthesizing means (40).

The data processing means (30) according to claim 1, wherein said data processing means (30) converts DTMF signals pressed by telephone subscribers into predetermined bit data and transmits the data to said packet network connection unit (3) and transmits data transmitted from an information provider to said central processing means ( A serial input / output unit transmitted to 20); And an Ethernet input / output unit used for debugging the system and downloading the synthesis database and the application program.

2. The voice synthesizing means (40) according to claim 1, characterized in that the voice synthesizing means (40) comprises a digital signal processor (DSP), a state RAM (SRAM), a ROM, an analog signal connection circuit (AIC), and a device controller. Text-to-speech service device.

Subscriber access means 10 connected to an external telephone network connection 1, a central processing means 20 connected to the subscriber access means 10, between the central processing means 20 and an external packet network connection 3 In the text-to-speech service control method applied to a text-to-speech service apparatus having data processing means (30) connected to the terminal, when the call is generated from the subscriber access means (10), the central processing means (20) A first step of receiving data by connecting to an information provider through the packet network access unit (3); A second step of analyzing the received data to determine whether a current state is a step of selecting a menu unless the service is released; A third step of receiving menu data and generating synthesized sounds according to whether the menu selection step is performed, or generating and transmitting a corresponding synthesized sound by detecting data information; And a fourth step of receiving a DTMF signal pressed by the user and transmitting a command to the information provider to move to the next state through the data processing means (30).

8. The method of claim 7, wherein the second step comprises: a fifth step of analyzing whether the current data is analyzed to determine which step is present, and if the end of the service is requested, sending a termination message to confirm the release; And a sixth step of analyzing whether the current state is a step of selecting a menu when the DTMF signal corresponding to termination is received from the subscriber as a result of the checking, and disconnecting the information provider. Characterized by the voice-to-speech service control method.

8. The method of claim 7, wherein the third step further comprises: analyzing the menu selection step and, if the current state is the menu selection step, sending each menu and a guide message to the voice synthesizing means 40 to receive menu data. step; After performing the fifth step, applying a Korean rule and a speech synthesis algorithm to generate a synthesized sound, and transmitting it to the subscriber access means 10 so that the user can listen through the telephone; A seventh step of performing data information detection for detecting an information section when the search result is not the menu step but the final search information desired by the user as a result of analyzing the menu selection step; And an eighth step of generating a corresponding synthesized sound after performing the seventh step and transmitting the synthesized sound to the subscriber access means (10).

The method of claim 7 or 9, wherein the process of generating the synthesized sound is divided into a language processing and synthesis unit generation process performed by the central processing means 20 and a speech synthesis process performed by the speech synthesis means 40. Character-to-speech service control method, characterized in that.

The method of claim 10, wherein the process of generating a language and synthesizing unit comprises: a first process of converting numbers and abbreviations into Korean, classifying sentences through morphological analysis, and generating syntax information; A second step of generating information such as predetermined rhyme information, duration, pitch, and the like, and generating a phonetic symbol sequence using a predetermined phonological rule and an exception dictionary; And a third step of generating the synthesis parameters of the time domain by converting each phonetic symbol string into a synthesis unit suitable for synthesis and linking the parameters necessary for the synthesis from a database in order. How to control conversion services.

The method of claim 11, wherein the speech synthesis process generates synthesized sound by regularly adjusting pitch, duration, and energy of the waveform in the time domain by using the received time domain synthesis parameters and phonetic knowledge. Character-to-speech service control method, characterized in that.

The data structure of the synthesis parameter according to claim 11, further comprising: a synthesis unit start position field (310) indicating a start position of the synthesis unit in the database; A segment symbol and a length field 320 indicating a symbol indicating an acoustic division in the synthesis unit and a length of the symbol in the unit; A pitch start position field 330 indicating a start position of the pitch data; A pitch data field 340 in which pitch values of the synthesis unit are stored; A sample start position field 350 indicating a start position of an actual waveform; And a sample data field (360) in which a speech waveform is stored.