KR100287905B1

KR100287905B1 - Real time voice playback system and voice playback method using the same

Info

Publication number: KR100287905B1
Application number: KR1019980062869A
Authority: KR
Inventors: 최준용
Original assignee: 서평원; 엘지정보통신주식회사
Priority date: 1998-12-31
Filing date: 1998-12-31
Publication date: 2001-05-02
Also published as: KR20000046192A

Abstract

실시간 음성 재생 시스템 및 그를 이용한 음성 재생 방법에 관한 것으로 특히, 음성 인식과 음성 합성 및 음성 재생이 통합된 음성 처리 보드에서 음성 인식 결과에 따른 음성안내 서비스시 음성재생과 음성합성(Text To Speech : TTS) 처리를 실시간 처리하기에 적당하도록 한 실시간 음성 재생 시스템 및 그를 이용한 음성 재생 방법에 관한 것이다. 이와 같은 실시간 음성 재생 시스템은 음성인식모듈 및 음성합성모듈로 구성된 음성처리부와, 음성 데이터를 재생할 수 있는 음성재생부와, 상기 음성인식모듈, 음성 합성모듈 및 음성재생부에서 사용하는 메모리부와, 상기 음성처리부, 메모리부 및 음성 재생부를 제어하는 제어부로 구성된다.The present invention relates to a real-time voice reproducing system and a voice reproducing method using the same. The present invention relates to a real-time voice reproduction system and a voice reproduction method using the same. Such a real time voice reproducing system includes a voice processor comprising a voice recognition module and a voice synthesis module, a voice reproducing unit capable of reproducing voice data, a memory unit used in the voice recognition module, a voice synthesizing module and a voice reproducing unit, And a control unit for controlling the voice processing unit, the memory unit, and the voice reproducing unit.

Description

Real time voice playback system and voice playback method using the same

본 발명은 실시간 음성 재생 시스템 및 그를 이용한 음성 재생 방법에 관한 것으로 특히, 음성 인식(Speech Recognition)과 음성 합성(Text To Speech : TTS) 및 음성 재생(Speech play)이 통합된 음성 처리 보드에서 음성 인식 결과에 따른 음성안내 서비스시 음성재생과 음성합성 처리를 실시간(Real time) 처리하기에 적당하도록 한 실시간 음성 재생 시스템 및 그를 이용한 음성 재생 방법에 관한 것이다.The present invention relates to a real-time voice playback system and a method for reproducing a voice using the same, and more particularly, to speech recognition in a speech processing board in which speech recognition, text to speech (TTS), and speech play are integrated. The present invention relates to a real-time voice reproducing system and a voice reproducing method using the same, which are suitable for real time processing of voice reproducing and voice synthesis processing in a voice guidance service.

음성 자동응답 서비스(Audio Response Service : 이하, ARS라 약칭 함) 시스템 등의 부가통신 시스템에 적용되는 음성인식 및 음성합성(TTS)을 위해서 구성되는 방식은 음성인식 전용보드(Board)와 음성합성보드 및 음성재생보드를 각각 구성하여 음성인식 및 음성합성 서비스를 처리하는 방식을 사용해왔다.Voice recognition boards and voice synthesis boards that are configured for voice recognition and voice synthesis (TTS) applied to additional communication systems, such as the Audio Response Service (hereinafter referred to as ARS) system And voice playback boards, respectively, have been used to process voice recognition and voice synthesis services.

음성인식 전용보드(Board)와 음성합성보드 및 음성재생보드의 역할에 맞는 프로그램이 로딩(loading)된 프로세서가 내장된 독립된 각각의 전용보드는 각 보드를 제어하는 CPU(central processing unit)보드에 의해 각 서비스를 실행하게 되는 구성을 갖고 있다.Each dedicated board with a built-in processor loaded with a voice recognition board and a program loaded for the role of a voice synthesis board and a voice playback board is controlled by a CPU (central processing unit) board that controls each board. It has a configuration that runs each service.

이와 같은 방식은 1채널의 음성인식 및 음성합성 서비스가 필요하더라도 개별 서비스 담당 보드가 각각 필요하고, 각 보드를 제어해야 하기 때문에 시스템자원의 낭비가 있을 수밖에 없었다.In this way, even if one channel of voice recognition and voice synthesis service is required, each board in charge of individual service is required, and each board must be controlled, which leads to waste of system resources.

또한 각각의 서비스를 실시간적으로 개발하기 위해서는 보드간 데이터 및 메시지 통신을 위해서 고속의 데이터 버스가 필요한 측면이 있었다.Also, in order to develop each service in real time, a high speed data bus was required for data and message communication between boards.

이와 같은 종래 음성인식 안내 서비스를 제공하기 위한 음성 처리 시스템을 첨부된 도면을 참조하여 설명하기로 한다.A voice processing system for providing such a conventional voice recognition guide service will be described with reference to the accompanying drawings.

도 1은 종래 음성 처리 시스템을 나타낸 블록 구성도이다.1 is a block diagram illustrating a conventional speech processing system.

종래 음성 처리 시스템은 도 1에 나타낸 바와 같이, 음성인식안내 서비스를 하기 위해서 각각의 전용보드를 사용하게 되는데 음성인식을 하는 음성 인식 전용보드(2)와, 임의의 문장을 입력받아 음성 합성하는 음성 합성 전용보드(3)와, 음성을 재생시키는 음성 재생 전용보드(4) 및 음성 인식 전용보드(2), 음성 합성 전용보드(3) 및 음성 재생 전용보드(4)를 제어하는 CPU 보드(1)로 구성된다. 이때, 상기 CPU 보드(1), 음성 인식 전용보드(2), 음성 합성 전용보드(3) 및 음성 재생 전용보드(4)는 음성 자동응답 시스템(Audio Response System)(10)내에 구성된다.In the conventional speech processing system, as shown in FIG. 1, each dedicated board is used to provide a voice recognition guide service. A voice recognition board 2 for voice recognition and a voice for inputting arbitrary sentences and synthesize a voice CPU board (1) for controlling the synthesis dedicated board (3), the speech reproduction exclusive board (4) and the speech recognition exclusive board (2), the speech synthesis dedicated board (3), and the speech reproduction dedicated board (4) for reproducing voice. It is composed of At this time, the CPU board 1, the speech recognition dedicated board 2, the speech synthesis dedicated board 3, and the speech reproduction dedicated board 4 are constituted in an audio response system 10.

그리고, 상기 음성 자동응답 시스템(10)은 사용자들과의 통신 서비스를 제공해주는 공중 교환 전화망(PSTN), 개인 통신 시스템(PCS)망, 디지털 통신 시스템(DCN)망과의 정합을 위한 트렁크 카드(7)를 더 구비하고 있다.The voice answering system 10 may further include a trunk card for matching with a public switched telephone network (PSTN), a personal communication system (PCS) network, and a digital communication system (DCN) network. 7) is further provided.

이때, 음성인식 전용보드(2)에서 인식된 인식된 결과에 대하여 아나운서 등에 의해 기 녹음된 데이터를 재생해 주는 방식을 사용하여 왔고, 기 녹음된 데이터는 시스템의 하드디스크(Harddisk : HDD)에 저장되며, 음성의 재생은 음성재생 전용보드(4)에서 이뤄져 사용자들에게 서비스된다.In this case, a method of reproducing pre-recorded data by an announcer or the like with respect to the recognized result recognized by the voice recognition board 2 has been used, and the pre-recorded data is stored in a system hard disk (HDD). The playback of the voice is made on the audio playback board (4) Serviced to users.

이때, 미설명 부호 6은 제어신호 및 메시지 전송 버스이다In this case, reference numeral 6 denotes a control signal and a message transmission bus.

이와 같은 종래 음성인식 안내 서비스를 제공하기 위한 음성 처리 시스템은 서비스를 위한 호가 착신이 된 후 음성인식 전용보드(2)에서는 외부와 연결된 트렁크 카드(7)로 음성을 디지털신호로 받아 음성인식 알고리즘에 의해 음성인식을 한 후 결과를 서비스 시나리오가 프로그램된 CPU보드(1)로 결과 값을 전달하게 된다.In the conventional voice processing system for providing a voice recognition guide service, a call for a service is received and a voice recognition board 2 receives a voice as a digital signal from a trunk card 7 connected to the outside to a voice recognition algorithm. After the voice recognition, the result is transferred to the CPU board 1 in which the service scenario is programmed.

CPU 보드(1)는 그 결과 값에 따라 안내 시나리오를 진행하게 되는데 기 녹음된 음성화일(File)을 음성재생 전용보드(4)로 데이터버스(5)를 통해 전송을 하여, 전송된 데이터를 수신한 음성재생 전용보드(4)는 실시간적으로 재생을 하게 된다.The CPU board 1 proceeds to the guide scenario according to the result value. The pre-recorded voice file is transferred to the voice reproduction board 4 through the data bus 5 to receive the transmitted data. One voice playback dedicated board 4 is to play in real time.

음성 합성의 기능을 이용하여 음성재생을 할 경우는 미리 음성합성된 파일을 재생하게 되거나, 실시간적인 처리 요구시에도 입력된 텍스트(Text)의 모든 합성이 끝나고 나서야 음성재생 전용보드(3)에서 재생을 할 수 있도록 되어 있다.In case of playing the voice using the function of voice synthesis, the voice synthesized file is played in advance, or only after the synthesis of the inputted text is finished, even if it is required to process in real time, it is played on the voice play board 3 only. It is supposed to be.

이와 같은 종래 음성 처리 시스템에 있어서는 다음과 같은 문제점이 있었다.Such a conventional voice processing system has the following problems.

첫째, 새로운 음성을 추가하여 서비스를 하려면 아나운서의 목소리로 녹음한 음성 데이터 화일을 시스템의 하드디스크에 저장을 해야 하며 또 그것을 제어하는 프로그램의 수정이 가해지는 불편이 따라 비용적인 면에서 문제가 많았다. 즉 음성합성기능이 적용되어 있지 않으므로 인해서 새로운 음성의 추가시 별도의 비용이 지불되는 측면이 있었다.First, in order to add new voices, the voice data files recorded by the announcer's voices must be stored on the system's hard disk. That is, because the speech synthesis function is not applied, there was a side in which a separate cost was paid when adding a new voice.

둘째, 음성합성서비스가 적용된 경우에도 실시간적이지 못하거나 음성합성 전용보드를 사용해야 하므로 비용 및 시스템 자원면에서 낭비가 되었다.Second, even when the voice synthesis service is applied, it is not a real-time or because a voice synthesis board must be used, it is a waste in terms of cost and system resources.

본 발명의 목적은 이상에서 언급한 종래 기술의 문제점을 감안하여 안출한 것으로서, 음성인식, 음성합성 및 음성재생이 동시에 병렬적으로 수행될 수 있도록 보드를 구성함으로써 음성안내 서비스시 음성재생과 음성합성시 실시간 처리에 의해 음성인식 서비스 및 음성합성 서비스를 할 수 있는 실시간 음성 재생 시스템 및 그를 이용한 음성 재생 방법을 제공하기 위한 것이다.An object of the present invention has been made in view of the above-mentioned problems of the prior art, by configuring the board so that voice recognition, speech synthesis and voice playback can be performed in parallel at the same time, voice playback and voice synthesis in the voice guidance service The present invention provides a real-time voice reproducing system capable of providing a voice recognition service and a voice synthesis service by time real-time processing, and a voice reproducing method using the same.

이상과 같은 목적을 달성하기 위한 본 발명의 일 특징에 따르면, 음성인식모듈 및 음성합성모듈로 구성된 음성처리부와, 음성 데이터를 재생할 수 있는 음성재생부와, 상기 음성인식모듈, 음성 합성모듈 및 음성재생부에서 사용하는 메모리부와, 상기 음성처리부, 메모리부 및 음성 재생부를 제어하는 제어부로 구성된다.According to an aspect of the present invention for achieving the above object, a voice processing unit consisting of a voice recognition module and a voice synthesis module, a voice playback unit for reproducing voice data, the voice recognition module, voice synthesis module and voice And a control unit for controlling the voice processing unit, the memory unit, and the voice reproducing unit.

바람직하게, 상기 메모리부는 인식에 필요한 인식 파라미터들 및 인식 프로그램의 수행을 위한 기타 메모리영역인 인식 메모리부와, 음성합성을 위한 데이터 베이스(Database) 및 TTS 프로그램의 수행을 위한 기타 메모리 영역인 TTS 메모리부와, 상기 음성 합성부 및 음성 재생 모듈에서 실시간적인 음성합성 및 재생을 위한 공유 메모리부로 구성된다.Preferably, the memory unit is a recognition memory unit for recognition parameters required for recognition and other memory areas for the execution of the recognition program, a database for speech synthesis, and a TTS memory for other memory areas for the execution of the TTS program. And a shared memory unit for real-time speech synthesis and reproduction in the speech synthesis unit and the speech reproduction module.

그리고, 바람직하게 상기 음성 처리부는 음성인식 및 음성합성 프로그램이 로딩되어 있는 프로세서로 구성된다.Preferably, the speech processing unit is composed of a processor loaded with a speech recognition and speech synthesis program.

바람직하게 상기 공유 메모리는 제 1 합성 데이터 출력 영역과 제 2 합성 데이터 출력 영역으로 구성되어 상기 음성재생부가 상기 제 1 합성 데이터 출력 영역과 제 2 합성 데이터 출력 영역을 개별적으로 리드(read)하여 음성을 재생시킨다.Preferably, the shared memory includes a first composite data output region and a second composite data output region, so that the voice reproducing unit reads the first composite data output region and the second composite data output region separately. Play it back.

바람직하게 상기 음성 처리 시스템은 임의의 통신 채널과 연결되어 음성의 송신(Transmission) 및 수신(Receive) 을 할 수 있고, 호(Call)착신 기능을 하며, 시나리오 메시지 및 텍스트(Text)를 받는 외부 인터페이스를 더 구비한다.Preferably, the voice processing system may be connected to any communication channel to transmit and receive voice, to function as a call, and to receive a scenario message and text. It is further provided.

이상과 같은 다른 목적을 달성하기 위한 본 발명의 다른 특징에 따르면, 음성인식을 준비하는 단계, 입력되는 음성이 기 녹음된 음성의 재생을 요구하는가를 판단하는 단계, 상기 판단결과 기 녹음된 음성의 재생을 요구하는 것이면 기 녹음된 음성 파일을 재생하는 단계, 상기 판단결과 기 녹음된 음성의 재생을 요구하는 것이 아니면 음성합성 및 서비스할 음성 합성 텍스트를 준비하는 단계, 상기 서비스할 텍스트를 음성합성하는 단계, 일 메모리 영역에 상기 음성합성한 텍스트를 복사하는 단계, 합성할 텍스트가 더 남아 있는가를 판단하는 단계, 합성할 텍스트가 더 남아 있을 경우 상기 일 메모리 영역의 상기 음성 합성한 텍스트를 재생하고 동시에 임의의 다른 메모리 영역에 상기 남아 있는 텍스트를 음성합성하는 단계, 상기 일 메모리 영역의 음성 재생이 끝남과 동시에 상기 다른 메모리 영역의 상기 합성한 텍스트를 음성 재생하는 단계를 포함하여 이루어진다.According to another aspect of the present invention for achieving the above object, preparing a voice recognition, determining whether the input voice requires the reproduction of the pre-recorded voice, the determination result of the pre-recorded voice Reproducing the pre-recorded voice file if the reproduction is requested, preparing a voice synthesized text to be synthesized and serviced if the regeneration is not required to reproduce the pre-recorded voice, and synthesizing the text to be serviced. Copying the speech synthesized text to one memory area; determining whether there is more text to synthesize; if there is more text to synthesize, playing the speech synthesized text of the one memory area and simultaneously randomly Synthesizing the remaining text in another memory area of the memory area; At the same time as the reproduction property ends comprises the step of reproducing the synthesized speech of text in the other memory area.

이상과 같은 본 발명에 따르면, 음성 인식 ARS 시스템에서 인식 결과에 따른 임의의 음성 안내 서비스를 기 녹음 음성을 통해 하거나, TTS를 이용하여 선택적으로 실시간에 서비스할 수 있다.According to the present invention as described above, in the voice recognition ARS system, an arbitrary voice guidance service according to the recognition result may be provided through a pre-recorded voice, or may be selectively serviced in real time using a TTS.

도 1은 종래 음성 처리 시스템을 나타낸 블록 구성도1 is a block diagram showing a conventional speech processing system

도 2는 본 발명 실시간 음성 재생 시스템을 나타낸 블록 구성도2 is a block diagram showing a real-time voice playback system of the present invention

도 3은 본 발명 실시간 음성 재생 시스템을 이용한 음성 재생 방법을 나타낸 플로우차트3 is a flowchart showing a voice reproducing method using the present invention real-time voice reproducing system

도면의 주요부분에 대한 부호의 설명Explanation of symbols for main parts of the drawings

20 : CPU 보드 30 : 음성처리 전용보드20: CPU board 30: dedicated voice processing board

31 : 외부 인터페이스부 32 : 음성 처리부31: external interface unit 32: voice processing unit

33 : 제어부 34 : 인식 메모리부33 control unit 34 recognition memory unit

35 : TTS 메모리부 36 : 공유 메모리부35: TTS memory unit 36: shared memory unit

37 : 음성 재생부 38 : 데이터 버스37: voice playback unit 38: data bus

39 : 제어신호 및 메시지 버스 40 : 가입자 보드39: control signal and message bus 40: subscriber board

이하 본 발명의 바람직한 일 실시 예에 따른 구성 및 작용을 첨부된 도면을 참조하여 설명한다.Hereinafter, a configuration and an operation according to an exemplary embodiment of the present invention will be described with reference to the accompanying drawings.

도 2는 본 발명 실시간 음성 재생 시스템을 나타낸 블록 구성도이다.2 is a block diagram showing a real-time voice playback system of the present invention.

본 발명 실시간 음성 재생 시스템은 음성 자동응답 시스템(Audio Response System)(40)내에 구성되는데, 이를 상세히 살펴보면, 음성인식모듈(32a) 및 음성합성모듈(32b)로 구성된 음성처리부(32)와, 상기 음성인식모듈(32a)에서 사용할 인식 메모리부(34)와, 상기 음성 합성모듈(32b)에서 사용할 TTS 메모리부(35)와, 디지털 음성 데이터를 재생할 수 있는 음성재생부(37)와, 상기 음성재생부(37)와 음성 합성 모듈(32b)에서 공유할 공유 메모리부(36)와, 상기 음성처리부(32), 인식 메모리부(34), TTS 메모리부(35), 공유 메모리부(36) 및 음성 재생부(37)를 제어하는 제어부(33)로 구성된다.The present invention real-time voice playback system is configured in an audio response system (Audio Response System) 40, in detail, the voice processing unit 32 composed of a voice recognition module 32a and a voice synthesis module 32b, and A recognition memory unit 34 for use in the voice recognition module 32a, a TTS memory unit 35 for use in the voice synthesis module 32b, a voice reproducing unit 37 for reproducing digital voice data, and the voice Shared memory unit 36 to be shared by playback unit 37 and speech synthesis module 32b, voice processor 32, recognition memory unit 34, TTS memory unit 35, shared memory unit 36 And a control unit 33 for controlling the audio reproducing unit 37.

이때, 음성인식, 음성합성 및 음성재생이 동시에 병렬적으로 수행될 수 있도록 음성인식 모듈(32a) 및 음성합성 모듈(32b)로 구성되는 음성처리부(32)와 음성 재생부(37)를 음성 처리 전용 보드(30)에 전부 구성하고, 상기 음성 처리 전용 보드(30)의 내부 메모리 구성을 인식 메모리부(34), TTS 메모리부(35) 및 공유 메모리부(36)으로 논리적으로 나누어 구성한다.At this time, the speech processing unit 32 and the speech reproducing unit 37, which are composed of the speech recognition module 32a and the speech synthesis module 32b, can perform speech processing so that speech recognition, speech synthesis, and speech reproduction can be simultaneously performed in parallel. All of the dedicated board 30 is configured, and the internal memory configuration of the voice processing board 30 is logically divided into a recognition memory unit 34, a TTS memory unit 35, and a shared memory unit 36.

음성처리부(32)는 상기 음성인식 및 음성합성 프로그램이 로딩되어 있는 프로세서로 구성되어 메시지에 따른 음성인식 및 음성합성을 할 수 있도록 메시지 처리모듈(도시하지 않음)과 음성인식모듈(32a), 음성합성 모듈(32b)이 프로그램되어 있어 한 보드에서 두 가지 서비스를 메시지에 따라 선택적으로 할 수 있다.The voice processing unit 32 is composed of a processor in which the voice recognition and voice synthesis programs are loaded, so that the voice processing module (not shown), the voice recognition module 32a, and the voice can perform voice recognition and voice synthesis according to the message. Synthesis module 32b is programmed so that two services on a board can be selectively selected according to the message.

인식 메모리부(34)는 인식에 필요한 인식 파라미터들 및 인식 프로그램의 수행을 위한 기타 메모리영역을 위해 구성된다.The recognition memory unit 34 is configured for recognition parameters necessary for recognition and other memory areas for performing the recognition program.

TTS 메모리부(35)는 음성합성을 위한 데이터 베이스(Database) 및 TTS 프로그램의 수행을 위한 기타 메모리 영역을 위해 구성된다.The TTS memory unit 35 is configured for a database for speech synthesis and other memory areas for the execution of the TTS program.

그리고, 공유 메모리부(36)는 실시간적인 음성합성 및 재생을 위하여 제 1 합성 데이터 출력 영역(36a)과 제 2 합성 데이터 출력 영역(36b)으로 구성한다.The shared memory unit 36 includes a first composite data output region 36a and a second composite data output region 36b for real-time speech synthesis and reproduction.

이렇게 나누어진 공유메모리부(36)의 두 영역(36a)(36b)은 음성재생부(37)에서 개별적으로 리드(read)하여 음성 재생을 할 수 있도록 한다.The two regions 36a and 36b of the shared memory unit 36 divided as described above are read out by the audio reproducing unit 37 so as to enable voice reproduction.

그리고, 외부 인터페이스부(31)는 통신채널과 연결되어 음성의 송신(Transmission) 및 수신(Receive) 을 할 수 있고, 호(Call)착신 기능을 해야 하며 CPU 보드(20)로부터 오는 시나리오 메시지 및 텍스트(Text)를 받는다.In addition, the external interface unit 31 may be connected to a communication channel to transmit and receive voices, to call, and to receive a scenario message and text from the CPU board 20. Receive (Text)

또한, 음성재생부(37)는 음성디지탈 데이터를 재생할 수 있는 모듈이다.In addition, the voice reproducing section 37 is a module capable of reproducing voice digital data.

그리고, 미설명 부호 39는 데이터 버스이고, 38은 제어 및 메시지 전송 버스이다.Reference numeral 39 is a data bus and 38 is a control and message transmission bus.

도 3은 본 발명 음성 처리 시스템을 이용한 음성 처리 방법을 설명하기 위한 플로우차트이다.3 is a flowchart for explaining a voice processing method using the voice processing system of the present invention.

본 발명 음성 처리 시스템을 이용한 음성 처리 방법은 ARS장비에서 호가 착신이 되어 ARS의 CPU에서 음성처리 전용보드(30)로 음성인식서비스를 요청했을 때, 음성처리 전용보드(30)는 외부 인터페이스(31)를 통하여 들어온 수신음성을 인식하여 결과를 외부인터페이스(31)를 경유 사용자에게 서비스내용을 음성재생을 통해 안내하게 되는데 다음과 같은 경우를 외부 메시지에 의해 선택적으로 할 수 있도록 구현한다.In the voice processing method using the voice processing system of the present invention, when a call is received from the ARS device and the voice processing service is requested from the CPU of the ARS to the voice processing board 30, the voice processing board 30 is connected to an external interface 31. Recognizing the incoming voice received through) to guide the service contents to the user via voice playback via the external interface 31, the following cases are implemented to be selectively made by an external message.

첫번째 경우는, 서비스를 제어하는 시나리오에 의해서, 기녹음된 음성 데이터를 시스템의 하드디스크(도시하지 않음)로부터 읽어와 음성처리 전용보드(30)의 음성재생을 통해서 처리하는 경우이고, 두번째 경우는, 서비스를 제어하는 시나리오에 의해 안내 서비스할 텍스트를 받아 음성합성(TTS)하여 음성처리 전용보드의 음성재생을 하는 경우이다.In the first case, the recorded voice data is read from the system's hard disk (not shown) and processed through the voice playback of the voice processing board 30 according to the scenario of controlling the service. In this case, the voice is reproduced on a dedicated voice processing board by receiving a text to be guided by a scenario of controlling a service and synthesizing the voice by using TTS.

음성신호처리보드의 음성처리부(32)는 착신된 호에 대하여 음성인식서비스를 하기 위하여 외부 인터페이스(31)를 통해 음성을 수신하여 음성인식을 준비한다.(100)The voice processor 32 of the voice signal processing board receives a voice through an external interface 31 to prepare voice recognition for a voice recognition service for the received call.

이어서, 음성이 인식되면 사용자 음성 입력인가를 판단한다.(101)Subsequently, if the voice is recognized, it is determined whether it is a user voice input.

판단결과(101) 음성 입력이 아닐 경우 음성 인식 준비(100) 상태로 복귀한다.If the determination result 101 is not a voice input, the process returns to the voice recognition preparation 100 state.

그러나, 판단결과(101) 음성 입력이면 음성 인식 및 인식 결과를 보고한다.(102)However, if the determination result 101 is a voice input, the voice recognition and recognition result is reported.

이때, 음성 인식은 음성 처리부(32)의 음성 인식 모듈(32a)에서 하며, 인식 결과는 CPU보드(33)의 CPU에 보고한다.At this time, the speech recognition is performed by the speech recognition module 32a of the speech processing unit 32, and the recognition result is reported to the CPU of the CPU board 33.

이어서, 인식 결과 기 녹음된 데이터의 재생이 필요한가를 판단한다.(103)Subsequently, it is determined whether or not reproduction of the pre-recorded data is necessary as a result of the recognition.

판단결과(103) 기 녹음된 음성 데이터의 재생인 경우중 첫 번째 방식(기 녹음된 음성 데이터를 시스템의 하드디스크에서 읽어와 음성처리 전용보드(30)의 음성 재생을 통해 처리하는 경우)의 경우는 보드(30)의 외부 인터페이스(31)를 통해 CPU가 시스템 하드디스크로부터 재생할 음성데이터를 읽어와 보드(30)의 제 1 합성출력 데이터영역(이하 제 1 영역이라 약칭 함)(36a)에 데이터를 쓴다.In the case of the first method (when the previously recorded voice data is read from the hard disk of the system and processed through the voice playback of the voice processing board 30), the judgment result 103 is the playback of the pre-recorded voice data. Reads the voice data to be reproduced from the system hard disk by the CPU through the external interface 31 of the board 30 and stores the data in the first synthesized output data area (hereinafter abbreviated as first area) 36a of the board 30. Write

이때 제 1 영역(36a)과 제 2 합성출력 데이터영역(이하 제 2 영역이라 약칭함)(36b)은 1회 음성재생시 필요한 만큼의 고정적인 크기를 가지고 있도록 하여 데이터 쓰기 할 때 CPU는 제 1 영역만큼 제 1 영역에 쓰게된다.At this time, the first area 36a and the second synthesized output data area (hereinafter abbreviated as second area) 36b have a fixed size as necessary for one-time audio reproduction, so that the CPU writes the first data. Write to the first area as long as the area.

제 1 영역에 재생할 음성데이타가 다 씌어지면 음성재생부(37)는 음성재생을 시작하게 된다.When the voice data to be played back is written to the first area, the voice player 37 starts voice playback.

이렇게 음성을 재생함과 동시에 CPU는 제 2 영역에 나머지 재생할 음성 데이터들을 제 2 영역만큼 쓰고 난 후, 제 1 영역의 음성재생이 끝나면 이미 저장된 제 2 영역의 데이터들을 재생한다.At the same time as the voice is reproduced, the CPU writes the remaining voice data to be reproduced in the second area as much as the second area. After the voice is reproduced in the first area, the CPU reproduces the data of the second area.

마찬가지로 제 1 영역에서는 제 2 영역이 재생동작중일 때 나머지의 음성 데이터가 저장되며 위의 전술한 과정은 모든 데이터를 재생할 때까지 반복 수행하게 된다.(104)Likewise, in the first region, the remaining audio data is stored when the second region is in the playback operation, and the above-described process is repeated until all the data is reproduced.

한편, 두 번째 방식(안내 서비스할 텍스트를 받아 음성합성하여 음성처리 전용보드(30)의 음성 재생을 하는 경우)의 경우는, 음성데이타가 아닌 서비스할 텍스트(Text)를 외부 인터페이스(31)를 통해 일정한 크기로 입력 텍스트를 받아 음성합성(Text-to-speech)을 한 후 제 1 영역과 제 2 영역에 교차적으로 데이터 쓰기를 하여 첫 번째 방식처럼 교차재생을 할 수 있도록 한다.On the other hand, in the case of the second method (in case of playing the voice of the voice processing board 30 by synthesizing the voice by receiving the text to be guided service), the text to be serviced instead of the voice data is provided to the external interface 31. After inputting text with a certain size through text-to-speech, the data is written in the first and second areas alternately, so that cross play can be performed as in the first method.

여기서 교차재생을 위하여 음성합성모듈(32b)에서 구현되어야 할 것은 입력된 텍스트에 대해서 합성이 모두 끝난 후 위의 과정을 진행하는 것이 아니라, 제 1 영역(또는 제 2 영역)만큼의 음성합성 후 일단 제 1 영역(또는 제 2 영역)에 데이터 쓰기를 하고, 제 2 영역의 데이터를 재생하는 시간에 병렬적으로 다시 제 2 영역(또는 제 1 영역)만큼의 데이터를 합성하고 데이터 쓰기를 하는 과정을 반복적으로 할 수 있도록 해야 한다.Here, what is to be implemented in the speech synthesis module 32b for cross reproduction is not to proceed the above process after the synthesis is completed for the input text, but after the speech synthesis for the first region (or the second region) once The process of writing data in the first area (or the second area), synthesizing data of the second area (or the first area) in parallel again at the time of reproducing the data of the second area, and performing data writing You should be able to do it repeatedly.

이 과정을 자세히 설명하면 다음과 같다.This process is explained in detail as follows.

CPU에서 음성재생 서비스할 텍스트를 TTS서비스와 함께 요청하게 되면, 보드(30)의 외부 인터페이스(31)에서는 요청된 메시지와 텍스트를 입력을 받게 된다.(105)When the CPU requests text to be played back with the TTS service, the external interface 31 of the board 30 receives input of the requested message and text.

이때, 보드(30)에 입력되는 입력텍스트는 메모리의 특정영역에 복사를 해놓는데,(105-1) 이 텍스트를 음성처리부(32)의 음성합성모듈(32b)에서 읽어와 음성합성을 하게 된다.(105-2)At this time, the input text input to the board 30 is copied to a specific area of the memory (105-1), and the text is read by the speech synthesis module 32b of the speech processor 32 to synthesize the speech. (105-2)

만일 음성합성모듈(32b)에서 읽을 수 있는 텍스트의 양을 넘어서 저장되어 있다면 일단은 지정된 양만큼 텍스트를 읽어와 합성처리를 하고(105-2), 합성이 완료된 후 다음 나머지의 텍스트를 읽어와 처리를 한다.(107)If it is stored beyond the amount of text that can be read by the speech synthesis module 32b, the text is first read by the specified amount and synthesized (105-2), and after the synthesis is completed, the next remaining text is read and processed. (107)

이어서, 합성할 텍스트가 남아있는가를 판단한다.(108)Subsequently, it is determined whether the text to be synthesized remains.

판단결과 합성할 텍스트가 남아았지 않으면 음성 재생부(37)를 이용해 제 1 영역의 음성을 재생한다.(109)If there is no text to be synthesized as a result of the determination, the voice reproducing unit 37 reproduces the voice of the first region.

그러나, 판단결과(108) 합성할 텍스트가 남아 았으면 다음과 같은 동작이 진행된다.However, if the text to be synthesized remains in the determination result 108, the following operation proceeds.

즉, 음성합성모듈(32b)은 음성합성 알고리듬(Algorithm)에 따라 합성을 하게 되는데, 프로그램구조를 제 1 영역(또는 제 2 영역)만큼 합성이 되면 제 1 영역(또는 제 2 영역)으로 합성 데이타들을 쓰도록 한다.(110)That is, the speech synthesis module 32b synthesizes according to the speech synthesis algorithm. When the program structure is synthesized by the first region (or the second region), the synthesized data is converted into the first region (or the second region). (110)

다시 말해서, 음성합성이 시작되어 제 1 영역에 합성 데이터 쓰기가 끝나면 제 1 영역의 재생이 시작되고, 제 2 영역의 재생이 시작되어 재생을 하는 동안 음성합성모듈(32b)은 다음의 음성합성을 계속 진행하여 다시 제 2 영역만큼의 합성이 생성되면 제 2 영역으로 데이터 쓰기를 한다.In other words, when the speech synthesis starts and the synthesis data is written to the first region, the reproduction of the first region begins, and the reproduction of the second region begins and the speech synthesis module 32b synthesizes the next speech synthesis. If the synthesis continues as much as the second area, data is written to the second area.

이것은 제 1 영역의 재생시간이 음성합성과 데이터 저장에 소요되는 시간에 비해 길므로 가능하다.This is possible because the reproduction time of the first region is longer than the time required for speech synthesis and data storage.

이어서, 합성할 텍스트가 남아 있는가를 판단한다.(111)Next, it is determined whether the text to be synthesized remains.

판단결과(111) 합성할 텍스트가 남아 있지 않으면 제 2 영역의 텍스트를 음성으로 재생한다.(112)As a result of determination 111, if there is no text to be synthesized, the text of the second area is reproduced by voice.

즉, 제 1 영역의 재생이 끝나면 저장된 제 2 영역의 데이터들을 재생시작 한다.That is, after the reproduction of the first region is finished, the reproduction of the stored data of the second region is started.

그러나, 판단결과(111) 합성할 텍스트가 남아 있으면 앞에서와 마찬가지로 음성합성모듈은 다시 음성합성을 계속 진행하여 나머지 음성 데이터들을 제 1 영역에 쓰게 되는데 이러한 일련의 과정은 모든 텍스트에 대하여 합성 및 재생이 완료될 때까지 계속 진행된다.(113)However, if there is remaining text to be synthesized as a result of the determination 111, the voice synthesis module continues the voice synthesis again as described above and writes the remaining voice data in the first area. It continues until it is completed. (113)

이상의 설명에서와 같은 본 발명은 음성인식과 음성합성, 음성 재생이 통합된 음성 처리 보드에서 음성인식 결과에 따른 음성안내 서비스(ARS)시 음성재생과 음성합성을 실시간으로 할 수 있어 착신된 호에 대하여 음성인식서비스 및 음성 합성 서비스를 실시간에 할 수 있는 효과가 있다.As described above, the present invention can perform voice playback and voice synthesis in real time during voice guidance service (ARS) based on voice recognition results in a voice processing board integrated with voice recognition, voice synthesis, and voice playback. The voice recognition service and the speech synthesis service can be performed in real time.

Claims

A speech processing unit comprising a speech recognition module and a speech synthesis module;

A voice reproducing unit capable of reproducing voice data;

A memory unit used in the voice recognition module, the voice synthesis module and the voice reproducing unit;

And a control unit for controlling the voice processing unit, the memory unit, and the voice reproducing unit.

2. The TTS memory of claim 1, wherein the memory unit is a recognition memory unit for recognition parameters required for recognition and other memory areas for performing a recognition program, and a TTS memory for other memory areas for performing a database and a TTS program for speech synthesis. And a shared memory unit for real-time speech synthesis and reproduction in the speech synthesis unit and the speech reproduction module.

The system of claim 1, wherein the speech processor comprises a processor in which a speech recognition and speech synthesis program is loaded.

2. The shared memory of claim 1, wherein the shared memory includes a first composite data output region and a second composite data output region, so that the voice reproducing unit reads the first composite data output region and the second composite data output region separately. Real-time voice playback system, characterized in that for playing the voice.

The real-time voice reproducing system of claim 1, wherein the voice processing system further comprises an external interface connected to an arbitrary communication channel to receive and transmit voice, and to receive a call function and a scenario message and text.

Preparing speech recognition;

Determining whether the input voice requires reproduction of the pre-recorded voice;

Reproducing the pre-recorded voice file if it is determined that the pre-recorded voice is to be played back;

Preparing a speech synthesized text to be synthesized and serviced if the result of the determination does not require reproduction of the pre-recorded voice;

Voice synthesizing the text to be serviced;

Copying the speech synthesized text into one memory area;

Determining whether there is more text to be synthesized;

Reproducing the speech synthesized text of the one memory area while simultaneously synthesizing the remaining text in any other memory area when there is more text to be synthesized;

And reproducing the synthesized text of the other memory area at the same time that the voice reproducing of the one memory area is ended.

7. The method of claim 6, further comprising: synthesizing the remaining text to be synthesized and copying the remaining text to the one memory area when there is more text to be synthesized when voice playing the synthesized text in the other memory area; And reproducing the voice of the text copied to the one memory area at the same time that the synthesized text voice reproduction of the memory area is terminated.

7. The method of claim 6, wherein, when the prerecorded voice file is larger than the one memory area, the prerecorded voice file is written to the other memory area when the prerecorded voice file is reproduced. And repeating the step of reproducing the voice file in the other memory area at the same time as the end until the pre-recorded voice reproduction is completed.