KR100314659B1

KR100314659B1 - Automatic transformation device and method for telephone speech database

Info

Publication number: KR100314659B1
Application number: KR1019990051588A
Authority: KR
Inventors: 김기철; 김기재; 성종모; 최고봉
Original assignee: 오길록; 한국전자통신연구원; 이계철; 한국전기통신공사
Priority date: 1999-11-19
Filing date: 1999-11-19
Publication date: 2001-11-15
Also published as: KR20010047387A

Abstract

본 발명은 조용한 환경에서 녹음된 기존의 음성 데이터베이스를 전화선을 통과시켜, 전화음성으로 변환하는 전화음성 데이터베이스 자동 구축 장치 및 방법에 관한 것이다.The present invention relates to an automatic telephone database construction apparatus and method for converting an existing voice database recorded in a quiet environment into a telephone voice through a telephone line.

본 발명의 전화음성 데이터베이스 자동 구축 장치는 전화를 걸고 받아주며, 디지털 음성신호와 아날로그 전화음성 사이의 신호 변환 및 음성파일의 끝을 알려주기 위한 특수 톤을 내보내는 전화망 정합부와; 상기 전화망 정합부로 음성 데이터베이스의 음성파일을 읽어 필요한 변환을 하여 출력하고, 상기 전화망 정합부에서 출력하는 전화선을 통과한 디지털 전화음성을 녹음하고 전화음성파일을 만들어 전화음성 데이터베이스를 구축하는 음성 데이터베이스, 음성파일 재생부, 호처리 제어부, 전화음성파일 녹음 및 저장부, 전화음성 데이터베이스 및 시스템 제어부를 포함하는 호스트로 구성된다.The telephone voice database automatic building apparatus of the present invention comprises: a telephone network matching unit for making and receiving a telephone call and for outputting a special tone for converting a signal between a digital voice signal and an analog telephone voice and an end of a voice file; Voice database and voice to read the voice file of the voice database to the telephone network matching unit, convert it as necessary, output it, record digital telephone voice passing through the telephone line output from the telephone network matching unit, and make a telephone voice file to build a telephone voice database. And a host including a file reproducing unit, a call processing control unit, a telephone voice file recording and storage unit, a telephone voice database, and a system control unit.

본 발명은 운용자 및 음성을 제공하는 사용자 없이 전화음성 데이터베이스를 자동으로 구축해 줌으로써, 운용자 및 사용자에게 드는 작업시간 및 비용을 절감하는 동시에, 기존 음성 데이터와 전화음성 데이터간의 인식 성능을 비교할 수 있도록 함으로써, 전화채널이 인식 성능에 미치는 영향을 분석하고 전화채널에 최적화된 음성인식기 개발을 가능하도록 한다.The present invention automatically establishes a telephone voice database without an operator and a user who provides a voice, thereby reducing operation time and cost for the operator and the user, and comparing the recognition performance between the existing voice data and the telephone voice data. This study analyzes the influence of telephone channel on recognition performance and enables the development of voice recognizer optimized for telephone channel.

Description

Automatic transformation device and method for telephone speech database

본 발명은 전화선 음성인식에 필수적인 전화음성 데이터베이스를 구축하는 장치에 관한 것으로서, 특히 조용한 환경에서 녹음된 대규모의 음성 데이터베이스를 전화음성 데이터로 자동 변환해 주는 전화음성 데이터베이스 자동 구축 장치 및 방법에 관한 것이다.The present invention relates to an apparatus for constructing a telephone speech database essential for telephone line speech recognition, and more particularly, to an apparatus and method for automatically constructing a telephone speech database for automatically converting a large-scale speech database recorded in a quiet environment into telephone speech data.

전화환경에서 음성인식 서비스를 구현하기 위해서는 음성인식 엔진을 전화환경에서의 음성데이터로 학습시켜야 하는 문제가 있다. 이를 위해 전화음성 데이터베이스를 구축하는 장치가 요구된다.In order to implement a voice recognition service in a telephone environment, there is a problem in that a voice recognition engine needs to be trained with voice data in a telephone environment. To this end, a device for building a telephone voice database is required.

일반적인 전화음성 데이터베이스 구축장치는 사용자의 음성을 직접 전화선을 통해 수집하는 것으로, 그 수집 과정을 효율적으로 하기 위해 다채널로 동시에 여러 사람의 음성을 수집하는 기술(대한민국 특허출원 제1997-052666호 참조), 안내 방송을 이용하여 운용자 없이 자동으로 사용자의 음성을 수집하도록 하는 기술(대한미국 특허출원 제1997-047152호 참조), 수집한 음성 데이터를 편집하는데 소요되는 시간을 줄이기 위해 음성 레이블링을 쉽게 할 수 있는 환경을 제공하는 기술(대한미국 특허출원 제1997-047337호 참조), 자동으로 음성 구간을 추출하여 음성파일로 저장하는 기술 등이 알려져 있다.A general telephone voice database construction device collects the user's voice directly through a telephone line. The technology of collecting multiple voices simultaneously in multiple channels for efficient collection process (see Korean Patent Application No. 1997-052666). Technology that automatically collects the user's voice without an operator using announcements (see Korean Patent Application No. 1997-047152), and can easily label voice to reduce the time required to edit the collected voice data. Techniques for providing an environment (see Korean Patent Application No. 1997-047337), and a technique for automatically extracting a voice section and storing it as a voice file are known.

또한 전화선 환경을 사용하지 않고 시뮬레이터를 이용하는 기술(미국특허 US5,475,792 참조)도 있으나, 전화채널이 음성인식의 성능에 영향을 주는 요인이 다양하므로 실제의 전화채널 환경에 최적화된 모델링이 쉽지 않다.There is also a technique using a simulator without using a telephone line environment (see US Pat. No. 5,475,792), but since the telephone channel has various factors affecting the performance of voice recognition, modeling optimized for the actual telephone channel environment is not easy.

일 예로 종래의 전화선 음성인식을 위한 전화음성 데이터베이스 구축은 도 1에 도시한 바와 같은 전화음성 자동 수집 장치에 의해 이루어진다.For example, the conventional telephone voice database for telephone line voice recognition is constructed by an automatic telephone voice collection apparatus as shown in FIG. 1.

도 1에서 보듯이, 종래 전화음성 자동 수집 장치의 구성은 사용자로부터 전화가 걸려왔는지 감지하고 전화선으로부터의 아날로그 신호와 장치내의 디지털 신호를 상호 변환해 주는 전화선 정합부(10)와; 상기 전화선 정합부(10)를 통해 입력되는 사용자의 음성과 안내 방송으로 출력되는 신호가 혼합된 신호에서 사용자의 음성만을 분리해내고, 또 사용자의 음성구간만을 추출해주는 음성신호 처리부(12)와; 상기 음성신호 처리부(12)에서 검출된 음성구간 데이터를 음성파일로 저장하는 음성파일 저장부(16)와; 상기 전화선 정합부(10)에서의 호제어 및 음성파일저장을 제어하는 호제어 및 음성녹음부(14)를 포함하여 구성된다.As shown in FIG. 1, the configuration of a conventional telephone voice collecting device includes: a telephone line matching unit 10 for detecting whether a telephone call is received from a user and converting an analog signal from a telephone line and a digital signal in the apparatus; A voice signal processor 12 which separates only the voice of the user from the signal mixed with the voice of the user input through the telephone line matching unit 10 and the signal output through the guide broadcast, and extracts only the voice section of the user; A voice file storage unit 16 for storing the voice section data detected by the voice signal processing unit 12 as a voice file; And a call control and voice recording unit 14 for controlling call control and voice file storage in the telephone line matching unit 10.

그런데, 도 1에 도시한 바와 같은 전화음성 데이터베이스를 구축할 수 있는 장치를 만드는 것도 간단치 않거니와 음성데이터 수집비용도 만만치 않다. 그리고많은 사람을 동원하는 문제가 있으며, 사용자들의 부주의로 인한 데이터 유실여부를 일일히 확인할 필요가 있고, 데이터 레이블링 문제도 복잡하다. 또한 채널 왜곡으로 인한 인식 성능 저하 및 알고리즘 개선에 의한 향상 정도를 정량적으로 확인하기가 곤란하다.However, it is not easy to make a device for building a telephone voice database as shown in FIG. In addition, there is a problem of mobilizing a large number of people, it is necessary to check whether data is lost due to carelessness of users, and data labeling problem is complicated. In addition, it is difficult to quantitatively determine the improvement of recognition performance due to channel distortion and the improvement due to algorithm improvement.

기존의 조용한 환경에서 녹음된 다양한 음성 데이터베이스를 활용할 수 있으면 많은 시간과 비용을 절감할 수 있으며, 같은 음성에 대한 전화채널의 영향을 음성인식 성능 비교를 통해 분석할 수 있으므로 전화환경에서의 음성인식 엔진 개발에 유리하다.The use of various voice databases recorded in the existing quiet environment can save a lot of time and money, and the voice recognition engine in the telephone environment can be analyzed through the comparison of voice recognition performances. It is advantageous for development.

이러한 이점을 살린 기존 음성 데이터베이스를 사용하여 전화음성 데이터베이스를 만드는 방법으로는 기존 음성 데이터베이스를 실제의 전화선으로 통과시키는 대신 채널 시뮬레이터를 사용하는 방법이 있으나, 채널 시뮬레이터는 고가일 뿐아니라 실제의 전화채널 특성을 충분히 반영하고 있는지 알 수가 없고 다양한 파라미터를 제어하기도 쉽지 않다.In order to create a telephone voice database using the existing voice database utilizing these advantages, there is a method of using a channel simulator instead of passing the existing voice database over a real telephone line. It is difficult to know whether it reflects enough, and it is not easy to control various parameters.

기존 음성 데이터베이스를 사용하여 전화음성 데이터베이스를 만드는 다른 방법으로는 음성파일을 아날로그 음성으로 만들어 전화 수화기를 통해 들려준 뒤, 다른 전화와 연결된 녹음장치를 이용하여 수집하는 방법도 있으나, 증폭기 및 스피커의 영향을 무시할 수 없어 채널 왜곡만을 고려한 인식 성능 향상 정도를 분석하기 곤란하며, 음성 구간의 시작과 끝을 구분하는 문제 및 레이블링 문제 등을 처리해야 한다.Another way to create a phone voice database using an existing voice database is to create a voice file as an analog voice and play it through a telephone receiver, and then collect it using a recording device connected to another phone. Cannot be ignored, and it is difficult to analyze the improvement of recognition performance considering only channel distortion, and it is necessary to deal with the problem of distinguishing the beginning and the end of the speech section and the labeling problem.

상기와 같은 문제점을 해결하기 위한 본 발명의 목적은 전화음성 데이터베이스를 구축하기 위하여 사용자의 음성을 전화를 통해 직접 수집하는 대신, 기존의 조용한 환경에서 수집되어 레이블링된 음성 데이터를 전화망을 통과시킴으로써, 대규모의 전화음성 데이터베이스를 쉽게 구축할 수 있도록 하는 전화음성 데이터베이스 자동 구축 장치 및 방법을 제공하는 데 있다.An object of the present invention to solve the above problems is to collect the voice data collected and labeled in the existing quiet environment through the telephone network, instead of collecting the user's voice directly through the telephone to build a telephone voice database, The present invention provides an apparatus and method for automatically constructing a telephone voice database that can easily build a telephone voice database.

도 1은 일반적인 전화음성 데이터 수집 장치를 나타내는 블록 구성도1 is a block diagram showing a general phone voice data collection device

도 2는 본 발명에 의한 전화음성 데이터베이스 자동 구축 장치의 블록 구성도2 is a block diagram of the automatic construction of telephone voice database according to the present invention

도 3은 본 발명에 의한 음성파일의 전화음성파일로의 자동 변환 과정을 나타내는 흐름도3 is a flowchart illustrating an automatic conversion process of a voice file into a phone voice file according to the present invention.

도 4는 본 발명에 의한 전화음성 데이터베이스 자동 구축 과정을 나타내는 흐름도4 is a flowchart illustrating a process of automatically building a telephone voice database according to the present invention.

* 도면의 주요 부분에 대한 부호의 설명 *Explanation of symbols on the main parts of the drawings

10 : 전화선 정합부 12 : 음성신호처리부10: telephone line matching unit 12: voice signal processing unit

14 : 호제어 및 음성 녹음부 16 : 음성파일 저장부14: call control and voice recording unit 16: voice file storage unit

50 : 공중망의 지역 전화국(PSTN) 51 : 송신전화선50: local telephone station (PSTN) of public network 51: outgoing telephone line

52 : 수신전화선 60 : 전화망 정합부52: incoming call line 60: telephone network matching unit

70 : 호스트 컴퓨터 71 : 호처리 제어부70: host computer 71: call processing control unit

72: 시스템 제어부72: system control unit

73 : 음성 (Clean Speech) 데이터베이스73: Clean Speech database

74 : 음성파일 재생부 75 : 전화음성파일 녹음 및 저장부74: voice file playback unit 75: telephone voice file recording and storage unit

76 : 전화음성 데이터베이스76: telephone voice database

상기 목적을 달성하기 위하여 본 발명은 공중망의 지역 전화국에 전화를 걸고 받아주며, 음성파일 재생부에서 입력되는 디지털 음성신호를 아날로그 전화음성으로 변환하여 송신전화선을 통해 송신하고, 수신전화선을 통해 수신되는 아날로그 전화음성을 디지털 음성신호로 변환하여 전화음성파일 녹음 및 저장부로 출력하며 음성파일의 끝을 알려주기 위한 특수 톤을 내보내는 전화망 정합부와; 상기 전화망 정합부로의 디지털 음성데이터 입출력과, 상기 전화망 정합부의 한 쪽 국선 인터페이스에서 다른 쪽 국선 인터페이스로 전화를 걸어주고 끊어주는 동작, 및 특수 톤을 내보내는 동작을 제어하는 호처리 제어부와; 음성 데이터베이스로부터 음성파일을 읽어 다운샘플링하고, 전화선 통과를 위해 필요한 변환을 한 다음, 변환된 디지털 음성 데이터를 상기 전화망 정합부로 송신하는 음성파일 재생부와; 상기 전화망 정합부에서 음성파일 재생 끝을 알리는 톤을 수신할 때까지 상기 전화망 정합부에서 출력하는 디지털 음성신호를 녹음하고, 하나의 음성파일에 대한 녹음이 완료되면 전화음성파일을 대용량 저장장치의 전화음성 데이터베이스에 저장하는 전화음성파일 녹음 및 저장부; 및 상기 호처리 제어부, 음성파일 재생부, 전화음성파일 녹음 및 저장부의 동작을 제어함으로써 전화음성 데이터베이스 자동 구축 장치를 총괄 제어하는 시스템 제어부를 포함하여 구성하는 것을 특징으로 한다.In order to achieve the above object, the present invention makes and receives a telephone call to a local telephone station of a public network, converts a digital voice signal input from a voice file reproducing unit into an analog telephone voice, transmits it through a transmission telephone line, and is received through a receiving telephone line. A telephone network matching unit for converting an analog telephone voice into a digital voice signal and outputting the telephone voice file recording and storage unit and outputting a special tone for notifying the end of the voice file; A call processing control unit controlling digital voice data input / output to the telephone network matching unit, an operation of making and disconnecting a call from one trunk line interface to the other trunk line interface of the telephone network matching unit, and outputting a special tone; A voice file reproducing unit for reading and downsampling the voice file from the voice database, performing conversion necessary for passing the telephone line, and then transmitting the converted digital voice data to the telephone network matching unit; Record the digital voice signal outputted from the telephone network matching unit until the telephone network matching unit receives a tone indicating the end of playing the voice file, and when the recording of one voice file is completed, the telephone voice file is stored in the telephone of the mass storage device. Telephone voice file recording and storage unit for storing in the voice database; And a system controller for controlling the call voice database automatic construction device by controlling the operations of the call processing controller, the voice file reproducing unit, and the phone voice file recording and storage unit.

이하, 본 발명의 바람직한 실시예를 첨부된 도면을 참조하여 상세히 설명하면 다음과 같다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도면 2는 본 발명에 의해 구현된 전화음성 데이터베이스 자동 구축 장치의 구성을 나타내는 블록도로서, 기존의 음성 데이터베이스를 이용하여 전화음성 데이터베이스를 구축할 수 있도록, 공중망의 지역 전화국(50)과; 호처리 제어부(71), 음성파일 재생부(74), 전화음성파일 녹음 및 저장부(75), 음성 데이터베이스(73), 전화음성 데이터베이스(76) 및 시스템 제어부(72)를 포함하여 이루어지는 호스트 컴퓨터(70)와; 상기 공중망의 지역 전화국(50)과 송신전화선(51) 및 수신전화선(52)을 통해 접속되며, 또한 상기 상기 호스트 컴퓨터(70)와 접속되는 전화망 정합부(60)로 구성된다.Figure 2 is a block diagram showing the configuration of the automatic phone voice database construction device implemented by the present invention, the local telephone station 50 of the public network to be able to build a phone voice database using an existing voice database; A host computer including a call processing control unit 71, a voice file reproducing unit 74, a phone voice file recording and storage unit 75, a voice database 73, a phone voice database 76 and a system control unit 72. 70 and; A telephone network matching section 60 is connected to a local telephone station 50 of the public network via a transmission telephone line 51 and a reception telephone line 52, and is also connected to the host computer 70.

상세하게는 공중망의 지역 전화국(50)에 전화를 걸어주고 음성파일 재생부(74)에서 입력된 A-Law PCM 포맷의 디지털 음성신호를 아날로그 음성신호로 변환하여 송신전화선(51)을 통하여 송신하며, 공중망의 지역 전화국(50)의 호 착신을 감지해서 전화를 받아 주고 수신전화선(52)을 통해 수신된 아날로그 음성신호를A-Law PCM 포맷의 디지털 음성신호로 변환하여 전화음성파일 녹음 및 저장부(75)에 출력하며, 또한 필요한 DTMF톤 생성을 처리해 주는 전화망 정합부(60)와; 상기 전화망 정합부(60)의 디지털 음성데이터 입출력, 음성데이터를 내보낼 송신전화선(51)에서 공중망의 지역 전화국(50)을 통과한 음성데이터를 받을 수신전화선(52)으로의 호의 발생/해제 및, 하나의 음성파일 재생이 완료되었을 때 이를 구분할 수 있는 톤의 송수신을 제어하는 호처리 제어부(71)와; 기존의 조용한 환경에서 녹음된 음성 데이터베이스(73)로부터 음성파일을 읽어서 파일 헤더 정보를 제거한 다음, 저역 통과 필터를 통과시켜 8 kHz로 다운샘플링하고 8-bit A-Law PCM 포맷으로 양자화하여 디지털 변환하고, 변환된 디지털 음성 데이터를 상기 전화망 정합부(60)로 송신하는 음성파일 재생부(74)와; 상기 전화망 정합부(60)에서 전화선으로부터 수신된 아날로그 음성신호를 디지털 음성신호로 변환해서 송신해 주면, 상기 전화망 정합부(60)에서 음성파일 재생 끝을 알리는 DTMF 톤을 수신할 때까지 상기 전화망 정합부(60)에서 출력하는 디지털 음성신호를 녹음하고, 하나의 음성파일에 대한 녹음이 완료되면 이를 16-bit Linear PCM 포맷으로 변환하여, 변환된 전화음성파일을 대용량 저장장치의 전화음성 데이터베이스(76)에 저장하는 전화음성파일 녹음 및 저장부(75); 및 상기 호처리 제어부(71), 음성파일 재생부(74), 전화음성파일 녹음 및 저장부(75)의 동작을 제어함으로써 본 발명의 전화음성 데이터베이스 자동 구축 장치를 총괄 제어하는 시스템 제어부(72)를 포함하여 구성된다.Specifically, the local telephone station 50 of the public network is dialed, and the A-Law PCM format digital voice signal input from the voice file reproducing unit 74 is converted into an analog voice signal and transmitted through the transmission telephone line 51. Detects the incoming call of the local telephone company 50 of the public network to answer the call, and converts the analog voice signal received through the receiving phone line 52 into a digital voice signal of the A-Law PCM format to record and store the phone voice file. A telephone network matching unit 60 for outputting to 75 and processing necessary DTMF tones generation; Digital voice data input / output of the telephone network matching unit 60, generation / release of a call from a transmission telephone line 51 to which voice data is to be exported to a reception telephone line 52 to receive voice data passing through a local telephone station 50 of a public network; A call processing control unit 71 for controlling transmission and reception of tones which can distinguish when one voice file is reproduced; Read the voice file from the recorded voice database (73) in the existing quiet environment, remove the file header information, pass it through a low pass filter, downsample it to 8 kHz, quantize it into 8-bit A-Law PCM format, and digitally convert it. A voice file reproducing unit 74 for transmitting the converted digital voice data to the telephone network matching unit 60; When the telephone network matching unit 60 converts the analog voice signal received from the telephone line into a digital voice signal and transmits it, the telephone network matching unit 60 matches the telephone network until the DTMF tone indicating the end of the voice file reproduction is received. The digital voice signal output from the unit 60 is recorded, and when the recording of one voice file is completed, the digital voice signal is converted into a 16-bit linear PCM format, and the converted phone voice file is converted into a phone voice database of the mass storage device (76). Telephone voice file recording and storage unit 75 to store; And a system controller 72 for totally controlling the telephone voice database automatic construction device of the present invention by controlling operations of the call processing control unit 71, the voice file reproducing unit 74, and the telephone voice file recording and storing unit 75. It is configured to include.

상기와 같이 구성된 전화음성 데이터베이스 자동 구축 장치에서 음성파일을전화음성파일로 자동 변환해 주는 과정은 도 3에 도시한 바와 같다.A process of automatically converting a voice file into a phone voice file in the apparatus for automatically building a phone voice database configured as described above is illustrated in FIG. 3.

먼저, 전화망 정합부(60)의 송신전화선(51)에서 수신전화선(52)으로 전화가 연결되어 있는 상태에서 음성파일 재생부(74)는 음성 데이터베이스(73)에서 음성파일을 하나 읽어 온다(S1).First, the voice file reproducing unit 74 reads one voice file from the voice database 73 while the telephone is connected from the transmission telephone line 51 of the telephone network matching unit 60 to the receiving telephone line 52 (S1). ).

그런 다음 음성파일 재생부(74)는 파일 헤더 정보를 제거하고(S2), 다운샘플링 과정에서 발생할 수 있는 간섭 효과(aliasing)를 없애기 위해 4 kHz 이하의 저역통과 디지털 필터를 거치도록 한 뒤(S3), 보통 16 kHz 이상의 샘플링률을 가지는 음성 데이터를 8 kHz로 다운샘플링해 준다(S4).Then, the audio file reproducing unit 74 removes the file header information (S2), and passes through a low pass digital filter of 4 kHz or less in order to eliminate the interference effect (aliasing) that may occur during the downsampling process (S3). ), The audio data having a sampling rate of 16 kHz or more is downsampled to 8 kHz (S4).

또한, 전화망을 통과한 음성데이터를 공중망의 지역 전화국(50) 등에서 음성인식할 경우를 위해, 음성인식 시스템이 E1 트렁크에 연결될 경우를 가정하여 디지털 음성데이터는 8-bit A-Law PCM 포맷으로 하는데, 이를 위해 8 kHz로 다운샘플링된 16-bit Linear PCM 포맷의 디지털 음성을 8-bit A-Law PCM 포맷의 디지털 음성으로 양자화한다(S5).In addition, in order to recognize the voice data passing through the telephone network by the local telephone company 50 in the public network, the digital voice data is assumed to be connected to the E1 trunk and the digital voice data is in 8-bit A-Law PCM format. For this purpose, the digital voice of 16-bit linear PCM format downsampled to 8 kHz is quantized into 8-bit A-Law PCM format digital voice (S5).

음성파일 재생부(74)에서 호처리 제어부(71)의 제어로 지정된 채널로 변환된 디지털 음성신호를 전화망 정합부(60)에 송신해 주면, 전화망 정합부(60)는 이를 아날로그 음성신호로 변환하여 송신전화선(51)으로 내보내게 된다. 그리고 송신전화선(51)으로 나간 아날로그 음성신호는 공중망의 지역 전화국(50)을 통과하여 수신전화선(51)으로 수신 된다(S6).When the voice file reproducing unit 74 transmits the digital voice signal converted into the channel designated by the control of the call processing control unit 71 to the telephone network matching unit 60, the telephone network matching unit 60 converts it into an analog voice signal. To the outgoing telephone line 51. Then, the analog voice signal to the transmission telephone line 51 is received by the reception telephone line 51 through the local telephone station 50 of the public network (S6).

수신된 아날로그 음성신호는 전화망 정합부(60)에서 디지털 음성신호로 변환된 후 전화음성파일 녹음 및 저장부(75)에서 녹음되고(S7), 녹음이 완료되면 16-bit Linear PCM 포맷으로 다시 변환되어(S8), 전화음성파일로 전화음성 데이터베이스(76)에 저장된다(S9).The received analog voice signal is converted into a digital voice signal by the telephone network matching unit 60 and then recorded by the telephone voice file recording and storage unit 75 (S7), and when the recording is completed, it is converted back to the 16-bit linear PCM format. (S8), it is stored in the telephone voice database 76 as a telephone voice file (S9).

도 4는 본 발명에 의한 전화음성 데이터베이스 자동 구축 장치에서 대량의 음성파일을 자동으로 전화음성파일로 변환하는 과정을 나타낸 흐름도이다.4 is a flowchart illustrating a process of automatically converting a large number of voice files into a phone voice file in the apparatus for automatically building a phone voice database according to the present invention.

이를 상세하게 설명한다.This will be described in detail.

먼저, 음성 데이터베이스를 이용하여 전화음성 데이터베이스를 구축하기 위하여 시스템 제어부(72)는 호처리 제어부(71)를 통해 전화망 정합부(60)가 공중망의 지역 전화국(50)에 전화를 걸어 송신전화선(51)과 수신전화선(52)을 연결하도록 하므로써 공중망의 지역 전화국(50)을 통한 로컬 루프(Local Loop)를 형성한다(S10).First, in order to build a telephone voice database using the voice database, the system control unit 72 makes a telephone network matching unit 60 call a local telephone station 50 of the public network through the call processing control unit 71 to transmit a telephone line 51. ) To form a local loop through the local telephone station 50 of the public network (S10).

로컬 루프가 형성되면 음성파일 재생부(74)는 음성 데이터베이스(73)에서 음성파일을 하나 읽어와서(S11), 읽은 음성파일을 8 kHz 8-bit A-Law PCM로 변환한다(S12). 이때 음성파일 재생부(74)가 음성 데이터베이스(73)에서 읽어 오는 음성파일은 시스템 제어부(72)가 음성파일 리스트를 참조하여 지정한다. 그리고 음성파일의 변환은 앞서 설명한 바와 같이, 읽은 음성파일의 헤더 정보를 제거하고, 저역통과 디지털 필터를 거치도록 한 뒤, 8 kHz로 다운샘플링하고, 다시 8-bit A-Law PCM 포맷의 디지털 음성으로 변환하는 일련의 과정을 거쳐 이루어진다.When the local loop is formed, the voice file reproducing unit 74 reads one voice file from the voice database 73 (S11), and converts the read voice file into an 8 kHz 8-bit A-Law PCM (S12). At this time, the voice file read out from the voice database 73 by the voice file reproducing unit 74 is designated by the system control unit 72 with reference to the voice file list. As described above, the conversion of the voice file is performed by removing the header information of the read voice file, passing through a low pass digital filter, downsampling at 8 kHz, and then again using 8-bit A-Law PCM format digital voice. This is done through a series of conversions.

전화망 정합부(71)는 음성파일 재생부(64)에서 입력되는 8 kHz 8-bit A-Law PCM로 변환된 디지털 음성파일을 아날로그 음성신호로 변환시켜 송신전화선(51)으로 내보내어 음성파일 재생을 시작하고, 공중망의 지역 전화국(50)을 통해 수신전화선(52)으로 아날로그 음성신호를 수신하여 디지털 음성신호로 변환하여 전화음성파일 녹음 및 저장부(75)에 출력하여 전화음성 녹음을 시작한다(S13).The telephone network matching unit 71 converts the digital voice file converted into the 8 kHz 8-bit A-Law PCM input from the voice file reproducing unit 64 into an analog voice signal and exports it to the transmission telephone line 51 to reproduce the voice file. Start, and receives the analog voice signal to the receiving telephone line 52 through the local telephone station 50 of the public network to convert the digital voice signal and output to the telephone voice file recording and storage unit 75 to start the telephone voice recording (S13).

이때, 전화음성파일 녹음 및 저장부(75)는 시스템 제어부(72)의 제어에 의해 전화음성을 녹음한다.At this time, the telephone voice file recording and storage unit 75 records the telephone voice under the control of the system control unit 72.

송신전화선(51)으로 음성파일을 재생하는 중에(S14) 음성파일의 끝이 검출되면(S15), 전화망 정합부(60)는 호처리 제어부(71)의 제어로 송신전화선(51)으로 DTMF 톤 '# '을 내보낸다(S16).When the end of the voice file is detected (S15) while the voice file is being reproduced by the transmission telephone line 51 (S15), the telephone network matching unit 60 controls the call processing control unit 71 to transmit DTMF tones to the transmission telephone line 51. Export '#' (S16).

이때 음성파일의 끝은 파일의 EOF(End Of File)이나 파일 끝에 부가된 DTMF 톤에 의해 인식하게 되는데, 음성파일 재생부(74)에서 음성파일의 끝을 인식하면 이를 시스템 제어부(72)로 알린다. 그러면 시스템 제어부(72)가 호처리 제어부(71)를 통하여 전화망 정합부(60)에서 음성파일의 끝을 알리는 DTMF 톤 '#'을 전송하도록 제어하게 되는 것이다.At this time, the end of the voice file is recognized by the end of file (OFF) of the file or by the DTMF tone added to the end of the file. When the voice file playing unit 74 recognizes the end of the voice file, it notifies the system controller 72 of the end of the file. . Then, the system control unit 72 controls the telephone network matching unit 60 to transmit the DTMF tone '#' indicating the end of the voice file through the call processing control unit 71.

그리고, 수신전화선(52)에서 음성파일의 끝을 나타내는 DTMF 톤 '# ' 이 검출되었는지 확인한다(S17). 확인 결과 수신전화선(52)에서 DTMF 톤 '# ' 이 검출되지 않으면 음성 녹음을 계속한다(S18). 그리고 단계 S14로 이행하여 계속하여 음성파일을 재생하면서 음성파일의 끝이 검출되어 송신전화선(51)을 통해 DTMF 톤 '#'이 수신전화선(52)에서 검출되었는지를 확인하게 된다.Then, it is checked whether the DTMF tone '#' indicating the end of the voice file is detected at the reception telephone line 52 (S17). If the DTMF tone '#' is not detected from the reception telephone line 52, the voice recording is continued (S18). Then, the flow advances to step S14 to continuously play back the voice file, and the end of the voice file is detected to confirm whether the DTMF tone '#' has been detected on the receiving phone line 52 through the transmission telephone line 51.

그러나, 단계 S17에서의 확인 결과 수신전화선(52)에서 DTMF 톤 '# '이 검출되면 전화음성파일 녹음 및 저장부(75)에서의 음성녹음을 종료한다(S19). 이때, 음성녹음의 종료는 전화망 정합부(60)가 수신전화선(52)에서 DTMF 톤 '# '이 검출되었음을 호처리 제어부(71)를 통해 시스템 제어부(72)에 알리고, 시스템 제어부(72)가 전화음성파일 녹음 및 저장부(75)를 제어함으로써 이루어진다.However, when the DTMF tone '#' is detected in the reception telephone line 52 as a result of the checking in step S17, the voice recording in the telephone voice file recording and storage unit 75 ends (S19). At this time, the end of the voice recording, the telephone network matching unit 60 notifies the system control unit 72 through the call processing control unit 71 that the DTMF tone '#' has been detected on the receiving telephone line 52, the system control unit 72 This is achieved by controlling the telephone voice file recording and storage unit 75.

이렇게 하나의 음성파일에 대한 음성재생과 녹음이 모두 완료하면 전화음성파일 녹음 및 저장부(75)는 A-Law PCM 포맷으로 녹음이 완료된 음성을 16-bit Linear PCM 포맷으로 변환한 후 전화음성 데이터베이스(76)에 저장한다(S20).When the voice playback and recording of one voice file is completed, the phone voice file recording and storage unit 75 converts the voice recording completed in the A-Law PCM format into a 16-bit linear PCM format and then stores the phone voice database. Stored in (76) (S20).

이때, 전화음성 데이터베이스(76)에 저장되는 전화음성 파일은 시스템 제어부(72)에서 지정한 이름으로 레이블링되어 저장된다.At this time, the telephone voice file stored in the telephone voice database 76 is labeled and stored with the name specified by the system controller 72.

그런 다음 음성 데이터베이스(63) 내의 모든 파일을 다 변환하였는지를 확인한다(S21). 확인 결과, 음성 데이터베이스(63)의 모든 파일을 다 변환하였으면 전화를 끊어 공중망의 지역 전화국(50)과의 로컬 루프 형성을 해제하여 전화음성 데이터베이스 구축 작업을 종료한다(S22). 그러나 음성 데이터베이스(63) 내에 변환할 음성파일이 존재하면 단계 S11로 돌아가서 음성데이터 파일을 읽어 전화음성으로 변환하기 위하여 위의 단계 S12 내지 S21을 반복하게 된다.Then, it is checked whether all files in the voice database 63 have been converted (S21). As a result of the verification, when all the files in the voice database 63 have been converted, the telephone is disconnected, the local loop is formed with the local telephone station 50 of the public network, and the telephone voice database construction work is terminated (S22). However, if there is a voice file to be converted in the voice database 63, the process returns to step S11 to repeat the above steps S12 to S21 to read the voice data file and convert it into a telephone voice.

전화음성 데이터베이스 구축 작업의 종료 역시 시스템 제어부(72)가 음성파일 리스트를 참조하여 모든 음성파일이 전화음성 파일로 변환된 것을 확인하고, 호처리 제어부(71)와 전화음성파일 녹음 및 저장부(75)를 제어함으로써 이루어진다.The end of the telephone voice database construction work is also confirmed by the system control unit 72 by referring to the voice file list that all voice files are converted into telephone voice files, and the call processing control unit 71 and the telephone voice file recording and storage unit 75. By controlling

이상과 같이, 이미 레이블링된 음성 데이터의 전화선 통과 과정을 자동화함으로써, 대량의 음성 데이터베이스에 대한 전화음성 변환 및 전화음성 데이터베이스의 구축을 자동화할 수 있게 된다.As described above, by automating the telephone line passage of the already labeled voice data, it is possible to automate the construction of the telephone voice conversion and the telephone voice database for a large voice database.

상술한 바와 같이 본 발명은 운용자 및 음성을 제공하는 사용자 없이 기존의 음성 데이터베이스를 이용하여 전화음성 데이터베이스를 자동으로 구축할 수 있어 운용자 및 사용자에게 드는 작업시간과 비용을 절감할 수 있다.As described above, the present invention can automatically build a telephone voice database using an existing voice database without an operator and a user who provides a voice, thereby reducing work time and cost for the operator and the user.

또한 음성인식 시스템에서의 기존 음성 데이터와 전화음성 데이터간의 인식 성능 비교가 가능하게 되어 전화채널이 인식 성능에 미치는 영향을 분석할 수 있고, 이에 따라 전화채널에 최적화된 음성인식 시스템의 개발이 가능하게 한다.In addition, it is possible to compare the recognition performance between the existing voice data and the telephone voice data in the voice recognition system, thereby analyzing the influence of the telephone channel on the recognition performance, and thus, the development of the voice recognition system optimized for the telephone channel is possible. do.

이상에서 본 발명에 대한 기술사상을 첨부도면과 함께 서술하였지만 이는 본 발명의 바람직한 실시예를 예시적으로 설명한 것이지 본 발명을 한정하는 것은 아니다. 또한, 이 기술분야의 통상의 지식을 가진 자라면 누구나 본 발명의 기술사상의 범주를 이탈하지 않는 범위 내에서 다양한 변형 및 모방이 가능함은 명백한 사실이다.The technical spirit of the present invention has been described above with reference to the accompanying drawings, but this is by way of example only and not intended to limit the present invention. In addition, it is obvious that any person skilled in the art can make various modifications and imitations without departing from the scope of the technical idea of the present invention.

Claims

A local telephone station 50 of the public network so that a telephone voice database can be constructed using an existing voice database; A host computer including a call processing control unit 71, a voice file reproducing unit 74, a phone voice file recording and storage unit 75, a voice database 73, a phone voice database 76 and a system control unit 72. 70 and; A telephone network matching unit 60 connected to the local telephone station 50 of the public network via a transmission telephone line 51 and a reception telephone line 52, and also connected to the host computer 70;

The telephone network matching unit 60 makes a call to the local telephone station 50 of the public network, converts the digital voice signal input from the voice file reproducing unit 74 into an analog voice signal, and transmits it through the transmission telephone line 51. It detects the incoming call of the local telephone company 50 of the public network to answer the call and converts the analog voice signal received through the receiving phone line 52 into a digital voice signal and outputs it to the telephone voice file recording and storage unit 75, It also exports DTMF tones that indicate the end of a voice file,

The call processing control unit 71 receives a digital voice data input / output of the telephone network matching unit 60 and a receiving telephone line 52 to receive voice data passing through a local telephone station 50 of a public network from a transmission telephone line 51 to which voice data is to be exported. Controls the generation / release of a call to and the transmission and reception of a tone which can be distinguished when one voice file is played.

The voice file reproducing unit 74 reads the voice file from the voice database 73 recorded in the existing quiet environment, performs conversion necessary for passing the telephone line, and then transmits the converted digital voice data to the telephone network matching unit.

The telephone voice file recording and storage unit 75 records the digital voice signal output from the telephone network matching unit 60 until the telephone network matching unit 60 receives a DTMF tone indicating the end of the voice file reproduction, When recording of one voice file is completed, the format of the voice file is converted, and the converted phone voice file is stored in the phone voice database 76 of the mass storage device.

The system control unit 72 controls overall operation of the telephone voice database by controlling operations of the call processing control unit 71, the voice file reproducing unit 74, and the phone voice file recording and storage unit 75. Phone speech database automatic build device.

Using the existing voice database to automatically build a phone voice database,

A first step of making a local loop through a local telephone station 50 of a public network by dialing a transmission telephone line 51 from a transmission telephone line 51 of the telephone network matching unit 60;

A second step of the voice file reproducing unit 74 reading and converting one voice file from the voice database 73;

The telephone network matching unit 71 starts the reproduction of the voice file through the transmission telephone line 51, receives it from the receiving telephone line 52, and records the telephone voice file and starts recording the telephone voice file in the storage unit 75. Step 3;

A fourth step of, if the end of the voice file is detected during the reproduction of the voice file, outputting a DTMF tone indicating the end of the voice file from the telephone network matching unit 60 to the transmission telephone line 51;

If the DTMF tone indicating the end of the voice file is not detected on the incoming telephone line 52, repeating step 4 while continuing to record the telephone voice, and recording and storing the telephone voice file when the DTMF tone indicating the end of the voice file is detected. A fifth step of ending the voice recording in the unit 75;

A sixth step of converting the format of the recorded voice into a telephone voice database 76 when all of the voice playback and recording of one voice file are completed; And

Check that all files in the voice database 63 have been converted into telephone voice files. If all files in the voice database 63 have been converted into telephone voice files, hang up the telephone to complete the telephone voice database construction operation. And a seventh step of returning to the second step in order to convert the phone to a voice file.

The method of claim 2,

The voice file conversion in the second step,

Remove the header information of the read voice file, pass through a lowpass digital filter to eliminate the interference effects that may occur during downsampling, then downsample 8 kHz and convert to 8-bit A-Law PCM. How to build telephone voice database automatically.

The method according to claim 2 or 3,

DTMF tones indicating the end of the voice file in the fourth step,

The voice file reproducing unit 74 recognizes the EOF of the voice file or the DTMF tone added to the end of the voice file and informs the system control unit 72, so that the system control unit 72 matches the end of the voice file in the telephone network matching unit 60. A method for automatically constructing a telephone voice database, characterized in that it is sent through a transmission telephone line (51) by controlling the call processing control unit (71) to send a DTMF tone.

The method according to claim 2 or 3,

The fifth step of terminating the voice recording,

When the DTMF tone indicating the end of the voice file is detected on the receiving telephone line 52, the telephone network matching unit 60 notifies the system control unit 72 through the call processing control unit 71, and the system control unit 72 calls the voice file. Method for automatically constructing a telephone voice database, characterized in that the recording and storage unit 75 controls to end voice recording.

The method according to claim 2 or 3,

The sixth step,

And converting the recorded voice into a 16-bit linear PCM, and labeling the recorded voice with a name designated by the system controller 72 and storing the recorded voice in the telephone voice database 76.

A computer-readable recording medium having recorded thereon a program for automatically constructing a telephone voice database using an existing voice database,

If the DTMF tone indicating the end of the voice file is not detected on the incoming telephone line 52, repeating step 4 while continuing to record the telephone voice, and recording and storing the telephone voice file when the DTMF tone indicating the end of the voice file is detected. A fifth step of ending voice recording in the unit 75;

Check that all files in the voice database 63 have been converted into telephone voice files. If all files in the voice database 63 have been converted into telephone voice files, hang up the telephone to complete the telephone voice database construction operation. And a seventh step of returning to the second step to convert the data into a telephone voice file.