KR20200098320A

KR20200098320A - CTS conbine system with PU

Info

Publication number: KR20200098320A
Application number: KR1020190016317A
Authority: KR
Inventors: 허일; 김효순
Original assignee: 주식회사 더하일
Priority date: 2019-02-12
Filing date: 2019-02-12
Publication date: 2020-08-20
Also published as: KR102330345B1

Abstract

A CTS combine system with PU of the present invention comprises: a database unit in which a voice file is stored; a text input unit inputting a text; a control unit extracting a voice file corresponding to the text inputted from the text input unit from the database unit; a TTS engine converting the voice file transmitted from the control unit into audio wave data; and an audio processing unit connected to the TTS engine and converting the audio wave data converted in the TTS engine into an analog voice signal to transmit the analog voice signal to a speaker. The CTS combine system may prevent load.

Description

PU applied CTS conbine system {CTS conbine system with PU}

본 발명은 개인단말을 통한 문자입력에 의해 입력된 문자가 음성으로 매칭되어 구현되는 PU 적용 CTS 컨바인 시스템에 관한 것이다.The present invention relates to a PU applied CTS Conbine system in which a character input by a character input through a personal terminal is matched with a voice.

TTS(Text To Speech)시스템은 일반적인 텍스트를 사람의 음성으로 변환하는 기술을 말하는데, 예를 들어 데이터베이스에 저장되어 있는 각 텍스트에 상응하는 분절음을 읽어 들여 조합함으로써 원하는 음성을 생성한다. 이러한 TTS시스템은 주로 무인 자동응답시스템(ARS)이나 시각 장애인을 위해 텍스트 정보를 음성으로 변환하여 제공하는 기술 등에 적용되어 왔다.The TTS (Text To Speech) system refers to a technology that converts a general text into a human voice. For example, a desired voice is generated by reading and combining the segmental sounds corresponding to each text stored in a database. This TTS system has been mainly applied to an unmanned automatic response system (ARS) or a technology that converts text information into voice and provides it for the visually impaired.

일 예로 대한민국 특허등록 제10-1180783호에서는 텍스트를 음성으로 변환하는 기능(TTS: Text To Speech)을 구비한 방송 단말에서 수행하되, 사용자 취향정보가 입력된 경우에 이렇게 입력된 내용에 따라 사용자 취향정보 데이터베이스를 갱신하는 (a)단계; 방송 프로그램 정보를 수신하여 방송 프로그램 정보 데이터베이스를 구축하는 (b)단계: 맞춤형 TTS 서비스가 선택된 경우에 상기 방송 프로그램 정보 데이터베이스와 상기 취향정보 데이터베이스의 내용을 매칭시켜 사용자의 취향에 맞는 맞춤형 방송 프로그램 데이터베이스를 시간별 혹은 채널별로 구축하는 (c)단계 및 상기 (c)단계에서 구축된 상기 맞춤형 방송 프로그램 정보 데이터베이스의 텍스트 데이터를 음성으로 변환하여 출력하는 (d)를 포함하고 예약시청 또는 예약녹화의 서비스를 음성으로 단말에 명령하는 단계(e)를 더 포함하여 이루어진 TTS 기술을 이용한 사용자 맞춤형 방송 서비스 방법을 제시하고 있다.For example, in Korean Patent Registration No. 10-1180783, it is performed on a broadcasting terminal equipped with a function to convert text into speech (TTS: Text To Speech), but when user preference information is input, the user preference (A) updating the information database; Step (b) of constructing a broadcast program information database by receiving broadcast program information: When a customized TTS service is selected, the contents of the broadcast program information database and the taste information database are matched to create a customized broadcast program database suitable for the user's taste. Including (d) for converting the text data of the customized broadcasting program information database built in step (c) and (c) to be built by time or channel, and outputting it to voice, and provides a service of reservation viewing or reservation recording. A user-customized broadcast service method using the TTS technology further including the step (e) of instructing the terminal is proposed.

그러나 상기 기술의 경우 데이터베이스에 데이터화 된 음성파일에 잡음 등 에러가 발생되는 경우 이후 TTS시스템의 구현에 에러가 발생되거나 부하가 발생되는 문제에 대한 어떠한 기술도 없다. However, in the case of the above technology, there is no description of a problem in which an error or load is generated in the implementation of the TTS system after an error such as noise occurs in a voice file data converted into a database.

대한민국 특허등록 제10-1180783호Korean Patent Registration No. 10-1180783

본 발명은 상술한 문제점을 해결하기 위해 안출된 것으로, TTS시스템을 구현하되 데이터화 된 음성파일의 건전성을 높여 구현도를 높이고 부하를 방지할 수 있는 PU 적용 CTS 컨바인 시스템을 제공하고자 함이다. The present invention has been conceived to solve the above-described problems, and is intended to provide a PU applied CTS combine system capable of implementing a TTS system but increasing the degree of implementation by increasing the soundness of a voice file converted into data and preventing load.

상기 목적을 이루기 위한 본 발명의 PU 적용 CTS 컨바인 시스템(이하 "본 발명의 시스템"이라함)은, 문자를 입력하는 문자입력부와, 상기 문자입력부에서 입력된 문자신호를 송신하고 상기 문자신호에 매칭되는 음성파일을 수신하는 제 1통신부와, 상기 음성파일을 오디오웨이브데이터로 변환하는 TTS엔진과, 상기 TTS엔진에 연결되어 상기 TTS엔진에서 변환된 오디오웨이브데이터를 아날로그 음성신호로 변환하는 오디오처리부와, 상기 오디오처리부에서 변환된 음성신호를 발현시키는 스피커를 포함하는 복수의 개인단말; 상기 문자신호를 수신하고 상기 문자신호에 매칭되는 음성파일을 송신하는 제 2통신부와, 음성파일이 저장된 데이터베이스부를 포함하는 중앙서버;를 포함하는 것을 특징으로 한다. To achieve the above object, the PU-applied CTS Convine system (hereinafter referred to as "system of the present invention") of the present invention transmits a character input unit for inputting a character, and a character signal input from the character input unit, and transmits a character signal to the character signal. A first communication unit that receives a matched voice file, a TTS engine that converts the voice file into audio wave data, and an audio processing unit that is connected to the TTS engine and converts the audio wave data converted by the TTS engine into an analog voice signal. And, a plurality of personal terminals including a speaker for expressing the voice signal converted by the audio processing unit; And a central server including a second communication unit for receiving the text signal and transmitting a voice file matching the text signal, and a database unit storing the voice file.

하나의 예로 상기 중앙서버에는, 상기 데이터베이스부에 음성파일이 입력되도록 하는 음성입력부가 구성되되, 상기 음성입력부는 입력자로부터 근거리의 근거리마이크와 입력자로부터 원거리의 원거리마이크와 상기 근거리마이크와 상기 원거리마이크로부터 입력되는 입력신호를 푸리에 변환을 통해 주파수 성분을 연산하고 각 마이크의 주파수 성분을 통해 주파수 신호 전력을 연산하며 상기 근거리 마이크의 주파수 신호전력과 상기 원거리마이크의 주파수 신호전력을 비교 분석하여 양 신호 전력의 비값을 연산하고 연산된 상기 양 신호 전력의 비값에 기초하여 상기 데이터베이스부에 저장될 음성파일 유무를 판단하는 잡음제거부를 포함하는 것을 특징으로 한다. For example, in the central server, a voice input unit for inputting a voice file to the database unit is configured, and the voice input unit includes a near-field microphone at a distance from the input person, a remote microphone at a distance from the input person, and the near microphone and the remote The frequency component of the input signal input from the microphone is calculated through Fourier transform, the frequency signal power is calculated through the frequency component of each microphone, and the frequency signal power of the near-field microphone and the frequency signal power of the far-field microphone are compared and analyzed to obtain both signals. And a noise canceling unit that calculates a power ratio and determines whether or not there is a voice file to be stored in the database based on the calculated ratio of the power of both signals.

하나의 예로 상기 음성입력부에는 상기 근거리마이크 및 상기 원거리마이크가 노출되는 하우징이 구성되고 상기 하우징 내에 상기 잡음제거부가 내재되며 상기 원거리마이크는 상기 근거리마이크 방향으로 복수의 유동공이 형성되며 반대방향에는 폐면을 형성하는 반사파차단커버에 의해 커버된 형상으로 구성됨을 특징으로 한다. As an example, the voice input unit includes a housing to which the near microphone and the far microphone are exposed, and the noise canceling unit is embedded in the housing, and the far microphone has a plurality of flow holes in the direction of the near microphone, and a closed surface is formed in the opposite direction. It is characterized in that it is configured in a shape covered by a reflective wave blocking cover to be formed.

하나의 예로 상기 음성입력부에는 상기 근거리마이크 및 상기 원거리마이크가 노출되는 하우징이 구성되고 상기 하우징에 있어 상기 근거리마이크에서 상기 원거리마이크에는 요홈형상의 유도홈이 형성되며 상기 원거리마이크는 상기 유도홈의 끝단 측벽에 형성되고 상기 원거리마이크가 형성된 유도홈 끝단부에는 반사파차단커버가 형성되는 것을 특징으로 한다. As an example, the voice input unit includes a housing to which the near-field microphone and the far-field microphone are exposed, and in the housing, a concave-shaped guide groove is formed in the far-field microphone, and the far-end microphone is the end of the guide groove. It is formed on the side wall and characterized in that the reflection wave blocking cover is formed at the end of the guide groove on which the long-distance microphone is formed.

본 발명의 시스템은 안내방송 자동응답 시스템, 대중교통 안내 네비게이션, 교육용 어플리케이션 등 다양한 용도에 적용되어 편리성, 음성의 일관성, 경제성 등의 효과가 발현될 수 있는 장점이 있다. The system of the present invention is applied to a variety of uses, such as an automatic response system for announcements, public transport guidance navigation, and educational applications, and has advantages in that effects such as convenience, consistency of voice, and economy can be expressed.

또한 음성파일의 저장에 있어 잡음이 제거된 순수 음성파일만이 데이터로서 저장되도록 함으로써 TTS시스템의 운용효율을 높일 수 있는 장점이 있다. In addition, there is an advantage of increasing the operational efficiency of the TTS system by allowing only pure audio files from which noise has been removed to be stored as data when storing audio files.

도 1은 본 발명의 시스템을 나타내는 개략도.
도 2는 도 1에 있어 개인단말의 세부구성을 나타내는 블록도.
도 3은 도 1에 있어 중앙서버의 세부구성을 나타내는 블록도.
도 4는 본 발명의 일 구성인 음성입력부의 상세 구성을 나타내는 블록도.
도 5 및 도 6은 상기 음성입력부의 각 실시 예를 나타내는 개략도.1 is a schematic diagram showing the system of the present invention.
Figure 2 is a block diagram showing a detailed configuration of the personal terminal in Figure 1;
Figure 3 is a block diagram showing a detailed configuration of the central server in Figure 1;
4 is a block diagram showing a detailed configuration of a voice input unit, which is one configuration of the present invention.
5 and 6 are schematic diagrams showing each embodiment of the voice input unit.

이하, 도면을 참고하여 본 발명의 바람직한 실시예에 대하여 상세히 설명한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the drawings.

본 발명의 시스템(1)은 도 1 내지 도 3에서 보는 바와 같이 문자를 입력하는 문자입력부(21)와, 상기 문자입력부(21)에서 입력된 문자신호를 송신하고 상기 문자신호에 매칭되는 음성파일을 수신하는 제 1통신부(25)와, 상기 음성파일을 오디오웨이브데이터로 변환하는 TTS엔진(23)과, 상기 TTS엔진(23)에 연결되어 상기 TTS엔진(23)에서 변환된 오디오웨이브데이터를 아날로그 음성신호로 변환하는 오디오처리부(24)와, 상기 오디오처리부(24)에서 변환된 음성신호를 발현시키는 스피커(26)를 포함하는 복수의 개인단말(2); 상기 문자신호를 수신하고 상기 문자신호에 매칭되는 음성파일을 송신하는 제 2통신부(34)와, 음성파일이 저장된 데이터베이스부(32)를 포함하는 중앙서버(3);를 포함하는 것을 특징으로 한다. The system 1 of the present invention includes a character input unit 21 for inputting characters as shown in Figs. 1 to 3, and a voice file matching the character signal by transmitting a character signal input from the character input unit 21. The first communication unit 25 receiving a signal, a TTS engine 23 that converts the voice file into audio wave data, and the audio wave data that is connected to the TTS engine 23 and converted by the TTS engine 23 A plurality of personal terminals (2) including an audio processing unit (24) for converting into an analog audio signal and a speaker (26) for expressing the audio signal converted by the audio processing unit (24); And a central server (3) including a second communication unit (34) for receiving the text signal and transmitting a voice file matching the text signal, and a database unit (32) storing the voice file. .

상기 개인단말(2)에는 도 2에서 보는 바와 같이 문자입력부(21), 단말제어부(22), TTS엔진(23), 오디어처리부(24), 제 1통신부(25), 스피커(26)가 포함되는 것을 특징으로 한다.The personal terminal 2 includes a character input unit 21, a terminal control unit 22, a TTS engine 23, an audio processing unit 24, a first communication unit 25, and a speaker 26 as shown in FIG. It characterized in that it is included.

또한 상기 중앙서버(3)에는 도 3에서 보는 바와 같이 음성입력부(31), 데이터베이스부(32), 중앙제어부(33), 제 2통신부(34)가 포함되는 것을 특징으로 한다. In addition, the central server 3 is characterized in that it includes a voice input unit 31, a database unit 32, a central control unit 33, and a second communication unit 34 as shown in FIG.

이하 상기 구성들의 작동기작에 대해 설명한다. Hereinafter, an operation mechanism of the above components will be described.

상기 문자입력부(21)는 상기 단말제어부(22)와 연결되어 있으며, 다수의 숫자 키와 각종 기능을 수행하기 위한 기능 키를 구비한다.The character input unit 21 is connected to the terminal control unit 22 and includes a plurality of numeric keys and function keys for performing various functions.

상기 단말제어부(22)는 상기 문자입력부(21)에서 입력된 문자신호를 상기 제 1통신부(25)로 전달하여 상기 제 1통신부(25)에서 상기 문자신호를 상기 중앙서버(3)로 송신토록 하는 것이다. The terminal control unit 22 transmits the character signal inputted from the character input unit 21 to the first communication unit 25 so that the first communication unit 25 transmits the character signal to the central server 3 Is to do.

상기 문자신호는 상기 중앙서버(3)의 제 2통신부(34)로 수신되어 중앙제어부(33)에서는 수신된 문자신호에 매칭되는 음성파일을 상기 데이터베이스부(32)로부터 추출하여 상기 제 2통신부(34)를 통해 해당 개인단말(2)의 상기 제 1통신부(25)로 송신토록 하는 것이다. The text signal is received by the second communication unit 34 of the central server 3, and the central control unit 33 extracts a voice file matching the received text signal from the database unit 32, and the second communication unit ( It is to transmit to the first communication unit 25 of the personal terminal 2 through 34).

이렇게 상기 제 1통신부(25)에서 음성파일이 수신되면 상기 단말제어부(22)는 수신된 음성파일을 상기 TTS엔진(23)으로 전달토록 하고, 상기 TTS엔진(23)은 상기 단말제어부(22)의 일측에 연결되어 있어, 상기 단말제어부(22)로부터 전달된 소정의 언어로 입력된 텍스트 문장 즉 음성파일을 오디오웨이브데이터로 변환하게 되는 것이다. When a voice file is received from the first communication unit 25 in this way, the terminal control unit 22 transmits the received voice file to the TTS engine 23, and the TTS engine 23 transmits the received voice file to the terminal control unit 22. It is connected to one side of the terminal control unit 22 to convert a text sentence inputted in a predetermined language, that is, a voice file into audio wave data.

상기 오디오처리부(24)는 상기 TTS엔진(23)의 일측에 연결되어 있어, 상기 TTS엔진(23)에서 변환된 오디오웨이브데이터를 아날로그 음성신호로 변환한다. 상기 오디오처리부(24)는 일반적인 소프트웨어모듈로서 오디오 드라이버와 하드웨어 블락으로서 오디오 카드를 포함하여 구성한다.The audio processing unit 24 is connected to one side of the TTS engine 23 and converts the audio wave data converted by the TTS engine 23 into an analog audio signal. The audio processing unit 24 is a general software module and includes an audio driver and an audio card as a hardware block.

상기 중앙서버(3)에 있어 상기 데이터베이스부(32)는 각종 음성파일이 저장되는 구성으로 이하에서 설명할 음성입력부(31)에 의해 입력된 음성파일이 저장되는 구성에 해당한다.In the central server 3, the database unit 32 stores various voice files and corresponds to a configuration in which voice files input by the voice input unit 31 to be described below are stored.

또한 도 3에서 보는 바와 같이 본 발명에는 상기 중앙서버(3)에 음성파일이 입력되도록 하는 음성입력부(31)가 구성되도록 하는데 특히 본 발명에서는 상기 데이터베이스부(32)에 데이터화 되는 음성파일에 잡음이 혼입되어 TTS시스템의 구현에 에러가 발생되거나 부하가 발생되는 문제를 해결하는 실시 예를 제시하고 있다. In addition, as shown in FIG. 3, in the present invention, a voice input unit 31 for inputting a voice file to the central server 3 is configured. In particular, in the present invention, noise is generated in the voice file converted into data in the database unit 32. An embodiment for solving a problem in which an error occurs or a load is generated in the implementation of the TTS system is proposed.

본 실시 예의 음성입력부(31)은 도 4에서 보는 바와 같이 입력자로부터 근거리의 근거리마이크(311)와, 입력자로부터 원거리의 원거리마이크(312)와, 상기 근거리마이크(311)와 상기 원거리마이크(312)로부터 입력되는 입력신호를 푸리에 변환을 통해 주파수 성분을 연산하고 각 마이크의 주파수 성분을 통해 주파수 신호 전력을 연산하며 상기 근거리마이크(311)의 주파수 신호전력과 상기 원거리마이크(312)의 주파수 신호전력을 비교 분석하여 양 신호 전력의 비값을 연산하고 연산된 상기 양 신호 전력의 비값에 기초하여 상기 데이터베이스부(32)에 저장될 음성파일 유무를 판단하는 잡음제거부(313)를 포함하는 것을 특징으로 한다. As shown in FIG. 4, the voice input unit 31 of the present embodiment includes a near-field microphone 311 at a distance from the input user, a far-field microphone 312 at a distance from the input user, and the near-field microphone 311 and the far-field microphone ( Calculate the frequency component through Fourier transform of the input signal input from 312), calculate the frequency signal power through the frequency component of each microphone, and calculate the frequency signal power of the near-field microphone 311 and the frequency signal of the far-field microphone 312 And a noise reduction unit 313 that compares and analyzes power to calculate a ratio value of both signal powers and determines whether there is a voice file to be stored in the database unit 32 based on the calculated ratio value of both signal powers. To do.

상기 근거리마이크(311)은 입력자의 입과 근접하여 위치하고, 상기 원거리마이크(312)는 상기 근거리마이크(311)에 비해서 입력자의 입에서 상대적으로 멀리 떨어져 위치하도록 한다. The near-field microphone 311 is positioned close to the inputter's mouth, and the far-field microphone 312 is positioned relatively far from the input's mouth compared to the near-field microphone 311.

상기 잡음제거부(313)에서는 우선 각각의 마이크(311, 312)로부터 입력되는 아날로그 신호를 아날로그-디지털 변환기(analog to digital converter)를 통해 디지털 신호로 변환토록 한다. The noise reduction unit 313 first converts an analog signal input from each of the microphones 311 and 312 into a digital signal through an analog to digital converter.

상기 잡음제거부(313)은 근거리마이크(311)의 주파수 신호 전력과 원거리마이크(312)의 주파수 신호 전력을 비교 분석하여 양 신호 전력의 비값을 연산하고, 연산된 상기 양 신호 전력의 비값에 기초하여 상기 데이터베이스부(32)에 저장될 음성파일 유무 즉 음성파일 외의 신호를 잡음으로 추정하고 이를 제거토록 하는 것이다. The noise removal unit 313 compares and analyzes the frequency signal power of the near-field microphone 311 and the frequency signal power of the far-field microphone 312 to calculate a ratio of both signal powers, and based on the calculated ratio of the two signal powers. Thus, the presence or absence of a voice file to be stored in the database unit 32, that is, signals other than the voice file are estimated as noise and removed.

이를 더욱 상세히 설명하면 상기 잡음제거부(313)은 디지털 신호형태로 입력된 상기 근거리마이크(311)의 음성신호와 상기 원거리마이크(312)의 음성신호에 대한 주파수 성분을 파악할 수 있도록 각각의 음성신호에 대하여 푸리에 변환을 실시한다. 이때, 상기 근거리마이크(311)의 입력 음성신호를 d(n), 상기 원거리마이크(312)의 입력 음성신호를 x(n)이라 하면, 상기 d(n)의 푸리에 변환에 의한 d(k)와 상기 x(n)의 푸리에 변환에 의한 x(k)은 하기의 수학식 1에 의해 도출된다.In more detail, the noise canceling unit 313 identifies the frequency components of the voice signal of the near-field microphone 311 and the voice signal of the far-field microphone 312 input in the form of a digital signal. Fourier transform is performed on. At this time, if the input voice signal of the near-field microphone 311 is d(n) and the input voice signal of the far-field microphone 312 is x(n), d(k) by Fourier transform of d(n) And x(k) by the Fourier transform of x(n) are derived by Equation 1 below.

여기서, 'N'은 푸리에 변환 시의 현재 샘플을 포함한 이전 소정기간 동안의 블록의 샘플 수로서 0≤n≤N-1, 0≤k≤N-1이다. Here,'N' is the number of samples of the block during the previous predetermined period including the current sample during Fourier transform, and is 0≦n≦N-1 and 0≦k≦N-1.

이후, 상기 잡음제거부(313)은 양 마이크(311, 312)의 주파수 신호전력을 연산하는 바, 상기 수학식 1에서 k번째의 주파수 성분 d(k) 및 x(k)의 주파수 신호 전력을 각각 D(k) 및 X(k)라 하면, 상기 D(k) 및 X(k)은 하기 수학식 2에 의해 도출된다.Thereafter, the noise removing unit 313 calculates the frequency signal power of both microphones 311 and 312, and calculates the frequency signal power of the k-th frequency components d(k) and x(k) in Equation 1 above. Considering D(k) and X(k), respectively, the D(k) and X(k) are derived by Equation 2 below.

이와 같이 2 개의 마이크(311, 312)가 근접하여 위치했을 때, 입력자의 입보다 상대적으로 멀리 떨어져 있는 잡음신호인 경우에는 각 마이크(311, 312)에 거의 비슷한 레벨의 신호전력으로 잡음신호가 입력된다. As described above, when the two microphones 311 and 312 are located close together, in the case of a noise signal that is relatively far apart from the input's mouth, the noise signal is input to each microphone 311 and 312 with almost the same signal power. do.

반대로 입력자의 입과 가까이 있는 근거리마이크(311)에는 원거리마이크(312)보다 큰 레벨의 주파수 신호 전력으로 입력자의 목소리가 입력되는 것이다.On the contrary, the voice of the inputter is input to the near-field microphone 311 close to the input person's mouth with a higher level of frequency signal power than the far-field microphone 312.

본 실시예에서는 이러한 현상을 이용하여 잡음구간을 추정하여 데이터베이스부(32)에 진성의 음성파일만이 데이터화 되어 TTS시스템의 구현효율을 높이도록 하는 것이다. In the present embodiment, by estimating the noise section using this phenomenon, only the voice file of authenticity is converted into data in the database unit 32 so as to improve the implementation efficiency of the TTS system.

먼저, 본 발명의 일 실시예에 따른 잡음구간 추정방법을 설명하자면, 상기 수학식 2에서 도출된 D(k) 및 X(k)값에서 D(k)〈X(k)일 경우, 즉 원거리마이크(312)의 주파수 신호전력이 근거리마이크(311)의 주파수 신호전력보다 클 경우, l번째 프레임에서 두 마이크(311, 312)의 신호 전력 D(k), X(k)의 비값 A(l)는 하기 수학식 3에 의해 도출된다.First, a method for estimating a noise section according to an embodiment of the present invention will be described when D(k) <X(k) in the D(k) and X(k) values derived in Equation 2 above, that is, the distance When the frequency signal power of the microphone 312 is greater than the frequency signal power of the near-field microphone 311, the signal power D(k) of the two microphones 311, 312 in the lth frame, the ratio value A(l) of X(k) ) Is derived by Equation 3 below.

여기서, N은 한 블록의 샘플수이며, l은 프레임 인덱스이다.Here, N is the number of samples in one block, and l is a frame index.

일반적으로 잡음은 입력자의 입보다 상대적으로 먼 거리에서 마이크로 입력되기 때문에 두 개의 마이크(311, 312)에 거의 비슷한 레벨의 주파수 신호전력이 입력될 수 있으며, 이때의 A(l)값은 거의 "1"에 가까워 지게 된다.In general, since noise is input to a microphone at a relatively farther distance than the input's mouth, a frequency signal power of almost similar level can be input to the two microphones 311 and 312, and the A(l) value at this time is almost "1. "It gets closer to.

이에 따라, 본 실시예에 따른 잡음제거부(313)은 상기 수학식 3으로부터 도출된 D(k), X(k)의 비값 A(l)을 임계값과 비교 분석하여 잡음구간을 추정하는 것으로, l번째 프레임에서 A(l)〉Thr_A를 만족하는 경우 이를 잡음구간으로 추정할 수 있다. 여기서, 임계값 Thr_A는 0과 1 사이의 값이다.Accordingly, the noise removal unit 313 according to the present embodiment estimates the noise interval by comparing and analyzing the ratio values A(l) of D(k) and X(k) derived from Equation 3 with a threshold value. , If A(l)>Thr_A is satisfied in the l-th frame, it can be estimated as a noise interval. Here, the threshold value Thr_A is a value between 0 and 1.

상기 임계값 Thr_A는 반복적 실험 결과에서 얻어진 최적의 설정값으로, 본 발명이 이에 한정되는 것은 아니다.The threshold value Thr_A is an optimal set value obtained from repeated experiment results, and the present invention is not limited thereto.

다른 실시 예에 따른 잡음구간 추정방법으로는, 상술한 실시 예와 반대개념으로 상기 수학식 2에서 도출된 D(k) 및 X(k)값에서 D(k)〉X(k)일 경우, 즉 근거리마이크(311)의 주파수 신호전력이 원거리마이크(312)의 주파수 신호전력보다 클 경우, l번째 프레임에서 두 마이크(81, 82)의 전력신호 D(k), X(k)의 비값 B(l)는 하기 수학식 4를 통해 도출된다.As a method for estimating a noise section according to another embodiment, in the case of D(k)>X(k) in the values of D(k) and X(k) derived in Equation 2, as opposed to the above-described embodiment, That is, when the frequency signal power of the near-field microphone 311 is greater than the frequency signal power of the far-field microphone 312, the ratio B of the power signals D(k) and X(k) of the two microphones 81 and 82 in the lth frame (l) is derived through Equation 4 below.

본 실시예의 경우, 입력자의 목소리가 원거리마이크(312)보다 근거리마이크(311)에 상대적으로 큰 레벨로 입력되기 때문에 이때의 B(l)은 "1" 이하의 값으로 앞서 실시예에서의 A(l)보다 작은 경향을 보여준다.In this embodiment, since the input's voice is input at a relatively higher level to the near-field microphone 311 than in the far-field microphone 312, B(l) at this time is less than “1” and A ( l).

따라서, l번째 프레임에서의 B(l)은 B(l)〈Thr_B를 만족할 때 잡음구간으로 추정할 수 있다. 여기서 임계값 Thr_B는 0과 1 사이 값이다. 여기서, 상기 임계값 Thr_B 역시 반복적 실험 결과에서 얻어진 최적의 설정값으로, 본 발명이 이에 한정되는 것은 아니다. Accordingly, B(l) in the l-th frame can be estimated as a noise interval when B(l) < Thr_B is satisfied. Here, the threshold value Thr_B is a value between 0 and 1. Here, the threshold value Thr_B is also an optimal set value obtained from an iterative experiment result, and the present invention is not limited thereto.

상기 B(l)값은 음성활동이 없는 구간(Noise)에서는 큰 값을, 음성활동이 있는 구간에서는 상대적으로 작은 값은 나타내는 바, 음성활동이 없는 구간에서만 잡음 신호의 스펙트럼을 추출함으로써 잡음을 제거하도록 하는 것이다.The B(l) value represents a large value in the section without voice activity (Noise), and a relatively small value in the section with voice activity. Noise is removed by extracting the spectrum of the noise signal only in the section without voice activity. To do it.

상술한 두 가지 실시예를 통해 잡음제거부(313)은 음성파일 구간과 잡음구간을 구분하게 되며, 잡음신호만을 추출 및 제거할 수 있게 되는 것이다.Through the above-described two embodiments, the noise removing unit 313 separates the voice file section and the noise section, and can extract and remove only the noise signal.

한편, 본 발명의 바람직한 실시예에 따르면, 상기 잡음제거부(313)에 의해 잡음구간을 추정할 때 음성파일 구간을 보호하기 위하여 상기 수학식 3 및 수학식 4에서 도출된 A(l)과 B(l) 각각에 대하여 스무딩(smoothing)을 실시하도록 한다.Meanwhile, according to a preferred embodiment of the present invention, in order to protect the voice file section when estimating the noise section by the noise removing unit 313, A(l) and B derived from Equations 3 and 4 are (l) Perform smoothing for each.

예를 들면, A(l)의 경우For example, for A(l)

i) A(l)≥A(l-1)일때, A(l)=α1*A(l)+(1-α1)*A(l-1)i) When A(l)≥A(l-1), A(l)=α1*A(l)+(1-α1)*A(l-1)

ii) A(l)〈A(l-1)일때, A(l)=α2*A(l)+(1-α2)*A(l-1)ii) When A(l)<A(l-1), A(l)=α2*A(l)+(1-α2)*A(l-1)

여기서, 0〈α1〈α2〈1 이다.Here, 0<α1<α2<1.

상기와 같이 스무딩하면 A(l)은 천천히 증가하고 빨리 감소하게 된다.When smoothing as described above, A(l) increases slowly and decreases quickly.

반대로 B(l)의 경우Conversely, in the case of B(l)

i) B(l)〉B(l-1)일때, B(l)=β1*B(l)+(1-β1)*B(l-1)i) When B(l)>B(l-1), B(l)=β1*B(l)+(1-β1)*B(l-1)

ii) B(l)≤B(l-1)일때, B(l)=β2*B(l)+(1-β2)*B(l-1)ii) When B(l)≤B(l-1), B(l)=β2*B(l)+(1-β2)*B(l-1)

여기서, 1〉β1〉β2〉0 이다.Here, 1>β1>β2>0.

상기와 같이 스무딩하면 B(l)는 A(l)보다 상대적으로 빨리 증가하고 천천히 감소하게 된다.When smoothing as described above, B(l) increases relatively faster than A(l) and decreases slowly.

여기서, 상기 스무딩 방식은 공지 기술 등을 통해 다양하게 실시될 수 있는 바, 상기 본 발명에서 제시하는 예에 한정되는 것은 아니다. Here, the smoothing method may be variously implemented through known techniques, but is not limited to the examples presented in the present invention.

이후, 상기 잡음제거부(313)에서 양 신호 전력의 비값이 임계값과의 비교 분석을 통해 잡음구간으로 추정되지 않은 경우 음성활동이 있다고 판단하고 음성활동이 있음을 나타내는 신호를 출력한다. 즉 음성파일 구간으로 판단한다. Thereafter, when the ratio value of the power of both signals is not estimated as a noise section through a comparison analysis with a threshold value, the noise removal unit 313 determines that there is voice activity and outputs a signal indicating that there is voice activity. That is, it is determined as a voice file section.

이러한 잡음구간으로 추정되는 신호와 음성파일 신호는 잡음제거부(313)에서 입력된 신호를 기초하여 각 마이크(311, 312)으로부터 입력되어 디지털로 변환된 신호에서 선택적으로 신호의 편집을 실시함으로써, 잡음을 제거하게 되는 것이다.The signal and the voice file signal estimated as such a noise section are input from each microphone 311 and 312 based on the signal input from the noise reduction unit 313 and selectively edit the signal from the digitally converted signal, The noise is removed.

이와 같이 본 발명에서는 상기의 작동기작을 통해 데이터베이스부(32)에 입력될 수 있는 잡음을 제거토록 하는데 이러한 잡음은 각각 마이크(311, 312)로 입력자의 음성신호와 함께 실내의 컴퓨터 팬 소음, TV 소리와 같은 주변 잡음으로 유입되는 것으로 이러한 잡음은 상기 잡음제거부(313)의 작동기작에 의해 제거가 용이할 것으로 보이나 음성신호가 가구 등에 반사되어 형성되는 반사파인 경우 음성신호와 유사한 주파수를 가지고 있어 이를 상기 잡음제거부(313)에서 제거하는 것이 용이하지 않은 문제가 있다. 즉 반사파의 경우 잡음제거부(313)에서 잘못된 음성파일이 데이터화 될 수 있는 요인이 될 수 있다. As described above, in the present invention, noise that may be input to the database unit 32 is eliminated through the above operation mechanism. These noises are respectively used by the microphones 311 and 312 together with the input's voice signal, the noise of the computer fan in the room, and the TV. As it is introduced as ambient noise such as sound, it seems that this noise can be easily removed by the operation mechanism of the noise reduction unit 313, but in the case of a reflected wave formed by reflecting a voice signal to furniture, etc., it has a frequency similar to that of the voice signal. There is a problem that it is not easy to remove this in the noise removing unit 313. That is, in the case of the reflected wave, the noise canceling unit 313 may be a factor in which an incorrect voice file may be converted into data.

이에 본 발명에서는 2가지 실시 예를 제시하고 있는 바, 첫 번째 실시 예가 도 3에 도시되고 있다.Accordingly, in the present invention, two embodiments are proposed, and the first embodiment is shown in FIG. 3.

본 실시 예의 음성입력부(31)은 상기 근거리마이크(311) 및 상기 원거리마이크(312)가 노출되는 하우징(314)가 구성되고 상기 하우징(314) 내에 상기 잡음제거부(313)이 내재되며 상기 원거리마이크(312)는 상기 근거리마이크(311) 방향으로 복수의 유동공(315-1)이 형성되며 반대방향에는 폐면(315-2)를 형성하는 반사파차단커버(315)에 의해 커버된 형상으로 구성됨을 특징으로 한다. The voice input unit 31 of the present embodiment includes a housing 314 to which the near-field microphone 311 and the far-field microphone 312 are exposed, and the noise canceling unit 313 is embedded in the housing 314, and The microphone 312 has a shape covered by a reflection wave blocking cover 315 forming a closed surface 315-2 in the opposite direction and a plurality of flow holes 315-1 formed in the direction of the near-field microphone 311 It features.

상기 원거리마이크(312)가 상기 반사파차단커버(315)에 내재된 상태로 구성되는데 입력자는 근거리마이크(311)에 근접하여 음성신호를 발생시키는 바, 이러한 음성신호(S1)가 바로 원거리마이크(312)로 유입되는 경우는 상기 유동공(315-1)을 통해 유입되도록 하면서 음성신호에 대한 반사파의 경우 반사파차단커버(315)의 폐면(315-2)에 의해 상기 원거리마이크(312)로 반사파의 유입이 차단되도록 하는 것이다. The far-field microphone 312 is configured in a state embedded in the reflected wave blocking cover 315, and the input is close to the near-field microphone 311 to generate a voice signal, and this voice signal S1 is the far-field microphone 312 ) Is introduced through the flow hole 315-1, and in the case of the reflected wave for the voice signal, the reflected wave is transmitted to the far-field microphone 312 by the closed surface 315-2 of the reflected wave blocking cover 315. It is to prevent inflow.

즉 근거리마이크(311)에는 원거리마이크(312)보다 큰 레벨의 주파수 신호 전력으로 입력자의 목소리가 입력되는데 반해 원거리마이크(312)는 작은 레벨의 주파수 신호 전력으로 입력자의 목소리가 입력되어 반사파의 영향으로 신호왜곡의 위험이 큰 바, 원거리마이크(312)에서 반사파가 유입되는 것을 차단토록 하는 것이다.That is, the near-field microphone 311 inputs the input's voice with a higher level of frequency signal power than the far-field microphone 312, whereas the far-field microphone 312 inputs the input's voice with a small level of frequency signal power. Since the risk of signal distortion is great, it is to block the inflow of the reflected wave from the far-field microphone 312.

바람직하게는 반사파차단커버(315)는 다양한 공지의 흡음성재질을 사용하여 반사파가 반사파차단커버(315)에서 흡수되도록 하는 것이 타당하다. Preferably, the reflected wave blocking cover 315 is made of various known sound absorbing materials so that the reflected wave is absorbed by the reflected wave blocking cover 315 is reasonable.

두 번째 실시 예가 도 6에 도시되고 있다.A second embodiment is shown in FIG. 6.

본 실시 예의 음성입력부(31)은 상기 근거리마이크(311) 및 상기 원거리마이크(312)가 노출되는 하우징(314)이 구성되고 상기 하우징(314)에 있어 상기 근거리마이크(311)에서 상기 원거리마이크(312)에는 요홈형상의 유도홈(316)이 형성되며 상기 원거리마이크(312)는 상기 유도홈(316)의 끝단 측벽에 형성되고 상기 원거리마이크(312)가 형성된 유도홈(316) 끝단부에는 반사파차단커버(317)가 형성되는 것을 특징으로 한다. The voice input unit 31 according to the present embodiment includes a housing 314 to which the near-field microphone 311 and the far-field microphone 312 are exposed, and in the housing 314, the near-field microphone 311 to the far-field microphone ( In 312, a concave-shaped induction groove 316 is formed, and the distance microphone 312 is formed on an end sidewall of the induction groove 316, and a reflection wave is formed at the end of the induction groove 316 in which the distance microphone 312 is formed. It is characterized in that the blocking cover 317 is formed.

상기 원거리마이크(312)가 유도홈(316)과 상기 반사파차단커버(317)에 내재된 상태로 구성되는데 입력자는 근거리마이크(311)에 근접하여 음성신호를 발생시키는 바, 이러한 음성신호(S1)가 바로 원거리마이크(312)로 유입되는 경우는 유도홈(316)을 타고 유도홈(316)과 반사파차단커버(317)에 의해 형성되는 유로를 통해 유입되도록 하면서 음성신호에 대한 반사파(S2)의 경우 반사파차단커버(317)에 의해 상기 원거리마이크(312)로 반사파의 유입이 차단되도록 하는 것이다. The far-field microphone 312 is configured in a state embedded in the induction groove 316 and the reflected wave blocking cover 317, and the inputter generates an audio signal in proximity to the near-field microphone 311, such a voice signal (S1) In the case of direct flow into the far-field microphone 312, the reflected wave (S2) for the voice signal is transmitted through the flow path formed by the guidance groove 316 and the reflected wave blocking cover 317 through the guidance groove 316. In this case, the inflow of the reflected wave to the far-field microphone 312 is blocked by the reflected wave blocking cover 317.

이 경우도 근거리마이크(311)에는 원거리마이크(312)보다 큰 레벨의 주파수 신호 전력으로 입력자의 목소리가 입력되는데 반해 원거리마이크(312)는 작은 레벨의 주파수 신호 전력으로 입력자의 목소리가 입력되어 반사파의 영향으로 신호왜곡의 위험이 큰 바, 원거리마이크(312)에서 반사파가 유입되는 것을 차단토록 하는 것이다.In this case, the input's voice is input to the near-field microphone 311 with a higher level of frequency signal power than the far-field microphone 312, whereas the far-field microphone 312 receives the input's voice with a small level of frequency signal power. Since the risk of signal distortion is large due to the influence, it is to block the inflow of the reflected wave from the far-field microphone 312.

본 실시 예의 경우도 반사파차단커버(317)은 다양한 공지의 흡음성재질을 사용하여 반사파가 반사파차단커버(317)에서 흡수되도록 하는 것이 타당하다. Also in the case of the present embodiment, it is appropriate that the reflected wave blocking cover 317 uses various known sound absorbing materials so that the reflected wave is absorbed by the reflected wave blocking cover 317.

이상 설명한 내용을 통해 당업자라면 본 발명의 기술사상을 일탈하지 아니하는 범위에서 다양한 변경 및 수정 가능함을 알 수 있을 것이다. 따라서, 본 발명의 기술적 범위는 명세서의 상세한 설명에 기재된 내용으로 한정되는 것이 아니라 특허청구범위에 의해 정해져야만 할 것이다.It will be appreciated by those skilled in the art through the above description that various changes and modifications can be made without departing from the technical idea of the present invention. Therefore, the technical scope of the present invention should not be limited to the contents described in the detailed description of the specification, but should be determined by the claims.

1 : 본 발명 2 : 개인단말
3 : 중앙서버1: the present invention 2: personal terminal
3: Central server

Claims

A character input unit for inputting a character, a first communication unit for transmitting a character signal input from the character input unit and receiving a voice file matching the text signal, a TTS engine for converting the voice file into audio wave data, and the A plurality of personal terminals including an audio processing unit connected to the TTS engine and converting audio wave data converted by the TTS engine into an analog audio signal, and a speaker generating the audio signal converted by the audio processing unit; And
And a central server including a second communication unit for receiving the text signal and transmitting a voice file matched with the text signal, and a database unit storing the voice file.

The method of claim 1,
In the central server,
A voice input unit for inputting a voice file to the database unit is configured, and the voice input unit Fourier inputs an input signal input from the input person to the near-field microphone, the far-field microphone and the near-field microphone and the far-field microphone. The frequency component is calculated through conversion, the frequency signal power is calculated through the frequency component of each microphone, and the ratio of the power of both signals is calculated by comparing and analyzing the frequency signal power of the near-field microphone and the frequency signal power of the far-field microphone. And a noise canceling unit for determining the presence or absence of a voice file to be stored in the database unit based on a ratio value of the power of both signals.

The method of claim 2,
The voice input unit includes a housing to which the near-field microphone and the far-field microphone are exposed, and the noise-removing portion is embedded in the housing, and the far-field microphone has a plurality of flow holes in the near-field microphone direction, and a reflected wave forming a closed surface in the opposite direction. PU applied CTS Conbine system, characterized in that it is configured in a shape covered by a blocking cover.

The method of claim 2,
The voice input unit includes a housing to which the near-field microphone and the far-field microphone are exposed, and in the housing, a concave-shaped induction groove is formed in the far-field microphone in the near-field microphone, and the far-field microphone is formed on a side wall of the end of the guide groove. And a reflective wave blocking cover is formed at the end of the guide groove on which the far-field microphone is formed.