KR20030061241A

KR20030061241A - Conference service system and method using voice recognition and voice synthesis

Info

Publication number: KR20030061241A
Application number: KR1020020001805A
Authority: KR
Inventors: 전화성; 최도연
Original assignee: 에스엘투(주); (주)티아이스퀘어
Priority date: 2002-01-11
Filing date: 2002-01-11
Publication date: 2003-07-18

Abstract

PURPOSE: A conference service system using voice recognition and synthesis functions and a method therefor are provided to input necessary information from a user by using a voice recognition function, and to supply necessary information to the user by using a voice synthesis function, thereby remarkably increasing the number of simultaneously accessible users. CONSTITUTION: A conference service supplying system(100) has a trunk line interface(210). The trunk line interface(210) receives a user's signal, and transmits the signal to an external PSTN/mobile phone network or an IP network. The trunk line interface(210) is connected to a call processing controller(212) for controlling processing of the transmitted call signal. The call processing controller(212) is connected to an H.110 bus controller(214), an internal bus controller(216), and a DSP controller(218). The H.110 bus controller(214) controls an H.110 bus for processing connections and data transceiving between other boards. The internal bus controller(216) controls an internal bus for processing connections and data transceiving between modules. The DSP controller(218) processes digital signals.

Description

Conference service system and method using speech recognition and synthesis {Conference service system and method using voice recognition and voice synthesis}

본 발명은 음성 인식 및 음성 합성을 이용한 컨퍼런스 서비스 시스템 및 그 방법에 관한 것으로서, 더 구체적으로는 동시 접속 가능한 사용자 수를 크게 증가시킬 수 있으며, 음성 인식을 이용해 사용자로부터 필요한 정보를 입력받고, 음성 합성을 이용해 사용자에게 필요한 정보를 제공하는 컨퍼런스 시스템 및 그 방법에 관한 것이다.The present invention relates to a conference service system and method using voice recognition and speech synthesis, and more particularly, to increase the number of users that can be simultaneously connected, and to receive necessary information from a user using speech recognition, and to synthesize speech. The present invention relates to a conference system and a method for providing a user with necessary information.

원거리에 있는 다자간에 음성을 이용해 정보를 주고받기 위하여, 전화를 이용한 음성 컨퍼런스 시스템 또는 인터넷 전화나 퍼스널 컴퓨터를 이용한 음성 컨퍼런스 시스템 등이 개발되어 사용되고 있다.In order to send and receive information using multi-way voices in remote areas, a voice conference system using a telephone or a voice conference system using an internet telephone or a personal computer has been developed and used.

다자간 통화를 위해서는 통화하고자 하는 사용자 각각으로부터의 음성 메시지를 나머지 사용자 모두에게 전달하여야 한다. 이를 위해서는 전화를 이용한 음성 컨퍼런스 시스템의 경우 별도로 제작된 컨퍼런스용 전화기가 사용되기도 하고, 컨퍼런스 서비스를 제공하는 서비스 시스템에서 이와 같은 기능을 제공해주기도 한다.For the conference call, a voice message from each user to be called must be delivered to all other users. In order to do this, a telephone conference call system is used separately, and a service system that provides a conference service may provide such a function.

그러나, 이러한 종래의 방법들은 대부분 동시 접속이 가능한 사용자의 수가 일정한 숫자 이하로 제한되었으며, 이는 시스템 상의 한계에 의한 것뿐만 아니라, 일정 숫자 이상의 사용자가 동시에 접속할 경우 실제로 다자간 통화 자체가 힘들다는 원인에도 기인한다.However, most of these conventional methods are limited to the number of users who can be connected simultaneously to a certain number or less, not only because of limitations on the system, but also due to the fact that the conference itself is difficult when more than a certain number of users are connected at the same time. do.

한편, 사용자에게 가장 친숙한 인터페이스인 음성 인터페이스를 이용해 각종 기기를 제어하거나 음성 인터페이스를 이용해 다양한 서비스를 제공하는 방법이 확산되고 있다. 컨퍼런스 서비스 제공 시스템 및 그 방법에 있어서도 사용자가 입력하고자 하는 정보를 음성으로 입력하여 이를 인식하여 처리하도록 하고, 필요한 정보를 음성으로 합성하여 사용자에게 전달한다면 서비스의 이용은 매우 편리해질 것이다.Meanwhile, a method of controlling various devices using a voice interface, which is a user-friendly interface, or providing various services using a voice interface, has been spreading. In the conference service providing system and method thereof, if a user inputs information to be input by voice to recognize and process the information and synthesizes necessary information into a voice, the service will be very convenient.

본 발명은 이와 같은 종래 기술보다 더욱 발전된 서비스를 제공하고자 하는 것으로서, 동시 접속 가능한 사용자의 수를 크게 늘릴 수 있는 컨퍼런스 서비스 시스템 및 그 방법을 제공하는 것을 그 목적으로 한다.An object of the present invention is to provide a more advanced service than the prior art, and to provide a conference service system and a method thereof, which can greatly increase the number of simultaneous access users.

본 발명의 다른 목적은 사용자가 입력하고자 하는 정보를 음성 인식을 통해 획득하며 사용자에게 필요한 정보를 음성 합성을 통해 제공함으로써 사용자에게 가장 친숙한 음성 인터페이스를 사용하여 이용할 수 있는 컨퍼런스 서비스 시스템 및 그 방법을 제공하는 것이다.Another object of the present invention is to provide a conference service system and method that can be used using a voice interface that is most familiar to a user by acquiring information desired by a user through voice recognition and providing information required by the user through voice synthesis. It is.

본 발명의 또다른 목적은 특정 음색을 이용한 음성 정보 부가 서비스 및 양방향 음성 자동 응답 서비스와 같은 다양한 음성 정보 서비스를 제공하기 위한 시스템 및 방법을 제공하는 것이다.Another object of the present invention is to provide a system and method for providing various voice information services such as voice information supplementary service and two-way voice answering service using a specific tone.

도 1은 본 발명의 실시예에 따른 음성 인식 및 음성 합성을 이용한 컨퍼런스 서비스를 제공하기 위한 전체 네트워크 구성을 개략적으로 나타낸 도면이다.1 is a diagram schematically showing an overall network configuration for providing a conference service using speech recognition and speech synthesis according to an embodiment of the present invention.

도 2는 본 발명의 실시예에 따른 컨퍼런스 서비스 제공 시스템의 구성을 나타낸 블록도이다.2 is a block diagram showing the configuration of a conference service providing system according to an embodiment of the present invention.

도 3은 본 발명의 실시예에 따른 컨퍼런스 서비스 제공 시스템의 하드웨어 구성의 일례를 모식적으로 나타낸 도면이다.3 is a diagram schematically showing an example of a hardware configuration of a conference service providing system according to an embodiment of the present invention.

도 4는 본 발명의 실시예에 따른 컨퍼런스 서비스에서의 사용자 구성을 나타낸 도면이다.4 is a diagram illustrating a user configuration in a conference service according to an embodiment of the present invention.

도 5는 본 발명의 실시예에 따른 컨퍼런스 서비스 제공 시스템의 하드웨어 구성의 다른 예를 나타낸 도면이다.5 is a view showing another example of a hardware configuration of a conference service providing system according to an embodiment of the present invention.

도 6a와 도 6b는 각각 도 5의 중계선 기판 및 음성 처리 기판의 구성을 나타낸 블록도이다.6A and 6B are block diagrams illustrating the configurations of the relay line substrate and the voice processing substrate of FIG. 5, respectively.

도 7은 본 발명의 실시예에 따른 컨퍼런스 서비스의 흐름도이다.7 is a flowchart of a conference service according to an embodiment of the present invention.

도 8은 본 발명의 실시예에 따른 특정 음성을 정보 제공 서비스에 사용되는 음성 변환을 위한 변환 함수를 구하는 과정을 모식적으로 나타낸 도면이다.8 is a diagram schematically illustrating a process of obtaining a conversion function for speech conversion used for an information providing service of a specific speech according to an embodiment of the present invention.

도 9는 도 8에 나타난 방법으로 구한 변환 함수를 이용하여 원본 음성을 목적 화자의 음성으로 변환하는 과정을 모식적으로 나타낸 도면이다.FIG. 9 is a diagram schematically illustrating a process of converting an original voice into a target speaker's voice using a conversion function obtained by the method shown in FIG. 8.

도 10은 본 발명의 실시예에 따른 양방향 음성 자동 응답 서비스 제공 시스템의 전체 네트워크 구성도이다.10 is a diagram illustrating an overall network of a system for providing a two-way voice answering service according to an embodiment of the present invention.

도 11은 본 발명의 실시예에 따른 양방향 음성 자동 응답 서비스의 흐름도이다.11 is a flowchart of a two-way voice answering service according to an embodiment of the present invention.

이러한 목적을 달성하기 위하여 본 발명에서는, H.110 버스를 이용하여 두 개 이상의 음성 처리용 기판을 연결하여 다수의 사용자가 동시에 컨퍼런스 서비스를 이용할 수 있도록 하고, 다수의 사용자 중 일부는 능동 사용자로 설정하여 음성 신호의 송수신이 모두 가능하지만, 나머지 사용자는 단지 능동 사용자로부터의 음성 신호를 수신할 수만 있도록 한다. 사용자로부터 음성으로 입력되는 정보는 음성 인식부를 통해 인식되며, 사용자에게 전달될 정보는 음성 합성부를 통해 합성되어 제공된다.In order to achieve the above object, in the present invention, two or more voice processing boards are connected by using an H.110 bus so that a plurality of users can simultaneously use a conference service, and some of the plurality of users are set as active users. It is possible to both transmit and receive voice signals, but the remaining users can only receive voice signals from the active user. Information input by the user through the voice is recognized through the voice recognition unit, and information to be delivered to the user is synthesized and provided through the voice synthesizer.

즉, 본 발명에 따른 컨퍼런스 서비스 제공 시스템은, 외부 통신망으로부터 사용자의 신호를 전달받고 상기 외부 통신망으로 신호를 전달하는 중계선 인터페이스부, 전달된 호 신호의 처리를 제어하는 호 처리 제어부, 다른 기판과의 연결 및 데이터 송수신을 처리하기 위한 H.110 버스를 제어하는 H.110 버스 제어부, 자체 기판 내에서의 각 모듈 간의 연결 및 데이터 송수신을 처리하기 위한 내부 버스를 제어하는 내부 버스 제어부, 컨퍼런스 서비스에서 사용되는 디지털 신호를 처리하기 위한 DSP 제어부를 포함하며, H.110 버스를 통해 연결된 적어도 두 개의 기판을포함하여 이루어진다.That is, the conference service providing system according to the present invention includes a relay line interface unit for receiving a user's signal from an external communication network and transmitting a signal to the external communication network, a call processing control unit for controlling the processing of the transmitted call signal, and another substrate. H.110 bus controller for controlling the H.110 bus for handling connection and data transmission and reception, internal bus controller for controlling the internal bus for handling connectivity and data transmission and reception between modules on its own board, used in conference services It includes a DSP control unit for processing the digital signal, and comprises at least two substrates connected via the H.110 bus.

여기에서, 상기 DSP 제어부와 연결되어 있으며, 사용자로부터 음성으로 전달된 정보를 인식하는 음성 인식부, 상기 DSP 제어부와 연결되어 있으며, 사용자에게 제공할 정보를 음성으로 합성하는 음성 합성부 및/또는 상기 DSP 제어부와 연결되어 있으며, 컨퍼런스 서비스를 통해 주고받는 음성 정보인 대화 내용을 기록장치로 녹취하는 음성 녹취부를 더 포함할 수 있으며, 또한, 상기 DSP 제어부와 연결되어 있으며, 각각 컨퍼런스 서비스를 위해 데이터 수신, 데이터 송신 및 데이터 믹싱을 처리할 수 있는 수신 처리부, 송신 처리부 및 믹싱 처리부를 더 포함할 수도 있다.Here, the voice recognition unit is connected to the DSP control unit, and recognizes the information transmitted to the voice from the user, the voice synthesis unit is connected to the DSP control unit, and synthesizes the information to be provided to the user to the voice and / or the It is connected to the DSP control unit, and may further include a voice recording unit for recording the conversation content, which is voice information transmitted and received through the conference service to the recording device, and is also connected to the DSP control unit, each receiving data for the conference service The apparatus may further include a reception processor, a transmission processor, and a mixing processor capable of processing data transmission and data mixing.

본 발명의 컨퍼런스 서비스 제공 시스템에서는, 상기 호 처리 제어부와 DSP 제어부가 함께 동작하여, 상기 컨퍼런스 서비스 제공 시스템에 접속되어 컨퍼런스 서비스를 이용하는 전체 사용자 중 일부인 제 1 사용자 중 하나로부터 전달된 음성 신호는 해당 음성 신호를 전달한 사용자를 제외한 나머지 모든 사용자에게 전달하고, 전체 사용자 중 제 1 사용자를 제외한 나머지 사용자인 제 2 사용자 중 하나로부터 전달된 음성 신호는 무시하도록 처리할 수 있다.In the conference service providing system of the present invention, the call processing control unit and the DSP control unit work together, and the voice signal transmitted from one of the first users, which is part of all users who are connected to the conference service providing system and uses the conference service, is a corresponding voice. The voice signal transmitted from one of the second users other than the first user among all users may be processed to be transmitted to all users except the user who has transmitted the signal.

한편, 본 발명에 따른 컨퍼런스 서비스 제공 방법은, 외부 통신망으로부터 사용자의 신호를 전달받고 상기 외부 통신망으로 신호를 전달하는 중계선 인터페이스부, 전달된 호 신호의 처리를 제어하는 호 처리 제어부, 다른 기판과의 연결 및 데이터 송수신을 처리하기 위한 H.110 버스를 제어하는 H.110 버스 제어부, 자체 기판 내에서의 각 모듈 간의 연결 및 데이터 송수신을 처리하기 위한 내부 버스를 제어하는 내부 버스 제어부, 컨퍼런스 서비스에서 사용되는 디지털 신호를 처리하기 위한 DSP 제어부를 포함하는 컨퍼런스 서비스 제공 시스템을 이용한 컨퍼런스 서비스 제공 방법으로서, 사용자가 통신망을 통해 상기 컨퍼런스 서비스 제공 시스템에 접속함에 따라 상기 사용자로부터의 호 신호가 상기 중계선 인터페이스부를 거쳐 상기 호 처리 제어부로 전달되는 단계, 접속된 사용자 수가 동시 접속 가능 사용자 수를 초과하는지를 판단하는 단계, 상기 판단 단계에서 접속된 사용자 수가 동시 접속 가능 사용자 수를 초과하지 않는 경우, 상기 사용자가 신호 송수신이 모두 허용되는 능동 사용자와 신호 수신만이 허용되는 수동 사용자 중 어느 상태로 될 것인지를 결정하는 단계, 상기 결정 단계에서 상기 사용자가 신호 송수신이 모두 허용되는 능동 사용자로 결정된 경우, 상기 사용자로부터 전달된 음성 신호를 상기 사용자를 제외한 나머지 모든 사용자에게 전달하도록 처리하고, 상기 결정 단계에서 상기 사용자가 신호 수신만이 허용되는 능동 사용자로 결정된 경우, 상기 사용자로부터 전달된 음성 신호를 다른 사용자에게 전달하지 않고 무시하도록 처리하는 단계를 포함하여 이루어진다.Meanwhile, the method for providing a conference service according to the present invention includes a relay line interface unit receiving a user's signal from an external communication network and transmitting a signal to the external communication network, a call processing controller for controlling the processing of the transferred call signal, and another substrate. H.110 bus controller for controlling the H.110 bus for handling connection and data transmission and reception, internal bus controller for controlling the internal bus for handling connectivity and data transmission and reception between modules on its own board, used in conference services A conference service providing method using a conference service providing system including a DSP control unit for processing a digital signal, wherein a call signal from the user passes through the relay line interface unit as a user connects to the conference service providing system through a communication network. The call processing control unit Determining whether the number of connected users exceeds the number of simultaneous accessible users, and if the number of connected users does not exceed the number of concurrently accessible users in the determining step, the active user is allowed to send and receive signals. And determining which of the passive users is allowed only to receive a signal. If the user is determined to be an active user that is allowed to transmit and receive signals, the user may receive a voice signal transmitted from the user. And processing to deliver to all other users except for, and in the determining step, ignoring the voice signal transmitted from the user without passing the signal to another user when the user is determined to be an active user who is allowed to receive a signal. It is done by

여기에서, 상기 능동 사용자로부터 전달된 음성 신호를 기록 장치로 저장하는 단계를 더 포함하거나, 상기 컨퍼런스 서비스 제공 시스템이, 상기 DSP 제어부와 연결되어 있으며 각각 컨퍼런스 서비스를 위해 데이터 수신, 데이터 송신 및 데이터 믹싱을 처리할 수 있는 수신 처리부, 송신 처리부 및 믹싱 처리부를 더 포함하며, 상기 사용자로부터 전달된 음성 신호를 처리하는 단계에서, 신호 수신, 신호 송신 및 신호 믹싱이 상기 수신 처리부, 송신 처리부 및 믹싱 처리부를 통해 이루어질 수 있다.The method may further include storing a voice signal transmitted from the active user to a recording device, or the conference service providing system is connected to the DSP control unit and receives data, transmits data, and mix data for a conference service, respectively. The apparatus may further include a reception processing unit, a transmission processing unit, and a mixing processing unit that process the voice signal, and in the processing of the voice signal transmitted from the user, signal reception, signal transmission, and signal mixing are performed by the reception processing unit, the transmission processing unit, and the mixing processing unit. It can be done through.

본 발명에 따른 특정 음색의 음성 정보 서비스 제공 방법은, 외부 통신망으로부터 신호를 전달받고 상기 외부 통신망을 통해 사용자에게 신호를 전달하는 중계선 인터페이스부, 인입 또는 출중되는 호 신호의 처리를 제어하는 호 처리 제어부, 변환 함수를 이용하여 원본 음성으로부터 특정한 음색을 갖는 목적 화자의 음성을 합성할 수 있도록 처리하는 음성 합성 처리부, 음성 정보 서비스에서 사용되는 디지털 신호를 처리하기 위한 DSP 제어부, 사용자로부터 제공되는 정보 및 기타 음성 정보 서비스를 제공하기 위하여 필요한 정보를 기억하기 위한 기억장치를 포함하는 음성 정보 서비스 장치를 이용하여 특정 음색의 음성 정보를 제공하는 방법으로서, 사용자가 사용자 단말을 이용하여 통신망을 통해 상기 장치에 접속하여 제공받고자 하는 음성 정보의 내용, 음색, 음성 정보를 제공받고자 하는 시각을 포함하는 음성 정보 서비스 내용을 입력함에 따라, 사용자로부터 제공된 음성 정보 서비스 내용이 상기 장치 내의 기억장치에 저장되는 제 1 단계, 사용자가 제공받고자 하는 음성 정보의 내용이 상기 음성 합성 처리부에 의해 사용자가 선택한 특정 음색의 음성으로 합성되는 제 2 단계, 사용자로부터 입력된 음성 정보 제공 시각에, 합성된 음성을 사용자에게 전달하는 제 3 단계를 포함하여 이루어진다. 여기에서, 사용자가 제공받고자 하는 음성 정보의 내용은 모닝콜 또는 스케쥴링 정보일 수 있다.Voice information service providing method of a specific tone according to the present invention, a relay line interface unit for receiving a signal from an external communication network and transmitting a signal to a user through the external communication network, call processing control unit for controlling the processing of incoming or outgoing call signal A speech synthesis processor for processing a synthesizer's voice having a specific tone from a source voice using a conversion function, a DSP controller for processing digital signals used in a voice information service, information provided by a user, and the like. A method of providing voice information of a specific tone by using a voice information service device including a memory device for storing information necessary for providing a voice information service, wherein a user accesses the device through a communication network using a user terminal. Voice information to be provided by The first step in which the voice information service contents provided by the user are stored in a storage device in the device as input of the voice information service contents including the contents of the information, the tone, and the time of the voice information is inputted, is provided by the user. And a third step of synthesizing contents of voice information into voices of a specific tone selected by the voice synthesis processor, and a third step of delivering the synthesized voice to the user at the time of providing voice information input from the user. . Here, the content of the voice information that the user wants to be provided may be a wake-up call or scheduling information.

여기에서, 상기 음성 합성 처리부에 의해 원하는 음색의 음성을 합성하기 위해 이용되는 변환 함수는, 녹음된 원본 화자와 목적 화자의 목소리를 HNM(Harmonic and Noise Model) 분석하여 각 목소리의 스펙트럼 정보를 추출하는 단계, 각 스펙트럼 정보의 같은 발음의 위치를 DTW(Dynamic Time Warping) 방법을 이용하여 대응시키는 단계, 두 스펙트럼 정보의 관계를 GMM(Gaussian Mixture Model)으로 표현하고 EM(Expectation Maximization) 방법을 이용하여 점진적으로 훈련시키는 단계, 훈련된 GMM을 최소제곱법(least square)을 이용하여 최적화하는 단계를 통해 얻어지고, 상기 제 2 단계는, 사용자가 제공받고자 하는 음성 정보의 내용을 담고 있는 원본 음성 신호를 HNM 분석하여 유성음부의 스펙트럼 포락과 음원 정보로 분리하는 제 2-1 단계, 상기 스펙트럼 포락을 상기 변환 함수를 이용하여 원하는 스펙트럼 포락으로 변환하는 제 2-2 단계, 상기 변환된 스펙트럼 포락에 대해 잡음 제거 처리를 하는 제 2-3 단계, 상기 음원 정보를 원하는 음색의 음원 정보로 바꾸기 위하여 매핑 처리하는 제 2-4 단계, 매핑 처리된 상기 음원 정보와 잡음 제거 처리된 스펙트럼 포락을 이용해 HNM 합성함으로써 원하는 음색의 음성을 합성하는 제 2-5 단계를 포함하는 것이 바람직하다.Here, the conversion function used for synthesizing the voice of the desired tone by the voice synthesis processor is configured to extract spectral information of each voice by analyzing a recorded voice and noise model (HNM) of the original speaker and the target speaker. Step, corresponding position of the same pronunciation of each spectral information using DTW (Dynamic Time Warping) method, expressing the relationship between the two spectral information in a Gaussian Mixture Model (GMM) and progressively using the Expectation Maximization (EM) method Training the GMM, and optimizing the trained GMM using a least square method, wherein the second step includes the HNM of the original voice signal containing the content of the voice information to be provided by the user. Step 2-1 of analyzing and separating the spectral envelope and sound source information of the voiced sound part by using the transform function. Step 2-2 of converting the desired spectral envelope to noise, step 2-3 of performing noise removing processing on the converted spectral envelope, and mapping of the sound source information to sound source information of a desired tone. And a second to fifth step of synthesizing the voice of a desired tone by HNM synthesis using the mapped sound source information and the noise canceled spectral envelope.

상기 제 1 단계에서 상기 사용자 단말은 유선 또는 무선 전화이고, 상기 통신망은 공중전화망 또는 이동전화망일 수 있으며, 이 경우 상기 제 3 단계에서는 상기 제 1 단계에서 사용자가 상기 장치에 접속하기 위해 사용한 사용자 단말로 합성된 상기 음성을 전달할 수 있다.In the first step, the user terminal may be a wired or wireless telephone, and the communication network may be a public telephone network or a mobile phone network. In this case, in the third step, the user terminal used by the user to access the device in the first step. The synthesized voice can be delivered.

또한, 상기 제 1 단계에서, 상기 사용자 단말은 개인용 컴퓨터이고, 상기 통신망은 인터넷망일 수도 있다.In the first step, the user terminal may be a personal computer, and the communication network may be an internet network.

한편, 상기 제 1 단계에서 사용자가 제공하는 음성 정보 서비스 내용은 음성 정보 서비스를 받고자 하는 사용자 단말을 특정하는 사용자 단말 정보를 포함하며,상기 제 3 단계에서는 상기 제 1 단계에서 사용자에 의해 특정된 사용자 단말로 합성된 상기 음성을 전달할 수도 있다.Meanwhile, the content of the voice information service provided by the user in the first step includes user terminal information specifying a user terminal to receive the voice information service, and in the third step, the user specified by the user in the first step The synthesized voice may be transmitted to the terminal.

상기 제 2 단계 이후에는, 상기 제 2 단계에서 합성된 음성을 파일의 형태로 상기 장치의 상기 기억장치에 저장하는 단계를 더 포함하는 것이 바람직하며, 상기 제 3 단계에서는, 상기 호 처리 제어부와 상기 중계선 인터페이스에 의해 사용자의 단말로 출중호의 연결을 설정한 후 합성된 음성을 사용자에게 전달하거나, 사용자의 단말에 대한 음성 메시지의 형태로 합성된 음성을 전달할 수 있다.After the second step, the method further comprises storing the voice synthesized in the second step in the form of a file in the storage device of the device. In the third step, the call processing control unit and the After establishing the connection of the outgoing call to the terminal of the user through the relay line interface, the synthesized voice may be delivered to the user, or the synthesized voice may be delivered in the form of a voice message to the terminal of the user.

또한, 본 발명에 따른 양방향 자동 응답 서비스 제공 장치는, 외부 통신망으로부터 사용자의 신호를 전달받고 상기 외부 통신망으로 신호를 전달하는 중계선 인터페이스, 전달된 호 신호의 처리를 제어하는 호 처리 제어 모듈, 사용자로부터 입력된 음성 신호를 인식하기 위한 음성 인식 모듈, 인식된 음성으로부터 핵심어를 추출하고 이에 따른 처리를 하는 핵심어 추출 및 처리 모듈, 사용자를 원하는 서비스로 연결하기 위한 정보를 보관하고 있는 데이터베이스를 포함한다.In addition, the apparatus for providing a two-way answering machine according to the present invention includes a relay line interface for receiving a user's signal from an external communication network and transmitting a signal to the external communication network, a call processing control module for controlling the processing of the transmitted call signal, and from the user. A voice recognition module for recognizing an input voice signal, a key word extraction and processing module for extracting a key word from the recognized voice and processing the key word, and a database storing information for connecting a user to a desired service.

본 발명에 따른 양방향 자동 응답 서비스 제공 방법은, 상술한 바와 같은 장치를 이용해 제공되며, 사용자가 통신망을 통해 상기 장치에 접속함에 따라 상기 사용자로부터의 호 신호가 상기 중계선 인터페이스를 거쳐 상기 호 처리 제어 모듈로 전달되는 단계, 사용자가 원하는 서비스의 내용을 음성으로 입력함에 따라 상기 음성 인식 모듈에 의해 사용자로부터 입력된 음성 신호가 인식되는 단계, 인식된 상기 음성 신호로부터 핵심 단어가 상기 핵심어 추출 및 처리 모듈에 의해 추출되고, 추출된 상기 핵심 단어를 이용하여 상기 데이터베이스를 참조하여 상기 사용자에게 제공될 서비스의 종류가 결정되는 단계, 상기 사용자가 결정된 상기 서비스에 연결되는 단계를 포함한다.The method for providing a two-way answering service according to the present invention is provided by using the apparatus as described above, and as a user connects to the device through a communication network, a call signal from the user passes through the relay line interface. The voice signal inputted by the user is recognized by the voice recognition module as the user inputs the contents of the service desired by the user. The key word is extracted from the recognized voice signal to the keyword extraction and processing module. And determining the type of service to be provided to the user by referring to the database using the extracted key words, and connecting the service to the determined service.

여기에서, 상기 음성 신호 인식 단계에서 사용자는 상담하고자 하는 내용을 음성으로 입력하고, 상기 서비스 종류 결정 단계에서는 사용자의 상담 내용으로부터 추출된 핵심 단어를 이용하여 사용자가 연결될 상담자가 결정되며, 상기 서비스 연결 단계에서는 상기 사용자와 상기 상담자가 직접 전화를 통해 연결될 수 있다.Here, in the voice signal recognition step, the user inputs a content to consult with a voice. In the service type determination step, a counselor to which the user is connected is determined by using key words extracted from the user's counseling content. In the step, the user and the counselor may be directly connected by telephone.

이제 첨부한 도면을 참고로 하여 본 발명의 바람직한 실시예에 대하여 상세히 설명한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명에 따른 컨퍼런스 서비스가 제공되는 전체 네트워크의 구성을 나타낸 개략도이다.1 is a schematic diagram showing the configuration of an entire network provided with a conference service according to the present invention.

도 1에 나타난 바와 같이, 본 발명의 컨퍼런스 서비스 제공 시스템(100)은 공중전화망/이동전화망(PSTN/MDCN; public switched telephone network/mobile digital cellular network)(110)과 E1/PRI, R2로 연결되어 있고, 인터넷 프로토콜을 이용하는 IP망(120)과도 HTTP/H.323 프로토콜을 이용하여 연결되어 있다. 유선 전화(112) 또는 무선 전화(114)를 사용자 단말로 사용하는 사용자는 PSTN/MDCN(110)을 통해 본 발명의 컨퍼런스 서비스 제공 시스템(100)에 접속할 수 있고, 퍼스널 컴퓨터(116)나 인터넷 전화(118) 등을 사용자 단말로 사용하는 사용자는 IP망(120)을 통해 본 발명의 컨퍼런스 서비스 제공 시스템(100)에 접속할 수 있다.As shown in FIG. 1, the conference service providing system 100 of the present invention is connected to a public switched telephone network / mobile digital cellular network (PSTN / MDCN) 110 and E1 / PRI, R2. It is also connected to the IP network 120 using the Internet protocol using the HTTP / H.323 protocol. A user who uses the landline 112 or the cordless telephone 114 as the user terminal can access the conference service providing system 100 of the present invention through the PSTN / MDCN 110, and the personal computer 116 or the Internet telephone. A user using 118 or the like as the user terminal may access the conference service providing system 100 of the present invention through the IP network 120.

도 2는 도 1 중 컨퍼런스 서비스 제공 시스템(100)의 구성을 좀 더 상세히 도시한 블록도이다.FIG. 2 is a block diagram illustrating in more detail the configuration of the conference service providing system 100 of FIG. 1.

도 2에 나타난 바와 같이, 컨퍼런스 서비스 제공 시스템(100)은 외부의 공중전화망/이동전화망(110) 또는 IP망(120)으로부터 사용자의 신호를 전달받고 외부의 공중전화망/이동전화망(110) 또는 IP망(120)으로 신호를 전달하는 중계선 인터페이스부(210)를 구비하고 있다. 중계선 인터페이스부(210)는 전달된 호 신호의 처리를 제어하는 호 처리 제어부(212)와 연결되어 있다. 호 처리 제어부(212)는 다른 기판과의 연결 및 데이터 송수신을 처리하기 위한 H.110 버스를 제어하는 H.110 버스 제어부(214), 해당 기판 내에서의 각 모듈 간의 연결 및 데이터 송수신을 처리하기 위한 내부 버스를 제어하는 내부 버스 제어부(216) 및 컨퍼런스 서비스에서 사용되는 디지털 신호를 처리하기 위한 DSP 제어부(218)와 연결되어 있다.As shown in FIG. 2, the conference service providing system 100 receives a user signal from an external public telephone network / mobile telephone network 110 or an IP network 120 and receives an external public telephone network / mobile telephone network 110 or IP. The relay line interface unit 210 transmits a signal to the network 120. The relay line interface 210 is connected to a call processing control unit 212 that controls the processing of the transferred call signal. The call processing control unit 212 is an H.110 bus control unit 214 which controls an H.110 bus for processing connection and data transmission and reception with another substrate, and processing connection and data transmission and reception between each module in the corresponding substrate. It is connected to the internal bus control unit 216 for controlling the internal bus and the DSP control unit 218 for processing the digital signal used in the conference service.

컨퍼런스 서비스란 통상의 일대일 통화와 달리 셋 이상의 사용자가 함께 통화할 수 있는 서비스를 말한다. 이러한 컨퍼런스 서비스를 위해서는 다수의 사용자가 송수신하는 신호를 공유할 수 있어야 한다. 즉, 컨퍼런스 서비스를 이용하는 다수의 사용자 중 하나의 사용자가 송신하는 신호는 컨퍼런스 서비스를 이용하는 나머지 사용자에게 각각 다 전달되어야 한다. 이러한 개념을 설명하기 위하여 통상 가상적인 컨퍼런스 룸을 상정할 수 있다. 하나의 컨퍼런스 룸에 참여될 수 있는 사용자의 수는 제한된다. 통상 하나의 기판을 이용하여 구현하는 경우 12명의 사용자가 컨퍼런스 시스템을 이용할 수 있다. 본 발명에서는 하나의 컨퍼런스 룸에 참여할 수 있는 사용자의 수를 늘리기 위하여 다수 개의 기판을 서로 연결하며, 연결된다수 개의 기판은 H.110 버스를 이용해 데이터를 주고받는다. 이와 같이 다수 개의 기판 사이를 연결하는 H.110 버스를 통한 데이터의 송수신을 H.110 버스 제어부(214)가 담당한다.Unlike general one-to-one calls, a conference service is a service that allows three or more users to talk together. These conference services require that multiple users can share the signals sent and received. That is, signals transmitted by one of the plurality of users who use the conference service must be transmitted to each other user who uses the conference service. To illustrate this concept, a virtual conference room can usually be assumed. The number of users who can participate in one conference room is limited. In general, when using one board, 12 users may use the conference system. In the present invention, in order to increase the number of users who can participate in one conference room, a plurality of boards are connected to each other. Several boards exchange data using an H.110 bus. As such, the H.110 bus controller 214 is responsible for transmitting and receiving data through the H.110 bus connecting the plurality of substrates.

한편, 하나의 기판 내에서의 모듈 간의 데이터 송수신은 내부 버스를 이용해 이루어지며, 이의 제어는 내부 버스 제어부(216)에 의해 이루어진다.Meanwhile, data transmission and reception between modules in one substrate is performed using an internal bus, and control thereof is performed by the internal bus controller 216.

즉, 도 2와 같은 구성 요소를 갖는 다수 개의 기판(100_1, 100_2, .. , 100_n)을 도 3에 나타낸 바와 같이 H.110 프로토콜을 이용하여 다수 개 연결하여 많은 수의 이용자가 동시에 접속하여 이용할 수 있는 컨퍼런스 서비스 제공 시스템을 구현할 수 있다. 그러나, 다수 개의 기판을 이용하여 컨퍼런스 서비스 제공 시스템을 구현할 경우에 모든 기판이 도 2와 같은 구성을 가져야 하는 것은 아니며, 각 기판 별로 각 기능 모듈들이 선택적으로 포함될 수도 있다. 이러한 예에 대해서는 후술한다.That is, a plurality of substrates 100_1, 100_2, .., 100_n having the components as shown in FIG. 2 are connected to each other by using the H.110 protocol as shown in FIG. A conference service providing system can be implemented. However, when implementing a conference service providing system using a plurality of substrates, not all substrates need to have a configuration as shown in FIG. 2, and each functional module may be selectively included for each substrate. Such an example will be described later.

사용자로부터의 접속 신호가 중계선 인터페이스부(210)를 통해 호 처리 제어부(212)로 전달될 경우, 호 처리 제어부(212)는 먼저 해당 호 신호를 어떤 기판 내에서 처리하여야 하는지를 판단한다. 판단 결과에 따라 H.110 버스 제어부(214)에 의해 호 신호는 해당 기판으로 연결되고, 해당 기판으로 연결된 호 신호는 다시 해당 기판 내의 내부 버스 제어부(216)에 의해 원하는 모듈로 연결된다. 그러나, 호 신호를 전달받은 기판 내에서 처리될 수 있는 호 신호인 경우에는 H.110 버스 제어부(214)의 제어를 벗어나 직접 자체 기판 내의 내부 버스 제어부(216)로 제어가 넘어간다.When a connection signal from a user is transmitted to the call processing control unit 212 through the relay line interface unit 210, the call processing control unit 212 first determines in which substrate the call signal should be processed. According to the determination result, the call signal is connected to the substrate by the H.110 bus controller 214, and the call signal connected to the substrate is connected to the desired module by the internal bus controller 216 in the substrate. However, in the case of a call signal that can be processed in the substrate receiving the call signal, the control passes directly to the internal bus controller 216 in the own substrate, out of the control of the H.110 bus controller 214.

DSP 제어부(218)는 컨퍼런스 서비스 제공을 위해 각각 데이터 수신(Rx), 데이터 송신(Tx) 및 데이터 믹싱을 처리할 수 있는 Rx 처리부(226), Tx 처리부(228), 믹싱 처리부(230)와 연결되어 있다. 또한, 사용자로부터 음성으로 전달된 정보를 인식하여야 할 경우 이를 제어하기 위한 음성 인식 제어부(220), 사용자에게 필요한 정보를 음성 합성하여 전달하여야 할 경우 이를 제어하기 위한 음성 합성 제어부(222) 및 컨퍼런스 서비스를 통해 제공되는 음성 정보를 시스템 내에서 직접 녹취하거나 브로드캐스팅하는 처리를 수행하기 위한 음성 녹취 및 브로드캐스팅 처리부(224)와 연결되어 있다.The DSP controller 218 is connected to an Rx processor 226, a Tx processor 228, and a mixing processor 230 that can process data reception (Rx), data transmission (Tx), and data mixing, respectively, to provide a conference service. It is. In addition, the voice recognition controller 220 for controlling the information transmitted by the voice from the user to control it, the voice synthesis control unit 222 for controlling the speech synthesis to deliver the information necessary to the user and the conference service It is connected to the voice recording and broadcasting processing unit 224 for performing a process of directly recording or broadcasting voice information provided through the system.

이론상으로는 아무리 많은 수의 사용자가 하나의 컨퍼런스 룸에 참여하더라도 컨퍼런스 서비스가 가능하지만, 실제로는 일정 숫자 이상의 사용자가 하나의 컨퍼런스 룸에 참여하게 되면 컨퍼런스 자체가 이루어지기 힘든 상황이 발생할 수 있다. 특히, 본 발명에서 예를 들어 설명하고자 하는 스타와의 만남의 방 서비스의 경우는 동시에 천 명 이상의 사용자가 하나의 컨퍼런스 룸에 참여하게 되며, 이 때 해당 컨퍼런스 룸에 참여한 천 명 이상의 사용자가 함께 대화하기란 불가능할 것이다.Theoretically, no matter how many users join one conference room, the conference service is available. However, in reality, if a certain number of users join one conference room, the conference itself may be difficult. In particular, in the case of the meeting room service of the star to be described as an example in the present invention, at least 1,000 users simultaneously participate in one conference room, and at this time, more than 1,000 users who participated in the conference room talk together. It would be impossible to do.

이러한 문제점을 해결하기 위하여 본 발명의 컨퍼런스 서비스 제공 시스템에서는 하나의 컨퍼런스 룸에 참여하는 전체 사용자 중 일부 사용자만을 송신 가능한 사용자로 설정하고 나머지 사용자는 송신 가능한 일부 사용자가 송신하는 정보를 수신하기만 하는 사용자로 설정하는 방법을 제공한다. 이러한 방법은 사용자 장치가 아닌 본 발명의 컨퍼런스 서비스 제공 시스템 내에서 구현된다.In order to solve this problem, in the conference service providing system of the present invention, only some users among all users participating in one conference room are set as transmittable users, while others only receive information transmitted by some transmittable users. Provides a way to set. This method is implemented within the conference service providing system of the present invention and not the user device.

도 4에 본 발명의 실시예에 따른 컨퍼런스 서비스에서의 사용자 구성이 나타나 있다. 즉, 일정한 방식으로 정해진 능동 사용자(Active user or Talker)("T"로 표시함)로부터 전달되는 신호는 나머지 모든 사용자에게로 각각 전달된다. 그러나, 능동 사용자가 아닌 나머지 수동 사용자(Passive user or Listener)("L"로 표시함)로부터 전달되는 신호는 다른 사용자에게로 전달되지 않고 무시된다.4 illustrates a user configuration in a conference service according to an embodiment of the present invention. That is, signals transmitted from an active user or talker (denoted by "T"), which are determined in a certain manner, are respectively transmitted to all remaining users. However, signals from other passive users (Passive users or listeners) (indicated by "L") that are not active users are ignored without being passed to other users.

다수의 사용자 중에서 능동 사용자를 선정하는 것은 경매 방식을 이용하여 많은 이용료를 지불하는 순으로 선정하거나, 선착순으로 선정하거나, 추첨하여 선정하는 등 다양한 방식을 이용할 수 있다. 또한, 필요에 따라서는 하나의 컨퍼런스 룸 내에서도 능동 사용자가 일정 시간 간격으로 또는 특정한 이벤트의 발생에 따라 바뀌도록 하는 방식 등을 사용할 수도 있다. 이에 대해서는 후술한다.Selecting an active user from among a plurality of users may use various methods, such as selecting an order of paying a large amount of usage fees, selecting a first-come-first-served basis, or selecting a lottery. In addition, if necessary, a method of allowing an active user to change within a conference room at a predetermined time interval or as a specific event occurs may be used. This will be described later.

앞서 간략히 서술한 바와 같이, 도 2에 나타낸 바와 같은 구성이 반드시 하나의 하드웨어 기판 상에 구현되어야 하는 것은 아니다. 다시 말하면, 전체 시스템의 구성이 반드시 도 3에 나타낸 바와 같이 동일한 다수 개의 기판을 연결한 것으로 이루어져야 하는 것은 아니며, 필요에 따라서는, 각 기능 모듈을 여러 개의 기판에 분산하여 형성할 수 있다. 특히, 컨퍼런스 서비스 외의 다른 부가 서비스를 함께 제공하고자 할 경우, 해당 부가 서비스를 제공하기 위한 기능이 모든 기판에 포함되어야 할 필요는 없으므로 이와 같은 경우에는 도 5 및 도 6에 나타난 바와 같은 형태로 전체 시스템을 구성하는 것이 유리할 수 있다.As briefly described above, the configuration as shown in FIG. 2 does not necessarily have to be implemented on one hardware substrate. In other words, the configuration of the entire system does not necessarily consist of connecting the same plurality of substrates as shown in FIG. 3, and each functional module can be formed by dispersing the plurality of substrates as necessary. In particular, when providing additional services other than the conference service together, the function for providing the additional services does not need to be included in all the boards. In such a case, the entire system in the form as shown in FIGS. It may be advantageous to construct.

즉, 일반적인 접속 기능과 부가 서비스 기능을 제공하기 위한 중계선 기판(510_1, 510_2)과 컨퍼런스 서비스 제공을 위한 컨퍼런스 서비스 기판(520_1,.. , 520_n)으로 나누어 구성하는 것이다.That is, the relay boards 510_1 and 510_2 for providing a general access function and an additional service function and the conference service boards 520_1,..., 520_n for providing a conference service are configured.

중계선 기판(510)에는 중계선 인터페이스부(610), 호 처리 제어부(612), H.110 버스 제어부(614)가 포함되고, 부가 서비스 제공을 위해 사용될 수 있는 음성 인식 제어부(620), 음성 합성 제어부(622), 음성 녹취 및 브로드캐스팅 처리부(624)가 포함된다.The relay line board 510 includes a relay line interface unit 610, a call processing control unit 612, and an H.110 bus control unit 614, and a voice recognition control unit 620 and a voice synthesis control unit that can be used for providing an additional service. 622, voice recording and broadcasting processing unit 624 is included.

이에 비해, 컨퍼런스 서비스 기판(520)에는 해당 기판 내의 내부 버스를 제어하기 위한 내부 버스 제어부(614)가 포함되고, 컨퍼런스 서비스 제공을 위해 신호 처리를 하는 DSP 제어부(618) 및 이에 연결된 Rx 처리부(626), Tx 처리부(628) 및 믹싱 처리부(630)가 포함된다.In contrast, the conference service board 520 includes an internal bus control unit 614 for controlling an internal bus in the board, the DSP control unit 618 which performs signal processing for providing the conference service, and the Rx processing unit 626 connected thereto. ), The Tx processor 628 and the mixing processor 630.

이에 이와 같은 컨퍼런스 서비스 제공 시스템을 이용하여 실제 서비스를 하는 과정을 예로 들어 컨퍼런스 서비스 방법을 상세히 설명한다. 본 발명의 컨퍼런스 서비스는 소위 스타와 팬의 "만남의 방" 서비스로 구현된다.Thus, the conference service method will be described in detail by taking a process of performing an actual service using the conference service providing system as an example. The conference service of the present invention is implemented as a so-called "room of meeting" service of stars and fans.

즉, 일정한 시간대에 스타가 컨퍼런스 서비스 제공 시스템에 접속하도록 하고, 해당 스타와의 통화를 원하는 다수의 사용자들이 해당 스타가 접속되어 있는 가상의 컨퍼런스 룸에 참여하게 된다. 스타와 팬의 만남이라는 서비스의 특성상 동시 접속하는 사용자의 수는 매우 많아지게 되며, 동시 접속 가능한 사용자의 수는 하드웨어의 구성에 따라 달라질 수 있지만, 천 명 이상의 사용자가 동시 접속하여 "만남의 방" 서비스를 이용할 수 있게 된다.That is, at a certain time, a star accesses the conference service providing system, and a plurality of users who want to call the star participate in the virtual conference room where the star is connected. Due to the characteristics of the service of meeting stars and fans, the number of concurrent users will be very large, and the number of users who can be connected simultaneously may vary depending on the hardware configuration, but more than 1,000 users can simultaneously connect to the "room of meeting". The service will be available.

이 때, 해당 스타는 기본적으로 능동 사용자(Talker)로 설정되며, 사용자들 중 일정 수의 사용자가 능동 사용자로 설정되고, 나머지 사용자들은 수동사용자(Listener)로 설정된다. 능동 사용자로 설정된 일정 수의 사용자는 역시 능동 사용자인 해당 스타와 다자간 통화 서비스를 즐길 수 있으며, 수동 사용자들은 해당 스타와 능동 사용자의 대화 내용을 들을 수 있다. 즉, 수동 사용자에게는 해당 대화 내용이 방송되는 셈이 된다. 이러한 능동 사용자와 수동 사용자로부터의 메시지 제어는 DSP 제어부(218, 618)와 이에 연결된 Rx 처리부(226, 626), Tx 처리부(228, 628), 믹싱 처리부(230, 630)에 의해 이루어진다.At this time, the star is basically set as the active user (Talker), a certain number of users are set as the active user, the remaining users are set to the passive user (Listener). A certain number of users set as active users can enjoy a multi-party call service with the star, which is also an active user, and passive users can listen to the conversation between the star and the active user. That is, the contents of the conversation are broadcast to the passive user. Message control from the active and passive users is performed by the DSP controllers 218 and 618, Rx processors 226 and 626, Tx processors 228 and 628, and mixing processors 230 and 630 connected thereto.

해당 스타와의 만남의 방 서비스를 이용할 수 있는 사용자의 선정과 능동 사용자의 선정은 각각 선착순 선정 및 많은 사용료를 지불하는 사용자의 순으로 선정하는 방식으로 할 수 있다. 서비스 이용에 따른 요금 지불 등의 편의를 위해서 만남의 방 서비스는 회원제로 운영되는 것이 바람직하다. 또한, 서비스 이용에 따른 요금 지불은 통상의 ARS 서비스에서 사용되는 시간당 요금 청구 방식이 사용될 수 있지만, 사용료에 따른 차별적인 서비스가 제공되는 점(대화 참여 가능 여부 결정)에 비추어 다른 결제 수단을 구비하고 있는 쪽이 바람직하다. 이를 위해서는 미리 일정한 금액을 해당 서비스 이용을 위해 충전해두고 서비스를 이용하도록 하는 방식을 사용할 수 있다. 서비스 이용 요금을 충전할 때에는 계좌이체, 신용카드, 이동전화를 이용한 결제, 전화요금과 통합 청구 등 다양한 결제 방식을 이용할 수 있다. 이를 위해 도면 상에 도시하지는 않았지만, 과금 서버 등을 둘 수 있고, 회원 정보와 충전금액 정보 등을 보관하고 있는 데이터베이스를 포함하고 있을 수 있다.The selection of the users who can use the room service of the meeting with the star and the selection of the active users can be done by selecting the first-come-first-served basis and the user who pays the most fee. For convenience, such as payment of the fee for using the service, the meeting room service is preferably operated on a membership basis. In addition, although the hourly billing method used in the ordinary ARS service may be used for payment of the fee for using the service, a different payment method is provided in view of the fact that discriminatory services are provided according to the fee (determining whether or not to participate in the conversation). It is preferable that there is. To this end, a certain amount of money may be charged in advance to use the service, and the service may be used. You can use a variety of payment methods, such as bank transfer, credit card, mobile phone payment, and telephone bills and consolidated bills. For this purpose, although not shown in the drawings, a billing server may be provided, and may include a database storing member information and charge amount information.

도 7에는 본 발명의 만남의 방 서비스의 흐름도가 나타나 있다.7 shows a flowchart of the meeting room service of the present invention.

먼저, 사용자는 유/무선 전화, 컴퓨터, 인터넷 전화 등을 이용하여 본 발명의 컨퍼런스 제공 서비스 시스템에 접속하여(S710) 음성 자동 응답 안내에 따라 만남의 방 서비스를 선택한다(S720). 회원제로 서비스가 이루어지는 경우, 음성 안내에 따른 회원 인증 과정을 거친다. 또한, 필요에 따라서는 음성 안내를 통해 회원 가입을 할 수도 있다. 어떤 시간대에 어떤 스타가 만남의 방에 출연하는지와 같은 프로그램 정보는 미리 다른 방법을 통해 온/오프라인으로 홍보될 수 있으며, 사용자의 선택에 따라서는 프로그램 정보를 음성 안내를 통해 들을 수도 있다. 프로그램은 시간대별로 구성될 수 있으며, 경우에 따라서는 하나의 시간대에 다수의 컨퍼런스 룸이 열릴 수도 있다. 후자의 경우에는 만남의 방 서비스에 접속한 후, 다시 참여하고자 하는 스타의 만남의 방을 선택하면 된다.First, the user accesses the conference providing service system of the present invention using a wired / wireless phone, a computer, an internet phone, etc. (S710) and selects a meeting room service according to the voice answering guide (S720). In the case of a membership-based service, the member authentication process is performed according to the voice guidance. In addition, if necessary, membership may be registered through voice guidance. Program information, such as which stars appear in the meeting room at any time, may be promoted on / offline in advance through other methods, and program information may be heard through voice guidance, depending on the user's choice. Programs can be organized by time zone, and in some cases multiple conference rooms can be opened in one time zone. In the latter case, you can access the meeting room service and select the meeting room of the star you want to participate again.

사용자가 만남의 방 서비스를 선택하면, 컨퍼런스 제공 서비스 시스템은 해당 만남의 방에 접속중인 사용자 수를 파악하여 현재 접속된 사용자가 만남의 방 접속이 가능한지를 판단한다(S730).When the user selects the meeting room service, the conference providing service system determines the number of users connected to the meeting room to determine whether the currently connected user can access the meeting room (S730).

접속 가능한 경우에는 사용자는 만남의 방으로 입장, 즉 만남의 방 서비스에 접속하게 되고(S750), 동시 접속 가능 사용자 수를 초과하여 접속 가능하지 않은 경우에는 접속 불가능에 따른 음성 안내가 사용자에게 제공된다(S740).If the access is possible, the user enters the meeting room, i.e., accesses the meeting room service (S750). If the user cannot access more than the number of concurrently connected users, a voice prompt is provided to the user. (S740).

만남의 방으로 접속하게 되면, 사용자는 자신이 지불할 사용료를 결정하여 입력한다(S760). 즉, 해당 만남의 방 서비스에서 능동 사용자가 되기 위해 지불할 금액을 결정하는 것이다. 금액의 입력은 전화기의 숫자판을 눌러 DTMF로 인식하도록 할 수도 있고, 사용자가 원하는 금액을 음성으로 말하면 이를 인식하여 할 수도 있다. 사용자가 금액을 입력하면, 입력된 금액을 사용자가 확인할 수 있도록 음성합성 제어부(222)에 의해 합성된 음성을 이용하여 안내하는 것이 바람직하다.When the user connects to the meeting room, the user determines and inputs a fee to be paid (S760). That is, the amount of money to be paid to become an active user in the room service of the meeting is determined. The input of the amount of money can be recognized by DTMF by pressing the numeric keypad of the phone, or by recognizing the amount of money desired by the user. When the user inputs the amount of money, it is preferable to use the synthesized voice by the voice synthesis control unit 222 to guide the input amount.

입력된 금액을 인식한 후에는, 해당 사용자가 능동 사용자가 될 것인지 수동 사용자가 될 것인지를 결정한다(S770). 능동/수동 사용자의 결정은 각 사용자로부터 입력된 금액을 비교하여 많은 금액을 입력한 사용자의 순으로 일정 수의 사용자를 능동 사용자로 결정하는 방식을 사용할 수 있다.After recognizing the input amount, it is determined whether the corresponding user becomes an active user or a passive user (S770). The determination of the active / passive user may use a method of determining a certain number of users as active users in order of users who input a large amount of money by comparing the amounts input from each user.

결정된 사용자 상태는 해당 사용자에게 다시 음성으로 안내되며, 사용자 상태가 능동 사용자로 결정된 경우에는 해당 사용자는 해당 스타와 다자간 통화를 할 수 있게 되고(S780), 사용자 상태가 수동 사용자로 결정된 경우에는 해당 사용자는 해당 스타와 능동 사용자 간의 다자간 통화 내용을 청취하게 된다(S790).The determined user status is voiced back to the user, and if the user status is determined to be an active user, the user can conference with the star (S780), and if the user status is determined to be a passive user, the user S hears the contents of the multi-party call between the star and the active user (S790).

사용자 상태는 만남의 방 서비스가 계속되는 동안 동일하게 유지되도록 할 수도 있고, 일정 시간 간격으로 사용자 상태를 바꾸도록 할 수도 있으며, 서비스 운영자의 제어나 해당 만남의 방에 참가하고 있는 스타의 특정 이벤트 등에 연동하여 바꾸도록 하는 것도 가능하다.The user status can be kept the same while the meeting room service continues, or can change the user status at regular intervals, and can be linked to the control of the service operator or to specific events of the stars participating in the meeting room. It is also possible to change it.

한편, 해당 통화 내용은 컨퍼런스 서비스 시스템 내에서 자동으로 녹취될 수 있다. 이를 위해 컨퍼런스 서비스 시스템은 녹취처리부를 두고 있으며, 녹취처리를 위해 구비된 하드디스크와 같은 기억장치에 해당 대화 내용을 기록한다. 기록된 대화 내용은 추후에 CD 등으로 제작되어 사용자에게 제공될 수 있다.Meanwhile, the call may be automatically recorded in the conference service system. For this purpose, the conference service system has a recording processing unit, and records the conversation contents in a storage device such as a hard disk provided for recording processing. The recorded conversation contents may be produced on a CD or the like and provided to the user later.

본 발명에 따르면, 또한, 특정 음색을 이용한 음성 모닝콜 및 스케쥴링 서비스와 그 시스템이 제공된다. 이는 소위 "스타 모닝콜" 및 "스타 스케쥴링" 서비스로 구현된다.According to the present invention, there is also provided a voice wake-up call and scheduling service using a specific tone and its system. This is implemented with so-called "star wakeup call" and "star scheduling" services.

먼저, 스타 모닝콜 서비스에 대해 설명한다.First, the star wakeup call service will be described.

사용자가 음성 모닝콜 서비스 시스템에 접속하여 자신이 모닝콜 서비스를 받고자 하는 스타와 모닝콜 서비스를 받고자 하는 시간을 선택한다. 또한, 사용자가 모닝콜 서비스를 받고자 하는 전화번호를 입력한다. 모닝콜 서비스를 받고자 하는 전화번호는 사용자가 전화를 이용해 음성 모닝콜 서비스 시스템에 접속한 경우에는 별도로 입력할 필요없이 시스템에 의해 자동으로 설정될 수 있다. 그러나, 사용자는 자신이 접속할 때 사용한 전화번호 외에 다른 전화번호로 모닝콜 서비스를 이용하도록 선택할 수도 있다. 또한, 필요한 경우에는 모닝콜 서비스에 사용될 문구 등을 사용자가 선택하도록 할 수도 있다.The user accesses the voice wake-up service system and selects a star to which he / she receives a wake-up service and a time to receive wake-up service. In addition, the user enters a phone number to receive the morning call service. The telephone number to receive the wake-up call service may be automatically set by the system without a separate input when the user accesses the voice wake-up call service system by telephone. However, the user may choose to use the wake-up service with a telephone number other than the telephone number used to connect. In addition, if necessary, a user may select a phrase to be used for a wake-up service.

사용자가 입력한 정보에 따라 음성합성 모듈을 이용하여 해당 스타의 음성으로 모닝콜 메시지를 합성한다. 합성된 모닝콜 메시지는 되풀이하여 사용될 것에 대비하여 음성 파일로 만들어져 보관되는 것이 바람직하다.According to the information input by the user, a voice call module is used to synthesize a morning call message using the voice of the corresponding star. The synthesized wake-up message is preferably stored in a voice file in preparation for repeated use.

사용자가 정한 모닝콜 시간이 되면, 사용자가 정한 모닝콜 전화번호로 전화를 걸어 앞서 합성한 모닝콜 메시지를 전달한다.When the wake-up call time set by the user is reached, the user calls the wake-up call number set by the user and delivers the morning call message synthesized above.

해당 스타의 음성으로 모닝콜 메시지를 합성하기 위해서는, 같은 내용을 발성한 두 화자의 음성으로부터 원본 화자의 음성을 목적 화자(스타)의 음성으로 변환하기 위한 변환 함수를 구하고, 구해진 변환 함수를 이용하여 원하는 메시지를 스타의 음성으로 변환하는 방법을 이용한다. 이제 이러한 방법에 대해 좀 더 자세히 설명한다.In order to synthesize a wake-up message with the voice of the corresponding star, a conversion function for converting the original speaker's voice into the voice of the target speaker (star) is obtained from the voices of the two speakers with the same contents. Use a method of converting a message into a star's voice. Now let's take a closer look at these methods.

음성 변환의 목적은 특정 화자 A(예를 들면 본 발명에서는 스타 모닝콜 서비스의 운영자)의 목소리를 원하는 화자 B(예를 들면 본 발명에서는 특정 스타)의 목소리로 바꾸는 것이다.The purpose of voice conversion is to change the voice of a specific speaker A (e.g., the operator of the star morning call service in the present invention) to the voice of the desired speaker B (e.g., a particular star in the present invention).

음성 변환을 위해서는 우선 특정 사람의 발성으로부터 그 사람임을 특징지어주는 여러 가지 특성들을 수치화하여 뽑아내어야 한다. 한 사람의 목소리의 특성을 완전히 표현하기 위해서는 그 사람의 성도 자체의 특성(성문) 뿐만이 아니라 발음 습관, 억양, 운율 정보 등 많은 것들이 수치화되어야 하며, 이 작업은 매우 힘들고 현재의 기술로는 거의 불가능하다고 할 수 있다.To convert speech, one must first quantify various characteristics that characterize that person from the person's utterance. In order to fully express the characteristics of a person's voice, not only the characteristics of the person's saints themselves (voices), but also many things such as pronunciation habits, intonation, rhyme information, etc., must be quantified. can do.

하지만, 만일 매우 짧은 발성, 예를 들어 한 어절의 발성만을 고려할 경우, 수많은 어절 외적인 정보들, 예를 들어 말하는 평균 속도와 음 높이, 억양 등의 정보들은 그 목소리가 특정 사람의 목소리임을 구분하는 데 큰 영향을 주지 않는다. 대신 그 사람의 성도의 특성 정보가 그 목소리가 그 사람임을 구분하게 하는 데 큰 역할을 한다.However, if you consider only very short utterances, for example a single utterance, a lot of extra-word information, such as average speed, pitch, and intonation, is used to distinguish that voice is a specific person's voice. It doesn't have a big impact. Instead, the characteristic information of a person's saints plays a large role in distinguishing that the voice is that person.

따라서 짧은 발성의 경우 그 사람의 성도 특성을 표현하는 스펙트럼 포락(spectrum envelope) 정보만을 이용하여 음성 변환을 수행하여도 비교적 만족스런 결과를 얻을 수 있다.Therefore, in the case of short speech, a relatively satisfactory result can be obtained even if the speech conversion is performed using only the spectrum envelope information representing the vocal characteristics of the person.

따라서 음성 변환을 위해서는, 같은 내용을 발성한 두 화자의 스펙트럼 포락간의 확률적 관계를 나타내어 주는 적절한 모델을 찾으면 된다. 이와 같은 모델을 토대로 두 스펙트럼 포락 간의 변환을 수행하는 변환 함수(conversion function)를 구한다. 이를 위해서는 실험적 데이터를 통한 모델의 훈련이 필요하다.Therefore, for speech conversion, it is necessary to find an appropriate model that represents the probabilistic relationship between the spectral envelopes of two speakers with the same content. Based on such a model, a conversion function for converting between two spectral envelopes is obtained. This requires training the model through experimental data.

도 8에 음성 변환을 위한 변환 함수를 구하는 과정을 모식적으로 나타내었다. 먼저 동일한 내용으로 원본 화자와 목적 화자의 목소리(810_s, 810_t)를 각각 녹음한 후, HNM(Harmonic and Noise Model) 분석기법(820_s, 820_t)을 이용하여 각각의 목소리로부터 스펙트럼 정보(spectrum envelope)(830_s, 830_t)를 추출한다. 그 후에 DTW(Dynamic Time Warping) 방법을 이용하여 같은 발음들을 같은 위치에 대응시킨다(840).8 schematically illustrates a process of obtaining a conversion function for speech conversion. First, the voices of the original speaker and the target speaker (810_s, 810_t) are recorded with the same contents, and then spectrum spectrum (spectrum envelope) (Harmonic and Noise Model) analysis method (820_s, 820_t) is used. 830_s and 830_t). Thereafter, the same pronunciations are mapped to the same position by using a dynamic time warping (DTW) method (840).

변환 함수를 얻어내기 위해서는 두 스펙트럼간의 관계를 나타내어주는 확률적 모델을 데이터로부터 훈련시켜야 한다. 이를 위해 대응된 각 발음별로 두 스펙트럼 정보의 관계를 GMM(Gaussian Mixture Model)으로 표현하고 EM(Expectation Maximization) 방법을 이용하여, 점진적 방법으로 훈련시킨다(860). 다음, 이를 DTW에서 구한 정렬 경로(alignment path)를 이용하여 다시 발음별로 대응시킨다.To obtain the transform function, we need to train a probabilistic model that represents the relationship between the two spectra. To this end, the relationship between the two spectral information for each corresponding pronunciation is represented by a Gaussian Mixture Model (GMM) and trained in a progressive manner using an Expectation Maximization (EM) method (860). Next, it is matched again by pronunciation using the alignment path obtained from DTW.

여기에서, GMM(Gaussian Mixture Model)은 많은 패턴 인식 기술에서 쓰이는 고전적인 모델이다. 이 모델은 관측되는 인자들의 분포가 소정의 정규 분포를 따르며, 관측되는 인자들 각각은 서로 독립이라 가정한다. 이는 관측의 순서가 중요하지 않은 상황에서 쓰기에 적합하다. 이 모델은 분포 함수가 가우시안(Gaussian) 함수이고 각 상태로 진행할 확률이 같은 HMM(Hidden Markov Model)로 간단히 생각할 수 있다.Here, the Gaussian Mixture Model (GMM) is a classic model used in many pattern recognition techniques. This model assumes that the distribution of observed factors follows a certain normal distribution, and that each of the observed factors is independent of each other. It is suitable for use in situations where the order of observations is not important. This model can be thought of simply as a Hidden Markov Model (HMM) whose distribution function is a Gaussian and has the same probability of progressing to each state.

이 함수를 음성 변환 함수를 만드는 데 쓸 수 있는 이유는, 변환 함수의 입력이 시간과는 관계가 없고 다만 주어진 시간의 원본 음성의 스펙트럼 포락에만 관계되기 때문이다.This function can be used to create a speech conversion function because the input of the conversion function is not related to time but only to the spectral envelope of the original speech at a given time.

GMM 모델에서 결국 각 음소들은 그 평균 특성과 분포의 정도로 특징지어진다. GMM 모델이 훈련되어 있다는 가정 하에서 특정한 입력 벡터가 주어진 음소일 확률은 조건부 확률로 주어진다.In the GMM model, eventually, each phoneme is characterized by its average characteristic and degree of distribution. Assuming that the GMM model is trained, the probability that a particular input vector is a given phoneme is given as a conditional probability.

GMM 모델의 인자들은 주어진 훈련용 입력 집합(결국 벡터들의 집합)으로부터 EM(Expectation Maximization) 알고리즘을 통해 구할 수 있다.The parameters of the GMM model can be obtained from the given training input (final set of vectors) through the Expectation Maximization (EM) algorithm.

EM(Expectation Maximization) 알고리즘은 훈련용 입력 집합에서 벡터를 하나씩 가져다가 이 벡터가 맞는 음소 모델로 들어갈 기대값이 최대가 되는 방향으로 되풀이하여 모델의 인자를 바꾸어 나가는 일반적인 방법이며, 이를 이용하여 모델이 수렴될 때까지 반복함으로써 변환 함수가 얻어진다.Expectation Maximization (EM) algorithm is a general method of taking a vector from a set of training inputs and changing the model's parameters by repeating it in a direction that maximizes the expected value of the vector into the correct phoneme model. By iterating until convergence, a transform function is obtained.

이와 같이 얻어진 변환 함수는 다시 최소제곱법(least square)을 이용하여 최적화된다(860).The transformation function thus obtained is again optimized using least squares (860).

상술한 바와 같은 방법을 이용하면 4분 정도로 적은 양의 녹음 음성을 이용하여 원하는 원본 음성을 원하는 화자의 목적 음성으로 변환하기 위한 변환 함수를 구할 수 있으며, 특히 음절 단위의 짧은 발성인 경우에는 매우 유용하다.Using the method described above, a conversion function for converting a desired original voice to a desired speaker's target voice using a small amount of recorded voices of about 4 minutes can be obtained. This method is particularly useful for short speech in syllable units. Do.

도 9는 도 8에 나타난 방법으로 구한 변환 함수를 이용하여 원본 음성을 목적 화자의 음성으로 변환하는 과정을 모식적으로 나타내고 있다.FIG. 9 schematically illustrates a process of converting an original speech into a target speaker's speech using a conversion function obtained by the method shown in FIG.

먼저, 원본 음성 신호(910)는 HNM 분석(920)을 통해 유성음부의 스펙트럼 포락(930_v)과 음원 정보(prosodic specification)(930_p)으로 분리되며, 분리된 스펙트럼 포락(930_v)은 변환 함수(940)를 통해 원하는 스펙트럼 포락으로 변환된다(950). 변환된 스펙트럼 포락은 완전하지 않기 때문에 이를 이용하여 신호를 합성하면 잡음이 들어간 형태의 합성음이 나오게 된다. 이를 피하기 위해 부가적으로 잡음을 제거하기 위한 잡음 필터를 이용하여 잡음 처리를 한다(970).First, the original speech signal 910 is separated into a spectral envelope 930_v and a sound source specification 930_p of the voiced sound unit through the HNM analysis 920, and the separated spectral envelope 930_v is converted into a transform function 940. 950 is transformed into the desired spectral envelope. The converted spectral envelope is not perfect, so synthesizing the signal using it results in a noisy synthesized sound. To avoid this, noise processing is performed using a noise filter for additionally removing noise (970).

한편, 이와는 별도로 원본 음성 신호(910)에서 분리된 음원 정보(930_p)는 음의 높이를 변환하기 위해 매핑 처리되며(960), 앞서 서술한 바와 같이 처리된 스펙트럼 포락 정보와 합쳐져 HNM 합성되어(980) 목적 화자의 목소리(990)로 변환된다.Meanwhile, separately, sound source information 930_p separated from the original speech signal 910 is mapped to convert the height of the sound (960), and merged with the processed spectral envelope information as described above to HNM synthesis (980). ) Is converted into the speaker's voice (990).

다음, 스타 스케쥴링 서비스에 대해 설명한다.Next, the star scheduling service will be described.

사용자가 음성 스케쥴링 서비스 시스템에 접속하여 자신이 스케쥴링 서비스를 받고자 하는 스타와 스케쥴링 내용을 선택한다. 또한, 사용자가 스케쥴링 서비스를 받고자 하는 전화번호를 입력한다. 모닝콜 서비스에서와 마찬가지로 스케쥴링 서비스를 받고자 하는 전화번호는 별도로 사용자가 입력하지 않아도 시스템에 의해 자동으로 설정될 수 있다. 그러나, 사용자는 자신이 접속할 때 사용한 전화번호 외에 다른 전화번호로 스케쥴링 서비스를 이용하도록 선택할 수 있음은 물론이다. 또한, 스케쥴링 서비스를 받고자 하는 전화가 이동전화인 경우, 스케쥴링 서비스를 직접 통화가 아닌 음성 및/또는 문자 메시지를 통해 제공받도록 선택할 수도 있다. 또한, 모닝콜 서비스에서와 같이 스케쥴링 서비스에 사용될 문구 등을 사용자가 선택하도록 할 수도 있다.The user accesses the voice scheduling service system and selects a star and scheduling contents for which he / she wants to receive the scheduling service. In addition, the user enters a phone number to receive the scheduling service. As in the wake-up service, the telephone number to receive the scheduling service may be automatically set by the system without a user input. However, the user may, of course, choose to use the scheduling service with another phone number in addition to the phone number used when he connects. In addition, when the phone to which the scheduling service is intended is a mobile phone, the scheduling service may be selected to be provided through voice and / or text message rather than a direct call. In addition, as in the morning call service, the user may select a phrase to be used in the scheduling service.

사용자가 입력한 정보에 따라 음성합성 모듈을 이용하여 해당 스타의 음성으로 스케쥴링 메시지를 합성한다. 스케쥴링 메시지의 합성은 사용자가 스케쥴을 입력한 즉시 이루어질 수도 있고, 사용자가 입력한 스케쥴을 알려야 할 시간에 이루어질 수도 있다. 미리 스케쥴링 메시지를 합성하는 경우에는 합성된 스케쥴링 메시지가 음성 파일로 만들어져 보관되는 것이 바람직하다. 해당 스타의 음성으로 스케쥴링 메시지를 합성하는 과정은 앞서 스타 모닝콜 서비스와 관련하여 설명한 것과 유사한 방법으로 이루어진다.According to the information input by the user, a scheduling message is synthesized using the voice of the corresponding star using the voice synthesis module. Synthesis of the scheduling message may be performed immediately after the user inputs a schedule, or may be performed at a time when a user inputs a schedule. When synthesizing the scheduling message in advance, it is preferable that the synthesized scheduling message is created and stored in a voice file. Synthesizing the scheduling message by the voice of the star is performed in a similar manner as described above with respect to the star morning call service.

사용자가 정한 시간이 되면, 사용자가 정한 전화번호로 전화를 걸어 합성된 스케쥴링 메시지를 전달하거나, 음성 및/또는 문자 메시지로 사용자의 스케쥴링 메시지를 전달한다.When the time set by the user is reached, the telephone number set by the user is dialed to deliver the synthesized scheduling message, or the user's scheduling message is transmitted by voice and / or text message.

본 발명의 음성 모닝콜 시스템 및 음성 스케쥴링 시스템은 앞서 그 구조와 기능에 대해 설명한 컨퍼런스 시스템을 이용해 구현될 수도 있다. 즉, 사용자의 접속에 따른 처리는 중계선 인터페이스부에 의해 이루어지며, 서비스 선택과 서비스 내용의 입력은 호 처리 제어부와 DSP 제어부 및 DSP 제어부와 연결된 음성 처리 모듈(음성 합성 제어부, 음성 인식 제어부 등)을 통해 이루어질 수 있고, 모닝콜 및 스케쥴링 서비스의 제공은 중계선 인터페이스부, 호 처리 제어부, DSP 제어부, 음성 합성 제어부 등을 통해 이루어질 수 있을 것이다. 그러나, 음성 모닝콜 및/또는 스케쥴링 서비스를 위한 별도의 시스템을 구성할 수도 있음은 물론이다.The voice wakeup call system and the voice scheduling system of the present invention may be implemented using a conference system described above with respect to its structure and function. That is, the process according to the user's connection is performed by the relay line interface unit, and the service selection and input of the service contents are performed by the call processing control unit, the DSP control unit, and the voice processing module (voice synthesis control unit, voice recognition control unit, etc.) connected to the DSP control unit. The morning call and the scheduling service may be provided through a relay line interface unit, a call processing controller, a DSP controller, a voice synthesis controller, and the like. However, it is of course possible to configure a separate system for voice wake-up call and / or scheduling service.

본 발명의 음성 모닝콜 및/또는 스케쥴링 서비스 시스템 내에 사용자가 인터넷을 통해 접속할 수 있는 웹 서버를 두는 경우 좀 더 편리하고 다양한 서비스를 사용자에게 제공할 수 있다.In the voice wake-up call and / or scheduling service system of the present invention, when a web server that a user can access through the Internet is provided, more convenient and various services can be provided to the user.

예를 들면, 스타 모닝콜 및/또는 스케쥴링 서비스를 이용하기 위한 정보, 즉 원하는 스타, 서비스를 제공받고자 하는 전화번호, 모닝콜 시간, 일정 내용 등을 웹 서버에 접속하여 입력하면, 시스템은 해당 정보를 이용하여 스타 모닝콜 및/또는 스케쥴링 서비스를 제공할 수 있다. 이와 같이 인터넷을 이용해 필요한 정보를 입력하는 방식은 전화를 이용해 해당 정보를 입력하는 방법에 비해 편리하고 저렴한 비용으로 이용할 수 있다는 장점이 있다.For example, if a user enters information for using a star wakeup call and / or scheduling service, that is, a desired star, a telephone number to be provided with a service, a wakeup call time, a schedule, and the like, the system uses the information. Star wakeup call and / or scheduling service can be provided. As such, the method of inputting necessary information using the Internet has an advantage that it can be used at a convenient and low cost compared to a method of inputting corresponding information using a telephone.

또한, 컨퍼런스 서비스 시스템의 경우도 마찬가지로 인터넷을 통해 접속할 수 있는 웹 서버를 두게 되면, 다양한 부가 서비스를 제공할 수 있게 된다. 예를 들면, 스타 만남의 방 서비스 프로그램 안내를 인터넷을 통해 제공할 수 있으며, 녹취된 내용을 다시 들을 수 있도록 하는 서비스나 녹취된 내용을 담은 CD 등을 주문 판매하는 서비스 등을 제공할 수 있다.In addition, in the case of a conference service system, if a web server accessible through the Internet is provided, various additional services can be provided. For example, a room service program guide for a star meeting can be provided through the Internet, and a service for re-listening the recorded contents or a service for ordering and selling a CD containing the recorded contents can be provided.

본 발명에 따르면, 또한, 양방향 음성 자동 응답 서비스 및 그 시스템이 제공된다.According to the present invention, there is also provided a two-way voice answering service and a system thereof.

도 10에 본 발명의 양방향 음성 자동 응답 시스템의 구성예가 나타나 있다. 본 발명의 시스템(1000)은 통신망을 통해 시스템에 접속하는 사용자(1060)와의 접속을 제어하는 중계선 인터페이스(1010)와 전달된 호 신호의 처리를 제어하는 호 처리 모듈(1020), 사용자로부터 입력된 음성 신호를 인식하기 위한 음성 인식 모듈(1030), 인식된 음성으로부터 핵심어를 추출하고 이에 따른 처리를 하는 핵심어 추출 및 처리 모듈(1040)과 사용자를 원하는 서비스로 연결하기 위한 정보를 보관하고 있는 데이터베이스(1050)를 포함하여 이루어진다. 앞서 서술한 바와도 같이, 사용자(1060)는 공중전화망이나 이동전화망과 같은 통신망을 통해 본 발명의 양방향 음성 자동 응답 서비스 시스템에 접속할 수 있고, 본 발명의 시스템은 또한, 본 발명의 시스템을 통해 제공되는 서비스를 직접 제공하는 전문가의단말(1070)과도 연결되어 있다.10 shows an example of the configuration of a two-way voice answering system of the present invention. The system 1000 of the present invention includes a relay line interface 1010 for controlling a connection with a user 1060 connecting to a system through a communication network, a call processing module 1020 for controlling the processing of a transmitted call signal, and an input from a user. A voice recognition module 1030 for recognizing a voice signal, a key word extraction and processing module 1040 for extracting a key word from the recognized voice and processing accordingly, and a database storing information for connecting a user to a desired service ( 1050). As described above, the user 1060 can access the two-way voice answering service system of the present invention through a communication network such as a public telephone network or a mobile telephone network, and the system of the present invention is also provided through the system of the present invention. It is also connected to the expert's terminal 1070 that directly provides services.

본 발명의 양방향 음성 자동 응답 시스템을 통해 제공되는 서비스의 예로는 양방향 상담 서비스를 들 수 있다. 이러한 서비스의 흐름이 도 11에 나타나 있다.An example of a service provided through the two-way voice answering system of the present invention is a two-way consultation service. This service flow is shown in FIG.

먼저, 사용자는 유/무선 전화와 같은 사용자 단말을 이용하여 본 발명의 양방향 음성 자동 응답 시스템에 접속한다(S1110).First, the user accesses the two-way voice answering system of the present invention by using a user terminal such as a wired / wireless telephone (S1110).

다음, 사용자는 시스템의 안내에 따라 자신의 상담 내용을 전화를 통해 이야기한다(S1120).Next, the user talks his / her counseling contents over the phone according to the guidance of the system (S1120).

사용자로부터 전달된 음성 정보는 음성 인식 모듈을 통해 인식되고(S1130), 인식된 문장으로부터 핵심어가 추출된다(S1140). 핵심어가 추출되면 해당 핵심어 정보로부터 사용자에게 요구되는 전문 상담 서비스의 카테고리 및 해당 상담자가 결정될 수 있다(S1150).The voice information transmitted from the user is recognized through the speech recognition module (S1130), and a key word is extracted from the recognized sentence (S1140). When the key word is extracted, the category of the professional counseling service required by the user and the corresponding counselor may be determined from the key word information (S1150).

해당 상담자가 결정되면 본 발명의 시스템은 상담을 요청한 사용자를 해당 상담자에게로 직접 연결하여 상담 서비스를 제공한다(S1160).When the counselor is determined, the system of the present invention directly connects the user who requested the consultation to the counselor and provides a counseling service (S1160).

이와 같은 양방향 음성 자동 응답 서비스는 기존의 단방향 서비스와 달리 시스템 상에서 사용자가 제공한 음성 정보로부터 적절한 서비스의 내용을 파악하여 연결하여 주므로, 편리할 뿐 아니라 사용자가 원하는 서비스의 범주를 정확히 결정하기 힘든 경우에도 원하는 서비스를 제공받을 수 있다는 장점이 있다.Unlike the existing one-way service, the two-way voice answering service detects and connects the appropriate service contents from the voice information provided by the user on the system, which is convenient and difficult to accurately determine the category of the service desired by the user. In addition, there is an advantage that the desired service can be provided.

지금까지 바람직한 실시예를 참고로 하여 이 발명을 상세히 설명하였으나 이 발명의 범위는 이에 한정되는 것은 아니며, 다음의 특허청구범위에 의해 해석되어야 할 것이다. 또한, 이 발명이 속하는 분야의 통상의 기술자라면 이 발명의 사상을 벗어나지 않고도 다양한 변형이나 변경이 가능함을 이해할 수 있을 것이다.The present invention has been described in detail with reference to preferred embodiments, but the scope of the present invention is not limited thereto, and should be interpreted by the following claims. In addition, it will be understood by those skilled in the art that various modifications or changes may be made without departing from the spirit of the present invention.

이상에서 살펴본 바와 같이, 본 발명에 컨퍼런스 서비스 시스템을 이용하면 천 명 이상의 사용자가 동시에 하나의 컨퍼런스 룸에 접속하여 주고받는 음성 메시지를 공유할 수 있으며, 이 중 특정 사용자만을 말하고 들을 수 있는 능동 사용자로 설정함으로써 지나치게 많은 사용자가 하나의 컨퍼런스 룸에 접속한 경우의 문제점을 해결하면서 스타와의 팬클럽 서비스와 같은 특정 애플리케이션에 적절하게 이용할 수 있다. 사용자가 입력하고자 하는 정보는 음성 인식을 통해 전달되므로 편리하며, 대화 내용을 시스템 내에서 직접 녹취할 수 있어 이를 다양한 부가 서비스에 응용할 수 있다.As described above, using the conference service system according to the present invention, more than 1,000 users can simultaneously access a conference room and share a voice message, which is an active user who can only speak and listen to a specific user. By setting this up, you can solve the problem of too many users accessing a conference room, and use it appropriately for certain applications, such as a fan club service with a star. Since the information input by the user is delivered through voice recognition, it is convenient, and the contents of the conversation can be recorded directly in the system, and the information can be applied to various additional services.

또한, 음성 합성 및 변조를 이용하여 특정 음색으로 모닝콜 및/또는 스케쥴링 서비스를 제공할 수 있으며, 사용자가 원하는 내용을 음성으로 입력하면 이를 분석하여 해당 서비스와 연결해 주므로 사용자에게 편리하고 효과적인 서비스를 제공할 수 있다.In addition, it is possible to provide a wake-up call and / or scheduling service with a specific tone by using voice synthesis and modulation, and to provide a convenient and effective service to the user by analyzing and connecting the user's desired contents with the voice. Can be.

Claims

A relay line interface unit for receiving a user's signal from an external communication network and transmitting the signal to the external communication network, a call processing controller for controlling the processing of the transmitted call signal, an H.110 bus for connecting to another board and processing data transmission and reception. H.110 bus control unit for controlling the internal bus control, the internal bus control unit for controlling the internal bus for processing data transmission and reception between each module in its own board, DSP control unit for processing digital signals used in the conference service, And at least two substrates connected via an H.110 bus.

The method of claim 1,

And a voice recognition unit connected to the DSP control unit and recognizing information transmitted from the user by voice.

The method of claim 1,

And a voice synthesizer connected to the DSP control unit and synthesizing information to be provided to a user into voice.

The method of claim 1,

And a voice recording unit connected to the DSP control unit and recording a conversation content, which is voice information transmitted and received through a conference service, to a recording device.

The method of claim 1,

And a reception processing unit, a transmission processing unit, and a mixing processing unit, each of which is connected to the DSP control unit and can process data reception, data transmission, and data mixing for a conference service.

The method of claim 1,

The call processing control unit and the DSP control unit operate together so that a voice signal transmitted from one of the first users, which is a part of all users who are connected to the conference service providing system and uses the conference service, is all other users except the user who transmitted the voice signal. And a voice signal transmitted from one of the second users other than the first user among all the users.

A relay line interface unit for receiving a user's signal from an external communication network and transmitting the signal to the external communication network, a call processing controller for controlling the processing of the transmitted call signal, an H.110 bus for connecting to another board and processing data transmission and reception. H.110 bus control unit for controlling the internal bus control unit for controlling the internal bus for processing connection and data transmission and reception between each module in its own board, DSP control unit for processing digital signals used in conference services As a conference service providing method using a conference service providing system,

Transmitting a call signal from the user to the call processing controller via the relay line interface unit as the user accesses the conference service providing system through a communication network;

Determining whether the number of connected users exceeds the number of simultaneous accessible users,

If the number of connected users does not exceed the number of concurrently accessible users in the determining step, determining, by the user, which of the active users is allowed to both transmit and receive signals and a passive user who is only allowed to receive signals;

In the determining step, if the user is determined to be an active user who is allowed to both transmit and receive a signal, the voice signal transmitted from the user is transmitted to all other users except the user, and in the determining step, the user receives only the signal. If it is determined that this is an allowed active user, processing to ignore the voice signal delivered from the user without passing it to another user.

The method of claim 7, wherein

And storing the voice signal transmitted from the active user to a recording device.

The method of claim 7, wherein

The conference service providing system is connected to the DSP control unit, and further includes a reception processing unit, a transmission processing unit, and a mixing processing unit, each of which can process data reception, data transmission, and data mixing for a conference service,

And in the processing of the voice signal transmitted from the user, signal reception, signal transmission, and signal mixing are performed through the reception processor, the transmission processor, and the mixing processor.

A relay line interface unit for receiving a signal from an external communication network and transmitting a signal to a user through the external communication network, a call processing control unit for controlling incoming or outgoing call signal, and an object having a specific tone from an original voice using a conversion function Speech synthesis processing unit for processing the speaker's voice synthesis, DSP control unit for processing digital signal used in voice information service, memory for storing information provided from user and other voice information service A method of providing voice information of a specific tone using a voice information service device comprising a device,

Voice information service provided by the user as the user inputs the voice information service content including the content of the voice information to be provided, the tone, and the time to be provided with the voice information by accessing the device through a communication network using the user terminal. A first step in which content is stored in the storage device in the device,

A second step of synthesizing contents of voice information to be provided by a user into voices of a specific tone selected by the user by the voice synthesis processor;

And a third step of delivering the synthesized voice to the user at the time of providing the voice information input from the user.

The method of claim 10,

A conversion function used for synthesizing a voice of a desired tone by the voice synthesis processor,

Extracting spectral information of each voice by analyzing the recorded voice of the original speaker and the target speaker, HNM (Harmonic and Noise Model),

Mapping positions of the same pronunciation of each spectrum information by using a dynamic time warping (DTW) method,

Expressing the relationship between the two spectral information as a Gaussian Mixture Model (GMM) and gradually training using an Expectation Maximization (EM) method;

A method of providing a voice information service of a specific tone obtained by optimizing a trained GMM using a least square method.

The method of claim 10,

The second step,

Step 2-1 of HNM analysis of the original speech signal containing the contents of the speech information to be provided by the user and separation into spectral envelope and sound source information of the voiced sound;

Step 2-2 of converting the spectral envelope into a desired spectral envelope using the transform function;

Performing a noise canceling process on the converted spectral envelope;

A second to fourth mapping process for converting the sound source information into sound source information of a desired tone;

And a second to fifth step of synthesizing a voice of a desired tone by HNM synthesis using the mapped sound source information and the noise canceled spectral envelope.

The method of claim 10,

The content of voice information to be provided is a wake-up call or scheduling information.

The method of claim 10,

In the first step,

The user terminal is a wired or wireless telephone,

The communication network is a public telephone network or a mobile telephone network voice information service providing method of a specific tone.

The method of claim 14,

And in the third step, delivering the voice synthesized to the user terminal used by the user to access the device in the first step.

The method of claim 10,

In the first step,

The user terminal is a personal computer,

And the communication network is an internet network.

The method of claim 10,

The voice information service contents provided by the user in the first step includes user terminal information specifying a user terminal to receive the voice information service.

And in the third step, delivering the voice synthesized to the user terminal specified by the user in the first step.

The method of claim 10,

After the second step,

And storing the voice synthesized in the second step in the form of a file in the storage device of the device.

The method of claim 10,

In the third step,

And establishing a connection of an outgoing call to the user's terminal by the call processing control unit and the relay line interface, and transmitting the synthesized voice to the user.

The method of claim 10,

In the third step,

Voice information service providing method of a specific tone to deliver the voice synthesized in the form of a voice message to the user terminal to the user.

A relay line interface for receiving a user's signal from an external communication network and transmitting a signal to the external communication network, a call processing control module for controlling processing of the transmitted call signal, a voice recognition module for recognizing a voice signal input from the user, Apparatus for providing a two-way answering service comprising a keyword extraction and processing module for extracting a key word from the voice and processing accordingly, and a database storing information for connecting a user to a desired service.

A relay line interface for receiving a user's signal from an external communication network and transmitting a signal to the external communication network, a call processing control module for controlling processing of the transmitted call signal, a voice recognition module for recognizing a voice signal input from the user, A method for providing a two-way answering service using a device including a keyword extraction and processing module for extracting a key word from voice and processing accordingly, and a database storing information for connecting a user to a desired service,

Transmitting a call signal from the user to the call processing control module via the relay line interface as the user connects to the device through a communication network;

Recognizing a voice signal input from the user by the voice recognition module according to a voice input of contents of a service desired by the user,

Extracting a key word from the recognized voice signal by the key word extracting and processing module, and determining a type of service to be provided to the user by referring to the database using the extracted key word,

And the user is connected to the determined service.

The method of claim 22,

In the voice signal recognition step, the user inputs the contents to consult by voice,

In the service type determination step, a counselor to which the user is connected is determined using key words extracted from the counseling content of the user.

In the service connection step, the user and the counselor are connected via a direct telephone method.