KR20030038921A

KR20030038921A - The voice recognition system for independent speech processing

Info

Publication number: KR20030038921A
Application number: KR1020010069128A
Authority: KR
Inventors: 김영진
Original assignee: 주식회사 보이스콤넷
Priority date: 2001-11-07
Filing date: 2001-11-07
Publication date: 2003-05-17
Also published as: KR100432373B1

Abstract

PURPOSE: A voice recognition system for independent voice processing is provided to eliminate necessity of reconstructing a new system even when voice recognition/synthesizing server is replaced with another one. CONSTITUTION: A voice recognition system includes an application unit(30a), a switching server(100), an interface(40a), and a voice recognition/synthesizing server. The application unit processes a voice signal transmitted from a signal transmitter(20) into a voice correspondence request signal. The switching server designates a voice recognition/synthesizing server(50a,50b) that generates an answer to the voice correspondence request signal, inserts test information into grammar file information that is a command for allowing the voice recognition/synthesizing server to generate the answer, and generates a processed voice signal including the grammar file information and voice signal. The interface automatically outputs the processed voice signal generated by the switching server. The voice recognition/synthesizing server receives the processed voice signal, and matches the processed voice signal with a voice correspondence signal storage to generate a voice correspondence signal that a user requests.

Description

The voice recognition system for independent speech processing

본 발명은 음성인식 시스템에 관한 것으로서, 더욱 상세하게는 음성인식 시스템의 일 구성요소인 음성인식/합성서버를 교체하여도 새롭게 시스템을 재건축 할 필요가 없으며, 더욱이 다양한 종류의 음성인식/합성서버를 운영하여도 시스템의 구현이 가능한 독립적 음성처리를 위한 음성인식 시스템에 관한 것이다.The present invention relates to a voice recognition system, and more particularly, even if a voice recognition / synthesis server which is one component of the voice recognition system is replaced, there is no need to newly reconstruct the system, and moreover, various types of voice recognition / synthesis servers are provided. The present invention relates to a speech recognition system for independent speech processing that can be implemented even when operating.

일반적으로 음성인식 시스템이라 함은 사용자의 음성을 인식하여 그 내용대로 동작해 주는 장치로서, 최근 이동통신단말기(셀룰러폰, PCS) 등에서 음성으로 콜(call)을 하거나, 인터넷상에서의 음성포털 서비스, 증권업무, 교육분야(예를 들어, 문답시, 문장 또는 단어의 발음이 요구되는 외국어 학습분야) 및 전화망서비스에서 항공 및 승차권 예약 등에서 사용자의 음성에 의해 해당 정보를 검색하여 주는 등, 여러 측면에서 활용가능성이 제시되고 있다.In general, a voice recognition system is a device that recognizes a user's voice and operates according to its contents. In recent years, a voice call service is used in a mobile communication terminal (cellular phone, PCS), or a voice portal service on the Internet, In many aspects, such as securities business, education (for example, foreign language learning that requires pronunciation of sentences or words) and telephone network services, such as retrieval of information by user's voice in flight and ticket reservation, etc. Applicability is suggested.

도 1은 일반적인 음성인식 시스템을 나타낸 블록도이다.1 is a block diagram showing a general voice recognition system.

도시한 바와 같이, 사용자의 음성이 입력이 되는 음성신호입력수단(10), 입력된 음성신호를 전송하여 주는 신호전송수단(20), 전송이 된 음성신호를 음성인식/합성서버(50)가 인식할 수 있는 형식으로 음성신호를 가공하여 주는 어플리케이션부(30), 가공된 음성신호를 인터페이스부(40)를 통하여 입력받은 후, 가공된 음성신호를 인지하여, 거기에 대한 답변인 음성대응신호를 생성하는 음성인식/합성서버(50), 생성된 음성대응신호가 사용자가 인지할 수 있도록, 출력되는 음성대응신호출력수단(60)으로 구성되어 있다.As shown, the voice signal input means 10 that the user's voice is input, the signal transmission means 20 for transmitting the input voice signal, the voice recognition / synthesis server 50 to the transmitted voice signal The application unit 30 that processes the voice signal in a recognizable format, receives the processed voice signal through the interface unit 40, recognizes the processed voice signal, and answers the voice signal corresponding thereto. Speech recognition / synthesis server 50 for generating a, is composed of the voice corresponding signal output means 60 is output so that the user can recognize the generated voice corresponding signal.

상기 음성신호입력수단(10)에는 컴퓨터 단말기에 구비되는 마이크, 이동통신단말기에 있어서 인터넷 서비스가 가능한 왑폰(WAP phone: Wireless Application Protocol phone) 및 전화기 등을 이용하여 사용자의 음성이 입력된다.The voice signal input means 10 inputs a user's voice using a microphone provided in a computer terminal, a WAP phone (WAP phone) and a telephone, which are capable of Internet service in a mobile communication terminal.

상기 신호전송수단(20)에는 음성신호입력수단(10)에서 입력된 음성신호를 전송하여 주기 위해, 인터넷망, 기지국과 기지국 제어기 등을 포함하여 구성되는 이동통신망 및 공중전화망(PSTN: public switched telephone network) 등을 포함하여 구성되며, 음성신호입력수단(10)에서 입력이 된 음성신호를 어플리케이션부(30)로 전송한다.In order to transmit the voice signal input from the voice signal input means 10 to the signal transmission means 20, a mobile communication network and a public switched telephone network (PSTN) including an internet network, a base station and a base station controller, etc. and a voice signal inputted by the voice signal input means 10 to the application unit 30.

어플리케이션부(30)에는 전송되는 음성신호의 발음의 범위에 속하는 텍스트들을 경우의 수를 참조하여 데이터베이스화 한 텍스트저장부(미도시)를 구비하고, 상기 신호전송수단(20)에서 전송된 음성신호와 텍스트신호를 사전에 프로그래밍화 되어진 문법파일형식(명령어)에 삽입을 하여 인터페이스부(40)를 통하여 음성인식/합성서버(50)로 입력한다.The application unit 30 includes a text storage unit (not shown) in which the texts belonging to the pronunciation range of the transmitted voice signal are databased with reference to the number of cases, and the voice signal transmitted from the signal transmission means 20. And a text signal are inserted into a grammar file format (command word) programmed in advance, and input to the voice recognition / synthesis server 50 through the interface unit 40.

여기서 상기 어플리케이션부(30)는 음성인식/합성서버(50)에 종속적이기 때문에 음성인식/합성서버(50)가 인식할 수 있는 문법파일형식(명령어)으로 신호를 가공해야 한다.Since the application unit 30 is dependent on the voice recognition / synthesis server 50, the application unit 30 must process a signal in a grammar file format (command) that can be recognized by the voice recognition / synthesis server 50.

즉, 음성인식 서비스를 제공하기 전에 탑재가 된 음성인식/합성서버(50)가 인식할 수 있는 문법파일형식을 생성할 수 있도록, 어플리케이션부(30)는 사전에 프로그래밍화 되어 있어야 한다.That is, the application unit 30 should be programmed in advance so that the grammar file format that can be recognized by the mounted voice recognition / synthesis server 50 can be generated before providing the voice recognition service.

음성인식/합성서버(50)에는 음성인식엔진 및 음성합성엔진과 가공이 된 음성신호에 대응이 되는 음성대응신호(문자, 특별문자, 도형, 음성 및 이들의 조합)가 데이터베이화 된 음성대응신호 저장부(미도시)를 구비하고, 인터페이스부(40)를 통하여 입력된 가공이 된 음성신호를 인식한 후에 음성합성엔진에 의해 가공이 된 음성신호와 음성대응신호 저장부를 매칭함으로써, 사용자가 요구하는 음성대응신호를 생성한다.The voice recognition / synthesis server 50 has a voice response engine (text, special characters, figures, voices, and combinations thereof) corresponding to the voice recognition engine and the voice synthesis engine and the processed voice signal. A user is provided with a signal storage unit (not shown) and recognizes the processed voice signal input through the interface unit 40, and then matches the processed voice signal with the voice corresponding signal storage unit by the voice synthesis engine. Generates the required voice response signal.

여기서 상기 음성인식/합성서버(50)와 어플리케이션부(30)와 인터페이스부(40)는 네트워크로 구성되어진다.Here, the voice recognition / synthesis server 50, the application unit 30 and the interface unit 40 is composed of a network.

또한, 음성인식/합성서버(50)는 음성인식 서비스 제공자가 요구하는 다양한 요구, 예를 들어, 숫자, 이름, 서비스명, 음성 위주, 또는 이들의 조합이냐에 따라 자신에게 필요한 음성인식/합성서버를 선택할 수 있도록, 다양한 종류의 음성인식/합성서버가 제공되어진다.In addition, the voice recognition / synthesis server 50 is a voice recognition / synthesis server required by the voice recognition service provider according to the various needs, for example, numbers, names, service names, voice-oriented, or a combination thereof Various kinds of speech recognition / synthesis server are provided to select.

생성된 음성대응신호는 신호전송수단(20)을 통하여 음성대응신호출력수단(60)에 출력되어진다.The generated voice corresponding signal is output to the voice corresponding signal output means 60 through the signal transmission means 20.

여기서 음성대응신호출력수단(60)에 출력되어지는 음성대응신호의 전송경로는 음성인식/합성서버(50)에서 인터페이스부(40)와 어플리케이션부(30)를 통하여 신호전송수단(20)으로 전송될 수도 있으며, 또는 음성인식/합성서버(50)에서 바로 신호전송수단(20)으로 전송될 수도 있다.Here, the transmission path of the voice corresponding signal output to the voice corresponding signal output means 60 is transmitted from the voice recognition / synthesis server 50 to the signal transmission means 20 through the interface unit 40 and the application unit 30. Alternatively, the voice recognition / synthesis server 50 may be directly transmitted to the signal transmission means 20.

상기와 같이 기술된 음성인식 시스템의 수요는 나날이 증가되고 있는 추세이며, 다수의 사용자의 다양한 욕구를 충족시켜 줄 수 있도록, 사용자의 질의에 따른 음성신호를 인지하고, 거기에 합당한 음성대응신호를 생성하는 다양한 종류의 음성인식/합성서버가 구축이 된 음성인식 시스템이 요구되고 있는 실정이다The demand of the voice recognition system described above is increasing day by day, and to satisfy the various needs of a large number of users, to recognize the voice signal according to the user's query, and generates a sound response signal accordingly There is a demand for a voice recognition system in which various types of voice recognition / synthesis servers are established.

그러나 상기 일반적인 음성인식 시스템에 있어서, 어플리케이션부는 음성인식/합성서버에 종속적이어서, 음성인식서비스 제공자가 음성인식 서비스의 용도에따라 다른 종류의 음성인식/합성서버를 사용할 경우, 상기 다른 종류의 음성인식/합성서버로 입력이 되는 가공이 된 음성신호를 인식할 수 있도록, 이전에 프로그래밍 되어진 어플리케이션부의 문법파일형식을 교체되어진 다른 종류의 음성인식/합성서버에 맞게 새롭게 프로그래밍을 하여야 하기 때문에 음성인식/합성서버의 변경에 따른 음성인식 시스템의 보수 및 재개발이 요구되어져 프로그램 개발자의 재교육과 시스템 재구축에 따른 비용지출이 발생하는 문제점이 있다.However, in the general voice recognition system, the application unit is dependent on the voice recognition / synthesis server, so that when the voice recognition service provider uses a different type of voice recognition / synthesis server according to the purpose of the voice recognition service, the other type of voice recognition In order to recognize the processed voice signal input to the synthesis server, the grammar file format of the previously programmed application part needs to be newly programmed for the other type of voice recognition / synthesis server that has been replaced. There is a problem that the cost of the re-education of the program developer and the system reconstruction are generated due to the necessity of maintenance and redevelopment of the voice recognition system according to the change of the server.

또한, 서로 다른 음성인식/합성서버 제공자들이 제공한 다양한 종류의 음성인식/합성서버들을 운영할 경우, 상기 개개의 음성인식/합성서버가 인식할 수 있도록, 구축되어진 음성인식/합성서버의 종류별만큼 프로그래밍 되어진 다수의 어플리케이션부가 구축이 되어야 하는 문제점이 있다.In addition, in the case of operating various types of voice recognition / synthesis servers provided by different voice recognition / synthesis server providers, each voice recognition / synthesis server can be recognized by the type of constructed voice recognition / synthesis server. There is a problem in that a large number of programmed parts are programmed.

본 발명은 상기 종래 기술의 문제점을 해결하기 위한 것으로, 본 발명의 목적은 음성인식서비스 제공자가 음성인식 서비스의 용도에 따라 다른 종류의 음성인식/합성서버로 교체하여 사용할 경우에도 음성인식 시스템의 일 구성요소인 어플리케이션부를 새롭게 프로그래밍 할 필요 없이, 간단한 매개변수 입력값 만으로 음성인식 시스템의 구동을 가능하게 하여 프로그램 개발자의 재교육이 요구되지 않으며, 또한 음성인식 시스템의 재구축에 따른 비용지출도 없는 독립적 음성처리를 위한 음성인식 시스템을 제공하는데 있다.The present invention is to solve the problems of the prior art, an object of the present invention is to provide a voice recognition system even if the voice recognition service provider is replaced with another type of voice recognition / synthesis server according to the use of the voice recognition service It is possible to operate the voice recognition system with only simple parameter input values without the need to program the application part, which is a component, and does not require re-education of the program developer and independent voice without the expense of rebuilding the voice recognition system. It is to provide a voice recognition system for processing.

본 발명의 다른 목적은 음성인식 서비스 제공자가 다양한 종류의 음성인식/합성서버들을 운영할 경우에도 음성인식 시스템의 구현이 가능하도록 제어하여 주는 미들웨어가 탑재가 된 독립적 음성처리를 위한 음성인식 시스템을 제공하는데 있다.Another object of the present invention is to provide a voice recognition system for independent voice processing equipped with middleware that controls the implementation of a voice recognition system even when the voice recognition service provider operates various types of voice recognition / synthesis servers. It is.

본 발명의 목적을 달성하기 위한 기술적 사상으로, 사용자의 음성이 입력되는 음성신호 입력수단, 상기 음성신호 입력수단에 입력이 된 음성신호를 전송하는 신호전송수단에 의해 전송이 된 음성신호를 인식하여 해당 음성신호에 대응하는 신호를 합성하여 음성대응신호를 출력하여 주는 음성인식 시스템에 있어서, 음성대응신호를 요청할 수 있도록, 상기 신호전송수단에서 전송되는 음성신호를 음성대응 요청신호로 가공하는 어플리케이션부와, 상기 어플리케이션부에서 가공된 음성대응 요청신호를 입력받아, 상기 입력된 음성대응 요청신호에 대응하여 답변을 생성하는 음성인식/합성서버의 지정과, 상기 지정된 음성인식/합성서버에서 답변을 생성하도록 하는 명령어인 문법파일정보에 음성대응 요청신호인 음성신호의 발음의 범위에 속하는 텍스트정보를 삽입하고, 상기 텍스트정보가 삽입된 문법파일정보와 음성신호를 포함하는 가공이 된 음성신호를 생성하는 스위칭서버와, 상기 스위칭서버에서 생성된 가공이 된 음성신호를 자동으로 출력시켜주는 인터페이스부와, 상기 인터페이스부로부터 입력된 가공이 된 음성신호를 입력받아 음성인식/합성엔진에 의해, 가공이 된 음성신호와 음성대응신호 저장부를 매칭하여 사용자가 요구하는 음성대응신호를 생성하는 해당 음성인식/합성서버를 포함하여 제시한다.As a technical idea for achieving the object of the present invention, by recognizing the voice signal transmitted by the voice signal input means to the user's voice input, the signal transmission means for transmitting the voice signal input to the voice signal input means In a speech recognition system for synthesizing a signal corresponding to a corresponding speech signal and outputting a speech corresponding signal, an application unit for processing the speech signal transmitted from the signal transmitting means into a speech corresponding request signal so as to request the speech corresponding signal. And a designation of a voice recognition / synthesis server that receives the processed voice response request signal from the application unit and generates an answer in response to the input voice response request signal, and generates an answer in the designated voice recognition / synthesis server. The grammar file information, which is a command to A switching server for inserting text information, generating a processed speech signal including the grammar file information into which the text information is inserted, and a speech signal, and automatically outputting the processed speech signal generated by the switching server. The interface unit receives the processed voice signal inputted from the interface unit, and generates a voice corresponding signal required by the user by matching the processed voice signal with the voice corresponding signal storage unit by a voice recognition / synthesis engine. Present including voice recognition / synthesis server.

여기서 상기 스위칭서버부는, 음성인식 시스템에 구축되어 있는 음성인식/합성서버의 IP어드레스 및 상표명, 상기 음성인식/합성서버가 인식하여 음성대응신호를 생성할 수 있도록 하는 문법파일 및 사용자가 입력한 음성신호의 발음의 범위에속하는 텍스트들을 입력하는 사용자입력부와, 상기 사용자입력부에서 입력이 되는 데이터 중, 음성인식/합성서버인식용 문법파일데이터와 사용자가 입력한 음성신호의 발음의 범위에 속하는 텍스트데이터가 저장이 되는 외부데이터 저장부와, 음성인식/합성서버의 전송경로가 지정이 된 음성대응 요청신호를 입력받아 상기 외부데이터 저장부와 매칭하여 해당 음성인식/합성서버의 문법파일정보 및 사용자가 입력한 음성신호의 발음의 범위에 속하는 해당 텍스트데이터를 각각 추출하고, 상기 추출된 문법파일정보에 상기 추출된 텍스트정보를 삽입하여, 상기 텍스트정보가 삽입이 된 문법파일정보와 사용자의 음성신호를 포함하는 가공이 된 음성신호를 생성하는 데이터취합부와, 상기 사용자입력부에서 입력이 된 음성인식/합성서버의 IP어드레스데이터와 상표명데이터, 상기 음성인식/합성서버가 인식하여 음성대응신호를 생성할 수 있도록 하는 문법파일데이터 및 사용자가 입력한 음성신호의 발음의 범위에 속하는 텍스트데이터를 임시저장부에 래치하고, 상기 래치된 데이터 중, 문법파일데이터 및 텍스트데이터를 외부데이터 저장부에 저장이 되도록 제어하고, 입력된 음성대응 요청신호의 답변인 음성대응신호를 생성하는 음성인식/합성서버를 임시저장부에 래치된 음성인식/합성서버의 상표명데이터를 판독하여 해당 음성인식/합성서버를 지정하고, 상기 지정된 음성인식/합성서버에서 음성대응신호를 생성할 수 있도록, 상기 데이터취합부를 제어하는 제어부를 포함하여 구성이 된다.Here, the switching server unit, the IP address and the brand name of the voice recognition / synthesis server built in the voice recognition system, the grammar file and the user input voice to recognize the voice recognition / synthesis server to generate a voice corresponding signal User input unit for inputting texts belonging to the range of pronunciation of the signal, grammar file data for voice recognition / synthesis server recognition and text data belonging to the pronunciation range of the voice signal input by the user among the data input by the user input unit The grammar file information of the corresponding voice recognition / synthesis server and the user are received by matching the external data storage unit and the external data storage unit which receives the voice response request signal in which the transmission path of the voice recognition / synthesis server is designated. Extract the corresponding text data belonging to the pronunciation range of the input voice signal, and extract the grammar wave. A data collection unit for inserting the extracted text information into work information to generate a processed voice signal including the grammar file information into which the text information is inserted and a user's voice signal, and inputted from the user input unit; IP address data and brand name data of the voice recognition / synthesis server, grammar file data for the voice recognition / synthesis server to recognize and generate a voice response signal, and text data belonging to the range of pronunciation of the voice signal input by the user. The voice recognition / synthesis latches the temporary storage unit, controls the grammar file data and the text data of the latched data to be stored in the external data storage unit, and generates a voice response signal that is an answer to the input voice response request signal. The server recognizes the voice recognition / synthesis server by reading the brand name data of the voice recognition / synthesis server latched in the temporary storage. And, to generate the corresponding voice signal by the specified speech recognition / synthesis server, and is configured by a control unit for controlling the data collection unit.

여기서 상기 제어부는 입력이 되는 음성대응 요청신호가 폭주할 경우, 상기 임시저장부에 저장되어 있는 음성인식/합성서버의 상표명 데이터를 판독하여, 선택적으로 음성인식/합성서버들을 지정할 수 있도록 제어한다.In this case, when the input voice response request signal is congested, the controller reads the brand name data of the voice recognition / synthesis server stored in the temporary storage unit, and controls to selectively specify voice recognition / synthesis servers.

또한, 상기 음성인식/합성서버는 사용자의 음성신호에 대응하여 답변을 생성하도록 하는 명령어인 문법파일이 서로 다른 다양한 종류로 구성이 되어도 음성대응신호의 생성이 가능하다.In addition, the voice recognition / synthesis server can generate a voice response signal even if the grammar file, which is a command for generating an answer in response to a voice signal of the user, is composed of various different types.

도 1은 일반적인 음성인식 시스템을 나타낸 블록도,1 is a block diagram showing a general voice recognition system,

도 2는 본 발명의 음성인식 시스템을 나타낸 블록도,2 is a block diagram showing a voice recognition system of the present invention;

도 3은 도 2의 음성인식 시스템의 한 구성요소인 미들웨어를 나타낸 상세 블록도이다.FIG. 3 is a detailed block diagram illustrating middleware that is a component of the voice recognition system of FIG. 2.

<도면의 주요부분에 대한 부호설명><Code Description of Main Parts of Drawing>

A : 미들웨어10 : 음성신호입력수단A: middleware 10: voice signal input means

20 : 신호전송수단 30,30a : 어플리케이션부20: signal transmission means 30,30a: application unit

40,40a : 인터페이스부50,50a,50b : 음성인식/합성서버40,40a: interface unit 50,50a, 50b: voice recognition / synthesis server

60 : 음성대응신호출력수단100 : 스위칭서버60: voice response signal output means 100: switching server

110 : 사용자입력부120 : 데이터취합부110: user input unit 120: data collection unit

130 : 외부데이터 저장부140 : 제어부130: external data storage unit 140: control unit

이하에서는 본 발명의 실시예에 대한 구성 및 작용을 첨부한 도면을 참조하면서 상세히 설명하기로 한다.Hereinafter, with reference to the accompanying drawings, the configuration and operation of the embodiment of the present invention will be described in detail.

도 2는 본 발명의 음성인식 시스템을 나타낸 블록이고, 도 3은 상기 도 2의 한 구성요소인 미들웨어를 나타낸 상세블록도이다.FIG. 2 is a block diagram illustrating a voice recognition system of the present invention, and FIG. 3 is a detailed block diagram showing middleware as one component of FIG. 2.

도 2에 도시한 바와 같이, 사용자의 음성이 입력되는 음성신호 입력수단(10), 상기 음성신호 입력수단(10)에 입력이 된 음성신호를 전송하는 신호전송수단(20)에 의해 전송이 된 음성신호를 인식하여 해당 음성신호에 대응하는 신호를 합성하여 음성대응신호를 출력하여 주는 음성인식 시스템에 있어서, 음성대응신호를 요청할 수 있도록, 상기 신호전송수단(20)에서 전송되는 음성신호를 음성대응 요청신호로 가공하는 어플리케이션부(30a)와, 상기 어플리케이션부(30a)에서 가공된 음성대응 요청신호를 입력받아, 상기 입력된 음성대응 요청신호에 대응하여 답변을 생성하는 음성인식/합성서버(50a)(50b)의 지정과, 음성대응 요청신호인 음성신호의 발음의 범위에 속하는 텍스트정보를 추출한 후, 지정된 음성인식/합성서버가 인식할 수 있도록, 상기 지정된 음성인식/합성서버에서 답변을 생성하도록 하는 명령어인 해당문법파일에 상기 텍스트정보를 삽입함으로써, 상기 텍스트정보가 삽입이 된 해당 문법파일과 사용자의 음성신호를 포함하는 가공이 된 음성신호를생성하는 스위칭서버(100)와, 상기 스위칭서버(100)에서 생성된 가공이 된 음성신호를 자동으로 출력시켜주는 인터페이스부(40a)와, 상기 인터페이스부(40a)로부터 입력된 가공이 된 음성신호를 입력받아 음성인식/합성엔진(미도시)에 의해, 가공이 된 음성신호와 음성대응신호 저장부(미도시)를 매칭하여 사용자가 요구하는 음성대응신호를 생성하는 해당 음성인식/합성서버(50a)(50b)를 포함하여 구성된다.As shown in FIG. 2, the voice signal input means 10 for inputting the user's voice is transmitted by the signal transmission means 20 for transmitting the voice signal input to the voice signal input means 10. In a voice recognition system for recognizing a voice signal and synthesizing a signal corresponding to the voice signal and outputting a voice response signal, the voice signal transmitted from the signal transmission means 20 is voiced to request a voice response signal. An application unit 30a for processing a corresponding request signal, and a voice recognition / synthesis server that receives a voice response request signal processed by the application unit 30a and generates an answer in response to the input voice response request signal ( 50a) (50b) and extract the text information belonging to the pronunciation range of the voice signal, which is the voice response request signal, and then the designated voice recognition / synthesis server to recognize Switching to generate a processed voice signal including the grammar file into which the text information is inserted and the user's voice signal by inserting the text information into a corresponding grammar file that is an instruction to generate an answer in the Quinceanera / Synthesis server. The server 100, the interface unit 40a for automatically outputting the processed voice signal generated by the switching server 100, and receives the processed voice signal input from the interface unit 40a A speech recognition / synthesis server 50a for matching a processed speech signal with a speech corresponding signal storage unit (not shown) by a speech recognition / synthesis engine (not shown) to generate a speech corresponding signal required by a user ( 50b).

상기 어플리케이션부(30a)는 입력되는 음성신호가 있을 때마다 거기에 대응하여 생성이 되는 음성대응신호를 요청하기만 한다.The application unit 30a only requests a voice response signal generated in response to an input voice signal.

상기 스위칭서버부(100)는 음성인식 시스템에 구축되어 있는 음성인식/합성서버(50a)(50b)의 IP(Internet Protocol)어드레스, 상기 음성인식/합성서버(50a)(50b)의 상표명, 상기 음성인식/합성서버(50a)(50b)가 인식하여 음성대응신호를 생성할 수 있도록 하는 문법파일(명령어) 및 사용자가 입력한 음성신호의 발음의 범위에 속하는 텍스트를 경우의 수를 참조하여 제작이 된 텍스트들을 입력하는 사용자입력부(110)와, 상기 사용자입력부(110)에서 입력이 되는 데이터 중, 음성인식/합성서버(50a)(50b)인식용 문법파일데이터와 사용자가 입력한 음성신호의 발음의 범위에 속하는 텍스트데이터가 저장이 되는 외부데이터 저장부(130)와, 음성인식/합성서버(50a)(50b)의 전송경로가 지정이 된 음성대응 요청신호를 입력받아 상기 외부데이터 저장부(130)에 저장된 음성인식/합성서버(50a)(50b)인식용 문법파일데이터에서 해당 음성인식/합성서버(50a)(50b)의 문법파일정보 추출 및 이와 동시에 상기 외부데이터 저장부(130)에 저장된 사용자가 입력한 음성신호의 발음의 범위에 속하는 텍스트데이터와 상기 입력된 음성대응 요청신호인 음성신호를 매칭하여 추출되는 텍스트정보를 상기 해당 음성인식/합성서버(50a)(50b)의 문법파일정보에 삽입하여, 상기 텍스트정보가 삽입이 된 문법파일정보와 사용자의 음성신호를 포함하는 가공이 된 음성신호를 생성하는 데이터취합부(120)와, 상기 사용자입력부(110)에서 입력이 된 음성인식/합성서버(50a)(50b)의 IP어드레스데이터와 상표명데이터, 상기 음성인식/합성서버(50a)(50b)가 인식하여 음성대응신호를 생성할 수 있도록 하는 명령어인 문법파일데이터 및 사용자가 입력한 음성신호의 발음의 범위에 속하는 텍스트데이터를 임시저장부(미도시)에 래치하고, 상기 래치된 데이터 중, 문법파일데이터 및 텍스트데이터를 외부데이터 저장부(130)에 저장이 되도록 제어하며, 입력이 된 음성대응 요청신호의 답변인 음성대응신호를 생성하는 음성인식/합성서버(50a)(50b)를 지정할 수 있도록, 임시저장부에 래치된 음성인식/합성서버(50a)(50b)의 상표명데이터를 판독하여 해당 음성인식/합성서버(50a)(50b)를 지정하고, 상기 지정된 해당 음성인식/합성서버(50a)(50b)의 음성대응요청신호에 의해 음성대응신호를 생성할 수 있도록, 가공이 된 음성신호를 생성하도록 상기 데이터취합부(120)를 제어하는 제어부(140)를 포함하여 구성이 되며, 상기 스우칭서버부(100)의 한 구성요소인 제어부(140)에서 생성된 가공이 된 음성신호를 해당 음성인식/합성서버로 데이터를 자동으로 전송하는 인터페이스부(40a)를 더 포함하여 구성하고, 상기 스우칭서버부(100)와 인터페이스부(40a)를 통칭하여 미들웨어(A)라 명명한다.The switching server unit 100 is an IP (Internet Protocol) address of the voice recognition / synthesis server (50a) 50b built in the voice recognition system, the brand name of the voice recognition / synthesis server (50a) 50b, the Produces a grammar file (command) that allows the voice recognition / synthesis server 50a and 50b to generate a voice response signal, and texts that fall within the pronunciation range of the voice signal input by the user with reference to the number of cases. The user input unit 110 for inputting the texts, and of the data input from the user input unit 110, the speech recognition / synthesis server (50a) (50b) of the recognition grammar file data and the voice signal input by the user The external data storage unit 130 receives text data corresponding to the pronunciation range and receives a voice response request signal in which a transmission path of the voice recognition / synthesis server 50a or 50b is designated. Voice recognition / synthesis stored on 130 Extracting the grammar file information of the speech recognition / synthesis server 50a and 50b from the grammar file data for recognition and simultaneously converting the grammar file information from the user stored in the external data storage unit 130 The text information extracted by matching the text data belonging to the range of pronunciation with the voice signal which is the input voice response request signal is inserted into the grammar file information of the corresponding voice recognition / synthesis server 50a and 50b, and the text information. The data collecting unit 120 generates a processed voice signal including the grammar file information inserted and the user's voice signal, and the voice recognition / synthesis server 50a inputted from the user input unit 110 ( IP address data and brand name data of 50b), grammar file data which is a command for the voice recognition / synthesis server 50a (50b) to recognize and generate a voice response signal, and the range of pronunciation of the voice signal input by the user Latches the text data belonging to the temporary storage unit (not shown), controls the grammar file data and the text data to be stored in the external data storage unit 130 among the latched data, and receives an input voice response request signal. Read the brand name data of the voice recognition / synthesis server 50a (50b) latched in the temporary storage unit so that the voice recognition / synthesis server 50a (50b) can generate a voice response signal. The speech signal is processed to designate a speech / synthesis server 50a, 50b, and to generate a speech response signal by the speech response request signal of the speech recognition / synthesis server 50a, 50b. And a control unit 140 for controlling the data collection unit 120 to generate the corresponding audio signal generated by the control unit 140 which is one component of the pitching server unit 100. Automated data with voice recognition / synthesis server It further comprises an interface unit 40a for transmitting to, and the stacking server unit 100 and the interface unit 40a collectively referred to as middleware (A).

상기 사용자입력부(110)에서 입력되는 데이터 중,음성인식/합성서버(50a)(50b)의 상표명 및 상기 음성인식/합성서버(50a)(50b)가 인식하여 음성대응신호를 생성할 수 있도록 하는 문법파일은 음성인식 시스템에 구축되어 있는 음성인식/합성서버(50a)(50b)의 종류가 바뀌어 대체되어질 경우, 상기 사용자입력부(110)에서 바뀌어진 음성인식/합성서버(50a)(50b)의 해당 상표명과 바뀌어진 음성인식/합성서버(50a)(50b)의 문법파일만 재입력만 하면 되며, 다양한 종류의 음성인식/합성서버(50a)(50b)가 구축이 되어 운영이 되어질 경우에도, 간단히 종류별 음성인식/합성서버(50a)(50b)의 상표명과 종류별 음성인식/합성서버(50a)(50b)의 문법파일만 입력을 하기만 하면, 제어부(140)의 제어에 의해 원활하게 음성인식 시스템이 구현되어진다.Among the data input from the user input unit 110, the brand name of the voice recognition / synthesis server (50a) (50b) and the voice recognition / synthesis server (50a) (50b) to recognize and generate a voice corresponding signal When the grammar file is replaced by a change in the type of the voice recognition / synthesis server 50a (50b) built in the voice recognition system, the user input unit 110 of the voice recognition / synthesis server 50a (50b) If only the grammar file of the corresponding brand name and the voice recognition / synthesis server (50a) (50b) has been re-entered, even if various types of voice recognition / synthesis server (50a) (50b) is built and operated, Simply input only the brand name of the voice recognition / synthesis server 50a and 50b of each type and the grammar file of the voice recognition / synthesis server 50a and 50b of each type, and the voice recognition is smoothly controlled by the control unit 140. The system is implemented.

또한 상기 제어부(140)는 사용자의 음성대응 요청신호가 폭주할 경우, 임의의 하나의 음성인식/합성서버(50a)(50b)로만 입력되어 부하가 발생하는 것을 방지하기 위해, 임시저장부에 저장되어 있는 음성인식/합성서버(50a)(50b)의 상표명 데이터를 판독하여 선택적으로 개개의 음성인식/합성서버(50a)(50b)에 분산입력이 되도록 제어한다.In addition, the control unit 140 is stored in the temporary storage in order to prevent the load is generated only input to any one voice recognition / synthesis server 50a (50b), when the user's voice response request signal congested The brand name data of the voice recognition / synthesis servers 50a and 50b is read out and selectively controlled to be distributed input to the individual voice recognition / synthesis servers 50a and 50b.

이하 상기 구성을 이용하여 작용을 설명하면 다음과 같다.Hereinafter, the operation using the above configuration will be described.

음성인식 시스템의 서비스를 받고자 하는 사용자가 음성신호입력수단(10)을 이용하여 음성을 입력하면, 입력된 음성신호는 신호전송수단(20)을 통하여 어플리케이션부(30a)로 전송이 되며, 어플리케이션부(30a)는 전송이 된 음성신호를 음성대응신호를 요청하는 음성대응 요청신호로 가공을 한 후에 스위칭서버(100)의 한 구성요소인 제어부(140)로 입력한다.When the user who wants to receive the service of the voice recognition system inputs the voice using the voice signal input means 10, the input voice signal is transmitted to the application unit 30a through the signal transmission means 20, and the application unit. 30a processes the transmitted voice signal into a voice response request signal for requesting a voice response signal and inputs the same to the controller 140 which is one component of the switching server 100.

제어부(140)는 임시저장부에 래치된 음성인식/합성서버(50a)(50b)의 상표명데이터를 판독하여 입력된 음성대응 요청신호의 답변인 음성대응신호를 생성하는 음성인식/합성서버(50a)(50b)를 지정하고, 음성인식/합성서버(50a)(50b)가 지정이 된 음성대응 요청신호를 데이터취합부(120)로 입력한다.The control unit 140 reads the brand name data of the voice recognition / synthesis server 50a and 50b latched in the temporary storage unit and generates a voice response signal which is a response to the input voice response request signal. ) 50b, and the voice recognition / synthesis server 50a, 50b inputs the designated voice response request signal to the data collection unit 120.

여기서 제어부(140)는 입력이 되는 음성대응 요청신호가 폭주할 경우, 임의의 하나의 음성인식/합성서버(50a)(50b)로 집중되어 부하가 발생하는 것을 방지하기 위해, 음성인식 시스템에 구축된 개개의 음성인식/합성서버(50a)(50b)에 고르게 분산이 되도록, 임시저장부에 래치된 음성인식/합성서버(50a)(50b)의 상표명데이터를 판독하여 선택적으로 음성인식/합성서버(50a)(50b)들을 지정할 수 있다.Here, the control unit 140 is built in a voice recognition system in order to prevent the load generated by concentrating on any one voice recognition / synthesis server (50a) (50b) when the voice response request signal to be congested The voice recognition / synthesis server is selectively read by reading the brand name data of the voice recognition / synthesis server 50a and 50b latched in the temporary storage unit so that the voice recognition / synthesis server 50a and 50b can be evenly distributed. (50a) (50b) can be specified.

데이터취합부(120)는 입력된 음성인식/합성서버(50a)(50b)가 지정이 된 음성대응 요청신호에 의해 외부데이터 저장부(130)에서 해당 음성인식/합성서버(50a)(50b)의 문법파일데이터를 추출하고, 이와 동시에 입력된 음성인식/합성서버(50a)(50b)가 지정이 된 음성대응 요청신호인 음성신호의 발음의 범위에 속하는 텍스트데이터도 추출한다.The data collecting unit 120 receives the corresponding voice recognition / synthesis server 50a and 50b from the external data storage unit 130 according to the input voice recognition / synthesis server 50a and 50b. The grammar file data is extracted, and at the same time, the text data belonging to the range of pronunciation of the voice signal which is the voice corresponding request signal designated by the input voice recognition / synthesis server 50a or 50b is also extracted.

여기서 추출된 문법파일데이터에 텍스트데이터를 삽입하고, 텍스트데이터가 삽입된 문법파일데이터와 사용자의 음성신호를 포함하는 가공이 된 음성신호를 제어부(140)로 입력을 하며, 인터페이스부(40a)는 상기 제어부(140)에 입력된 가공이 된 음성신호를 입력받아 자동으로 해당 음성인식/합성서버(50a)(50b)로 입력하고, 해당 음성인식/합성서버(50a)(50b)는 입력된 가공이 된 음성신호를 인식하여 음성합성엔진에 의해, 가공이 된 음성신호와 음성대응신호 저장부를 매칭하여 사용자가요구하는 음성대응신호를 생성한다.Here, the text data is inserted into the extracted grammar file data, and the processed voice signal including the grammar file data into which the text data is inserted and the user's voice signal is input to the controller 140, and the interface unit 40a receives the text data. The processed voice signal input to the controller 140 is automatically input to the corresponding voice recognition / synthesis server 50a, 50b, and the corresponding voice recognition / synthesis server 50a, 50b is inputted. The speech signal is recognized, and the speech synthesis engine matches the processed speech signal with the speech corresponding signal storage to generate a speech corresponding signal required by the user.

생성된 음성대응신호는 사용자가 인지할 수 있도록, 음성대응신호출력수단(60)에 출력되어진다.The generated voice response signal is output to the voice response signal output means 60 so that a user can recognize it.

여기서 음성대응신호출력수단(60)에 출력되어지는 음성대응신호의 전송경로는 해당 음성인식/합성서버(50a)(50b)에서 스위칭서버(100)와 인터페이스부(40a)를 포함하는 미들웨어(A)로 입력이 되어 신호전송수단(20)에 의해 전송될 수도 있으며, 또는 해당 음성인식/합성서버(50a)(50b)에서 바로 신호전송수단(20)으로 전송될 수도 있다.Here, the transmission path of the voice corresponding signal output to the voice corresponding signal output means 60 includes a middleware including the switching server 100 and the interface unit 40a in the corresponding voice recognition / synthesis servers 50a and 50b. May be input by the signal transmission means 20, or may be directly transmitted to the signal transmission means 20 from the voice recognition / synthesis server 50a or 50b.

이상에서 설명한 바와 같이, 본 발명은 음성인식서비스 제공자가 음성인식 서비스의 용도에 따라 다른 종류의 음성인식/합성서버로 교체하여 사용할 경우에도 음성인식 시스템에 구축이 된 소프트웨어를 새롭게 프로그래밍 할 필요가 없어 음성인식 시스템의 재구축에 따른 비용지출이 없으며, 더욱이 음성인식 시스템의 재구축시 요구되는 프로그램 개발자의 재교육이 필요하지 않은 효과가 있다.As described above, the present invention does not need to program the software built in the voice recognition system even when the voice recognition service provider is replaced with another type of voice recognition / synthesis server according to the use of the voice recognition service. There is no cost expenditure due to the reconstruction of the voice recognition system, and furthermore, there is an effect that the re-education of the program developer required for the reconstruction of the voice recognition system is not necessary.

또한, 음성인식 서비스 제공자가 다양한 종류의 음성인식/합성서버들을 운영할 경우에도 개개의 음성인식/합성서버에 요구되는 개개의 어플리케이션부가 없이도 음성인식 시스템의 운영이 가능한 효과가 있다.In addition, even when the voice recognition service provider operates various types of voice recognition / synthesis servers, it is possible to operate the voice recognition system without the individual application parts required for the individual voice recognition / synthesis servers.

또한, 사용자의 음성신호가 폭주하여도, 선택적으로 각각의 음성인식/합성서버에 음성신호를 분산입력 하도록 제어하여 음성인식/합성서버에 부하가 발생하는 것을 방지하는 효과가 있다.In addition, even when the user's voice signal is congested, it is possible to selectively control the voice signal input to each voice recognition / synthesis server to prevent the load on the voice recognition / synthesis server.

본 발명은 기재된 구체예 대해서만 상세히 설명되었지만 본 발명의 사상과 범위내에서 변형이나 변경할 수 있음은 본 발명이 속하는 분야의 당업자에게는 명백한 것이며, 그러한 변형이나 변경은 첨부한 특허청구범위에 속한다 할 것이다.Although the invention has been described in detail only with respect to the described embodiments, it will be apparent to those skilled in the art that modifications and variations can be made within the spirit and scope of the invention, and such modifications and variations will fall within the scope of the appended claims.

Claims

The voice signal input by the voice signal input means 10, the voice signal input means 10 for transmitting the voice signal input to the voice signal input means 10 to the user's voice is recognized and the corresponding voice signal In the speech recognition system for synthesizing the signal corresponding to the output of the speech corresponding signal,

An application unit 30a for processing a voice signal transmitted from the signal transmission means 20 into a voice response request signal so as to request a voice response signal;

Designation of voice recognition / synthesis server 50a and 50b that receives the voice response request signal processed by the application unit 30a and generates an answer in response to the input voice response request signal, and the designated voice recognition. Inserting the text information belonging to the pronunciation range of the voice signal, which is the voice response request signal, into the grammar file information, which is a command to generate an answer from the synthesis server, and including the grammar file information and the voice signal into which the text information is inserted. Switching server 100 for generating a voice signal, and

An interface unit 40a for automatically outputting the processed voice signal generated by the switching server;

The corresponding voice for receiving the processed voice signal input from the interface unit 40a and matching the processed voice signal with the voice corresponding signal storage unit by a voice recognition / synthesis engine to generate a voice corresponding signal required by the user. Speech recognition system for independent speech processing, characterized in that comprises a recognition / synthesis server.

The method of claim 1,

The switching server 100 recognizes the IP address and the brand name of the voice recognition / synthesis server 50a and 50b built in the voice recognition system, and is recognized by the voice recognition / synthesis server 50a and 50b. A user input unit 110 for inputting texts belonging to a range of pronunciation of a voice signal input by the user and a grammar file to generate a;

Among the data input by the user input unit 110, external data in which the grammar file data for voice recognition / synthesis server 50a and 50b and text data belonging to the pronunciation range of the voice signal input by the user are stored. Storage unit 130,

The grammar of the corresponding voice recognition / synthesis server (50a) 50b by receiving a voice response request signal with a specified transmission path of the voice recognition / synthesis server (50a) (50b) is matched with the external data storage unit (130). Extracts the text information corresponding to the pronunciation of the voice information input by the file information and the user, inserts the extracted text information into the extracted grammar file information, and inserts the grammar file information into which the text information is inserted. A data collecting unit 120 for generating a processed voice signal including a voice signal of a user;

The IP address data and the brand name data of the voice recognition / synthesis server 50a and 50b input by the user input unit 110 may be recognized by the voice recognition / synthesis server 50a and 50b to generate a voice response signal. Latching the grammar file data and the text data belonging to the pronunciation range of the voice signal input by the user to the temporary storage unit, and among the latched data, the grammar file data and the text data to the external data storage unit 130; A voice recognition / synthesis server 50a (50b) which is controlled to be stored and which generates a voice response signal which is a response to the input voice response request signal, latched in a temporary storage unit. The data collection unit may be configured to read the brand name data of the voice recognition / synthesis server 50a and 50b, and generate a voice corresponding signal from the designated voice recognition / synthesis server 50a and 50b. Control unit 140 to control 120 Speech recognition system for independent voice processing, characterized in that comprises.

The method of claim 2,

The controller 140 reads the brand name data of the voice recognition / synthesis server 50a and 50b stored in the temporary storage unit when the input voice response request signal is congested, and selectively recognizes and synthesizes the voice. Speech recognition system for independent speech processing, characterized in that it further comprises controlling to share the load by specifying the server (50a) (50b).

The method according to any one of claims 1 to 3,

The voice recognition / synthesis server (50a) (50b) is an independent voice, characterized in that the generation of a voice corresponding signal can be generated even if the grammar file that is a command to generate an answer corresponding to the user's voice signal is composed of different types Speech Recognition System for Speech Processing.