KR20100058687A

KR20100058687A - Apparatus and method for processing speech recognition for large vocabulary speech recognition

Info

Publication number: KR20100058687A
Application number: KR1020080103315A
Authority: KR
Inventors: 류창선; 정영준; 구명완; 김재인
Original assignee: 주식회사 케이티
Priority date: 2008-10-21
Filing date: 2008-10-21
Publication date: 2010-06-04
Also published as: KR101483307B1

Abstract

PURPOSE: A speech recognition device for the bulk voice recognition and a method thereof are provided to perform the bulk voice recognition in a limited computing environment. CONSTITUTION: A voice recognition support unit(120) receives voice recognition target words from a user according to the scenario of a voice recognition service through a server interface unit(140). If the voice recognition target word is inputted, a recognition target address book management unit(110) creates an address book integrated by a personal address book and a business address book. A voice recognition unit(130) recognizes the voice recognition target word as the voice recognition name of the integration address book.

Description

Speech Recognition Processing Apparatus and Method for Massive Speech Recognition {APPARATUS AND METHOD FOR PROCESSING SPEECH RECOGNITION FOR LARGE VOCABULARY SPEECH RECOGNITION}

본 발명은 대용량 음성인식을 위한 음성인식 처리 장치 및 그 방법에 관한 것으로, 더욱 상세하게는 개인 주소록과 기업 주소록을 이용한 통합 주소록을 사용자별로 생성하고, 그 생성된 통합 주소록에서 사용자의 음성인식 대상 단어에 해당하는 음성인식 명칭을 인식함으로써, 제약된 컴퓨팅 환경(예를 들면, 1,000 단어 급의 음성인식 엔진)에서도 대용량의 음성인식을 정확하면서도 신속하게 수행하여 사용자에게 안정적인 음성인식 서비스를 제공할 수 있는, 대용량 음성인식을 위한 음성인식 처리 장치 및 그 방법에 관한 것이다.The present invention relates to a voice recognition processing apparatus and a method for large-scale voice recognition, and more particularly, to generate a user-specific integrated address book using a personal address book and a corporate address book for each user, the user's voice recognition target words in the generated integrated address book By recognizing the voice recognition name corresponding to, it is possible to provide a stable voice recognition service to the user by performing a large amount of voice recognition accurately and quickly even in a restricted computing environment (for example, a 1,000 word speech recognition engine). The present invention relates to a speech recognition processing apparatus and a method for mass speech recognition.

음성인식 기술은 사용자 인터페이스로서 음성인식 기능을 제공하는 음성인식 서비스에 적용되며, 현재도 계속 발전하고 있다. 하지만, 음성인식 기술은 시장에서 사용자가 원하는 수준을 만족시키는 서비스 제공에 애로가 있다. "브랜드 콜 서 비스"를 예로 들어 설명하고자 한다. "브랜드 콜 서비스"의 경우는 음성인식 성능의 제약으로 서비스 활성화에 제약이 있었다. 당시 음성인식율은 100%이어야 한다는 사용자의 요구가 있었다. 그러나 현재는 음성인식율이 100%이어야 하기보다는 하나의 편리한 인터페이스로 인식되고 있다. 또한, 음성인식에 대한 요구가 보다 구체화되고 있다. 이러한 사용자 요구사항을 반영한 대용량의 음성인식 방법이 절실히 필요한 상황이다. 이러한 문제는 음성인식 기술에 있어 필수적으로 해결해야 한다. 이와 같은 문제가 해결이 되지 않는다면, 대용량의 음성인식을 원하는 사용자에게 대용량 음성인식 서비스를 효율적으로 제공하기 곤란하다.Voice recognition technology is applied to a voice recognition service that provides a voice recognition function as a user interface, and is still developing. However, voice recognition technology has difficulty in providing a service that satisfies the level desired by the user in the market. A description will be given using "brand call service" as an example. In the case of "brand call service", service activation was limited due to voice recognition performance. At the time, there was a demand from the user that the voice recognition rate should be 100%. However, it is now recognized as a convenient interface rather than having to be 100%. In addition, the demand for speech recognition has become more specific. There is an urgent need for a large capacity speech recognition method that reflects these user requirements. This problem must be solved in the voice recognition technology. If this problem is not solved, it is difficult to efficiently provide a large capacity voice recognition service to a user who wants a large capacity voice recognition.

한편, 대용량 음성인식 서비스 제공을 하는 음성인식 엔진이 있다고 하여도, 음성인식 기능은 시스템 컴퓨팅 능력에 따라 제약받게 된다. 음성인식 기능은 중앙처리장치에 의존적인 작업(CPU Bound Job)으로서, 프로팅 포인트(FP: Floating Point) 연산이 주를 이룬다. 몇 개의 채널이 동시에 중앙처리장치를 점유하여 음성인식을 요구하는 경우, 음성인식 서버에서의 중앙처리장치(CPU) 점유율은 100%에 이른다. 음성인식 알고리즘은 그만큼 중앙처리장치 자원을 많이 이용한다는 것이다.On the other hand, even if there is a voice recognition engine that provides a large capacity voice recognition service, the voice recognition function is limited by the system computing ability. The voice recognition function is a CPU-bound job (CPU Bound Job), mainly the floating point (FP) operation. When several channels simultaneously occupy the central processing unit and require voice recognition, the CPU share in the voice recognition server reaches 100%. Speech recognition algorithms use a lot of CPU resources.

따라서 음성인식 시스템은 아무리 성능 좋은 음성인식 엔진을 도입한다 해도, 컴퓨팅 능력의 한계로 인하여 대용량의 음성인식 처리에 무리가 따른다. 특히, 광대역통합망(BcN: Broadband Convergence Network)을 기반 망으로 활용하는 인터넷 전화(SoIP: Service Over IP) 시스템은 컴퓨팅 능력의 한계를 어느 정도 극복하는 음성인식 시스템을 절실히 필요로 한다. 또한, 인터넷 전화 시스템은 대용량 음 성인식을 요구하고 있다.Therefore, even if a voice recognition system adopts a high performance speech recognition engine, it is difficult to process a large volume of speech recognition due to the limitation of computing power. In particular, a service over IP (SoIP) system that utilizes a broadband convergence network (BcN) as a base network needs a voice recognition system that overcomes some limitations of computing power. In addition, Internet telephony systems require large-scale sound ceremony.

종래의 제약된 컴퓨팅 능력을 가지는 음성인식 시스템에서 대용량 음성인식 서비스를 제공하기 곤란하다는 문제점이 있다. 예를 들어, 수년 전에 도입되고 현재는 시스템의 업그레이드 작업이 없는, 지능망 IP 멀티미디어 서브시스템(IMS: IP Multimedia Subsystem)의 음성인식 서버 및 광대역통합망(BcN) 미디어 서버 영역 내의 음성인식 서버가 그 구체적 예가 될 수 있다. 이러한 음성인식 서버는 광대역통합망(BcN)을 기반 망으로 이용하는 인터넷 전화 기술에 적극 반영하려고 한다.There is a problem that it is difficult to provide a large capacity voice recognition service in the conventional voice recognition system having a limited computing capability. For example, voice recognition servers in the IP Multimedia Subsystem (IMS) and Broadband Integrated Network (BcN) media server areas, introduced several years ago and currently without system upgrades, are specific. This can be an example. Such voice recognition server will actively reflect on Internet telephony technology using broadband integrated network (BcN).

열거한 지능망과 광대역통합망(BcN) 장비는 수년에 걸쳐 서비스 상황에 따른 장비를 도입하게 됨으로써, 도입시기에 따른 컴퓨팅 능력 간의 성능차이가 상당히 많이 나고 있는 것이 현실이다. 예를 들어, 최근에 도입된 시스템의 중앙처리장치(CPU)는 코어가 4개인 쿼드코어(Quad Core)이지만, 초기 제품은 800Mhz의 싱글 코어(Single Core)이다. 따라서 시스템 구매 스펙 제약으로 기존 서비스는 시스템의 구매규격이 1,000 단어 급이라면, 1,000 단어 규모에서 서비스 개발을 기획하였다. 그러나 최신의 장비들은 1,000 단어 급 이상을 처리가 가능하므로 서비스 기획에 좀 더 자유로와 질 수 있다.The enumerated intelligent network and broadband integrated network (BcN) equipment has been introduced in accordance with the service situation over the years, so that the performance difference between the computing capabilities according to the introduction time is significantly increased. For example, the recently introduced system's CPU is a quad core with four cores, but the initial product is a single core of 800 MHz. Therefore, due to the constraints of the system purchase specification, the existing service plans to develop the service at 1,000 words if the system purchase standard is 1,000 words. However, the latest equipment can handle more than 1,000 words, giving you more freedom in service planning.

전술된 "브랜드 콜 서비스"의 예를 통해, "브랜드 콜 서비스" 시나리오를 간략하게 살펴보면 다음과 같다.By way of example of the "brand call service" described above, a brief description of the "brand call service" scenario is as follows.

먼저, 사용자는 브랜드콜 서비스 접속번호에 접속한다.First, the user accesses the brand call service access number.

그리고 고객이 원하는 회사의 브랜드명을 사용자 단말을 통해 얘기한다. 예를 들어, 사용자는 "A회사명"을 얘기한다.And the brand name of the company the customer wants to talk through the user terminal. For example, the user may say "Company A".

이어서, 음성인식 서비스 시스템은 음성인식을 수행한다.Then, the voice recognition service system performs voice recognition.

이후, 고객이 원하는 음성인식 결과라면, 음성인식 서비스 시스템은 해당 업체로 전화 발신을 수행한다.Then, if the customer wants a voice recognition result, the voice recognition service system performs a call to the company.

상기 음성인식 서비스 시나리오에서 음성인식 대상은 기업체의 브랜드명이다. 음성인식 서비스의 사용자 입장에서는 서비스 이용 시 1천 단어만 인식하는 것이 아니라고 생각할 수 있다. 즉, 서비스를 이용하는 습성은 자신이 원하는 것은 다 되는 것으로 이해하고 있다. 따라서 서비스를 이용하는 사용자의 입장은 "브랜드콜 서비스"의 음성인식 관점에서 만족도를 살펴볼 때, 품질이 부족할 것으로 보인다. 이런 경우, "브랜드콜 서비스"는 사용의 편의성을 제공하지 못하고 있는 문제점이 있다. 따라서 "브랜드콜 서비스"는 사용자들로부터 외면을 받게 된다. 즉, 종래 기술은 1,000 단어 규모로 음성인식이 되므로, 1,000 단어에 해당하는 브랜드명만 선정하고 그 선정된 브랜드명에 대한 음성인식을 대다수의 사용자에게 제공하고 있다. 그러므로 종래의 음성인식 기술은 적절한 음성인식율 확보에 문제가 있다. 따라서 종래의 음성인식 기술은 서비스의 품질이 떨어지므로 사용자에게 외면을 받게 된다는 문제점이 있다.In the voice recognition service scenario, the voice recognition target is a brand name of a company. The user of the voice recognition service may think that not only 1,000 words are recognized when using the service. In other words, he understands that the habit of using a service is all he wants. Therefore, the user's position using the service seems to lack quality when looking at the satisfaction from the voice recognition perspective of the "brand call service". In this case, the "brand call service" has a problem that does not provide the convenience of use. Thus, the "brand call service" is turned away from users. That is, since the conventional technology is speech recognition on the scale of 1,000 words, only the brand name corresponding to 1,000 words is selected and the voice recognition for the selected brand name is provided to the majority of users. Therefore, the conventional speech recognition technology has a problem in securing an appropriate speech recognition rate. Therefore, the conventional voice recognition technology has a problem in that the user receives an external appearance because the quality of the service is lowered.

따라서 상기와 같은 종래 기술은 제약된 컴퓨팅 환경(예를 들면, 1,000 단어 급의 음성인식 엔진)에서 대용량의 음성인식 서비스를 사용자에게 효율적으로 제공 하지 못한다는 문제점이 있으며, 이러한 문제점을 해결하고자 하는 것이 본 발명의 과제이다.Therefore, the prior art as described above has a problem in that a user cannot efficiently provide a large capacity speech recognition service to a user in a limited computing environment (for example, a 1,000 word speech recognition engine). It is a subject of the present invention.

따라서 본 발명은 개인 주소록과 기업 주소록을 이용한 통합 주소록을 사용자별로 생성하고, 그 생성된 통합 주소록에서 사용자의 음성인식 대상 단어에 해당하는 음성인식 명칭을 인식함으로써, 제약된 컴퓨팅 환경(예를 들면, 1,000 단어 급의 음성인식 엔진)에서도 대용량의 음성인식을 정확하면서도 신속하게 수행하여 사용자에게 안정적인 음성인식 서비스를 제공할 수 있는, 대용량 음성인식을 위한 음성인식 처리 장치 및 그 방법을 제공하는데 그 목적이 있다.Therefore, the present invention generates a consolidated address book using a personal address book and a corporate address book for each user, and recognizes a voice recognition name corresponding to a user's voice recognition word in the generated integrated address book, thereby restricting computing environment (for example, It is an object of the present invention to provide a speech recognition processing device and a method for mass speech recognition, which can provide a stable speech recognition service to a user by performing a large volume of speech recognition accurately and quickly even in a 1,000 word class speech recognition engine). have.

본 발명의 목적들은 이상에서 언급한 목적으로 제한되지 않으며, 언급되지 않은 본 발명의 다른 목적 및 장점들은 하기의 설명에 의해서 이해될 수 있으며, 본 발명의 실시예에 의해 보다 분명하게 알게 될 것이다. 또한, 본 발명의 목적 및 장점들은 특허 청구 범위에 나타낸 수단 및 그 조합에 의해 실현될 수 있음을 쉽게 알 수 있을 것이다.The objects of the present invention are not limited to the above-mentioned objects, and other objects and advantages of the present invention which are not mentioned can be understood by the following description, and will be more clearly understood by the embodiments of the present invention. Also, it will be readily appreciated that the objects and advantages of the present invention may be realized by the means and combinations thereof indicated in the claims.

본 발명은 상기 문제점을 해결하기 위하여, 개인 주소록과 기업 주소록을 이용한 통합 주소록을 사용자별로 생성하고, 그 생성된 통합 주소록에서 사용자의 음성인식 대상 단어에 해당하는 음성인식 명칭을 인식하는 것을 특징으로 한다.The present invention is to solve the above problems, characterized in that for generating a user-specific integrated address book and a corporate address book for each user, and to recognize the voice recognition name corresponding to the user's voice recognition target words in the generated integrated address book .

더욱 구체적으로, 본 발명은, 대용량 음성인식을 위한 음성인식 처리 장치에 있어서, 응용 서버와의 음성인식 서비스를 위한 인터페이스를 제공하기 위한 서버 인터페이스 수단; 상기 인터페이스를 통해 음성인식 서비스의 시나리오에 따라 사용자로부터 음성인식 대상 단어를 입력받아 음성인식 서비스를 지원하기 위한 음성인식 지원 수단; 개인 주소록과 기업 주소록을 구분하여 관리하되, 상기 사용자로부터 음성인식 대상 단어가 음성으로 입력되면 상기 관리되는 개인 주소록 및 기업 주소록을 이용하여 통합 주소록을 사용자별로 생성하기 위한 인식대상 주소록 관리 수단; 및 상기 입력된 음성인식 대상 단어를 상기 생성된 통합 주소록에서의 음성인식 명칭으로 인식하기 위한 음성 인식 수단을 포함한다.More specifically, the present invention provides a speech recognition processing apparatus for a large capacity speech recognition, comprising: server interface means for providing an interface for a speech recognition service with an application server; Voice recognition support means for receiving a voice recognition target word from a user according to a scenario of a voice recognition service through the interface to support a voice recognition service; Recognition target address book management means for separately managing the personal address book and the corporate address book, for generating an integrated address book for each user by using the managed personal address book and the corporate address book when the voice recognition target word is input from the user by voice; And voice recognition means for recognizing the input voice recognition target word as a voice recognition name in the generated integrated address book.

한편, 본 발명은, 대용량 음성인식을 위한 음성인식 처리 방법에 있어서, 응용 서버와의 인터페이스를 통해 음성인식 서비스의 시나리오에 따라 사용자로부터 음성인식 대상 단어를 입력받는 단어 입력 단계; 상기 사용자로부터 음성인식 대상 단어가 음성으로 입력되면, 서로 구분된 개인 주소록과 기업 주소록을 이용하여 통합 주소록을 사용자별로 생성하는 통합 주소록 생성 단계; 및 상기 입력된 음성인식 대상 단어를 상기 생성된 통합 주소록에서의 음성인식 명칭으로 인식하는 음성 인식 단계를 포함한다.On the other hand, the present invention, in the speech recognition processing method for large-capacity speech recognition, word input step of receiving a voice recognition target word from the user according to the scenario of the speech recognition service through the interface with the application server; Generating an integrated address book for each user by using a personal address book and a corporate address book distinguished from each other when a voice recognition target word is input by voice from the user; And a voice recognition step of recognizing the input voice recognition target word as a voice recognition name in the generated integrated address book.

또한, 상기 본 발명의 방법은, 상기 인식된 음성인식 명칭에 대한 이용 내역을 사용자별로 분석하여 고객 선호도를 추출하는 고객 선호도 추출 단계를 더 포함한다.In addition, the method of the present invention further includes a customer preference extraction step of extracting customer preferences by analyzing the usage history of the recognized voice recognition name for each user.

상기와 같은 본 발명은, 개인 주소록과 기업 주소록을 이용한 통합 주소록을 사용자별로 생성하고, 그 생성된 통합 주소록에서 사용자의 음성인식 대상 단어에 해당하는 음성인식 명칭을 인식함으로써, 제약된 컴퓨팅 환경(예를 들면, 1,000 단어 급의 음성인식 엔진)에서도 대용량의 음성인식을 정확하면서도 신속하게 수행하여 사용자에게 안정적인 음성인식 서비스를 제공할 수 있는 효과가 있다.The present invention as described above, by creating an integrated address book using a personal address book and a corporate address book for each user, by recognizing a voice recognition name corresponding to the user's voice recognition words in the generated integrated address book, limited computing environment (eg For example, the 1,000 words speech recognition engine) has the effect of providing a stable speech recognition service to the user by performing a large volume of speech recognition accurately and quickly.

즉, 본 발명은, BcN 기반망의 인터넷 전화 단말을 이용한 서비스에서 대용량의 음성인식을 기존 규격의 시스템을 통해 제공함으로써, 시스템 구축에 따른 경비 절감과 음성인식율의 보장을 통해, 서비스 개발에 있어서 대용량의 음성인식을 제공 가능하게 할 수 있는 효과가 있다. 또한, 본 발명은, 음성인식 적용을 위한 서비스 개발 시 정확한 음성인식 성능 및 안정성을 제공하여 사용자에게 안정적 서비스 제공을 위한 편의성을 제공해주므로, 결국 사업의 활성화에 기여하는 효과가 있다.That is, the present invention, by providing a large volume of voice recognition in the service using the Internet phone terminal of the BcN-based network through the system of the existing standard, through the cost reduction and guarantee of the voice recognition rate according to the system construction, large capacity in the service development There is an effect that can enable the provision of speech recognition. In addition, the present invention provides accurate voice recognition performance and stability when developing a service for applying voice recognition, thereby providing convenience for providing a stable service to a user, thereby contributing to the activation of a business.

상술한 목적, 특징 및 장점은 첨부된 도면을 참조하여 상세하게 후술되어 있는 상세한 설명을 통하여 보다 명확해 질 것이며, 그에 따라 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명의 기술적 사상을 용이하게 실시할 수 있을 것이다. 또한, 본 발명을 설명함에 있어서 본 발명과 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에 그 상세한 설명을 생략하기로 한다. 이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 실시예를 상세히 설명하기로 한다.BRIEF DESCRIPTION OF THE DRAWINGS The above and other objects, features and advantages of the present invention will become more apparent from the following detailed description of the present invention when taken in conjunction with the accompanying drawings, It can be easily carried out. In the following description, well-known functions or constructions are not described in detail since they would obscure the invention in unnecessary detail. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1 은 본 발명에 따른 음성인식 처리 장치가 적용된 광대역통합망 시스템의 구성도이다.1 is a block diagram of a broadband integrated network system to which the speech recognition processing apparatus according to the present invention is applied.

도 1에 도시된 바와 같이, 광대역통합망 시스템은 음성인식 처리 장치(100) 및 응용 서버(10)를 포함한다. 여기서, 음성인식 처리 장치(100)는 인식대상 주소록 관리부(110), 음성인식 지원부(120), 음성 인식부(130) 및 서버 인터페이스부(140)를 포함한다. 또한, 가입자 단말(101)은 기업고객 단말(1011)과 개인고객 단말(1012)을 포함한다.As shown in FIG. 1, the broadband integrated network system includes a voice recognition processing apparatus 100 and an application server 10. Here, the voice recognition processing apparatus 100 includes a recognition target address book management unit 110, a voice recognition support unit 120, a voice recognition unit 130, and a server interface unit 140. In addition, the subscriber terminal 101 includes an enterprise customer terminal 1011 and an individual customer terminal 1012.

음성인식 처리 장치(100)는 도 1과 같은 서버 기반의 음성인식 서비스를 이용한 광대역통합망 시스템에 적용된다. 즉, 광대역통합망의 환경에서 적용된 일례가 도 1에 나타나 있다. 또한, 음성인식 처리 장치(100)는 지능망의 경우도 도 1과 같이 구현 가능하다.The speech recognition processing apparatus 100 is applied to a broadband integrated network system using a server-based speech recognition service as shown in FIG. 1. That is, an example applied in the environment of a broadband integrated network is shown in FIG. In addition, the voice recognition processing apparatus 100 may be implemented as shown in FIG. 1 in the case of an intelligent network.

여기서, 음성인식 처리 장치(100)에서 대용량 음성인식을 처리하기 위해서, 기본적으로 음성인식 엔진의 성능 향상이 필요하다. 음성인식 처리 장치(100)의 음성 인식부(130)는 음성인식(ASR: Automatic Speech Recognition) 서버로 이루어지며, 음성인식 엔진을 이용하여 음성인식 기능을 제공한다. 음성인식 엔진의 스펙이 낮은 음성인식 서버를 구비하고 있더라도, 본 발명에 따른 음성인식 처리 장치(100)가 대용량의 음성인식을 원하는 경우에 유용하게 적용될 수 있다.Here, in order to process the large-capacity speech recognition in the speech recognition processing apparatus 100, it is basically necessary to improve the performance of the speech recognition engine. The speech recognition unit 130 of the speech recognition processing apparatus 100 includes an automatic speech recognition (ASR) server, and provides a speech recognition function using a speech recognition engine. Even if the specification of the voice recognition engine is provided with a low voice recognition server, the voice recognition processing apparatus 100 according to the present invention can be usefully applied when a large volume of voice recognition is desired.

한편, 사용자가 사용자 단말(102)(예를 들면, 광대역통합망(BcN) 단말)을 통해 음성인식 서비스 번호를 이용하여 음성인식 서비스를 요청하면, 음성인식 처리 장치(100)의 서버 인터페이스부(140)는 사용자의 요청을 응용 서버(10)로 전송한 다.On the other hand, when a user requests a voice recognition service using a voice recognition service number through a user terminal 102 (for example, a broadband integrated network (BcN) terminal), the server interface unit of the voice recognition processing apparatus 100 ( 140 transmits the user's request to the application server (10).

그리고 응용 서버(10)는 음성인식 처리 장치(100)의 서버 인터페이스부(140)로부터 전송받은 음성인식 서비스 번호에 대한 정적 알고리즘(Static grammar)과 VXML 서버의 위치(VXML url)를 확인한다. 응용 서버(10)는 음성인식 처리 장치(100)로 그 확인 결과를 서버 인터페이스부(140)를 통해 전송한다.The application server 10 checks the static grammar of the voice recognition service number received from the server interface 140 of the voice recognition processing apparatus 100 and the location of the VXML url. The application server 10 transmits the verification result to the voice recognition processing apparatus 100 through the server interface unit 140.

이어서, 음성인식 처리 장치(100)는 사용자 단말(102)에 음성 확장성 표기언어(VXML: Voice eXtensible Markup Language) 기반의 음성인식 서비스를 제공한다. 그리고 사용자가 사용자 단말(102)을 통해 입력한 음성이 음성인식 처리 장치(100)에서 인식되고, 그 인식 결과는 서버 인터페이스부(140)를 통해 응용 서버(10)로 전송된다.Subsequently, the speech recognition processing apparatus 100 provides a speech recognition service based on a voice eXtensible Markup Language (VXML) to the user terminal 102. The voice input by the user through the user terminal 102 is recognized by the voice recognition processing apparatus 100, and the recognition result is transmitted to the application server 10 through the server interface unit 140.

이후, 응용 서버(10)는 그 인식 결과를 이용하여 사용자 단말(102)을 통해 사용자에게 응용 서비스를 제공한다.Thereafter, the application server 10 provides the application service to the user through the user terminal 102 by using the recognition result.

이하, 본 발명에 따른 음성인식 처리 장치(100)를 통해 음성인식 서비스를 제공하고자 하는 가입자 즉, 음성인식 대상 주소록의 소유자들로부터 음성인식 대상 주소록을 등록하는 과정부터 살펴보기로 한다.Hereinafter, a process of registering a voice recognition target address book from a subscriber who wants to provide a voice recognition service through the voice recognition processing apparatus 100 according to the present invention, that is, the owners of the voice recognition target address book will be described.

음성인식을 위하여, 음성 인식부(130)는 구비된 음성인식 엔진에 음성인식 대상 단어를 이용하여 음성인식 초기화 과정을 수행해야 한다. 즉, 음성 인식부(130)는 음성인식 엔진에 알려진 단어만 음성인식을 할 수 있다. 예를 들어, 음성인식 대상 단어가 기업명이며 대용량(예를 들면, 10,000 단어급)이라 가정하자. 이렇게 가정한다면, 종래의 음성인식 서버는 사용자가 가장 많이 사용할 것 같은 1,000 단어만 선정하여 음성인식 초기화를 해놓고 있다가, 사용자의 전화 접속을 통해 사용자에게 음성인식 결과를 제공하게 된다.For speech recognition, the speech recognition unit 130 must perform a speech recognition initialization process using the speech recognition target word in the speech recognition engine provided. That is, the speech recognition unit 130 may recognize speech only words known to the speech recognition engine. For example, suppose the target word for speech recognition is a company name and a large amount (for example, 10,000 words). If this is assumed, the conventional voice recognition server selects only 1,000 words which are most likely to be used by the user, and initializes the voice recognition, and provides the voice recognition result to the user through the user's telephone connection.

그러나 본 발명에 따른 음성 인식부(130)가 기업명을 인식한다고 가정할 때, 기업고객은 기업고객 단말(1011)을 통해 기업명에 대한 음성인식 대상 주소록(이하, '기업 주소록'이라 함)을 인터넷(예를 들면, 서비스 웹)을 통해 인식대상 주소록 관리부(110)에 등록하게 된다. 즉, 음성인식을 원하는 기업고객은 기업고객 단말(1011)을 통해 반드시 웹에 기업명을 등록해서 해당 기업체에서 원하는 명칭을 등록하게 한다. 이는 향후 서비스 시에 사용자가 정확한 기업명을 이용할 수 있도록 한다. 또한, 이는 음성 인식부(130)에서 기업명의 인식율이 정확하게 확보되도록 한다.However, assuming that the voice recognition unit 130 according to the present invention recognizes the corporate name, the corporate customer can use the corporate customer terminal 1011 to input a voice recognition target address book (hereinafter, referred to as a corporate address book) for the corporate name. (For example, the service web) is registered in the recognition target address book management unit 110. That is, the corporate customer who wants to recognize the voice must register the company name on the web through the corporate customer terminal 1011 to register the desired name in the corresponding enterprise. This allows the user to use the correct company name for future services. In addition, this allows the voice recognition unit 130 to correctly recognize the recognition rate of the company name.

한편, 인식대상 주소록 관리부(110)는 음성인식 서비스를 이용하려는 모든 개인고객의 개인고객 단말(1012)로부터 개인화된 서비스 목록을 등록받게 된다. 여기서, 개인화된 서비스 목록은 개인고객이 음성인식 서비스를 제공받기를 원하는 복수의 기업명으로 이루어진다. 인식대상 주소록 관리부(110)에 개인이 원하는 음성인식 대상 주소록(이하, '개인 주소록'이라 함)이 등록되면, 그 등록된 개인 주소록을 이용하여 응용 서버(10)는 개인에게 맞춤형 서비스를 제공하게 된다. 또한, 인식대상 주소록 관리부(110)는 개인의 맞춤형 서비스를 통해 음성인식율을 높일 수 있는 통합 주소록을 관리하게 된다. 종래의 "브랜드콜 서비스"처럼 불특정 다수가 아무런 제약 없이 이용하는 것을 막기 위해, 인식대상 주소록 관리부(110)는 음성인식 서비스를 제공받기 원하는 개인고객의 개인고객 단말(1012)로부터 개인 주 소록을 등록받게 된다. 여기서, 개인적으로 이용하려는 개인고객을 불특정 다수로 하여 서비스를 제공한다면, 종래의 음성인식 기술과 같이 음성인식 성능이 떨어지게 된다.On the other hand, the recognition target address book management unit 110 receives a personalized service list from the personal customer terminal 1012 of all individual customers who want to use the voice recognition service. Here, the personalized service list is composed of a plurality of company names that the individual customer wants to receive the voice recognition service. When the voice recognition target address book (hereinafter, referred to as a 'personal address book') desired by the individual is registered in the recognition target address book management unit 110, the application server 10 may provide a personalized service to the individual by using the registered personal address book. do. In addition, the recognition target address book management unit 110 manages an integrated address book that can increase the voice recognition rate through a personalized service. In order to prevent the unspecified number from being used without any limitation as in the conventional "brand call service", the recognition target address book management unit 110 registers a personal address book from the personal customer terminal 1012 of the individual customer who wants to receive a voice recognition service. do. In this case, if the service is provided by a non-specified number of individual customers who want to use personally, voice recognition performance is degraded as in the conventional voice recognition technology.

한편, 음성 인식부(130)에서 수행되는 음성인식 초기화 과정에 대하여 살펴보기로 한다.Meanwhile, a description will be given of a voice recognition initialization process performed by the voice recognition unit 130.

종래의 음성인식 서버는 음성인식을 위한 대상 단어의 초기화를 수행한다. 종래의 음성인식 서버에서 수행되는 대상 단어의 초기화는 서비스의 형태에 따라 고정형의 초기화와 다이나믹 초기화로 나누어 질 수 있다. 고정형 초기화는 한번 초기화를 수행한 음성인식 서버가 다시는 초기화를 하지 않는 것을 의미한다. 이에 반해, 동적 초기화는 음성인식 서버가 필요 시 동적으로 초기화를 진행하는 것을 의미한다.The conventional speech recognition server initializes a target word for speech recognition. The initialization of the target word performed in the conventional voice recognition server may be divided into fixed initialization and dynamic initialization according to the type of service. Fixed initialization means that the voice recognition server once initialized does not initialize again. In contrast, dynamic initialization means that the voice recognition server dynamically initializes when necessary.

반면, 음성 인식부(130)는 전술된 고정형과 동적 초기화를 혼합하여 혼합 형태의 음성인식 초기화를 수행한다. 즉, 음성 인식부(130)는 고정적으로 변경되지 않는 기업 주소록과 사용자인 개인고객이 자주 이용할 것 같은 개인 주소록을 통합하여 통합 주소록으로 저장한다. 그리고 음성 인식부(130)는 그 통합 주소록 즉, 기업 주소록에 대해서는 고정형 초기화를 수행하고, 개인 주소록에 대해서는 동적 초기화를 수행하는 혼합 형태의 음성인식 초기화를 수행한다.On the other hand, the speech recognition unit 130 performs the mixed speech recognition initialization by mixing the fixed type and the dynamic initialization described above. That is, the voice recognition unit 130 integrates the corporate address book which is not fixedly changed and the personal address book which is frequently used by the individual customer as the user and stores the integrated address book. The voice recognition unit 130 performs a fixed initialization of the integrated address book, that is, the corporate address book, and performs a dynamic initialization of the personal address book.

한편, 음성 인식부(130)는 낮은 음성인식 엔진을 구비하더라도 대용량의 음성인식 요구 시 음성인식율을 향상시키고자 한다. 따라서 음성인식 지원부(120)는 사용자가 사용자 단말(102)을 통해 음성인식을 이용한 이용 내역에 대한 고객 선호 도를 음성 인식부(130)에서의 인식 결과를 통해서 수집한다. 그리고 인식대상 주소록 관리부(110)는 그 수집된 고객 선호도를 해당 주소록에 저장시키고, 각 주소록에 반영시킨다.On the other hand, even if the speech recognition unit 130 is provided with a low speech recognition engine to improve the speech recognition rate when a large volume of speech recognition request. Therefore, the voice recognition support unit 120 collects the customer preferences for the user history using the voice recognition through the user terminal 102 through the recognition result in the voice recognition unit 130. In addition, the recognition target address book management unit 110 stores the collected customer preferences in the corresponding address book and reflects them in each address book.

도 2 는 본 발명에 따른 대용량 음성인식을 위한 음성인식 처리 장치의 일실시예 구성도이다.2 is a block diagram of an embodiment of a speech recognition processing apparatus for mass speech recognition according to the present invention.

도 2에 도시된 바와 같이, 음성인식 처리 장치(100)는 인식대상 주소록 관리부(110), 음성인식 지원부(120), 음성 인식부(130) 및 서버 인터페이스부(140)를 포함한다. 여기서, 인식대상 주소록 관리부(110)는 주소록 관리부(111), 개인 주소록 저장부(112), 기업 주소록 저장부(113), 고객 선호도 저장부(114) 및 통합 주소록 관리부(115)를 포함한다. 여기서, 고객은 음성인식 대상 명칭의 등록 과정과 음성인식 초기화 과정을 위하여 개인고객(향후 음성인식 서비스를 사용하는 고객)과 기업고객(음성인식을 원하는 기업체)으로 나누어진다. 그리고 공통적으로 많이 이용하는 주소록과 고객 이용내역에 따라 고객 선호도가 고객 선호도 저장부(114)에 저장된다. 개인 주소록은 고객이 선호하는 명칭에 따라 향후 갱신되게 된다.As shown in FIG. 2, the voice recognition processing apparatus 100 includes a recognition target address book management unit 110, a voice recognition support unit 120, a voice recognition unit 130, and a server interface unit 140. Here, the recognition target address book management unit 110 includes an address book management unit 111, a personal address book storage unit 112, a corporate address book storage unit 113, a customer preference storage unit 114, and an integrated address book management unit 115. Here, the customer is divided into an individual customer (a customer who uses a voice recognition service in the future) and a corporate customer (a company that wants voice recognition) for the registration process of the voice recognition object name and the voice recognition initialization process. In addition, the customer preference is stored in the customer preference storage unit 114 according to a commonly used address book and customer usage history. The personal address book will be updated according to the customer's preferred name.

이하, 음성인식 처리 장치(100)의 구성 요소 각각에 대하여 상세하게 설명하기로 한다.Hereinafter, each component of the speech recognition processing apparatus 100 will be described in detail.

우선, 기업고객이 기업고객 단말(1011)을 통해 음성인식 대상 단어인 기업명과 해당 수신 전화번호를 함께 입력하여 웹에서 기업명을 등록한다. 그리고 주소록 관리부(111)는 기업고객이 기업고객 단말(1011)을 통해 입력한 음성인식 안내 서비스에 필요한 기업명 즉, 음성인식 대상 단어인 기업명을 웹을 통해 등록받게 된다. 여기서, 주소록 관리부(111)는 등록된 기업 주소록을 기업 주소록 저장부(113)에 저장한다.First, a corporate customer registers a corporate name on the web by inputting a company name, which is a voice recognition target word, and a corresponding telephone number, through the corporate customer terminal 1011. In addition, the address book manager 111 registers a company name necessary for a voice recognition service inputted by the corporate customer through the corporate customer terminal 1011, that is, a company name which is a voice recognition target word through the web. Here, the address book manager 111 stores the registered company address book in the company address book storage 113.

또한, 개인고객이 개인고객 단말(1012)을 통해 자신이 선호하는 음성인식 대상 단어인 기업명을 입력하여 웹에서 등록한다. 그리고 주소록 관리부(111)는 개인고객 단말(1012)에 의해 등록된 기업명을 각 개인별로 구분하여 개인 주소록 저장부(112)에 저장시킨다. 개인고객은 가입과 동시에 개인 주소록을 갖게 된다. 이러한 개인 주소록은 개인적인 음성인식에 이용될 수 있다.In addition, the individual customer enters the company name, which is a voice recognition target word of his or her preference, through the personal customer terminal 1012 and registers on the web. In addition, the address book manager 111 classifies the company name registered by the personal customer terminal 1012 for each individual and stores the company name in the personal address book storage 112. Individual customers will have a personal address book upon signing up. This personal address book can be used for personal voice recognition.

다시 말하면, 기업고객은 기업고객 단말(1011)을 통해 웹에서 서비스 등록을 수행한다. 기업고객은 기업고객이 원하는 인식명칭과 수신전화번호를 입력하여 서비스 등록을 수행한다. 그리고 음성인식 서비스를 제공받고자 하는 개인고객들도 서비스 웹을 통해 서비스 등록을 수행한다.In other words, the corporate customer performs service registration on the web through the corporate customer terminal 1011. The corporate customer registers the service by inputting the recognition name and the incoming telephone number desired by the corporate customer. And individual customers who want to receive voice recognition service also register service through service web.

그리고 주소록 관리부(111)는 기업고객 단말(1011)로부터 등록받은 인식명칭과 수신전화번호를 기업 주소록 저장부(113)에 저장시켜 데이터베이스화시킨다. 또한, 주소록 관리부(111)는 개인고객 단말(1012)로부터 전송된 인식명칭과 수신전화번호를 개인 주소록 저장부(112)에 저장시켜 데이터베이스화시킨다.The address book manager 111 stores the recognition name and the received telephone number registered from the corporate customer terminal 1011 in the corporate address book storage 113 to make a database. In addition, the address book manager 111 stores the recognition name and the received telephone number transmitted from the personal customer terminal 1012 in the personal address book storage 112 to make a database.

사용자가 사용자 단말(102)을 통해 음성인식 서비스를 요청하면, 주소록 관리부(111)는 음성인식 대상 주소록(개인 주소록과 기업 주소록)에 해당하는 기업명을 통합하여 통합 주소록을 구축한다. 일례로, 주소록 관리부(111)는 가장 많이 선호되는 기업명 순위에서 500개의 인식명칭과 개인 주소록에서 자신이 수정가능한 500개의 인식명칭을 구분하여 관리할 수 있다.When the user requests the voice recognition service through the user terminal 102, the address book manager 111 constructs an integrated address book by integrating the company name corresponding to the voice recognition target address book (personal address book and corporate address book). For example, the address book manager 111 may manage 500 recognition names in the rank of the most preferred company name and 500 recognition names which can be modified in the personal address book.

한편, 개인고객의 요청에 따라, 주소록 관리부(111)는 개인 주소록 저장부(112)에 저장된 개인 주소록을 갱신할 수 있다. 그러나 고정적으로 기업 주소록 저장부(113)에 저장된 약 500개의 기업명은 개인고객이 수정할 수 없다. 그 500개의 기업명은 음성인식 초기화에서 고정적인 데이터가 된다. 나머지 500개의 기업명은 개인 주소록을 통해 개인고객이 직접 관리할 수 있다. 즉, 개인고객이 본 서비스를 사용하기 위해 망에 접속하게 되면, 주소록 관리부(111)는 사용자 인증을 거쳐 고정형 500단어와 개인이 수정가능한 500개 단어를 더해서 음성인식 엔진의 음성인식 대상 주소록을 통합하여 통합 주소록을 구축한다.Meanwhile, at the request of the individual customer, the address book manager 111 may update the personal address book stored in the personal address book storage 112. However, about 500 company names fixedly stored in the corporate address book storage 113 cannot be modified by individual customers. The 500 company names are fixed data in voice recognition initialization. The remaining 500 company names can be directly managed by individual customers through personal address books. That is, when an individual customer accesses the network to use the service, the address book manager 111 integrates the voice recognition target address book of the speech recognition engine by adding 500 fixed words and 500 words that can be modified by the individual through user authentication. To build an integrated address book.

전술된 바와 같이, 음성인식 처리 장치(100)는 서비스 웹 등을 통해 등록된 개인고객에게 대용량 음성인식 서비스를 제공하고자 한다. 즉, 개인고객은 서비스 웹에서 사용자 등록을 수행한다. 이때, 서비스 웹에서 등록된 개인고객 단말(1012)을 이용하여 대용량 음성인식 서비스가 제공될 수 있다. 이러한 경우, 개인고객은 음성 인식부(130)가 가지는 제약된 컴퓨팅 환경에서도 높은 음성인식율로 음성인식 서비스를 제공받을 수 있다.As described above, the voice recognition processing apparatus 100 intends to provide a large capacity voice recognition service to a registered individual customer through a service web or the like. That is, the individual customer performs user registration on the service web. In this case, a large-capacity voice recognition service may be provided using the personal customer terminal 1012 registered on the service web. In this case, the individual customer may be provided with a voice recognition service at a high voice recognition rate even in the limited computing environment of the voice recognition unit 130.

즉, 음성인식 처리 장치(100)는 광대역통합망(BcN) 기반의 인터넷 전화(SoIP)에서 낮은 음성인식 엔진을 통해서도 높은 음성인식율을 제공하는 대용량 음성인식 서비스를 제공하고자 한다. 따라서 광대역통합망에서 인터넷 전화(SoIP) 기반 음성인식 서비스를 제공함에 있어, 대용량 음성인식 제공을 위한 시스템에 음성인식 처리 장치(100)를 적용하고자 한다.That is, the voice recognition processing apparatus 100 is to provide a large capacity voice recognition service that provides a high voice recognition rate even through a low voice recognition engine in a broadband integrated network (BcN) -based Internet telephone (SoIP). Therefore, in providing an Internet telephony (SoIP) based voice recognition service in a broadband integrated network, the voice recognition processing device 100 is to be applied to a system for providing a large capacity voice recognition.

한편, 개인고객에게 음성인식 서비스를 제공하기 위하여, 주소록 관리 부(111)는 개인고객별로 이용 내역이 분석된 이용 기록을 고객 선호도 저장부(114)에 데이터베이스로 구축해 놓는다. 그에 따라, 음성인식 지원부(120)는 기 구축된 데이터베이스로부터 고객 선호도를 추출한다. 그리고 음성인식 지원부(120)가 주소록 관리부(111)로 추출된 고객 선호도를 전달해주면, 주소록 관리부(111)는 추출된 고객 선호도를 개인 주소록에 임의 주기에 따라 반영시킬 수 있다. 주소록 관리부(111)는 갱신된 개인 주소록을 이용하여 미리 구축된 통합 주소록 관리부(115)에 저장된 통합 주소록을 갱신시킬 수 있다. 여기서, 미리 정해진 고정적인 기업 주소록 이외의 개인고객별로 수정가능한 개인 주소록이 갱신된다.On the other hand, in order to provide a voice recognition service to the individual customer, the address book management unit 111 builds a usage record analyzed by the individual customer history in the customer preference storage unit 114 as a database. Accordingly, the voice recognition support unit 120 extracts customer preferences from a pre-built database. In addition, when the voice recognition support unit 120 delivers the extracted customer preferences to the address book manager 111, the address book manager 111 may reflect the extracted customer preferences to the personal address book at random intervals. The address book manager 111 may update the integrated address book stored in the integrated address book manager 115 previously constructed using the updated personal address book. Here, the personal address book which can be modified for each individual customer other than the predetermined fixed corporate address book is updated.

이후, 사용자가 사용자 단말(102)을 통해 음성인식 서비스를 음성인식 처리 장치(100)에 요청하면, 음성인식 지원부(120)는 사용자를 확인하여 주소록 관리부(111)에 해당 사용자에 대한 통합 주소록을 요청한다. 그러면, 주소록 관리부(111)는 음성인식을 요청한 사용자의 개인 주소록과 미리 정해진 기업 주소록을 이용하여 사용자에 대한 통합 주소록을 통합 주소록에 구축시킨다.Then, when the user requests the voice recognition service 100 through the user terminal 102, the voice recognition support unit 120 checks the user to the address book management unit 111 to the integrated address book for the user request. Then, the address book manager 111 builds an integrated address book for the user in the integrated address book by using the personal address book of the user who requested voice recognition and a predetermined corporate address book.

이어서, 음성인식 지원부(120)는 그 사용자에 대한 통합 주소록을 인식대상 주소록 관리부(110)로부터 전달받아 음성 인식부(130)로 전달한다. 그리고 음성 인식부(130)는 전달받은 사용자에 대한 통합 주소록을 이용하여 음성인식 초기화 과정을 수행한다. 이때, 음성인식 지원부(120)는 서비스 가입된 발신자가 이용할 경우 등록된 발신단말의 번호를 확인하고, 그 확인된 발신단말의 사용자에 대한 통합 주소록을 음성 인식부(130)로 전달하여 음성인식 초기화를 수행하게 한다. 그에 따라, 음성 인식부(130)는 개인별로 미리 등록된 개인 주소록에 따른 통합 주소록을 이용하여 음성인식 대상 주소록을 초기화함으로써 맞춤형 서비스를 사용자에게 제공할 수 있다. 이는 종래에 도입된 음성인식 엔진의 하드웨어 스펙이 낮음에도 음성인식 성능의 변화없이 대용량의 음성인식 서비스를 지원하기 위함이다. 또한, 주어진 환경에서 대용량의 음성인식 서비스 제공을 통해 인식성능의 향상을 꾀할 수 있다. 따라서 광대역통합망 SoIP 기술의 활성화에 크게 기여가 가능하다.Subsequently, the voice recognition support unit 120 receives the integrated address book for the user from the recognition target address book management unit 110 and transmits the received address book to the voice recognition unit 130. The voice recognition unit 130 performs a voice recognition initialization process using the integrated address book for the received user. At this time, the voice recognition support unit 120 checks the number of the registered calling terminal when the caller subscribed to the service, and delivers the integrated address book for the user of the identified calling terminal to the voice recognition unit 130 to initialize the voice recognition To perform Accordingly, the voice recognition unit 130 may provide the user with a customized service by initializing the voice recognition target address book using the integrated address book according to the personal address book registered in advance for each individual. This is to support a large capacity voice recognition service without changing the voice recognition performance even though the hardware specification of the conventional voice recognition engine is low. In addition, it is possible to improve the recognition performance by providing a large capacity voice recognition service in a given environment. Therefore, it can greatly contribute to the activation of broadband integrated network SoIP technology.

한편, 음성 인식부(130)의 성능은 하드웨어(음성인식 엔진) 스펙의 영향을 많이 받는다. 이는 음성인식 알고리즘이 다른 기능보다 중앙처리장치(CPU)의 자원을 많이 의존해서 처리되기 때문이다. 더 나아가, 음성인식을 원하는 대상 단어가 대용량으로 증가하는 경우, 음성인식 시간은 급격하게 늘어날 수 있다. 따라서 음성인식 단어수가 1천 단어급의 능력을 가지는 음성인식 처리 장치(100)는 음성인식 대상 단어가 증가함에 따라, 음성인식율보다는 음성인식 시간에 영향을 많이 미치게 된다.On the other hand, the performance of the speech recognition unit 130 is greatly influenced by hardware (speech recognition engine) specifications. This is because the voice recognition algorithm is processed based on more resources of the CPU than other functions. Furthermore, when the target word for speech recognition increases in large capacity, the speech recognition time may increase rapidly. Therefore, the voice recognition processing apparatus 100 having the capability of 1,000 words of voice recognition words has more influence on voice recognition time than voice recognition rate as the words to be recognized are increased.

따라서 본 발명에 따른 음성인식 처리 장치(100)는 제한된 하드웨어 스펙상에서 대용량의 음성인식 효과를 사용자에게 제공하고자 한다. 이를 위하여, 주소록 관리부(111)는 기업고객에 의해 등록된 기업 주소록, 개인고객에 의해 등록된 개인 주소록 및 사용자의 이용 내역이 반영된 고객 선호도를 이용하여 통합 주소록을 구축한다. 그러면, 음성인식 처리 장치(100)는 음성인식율을 보장하면서 대용량 음성인식 서비스의 효과를 사용자에게 제공하게 된다. 음성인식 처리 장치(100)는 일반 사용자들에게 전혀 새로운 것이 아니라 일반적으로 이용하는 인식명칭을 통해 음성인식 서비스를 제공하게 된다. 그리고 음성인식 처리 장치(100)는 사용자의 고객 선호도에 따른 반영을 통해 음성인식 서비스를 제공함으로써, 음성인식율이 보장되는 음성인식 서비스를 사용자에게 제공할 수 있다.Therefore, the voice recognition processing apparatus 100 according to the present invention intends to provide a user with a large voice recognition effect on a limited hardware specification. To this end, the address book manager 111 constructs an integrated address book by using a corporate address book registered by a corporate customer, a personal address book registered by a personal customer, and customer preferences reflecting user usage details. Then, the speech recognition processing apparatus 100 provides the user with the effect of the large capacity speech recognition service while ensuring the speech recognition rate. The speech recognition processing apparatus 100 may provide a speech recognition service through a recognition name which is generally not new to general users but generally used. In addition, the voice recognition processing apparatus 100 may provide a voice recognition service to which the voice recognition rate is guaranteed by providing a voice recognition service through reflection according to the user's customer preference.

한편, 기업명을 통한 음성인식 서비스는 실제 행정적으로 이용하는 기업명과 실제 사용자들이 일반적으로 이용하는 인식명칭 간에는 괴리가 존재한다. 간단한 이유로 기업의 입장에서는 기업명이 기업을 대표하는 것으로 인식명칭에 의미를 내포하게 하기 위해, 기업명은 자세히 적게 되어 있다. 그러나 일반 사용자는 최대한 간단히 말하는 경향이 있다. 예를 들어 "삼성화재해상보험주식회사"가 공식명칭이지만, 사용자는 일반적으로 "삼성화재" 또는 "애니카"라는 브랜드를 얘기하곤 한다.On the other hand, in the voice recognition service through the company name, there is a difference between the name of the company that is actually used administratively and the name of the name commonly used by real users. For the sake of simplicity, the name of the company has been reduced in detail so that the name of the company represents the company and implies meaning in the recognition name. But the average user tends to speak as simply as possible. For example, "Samsung Fire & Marine Insurance Co., Ltd." is the official name, but users generally refer to the brand "Samsung Fire" or "Anika".

따라서 기업명을 음성인식 처리하는 경우에 있어, 해당 기업에 대해 일반적인 사용자들이 얘기하는 인식명칭이 미리 수집될 수 있다. 그리고 주소록 관리부(111)는 사용자가 그 수집된 인식명칭을 직접 입력가능하게 함으로써, 음성인식율의 제고 및 해당 서비스에 대한 고객 만족도를 높일 수 있다.Therefore, in the case of speech recognition processing of the company name, the recognition name that general users talk about the company may be collected in advance. In addition, the address book manager 111 may enable the user to directly input the collected recognition names, thereby increasing the voice recognition rate and increasing customer satisfaction with the corresponding service.

한편, 음성 인식부(130)가 기업명을 음성인식하는 경우, 불특정 다수를 대상으로 하게 되면 음성인식 성능이 낮아져서 음성인식 서비스가 곤란해진다. 이는 음성인식 대상 단어가 너무 많고 새로운 기업명이 증가하기 때문에 불특정 다수에게 동일한 음성인식 서비스를 제공하기 곤란하다.On the other hand, when the voice recognition unit 130 recognizes the name of the company, if the target of the unspecified number of voice recognition performance is low, voice recognition service becomes difficult. It is difficult to provide the same voice recognition service to an unspecified number because there are too many words to be recognized and new company names increase.

따라서 음성인식 처리 장치(100)는 음성인식 서비스를 이용하는 고객에게 맞춤형으로 제공하고자 한다. 주소록 관리부(111)는 기업명을 음성인식으로 제공받기를 원하는 개인고객으로부터 웹에 서비스를 등록받게 된다. 또한, 기업고객들은 자 신이 원하는 해당 기업명을 입력하게 된다.Therefore, the voice recognition processing device 100 is to provide a customized to the customer using the voice recognition service. The address book manager 111 registers a service on the web from an individual customer who wants to receive a company name by voice recognition. In addition, corporate customers enter the company name they want.

한편, 음성 인식부(130)는 선정된 기업명에 대해서 모두 음성인식 기능을 제공할 수 없으며, 음성인식 지원부(120)의 지원에 따라 음성인식 초기화된 기업명에 대해서 음성인식이 가능하다. 주소록 관리부(111)는 미리 분석된 인식대상 명칭의 빈도수를 확인하여 1천단어를 선별한다. 예를 들어, 음성인식 서비스를 위하여 음성인식 엔진의 하드웨어 스펙이 1천 단어 규모이므로, 주소록 관리부(111)는 선별된 단어 중 고정적인 기업명과 동적인 기업명으로 나누어서 통합 주소록을 구축한다. 주소록 관리부(111)는 기업 주소록 저장부(113)에 저장된 기업명 중 50%인 오백 단어는 통합 주소록 관리부(115)에 저장시켜 일반 사용자가 수정할 수 없도록 하게 한다. 그리고 나머지 50%인 오백단어는 1천 단어의 나머지 5백 단어로 채워 주되, 일반 사용자가 수정가능하도록 하게 한다. 이때, 기업체에서 등록한 기업명칭과의 일관성을 위해 인식대상 단어의 주소록은 다음과 같은 명칭으로 이루어지게 된다.Meanwhile, the voice recognition unit 130 may not provide a voice recognition function for all the selected company names, and voice recognition may be performed for the company name initialized by the voice recognition support unit 120 by the voice recognition support unit 120. The address book manager 111 selects 1,000 words by checking the frequency of the recognition target name analyzed in advance. For example, since the hardware specification of the voice recognition engine is 1,000 words for the voice recognition service, the address book manager 111 builds an integrated address book by dividing the fixed company name and the dynamic company name among the selected words. The address book manager 111 stores the five hundred words which are 50% of the company names stored in the company address book storage 113 in the integrated address book manager 115 so that the general user cannot modify them. The remaining 50% of the five hundred words are filled with the remaining 500 words of 1,000 words, allowing the general user to modify them. At this time, in order to be consistent with the company name registered by the enterprise, the address book of the recognition target word is made of the following names.

구체적으로 살펴보면, 주소록 관리부(111)는 최초 기업고객에 의해 등록된 기업명에 대해서 사용자가 수정하지 못하도록 한다. 즉, 기업고객이 직접 입력한 각 기업명에 대한 전화번호는 사용자에 의해 수정이 불가하다. 반면, 주소록 관리부(111)는 사용자가 수정가능한 음성인식 명칭에 대해서 사용자가 원하는 기업명으로 수정 가능하게 한다. 1천 단어 중 고정된 오백단어 이외에 사용자가 수정 가능한 오백단어에 대해 사용자가 수정 가능하다. 또한, 사용자가 웹에 등록과 함께 사용자의 발신 전화도 등록을 하게 된다.Specifically, the address book manager 111 prevents the user from modifying the company name registered by the first corporate customer. That is, the telephone number for each company name directly entered by the corporate customer cannot be modified by the user. On the other hand, the address book manager 111 enables the user to modify the company name desired for the voice recognition name that can be modified. In addition to the fixed five hundred words out of 1,000 words, the user can modify the five hundred words user can modify. In addition, the user registers on the web with the user's outgoing call.

한편, 주소록 관리부(111)는 음성인식 서비스를 등록한 일반 사용자가 음성인식 서비스를 이용하는 경우, 사용자 이용 내역을 음성인식 지원부(120)로부터 전달받아 고객 선호도 저장부(114)에 저장하게 된다. 음성인식 서비스를 가입한 사용자의 경우 음성인식 성능 및 이용 내역을 파악하기 위해, 고객 선호도 저장부(114)는 데이터베이스로 구축하여 고객 선호도를 저장한다. 여기서, 고객 선호도 저장부(114)는 고객 선호도로서, 발신단말번호(CID), 사용자 음성정보 및 음성인식 결과를 저장한다. 고객 선호도 저장부(114)에서 발신단말번호(CID)가 저장되는 것은 사용자의 구분을 위함이다.Meanwhile, when the general user who registers the voice recognition service uses the voice recognition service, the address book manager 111 receives the user usage history from the voice recognition supporter 120 and stores the user usage history in the customer preference storage 114. In the case of a user who subscribes to a voice recognition service, in order to grasp voice recognition performance and usage history, the customer preference storage unit 114 builds a database and stores customer preferences. Here, the customer preference storage unit 114 stores the calling terminal number (CID), user voice information, and voice recognition result as customer preferences. The calling terminal number (CID) is stored in the customer preference storage unit 114 for user identification.

음성인식 처리 장치(100)는 통합 주소록으로 구축된 데이터베이스에서 임의 주기로 고객 선호도를 추출하여 음성인식 서비스에 다시 반영하는 과정을 수행한다.The speech recognition processing apparatus 100 performs a process of extracting customer preferences at random intervals from a database constructed as an integrated address book and reflecting them back to the speech recognition service.

구체적으로 살펴보면, 주소록 관리부(111)는 음성인식 지원부(120)를 통해 전달된 음성인식 서비스를 요구하는 전체 사용자의 발화 명칭을 종합해서 각 발화 명칭별로 순위를 매긴다.In detail, the address book manager 111 ranks each speech name by synthesizing the speech names of all users who request the speech recognition service delivered through the speech recognition support 120.

그리고 주소록 관리부(111)는 매겨진 순위를 갖지고 1등부터 500등 까지의 음성인식 명칭에 대해서는 고정된 500개의 주소록으로 이루어진 기업 주소록에 새로이 반영시킨다. 주소록 관리부(111)는 나머지 500개의 명칭을 고객이 수정할 수 있는 개인 주소록에 반영시킨다. 여기서, 주소록 관리부(111)는 이전에 사용자가 수정가능한 음성인식 대상 주소록과 비교하여 새로이 진입된 것만을 개인 주소록에 반영시킬 수 있다. 또한, 주소록 관리부(111)는 새로이 진입된 주소록을 사용자에 게 공지한다. 그러면, 사용자는 주소록 관리부(111)를 통해 공지된 것을 바탕으로 이전의 개인 주소록을 수정할 수 있다. 이는 개인의 주소록에 고객 선호도를 반영하기 위함이다.And the address book management unit 111 has a ranked ranking and reflects newly to the corporate address book consisting of a fixed 500 address book for the voice recognition name from the first place to the 500th place. The address book manager 111 reflects the remaining 500 names in a personal address book that can be modified by a customer. Here, the address book manager 111 may reflect only the newly entered entry in the personal address book in comparison with the voice recognition target address book which the user may modify before. In addition, the address book manager 111 notifies the user of the newly entered address book. Then, the user may modify the previous personal address book based on what is known through the address book manager 111. This is to reflect customer preferences in an individual's address book.

이후, 최초 음성인식 서비스 개시에 이용되었던 즉, 미리 선별된 음성인식 대상 주소록은 시간이 지남에 따라 실제 사용자들의 이용 내역을 통해 이전에 비해 변화하게 된다. 따라서 시간이 지남에 따라 기업명(인식명칭)들은 개인의 취향이나 이용 내역에 따라 정제화될 것이다. 그러므로 대용량의 음성인식 요구에도 개개인의 맞춤형으로 제공되므로, 주소록 관리부(111)는 사용자가 원하는 수준의 음성인식 성능을 제공하기 위하여, 개인 주소록, 기업 주소록 및 통합 주소록을 관리하게 된다.Thereafter, the voice recognition target address book, which was used for the initial voice recognition service start, is changed over time through the usage history of actual users as time passes. Thus, over time, corporate names (recognition names) will be refined according to personal preferences and usage details. Therefore, since a large amount of voice recognition requirements are provided to the individual customization, the address book manager 111 manages a personal address book, corporate address book and integrated address book in order to provide a voice recognition performance of the level desired by the user.

그리고 음성인식 서비스를 사용자에게 제공할 때, 음성 인식부(130)는 인식대상 주소록 관리부(110)에서 갱신된 음성인식 명칭(기업명)들을 이용한다. 따라서 주소록 관리부(111)는 사용자별로 서비스 가입 시 등록한 단말 번호를 하나의 인식번호로 정하여 개인 주소록 저장부(112)에 각 사용자별 개인 주소록을 저장시킨다. 이러한 개인 주소록은 사용자가 개인고객 단말(1012)을 통해 웹에서 수정가능하며, 수정 시 실시간으로 개인 주소록 저장부(112)에 반영된다.When the voice recognition service is provided to the user, the voice recognition unit 130 uses the voice recognition names (company name) updated by the recognition target address book management unit 110. Therefore, the address book manager 111 sets a terminal number registered at the time of service subscription for each user as one identification number and stores the personal address book for each user in the personal address book storage 112. The personal address book can be modified by the user through the personal customer terminal 1012 on the web, and is reflected in the personal address book storage 112 in real time when the modification is made.

도 3 은 본 발명에 따른 대용량 음성인식을 위한 음성인식 처리 방법에 대한 일실시예 흐름도이다.Figure 3 is a flow diagram of an embodiment of a speech recognition processing method for large-capacity speech recognition according to the present invention.

먼저, 음성인식 지원부(120)는 응용 서버(10)와의 인터페이스를 통해 음성인식 서비스의 시나리오에 따라 사용자 단말(102)로부터 음성인식 대상 단어를 입력 받는다(302).First, the voice recognition support unit 120 receives a voice recognition target word from the user terminal 102 according to the scenario of the voice recognition service through the interface with the application server 10 (302).

그리고 사용자 단말(102)로부터 음성인식 대상 단어가 음성으로 입력되면, 인식대상 주소록 관리부(110)는 서로 구분된 개인 주소록과 기업 주소록을 이용하여 통합 주소록을 사용자별로 생성한다(304).When the voice recognition target word is input from the user terminal 102 by voice, the recognition target address book management unit 110 generates an integrated address book for each user by using the personal address book and the corporate address book separated from each other (304).

이어서, 음성 인식부(130)는 입력된 음성인식 대상 단어를 인식대상 주소록 관리부(110)에서 생성된 통합 주소록에서의 음성인식 명칭으로 인식한다(306).Subsequently, the voice recognition unit 130 recognizes the input voice recognition target word as a voice recognition name in the integrated address book generated by the recognition target address book management unit 110 (306).

이후, 음성인식 지원부(120)는 음성 인식부(130)에서 인식된 음성인식 명칭에 대한 이용 내역을 사용자별로 분석하여 고객 선호도를 추출한다(308).Thereafter, the voice recognition support unit 120 extracts the customer preference by analyzing the usage history of the voice recognition name recognized by the voice recognition unit 130 for each user (308).

그리고 인식대상 주소록 관리부(110)는 음성인식 지원부(120)에서 추출된 고객 선호도에 사용자의 발신 단말 번호, 음성 데이터 및 인식된 음성인식 명칭을 포함시키고, 그 추출된 고객 선호도를 이용하여 개인 주소록을 갱신한다(310).The recognition target address book management unit 110 includes the user's calling terminal number, voice data and the recognized voice recognition name in the customer preference extracted by the voice recognition support unit 120, and uses the extracted customer preference to create a personal address book. Update 310.

한편, 인식대상 주소록 관리부(110)는 인식된 음성인식 명칭의 빈도수를 확인하여 미리 설정된 개수의 음성인식 명칭을 선별하고, 그 선별된 음성인식 명칭이 포함된 통합 주소록을 생성한다. 또한, 인식대상 주소록 관리부(110)는 그 선별된 음성인식 명칭을 미리 설정된 비율로 수정가능한 명칭과 수정이 불가능한 명칭으로 구분하여 통합 주소록을 생성한다.Meanwhile, the recognition target address book management unit 110 checks the frequency of the recognized voice recognition names, selects a predetermined number of voice recognition names, and generates an integrated address book including the selected voice recognition names. In addition, the recognition target address book management unit 110 generates the integrated address book by dividing the selected voice recognition name into a name that can be modified at a preset rate and a name that cannot be modified.

한편, 전술한 바와 같은 본 발명의 방법은 컴퓨터 프로그램으로 작성이 가능하다. 그리고 상기 프로그램을 구성하는 코드 및 코드 세그먼트는 당해 분야의 컴퓨터 프로그래머에 의하여 용이하게 추론될 수 있다.　또한, 상기 작성된 프로그램은 컴퓨터가 읽을 수 있는 기록매체(정보저장매체)에 저장되고, 컴퓨터에 의하여 판독되고 실행됨으로써 본 발명의 방법을 구현한다. 그리고 상기 기록매체는 컴퓨터가 판독할 수 있는 모든 형태의 기록매체를 포함한다.On the other hand, the method of the present invention as described above can be written in a computer program. And the code and code segments constituting the program can be easily inferred by a computer programmer in the art. In addition, the written program is stored in a computer-readable recording medium (information storage medium), and read and executed by a computer to implement the method of the present invention. The recording medium may include any type of computer readable recording medium.

이상에서 설명한 본 발명은, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 있어 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 여러 가지 치환, 변형 및 변경이 가능하므로 전술한 실시예 및 첨부된 도면에 의해 한정되는 것이 아니다.The present invention described above is capable of various substitutions, modifications, and changes without departing from the technical spirit of the present invention for those skilled in the art to which the present invention pertains. It is not limited by the drawings.

도 1 은 본 발명에 따른 음성인식 처리 장치가 적용된 광대역통합망 시스템의 구성도,1 is a block diagram of a broadband integrated network system to which a speech recognition processing device according to the present invention is applied;

도 2 는 본 발명에 따른 대용량 음성인식을 위한 음성인식 처리 장치의 일실시예 구성도,2 is a block diagram of an embodiment of a speech recognition processing apparatus for mass speech recognition according to the present invention;

도 3 은 본 발명에 따른 도 3 은 본 발명에 따른 대용량 음성인식을 위한 음성인식 처리 방법에 대한 일실시예 흐름도이다.3 is a flowchart illustrating an embodiment of a speech recognition processing method for mass speech recognition according to the present invention.

* 도면의 주요 부분에 대한 부호의 설명* Explanation of symbols for the main parts of the drawings

10: 응용 서버 100: 음성인식 처리 장치10: application server 100: speech recognition processing unit

110: 인식대상 주소록 관리부 111: 주소록 관리부110: address book management unit 111: address book management unit

112: 개인 주소록 저장부 113: 기업 주소록 저장부112: personal address book storage 113: corporate address book storage

114: 고객 선호도 저장부 115: 통합 주소록 관리부114: customer preference storage 115: integrated address book management unit

120: 음성인식 지원부 130: 음성 인식부120: speech recognition support unit 130: speech recognition unit

140: 서버 인터페이스부140: server interface unit

Claims

In the speech recognition processing apparatus for a large capacity speech recognition,

Server interface means for providing an interface for voice recognition service with an application server;

Voice recognition support means for receiving a voice recognition target word from a user according to a scenario of a voice recognition service through the interface to support a voice recognition service;

Recognition target address book management means for separately managing the personal address book and the corporate address book, for generating an integrated address book for each user by using the managed personal address book and the corporate address book when the voice recognition target word is input from the user by voice; And

Voice recognition means for recognizing the input voice recognition target word as a voice recognition name in the generated integrated address book

Speech recognition processing device comprising a.

The method of claim 1,

The voice recognition support means,

And a function of extracting customer preferences by analyzing usage history of the recognized voice recognition name for each user.

The method of claim 2,

The voice recognition support means,

And a voice calling terminal number, voice data, and the recognized voice recognition name of the user in the extracted customer preferences.

The method of claim 3, wherein

The recognition target address book management means,

Further comprising customer preference storage means for storing the extracted customer preferences for each user,

Voice recognition processing device for updating the managed personal address book using the stored customer preferences.

The method according to any one of claims 1 to 4,

The recognition target address book management means,

Address book management means for registering a personal address book and a corporate address book from a personal customer and a corporate customer, respectively, and generating and managing an integrated address book by integrating the registered personal address book and the registered corporate address book;

Personal address book storage means for storing the personal address book registered by the personal customer for each customer;

Corporate address book storage means for storing a corporate address book registered by the corporate customer; And

Integrated address book storage means for storing the generated integrated address book

Speech recognition processing device comprising a.

The method of claim 5,

The address book management means,

And a predetermined number of speech recognition names are selected by checking the frequency of the recognized speech recognition names and generating an integrated address book including the selected speech recognition names.

The method of claim 6,

The address book management means,

And generating the integrated address book by dividing the selected voice recognition name into a name that can be modified and a name that cannot be modified at a preset ratio.

The method of claim 5,

The address book management means,

And analyzing the recognized voice recognition name to obtain a rank for the voice recognition name, and newly reflecting the obtained rank in the voice recognition name of the stored personal address book.

In the speech recognition processing method for large-capacity speech recognition,

A word input step of receiving a voice recognition target word from a user according to a scenario of a voice recognition service through an interface with an application server;

Generating an integrated address book for each user by using a personal address book and a corporate address book distinguished from each other when a voice recognition target word is input by voice from the user; And

A voice recognition step of recognizing the input voice recognition target word as a voice recognition name in the generated integrated address book

Speech recognition processing method comprising a.

The method of claim 9,

Customer preference extraction step of extracting customer preferences by analyzing the use history of the recognized voice recognition name for each user

Speech recognition processing method further comprising.

The method of claim 10,

The customer preference extraction step,

And including the user's calling terminal number, voice data, and the recognized voice recognition name in the extracted customer preferences, and updating the managed personal address book using the extracted customer preferences.

The method according to any one of claims 9 to 11,

The integrated address book generation step,

And a predetermined number of speech recognition names are selected by checking the frequency of the recognized speech recognition names, and generating an integrated address book including the selected speech recognition names.

13. The method of claim 12,

The integrated address book generation step,

And generating a unified address book by dividing the selected voice recognition name into a name that can be modified and a name that cannot be modified at a preset ratio.