KR100994930B1

KR100994930B1 - Adaptive voice recognition control method for voice recognition based home network system and the system thereof

Info

Publication number: KR100994930B1
Application number: KR1020080070484A
Authority: KR
Inventors: 김기현
Original assignee: 주식회사 씨에스
Priority date: 2008-07-21
Filing date: 2008-07-21
Publication date: 2010-11-17
Also published as: KR20100009730A

Abstract

음성인식 기반 홈네트워크 시스템의 적응화 음성 인식 제어 방법 및 그 시스템이 개시된다. 본 발명에 따른 음성인식 기반 홈네트워크 시스템의 적응화 음성 인식 제어 방법은 인터넷망과, 상기 인터넷망과 홈네트워크에 접속되어 통화 및 각종 제어 기능을 수행하는 월패드와, 상기 월패드에 접속되어 음성 인식을 수행하는 음성인식부, 및 상기 인터넷망에 접속되어 상기 음성인식부에서의 적응화된 음성 인식 모델의 관리를 수행하는 관리 서버로 이루어지는 음성인식 홈네트워크에서 수행되는 적응화 음성 인식 제어 방법에 있어서, (a) 월패드가 사용자로부터 적응화 실시의 개시 요청을 접수하는 단계와, (b) 음성인식부가 상기 (a) 단계에 응답하여 적응화를 실시하여 해당 사용자에 대한 적응 모델을 생성하는 단계와, (c) 상기 (b) 단계에서 생성된 적응 모델을 월패드에서 인터넷망을 통해 관리 서버로 전송하는 단계와, (d) 관리 서버가 상기 (c) 단계에서 상기 인터넷망을 통하여 전송된 적응 모델에 대하여 충분히 많은 화자의 발성 데이터를 사용하여 인식률을 검증하는 단계, 및 (e) 상기 (d) 단계에서 인식률이 부적합한 것으로 검증된 경우에는 관리 서버가 적응 모델의 삭제를 명령하는 단계를 포함하는 것을 특징으로 한다.Disclosed are an adaptive speech recognition control method of a speech recognition based home network system, and a system thereof. An adaptive voice recognition control method of a voice recognition-based home network system according to the present invention includes an internet network, a wall pad connected to the internet network and a home network to perform calls and various control functions, and a voice recognition connected to the wall pad. An adaptive speech recognition control method performed in a speech recognition home network, comprising: a speech recognition unit configured to perform a; and a management server connected to the Internet network to manage an adapted speech recognition model in the speech recognition unit. a) the wall pad accepting a request for the start of the adaptation from the user, (b) the speech recognition unit performing the adaptation in response to the step (a) to generate an adaptation model for the user, and (c) Transmitting the adaptive model generated in step (b) from the wall pad to the management server through the Internet; and (d) the management server. (c) verifying the recognition rate using sufficient speaker's speech data for the adaptive model transmitted through the internet network; and (e) managing the recognition rate if the recognition rate is inappropriate in step (d). The server instructing the deletion of the adaptation model.

홈네트워크, 음성인식, 적응화, 거절율 Home network, voice recognition, adaptation, rejection rate

Description

Adaptive voice recognition control method for voice recognition based home network system and the system

본 발명은 음성인식 기반 홈네트워크 시스템의 적응화 음성 인식 제어 방법 및 그 시스템에 관한 것으로 더 상세하게는 음성인식 기반의 홈네트워크 시스템에서 화자 독립 음성 인식을 보완하기 위하여 적응화 기법을 적용한 화자 종속 음성 인식을 사용할 때 발생할 수 있는 거절율이 낮아지는 문제를 효율적으로 개선하기 위한 음성인식 기반 홈네트워크 시스템의 적응화 음성 인식 제어 방법 및 그 시스템에 관한 것이다.The present invention relates to a method and a system for controlling adaptive speech recognition of a speech recognition based home network system. More particularly, the present invention relates to speaker dependent speech recognition using an adaptation technique to supplement speaker independent speech recognition in a speech recognition based home network system. The present invention relates to a method and system for adaptive speech recognition control of a voice recognition-based home network system for efficiently improving a problem of low rejection rate that may occur when using the apparatus.

일반적으로 홈네트워크 시스템은 가정 내의 조명, 냉난방, 커튼, 가스 밸브, 기타 보안 장치들을 하나의 네트워크로 연결하여 제어하는 시스템을 일컫는다. 이러한 홈네트워크 시스템은 통화 및 각종 제어 기능을 수행하는 월패드(Wall Pad)라고 칭하는 사용자 인터페이스와, 집안 내의 각종 기기들이 연결되어 감시 및 제어가 이루어지는 기기제어장치, 현관 밖에 설치되는 도어폰 및/또는 공동 현관에 설치되어 출입을 관장하는 로비폰, 경비실에 설치되는 경비실 폰, 및 단지의 유지 및 관리를 위한 단지 서버 및 정보 통신망 구성을 위한 서버류 등으로 구성된다. In general, a home network system refers to a system for controlling lighting, air conditioning, curtains, gas valves, and other security devices in a home by connecting them to one network. Such a home network system includes a user interface called a wall pad that performs a call and various control functions, a device control device in which various devices in the home are connected and monitored and controlled, a door phone installed outside the entrance and / or It is composed of a lobby phone installed in the common entrance to manage access, a guard room phone installed in the guard room, a server for the maintenance and management of the complex and a server for the information communication network.

하지만, 상기와 같은 종래의 홈네트워크 시스템은 사용자 인터페이스가 복잡하고 사용하기가 어려워 일반 사용자들은 인터폰 기능 등에 국한되어 사용된다는 문제점이 있다. 이러한 문제점을 해결하기 위하여 화자 독립 음성 인식 기능을 홈네트워크 시스템의 사용자 인터페이스로 추가하고, 말로써 홈네트워크의 다양한 기능들을 구동할 수 있도록 한 음성인식 홈네트워크 시스템이 개발되고 있다.However, the conventional home network system as described above has a problem in that the user interface is complicated and difficult to use, and thus general users are limited to the interphone function. In order to solve this problem, a voice recognition home network system has been developed to add a speaker independent speech recognition function to a user interface of a home network system and to drive various functions of the home network by speech.

음성 인식 기법은 화자 독립적인 표준 음성 모델을 사용하는 기법과 화자 종속적인 적응형 음성 모델을 사용하는 기법으로 구분될 수 있다. 표준 음성 모델은 불특정 다수 화자의 발성을 녹음한 음성 데이터로부터 인식하고자 하는 대상 단어 또는 문장으로부터 공통적인 특징을 추출하여 추출된 특징을 사용하여 음성 인식의 기준 모델로 표준 음성 모델이 구한다. 하지만, 모든 개개의 화자는 발성 습관이나 억양 측면에서 조금씩 차이가 있어서 동일한 화자 독립 음성 인식 모델을 적용한 음성 인식에서 인식률이 차이가 있고, 특이한 음색이나 발성 습관을 가진 사람의 경우에는 인식률이 현저히 떨어지게 되는 경우도 발생할 수 있다. Speech recognition can be divided into a technique using a speaker-independent standard speech model and a speaker-dependent adaptive speech model. The standard speech model is obtained by extracting a common feature from a target word or sentence to be recognized from speech data recorded by a voice of an unspecified majority speaker and using the extracted feature as a reference model for speech recognition. However, all the individual speakers differ slightly in terms of speech habits and intonation, so that the recognition rate is different in speech recognition using the same speaker-independent speech recognition model, and the recognition rate is significantly lower in the case of a person having an unusual tone or speech habit. It can also happen.

즉, 화자 독립 음성인식 기술도 특이한 발성 습관이나 억양을 가지고 있는 사람의 경우에는 인식 성능이 현저히 떨어진다. 이러한 문제점을 개선하기 위하여 인식률이 현저히 떨어지는 사람의 경우에도 별도의 음성인식기 학습 과정을 거쳐 인식률을 일반인의 수준으로 끌어올리는 기법으로 적응화 기술이 사용된다. 이로써, 특정 화자에 대해서도 어느 정도 이상의 인식률을 갖도록 하기 위하여 특정 화자에 대하여 최적화된 변형 인식 모델, 즉, 표준 음성 인식 모델에 적응화된 음성 인식 모델이 결합된 음성 인식 모델을 사용하는 것이 가능할 것이다.In other words, the speaker-independent speech recognition technology also has a significantly lower recognition performance in the case of a person with an unusual speech habit or intonation. In order to improve this problem, the adaptation technique is used as a technique to raise the recognition rate to the level of the general public through a separate speech recognizer learning process even for a person whose recognition rate is significantly lower. Thus, it may be possible to use a modified recognition model optimized for a specific speaker, that is, a speech recognition model combined with a speech recognition model adapted to a standard speech recognition model in order to have a certain recognition rate to a certain speaker.

하지만, 적응화는 기준 모델 외에 특정 화자를 위한 적응 모델이 사용되어지게 되고 인식률은 향상될 수 있으나 개발자의 손을 떠나게 되어 거절율의 제어가 어렵게 되고 오인식이 증가될 수 있다는 문제점이 있다. 또한, 수많은 화자들이 발성한 명령 음성을 저장한 방대한 사이즈의 데이터베이스가 요구되고 그 데이터베이스에 저장된 음성들을 사용하여 인식률을 검증하기 위한 제어과정이 복잡하여 그러한 기능을 월패드에 삽입하여 사용자가 직접 검증하기가 어렵다는 문제점이 있다. 또한, 거절율을 검증하기 위해서는 상당히 많은 양의 생활 소음과 테스트 시간을 요할 뿐만 아니라 처리 프로그램의 복잡도가 높기 때문에 월패드에서 사용자가 직접 검증하도록 구현하는 것은 용이하지 않다는 문제점이 있다.However, in the adaptation, an adaptation model for a specific speaker may be used in addition to the reference model, and the recognition rate may be improved. However, it is difficult to control the rejection rate and increase the recognition of the error by leaving the developer's hand. In addition, a large sized database storing command voices spoken by a large number of speakers is required, and a complicated control process for verifying recognition rate using the voices stored in the database is complicated. There is a problem that is difficult. In addition, in order to verify the rejection rate, a considerable amount of noise and test time are required, as well as the complexity of the processing program, there is a problem that it is not easy to implement the user to verify directly on the wall pad.

본 발명은 상기한 문제점을 해결하기 위하여 발명된 것으로 본 발명이 이루고자하는 기술적 과제는 음성인식 홈네트워크 시스템에 화자 독립 인식률을 높이기 위해 적응화 기술을 접목할 때 발생될 수 있는 문제점을 해결하여 거절율과 인식률을 최적화할 수 있는 음성인식 홈네트워크 시스템을 제공하는 것이다.The present invention has been invented to solve the above problems, the technical problem to be achieved by the present invention is to solve the problem that may occur when applying the adaptation technology to increase the speaker independent recognition rate in the voice recognition home network system and It is to provide a voice recognition home network system that can optimize the recognition rate.

상기 기술적 과제를 해결하기 위한 본 발명의 일측면에 따른 음성인식 기반의 홈네트워크 시스템의 적응화 음성 인식 제어 방법은,An adaptive speech recognition control method of a voice recognition based home network system according to an aspect of the present invention for solving the technical problem,

인터넷망과, 상기 인터넷망과 홈네트워크에 접속되어 통화 및 각종 제어 기능을 수행하는 월패드와, 상기 월패드에 접속되어 음성 인식을 수행하는 음성인식부, 및 상기 인터넷망에 접속되어 상기 음성인식부에서의 적응화된 음성 인식 모델의 관리를 수행하는 관리 서버로 이루어지는 음성인식 홈네트워크에서 수행되는 적응화 음성 인식 제어 방법에 있어서, (a) 월패드가 사용자로부터 적응화 실시의 개시 요청을 접수하는 단계; (b) 음성인식부가 상기 (a) 단계에 응답하여 적응화를 실시하여 해당 사용자에 대한 적응 모델을 생성하는 단계; (c) 상기 (b) 단계에서 생성된 적응 모델을 월패드에서 인터넷망을 통해 관리 서버로 전송하는 단계: (d) 관리 서버가 상기 (c) 단계에서 상기 인터넷망을 통하여 전송된 적응 모델에 대하여 충분히 많은 화자의 발성 데이터를 사용하여 인식률을 검증하는 단계; 및 (e) 상기 (d) 단계에서 인식률이 부적합한 것으로 검증된 경우에는 관리 서버가 적응 모델의 삭제를 명령하는 단계;를 포함하는 것을 특징으로 한다.A wall pad connected to an internet network, the internet network and a home network to perform calls and various control functions, a voice recognition unit connected to the wall pad to perform voice recognition, and a voice recognition connected to the internet network An adaptive speech recognition control method performed in a speech recognition home network comprising a management server that manages an adapted speech recognition model in a unit, the method comprising: (a) receiving, by a wall pad, a start request for adaptation from a user; (b) the speech recognition unit performing adaptation in response to step (a) to generate an adaptation model for the corresponding user; (c) transmitting the adaptation model generated in step (b) from the wall pad to the management server through the internet network: (d) the management server is applied to the adaptation model transmitted through the internet network in step (c) Verifying the recognition rate using enough speaker's speech data; And (e) if it is verified that the recognition rate is inappropriate in step (d), the management server instructing the deletion of the adaptation model.

또한, 상기 방법은 (f) 상기 (e) 단계에서 인식률이 적합한 것으로 판단된 경우에는 관리 서버가 충분히 많은 시간 이상의 생활 소음 데이터를 사용하여 거절율을 검증하는 단계; (g) 상기 (f) 단계에서 거절율이 적합하지 않은 것으로 판단된 경우이면 관리 서버가 거절율을 확보하기 위하여 요구되는 필러 모델(Filler model)을 포함하는 적응 모델 보완 패치를 생성하는 단계; (h) 관리 서버가 적응 모델 보완 패치를 인터넷망을 통하여 월패드로 전송하는 단계; 및 (i) 월패드가 상기 (h) 단계에서 전송된 적응 모델 보완 패치를 수신하고 음성인식부는 수신된 적응 음성 모델 보완 패치를 사용하여 적응 모델을 갱신하는 단계;를 더 포함하는 것이 보다 바람직하다.The method may further include (f) verifying the rejection rate by using the living noise data of a sufficient time or more, if the recognition rate is determined to be appropriate in step (e); (g) if it is determined that the rejection rate is not suitable in step (f), the management server generating an adaptive model supplemental patch including a filler model required to secure the rejection rate; (h) the management server transmitting an adaptation model supplemental patch to the wall pad through the Internet; And (i) the wall pad receiving the adaptive model supplemental patch transmitted in step (h) and the speech recognition unit updating the adaptive model using the received adaptive speech model supplemental patch. .

또한, 상기 기술적 과제를 해결하기 위한 본 발명의 다른 측면에 따른 음성인식 기반의 홈네트워크 시스템의 적응화 음성 인식 제어 시스템은, 음성 인식을 사용하는 홈네트워크 시스템에 있어서, 인터넷망과 홈네트워크 기기들에 접속되고 통화 및 각종 제어와 함께 사용자에게 적응화에 필요한 일련의 안내를 제공하는 안내부를 포함하는 월패드; 상기 월패드와 접속되는 것으로 음성 인식을 처리하는 제1 모듈과 상기 월패드의 안내에 응답하여 화자 종속 음성 인식을 기반으로 한 소정의 적응화 음성 인식 알고리즘을 수행함으로써 사용자의 발성에 특화된 적응 모델을 생성하고 생성된 적응 음성 모델을 통신망을 통하여 관리 서버로 전송하며 관리 서버로부터의 검증 결과를 사용하여 적응 모델의 삭제 및/또는 적응 모델 보완 패 치를 사용한 적응 모델 갱신을 포함하는 적응 음성 모델 관리 과정을 수행하는 제2 모듈을 구비하는 음성 인식부; 및 충분히 많은 불특정 다수 화자의 인식 대상 명령어에 대한 발성 데이터를 저장하는 테스트 음성 데이터베이스와, 상기 테스트 음성 데이터베이스에 저장된 음성을 불러들여 특징을 추출하는 특징 추출부와, 추출된 특징과 인터넷을 통하여 전송된 적응 모델을 사용하여 비교 및 검색하는 비교검색부, 상기 비교검색부에서의 검색 및 비교 결과에 따라 인식된 음성이 미리 정의된 명령에 해당하는 음성인지를 결정하는 결정부, 및 상기 비교검색부 및 상기 결정부에서의 인식 결과를 기초로 인식률을 검증하여 적절하지 않은 것으로 판단되면 적응 모델을 삭제하도록 하는 적응 모델 삭제 명령을 검증 결과로서 생성하여 월패드로 전송하는 검증부를 포함한 관리 서버;를 포함하며, 상기 월패드는, 상기 적응 모델 삭제 명령을 수신하면 상기 음성인식부에서 해당 적응 모델을 삭제하도록 제어하는 것을 특징으로 한다.In addition, the adaptive voice recognition control system of the voice recognition-based home network system according to another aspect of the present invention for solving the above technical problem, in the home network system using the voice recognition, to the Internet network and home network devices A wall pad that is connected and includes a guide that provides a user with a call and various controls, the set of instructions for adaptation; A first module which processes speech recognition by being connected to the wall pad and a predetermined adaptive speech recognition algorithm based on speaker-dependent speech recognition in response to the guidance of the wall pad generate an adaptive model specialized for the user's speech. And transmits the generated adaptive speech model to the management server through the communication network, and performs the adaptive speech model management process including deleting the adaptive model and / or updating the adaptive model using the adaptive model complementary patch using the verification result from the management server. A speech recognition unit having a second module; And a test voice database for storing speech data for a command to be recognized by a large number of unspecified plural speakers, a feature extractor for extracting features by importing voices stored in the test voice database, A comparison search unit for comparing and searching using an adaptive model, a determination unit for determining whether a recognized voice corresponds to a predefined command according to the search and comparison result in the comparison search unit, and the comparison search unit; And a management server including a verification unit configured to verify the recognition rate based on the recognition result of the determination unit, and to generate an adaptation model deletion command for deleting the adaptation model as a verification result and to transmit it to the wall pad. The wall pad may recognize the voice when receiving the adaptive model delete command. In it characterized in that to control the removal of the adaptation model.

또한, 상기 테스트 음성 데이터베이스는 충분히 많은 불특정 다수 화자의 인식 대상 명령어에 대한 발성 데이터와 충분히 긴 시간 동안의 생활 소음 데이터를 저장하고, 상기 검증부는, 상기 비교검색부 및 상기 결정부에서의 인식 결과를 기초로 인식률을 검증하여 적절하지 않은 것으로 판단되면 적응 모델을 삭제하도록 하는 적응 모델 삭제 명령을 검증 결과로서 생성하여 월패드로 전송하고, 인식률을 검증하여 적절한 것으로 판단되면 오인식이 발생하는 데이터로부터 필러 모델을 포함하는 적응 모델 보완 패치를 검증 결과로서 생성하여 월패드로 전송하며, 상기 월패드는, 상기 적응 모델 삭제 명령을 수신하면 상기 음성인식부에서 해당 적응 모델을 삭제하도록 제어하고 상기 적응 모델 보완 패치를 수신하면 상기 음성인식부에서 적응 모델 보완 패치를 사용하여 적응 모델을 갱신하도록 제어하는 것도 가능하다.In addition, the test voice database stores voice data for a large number of unspecified multiple speaker speech recognition commands and living noise data for a sufficiently long time, and the verification unit is further configured to recognize the recognition results of the comparison search unit and the determination unit. Based on the verification of the recognition rate, if it is determined that it is not appropriate, the generation of the adaptive model deletion command to delete the adaptive model is generated as a verification result and transmitted to the wall pad. Generates an adaptive model supplemental patch including a verification result and transmits it to a wall pad, wherein the wall pad controls the speech recognition unit to delete the adaptive model and receives the adaptive model deleting command and receives the adaptive model deleting patch. When receiving the adaptive model in the speech recognition unit Using the completed patch is also possible to control so as to update an adaptive model.

본 발명에 따른 음성인식 기반 홈네트워크 시스템의 적응화 음성 인식 제어 방법 및 그 시스템은 관리서버에서 충분히 많은 화자의 발성 데이터를 사용하여 인식률이 확보되는지를 검증하여 소정 기준, 예컨대 적어도 90% 이상의 발성 데이터가 적절히 인식이 된다면 그 적응 음성 모델은 표준 음성 모델을 보완하는 적응 음성 모델로 채택되지만 그렇지 않은 경우라면 표준 음성 모델과 지나치게 차이를 보이므로 오인식의 가능성이 높다고 판단하여 해당 적응 음성 모델을 삭제하도록 함으로써 적응화 과정에서 오인식이 발생할 수 있는 가능성을 줄여 준다. 즉, 월패드 단독으로 적응화 음성 인식을 적용할 때 발생될 수 있는 문제점을 보완한다.
또한, 선택적으로는, 검증 결과상 거절이 되지 않고 오인식되는 경우에는 관리 서버에서 오인식되지 않도록 하는 필러 모델을 포함하여 적응 모델을 보완하도록 하는 적응 모델 보완 패치를 전송하고 월패드와 음성 인식부에서는 적응 모델 보완 패치를 사용하여 음성 모델을 갱신하는 거절율 검증 과정을 포함함으로써 인식률을 개선하는 과정에서 발생할 수 있는 거절율이 저하되는 문제를 보다 효율적으로 개선할 수 있다.The adaptive voice recognition control method and system of the voice recognition based home network system according to the present invention verify that the recognition rate is secured by using a sufficient number of speaker's voice data in the management server, so that a predetermined criterion, for example, at least 90% or more If it is properly recognized, the adaptive speech model is adopted as an adaptive speech model that complements the standard speech model. If not, the adaptive speech model is considered to be too different from the standard speech model, so that it is highly likely to be misidentified. This reduces the likelihood of false recognition occurring in the process. That is, the problem that may occur when applying adaptive speech recognition by wall pad alone is compensated for.
Optionally, if the recognition result is not rejected and is misrecognized, the management server transmits an adaptation model supplemental patch to complement the adaptation model including a filler model that is not misrecognized by the management server. By including a rejection rate verification process for updating a speech model using a model complementary patch, it is possible to more efficiently alleviate the problem of a decline in rejection rate that may occur in the process of improving the recognition rate.

이하 첨부된 도면들을 참조하여 본 발명의 바람직한 실시예들을 보다 상세히 설명하기로 한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1에는 본 발명에 따른 음성인식 기반 홈네트워크 시스템의 적응화 음성 인식 제어 방법을 수행하기 위한 시스템의 개략적인 구조를 블록도로써 나타내었다. 도 1을 참조하면, 본 발명에 따른 음성인식 기반 홈네트워크 시스템의 적응화 음성 인식 제어 방법을 수행하기 위한 시스템은 인터넷망(1)과 홈네트워크 기기들(104)에 접속되고 통화 및 각종 제어와 함께 사용자에게 적응화에 필요한 일련의 안내를 제공하는 안내부(102)를 포함하는 월패드(10)와,1 is a block diagram illustrating a schematic structure of a system for performing an adaptive speech recognition control method of a speech recognition based home network system according to the present invention. Referring to FIG. 1, a system for performing an adaptive voice recognition control method of a voice recognition-based home network system according to the present invention is connected to an internet network 1 and home network devices 104 and includes a call and various controls. A wall pad (10) comprising a guide (102) for providing a set of instructions for adaptation to a user;

상기 월패드(10)와 접속되는 것으로 음성 인식을 처리하는 제1 모듈(122)과 상기 월패드(10)의 안내에 응답하여 표준 음성 인식 모델(124)에 기한 화자 종속 음성 인식을 기반으로 한 소정의 적응화 음성 인식 알고리즘을 수행함으로써 사용자의 발성에 특화된 적응 모델을 생성하고 생성된 적응 음성 모델(128)을 인터넷망(1)을 통하여 관리 서버(14)로 전송하며 관리 서버(14)로부터의 검증 결과를 사용하여 적응 모델의 삭제 및/또는 적응 모델 보완 패치를 사용한 적응 모델 갱신을 포함하는 적응 음성 모델 관리 과정을 수행하는 제2 모듈(126)을 구비하는 음성 인식부(12), 및The first module 122 which processes speech recognition by being connected to the wall pad 10 and the speaker dependent speech recognition based on the speaker-dependent speech recognition based on the standard speech recognition model 124 in response to the guidance of the wall pad 10. By performing a predetermined adaptive speech recognition algorithm, an adaptive model specific to a user's speech is generated, and the generated adaptive speech model 128 is transmitted to the management server 14 through the Internet network 1 and from the management server 14. A speech recognition unit 12 including a second module 126 for performing an adaptive speech model management process including deleting an adaptive model and / or updating an adaptive model using an adaptive model supplemental patch using the verification result, and

충분히 많은 불특정 다수 화자의 인식 대상 명령어에 대한 발성 데이터를 저장하는 테스트 음성 데이터베이스(142)와, 상기 테스트 음성 데이터베이스(142)에 저장된 음성을 불러들여 특징을 추출하는 특징 추출부(144)와, 추출된 특징과 인터넷을 통하여 전송된 적응 모델(145)을 비교하는 비교검색부(146), 상기 비교검색부(146)에서의 검색 및 비교 결과에 따라 인식된 음성이 미리 정의된 명령에 해당하는 음성인지를 결정하는 결정부(148), 및 상기 비교검색부(146) 및 상기 결정부(148)에서의 인식 결과를 기초로 인식률을 검증하여 적절하지 않은 것으로 판단되면 적응 모델을 삭제하도록 하는 적응 모델 삭제 명령(delete_adaptmodel)을 검증 결과로서 생성하여 월패드로 전송하는 검증부(150)를 포함한 관리 서버(14)를 포함한다.A test voice database 142 which stores voice data for a command to be recognized by a large number of unspecified multiple speakers, a feature extracting unit 144 which extracts a feature by importing voices stored in the test voice database 142, and extracts A comparison search unit 146 for comparing the received feature with the adaptive model 145 transmitted through the Internet, and a voice corresponding to a predetermined command as a recognized voice according to the search and comparison result of the comparison search unit 146. The adaptive model for determining the recognition and the adaptive model to delete the adaptive model if it is determined that it is not appropriate by verifying the recognition rate based on the recognition results of the comparison search unit 146 and the determination unit 148. The management server 14 includes a verification unit 150 that generates a delete command delete_adaptmodel as a verification result and transmits the result to the wall pad.

또한, 상기 테스트 음성 데이터베이스(14)는 충분히 많은 불특정 다수 화자의 인식 대상 명령어에 대한 발성 데이터와 충분히 긴 시간 동안의 생활 소음 데이터를 저장하는 것이 보다 바람직하고, 상기 검증부(150)는 상기 비교검색부(146) 및 상기 결정부(148)에서의 인식 결과를 기초로 인식률을 검증하여 적절하지 않은 것으로 판단되면 적응 모델을 삭제하도록 하는 적응 모델 삭제 명령(delete_adaptmodel)을 검증 결과로서 생성하여 월패드로 전송함과 더불어, 인식률을 검증하여 적절한 것으로 판단되면 오인식이 발생하는 데이터로부터 필러 모델(Filler model)을 포함하는 적응 모델 보완 패치(complementpatch_adaptmodel)를 검증 결과로서 생성하여 월패드(10)로 전송하며, 상기 월패드(10)는, 상기 적응 모델 삭제 명령(delete_adaptmodel)을 수신하면 상기 음성인식부(12)에서 해당 적응 모델을 삭제하도록 제어하고 상기 적응 모델 보완 패치(complement_adaptmodel)를 수신하면 상기 음성인식부(12)에서 적응 모델 보완 패치(complement_adaptmodel)를 사용하여 적응 모델을 갱신하도록 제어하는 것이 보다 바람직하다.In addition, the test voice database 14 preferably stores voice data for a large number of unspecified multiple speaker speech recognition commands and living noise data for a sufficiently long time, and the verification unit 150 performs the comparison search. Based on the recognition result of the unit 146 and the determination unit 148, the recognition rate is verified, and if it is determined that it is not appropriate, an adaptive model delete command (delete_adaptmodel) for deleting the adaptation model is generated as a verification result and used as a wall pad. In addition to the transmission, if the recognition rate is verified and determined to be appropriate, an adaptive model complement patch (complementpatch_adaptmodel) including a filler model is generated from the data in which misrecognition occurs as a verification result and transmitted to the wall pad 10. The wall pad 10, in response to receiving the adaptive model delete command (delete_adaptmodel) in the speech recognition unit 12 When the control unit deletes the adaptation model and receives the adaptation model supplement patch (complement_adaptmodel), it is more preferable to control the speech recognition unit 12 to update the adaptation model using the adaptation model supplement patch (complement_adaptmodel).

상기와 같은 시스템에서는 본 발명에 따른 음성인식 기반 홈네트워크 시스템의 적응화 음성 인식 제어 방법이 월패드(10)와 음성인식부(12) 및 관리서버(14)에서 분산적으로 수행된다. 그러한 제어 방법은 각 구성 요소들에서 처리되는 일련의 단계들을 프로그램, 프로그램 코드, 코드 세그멘트들을 저장하고 불러들여 실행함으로써 구현되며 당업자에 의하여 프로그래밍될 수 있다.In such a system, the adaptive voice recognition control method of the voice recognition-based home network system according to the present invention is distributed in the wall pad 10, the voice recognition unit 12, and the management server 14. Such a control method is implemented by storing and retrieving a program, program code, code segments, and executing a series of steps processed in each component and can be programmed by those skilled in the art.

도 2에는 본 발명에 따른 음성인식 기반 홈네트워크 시스템의 적응화 음성 인식 제어 방법의 주요 처리 단계들을 흐름도로써 나타내었다. 도 1은 이하에서 수시로 참조된다. 도 2를 참조하면, 본 발명의 실시예에 따른 음성 인식 기반 홈네트워크 시스템의 적응화 음성 인식 제어 방법은 인터넷망(1)과, 상기 인터넷망(1)과 홈네트워크에 접속되어 통화 및 각종 제어 기능을 수행하는 월패드(10)와, 상기 월패드(10)에 접속되어 음성 인식을 수행하는 음성인식부(12), 및 상기 인터넷망(1)에 접속되어 상기 음성인식부(12)에서의 적응화된 음성 인식 모델의 관리를 수행하는 관리 서버(14)로 이루어지는 음성인식 홈네트워크에서 수행되는 적응화 음성 인식 제어 방법에 있어서,2 is a flowchart illustrating main processing steps of an adaptive speech recognition control method of a speech recognition based home network system according to the present invention. 1 is often referenced below. 2, the adaptive voice recognition control method of the voice recognition-based home network system according to an embodiment of the present invention is connected to the Internet network 1, the Internet network 1 and the home network to make calls and various control functions. The voice recognition unit 12 connected to the wall pad 10 to perform voice recognition, and the voice recognition unit 12 connected to the Internet network 1. An adaptive speech recognition control method performed in a speech recognition home network comprising a management server 14 for managing an adapted speech recognition model,

(a) 월패드(10)가 사용자로부터 적응화 실시의 개시 요청을 접수하는 단계(S200)와,(a) step S200 of receiving, by the user, the wall pad 10 from the user to start the adaptation;

(b) 음성인식부(12)가 상기 (a) 단계(S200)에 응답하여 적응화를 실시하여 해당 사용자에 대한 적응 모델을 생성하는 단계(S202)와,(b) the speech recognition unit 12 performing adaptation in response to step (a) (S200) to generate an adaptation model for the corresponding user (S202);

(c) 상기 (b) 단계(S202)에서 생성된 적응 모델을 월패드에서 인터넷망(1)을 통해 관리 서버(14)로 전송하는 단계(S204)와,(c) transmitting (S204) the adaptive model generated in step (b) to the management server 14 through the internet network 1 in the wall pad;

(d) 관리 서버(14)가 상기 (c) 단계(S204)에서 상기 인터넷망(1)을 통하여 전송된 적응 모델에 대하여 충분히 많은 화자의 발성 데이터를 사용하여 인식률을 검증하는 단계(S206), 및(d) the management server 14 verifying the recognition rate using enough speaker's speech data with respect to the adaptive model transmitted through the Internet network 1 in step (c) (S204), And

(e) 상기 (d) 단계(S206)에서 인식률이 부적합한 것으로 검증된 경우에는 관리 서버(14)가 적응 모델의 삭제를 명령하는 적응모델 삭제 명령(delete_adaptmodel)을 생성하는 단계(S208)를 포함한다.(e) if it is verified that the recognition rate is inappropriate in step (d) (S206), the management server 14 includes a step (S208) of generating an adaptive model delete command (delete_adaptmodel) instructing the deletion of the adaptive model. .

도 3에는 적응화에 의하여 음성 인식 모델을 갱신하는 과정을 보다 상세하게 설명하기 위한 도면을 나타내었다. 도 3을 참조하여 설명하는 과정은 도 2의 단계 (b)에 해당한다. 도 3을 참조하면, 인식율이 현저히 떨어지는 특정 화자의 음성, 바람직하게는 정해진 몇 개의 문장에 대한 발성을 입력하여, 레벨 조정과 같은 입력 신호 처리(302)를 거쳐 녹음(304)함으로써 저장된 음성 데이터(306)로부터 특징을 추출(308)하고 모델링(310)하여 음성 인식 모델을 갱신(312)한다. 갱신되는 음성인식모델(314)은 음성모델, 렉시콘(Lexicon), 및 음성대상 ID를 포함한다. 여기서, 음성 인식 모델을 갱신함으로써 불특정 다수 화자의 발성을 녹음한 음성 데이터로부터 인식하고자 하는 대상 단어 또는 문장의 구성 음소의 표준적 특징을 추출하여 음성인식의 기준 모델로 하는 표준 음성 모델에 위에서 설명한 과정에 의하여 모델링된 적응 음성 모델이 반영되어 추가되는 것이다.3 is a diagram for explaining in detail the process of updating the speech recognition model by adaptation. The process described with reference to FIG. 3 corresponds to step (b) of FIG. 2. Referring to Fig. 3, the voice data stored by the voice of a particular speaker whose recognition rate is significantly lowered, preferably voiced for a predetermined number of sentences, is recorded 304 through input signal processing 302 such as level adjustment. The feature is extracted 308 from 306 and modeled 310 to update 312 the speech recognition model. The updated voice recognition model 314 includes a voice model, a Lexicon, and a voice object ID. Here, the process described above to the standard speech model that extracts the standard features of the constituent phonemes of the target word or sentence to be recognized from the speech data recorded by the voice of the unspecified majority speaker by updating the speech recognition model and serves as a reference model for speech recognition. The adaptive speech model modeled by is reflected.

하지만 상기와 같은 적응화 과정을 통하여 표준 음성 모델을 갱신하는 적응화 음성 모델은 개발자에 의하여 검증되지 않은 음성 모델로 표준 음성 모델과 지나치게 차이가 있다면 명령어가 아닌 발성이나 주변 소음에 의하여 오인식될 수 있는 가능성이 높아진다는 문제점이 있다. 따라서, 본 발명에 따르면 월패드(10)와 음성인식부(12)에서 적응화 과정에 의하여 만들어진 적응 음성 모델을 관리 서버(14)로 전송하고 관리 서버(14)에서 적응 음성 모델에 대하여 인식률을 검증하고 인식률이 부적합하다고 판단되는 경우에는 적응 모델을 삭제할 것을 명령하는 적응 모델 삭제 명령(delete_adaptmodel)을 생성하여 월패드(10)로 전송한다. 인식률을 검증하는 과정은 하나의 명령어에 대하여 충분히 많은 화자, 예컨대, 적어도 100명 이상의 발성 데이터를 사용하여 이루어진다. 발성 데이터는 관리서버(14)의 테스트 음성 데이터베이스(142)에 미리 저장되어 있으며 소정 기준, 즉, 목표 성능이 90%라면 100명의 화자 데이터에 의한 음성 인식 테스트에서 90명 이상의 데이터가 인식 성공을 보여야 한다. 즉, 인식률이 떨어지는 사용자가 월패드(10)를 조작하여 적응화 과정을 거쳐 생성된 적응 음성 모델을 적용하여 관리 서버(14)가 테스트 음성 데이터베이스(142)에 저장된 충분히 많은 화자의 발성 데이터를 사용하여 인식률이 확보되는지를 검증하여 적어도 90% 이상의 발성 데이터가 적절히 인식이 된다면 그 적응 음성 모델은 표준 음성 모델을 보완하는 적응 음성 모델로 채택되지만 그렇지 않은 경우라면 표준 음성 모델과 지나치게 차이를 보이므로 오인식의 가능성이 높다고 판단하여 해당 적응 음성 모델을 삭제하도록 하는 것이다.However, the adaptive speech model that updates the standard speech model through the adaptation process as described above is a speech model that has not been verified by the developer. There is a problem of being high. Therefore, according to the present invention, the adaptive voice model generated by the adaptation process in the wall pad 10 and the voice recognition unit 12 is transmitted to the management server 14 and the management server 14 verifies the recognition rate with respect to the adaptive voice model. If it is determined that the recognition rate is not appropriate, an adaptive model delete command (delete_adaptmodel) is generated to transmit the adaptation model to the wall pad 10. The process of verifying the recognition rate is done using enough speaker, e.g., at least 100 voice data for one instruction. Voice data is pre-stored in the test voice database 142 of the management server 14, and if a predetermined criterion, i.e., the target performance is 90%, more than 90 data should show recognition success in the voice recognition test by 100 speaker data. do. That is, the user with low recognition rate manipulates the wall pad 10 to apply the adaptive speech model generated through the adaptation process, so that the management server 14 uses the speech data of a sufficient number of speakers stored in the test speech database 142. If at least 90% of speech data is properly recognized by verifying that the recognition rate is secured, the adaptive speech model is adopted as an adaptive speech model that complements the standard speech model, but otherwise it is too different from the standard speech model. It is determined that the likelihood is high and the adaptive speech model is deleted.

한편, 상시 동작되는 음성 인식의 경우에는 전화 통화 소리, 잡담 소리, 기타 생활 소음에 의해서도 명령어로 인식하는 경우가 있다. 이러한 오인식을 방지하기 위하여 적절한 거절율을 확보하는 것이 요구된다. 거절율을 확보하기 위해서는 충분히 긴 시간 이상의 생활 소음 데이터를 사용하여 음성 인식을 시도하여 명령어로 인식되는 경우가 있는지를 검증한다. 하지만, 이러한 생활 소음 데이터는 사이즈가 방대하며 생활 소음 데이터를 사용한 거절율 검증 과정도 월패드(10)에서 수행하기에는 적합하지 않다. 따라서, 본 발명에 따르면 적응화 과정에서 만들어진 음성 인식 모델에 대해 관리 서버(14)가 테스트 음성 데이터베이스(142)에 저장된 생활 소음 데이터를 사용하여 오인식이 발생하는지를 검증하여 오인식이 발생하는 경우에는 그 발성에 대해서 거절하기 위한 필러모델을 패치(patch) 형식으로 끼워 넣는다.On the other hand, in the case of the voice recognition that is always operated, it may be recognized as a command also by the sound of phone calls, chats, other noises. In order to prevent such misunderstandings, it is necessary to secure an appropriate rejection rate. In order to secure a rejection rate, a voice recognition is attempted using the noise data of a life long enough to verify whether it is recognized as a command. However, such a living noise data is large and the rejection rate verification process using the living noise data is also not suitable for performing in the wall pad 10. Therefore, according to the present invention, the management server 14 verifies whether the false recognition occurs using the living noise data stored in the test voice database 142 with respect to the speech recognition model generated in the adaptation process, and when the false recognition occurs, Insert a filler model in the form of a patch to reject it.

즉, 본 발명에 따른 음성인식 기반 홈네트워크 시스템의 적응화 음성 인식 제어 방법은 (f) 상기 (e) 단계(S206)에서 인식률이 적합한 것으로 판단된 경우에는 관리 서버(14)가 충분히 많은 시간 이상의 생활 소음 데이터를 사용하여 거절율을 검증하는 단계(S220);That is, in the adaptive speech recognition control method of the voice recognition-based home network system according to the present invention, if (f) the recognition rate is determined to be appropriate in step (e) (S206), the management server 14 lives more than enough time. Verifying a rejection rate using the noise data (S220);

(g) 상기 (f) 단계(S220)에서 거절율이 적합하지 않은 것으로 판단된 경우이면 관리 서버(14)가 거절율을 확보하기 위하여 요구되는 필러 모델을 포함하는 적응 모델 보완 패치를 생성하는 단계(S222);(g) if it is determined that the rejection rate is not suitable in step (f) (S220), the management server 14 generates an adaptive model supplemental patch including a filler model required to secure the rejection rate. (S222);

(h) 관리 서버(14)가 적응 모델 보완 패치를 인터넷망(1)을 통하여 월패드(10)로 전송하는 단계; 및(h) the management server 14 transmitting the adaptation model supplemental patch to the wall pad 10 through the Internet network 1; And

(i) 월패드가 상기 (h) 단계에서 전송된 적응 모델 보완 패치를 접수(S224)하고 음성인식부(12)에서는 수신된 적응 음성 모델 보완 패치를 사용하여 적응 모델을 갱신하는 단계(S226);를 더 포함하는 것이 보다 바람직하다.(i) the wall pad receives the adaptive model supplemental patch transmitted in step (h) (S224), and the speech recognition unit 12 updates the adaptive model using the received adaptive speech model supplemental patch (S226). It is more preferable to further include;

다음으로, 도 4에는 도 2의 제어 방법을 구현하기 위한 프로그램 처리 과정을 설명하기 위한 도면을 나타내었다. 도 4를 참조하면, 사용자가 월패드에 마련된 적응화 버튼을 누르는 것을 감지(S402)하여 적응화 버튼을 누른 것으로 감지되면 월패드(10)는 적응화 모드 개시 프로토콜을 음성인식부(12)로 전송하고 음성인식부(12)는 상기 프로토콜을 수신하여 승인(ACK) 전송(S408)하고 월패드(10)는 상기 승인(ACK)을 수신(S410)하여 적응화 모드를 진행(S412)한다. 이로써, 적응화 모드 진행을 위한 제어가 개시되며 음성인식부(12)에서는 그 제어에 응답하여 적응화 연 산을 진행(S414)한다. 적응화 연산이 완료되면 음성인식부(12)는 적응화 연산이 완료되었음을 알리는 적응화 연산 완료 프로토콜을 월패드(10)로 전송(S416)한다. 월패드(10)는 적응화 연산 완료 프로토콜을 수신(S418)하고 적응화 모드를 종료(S420)하며, 이제 음성인식부(12)는 음성 인식 동작을 위한 인식 모드로 전환한다.Next, FIG. 4 is a diagram illustrating a program processing process for implementing the control method of FIG. 2. Referring to FIG. 4, when the user senses that the user presses the adaptation button provided on the wall pad (S402) and the user presses the adaptation button, the wall pad 10 transmits the adaptation mode initiation protocol to the speech recognition unit 12. The recognition unit 12 receives the protocol and transmits the acknowledgment (ACK) (S408), and the wall pad 10 receives the acknowledgment (ACK) (S410) and proceeds to the adaptation mode (S412). As a result, control for the adaptation mode is started, and the speech recognition unit 12 proceeds to the adaptive operation in response to the control (S414). When the adaptation operation is completed, the speech recognition unit 12 transmits an adaptation operation completion protocol to the wall pad 10 informing that the adaptation operation is completed (S416). The wall pad 10 receives the adaptive calculation completion protocol (S418) and ends the adaptation mode (S420), and the voice recognition unit 12 now switches to the recognition mode for the voice recognition operation.

도 5a 및 도 5b에는 도 4의 제어 방법에 적용될 수 있는 적응화 과정의 일예를 흐름도로써 나타내었다. 도 5a 및 도 5b를 참조하면, 월패드(10)는 적응화에 대한 설명을 표시하고 성별과 모듈에 대한 선택을 화면으로 제공한다(S502). 그래픽 유저 인터페이스 형식으로 안내 문구, 성별 선택 버튼 표시, 모드 선택 버튼 제공, 다음/취소 버튼을 제공하는 것이 바람직하다. 필요에 따라서는 적응화 설명, 적응화 주의사항, 및 성별 선택 안내가 포함될 수 있다. 다음으로, 사용자가 적응화를 실시한 적응화 회수를 체크(S504)하여 예컨대 횟수가 3보다 큰 경우에는 모델 초기화할 것인지에 대한 선택을 제공(S506)하고 그 선택에 따라 모델 초기화를 위한 버튼과 계속 또는 취소할 것인지를 선택하는 계속/취소 버튼을 GUI로 제공하는 것이 바람직하다. 즉, 성별당 3회 이상의 적응화가 불가능함과 모델을 초기화하게 됨을 안내하는 것이 가능하다. 반면에 적응화 횟수가 3보다 적은 경우에는 전체 적응화 할 것인지 선택 적응화를 할 것인지에 대한 선택을 제공(S508)하여 전체 적응화를 선택한 경우에는 전체 명령어 리스트를 확인한 후 적응화를 시작하는 단계(S510)로 분기시키고, 선택 적응화를 선택한 경우에는 명령어 리스트에서 적응할 명령어를 선택한 후 적응화를 시작하는 단계(S512)로 분기시킨다. 사용자가 확인할 수 있는 표시로서 월패드(10)에 구비되는 예를들어 녹색 LED를 확인한 후 그 사용자가 명령어를 발성(S520)하면, 인식 여부, 발성길이, 신호대잡음비(SNR: Signal-to-Noise Ratio) 체크, 묵음 길이등을 기초로 올바른 발성인지를 판단(S522)한다. 이때, 녹취 발성을 플레이하는 기능, 발성에 대한 판단 여부에 대한 안내 기능을 추가로 제공하는 것이 보다 바람직할 것이다. 올바른 발성인 것으로 판단되었으면 다음 명령어에 대하여 위 과정을 반복하며, 올바른 발성이 아닌 것으로 판단되었으면 동일한 명령어를 재발성하도록 안내하여 실행(S524)하며, 3회 발성하였는지를 체크(S526)하여 3회 미만 발성한 경우에는 현재 명령어에 대하여 위 과정을 반복하고 3회를 발성한 경우에는 1회 발성 재시도(S528)하여 발성 등록한다. 전체/선택 명령어를 모두 발성하였을 때 또는 모든 발성을 하지 않았다고 하더라도 '적응화 버튼'을 눌렀을 경우에는 적응화 수행 단계로 이동(S530)시킨다. 또한, 발성이 종료되면 적응화를 수행하게 된다(S540).5A and 5B show an example of an adaptation process that can be applied to the control method of FIG. 4 as a flowchart. 5A and 5B, the wall pad 10 displays a description of adaptation and provides a selection of gender and a module on the screen (S502). It is desirable to provide guidance text, gender selection button display, mode selection button, and next / cancel button in the form of a graphical user interface. If necessary, adaptation descriptions, adaptation notes, and gender selection guides may be included. Next, the user checks the number of adaptations that the user has adapted (S504) and, for example, when the number is greater than 3, provides a selection of whether or not to initialize the model (S506). It is desirable to provide a GUI with a Continue / Cancel button to select whether or not. In other words, it is possible to guide the initiation of the model and the inability to adapt more than three times per gender. On the other hand, if the number of adaptations is less than 3, the method selects whether to perform full adaptation or selective adaptation (S508). When the entire adaptation is selected, after checking the entire instruction list, branching is performed to start adaptation (S510). In the case of selecting selective adaptation, the method branches to step S512 of selecting an instruction to be adapted from the instruction list and then starting the adaptation. For example, if the user checks the green LED, which is provided in the wall pad 10, and the user speaks a command (S520), whether the recognition is performed, the speech length, and the signal-to-noise ratio (SNR) are displayed. It is determined whether the correct utterance is performed based on the ratio) check, the silence length, and the like (S522). At this time, it would be more desirable to further provide a function of playing the recording utterance, a guide function for determining whether or not the utterance. If it is determined that the correct utterance is repeated for the next instruction, and if it is determined that the correct utterance is not the correct utterance is guided to re-speak the same instruction to execute (S524), check whether it is uttered three times (S526) utterance less than three times In one case, the above process is repeated with respect to the current command, and in the case of uttering three times, the utterance is registered by uttering one utterance (S528). When all / selection commands are uttered or even when not all utterances are pressed, when the 'adaptation button' is pressed, the process moves to the adaptation performing step (S530). In addition, when vocalization ends, the adaptation is performed (S540).

한편, 본 발명에 개시된 개념은 분산 음성 인식의 개념과 구별되어야 한다. 분산음성인식은 유무선 네트워크를 통한 서버-클라이언트 구조의 시스템에서의 음성 인식으로 DSR(Distributed Speech Recognition)이라 한다. 이 방식은 음성인식의 전처리부와 인식부 과정을 각각 다른 컴퓨터에서 실행하는 것이 핵심이며 보통 음성 특징을 추출하는 전처리부는 클라이언트 컴퓨터에서, 인식부, 즉 언어 처리 과정은 거대 데이터베이스가 존재하는 서버 컴퓨터에서 처리할 수 있도록 한다. 클라이언트 컴퓨터에서 음성 인식의 전 과정을 실행할 경우 보통 충분한 데이터나 처리 능력이 없어서 어려움을 겪게 되고 반대로 서버 컴퓨터에서 음성 인식의 전 과 정을 실행할 경우 오디오 데이터를 그대로 서버로 전송할 때 잡음 및 왜곡 문제가 발생하는 어려움을 겪게 된다. 분산 음성 인식은 이러한 문제를 극복할 수 있으며 최근의 분산화 되어 가는 시스템 환경들에 적합한 모델이라 할 수 있다.On the other hand, the concept disclosed in the present invention should be distinguished from the concept of distributed speech recognition. Distributed speech recognition is a speech recognition in a system of server-client structure through wired / wireless network and is called DSR (Distributed Speech Recognition). In this method, it is essential to execute the preprocessing unit and the recognition unit of speech recognition on different computers, and the preprocessing unit that extracts the speech feature is usually performed on the client computer, and the recognition unit, that is, the language processing process, is performed on the server computer where the large database exists. To be handled. Running the entire process of speech recognition on a client computer usually suffers from insufficient data or processing power. Conversely, running the entire process of speech recognition on a server computer introduces noise and distortion when sending audio data to the server. You will have difficulty. Distributed speech recognition can overcome this problem and is a suitable model for the recent decentralized system environment.

하지만, 본 발명에 따르면 적응화를 위한 모든 음성 인식 과정을 클라이언트 컴퓨터에 해당하는 월패드측에서 단독으로 수행하되 그 적응 음성 모델을 관리 서버측으로 전송하고 관리 서버측에서는 월패드측에서 수행한 적응화 음성 모델에 대한 인식률 검증과 거절율 검증만을 수행하여 적응화 음성 모델을 삭제할 것인지와 오인식을 발생하지 않도록 하는 보완 패치를 생성하여 전송함으로써 월패드 단독으로 적응화 음성 인식을 적용할 때 발생될 수 있는 문제점을 보완하는 구조임에 주목하여야 할 것이다.However, according to the present invention, all the speech recognition processes for adaptation are performed by the wall pad corresponding to the client computer alone, but the adaptive speech model is transmitted to the management server and the management server side is applied to the adaptive speech model performed by the wall pad. Completion of the problem that may occur when applying adaptive speech recognition by wall pad alone by generating and transmitting supplemental patches to delete the adaptive speech model by performing recognition rate verification and rejection rate verification only, and to prevent false recognition. It should be noted that

상기와 같은 본 발명에 따른 음성인식 기반 홈네트워크 시스템의 적응화 음성 인식 제어 방법 및 그 시스템은 관리 서버를 구비하고 그 관리서버에서 충분히 많은 화자의 발성 데이터를 미리 저장한 테스트 음성 데이터베이스를 사용하여 인식률을 검증하고 검증 결과상 인식률이 낮은 적응모델의 경우에는 삭제하도록 하는 인식률 검증과정과, 충분한 시간 이상의 생활 소음 데이터를 미리 저장한 테스트 음성 데이터베이스를 사용하여 거절율을 검증하고 검증 결과상 거절이 되지 않고 오인식되는 경우에는 관리 서버에서 오인식되지 않도록 하는 일예로써 필러 모델을 포함하여 적응 모델을 보완하도록 하는 적응 모델 보완 패치를 전송하고 월패드와 음성 인식부에서는 적응 모델 보완 패치를 사용하여 음성 모델을 갱신하는 거절율 검증 과정을 포함으로써 인식률을 개선하는 과정에서 발생할 수 있는 거절율이 저 하되는 문제를 효율적으로 개선한다.The adaptive speech recognition control method and system for a speech recognition-based home network system according to the present invention as described above are provided with a management server, and the recognition rate is determined using a test speech database in which a sufficient number of speaker's voice data is stored in advance in the management server. Verification rate is verified by using the recognition rate verification process to verify and delete the adaptive model that has low recognition rate in the verification result, and the test voice database that stores the living noise data for a sufficient time in advance. In this case, the management server transmits an adaptation model supplemental patch to supplement the adaptation model, including the filler model, and the wall pad and the speech recognition unit reject the update of the speech model using the adaptation model supplemental patch. Po rate verification process By this rejection rate that may occur in the process of improving the recognition rate, but that improves the problem efficiently.

도 1은 본 발명에 따른 음성인식 기반 홈네트워크 시스템의 적응화 음성 인식 제어 방법을 수행하기 위한 시스템의 구조를 개략적으로 나타낸 블록도,1 is a block diagram schematically showing the structure of a system for performing an adaptive speech recognition control method of a speech recognition based home network system according to the present invention;

도 2는 발명에 따른 음성인식 기반 홈네트워크 시스템의 적응화 음성 인식 제어 방법의 주요 처리 단계들을 나타낸 흐름도,2 is a flowchart illustrating main processing steps of an adaptive speech recognition control method of a speech recognition based home network system according to the present invention;

도 3에는 적응화에 의하여 음성 인식 모델을 갱신하는 과정을 보다 상세하게 설명하기 위한 도면,3 is a diagram for explaining in detail the process of updating the speech recognition model by adaptation;

도 4는 도 2의 제어 방법을 구현하기 위한 프로그램 처리 과정을 설명하기 위한 도면,4 is a view for explaining a program processing procedure for implementing the control method of FIG.

도 5a 및 도 5b는 도 4의 제어 방법에 적용될 수 있는 적응화 과정의 일예를 나타낸 흐름도.5A and 5B are flowcharts illustrating an example of an adaptation process that may be applied to the control method of FIG. 4.

Claims

A wall pad connected to an internet network, the internet network and a home network to perform calls and various control functions, a voice recognition unit connected to the wall pad to perform voice recognition, and a voice recognition connected to the internet network An adaptive speech recognition control method performed in a speech recognition home network comprising a management server that manages an adaptive speech recognition model in a part,

(a) the wall pad accepting a request for initiation of the adaptation from the user;

(b) generating an adaptive model specialized for the user's utterance by performing a recording process for adaptation according to the guidance of the wall pad and performing a predetermined adaptive speech recognition algorithm based on speaker-dependent speech recognition with the recorded data. ;

(c) transmitting the adaptive model generated in step (b) from the wall pad to the management server through the Internet;

(d) the management server retrieves a voice stored in a test voice database storing voice data of a command to be recognized by a plurality of speakers, extracts a feature, and extracts the extracted feature and the adaptive model transmitted through the Internet in step (c). Verifying the recognition rate based on the recognition result for comparing and searching using and determining whether the recognized speech is a voice corresponding to a predefined command according to the comparison and search result; And

(e) if it is verified that the recognition success rate is less than a predetermined criterion in speech recognition using the adaptive model in step (d), instructing the management server to delete the corresponding adaptive model generated in the wall pad; An adaptive speech recognition control method of a voice recognition based home network system, characterized in that the.

The method of claim 1,

(f) If it is verified in step (e) that the recognition success rate is higher than or equal to a predetermined criterion in the speech recognition using the adaptive model of step (d), the management server rejects the living noise data previously stored in the management server. Verifying the rate;

(g) an adaptation including a filler model required by the management server to secure the rejection rate if it is determined in step (f) that the rejection rate, which is a rate at which the living noise data is not recognized and rejected as a command, is less than a predetermined criterion; Generating a model supplemental patch;

(h) the management server transmitting an adaptation model supplemental patch to the wall pad through the Internet; And

(i) the wall pad receives the adaptive model supplemental patch transmitted in the step (h) and the speech recognition unit updates the adaptive model using the received adaptive speech model supplemental patch; Adaptive Speech Recognition Control Method of Recognition-based Home Network System.

A wall pad including an internet network, a guide unit connected to the internet network and a home network, and providing a series of guides necessary for adaptation to the user together with a call and various control functions, and a voice pad connected to the wall pad to perform voice recognition. A voice recognition home network system comprising a voice recognition unit and a management server connected to the Internet network to manage an adapted voice recognition model in the voice recognition unit,

A first module which processes speech recognition by being connected to the wall pad and a predetermined adaptive speech recognition algorithm based on speaker-dependent speech recognition in response to the guidance of the wall pad generate an adaptive model specialized for the user's speech. And transmit the generated adaptive speech model to the management server through a communication network and perform the adaptive speech model management process including the deletion of the adaptation model or the update of the adaptation model using the adaptation model supplement patch using the verification result from the management server. A speech recognition unit having two modules; And

Using a test voice database for storing voice data of a command to be recognized by a plurality of speakers, a feature extractor for retrieving features by retrieving voices stored in the test voice database, an extracted feature and an adaptive model transmitted through the Internet A comparison search unit for comparing and searching, a determination unit determining whether a recognized voice corresponds to a predefined command according to the search and comparison result in the comparison search unit, and the comparison search unit and the determination unit In the case of the voice recognition using the adaptive model based on the recognition result of the recognition result that the recognition success rate is less than a predetermined criterion includes a verification unit for generating an adaptive model deletion command to delete the adaptive model as a verification result and transmits it to the wall pad A management server;

The wall pad,

And the voice recognition unit controls to delete the adaptive model when the adaptive model deletion command is received.

The method of claim 3,

The test voice database,

Saves voice and living noise data for multiple speaker recognition commands;

The verification unit,

An adaptive model deletion command for verifying a recognition rate based on the recognition result of the comparison search unit and the determination unit and deleting the adaptive model when the recognition success rate is less than a predetermined criterion in speech recognition using the adaptive model. Generated as a verification result and transmitted to the wall pad, and when the recognition rate is verified and it is verified that the recognition success rate is higher than or equal to a predetermined criterion in the speech recognition using the adaptive model, an adaptive model supplemental patch including a filler model is included from the data in which false recognition occurs. Generated as a verification result and sent to the wall pad,

The wall pad,

When the adaptive model deletion command is received, the voice recognition unit controls to delete the corresponding adaptive model, and when the adaptive model complementary patch is received, the voice recognition unit controls to update the adaptive model using the adaptive model complementary patch. Voice recognition home network system.