KR20070008993A

KR20070008993A - Method of data acquisition using collaboration between input modalities

Info

Publication number: KR20070008993A
Application number: KR1020050063895A
Authority: KR
Inventors: 박성찬; 구명완; 김문식
Original assignee: 주식회사 케이티; 정보통신연구진흥원; 주식회사 케이티프리텔; 주식회사 솔트룩스; 중앙대학교 산학협력단; 서치캐스트 주식회사
Priority date: 2005-07-14
Filing date: 2005-07-14
Publication date: 2007-01-18

Abstract

A method for obtaining data with collaboration between input modalities is provided to flexibly and efficiently offer a service wanted by a user by using diverse input modalities according to an environment and using the collaboration between the input modalities. Data input is waited in an input standby state(201). The data inputted through one input modality is obtained if recognition reliability of the inputted data is over the first reference threshold and switchover to other input modalities having the higher reliability than one input modality is guided if not. The data inputted through one other input modality is obtained if the recognition reliability of the inputted data is over the corresponding reference threshold and the switchover to other input modalities having the higher reliability than input modality is guided if not. The integrated input data is obtained by integrating the input data obtained from each step(209). The obtained integrated input data is verified(210).

Description

Method of data acquisition using collaboration between input modalities}

도 1은 본 발명이 적용되는 음성 인식과 펜 인식을 이용한 멀티모달 시스템의 일실시예 구성도,1 is a configuration diagram of an embodiment of a multi-modal system using speech recognition and pen recognition to which the present invention is applied;

도 2는 본 발명에 따른 음성 인식과 펜 입력 모달리티 간의 보완관계를 이용한 음성 데이터 획득 방법에 대한 일실시예 흐름도이다.2 is a flowchart illustrating a method of acquiring voice data using a complementary relationship between voice recognition and pen input modality according to the present invention.

* 도면의 주요 부분에 대한 부호의 설명* Explanation of symbols for the main parts of the drawings

10 : 멀티모달 시스템 100 : 멀티모달 매니저10: multimodal system 100: multimodal manager

101 : 멀티모달 브라우저 102 : 펜 인식 모듈101: multimodal browser 102: pen recognition module

103 : 음성 인식 모듈 104 : 대화 매니저103: speech recognition module 104: conversation manager

105 : 대화 엔진 106 : 웹 서버105: Dialog Engine 106: Web Server

본 발명은 멀티모달 시스템에서 입력 모달리티 간의 보완관계를 이용한 데이터 획득 방법과 상기 방법을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체에 관한 것으로, 더욱 상세하게는 인간의 두 개 이상의 감각 기관을 모델링한 멀티모달 시스템에서 음성 입출력 시 다른 보완적 관계(협력적 관계)에 있는 입력 수단을 이용하여 다음 상태로 전이하면서 기존에 획득이 실패한 데이터를 획득(수집)함으로써, 전체 시스템의 성능과 신뢰도를 향상시킬 수 있는, 입력 모달리티 간의 보완관계를 이용한 데이터 획득 방법과 상기 방법을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체에 관한 것이다.The present invention relates to a data acquisition method using a complementary relationship between input modalities in a multi-modal system, and a computer-readable recording medium recording a program for realizing the method. In the modeled multi-modal system, when the input and output of the voice is transferred to the next state by using input means in different complementary relations (cooperative relations), data that failed to acquire is collected (collected) to improve the performance and reliability of the entire system. A data acquisition method using a complementary relationship between input modalities and a computer readable recording medium having recorded thereon a program for realizing the method.

일반적으로 멀티모달 시스템은 2개 이상의 인간의 감각 기관을 모델링하여 사용자가 원하는 서비스를 제공하기 위한 시스템이다. 멀티모달 입력의 예로는 음성 입력, 키보드 입력, 마우스 입력, 펜 입력, 터치스크린 입력, 제스처 입력, 안구 이동 입력 등이 있고, 본 발명에서는 편의상 음성 입력과 펜(또는 키보드) 입력을 예로 들어 설명하기로 한다. 그러나 본 발명이 이에 한정되는 것이 아니며, 서로 다른 둘 이상 또는 다수의 입력을 이용하여 구현할 수 있다.In general, a multimodal system is a system for providing a service desired by a user by modeling two or more human sensory organs. Examples of the multi-modal input include voice input, keyboard input, mouse input, pen input, touch screen input, gesture input, eye movement input, and the like. In the present invention, for example, voice input and pen (or keyboard) input will be described for convenience. Shall be. However, the present invention is not limited thereto and may be implemented by using two or more different inputs.

현재 멀티모달의 입력 방식이 다양하게 개발되고 있지만 가장 간단하고 편리한 방식은 한 번에 하나의 모달 입력을 가정하는 것이다. 이러한 순차적 멀티모달 입력 방식은 서로 다른 입력 간의 충돌이나 동기화(synchronization) 등의 기술적인 어려움을 피할 수 있다. 따라서 본 발명에서도 순차적 입력 방식을 채택하여 구현한 경우를 예로 들어 설명하기로 한다.Multimodal input methods are currently being developed in various ways, but the simplest and most convenient method is to assume one modal input at a time. Such a sequential multi-modal input method can avoid technical difficulties such as collision or synchronization between different inputs. Therefore, the present invention will be described by taking the case of implementing the sequential input method as an example.

종래의 멀티모달 시스템에서는 모든 멀티모달 입력 패턴을 독립적으로 보고, 서로 간의 관계를 이용하거나 추론하려는 시도를 하지 않았다. 일반적으로 음성 입력은 별다른 입력 매체가 필요하지 않은 대신에 신뢰도가 낮고, 입력 장치를 통한 인터페이스는 신뢰도는 높지만 휴대성과 편의성이 떨어지는 단점이 있다. 이러한 경우 환경에 따라 여러 입력 방식 간의 선호도가 바뀌며 상이한 입력 방식은 보완관계 또는 협력관계가 생긴다.In conventional multimodal systems, all multimodal input patterns are viewed independently and no attempt is made to use or infer relationships between them. In general, voice input does not require a different input medium, but has low reliability, and an interface through the input device has high reliability but low portability and convenience. In this case, preferences among various input methods change according to the environment, and different input methods have complementary or cooperative relationships.

그럼에도 불구하고 종래의 멀티모달 시스템은 상호 간의 입력을 모두 다른 독립적인 입력으로 간주하여 올바른 입력이 들어올 때까지 사용자로 하여금 비효율적인 입력을 반복하게 하는 문제점이 있었다.Nevertheless, the conventional multimodal system considers each other's inputs to be different independent inputs, and causes a user to repeat an inefficient input until a correct input is received.

본 발명은 상기 문제점을 해결하기 위하여 제안된 것으로, 환경에 따라 입력 방식을 다양하게 두고 상호 간의 보완관계(협력관계)를 이용하여 사용자가 원하는 서비스(입력 방식)를 유연하고 효율적으로 제공할 수 있는, 입력 모달리티 간의 보완관계를 이용한 데이터 획득 방법과 상기 방법을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공하는데 그 목적이 있다.The present invention has been proposed to solve the above problems, and it is possible to flexibly and efficiently provide a service (input method) desired by the user by using various complementary input methods (cooperative relationship) according to the environment. It is an object of the present invention to provide a data acquisition method using a complementary relationship between input modalities and a computer-readable recording medium recording a program for realizing the method.

즉, 본 발명은 인간의 두 개 이상의 감각 기관을 모델링한 멀티모달 시스템에서 음성 입출력 시 다른 보완적 관계(협력적 관계)에 있는 입력 수단을 이용하여 다음 상태로 전이하면서 기존에 획득이 실패한 데이터를 획득(수집)함으로써, 전체 시스템의 성능과 신뢰도를 향상시킬 수 있는, 입력 모달리티 간의 보완관계를 이용한 데이터 획득 방법과 상기 방법을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공하는데 그 목적이 있다.In other words, the present invention is a multi-modal system modeling two or more sensory organs of the human body to transfer the data that has previously failed to acquire while transitioning to the next state by using input means in a different complementary relationship (cooperative relationship) during voice input and output The present invention provides a data acquisition method using a complementary relationship between input modalities and a computer-readable recording medium recording a program for realizing the method, by which acquisition (acquisition) can improve the performance and reliability of the entire system. There is this.

본 발명의 다른 목적 및 장점들은 하기의 설명에 의해서 이해될 수 있으며, 본 발명의 실시예에 의해 보다 분명하게 알게 될 것이다. 또한, 본 발명의 목적 및 장점들은 특허 청구 범위에 나타낸 수단 및 그 조합에 의해 실현될 수 있음을 쉽게 알 수 있을 것이다.Other objects and advantages of the present invention can be understood by the following description, and will be more clearly understood by the embodiments of the present invention. Also, it will be readily appreciated that the objects and advantages of the present invention may be realized by the means and combinations thereof indicated in the claims.

상기 목적을 달성하기 위한 본 발명의 방법은, 데이터 획득 방법에 있어서, 입력 대기 상태에서 데이터 입력을 대기하는 단계; 일 입력 수단을 통하여 입력받은 데이터의 인식 신뢰도가 제 1 기준 임계치 이상임에 따라 입력 데이터를 획득하고, 제 1 기준 임계치 미만임에 따라 상기 일 입력 수단보다 신뢰도가 높은 타 입력 수단으로 전환을 유도하는 제 1 처리 단계; 상기 타 입력 수단을 통하여 입력받은 데이터의 인식 신뢰도가 해당하는 기준 임계치 이상임에 따라 입력 데이터를 획득하고, 해당하는 기준 임계치 미만임에 따라 상기 타 입력 수단보다 신뢰도가 높은 또 다른 입력 수단으로 전환을 유도하는 처리 과정을 최상위 신뢰도의 입력 수단까지 반복 수행하는 제 2 처리 단계; 및 상기 각 처리 단계에서 획득한 입력 데이터를 통합하여 통합 입력 데이터를 획득하는 단계를 포함한다.According to an aspect of the present invention, there is provided a data acquisition method comprising: waiting for data input in an input standby state; Obtaining input data when the recognition reliability of the data input through the one input means is greater than or equal to the first reference threshold, and inducing conversion to another input means having higher reliability than the one input means when the recognition reliability of the data input through the one input means is less than the first reference threshold; 1 treatment step; Acquiring input data when the recognition reliability of the data input through the other input means is equal to or greater than a corresponding reference threshold, and inducing switching to another input means having higher reliability than the other input means as the recognition reliability is lower than the corresponding reference threshold. A second processing step of repeatedly performing the processing up to the input means having the highest reliability; And integrating the input data obtained in each processing step to obtain integrated input data.

또한 상기 본 발명의 방법은, 상기 획득한 통합 입력 데이터를 검증하는 검증 단계를 더 포함한다.The method further includes a verifying step of verifying the obtained integrated input data.

또한 상기 본 발명의 방법은, 상기 검증 단계를 수행하여 최종적으로 획득한 입력 데이터를 이용하여 해당 기준 임계치를 계산하여 다음 서비스에 반영하는 단계를 더 포함한다.In addition, the method of the present invention further includes the step of calculating the reference threshold value using the input data finally obtained by performing the verification step and reflecting the next service.

한편, 본 발명은, 프로세서를 구비한 멀티모달 시스템에, 입력 대기 상태에서 데이터 입력을 대기하는 기능; 일 입력 수단을 통하여 입력받은 데이터의 인식 신뢰도가 제 1 기준 임계치 이상임에 따라 입력 데이터를 획득하고, 제 1 기준 임계치 미만임에 따라 상기 일 입력 수단보다 신뢰도가 높은 타 입력 수단으로 전환을 유도하는 기능; 상기 타 입력 수단을 통하여 입력받은 데이터의 인식 신뢰도가 해당하는 기준 임계치 이상임에 따라 입력 데이터를 획득하고, 해당하는 기준 임계치 미만임에 따라 상기 타 입력 수단보다 신뢰도가 높은 또 다른 입력 수단으로 전환을 유도하는 과정을 최상위 신뢰도의 입력 수단까지 반복 수행하는 기능; 및 상기 획득한 각 입력 데이터를 통합하여 통합 입력 데이터를 획득하는 기능을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공한다.On the other hand, the present invention, a multi-modal system having a processor, the function to wait for data input in the input standby state; A function of acquiring input data when the recognition reliability of the data input through the one input means is greater than or equal to the first reference threshold, and inducing a switch to another input means having higher reliability than the one input means when the recognition reliability of the data input through the one input means is less than or equal to the first reference threshold. ; Acquiring input data when the recognition reliability of the data input through the other input means is equal to or greater than a corresponding reference threshold, and inducing switching to another input means having higher reliability than the other input means as the recognition reliability is lower than the corresponding reference threshold. Repeating the process up to the input means having the highest reliability; And a computer-readable recording medium having recorded thereon a program for realizing a function of acquiring integrated input data by integrating the obtained input data.

또한 상기 본 발명은, 상기 획득한 통합 입력 데이터를 검증하는 검증 기능을 더 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공한다.The present invention also provides a computer-readable recording medium having recorded thereon a program for further realizing a verification function for verifying the acquired integrated input data.

또한 상기 본 발명은, 상기 검증 기능을 수행하여 최종적으로 획득한 입력 데이터를 이용하여 해당 기준 임계치를 계산하여 다음 서비스에 반영하는 기능을 더 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공한다.The present invention also provides a computer-readable recording medium having recorded thereon a program for further realizing a function of calculating a corresponding reference threshold value to reflect the next service using the input data finally obtained by performing the verification function. do.

이처럼 본 발명은 멀티모달 시스템에서 입력 모달리티 간의 보완관계(협력관 계)를 이용하여 데이터를 획득하고, 획득한 데이터를 인식 성능을 높이는데 이용할 수 있도록 한다. 즉, 본 발명은 신뢰도가 낮은 입력이 들어와 인식이 거듭 실패할 경우 그보다 높은 신뢰도의 입력이 존재하는지를 조사해 보고, 존재하면 그 입력을 사용자로 하여금 채택하도록 함으로써, 전체적인 멀티모달 시스템의 신뢰도를 향상시킬 수 있다. 일예로, 본 발명에서는 음성 인식이 실패한 경우에도 다른 모달 입력(펜 입력)을 제시함으로써, 음성 데이터의 획득과 재사용을 가능하게 하고, 시스템의 성능을 향상시키는 등 멀티모달 시스템의 신뢰도를 높일 수 있다.As described above, the present invention allows data to be acquired using a complementary relationship (cooperative relationship) between input modalities in a multi-modal system, and can be used to increase the recognition performance. That is, the present invention can improve the reliability of an overall multi-modal system by examining whether an input of higher reliability exists if the input of a low reliability comes in and if recognition fails repeatedly. have. For example, in the present invention, even if the speech recognition fails, by presenting another modal input (pen input), it is possible to increase the reliability of the multi-modal system by enabling the acquisition and reuse of the voice data and improving the performance of the system. .

상술한 목적, 특징 및 장점은 첨부된 도면과 관련한 다음의 상세한 설명을 통하여 보다 분명해 질 것이며, 그에 따라 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명의 기술적 사상을 용이하게 실시할 수 있을 것이다. 또한, 본 발명을 설명함에 있어서 본 발명과 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에 그 상세한 설명을 생략하기로 한다. 이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 일실시예를 상세히 설명하기로 한다.The above objects, features and advantages will become more apparent from the following detailed description taken in conjunction with the accompanying drawings, whereby those skilled in the art may easily implement the technical idea of the present invention. There will be. In addition, in describing the present invention, when it is determined that the detailed description of the known technology related to the present invention may unnecessarily obscure the gist of the present invention, the detailed description thereof will be omitted. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

먼저, 본 발명의 이해를 돕기 위하여 입력 방식들의 신뢰도 및 멀티모달 시스템의 개념에 대하여 살펴보면 다음과 같다.First, to help understand the present invention, the reliability of the input schemes and the concept of a multimodal system will be described.

사용자가 입력한 펜 입력이나 음성 입력의 인식 결과는 그 패턴에 따라 신뢰 도(confidence score)가 모두 다르다. 예를 들어, 터치스크린의 경우 입력 신호의 인식 결과는 어떤 신호가 감지되었을 때와 아닐 때의 두 경우밖에 없으므로 신뢰도가 항상 1 또는 0인 반면에, 펜 인식 결과나 음성 인식 결과의 신뢰도는 주변 환경 의 영향이나 기술적 어려움 때문에 0과 1 사이에 존재한다. 음성 인식 시스템에서는 음성 인터페이스 이외에는 다른 입력 수단이 없으므로 연속해서 인식에 실패할 경우에는 다음 과정으로 전이할 수 없는 단점이 있었다. 그런데, 멀티모달 시스템에서는 입력 수단이 다양하게 마련되어 있을 뿐만 아니라 어떤 입력의 인식이 실패하였을 때 다른 입력을 제시하는 입력 장치 간의 보완 수단이 마련되어 있다. 이는 인간 세계에서도 흔히 볼 수 있는 상황을 모델링한 것으로 예컨대, 두 사람이 전화 통화 시 상대편에서 상대방의 이름을 잘 이해하지 못할 때 이쪽에서 스펠링을 불러주는 경우가 바로 그것이다.The recognition result of the pen input or voice input input by the user differs depending on the pattern. For example, in the case of the touch screen, the recognition result of the input signal is only two cases when a signal is detected and not, so the reliability is always 1 or 0, whereas the reliability of the pen recognition result or the speech recognition result is the surrounding environment. It is between 0 and 1 because of its influence or technical difficulties. In the speech recognition system, since there is no other input means other than the speech interface, there is a disadvantage in that it cannot transfer to the next process if the recognition fails continuously. However, in the multi-modal system, not only various input means are provided, but also complementary means are provided between input devices which present different inputs when a certain input fails. This modeled a situation that is common in the human world, for example, when two people don't understand each other's names when they talk on the phone.

도 1은 본 발명이 적용되는 음성 인식과 펜 인식을 이용한 멀티모달 시스템의 일실시예 구성도이다.1 is a block diagram of an embodiment of a multi-modal system using speech recognition and pen recognition according to the present invention.

도 1에 도시된 바와 같이, 본 발명이 적용되는 음성 인식과 펜 인식을 이용한 멀티모달 시스템은, 멀티모달 매니저(100)와 대화 매니저(104)를 포함하여 이루어지고, 상기 두 매니저(100, 104)는 멀티모달 매니저 인터페이스를 통하여 연결된다. 그리고 상기 멀티모달 매니저(100)는 멀티모달 브라우저(101), 펜 인식 모듈(서버)(102), 음성 인식 모듈(서버)(103)을 포함하여 이루어지고, 상기 대화 매니저(104)는 대화 엔진(105)과 웹 서버(106)를 포함하여 이루어지며, 상기 대화 매니저(104)는 외부의 기타 에이전트와 대화 기술 인터페이스를 통하여 연결되어 있다.As shown in FIG. 1, the multimodal system using voice recognition and pen recognition to which the present invention is applied includes a multimodal manager 100 and a conversation manager 104, and the two managers 100 and 104. ) Is connected through the multimodal manager interface. The multi-modal manager 100 includes a multi-modal browser 101, a pen recognition module (server) 102, a voice recognition module (server) 103, and the conversation manager 104 includes a conversation engine. It comprises a 105 and a web server 106, the conversation manager 104 is connected to other external agents through a conversation technology interface.

이때, 음성 인식과 펜(필기) 인식 등의 멀티모달 상황에서는 일반적으로 상당한 수준의 컴퓨팅 성능을 필요로 한다. 따라서 입력된 신호에 대해서는 단말 측에서 일정 수준의 전처리 과정을 수행한 후에 음성 인식 모듈(103) 또는 펜 인식 모듈(102)로 그 특징 추출 결과를 보내서 음성 인식 모듈(103) 또는 펜 인식 모듈(102)에서 최종 결과를 얻는 방식이 효율적이다.In this case, multimodal situations, such as speech recognition and pen (writing) recognition, generally require a considerable level of computing power. Therefore, after the terminal performs a predetermined level of preprocessing on the input signal, the feature extraction result is sent to the speech recognition module 103 or the pen recognition module 102, and thus the speech recognition module 103 or the pen recognition module 102 is performed. ) Is the way to get the final result.

그리고 멀티모달 시스템에서는 시스템과 사용자 간의 상호 작용을 위해서 대화 매니저(104)가 특정 작업을 위한 입력 조건이 충족될 때까지 대화의 흐름을 감독한다. 그리고 좀 더 일반적인 접근을 위해 대화 기술 언어인 보이스(Voice) XML(eXtensible Markup Language)과 웹 마크업 언어인 HTML(HyperText Markup Language) 모두를 사용하여 두 개의 중요한 입력 모달리티가 통합될 수 있도록 하였다.In a multimodal system, for the interaction between the system and the user, the conversation manager 104 monitors the flow of the conversation until the input condition for the specific task is satisfied. For a more general approach, two important input modalities can be integrated using both Voice (eXtensible Markup Language) XML and Web Markup Language (HyperText Markup Language).

상기 멀티모달 브라우저(101)는 사용자에게 입력 장치 간의 서로 동등한 표현을 가능하게 해 주는데, 현재의 시스템에서는 음성 다이얼로그의 컴퓨터 환경과 기존의 그래픽 인터페이스 환경을 묶어 주는 역할을 한다. 한편, 대화 매니저(104)는 사용자의 입력을 받아 자체 저장 장치에 저장하고, 또한 웹 서버(106)로부터 필요한 정보를 갈무리하여 사용자에게 전달한다. 참고로, 본 발명이 적용되는 멀티모달 시스템에서는 입력 장치에 초점을 맞추어 편의상 음성 합성 등의 음성 출력 모듈(서버) 등은 도면에 도시하지 않았다.The multi-modal browser 101 enables a user to have an equal representation between input devices. In the current system, the multimodal browser 101 binds the computer environment of the voice dialog and the existing graphic interface environment. On the other hand, the conversation manager 104 receives the user's input and stores it in its own storage device, and captures the necessary information from the web server 106 and delivers it to the user. For reference, in the multi-modal system to which the present invention is applied, a voice output module (server) such as speech synthesis is not shown in the drawings for the convenience of focusing on the input device.

도 2는 본 발명에 따른 음성 인식과 펜 입력 모달리티 간의 보완관계를 이용한 음성 데이터 획득 방법에 대한 일실시예 흐름도로서, 입력 장치로부터 들어온 데이터를 통합/검증하여 다음 과정으로 넘어가기까지의 과정을 나타낸 흐름도이다.FIG. 2 is a flowchart illustrating a method of acquiring voice data using a complementary relationship between voice recognition and pen input modality according to the present invention, and illustrates a process of integrating / verifying data input from an input device and proceeding to the next step. It is a flow chart.

도 2에 도시된 바와 같이, 멀티모달 시스템에서 사용자가 음성 인식을 통해 입력을 시도할 때 신뢰도가 임계치를 넘지 못하는 음성 데이터가 소정 횟수 이상( 예 : 2회 이상) 입력될 경우 대화 매니저(104)는 사용자에게 펜으로 입력을 요청하고 그 결과에 대하여 타당성을 검사한 후에 적법한 데이터에 대하여 음성 데이터와 텍스트 데이터를 수거하여 저장 장치에 저장한다. 이를 일반적인 멀티모달 입력으로 확장하면 신뢰도가 낮은 입력이 들어와 인식이 거듭 실패할 경우 그보다 높은 신뢰도의 인식 결과가 예상되는 입력이 존재하는지를 조사해 보고, 존재하면 그 입력을 사용자로 하여금 채택하도록 함으로써, 전체적인 멀티모달 시스템의 인식 성능을 향상시킬 수 있다. 일반적으로 신뢰도가 낮은 입력부터 높은 입력 순으로 나열할 경우 음성 입력, 필기체 입력, 키보드 입력, 터치스크린(펜, 손), 마우스 순이 된다.As illustrated in FIG. 2, in a multi-modal system, when a user attempts to input through speech recognition, when the voice data whose reliability does not exceed a threshold is input more than a predetermined number of times (eg, two or more times), the conversation manager 104 is input. After the user requests input with a pen and checks the validity of the result, the voice data and text data are collected and stored in the storage device. Extending this to a general multimodal input, if a low reliability input comes in and if the recognition fails repeatedly, examine whether there is an input that expects a higher reliability recognition result, and if so, let the user adopt the input. It can improve the recognition performance of modal system. In general, the order of low reliability input to high input is voice input, handwriting input, keyboard input, touch screen (pen, hand), and mouse.

도 2를 참조하여 음성 인식과 펜 입력 모달리티 간의 보완관계를 이용한 음성 데이터 획득 방법에 대하여 상세히 살펴보면 다음과 같다.Referring to FIG. 2, a method of obtaining voice data using a complementary relationship between voice recognition and pen input modality is described in detail as follows.

먼저, 서비스가 시작되면 초기에 사용자의 입력을 기다리는 음성 입력 또는 펜 입력 대기 상태가 된다(201).First, when the service is started, a voice input or pen input waiting state is initially waited for the user's input (201).

상기 음성 입력 또는 펜 입력 대기 상태(201)에서 사용자가 음성 입력을 선택하여 음성을 입력하면 해당 음성을 입력받아 끝점을 검출하여 음성 데이터 세그먼트를 확보한다(202).When the user selects a voice input and inputs a voice in the voice input or pen input standby state 201, the voice is received and the end point is detected to secure a voice data segment (202).

이후, 음성 인식 엔진용 및 펜 인식 목록을 나타내는 인식 그래마(205)를 이용하여 음성 입력 데이터에 대하여 음성 인식 신뢰도를 구한다(204). 여기서, 일예로 음성 인식 신뢰도는 대상 모델의 음소 시퀀스와 안티 모델의 음소 시퀀스의 비로서 계산할 수 있다. 하나의 안티 모델은 대상 모델을 제외한 나머지 데이터를 모 두 병합하여 만들어지는데, 모델의 크기가 본래 데이터의 크기보다 커지게 되므로 대상 모델의 데이터 량과의 균형을 맞추기 위해 대상 데이터 량과 비례하게 임의(랜덤)의 데이터를 선별하여 적절한 양의 안티 데이터를 생성한다.The speech recognition reliability is then obtained for the speech input data using the recognition grammar 205 representing the speech recognition engine and the pen recognition list (204). Here, for example, the speech recognition reliability may be calculated as a ratio between the phoneme sequence of the target model and the phoneme sequence of the anti-model. One anti-model is created by merging all the data except the target model. Since the size of the model becomes larger than the original data, it is randomly proportional to the target data volume to balance the data volume of the target model. Random data is generated to generate an appropriate amount of anti-data.

이후, 상기 구한 음성 인식 신뢰도를 기준(허용) 임계치와 비교하여(207) 1차 음성 인식을 실패한 경우에는 상기 음성 입력 과정(202)으로 진행하여 다음 과정들을 수행하고, 2차 음성 인식을 실패한 경우에는 펜 입력 과정(203)으로 진행하여 다음 과정들을 수행하며, 음성 인식이 성공한 경우에는 멀티모달 입력 통합 과정(209)으로 진행한다. 여기서, 상기 기준 임계치를 설정하는 방식을 구체적인 예를 들어 살펴보면, 이후의 검증 과정(210)을 수행하여 최종적으로 획득한 입력 데이터를 이용하여 기준 임계치를 계산하여 다음 서비스에 반영한다. 상기 입력 인식 성공 및 실패의 각 경우에 대한 처리 절차에 대해서는 후술하기로 한다.Subsequently, when primary speech recognition fails by comparing the obtained speech recognition reliability with a reference (allowed) threshold (207), the process proceeds to the speech input process 202 to perform the following processes, and when the secondary speech recognition fails. Next, the process proceeds to the pen input process 203 and performs the following processes. If the speech recognition is successful, the process proceeds to the multimodal input integration process 209. Here, referring to a specific example of setting the reference threshold value, the reference threshold value is calculated by using the input data finally obtained by performing a subsequent verification process 210 and reflected in the next service. The processing procedure for each case of the input recognition success and failure will be described later.

한편, 상기 음성 입력 또는 펜 입력 대기 상태(201)에서 사용자가 펜 입력을 선택하여 펜을 통하여 데이터를 입력하면 해당 데이터를 인식하여 펜 입력 데이터를 확보한다(203).Meanwhile, when the user selects a pen input and inputs data through the pen in the voice input or pen input standby state 201, the pen recognizes the corresponding data to secure pen input data (203).

이후, 음성 인식 엔진용 및 펜 인식 목록을 나타내는 인식 그래마(205)를 이용하여 펜 입력 데이터에 대하여 펜 인식 신뢰도를 구한다(206). 여기서, 펜 인식 신뢰도는 필기 데이터를 먼저 크기 정규화한 후 위의 음성 인식에서 사용한 방식과 유사하게 인식 신뢰도를 계산한다.The pen recognition reliability is then obtained for the pen input data using the recognition grammar 205 representing the speech recognition engine and the pen recognition list (206). Here, the pen recognition reliability calculates the recognition reliability similarly to the method used in the above speech recognition after size normalizing the handwritten data first.

이후, 상기 구한 펜 인식 신뢰도를 기준(허용) 임계치와 비교하여(208) 1차 펜 인식을 실패한 경우에는 상기 펜 입력 과정(203)으로 진행하여 다음 과정들을 수행하고, 2차 펜 인식을 실패한 경우에는 상기 음성 입력 또는 펜 입력 대기 상태(201)로 진행하여 다음 과정들을 수행하며, 펜 인식이 성공한 경우에는 멀티모달 입력 통합 과정(209)으로 진행한다. 여기서, 상기 기준 임계치를 설정하는 방식을 구체적인 예를 들어 살펴보면, 이후의 검증 과정(210)을 수행하여 최종적으로 획득한 입력 데이터를 이용하여 기준 임계치를 계산하여 다음 서비스에 반영한다. 상기 입력 인식 성공 및 실패의 각 경우에 대한 처리 절차에 대해서는 후술하기로 한다.Subsequently, when the first pen recognition fails by comparing the obtained pen recognition reliability with a reference (allowed) threshold (208), the process proceeds to the pen input process 203 to perform the following processes, and when the second pen recognition fails Next, the process proceeds to the voice input or pen input standby state 201 and performs the following processes. If the pen recognition is successful, the process proceeds to the multimodal input integration process 209. Here, referring to a specific example of setting the reference threshold value, the reference threshold value is calculated by using the input data finally obtained by performing a subsequent verification process 210 and reflected in the next service. The processing procedure for each case of the input recognition success and failure will be described later.

이후, 상기 인식한 음성 입력 데이터 및 펜 입력 데이터를 통합하여 전체 입력 데이터를 획득한다(209). 예를 들어 '홍길동’의 음성 인식과 펜 인식의 결과는 다음의 부가 정보와 함께 아래의 [표 1]과 같이 저장매체에 저장된다. 일반적으로 인식 신뢰도가 낮은 입력 데이터에서 높은 입력 데이터의 순서대로 저장된다.Thereafter, the recognized voice input data and the pen input data are integrated to obtain all input data (209). For example, the results of Hong Gil-dong's voice recognition and pen recognition are stored in the storage media as shown in Table 1 below with the following additional information. In general, the input data is stored in the order of high input data from low input reliability data.

이후, 부가적으로 상기 획득한 전체 입력 데이터의 오류를 검증하여 최종적으로 고순도의 입력 데이터를 얻어낼 수 있다(210).Thereafter, in operation 210, an error of the obtained entire input data may be verified to finally obtain high purity input data.

상기 비교 과정(207, 208)에서 입력 데이터를 성공적으로 인식하여 다음 과정으로 전이하는 경우는 다음과 같다.In the comparison process 207 and 208, the case where the input data is successfully recognized and transferred to the next process is as follows.

1. 초기 상태(펜 또는 음성 입력 대기)→음성(성공)→다음 과정1.Initial state (pen or voice input standby) → Voice (success) → Next process

2. 초기 상태→펜(성공)→다음 과정2. Initial state → Pen (success) → Next step

3. 초기 상태→펜(실패)→입력 요청(한번 더 써주세요)→펜(성공)→다음 과정3. Initial state → Pen (failure) → Input request (write one more time) → Pen (success) → Next step

4. 초기 상태→음성(실패)→입력 요청(한번 더 말하세요)→음성(실패)→입력 요청(이번엔 펜으로 써주세요)→펜(성공)→다음 과정4. Initial state → Voice (failure) → Input request (speak one more time) → Voice (failure) → Input request (write with pen this time) → Pen (success) → Next step

5. 초기 상태→음성(실패)→입력 요청(한번 더 말하세요)→음성(실패)→입력 요청(이번엔 펜으로 써주세요)→펜(실패)→입력 요청(한번 더 써주세요)→펜(성공)→다음 과정5. Initial state → Voice (failure) → input request (speak one more time) → Voice (failure) → input request (write with pen this time) → Pen (failure) → input request (write one more time) → Pen (success) → Next course

상기 비교 과정(207, 208)에서 입력 데이터에 대한 인식을 실패하여 다시 초기화 상태로 돌아가는 경우는 다음과 같다.In the comparison process 207 or 208, the recognition of the input data fails and returns to the initialization state as follows.

1. 초기 상태→음성(실패)→타임아웃→초기 상태1.Initial state → Voice (failure) → Timeout → Initial state

2. 초기 상태→음성(실패)→입력 요청(한번 더 말하세요)→타임아웃→초기 상태2. Initial state → Voice (failure) → Input request (speak again) → Timeout → Initial state

3. 초기 상태→음성(실패)→입력 요청(한번 더 말하세요)→음성(실패)→입력 요청(이번엔 펜으로 써주세요)→타임아웃→초기 상태3. Initial state → voice (failure) → input request (speak one more time) → voice (failure) → input request (write with a pen this time) → time out → initial state

4. 초기 상태→음성(실패)→입력 요청(한번 더 말하세요)→음성(실패)→입력 요청(이번엔 펜으로 써주세요)→펜(실패)→입력 요청(한번 더 써주세요)→타임아웃→초기 상태4. Initial state → voice (failure) → input request (speak one more time) → voice (failure) → input request (write with a pen this time) → pen (failure) → input request (write one more time) → time out → initial stage condition

5. 초기 상태→음성(실패)→입력 요청(한번 더 말하세요)→음성(실패)→입력 요청(이번엔 펜으로 써주세요)→펜(실패)→입력 요청(한번 더 써주세요)→펜(실패)→초기 상태5. Initial state → voice (failure) → input request (speak one more time) → voice (failure) → input request (write with a pen this time) → pen (failure) → input request (write one more time) → pen (failure) → Initial state

6. 초기 상태→펜(실패)→입력 요청(한번 더 써주세요)→타임아웃→초기 상태6. Initial state → Pen (failure) → Input request (write one more time) → Timeout → Initial state

7. 초기 상태→펜(실패)→입력 요청(한번 더 써주세요)→펜(실패)→초기 상태7. Initial state → Pen (failure) → Input request (write one more time) → Pen (failure) → Initial state

최종적으로 획득되는 데이터는 인식이 성공하여 다음 과정으로 전이되는 경우에 한한다. 결과적으로 기준 신뢰도를 만족하는 음성 데이터의 경우는 텍스트 정보가 부재하나, 기준 신뢰도를 만족하지 못하여 펜 입력의 대체 입력 수단에 의해 최종 허용되는 입력은 음성 데이터뿐만 아니라 텍스트 정보도 함께 포함되는데 이는 검증 후에 분석용 학습 데이터로 이용되는 등의 활용 가치가 높다.The data finally obtained is limited to the case where recognition is successful and transferred to the next process. As a result, in the case of speech data that satisfies the reference reliability, text information is absent, but the input finally accepted by the alternative input means of the pen input because it does not satisfy the reference reliability includes not only the speech data but also the text information. High value for use, such as being used as learning data for analysis.

이처럼 본 발명은, 점점 다양화되고 있는 사용자 입력 환경에 있어서 각각의 입력 수단을 별개 또는 독립적으로 보지 않고 신뢰도가 낮은 입력부터 높은 입력 장치까지 입력 장치 간의 관계를 미리 설정해 놓고 임의의 입력 장치가 인식에 실패하였을 경우, 신뢰도가 허용 임계치에 다다를 때까지 대화 매니저가 연속적으로 사용자에게 대체 장치를 통해 입력 요청을 시도하여 보다 사용자 중심의 견고하고 지능적인 서비스를 제공한다.As described above, the present invention sets up a relationship between input devices from low-reliability input to high input devices in advance in an increasingly diversified user input environment, without regard to each input means independently or independently. If unsuccessful, the conversation manager continuously attempts an input request from the alternate device to the user until the reliability reaches an acceptable threshold, providing a more user-centric, robust and intelligent service.

즉, 본 발명은, 최종 인식이 성공되는 입력 장치 간의 시퀀스를 미리 테이블로 정의한 후 대화 매니저로 하여금 음성 인식 과정과 펜 입력 과정을 혼용하여 음성 인식이 두 번 이상 실패하였을 경우 인식 문법에 정의된 인식 대상 단어를 펜 입력으로 대체하여 사용자에게 요청함으로써, 사용자에게 빈틈없는(Seamless) 서비스를 제공하고 텍스트화된 음성 데이터를 획득할 수 있는 장점을 제공하고 있다. 결과적으로, 대체 입력 수단이 존재하는 경우 기준 임계치를 다소 엄격하게 설정하여 오인식의 확률을 줄일 수 있게 된다. 마지막으로 데이터의 신뢰성을 높이기 위해 인식 과정을 거친 음성 데이터와 텍스트 데이터 간의 제3자간 검증을 통해 오류가 발생한 데이터를 제거함으로써, 최종적으로 고품질의 음성 데이터를 획득하게 된다. 결과적으로 이는 음성 인식 시스템의 점진적인 품질 개선에 기여할 수 있다.That is, the present invention defines a sequence between input devices for which the final recognition is successful in advance, and then allows the conversation manager to use the speech recognition process and the pen input process to recognize the speech defined in the recognition grammar more than once. By replacing the target word with a pen input and requesting the user, the user can provide a seamless service to the user and obtain textual voice data. As a result, it is possible to reduce the probability of misperception by setting the reference threshold somewhat strict if an alternative input means is present. Finally, in order to increase the reliability of the data, the third-party verification between the speech data and the text data which has undergone the recognition process is eliminated, thereby obtaining high quality speech data. As a result, this may contribute to the gradual quality improvement of the speech recognition system.

상기 실시예에서는 음성 인식과 펜 인식 두 가지의 입력 장치를 가정하였으나 상기 실시예는 하나의 예에 불과하며, 본 발명에서 멀티모달 시스템을 위한 입력 장치로는 음성 입력, 키보드 입력, 마우스 입력, 펜 입력, 터치스크린 입력, 제스처 입력, 안구 이동 입력 등 모든 형태의 입력 장치가 이용될 수 있다.In the above embodiment, it is assumed that two types of input devices are speech recognition and pen recognition. However, the embodiment is just an example. In the present invention, the input device for the multi-modal system is a voice input, a keyboard input, a mouse input, a pen. All types of input devices may be used, such as input, touch screen input, gesture input, and eye movement input.

상술한 바와 같은 본 발명의 방법은 프로그램으로 구현되어 컴퓨터로 읽을 수 있는 형태로 기록매체(씨디롬, 롬, 램, 플로피 디스크, 하드 디스크, 광자기 디스크 등)에 저장될 수 있다. 이러한 과정은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있으므로 더 이상 상세히 설명하지 않기로 한다.As described above, the method of the present invention may be implemented as a program and stored in a recording medium (CD-ROM, ROM, RAM, floppy disk, hard disk, magneto-optical disk, etc.) in a computer-readable form. Since this process can be easily implemented by those skilled in the art will not be described in more detail.

이상에서 설명한 본 발명은, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 있어 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 여러 가지 치환, 변형 및 변경이 가능하므로 전술한 실시예 및 첨부된 도면에 의해 한정되는 것이 아니다.The present invention described above is capable of various substitutions, modifications, and changes without departing from the technical spirit of the present invention for those skilled in the art to which the present invention pertains. It is not limited by the drawings.

상기와 같은 본 발명은, 인간의 두 개 이상의 감각 기관을 모델링한 멀티모 달 시스템에서 음성 입출력 시 다른 보완적 관계(협력적 관계)에 있는 입력 수단을 이용하여 다음 상태로 전이하면서 기존에 획득이 실패한 데이터를 획득(수집)함으로써, 전체 시스템의 성능과 신뢰도를 향상시킬 수 있는 효과가 있다.The present invention as described above, in the multi-modal system modeling two or more sensory organs of the human being obtained while the transition to the next state by using the input means in a different complementary relationship (cooperative relationship) during voice input and output By acquiring (collecting) failed data, the performance and reliability of the entire system can be improved.

즉, 본 발명은 점점 다양화되고 있는 사용자 입력 환경에 있어서 각각의 입력 수단을 별개 또는 독립적으로 보지 않고 신뢰도가 낮은 입력부터 높은 입력 장치까지 입력 장치 간의 관계를 미리 설정해 놓고 임의의 입력 장치가 인식에 실패하였을 경우, 신뢰도가 허용 임계치에 다다를 때까지 대화 매니저가 연속적으로 사용자에게 대체 장치를 통해 입력 요청을 시도하여 보다 사용자 중심의 견고하고 지능적인 서비스를 제공할 수 있는 효과가 있다.In other words, in an increasingly diversified user input environment, the present invention pre-sets the relationship between input devices from low-reliability input to high input devices without having to view each input means separately or independently, and any input device is used for recognition. If unsuccessful, the conversation manager continuously attempts an input request to the user through an alternative device until the reliability reaches an allowable threshold, thereby providing a more user-oriented robust and intelligent service.

또한 본 발명은, 최종 인식이 성공되는 입력 장치 간의 시퀀스를 미리 테이블로 정의한 후 대화 매니저로 하여금 음성 인식 과정과 펜 입력 과정을 혼용하여 음성 인식이 두 번 이상 실패하였을 경우 인식 문법에 정의된 인식 대상 단어를 펜 입력으로 대체하여 사용자에게 요청함으로써, 사용자에게 빈틈없는(Seamless) 서비스를 제공하고 텍스트화된 음성 데이터를 획득할 수 있는 장점을 제공하며, 결과적으로 대체 입력 수단이 존재하는 경우 기준 임계치를 다소 엄격하게 설정하여 오인식의 확률을 줄일 수 있고, 마지막으로 데이터의 신뢰성을 높이기 위해 인식 과정을 거친 음성 데이터와 텍스트 데이터 간의 제3자간 검증을 통해 오류가 발생한 데이터를 제거함으로써, 최종적으로 고품질의 음성 데이터를 획득하고, 그에 따라 음성 인식 시스템의 점진적인 품질 개선에 기여할 수 있는 효과가 있다.In addition, the present invention, after defining the sequence between the input devices that the final recognition is successful in advance to the table, the conversation manager uses a speech recognition process and a pen input process, if the speech recognition fails more than once, the recognition target defined in the recognition grammar By replacing words with pen input and asking the user, it provides the user with a seamless service and the advantage of obtaining textualized voice data. It is possible to reduce the probability of misrecognition by setting up more strictly, and finally, to remove the error data through the third-party verification between the speech data and the text data which have been recognized to increase the reliability of the data, and finally the high quality speech Acquire the data and accordingly the points of the speech recognition system There is an effect that can contribute to quality improvement.

Claims

In the data acquisition method,

Waiting for data input in an input waiting state;

Obtaining input data when the recognition reliability of the data input through the one input means is greater than or equal to the first reference threshold; 1 treatment step;

Acquiring input data when the recognition reliability of the data input through the other input means is equal to or greater than a corresponding reference threshold, and inducing switching to another input means having higher reliability than the other input means as the recognition reliability is lower than the corresponding reference threshold. A second processing step of repeatedly performing the processing up to the input means having the highest reliability; And

Acquiring integrated input data by integrating the input data acquired in each processing step;

Data acquisition method using a complementary relationship between the input modality comprising a.

The method of claim 1,

A verification step of verifying the acquired integrated input data

Data acquisition method using a complementary relationship between the input modality further comprising.

The method of claim 2,

Performing the verification step to calculate the corresponding threshold value using the finally obtained input data to reflect the next service

The method according to any one of claims 1 to 3,

The first processing step,

A first input step of receiving data through the first input means;

A first comparing step of recognizing the data input in the first input step, measuring a recognition reliability, and comparing the measured reliability with a first reference threshold;

Obtaining input data according to a comparison result of the first comparing step, as the recognition reliability is equal to or greater than a first reference threshold;

A first re-input processing step of repeating the first input step and the first comparison step by requesting re-input as the recognition result is less than a first reference threshold as a result of the comparison in the first comparison step;

Acquiring input data according to a comparison result in the first re-input processing step, as the recognition reliability is equal to or greater than a first reference threshold; And

A first switch derivation step of inducing a switch to the other input means having higher reliability than the first input means as a result of the comparison in the first re-input processing step, as the recognition reliability is less than the first reference threshold;

The method of claim 4, wherein

The processing of the second processing step,

A second input step of receiving data through the other input means;

A second comparing step of recognizing data input in the second input step, measuring a recognition reliability, and comparing the received data with a second reference threshold;

Obtaining input data according to a comparison result of the second comparing step, the recognition reliability being greater than or equal to a second reference threshold;

A second re-input processing step of repeating the second input step and the second comparison step by requesting re-input as the recognition result is less than a second reference threshold as a result of the comparison in the second comparison step;

Obtaining input data as a result of the comparison in the second re-input processing step, as the recognition reliability is greater than or equal to a second reference threshold; And

A second switch derivation step of inducing a switch to another input means having higher reliability than the other input means as the recognition reliability is less than a second reference threshold as a result of the comparison in the second re-input processing step.

The method of claim 5,

The first input means is a voice input means, and the other input means is a pen input means, using a complementary relationship between input modalities, wherein the text data or the text data and the voice data are obtained through one conversion induction process. Data acquisition method.

In a multimodal system with a processor,

Waiting for data input in an input standby state;

A function of acquiring input data when the recognition reliability of the data input through the one input means is greater than or equal to the first reference threshold, and inducing a switch to another input means having higher reliability than the one input means when the recognition reliability of the data input through the one input means is less than or equal to the first reference threshold. ;

Acquiring input data when the recognition reliability of the data input through the other input means is equal to or greater than a corresponding reference threshold, and inducing switching to another input means having higher reliability than the other input means as the recognition reliability is lower than the corresponding reference threshold. Repeating the process up to the input means having the highest reliability; And

A function of integrating the obtained input data to obtain integrated input data

A computer-readable recording medium having recorded thereon a program for realizing this.

The method of claim 7, wherein

Verification function for verifying the acquired integrated input data

A computer-readable recording medium that records a program for further realization.

The method of claim 8,

A function of calculating the corresponding threshold value using the finally obtained input data by performing the verification function and reflecting it to the next service