KR20120110751A

KR20120110751A - Speech processing apparatus and method

Info

Publication number: KR20120110751A
Application number: KR1020110028816A
Authority: KR
Inventors: 이근배; 최준휘; 김석환; 김경덕; 이동현
Original assignee: 포항공과대학교 산학협력단
Priority date: 2011-03-30
Filing date: 2011-03-30
Publication date: 2012-10-10
Also published as: KR101197010B1

Abstract

PURPOSE: A speech processing device and a method thereof are provided to automatically recognize a speech for modification without special modification as soon as a speech is inputted. CONSTITUTION: A speech recognizing module(100) recognizes a speech of a user. The speech recognizing module outputs qualification information for determining an intention of a speech of a user. A speech intension determining module(300) determines an intension of a speech of the user. A character input module(500) inputs a character according to the intension. [Reference numerals] (100) Speech recognizing module; (200) Error extracting module; (300) Speech intension determining module; (400) Training word chunk database; (500) Character input module; (AA) User speech

Description

Speech Processing Apparatus and Method

본 발명은 음성 처리 및 방법에 관한 것으로, 보다 구체적으로는 음성 워드 프로세서를 구축하는 데 있어 음성을 통한 문장 입력 및 수정을 수행하는 음성 처리 장치 및 방법에 관한 것이다.The present invention relates to speech processing and methods, and more particularly, to a speech processing apparatus and method for performing sentence input and correction through speech in constructing a speech word processor.

최근 스마트폰 등 모바일 기기의 넓은 보급으로 인하여 음성 인식 소프트웨어에 대한 관심이 증대되고 있다. Recently, due to the wide spread of mobile devices such as smart phones, interest in voice recognition software is increasing.

음성 인식이란 자동적 수단에 의하여 음성으로부터 언어적 의미 내용을 식별하는 것이다. 구체적으로 음성 파형을 입력하여 단어나 단어열을 식별하고 의미를 추출하는 처리 과정이며, 크게 음성 분석, 음소 인식, 단어 인식, 문장 해석, 의미 추출의 5가지로 분류된다. 좁은 의미로는 음성 분석에서 단어 인식까지를 말하는 경우가 많다. 인간-기계 인터페이스 개선의 하나로 음성으로 정보를 입력하는 음성 인식과 음성으로 정보를 출력하는 음성 합성 기술의 연구 개발이 오랫동안 진행되어 왔다. 대형 장치를 필요로 하였던 음성 인식 장치와 음성 합성 장치를 대규모 집적 회로(LSI)의 발달에 따라 가로세로 수 mm 크기의 집적 회로 위에 실현할 수 있게 됨으로써 음성 입출력 장치가 실용화되었다. Speech recognition is the identification of linguistic semantic content from speech by automatic means. Specifically, the process of identifying a word or word sequence and extracting meaning by inputting a voice waveform is classified into five categories: speech analysis, phoneme recognition, word recognition, sentence interpretation, and meaning extraction. In a narrow sense, they often speak from speech analysis to word recognition. As an improvement of the human-machine interface, research and development of speech recognition technology for inputting information with voice and speech synthesis technology for outputting information with voice have been in progress for a long time. The voice input / output device has been put into practice by enabling a speech recognition device and a speech synthesis device, which required a large device, to be realized on an integrated circuit having a width of several mm in accordance with the development of a large scale integrated circuit (LSI).

현재 전화에 의한 은행 잔액 조회, 증권 시세 조회, 통신 판매의 신청, 신용 카드 조회, 호텔이나 항공기 좌석 예약 등에 사용된다. 그러나 이들 서비스는 제한된 수의 단어를 하나하나 떼어서 발음하는 음성을 인식하는 단어 음성 인식 장치를 사용한다. 음성 인식의 궁극적인 목표는 자연스러운 발성에 의한 음성을 인식하여 실행 명령어로서 받아들이거나 자료로서 문서에 입력하는 완전한 음성?텍스트 변환의 실현이다. 단지 단어를 인식할 뿐 아니라 구문 정보, 의미 정보, 작업에 관련된 정보와 지식 등을 이용하여 연속 음성 또는 문장의 의미 내용을 정확하게 추출하는 음성 이해 시스템을 개발하는 것이다. 이러한 시스템의 연구 개발이 전 세계에서 활발하게 진행되고 있다.It is currently used for bank balance inquiry by phone, stock quote inquiry, mail order application, credit card inquiry, hotel or aircraft seat reservation. However, these services use a word speech recognition device that recognizes a pronounced voice by breaking up a limited number of words one by one. The ultimate goal of speech recognition is the realization of a complete speech-to-text conversion that recognizes speech by natural utterance and accepts it as execution instructions or inputs it into a document as data. It is to develop a speech understanding system that not only recognizes words but also accurately extracts the meaning of continuous speech or sentences using phrase information, semantic information, and work-related information and knowledge. R & D of these systems is actively underway around the world.

하지만 현재의 음성 인식 기술은 오류율이 낮지 않으며 이를 수정하기 위해서는 직접적인 타이핑이 필요하거나 임의의 수정 명령을 통한 수정만이 가능한 실정이다. 오류는 대체로 오류율에 의하여 원하는 문장과는 일부 다르게 출력이 되는데, 현재의 음성 인식 기술로는 최소 약 10%의 오류율이 발생한다. 즉, 적어도 10 단어로 이루어진 문장에서 한 단어 이상의 오류가 존재한다고 말할 수 있다.However, the current speech recognition technology does not have a low error rate, and in order to correct it, direct typing is required or modification through arbitrary modification commands is possible. The error is generally output differently from the desired sentence due to the error rate, and at least about 10% error rate occurs with current speech recognition technology. That is, it can be said that there is more than one word error in a sentence of at least 10 words.

이러한 오류를 수정하기 위하여 직접적인 타이핑을 통하여 수정을 함은 전적으로 음성 입력기만을 통해 입력을 할 수 없다는 것을 의미하고, 수정 명령을 통해 수정을 하게 되면 수정 목표의 선정 과정과 본래 의도한 단어가 무엇인지 결정하는 과정 등이 차례로 이루어져야 하므로 한 문장을 제대로 입력하는 데 적지 않은 시간이 소요되는 문제가 발생한다. Correction through direct typing to correct these errors means that input cannot be made entirely through the voice input device. If correction is made through the correction command, the selection process of the correction target and the original intended word are determined. Because the process must be done in sequence, it takes a lot of time to properly enter a sentence.

본 발명은 상술한 종래기술의 문제점을 극복하기 위한 것으로, 사용자의 발화 의도가 입력을 위한 것인지 수정을 위한 것인지 자동적으로 판단하고, 이러한 판단에 대한 검증 과정 및 수정 과정을 거치도록 하는, 음성 인식 장치 및 방법을 제공하는 것을 그 목적으로 한다.The present invention is to overcome the above-mentioned problems of the prior art, and to automatically determine whether the user's utterance intention is for input or for correction, and through the verification process and correction process for this determination, speech recognition device And to provide a method.

본 발명의 일 측면에 따른 음성 처리 장치는, 사용자의 발화를 인식하여 인식 결과를 문자 형태로 출력하며, 사용자의 발화 의도 판단을 위한 자질 정보를 출력하는 음성 인식 모듈, 상기 음성 인식 모듈이 출력하는 자질 정보를 이용해 사용자의 발화 의도가 문자 입력을 위한 발화인지 기 입력된 문자의 수정을 위한 발화인지 판별하는 발화의도 판별 모듈, 및 상기 발화의도 판별 모듈이 출력하는 사용자의 발화 의도에 따라 문자 입력을 수행하는 문자입력 수행 모듈을 포함한다.According to an aspect of the present invention, a speech processing apparatus may recognize a speech of a user and output a recognition result in a text form, and output a feature information for determining a speech intent of the user, which is output by the speech recognition module. A speech intention determining module for determining whether a user's speech intent is speech for character input or speech for correcting a previously input character using feature information, and a character according to the user's speech intent output by the speech intention determination module It includes a character input execution module for performing an input.

상기 음성 처리 장치는, 상기 음성 인식 모듈이 출력한 결과를 수신하고, 인식된 적어도 하나의 문자열 중 어느 부분에 오류가 있는지 예측하여 오류 예측 정보를 출력하는 오류 추출 모듈을 더 포함할 수 있고, 이때, 상기 발화 의도 판별 모듈은 상기 음성 인식 모듈과 더불어 상기 오류 추출 모듈이 출력하는 자질 정보를 이용해 사용자의 발화 의도를 판별할 수 있다.The speech processing apparatus may further include an error extraction module configured to receive a result output by the speech recognition module, predict an error in at least one of the recognized strings, and output error prediction information. The speech intent determination module may determine the speech intent of the user using the feature information output by the error extraction module together with the speech recognition module.

상기 자질 정보는, 사용자 음성의 볼륨, 사용자 음성의 강세 패턴, 사용자의 발화 길이 중 적어도 하나를 포함하는 훈련 음성 자질, 및 현재 인식된 문장과 이전에 입력된 문장 간 발음열의 유사도, 및 문장 종결 여부 중 적어도 하나를 포함하는 문맥 자질을 포함할 수 있다. The feature information may include a voice quality of a user, a training voice quality including at least one of a stress pattern of the user voice, a length of the user's speech, and a similarity between a pronunciation string between a currently recognized sentence and a previously input sentence, and whether the sentence is terminated. It may include a context feature including at least one of.

상기 음성 처리 장치는, 수정을 위한 발화와 입력을 위한 발화에 대한 말뭉치 모음을 저장하는 훈련 말뭉치 데이터베이스를 더 포함할 수 있다. The speech processing apparatus may further include a training corpus database for storing corpus vowels for utterances for correction and utterances for input.

본 발명의 바람직한 일 실시예에 따르면 상기 발화의도 판별 모듈은, 상기 훈련 말뭉치 데이터베이스에 저장된 말뭉치를 이용해 사용자의 발화가 입력 의도를 가진 발화인지 수정 의도를 가진 발화인지 판별 가능하도록 기 훈련된 상태이다.According to a preferred embodiment of the present invention, the speech intention determination module is pre-trained to determine whether the user's speech is a speech having an input intention or a speech having a modification intention using a corpus stored in the training corpus database. .

상기 문자입력 수행 모듈은, 사용자의 수정 발화 의도를 입력받아 입력된 음성의 문맥 자질을 이용해, 기 입력된 적어도 하나의 문자열 중 사용자가 수정을 원하는 부분을 예측하여 설정하는 자동 수정목표 설정부를 포함한다. The text input performing module may include an automatic correction target setting unit configured to predict and set a portion that the user wants to modify among at least one input string by receiving a user's modified speech intent and using contextual characteristics of the input voice. .

사용자가 수정을 원하는 부분을 예측하는 데 사용되는 상기 문맥 자질은, 현재 인식된 문장과 이전에 입력된 문장 간 발음열의 유사도 및 기 입력된 적어도 하나의 문자열 내의 예측된 오류의 존재 여부 중 적어도 하나를 포함할 수 있다,The context qualities used to predict a portion that a user wants to correct may include at least one of a similarity of a pronunciation string between a currently recognized sentence and a previously entered sentence, and whether there is a predicted error in at least one input string. Can include,

상기 문자입력 수행 모듈은, 상기 자동 수정목표 설정부에 의해 설정된 수정목표를 검증하고 검증 결과에 따라 입력, 대체, 수정 목표 변경, 수정 문장 변경, 취소의 동작 중 어떤 동작을 수행할 것인지 판단하는 사용자 검증 및 판단부를 더 포함할 수 있다.The character input performing module verifies a correction target set by the automatic correction target setting unit, and determines a user to perform an operation of input, replacement, modification target change, modification sentence change, or cancellation according to a verification result. The verification and determination unit may further include.

상기 문자입력 수행 모듈은, 상기 사용자 검증 및 판단부의 판단에 따라 상기 자동 수정목표 설정부에 의해 설정된 수정목표를 수정하고, 사용자 선택에 따라 수정 목표의 구간 및 수정 목표의 위치 중 적어도 하나를 수정하여 설정 가능한 수동 수정목표 설정부를 더 포함할 수 있다.The character input performing module corrects the correction target set by the automatic correction target setting unit according to the determination of the user verification and determination unit, and corrects at least one of the section of the correction target and the position of the correction target according to the user selection. The manual correction target setting unit may be further included.

상기 문자입력 수행 모듈은 또한, 사용자 검증 및 판단부의 판단에 따라 수정의 대상이 되는 목표 문자열을 수정 문자열로 대체하는 대체부 및 상기 사용자 검증 및 판단부의 판단에 따라 수정 목표의 문자열을 수정 입력된 문자열로 변경하는 수정문장 변경부를 더 포함할 수 있다.
The character input performing module may further include: a replacement unit for replacing the target string to be modified with a correction string according to the determination of the user verification and determination unit, and a character string for correcting and inputting the character string of the correction target according to the determination of the user verification and determination unit. It may further include a modified sentence change unit to change to.

본 발명의 다른 측면에 따른 음성 처리 방법은, 사용자의 발화를 인식하여 인식 결과를 문자 형태로 출력하며, 사용자의 발화 의도 판단을 위한 자질 정보를 출력하는 단계, 상기 사용자의 발화 의도 판단을 위한 자질 정보를 이용해 사용자의 발화 의도가 문자 입력을 위한 발화인지 기 입력된 문자의 수정을 위한 발화인지 판별하는 단계, 및 상기 판별된 사용자의 발화 의도에 따라 문자 입력을 수행하는 단계를 포함한다.According to another aspect of the present invention, a voice processing method includes recognizing a user's speech, outputting a recognition result in a text form, and outputting feature information for determining a user's speech intent, and a feature for determining the user's speech intent. And determining whether the user's utterance intention is a utterance for character input or a utterance for correcting an input character, and performing character input according to the determined utterance intention of the user.

상기 음성 처리 방법은 상기 출력된 사용자의 발화 의도 판단을 위한 자질 정보를 이용해 인식된 적어도 하나의 문자열 중 어느 부분에 오류가 있는지 예측한 정보를 출력하는 단계를 더 포함하고, 상기 사용자의 발화 의도가 문자 입력을 위한 발화인지 기 입력된 문자의 수정을 위한 발화인지 판별하는 단계는, 상기 적어도 하나의 문자열 중 어느 부분에 오류가 있는지 예측한 정보를 상기 판별에 추가적으로 이용한다.The voice processing method may further include outputting information predicting which part of at least one of the recognized character strings has an error by using the outputted feature information for determining the user's speech intent. In the determining of whether the speech is input for character input or the speech is corrected for inputting the input character, information for predicting which part of the at least one character string is in error is additionally used for the determination.

상기 판별된 사용자의 발화 의도에 따라 문자 입력을 수행하는 단계는, 사용자의 수정 발화 의도를 입력받아 입력된 음성의 문맥 자질을 이용해, 기 입력된 적어도 하나의 문자열 중 사용자가 수정을 원하는 부분을 예측하여 수정 목표를 설정하는 단계를 포함할 수 있다.In the performing of the text input according to the determined speech intent of the user, the corrected speech intent of the user is input, and the part of the at least one input string that the user wants to correct is predicted using the contextual quality of the input voice. And setting a correction goal.

상기 판별된 사용자의 발화 의도에 따라 문자 입력을 수행하는 단계는, 상기 설정된 수정목표를 검증하고 검증 결과에 따라 입력, 대체, 수정 목표 변경, 수정 문장 변경, 취소 중 어떤 동작을 수행할 것인지 판단하는 단계를 더 포함할 수 있다.According to the determined user's utterance intention, the step of inputting a character may include verifying the set correction target and determining which operation to perform, such as input, replacement, modification target change, correction sentence change, or cancellation, according to the verification result. It may further comprise a step.

상기 판별된 사용자의 발화 의도에 따라 문자 입력을 수행하는 단계는, 상기 검증 결과 상기 설정된 수정 목표를 변경해야 하는 것으로 판단된 경우, 상기 설정된 수정 목표를 변경하는 단계를 더 포함할 수 있다.The performing of the text input according to the determined speech intent of the user may further include changing the set correction target when it is determined that the set correction target should be changed as a result of the verification.

상기 판별된 사용자의 발화 의도에 따라 문자 입력을 수행하는 단계는, 상기 검증 결과 상기 설정된 수정 목표를 입력된 새로운 문자열로 대체해야 하는 것으로 판단된 경우, 수정의 대상이 되는 목표 문자열을 수정 문자열로 대체하는 단계를 더 포함할 수 있다.In the performing of the character input according to the determined user's intention to speak, when it is determined that the verification target needs to be replaced with the inputted new character string, the character string to be modified is replaced with the character string. It may further comprise the step.

상기 판별된 사용자의 발화 의도에 따라 문자 입력을 수행하는 단계는, 상기 검증 결과 상기 설정된 수정 목표가 변경되어야 하는 것으로 판단된 경우, 상기 설정된 수정 목표의 문자열을 사용자에 의해 수정 입력된 문자열로 변경하는 단계를 더 포함할 수 있다.
The character input may be performed according to the determined user's utterance intention, when it is determined that the set correction target should be changed as a result of the verification, changing the character string of the set correction target to a character string input by the user. It may further include.

본 발명의 또 다른 측면에 따른 음성 워드 프로세서는, 사용자의 발화를 인식하여 인식 결과를 문자 형태로 출력하며, 사용자의 발화 의도 판단을 위한 자질 정보를 출력하는 음성 인식 모듈, 상기 음성 인식 모듈이 출력한 결과를 수신하여 인식된 적어도 하나의 문자열 중 어느 부분에 오류가 있는지 예측하여 오류 예측 정보를 출력하는 오류 추출 모듈, 기 저장된 수정 발화와 입력 발화에 대한 훈련 말뭉치를 이용해 사용자의 발화가 입력 의도를 가진 발화인지 수정 의도를 가진 발화인지 판별 가능하도록 훈련된 발화의도 판별 모듈로서, 상기 음성 인식 모듈이 출력하는 사용자의 발화 의도 판단을 위한 자질 정보 및 상기 오류 추출 모듈이 출력하는 오류 예측 정보를 이용해 사용자의 발화 의도를 판별하는 발화의도 판별 모듈, 및 상기 발화의도 판별 모듈이 출력하는 사용자의 발화 의도에 따라 문자 입력을 수행하는 문자입력 수행 모듈을 포함한다.
Voice word processor according to another aspect of the present invention, the speech recognition module for recognizing the user's utterance and outputs the recognition result in the form of a character, and outputs the feature information for determining the utterance intention of the user, the voice recognition module outputs An error extraction module that outputs error prediction information by predicting which part of at least one recognized character string is received by receiving a result, and a training corpus for pre-corrected speech and input speech is used to determine the user's speech intent. A speech intention determination module trained to discriminate whether a speech has an utterance or an intention to correct the speech, using the feature information for determining the utterance intention of the user output by the speech recognition module and the error prediction information output by the error extraction module. Speech intention determination module for determining the user's speech intention, and said speech intention And according to the user's utterance intention of each module, the output including the character input to perform module for performing a character input.

본 발명은 기존에 음성 인식 장치를 통해 잘못 입력된 문장을 일일이 수동으로 수정하던 것을 개선하고자 하는 것으로, 음성만을 통한 입력과 동시에 특별한 수정 명령 없이 시스템이 자동으로 해당 발화가 수정을 위한 발화라는 것을 인지하여 효과적인 입력 수정을 가능케 한다.The present invention aims to improve the manual correction of a sentence that has been incorrectly input through a speech recognition device, and recognizes that the speech is a speech for correcting the system automatically without a special corrective command at the same time. To enable effective input correction.

도 1은 본 발명에 따른 음성 인식에 대한 개념을 설명하고 있는 도면.
도 2는 본 발명의 바람직한 일 실시예에 따른 음성 처리 장치의 블록 구성을 나타낸 도면.
도 3은 본 발명의 바람직한 일 실시예에 따른 문자 입력 수행 모듈의 구체적인 블록 구성을 나타낸 도면.
도 4는 본 발명의 바람직한 일 실시예에 따른 음성 인식 및 자동 문자 입력 방법의 순차적인 동작 흐름을 나타낸 도면.1 is a diagram illustrating the concept of speech recognition according to the present invention.
2 is a block diagram of a speech processing device according to an embodiment of the present invention.
3 is a block diagram illustrating a specific block configuration of a character input execution module according to an exemplary embodiment of the present invention.
4 is a diagram illustrating a sequential operation flow of a voice recognition and automatic text input method according to an embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세하게 설명하고자 한다.While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail.

그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.It should be understood, however, that the invention is not intended to be limited to the particular embodiments, but includes all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.

본 출원에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting of the present invention. Singular expressions include plural expressions unless the context clearly indicates otherwise. In this application, the terms "comprise" or "having" are intended to indicate that there is a feature, number, step, operation, component, part, or combination thereof described in the specification, and one or more other features. It is to be understood that the present invention does not exclude the possibility of the presence or the addition of numbers, steps, operations, components, components, or a combination thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가진 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art. Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the relevant art and are to be interpreted in an ideal or overly formal sense unless explicitly defined in the present application Do not.

본 명세서에서 사용된 용어 "문자열"은 적어도 하나의 문자가 연속되는 형태의 일반적인 의미로 사용되었으며, 컴퓨터 등에서 다루어지는 데이터로서 일련의 문자 또는 코드로서의 한정적인 의미로 제한되지는 않음을 밝혀둔다.As used herein, the term “string” is used in the general sense of the form in which at least one character is contiguous and is not limited to the limited meaning as a series of characters or codes as data handled by a computer or the like.

이하, 첨부한 도면들을 참조하여, 본 발명의 바람직한 실시예를 보다 상세하게 설명하고자 한다. 본 발명을 설명함에 있어 전체적인 이해를 용이하게 하기 위하여 도면상의 동일한 구성요소에 대해서는 동일한 참조부호를 사용하고 동일한 구성요소에 대해서 중복된 설명은 생략한다.
Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. In order to facilitate the understanding of the present invention, the same reference numerals are used for the same constituent elements in the drawings and redundant explanations for the same constituent elements are omitted.

도 1은 본 발명에 따른 음성 인식에 대한 개념을 설명하고 있는 도면이다.1 is a diagram illustrating a concept of speech recognition according to the present invention.

도 1에 도시된 그림을 살펴보면, 본 발명에 따르는 경우 일반적인 음성 인식의 기능에 추가적으로, 특별한 수정명령이 없이 사용자가 입력을 원하는 문장 또는 단어를 그대로 발화하면 해당 발화가 이후 입력을 원하는 발화인지 이전에 입력된 문장에 대해 수정을 원하는 문장인지를 자동으로 판단한다. 판단 이후에는 수정 목표를 설정하는 과정, 본래 의도한 단어를 결정하는 과정이 일괄 처리됨으로써, 번거로운 수작업 없이도 음성으로만 문자 입력을 수행할 수 있다. Looking at the picture shown in Figure 1, according to the present invention, in addition to the function of the general speech recognition, if the user utters the sentence or word that the user wants to input without special modification command as it is before the corresponding utterance desired speech input Automatically determine whether the sentence to be corrected for the input sentence. After the determination, the process of setting the correction target and the process of determining the originally intended word are processed collectively, so that text input can be performed only by voice without cumbersome manual work.

즉, 본 발명은 음성 인식 장치를 통해 입력하는 음성 워드 프로세서를 위한 문장 수정을 음성을 통해 효과적으로 해결하기 위한 것으로서, 기존에 음성 인식 장치를 통해 잘못 입력된 문장을 일일이 수동으로 수정하던 것을 개선하고자 하며, 음성만을 통한 입력과 동시에 특별한 수정 명령 없이 시스템이 자동으로 해당 발화가 수정을 위한 발화라는 것을 인지하여 효과적인 입력 수정을 가능케 한다.That is, the present invention is to effectively correct the sentence correction for the voice word processor input through the speech recognition device, through the voice, and to improve the manual correction of the wrong input by the conventional speech recognition device. In addition, the system automatically recognizes that the utterance is a utterance for correcting without inputting a special corrective command at the same time as input through voice only, thereby enabling effective input correction.

보다 구체적으로, 본 발명의 바람직한 일 실시예에 따르면, 사용자의 발화 의도에 대해 이후 입력을 원하는 발화인지 이전 입력된 문장에 대한 수정을 원하는 발화인지를 먼저 판단하여 작업을 수행하도록 한다. 발화 의도 판단을 위해 수정 발화와 입력 발화를 모아 놓은 말뭉치를 통해 사전에 발화의도 판별 모듈을 충분히 학습시킨다. 이후, 해당 작업에 대한 판단이 이루어진 후 사용자에 의한 해당 작업의 검증이 이루어지도록 한다. 본 발명의 또 다른 바람직한 일 실시예에 따르면, 검증 과정에서 작업 수정 및 수정 목표의 수정, 수정 문장의 수정 등이 사용자의 선택 하에 이루어질 수 있도록 한다. More specifically, according to an exemplary embodiment of the present invention, the user's utterance intention is first determined whether the utterance desired to be input later or the utterance desired to be corrected for the previously input sentence is performed to perform the operation. In order to determine the intention of speech, the intention determination module of speech intention is fully trained through corpus that collects corrected speech and input speech. Thereafter, after the determination of the operation is made, the operation of the operation is verified by the user. According to another preferred embodiment of the present invention, in the verification process, the modification of the task and the correction target, the correction sentence can be made under the user's selection.

본 발명에 따른 음성 처리 장치는 음성 입력 및 자동 수정이 가능한 음성 워드 프로세서의 형태로 나타날 수 있다.
The speech processing apparatus according to the present invention may appear in the form of a speech word processor capable of speech input and automatic correction.

이하, 무결절적 수정 상태 돌입을 통한 음성 입력 및 수정 방법을 통한 본 발명의 바람직한 일 실시예에 따른 음성 처리 장치를 도 2 및 도 3을 통해 좀더 상세히 살펴보기로 한다.Hereinafter, a voice processing apparatus according to an exemplary embodiment of the present invention through a voice input and correction method through an inadequate modification state will be described in detail with reference to FIGS. 2 and 3.

도 2는 본 발명의 바람직한 일 실시예에 따른 음성 처리 장치의 블록 구성을 나타낸다. 2 is a block diagram of a speech processing apparatus according to an exemplary embodiment of the present invention.

도 2에 나타난 바와 같이, 본 발명의 바림직한 일 실시예에 따른 음성 처리 장치는 음성 인식 모듈(100), 오류 추출 모듈(200), 발화 의도 판별 모듈(300), 훈련 말뭉치 데이터베이스(400), 및 문자입력수행 모듈(500)을 포함하여 구성될 수 있다. As shown in FIG. 2, a speech processing apparatus according to an exemplary embodiment of the present invention includes a speech recognition module 100, an error extraction module 200, a speech intent determination module 300, a training corpus database 400, And it may be configured to include a character input performing module 500.

음성 인식 모듈(100)은 사용자의 발화를 입력받아 이를 인식하고, 인식된 문자, 그리고 사용자 의도가 수정인지 입력인지를 파악할 수 있는 자질, 즉 훈련 음성 자질 및 문맥 자질을 함께 결과로 출력한다. The speech recognition module 100 receives the user's speech and recognizes it, and outputs the recognized text and the features for identifying whether the user's intention is modified or input, that is, the training speech quality and the context quality.

오류 추출 모듈(200)은 음성 인식 모듈(100)이 출력한 결과를 수신하여 문장의 어느 부분에 오류가 있을 것인지 예측하고, 이 정보를 발화의도 판별 모듈(300)로 전달한다.The error extraction module 200 receives a result output from the speech recognition module 100, predicts which part of the sentence is in error, and transmits this information to the speech intention determination module 300.

발화의도 판별 모듈(300)은 바람직하게는, 수정 발화와 입력 발화의 모음으로 구성된 훈련 말뭉치 데이터베이스(400)를 통해 사용자의 발화가 입력 의도를 가진 발화인지 수정 의도를 가진 발화를 판별할 수 있도록 훈련된 상태이다. The speech intention determination module 300 may be configured to determine whether the user's speech is a speech having an input intention through the training corpus database 400 including a modified speech and a collection of input speech. Trained.

훈련 말뭉치 데이터베이스(400)는 입력을 위한 발화에 대한 말뭉치와 수정을 위한 발화에 대한 말뭉치가 구분되도록 훈련 말뭉치를 저장할 수 있다.The training corpus database 400 may store the training corpus so that corpus for speech for input and corpus for speech for modification are distinguished.

여기서, 훈련 음성 자질의 바람직한 실시예로는 사용자 음성의 입력 볼륨, 사용자 강세 패턴, 사용자 발화 길이 등을 들 수 있으며, 훈련 말뭉치를 이용하여 훈련된다. 문맥 자질의 바람직한 실시예로는 현재 인식된 문장과 이전 입력 문장 간의 발음열의 유사도, 문장 종결 여부 및 오류 추출 모듈(200)로부터 출력된 오류 예측 정보 등을 들 수 있다. Here, a preferred embodiment of the training voice qualities may include an input volume of the user voice, a user accent pattern, a user speech length, and the like, and are trained using the training corpus. Preferred embodiments of the context feature may include a similarity between a pronunciation string between a currently recognized sentence and a previous input sentence, whether a sentence is terminated, and error prediction information output from the error extraction module 200.

예를 들자면, 1차로 인식된 사용자의 발화가 "나는 학교를 간다"로 판정된 경우, 이후에 이어진 사용자의 발화가 "나는 학교에 간다"로 인식된 경우 1차 인식된 발화와 이후의 발화가 매우 유사한 형태를 띄므로, 이후의 발화는 수정을 위한 발화로 판단될 것이다. 또한, 예를 들어 위의 1차적인 사용자의 발화 이후에 이어진 사용자의 발화가 '아! 학교에"라는 형태로 인식되었다면 "아" 등의 감탄사 이후에 이어진 단어를 수정하기 위한 발화로 판단될 수 있다. 뿐만 아니라, 사용자의 톤이 높아지면서 "학교에"를 반복 입력하는 경우 등을 수정을 위한 발화로 판단할 수 있을 것이다. For example, if the first recognized user's utterance is determined to be "I'm going to school," then the subsequent user's utterance is recognized as "I'm going to school." Since the form is very similar, subsequent speech will be judged as speech for correction. Also, for example, the user's utterance following the primary user's utterance above is' Ah! If it is recognized as "in school," it can be judged as an utterance for correcting words following interjections such as "ah". In addition, when the user's tone is increased, the word "at school" is repeatedly inputted. It can be judged as a speech for.

앞서 언급된 훈련 음성 자질 및 문맥 자질은 음성 인식 모듈(100)과 오류 추출 모듈(200)로부터 출력된다. 발화 의도 판별 모듈(300)은 음성 인식 모듈(100)과 오류 추출 모듈(200)가 출력하는 정보를 종합적으로 판단하여, 수정을 위한 발화인지 입력을 위한 발화인지 판별한다.The training speech feature and the context feature mentioned above are output from the speech recognition module 100 and the error extraction module 200. The speech intent determination module 300 comprehensively determines the information output from the speech recognition module 100 and the error extraction module 200 to determine whether the speech is corrected or the speech is input.

문자입력수행 모듈(500)은 발화의도 판별 모듈(300)의 출력, 즉 판별된 발화 의도에 따라 사용자 발화 입력에 따른 문자 입력을 수행하거나 문장 수정 의도에 따른 후속 절차 및 그에 따른 문자 입력을 수행하게 된다. 문자입력수행 모듈(500)과 관련하여서는 아래 도 3을 통해 좀더 자세히 살펴보기로 한다.
The character input execution module 500 performs character input according to the user's speech input according to the output of the speech intention determination module 300, that is, the determined speech intent, or performs the subsequent procedure and the character input according to the sentence correction intention. Done. With respect to the character input performing module 500 will be described in more detail through FIG. 3 below.

도 3은 본 발명의 바람직한 일 실시예에 따른 문자 입력 수행 모듈의 구체적인 블록 구성을 나타낸다.3 illustrates a detailed block configuration of a character input execution module according to an exemplary embodiment of the present invention.

도 3에 도시된 바와 같이, 본 발명의 바람직한 일 실시예에 따른 문자 입력 수행 모듈은 문자 입력부(510), 자동 수정목표 설정부(520), 사용자 검증 및 판단부(530), 대체부(540), 수동 수정목표 설정부(550), 수정문장 변경부(560)를 포함하여 구성된다.As shown in FIG. 3, the character input execution module according to an exemplary embodiment of the present invention includes a character input unit 510, an automatic correction target setting unit 520, a user verification and determination unit 530, and a replacement unit 540. ), A manual correction target setting unit 550, and a correction sentence change unit 560 is configured.

도 2를 통해 살펴본 발화 의도 판별 모듈(300)이 출력하는 사용자의 발화 의도는 문자 입력부(510) 또는 자동 수정목표 설정부(520)로 입력된다.The utterance intention of the user outputted by the utterance intent determination module 300 described with reference to FIG. 2 is input to the character input unit 510 or the automatic correction target setting unit 520.

발화 의도가 수정이라고 판단된 경우, 수정 발화 의도는 자동 수정목표 설정부(520)로 입력되고, 자동 수정목표 설정부(520)는 앞서 입력된 문장/단어 중 사용자가 어느 부분의 수정을 원하는지 예측을 수행한다. When it is determined that the speech intention is corrected, the corrected speech intent is input to the automatic correction target setting unit 520, and the automatic correction target setting unit 520 predicts which part of the sentence / word previously inputted is desired by the user. Do this.

여기서, 예측하는 데 필요한 자질의 바람직한 실시예로는 발음열 유사도, 예측된 오류의 존재 여부 등을 들 수 있으며, 자동 수정목표 설정부(520)는 이러한 자질을 이용해 사용자가 수정을 의도하는 위치를 자동으로 예측, 설정한다.Here, preferred embodiments of the qualities required for the prediction may include pronunciation string similarity, presence or absence of a predicted error, and the automatic correction goal setting unit 520 may use the qualities to determine a position to be corrected by the user. Automatically predict and set

사용자 검증 및 판단부(530)에서는 자동 수정목표 설정부(520)에 의해 자동 설정된 수정 목표가 정확한지, 사용자 명령을 통해 다시 한번 확인, 즉, 자동 수정목표 설정 작업이 정확하게 수행되었는지를 검증한다. 바람직하게는, 사용자 검증 및 판단부(530)는 명령을 음성으로 입력받기 위한 명령어 음성 인식기를 포함한다.The user verification and determination unit 530 verifies whether the correction target automatically set by the automatic correction target setting unit 520 is correct and confirms again through a user command, that is, whether the automatic correction target setting operation is correctly performed. Preferably, the user verification and determination unit 530 includes a command voice recognizer for receiving a command by voice.

이 과정에서 사용자로부터 받을 수 있는 명령으로, 본 발명에서는 크게 5 가지의 바람직한 실시예를 고려한다. 즉, 본 발명에 따른 사용자 명령에 대한 바람직한 실시예로 입력, 대체, 수정 목표 변경, 수정 문장 변경, 취소 명령을 들 수 있다. 사용자 검증 및 판단부(530) 사용자로부터 명령을 수신하면 어떤 명령인지 판단하고, 해당 작업을 수행하는 블록의 동작을 활성화하여 해당 작업을 수행하도록 제어한다.As a command that can be received from the user in this process, the present invention considers five preferred embodiments. That is, preferred embodiments of the user command according to the present invention include an input, a replacement, a modification target change, a correction sentence change, and a cancellation command. When the user verification and determination unit 530 receives a command from a user, the user verification and determination unit 530 determines which command, and activates an operation of a block that performs the corresponding operation to control to perform the corresponding operation.

차례로 설명하자면, 대체부(540)는 사용자 검증부가 수신한 사용자 명령이 문자의 대체라고 판별되었을 때, 수정의 대상이 되는 목표를 수정 문장으로 대체하는 역할을 담당한다.In order to explain, the replacement unit 540 plays a role of replacing a target to be modified with a correction sentence when it is determined that the user command received by the user verification unit replaces a character.

수동 수정목표 설정부(550)는 자동으로 설정된 수정 목표를 수정하는 역할을 담당하는데, 수정 목표의 구간을 늘리거나 줄이고, 수정 목표의 위치를 앞이나 뒤로 움직이는 기능도 수행할 수 있다. 이러한 기능을 음성으로 해결하기 위해, 바람직하게는 수동 수정목표 설정부(550)는 명령어 인식을 위한 음성 인식기를 포함한다. The manual correction goal setting unit 550 is responsible for correcting the automatically set correction target, may increase or decrease the interval of the correction target, and may also function to move the position of the correction target forward or backward. In order to solve this function by voice, the manual correction target setting unit 550 preferably includes a voice recognizer for command recognition.

수정문장 변경부(560)에서는 수정 목표의 문장 또는 단어를 수정 입력된 문장 또는 단어로 변경하는 역할을 담당한다. 수정 입력을 의도한 것으로 인식된 문장이 올바르지 않은 경우에 수정문장 변경부(560)가 동작을 수행하게 된다. 본 발명의 바람직한 실시예에 따르면, 수정문장 변경부(560)는 수정 문장에 대한 다른 후보를 제시하거나 재발화의 명령을 수행할 수 있고, 재발화를 위한 음성 인식기와 명령을 위한 명령어 음성 인식기가 포함되어 있다.
The correction sentence change unit 560 plays a role of changing a sentence or word of a correction target into a correction input sentence or word. If the sentence recognized as intended to be corrected is not correct, the corrected sentence changer 560 performs an operation. According to a preferred embodiment of the present invention, the correction sentence change unit 560 may present another candidate for the correction sentence or perform a command of re-ignition, and a voice recognizer for re-ignition and a command voice recognizer for command are provided. Included.

도 4는 본 발명의 바람직한 일 실시예에 따른 음성 인식 및 자동 문자 입력 방법의 순차적인 동작 흐름을 나타낸다. 4 is a flowchart illustrating a sequential operation of a voice recognition and automatic text input method according to an exemplary embodiment of the present invention.

즉, 도 4는 본 발명의 바람직한 일 실시예에 따른 음성 처리 장치 또는 음성 워드프로세서가 입력과 수정을 자동으로 판단하고 문장을 입력해 나가는 흐름도를 나타낸다.That is, FIG. 4 is a flowchart in which a speech processing apparatus or a speech word processor automatically determines an input and a correction and inputs a sentence according to an exemplary embodiment of the present invention.

본 발명에 따라 음성 워드프로세서를 위한 무결절적 수정 상태 돌입을 통한 음성 입력 및 수정 방법을 수행하기 위해서는 도 4에 나타난 바와 같이, 우선 사용자가 발화한 음성을 입력받고(S401), 발화 의도를 판단한다(S402).In order to perform a voice input and correction method through an inadequate correction state for a voice word processor according to the present invention, as shown in FIG. 4, first, a user receives a spoken voice (S401) and determines a speech intent. (S402).

발화 의도란, 사용자가 발화한 문장의 의미를 말한다. 본 발명의 바람직한 일 실시예에 따른 음성 워드프로세서에 대한 입력으로서, 해당 문자열이 그저 입력을 위한 문자열인지 이전에 입력되었던 문자열에 대한 수정을 요구하는 문자열인지를 판별하는 것이다. The speech intent refers to the meaning of the sentence spoken by the user. As an input to a voice word processor according to an exemplary embodiment of the present invention, it is determined whether the corresponding string is just a character string for input or a character string requiring modification of a character string previously input.

입력을 위한 발화라면 문단/문장의 마지막 커서 부분에 문자를 입력하고( S404), 다시 사용자의 발화를 기다린다(S401). If the speech is for input, a character is input at the last cursor portion of the paragraph / phrase (S404), and the user waits for speech (S401).

수정을 원하는 발화라고 판단될 시에는(S402의 수정), 사용자에 의한 수정 의도 검증 단계로 넘어간다(S403). 수정 의도에 대한 검증을 하는 이유는, 시스템이 수정을 원하는 발화라고 판단했더라도 그 작업이 틀릴 가능성이 존재하기 때문이다. 이 과정에서 사용자로부터 받을 수 있는 명령으로, 본 발명에서는 크게 5 가지의 바람직한 실시예를 고려한다. 즉, 본 발명에 따른 사용자 명령에 대한 바람직한 실시예로 입력, 대체, 수정 목표 변경, 수정 문장 변경, 취소 명령을 들 수 있다. 사용자로부터 해당 명령을 수신하면, 해당 작업의 과정으로 넘어가 사용자 명령에 따른 다양한 작업을 처리하게 된다. When it is determined that the speech is desired to be corrected (correction in S402), the process proceeds to the correction intent verification step by the user (S403). The reason for the verification of the modification intention is that even if the system determines that the speech is desired to be corrected, there is a possibility that the operation is wrong. As a command that can be received from the user in this process, the present invention considers five preferred embodiments. That is, preferred embodiments of the user command according to the present invention include an input, a replacement, a modification target change, a correction sentence change, and a cancellation command. When the command is received from the user, the process proceeds to the corresponding job and processes various jobs according to the user command.

차례대로 살펴보자면, 우선 사용자 검증 결과 발화 의도가 문자 입력 의도 혹은 취소인지 판단한다(S403). 사용자로부터 "취소"라는 명령을 받은 경우에는 (S403의 취소), 즉시 수정 작업을 취소하고 다시 사용자의 발화를 기다린다(S401).In turn, first, it is determined whether the user verification result utterance intention is a character input intention or cancellation (S403). If the user receives a "cancel" command from the user (S403 cancel), immediately cancels the corrective work and waits for the user to speak again (S401).

반대로, 입력이라는 명령을 받았다면 입력을 위한 발화라고 판단한 것과 동일한 작업(S404)을 수행한 후 다시 사용자의 발화를 기다린다(S401).On the contrary, if a command called an input is performed, the same operation S404 as that determined to be input is performed (S404), and then the user's speech is waited again (S401).

입력 의도도 아니고 취소도 아닌 경우(S403의 아니오)에는, 수정 문장 또는 수정 목표를 변경하고자 하는 의도인지 판단한다(S405). 판단 결과, 사용자가 수정 문장 또는 수정 목표를 변경하고자 하는 것으로 판단한 경우(S405의 예), 수정 문장 또는 수정 목표를 변경하고(S407), 수정 문장/목표를 변경한 후에는 다시 검증 과정으로 돌아간다(S403).If it is neither an input intention nor a cancellation (NO in S403), it is determined whether the intention is to change the correction sentence or the correction target (S405). If it is determined that the user wants to change the corrected sentence or the corrected target (YES in S405), the corrected sentence or the corrected target is changed (S407), and after changing the corrected sentence / target, the process returns to the verification process again. (S403).

수정 문장 또는 수정 목표를 변경하고자 하는 것이 아니라고 판단한 경우(S405의 아니오)에는, 사용자로부터 "대체"라는 명령을 받은 것으로 판단하여 새로이 입력된 문자로 기존의 문자를 대체하여 입력한(S406) 후, 다시 사용자의 발화를 기다린다(S401).If it is determined that the correction sentence or the correction target is not to be changed (NO in S405), it is determined that the user has received a command of "replacement", and replaces the existing character with the newly input character and then inputs (S406). Waiting for the user to speak again (S401).

도 4를 통해 상술한 과정을 진행하며 사용자는 본 발명에 따른 음성 워드프로세서에 대한 입력을 지속할 수 있다.Through the above-described process through FIG. 4, the user can continue input to the voice word processor according to the present invention.

도 4에서는 도시의 편의상 사용자 명령을 판단하는 동작 흐름을 시간 순서에 따라 단계적으로 설명하고 있으나, 사용자 명령에 의한 수정 의도 검증 작업 후, 검증 결과에 따라 입력, 대체, 수정 목표 변경, 수정 문장 변경, 및 취소 중 하나의 명령이 택일적으로 이루어질 수 있음은 물론이다.
In FIG. 4, a flow of operations for determining a user command is described step by step in order of time for convenience of illustration, but after corrective intention verification operation by a user command, input, replacement, modification target change, modification sentence change, Of course, one of the order of cancellation and cancellation can be made alternatively.

상술한 바와 같이, 본 발명은 기존의 음성 입력기를 통한 음성 입력에서 발생한 오류에 대해서 그를 수정하기 위한 발화와 이어지는 입력을 원하는 발화를 구별할 수 있다. As described above, the present invention can distinguish between a utterance for correcting an error occurring in a voice input through an existing voice input device and a utterance for which subsequent input is desired.

이와 같이 본 발명은, 직접적인 타이핑이나 수정 명령을 통한 수정이 아닌 본래의 입력을 하는 발화와 동일한 발화를 함으로써 편리하게 문장을 수정할 수 있다. 지속적으로 입력을 원하는 발화를 함으로써 어떤 문장을 수정할 것인가, 어떻게 수정할 것인가에 대한 직접적인 지시 없이 효과적으로 원하는 문장을 입력할 수 있는 것이다. 이 과정에서 자동적으로 수정 목표를 선정할 수 있고, 어떤 문장으로 수정할 것인지를 한 번에 확정할 수 있다. 이후 사용자가 작업에 대한 검증과정을 통하여 시스템의 신뢰도를 높일 수 있다.
As described above, the present invention can conveniently modify a sentence by performing the same speech as that of the original input rather than the direct typing or the correction command. By continually making speech that you want to input, you can effectively input the sentence you want without modifying what sentence and how to modify it. In this process, you can automatically select the corrective goal and decide which sentence to revise at once. The user can then increase the reliability of the system by verifying the work.

이상 실시예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.Although described with reference to the embodiments above, those skilled in the art will understand that the present invention can be variously modified and changed without departing from the spirit and scope of the invention as set forth in the claims below. Could be.

Claims

A speech recognition module that recognizes a user's speech and outputs a recognition result in a text form, and outputs feature information for determining a user's speech intent;
A speech intention determination module for determining whether a speech intention of the user is speech for text input or speech for correcting a text input by using the feature information output by the speech recognition module; And
And a text input performing module configured to perform text input according to a user's intention of speech output by the speech intention determination module.

The method according to claim 1,
Receiving a result output by the speech recognition module, and further comprising an error extraction module for predicting which part of the recognized at least one character string error to output error prediction information,
The speech intent determination module determines the speech intent of the user by using the feature information output by the speech recognition module and the error prediction information output by the error extraction module.

The method according to claim 1,
The quality information,
At least one of a volume of the user's voice, an accent pattern of the user's voice, a training speech feature including at least one of the user's speech length, and the similarity of the pronunciation string between the currently recognized sentence and the previously input sentence, and whether the sentence is terminated. A speech processing device, comprising a context feature to do.

The method according to claim 1,
And a training corpus database for storing corpus collections for speech for modification and speech for input.

The method of claim 4,
The speech intention determination module,
And a pre-trained state to determine whether a user's speech is a speech having an input intention or a speech having a modification intention using a corpus stored in the training corpus database.

The method according to claim 1,
The character input performing module,
And an automatic correction target setting unit configured to predict and set a portion of the at least one input character string that the user wants to correct using the contextual characteristics of the input voice after receiving the corrected speech intention of the user.

The method of claim 6,
The context qualities,
And at least one of a similarity between a pronunciation string between a currently recognized sentence and a previously input sentence, and whether there is a predicted error in at least one input string.

The method of claim 6,
The character input performing module,
The apparatus further includes a user verification and determination unit configured to verify a correction target set by the automatic correction target setting unit and determine which operation to perform, such as input, replacement, modification target change, modification sentence change, or cancellation, according to the verification result. , Speech processing device.

The method according to claim 8,
The character input performing module,
The manual correction target setting unit may be set by modifying the correction target set by the automatic correction target setting unit according to the determination of the user verification and determination unit, and modifying at least one of the section of the correction target and the position of the correction target according to the user selection. Further comprising, the speech processing device.

The method according to claim 8,
The character input performing module,
A replacement unit for replacing a target string to be modified with a correction string according to the determination of the user verification and determination unit; And
And a modification sentence change unit for changing the character string of the correction target into a character string corrected and input by the user according to the determination of the user verification and determination unit.

Recognizing the utterance of the user and outputting the recognition result in the form of a character, and outputting feature information for determining the utterance intention of the user;
Determining whether the user's speech intention is a speech for inputting a character or a speech for correcting an input character using the feature information for determining the user's speech intent; And
And performing text input according to the determined speech intent of the user.

The method of claim 11,
Outputting information predicting which part of the at least one character string is recognized using the feature information for determining the utterance intention of the user;
The determining of whether the user's intention to speak is a speech for inputting a character or a speech for correcting a previously input character may further include using information for predicting which part of the at least one string is in error. A voice processing method.

The method of claim 11,
The quality information,
At least one of a volume of the user's voice, an accent pattern of the user's voice, a training speech feature including at least one of the user's speech length, and the similarity of the pronunciation string between the currently recognized sentence and the previously input sentence, and whether the sentence is terminated. A speech processing method comprising a context feature.

The method of claim 11,
According to the determined user's speech intent to perform a character input,
And receiving a user's corrected speech intent and using a contextual feature of the input voice, predicting a portion of the at least one input string that the user wants to correct and setting a correction target.

The method according to claim 14,
The context qualities,
And at least one of a similarity between a pronunciation string between a currently recognized sentence and a previously input sentence, and whether there is a predicted error in at least one input string.

The method according to claim 14,
According to the determined user's speech intent to perform a character input,
Verifying the set correction target and determining whether to perform an input, a replacement, a modification target change, a modification sentence change, or a cancellation operation according to a verification result.

18. The method of claim 16,
According to the determined user's speech intent to perform a character input,
And if it is determined that the set correction target should be changed as a result of the verification, changing the set modification target.

18. The method of claim 16,
According to the determined user's speech intent to perform a character input,
And when it is determined that the set correction target should be replaced with the inputted new string, the verification target further comprises: replacing the target character string to be modified with the correction string.

18. The method of claim 16,
According to the determined user's speech intent to perform a character input,
And if it is determined that the set correction target is to be changed as a result of the verification, changing the string of the set modification target to a character string modified by the user.

A speech recognition module that recognizes a user's speech and outputs a recognition result in a text form, and outputs feature information for determining a user's speech intent;
An error extraction module configured to receive a result output by the speech recognition module, predict which part of the recognized at least one character string has an error, and output error prediction information;
A speech intention determination module trained to determine whether a user's speech is an input intention or a speech intention using a training corpus for pre-stored corrected speech and an input speech, wherein the speech recognition module outputs A speech intention determination module for determining a speech intention of a user using feature information for determining speech intent and error prediction information output by the error extraction module; And
And a text input performing module configured to perform text input according to a user's speech intent output by the speech intention determination module.