KR20080075260A

KR20080075260A - Method for input message using voice recognition and image recognition in mobile terminal

Info

Publication number: KR20080075260A
Application number: KR1020070014159A
Authority: KR
Inventors: 안기모
Original assignee: 삼성전자주식회사
Priority date: 2007-02-12
Filing date: 2007-02-12
Publication date: 2008-08-18
Also published as: KR101373206B1

Abstract

A document writing method in a portable terminal by using voice recognition and image recognition is provided to be capable of recognizing voice data and image data received from a microphone and a camera, so as to convert the recognized data into text data, thereby enabling more exact recognition. When a document writing mode is executed, a microphone and a camera are activated(S201,S205). Voice data and image data are obtained from the microphone and the camera, and the voice data is recognized(S207-S217). In case of a voice data recognition error, the image data is recognized to replace a recognition error of the voice data(S219-S227). Text data is generated as the recognized results(S229).

Description

Method for writing document in mobile terminal using voice recognition and image recognition {Method for input message using voice recognition and image recognition in Mobile terminal}

도 1은 본 발명의 실시예에 따른 휴대단말기의 주요 구성을 나타내는 블록도1 is a block diagram showing the main configuration of a mobile terminal according to an embodiment of the present invention;

도 2는 본 발명의 실시예에 따른 음성인식과 영상인식을 이용한 휴대단말기에서의 문서작성 방법을 나타내는 흐름도2 is a flowchart illustrating a document creation method in a mobile terminal using voice recognition and image recognition according to an embodiment of the present invention.

도 3은 본 발명의 실시예에 따른 휴대단말기에서의 영상인식을 위하여 표시된 캡처 영역을 변경하는 방법을 나타내는 상세흐름도3 is a detailed flowchart illustrating a method of changing a displayed capture area for image recognition in a mobile terminal according to an embodiment of the present invention.

도 4a 및 도 4b는 본 발명의 실시예에 따른 휴대단말기에서 영상인식을 위하여 표시된 캡처 영역의 화면을 나타내는 예시도4A and 4B are exemplary views showing screens of a capture area displayed for image recognition in a mobile terminal according to an embodiment of the present invention.

본 발명은 음성인식을 이용한 휴대단말기에서의 문서작성 방법에 관한 것으로서, 더욱 상세하게는 음성인식과 영상인식을 이용하여 휴대단말기에서의 문서작성 방법에 관한 것이다.The present invention relates to a method for creating a document in a mobile terminal using voice recognition, and more particularly, to a method for creating a document in a mobile terminal using voice recognition and image recognition.

휴대단말기의 휴대율이 높아짐과 동시에 사용자들이 시간과 장소를 가리지 않고 휴대단말기를 사용하는 모습을 자주 목격할 수 있다. 사용자들은 상기 휴대단말기로 전화통화 수행 뿐 아니라, 메모, 문자메시지 등 많은 정보들을 입력하여 저장한다. 이때, 일반적인 키 입력만으로 상기 정보를 입력하기 위해서 소모되는 시간이 증가하고, 많은 키 입력을 수행해야 하는 문제점을 해결하기 위하여 터치스크린, 음성인식 등을 이용한 보다 간단한 정보입력 방법이 개발되고 있다. At the same time as the portable terminal of the portable terminal increases, users can often witness the use of the portable terminal regardless of time and place. Users not only perform phone calls with the mobile terminal, but also store and input a lot of information such as memos and text messages. In this case, a simpler information input method using a touch screen, voice recognition, and the like has been developed in order to solve the problem of increasing the time required for inputting the information using only a general key input and performing a large number of key inputs.

그러나, 상기의 음성인식 기술의 경우, 사용자가 발성한 음성만을 이용하여 음성인식을 수행하기 때문에, 사용자마다의 발성, 발음차이로 인하여 입력하고자 하는 정보에 대한 인식률이 정확하지 않은 문제점이 있었다. 그리하여, 보다 정확한 인식을 가능하게 하는 음성인식 기술의 필요성이 대두되고 있다. However, in the case of the speech recognition technology, since the speech recognition is performed using only the voice spoken by the user, there is a problem in that the recognition rate for the information to be input is not accurate due to the utterance and pronunciation difference for each user. Thus, there is a need for a voice recognition technology that enables more accurate recognition.

따라서, 본 발명의 목적은 마이크와 카메라로부터 수신된 음성데이터와 영상데이터를 인식하여 문자데이터로 변환하는 방법을 제공하는 것이다.Accordingly, an object of the present invention is to provide a method for recognizing voice data and image data received from a microphone and a camera and converting the text data into text data.

본 발명의 다른 목적은 음성데이터의 인식률이 취약한 부분만을 영상데이터에서 추출되는 문자데이터로 변환하여 문서를 작성하는 방법을 제공하는 것이다.It is another object of the present invention to provide a method for creating a document by converting only a portion where the recognition rate of the voice data is weak into text data extracted from the image data.

위 목적들을 달성하기 위하여, 본 발명에 따른 휴대단말기에서 음성인식과 영상인식을 이용한 문서작성 방법은 문서작성모드 실행 시 마이크와 카메라를 활성화하는 단계, 상기 마이크와 카메라로부터 음성데이터와 영상데이터를 획득하고, 상기 음성데이터를 인식하는 단계, 상기 음성데이터 인식 오류 시 상기 영상데이터를 인식하여 상기 음성데이터의 인식 오류를 대체하는 단계, 상기 인식된 결과로 문자데이터를 생성하는 단계를 포함하는 것을 특징으로 한다. In order to achieve the above objects, the document creation method using the voice recognition and image recognition in the portable terminal according to the present invention comprises the steps of activating the microphone and the camera when executing the document creation mode, to obtain the audio and image data from the microphone and camera And recognizing the voice data, replacing the recognition error of the voice data by recognizing the image data when the voice data recognition error occurs, and generating text data based on the recognized result. do.

이하, 첨부된 도면들을 참조하여 본 발명의 바람직한 실시예를 상세히 설명한다. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

이하, 본 발명의 실시예에서는 카메라가 장착되어 영상인식이 가능하고, 음성인식이 가능한 휴대단말기를 예를 들어 설명한다. 즉, 본 발명의 휴대단말기는 사용자에게 편의를 제공하기 위한 단말기로서, 바람직하게는 카메라가 장착되어 영상인식이 가능하고, 음성인식이 가능한 이동통신 단말기, 이동 정화기, 개인정보 단말기(PDA, Personal Digital Assistant), 스마트 폰(Smart Phone), 노트 북 등과 같은 모든 정보통신기기 및 멀티미디어 기기와, 그에 대한 응용에도 적용될 수 있음은 자명할 것이다. Hereinafter, an embodiment of the present invention will be described with an example of a portable terminal equipped with a camera and capable of image recognition and voice recognition. That is, the portable terminal of the present invention is a terminal for providing convenience to a user. Preferably, the camera is equipped with a video recognition, a mobile communication terminal capable of voice recognition, a mobile purifier, and a personal digital assistant (PDA). It will be apparent that the present invention can be applied to all information and communication devices such as assistants, smart phones, notebooks, and the like, and applications thereof.

또한, 본 발명의 실시예에서 사용되는 "문서"는 사용자가 입력하고자 하는 명령어(예컨대, 숫자, 문자, 숫자와 문자로 구성된 문장)를 구성하는 문자데이터의 집합을 의미한다. 이때, 상기 문자데이터의 집합은 숫자로만 구성된 전화번호일 수 있고, 문자메시지를 작성하기 위한 문자데이터 등을 포함할 수 있다. 휴대단말기는 상기 문자데이터의 집합을 구성하는 각각의 문자데이터에 대한 음성인식 또는 영상인식을 수행한다.In addition, "document" used in the embodiment of the present invention means a set of character data constituting a command (eg, numbers, letters, sentences composed of numbers and letters) that a user wants to input. At this time, the set of the text data may be a phone number consisting only of numbers, and may include text data for creating a text message. The mobile terminal performs voice recognition or image recognition for each character data constituting the set of character data.

도 1은 본 발명의 실시예에 따른 휴대단말기의 주요 구성을 나타내는 블록도이다.1 is a block diagram showing the main configuration of a mobile terminal according to an embodiment of the present invention.

도 1을 참조하면, 상기 휴대단말기는 RF(Radio Frequency)부(101), 입력부(103), 카메라(105), 영상처리부(107), 표시부(109), 제어부(111), 음성인식 부(113), 영상인식부(115), 오디오부(117), 메모리(119)를 포함한다. Referring to FIG. 1, the mobile terminal includes a radio frequency (RF) unit 101, an input unit 103, a camera 105, an image processing unit 107, a display unit 109, a controller 111, and a voice recognition unit ( 113, an image recognition unit 115, an audio unit 117, and a memory 119.

RF부(101)는 휴대단말기와 이동통신망과의 통상적인 무선통신을 수행한다. 예를 들어, RF부(101)는 이동통신망을 통한 음성데이터의 송/수신, 문자메시지의 송/수신 및 멀티미디어메시지의 송/수신 등을 수행한다. The RF unit 101 performs normal wireless communication between the mobile terminal and the mobile communication network. For example, the RF unit 101 transmits / receives voice data through a mobile communication network, transmits / receives a text message, and transmits / receives a multimedia message.

입력부(103)는 통상적인 키패드로 구성된다. 또한, 입력부(103)는 터치스크린, 터치패드 및 스크롤 휠 등으로 구성될 수 있다. 입력부(103)는 휴대단말기의 동작을 제어하기 위하여 사용자로부터 입력된 조작신호를 제어부(111)로 전송한다.The input unit 103 is composed of a conventional keypad. In addition, the input unit 103 may include a touch screen, a touch pad, a scroll wheel, and the like. The input unit 103 transmits an operation signal input from a user to the control unit 111 to control the operation of the mobile terminal.

카메라(105)는 피사체의 광 신호를 아날로그 신호로 변환하는 이미지 센서 및 아날로그 신호를 디지털 신호로 변환하는 신호처리부(미도시)를 포함한다. 즉, 카메라(105)는 렌즈를 통해 수집한 광신호를 디지털 신호로 변환하여 영상데이터를 생성한다. 또한, 카메라(105)는 사용자가 표시부(109)에서 프리뷰화면에 표시된 캡처 영역을 확인하면서 자신의 입술 위치에 맞게 이동할 수 있도록 회전 또는 이동이 가능하다.The camera 105 includes an image sensor for converting an optical signal of a subject into an analog signal and a signal processor (not shown) for converting an analog signal into a digital signal. That is, the camera 105 generates image data by converting the optical signal collected through the lens into a digital signal. In addition, the camera 105 may be rotated or moved so that the user may move to fit his / her lips while checking the capture area displayed on the preview screen on the display unit 109.

특히, 본 발명에서 카메라(105)는 영상인식의 계산량을 감소하기 위하여 프리뷰화면에서 캡처 영역이 표시된 부분의 영상데이터를 생성하여 제어부(111)로 전송할 수 있다. In particular, in the present invention, the camera 105 may generate and transmit the image data of the portion in which the capture area is displayed on the preview screen to reduce the amount of computation of the image recognition to the control unit 111.

영상처리부(107)는 카메라(105)에서 생성한 프리뷰화면의 영상데이터를 표시부(109)의 규격에 맞추어 처리한다. The image processing unit 107 processes the image data of the preview screen generated by the camera 105 in accordance with the standard of the display unit 109.

표시부(109)는 제어부(111)의 제어를 받아 휴대단말기에서 이루어지는 일련의 동작상태와 동작결과 및 다수의 정보를 표시한다. 이때, 표시부(109)는 LCD(Liquid Crystal Display), OLED(Organic Light Emitting Diodes), PDP(Plasma Display Panel) 등의 디스플레이 장치로 구성될 수 있다. 표시부(109)는 제어부(111)의 제어를 받아 프리뷰화면에 캡처 영역(capture window)을 표시한다. 또한, 표시부(109)는 제어부(111)의 제어를 받아 입력부(103)에서 입력되는 변경신호에 따라 상기 캡처 영역의 위치변경, 크기변경 등을 표시한다. The display unit 109 displays a series of operation states, operation results, and a plurality of pieces of information made by the mobile terminal under the control of the control unit 111. In this case, the display unit 109 may be configured as a display device such as a liquid crystal display (LCD), organic light emitting diodes (OLED), a plasma display panel (PDP), or the like. The display unit 109 displays a capture area on the preview screen under the control of the controller 111. In addition, the display unit 109 displays the position change, the size change, etc. of the capture area according to the change signal input from the input unit 103 under the control of the control unit 111.

제어부(111)는 휴대단말기의 전반적인 동작에 대한 제어를 담당한다. 제어부(111)는 오디오부(117)에서 수신된 음성데이터를 문자데이터로 변환하기 위한 음성인식부(113)와 카메라(105)에서 수신된 영상데이터를 문자데이터로 변환하기 위한 영상인식부(115)를 포함한다. The controller 111 is responsible for controlling the overall operation of the mobile terminal. The control unit 111 is a voice recognition unit 113 for converting the voice data received from the audio unit 117 to the text data and the image recognition unit 115 for converting the image data received from the camera 105 into the text data. ).

음성인식부(113)는 마이크(MIC)에서 수집된 음성신호가 오디오부(117)에서 음성데이터로 변환된 후 전송된 상기 음성데이터를 수신하여 문자코드로 변환하고 상기 문자코드를 문자데이터로 변환한다. The voice recognition unit 113 converts the voice signal collected by the microphone (MIC) into voice data after the audio signal is converted by the audio unit 117, converts the received voice data into a text code, and converts the text code into text data. do.

영상인식부(115)는 카메라(105)에서 전송된 캡처 영역의 영상데이터(예컨대, 입술 주변의 영상데이터)를 메모리(119)의 버퍼(미도시)에 임시 저장하고, 제어부(111)의 제어에 의해 상기 영상데이터에서 입술의 변화패턴을 추출한다. 또한, 영상인식부(115)는 카메라(105)에서 프리뷰화면에 대한 영상데이터가 전송되면, 제어부(111)의 제어에 의해 상기 캡처 영역의 영상데이터만을 추출하여 메모리(119)의 버퍼에 임시 저장할 수 있다. The image recognition unit 115 temporarily stores image data (eg, image data around the lips) of the capture area transmitted from the camera 105 in a buffer (not shown) of the memory 119, and controls the control unit 111. The change pattern of the lips is extracted from the image data. In addition, when image data for the preview screen is transmitted from the camera 105, the image recognition unit 115 extracts only image data of the capture area under the control of the controller 111 and temporarily stores the image data in the buffer of the memory 119. Can be.

영상인식부(115)는 상기 추출된 입술의 변화모양과 일치하는 입술의 변화패턴을 메모리(119)에 기저장된 입술의 변화패턴에서 검색하여, 일치하는 변화패턴이 존재하면, 상기 존재하는 변화패턴에 해당하는 문자데이터를 추출한다. The image recognition unit 115 searches for the change pattern of the lips that matches the shape of the extracted lips in the change pattern of the lips previously stored in the memory 119, and if there is a matching change pattern, the change pattern exists. Extract the character data corresponding to

제어부(111)는 음성인식부(113)에서 음성데이터를 문자데이터로 변환하던 도중에 일정 시점에서 상기 음성데이터에 대한 오류발생을 감지하면, 제어부(111)는 영상인식부(115)에서 상기 시점에 해당하는 영상데이터에 대한 문자데이터를 상기 시점에서 발생된 오류에 대체한다. 이때, 상기 영상데이터에 대한 문자데이터를 추출하는 방법은 메모리(119)에 기저장된 입술의 변화패턴에서 검색하여 추출할 수 있다. If the controller 111 detects an error with respect to the voice data at a certain point in time while the voice recognition unit 113 converts the voice data into the text data, the control unit 111 at the point in time at the image recognition unit 115. Character data for the corresponding image data is replaced with an error generated at the time. At this time, the method of extracting the character data for the image data may be extracted by searching in the change pattern of the lips previously stored in the memory 119.

이후, 제어부(111)는 상기 음성데이터를 변환한 문자데이터를 기반으로 오류발생 부분을 영상데이터에서 추출된 문자데이터로 대체하여 사용자로부터 입력된 문서(문장, 숫자의 구성 등을 포함할 수 있다)를 구성하는 문자데이터를 표시부(109)에 표시한다. 제어부(111)는 상기 문자데이터로 구성된 문서를 표시한 후에, 메모리(119)의 버퍼에 임시 저장되어 있는 영상데이터를 삭제한다. Subsequently, the control unit 111 replaces an error occurrence part with text data extracted from the image data based on the text data converted from the voice data, and may include a document inputted from a user (consist of sentence, number, etc.). The character data constituting the symbol is displayed on the display unit 109. After displaying the document composed of the character data, the controller 111 deletes the image data temporarily stored in the buffer of the memory 119.

오디오부(117)는 마이크(MIC)를 통해 입력되는 아날로그 오디오 신호를 디지털 오디오 신호로 변환하여 제어부(111)로 제공하고, 제어부(111)에서 출력되는 디지털 오디오 신호가 변환된 아날로그 오디오 신호를 스피커(SPK)를 통해 재생한다. The audio unit 117 converts an analog audio signal input through a microphone (MIC) into a digital audio signal and provides the digital audio signal to the controller 111, and converts the analog audio signal converted from the digital audio signal output from the controller 111 into a speaker. Play through (SPK).

메모리(119)는 제어부(111)의 제어에 의해 휴대단말기를 제어하기 위한 프로그램의 동작과 관련된 정보(예컨대, 설정상태와 메뉴에 대한 정보 등)를 저장한다. 메모리(119)는 버퍼(미도시)를 포함하고, 상기 버퍼는 카메라(105)에서 제어부(111)가 수신하는 영상데이터를 제어부(111)의 제어를 받아 일정시간 동안 실시간으로 임시 저장한다. 이때, 상기 영상데이터는 상기 캡처 영역에 대한 영상데이 터일 수 있다. The memory 119 stores information related to the operation of a program for controlling the mobile terminal under the control of the control unit 111 (for example, information on a setting state and a menu). The memory 119 includes a buffer (not shown), and the buffer temporarily stores the image data received by the controller 111 from the camera 105 in real time for a predetermined time under the control of the controller 111. In this case, the image data may be image data of the capture area.

또한, 메모리(119)는 각 명령어의 특징정보에 대한 입술의 변화패턴을 저장한다. 상기 입술의 변화패턴은 사용자가 임의의 명령어를 발음할 때 생성되는 영상데이터로부터 입술의 특징정보를 추출하여 상기 명령어에 대한 입술동작의 특징을 학습하고, 각 명령어들에 대한 학습 정보를 문자데이터로 저장한다. In addition, the memory 119 stores a change pattern of the lips with respect to the feature information of each command. The change pattern of the lip extracts feature information of the lip from image data generated when a user pronounces an arbitrary command to learn a feature of the lip operation for the command, and converts the learning information for each command into text data. Save it.

도 2는 본 발명의 실시예에 따른 음성인식과 영상인식을 이용한 휴대단말기에서의 문서작성 방법을 나타내는 흐름도이다. 도 3은 본 발명의 실시예에 따른 휴대단말기에서의 영상인식을 위하여 표시된 캡처 영역을 변경하는 방법을 나타내는 상세흐름도이다. 도 4a 및 도 4b는 본 발명의 실시예에 따른 휴대단말기에서 영상인식을 위하여 표시된 캡처 영역의 화면을 나타내는 예시도이다.2 is a flowchart illustrating a document creation method in a mobile terminal using voice recognition and image recognition according to an embodiment of the present invention. 3 is a detailed flowchart illustrating a method of changing a displayed capture area for image recognition in a mobile terminal according to an embodiment of the present invention. 4A and 4B are exemplary views illustrating screens of a capture area displayed for image recognition in a mobile terminal according to an embodiment of the present invention.

도 2 및 도 4b를 참조하면, S201단계에서 제어부(111)는 문서작성모드로의 진입여부를 확인하여, 문서작성모드로 진입하지 않으면, S203단계로 진행하여 S203단계에서 제어부(111)는 해당기능을 수행한다. 반대로, S201단계에서 문서작성모드로 진입하면, 제어부(111)는 S205단계로 진행하여 마이크(MIC)와 카메라(105)를 활성화한다. 2 and 4B, in step S201, the control unit 111 checks whether the document creation mode has been entered. If the document creation mode is not entered, the control unit 111 proceeds to step S203, and in step S203, Perform the function. On the contrary, when entering the document creation mode in step S201, the control unit 111 proceeds to step S205 to activate the microphone (MIC) and the camera 105.

S207단계에서 제어부(111)는 활성화된 카메라(105)에서 획득되어 영상신호가 변환된 영상데이터를 표시부(109)에 프리뷰화면으로 표시하고, 캡처 영역을 표시한다. 이때, 이에 대한 화면 예시도는 도 4a와 같이 구성될 수 있다. 도 4a를 참조하면, 표시부(109)에 사용자의 얼굴에 대한 영상데이터가 표시되고, 참조번호 401과 같이 사용자의 입술 주변에 사각형태의 캡처 영역이 표시된다. 이때, 상기 캡처 영 역은 사각형태 등을 포함한 다양한 형태의 캡처 영역일 수 있다.In operation S207, the control unit 111 displays the image data acquired by the activated camera 105 and converted into an image signal on the display unit 109 as a preview screen, and displays a capture area. In this case, a screen example thereof may be configured as shown in FIG. 4A. Referring to FIG. 4A, image data of a user's face is displayed on the display unit 109, and a rectangular capture area is displayed around the user's lips as shown by reference numeral 401. In this case, the capture area may be a capture area of various forms including a square shape.

이후, S209단계에서 제어부(111)가 입력부(103)로부터 상기 캡처 영역의 변경에 대한 신호를 수신하면, S211단계에서 제어부(111)는 상기 캡처 영역을 변경한다. 이때, 상기 캡처 영역 변경에 대한 상세 설명은 도 3을 참조하여 설명한다. Thereafter, when the control unit 111 receives a signal for changing the capture area from the input unit 103 in step S209, the control unit 111 changes the capture area in step S211. In this case, a detailed description of the capture area change will be described with reference to FIG. 3.

S301단계에서 제어부(111)가 입력부(103)로부터 상기 캡처 영역의 위치변경에 대한 신호를 수신하면, S303단계에서 제어부(111)는 입력부(103)로부터 위치를 변경하기 위한 신호에 따라 상기 캡처 영역의 위치를 변경한다. 이때, 상기 입력부(103)의 위치를 변경하기 신호는 입력부(103)의 방향키 등으로 입력될 수 있다. If the control unit 111 receives a signal for changing the position of the capture area from the input unit 103 in step S301, the control unit 111 according to the signal for changing the position from the input unit 103 in step S303. Change the position of. In this case, the signal for changing the position of the input unit 103 may be input using a direction key of the input unit 103.

S301단계에서 상기 캡처 영역의 위치변경에 대한 신호를 수신하지 않거나, S303단계에서 상기 캡처 영역의 위치가 변경되면, 제어부(111)는 S305단계로 진행하여 상기 캡처 영역의 크기변경에 대한 신호를 수신한다. 이때, 상기 크기변경에 대한 신호가 입력부(103)로부터 수신되면, 제어부(111)는 S307단계로 진행하여 입력부(103)로부터 입력되는 크기를 변경하기 위한 신호에 응답하여 상기 캡처 영역의 크기를 변경한다. 이후, 제어부(111)가 S309단계에서 상기 캡처 영역에 대한 변경완료 신호를 수신하면, 도 2로 복귀하여 S213단계로 진행한다. If the signal for changing the location of the capture area is not received in step S301 or if the location of the capture area is changed in step S303, the controller 111 proceeds to step S305 to receive a signal for changing the size of the capture area. do. In this case, when the signal for changing the size is received from the input unit 103, the controller 111 changes the size of the capture area in response to the signal for changing the size input from the input unit 103 in step S307. do. Thereafter, when the control unit 111 receives the change completion signal for the capture area in step S309, the control unit 111 returns to FIG. 2 and proceeds to step S213.

이때, 상기 도 3에 대한 화면 예시도는 도 4b와 같이 구성될 수 있다. 도 4b를 참조하면, 표시부(109)에 표시된 사용자의 얼굴에 대한 영상데이터가 표시되고, "메뉴"를 선택하면, 캡처 영역(참조번호, 401)의 위치와 크기를 변경할 수 있는 서브 메뉴가 표시된다. 이때, "1. 위치변경"을 선택하고, 입력부(103)의 방향키 등을 이용하여 상기 캡처 영역의 위치를 상, 하, 좌, 우로 변경할 수 있다. 또한, "2. 크기변경"을 선택하고, 입력부(103)의 방향키 등을 이용하여, 배율 등으로 상기 캡처 영역의 크기를 변경할 수 있다. In this case, the screen example of FIG. 3 may be configured as shown in FIG. 4B. Referring to FIG. 4B, image data of a user's face displayed on the display unit 109 is displayed, and when "Menu" is selected, a sub-menu for changing the position and size of the capture area (reference number 401) is displayed. do. In this case, it is possible to select “1. Change position” and change the position of the capture area to up, down, left, or right by using the direction key of the input unit 103. In addition, it is possible to select "2. Change size" and change the size of the capture area by the magnification or the like by using the direction key of the input unit 103 or the like.

이후, S213단계에서 제어부(111)는 상기 활성화된 마이크(MIC)와 카메라(105)로부터 획득된 음성데이터와 영상데이터를 수신한다. 이때, 상기 음성데이터는 마이크(MIC)에서 획득된 후, 오디오부(117)에서 디지털 형태의 음성데이터로 변환된다. 또한, 상기 영상데이터는 프리뷰화면에서의 캡처 영역에 대한 영상데이터일 수 있다. S215단계에서 제어부(111)는 상기 수신된 캡처 영역의 영상데이터를 일정시간 동안 메모리(119)의 버퍼에 임시 저장한다. In operation S213, the control unit 111 receives the audio data and the image data obtained from the activated microphone MIC and the camera 105. In this case, the voice data is obtained from the microphone MIC, and then converted into digital data in the audio unit 117. The image data may be image data of a capture area on a preview screen. In operation S215, the controller 111 temporarily stores the image data of the received capture area in a buffer of the memory 119 for a predetermined time.

S217단계에서 제어부(111)의 음성인식부(113)는 상기 일정시간 동안 수신된 상기 음성데이터를 문자코드로 변환하여 상기 문자코드를 문자데이터로 변환함으로써, 상기 음성데이터를 문자데이터로 인식한다. 이후, S219단계에서 제어부(111)는 상기 일정시간 동안 음성데이터를 문자데이터로 변경 시에 인식 오류의 발생여부를 판단한다. In operation S217, the voice recognition unit 113 of the controller 111 converts the voice data received during the predetermined time into a text code and converts the text code into text data, thereby recognizing the voice data as text data. Thereafter, in step S219, the control unit 111 determines whether a recognition error occurs when changing voice data into text data for the predetermined time.

S219단계에서 판단된 결과, 상기 음성데이터에 대한 인식 오류가 발생하지 않으면, 제어부(111)는 S221단계로 진행하여, S217단계에서의 음성인식결과로 문자데이터를 생성한다. As a result of the determination in step S219, if a recognition error for the voice data does not occur, the control unit 111 proceeds to step S221 and generates text data based on the voice recognition result in step S217.

반대로, S219단계에서 판단된 결과, 상기 음성데이터에 대하여 임의의 시점에서 인식 오류가 발생하면, 제어부(111)는 S223단계로 진행하여 상기 버퍼에 임시 저장된 영상데이터에서 상기 인식 오류가 발생한 시점에 해당하는 부분의 영상데이터를 추출한다. 이후, S225단계에서 제어부(111)는 상기 추출된 영상데이터의 변화 패턴을 추출한다. 제어부(111)는 상기 추출된 변화패턴을 메모리(119)에 기저장된 변화패턴과 비교하여, 비교결과가 동일할 경우에 상기 변화패턴에 해당하는 문자데이터를 추출한다. S227단계에서 제어부(111)는 상기 추출된 문자데이터를 상기 인식 오류가 발생한 시점으로 대체하고, S229단계에서 제어부(111)는 상기 대체된 문자데이터를 포함하여 사용자로부터 입력된 문서를 구성하는 문자데이터를 생성한다. On the contrary, if the recognition error occurs at any point in time as a result of the determination in step S219, the controller 111 proceeds to step S223 and corresponds to the point in time when the recognition error occurs in the image data temporarily stored in the buffer. Image data of the part to be extracted is extracted. In operation S225, the control unit 111 extracts the change pattern of the extracted image data. The controller 111 compares the extracted change pattern with a change pattern previously stored in the memory 119, and extracts character data corresponding to the change pattern when the comparison result is the same. In step S227, the control unit 111 replaces the extracted text data with the time point at which the recognition error occurs, and in step S229, the control unit 111 includes the replaced text data and constitutes text data input from a user. Create

S231단계에서 제어부(111)는 S221단계에서 생성된 문자데이터 혹은 S229단계에서 생성된 문자데이터를 표시부(109)에 표시한다. 또한, S233단계에서 제어부(111)는 상기 S215단계에서 저장된 영상데이터를 삭제한다. In operation S231, the control unit 111 displays the character data generated in operation S221 or the character data generated in operation S229 on the display unit 109. In addition, in step S233, the control unit 111 deletes the image data stored in step S215.

마지막으로, S235단계에서 제어부(111)가 입력부(103)로부터 입력 종료 신호를 수신하면 제어부(111)는 사용자로부터의 문서 입력을 종료하고, 표시부(109)에 표시된 문자데이터에 대한 임의의 기능을 수행한다. S235단계에저 제어부(111)가 종료 신호를 수신하지 않으면, 제어부(111)는 S213단계로 회귀하여 상기의 과정을 반복하여, 사용자가 입력하고자 하는 문서를 종료 신호를 수신할 때까지 표시부(109)에 표시한다. Finally, when the control unit 111 receives the input end signal from the input unit 103 in step S235, the control unit 111 terminates the document input from the user, and performs any function on the text data displayed on the display unit 109. Perform. If the control unit 111 does not receive the end signal in step S235, the control unit 111 returns to step S213 and repeats the above process, until the user receives the end signal for the document to be input. ).

지금까지 본 발명에 대해서 상세히 설명하였으나, 그 과정에서 언급한 실시예는 예시적인 것일 뿐, 한정적인 것이 아님을 분명히 하며, 본 발명은 이하의 특허청구범위에 의해 제공되는 본 발명의 기술적 사상이나 분야를 벗어나지 않는 범위내에서, 균등하게 대처될 수 있는 정도의 구성요소 변경은 본 발명의 범위에 속한다 할 것이다.Although the present invention has been described in detail so far, it should be apparent that the embodiments mentioned in the process are only illustrative, and not restrictive, and the present invention is provided by the following claims. Within the scope not departing from the scope of the present invention, changes in the components that can be coped evenly will fall within the scope of the present invention.

이상에서 설명한 바와 같이 본 발명은 마이크와 카메라로부터 수신된 음성데이터와 영상데이터를 인식하여 문자데이터로 변환한다. As described above, the present invention recognizes the voice data and the image data received from the microphone and the camera and converts them into text data.

보다 상세하게 본 발명은 음성데이터를 문자데이터로 변환하는 과정에서 오류가 발생되어 인식이 명확하지 않은 부분을 입술 영역의 영상데이터에서 추출되는 문자데이터로 대체함으로써, 음성인식의 정확성을 향상시킬 수 있는 효과가 있다. In more detail, the present invention can improve the accuracy of speech recognition by replacing the parts that are not clearly recognized because the error occurs in the process of converting the speech data into the text data with the text data extracted from the image data of the lips area It works.

또한, 음성인식의 정확성을 향상함으로써, 사용자가 입력하는 문서작성을 효율적으로 수행할 수 있는 효과가 있다. In addition, by improving the accuracy of the speech recognition, there is an effect that can efficiently perform the document input by the user.

Claims

In the document creation method using speech recognition and image recognition in a mobile terminal,

Activating a microphone and a camera when the document creation mode is executed;

Acquiring voice data and image data from the microphone and the camera, and recognizing the voice data;

Recognizing the image data and replacing the recognition error of the voice data when the voice data recognition error occurs;

And generating text data based on the recognized result.

The method of claim 1, wherein activating the microphone and the camera

And displaying a capture area on a preview screen after activating the camera.

The method of claim 2, wherein acquiring the image data and recognizing the audio data comprises:

Receiving the obtained audio data and image data of the capture area;

And storing the image data of the received capture area temporarily.

The method of claim 3, wherein the recognizing the voice data

And recognizing voice data acquired by the microphone to generate text data corresponding to the voice data.

The method of claim 3, wherein replacing the recognition error of the voice data comprises:

Extracting image data corresponding to an error point at which the recognition error occurs from the image data of the temporarily stored capture area;

Extracting a change pattern of the extracted image data;

Comparing the extracted change pattern with a previously stored change pattern;

Extracting character data corresponding to the previously stored change pattern that matches the extracted change pattern;

And replacing the error time at which the recognition error occurs with the extracted text data.

The method of claim 5, wherein converting the recognized result into text data comprises:

Generating voice data and video data as text data as a result of recognizing text data generated as a result of recognizing the voice data and text data replaced at an error time when the recognition error occurs;

Displaying the generated character data;

And deleting the image data of the temporarily stored capture area.

The method of claim 5, wherein the previously stored change pattern is

And learning information about each command is stored as text data in the memory of the mobile terminal by learning characteristics of the change of the lip shape for each command from the image data.

The method of claim 3,

Generating and displaying text data corresponding to the voice data if the voice data recognition error does not occur;

And deleting the image data of the temporarily stored capture area.

The method of claim 2,

Selecting whether to change the displayed capture area;

And changing the capture area if a change of the capture area is selected.

10. The method of claim 9, wherein changing the capture area

Selecting a change of position of a capture area displayed on the preview screen;

Changing the location of the capture area if the change of location is selected;

And displaying the capture area at the changed location when the location change of the capture area is completed.

The method of claim 10, wherein changing the capture area

Selecting a size change of the capture area displayed on the preview screen;

Changing the size of the capture area if the resizing is selected;

And displaying a capture area of the changed size when the resizing of the capture area is completed.