KR102042131B1

KR102042131B1 - Method for a video stabilization for real-time optical character recognition (OCR)

Info

Publication number: KR102042131B1
Application number: KR1020180011600A
Authority: KR
Inventors: 이윤구
Original assignee: 광운대학교 산학협력단
Priority date: 2018-01-30
Filing date: 2018-01-30
Publication date: 2019-11-07
Also published as: KR20190092155A

Abstract

본 발명의 바람직한 일 실시예로서, 영상 내 글자인식시 흔들림보정방법은 입력프레임의 일부를 출력프레임으로 설정하는 단계; 상기 출력프레임의 중심좌표 및 OCR기법을 상기 출력프레임에 적용하여 예측된 각 단어의 위치를 이용하여 상기 출력프레임 내에서 목표단어를 결정하는 단계;및 출력프레임의 중심좌표와 상기 목표단어의 위치 및 상기 목표단어의 영역 중 적어도 하나를 기초로 보정량을 결정하는 단계;를 포함한다.As a preferred embodiment of the present invention, a method of compensating for shaking a character in an image may include setting a portion of an input frame as an output frame; Determining a target word in the output frame using the predicted position of each word by applying the center coordinates and the OCR technique to the output frame; and the center coordinates of the output frame and the position of the target word. And determining a correction amount based on at least one of the regions of the target word.

Description

Method for stabilizing video when real-time character recognition in terminal {Method for a video stabilization for real-time optical character recognition (OCR)}

본 발명은 단말기에 구비된 영상촬영장치에서 촬영한 영상 내에서 실시간으로 글자를 인식하는 방법에 관한 것이다. 보다 상세히, 글자 인식시 영상을 안정화하는 방법에 관한 것이다.The present invention relates to a method for recognizing a character in real time in an image photographed by an image photographing apparatus provided in a terminal. More particularly, the present invention relates to a method of stabilizing an image upon character recognition.

실시간 OCR 기법은 영상 내에서 사용자가 설정한 관심영역 내의 글자를 실시간으로 인식한다. 이 경우, 마커는 특정 단어를 미리보기스크린(preview screen)상의 중심에 표시하도록 이용된다. 스마트폰과 같은 터치형 단말기에서는 사용자가 관심을 둔 특정 단어에 마커를 위치하도록 인터페이스를 지원하나, 사용자의 손이 흔들리는 경우 영상 내에 표시되는 미리보기 스크린이 흔들리는 경우가 발생할 수 있다. The real-time OCR technique recognizes letters in a region of interest set by a user in an image in real time. In this case, the marker is used to mark a particular word in the center on the preview screen. In a touch type terminal such as a smartphone, the interface is supported to place a marker on a specific word of interest to the user, but when the user's hand shakes, the preview screen displayed in the image may shake.

이러한 문제점을 해결하기 위하여 다양한 영상 흔들림 안정화기법이 적용되고 있다. 영상 흔들림 안정화 기법에는 OIS(optical image stabilization), DIS(digital image stabilization), 카메라 안정화 같은 기법이 있다. To solve this problem, various image stabilization techniques have been applied. Image stabilization techniques include optical image stabilization (OIS), digital image stabilization (DIS), and camera stabilization.

일반적으로, Gimbal과 같은 카메라 안정화 구성을 이용하여 카메라의 흔들림이나 회전을 인식하고, 카메라의 오리엔테이션을 보상하는 방법 등이 이용되고 있다. 이 외에도 자이로스코프를 이용하여 카메라의 움직임을 예측하는 방법, 영상의 전체적인 움직임을 파악하여 영상 전체를 안정화하는 방법, 카메라의 3차원 공간 상에서 움직임을 예측하고, 카메라의 움직임을 안정화하여 새로운 위치에서 영상을 합성하는 방법들이 개발되어 왔다. 또한, 화면의 움직임 내지 카메라의 움직임을 예측하기 위하여, 영상 내의 객체들을 트래킹하고, 트래킹한 객체의 움직임을 보상하는 방식으로 구현되고 있다. In general, a method of recognizing camera shake or rotation using a camera stabilization configuration such as Gimbal and compensating for the camera orientation is used. In addition, a method of predicting camera movement using a gyroscope, a method of stabilizing the entire image by grasping the overall motion of the image, predicting the motion in the three-dimensional space of the camera, and stabilizing the movement of the camera to display the image at a new position Methods of synthesizing have been developed. In addition, in order to predict the movement of the screen or the movement of the camera, objects in the image are tracked and implemented in a manner of compensating for the movement of the tracked object.

KR 10-2007-0093995 AKR 10-2007-0093995 A

본 발명의 바람직한 일 실시예에서는 글자 중심의 화면 안정화 기법을 제공하고자 한다. In a preferred embodiment of the present invention to provide a screen-centric screen stabilization technique.

본 발명의 바람직한 일 실시예로서, 영상 내 글자인식시 흔들림보정방법은 영상촬영장치로부터 입력되는 입력프레임의 일부를 출력프레임으로 설정하는 단계; 상기 출력프레임의 중심좌표 및 OCR기법을 상기 출력프레임에 적용하여 예측된 상기 출력프레임 내의 각 단어의 위치를 이용하여 상기 출력프레임 내에서 목표단어를 결정하는 단계;및 n번째 출력프레임의 중심좌표와 상기 목표단어의 위치 및 상기 목표단어의 영역 중 적어도 하나를 기초로 보정량을 결정하는 단계;를 포함하는 것을 특징으로 한다. In a preferred embodiment of the present invention, a method for compensating for shaking a character in an image may include: setting a part of an input frame input from an image photographing apparatus as an output frame; Determining a target word in the output frame using the predicted position of each word in the output frame by applying the center coordinates and the OCR technique to the output frame; and the center coordinate of the nth output frame. And determining a correction amount based on at least one of the position of the target word and the area of the target word.

본 발명의 바람직한 일 실시예로서, 상기 목표단어를 결정하는 단계는 n번째 출력프레임의 중심좌표 M(n)와 상기 출력프레임 내의 각 단어의 중심좌표 간의 거리가 최소인 단어를 목표단어로 결정하며, 이 경우 각 단어의 위치는 Rect(k)= (x(k),y(k),w(k),h(k))이고, x(k), y(k)는 각 단어의 시작점 위치, w(k)는 각 단어의 가로길이, h(k)는 각 단어의 세로 길이를 나타내며, 상기 각 단어의 중심좌표는

로 표시되는 것을 특징으로 한다.In a preferred embodiment of the present invention, the step of determining the target word is to determine the target word of the word having the minimum distance between the center coordinate M (n) of the nth output frame and the center coordinate of each word in the output frame In this case, the position of each word is Rect (k) = (x (k), y (k), w (k), h (k)), and x (k) and y (k) are the starting point of each word. Where w (k) is the horizontal length of each word, h (k) is the vertical length of each word, and the center coordinate of each word is

Characterized in that is represented.

본 발명의 바람직한 일 실시예로서, 상기 목표단어를 결정하는 단계는 n번째 출력프레임의 중심좌표 M(n)가 상기 n번째 출력프레임에서 인식한 임의의 단어의 좌표범위 내에 속하는 경우 해당 단어를 목표단어를 결정하며, 이 경우 상기 단어의 좌표범위는 가로의 경우 ((x(k),y(k))부터 (((x(k)+w(k)),y(k)), 세로의 경우 ((x(k),y(k))부터 ((x(k), (y(k)+h(k)))인 것을 특징으로 한다. In a preferred embodiment of the present invention, the step of determining the target word targets the word if the center coordinate M (n) of the nth output frame falls within the coordinate range of any word recognized in the nth output frame. In this case, the coordinate range of the word ranges from ((x (k), y (k)) to (((x (k) + w (k)), y (k)), vertical for horizontal. In the case of (x (k), y (k)) to ((x (k), (y (k) + h (k))).

본 발명의 바람직한 일 실시예로서, 상기 보정량을 결정하는 단계는 상기 n번째 출력프레임의 중심좌표와 상기 목표단어의 중심좌표간의 거리를 기초로 보정량을 결정하는 것을 특징으로 한다.In a preferred embodiment of the present invention, the step of determining the correction amount is characterized in that for determining the correction amount based on the distance between the center coordinates of the n-th output frame and the center coordinates of the target word.

본 발명의 바람직한 일 실시예로서, 상기 보정량은 상기 n번째 출력프레임의 중심좌표와 상기 목표단어의 중심좌표간의 거리에 반비례하도록 결정되는 것을 특징으로 한다.In a preferred embodiment of the present invention, the correction amount is determined to be inversely proportional to the distance between the center coordinate of the nth output frame and the center coordinate of the target word.

본 발명의 바람직한 일 실시예로서, n번째 출력프레임의 중심에 n번째 마커(marker) M_n가 중첩되어 표시되는 것을 특징으로 한다.According to an exemplary embodiment of the present invention, an n th marker M _n is superimposed and displayed at the center of the n th output frame.

본 발명의 바람직한 일 실시예로서, (n-1)번째 출력프레임의 중심에 표시된 (n-1)번째 마커 M_n _-1의 위치가 상기 (n-1)번째 출력프레임 내의 임의의 단어 내부에 위치하는 경우, 해당 단어를 n번째 출력프레임의 목표단어로 결정하는 것을 특징으로 한다.In a preferred embodiment of the present invention, the position of the (n-1) th marker M _n _-1 indicated at the center of the (n-1) th output frame is located within any word in the (n-1) th output frame. In the case of positioning, the word is determined as a target word of the nth output frame.

본 발명의 바람직한 일 실시예로서, (n-1)번째 출력프레임의 중심에 표시된 (n-1)번째 마커 M_n _-1의 위치가 상기 (n-1)번째 출력프레임 내에서 인식된 각 단어 사이의 빈 공간에 위치하는 경우, 상기 (n-1)번째 마커 M_n _-1의 위치와 상기 (n-1)번째 출력프레임 내에서 인식된 각 단어의 중심과의 거리를 기초로 상기 목표단어를 결정하는 것을 특징으로 한다.In one embodiment of the present invention, (n-1) th output displayed in the center of the frame (n-1) th marker M _n, each word in the recognition of the _-1 position is the (n-1) th output frame The target word based on a distance between the position of the (n-1) th marker M _n _-1 and the center of each word recognized in the (n-1) th output frame when located in an empty space between the target words. Characterized in determining.

본 발명의 바람직한 일 실시예로서, (n-1)번째 마커 M_n _-1의 위치와 상기 (n-1)번째 출력프레임 내에서 인식된 각 단어의 중심과의 거리가 최소인 단어를 n번째 출력프레임의 목표단어로 결정하는 것을 특징으로 한다.In a preferred embodiment of the invention, (n-1) th marker M _n _-1 of the position and the (n-1) th output and the center distance of the smallest word of each word recognized in the frame n-th It is characterized by determining the target word of the output frame.

본 발명의 또 다른 바람직한 일 실시예로서, 영상촬영장치로부터 입력되는 입력프레임의 일부를 출력프레임으로 설정하는 단계; 상기 출력프레임의 중심좌표 및 상기 출력프레임에서 인식한 각 단어의 위치를 이용하여 상기 출력프레임 내에서 목표단어를 결정하는 단계;및 n번째 출력프레임의 중심좌표와 상기 목표단어의 중심좌표간의 거리에 반비례하는 보정량을 결정하는 단계; 상기 보정량만큼 상기 설정된 출력프레임을 이동하는 단계; 상기 이동된 출력프레임을 상기 입력프레임에서 크랍(crop)하여 디스플레이하는 단계;를 포함하는 것을 특징으로 한다.According to another preferred embodiment of the present invention, a step of setting a portion of the input frame input from the image photographing apparatus as an output frame; Determining a target word in the output frame using the center coordinates of the output frame and the position of each word recognized in the output frame; and a distance between the center coordinate of the nth output frame and the center coordinate of the target word. Determining an inversely corrected amount; Moving the set output frame by the correction amount; And cropping and displaying the moved output frame in the input frame.

본 발명의 바람직한 일 실시예로서, 상기 이동된 출력프레임의 중심점에 마커 M'을 중첩하여 표시하는 단계;를 더 포함하는 것을 특징으로 한다.As an exemplary embodiment of the present invention, the method may further include displaying the marker M 'overlapping the center point of the moved output frame.

본 발명의 또 다른 바람직한 일 실시예로서, CMOS 센서가 구비된 영상촬영장치로부터 입력되는 입력프레임의 일부를 출력프레임으로 설정하는 단계; 상기 입력프레임에 롤링셔터왜곡이 발생한 경우, 상기 롤링셔터왜곡 보정을 수행하는 단계; 상기 출력프레임의 중심좌표 및 상기 출력프레임에서 인식한 각 단어의 위치를 이용하여 상기 출력프레임 내에서 목표단어를 결정하는 단계;및 n번째 출력프레임의 중심좌표와 상기 목표단어의 중심좌표간의 거리에 반비례하는 보정량을 결정하는 단계; 상기 보정량만큼 상기 설정된 출력프레임을 이동하는 단계; 상기 이동된 출력프레임을 상기 입력프레임에서 크랍(crop)하여 디스플레이하는 단계;를 포함하는 것을 특징으로 한다. According to another preferred embodiment of the invention, the step of setting a portion of the input frame input from the image photographing apparatus equipped with a CMOS sensor as an output frame; When rolling shutter distortion occurs in the input frame, performing the rolling shutter distortion correction; Determining a target word in the output frame using the center coordinates of the output frame and the position of each word recognized in the output frame; and a distance between the center coordinate of the nth output frame and the center coordinate of the target word. Determining an inversely corrected amount; Moving the set output frame by the correction amount; And cropping and displaying the moved output frame in the input frame.

본 발명의 또 다른 바람직한 일 실시예로서, 영상 내 글자인식시 흔들림보정을 수행하는 영상촬영장치는 영상촬영장치로부터 입력되는 입력프레임의 일부를 출력프레임으로 설정하는 출력프레임설정부; 상기 출력프레임의 중심좌표 및 OCR기법을 상기 출력프레임에 적용하여 예측된 상기 출력프레임 내의 각 단어의 위치를 이용하여 상기 출력프레임 내에서 목표단어를 결정하는 목표단어결정부;및 n번째 출력프레임의 중심좌표와 상기 목표단어의 위치 및 상기 목표단어의 영역 중 적어도 하나를 기초로 보정량을 결정하는 보정부;를 포함하는 것을 특징으로 한다.According to another preferred embodiment of the present invention, an image recording apparatus for performing shake correction when recognizing characters in an image includes an output frame setting unit for setting a part of an input frame input from the image photographing apparatus as an output frame; A target word determination unit determining a target word in the output frame by using the position of each word in the output frame predicted by applying the center coordinates and the OCR technique of the output frame to the output frame; And a correction unit determining a correction amount based on at least one of a center coordinate, a position of the target word, and an area of the target word.

본 발명의 또 다른 바람직한 일 실시예로서, 영상 내 글자인식시 흔들림보정을 수행하는 영상촬영장치는 영상촬영장치로부터 입력되는 입력프레임의 일부를 출력프레임으로 설정하는 출력프레임설정부; 상기 출력프레임의 중심좌표 및 상기 출력프레임에서 인식한 각 단어의 위치를 이용하여 상기 출력프레임 내에서 목표단어를 결정하는 목표단어결정부;및 n번째 출력프레임의 중심좌표와 상기 목표단어의 중심좌표간의 거리에 반비례하는 보정량을 결정하고 상기 보정량만큼 상기 설정된 출력프레임을 이동하는 보정부; 상기 이동된 출력프레임을 상기 입력프레임에서 크랍(crop)하여 디스플레이하는 디스플레이부;를 포함하는 것을 특징으로 한다.According to another preferred embodiment of the present invention, an image recording apparatus for performing shake correction when recognizing characters in an image includes an output frame setting unit for setting a part of an input frame input from the image photographing apparatus as an output frame; A target word determination unit that determines a target word in the output frame using the center coordinates of the output frame and the position of each word recognized in the output frame; and a center coordinate of an nth output frame and a center coordinate of the target word. A correction unit which determines an amount of correction inversely proportional to a distance between and moves the set output frame by the amount of correction; And a display unit for cropping and displaying the moved output frame from the input frame.

본 발명의 또 다른 바람직한 일 실시예로서, 영상 내 글자인식시 흔들림보정을 수행하는 영상촬영장치는 CMOS 센서가 구비된 영상촬영장치로부터 입력되는 입력프레임의 일부를 출력프레임으로 설정하는 출력프레임설정부; 상기 입력프레임에 롤링셔터왜곡이 발생한 경우, 상기 롤링셔터왜곡 보정을 수행하는 왜곡보정부; 상기 출력프레임의 중심좌표 및 상기 출력프레임에서 인식한 각 단어의 위치를 이용하여 상기 출력프레임 내에서 목표단어를 결정하는 목표단어결정부;및 n번째 출력프레임의 중심좌표와 상기 목표단어의 중심좌표간의 거리에 반비례하는 보정량을 결정하고 상기 보정량만큼 상기 설정된 출력프레임을 이동하는 보정부; 상기 이동된 출력프레임을 상기 입력프레임에서 크랍(crop)하여 디스플레이하는 디스플레이부;를 포함하는 것을 특징으로 한다. According to another preferred embodiment of the present invention, the image recording apparatus for performing image stabilization during character recognition in the image output frame setting unit for setting a portion of the input frame input from the image photographing apparatus equipped with a CMOS sensor ; A distortion correction unit configured to perform the rolling shutter distortion correction when the rolling shutter distortion occurs in the input frame; A target word determiner configured to determine a target word in the output frame using the center coordinates of the output frame and the position of each word recognized in the output frame; and a center coordinate of an nth output frame and a center coordinate of the target word. A correction unit for determining a correction amount inversely proportional to a distance between the two and moving the set output frame by the correction amount; And a display unit for cropping and displaying the moved output frame from the input frame.

본 발명의 바람직한 일 실시예에서는 흔들리는 영상에서 영상이 아닌 글자 중심으로 안정화를 수행하는 기법을 제안한다. 화면상에 다수의 글자 혹은 단어가 있을 때, 사용자의 의도를 자동으로 파악하여 사용자가 원하는 글자 혹은 단어를 중심으로 화면을 안정화하는 효과가 있다. 이로써, 화면 안정화 내지 영상 안정화 기법에서 사용자의 반응성이 간과되던 문제가 해결되어 사용자의 작은 의도적움직임 내지 사용자의 반응성을 빠르게 반영할 수 있는 효과가 있다.In a preferred embodiment of the present invention, a technique for performing stabilization around a character rather than an image in a shaking image is proposed. When there are a plurality of letters or words on the screen, the user's intention is automatically detected and the screen is stabilized around the letters or words desired by the user. As a result, the problem that the user's responsiveness is overlooked in the screen stabilization or image stabilization technique is solved, so that the user's small intentional movement or the user's responsiveness can be quickly reflected.

본 발명의 바람직한 일 실시예로서, 영상 내 글자인식시 흔들림보정을 수행하는 방법은 현재 입력프레임을 기준으로 출력할 출력프레임의 크랍(crop) 범위만을 조정하여 출력함으로써 과도한 연산이 필요없고 신속하게 흔들림을 보정할 수 있는 효과가 있다. As a preferred embodiment of the present invention, the method of performing image stabilization during image recognition in the image by adjusting only the crop (crop) range of the output frame to output based on the current input frame does not require excessive operation and shakes quickly There is an effect that can correct.

본 발명의 바람직한 일 실시예로서, CMOS 센서가 구비된 영상촬영장치를 이용하여 입력프레임에 롤링셔터왜곡이 발생한 경우, 일반적으로 적용되는 롤링셔터왜곡을 보완한 후에 본 발명의 글자인식시 흔들림 보정을 수행함으로써 CMOS 센서가 구비된 영상촬영장치를 이용하는 경우에도 글자인식을 개선할 수 있는 효과가 있다. As a preferred embodiment of the present invention, when a rolling shutter distortion occurs in an input frame by using an image photographing apparatus equipped with a CMOS sensor, a shake correction is performed during letter recognition of the present invention after supplementing a rolling shutter distortion generally applied. In this case, even when using an image photographing apparatus equipped with a CMOS sensor, there is an effect of improving character recognition.

도 1 (a) 내지 (c)는 스마트폰에서 구현한 돋보기 앱의 일 예를 도시한다.
도 2 는 저대역필터를 이용한 화면안정화 기법의 문제를 도시한다.
도 3 은 본 발명의 바람직한 일 실시예로서, 영상 내 글자인식시 흔들림보정을 수행하는 영상촬영장치의 내부 구성도를 도시한다.
도 4 는 본 발명의 바람직한 일 실시예로서, 입력프레임과 출력프레임 간의 관계를 도시한다.
도 5 는 본 발명의 바람직한 일 실시예로서, 영상 내 글자인식시 흔들림이 발생한 경우 이를 보정한 일 예를 도시한다.
도 6 은 본 발명의 바람직한 일 실시예로서, 출력프레임 내의 목표단어를 설정하는 일 예를 도시한다.
도 7 은 본 발명의 바람직한 일 실시예로서, 입력영상 내의 글자인식시 흔들림이 발생한 경우 본 발명에 따라 보정한 예와 종래의 방법에 따라 보정한 예를 비교한 일 예를 도시한다.
도 8 내지 9 는 본 발명의 바람직한 일 실시예로서, 입력영상 내의 글자인식시 흔들림이 발생한 경우 보정하는 흐름도를 각각 도시한다.1 (a) to (c) shows an example of a magnifying glass app implemented in a smartphone.
2 illustrates a problem of a picture stabilization technique using a low band filter.
FIG. 3 is a diagram illustrating an internal configuration of an image photographing apparatus that performs shake correction when recognizing letters in an image according to an exemplary embodiment of the present invention.
4 illustrates a relationship between an input frame and an output frame as a preferred embodiment of the present invention.
FIG. 5 is a diagram for one example of correcting a shake when a character is recognized in an image as an exemplary embodiment of the present invention.
6 illustrates an example of setting a target word in an output frame as a preferred embodiment of the present invention.
FIG. 7 illustrates an example of comparing the example corrected according to the present invention with the example corrected according to a conventional method when a shaking occurs in character recognition in an input image.
8 to 9 illustrate a flowchart for correcting when a shake occurs in character recognition in an input image as an exemplary embodiment of the present invention.

도 1 (a) 내지 (c)는 스마트폰에서 구현한 돋보기 앱의 일 예를 도시한다. 1 (a) to (c) shows an example of a magnifying glass app implemented in a smartphone.

도 1 (a)는 사용자가 스마트폰 돋보기 앱을 이용하여 스마트폰 스크린 상에 디스플레이된 "자동차"(120), "자전거"(110), "기차"(130)라는 단어 중 "자전거"(110)라는 단어를 확대하여 확대된 확대단어 "자전거"(110a)가 스마트폰 스크린 상에 중첩되어 표시된 일 예를 도시한다. Figure 1 (a) is a "bicycle" (110) of the words "car" 120, "bicycle" 110, "train" 130 displayed on the smartphone screen by the user using a smartphone magnifying glass app (110) ) Shows an example in which the enlarged word "bike" 110a enlarged by enlarging the word) is superimposed and displayed on a smartphone screen.

입력영상 내에서 "기차"(130)라는 단어가 화면의 테두리와 떨어진 거리가 도 1(a)의 경우는 d1(111), 도 1(b)의 경우는 d2(112), 도 1(c)의 경우는 d3(113)임을 알 수 있다. In the input image, the distance between the word "train" 130 and the edge of the screen is d1 (111) in FIG. 1 (a), d2 (112) in FIG. 1 (b), and FIG. 1 (c). In the case of) it can be seen that d3 (113).

사용자는 도 1(a)와 같이 "자전거"(110)라는 단어를 확대하여 보다가, 도 1(b)의 예시와 같이 "자전거"(110)의 좌측 단어인 "자동차"(120)를 확대하여 확대단어 "자동차"(120a)를 보고, 이 후 다시 스마트폰의 스크린을 움직여서 도 1(c)의 예시와 같이 "자전거"(110)를 확대하여 확대단어 "자전거"(110c)를 보고 있다. The user enlarges the word "bike" 110 as shown in FIG. 1 (a), and enlarges the word "car" 120 which is the left word of the "bike" 110 as shown in FIG. 1 (b). By looking at the enlarged word "car" (120a), and then moving the screen of the smartphone again to enlarge the "bike" 110 as shown in the example of Figure 1 (c) to see the enlarged word "bike" (110c). .

종래에는 도 1(a) 내지 (c)와 같이 사용자가 스마트폰의 스크린을 움직이는 경우, 사용자의 움직임의 의도가 의도적움직임인지 비의도적움직임인지를 파악하지 않았다. 단순히 화면의 안정화를 위하여 스마트폰의 스크린이 움직이는 경우, 스마트폰의 움직임 정도 또는 스마트폰의 위치정보를 기초로 저대역필터(LPF, Low Pass Filter)를 통과시켜 스마트폰의 움직임을 부드러운 경로로 변환시켜 사용자에게 스마트폰의 영상을 보여주는 것이 일반적이었다. Conventionally, when the user moves the screen of the smartphone as shown in Figs. 1 (a) to (c), it is not determined whether the intention of the user's movement is intentional or unintentional movement. When the screen of the smartphone moves simply to stabilize the screen, the movement of the smartphone is converted into a smooth path by passing through a low pass filter (LPF) based on the degree of movement of the smartphone or location information of the smartphone. It was common to show a smartphone image to a user.

이 경우, 사용자의 작은 움직임을 원하지 않는 화면의 흔들림으로 판단하여 흔들림을 안정화시키는 기술을 적용함으로써 사용자가 도 1(a)에서 도 1(b)로 그 후 다시 도 1(c)로 움직인 움직임이 스마트폰 화면에 그대로 반영이 되지 않는 문제가 있었다. In this case, the motion of the user moving from Fig. 1 (a) to Fig. 1 (b) and then back to Fig. 1 (c) by applying a technique of stabilizing the shaking by judging the small movement of the user as the shaking of the unwanted screen. There was a problem that the smartphone screen is not reflected as it is.

도 2 를 참고하여, 저대역필터를 이용하여 화면 안정화를 시키는 경우 발생하는 문제점을 살펴본다. 일반적으로 화면의 흔들림은 고주파성분을 지니고 있기 때문에, 흔들림을 감소시키기 위하여 저대역필터를 적용하고 있다. 입력신호(200)를 저대역필터에 통과시킬 경우, 작은 잔떨림 성분이 효율적으로 제거되는 안정적인 영상신호(201, 202)값을 획득할 수 있다. Referring to Figure 2, looks at the problem that occurs when the screen is stabilized by using a low-band filter. In general, since the shaking of the screen has a high frequency component, a low band filter is applied to reduce the shaking. When the input signal 200 is passed through the low pass filter, it is possible to obtain stable image signal values 201 and 202 in which small residual components are efficiently removed.

그러나, 저대역필터의 경우 영상에서 획득한 과거 또는 현재시점과 관련한 정보만으로 저주파 필터링을 수행함으로써, 카메라 등과 같은 영상촬영장치에 갑작스러운 움직임이 발생할 경우 도 2와 같이 안정적인 영상신호(201, 202)는 획득하였으나, 갑작스러운 변화 R1(210), R2(220)를 따라가지 못하는 문제가 발생한다. However, in the case of the low band filter, low frequency filtering is performed using only information related to past or present time points acquired from an image, and thus, when a sudden movement occurs in an image photographing apparatus such as a camera, the stable image signals 201 and 202 are shown in FIG. 2. Is obtained, but there is a problem that can not follow the sudden change R1 (210), R2 (220).

약한 저주파 필터링을 수행하는 경우에도 입력신호(200)와 저주파 필터링이 수행된 영상신호(201)간에는 응답시간 R2(220)의 차이가 발생하며, 일반적인 저주파 필터링을 수행하는 경우 입력신호(200)와 저주파 필터링이 수행된 영상신호(202)간에는 굉장히 큰 응답시간 R2(220)의 차이가 발생한다.Even when the weak low frequency filtering is performed, a difference in response time R2 220 occurs between the input signal 200 and the low frequency filtering image signal 201, and when the low frequency filtering is performed in general, the input signal 200 and the low frequency filtering are performed. A very large difference in response time R2 220 occurs between the image signals 202 subjected to low frequency filtering.

즉, 저대역필터를 이용하여 작은 잔떨림 성분을 제거함으로써 안정적인 경로는 확보할 수 있으나 갑작스러운 카메라의 움직임과 같은 변화를 따라가지 못하거나 또는 사용자의 작은 의도적움직임을 감지하지 못하는 문제가 발생한다. 이러한 현상은 사용자가 스마트폰을 움직일 때, 화면이 바로 업데이트되지 않는 현상으로 나타난다. In other words, by using a low-pass filter to remove a small blur component can ensure a stable path, but there is a problem that can not follow the change, such as sudden camera movement or the user does not detect a small intentional movement. This phenomenon occurs when the user moves the smartphone, the screen is not updated immediately.

본 발명의 바람직한 일 실시예에서는 이상과 같이 저대역필터를 이용하거나 또는 기존에 화면 안정화 내지 영상 안정화 기법을 적용하는 경우 응답시간이 너무 늦어지거나, 입력신호의 고주파 성분이 과도하게 제거되는 문제점을 해결하고자 한다. In a preferred embodiment of the present invention as described above, when using a low-band filter or when applying a conventional image stabilization or image stabilization technique, the response time is too late, or the high frequency component of the input signal is removed excessively I would like to.

도 3 은 본 발명의 바람직한 일 실시예로서, 영상 내 글자인식시 흔들림보정을 수행하는 영상촬영장치의 내부 구성도를 도시한다.FIG. 3 is a diagram illustrating an internal configuration of an image photographing apparatus for performing shake correction when recognizing characters in an image according to an exemplary embodiment of the present invention.

영상촬영장치(300)는 출력프레임설정부(310), 목표단어결정부(320), 보정부(330)를 포함하고, 디스플레이부(340)를 더 포함할 수 있다. The image capturing apparatus 300 may include an output frame setting unit 310, a target word determiner 320, and a correction unit 330, and may further include a display unit 340.

본 발명에서 영상촬영장치는 단말기, 휴대폰, 스마트폰, 스마트와치, 테블릿, 노트북, 컴퓨터, 핸드헬드장치, 웨어러블 장치등을 모두 포함하는 것으로 해석될 수 있다. In the present invention, the image capturing apparatus may be interpreted as including a terminal, a mobile phone, a smart phone, a smart watch, a tablet, a notebook, a computer, a handheld device, a wearable device, and the like.

또한 영상촬영장치는 피사체로부터의 광학 신호를 입력하는 광학부, 광학부를 통해 입력된 광학 신호를 전기 신호로 변환하는 촬상 소자, 촬상 소자로부터 제공된 전기 신호에 대해 노이즈 저감 처리, 디지털 신호로 변환 처리 등의 신호 처리를 행하는 입력 신호 처리부,광학부를 구동하는 모터, 모터의 동작을 제어하는 구동부를 구비할 수 있다. 또한, 사용자의 조작 신호를 입력하는 사용자 입력부, 입력 영상의 데이터, 연산 처리를 위한 데이터, 처리 결과 등을 임시 저장하는 SDRAM, 플래시 메모리, 영상 파일을 저장하는 기록 장치로서 SD/CF/SM 카드 등을 더 구비할 수 있다. 또한, 영상촬영장치에 구비된 촬상 소자로 CMOS(Complementary Metal Oxide Semiconductor) 센서 어레이, CCD(Charge coupled device) 센서 어레이 등을 사용할 수 있다. In addition, the image photographing apparatus includes an optical unit for inputting an optical signal from a subject, an imaging device for converting an optical signal input through the optical unit into an electrical signal, a noise reduction process for the electrical signal provided from the imaging device, a conversion process for a digital signal, and the like. An input signal processor for performing signal processing of the motor, a motor for driving the optical unit, and a driver for controlling the operation of the motor. In addition, a user input unit for inputting a user's operation signal, SDRAM for temporarily storing data of input image, data for arithmetic processing, processing result, flash memory, SD / CF / SM card, etc. as a recording device for storing image file. It may be further provided. In addition, as an imaging device included in the image photographing apparatus, a CMOS (Complementary Metal Oxide Semiconductor) sensor array, a charge coupled device (CCD) sensor array, or the like may be used.

도 3 에 도시된 영상촬영장치(300)의 각 구성은 다음과 같은 특징이 있다. Each configuration of the image capturing apparatus 300 illustrated in FIG. 3 has the following characteristics.

출력프레임설정부(310)는 영상촬영장치로부터 입력되는 입력프레임의 일부를 출력프레임으로 설정하도록 구현된다. 도 4 를 참고하면, 출력프레임설정부(310)는 입력프레임(400)에서 (n-1)번째 출력프레임(410) 또는 n번째 출력프레임(420)를 각각 설정할 수 있다. The output frame setting unit 310 is implemented to set a part of the input frame input from the image photographing apparatus as an output frame. Referring to FIG. 4, the output frame setting unit 310 may set the (n−1) th output frame 410 or the n th output frame 420 in the input frame 400, respectively.

목표단어결정부(320)는 출력프레임에 OCR기법을 적용하여 예측된 출력프레임 내의 각 단어의 위치, 각 단어의 영역과 출력프레임의 중심좌표를 이용하여 출력프레임 내에서 목표단어를 결정할 수 있다. 목표단어결정부(320)에서는 현재 출력프레임 정보만을 이용하여 목표단어를 결정하거나 또는 이전 출력프레임 정보를 더 이용하여 목표단어를 결정할 수 있다. The target word determiner 320 may determine the target word in the output frame by applying the OCR technique to the output frame using the position of each word in the predicted output frame, the region of each word, and the center coordinate of the output frame. The target word determiner 320 may determine the target word using only the current output frame information or may further determine the target word using the previous output frame information.

본 발명의 바람직한 일 실시예로서, 목표단어결정부(320)는 현재 출력프레임 정보만을 이용하여 목표단어를 결정하는 일 예는 다음과 같다.As an exemplary embodiment of the present invention, the target word determiner 320 determines the target word using only the current output frame information as follows.

목표단어결정부(320)는 n번째 출력프레임의 중심좌표 M(n)와 상기 출력프레임 내의 각 단어의 중심좌표 간의 거리가 최소인 단어를 목표단어로 결정한다. n번째 출력프레임의 각 단어의 위치는 Rect(k)= (x(k),y(k),w(k),h(k))이고, x(k), y(k)는 각 단어의 시작점 위치, w(k)는 각 단어의 가로길이, h(k)는 각 단어의 세로 길이를 나타내며, 각 단어의 중심좌표는

로 표시될 수 있다. The target word determiner 320 determines a word having a minimum distance between the center coordinate M (n) of the nth output frame and the center coordinate of each word in the output frame. The position of each word of the nth output frame is Rect (k) = (x (k), y (k), w (k), h (k)), and x (k) and y (k) are each word. Where w (k) is the width of each word, h (k) is the length of each word, and the center coordinates of each word are

It may be represented as.

도 6의 일 실시예를 참고하면, 입력영상(600)의 일부를 출력영상(610)으로 설정할 수 있다. 출력영상(610) 내의 각 단어, "새", "자동차", "자전거" 및 "기차"를 OCR 기법으로 인식할 수 있으며, 또한 인식된 각 단어의 위치를 예측할 수 있다. Referring to the exemplary embodiment of FIG. 6, a part of the input image 600 may be set as the output image 610. Each word, “new”, “car”, “bicycle” and “train” in the output image 610 may be recognized by the OCR technique, and the position of each recognized word may be predicted.

본 발명의 바람직한 일 실시예로서, 입력영상(600)은 절대좌표계를 이용한다. 입력영상(600)을 기준으로 할 경우, "자전거"(601)가 입력영상(600)의 중심에 가까우나, 출력영상(610)을 기준으로 할 경우, "새"(611)가 출력영상(610)의 중심에 가깝게 계산될 수 있다. 본 발명의 바람직한 일 실시예에서 목표단어결정부(320)는 출력영상(610) 중심과의 거리를 기준으로 "자전거"(601)가 아닌 "새"(611)를 목표단어로 결정할 수 있다. In one preferred embodiment of the present invention, the input image 600 uses an absolute coordinate system. If the reference image 600, the "bicycle" (601) is close to the center of the input image 600, if the reference to the output image 610, "new" 611 is the output image ( 610 may be calculated close to the center. In an exemplary embodiment of the present invention, the target word determiner 320 may determine “new” 611, not “bicycle” 601, based on the distance from the center of the output image 610 as the target word.

이 경우, 보정부(330)는 출력영상 내의 목표단어 "새"(611)의 중심(612)과 출력영상(610)의 중심간의 거리에 반비례하는 값만큼 출력화면을 이동하여 보정을 수행할 수 있다. 본 발명의 또 다른 일 실시예에서는 목표단어의 중심과 출력영상의 중심간의 거리의 제곱에 반비례하도록 보정량을 결정할 수 있다. In this case, the correction unit 330 may perform correction by moving the output screen by a value inversely proportional to the distance between the center 612 of the target word “new” 611 and the center of the output image 610 in the output image. have. In another embodiment of the present invention, the correction amount may be determined to be inversely proportional to the square of the distance between the center of the target word and the center of the output image.

보정량이 결정되면, 디스플레이부(340)는 화면 영상을 보정량만큼 이동한 후 입력영상에서 보정량이 반영된 출력영상을 크랍하여 디스플레이에 표시한다. 이 경우, n번째 출력프레임의 중심에 n번째 마커(marker) M_n가 중첩되어 디스플레이에 함께 표시될 수 있다.When the correction amount is determined, the display unit 340 moves the screen image by the correction amount, and then crops the output image in which the correction amount is reflected in the input image and displays it on the display. In this case, the n th marker M _n may be superimposed on the center of the n th output frame and displayed together on the display.

본 발명과 종래의 흔들림 보정 기법과의 가장 큰 차이는 여기서 발생한다. 종래의 흔들림 보정 기법은 영상의 움직임이나 카메라의 움직임을 예측해야 하기 때문에 연속하는 영상 혹은 서로 다른 시간에 촬영된 영상과의 분석이 필수적으로 필요하다. 예를들어 n번째 영상과 n-1번째 영상 사이의 분석이 필요하다. 그러나, 본 발명의 일 실시예에서는 n번째 출력프레임(610)만을 이용하여서도 흔들림 보정을 수행할 수 있는 차이점이 있다. The greatest difference between the present invention and conventional stabilization techniques occurs here. In the conventional image stabilization technique, since the motion of the image or the movement of the camera must be predicted, it is necessary to analyze the continuous images or the images photographed at different times. For example, analysis between the nth image and the n-1th image is necessary. However, in one embodiment of the present invention, there is a difference that shake correction can be performed using only the n-th output frame 610.

또 다른 일 실시예로서, 목표단어결정부(320)는 n번째 출력프레임의 중심좌표 M(n)가 상기 n번째 출력프레임에서 인식한 임의의 단어의 좌표범위 내에 속하는 경우 해당 단어를 목표단어를 결정하며, 이 경우 상기 단어의 좌표범위는 가로의 경우 ((x(k),y(k))부터 (((x(k)+w(k)),y(k)), 세로의 경우 ((x(k),y(k))부터 ((x(k), (y(k)+h(k)))인 것을 특징으로 한다. 이 경우, 보정부(330)는 출력영상 내의 목표단어 "새"(611)의 중심(612)과 출력영상(610)의 중심간의 거리에 반비례하는 값만큼 출력화면을 이동하여 보정을 수행할 수 있다. 또한, 보정부(330)는 "새"(611) 단어의 크기를 고려하여 보정량을 결정할 수 있다. As another example, the target word determiner 320 may determine the target word when the center coordinate M (n) of the nth output frame falls within the coordinate range of any word recognized by the nth output frame. In this case, the coordinate range of the word ranges from ((x (k), y (k)) to (((x (k) + w (k)), y (k)) for the horizontal and vertical ((x (k), y (k)) to ((x (k), (y (k) + h (k))). In this case, the correction unit 330 is configured to display the output image. Correction may be performed by moving the output screen by a value inversely proportional to the distance between the center 612 of the target word “bird” 611 and the center of the output image 610. In addition, the correction unit 330 may perform correction. The amount of correction may be determined in consideration of the size of the word "611".

도 6 을 참고하면, "새"라는 단어의 영역이 (x1,y1)~(x1+w1, y1+h1)인 경우 출력영상(510)의 중심(미 도시)이 "새"라는 단어의 영역(x1,y1)~(x1+w1, y1+h1) 이내에 포함되어 있는 경우 "새"를 목표단어로 인식할 수 있다. 이 경우는 "새"라는 단어의 크기를 고려한 방법이다. 일 예로, "새"(611) 단어의 가로 w1*세로 h1을 기초로 계산된 면적량 또는 "새"(611)의 중심(612)과 출력영상(610)의 중심간의 거리 중 적어도 하나를 기초로 보정량을 결정할 수 있다.Referring to FIG. 6, when the area of the word “new” is (x1, y1) to (x1 + w1, y1 + h1), the center (not shown) of the output image 510 is the area of the word “new”. When included within (x1, y1) to (x1 + w1, y1 + h1), the word “new” can be recognized as a target word. This case takes the size of the word "new" into account. For example, based on at least one of the area amount calculated based on the horizontal w1 * length h1 of the word “new” 611 or the distance between the center 612 of the “bird” 611 and the center of the output image 610. The amount of correction can be determined with.

즉, 만유인력과 같이 지구와 물체와의 거리에 따라 출력영상에서 인식된 임의의 단어가 출력영상의 중심점을 끌어당기는 모델로 위치를 보정할 수 있다. "새"(611) 단어의 크기에 해당하는 w1 x h1는 질량에 해당하며 "새"단어의 중심(612)과 출력영상(610)의 중심과의 거리는 지구와 물체와의 거리에 해당한다. 이를 통해서 만유인력과 같은 힘을 정의할 수 있다. 계산된 힘의 크기에 따라 시간에 따른 위치 변화량을 계산할 수 있다. 이 모델에서는 단어와 중심과의 거리가 가까울수록 강한힘으로 당겨지며 거리가 멀수록 그 효과가 줄어들게 된다. 따라서, 단어와 중심과의 목표 단어의 거리가 가깝게 되면 목표단어는 항상 출력화면 중심에 위치하게 되어 사용자 입장에서는 화면이 안정화된 것으로 느낄 수 있다. 사용자가 갑작스런 움직임으로 목표단어가 출력영상 중심에서 일정크기 이상 이동할 경우 단어와 출력영상간에 서로 작용하는 힘이 줄어들어 단어의 중력권을 벗어나게 된다. That is, according to the distance between the earth and the object, such as the universal gravity, the position can be corrected by a model in which an arbitrary word recognized in the output image pulls the center point of the output image. The w1 x h1 corresponding to the size of the word "new" 611 corresponds to the mass, and the distance between the center of the word "new" 612 and the center of the output image 610 corresponds to the distance between the earth and the object. This allows us to define forces like the universal gravity. Depending on the magnitude of the calculated force, the position change over time can be calculated. In this model, the closer the word is to the center, the stronger the force, and the farther away the effect is. Therefore, when the distance between the word and the target word is close to the center, the target word is always positioned at the center of the output screen, and the user may feel that the screen is stabilized. When the user moves the target word more than a certain size from the center of the output image due to a sudden movement, the force of interaction between the word and the output image is reduced, leaving the gravity of the word.

본 발명의 또 다른 바람직한 일 실시예로서, 목표단어결정부(320)는 이전 출력프레임 정보와 현재 출력프레임 정보만을 이용하여 목표단어를 결정할 수 있다. As another preferred embodiment of the present invention, the target word determiner 320 may determine the target word using only previous output frame information and current output frame information.

도 4를 참고하면, (n-1)번째 출력프레임(410)의 중심에 표시된 (n-1)번째 마커 M_n _-1의 위치(411)가 상기 (n-1)번째 출력프레임(410) 내의 임의의 단어 내부에 위치하는 경우, 해당 단어를 n번째 출력프레임의 목표단어로 결정할 수 있다. Referring to FIG. 4, the position 411 of the (n-1) th marker M _n _-1 displayed at the center of the (n-1) th output frame 410 is the (n-1) th output frame 410. If it is located inside any word in the word, the word may be determined as the target word of the nth output frame.

또 다른 바람직한 일 실시예로서, (n-1)번째 출력프레임(410)의 중심에 표시된 (n-1)번째 마커 M_n _-1의 위치(411)가 (n-1)번째 출력프레임(410) 내에서 인식된 각 단어 사이의 빈 공간에 위치하는 경우, (n-1)번째 마커 M_n _-1의 위치(411)와 (n-1)번째 출력프레임 내에서 인식된 각 단어의 중심과의 거리를 기초로 목표단어를 결정할 수 있다. In another preferred embodiment, (n-1) th output frame 410 is displayed in the center of (n-1) _n-th marker M position 411 of _-1 (n-1) th output of the frame (410 In the space between each word recognized within the), the center of each word recognized within the position 411 of the (n-1) th marker M _n _-1 and the (n-1) th output frame. The target word can be determined based on the distance of.

또 다른 바람직한 일 실시예로서,(n-1)번째 마커 M_n _-1의 위치(411)와 (n-1)번째 출력프레임(410) 내에서 인식된 각 단어의 중심과의 거리가 최소인 단어를 n번째 출력프레임의 목표단어로 결정할 수 있다. In another preferred embodiment, the distance between the position 411 of the (n-1) th marker M _n _-1 and the center of each word recognized in the (n-1) th output frame 410 is the minimum. The word may be determined as the target word of the nth output frame.

도 5 는 본 발명의 바람직한 일 실시예로서, 보정부에서 보정을 통해 출력영상에 흔들림을 보정한 일 예를 도시한다. FIG. 5 is a diagram for one example of correcting a shake in an output image through correction by a correction unit.

n번째 입력영상(500a)에서 출력영상의 범위를 "기차"라는 단어와 화면의 일 측면과의 거리 d1(501)이도록 설정(501)한 후 흔들림이 발생하지 않은 경우, n번째 출력영상(510)에서 "기차"라는 단어와 화면의 일 측면과의 거리 d3(503)는 실질적으로 동일하다. When the range of the output image in the nth input image 500a is set to be a distance d1 (501) between the word "train" and one side of the screen (501), and there is no shaking, the nth output image (510) ), The distance d3 503 between the word "train" and one side of the screen is substantially the same.

그러나, n+1번째 입력영상(500b)을 촬영할 때 영상촬영장치의 미세한 이동으로 영상이 오른쪽으로 미세하게 이동한 경우, "기차"라는 단어와 화면의 일측면과의 거리 d2(502)는 d1(501)보다 작게된다. 본 발명의 바람직한 일 실시예에서는, n+1번째 입력영상(500b)에서 출력영상을 설정하고, 설정한 출력영상의 중심과 출력영상내의 각 단어의 위치를 기초로 목표단어를 설정하고, 목표단어의 중심과 설정한 출력영상의 중심을 기초로 보정을 수행하여, n+1번째 출력영상(520)을 출력한다. 이 경우, 또한, d3(503)와 같도록 보정량을 결정할 수 있다. However, when the image is minutely moved to the right due to the minute movement of the image photographing device when the n + 1th input image 500b is photographed, the distance between the word "train" and one side of the screen is d1 (502). Becomes smaller than 501. In a preferred embodiment of the present invention, the output image is set in the n + 1th input image 500b, the target word is set based on the center of the set output image and the position of each word in the output image, and the target word. The correction is performed based on the center of the center and the center of the set output image to output the n + 1 th output image 520. In this case, the amount of correction can also be determined to be equal to d3 (503).

본 발명의 바람직한 일 실시예로서, 도 5에서 n+1번째 출력영상(520)은 n+1 번째 입력영상(500b)만을 기초로 보정량을 결정하여 n+1번째 출력영상(520)에 표시된 "기차"와 화면의 일 측면의 거리 d4(504)는 d2(502)보다 크도록 보정량을 결정할 수 있다. In an exemplary embodiment of the present invention, in FIG. 5, the n + 1 th output image 520 determines the correction amount based on only the n + 1 th input image 500b and is displayed on the n + 1 th output image 520. Train ”and the distance d4 504 of one side of the screen may determine the correction amount to be larger than d2 502.

본 발명의 또 다른 바람직한 일 실시예로서, 도 5에서 n+1번째 출력영상(520)은 n 번째 출력영상(510)과 n+1번째 입력영상(500b)을 모두 고려하여 보정량을 결정할 수 있다. As another preferred embodiment of the present invention, in FIG. 5, the n + 1 th output image 520 may determine the correction amount in consideration of both the n th output image 510 and the n + 1 th input image 500b. .

이 경우, n+1번째 출력영상(520)의 중심점을 M_n ₊₁, n번째 출력영상(510)의 중심점을 M_n인 경우, n+1번째 출력영상(520)의 중심점과 n번째 출력영상(510)의 중심점은 아래와 같은 관계가 있다.In this case, when the center point of the n + 1 th output image 520 is M _n ₊₁ and the center point of the n th output image 510 is M _n , the center point and the n th output of the n + 1 th output image 520 are output. The center point of the image 510 has the following relationship.

M_n ₊₁=M_n+update(n), update(n)은 n번째 출력영상(510)의 보정량M _n ₊₁ = M _n + update (n), update (n) is the correction amount of the nth output image 510

update(n)은 목표단어의 위치, 목표단어의 크기(영역), n번째 출력영상(510)의 중심 Mn을 고려하여 결정할 수 있다. update (n) may be determined in consideration of the position of the target word, the size (region) of the target word, and the center Mn of the nth output image 510.

도 7 은 본 발명의 바람직한 일 실시예로서, 입력영상 내의 글자인식시 흔들림이 발생한 경우 본 발명에 따라 보정한 예와 종래의 방법에 따라 보정한 예를 비교한 일 예를 도시한다. FIG. 7 illustrates an example of comparing the example corrected according to the present invention with the example corrected according to a conventional method when a shaking occurs in character recognition in an input image.

입력영상(700) 중 2번째 입력프레임(701), 16번째 입력프레임(702), 26번째 입력프레임(703) 및 38번째 입력프레임(704)를 도시한다. 각 입력프레임(701~704)의 중심점을 나타내는 마커(701a, 702a, 703, 704a)는 일정하게 "have"라는 단어의 중심에 있으나 입력프레임 간에 약간의 미세한 움직임이 있는 경우를 도시한다.A second input frame 701, a 16th input frame 702, a 26th input frame 703, and a 38th input frame 704 of the input image 700 are illustrated. The markers 701a, 702a, 703, and 704a, which represent the center points of the respective input frames 701 to 704, are constantly at the center of the word "have", but show a slight movement between the input frames.

영상촬영장치에서 약간의 미세한 움직임이 있는 경우 기존방법에 따라 영상을 보정한 출력영상(720)에서, 2번째 출력프레임(721), 16번째 출력프레임(722), 26번째 출력프레임(723) 및 38번째 출력프레임(724)의 중심점(721a, 722a, 723a, 724a)은 모두 "have"라는 단어를 벗어나, 단어와 단어 사이의 빈 공간에 위치한다. 영상촬영장치에서 촬영한 글자를 포함하는 영상에서 발생한 미세한 움직임의 보상에 비효율적인 것을 확인할 수 있다. If there is a slight movement in the image photographing apparatus, in the output image 720, which is corrected according to the existing method, the second output frame 721, the 16th output frame 722, the 26th output frame 723 and The center points 721a, 722a, 723a, and 724a of the 38th output frame 724 all leave the word "have" and are located in the empty space between the words. It can be seen that it is inefficient in compensating for the minute movement occurring in the image including the characters photographed by the image photographing apparatus.

본 발명의 영상 내 글자인식시 흔들림보정방법을 적용하여 입력영상(700)을 보정한 출력영상(710)은 2번째 출력프레임(7211), 16번째 출력프레임(712), 26번째 출력프레임(713) 및 38번째 출력프레임(714)의 중심점이 모두 "have"라는 단어의 중심점(711a, 712a, 713a, 714a)에 위치하여, 각 프레임(711~714)의 중심간의 차이점을 실질적으로 거의 분간하기 어렵다. 즉, 촬영장치에서 입력영상 내의 글자인식시 흔들림이 발생하는 경우에도 안정적으로 흔들림을 보상하는 것을 확인할 수 있다. The output image 710 correcting the input image 700 by applying the shake correction method in the image recognition according to the present invention includes a second output frame 7141, a 16th output frame 712, and a 26th output frame 713. ) And the center point of the 38th output frame 714 are located at the center points 711a, 712a, 713a, and 714a of the word "have", so that the difference between the centers of each of the frames 711 to 714 is substantially almost different. It is difficult. That is, even when a shake occurs when a character is recognized in the input image, the image pickup device can stably compensate for the shake.

도 8 내지 9 는 본 발명의 바람직한 일 실시예로서, 입력영상 내의 글자인식시 흔들림이 발생한 경우 보정하는 흐름도를 각각 도시한다.8 to 9 illustrate a flowchart for correcting when a shake occurs in character recognition in an input image as an exemplary embodiment of the present invention.

도 8 은 본 발명의 바람직한 일 실시예로서, 입력영상 내의 글자인식시 흔들림이 발생한 경우의 흐름도를 도시한다. 8 is a flowchart illustrating a case in which shaking occurs when character recognition in an input image is performed according to an exemplary embodiment of the present invention.

출력프레임설정부는 촬영장치로부터 입력되는 입력프레임의 일부를 출력프레임으로 설정한다(S810). 목표단어결정부에서는 출력프레임설정부에서 설정한 출력프레임의 중심좌표 및 출력프레임에서 인식한 각 단어의 위치를 이용하여 목표단어를 결정한다(S820). 본 발명의 바람직한 일 실시예에서는 실시간으로 OCR 기법을 적용하여 출력프레임에서 각 단어의 위치를 예측하거나 인식할 수 있으며, 이 외에 다양한 단어 검출 기법을 이용하여 영상 내에서 단어, 글자, 캐릭터(character) 등을 검출할 수 있다. The output frame setting unit sets a portion of an input frame input from the photographing apparatus as an output frame (S810). The target word determination unit determines the target word using the center coordinates of the output frame set by the output frame setting unit and the position of each word recognized in the output frame (S820). In a preferred embodiment of the present invention, by applying the OCR technique in real time to predict or recognize the position of each word in the output frame, in addition to the words, letters, characters in the image using a variety of word detection techniques Etc. can be detected.

보정부는 n번째 출력프레임의 중심좌표와 목표단어의 중심좌표간의 거리에 반비례하는 보정량을 결정하고(S830), 보정량만큼 상기 설정된 출력프레임을 이동한다(S840). 디스플레이부는 이동된 출력프레임을 입력프레임에서 크랍(crop)하여 디스플레이한다(S850).The correction unit determines a correction amount inversely proportional to the distance between the center coordinate of the nth output frame and the center coordinate of the target word (S830), and moves the set output frame by the amount of correction (S840). The display unit crops and displays the moved output frame from the input frame (S850).

도 9는 본 발명의 바람직한 일 실시예로서, CMOS를 이용하는 영상촬영장치에서 입력영상 내의 글자인식시 흔들림이 발생한 경우의 흐름도를 도시한다. FIG. 9 is a flowchart illustrating a case in which shaking occurs when a character is recognized in an input image in an image photographing apparatus using CMOS according to an exemplary embodiment of the present invention.

출력프레임설정부에서 CMOS 센서가 구비된 영상촬영장치로부터 입력되는 입력프레임의 일부를 출력프레임으로 설정한다(S910). 본 발명의 바람직한 일 실시예에서는 CMOS 센서로 인해 입력프레임에 롤링셔터왜곡이 발생한 경우, 롤링셔터왜곡 보정을 수행한 후(S920), 입력영상 내의 글자 인식시 발생한 흔들림 보정을 수행한다(S930~S960). 입력영상 내의 글자 인식시 발생한 흔들림 보정은 도 8의 S820~S850) 단계를 참고한다. In the output frame setting unit, a part of the input frame input from the image photographing apparatus equipped with the CMOS sensor is set as an output frame (S910). In a preferred embodiment of the present invention, when the rolling shutter distortion occurs in the input frame due to the CMOS sensor, after performing the rolling shutter distortion correction (S920), the shake correction generated during character recognition in the input image is performed (S930 to S960). ). For the shake correction generated when the character in the input image is recognized, refer to steps S820 to S850 of FIG. 8.

본 발명은 또한 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 컴퓨터가 읽을 수 있는 기록매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피디스크, 광데이터 저장장치 등이 있다. 또한 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다.The invention can also be embodied as computer readable code on a computer readable recording medium. The computer-readable recording medium includes all kinds of recording devices in which data that can be read by a computer system is stored. Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disks, optical data storage devices, and the like. The computer readable recording medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.

앞에서 설명되고, 도면에 도시된 본 발명의 실시 예들은 본 발명의 기술적 사상을 한정하는 것으로 해석되어서는 안 된다. 본 발명의 보호범위는 청구범위에 기재된 사항에 의하여만 제한되고, 본 발명의 기술분야에서 통상의 지식을 가진 자는 본 발명의 기술적 사상을 다양한 형태로 개량 변경하는 것이 가능하다. 따라서 이러한 개량 및 변경은 통상의 지식을 가진 자에게 자명한 것인 경우에는 본 발명의 보호범위에 속하게 될 것이다. The embodiments of the present invention described above and illustrated in the drawings should not be construed as limiting the technical spirit of the present invention. The protection scope of the present invention is limited only by the matters described in the claims, and those skilled in the art can change and change the technical idea of the present invention in various forms. Therefore, such improvements and modifications will fall within the protection scope of the present invention if it is obvious to those skilled in the art.

Claims

Setting a part of an input frame input from an image photographing apparatus as an output frame;
Determining a target word in the output frame using the predicted position of each word in the output frame by applying the center coordinates and the OCR technique to the output frame; and
Determining a correction amount based on at least one of a center coordinate of an output frame, a position of the target word, and an area of the target word;
And the correction amount is determined to decrease when the distance between the center coordinate of the output frame and the center coordinate of the target word increases.

The method of claim 1, wherein the determining of the target word
The target word is a word having a minimum distance between the center coordinate M (n) of the nth output frame and the center coordinate of each word in the output frame. In this case, the position of each word is Rect (k) = (x (k ), y (k), w (k), h (k)), x (k), y (k) is the starting point of each word, w (k) is the width of each word, h (k) Represents the vertical length of each word, and the center coordinate of each word is

Shake correction method when recognizing the character in the image, characterized in that the display.

The method of claim 1, wherein the determining of the target word
If the center coordinate M (n) of the nth output frame falls within the coordinate range of any word recognized by the nth output frame, the target word is determined for the word, in which case the coordinate range of the word is horizontal ( (x (k), y (k)) to (((x (k) + w (k)), y (k)), for vertical ((x (k), y (k)) to (( Image stabilization method for character recognition in the image, characterized in that x (k), (y (k) + h (k))).

The method of claim 1, wherein the determining of the correction amount
Shake correction method for character recognition in the image, characterized in that for determining the correction amount based on the distance between the center coordinates of the output frame and the center coordinates of the target word.

delete

The method of claim 1,
An image stabilization method for character recognition in an image, wherein an n th marker M _n is superimposed and displayed at the center of the n th output frame.

The method of claim 6,
(n-1) if the output of the second displayed in the center of the frame (n-1) th marker M _n _-1 position located within the random word in the (n-1) th output frame, the second word n Image stabilization method for character recognition in the image, characterized in that for determining the target word of the output frame.

The method of claim 6,
When the position of the (n-1) th marker M _n _-1 indicated in the center of the (n-1) th output frame is located in the empty space between each word recognized in the (n-1) th output frame, The target word is determined based on a distance between the position of the (n-1) th marker M _n _-1 and the center of each word recognized in the (n-1) th output frame. Image stabilization method for character recognition.

The method of claim 8,
Determining a word having a minimum distance between the position of the (n-1) th marker M _n _-1 and the center of each word recognized in the (n-1) th output frame as the target word of the nth output frame Image stabilization method of character recognition in the image.

Setting a part of an input frame input from an image photographing apparatus as an output frame;
Determining a target word in the output frame using the center coordinates of the output frame and the position of each word recognized in the output frame; and
Determining an amount of correction inversely proportional to a distance between a center coordinate of an output frame and a center coordinate of the target word;
Moving the set output frame by the correction amount;
And displaying the shifted output frame by cropping the input frame from the input frame.

The method of claim 10,
And overlaying the marker M 'on the center point of the moved output frame to display the shake correction method.

Setting a part of an input frame input from an image photographing apparatus equipped with a CMOS sensor as an output frame;
When rolling shutter distortion occurs in the input frame, performing the rolling shutter distortion correction;
Determining a target word in the output frame using the center coordinates of the output frame and the position of each word recognized in the output frame;
Determining an amount of correction inversely proportional to a distance between a center coordinate of an output frame and a center coordinate of the target word;
Moving the set output frame by the correction amount; and
And displaying the shifted output frame by cropping the input frame from the input frame.

An output frame setting unit that sets a portion of an input frame input from the image photographing apparatus as an output frame;
A target word determination unit for determining a target word in the output frame using the position of each word in the output frame predicted by applying the center coordinates and the OCR technique of the output frame to the output frame; And
And a correction unit determining a correction amount based on at least one of a center coordinate of an output frame, a position of the target word, and an area of the target word.
And the correction amount is determined to decrease when the distance between the center coordinate of the output frame and the center coordinate of the target word increases.

An output frame setting unit that sets a portion of an input frame input from the image photographing apparatus as an output frame;
A target word determination unit which determines a target word in the output frame by using the center coordinates of the output frame and the position of each word recognized in the output frame; and
A correction unit which determines a correction amount in inverse proportion to a distance between a center coordinate of an output frame and a center coordinate of the target word and moves the set output frame by the correction amount;
And a display unit for cropping and displaying the shifted output frame in the input frame.

An output frame setting unit for setting a portion of an input frame input from an image photographing apparatus having a CMOS sensor as an output frame;
A distortion correction unit configured to perform the rolling shutter distortion correction when the rolling shutter distortion occurs in the input frame;
A target word determination unit which determines a target word in the output frame by using the center coordinates of the output frame and the position of each word recognized in the output frame; and
A correction unit which determines a correction amount in inverse proportion to a distance between a center coordinate of an output frame and a center coordinate of the target word and moves the set output frame by the correction amount;
And a display unit for displaying the moved output frame by cropping the input frame from the input frame.

A computer-readable recording medium having recorded thereon a program for performing the method according to any one of claims 1 to 4.