KR101126466B1

KR101126466B1 - Photographic document imaging system

Info

Publication number: KR101126466B1
Application number: KR1020077006738A
Authority: KR
Inventors: 에드워드 피. 주니어. 히네이; 재커리 앙드레; 재커라이어 클레그; 제임스 다피니언; 커트 라펠즈; 윌리암 제이. 아담스; 재커리 비. 도즈
Original assignee: 컴퓨링크 매니지먼트 센터, 인크.
Priority date: 2004-08-26
Filing date: 2005-07-29
Publication date: 2012-04-23
Also published as: US20090238485A1; CN101048783A; IL181451A0; KR20070046946A; ZA200701630B; MX2007002237A; US7835589B2; BRPI0514674A; US20090237531A1; EP1787239A2; AU2005280513A1; US20060045379A1; IL181451A; WO2006026005A3; CN101048783B; US7593595B2; WO2006026005A2; US8204339B2; EP1787239A4

Abstract

캡처된 이미지를 처리하는 장치 및 방법, 더욱 구체적으로 문서를 포함하는 캡처된 이미지를 처리하는 장치 및 방법이 제공된다. 한 실시예에서, 문서를 캡처하기 위한 카메라를 포함하는 장치가 설명된다. 다른 실시예에서, 이미지화된 문서를 그 배경과 구별하는 단계, 카메라의 사용으로부터 생성된 왜곡을 감소시키기 위해 캡처된 이미지를 조정하는 단계 및 문서의 방향을 적절하게 맞추는 단계를 포함하는, 문서를 포함하는 캡처된 이미지를 처리하는 방법이 설명된다.Apparatus and methods are provided for processing a captured image, and more specifically, apparatus and methods for processing a captured image including a document. In one embodiment, an apparatus comprising a camera for capturing a document is described. In another embodiment, the method comprises distinguishing the imaged document from its background, adjusting the captured image to reduce distortion created from the use of the camera, and properly orienting the document. A method of processing a captured image is described.

그래픽 정보, 픽셀 강도, 원근 왜곡, 외곽점 제거, 왜곡 제거 Graphical Information, Pixel Intensity, Perspective Distortion, Edge Removal, Distortion Removal

Description

Photo document imaging system {PHOTOGRAPHIC DOCUMENT IMAGING SYSTEM}

본 발명은 캡처된 이미지를 처리하는 장치 및 방법에 관한 것으로, 더욱 구체적으로 문서를 포함하는 캡처된 이미지를 처리하는 장치 및 방법에 관한 것이다.The present invention relates to an apparatus and method for processing a captured image, and more particularly, to an apparatus and method for processing a captured image including a document.

도 1-A는 스캐너의 전형적인 구성요소들을 도시한 블록도이다. 스캐너는 전형적으로 문서(110)의 이미지를 캡처하기 위해 사용된다. 문서(110)는 스캐너 판(112) 상에 배치된다. 일반적으로 광 서브시스템(122) 및 전하 결합 소자(charge-coupled device: CCD)로 이루어진 스캔 헤드(120)는 문서(110)를 가로질러 이동된다. 도 1A는 단지 2차원 뷰만을 도시하고 있지만, 스캔 헤드(120)는 화살표(114)로 표시된 방향과 문서(110)에 수직인 방향으로 문서를 가로질러 이동할 수 있다. 광학 서브시스템(122)은 문서(110)로부터 반사된 광을 CCD(124) 상으로 집속시킨다. CCD(124)는 흔히 감광 용량성 소자의 2차원 배열로서 구현된다. 광이 CCD(124)의 감광 소자 상에 입사될 때, 전하는 반도체 소자의 공핍 영역 내에 트랩된다. 감광 용량성 소자와 관련된 전하의 양은 샘플링 기간동안 수신된 각각의 소자 상에 입사된 광의 강도와 관련된다. 따라서, 이미지는 소자의 샘플링을 통해 각각의 감광 용량성 소자에서의 입사 광의 강도를 판정함으로써 캡처된다. 감광 용량성 소자에 의해 생성된 아날로그 정보는 아날로그-디지털(A/D) 변환 기(130)에 의해 디지털 정보로 변환된다. A/D 변환기(130)는 CCD(124)로부터 수신된 아날로그 정보를 직렬 또는 병렬 방식으로 변환할 수 있다. 변환된 디지털 정보는 메모리(140) 내에 저장될 수 있다. 그 다음, 디지털 정보는 ROM(180) 내에 저장된 제어 소프트웨어에 따라 프로세서(150)에 의해 처리된다. 사용자는 사용자 인터페이스(170)를 통해 스캐닝 파라미터를 제어할 수 있고, 스캐닝된 이미지는 출력 포트(160)를 통해 출력된다.1-A is a block diagram illustrating typical components of a scanner. The scanner is typically used to capture an image of the document 110. Document 110 is disposed on scanner plate 112. The scan head 120, which is generally comprised of an optical subsystem 122 and a charge-coupled device (CCD), is moved across the document 110. 1A only shows a two-dimensional view, the scan head 120 may move across the document in the direction indicated by the arrow 114 and in a direction perpendicular to the document 110. Optical subsystem 122 focuses the light reflected from document 110 onto CCD 124. CCD 124 is often implemented as a two-dimensional array of photosensitive capacitive elements. When light is incident on the photosensitive element of CCD 124, the charge is trapped in the depletion region of the semiconductor element. The amount of charge associated with the photosensitive capacitive element is related to the intensity of light incident on each element received during the sampling period. Thus, the image is captured by determining the intensity of incident light at each photosensitive capacitive element through sampling of the element. The analog information generated by the photosensitive capacitive element is converted into digital information by an analog-to-digital (A / D) converter 130. The A / D converter 130 may convert analog information received from the CCD 124 in a serial or parallel manner. The converted digital information may be stored in the memory 140. The digital information is then processed by processor 150 in accordance with control software stored in ROM 180. The user may control the scanning parameters through the user interface 170, and the scanned images are output through the output port 160.

디지털 카메라의 블록도는 도 1B에 도시된다. 디지털 카메라의 광 서브시스템(122)은 스캐너에서처럼, 문서(110)로부터 반사된 광을 CCD(124) 상으로 집속시키기 위해 사용될 수 있다. 그외 다른 디지털 카메라에서, CCD 이외에 CMOS 센서와 같은 장치가 이미지로부터 반사된 광을 캡처하기 위해 사용된다. 디지털 카메라와 관련하여, 스캐너와 대조적으로, 광 서브시스템(122)은 스캐너에서처럼 문서의 표면을 따라 이동되지 않는다. 오히려, 디지털 카메라에서, 광 시스템(122)은 일반적으로, 이미지화될 문서와 같은 물체에 대해 움직이지 않는다. 디지털 카메라 이외에, 필름에 기초한 카메라로부터 캡처된 사진도 또한 디지털화 될 수 있다.A block diagram of the digital camera is shown in FIG. 1B. The light subsystem 122 of the digital camera can be used to focus the light reflected from the document 110 onto the CCD 124, as in a scanner. In other digital cameras, devices other than CCDs, such as CMOS sensors, are used to capture light reflected from the image. In the context of a digital camera, in contrast to a scanner, the optical subsystem 122 does not move along the surface of the document as in a scanner. Rather, in a digital camera, the optical system 122 generally does not move relative to an object, such as a document to be imaged. In addition to digital cameras, photos captured from film based cameras can also be digitized.

카메라는 문서 이미지 및 그외 다른 이미지를 캡처하는데 있어서 스캐너보다 뛰어난 장점을 제공한다. 예를 들어, 카메라는 일반적으로 스캐너보다 더 휴대하기 편리하다. 또한, 스캐너가 스캐너 판 상에 배치될 캡처된 이미지를 필요로 하기 때문에, 카메라는 스캐너보다 더 넓은 이미지 어레이를 캡처할 수 있다. 그러나, 카메라의 사용은 스캐너를 사용할 때 존재하지 않는 이미지 캡처링의 어려움을 야기한다. 예를 들어, 광 조건은 카메라를 사용할 때 변화하는 반면, 광 조건은 일반적으로 스캐너에서 제어된다. 또한, 카메라의 사용은 이미지에 대한 카메라의 각도, 카메라에 의해 사용된 렌즈 및 이미지로부터의 그 거리, 문서를 포함한 이미지가 평면 상에 놓이는지 곡면 상에 놓이는지의 여부 및 기타 요인과 같은 여러 변수에 따라 달라질 수 있는 이미지 왜곡을 끌어들인다. 스캐너가 이동 스캐너 헤드를 이용하기 때문에, 이미지화될 문서로부터의 고정 거리에서, 이들 왜곡은 일반적으로 스캐너에서 발생하지 않는다.The camera offers advantages over scanners for capturing document images and other images. For example, cameras are generally more portable than scanners. In addition, because the scanner requires a captured image to be placed on the scanner plate, the camera can capture a wider array of images than the scanner. However, the use of a camera causes difficulties in image capturing that do not exist when using the scanner. For example, light conditions change when using a camera, while light conditions are generally controlled at the scanner. In addition, the use of the camera depends on several variables, such as the camera's angle to the image, the lens used by the camera and its distance from the image, whether the image containing the document lies on a plane or on a surface, and other factors. Induce image distortion that can vary. Since the scanner uses a moving scanner head, at a fixed distance from the document to be imaged, these distortions generally do not occur in the scanner.

그러므로, 스캐너보다 카메라의 장점을 이용하지만, 스캐너와 대조적으로 카메라를 통해 문서 이미지를 캡처함으로써 나타나는 어려움을 감소시킨, 문서의 이미지를 캡처하는 장치 및 방법이 필요하다.Therefore, there is a need for an apparatus and method for capturing an image of a document that takes advantage of the camera over a scanner, but which reduces the difficulty presented by capturing the document image through the camera in contrast to the scanner.

이미지화된 문서를 포함하는 캡처된 이미지를 처리하는 장치 및 방법이 설명된다. 한 실시예에서, 장치는 이미지화된 문서를 캡처하기 위해 이용되는 정지 카메라를 포함한다. 다른 실시예에서, 비정지 카메라는 이미지화된 문서를 캡처하기 위해 이용된다. 또 다른 실시예에서, 문서를 포함하는 캡처된 이미지를 처리하는 방법은 이미지화된 문서를 그 배경과 구별하는 단계, 카메라의 사용으로부터 생성된 왜곡을 감소시키기 위해 캡처된 이미지를 조정하는 단계 및 문서의 방향을 적절하게 맞추는 단계를 포함한다.An apparatus and method for processing a captured image including an imaged document is described. In one embodiment, the apparatus includes a still camera used to capture the imaged document. In another embodiment, a still camera is used to capture the imaged document. In another embodiment, a method of processing a captured image comprising a document comprises distinguishing the imaged document from its background, adjusting the captured image to reduce distortion created from the use of the camera, and Adjusting the orientation as appropriate.

도 1A는 종래의 문서 스캐너를 도시한 도면.1A shows a conventional document scanner.

도 1B는 종래의 디지털 카메라를 도시한 도면.1B shows a conventional digital camera.

도 2는 캡처된 이미지를 처리하는 방법의 일반적인 플로우차트를 도시한 도면.2 illustrates a general flowchart of a method of processing a captured image.

도 3은 캡처된 이미지를 처리하는 방법의 다른 실시예의 플로우차트를 도시한 도면.3 illustrates a flowchart of another embodiment of a method of processing a captured image.

도 4는 여기에 개시된 문서를 이미지화하는 방법의 구현들 중의 한 구현에 따라 분할(segmentation)을 실행하는 방법의 플로우차트를 도시한 도면.4 illustrates a flowchart of a method for performing segmentation in accordance with one of the implementations of a method of imaging a document disclosed herein.

도 5는 도 4에 도시된 랜덤 샘플 합의(consensus)를 실행하는 한 방법의 플로우차트를 도시한 도면.FIG. 5 shows a flowchart of one method of implementing the random sample consensus shown in FIG. 4. FIG.

도 6은 도 4에 도시된 외곽점(outlier) 제거 단계를 실행하는 한 방법의 플로우차트를 도시한 도면.FIG. 6 shows a flowchart of one method of performing the outlier removal step shown in FIG. 4. FIG.

도 7은 여기에 개시된 문서를 이미지화하는 방법에 따라 분할을 실행하는 다른 방법의 플로우차트를 도시한 도면.FIG. 7 illustrates a flowchart of another method of performing segmentation in accordance with a method of imaging a document disclosed herein.

도 8은 도 2 및 도 3에 도시된 왜곡 제거 단계를 실행하는 한 방법의 플로우차트를 도시한 도면.8 shows a flowchart of one method of performing the distortion removal step shown in FIGS. 2 and 3.

도 9는 도 3에 도시된 텍스트의 라인 단계를 실행하는 한 방법의 플로우차트를 도시한 도면.9 shows a flowchart of one method of executing the line step of the text shown in FIG.

도 10은 여기에 개시된 문서를 이미지화하는 방법의 한 구현에 따라 문서의 방향이 똑바르게 적절하게 맞춰졌는지 판정하는 한 방법의 플로우차트를 도시한 도면.FIG. 10 illustrates a flowchart of one method of determining whether a document is properly properly oriented according to one implementation of a method of imaging a document disclosed herein.

도 11은 이미지화된 문서를 포함하는 이미지를 캡처하여 처리하는 장치의 한 실시예를 도시한 도면.FIG. 11 illustrates one embodiment of an apparatus for capturing and processing an image comprising an imaged document.

도 12는 여기에 개시된 문서를 이미지화하는 방법의 한 구현에 따라 문서의 방향이 똑바르게 맞춰졌는지 판정하는 한 방법의 플로우차트를 도시한 도면.FIG. 12 illustrates a flowchart of one method of determining whether a document is straightened according to one implementation of a method of imaging a document disclosed herein.

도 13은 캡처된 이미지를 처리하는 시스템의 한 실시예를 도시한 도면.13 illustrates one embodiment of a system for processing a captured image.

여기에 개시된 실시예는 문서를 포함하는 카메라로부터 캡처된 이미지를 처리하도록 동작할 수 있다. 여기에 개시된 실시예는 캡처된 문서 이미지를 그 배경으로부터 식별하도록 동작할 수 있다. 캡처된 문서 이미지가 그 배경으로부터 분리된 후, 여기에 개시된 실시예는 캡처된 문서 이미지의 왜곡을 감소시키거나 제거하도록 동작할 수 있다. 캡처된 문서 이미지의 왜곡이 정정된 후, 여기에 개시된 실시예는 캡처된 문서 이미지를 그 적절한 방향으로 회전시키도록 동작할 수 있다. 부수적으로, 여기에 개시된 실시예는 사용자에게 본 발명의 여러 실시예에서의 각 단계를 구현하는 성공적인 평가를 제공한다.Embodiments disclosed herein may be operable to process images captured from a camera including a document. The embodiment disclosed herein can be operative to identify a captured document image from its background. After the captured document image is separated from its background, the embodiments disclosed herein can operate to reduce or eliminate distortion of the captured document image. After the distortion of the captured document image is corrected, the embodiments disclosed herein can be operated to rotate the captured document image in its proper direction. Incidentally, the embodiments disclosed herein provide the user with a successful evaluation of implementing each step in the various embodiments of the present invention.

도 2는 캡처된 이미지를 처리하는 방법의 일반적인 플로우차트를 도시한 도면이다. 시작(210) 후, 이미지가 수신된다(220). 이미지는 다양한 소스로부터 수신될 수 있다. 예를 들어, 한 실시예에서, 이미지는 디지털 카메라로부터 수신될 수 있다. 다른 실시예에서, 이미지는 디지털 카메라를 포함하는 정지 유닛으로부터 수신될 수 있다. 또 다른 실시예에서, 이미지는 디지털화된 필름 사진으로부터 수신될 수 있다. 수신된 이미지(220)는 문서 이미지를 포함한다. 단계(230)은 캡처된 문서 이미지를 이미지의 나머지 또는 배경으로부터 식별하는 동작을 한다. 단계(230)은 분할이라 칭해진다. 이 단계(230)는 캡처된 이미지 문서의 에지를 검출하는 동작을 할 수 있다. 이 단계(230)는 또한 문서를 그 배경으로부터 분리시키기 위해 캡처된 문서 이미지로부터 이미지의 배경을 잘라내는 동작을 할 수 있다. 왜곡 제거라 칭해지는 단계(240)는 캡처된 문서 이미지의 왜곡을 감소시키거나 제거하는 동작을 한다. 이 단계(240)가 정정하는 동작을 할 수 있는 몇가지 왜곡은 원근(perspective) 왜곡, 렌즈 왜곡 및 광 왜곡이다. 그외 다른 왜곡도 또한 이 단계(240)에서 정정될 수 있다. 단계(250)는 문서의 방향을 정정하는 동작을 한다. 이 단계(250)는 캡처된 문서 이미지가 세로 방향으로 있어야 되는지 가로 방향으로 있어야 되는지 판정하고 이에 따라 캡처된 문서 이미지를 회전시키는 동작을 할 수 있다. 이 단계(250)는 또한 캡처된 문서 이미지가 거꾸로 뒤집혔는지 판정하여 이에 따라 캡처된 문서 이미지를 회전시키는 동작을 할 수 있다. 단계(260)에서, 처리된 문서 이미지가 출력된다. 처리된 문서 이미지는 처리된 문서 이미지의 모니터 상의 이미지 표시, 처리된 문서 이미지의 컴퓨터 파일로의 저장, 문서 이미지의 전자적 전송, 또는 처리된 문서 이미지의 인쇄와 같이, 여러 수단을 통해 출력될 수 있다(260).2 shows a general flowchart of a method of processing a captured image. After start 210, an image is received 220. The image can be received from various sources. For example, in one embodiment, the image may be received from a digital camera. In another embodiment, the image may be received from a still unit that includes a digital camera. In yet another embodiment, the image may be received from a digitized film picture. The received image 220 includes a document image. Step 230 operates to identify the captured document image from the rest of the image or the background. Step 230 is called division. This step 230 may act to detect the edge of the captured image document. This step 230 may also operate to crop the background of the image from the captured document image to separate the document from its background. Step 240, referred to as distortion removal, operates to reduce or eliminate distortion of the captured document image. Some distortions that can be corrected by this step 240 are perspective distortion, lens distortion, and light distortion. Other distortions may also be corrected at this step 240. Step 250 operates to correct the orientation of the document. This step 250 may determine whether the captured document image should be in portrait or landscape orientation and thereby rotate the captured document image. This step 250 may also operate to determine if the captured document image is upside down and to rotate the captured document image accordingly. In step 260, the processed document image is output. The processed document image may be output through various means, such as displaying an image on the monitor of the processed document image, storing the processed document image in a computer file, electronically transmitting the document image, or printing the processed document image. (260).

도 3은 캡처된 이미지를 처리하는 방법의 다른 실시예의 플로우차트(300)를 도시한 도면이다. 시작(305) 후, 이미지가 수신된다(310). 단계(315)에서, 수신된 이미지는 장치 독립 비트 맵으로 변환된다. 단계(320)에서, 분할은 에지 기반의 분할 처리를 이용하여 실행된다. 에지 기반의 분할(320) 처리는 캡처된 문서 이미지를 그 배경과 구별하기 위해 캡처된 이미지 문서의 에지를 식별한다.3 is a flowchart 300 of another embodiment of a method of processing a captured image. After start 305, an image is received 310. In step 315, the received image is converted into a device independent bit map. In step 320, partitioning is performed using edge-based partitioning processing. Edge-based segmentation 320 processing identifies the edges of the captured image document to distinguish the captured document image from its background.

도 4는 에지 기반의 분할(320)의 한 실시예의 플로우차트를 도시한 도면이다. 이 실시예에서는, 수평 및 수직 에지 점의 위치를 찾아낸다. 이것은 에지 점을 검색함으로써 행해진다. 에지 점은 수신된 이미지의 배경 부분에서 수신된 이미지의 문서 부분으로의 전이를 포함하는 수신된 이미지의 부분을 식별함으로써 판정된다. 한 실시예에서, 수신된 이미지는 수신된 이미지의 중심에서부터 시작하여 스캐닝되고(410), 또한 수신된 이미지의 경계에서부터 시작하여 스캐닝된다(420). 한 실시예에서, 문서 이미지는 수신된 이미지의 중심을 차지하고 있는 것으로 추정된다. 다른 실시예에서, 캡처된 문서 이미지의 비텍스트 부분은 그 배경보다 높은 픽셀 강도를 갖는 것으로 추정된다. 수신된 이미지의 중심에서부터 시작하는 스캐닝(410)에서, 문서 픽셀로서 식별될 수 있는 영역을 찾은 후, 배경 픽셀로의 전이는 스캔을 따라 검색된다. 수신된 이미지의 경계에서부터 시작하는 스캐닝(420)에서, 한 영역이 배경 픽셀로서 식별되고, 문서 이미지 픽셀로의 전이가 식별된다. 처리는 이들 스캔(410, 420) 중의 하나 또는 둘다를 이용하여 실행될 수 있다. 한 실시예에서, 수신된 이미지는 수평 및 수직 양방향으로 스캐닝된다(410, 420).4 is a flowchart of an embodiment of an edge based segmentation 320. In this embodiment, the location of the horizontal and vertical edge points is found. This is done by searching for edge points. The edge point is determined by identifying the portion of the received image that includes the transition from the background portion of the received image to the document portion of the received image. In one embodiment, the received image is scanned starting at the center of the received image (410) and also scanning starting at the boundary of the received image (420). In one embodiment, the document image is assumed to occupy the center of the received image. In another embodiment, the non-text portion of the captured document image is assumed to have a higher pixel intensity than its background. In scanning 410 starting from the center of the received image, after finding an area that can be identified as a document pixel, the transition to the background pixel is searched along the scan. In scanning 420 starting from the boundary of the received image, an area is identified as a background pixel and a transition to the document image pixel is identified. Processing may be performed using one or both of these scans 410, 420. In one embodiment, the received image is scanned 410, 420 in both horizontal and vertical directions.

그 다음, 랜덤 샘플 합의 단계(430)가 실행된다. 도 5는 랜덤 샘플 합의 단계의 한 실시예를 도시한 것이다. 이 실시예에서, 랜덤 샘플 합의(430)는 단계(410 및 420)에서 선택된 에지 점들로부터 랜덤하게 2개의 점을 선택함으로써(510) 실행된다. 그 다음, 이들 2개의 랜덤하게 선택된 점들을 연결하는 라인이 계산된다(520). 한 실시예에서, 각도-거리 좌표가 사용되는데, 각도 값은 수신된 이미지의 중심 주위의 선분의 각도에 대응하고, 거리 값은 수신된 이미지의 중심에 서부터 선분 내의 가장 가까운 점까지의 거리에 대응한다. 다른 실시예에서, 예를 들어, 데카르트 좌표 또는 극 좌표를 포함하여 다른 좌표계가 사용될 수 있다. 그 다음, 이들 값이 저장된다. 단계(410 및 420)에서 얻어진 에지 점들로부터 2개의 랜덤 점들을 선택하는 처리는 충분한 샘플 그룹을 얻기 위해 반복된다(530). 한 실시예에서, 이 처리는 5천번 반복되지만, 이와 상이한 샘플 크기가 사용될 수도 있다. 샘플링 후, 모두 동일한 라인 상에 놓여있는 점들의 쌍들은 빈(bin) 내에 그룹지워진다. 단계(410 및 420)에서 선택된 초기 에지 점들이 수신된 이미지 내의 문서의 에지를 정확하게 나타내면, 대략 4분의 1의 점들은 4개의 문서 에지에 대응하는 4개의 작은 범위로 분포되는 반면, 나머지 점들은 나머지 가능한 좌표를 통해 일반적으로 균일하게 퍼질 것이다. 최대 그룹화 선분을 갖고있고(540) 그룹화 선분의 최소 임계치에 부합하는 4개 세트의 그룹화 선분은 수신된 이미지 내의 문서의 4개의 에지를 나타내는 것으로 식별된다(550). 한 실시예에서, 이들 선분 집합은 이 때, 수신된 이미지에서의 그들의 상대 위치에 따라 좌, 우, 상, 하 에지로 판정된다.Then, a random sample consensus step 430 is executed. 5 illustrates one embodiment of a random sample summation step. In this embodiment, the random sample agreement 430 is executed by selecting 510 randomly two points from the edge points selected in steps 410 and 420. Then, a line connecting these two randomly selected points is calculated (520). In one embodiment, angle-distance coordinates are used, where the angle value corresponds to the angle of the line segment around the center of the received image, and the distance value corresponds to the distance from the center of the received image to the closest point in the line segment. do. In other embodiments, other coordinate systems may be used, including, for example, Cartesian or polar coordinates. These values are then stored. The process of selecting two random points from the edge points obtained in steps 410 and 420 is repeated to obtain a sufficient sample group (530). In one embodiment, this process is repeated 5,000 times, although different sample sizes may be used. After sampling, pairs of points all lying on the same line are grouped into bins. If the initial edge points selected in steps 410 and 420 accurately represent the edges of the document in the received image, then approximately one quarter points are distributed in four small ranges corresponding to four document edges, while the remaining points are The remaining possible coordinates will generally spread evenly. Four sets of grouping segments that have the largest grouping segment (540) and that meet the minimum threshold of the grouping segments are identified as representing 550 four edges of the document in the received image. In one embodiment, these sets of segments are then determined to be left, right, top and bottom edges according to their relative position in the received image.

랜덤 샘플 합의(430)가 실행된 후, 한 실시예에서, 외곽점 제거 단계(440)는 문서 에지의 식별을 더욱 세밀하게 하기 위해 에지 점들의 집합 중에서 실행된다. 한 실시예에서는 도 6에 도시되는데, 이것은 수신된 문서 이미지의 에지들 중의 하나에 대응하는 에지 점들의 집합 사이에서 선형 회귀를 행함으로써 실행된다. 선형 회귀 기술에서, 에지 점들의 집합을 가장 정확하게 연결하고자 하는 라인이 그려진다 (610). 이 선형 회귀 라인에서 가장 먼 점이 선형 회귀 라인에서 충분히 먼 거리로 판정되면(620), 그 점은 제거되고(630), 새로운 선형 회귀가 실행된다. 이 처리는 선형 회귀 라인으로부터 가장 먼 점이 임계값 내에 있고, 최종적인 선형 회귀 라인이 에지 라인으로 판정될 때까지 반복된다. 이것은 수신된 이미지 문서의 4개의 에지를 나타내는 4개의 에지 점 집합의 각각에서 실행된다.After the random sample agreement 430 is executed, in one embodiment, the edge removal step 440 is performed among a set of edge points to further refine the identification of the document edge. In one embodiment, shown in FIG. 6, this is done by performing a linear regression between the set of edge points corresponding to one of the edges of the received document image. In the linear regression technique, a line is drawn (610) to which the set of edge points is most accurately connected. If the point furthest from this linear regression line is determined to be sufficiently far from the linear regression line (620), the point is removed (630) and a new linear regression is performed. This process is repeated until the point farthest from the linear regression line is within the threshold and the final linear regression line is determined to be an edge line. This is done at each of four sets of edge points representing the four edges of the received image document.

다시 도 3을 참조하면, 단계(325)에서, 에지 기반의 분할(320)로부터의 에지 라인 식별의 정확도 계산이 판정된다. 이 단계(325)는 신뢰도 계산으로 칭해질 수 있다. 한 실시예에서, 신뢰도는 수신된 문서 이미지의 각 에지마다 계산되고, 최저값은 전체 신뢰도로 판정된다. 다른 실시예에서, 에지 라인들 중의 최고 신뢰도 값이 전체 신뢰도로 판정된다. 또 다른 실시예에서, 전체 신뢰도를 판정하기 위해, 예를 들어 라인 에지들의 신뢰도 평균과 같은 에지 라인들의 신뢰도 조합이 사용된다. 특정 라인 에지의 판정 신뢰도를 계산하는 한 실시예는 외곽점 제거(440) 후 그 에지의 집합 내에 남아있는 픽셀 점들의 수와 그 에지에서 발견될 수 있었던 픽셀 점들의 총 수 사이의 비를 계산하는 것이다. 신뢰도 판정은 수신된 문서 이미지의 왜곡 제거(240, 350)를 개선하기 위해 사용될 수 있고, 또한 특정 수신된 이미지에 대한 시스템 성능의 정확도를 사용자에게 알려주기 위해 사용될 수 있다. 단계(330)에서, 에지 기반의 분할 단계(320)에서의 신뢰도가 충분히 높지 않으면, 단계(335)의 컨텐트 기반의 분할이 실행된다.Referring again to FIG. 3, at step 325, an accuracy calculation of edge line identification from edge based segmentation 320 is determined. This step 325 may be referred to as reliability calculation. In one embodiment, the confidence is calculated for each edge of the received document image and the lowest value is determined as the overall confidence. In another embodiment, the highest confidence value of the edge lines is determined as the overall confidence. In yet another embodiment, a reliability combination of edge lines is used, such as, for example, the reliability average of the line edges to determine the overall reliability. One embodiment of calculating the determination reliability of a particular line edge calculates the ratio between the number of pixel points remaining in the set of edges after edge removal 440 and the total number of pixel points that could be found at that edge. will be. Reliability determination can be used to improve distortion reduction 240 and 350 of the received document image, and can also be used to inform the user of the accuracy of system performance for a particular received image. In step 330, if the reliability in edge-based segmentation step 320 is not high enough, the content-based segmentation of step 335 is performed.

그 한 실시예가 도 7에 도시되어 있는 컨텐트 기반의 분할 단계(335)는 캡처된 이미지 문서의 텍스트를 식별하고, 텍스트와 관련하여 캡처된 이미지 문서의 에지를 계산한다. 이것은 수신된 문서 이미지 내의 연결된 요소들을 식별하여(710) 그들 요소에 가장 가까운 이웃을 찾아냄으로써(720) 달성된다. 연결된 요소들은 일반적으로 서로 인접한 그들 흑색 또는 어두운 픽셀을 나타낸다. 그 다음, 그들 인접한 픽셀들은 라인으로 연결되는데(730), 이들 라인은 다음에 텍스트의 경계를 결정하기 위해 사용된다(740). 이들 경계로부터, 수신된 문서 이미지의 에지 위치를 식별하기 위해 마진(margin)이 추가된다(750). 마진의 크기는 변화할 수 있는데, 한 실시예에서는, 단계(750)에서 표준 마진이 추가된다.A content based segmentation step 335 of which one embodiment is shown in FIG. 7 identifies the text of the captured image document and calculates the edge of the captured image document in relation to the text. This is accomplished by identifying the connected elements in the received document image (710) and finding the nearest neighbor to those elements (720). Connected elements generally represent those black or dark pixels adjacent to each other. Then, their adjacent pixels are connected by a line (730), which lines are then used to determine the boundaries of the text (740). From these boundaries, a margin is added 750 to identify the edge location of the received document image. The size of the margin can vary, in one embodiment, a standard margin is added at step 750.

단계(340)에서, 캡처된 문서 이미지의 코너가 계산된다. 한 실시예에서, 코너는 에지 라인들의 교점으로부터 계산될 수 있다.In step 340, the corners of the captured document image are calculated. In one embodiment, the corner may be calculated from the intersection of the edge lines.

왜곡 제거(240, 350) 단계는 수신된 이미지에 대한 여러 번의 조정을 수반할 수 있다. 한 실시예에서, 왜곡 제거(240, 350)는 수신된 이미지에서의 원근 왜곡에 대해 정정하기 위해 수신된 문서 이미지를 조정할 것이다. 예를 들어, 사진이 바로 위의 각도에서 찍히지 않고, 문서의 중심에 있지않는 상황에서는 수신된 문서 이미지의 원근 왜곡이 있을 것이다.Distortion removal 240 and 350 may involve multiple adjustments to the received image. In one embodiment, distortion removal 240, 350 will adjust the received document image to correct for perspective distortion in the received image. For example, in situations where the picture is not taken from the angle just above and not in the center of the document, there will be perspective distortion of the received document image.

원근 왜곡을 정정하기 위해 이미지를 조정하는 한 실시예는 도 8에 도시된다. 이 실시예는 이미지 좌표 세트, 예를 들어 (x, y)를 새로운 이미지 좌표 세트, 예를 들어 (u, v)에 매핑하는 것(810)을 수반한다. 분할 단계(230, 320, 335) 후, 문서의 4개의 코너가 판정된다(340). 전형적으로, 원근 왜곡을 포함하는 문서에서, 이들 4개의 코너는 사다리꼴에 대응하는 반면, 문서는 일반적으로 직사각형 모양을 가질 것이다. 그러므로, 한 실시예에서, 매핑(810)은 수신된 사다리꼴에서 원하는 직사각형으로의 사이에 실행된다. 이 매핑(810)을 달성하는 한 실시예는 본 분야에 공지되어 있는 왜곡 픽셀 좌표에서 비왜곡 픽셀 좌표로의 변환을 나타내는 동차 행렬(homogeneous matrix)을 통해 비왜곡 픽셀 좌표와 왜곡 픽셀 좌표 사이의 동차 변환을 이용하는 것이다. 변환은 분할(230, 320, 335) 동안에 결정된 4개의 코너를 비왜곡 수신 문서 이미지의 정정된 크기와 비교함으로써 계산될 수 있다. 한 실시예에서, 각 픽셀 점에서 변환을 계산하는 필요성은 새로운 픽셀 좌표를 계산하기 위해 단순히 각 라인마다 변환을 계산하여 선형 보간법을 이용함으로써 없앨 수 있다. 감소된 원근 왜곡을 갖는 문서에 대응하는 새로운 좌표를 매핑한 후, 픽셀의 재샘플링이 실행된다(815).One embodiment of adjusting the image to correct perspective distortion is shown in FIG. 8. This embodiment involves mapping 810 a set of image coordinates, for example (x, y), to a new set of image coordinates, for example (u, v). After the dividing step 230, 320, 335, four corners of the document are determined 340. Typically, in documents containing perspective distortion, these four corners correspond to a trapezoid while the document will generally have a rectangular shape. Therefore, in one embodiment, mapping 810 is performed between the received trapezoids and the desired rectangles. One embodiment of achieving this mapping 810 is a homogeneous between the non-distorted pixel coordinates and the distorted pixel coordinates through a homogeneous matrix representing the transformation from the distorted pixel coordinates to the non-distorted pixel coordinates known in the art. Is to use a transform. The transform can be calculated by comparing the four corners determined during segmentation 230, 320, 335 with the corrected size of the non-distorted received document image. In one embodiment, the need to calculate the transform at each pixel point can be eliminated by simply calculating the transform for each line and using linear interpolation to calculate the new pixel coordinates. After mapping the new coordinates corresponding to the document with the reduced perspective distortion, resampling of the pixels is performed (815).

왜곡 제거(240, 350) 단계에서 조정될 수 있는 수신된 이미지의 다른 양상은 카메라 렌즈에 의해 야기된 왜곡의 조정이다(820). 카메라 렌즈에 의해 야기된 왜곡은 직선을 구부러지도록 다르게 만들어낼 수 있다. 이 왜곡은 사용된 특정 렌즈, 및 캡처된 이미지로부터의 카메라의 거리에 의존한다. 렌즈 왜곡에 의해 생성된 굴곡은 일반적으로 방사상으로 될 것이므로, 렌즈 왜곡의 균일한 반경 조정은 렌즈 왜곡의 정도에 근사한 파라미터를 사용하여 실행될 수 있다. 이 파라미터는 시스템에 의해 계산되거나 사용자에 의해 입력될 수 있다.Another aspect of the received image that may be adjusted in the distortion removal step 240, 350 is the adjustment of the distortion caused by the camera lens (820). The distortion caused by the camera lens can be made differently to bend straight lines. This distortion depends on the particular lens used and the distance of the camera from the captured image. Since the curvature produced by lens distortion will generally be radial, uniform radius adjustment of lens distortion can be performed using parameters that approximate the degree of lens distortion. This parameter can be calculated by the system or entered by the user.

왜곡 제거(240, 350) 단계에서 조정될 수 있는 수신된 이미지의 또 다른 양상은 전체적으로 편평하지 않은 문서에 의해 야기된 왜곡의 조정이다. 예를 들어 이미지화된 문서가 책 속의 한 페이지이면, 그 페이지는 사진으로 캡처되었을 때 왜곡을 생성하는 굴곡이 있을 수 있다. 이 왜곡은 또한 왜곡 제거 단계(240, 350)에서 정정될 수 있다. 그외 다른 왜곡도 또한 정정될 수 있고, 여기에서의 특정 유형의 왜곡의 설명은 감소되거나 제거될 수 있는 왜곡의 유형을 제한하고자 하는 것은 아니다.Another aspect of the received image that can be adjusted in the distortion removal steps 240 and 350 is the adjustment of the distortion caused by the document that is not entirely flat. For example, if the imaged document is a page in a book, the page may have a curvature that creates distortion when captured in the photograph. This distortion can also be corrected in the distortion removal steps 240 and 350. Other distortions may also be corrected, and the description of certain types of distortions herein is not intended to limit the type of distortions that may be reduced or eliminated.

단계(365)에서, 임계화 처리는 단계(360)에서 생성된 이미지에서 실행된다. 임계화 처리(365)는 이미지의 색심도를 감소시키고, 이미지를 사진으로 찍을 때 사용될 수 있는 플래시에 의해 생성된 왜곡을 감소시키는 잠재적인 장점을 갖는다. 한 실시예에서, 임계화 처리(365)는 24비트 컬러 이미지를 1비트 흑백 이미지로 감소시킨다. 이미지를 흑백으로 감소시키는 잠재적인 이점은 카메라의 플래시에 의해 도입된 영향의 감소, 및 시스템(300)에 의해 요구된 처리하기 위한 정보량의 감소이다. 임계화 처리(365)는 여러가지 방식으로 실행될 수 있다. 한 실시예는 본 분야에 공지되어 있는 디더링(dithering) 기술을 이용할 수 있다. 디더링 기술의 예는 레이저피시에 의한 스노우바운드 이미지 라이브러리(SNOWBOUND^？ IMAGE LIBRARY by Snowbound Software Corp.)와 같은 현존하는 이미지 소프트웨어에서 찾아볼 수 있다. 그러나, 디더링 기술을 사용하는 한가지 단점은 이미지 내로의 노이즈의 유입이다. 임계화(365)의 다른 실시예는 이미지를 위한 글로벌 임계값의 선택을 수반한다. 그러한 기술에서는, 임계값이 선택된다. 임계값보다 큰 강도를 갖는 픽셀들은 백색으로 간주되고, 나머지 픽셀은 흑색으로 간주된다. 임계값은 여러가지 방식으로 선택될 수 있다. 한 실시예에서는, 임계값이 선택되어, 모든 수신된 이미지에 대해 적용된다. 이 기술은 수신된 이미지 내의 변화된 광 조건을 설명하지 않는다는 단점을 갖는다. 다른 실시예에서, 임계값은 히스토그램(histogram)과 같은 수신된 이미지의 분석으로부터 계산된다. 수신된 이미지의 분석을 수반하는 그러한 한 실시예에서는, 수신된 이미지가 수신된 문서 이미지의 전경 및 배경에 대응하는 강도 히스토그램에서 2개의 피크를 갖는다는 가정이 이루어진다. 이 실시예는 그 가정이 적용되지 않는 그들 이미지에 대해 잘 실행할 수 없다. 임계화(365)를 위한 다른 실시예는 수신된 이미지 내의 각 픽셀에 대한 분리된 임계값을 선택하는 것이다. 이 실시예는 조명 변화 또는 배경 콘트라스트와 같은 문서 내의 조건 변화에 응답하는 장점을 갖는다. 이 기술의 한 실시예는 적응 임계화라 칭해진다. 이 실시예에서, 이전의 픽셀 값은 각각의 새로운 픽셀이 임계 값의 판정을 위해 분석될 때 고려된다. 이것을 달성하는 한가지 방식은 수신된 이미지의 각각의 순차 픽셀이 분석됨에 따라 각 픽셀의 가중 평균치를 계산하는 것이다. 이 실시예의 한가지 가능한 단점은 수신된 이미지가 컬러화 문서를 포함하는 경우의 노이즈의 삽입이다.In step 365, the thresholding process is performed on the image created in step 360. Thresholding process 365 has the potential advantage of reducing the color depth of the image and reducing the distortion generated by the flash that can be used when taking the image as a picture. In one embodiment, the thresholding process 365 reduces the 24-bit color image to a 1-bit black and white image. Potential advantages of reducing the image to black and white are the reduction of the effects introduced by the flash of the camera, and the reduction in the amount of information required for processing by the system 300. The thresholding process 365 can be executed in a number of ways. One embodiment may use dithering techniques known in the art. For the dithering technique can be found in the existing image software, such as snow-bound image library by Microfiche laser ^{(SNOWBOUND? IMAGE LIBRARY by Snowbound Software} Corp.). However, one disadvantage of using dithering techniques is the influx of noise into the image. Another embodiment of thresholding 365 involves the selection of a global threshold for an image. In such a technique, a threshold is selected. Pixels with intensity greater than the threshold are considered white and the remaining pixels are considered black. The threshold can be selected in several ways. In one embodiment, a threshold is selected and applied to all received images. This technique has the disadvantage that it does not account for the changed light conditions in the received image. In another embodiment, the threshold is calculated from the analysis of the received image, such as a histogram. In one such embodiment involving analysis of the received image, an assumption is made that the received image has two peaks in the intensity histogram corresponding to the foreground and background of the received document image. This embodiment cannot work well for those images for which the assumption does not apply. Another embodiment for thresholding 365 is to select a separate threshold for each pixel in the received image. This embodiment has the advantage of responding to changes in conditions in the document such as changes in lighting or background contrast. One embodiment of this technique is called adaptive thresholding. In this embodiment, the previous pixel value is taken into account when each new pixel is analyzed for the determination of the threshold. One way to achieve this is to calculate a weighted average of each pixel as each sequential pixel of the received image is analyzed. One possible drawback of this embodiment is the insertion of noise when the received image contains a colorized document.

단계(370)에서, 텍스트의 라인 단계가 실행된다. 이 단계(370)에서, 시스템은 수신된 문서 이미지 내의 텍스트의 라인을 판정한다. 도 9는 텍스트의 라인(370)의 한 실시예를 도시한 것이다. 한 실시예에서, 시스템은 수신된 문서 이미지 내의 텍스트에 대응하는 픽셀이 수신된 문서 이미지의 배경 픽셀보다 낮은 강도를 갖는다고 가정한다. 이 실시예에서, 수신된 문서 이미지의 각 행 내의 모든 픽셀의 강도의 합이 계산된다(910). 그 다음, 이들 합은 픽셀 강도의 국소적인 최고 및 최저를 식별하기 위해 사용된다. 그 다음, 이들 최고 및 최저는 문서 내의 텍스트 라인을 판정하기 위해 분석된다. 예를 들어, 수신된 문서 이미지가 백색 배경과 흑색 텍스트 라인을 가졌으면, 전체적으로 백색인 픽셀 라인이 가장 높은 총 강도를 가질 것이고, 흑색 텍스트를 포함하는 라인은 상당히 낮은 픽셀 강도를 가질 것이다. 그 다음, 이들 강도의 차이가 계산될 수 있고, 이로써 텍스트의 라인이 판정될 수 있다. 양호한 실시예에서, 텍스트의 라인(370)은 수신된 문서 이미지를 가로질러 수평과 수직 양방향으로 실행된다.In step 370, a line step of text is executed. In this step 370, the system determines the line of text in the received document image. 9 illustrates one embodiment of a line 370 of text. In one embodiment, the system assumes that the pixel corresponding to the text in the received document image has a lower intensity than the background pixel of the received document image. In this embodiment, the sum of the intensities of all the pixels in each row of the received document image is calculated 910. These sums are then used to identify the local highest and lowest of the pixel intensities. These highs and lows are then analyzed to determine text lines in the document. For example, if the received document image had a white background and black text lines, the pixel lines that were white entirely would have the highest total intensity, and the lines containing black text would have significantly lower pixel intensities. The difference in these intensities can then be calculated, whereby a line of text can be determined. In a preferred embodiment, the line of text 370 runs in both horizontal and vertical directions across the received document image.

텍스트의 라인(370)을 실행하는 다른 실시예는 단계(335)에서 실행된 것과 유사한 텍스트 라인 검색을 실행하는 것이다. 그러한 한 실시예에서, 캡처된 문서 이미지의 텍스트가 식별되어 라인으로 형성된다. 이것은 캡처된 문서 이미지 내의 연결된 요소들을 식별하고 그들 요소에 가장 가까운 이웃을 찾아냄으로써 달성될 수 있다. 연결된 요소들은 일반적으로 서로 인접한 그들 흑색 또는 더 어두운 픽셀을 나타낸다. 그 다음, 그들 인접한 픽셀은 라인으로 연결된다. 이 처리과정은 도 7의 단계(710, 720 및 730)에서 설명된 것과 유사하다.Another embodiment of executing a line of text 370 is to perform a text line search similar to that performed in step 335. In one such embodiment, the text of the captured document image is identified and formed into lines. This can be accomplished by identifying linked elements in the captured document image and finding the neighbors closest to those elements. Connected elements generally represent those black or darker pixels that are adjacent to each other. Then, their adjacent pixels are connected by lines. This process is similar to that described in steps 710, 720, and 730 of FIG. 7.

단계(375)는 캡처된 문서 이미지가 가로방향 포맷으로 되어야 하는지 세로방향 포맷으로 되어야 하는지 판정한다. 한 실시예에서, 이것은 수직 방향으로의 텍스트 라인(370) 결과를 수평 방향으로의 텍스트 라인(370) 결과와 비교함으로써 달성된다. 한 실시예에서, 결과적으로 더 큰 수의 라인을 갖는 방향이 수신된 문서 이미지의 방향을 정하는 것으로 판정된다. 예를 들어, 폭보다 큰 높이를 갖는 수신된 문서 이미지에서, 수직 방향으로의 텍스트 라인(370)이 수평 방향으로의 텍스트 라인(370)보다 더 큰 수의 라인을 얻으면, 수신된 문서 이미지는 가로 방향을 갖는 것으로 판정된다. 다른 예로서, 동일한 수신된 문서 이미지에서, 수평 방향으로의 텍스트 라인(370)이 수직 방향으로의 텍스트 라인(370)보다 더 큰 수의 라 인을 얻으면, 수신된 문서 이미지는 세로 방향을 갖는 것으로 판정된다.Step 375 determines whether the captured document image should be in landscape or portrait format. In one embodiment, this is accomplished by comparing the text line 370 results in the vertical direction with the text line 370 results in the horizontal direction. In one embodiment, as a result, a direction with a larger number of lines is determined to orient the received document image. For example, in a received document image having a height greater than the width, if the text line 370 in the vertical direction gets a larger number of lines than the text line 370 in the horizontal direction, the received document image is horizontal. It is determined to have a direction. As another example, in the same received document image, if the text line 370 in the horizontal direction gets a larger number of lines than the text line 370 in the vertical direction, then the received document image is said to have a vertical direction. It is determined.

단계(380)는 문서의 똑바른 방향을 판정한다. 도 10은 수신된 문서 이미지가 적절하게 똑바로 향하고 있는지 판정하는(380) 한 실시예를 도시한 것이다. 한 실시예에서, 텍스트의 각 라인이 분석된다. 텍스트의 더 적은 수의 라인이 분석될 수 있지만, 이것은 덜 신뢰할 만한 결과를 초래할 수 있다. 한 실시예에서, 텍스트의 각 라인은 3개의 구간(1010): 즉, 상승 구간, 중간 구간 및 하강 구간으로 나누어진다. 영어 문자는 수신된 문서 이미지의 똑바른 방향을 판정하기 위해 소정의 실시예에서 사용될 수 있는 소정의 고유한 통계 특성을 포함한다. 예를 들어, 영어 알파벳은 문장의 하부 경계 아래로 내려가는 5개의 문자(즉, g, j, p, q 및 y)만 있고, 문장의 상부 경계 위로 올라가는 더 많은 문자(예를 들어, b, d, f, h, i, k, 및 q)가 있다. 한 실시예에서, 영어 문자의 이러한 특성은 상승 구간과 하강 구간 내에 포함된 각각의 픽셀 수를 계산하여(1020), 그들 픽셀 농도를 계산할 때(1030, 1040) 고려될 수 있다. 예를 들어, 하강 문자 픽셀보다 상승 문자 픽셀이 많이 있는 영어 문자를 갖는 수신된 문서 이미지는 아마 똑바른 위치에 있을 것이므로 회전될 필요가 없는 반면, 동일한 문서가 상승 문서 픽셀보다 많은 하강 문자 픽셀을 가지면, 문서는 아마 180도 회전될 필요가 있을 것이다(1050).Step 380 determines the straight direction of the document. FIG. 10 illustrates an embodiment of determining 380 whether a received document image is properly pointed straight. In one embodiment, each line of text is analyzed. Although fewer lines of text can be analyzed, this can result in less reliable results. In one embodiment, each line of text is divided into three sections 1010: a rising section, a middle section and a falling section. English characters include certain unique statistical properties that can be used in certain embodiments to determine the straight orientation of the received document image. For example, the English alphabet has only five characters (ie g, j, p, q, and y) that go down the lower boundary of the sentence, and more characters (eg, b, d) that rise above the upper boundary of the sentence. , f, h, i, k, and q). In one embodiment, this property of the English character may be taken into account when calculating the number of pixels included in the rising and falling sections (1020) and calculating their pixel concentrations (1030, 1040). For example, a received document image with English characters with more rising character pixels than falling character pixels would probably be in a straight position and would not need to be rotated, while the same document would have more falling character pixels than rising document pixels. The document will probably need to be rotated 180 degrees (1050).

다른 실시예에서, 영어 문자의 다른 특성이 또한 고려될 수 있다. 예를 들어, 수평 방향의 픽셀 위치의 특성이 고려될 수 있다. 또한, 문서의 똑바른 방향 을 판정하기 위해, 광 문자 인식("OCR")과 같은 비통계적 방법이 또한 사용될 수 있다. 다른 실시예는 신경망 방식을 이용할 수 있다. 또한, 이와 유사한 고유 특성이 비영어 문서에 이용될 수 있다. 예를 들어, 스페인어 문자는 영어와 유사하므로, 유사한 고유 특성을 가질 것이다. 다른 예로서, 아라비아어 문자는 더 많은 수의 하강 문자를 포함하고, 실시예는 그에 따라 그들 특성에 대해 조정할 수 있다.In other embodiments, other characteristics of English characters may also be considered. For example, the characteristics of the pixel position in the horizontal direction can be considered. In addition, non-statistical methods such as optical character recognition (“OCR”) can also be used to determine the straight direction of the document. Other embodiments may use a neural network approach. Similar inherent properties can also be used for non-English documents. For example, Spanish characters are similar to English, so they will have similar unique characteristics. As another example, Arabic characters include a greater number of falling characters, and embodiments may adjust for their characteristics accordingly.

도 12는 단계 380을 수행하여, 수신된 문서 이미지가 적절하게 똑바로 향하고 있는지의 여부를 판정하는 또 다른 실시예를 도시한 것이다. 한 실시예에서, 연결된 요소는 텍스트의 각 글자 라인을 판정하기 위해 사용된다. 각각의 요소는 높이에 의해 2개의 카테고리, 즉 소 및 대로 분류된다(1210). 텍스트 라인의 중심이 판정된다(1220). 한 실시예에서, 소문자 높이는 텍스트 라인의 중심을 판정하기 위해 사용된다(1220). 이것은 페이지를 가로질러 구부러진 경우와 같이 왜곡된 경우에 텍스트 라인 중심의 평가를 개선할 수 있다. 그 다음, 대문자는 텍스트 라인의 중심에 맞추어져서, 이 중심에 대한 상대 위치에 기초하여 상승 또는 하강으로 그룹지워진다(1230). 상승 및 하강 글자의 총 수가 계산된다. 전형적인 영어 문서에서는, 많은 문자들이 페이지의 위쪽으로 올라갈 것이다. 그러므로, 한 실시예에서, 상승 대문자의 수가 하강 대문자의 수보다 크면, 문서는 단계 390에서 출력되기 전에 단계 385에서 회전될 필요가 없다. 그러나, 하강 대문자의 수가 상승 대문자의 수보다 크면, 문서는 단계 390에서 출력되기 전에 단계 385에서 회전된다.FIG. 12 illustrates another embodiment of performing step 380 to determine whether a received document image is properly pointed straight. In one embodiment, connected elements are used to determine each character line of text. Each element is classified 1210 by height into two categories, namely cattle and cages. The center of the text line is determined 1220. In one embodiment, lowercase height is used to determine the center of the text line (1220). This can improve the evaluation of text line-centric evaluation in the case of distortion, such as when bending across the page. The capital letters are then aligned to the center of the text line and grouped up or down based on their relative position to the center (1230). The total number of rising and falling letters is calculated. In a typical English document, many characters will go to the top of the page. Therefore, in one embodiment, if the number of rising capital letters is greater than the number of falling capital letters, the document does not need to be rotated in step 385 before being output in step 390. However, if the number of falling capital letters is greater than the number of rising capital letters, the document is rotated in step 385 before being output in step 390.

그 다음, 이미지는 단계(380 및 375)의 판정에 따라 단계(385)에서 회전된 다. 그 다음, 새로운 문서 이미지가 출력된다(390).The image is then rotated in step 385 according to the determination of steps 380 and 375. Then, a new document image is output 390.

상술된 바와 같이, 시스템 이미지화된 문서는 필름 카메라 또는 디지털 카메라에 캡처될 수 있다. 이들 자유형식 장치에 대한 대안으로서, 정지 카메라 시스템이 이미지화된 문서를 캡처하기 위해 이용될 수 있다. 도 11은 문서 이미지를 캡처하는 정지 카메라 시스템의 실시예를 도시한 것이다. 이 실시예에서, 문서(1110)는 시스템의 베이스(base)(1120) 상에 배치된다. 양호한 실시예에서, 시스템의 베이스(1120)는 상술된, 분할 처리를 용이하게 하는 장점을 가질 수 있는 선정된 컬러로 이루어진다. 베이스(1120)로부터 연장되어 있는 것은 카메라(1140) 및 조명(1150)을 수용할 수 있는 스탠드(stand)(1130)이다. 카메라 및 조명은 영구적으로 스탠드(1130) 내에 수용될 수 있고, 또는 분리가능하거나 조정가능할 수 있다. 조명은 베이스(1120) 또는 스탠드(1130) 상의 어디에나 배치될 수 있다. 다른 실시예에서, 베이스(1120) 또는 스탠드(1130) 상에 포함된 추가 조명은 없다. 또 다른 실시예에서, 조명은 베이스(1120) 또는 스탠드(1130)로부터 분리된다. 그 다음, 정지 시스템은 수신된 이미지 문서의 상술된 처리를 실행하기 위해 컴퓨터(1160)에 연결된다. 다른 실시예에서, 컴퓨터는 또한 장치에 내장될 수 있다. 또 다른 실시예에서, 캡처된 이미지 문서는 단순히 디지털 카메라(1140) 내에 또는 다른 메모리 소스 내에 저장되고, 나중에 처리를 위해 컴퓨터에 연결될 수 있다. 그러한 정지 카메라 시스템은 예를 들어, 사무실에서 사용자 워크스테이션의 일부로서 배치될 수 있다.As described above, system imaged documents can be captured with a film camera or digital camera. As an alternative to these freestyle devices, a still camera system can be used to capture the imaged document. 11 illustrates an embodiment of a still camera system for capturing document images. In this embodiment, document 1110 is placed on base 1120 of the system. In a preferred embodiment, the base 1120 of the system is of a predetermined color, which may have the advantage of facilitating the segmentation process described above. Extending from the base 1120 is a stand 1130 that can receive the camera 1140 and the illumination 1150. The camera and lights may be permanently housed in the stand 1130 or may be detachable or adjustable. The illumination may be placed anywhere on the base 1120 or stand 1130. In other embodiments, there is no additional illumination included on the base 1120 or stand 1130. In yet another embodiment, the illumination is separated from the base 1120 or stand 1130. The still system is then connected to the computer 1160 to perform the above described processing of the received image document. In other embodiments, the computer may also be embedded in the device. In yet another embodiment, the captured image document may simply be stored in the digital camera 1140 or in another memory source and later connected to a computer for processing. Such a still camera system can be deployed, for example, as part of a user workstation in an office.

자유형식 카메라와 대조적으로 정지 카메라 시스템을 이용하는 장점이 몇가지 있다. 예를 들어, 정지 카메라 시스템 이용시에, 문서가 카메라 렌즈에 수직하게 되어 카메라 렌즈에 대하여 중심이 맞춰질 가능성이 더 크기 때문에, 원근 왜곡의 양이 감소될 수 있다. 또한, 다른 장점은 카메라와 사용된 렌즈 사이의 거리가 알려짐으로써 이들 파라미터를 계산하거나 근사시킬 필요성을 감소시키기 때문에, 시스템이 렌즈 왜곡을 더 잘 조정할 수 있게 될 수 있다. 다른 가능한 장점은 카메라 플래시에 의해 생성된 왜곡을 감소시키는 것일 수 있다. 양호한 실시예에서, 정지 시스템의 조명(1150)은 글레어(glare), 또는 카메라 플래시에 의해 생성된 다른 왜곡을 감소시키도록 위치 설정될 수 있다.In contrast to free-form cameras, there are several advantages to using still camera systems. For example, when using a still camera system, the amount of perspective distortion can be reduced because the document is more likely to be perpendicular to the camera lens and centered relative to the camera lens. Another advantage is that the system can better adjust lens distortion because the distance between the camera and the lens used is known, thereby reducing the need to calculate or approximate these parameters. Another possible advantage may be to reduce the distortion produced by the camera flash. In a preferred embodiment, the illumination 1150 of the stationary system may be positioned to reduce glare or other distortion produced by the camera flash.

캡처된 이미지를 처리하기 위해 여기에서 설명된 방식은 임의 유형의 처리 애플리케이션에 적용가능하고, (제한없이) 캡처된 이미지를 처리하는 컴퓨터 기반의 애플리케이션에 특히 더 적합하다. 여기에서 설명된 방식은 하드웨어 회로, 컴퓨터 소프트웨어, 또는 하드웨어 회로와 컴퓨터 소프트웨어의 조합으로 구현될 수 있고, 특정 하드웨어 또는 소프트웨어 구현에 제한되지 않는다.The approach described herein for processing a captured image is applicable to any type of processing application, and is particularly suitable for computer-based applications that process (without limitation) a captured image. The scheme described herein may be implemented in hardware circuitry, computer software, or a combination of hardware circuitry and computer software, and is not limited to any specific hardware or software implementation.

도 13은 본 발명의 실시예가 구현될 수 있는 컴퓨터 시스템(1300)을 도시한 블록도이다. 컴퓨터 시스템(1300)은 버스(1345) 또는 정보를 소통하기 위한 다른 통신 메카니즘, 및 버스(1345)와 연결되어 정보를 처리하는 프로세서(1335)를 포함한다. 컴퓨터 시스템(1300)은 또한 버스(1345)에 연결되어, 프로세서(1335)에 의해 실행될 정보 및 명령어를 저장하는 랜덤 액세스 메모리(RAM) 또는 다른 동적 저장 장치와 같은 메인 메모리(1320)를 포함한다. 메인 메모리(1320)는 또한 프로세서(1335)에 의해 실행될 명령어의 실행동안 일시적 변수 또는 기타 중간 정보를 저 장하기 위해 사용될 수 있다. 컴퓨터 시스템(1300)은 프로세서(1335)용 정적 정보 및 명령어를 저장하기 위해 버스(1345)에 연결된 판독 전용 메모리(ROM)(1325) 또는 기타 정적 저장 장치를 더 포함한다. 자기 디스크 또는 광 디스크와 같은 저장 장치(1330)가 제공되고, 정보 및 명령어를 저장하기 위해 버스(1345)에 연결된다.13 is a block diagram illustrating a computer system 1300 in which an embodiment of the present invention may be implemented. Computer system 1300 includes a bus 1345 or other communication mechanism for communicating information, and a processor 1335 connected to the bus 1345 to process information. Computer system 1300 is also coupled to bus 1345 and includes main memory 1320, such as random access memory (RAM) or other dynamic storage device, that stores information and instructions to be executed by processor 1335. Main memory 1320 may also be used to store temporary variables or other intermediate information during execution of instructions to be executed by processor 1335. Computer system 1300 further includes read-only memory (ROM) 1325 or other static storage device coupled to bus 1345 for storing static information and instructions for processor 1335. A storage device 1330, such as a magnetic disk or an optical disk, is provided and connected to the bus 1345 to store information and instructions.

컴퓨터 시스템(1300)은 컴퓨터 사용자에게 정보를 표시하기 위한 브라운관(CRT)과 같은 디스플레이(1305)에 버스(1345)를 통해 연결될 수 있다. 영숫자와 그외 다른 키를 포함하는 입력 장치(1310)는 정보 및 커맨드 선택을 프로세서(1335)에 통신하기 위해 버스(1345)에 연결된다. 다른 유형의 사용자 입력 장치는 방향 정보 및 커맨드 선택을 프로세서(1335)에 통신하고 디스플레이(1305) 상의 커서 움직임을 제어하기 위한 마우스, 트랙볼 또는 커서 방향 키와 같은 커서 제어장치(1315)이다. 이 입력 장치는 전형적으로 장치가 한 평면 내의 위치들을 지정할 수 있게 하는 2개의 축, 즉 제1축(예를 들어, x) 및 제2축(예를 들어, y)에서의 2개의 자유도를 갖는다.Computer system 1300 may be connected via bus 1345 to a display 1305, such as a cathode ray tube (CRT) for displaying information to a computer user. Input device 1310 comprising alphanumeric and other keys is coupled to bus 1345 to communicate information and command selections to processor 1335. Another type of user input device is a cursor controller 1315, such as a mouse, trackball or cursor direction key, for communicating direction information and command selection to the processor 1335 and controlling cursor movement on the display 1305. This input device typically has two degrees of freedom in two axes, i.e., the first axis (e.g. x) and the second axis (e.g. y), which allow the device to specify locations within one plane. .

여기에 설명된 방법은 캡처된 이미지를 처리하는 컴퓨터 시스템(1300)의 사용에 관련된다. 한 실시예에 따르면, 캡처된 이미지의 처리는 메인 메모리(1320) 내에 포함된 하나 이상의 명령어의 하나 이상의 시퀀스를 실행하는 프로세서(1335)에 응답하여 컴퓨터 시스템(1300)에 의해 제공된다. 그러한 명령어는 저장 장치(1330)와 같은 다른 컴퓨터-판독가능 매체로부터 메인 메모리(1320) 내로 판독될 수 있다. 메인 메모리(1320) 내에 포함된 명령어 시퀀스의 실행은 프로세서(1335)가 여기에 설명된 처리 단계를 실행하게 한다. 다중-처리 구성에서의 하나 이상의 프로세서는 또한 메인 메모리(1320) 내에 포함된 명령어 시퀀스를 실행하기 위해 이용될 수 있다. 대안적인 실시예에서, 하드와이어드(hard-wired) 회로는 여기에 설명된 실시예를 구현하기 위해 소프트웨어 명령어 대신에 또는 소프트웨어와 조합하여 사용될 수 있다. 그러므로, 여기에 설명된 실시예는 하드웨어 회로와 소프트웨어의 어떤 특정 조합에 제한되지 않는다.The method described herein relates to the use of computer system 1300 to process a captured image. According to one embodiment, the processing of the captured image is provided by computer system 1300 in response to processor 1335 executing one or more sequences of one or more instructions contained in main memory 1320. Such instructions can be read into main memory 1320 from other computer-readable media, such as storage device 1330. Execution of the sequence of instructions contained in main memory 1320 causes processor 1335 to execute the processing steps described herein. One or more processors in a multi-processing configuration may also be used to execute the sequence of instructions contained in main memory 1320. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions to implement the embodiments described herein. Therefore, the embodiments described herein are not limited to any particular combination of hardware circuitry and software.

여기에서 사용된 "컴퓨터-판독가능 매체"라는 용어는 실행을 위해 프로세서(1335)에 명령어를 제공하는 것에 관여하는 임의의 매체를 나타낸다. 그러한 매체는 비휘발성 매체, 휘발성 매체 및 전송 매체를 포함하는(이것에 제한되지 않음) 다양한 형태를 취할 수 있다. 비휘발성 매체는, 예를 들어 저장 장치(1330)와 같은 광 또는 자기 디스크를 포함한다. 휘발성 매체는 메인 메모리(1320)와 같은 동적 메모리를 포함한다. 전송 매체는 버스(1345)를 구성하는 와이어를 포함하여, 동축 케이블, 구리선 및 광섬유를 포함한다. 전송 매체는 또한 전파 및 적외선 데이터 통신 동안에 생성된 것들과 같은 음향파 또는 광파의 형태를 취할 수 있다.The term "computer-readable medium" as used herein refers to any medium that participates in providing instructions to the processor 1335 for execution. Such media can take various forms, including but not limited to non-volatile media, volatile media, and transmission media. Non-volatile media includes, for example, optical or magnetic disks, such as storage device 1330. Volatile media includes dynamic memory, such as main memory 1320. The transmission medium includes coaxial cables, copper wires, and optical fibers, including the wires that make up the bus 1345. The transmission medium may also take the form of acoustic or light waves, such as those generated during radio wave and infrared data communications.

컴퓨터 판독가능 매체의 일반적인 형태는, 예를 들어 플로피 디스크, 플렉시블 디스크, 하드 디스크, 자기 테이프, 또는 임의의 다른 자기 매체, CD-ROM, 임의의 다른 광 매체, 펀치 카드, 종이 테이프, 홀 패턴을 갖는 임의의 다른 물리적 매체, RAM, ROM 및 EPROM, 플래시-EPROM, 임의의 다른 메모리 칩 또는 카트리지, 후술되는 반송파, 또는 컴퓨터가 판독할 수 있는 임의의 다른 매체를 포함한다.Common forms of computer readable media include, for example, floppy disks, flexible disks, hard disks, magnetic tape, or any other magnetic media, CD-ROM, any other optical media, punch cards, paper tapes, hole patterns. Any other physical medium having, RAM, ROM and EPROM, flash-EPROM, any other memory chip or cartridge, carrier described below, or any other medium readable by a computer.

다양한 형태의 컴퓨터 판독가능 매체는 실행을 위해 프로세서(1335)에 하나 이상의 명령어의 하나 이상의 시퀀스를 전달하는 것에 관련될 수 있다. 예를 들 어, 명령어는 원격 컴퓨터의 자기 디스크 상에 초기에 포함될 수 있다. 원격 컴퓨터는 명령어를 동적 메모리 내로 로드하고, 모뎀을 사용하여 전화선을 통해 명령어를 보낼 수 있다. 컴퓨터 시스템(1300)에 국한된 모뎀은 전화선으로 데이터를 수신하고, 적외선 송신기를 사용하여, 데이터를 적외선 신호로 변환할 수 있다. 버스(1345)에 연결된 적외선 검출기는 적외선 신호로 전달된 데이터를 수신하여 그 데이터를 버스(1345) 상에 배치할 수 있다. 버스(1345)는 데이터를 메인 메모리(1320)에 전달하고, 프로세서(1335)는 이 메인 메모리(1320)로부터 명령어를 검색하여 실행한다. 메인 메모리(1320)에 의해 수신된 명령어는 프로세서(1335)에 의해 실행되기 전 또는 후에 저장 장치(1330) 상에 선택적으로 저장될 수 있다.Various forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to processor 1335 for execution. For example, the instructions may be initially included on the magnetic disk of the remote computer. The remote computer can load the instructions into dynamic memory and send the instructions over a telephone line using a modem. A modem limited to computer system 1300 may receive data over a telephone line and use an infrared transmitter to convert the data into an infrared signal. An infrared detector connected to the bus 1345 may receive data placed in an infrared signal and place the data on the bus 1345. The bus 1345 delivers data to the main memory 1320, and the processor 1335 retrieves and executes instructions from the main memory 1320. Instructions received by main memory 1320 may optionally be stored on storage device 1330 before or after execution by processor 1335.

컴퓨터 시스템(1300)은 또한 버스(1345)에 연결된 통신 인터페이스(1340)를 포함한다. 통신 인터페이스(1340)는 로컬 네트워크(1355)에 접속되는 네트워크 링크(1375)에 연결하여 양방향 데이터 통신을 제공한다. 예를 들어, 통신 인터페이스(1340)는 데이터 통신을 대응하는 유형의 전화선에 제공하기 위한 종합 정보 통신망(ISDN) 카드 또는 모뎀일 수 있다. 다른 예로서, 통신 인터페이스(1340)는 호환성 LAN에 데이터 통신 접속을 제공하기 위한 구내 통신망(LAN) 카드일 수 있다. 무선 링크가 또한 구현될 수 있다. 임의의 그러한 구현에서, 통신 인터페이스(1340)는 다양한 유형의 정보를 나타내는 디지털 데이터 스트림을 전달하는 전기 신호, 전자 신호 또는 광 신호를 송수신한다.Computer system 1300 also includes a communication interface 1340 coupled to bus 1345. The communication interface 1340 connects to a network link 1375 that is connected to the local network 1355 to provide bidirectional data communication. For example, communication interface 1340 may be an Integrated Services Digital Network (ISDN) card or modem for providing data communication to a corresponding type of telephone line. As another example, communication interface 1340 may be a local area network (LAN) card for providing a data communication connection to a compatible LAN. Wireless links may also be implemented. In any such implementation, communication interface 1340 transmits and receives electrical, electronic or optical signals that carry digital data streams representing various types of information.

네트워크 링크(1375)는 전형적으로 하나 이상의 네트워크를 통해 다른 데이터 서비스에 데이터 통신을 제공한다. 예를 들어, 네트워크 링크(1375)는 호스트 컴퓨터(1350)에, 또는 인터넷 서비스 제공자(ISP)(1365)에 의해 동작된 데이터 장비에, 로컬 네트워크(1355)를 통해 접속을 제공할 수 있다. ISP(1365)는 이번에는 일반적으로 "인터넷"(1360)이라고 칭해지는 월드 와이드 패킷 데이터 통신 네트워크를 통해 데이터 통신 서비스를 제공한다. 로컬 네트워크(1355) 및 인터넷(1360) 둘다는 디지털 데이터 스트림을 전달하는 전기 신호, 전자 신호 또는 광 신호를 사용한다. 다양한 네트워크를 통한 신호, 및 네트워크 링크(1375) 상에 있고 컴퓨터 시스템(1300)으로/으로부터 디지털 데이터를 전달하는 통신 인터페이스(1340)를 통하는 신호는 정보를 전달하는 반송파의 예시적인 형태이다.Network link 1375 typically provides data communication to other data services over one or more networks. For example, the network link 1375 may provide a connection over the local network 1355 to the host computer 1350 or to data equipment operated by an Internet service provider (ISP) 1365. ISP 1365 provides data communication services over a world wide packet data communication network, which is now generally referred to as " Internet " Both local network 1355 and the Internet 1360 use electrical, electronic or optical signals that carry digital data streams. Signals over various networks, and signals over communication interface 1340 on network link 1375 and transferring digital data to / from computer system 1300 are exemplary forms of carriers for conveying information.

컴퓨터 시스템(1300)은 네트워크(들), 네트워크 링크(1375) 및 통신 인터페이스(1340)를 통해 프로그램 코드를 포함하여, 메시지를 송신하고 데이터를 수신할 수 있다. 인터넷 예에서, 서버(1370)는 인터넷(1360), ISP(1365), 로컬 네트워크(1355) 및 통신 인터페이스(1340)를 통해 응용 프로그램을 위한 요청된 코드를 전송할 수 있다. 본 발명에 따르면, 그러한 하나의 다운로드된 애플리케이션은 여기에 설명된 바와 같이 캡처된 이미지의 처리를 제공한다.Computer system 1300 may include program code via network (s), network link 1375 and communication interface 1340 to transmit messages and receive data. In the Internet example, the server 1370 may send the requested code for the application via the Internet 1360, ISP 1365, local network 1355, and communication interface 1340. According to the present invention, such one downloaded application provides for the processing of the captured image as described herein.

수신 코드는 수신될 때 프로세서(1335)에 의해 실행되고/실행되거나, 저장 장치(1330) 또는 다른 비휘발성 저장 장치 내에 나중의 실행을 위해 저장될 수 있다. 이러한 방식으로, 컴퓨터 시스템(1300)은 반송파의 형태로 애플리케이션 코드를 얻을 수 있다.The received code may be executed by the processor 1335 when received and / or stored for later execution in storage 1330 or other non-volatile storage. In this manner, computer system 1300 may obtain application code in the form of a carrier wave.

Claims

A method of processing a captured image that includes an imaged document, the method comprising:

Detecting graphical information of the captured image, related to edges of the imaged document;

Separating the imaged document from the background of the captured image based on the detected graphic information related to edges of the imaged document;

Calculating deviations of the imaged document from a non-distorted perspective of the imaged document;

Based on the calculated deviations, mapping pixels of the imaged document to a new image of the imaged document with reduced perspective distortion;

Resampling pixels of the new image;

Detecting graphic information of the new image of the imaged document, related to the orientation of the imaged document in the new image; And

Rotating the new image based on the graphical information related to the orientation of the imaged document in the new image

And a captured image comprising the imaged document.

The method of claim 1,

Detecting graphical information in the captured image may include detecting graphical information in the captured image in response to a transition between the imaged document and the rest of the captured image, and detecting the graphical information in the captured image. By selecting one or more lines from the graphical information corresponding to edges,

The separating step is performed by separating the imaged document from the background of the captured image based on the one or more lines corresponding to edges of the imaged document,

And said calculating step is performed by calculating deviations of corners of said imaged document from a non-distorted perspective of said imaged document.

3. The method of claim 2,

Calculating corners of the imaged document based on intersections of the one or more lines corresponding to edges of the imaged document; And

Based on the calculated deviation, mapping coordinates of pixels of the imaged document to coordinates corresponding to a non-distorted perspective of the imaged document.

And further comprising a captured image comprising the imaged document.

Detecting graphical information in the captured image regarding a transition between the imaged document and the remaining portion of the captured image;

Selecting one or more lines from the graphical information corresponding to edges of the imaged document;

Calculating corners of the imaged document based on intersections of the one or more lines corresponding to edges of the imaged document;

Separating the imaged document from the background of the captured image based on the one or more lines corresponding to edges of the imaged document;

Calculating deviations of corners of the imaged document from a non-distorted perspective of the imaged document;

Generating a new image of the imaged document with reduced perspective distortion based on the calculated deviations;

Mapping coordinates of pixels of the imaged document to coordinates corresponding to a non-distorted perspective of the imaged document based on the calculated deviation;

Converting a non-distorted imaged document into a two-color representation of the imaged document;

Calculating pixel intensities of the two-color representations along the vertical axis of the non-distorted imaged document;

Calculating pixel intensities of the two-color representations along a horizontal axis of the non-distorted imaged document;

Identifying contrasts of pixel intensities along a vertical and horizontal axis of the non-distorted imaged document;

Identifying text lines of the imaged document based on the contrasts of the pixel intensities;

Determining a format of the non-distorted imaged document based on a direction of the text lines of the non-distorted imaged document relative to sizes of the edges of the imaged document; And

Rotating the non-distorted imaged document in accordance with the format determination of the non-distorted imaged document.

And a captured image comprising the imaged document.

5. The method of claim 4,

Dividing the text lines into three portions along the vertical axis of the text lines;

Determining the direction of the text lines based on a comparison of pixel intensities of the portions of the text lines; And

Rotating the non-distorted imaged document based on the direction determination

And further comprising a captured image comprising the imaged document.

delete

A computer readable storage medium for processing a captured image comprising an imaged document, the computer readable storage medium comprising:

The computer readable storage medium has one or more sequences of one or more instructions that, when executed by one or more processors, cause the one or more processors to execute computer-implemented steps,

The computer-implemented steps,

Calculating deviations of the imaged document from the non-distorted perspective of the imaged document;

Mapping pixels of the imaged document to a new image of the imaged document with reduced perspective distortion based on the calculated deviations;

Resampling pixels of the new image;

And a captured image comprising the imaged document.

An apparatus for processing a captured image comprising an imaged document, the apparatus comprising:

One or more processors;

Memory communicatively coupled to the one or more processors; And

One or more modules included in the memory and configured to be executed by the one or more processors

Including;

The one or more modules,

A module for detecting graphical information of the captured image, related to edges of the imaged document;

A module for separating the imaged document from the background of the captured image based on the detected graphic information related to edges of the imaged document;

A module for calculating deviations of the imaged document from the non-distorted perspective of the imaged document;

A module for mapping pixels of the imaged document to a new image of the imaged document with reduced perspective distortion based on the calculated deviations;

A module for resampling pixels of the new image;

A module for detecting graphical information of the new image of the imaged document, related to the orientation of the imaged document in the new image; And

And a module for rotating the new image based on the graphical information related to the orientation of the imaged document in the new image.