WO2016065551A1

WO2016065551A1 - Whiteboard and document image detection method and system

Info

Publication number: WO2016065551A1
Application number: PCT/CN2014/089780
Authority: WO
Inventors: Lu Yuan; Jiangyu Liu; Jian Sun; Takeshi Kubo; Yu UKAI; Francois de Sorbier DE POUGNADORESS; Seiichi Kato; Junko FUJIWARA
Original assignee: Microsoft Technology Licensing, Llc
Priority date: 2014-10-29
Filing date: 2014-10-29
Publication date: 2016-05-06
Also published as: CN106663207A

Abstract

In some examples, techniques and architectures for determining a boundary of a whiteboard in an image include detecting a plurality of quadrilaterals in the image, partitioning the image into a plurality of grids, determining a color of each grid, calculating an intersection space of at least a portion of the plurality of quadrilaterals, determining a color of the intersection space, and determining a foreground of the image and a background of the image based, at least in part, on the color of each grid and the color of the intersection space, wherein the foreground is inside a boundary of the whiteboard. The quadrilaterals may be ranked based, at least in part, on the number of stroke-marks within respective quadrilaterals that are in the foreground of the image. The boundary of the whiteboard in the image may be determined based, at least in part, on the ranking.

Description

WHITEBOARD AND DOCUMENT IMAGE DETECTION

BACKGROUND

Mobile computing devices, such as smartphones and tablets, are increasingly being utilized in lieu of standalone cameras for capturing photographs of whiteboards, blackboards (e.g., a writing surface having a colored background) and documents in association with various productivity scenarios in the workplace (e.g., meetings comprising slide presentations, brainstorming sessions, and the like) . The captured photographic images may then be utilized in one or more productivity applications for generating electronic documents. The aforementioned capturing of photographic images, however, may suffer from a number of drawbacks. For example, many photographs must be taken at an angle (which may be due to the physical dimension limitations of the room in which a user is located) as well as in less than ideal lighting conditions (e.g., due to glare from incident lights in a meeting room) . As a result, captured photographic images often contain unwanted perspective skews as well as unwanted regions (e.g., walls outside a whiteboard frame or table surfaces outside a document page boundary) that may be at least partially rectified prior to utilizing the images in external productivity applications. Moreover, captured photographic images may contain reflections of incident light residuals thereby making it necessary for these images to be “cleaned up” prior to being consumed by productivity application software.

SUMMARY

This disclosure describes techniques and architectures for detecting a boundary of a whiteboard region or a document region of an image captured by, for example, a smartphone, tablet, or any other suitable mobile computing device. In particular, such a boundary may comprise one of a relatively large number of quadrilateral configurations detected in the image by a quadrilateral detection process. The boundary is determined by selecting one of a number of quadrilateral candidates that are ranked accordingly to a set of criteria.

A quadrilateral detection process may involve a line segment detector that utilizes color-based edge detection. After fitting a series of line segments to detected edge points, the process includes a line validation function to remove and merge small or unstable line segments. Reducing the number of lines improves accuracy of line detection and may accelerate the quadrilateral detection process.

Two different ranking processes are used for determining which quadrilateral among the quadrilateral candidates is the boundary of the whiteboard or document region of the image. One ranking process may be used for whiteboard images and another ranking process may be used for document images. For example, a ranking process for whiteboard images may involve stroke-marks detection to help identify the whiteboard region in an image while excluding ambiguous regions such as white walls or white tables that may be in a background of the image. A ranking process for document images may involve an energy function that considers line color contrast in the image.

This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. The term “techniques, ” for instance, may refer to system (s) , method (s) , computer-readable instructions, module (s) , algorithms, hardware logic (e.g., Field-programmable Gate Arrays (FPGAs) , Application-specific Integrated Circuits (ASICs) , Application-specific Standard Products (ASSPs) , System-on-a-chip systems (SOCs) , Complex Programmable Logic Devices (CPLDs) ) , and/or other technique (s) as permitted by the context above and throughout the document.

BRIEF DESCRIPTION OF THE DRAWINGS

The detailed description is set forth with reference to the accompanying figures. In the figures, the left-most digit of a reference number identifies the figure in which the reference number first appears. The use of the same reference numbers in different figures indicates similar or identical items or features.

FIG. 1 is a block diagram depicting an environment in which techniques described herein may be implemented, according to various implementations.

FIG. 2 is a block diagram depicting a device in which techniques described herein may be implemented, according to various implementations.

FIG. 3 is a block diagram of a mobile computing device in which techniques described herein may be implemented, according to various implementations.

FIG. 4 illustrates a screen display of a computing device which includes a user interface for capturing an image for processing, according to various implementations.

FIG. 5 is a flow diagram illustrating processes for detecting quadrilaterals in an image that includes a whiteboard image or document image, according to various implementations.

FIG. 6 is a flow diagram illustrating processes for detecting edges in an image that includes a whiteboard image or document image, according to various implementations.

FIG. 7 is a flow diagram illustrating processes for line detection and validation for an image that includes a whiteboard image or document image, according to various implementations.

FIGS. 8 and 9 illustrate modeling for collinear line merging, according to various implementations.

FIG. 10 is a flow diagram illustrating processes for collinear line merging, according to various implementations.

FIG. 11 is a flow diagram illustrating processes for detecting stroke-marks and ranking candidate quadrilaterals, according to various implementations.

FIG. 12 is a flow diagram illustrating processes for determining a boundary of a whiteboard region in an image, according to various implementations.

DETAILED DESCRIPTION

OVERVIEW

This disclosure describes techniques and architectures for detecting a boundary of a whiteboard or a document in an image captured by, for example, a smartphone, a tablet, or other mobile computing or computing device. Such a boundary may comprise a quadrilateral configuration, hereinafter referred to as a “quadrilateral” . An image that includes a whiteboard or document may also include a background (e.g., the region of the image outside the region of the whiteboard/document) and a foreground (e.g., the region of the image inside the region of the whiteboard/document) . The background and the foreground may include a number of lines, shapes, markings, contrasting color portions, and so on. For example, the whiteboard region of the image may be substantially covered with markings of words, drawings, tables, and so on, made with a dry eraser or other felt pen writing instrument. Such felt pen markings are hereinafter referred to as stroke-marks. A digitized analysis of such an image may thus detect any number of quadrilaterals in addition to the quadrilateral that is the whiteboard boundary. Accordingly, implementations described herein describe, among other things, techniques for determining which among a number of quadrilaterals represents a boundary of a whiteboard or document region of an image.

In some implementations, a technique for determining a boundary of a whiteboard or document region in an image may include partitioning the document region of the image or the whiteboard region of the image into multiple color components and detecting edges in the document region of the image or the whiteboard region of the image. The technique may further include generating line segments based, at least in part, on the detected edges and generating quadrilateral candidates comprising a subset of the line segments. The quadrilateral candidates may be subsequently ranked according to likelihood of the quadrilateral candidates being the boundary of the document region of the image or the whiteboard region of the image.

In some implementations, a first type of ranking process may be performed for an image that includes a document region, whereas a second type of ranking process may be performed for an image that includes a whiteboard region. The first type of ranking process is different from the second type of ranking process.

For example, the first type of ranking process may involve assigning respective scores to the quadrilateral candidates, wherein a score of an individual quadrilateral candidate is based, at least in part, on color contrast of a region at least partially bounding the individual quadrilateral candidate.

In some implementations, the second type of ranking process used for an image that includes a whiteboard region may include partitioning the image, which includes the whiteboard region, into a plurality of grids, determining a color of each grid, calculating an intersection space of at least a portion of the quadrilateral candidates, determining a color of the intersection space, and determining a foreground of the image and a background of the image based, at least in part, on the color of each grid and the color of the intersection space.

Various implementations are described further with reference to FIGS. 1-12

EXAMPLE ENVIRONMENT

FIG. 1 is a block diagram depicting an environment 100 in which implementations involving image processing as described herein can operate, according to various implementations. In some examples, the various devices and/or components of environment 100 include distributed computing resources 102 that may communicate with one another and with external devices via one or more networks 104.

For example, network (s) 104 may include public networks such as the Internet, private networks such as an institutional and/or personal intranet, or some combination of private and public networks. Network (s) 104 may also include any type of wired and/or wireless network, including but not limited to local area networks (LANs) , wide area networks (WANs) , satellite networks, cable networks, Wi-Fi networks, WiMax networks, mobile communications networks (e.g., 3G, 4G, and so forth) or any combination thereof. Network (s) 104 may utilize communications protocols, including packet-based and/or datagram-based protocols such as internet protocol (IP) , transmission control protocol (TCP) , user datagram protocol (UDP) , or other types of protocols. Moreover, network (s) 104 may also include a number of devices that facilitate network communications and/or form a hardware basis for the networks, such as switches, routers, gateways, access points, firewalls, base stations, repeaters, backbone devices, and the like.

In some examples, network (s) 104 may further include devices that enable connection to a wireless network, such as a wireless access point (WAP) . Examples support connectivity through WAPs that send and receive data over various electromagnetic frequencies (e.g., radio frequencies) , including WAPs that support Institute of Electrical and Electronics Engineers (IEEE) 1302.11 standards (e.g., 1302.11g, 1302.11n, and so forth) , and other standards.

In various examples, distributed computing resource (s) 102 includes computing devices such as devices 106 (1) –106 (N) . Examples support scenarios where device (s) 106 may include one or more computing devices that operate in a cluster or other grouped configuration to share resources, balance load, increase performance, provide fail-over support or redundancy, or for other purposes. Although illustrated as servers, device (s) 106 may include a diverse variety of device types and are not limited to any particular type of device. Device (s) 106 may include specialized computing device (s) 108.

For example, device (s) 106 may include any type of computing device having one or more processing unit (s) 110 operably connected to computer-readable media 112, I/O interfaces (s) 114, and network interface (s) 116. Computer-readable media 112 may have an image processing framework 118 stored thereon. For example, image processing framework 118 may comprise computer-readable code that, when executed by processing unit (s) 110, receive and process images from a client server, such as specialized computing device (s) 120. Specialized computing device (s) 120, which may communicate with device (s) 106 via networks (s) 104, may include any type of computing device having one or more processing unit (s) 122 operably connected to computer-readable media 124, I/O interface (s) 126, and network interface (s) 128. I/O interface (s) may include a display device. Computer-readable media 124 may have a specialized computing device-side image processing framework 130 stored thereon. For example, image processing framework 130 may comprise computer-readable code that, when executed by processing unit (s) 122, perform image processing operations.

FIG. 2 depicts an illustrative device 200, which may represent device (s) 120 illustrated in FIG. 1, for example. Illustrative device 200 may include any type of computing device having one or more processing unit (s) 202, such as processing unit (s) 110 or 122, operably connected to computer-readable media 204, such as computer-

readable media

112 or 124. The connection may be via a bus 206, which in some instances may include one or more of a system bus, a data bus, an address bus, a PCI bus, a Mini-PCI bus, and any variety of local, peripheral, and/or independent buses, or via another operable connection. Processing unit (s) 202 may represent, for example, a CPU incorporated in device 200. The processing unit (s) 202 may similarly be operably connected to computer-readable media 204.

The computer-readable media 204 may include, at least, two types of computer-readable media, namely computer storage media and communication media. Computer storage media may include volatile and non-volatile machine-readable, removable, and non-removable media implemented in any method or technology for storage of information (in compressed or uncompressed form) , such as computer (or other electronic device) readable instructions, data structures, program modules, or other data to perform processes or methods described herein. The computer-readable media 112 and the computer-readable media 124 are examples of computer storage media. Computer storage media include, but are not limited to hard drives, floppy diskettes, optical disks, CD-ROMs, DVDs, Blu-ray, read-only memories (ROMs) , random access memories (RAMs) , EPROMs, EEPROMs, flash memory, magnetic or optical cards, solid-state memory devices, or other types of media/machine-readable medium suitable for storing electronic instructions.

In contrast, communication media may embody computer-readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave, or other transmission mechanism. As defined herein, computer storage media does not include communication media.

Device 200 may include, but is not limited to, desktop computers, server computers, web-server computers, personal computers, smartphones, mobile computers, laptop computers, tablet computers, wearable computers, implanted computing devices, telecommunication devices, automotive computers, network enabled televisions, thin clients, terminals, personal data assistants (PDAs) , game consoles, gaming devices, work stations, media players, personal video recorders (PVRs) , set-top boxes, cameras, integrated components for inclusion in a computing device, appliances, or any other sort of computing device such as one or more separate processor device (s) 208, such as CPU-type processors (e.g., micro-processors) 210, GPUs 212, or accelerator device (s) 214.

In some examples, as shown regarding device 200, computer-readable media 204 may store instructions executable by the processing unit (s) 202, which may represent a CPU incorporated in device 200. Computer-readable media 204 may also store instructions executable by an external CPU-type processor 210, executable by a GPU 212, and/or executable by an accelerator 214, such as an FPGA type accelerator 214 (1) , a DSP type accelerator 214 (2) , or any internal or external accelerator 214 (N) .

Executable instructions stored on computer-readable media 204 may include, for example, an operating system 216, an image processing framework 218, and other modules, programs, or applications that may be loadable and executable by processing units (s) 202, and/or 210. For example, image processing framework 218 may comprise computer-readable code that, when executed by processing unit (s) 202, perform image processing operations. In some implementations, modules may include a color partitioning module to partition an image of a document or an image of a whiteboard into multiple color components, an edge detecting module to detect edges in the image of the document or the image of the whiteboard, a line segment generator module to generate line segments based, at least in part, on the detected edges, a quadrilateral generator module to generate quadrilateral candidates comprising a subset of the line segments, and a ranking module to rank the quadrilateral candidates according to likelihood of the quadrilateral candidates being a boundary of the image of the document or the image of the whiteboard.

Alternatively, or in addition, the functionally described herein may be performed by one or more hardware logic components such as accelerators 214. For example, and without limitation, illustrative types of hardware logic components that may be used include Field-programmable Gate Arrays (FPGAs) , Application-specific Integrated Circuits (ASICs) , Application-specific Standard Products (ASSPs) , System-on-a-chip systems (SOCs) , Complex Programmable Logic Devices (CPLDs) , etc. For example, accelerator 214 (N) may represent a hybrid device, such as one that includes a CPU core embedded in an FPGA fabric.

In the illustrated example, computer-readable media 204 also includes a data store 220. In some examples, data store 220 includes data storage such as a database, data warehouse, or other type of structured or unstructured data storage. In some examples, data store 220 includes a relational database with one or more tables, indices, stored procedures, and so forth to enable data access. Data store 220 may store data for the operations of processes, applications, components, and/or modules stored in computer-readable media 204 and/or executed by processor (s) 202 and/or 210, and/or accelerator (s) 214. In some examples, data store 220 may store images of documents, papers, notes, whiteboards, or blackboards, among other things. Alternately, some or all of the above-referenced data may be stored on separate memories 222 such as a memory 222 (1) on board CPU type processor 210 (e.g., microprocessor (s) ) , memory 222 (2) on board GPU 212, memory 222 (3) on board FPGA type accelerator 214 (1) , memory 222 (4) on board DSP type accelerator 214(2) , and/or memory 222 (M) on board another accelerator 214 (N) .

Device 200 may further include one or more input/output (I/O) interface (s) 224, such as I/O interface (s) 114 or 126, to allow device 200 to communicate with input/output devices such as user input devices including peripheral input devices (e.g., a keyboard, a mouse, a pen, a game controller, a voice input device, a touch input device, a gestural input device, and the like) and/or output devices including peripheral output devices (e.g., a display, a printer, audio speakers, a haptic output, and the like) . Device 200 may also include one or more network interface (s) 226, such as network interface (s) 116 or 128, to enable communications between computing device 200 and other networked devices such as other device 120 over network (s) 104. Such network interface (s) 226 may include one or more network interface controllers (NICs) or other types of transceiver devices to send and receive communications over a network.

In some implementations, device 200 may comprise an image capture device 228, such as a camera, to capture images of a document, whiteboard, or blackboard, for example. Image capture device 228 may provide digitized images to image processing framework 218 or any other component of device 200.

FIG. 3 is a block diagram of a mobile computing device 300 in which techniques described herein may be implemented, according to various implementations. Mobile computing device 300 may include, without limitation, a smartphone, a tablet personal computer, a laptop computer, and the like, with which various implementations may be practiced. In the particular example implementation illustrated in FIG. 3, mobile computing device 300 may be a handheld computer having both input elements and output elements. Input elements may include touch screen display 302, camera 304, and input buttons 306 that allow the user to enter information into mobile computing device 300. Mobile computing device 300 may also incorporate an optional side input element 308 allowing further user input. Optional side input element 308 may be a rotary switch, a button, or any other type of manual input element. In the example implementation, side input element 308 is affixed to mobile computing device 300, but it is understood that side input element 308 may be physically separated from mobile computing device 300 and capable of remotely providing input to mobile computing device 300 through wireless communication. In alternative implementations, mobile computing device 300 may incorporate more or less input elements. In yet another alternative implementation, the mobile computing device may be a portable telephone system, such as a cellular phone having display 302 and input buttons 306. Mobile computing device 300 may also include an optional keypad 310. Optional keypad 310 may be a physical keypad or a “soft” keypad generated on the touch screen display.

Mobile computing device 300 may incorporate output elements, such as display 302, which can display a graphical user interface (GUI) . Other output elements include speaker 312, microphone 314, and LED 316. Additionally, mobile computing device 300 may incorporate a vibration module (not shown) , which causes mobile computing device 300 to vibrate to notify the user of an event. In yet another implementation, mobile computing device 300 may incorporate a headphone jack (not shown) for providing another means of providing output signals.

Although described herein in combination with mobile computing device 300, in alternative implementations may be used in combination with any number of computer systems, such as in desktop environments, laptop or notebook computer systems, multiprocessor systems, micro-processor based or programmable consumer electronics, network PCs, mini computers, main frame computers and the like. Various implementations may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network in a distributed computing environment； programs may be located in both local and remote memory storage devices. To summarize, any computer system having a plurality of environment sensors, a plurality of output elements to provide notifications to a user and a plurality of notification event types may incorporate the various implementations described herein.

FIG. 4 illustrates a screen display 400 of a computing device 402 which includes a user interface for capturing an image for processing, according to various implementations. The user interface may include

user controls

404, 406, and 408. For example, user control 404 may be utilized to select an image processing mode configured for standard photographic images. User control 406 may be utilized to select an image processing mode configured for whiteboard images. User control 408 may be utilized to select an image processing mode configured for document images. In accordance with various implementations, the selection of the user controls 404, 406, and 408 may be made by any number of gestures including tapping and swiping gestures. As illustrated in FIG. 4, user control 406 has been selected for whiteboard image processing and a user (represented by hands 410) is preparing to capture an image of a whiteboard 412 which may be, for example, mounted on a wall of a meeting room having a ceiling 414 or other background objects. The user may then capture the image of whiteboard 412 and the background as viewed in screen display 400.

FIG. 5 is a flow diagram illustrating a process 500 for detecting quadrilaterals in an image that includes a whiteboard image or document image, according to various implementations. A processor or computer system, such as device 200 illustrated in FIG. 2 for example, may perform process 500. Though process 500 may be performed by any of a number of devices or systems, the process will be described as being performed by a processor. Details of various portions of process 500 will be described with respect to subsequent figures.

At block 502, a processor may receive a color image that includes a document, a whiteboard, or a blackboard. The processor may receive the image from memory, from an image capturing device, or from a source external to the system that includes the processor, for example. A document may comprise a note, a sheet of paper, magazine portion, a receipt, or the like. Hereinafter, “whiteboard” is used to represent a physical display board or surface that may be written on and marked up, such as a chalkboard, a whiteboard, a blackboard, or any other board or substantially plane surface upon which are marks, writing, diagrams, postings (e.g., adhered to the whiteboard) , and so on. Such marks, writing, diagrams, or postings are referred to herein as stroke-marks. A stroke-mark may comprise a mark on a whiteboard such as a letter, number, character, drawing, table, figure, or any portion thereof.

The color image may include a foreground, considered to be the whiteboard, and a background, which may include all portions of the image that are not within the bounds of the whiteboard irrespective of whether those portions of the image are situated in front of, behind, or coplanar with the whiteboard. For example, the background of an image may include walls, ceiling, the floor, portions of a desk or chair, and people in the room (even though the people may be closer to a camera that captured the image than the whiteboard) , just to name a few examples.

At block 504, the processor performs edge detection of the received image to generate an edge map. For example, the processor may downscale the image by some factor with respect to a longest side of the image. Such downscaling may provide a benefit in that some portions of process 500 may proceed faster with a downscaled image as compared to an original image. Noise effects may also be reduced by such downscaling. The downscaled image may be partitioned into a number of color channels. For example, the image may be partitioned into three color channels, comprising a red channel, a green channel, and a blue channel. The processor may be configured to separately detect lines in each channel. Such a process of partitioning an image into multiple channels by respective colors may provide a number of benefits. For example, line detection performed by a single gray channel, as opposed to separate color channels, may have limitations on distinguishing among two or more different colors in an image. While the exemplary implementation partitions the image into a red, green, and blue color channel, it will be appreciated that other color partitions of the image may be used as well. For example, in an alternative implementation, red, yellow, and blue color channels may be used.

At block 506, the processor may perform a line fitting and validation process. For example, the processor may best-fit a parametric line to edge points in the edge map. In some implementations, respective lines may be fit to pairs of edge points that are linked by the edge. The processor may determine a fitting error for the lines that are fit to the edge points. For example, a fitting error may measure the extent to which a line fits to edge points based, at least in part, on distances from points on the line to the edge points, for example. A fitting error may have a quantitative value measureable by any of a number of methods. If such a fitting error is smaller than a threshold (e.g., a predetermined threshold value) , that indicates that the edge point satisfies the fitted line. The processor may, at least temporally, remove the satisfied edge points (whose fitting errors are smaller than the predetermined threshold value) from the edge map. The fitted line is then added to the list of fitted lines.. The processor may subsequently iteratively fit lines using remaining edge points (which may also include outlier edge points that have larger fitting errors than the predetermined threshold value) , check the fitting errors, and exclude outliers from the edge map. The processor may then group lines into a subset of lines that are portions of linear edge points. In this fashion, edge points in the edge map may be parameterized to a series of line segments.

At block 508, the processor may iteratively remove relatively small line segments that may be caused by noise. This process may improve robustness of quadrilateral detection and accelerate computation for detecting quadrilaterals. For example, in individual line subsets, the processor may rank lines according to length. The processor may remove line segments that are shorter than a median length of lines in the respective individual line subsets. The remaining line segments in each subset may be used to find quadrilateral candidates that may be the boundary of a whiteboard or document region of the image.

At block 510, the processor may rank candidate quadrilaterals according to likelihood of being the boundary of a whiteboard or document in the image. The processor may use a ranking technique for ranking candidate quadrilaterals for a document that is different from a ranking technique for ranking candidate quadrilaterals for a whiteboard. Such ranking techniques are described in detail below.

At block 512, the processor selects one of the candidate quadrilaterals as being the boundary of the whiteboard or document. The best (e.g., highest ranked) candidate quadrilateral may be selected based, at least in part, on the ranking performed in block 512, for example.

FIG. 6 is a flow diagram illustrating a process 600 for detecting edges in an image that includes a whiteboard or document, according to various implementations. For example, process 600 may be similar to or the same as processes represented by

blocks

502 and 504 in process 500. Similarly, a processor or computer system, such as device 200 illustrated in FIG. 2, for example, may perform process 600. At block 602, a processor may receive a color image, which may include a document or a whiteboard, for example. At block 604, the processor may perform processes to smooth the image. For example, the processor may smooth the image in each color channel using a low-pass Gaussian filter (e.g., σ＝0.67) . At block 606, the processor may then compute image gradients on both the X-coordinate (e.g., horizontal) and the Y-coordinate (e.g., vertical) using a Sobel operator. For example, a Sobel operator or filter may be used to generate an image that emphasizes edges and transitions. The Sobel operator is a discrete differentiation operator for computing an approximation of the gradient of an image intensity function. At each point in the image, the result of the Sobel operator is either the corresponding gradient vector or the norm of this vector. The Sobel operator is based on convolving the image with a small, separable, and integer-valued filter in horizontal and vertical directions, for example.

For the smoothing process, the processor may choose pixels with local maximum gradients and link the pixels together. Such a step is similar to, for example, a Canny detection operation. At block 608, the processor may generate an edge map based, at least in part, on the smoothing process and the image gradient computations. The edge map may be used for detecting objects in images for particular color channel images.

FIG. 7 is a flow diagram illustrating a process 700 for line detection and validation for an image that includes a whiteboard or document, according to various implementations. For example, process 700 may yield line candidates for possible quadrilaterals that will subsequently be ranked for likelihood of being the boundary of the whiteboard or document in the image. Some of the process steps in process 700 may be similar to or the same as process steps in process 500, for example. A processor or computer system, such as device 200 illustrated in FIG. 2, for example, may perform process 700.

At block 702, a processor may receive the color image. At block 704, the processor may downscale the image by some factor with respect to a longest side of the image. Such downscaling may provide a benefit in that some portions of process 700 may proceed faster with a downscaled image as compared to an original image. Noise effects may also be reduced by such downscaling. The downscaled image may be partitioned into three color channels, comprising a red channel, a green channel, and a blue channel, for example. At block 706 of each channel, the processor may detect edges. At block 708, the processor may fit lines to points of the detected edges. For example, a parametric line (l: ax + by + c＝0) may be fit to a set of edge points {e_i} , where (e_i: (x_i, y_i) ) in an edge map by the relation

argmin_a,b,c∑_i║ax + by + c║².

The processor may best-fit such parametric lines to edge points in an edge map. In some implementations, respective lines may be fit to pairs of edge points that are linked by the edge. The processor may determine a fitting error for the lines fit to the edge points. If such a fitting error is smaller than a threshold (e.g., a predetermined threshold value) , that indicates that the edge point satisfies the fitted line. The processor may, at least temporally, remove the satisfied edge points (whose fitting errors are smaller than the predetermined threshold value) from the edge map. The fitted line is then added to the list of fitted lines. The processor may subsequently iteratively fit lines using remaining edge points (which may also include outlier edge points that have larger fitting errors than the predetermined threshold value) , check the fitting errors, and exclude outliers from the edge map. The processor may then group lines into a subset of lines that are portions of linear edge points. In this fashion, edge points in the edge map may be parameterized to a series of line segments.

At block 710, the processor may perform collinear merging to merge detected line segments together from the three color channel images. In some implementations, criteria for such merging are based, at least in part, on the collinearity of any two line segments. For example, FIG. 8 illustrates three types of collinearity, identified as type (1) , type (2) , and type (3) . For example, type (1) collinearity involves a pair of line segments that do not overlap. Type (2) collinearity involves a pair of line segments that partially overlap. Type (3) collinearity involves a pair of line segments that completely overlap.

FIG. 9 illustrates line-pair modeling for collinear line merging and FIG. 10 is a flow diagram illustrating a process 1000 for collinear line merging, according to various implementations. For example, process 1000 may operate on a line pair comprising a first line ab and a second line cd. A processor or computer system, such as device 200 illustrated in FIG. 2, for example, may perform process 1000.

FIG. 9 illustrates a line pair comprising a line ab and a line cd. Each line is virtually extended so that the extended lines have the same length. For example line ab is extended to line ad’ a nd has the same length as line a’ d, which was extended from line cd. Orthogonal lines are projected from all endpoints of the two lines. For example, a line aa’ projects from endpoint a, a line cc’ projects from endpoint c, a line bb’ projects from endpoint b, and a line dd’ projects from endpoint d. As will be seen below, process 1000 uses this construction to determine whether to merge lines ab and cd.

In process 1000, at block 1002, the processor may compute an angle θ between line ab and line cd (FIG. 9 depicts lines ab and cd as being parallel, but this need not be the case) . At block 1004, the processor may determine whether angle θ is smaller than a threshold angle. If not, then process 1000 may proceed to block 1006 where line ab and line cd should not be merged together. On the other hand, if angle θ is smaller than the threshold angle, process 1000 may proceed to block 1008, where the processor performs a construction, such as that illustrated in FIG. 9. Such a construction involves computing a projection from the line ab to the other line cd and vice versa. Also, at block 1010, the processor may classify the type of collinearity of the line pair as one of type (1) , type (2) , or type (3) .

Remaining portions of process 1000 are based, at least in part, on which one of four conditions the line pair is in. The four conditions are identified as Condition A, Condition B, Condition C, and Condition D, and are based, at least in part, on line pair type (e.g., type (1) , type (2) , or type (3) . Condition A, Condition B, Condition C, and Condition D are defined as follows.

Conditions A is “type (2) or type (3) and max {|zc′|， |bd′|， |a′c|， |b′d|} ＜thresh” . Condition (B) is “max {|ac′|， |bd′|} ＜∈·|ab|， max {|a′c|， |b′d|} ＜∈·|cd|， d_max＜thresh， and min {|a′c|， |a′d|， |b′c|， |b′d|} ＜thresh” . Condition (C) is “max {|ac′|， |bd′|} ＜∈· |ab| ， max {|a′c|， |b′d|} ＜∈·|cd|， d_max＜thresh” . Condition (D) is “d_max＜thresh” . Here, d_max is max {|ac′|， |bd′|} when|ab|＞|cd|, otherwise, d_max＝max {|a′c|，|b′d|} . ) . Letter pairs (e.g., ab, bd, ac, b’ d, and so on) refer to line lengths. Of course, such conditions and definitions are merely examples, and claimed subject matter is not so limited.

Proceeding with process 1000, at block 1012, the processor determines whether the line pair ab-cd is in Condition A. If not, then process 1000 may proceed to block 1014 where the lines will not be merged together. On the other hand, if the line pair ab-cd is in Condition A, then process 1000 may proceed to block 1016 where the line pair is sorted based on its type. For example, if line pair ab-cd is type (1) , then process 1000 proceeds to block 1018. If the line pair ab-cd is in Condition B then the processor will merge the lines, else the processor will not merge the lines. If line pair ab-cd is type (2) , then process 1000 proceeds to block 1020. If the line pair ab-cd is in Condition C then the processor will merge the lines, else the processor will not merge the lines. If line pair ab-cd is type (3) , then process 1000 proceeds to block 1022. If the line pair ab-cd is in Condition D then the processor will merge the lines, else the processor will not merge the lines.

Returning to process 700 at block 712, in FIG. 7, the processor may classify detected line segments into four subsets according to their locations in the image: {left, top, right and bottom} . A technique for performing such classification may include computing the slope of each line segment. If the slope is nearer to the horizontal instead of the vertical, the line segment is placed in a top or bottom subset of lines. To distinguish between the top and the bottom subset, the processor may check if the endpoints of the line segments are lower or higher than the image center. Similarly, the processor may assign line segments to a left or a right subset of lines. After such line classification, any possible quadrilateral may be generated by the combination of four line segments, wherein each line segment may be in one of the four line subsets.

At block 714, the processor may exclude relatively small line segments that may result from noise. This step may improve the robustness of quadrilateral detection and accelerate the computation of detection. For example, in each line subset, the processor may rank lines according to their length. Finally, the processor may remove line segments having lengths smaller than a median length in the line subset for which the line is in.

Processes described above may produce a number of quadrilateral candidates, of which one is the boundary of the whiteboard region or the document region in the image. To determine which quadrilateral candidate is mostly likely to be the boundary, the processor may perform a ranking and elimination process, where relatively poor quadrilateral candidates are removed from the selection process. Such a ranking and elimination process may be a cascaded process, where one process is used for initial ranking and elimination, and a subsequent process is used for a final ranking and elimination process. The initial ranking and elimination process may produce a subset of quadrilaterals that are sub-optimal (e.g., relatively poor candidates) from all possible quadrilateral candidates. The final ranking and elimination process may choose the best one from the subset of the quadrilaterals which are obtained from the initial ranking and elimination process.

In a first ranking and elimination process, quadrilaterals (e.g., quadrilateral candidates) may be scored by the following relation:

Score ＝ Area^α × CoveredPerimeter^β × Ratio^γ ×

× ImageEdgePenalty.

[Eqn. 1]

“Area” is the area size of the quadrilateral, “CoveredPerimeter” is the ratio of the length of the detected lines of the quadrilateral to the length of the quadrilateral boundary, “Ratio” is the ratio of “CoveredPerimeter” to the quadrilateral's perimeter, “CornerPenalty” is the sum of the distances from endpoints of detected lines of the quadrilateral to corners of the quadrilateral, and “ImageEdgePenalty” ＝ (the number of lines that are not image boundary edges/4) ². α, β, γ, and

are user-selectable parameters that reflect tradeoffs of different energy terms. To achieve better parameters for the score, the processor may train a linear regression on a labeled dataset instead of maintaining original parameters (all are manually set to 1) . For example, α＝1, β＝4, γ＝2,

in a particular algorithm. After the score for each possible quadrilateral is computed, the processor may select N best candidates with highest scores (e.g., N ＝ 10) as a subset of sub-optimal quadrilaterals for the second ranking and elimination process.

In a second ranking and elimination process, two different strategies are used, one strategy for quadrilaterals in an image of a whiteboard and the other strategy for quadrilaterals in an image of a document. For the ranking and elimination process for quadrilaterals in an image of a document, the quadrilaterals may be scored by the following relation:

Score ＝ Area^α × CoveredPerimeter^β × Ratio^γ ×

× ImageEdgePenalty x ColorContrast

[Eqn. 2]

Equation 2 is similar to Equation 1, with the following differences. “ImageEdgePenalty” ＝ ( (the number of lines that are not image boundary edges + 1) /5) ². This relationship helps avoid a zero quality penalty for image edges.

“ColorContrast” ＝∏_i∈lines x (pixContrast_i) ^1/2， [Eqn. 3]

where “lines” is the subset of line segments {left, top, right, bottom of image} , and “pixContrast” ＝ min { (║C_in -C_out║² /3) ^1/2, 1} . C_in and C_out are the mean color of both margin sides of a line. After compute the scores for all quadrilaterals, the processor may select the quadrilateral with the highest score as the output.

FIG. 11 is a flow diagram illustrating a process 1100 for detecting stroke-marks and ranking candidate quadrilaterals, according to various implementations. For the ranking and elimination process for quadrilaterals in an image of a whiteboard (as opposed to a document) , the quadrilaterals may be ranked by process 1100. At block 1102, the processor may begin process 1100 by receiving a downscaled image, such as that described for block 704 in process 700, for example. At block 1104, the processor may convert the color image into a gray image. At block 1106, the processor may apply a difference-of-Gaussian (DoG) operation to the gray image. Here, in a particular implementation,

where G_σ＝4 is a Gaussian filter with the standard variance σ＝4 and

is a convolution operator. “Clamp” is an operation that computes the value of the first specified argument clamped to a range defined by the second and third specified arguments. The processor may threshold the DoGImage to get an initial mask of stroke-marks. To automatically estimate the threshold for an image, the processor may build a histogram according to the DoGImage and then calculate the gradients of nearby histogram bins. If the ratio of the gradient to the peak of histogram bin is greater than a constant (e.g., 64) , the processor may record the bin index as a threshold and use the bin index for thresholding the DoGImage.

At block 1108, the processor receives detected lines, such as from block 712 of process 700, for example. At block 1110, the processor removes some of the lines and isolated points. For example, various processes for detecting and generating lines may produce noise, such as spurious lines or points. Some noises result from relatively long lines that are relatively close to the image edges and isolated points and black spots on the DoGImage. The processor may use an eight-neighbor FloodFill algorithm to detect connected components, and check whether individual components are on a line or if the individual components are isolated points (or black spots) , which may be erased. For example, the processor may consider different ways to erase such noises. For example, for lines, if the length of a line segment is greater than 60％of the minimum length of image boundary or the line position (e.g., center, start or end) lies within the margin of the image boundary (e.g., the margin may be considered to be 20％of the image boundary length) , the connected component overlaying the line should be removed from the DoGImage. For isolated points and black spots, the processor may calculate the bounding box of a connect component. If the ratio of component area size to the bounding box size is greater than 0.8, for example, the processor may consider this component to be an isolated point or black spot to be removed.

At block 1112, the processor receives candidate quadrilaterals, such as from block 508 of process 500, for example. To rank the quadrilaterals, the processor may determine a region of the image that is foreground as opposed to a region of the image that is background. The processor may identify the foreground by comparing its color with that of a reference color, as follows.

At block 1114, the processor may compute the intersection region of all N quadrilateral candidates that were found in a first ranking process, described above. Although the intersection may not exactly be the foreground, the processor may consider it as the reference to foreground. At block 1116, the processor may partition the downscaled image into grids. For a particular example, grid size may be 25 X 25 pixels. For each grid, the processor may compute both its mean RGB color and its Lab color information. In addition, the processor may calculate the mean RGB color and the median Lab color within the intersection region. The mean RGB and median Lab may be considered as the reference colors. On one hand, the gray value of the reference colors may be used to determine whether the board is either a whiteboard or a blackboard. For example, if the gray value of the reference color is greater than the middle gray (i.e., 128) , it is the whiteboard. Otherwise it is the blackboard. For whiteboard and blackboard, the computation of DoGImage at block 1106 is different. On the other hand, the reference colors may be used to generate a foreground stroke-mark map by excluding background stroke-marks. The processor may then compute the respective distances (e.g., L2-norm Euclidean distance) between the RGB color of each grid and that of the reference, and between the LAB color of each grid and that of the reference. At block 1118, the processor may identify the foreground according to the mean color. For example, if the Lab distance between the color of each grid and that of the reference is less than a first threshold (for a particular example, threshold＝9) and the RGB distance is less than a second threshold (for a particular example, threshold＝70) , the responding grid may be regarded as the foreground, otherwise it is regarded as the background. At block 1120, the processor may generate a foreground stroke-mark map by separating all stroke-marks in the image into two subsets: foreground stroke-marks and background stroke-marks, which are then used to compute new features for score and ranking.

At block 1122, the processor may score and rank each quadrilateral based on a number of criteria. One of the criteria, for example, is the percentage of foreground strokes that are within the bounds of the quadrilateral (called “Percentage” ) . Another one of the criteria is the ratio of the size of quadrilateral to the image size. Yet another of the criteria is the ratio of the length of the detected lines to the length of the boundary (e.g., perimeter) of the quadrilateral boundary. This ratio is called “CoveredPerimeter” . The processor may only compute the rank score for quadrilaterals that have a CoveredPerimeter > 0.7, for example. The score is defined as:

RankScore ＝ Percentage -1/3 × (quadrilateral area size) / (image size)

Using the Rankscore, the processor may determine the quadrilateral with the highest score to be the bounds of the whiteboard region in the image.

FIG. 12 is a flow diagram illustrating a process 1200 for determining a boundary of a whiteboard region in an image, according to various implementations. A processor or computer system, such as device 200 illustrated in FIG. 2 for example, may perform process 1200. At block 1202, a processor may receive an image that includes an image of a whiteboard. At block 1204, the processor may detect a plurality of quadrilaterals in the image. For example, such a process may be similar to or the same as the process described for block 508 in process 500. At block 1206, the processor may partition the image into a plurality of grids. At block 1208, the partition may determine a color of each grid. For example, this process may be similar to or the same as the process described for block 1114 through block 1118 of process 1100. At block 1210, the processor may calculate an intersection space of at least a portion of the plurality of quadrilaterals. At block 1212, the processor may determine a color of the intersection space. At block 1214, the processor may determine a foreground of the image and a background of the image based, at least in part, on the color of each grid and the color of the intersection space, wherein the foreground is inside a boundary of the image of the whiteboard.

Example A, a method for image processing, the method comprising: receiving an image that includes a region picturing a whiteboard； detecting a plurality of quadrilaterals in the image； partitioning the image into a plurality of grids； determining a color of each grid； calculating an intersection space of at least a portion of the plurality of quadrilaterals； determining a color of the intersection space； and determining a foreground of the image and a background of the image based, at least in part, on the color of each grid and the color of the intersection space, wherein the foreground is inside a boundary of the region picturing the whiteboard.

Example B, the method as example A recites, wherein the region picturing the whiteboard includes stroke-marks, and the method further comprises: ranking the plurality of quadrilaterals based, at least in part, on the number of the stroke-marks within the respective quadrilaterals that are in the foreground of the image； and determining a boundary of the region picturing the whiteboard based, at least in part, on the ranking.

Example C, the method as example A recites, wherein detecting the plurality of quadrilaterals in the image comprises: partitioning the image into color channels so that each color channel comprises a component color of the image； detecting edges of each component color image so as to generate a plurality of lines； and based, at least in part, on predetermined criteria, selecting a subset of the plurality of lines to form the plurality of quadrilaterals.

Example D, the method as example C recites, wherein the predetermined criteria are based, at least in part, on (i) angles of the lines with respect to one another and (ii) location of the lines in the image.

Example E, the method as example C recites, wherein the plurality of lines comprises line pairs, and the method further comprises: classifying each of the lines pairs into one of three line-pair types based, at least in part, on an amount that lines of each line pair overlap one another, and wherein the predetermined criteria are based, at least in part, on a classification of the line pairs.

Example F, the method as any one of examples A-C recites, wherein the image is a color image, and wherein detecting the plurality of quadrilaterals in the image comprises: converting the color image to a gray-scale image that includes at least a portion of the stroke-marks； applying a difference-of-Gaussian operation to the gray-scale image to generate a difference-of-Gaussian image； and applying threshold criteria and a Flood-fill operation to the difference-of-Gaussian image to reduce the number of stroke-marks.

Example G, the method as any one of examples A-C recites, wherein the at least a portion of the plurality of quadrilaterals comprises a sub-optimal subset of quadrilaterals extracted from the plurality of quadrilaterals.

Example H, a system comprising: an input mechanism to receive an image picturing a document or a whiteboard； one or more processing units； and computer-readable media with modules thereon, the modules comprising: a color partitioning module to partition the image picturing the document or the whiteboard into multiple color components； an edge detecting module to detect edges in the image picturing the document or the whiteboard； a line segment generator module to generate line segments based, at least in part, on the detected edges； a quadrilateral generator module to generate quadrilateral candidates comprising a subset of the line segments； and a ranking module to rank the quadrilateral candidates according to likelihood of the quadrilateral candidates being a boundary of the document or the whiteboard.

Example I, the system as example H recites, wherein the image is picturing one of the document or the whiteboard, wherein the ranking module performs a first type of ranking process for the image picturing the document and performs a second type of ranking process for the image picturing the whiteboard, and wherein the first type of ranking process is different from the second type of ranking process.

Example J, the system as example I recites, wherein the first type of ranking process comprises: assigning respective scores to the quadrilateral candidates, wherein a score of an individual quadrilateral candidate is based, at least in part, on color contrast of a region at least partially bounding the individual quadrilateral candidate.

Example K, the system as example I recites, wherein the image is picturing the whiteboard, and wherein the second type of ranking process comprises: partitioning the image picturing the whiteboard into a plurality of grids； determining a color of each grid； calculating an intersection space of at least a portion of the quadrilateral candidates； determining a color of the intersection space； and determining a foreground of the image picturing the whiteboard and a background of the image picturing the whiteboard based, at least in part, on the color of each grid and the color of the intersection space.

Example L, the system as example K recites, wherein the foreground is inside a boundary of the whiteboard.

Example M, the system as example K recites, wherein the whiteboard includes stroke-marks, and wherein the second type of ranking process further comprises: ranking the plurality of quadrilateral candidates based, at least in part, on the number of the stroke-marks within the respective quadrilateral candidates that are in the foreground of the image picturing the whiteboard； and determining a boundary of the whiteboard in the image based, at least in part, on the ranking.

Example N, the system as example K recites, wherein the image picturing the whiteboard comprises a color image, the whiteboard includes stroke-marks, and the second type of ranking process further comprises: converting the color image to a gray-scale image that includes at least a portion of the stroke-marks； applying a difference-of-Gaussian operation to the gray-scale image to generate a difference-of-Gaussian image； and applying threshold criteria and a flood-fill operation to the difference-of-Gaussian image to reduce the number of stroke-marks.

Example O, the system as example H recites, wherein the modules further comprise: a line filter module to reduce the number of the line segments based, at least in part, on (i) angles of the line segments with respect to one another and (ii) location of the line segments in the image.

Example P, a method comprising: receiving an image that includes a region picturing a whiteboard； detecting a plurality of quadrilaterals in the image； partitioning the image into a plurality of grids； determining a color of each grid； calculating an intersection space of at least a portion of the plurality of quadrilaterals； determining a color of the intersection space； and determining a foreground of the image and a background of the image based, at least in part, on the color of each grid and the color of the intersection space, wherein the foreground is inside a boundary of the region picturing the whiteboard.

Example Q, the method as example P recites, wherein the whiteboard includes stroke-marks, and wherein the acts further comprise: ranking the plurality of quadrilaterals based, at least in part, on the number of the stroke-marks within the respective quadrilaterals that are in the foreground of the image； and determining a boundary of the region picturing the whiteboard based, at least in part, on the ranking.

Example R, the method as example P recites, wherein the acts further comprise: partitioning the image into color channels so that each color channel comprises a component color image of the image； detecting edges of each component color image so as to generate a plurality of lines； and based, at least in part, on predetermined criteria, selecting a subset of the plurality of lines to form the plurality of quadrilaterals.

Example S, the method as example R recites, wherein the predetermined criteria are based, at least in part, on (i) angles of the lines with respect to one another and (ii) location of the lines in the image.

Example T, the method as example P recites, wherein the at least a portion of the plurality of quadrilaterals comprises a sub-optimal subset of quadrilaterals extracted from the plurality of quadrilaterals.

CONCLUSION

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described. Rather, the specific features and steps are disclosed as example forms of implementing the claims.

All of the methods and processes described above may be embodied in, and fully automated via, software code modules executed by one or more general purpose computers or processors. The code modules may be stored in any type of computer-readable medium, computer storage medium, or other computer storage device. Some or all of the methods may alternatively be embodied in specialized computer hardware.

Conditional language such as, among others, "can, " "could, " "may" or "may, " unless specifically stated otherwise, are understood within the context to present that certain examples include, while other examples do not include, certain features, elements and/or steps. Thus, such conditional language is not generally intended to imply that certain features, elements and/or steps are in any way required for one or more examples or that one or more examples necessarily include logic for deciding, with or without user input or prompting, whether certain features, elements and/or steps are included or are to be performed in any particular example.

Conjunctive language such as the phrase “at least one of X, Y or Z, ” unless specifically stated otherwise, is to be understood to present that an item, term, etc. may be either X, Y, or Z, or a combination thereof.

Any routine descriptions, elements or blocks in the flow diagrams described herein and/or depicted in the attached figures should be understood as potentially representing modules, segments, or portions of code that include one or more executable instructions for implementing specific logical functions or elements in the routine. Alternate implementations are included within the scope of the examples described herein in which elements or functions may be deleted, or executed out of order from that shown or discussed, including substantially synchronously or in reverse order, depending on the functionality involved as would be understood by those skilled in the art.

It should be emphasized that many variations and modifications may be made to the above-described examples, the elements of which are to be understood as being among other acceptable examples. All such modifications and variations are intended to be included herein within the scope of this disclosure and protected by the following claims.

Claims

A method for image processing, the method comprising:

receiving an image that includes a region picturing a whiteboard；

detecting a plurality of quadrilaterals in the image；

partitioning the image into a plurality of grids；

determining a color of each grid；

calculating an intersection space of at least a portion of the plurality of quadrilaterals；

determining a color of the intersection space； and

determining a foreground of the image and a background of the image based, at least in part, on the color of each grid and the color of the intersection space, wherein the foreground is inside a boundary of the region picturing the whiteboard.
The method of claim 1, wherein the region picturing the whiteboard includes stroke-marks, and the method further comprises:

ranking the plurality of quadrilaterals based, at least in part, on the number of the stroke-marks within the respective quadrilaterals that are in the foreground of the image； and

determining a boundary of the region picturing the whiteboard based, at least in part, on the ranking.
The method of claim 1, wherein detecting the plurality of quadrilaterals in the image comprises:

partitioning the image into color channels so that each color channel comprises a component color of the image；

detecting edges of each component color image so as to generate a plurality of lines； and

based, at least in part, on predetermined criteria, selecting a subset of the plurality of lines to form the plurality of quadrilaterals.
The method of claim 3, wherein the predetermined criteria are based, at least in part, on (i)angles of the lines with respect to one another and (ii) location of the lines in the image.
The method of claim 3, wherein the plurality of lines comprises line pairs, and the method further comprises:

classifying each of the lines pairs into one of three line-pair types based, at least in part, on an amount that lines of each line pair overlap one another, and wherein the predetermined criteria are based, at least in part, on a classification of the line pairs.
The method of claim 1, wherein the image is a color image, and wherein detecting the plurality of quadrilaterals in the image comprises:

converting the color image to a gray-scale image that includes at least a portion of the stroke-marks；

applying a difference-of-Gaussian operation to the gray-scale image to generate a difference-of-Gaussian image； and

applying threshold criteria and a Flood-fill operation to the difference-of-Gaussian image to reduce the number of stroke-marks.
The method of claim 1, wherein the at least a portion of the plurality of quadrilaterals comprises a sub-optimal subset of quadrilaterals extracted from the plurality of quadrilaterals.
A system comprising:

an input mechanism to receive an image picturing a document or a whiteboard；

one or more processing units； and

computer-readable media with modules thereon, the modules comprising:

a color partitioning module to partition the image picturing the document or the whiteboard into multiple color components；

an edge detecting module to detect edges in the image picturing the document or the whiteboard；

a line segment generator module to generate line segments based, at least in part, on the detected edges；

a quadrilateral generator module to generate quadrilateral candidates comprising a subset of the line segments； and

a ranking module to rank the quadrilateral candidates according to likelihood of the quadrilateral candidates being a boundary of the document or the whiteboard.
The system of claim 8, wherein the image is picturing one of the document or the whiteboard, wherein the ranking module performs a first type of ranking process for the image picturing the document and performs a second type of ranking process for the image picturing the whiteboard, and wherein the first type of ranking process is different from the second type of ranking process.
The system of claim 9, wherein the first type of ranking process comprises:

assigning respective scores to the quadrilateral candidates, wherein a score of an individual quadrilateral candidate is based, at least in part, on color contrast of a region at least partially bounding the individual quadrilateral candidate.
The system of claim 9, wherein the image is picturing the whiteboard, and wherein the second type of ranking process comprises:

partitioning the image picturing the whiteboard into a plurality of grids；

determining a color of each grid；

calculating an intersection space of at least a portion of the quadrilateral candidates；

determining a color of the intersection space； and

determining a foreground of the image picturing the whiteboard and a background of the image picturing the whiteboard based, at least in part, on the color of each grid and the color of the intersection space.
The system of claim 11, wherein the foreground is inside a boundary of the whiteboard.
The system of claim 11, wherein the whiteboard includes stroke-marks, and wherein the second type of ranking process further comprises:

ranking the plurality of quadrilateral candidates based, at least in part, on the number of the stroke-marks within the respective quadrilateral candidates that are in the foreground of the image picturing the whiteboard； and

determining a boundary of the whiteboard in the image based, at least in part, on the ranking.
The system of claim 11, wherein

the image picturing the whiteboard comprises a color image,

the whiteboard includes stroke-marks, and

the second type of ranking process further comprises:

converting the color image to a gray-scale image that includes at least a portion of the stroke-marks；

applying a difference-of-Gaussian operation to the gray-scale image to generate a difference-of-Gaussian image； and

applying threshold criteria and a flood-fill operation to the difference-of-Gaussian image to reduce the number of stroke-marks.
The system of claim 8, wherein the modules further comprise:

a line filter module to reduce the number of the line segments based, at least in part, on (i) angles of the line segments with respect to one another and (ii) location of the line segments in the image.
A method comprising:

receiving an image that includes a region picturing a whiteboard；

partitioning the image into color channels so that each color channel comprises a component color of the image；

detecting edges of each component color image so as to generate a plurality of lines；

detecting a plurality of quadrilaterals in the image, wherein each of the plurality of quadrilaterals comprise four lines that are a subset of the plurality of lines；

partitioning the image into a plurality of grids；

determining a color of each grid；

calculating an intersection space of at least a portion of the plurality of quadrilaterals；

determining a color of the intersection space； and

determining a foreground of the image and a background of the image based, at least in part, on the color of each grid and the color of the intersection space, wherein the foreground is inside a boundary of the region picturing the whiteboard.
The method of claim 16, wherein the whiteboard includes stroke-marks, and wherein the acts further comprise:

ranking the plurality of quadrilaterals based, at least in part, on the number of the stroke-marks within the respective quadrilaterals that are in the foreground of the image； and

determining a boundary of the region picturing the whiteboard based, at least in part, on the ranking.
The method of claim 16, wherein the acts further comprise:

partitioning the image into color channels so that each color channel comprises a component color image of the image；

detecting edges of each component color image so as to generate a plurality of lines； and

based, at least in part, on predetermined criteria, selecting a subset of the plurality of lines to form the plurality of quadrilaterals.
The method of claim 18, wherein the predetermined criteria are based, at least in part, on (i)angles of the lines with respect to one another and (ii) location of the lines in the image.
The method of claim 16, wherein the at least a portion of the plurality of quadrilaterals comprises a sub-optimal subset of quadrilaterals extracted from the plurality of quadrilaterals.