WO2018120238A1 - 用于处理文档的设备、方法和图形用户界面 - Google Patents

用于处理文档的设备、方法和图形用户界面 Download PDF

Info

Publication number
WO2018120238A1
WO2018120238A1 PCT/CN2016/113987 CN2016113987W WO2018120238A1 WO 2018120238 A1 WO2018120238 A1 WO 2018120238A1 CN 2016113987 W CN2016113987 W CN 2016113987W WO 2018120238 A1 WO2018120238 A1 WO 2018120238A1
Authority
WO
WIPO (PCT)
Prior art keywords
color
pixel
quadrilateral
edge
color channel
Prior art date
Application number
PCT/CN2016/113987
Other languages
English (en)
French (fr)
Inventor
张运超
郜文美
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP16924927.3A priority Critical patent/EP3547218B1/en
Priority to CN201680091829.4A priority patent/CN110100251B/zh
Priority to US16/473,678 priority patent/US11158057B2/en
Priority to PCT/CN2016/113987 priority patent/WO2018120238A1/zh
Publication of WO2018120238A1 publication Critical patent/WO2018120238A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/16Image preprocessing
    • G06V30/1607Correcting image deformation, e.g. trapezoidal deformation caused by perspective
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/18Extraction of features or characteristics of the image
    • G06V30/1801Detecting partial patterns, e.g. edges or contours, or configurations, e.g. loops, corners, strokes or intersections
    • G06V30/18019Detecting partial patterns, e.g. edges or contours, or configurations, e.g. loops, corners, strokes or intersections by matching or filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30176Document

Definitions

  • This document relates to electronic devices that process documents, including but not limited to electronic devices, methods, and graphical user interfaces for detecting edge edges of documents.
  • a mobile device with a camera function such as a camera, mobile phone, wearable device or webcam.
  • a camera function such as a camera, mobile phone, wearable device or webcam.
  • people can use the mobile phone to take pictures of information such as whiteboards, slides or documents at any time, without having to hand-record this information, which is very convenient.
  • an original rectangular target image may be distorted into an arbitrary quadrilateral, such as a trapezoid, and this distortion is called tilt distortion.
  • the quadrilateral detection algorithm mainly uses the edge extraction algorithm in computer vision to detect the rectangular edges of the target image such as a document and a whiteboard, and is used to eliminate the non-target area outside the rectangular frame.
  • the rectangular region obtained by the above quadrilateral detection algorithm is usually subjected to projection correction using a quadrilateral correction algorithm to obtain a target image of higher quality.
  • An existing quadrilateral detection algorithm converts a color image into a single-channel grayscale image, and then performs quadrilateral edge detection, ignoring the color and saturation information of the image. In some scenes with similar front and back scene colors, the detection effect of the edge of the document is not good, and even some scenes where the human eye feels a large contrast between the front and the back are still difficult to detect the edge of the document.
  • Another quadrilateral detection algorithm available is to analyze each color channel of a color image to determine a corresponding busyness indicator that represents the complexity of the image data.
  • the color channel with the lowest degree of busyness is selected to detect the edge of the document.
  • This scheme takes into account the color, saturation, and luminance information of a color image, but only uses the least complex channel in detecting the edge of the document, and actually can be reduced to a single-channel processing algorithm.
  • the principle of minimum complexity is based on the premise that the degree of discrimination on the edge of the document is obvious. The principle of minimum channel selection can ensure that some non-real edge interference is eliminated, but the problem that the edge discrimination of the document is not obvious cannot be solved. For example, when detecting a white document on a pure white table, the color channel has the lowest complexity, but the detected document has fewer edges and cannot determine the true quadrilateral.
  • an electronic device with an efficient method and interface in response to the operation of the electronic device to take a document or process a document.
  • Such a method and interface can respond to the user's individual needs more quickly, more efficiently, and intelligently, and improve the success rate of the electronic device detecting the edge of the document, thereby avoiding the situation that the electronic device cannot detect the edge of the document.
  • the electronic device is portable (eg, a laptop, tablet, or handheld device, or a wearable device).
  • the electronic device has a touchpad.
  • the electronic device has a touch sensitive display (also referred to as a "touch screen", a “touch screen display” or a “display with a touch-sensitive surface”).
  • the electronic device has a graphical user interface (GUI), one or more processors, memory, and one or more modules, programs, or sets of instructions stored in the memory for performing a plurality of functions.
  • GUI graphical user interface
  • the user interacts with the GUI primarily through finger contact and/or gestures on the touch-sensitive surface.
  • these functions may include image editing, drawing, rendering, word processing, web page creation, disk editing, spreadsheet creation, playing games, making calls, video conferencing, emailing, instant messaging, exercise support. , digital photography, digital video recording, web browsing, digital music playback, and/or digital video playback. Executable instructions for performing these functions can be included in a non-transitory computer readable storage medium or other computer program product configured for execution by one or more processors.
  • the first aspect provides an electronic device.
  • the device includes: a display; one or more processors; a memory; a plurality of applications; and one or more programs.
  • the one or more programs are stored in memory and configured to be executed by one or more processors.
  • the one or more programs include instructions.
  • the instruction is configured to: acquire multi-color channel data of each pixel of a color image, where the multi-color channel data includes two-dimensional coordinate values of the pixel points, and values of the pixel points in the respective color channels; and the color image Performing line detection on the multi-color channel data of each pixel; detecting a quadrilateral based on performing the line detection to obtain a part or all of the line and a preset condition, the preset condition including the angle of the opposite side of the quadrilateral being smaller than the first a threshold value, the edge value of the quadrilateral is between a preset angle value range, and a distance between the opposite sides of the quadrilateral is greater than a second threshold, the first threshold is an integer greater than zero, and the second threshold is greater than An integer of zero.
  • Optional Preprocessing is performed on the color image, including but not limited to: color space conversion and/or histogram equalization.
  • the preset condition further includes: the ratio of the actually detected edge pixel number to the quadrilateral circumference is the largest. The quadrilateral thus detected is closer to the contour formed by the actual pixel points.
  • the instructions are further used to:
  • Preprocessing is also performed on the color image prior to acquiring multi-color channel data for each pixel of a color image, the pre-processing including at least one of color space conversion or histogram equalization.
  • the color space conversion and/or histogram equalization steps are added, which can further improve the probability of false detection of edge detection.
  • Histogram equalization is a method of improving image contrast, which can improve the detection rate of weak edges.
  • the instructions are further for:
  • Performing line detection on the multi-color channel data of each pixel of the color image including:
  • S1 calculating a gradient value and a gradient direction of each pixel in the color image in each color channel;
  • S2 marking a pixel point having a maximum value among the gradient values in all the color channels greater than a third threshold as the first a state;
  • S3 selecting, from a plurality of pixel points marked as the first state, a pixel point having a maximum value among the gradient values in all color channels as a starting point, performing line detection on all color channels to obtain all color channels Corresponding straight line; performing straight line detection from the starting point to obtain the longest straight line among the straight lines corresponding to all the color channels, adding the candidate document edge list, and marking the longest straight line points in all the color channels as the second state;
  • Step S3 is repeatedly performed until all the pixel points of the color image are marked as the second state.
  • the instructions are further for:
  • Selecting, from the plurality of pixel points marked as the first state, a pixel point having a maximum value among the gradient values in all the color channels as a starting point, performing line detection on all the color channels to obtain a corresponding color channel a straight line comprising: performing, for each color channel of all the color channels of the starting point, searching for a pixel point marking the first state along a vertical direction of the gradient direction of the starting point until the search path is marked as The number of the pixel points in the second state is greater than a third threshold; determining a straight line obtained by performing line detection, where two end points of the line are respectively the starting point and the ending point on the search path, and the marking on the search path The number of pixel points that are in the first state is greater than a fourth threshold.
  • the instructions are further for: Performing line detection on the multi-color channel data of each pixel of the image includes: performing Canny edge detection on all the multi-color channels in the color image, and marking the edges detected in all the multi-color channels as edge points, Constructing a multi-color channel blending edge; performing a Hough line detection on the edge points on the multi-channel blending edge, and adding the detected line to the candidate document edge list.
  • the instructions are further for:
  • Performing line detection on the multi-color channel data of each pixel of the color image including:
  • f(x(i)) is between The value between 0 and 1
  • m is the number of candidate document edges
  • m*f(x(i)) is the number of reserved lines of the i color channel.
  • the instructions are further for:
  • the quadrilateral is detected based on performing the line detection to obtain a part or all of the straight line and the preset condition, including: dividing the candidate line obtained based on performing the line detection into four categories according to the tilt angle and the position, the four categories including: , bottom, left and right; cyclically selecting a straight line from each category line, constructing a quadrilateral set according to the preset condition; selecting a quadrilateral having the largest ratio from the set of quadrilaterals as a result of edge detection, the ratio Is the number of edge pixels actually detected divided by the value obtained by fitting the perimeter of the quadrilateral.
  • the instruction is further configured to: perform at least one of the following processing on the detected quadrilateral: a quadrilateral original ratio estimation; or an attitude projection matrix estimation; or a quadrilateral correction; or Image enhancement.
  • the second aspect provides a method for an electronic device, the method comprising:
  • the multi-color channel data performs line detection; the quadrilateral is detected based on performing the line detection to obtain a part or all of the line and the preset condition, and the preset condition includes that the angle of the opposite side of the quadrilateral is smaller than the first threshold, The edge value of the quadrilateral is between the preset angle value range, and the opposite side distance of the quadrilateral is greater than a second threshold, the first threshold is an integer greater than zero, and the second threshold is an integer greater than zero.
  • the preset condition further includes: The ratio of the measured number of edge pixels to the circumference of the quadrilateral is the largest. The quadrilateral thus detected is closer to the contour formed by the actual pixel points.
  • the method further includes:
  • Preprocessing is also performed on the color image prior to acquiring multi-color channel data for each pixel of a color image, the pre-processing including at least one of color space conversion or histogram equalization.
  • the color space conversion and/or histogram equalization steps are added, which can further improve the probability of false detection of edge detection.
  • the method further includes:
  • Performing line detection on the multi-color channel data of each pixel of the color image including:
  • S1 calculating a gradient value and a gradient direction of each pixel in the color image in each color channel;
  • S2 marking a pixel point having a maximum value among the gradient values in all the color channels greater than a third threshold as the first a state;
  • S3 selecting, from a plurality of pixel points marked as the first state, a pixel point having a maximum value among the gradient values in all color channels as a starting point, performing line detection on all color channels to obtain all color channels Corresponding straight line; performing straight line detection from the starting point to obtain the longest straight line among the straight lines corresponding to all the color channels, adding the candidate document edge list, and marking the longest straight line points in all the color channels as the second state;
  • Step S3 is repeatedly performed until all the pixel points of the color image are marked as the second state.
  • the performing the line detection on the multi-color channel data of each pixel of the color image comprises: performing Canny in all the multi-color channels in the color image Edge detection, marking the edges detected in all multi-color channels as edge points, constructing a multi-color channel blending edge; performing a Hough line detection on the edge points on the multi-channel blending edge, adding the detected straight line A list of candidate document edges.
  • the quadrilateral is detected based on performing the line detection to obtain a part or all of the straight line and the preset condition, including: tilting the candidate line obtained based on performing the line detection according to the tilt
  • the angle and the position are divided into four categories, including: up, down, left, and right; a line is cyclically selected from each category line, and a set of quads is constructed according to the preset condition; from the set of quads A quadrilateral with the largest ratio is selected as the result of edge detection, which is the value obtained by dividing the number of edge pixels actually detected by the circumference of the fitted quadrilateral.
  • the method further includes:
  • At least one of the following processes is performed on the detected quadrilateral: quadrilateral original scale estimation; or, attitude projection matrix estimation; or quadrilateral correction; or image enhancement.
  • an electronic device comprising: a display; one or more processors; a memory; a plurality of applications; and one or more programs.
  • One or more of the programs are stored in memory and configured to be executed by one or more processors.
  • the one or more programs include instructions for performing the method according to the second aspect.
  • the one or more programs include instructions that, when executed by an electronic device comprising a display and a plurality of applications, cause the electronic device to perform the method according to the second aspect.
  • the electronic device includes a display, a memory, a plurality of applications, and one or more processors for executing one or more programs stored in the memory.
  • the graphical user interface includes a user interface displayed in accordance with the method of the second aspect, wherein the display includes a touch sensitive surface and a display screen.
  • an electronic device including: a display, wherein the display includes touch sensitive Surface and display screen; a plurality of applications; and means or modules or units for performing the method according to the second aspect.
  • Electronic devices include displays and multiple applications.
  • the information processing apparatus comprises: means for performing the method according to the second aspect, wherein the display comprises a touch sensitive surface and a display screen.
  • the electronic device can improve the success rate of detecting the edge of the document when the document and the background are not clearly distinguished.
  • FIG. 1 is a block diagram showing an electronic device in accordance with some embodiments.
  • FIG. 2 illustrates an electronic device in accordance with some embodiments.
  • FIG. 3 is a block diagram of an electronic device in accordance with some embodiments.
  • FIG. 4a shows a schematic diagram of an electronic device invoking an imaging device to capture a document, in accordance with some embodiments.
  • 4b shows a schematic diagram of a quadrilateral region in which a source rectangular picture is captured in an image captured by an electronic device, in accordance with some embodiments.
  • FIG. 5 illustrates a flow diagram for detecting an edge of a document, in accordance with some embodiments.
  • FIG. 6 illustrates a flow diagram for detecting an edge of a document, in accordance with some embodiments.
  • Figure 7 shows a schematic diagram of an RGB color space conversion to an HSV color space.
  • FIG. 8 shows a schematic diagram of the effect of performing histogram equalization.
  • Fig. 9 shows a schematic diagram of a gradient of an image and a gradient direction of an image.
  • Figure 10 shows a flow chart of Canny edge detection.
  • Figure 11 shows a schematic diagram of suppressing non-maximal values.
  • Fig. 12 shows a schematic diagram of Hough line detection.
  • Figure 13 illustrates a flow diagram for detecting an edge of a document, in accordance with some embodiments.
  • Figure 14 shows a schematic diagram of a linear fusion of multi-color channels.
  • Fig. 15 is a view showing the effect of the linear fusion of the multi-color channels.
  • Figure 16 illustrates a flow diagram for detecting an edge of a document, in accordance with some embodiments.
  • Figure 17 illustrates a flow diagram for detecting a document edge algorithm, in accordance with some embodiments.
  • Figure 18 shows a schematic diagram of a linear fusion of multiple color channels.
  • Figure 19 illustrates a flow diagram for detecting an edge of a document, in accordance with some embodiments.
  • Figure 20 illustrates a flow diagram for detecting an edge of a document, in accordance with some embodiments.
  • Figure 21 illustrates a flow diagram for detecting an edge of a document, in accordance with some embodiments.
  • Figure 22 shows a functional block diagram of the processing unit.
  • electronic devices need to have efficient methods and interfaces in response to the operation of the electronic device to take a document or process a document.
  • Such a method and interface can respond to the user's individual needs more quickly, more efficiently, and intelligently, and improve the success rate of the electronic device detecting the edge of the document.
  • the embodiment described below describes a technical solution for improving the success rate of detecting an edge of a document by an electronic device in the case where the degree of discrimination between the captured document and the background is not high.
  • the multi-color channels referred to herein include, but are not limited to, RGB (Red, Green, Blue) color channels, HSV (Hue, Saturation, Value) color channels, or HSL (Hue, Saturation, Lightness) color channels.
  • RGB Red, Green, Blue
  • HSV Human, Saturation, Value
  • HSL Human, Saturation, Lightness
  • the multi-color channel is described by taking the HSV color channel as an example, but this does not constitute a limitation of the present invention.
  • the multi-color channel may also adopt an RGB three-color channel, or an HSL three-color channel. Or other types of multi-color channels.
  • the mathematical expressions of the multi-color channel data referred to herein include but are not limited to: (i, j, h, s, v), or, (i, j, h), (i, j, s) and (i, j, v).
  • (i, j) is the coordinates of a pixel on the color image
  • h, s, and v refer to the values of the (i, j) pixel on the Hue, Saturation, and Value color channels, respectively.
  • the multi-color channel data can also be represented separately, that is, the coordinates of (i, j) pixels in the Hue single channel are (i, j, h), and the coordinates are (i, j) pixels in the Saturation single channel.
  • the value is (i, j, s), and the coordinates are (i, j).
  • the value of the pixel in the Value single channel is (i, j, v), ie, (i, j, h), (i, j, s). ) and (i, j, v).
  • the color image may be a preview frame image captured by the electronic device through the camera, or may be a digital image saved by the electronic device.
  • the document can be any planar object having a rectangular edge, such as a business card, an ID card, a slide projection screen, and the like.
  • Figures 1, 2, 3 provide a description of an exemplary device.
  • first threshold may be named a second threshold
  • second threshold may be named the first threshold without departing from the scope of the present invention.
  • Both the first threshold and the second threshold are thresholds, their sizes may be equal, and their sizes may not be equal.
  • the term “if” may be interpreted to mean “when” or “after” or “in response to determining” or “in response to detecting”, depending on the context.
  • the phrase “if determined" or “if [stated condition or event stated]” can be interpreted as meaning “when determining" or “in response to determining" or “detected [When stated conditions or events] or “in response to detecting [stated conditions or events]”.
  • the device is a portable communication device, such as a mobile phone, that also includes other functions, such as a personal digital assistant and/or music player functionality.
  • exemplary embodiments of an electronic device include, but are not limited to, piggybacking Or an electronic device of another operating system.
  • Other electronic devices may also be used, such as a laptop or tablet with a touch-sensitive surface (eg, a touch screen display and/or a touchpad).
  • the device is not a portable communication device, but a desktop computer having a touch-sensitive surface (eg, a touch screen display and/or a touch pad).
  • an electronic device including a display and a touch-sensitive surface is described. It should be understood, however, that an electronic device can include one or more other physical user interface devices, such as a physical keyboard, mouse, and/or joystick.
  • Devices typically support a variety of applications, such as one or more of the following: drawing applications, rendering applications, word processing applications, web page creation applications, disk editing applications, spreadsheet applications, gaming applications, Telephony applications, video conferencing applications, email applications, instant messaging applications, workout support applications, photo management applications, digital camera applications, digital video camera applications, web browsing applications, digital music player applications Program, and/or digital video player application.
  • applications such as one or more of the following: drawing applications, rendering applications, word processing applications, web page creation applications, disk editing applications, spreadsheet applications, gaming applications, Telephony applications, video conferencing applications, email applications, instant messaging applications, workout support applications, photo management applications, digital camera applications, digital video camera applications, web browsing applications, digital music player applications Program, and/or digital video player application.
  • Various applications that can be executed on the device can use at least one shared physical user interface device, such as a touch-sensitive surface.
  • One or more functions of the touch-sensitive surface and corresponding information displayed on the device may be adjusted and/or changed from one application to the next and/or adjusted and/or varied within the respective application.
  • the device's shared physical architecture such as a touch-sensitive surface, can support a variety of applications with a user interface that is intuitive to the user.
  • FIG. 1 is a block diagram showing a portable multifunction device 100 with a touch-sensitive display 112 in accordance with some embodiments.
  • Touch-sensitive display 112 is sometimes referred to as a "touch screen" for convenience, and may also be referred to as or referred to as a touch-sensitive display system, and may also be referred to as having a touch-sensitive surface and a display. Display system.
  • the portable multifunction device 100 can include a memory 102 (which can include one or more computer readable storage media), a memory controller 122, one or more processing units (CPUs) 120, a peripheral device interface 118, RF circuitry 108, Audio circuitry 110, speaker 111, microphone 113, input/output (I/O) subsystem 106, other input control devices 116, and external port 124.
  • Portable multifunction device 100 can include one or more optical sensors 164. These components can communicate over one or more communication buses or signal lines 103.
  • the portable multifunction device 100 is just one example of an electronic device, and the portable multifunction device 100 may have more or fewer components than those shown, two or more components may be combined, or There are different configurations or arrangements of these components.
  • the various components shown in FIG. 1 may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.
  • Memory 102 can include high speed random access memory and can also include non-volatile memory, such as one or more magnetic disk storage devices, flash memory devices, or other non-volatile solid state memory devices. Access to memory 102 by other components of portable multifunction device 100, such as CPU 120 and peripheral interface 118, may be controlled by memory controller 122.
  • Peripheral device interface 118 can be used to couple the input and output peripherals of the device to CPU 120 and memory 102.
  • the one or more processors 120 execute or execute various software programs and/or sets of instructions stored in the memory 102 to perform various functions of the portable multifunction device 100 and process data.
  • the one or more processors 120 include an image signal processor and a dual or multi-core processor.
  • peripheral interface 118, CPU 120, and memory controller 122 can be implemented on a single chip, such as chip 104. In some other embodiments, they can be implemented on separate chips.
  • RF (radio frequency) circuitry 108 receives and transmits RF signals, also referred to as electromagnetic signals.
  • the RF circuitry 108 converts electrical signals into/from electromagnetic signals and communicates with communication networks and other communication devices via electromagnetic signals.
  • RF circuitry 108 may include well-known circuitry for performing these functions, including but not limited to antenna systems, RF transceivers, one or more amplifiers, tuners, one or more oscillators, digital signal processors, Decode chipset, Subscriber Identity Module (SIM) card, memory, and more.
  • SIM Subscriber Identity Module
  • the RF circuitry 108 can communicate with the network and other devices via wireless communication, such as the Internet (also known as the World Wide Web (WWW)), intranets, and/or wireless networks (such as cellular telephone networks, wireless local area networks (LANs), and/or Or Metropolitan Area Network (MAN).
  • wireless communication such as the Internet (also known as the World Wide Web (WWW)), intranets, and/or wireless networks (such as cellular telephone networks, wireless local area networks (LANs), and/or Or Metropolitan Area Network (MAN).
  • WWW World Wide Web
  • LANs wireless local area networks
  • MAN Metropolitan Area Network
  • Wireless communication can use any of a variety of communication standards, protocols, and technologies including, but not limited to, Global System for Mobile Communications (GSM), Enhanced Data GSM Environment (EDGE), High Speed Downlink Packet Access (HSDPA), high speed uplink Link Packet Access (HSUPA), Wideband Code Division Multiple Access (W-CDMA), Code Division Multiple Access (CDMA), Time Division Multiple Access (TDMA), Bluetooth, Wireless Fidelity (WI-Fi) (eg, IEEE 802.11) a, IEEE 802.11b, IEEE 802.11g and/or IEEE 802.11n), Voice over Internet Protocol (VoIP), Wi-MAX, email protocols (eg, Internet Message Access Protocol (IMAP) and/or Post Office Protocol (POP)) , instant messaging (eg, Extensible Messaging Processing Site Protocol (IMAP) and/or Post Office Protocol (POP)) , instant messaging (eg, Extensible Messaging Processing Site Protocol (XMPP), Session Initiation Protocol (SIMPLE) for Instant Messaging and Field
  • Audio circuitry 110, speaker 111, and microphone 113 provide user and portable multi-function An audio interface between the devices 100.
  • Audio circuitry 110 receives audio data from peripheral interface 118, converts the audio data into electrical signals, and transmits the electrical signals to speaker 111.
  • the speaker 111 converts the electrical signal into a human audible sound wave.
  • the audio circuitry 110 also receives electrical signals converted by the microphone 113 based on the acoustic waves. Audio circuitry 110 converts the electrical signals into audio data and transmits the audio data to peripheral interface 118 for processing. Audio data may be retrieved from and/or transmitted to memory 102 and/or RF circuitry 108 by peripheral device interface 118.
  • audio circuitry 110 also includes a headset jack (eg, 212 in FIG. 2).
  • the headset jack provides an interface between the audio circuitry 110 and a removable audio input/output peripheral, such as an output only headset or with an output (eg, a monaural or binaural headset) and input (eg, Microphone
  • the I/O subsystem 106 couples input/output peripherals on portable multifunction device 100, such as touch screen 112 and other input control devices 116, to peripheral device interface 118.
  • the I/O subsystem 106 can include a display controller 156 and one or more input controllers 160 for other input control devices.
  • the one or more input controllers 160 receive electrical signals/transmit electrical signals from other input control devices 116 to other input control devices 116.
  • the other input control device 116 may include physical buttons (eg, a push button, rocker button, etc.), a dial, a slide switch, a joystick, a click wheel, and the like.
  • input controller 160 may be coupled to (or not coupled to) any of the following: a keyboard, an infrared port, a USB port, and a pointing device such as a mouse.
  • the one or more buttons may include up/down buttons for volume control of the speaker 111 and/or the microphone 113.
  • the one or more buttons can include a push button (eg, 206 in Figure 2).
  • Touch sensitive display 112 provides an input interface and an output interface between the device and the user.
  • Display controller 156 receives electrical signals from touch screen 112 and/or transmits electrical signals to touch screen 112.
  • Touch screen 112 displays a visual output to the user.
  • Visual output can include graphics, text, icons, video, and any combination thereof (collectively referred to as "graphics"). In some embodiments, some visual output or all of the visual output may correspond to a user interface object.
  • Touch screen 112 has a touch-sensitive surface, sensor or group of sensors that accept input from a user based on tactile and/or tactile contact.
  • Touch screen 112 and display controller 156 (along with any associated modules and/or sets of instructions in memory 102) detect contact on touch screen 112 (and any movement or interruption of the contact) and convert the detected contact to Interaction with user interface objects (eg, one or more soft keys, icons, web pages, or images) displayed on touch screen 112.
  • user interface objects eg, one or more soft keys, icons, web pages, or images
  • the point of contact between the touch screen 112 and the user corresponds to the user's finger.
  • Touch screen 112 may use LCD (Liquid Crystal Display) technology, LPD (Light Emitting Polymer Display) technology, or LED (Light Emitting Diode) technology, although other display technologies may be used in other embodiments.
  • Touch screen 112 and display controller 156 may utilize any of a variety of touch sensing techniques now known or later developed, as well as other proximity sensor arrays or other means for determining one or more points in contact with touch screen 112. Elements to detect contact and any movement or interruption thereof, including but not limited to capacitive, resistive, infrared, and surface acoustic wave techniques. In an exemplary embodiment, a projected mutual capacitance sensing technique is used.
  • Touch screen 112 can have a video resolution in excess of 100 dpi. In some embodiments, the touch screen has a video resolution of approximately 160 dpi.
  • the user can contact the touch screen 112 using any suitable object or add-on such as a stylus, finger, or the like.
  • the user interface is designed to work primarily with finger-based contacts and gestures, which may be less accurate than the stylus-based input due to the larger contact area of the finger on the touch screen.
  • the device translates the finger-based coarse input into an accurate pointer/cursor position or command to perform the action desired by the user.
  • the portable multifunction device 100 can include a touchpad (not shown) for activating or deactivating a particular function.
  • the touchpad is a touch sensitive area of the device that is different from the touchscreen in that it does not display a visual output.
  • the touchpad can be a touch-sensitive surface that is separate from the touchscreen 112 or an extension of the touch-sensitive surface formed by the touchscreen.
  • the portable multifunction device 100 also includes a power system 162 for powering various components.
  • the power system 162 can include a power management system, one or more power sources (eg, batteries, alternating current (AC)), a recharging system, power fault detection circuitry, power converters or inverters, power status indicators (eg, lighting) A diode (LED) and any other component associated with the generation, management, and distribution of power in a portable device.
  • Portable multifunction device 100 may also include one or more optical sensors 164.
  • FIG. 1 shows an optical sensor coupled to optical sensor controller 158 in I/O subsystem 106.
  • Optical sensor 164 can include a charge coupled device (CCD) or a complementary metal oxide semiconductor (CMOS) phototransistor.
  • CMOS complementary metal oxide semiconductor
  • Optical sensor 164 receives light projected through one or more lenses from the environment and converts the light into data representing an image.
  • imaging module 143 also referred to as a camera module
  • optical sensor 164 can capture still images or video.
  • one or more optical sensors Located at the rear of the portable multifunction device 100, opposite the touch screen display 112 on the front of the device, such that the touch screen display can be used as a viewfinder for still image and/or video image capture.
  • another or more optical sensors are located on the front of the device such that the user can view images of the user while viewing other video conferencing participants on the touch screen display for video conferencing.
  • Portable multifunction device 100 may also include one or more proximity sensors 166.
  • FIG. 1 shows a proximity sensor 166 coupled to a peripheral device interface 118.
  • proximity sensor 166 can be coupled to input controller 160 in I/O subsystem 106.
  • the proximity sensor when the electronic device is placed near the user's ear (eg, when the user is making a phone call), the proximity sensor turns off and disables the touch screen 112.
  • Portable multifunction device 100 may also include one or more accelerometers 168.
  • FIG. 1 shows an accelerometer 168 coupled to a peripheral device interface 118.
  • accelerometer 168 can be coupled to input controller 160 in I/O subsystem 106.
  • the information is displayed in a portrait view or a landscape view on the touch screen display based on an analysis of the data received from the one or more accelerometers.
  • the portable multifunction device 100 optionally includes a magnetometer (not shown) and a GPS (or GLONASS or Beidou or other global navigation system) receiver (not shown) in addition to the accelerometer 168 for obtaining more information about the portable Information on the position and orientation (eg, portrait or landscape) of the functional device 100.
  • the software components stored in the memory 102 include an operating system 126, a communication module (or set of instructions) 128, a contact/moving module (or set of instructions) 130, a graphics module (or set of instructions) 132, text input.
  • memory 102 stores device/global internal state 157, as shown in FIG.
  • the device/global internal state 157 includes one or more of the following: an active application state indicating which applications (if any) are currently active; a display state indicating which applications, views, or other Information occupies various areas of the touch screen display 112; sensor status, including information obtained from various sensors of the device and input control device 116; and position information regarding the position and posture of the device.
  • Operating system 126 eg, Darwin, RTXC, LINUX, UNIX, OS X, WINDOWS, ANDROID, or an embedded operating system (such as Vx Works)
  • general system tasks eg, memory management, storage device control, Various software components and/or drivers for power management, etc., and facilitate communication between various hardware and software components.
  • the memory 102 stores a digital camera film 159 and a digital image pipeline 161.
  • Communication module 128 facilitates communication with other devices via one or more external ports 124, and also includes various software components for processing data received by RF circuitry 108 and/or external port 124.
  • External port 124 eg, Universal Serial Bus (USB), Firewire, etc.
  • USB Universal Serial Bus
  • Firewire Firewire
  • the external port is adapted to be directly coupled to other devices or indirectly coupled through a network (eg, the Internet, a wireless LAN, etc.).
  • the external port is a multi-pin (eg, 30-pin) connector that is identical or similar to and/or compatible with a 30-pin connector used on an iPod (trademark of Apple Inc.) device.
  • the contact/movement module 130 can detect contact with the touch screen 112 (in conjunction with the display controller 156) and other touch sensitive devices (eg, a touchpad or physical click wheel).
  • the contact/movement module 130 includes a plurality of software components for performing various operations related to contact detection, such as determining if a contact has occurred (eg, detecting a finger press event), determining if there is a contact movement, and being touch sensitive throughout The movement is tracked on the surface (eg, detecting one or more finger drag events), and determining if the contact has terminated (eg, detecting a finger lift event or contact interruption).
  • Contact/movement module 130 receives contact data from the touch-sensitive surface.
  • Determining the movement of the contact point may include determining the rate (magnitude), velocity (magnitude and direction), and/or acceleration (change in magnitude and/or direction) of the contact point, the movement of the contact point being caused by a series of contact data Said. These operations can be applied to a single point of contact (eg, one finger contact) or multiple points of simultaneous contact (eg, "multi-touch" / multiple finger contacts).
  • the contact/movement module 130 and the display controller 156 detect contact on the touchpad.
  • the contact/movement module 130 can detect a user's gesture input. Different gestures on the touch-sensitive surface have different contact patterns. Therefore, the gesture can be detected by detecting a specific contact pattern. For example, detecting a one-finger tap gesture includes detecting a finger press event, and then detecting a finger lift at a location (or substantially the same location) as the finger press event (eg, at the icon location) ( Lift off) event. As another example, detecting a finger swipe gesture on the touch-sensitive surface includes detecting a finger press event, then detecting one or more finger drag events, and then detecting a finger lift (lift off) event.
  • Graphics module 132 includes a number of known software components for rendering and displaying graphics on touch screen 112 or other display, including components for changing the strength of the displayed graphics.
  • graphics includes any object that can be displayed to a user, including, without limitation, text, web pages, icons (such as user interface objects including soft keys), digital images, video, animation, and the like.
  • graphics module 132 stores data representation graphics to be used. Each graphic can be assigned a corresponding code.
  • the graphics module 132 receives one or more codes specifying graphics to be displayed from an application or the like, and also receives coordinate data and other graphic attributes together if necessary. The data, and then the screen image data is generated for output to the display controller 156.
  • a text input module 134 which may be a component of the graphics module 132, is provided for use in a variety of applications (eg, contacts 137, email 140, instant message 141, browser 147, and any other application requiring text input) Enter the soft keyboard for the text.
  • applications eg, contacts 137, email 140, instant message 141, browser 147, and any other application requiring text input
  • the GPS module 135 determines the location of the device and provides this information for use in various applications (eg, for phone 138 for location based dialing, for camera 143 as picture/video metadata, and for providing) Applications for location-based services, such as the weather desktop applet, the local yellow pages desktop applet, and the map/navigation desktop applet.
  • applications eg, for phone 138 for location based dialing, for camera 143 as picture/video metadata, and for providing
  • Applications for location-based services such as the weather desktop applet, the local yellow pages desktop applet, and the map/navigation desktop applet.
  • Application 136 may include the following modules (or sets of instructions) or subgroups or supersets thereof:
  • Contact module 137 (sometimes referred to as an address book or contact list);
  • IM Instant messaging
  • a camera module 143 for still images and/or video images
  • Image management module 144
  • Calendar module 148
  • a desktop applet module 149 which may include one or more of the following: weather desktop applet 149-1, stock market desktop applet 149-2, calculator desktop applet 149-3, alarm desktop applet 149-4 , the dictionary desktop applet 149-5, and other desktop applets obtained by the user, and the user-created desktop applet 149-6;
  • a desktop applet creator module 150 for generating a user-created desktop applet 149-6;
  • Video and music player module 152 which may be comprised of a video player module and a music player module;
  • Sound/audio recorder module 163 Sound/audio recorder module 163; and/or
  • Notification module 165
  • Examples of other applications 136 that may be stored in memory 102 include other word processing applications, other image editing applications, drawing applications, rendering applications, JAVA enabled applications, encryption, digital rights management, voice recognition, And sound reproduction.
  • contact module 137 can be used to manage contacts or contact lists (eg, stored in memory 102 or memory 370 for contact)
  • the application internal state 192 of the person module 137 includes: adding a name to the address book; deleting the name from the address book; associating the phone number, email address, actual address, or other information with the name; associating the image with the name; Names are categorized and categorized; a phone number or email address is provided to initiate and/or facilitate communication over the phone 138, video conference 139, email 140, or IM 141;
  • telephone module 138 can be used to input corresponding to the telephone The character sequence of the number, accessing one or more phone numbers in the address book 137, modifying the phone number that has been entered, dialing the corresponding phone number, making a call, and disconnecting or hanging up when the call is completed.
  • wireless communication can use any of a variety of communication standards, protocols, and technologies.
  • the video conferencing module 139 includes executable instructions for initiating, conducting, and ending a video conference between the user and one or more other participants in accordance with user instructions.
  • email client module 140 includes means for creating, transmitting, receiving, and managing in response to user instructions Executable instructions for email.
  • the email client module 140 makes it very easy to create and send emails with still images or video images taken by the camera module 143.
  • instant messaging module 141 includes a sequence of characters for inputting an instant message, modifying previously entered characters, Transfer the corresponding instant message (for example, Use Short Message Service (SMS) or Multimedia Messaging Service (MMS) protocol for phone-based instant messaging or use XMPP, SIMPLE, or IMPS for Internet-based instant messaging), receive instant messages, and view received instant messages Executable instructions.
  • SMS Use Short Message Service
  • MMS Multimedia Messaging Service
  • XMPP extensible Markup Language
  • SIMPLE Session Initiation Protocol
  • IMPS Internet Messaging Protocol
  • the transmitted and/or received instant messages may include graphics, photos, audio files, video files, and/or other attachments supported in MMS and/or Enhanced Messaging Service (EMS).
  • EMS Enhanced Messaging Service
  • instant messaging refers to both phone-based messages (eg, messages sent using SMS or MMS) and Internet-based messages (eg, messages transmitted using XMPP, SIMPLE, or IMPS).
  • exercise support module 142 includes executable instructions For creating workouts (eg, having time, distance, and/or calorie consumption goals); communicating with exercise sensors (sports equipment); receiving workout sensor data; calibrating sensors for monitoring workouts; selecting and playing music for workouts; And display, store, and transmit workout data.
  • creating workouts eg, having time, distance, and/or calorie consumption goals
  • communicating with exercise sensors sports equipment
  • receiving workout sensor data calibrating sensors for monitoring workouts
  • selecting and playing music for workouts And display, store, and transmit workout data.
  • camera module 143 includes features for capturing still images or video (including video streams) and storing them in memory 102 (eg, in digital camera film 159), modifying the characteristics of still images or video, or from memory 102 (eg, from digital camera film 159) deletes executable instructions for still images or video.
  • image management module 144 includes means for arranging, modifying (eg, editing), or otherwise manipulating, adding An executable instruction to tag, delete, render (eg, in a digital slide or album), and to store still images and/or video images (including still images and/or video images stored in camera roll 159).
  • browser module 147 includes means for browsing the Internet in accordance with user instructions (including searching, linking to, receiving, Executable instructions for displaying a web page or portion thereof, and attachments and other files linked to the web page.
  • calendar module 148 includes instructions for Create, display, modify, and store calendars Executable instructions for data associated with the calendar (eg, calendar entries, to-do list, etc.).
  • desktop applet module 149 is a mini-application that can be downloaded and used by a user. (eg, weather desktop applet 147-1, stock desktop applet 149-2, calculator desktop applet 149-3, alarm desktop applet 149-4, and dictionary desktop applet 149-5) or created by the user Micro-apps (for example, user-created desktop applets 149-6).
  • the desktop applet includes an HTML (Hypertext Markup Language) file, a CSS (Cascading Style Sheet) file, and a JavaScript file.
  • the desktop applet includes an XML (Extensible Markup Language) file and a JavaScript file (eg, a Yahoo! desktop applet).
  • desktop applet creator module 150 can be used by a user to create a desktop applet (For example, transfer the user-specified portion of a web page to the desktop applet).
  • search module 151 includes means for searching in memory 102 for one or more search criteria (eg, user specified) in accordance with user instructions.
  • search criteria eg, user specified
  • One or more search terms executable instructions for matching text, music, sound, images, video, and/or other files.
  • video and music player module 152 includes user download and playback Executable instructions for recorded music and other sound files stored in one or more file formats (such as MP3 or AAC files), and for displaying, rendering, or otherwise playing back video (eg, on touch screen 112 or Executable instructions on an external display connected via external port 124.
  • device 100 may include the functionality of an MP3 player.
  • note pad module 153 includes executable instructions for creating and managing notes, to-do list, and the like in accordance with user instructions.
  • map module 154 can be used to receive, according to user instructions, Display, modify, and store maps and maps Associated data (eg, driving directions; data for stores and other points of interest at or near specific locations; and other location-based data).
  • Map module 154 can be used to receive, according to user instructions, Display, modify, and store maps and maps Associated data (eg, driving directions; data for stores and other points of interest at or near specific locations; and other location-based data).
  • online Video module 155 includes allowing a user to access, browse, receive (eg, stream receive and/or download), play back (eg, on a touch screen or on an external display connected via external port 124), send with a particular online video.
  • receive eg, stream receive and/or download
  • play back eg, on a touch screen or on an external display connected via external port 124
  • sends eg, on a touch screen or on an external display connected via external port 124
  • the instant message module 141 is used instead of the email client module 140 to send a link to a particular online video.
  • audio/audio recorder module 163 includes a user-allowed one or more file formats (such as An MP3 or AAC file) executable instructions that record audio (eg, sound), and executable instructions for rendering or otherwise playing back the recorded audio file.
  • file formats such as An MP3 or AAC file
  • notification module 165 includes displaying notifications or warnings on touch screen 112 (such as incoming or incoming calls, calendar event reminders, application events, etc.) Executable instructions.
  • modules and applications corresponds to a set of methods for performing one or more of the functions described above, as well as the methods described in this application (eg, computer implemented methods and other information processing methods described herein). Execute the instruction.
  • modules ie, sets of instructions
  • memory 102 can store a subset of the modules and data structures described above.
  • memory 102 can store additional modules and data structures not described above.
  • portable multifunction device 100 is a device in which the operation of a predefined set of functions on the device is uniquely performed by a touch screen and/or a touchpad.
  • the touch screen and/or the touchpad as the primary input control device for the operation of the portable multifunction device 100, the number of physical input control devices (such as push buttons, dials, etc.) on the portable multifunction device 100 can be reduced. .
  • the predefined set of functions that are uniquely executable by the touch screen and/or trackpad include Navigation between user interfaces.
  • the portable multifunction device 100 is navigated from any user interface displayable on the portable multifunction device 100 to a main menu, main menu, or root menu when the touchpad is touched by a user.
  • the touchpad can be referred to as a "menu button.”
  • the menu button can be a physical push button or other physical input control device instead of a touchpad.
  • FIG. 2 illustrates a portable multifunction device 100 having a touch screen 112 in accordance with some embodiments.
  • the touch screen can display one or more graphics within a user interface (UI) 200.
  • UI user interface
  • the user may use, for example, one or more fingers 202 (not drawn to scale in the drawings) or one or more stylus 203 (in The figures are not drawn to scale.
  • a gesture is made on the graphic to select one or more of these figures.
  • the selection of one or more graphics occurs when the user interrupts contact with the one or more graphics.
  • the gesture may include one or more taps, one or more swipes (from left to right, from right to left, up and/or down) and/or have been in contact with the portable multifunction device 100 Fingers (from right to left, left to right, up and/or down).
  • unintentional contact with the graphic does not select the graphic. For example, when the gesture corresponding to the selection is a tap, the swipe gesture swiped over the application icon does not select the corresponding application.
  • Portable multifunction device 100 may also include one or more physical buttons, such as a "home screen" or menu button 204.
  • menu button 204 can be used to navigate to any of a set of applications that can run on device 100.
  • the menu button is implemented as a soft key displayed in a GUI on touch screen 112.
  • portable multifunction device 100 includes touch screen 112, menu button 204, push button 206 for device power on and lock device, volume adjustment button(s) 208, user identity module (SIM) The card slot 210, the headset jack 212, and the docking/charging external port 124.
  • the push button 206 can be used to power the device by pressing the button and holding the button in a depressed state for a predefined time interval; by pressing the button and before the predefined time interval Release the button to lock the device; and/or unlock the device or initiate an unlocking process.
  • portable multifunction device 100 may also accept voice input for activating or deactivating certain functions via microphone 113.
  • Device 300 need not be portable.
  • device 300 is a laptop, desktop, tablet, multimedia player device, navigation device, educational device (such as Children learn toys), game systems or control devices (for example, home or industrial controllers).
  • Device 300 typically includes one or more processing units (CPUs) 310, one or more network or other communication interfaces 360, memory 370, and one or more communication buses 320 for interconnecting these components.
  • processing unit 310 includes an image signal processor and a dual or multi-core processor.
  • Communication bus 320 may include circuitry (sometimes referred to as a chipset) that interconnects system components and controls communication between system components.
  • Device 300 includes an input/output (I/O) interface 330 having a display 340, which is typically a touch screen display.
  • the I/O interface 330 may also include a keyboard and/or mouse (or other pointing device) 350 and a trackpad 355.
  • Device 300 also includes an optical sensor 164 and an optical sensor controller 158.
  • Memory 370 includes high speed random access memory, such as DRAM, SRAM, DDR RAM, or other random access solid state memory devices; and may include nonvolatile memory such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, Or other non-volatile solid-state storage devices.
  • memory 370 can include one or more storage devices remotely located from CPU 310.
  • memory 370 stores programs, modules, and data structures similar to the programs, modules, and data structures stored in memory 102 of electronic device 100 (FIG. 1), or a subset thereof. Moreover, memory 370 can store additional programs, modules, and data structures that are not present in memory 102 of electronic device 100.
  • the memory 370 of the device 300 can store the drawing module 380, the rendering module 382, the word processing module 384, the web page creation module 386, the disk editing module 388, and/or the spreadsheet module 390, while the memory of the electronic device 100 (FIG. 1) 102 may not store these modules.
  • Each of the above identified components in Figure 3 can be stored in one or more of the aforementioned memory devices.
  • Each of the above identified modules corresponds to a set of instructions for performing the functions described above.
  • the above identified modules or programs (ie, sets of instructions) need not be implemented as separate software programs, procedures or modules, and thus various subsets of these modules may be combined or otherwise rearranged in various embodiments.
  • memory 370 can store a subset of the modules and data structures described above.
  • memory 370 can store additional modules and data structures not described above.
  • the electronic device referred to herein includes the portable multifunction device 100 of FIGS. 1 and 2 and the device 300 of FIG.
  • the focus is now directed to an embodiment of detecting a document edge that can be implemented on an electronic device, such as device 300 or electronic device 100.
  • 4a to 4b show a schematic diagram of a usage scenario of an electronic device.
  • FIG. 4a shows that the electronic device calls the camera to take a document, and usually the camera takes a picture. Generally, it is larger than the size of the source rectangular picture such as a document, and there is a certain inclination angle.
  • the unit inside the electronic device processes the shooting picture in real time, and outputs the quadrangle formed by the four vertices of the source rectangular picture of interest in the form of ordered coordinates.
  • Fig. 4b shows an example of a quadrangular region in which a source rectangular picture is located in a captured image.
  • the vertices of the quadrilateral in which the source rectangle is located are indicated by circles, which occupy only a part of the captured image, and are tilted.
  • the following takes the image taken as an input as an input, and the quadrilateral formed by the detected four corner points as an output as an example to describe the technical solution.
  • FIG. 5 illustrates a process or algorithm for detecting a document edge, in accordance with some embodiments.
  • the electronic device acquires multi-color channel data of each pixel of a color image, where the multi-color channel data includes two-dimensional coordinate values of the pixel points, and values of the pixel points in the respective color channels;
  • the electronic device performs line detection on multi-color channel data of each pixel of the color image
  • the electronic device detects a quadrilateral based on performing the line detection to obtain a part or all of a straight line and a preset condition, where the preset condition includes that an angle of the opposite side of the quadrilateral is smaller than a first threshold, and the edge of the quadrilateral The angle value is between the preset angle value range, and the opposite side distance of the quadrilateral is greater than a second threshold, the first threshold is an integer greater than zero, and the second threshold is an integer greater than zero.
  • the preset condition further includes: the ratio of the actually detected edge pixel number to the quadrilateral circumference is the largest.
  • FIG. 6 is compared with FIG. 5, and two optional steps 101 and 102 are added.
  • 101 and 102 can be independently selected, and both 101 and 102 can be selected. .
  • the electronic device performs color space conversion on the input color image.
  • the electronic device performs histogram equalization on the color image to be processed.
  • step 101 a schematic diagram of performing color space conversion on the input color image can be seen in FIG. 7, which shows a schematic diagram of the RGB color space conversion to the HSV color space.
  • the color image input by step 101 may be an image captured by a photographing device of an electronic device, an image saved by an electronic device, or a color image that performs histogram equalization.
  • the color image input in step 101 is composed of RGB color channels, and the HSV color channel is the most suitable for the human eye vision principle, and the HSV channel is used to detect the edge of the document conforming to the human eye and detecting the edge algorithm of the document. Consistent, that is, the detection document edge algorithm can detect the edge information perceived by the human eye.
  • the color space conversion algorithm can be used to convert the RGB color channel into the HSV color channel.
  • R, G, and B respectively represent the Red, Green, and Blue color channel information of the RGB color channel
  • H, S, and V represent Hue, Saturation, Value three color channel information of HSV color channel:
  • the color image to be processed may be an image captured by the camera of the electronic device, or may be an image saved by the electronic device.
  • the optional step 101 is omitted, and the optional step 102 is directly selected or executed first.
  • step 101 is performed. If step 101 is performed after step 101 is performed, the image to be processed is a color image after performing color space conversion. The effect of performing histogram equalization is shown in Figure 8.
  • histogram equalization is to change the distribution histogram of a channel of the original image from a certain space in the comparison set to a uniform distribution in the whole range of values, and nonlinearly stretch the image, thereby increasing the overall contrast of the image, and the histogram
  • the following conditions are generally met during the graph equalization process:
  • the value range of the pixel mapping function should be between 0 and 255, and cannot be out of bounds.
  • step 105 including but not limited to the following three ways:
  • Step 1053a is repeatedly performed until all pixel points of the color image are marked as the second state.
  • the gradient is essentially the derivation of the two-dimensional discrete function of the image, reflecting the change of the intensity of the image.
  • the gradient includes two dimensions: gradient value and gradient direction.
  • the gradient value reflects the specific magnitude of the image intensity change, while the gradient direction refers to the direction in which the image intensity changes the most.
  • the Saturation channel gradient of the image at the pixel (i,j) is GradS(i,j) , Value channel gradient GradV (i, j).
  • the gradient direction at a pixel is the direction in which the image brightness changes the most. Generally, the edge of the document changes brightly.
  • the edge direction of the document is generally the vertical direction of the gradient direction. This is used to determine the search direction of the line.
  • Figure 9 shows the gradient of the image and the gradient direction of the image.
  • the specific process of detecting a line on a single channel is to search for a pixel marked as a first state along the vertical direction of the gradient direction of the point until the pixel marked as the second state (for example, Used) on the search path exceeds a certain value.
  • the number (for example, the number is set to 3) stops the search.
  • a condition can also be set to further limit the search end point, ie, The angle between all the pixels and the starting point gradient direction is less than a certain threshold.
  • the original data is convolved with the Gaussian kernel, and the resulting image is slightly blurred compared to the original image.
  • a single highlighted pixel has little effect on the Gaussian smoothed image.
  • Canny Edge Detection uses 4 convolutional templates to detect edges in the horizontal, vertical, and diagonal directions.
  • the convolution of the original image with each convolution template is stored. For each pixel we identify the maximum value at this pixel and the direction of the generated edge. In this way, we generate the luminance gradient map of each pixel and the direction of the luminance gradient from the original image.
  • NMS non-maxima suppression
  • a typical method of reducing the number of false edges is a dual threshold algorithm that zeros all values below the threshold.
  • the double threshold algorithm contains two thresholds ⁇ 1 and ⁇ 2 for the non-maximum suppression image, and 2 ⁇ 1 ⁇ ⁇ 2, so that two threshold edge images can be obtained.
  • High-threshold images contain few false edges, but there are discontinuities (not closed) that require low-threshold images to find edges that are contoured to the edges of the original high-threshold image.
  • Polar coordinate transformation is performed from the set of edge points obtained by Canny detection. It can be known from the straight line equation that the points on the same straight line in the original Cartesian coordinate system fall on the same coordinate in the polar coordinate system;
  • 1052c performs line detection on each color channel separately, and sorts according to the length, respectively, and selects a line with a long length of m*f(x(i)) to join the candidate document edge list, and f(x(i)) is introduced.
  • m is the number of candidate document edges
  • m*f(x(i)) is the number of reserved lines of the i color channel.
  • "selecting a line with a longer length of m*f(x(i))” can be understood as the selected m*f(x(i)) line is the m* of the line ranked by the straight line detection.
  • step 107 part or all of the straight line and the preset strip are obtained based on performing the line detection.
  • the quad is detected, including:
  • the candidate straight lines obtained based on performing the straight line detection are divided into four categories according to the tilt angle and the position, and the four categories include: up, down, left, and right;
  • a quadrilateral having the largest ratio is selected from the set of quadrilaterals as a result of edge detection, which is a value obtained by dividing the number of edge pixels actually detected by the circumference of the fitted quadrilateral.
  • the preset condition includes that the angle of the opposite side of the quadrilateral is smaller than the first threshold (for example, the first threshold value ranges from 0 to 50 degrees), and the value of the edge of the quadrilateral is between presets.
  • An angle value field eg, a preset angle value range is (60 degrees - 120 degrees)
  • the opposite side distance of the quadrilateral is greater than a second threshold (eg, the second threshold is 1/5 length of the corresponding side of the image)
  • the first threshold is an integer greater than zero
  • the second threshold is an integer greater than zero.
  • FIG. 13 On the basis of the foregoing technical solutions, an alternative solution is shown in FIG. 13 .
  • the technical solution described above is directed to a local edge line located at different positions of the image, and linear detection is performed simultaneously from a plurality of color channels, and the longest line detected by the different color channels of the same local area is added to the edge candidate list by linear fusion of the multi-color channels.
  • the light on the left and right sides of the card is different, the V-channel detection effect on the left edge is better, and the S-channel on the right edge is more ideal.
  • FIG. 16 shows a schematic diagram including all of the optional steps, the more these optional steps are used, the better the final output image will be.
  • increasing the image pre-processing operation before detecting the quadrilateral can further improve the success rate of detecting the edge of the document when the edge of the document is not clearly distinguishable from the background of the document.
  • adding some or all of the optional steps in FIG. 16 can further implement document correction so that the electronic device can output a clear rectangular document when the document is taken at an oblique angle.
  • FIG. 17 an optional multi-color channel detection document edge algorithm flow chart is shown in FIG. 17, and the processing flow of the multi-color channel linear fusion is shown in FIG. 18.
  • FIG. 17 For specific steps, refer to the corresponding descriptions above, and details are not described herein again.
  • FIG. 22 shows a functional block diagram of a processing unit, in accordance with some embodiments.
  • the processing unit 1702 in FIG. 22 may be the processor 120 in FIG. 1 and FIG. 2; or the processing unit 1702 in FIG. 22 may be the central processing unit 310 of FIG. 3; or, the processing unit 1702 in FIG. 22 may be FIG. And a coprocessor not shown in FIG. 2; or the processing unit 1702 in FIG. 22 may be a graphics processing unit (Graphics Processing Unit, GPU) not shown in FIG. 1 and FIG. 2; or, in FIG.
  • the processing unit 1702 may be a coprocessor not shown in FIG. 3; or, the processing unit 1702 in FIG. 22 may be a graphics processor not shown in FIG.
  • processing unit 1702 may be implemented in hardware, software, or a combination of hardware and software to perform the principles of the present invention. Those skilled in the art will appreciate that the functional blocks described in FIG. 22 can be combined or separated into sub-blocks to implement the principles of the invention described above. Accordingly, the description herein may support any possible combination or separation or further definition of the functional blocks described herein. It should be understood that the processing unit may be a processor or a coprocessor.
  • the processing unit 1702 includes: an obtaining unit 102, configured to acquire multi-color channel data of each pixel of a color image, where the multi-color channel data includes two-dimensional coordinate values of the pixel points. And a value of a pixel at each color channel; and a line detecting unit 103 coupled to the acquiring unit 102.
  • the line detecting unit 103 is configured to perform line detection on the multi-color channel data of each pixel of the color image; and a detecting quad unit 104 coupled to the line detecting unit 103, the detecting quad unit The 104 is configured to detect a quadrilateral based on performing the line detection to obtain a part or all of the straight line and a preset condition, where the preset condition includes that the angle of the opposite side of the quadrilateral is smaller than the first threshold.
  • a value, a value of the edge of the quadrilateral is between a preset angle value range, and a distance between the opposite sides of the quadrilateral is greater than a second threshold
  • the first threshold is an integer greater than zero
  • the second threshold is greater than An integer of zero.
  • the preset condition further includes: the ratio of the actually detected edge pixel number to the quadrilateral circumference is the largest.
  • the processing unit 1702 further includes a pre-processing unit 101, configured to perform pre-processing on the color image before acquiring multi-color channel data of each pixel of a color image,
  • the pre-processing includes color space conversion and/or histogram equalization.
  • the processing unit 1702 further includes at least one of a quadrilateral original scale estimation unit 105, an attitude projection matrix estimation unit 106, a quadrilateral correction unit 107, and an image enhancement unit 108.
  • the quadrilateral original ratio estimating unit 105 is configured to perform original ratio estimation on the detected quadrilateral.
  • the attitude projection matrix estimation unit 106 is configured to perform an attitude projection matrix estimation on the input quadrilateral.
  • the quadrilateral correction unit 107 is for performing quadrilateral correction on the input quadrilateral.
  • the image enhancement unit 108 is operative to perform image enhancement on the input image.
  • processing unit 1702 The functions of the processing unit 1702 are described above, and those skilled in the art can understand the following functions and the electronic devices, systems, devices, methods, graphical user interfaces, information processing devices (eg, processor chips or processors) described herein.
  • the various embodiments of the computer-readable storage medium are echoed, and the various combinations and/or combinations of the various aspects are combinations that can be directly and unambiguously contemplated by those of ordinary skill in the art after understanding this document.

Abstract

一种检测文档边缘的方法,方法包括:获取一个彩色图像的每个像素点的多颜色通道数据(103),所述多颜色通道数据包括像素点的二维坐标值,以及像素点在各个颜色通道的值;对所述彩色图像的每个像素点的多颜色通道数据执行直线检测(105);基于执行所述直线检测得到部分或全部直线以及预设条件检测到四边形(107),采用上述方法可以提高文档边缘检测的成功率。

Description

用于处理文档的设备、方法和图形用户界面 技术领域
本文涉及处理文档的电子设备,包括但不限于用于检测文档边缘的电子设备、方法和图形用户界面。
背景技术
用户可以使用具有拍照功能的移动设备(例如相机、手机,可穿戴设备或网络摄像头)拍照或录像。例如,在会议室中,人们可以使用手机随时拍下白板、幻灯片或文件等资料上的信息,不必手写记录这些信息,十分便捷。
但是,在使用这些移动设备拍摄目标物时,由于拍摄距离、角度等因素的限制,摄像头的像平面往往与被拍摄平面之间存在一定的夹角,这样就产生了较大的图像失真。例如,原本一个矩形的目标图像可能会畸变成一个任意四边形,例如梯形,这种失真称为倾斜失真。为解决该问题,通常需要利用四边形检测以及四边形校正等算法对输入图像进行倾斜校正,获取目标图像。四边形检测算法主要是利用计算机视觉中的边缘提取算法检测文档、白板等目标图像的矩形边缘,用以剔除矩形边框外侧非目标区域。考虑到拍摄视角引起的图像投影畸变,通常采用四边形校正算法对上述四边形检测算法获得的矩形区域进行投影校正,获得质量较高的目标图像。
现有的一种四边形检测算法是将彩色图像转化为单通道灰度图像,然后进行四边形边缘检测,忽略了图像的色彩和饱和度信息。在一些前后景颜色相似场景下,文档边缘的检测效果不好,甚至某些人眼感觉前后景反差较大的场景,仍难以检出文档边缘。
现有的另一种四边形检测算法是分析彩色图像的每一彩色通道以确定表示图像数据的复杂度的对应繁忙度指标。选择具有最低繁忙度的所述彩色通道以检测文档边缘。这种方案考虑到了彩色图像的色彩、饱和度和亮度信息,但在检测文档边缘过程中仅使用了复杂度最低的通道,实际上仍可归结为单通道处理算法。复杂度最低原则是建立在文档边缘区分度明显的前提下,复杂度最低通道选择原则能够保证剔除部分非真正边缘干扰,但是无法解决文档边缘区分度不明显的问题。比如在纯白桌子上检测白色文档,色彩通道复杂度最低,但是检出的文档边缘较少,无法确定出真正的四边形。
发明内容
因此,需要有电子设备具有高效的方法和界面来响应电子设备拍摄文档或处理文档的操作。这样的方法和界面可以更快速、更高效、更智能地响应用户的个性需求,提高电子设备检测文档边缘的成功率,避免出现电子设备无法检测出文档边缘的情况。
本文的一些实施例通过所公开的电子设备提供了上述的方法和界面。在一些实施例中,该电子设备是便携式的(例如,笔记本电脑、平板电脑、或手持设备,或可穿戴设备)。在一些实施例中,该电子设备具有触控板。在一些实施例中,该电子设备具有触敏显示器(也被称为“触摸屏”、“触摸屏显示器”或“具有触敏表面的显示器”)。在一些实施例中,该电子设备具有图形用户界面(GUI)、一个或多个处理器、存储器以及存储在存储器中用于执行多个功能的一个或多个模块、程序或指令集。在一些实施例中,用户主要通过触敏表面上的手指接触和/或手势来与GUI交互。在一些实施例中,这些功能可以包括图像编辑、画图、呈现、文字处理、网页创建、盘编辑、电子表格制作、玩游戏、接打电话、视频会议、收发电子邮件、即时消息通信、锻炼支持、数字摄影、数字视频记录、网络浏览、数字音乐播放、和/或数字视频播放。用于执行这些功能的可执行指令可被包括在被配置用于由一个或多个处理器执行的非暂态计算机可读存储介质或其他计算机程序产品中。
第一方面提供了一种电子设备。设备包括:显示器;一个或多个处理器;存储器;多个应用程序;以及一个或多个程序。这一个或多个程序被存储在存储器中并被配置为被一个或多个处理器执行。这一个或多个程序包括指令。指令用于:获取一个彩色图像的每个像素点的多颜色通道数据,所述多颜色通道数据包括像素点的二维坐标值,以及像素点在各个颜色通道的值;对所述彩色图像的每个像素点的多颜色通道数据执行直线检测;基于执行所述直线检测得到部分或全部直线以及预设条件检测到四边形,所述预设条件包括所述四边形的对边夹角值小于第一阈值,所述四边形的临边夹角值介于预设角度值域,且所述四边形的对边距离大于第二阈值,所述第一阈值为大于零的整数,所述第二阈值为大于零的整数。这样的电子设备可以更高效、更智能地响应用户的个性需求,提高电子设备检测文档边缘的成功率。可选 的,对该彩色图像执行预处理,所述预处理包括但不限于:颜色空间转换和/或直方图均衡化。
在第一方面的第一种可能的实现方式中,所述预设条件还包括:实际检测到的边缘像素数与所述四边形周长的比值最大。这样检测到的四边形更加接近实际像素点构成的轮廓。
在第一方面的第二种可能的实现方式中,所述指令进一步用于:
在获取一个彩色图像的每个像素点的多颜色通道数据之前,还对所述彩色图像执行预处理,所述预处理包括颜色空间转换或直方图均衡化中的至少一个。增加了颜色空间转换和/或直方图均衡化步骤,可以进一步提高边缘检测的成功率降低误判的可能性。直方图均衡化是一种提高图像对比度方法,可以提高微弱边缘的检出率。
在第一方面的第三种可能的实现方式中,指令进一步用于:
对所述彩色图像的每个像素点的多颜色通道数据执行直线检测,包括:
S1:计算所述彩色图像中每一个像素点在每个颜色通道中的梯度值及梯度方向;S2:将所有颜色通道中的梯度值中的最大值大于第三阈值的像素点标记为第一状态;S3:从标记为所述第一状态的多个像素点中选择所有颜色通道中的梯度值中的最大值为最大的像素点作为起始点,在所有颜色通道执行直线检测得到所有颜色通道对应的直线;将从该起始点出发执行直线检测得到所有颜色通道对应的直线中最长直线保存,加入候选文档边缘列表,并将所有颜色通道中该最长直线上点标记为第二状态;重复执行步骤S3直至所述彩色图像的全部像素点标记为所述第二状态。
在第一方面的第四种可能的实现方式中,指令进一步用于:
所述从标记为所述第一状态的多个像素点中选择所有颜色通道中的梯度值中的最大值为最大的像素点作为起始点,在所有颜色通道执行直线检测得到所有颜色通道对应的直线,包括:对该起始点的所有颜色通道中每个颜色通道执行如下操作:沿该起始点的所述梯度方向的垂直方向搜索标记所述第一状态的像素点,直至搜索路径上标记为所述第二状态的像素点的数量大于第三阈值;确定执行直线检测得到的一个直线,所述直线的两个端点分别为所述起始点和搜索路径上的结束点,该搜索路径上标记为所述第一状态的像素点的数量大于第四阈值。
在第一方面的第五种可能的实现方式中,指令进一步用于:对所述彩色 图像的每个像素点的多颜色通道数据执行直线检测,包括:在所述彩色图像中所有多颜色通道分别执行Canny边缘检测,将在所有多颜色通道中检测到的边缘均标记为边缘点,构建多颜色通道混合边缘;对所述多通道混合边缘上的边缘点执行霍夫直线检测,将检测到的直线加入候选文档边缘列表。
在第一方面的第六种可能的实现方式中,指令进一步用于:
对所述彩色图像的每个像素点的多颜色通道数据执行直线检测,包括:
计算所述彩色图像每个颜色通道的复杂度x(i),i为颜色通道编号,i=0,1,…,n-1,n为颜色通道数目,所述复杂度包括所述彩色图像的信息熵,或者所述彩色图像的JPG压缩率;
在每个颜色通道分别执行直线检测,并分别按照长度的大小排序,选择m*f(x(i))条长度较长的直线加入候选文档边缘列表,f(x(i))为介于0和1之间的数值,m为候选文档边缘数目,m*f(x(i))为i颜色通道的保留直线数目。
在第一方面的第七种可能的实现方式中,指令进一步用于:
基于执行所述直线检测得到部分或全部直线以及预设条件检测到四边形,包括:将基于执行所述直线检测得到的候选直线按照倾斜角以及位置分成四种类别,所述四种类别包括:上、下、左和右;从每个类别直线中循环选择一条直线,按所所述预设条件构建四边形集合;从所述四边形集合中选取比值最大的一个四边形作为边缘检测的结果,所述比值是实际检测到的边缘像素数除以拟合四边形周长得到的值。
在第一方面的第八种可能的实现方式中,指令进一步用于:对检测到的四边形执行以下至少一种处理:四边形原始比例估计;或者,姿态投影矩阵估计;或者,四边形校正;或者,图像增强。
第二方面提供了一种方法,用于电子设备,方法包括:
获取一个彩色图像的每个像素点的多颜色通道数据,所述多颜色通道数据包括像素点的二维坐标值,以及像素点在各个颜色通道的值;对所述彩色图像的每个像素点的多颜色通道数据执行直线检测;基于执行所述直线检测得到部分或全部直线以及预设条件检测到四边形,所述预设条件包括所述四边形的对边夹角值小于第一阈值,所述四边形的临边夹角值介于预设角度值域,且所述四边形的对边距离大于第二阈值,所述第一阈值为大于零的整数,所述第二阈值为大于零的整数。
在第二方面的第一种可能的实现方式中,所述预设条件还包括:实际检 测到的边缘像素数与所述四边形周长的比值最大。这样检测到的四边形更加接近实际像素点构成的轮廓。
在第二方面的第二种可能的实现方式中,方法进一步包括:
在获取一个彩色图像的每个像素点的多颜色通道数据之前,还对所述彩色图像执行预处理,所述预处理包括颜色空间转换或直方图均衡化中的至少一个。增加了颜色空间转换和/或直方图均衡化步骤,可以进一步提高边缘检测的成功率降低误判的可能性。
在第二方面的第三种可能的实现方式中,方法进一步包括:
对所述彩色图像的每个像素点的多颜色通道数据执行直线检测,包括:
S1:计算所述彩色图像中每一个像素点在每个颜色通道中的梯度值及梯度方向;S2:将所有颜色通道中的梯度值中的最大值大于第三阈值的像素点标记为第一状态;S3:从标记为所述第一状态的多个像素点中选择所有颜色通道中的梯度值中的最大值为最大的像素点作为起始点,在所有颜色通道执行直线检测得到所有颜色通道对应的直线;将从该起始点出发执行直线检测得到所有颜色通道对应的直线中最长直线保存,加入候选文档边缘列表,并将所有颜色通道中该最长直线上点标记为第二状态;重复执行步骤S3直至所述彩色图像的全部像素点标记为所述第二状态。
在第二方面的第四种可能的实现方式中,其中,所述从标记为所述第一状态的多个像素点中选择所有颜色通道中的梯度值中的最大值为最大的像素点作为起始点,在所有颜色通道执行直线检测得到所有颜色通道对应的直线,包括:对该起始点的所有颜色通道中每个颜色通道执行如下操作:沿该起始点的所述梯度方向的垂直方向搜索标记所述第一状态的像素点,直至搜索路径上标记为所述第二状态的像素点的数量大于第三阈值;确定执行直线检测得到的一个直线,所述直线的两个端点分别为所述起始点和搜索路径上的结束点,该搜索路径上标记为所述第一状态的像素点的数量大于第四阈值。
在第二方面的第五种可能的实现方式中,其中,对所述彩色图像的每个像素点的多颜色通道数据执行直线检测,包括:在所述彩色图像中所有多颜色通道分别执行Canny边缘检测,将在所有多颜色通道中检测到的边缘均标记为边缘点,构建多颜色通道混合边缘;对所述多通道混合边缘上的边缘点执行霍夫直线检测,将检测到的直线加入候选文档边缘列表。
在第二方面的第六种可能的实现方式中,其中,对所述彩色图像的每个像素点的多颜色通道数据执行直线检测,包括:计算所述彩色图像每个颜色通道的复杂度x(i),i为颜色通道编号,i=0,1,…,n-1,n为颜色通道数目,所述复杂度包括所述彩色图像的信息熵,或者所述彩色图像的JPG压缩率;在每个颜色通道分别执行直线检测,并分别按照长度的大小排序,选择m*f(x(i))条长度较长的直线加入候选文档边缘列表,f(x(i))为介于0和1之间的数值,m为候选文档边缘数目,m*f(x(i))为i颜色通道的保留直线数目。
在第二方面的第七种可能的实现方式中,其中,基于执行所述直线检测得到部分或全部直线以及预设条件检测到四边形,包括:将基于执行所述直线检测得到的候选直线按照倾斜角以及位置分成四种类别,所述四种类别包括:上、下、左和右;从每个类别直线中循环选择一条直线,按所所述预设条件构建四边形集合;从所述四边形集合中选取比值最大的一个四边形作为边缘检测的结果,所述比值是实际检测到的边缘像素数除以拟合四边形周长得到的值。
在第二方面的第八种可能的实现方式中,方法进一步包括:
对检测到的四边形执行以下至少一种处理:四边形原始比例估计;或者,姿态投影矩阵估计;或者,四边形校正;或者,图像增强。
其它方面提供了一种电子设备,包括:显示器;一个或多个处理器;存储器;多个应用程序;以及一个或多个程序。其中一个或多个程序被存储在存储器中并被配置为被一个或多个处理器执行。一个或多个程序包括用于执行根据第二方面的方法的指令。
其它方面提供了一种存储一个或多个程序的计算机可读存储介质。一个或多个程序包括指令,指令当被包括显示器和多个应用程序的电子设备执行时使电子设备执行根据第二方面的方法。
其它方面提供了一种电子设备上的图形用户界面。电子设备包括显示器、存储器、多个应用程序;和用于执行存储在存储器中的一个或多个程序的一个或多个处理器。图形用户界面包括根据第二方面的方法显示的用户界面,其中,显示器包括触敏表面和显示屏。
其它方面提供了一种电子设备,包括:显示器,其中,显示器包括触敏 表面和显示屏;多个应用程序;以及用于执行根据第二方面的方法的装置或模块或单元。
其它方面提供了一种在电子设备中使用的信息处理装置。电子设备包括显示器和多个应用程序。信息处理装置包括:用于执行根据第二方面的方法的装置,其中,显示器包括触敏表面和显示屏。
基于上述技术方案,电子设备可以在文档与背景区分不明显的时候提高检测文档边缘的成功率。
附图说明
为了更好地理解本发明的前述实施例以及本发明的附加实施例,应该结合以下附图参考下面的实施例的说明,在附图中,相同的附图标号在所有附图中指示相应的部件。
图1是示出根据一些实施例的电子设备的框图。
图2示出了根据一些实施例的电子设备。
图3是根据一些实施例的电子设备的框图。
图4a示出了根据一些实施例的电子设备调用摄像装置拍摄文档的示意图。
图4b示出了根据一些实施例的电子设备所拍摄图像中源矩形画面所在四边形区域的示意图。
图5示出了根据一些实施例的用于检测文档边缘的流程图。
图6示出了根据一些实施例的用于检测文档边缘的流程图。
图7示出了RGB颜色空间转换到HSV颜色空间的示意图。
图8示出了执行直方图均衡化的效果示意图。
图9示出了图像的梯度和图像的梯度方向的示意图。
图10示出了Canny边缘检测的流程图。
图11示出了抑制非极大值的示意图。
图12示出了霍夫直线检测的示意图。
图13示出了根据一些实施例的用于检测文档边缘的流程图。
图14示出了多颜色通道直线融合的示意图。
图15示出了多颜色通道直线融合的效果示意图。
图16示出了根据一些实施例的用于检测文档边缘的流程图。
图17示出了根据一些实施例的用于检测文档边缘算法的流程图。
图18示出了多颜色通道直线融合的示意图。
图19示出了根据一些实施例的用于检测文档边缘的流程图。
图20示出了根据一些实施例的用于检测文档边缘的流程图。
图21示出了根据一些实施例的用于检测文档边缘的流程图。
图22示出了处理单元的功能框图。
具体实施方式
通常,电子设备需要具有高效的方法和界面来响应电子设备拍摄文档或处理文档的操作。这样的方法和界面可以更快速、更高效、更智能地响应用户的个性需求,提高电子设备检测文档边缘的成功率。
以下介绍的实施例描述了在所拍摄的文档与背景的区分度不高的情况下,提高电子设备检测文档边缘的成功率的技术方案。
应理解:本文所涉及的多颜色通道包括但不限于:RGB(Red、Green、Blue)颜色通道,HSV(Hue、Saturation、Value)颜色通道,或HSL(Hue、Saturation、Lightness)颜色通道。需要说明的是:本文多颜色通道以HSV颜色通道为例进行描述,但这并不构成对本发明的限制,在具体设计中,多颜色通道也可以采用RGB三颜色通道,或HSL三颜色通道,或者其他类型的多颜色通道。以HSV三颜色通道为例,本文所涉及的多颜色通道数据的数学表达式包括但不限于:(i,j,h,s,v),或者,(i,j,h),(i,j,s)以及(i,j,v)。其中(i,j)为彩色图像上一个像素点的坐标,h、s、v分别指坐标为(i,j)像素点在Hue、Saturation、Value颜色通道上的值。当然,多颜色通道数据也可以分别表示,即坐标为(i,j)像素点在Hue单通道的值为(i,j,h),坐标为(i,j)像素点在Saturation单通道的值为(i,j,s),坐标为(i,j)像素点在Value单通道的值为(i,j,v),即,(i,j,h),(i,j,s)以及(i,j,v)。
应理解:该彩色图像可以为电子设备通过摄像头拍摄得到的预览帧图像,或者可以为电子设备保存的数字图像。
应理解:文档可以是任意具有矩形边缘的平面物体,例如名片、ID卡、幻灯片投影画面等。
下面,图1、2、3提供了对示例性设备的描述。
示例性设备
现在将详细地参考实施例,这些实施例的示例在附图中被示出。在下面的详细描述中给出了许多具体细节,以便提供对本发明的充分理解。但是,对本领域技术人员将显而易见的是本发明可以在没有这些具体细节的情况下被实践。在其他情况下,没有详细地描述众所周知的方法、过程、部件、电路、和网络,从而不会不必要地使实施例的方面晦涩难懂。
还将理解的是,虽然术语“第一”、“第二”、“第三”、“第四”等可能在本文中用来描述各种元素,但是这些元素不应当被这些术语限定。这些术语只是用来将一个元素与另一元素区分开。例如,第一阈值可以被命名为第二阈值,并且类似地,第二阈值可以被命名为第一阈值,而不背离本发明的范围。第一阈值和第二阈值二者都是阈值,它们的大小可以相等,它们的大小也可以不相等。
在本文中对本发明的描述中所使用的术语只是为了描述特定实施例的目的,而并非旨在作为对本发明的限制。如本在发明的说明书和所附权利要求书中所使用的那样,单数表达形式“一个”、“一种”和“这一”旨在也包括复数表达形式,除非其上下文中明确地有相反指示。还将理解的是,本文中所使用的术语“和/或”是指并且涵盖相关联地列出的项目中一个或多个项目的任何和全部可能的组合。还将理解的是,术语“包括”和/或“包含”当在本说明书中使用时是指定存在所陈述的特征、整数、步骤、操作、元素和/或部件,但是并不排除存在或添加一个或多个其他特征、整数、步骤、操作、元素、部件和/或其分组。
如本文中所用,根据上下文,术语“如果”可以被解释为意思是“当…时”或“在…后”或“响应于确定”或“响应于检测到”。类似地,根据上下文,短语“如果确定…”或“如果检测到[所陈述的条件或事件]”可以被解释为意思是“在确定…时”或“响应于确定…”或“在检测到[所陈述的条件或事件]时”或“响应于检测到[所陈述的条件或事件]”。
介绍了电子设备、用于这样的设备的用户界面、和用于使用这样的设备的相关联过程的实施例。在一些实施例中,设备是还包含其它功能诸如个人数字助理和/或音乐播放器功能的便携式通信设备,诸如移动电话。电子设备的示例性实施例包括但不限于搭载
Figure PCTCN2016113987-appb-000001
或者其它操作系统的电子设备。也可以使用其它电子设备,诸如具有触敏表面(例如,触摸屏显示器和/或触控板)的膝上型计算机或平板电脑。还应当理解的是, 在一些实施例中,设备不是便携式通信设备,而是具有触敏表面(例如,触摸屏显示器和/或触控板)的台式计算机。
在下面的讨论中,介绍了一种包括显示器和触敏表面的电子设备。然而应当理解,电子设备可以包括一个或多个其他物理用户接口设备,诸如物理键盘、鼠标和/或操作杆。
设备通常支持多种应用程序,诸如以下中的一种或多种:画图应用程序、呈现应用程序、文字处理应用程序、网页创建应用程序、盘编辑应用程序、电子表格应用程序、游戏应用程序、电话应用程序、视频会议应用程序、电子邮件应用程序、即时消息应用程序、锻炼支持应用程序、相片管理应用程序、数字相机应用程序、数字视频摄像机应用程序、网络浏览应用程序、数字音乐播放器应用程序、和/或数字视频播放器应用程序。
可在设备上执行的各种应用程序可使用至少一个共用的物理用户接口设备,诸如触敏表面。触敏表面的一种或多种功能以及显示在设备上的相应信息可从一种应用程序调整和/或变化至下一种应用程序和/或在相应应用程序内被调整和/或变化。这样,设备的共用物理架构(诸如触敏表面)可利用对于用户而言直观清楚的用户界面来支持各种应用程序。
现在关注具有触敏显示器的便携式设备的实施例。图1是示出根据一些实施例的具有触敏显示器112的便携式多功能设备100的框图。触敏显示器112有时为了方便被称为“触摸屏”,并且也可被称为是或者被叫做触敏显示器系统,也可以被称为具有触敏表面(touch-sensitive surface)和显示屏(display)的显示器系统。便携式多功能设备100可包括存储器102(其可包括一个或多个计算机可读存储介质)、存储器控制器122、一个或多个处理单元(CPU)120、外围设备接口118、RF电路系统108、音频电路系统110、扬声器111、麦克风113、输入/输出(I/O)子系统106、其他输入控制设备116、和外部端口124。便携式多功能设备100可包括一个或多个光学传感器164。这些部件可通过一根或多根通信总线或信号线103进行通信。
应当理解,便携式多功能设备100只是一种电子设备的一个示例,并且便携式多功能设备100可具有比所示出的更多或更少的部件,可组合两个或更多个部件,或者可具有这些部件的不同配置或布置。图1中所示的各种部件可以硬件、软件方式或软硬件组合来实现,包括一个或多个信号处理和/或专用集成电路。
存储器102可以包括高速随机存取存储器,并且还可包括非易失性存储器,诸如一个或多个磁盘存储设备、闪存存储器设备、或其他非易失性固态存储器设备。便携式多功能设备100的其他部件(诸如CPU 120和外围设备接口118)对存储器102的访问可由存储器控制器122来控制。
外围设备接口118可以被用来将设备的输入和输出外围设备耦接到CPU120和存储器102。该一个或多个处理器120运行或执行存储在存储器102中的各种软件程序和/或指令集,以执行便携式多功能设备100的各种功能以及处理数据。在一些实施例中,该一个或多个处理器120包括图像信号处理器和双核或多核处理器。
在一些实施例中,外围设备接口118、CPU 120、和存储器控制器122可以被实现在单个芯片诸如芯片104上。在一些其他实施例中,它们可以被实现在独立的芯片上。
RF(射频)电路系统108接收和发送RF信号,也被叫做电磁信号。RF电路系统108将电信号转换为电磁信号/将电磁信号转换为电信号,并且经由电磁信号与通信网络及其他通信设备通信。RF电路系统108可包括用于执行这些功能的众所周知的电路系统,包括但不限于天线系统、RF收发器、一个或多个放大器、调谐器、一个或多个振荡器、数字信号处理器、编解码芯片组、用户身份模块(SIM)卡、存储器等等。RF电路系统108可通过无线通信与网络以及其他设备通信,网络诸如是互联网(也被称为万维网(WWW))、内联网和/或无线网络(诸如蜂窝电话网络、无线局域网(LAN)和/或城域网(MAN))。无线通信可使用多种通信标准、协议和技术中的任何类型,包括但不限于全球移动通信系统(GSM)、增强数据GSM环境(EDGE)、高速下行链路分组接入(HSDPA)、高速上行链路分组接入(HSUPA)、宽带码分多址(W-CDMA)、码分多址(CDMA)、时分多址(TDMA)、蓝牙、无线保真(WI-Fi)(例如,IEEE 802.11a、IEEE 802.11b、IEEE 802.11g和/或IEEE 802.11n)、因特网语音协议(VoIP)、Wi-MAX、电子邮件协议(例如,因特网消息访问协议(IMAP)和/或邮局协议(POP))、即时消息(例如,可扩展消息处理现场协议(XMPP)、用于即时消息和现场利用扩展的会话发起协议(SIMPLE)、即时消息和到场服务(IMPS))、和/或短消息服务(SMS)、或者其他任何适当的通信协议,包括在本文献提交日还未开发出的通信协议。
音频电路系统110、扬声器111、和麦克风113提供用户与便携式多功 能设备100之间的音频接口。音频电路系统110从外围设备接口118接收音频数据,将音频数据转换为电信号,并将电信号传输到扬声器111。扬声器111将电信号转换为人类可听的声波。音频电路系统110还接收由麦克风113根据声波转换来的电信号。音频电路系统110将电信号转换为音频数据,并将音频数据传输到外围设备接口118以进行处理。音频数据可由外围设备接口118检索自和/或传输至存储器102和/或RF电路系统108。在一些实施例中,音频电路系统110还包括耳麦插孔(例如,图2中的212)。耳麦插孔提供音频电路系统110与可移除的音频输入/输出外围设备之间的接口,该外围设备诸如仅输出的耳机或者具有输出(例如,单耳或双耳耳机)和输入(例如,麦克风)二者的耳麦。
I/O子系统106将便携式多功能设备100上的输入/输出外围设备,诸如触摸屏112和其他输入控制设备116,耦接到外围设备接口118。I/O子系统106可以包括显示控制器156和用于其他输入控制设备的一个或多个输入控制器160。该一个或多个输入控制器160从其他输入控制设备116接收电信号/发送电信号到其他输入控制设备116。所述其他输入控制设备116可包括物理按钮(例如,下压按钮、摇臂按钮等)、拨号盘、滑动开关、操纵杆、点击式转盘等等。在一些另选实施例中,输入控制器160可耦接到(或不耦接到)以下任一个:键盘、红外线端口、USB端口、和指针设备诸如鼠标。该一个或多个按钮(例如,图2中的208)可包括用于扬声器111和/或麦克风113的音量控制的上/下按钮。该一个或多个按钮可包括下压按钮(例如,图2中的206)。
触敏显示器112提供设备与用户之间的输入接口和输出接口。显示控制器156从触摸屏112接收电信号和/或向触摸屏112发送电信号。触摸屏112向用户显示视觉输出。视觉输出可包括图形、文本、图标、视频及它们的任何组合(统称为“图形”)。在一些实施例中,一些视觉输出或全部的视觉输出可对应于用户界面对象。
触摸屏112具有基于触觉和/或触觉接触从用户接受输入的触敏表面、传感器或传感器组。触摸屏112和显示控制器156(与存储器102中的任何相关联模块和/或指令集一起)检测触摸屏112上的接触(和该接触的任何移动或中断),并且将所检测到的接触转换为与显示在触摸屏112上的用户界面对象(例如,一个或多个软按键、图标、网页或图像)的交互。在示例性实施例中, 触摸屏112与用户之间的接触点对应于用户的手指。
触摸屏112可使用LCD(液晶显示器)技术、LPD(发光聚合物显示器)技术、或LED(发光二极管)技术,但是在其他实施例中可使用其他显示技术。触摸屏112和显示控制器156可以利用现在已知的或以后将开发出的多种触摸感测技术中的任何技术以及其他接近传感器阵列或用于确定与触摸屏112接触的一个或多个点的其他元件来检测接触及其任何移动或中断,该多种触摸感测技术包括但不限于电容性的、电阻性的、红外线的、和表面声波技术。在一示例性实施例中,使用投射式互电容感测技术。
触摸屏112可以具有超过100dpi的视频分辨率。在一些实施例中,触摸屏具有大约160dpi的视频分辨率。用户可以利用任何合适的物体或附加物诸如触笔、手指等等,与触摸屏112接触。在一些实施例中,用户界面被设计为主要与基于手指的接触和手势一起工作,这与基于触笔的输入相比由于手指在触摸屏上接触面积更大而可能精确度更低。在一些实施例中,设备将基于手指的粗略输入翻译为精确的指针/光标位置或命令,以执行用户所期望的动作。
在一些实施例中,除了触摸屏之外,便携式多功能设备100可包括用于激活或解除激活特定功能的触控板(未示出)。在一些实施例中,触控板是设备的触敏区域,该触敏区域与触摸屏不同,其不显示视觉输出。触控板可以是与触摸屏112分开的触敏表面,或者是由触摸屏形成的触敏表面的延伸部分。
便携式多功能设备100还包括用于为各种部件供电的电力系统162。电力系统162可包括电力管理系统、一个或多个电源(例如,电池、交流电(AC))、再充电系统、电力故障检测电路、功率变换器或逆变器、电力状态指示器(例如,发光二极管(LED))和任何其他与便携式设备中电力的生成、管理和分配相关联的部件。
便携式多功能设备100还可包括一个或多个光学传感器164。图1示出了耦接到I/O子系统106中光学传感器控制器158的光学传感器。光学传感器164可包括电荷耦合器件(CCD)或互补金属氧化物半导体(CMOS)光电晶体管。光学传感器164从环境接收通过一个或多个透镜投射的光,并且将光转换为表示图像的数据。结合成像模块143(也称为相机模块),光学传感器164可以捕获静态图像或视频。在一些实施例中,一个或者多个光学传感器 位于便携式多功能设备100的后部,与设备前部上的触摸屏显示器112相对,使得触摸屏显示器可用作用于静态图像和/或视频图像采集的取景器。在一些实施例中,另一个或者多个光学传感器位于设备的前部上,使得用户在触摸屏显示器上观看其它视频会议参与者的同时可以获得该用户的图像以用于视频会议。
便携式多功能设备100还可以包括一个或多个接近传感器166。图1示出了耦接到外围设备接口118的接近传感器166。作为另外一种选择,接近传感器166可耦接到I/O子系统106中的输入控制器160。在一些实施例中,当电子设备被置于用户耳朵附近时(例如,当用户正在进行电话呼叫时),接近传感器关闭并禁用触摸屏112。
便携式多功能设备100还可包括一个或多个加速度计168。图1示出了耦接到外围设备接口118的加速度计168。作为另外一种选择,加速度计168可耦接到I/O子系统106中的输入控制器160。在一些实施例中,信息基于对从该一个或多个加速度计所接收的数据的分析而在触摸屏显示器上以纵向视图或横向视图被显示。便携式多功能设备100可选地除了加速度计168之外还包括磁力计(未示出)和GPS(或GLONASS或北斗或其它全球导航系统)接收器(未示出),用于获得关于便携式多功能设备100的位置和取向(例如,纵向或横向)的信息。
在一些实施例中,存储在存储器102中的软件部件包括操作系统126、通信模块(或指令集)128、接触/移动模块(或指令集)130、图形模块(或指令集)132、文本输入模块(或指令集)134、全球定位系统(GPS)模块(或指令集)135、以及应用程序(或指令集)136。此外,在一些实施例中,存储器102存储设备/全局内部状态157,如图1所示。设备/全局内部状态157包括以下中一者或多者:活动应用程序状态,用于指示哪些应用程序(如果有的话)当前是活动的;显示状态,用于指示什么应用程序、视图或其它信息占据触摸屏显示器112的各个区域;传感器状态,包括从设备的各个传感器和输入控制设备116获得的信息;和关于设备的位置和姿态的位置信息。
操作系统126(例如,Darwin、RTXC、LINUX、UNIX、OS X、WINDOWS、ANDROID或嵌入式操作系统(诸如Vx Works))包括用于控制和管理一般系统任务(例如,存储器管理、存储设备控制、电力管理等)的各种软件部件和/或驱动器,并且有利于各个硬件和软件部件之间的通信。此外,在一些实施 例中,存储器102存储数字相机胶卷159和数字图像流水线161。
通信模块128有利于通过一个或多个外部端口124与其它设备通信,并且还包括用于处理由RF电路系统108和/或外部端口124所接收的数据的各种软件部件。外部端口124(例如,通用串行总线(USB)、火线等)适于直接耦接到其他设备或者间接地通过网络(例如,因特网、无线LAN等)耦接。在一些实施例中,外部端口是与iPod(Apple Inc.的商标)设备上所使用的30针连接器相同的或类似的以及/或者与其兼容的多针(例如,30针)连接器。
接触/移动模块130可检测与触摸屏112(结合显示控制器156)和其他触敏设备(例如,触控板或物理点击式转盘)的接触。接触/移动模块130包括多个软件部件用于执行与接触检测相关的各种操作,诸如确定是否已经发生了接触(例如,检测手指按下事件)、确定是否存在接触的移动并在整个触敏表面上跟踪该移动(例如,检测一个或多个手指拖动事件)、以及确定接触是否已经终止(例如,检测手指抬起事件或者接触中断)。接触/移动模块130从触敏表面接收接触数据。确定接触点的移动可以包括确定接触点的速率(量值)、速度(量值和方向)、和/或加速度(量值和/或方向的改变),接触点的移动由一系列接触数据来表示。这些操作可被应用于单点接触(例如,一个手指接触)或者多点同时接触(例如,“多点触摸”/多个手指接触)。在一些实施例中,接触/移动模块130和显示控制器156检测触控板上的接触。
接触/移动模块130可检测用户的手势输入。触敏表面上不同的手势具有不同的接触图案。因此,可通过检测具体接触图案来检测手势。例如,检测到单指轻击手势包括检测到手指按下事件、然后在与手指按下事件相同的位置(或基本上相同的位置)处(例如,在图标位置处)检测到手指抬起(抬离)事件。又如,在触敏表面上检测到手指轻扫手势包括检测到手指按下事件、然后检测到一个或多个手指拖动事件、并且随后检测到手指抬起(抬离)事件。
图形模块132包括用于在触摸屏112或其他显示器上渲染和显示图形的多个已知软件部件,包括用于改变被显示图形的强度的部件。如本文所用,术语“图形”包括可以被显示给用户的任何对象,非限制性地包括文本、网页、图标(诸如包括软按键的用户界面对象)、数字图像、视频、动画等等。
在一些实施例中,图形模块132存储要使用的数据表示图形。每个图形可以被分配有相应的代码。图形模块132从应用程序等接收指定要显示的图形的一个或多个代码,在必要的情况下还一起接收坐标数据和其他图形属性 数据,并且然后生成屏幕图像数据来输出给显示控制器156。
可作为图形模块132的部件的文本输入模块134提供用于在多种应用程序(例如,联系人137、电子邮件140、即时消息141、浏览器147、和需要文本输入的任何其他应用程序)中输入文本的软键盘。
GPS模块135确定设备的位置,并且提供该信息以在各种应用程序中使用(例如,提供给电话138来用于基于位置的拨号、提供给相机143作为图片/视频元数据、以及提供给提供基于位置的服务的应用程序,诸如天气桌面小程序、本地黄页桌面小程序、和地图/导航桌面小程序)。
应用程序136可包括以下模块(或指令集)或者其子组或超集:
●联系人模块137(有时也称为通讯录或联系人列表);
●电话模块138;
●视频会议模块139;
●电子邮件客户端模块140;
●即时消息(IM)模块141;
●锻炼支持模块142;
●用于静态图像和/或视频图像的相机模块143;
●图像管理模块144;
●浏览器模块147;
●日历模块148;
●桌面小程序模块149,其可以包括以下中一者或多者:天气桌面小程序149-1、股市桌面小程序149-2、计算器桌面小程序149-3、闹钟桌面小程序149-4、字典桌面小程序149-5、和用户获得的其他桌面小程序、以及用户创建的桌面小程序149-6;
●用于生成用户创建的桌面小程序149-6的桌面小程序创建器模块150;
●搜索模块151;
●视频和音乐播放器模块152,其可以由视频播放器模块和音乐播放器模块构成;
●便签模块153;
●地图模块154;
●在线视频模块155;
●声音/音频录制器模块163;和/或
●通知模块165。
可被存储在存储器102中的其他应用程序136的示例包括其他文字处理应用程序、其他图像编辑应用程序、画图应用程序、呈现应用程序、JAVA启用的应用程序、加密、数字权益管理、声音识别、和声音复制。
结合触摸屏112、显示控制器156、接触模块130、图形模块132、和文本输入模块134,联系人模块137可被用于管理通讯录或联系人列表(例如,存储在存储器102或存储器370中联系人模块137的应用程序内部状态192中),包括:添加姓名到通讯录;从通讯录删除姓名;将电话号码、电子邮件地址、实际地址或其他信息与姓名关联;将图像与姓名关联;对姓名进行分类和归类;提供电话号码或电子邮件地址来发起和/或促进通过电话138、视频会议139、电子邮件140或IM 141的通信;等等。
结合RF电路系统108、音频电路系统110、扬声器111、麦克风113、触摸屏112、显示控制器156、接触模块130、图形模块132、和文本输入模块134,电话模块138可以被用于输入对应于电话号码的字符序列、访问通讯录137中的一个或多个电话号码、修改已经输入的电话号码、拨打相应的电话号码、进行通话以及当通话完成时断开或挂断。如上所述,无线通信可以使用多个通信标准、协议和技术中的任一个。
结合RF电路系统108、音频电路系统110、扬声器111、麦克风113、触摸屏112、显示控制器156、光学传感器164、光学传感器控制器158、接触模块130、图形模块132、文本输入模块134、联系人列表137、和电话模块138,视频会议模块139包括用于根据用户指令发起、进行、和结束用户与一个或多个其他参与方之间的视频会议的可执行指令。
结合RF电路系统108、触摸屏112、显示控制器156、接触模块130、图形模块132、和文本输入模块134,电子邮件客户端模块140包括用于响应于用户指令来创建、发送、接收、和管理电子邮件的可执行指令。结合图像管理模块144,电子邮件客户端模块140使得非常容易创建和发送具有由相机模块143拍摄的静态图像或视频图像的电子邮件。
结合RF电路系统108、触摸屏112、显示控制器156、接触模块130、图形模块132、和文本输入模块134,即时消息模块141包括用于输入对应于即时消息的字符序列、修改先前输入的字符、传输相应即时消息(例如, 使用短消息服务(SMS)或多媒体消息服务(MMS)协议用于基于电话的即时消息或者使用XMPP、SIMPLE、或IMPS用于基于因特网的即时消息)、接收即时消息以及查看所接收的即时消息的可执行指令。在一些实施例中,所传输和/或接收的即时消息可包括图形、相片、音频文件、视频文件以及/或者MMS和/或增强消息服务(EMS)中所支持的其他附接件。如本文所用,“即时消息”是指基于电话的消息(例如,利用SMS或MMS发送的消息)和基于因特网的消息(例如,利用XMPP、SIMPLE、或IMPS发送的消息)二者。
结合RF电路系统108、触摸屏112、显示控制器156、接触模块130、图形模块132、文本输入模块134、GPS模块135、地图模块154、和音乐播放器模块146,锻炼支持模块142包括可执行指令,用于创建锻炼(例如,具有时间、距离、和/或卡路里消耗目标);与锻炼传感器(体育设备)通信;接收锻炼传感器数据;校准用于监视锻炼的传感器;为锻炼选择和播放音乐;以及显示、存储和传输锻炼数据。
结合触摸屏112、显示控制器156、光学传感器164、光学传感器控制器158、接触模块130、图形模块132、数字图像流水线161(其将来自光学传感器的原始数据转换为最终图像或视频)、和图像管理模块144,相机模块143包括用于捕获静态图像或视频(包括视频流)并将其存储到存储器102中(例如,在数字相机胶卷159中)、修改静态图像或视频的特性、或从存储器102(例如,从数字相机胶卷159)删除静态图像或视频的可执行指令。
结合触摸屏112、显示控制器156、接触模块130、图形模块132、文本输入模块134、和相机模块143,图像管理模块144包括用于排列、修改(例如,编辑)、或以其他方式操控、加标签、删除、呈现(例如在数字幻灯片或相册中)、以及存储静态图像和/或视频图像(包括存储在相机胶卷159中的静态图像和/或视频图像)的可执行指令。
结合RF电路系统108、触摸屏112、显示系统控制器156、接触模块130、图形模块132、和文本输入模块134,浏览器模块147包括用于根据用户指令浏览因特网(包括搜索、链接到、接收、和显示网页或其部分、以及链接到网页的附件和其他文件)的可执行指令。
结合RF电路系统108、触摸屏112、显示系统控制器156、接触模块130、图形模块132、文本输入模块134、电子邮件客户端模块140、和浏览器模块147,日历模块148包括用于根据用户指令创建、显示、修改、和存储日历 和与日历相关联的数据(例如,日历条目、待办任务列表等)的可执行指令。
结合RF电路系统108、触摸屏112、显示系统控制器156、接触模块130、图形模块132、文本输入模块134、和浏览器模块147,桌面小程序模块149是可以由用户下载并使用的微型应用程序(例如,天气桌面小程序149-1、股市桌面小程序149-2、计算器桌面小程序149-3、闹钟桌面小程序149-4、和字典桌面小程序149-5)或由用户创建的微型应用程序(例如,用户创建的桌面小程序149-6)。在一些实施例中,桌面小程序包括HTML(超文本标记语言)文件、CSS(层叠样式表)文件、和JavaScript文件。在一些实施例中,桌面小程序包括XML(可扩展标记语言)文件和JavaScript文件(例如,Yahoo!桌面小程序)。
结合RF电路系统108、触摸屏112、显示系统控制器156、接触模块130、图形模块132、文本输入模块134、和浏览器模块147,桌面小程序创建器模块150可以被用户用来创建桌面小程序(例如,将网页的用户指定部分转到桌面小程序中)。
结合触摸屏112、显示系统控制器156、接触模块130、图形模块132、和文本输入模块134,搜索模块151包括用于根据用户指令在存储器102中搜索与一个或多个搜索标准(例如,用户指定的一个或多个搜索词)匹配的文本、音乐、声音、图像、视频、和/或其他文件的可执行指令。
结合触摸屏112、显示系统控制器156、接触模块130、图形模块132、音频电路系统110、扬声器111、RF电路系统108、和浏览器模块147,视频和音乐播放器模块152包括允许用户下载和回放以一种或多种文件格式(诸如MP3或AAC文件)存储的所记录的音乐和其他声音文件的可执行指令,以及用于显示、呈现或以其他方式回放视频(例如,在触摸屏112上或在经由外部端口124连接的外部显示器上)的可执行指令。在一些实施例中,设备100可以包括MP3播放器的功能性。
结合触摸屏112、显示控制器156、接触模块130、图形模块132、和文本输入模块134,便签模块153包括用于根据用户指令创建和管理便签、待办任务清单等的可执行指令。
结合RF电路系统108、触摸屏112、显示系统控制器156、接触模块130、图形模块132、文本输入模块134、GPS模块135、和浏览器模块147,地图模块154可以被用于根据用户指令接收、显示、修改、和存储地图及与地图 相关联的数据(例如,驾车路线;特定位置处或附近的商店和其他兴趣点的数据;和其他基于位置的数据)。
结合触摸屏112、显示系统控制器156、接触模块130、图形模块132、音频电路系统110、扬声器111、RF电路系统108、文本输入模块134、电子邮件客户端模块140、和浏览器模块147,在线视频模块155包括允许用户访问、浏览、接收(例如,流式接收和/或下载)、回放(例如,在触摸屏上或在经由外部端口124连接的外部显示器上)、发送具有到特定在线视频的链接的电子邮件、以及以其他方式管理一种或多种文件格式(诸如H.264)的在线视频的指令。在一些实施例中,使用即时消息模块141、而不是电子邮件客户端模块140来发送到特定在线视频的链接。
结合触摸屏112、显示系统控制器156、接触模块130、图形模块132、音频电路系统110、扬声器111、和麦克风113,声音/音频录制器模块163包括允许用户以一种或多种文件格式(诸如MP3或AAC文件)记录音频(例如,声音)的可执行指令、以及用于呈现或以其他方式回放所记录的音频文件的可执行指令。
结合触摸屏112、显示系统控制器156、接触模块130、和图形模块132,通知模块165包括在触摸屏112上显示通知或警告(诸如传入消息或来电呼叫、日历事件提醒、应用程序事件等等)的可执行指令。
上述每个模块和应用程序对应于用于执行上述一种或多种功能以及在本申请中所介绍的方法(例如,本文中所描述的计算机实现的方法和其他信息处理方法)的一组可执行指令。这些模块(即指令集)不必被实现为分开的软件程序、过程或模块,因此这些模块的各种子组可以在各种实施例中被组合或以其他方式重新布置。在一些实施例中,存储器102可存储上述模块和数据结构的一个子组。此外,存储器102可以存储上面没有描述的另外的模块和数据结构。
在一些实施例中,便携式多功能设备100是这样一种设备,即在该设备上预定义的一组功能的操作唯一地通过触摸屏和/或触控板来执行。通过使用触摸屏和/或触控板作为用于便携式多功能设备100的操作的主要输入控制设备,可以减少便携式多功能设备100上物理输入控制设备(诸如下压按钮、拨号盘等等)的数量。
唯一地可通过触摸屏和/或触控板执行的该预定义的一组功能包括在用 户界面之间的导航。在一些实施例中,当触控板被用户触摸时将便携式多功能设备100从可显示在便携式多功能设备100上的任何用户界面导航到主要菜单、主菜单或根菜单。在这样的实施例中,触控板可被称为“菜单按钮”。在一些其他实施例中,菜单按钮可以是物理下压按钮或者其他物理输入控制设备,而不是触控板。
图2示出了根据一些实施例的具有触摸屏112的一种便携式多功能设备100。触摸屏可以在用户界面(UI)200内显示一个或多个图形。在该实施例中,以及在下文中介绍的其他实施例中,用户可以通过例如用一根或多根手指202(在附图中没有按比例绘制)或者用一个或多个触控笔203(在附图中没有按比例绘制)在图形上作出手势来选择这些图形中的一个或多个。在一些实施例中,当用户中断与该一个或多个图形的接触时,发生对一个或多个图形的选择。在一些实施例中,手势可包括一次或多次轻击、一次或多次滑动(从左向右、从右向左、向上和/或向下)和/或已经与便携式多功能设备100接触的手指(从右向左、从左向右、向上和/或向下)拨动。在一些实施例中,无意地与图形接触不会选择该图形。例如,当对应于选择的手势是轻击时,在应用程序图标之上扫动的轻扫手势不会选择相应的应用程序。
便携式多功能设备100还可包括一个或多个物理按钮,诸如“主屏幕”或菜单按钮204。如前所述,菜单按钮204可以被用于导航到可以在设备100上运行的一组应用程序中的任何应用程序136。或者,在一些实施例中,菜单按钮被实现为显示在触摸屏112上的GUI中的软键。
在一个实施例中,便携式多功能设备100包括触摸屏112、菜单按钮204、用于设备开关机和锁定设备的下压按钮206、(一个或多个)音量调节按钮208、用户身份模块(SIM)卡槽210、耳麦插孔212、和对接/充电外部端口124。下压按钮206可被用于通过压下该按钮并将该按钮保持在压下状态达预定义的时间间隔来对设备进行开关机;通过压下该按钮并在经过该预定义的时间间隔之前释放该按钮来锁定设备;和/或解锁设备或发起解锁过程。在一另选的实施例中,便携式多功能设备100还可通过麦克风113接受用于激活或解除激活某些功能的语音输入。
图3是根据一些实施例的具有显示器和触敏表面的一种示例性设备的框图。设备300不必是便携式的。在一些实施例中,设备300是膝上型计算机、台式计算机、平板电脑、多媒体播放器设备、导航设备、教育设备(诸如儿 童学习玩具)、游戏系统或控制设备(例如,家用或工业用控制器)。设备300通常包括一个或多个处理单元(CPU)310、一个或多个网络或其他通信接口360、存储器370和用于将这些部件互联的一根或多根通信总线320。在一些实施例中,处理单元310包括图像信号处理器和双核或多核处理器。通信总线320可包括将系统部件互联及控制系统部件之间通信的电路系统(有时称为芯片组)。设备300包括具有显示器340的输入/输出(I/O)接口330,显示器340通常是触摸屏显示器。I/O接口330还可以包括键盘和/或鼠标(或其他指向设备)350和触控板355。设备300还包括光学传感器164和光学传感器控制器158。存储器370包括高速随机存取存储器,诸如DRAM、SRAM、DDR RAM或其他随机存取固态存储器设备;并可包括非易失性存储器,诸如一个或多个磁盘存储设备、光盘存储设备、闪存设备、或其他非易失性固态存储设备。任选地,存储器370可包括从CPU 310远程定位的一个或多个存储设备。在一些实施例中,存储器370存储与电子设备100(图1)的存储器102中所存储的程序、模块和数据结构类似的程序、模块、和数据结构,或它们的子组。此外,存储器370可存储在电子设备100的存储器102中不存在的另外的程序、模块、和数据结构。例如,设备300的存储器370可存储画图模块380、呈现模块382、文字处理模块384、网页创建模块386、盘编辑模块388、和/或电子表格模块390,而电子设备100(图1)的存储器102可不存储这些模块。
图3中上述所识别的元件的每一个可被存储在一个或多个前面提到的存储器设备中。上述所识别的模块的每一个对应于用于执行上述功能的一组指令。上述所识别的模块或程序(即,指令集)不必被实现为单独的软件程序、过程或模块,并且因此这些模块的各种子组可以在各种实施例中被组合或以其它方式重新布置。在一些实施例中,存储器370可存储上述模块和数据结构的子组。此外,存储器370可存储上面没有描述的另外的模块和数据结构。
需要说明的是:本文所涉及的电子设备包括图1和图2中的便携多功能设备100和图3中的设备300。
现在将关注点转到可以在电子设备(诸如设备300或电子设备100)上实现的检测文档边缘的实施例。
图4a至图4b示出了电子设备的一种使用场景示意图。
图4a示出了电子设备调用摄像装置拍摄文档,通常摄像装置拍摄画面 一般要大于文档等源矩形画面的大小,并且存在一定的倾斜角。电子设备内部的单元实时处理拍摄画面,将感兴趣的源矩形画面四个顶点构成的四边形以有序坐标形式输出。
图4b示出了所拍摄图像中源矩形画面所在四边形区域的示例。源矩形画面所在四边形顶点用圆圈标出,该四边形仅占据拍摄画面的一部分,并且存在倾斜。下面以上述拍摄的图像作为输入,以检测到的四个角点构成的四边形作为输出为例描述技术方案。
图5示出了根据一些实施例的用于检测文档边缘的过程或算法。
103.电子设备获取一个彩色图像的每个像素点的多颜色通道数据,所述多颜色通道数据包括像素点的二维坐标值,以及像素点在各个颜色通道的值;
105.该电子设备对所述彩色图像的每个像素点的多颜色通道数据执行直线检测;
107.该电子设备基于执行所述直线检测得到部分或全部直线以及预设条件检测到四边形,所述预设条件包括所述四边形的对边夹角值小于第一阈值,所述四边形的临边夹角值介于预设角度值域,且所述四边形的对边距离大于第二阈值,所述第一阈值为大于零的整数,所述第二阈值为大于零的整数。
可选的,所述预设条件还包括:实际检测到的边缘像素数与所述四边形周长的比值最大。
在图5的基础上,可选的,参见图6所示,图6与图5相比,增加了两个可选步骤101和102,101和102可以独立选用,也可以101和102都选用。
101,该电子设备对输入的彩色图像执行颜色空间转换。
102.该电子设备对待处理彩色图像执行直方图均衡化。
对于步骤101,对输入的彩色图像执行颜色空间转换的示意图可参见图7,图7示出了RGB颜色空间转换到HSV颜色空间的示意图。下面假设步骤101输入的彩色图像可以是由电子设备的拍摄装置捕获的图像,也可以是电子设备保存的图像,也可以是执行直方图均衡化的彩色图像。且假设步骤101输入的彩色图像由RGB颜色通道构成,而最符合人眼视觉原理的为HSV颜色通道,使用HSV通道检测文档边缘符合人眼与检测文档边缘算法的感 受一致性,即检测文档边缘算法能够检测出人眼感知到的边缘信息。此处可使用颜色空间转换算法将RGB颜色通道转化为HSV颜色通道,具体过程如下,其中R、G、B分别表示RGB颜色通道的Red、Green、Blue三颜色通道信息,H、S、V表示HSV颜色通道的Hue、Saturation、Value三颜色通道信息:
(1)max=max(R,G,B)
(2)min=min(R,G,B)
(3)if R=max,H=(G-B)/(max-min)
(4)if G=max,H=2+(B-R)/(max-min)
(5)if B=max,H=4+(R-G)/(max-min)
(6)H=H*60
(7)if H<0,H=H+360
(8)V=max(R,G,B)
(9)S=(max-min)/max
对于步骤102,对待处理彩色图像可以是由电子设备的拍摄装置捕获的图像,也可以是电子设备保存的图像,这种情况下,相当于省略可选步骤101直接选用可选步骤102或者先执行步骤102后执行步骤101。如果先执行步骤101后执行步骤102,则待处理图像为执行颜色空间转换后的彩色图像。执行直方图均衡化的效果示意图如图8所示。通常直方图均衡化是将原始图像某通道的分布直方图从比较集中的某个空间变成在全部取值范围内的均匀分布,对图像进行非线性拉伸,从而增加了图像整体对比度,直方图均衡化过程中一般满足如下条件:
(1)像素无论怎么映射,一定要保证原来的大小关系不变,较亮的区域,依旧是较亮的,较暗依旧暗,只是对比度增大,绝对不能明暗颠倒;
(2)如果是八位图像,那么像素映射函数的值域应在0和255之间的,不能越界
(3)通过累积分布函数均匀分配像素值。
对于步骤105,包括但不限于以下三种方式:
方式一:
1051a,计算所述彩色图像中每一个像素点在每个颜色通道中的梯度值及梯度方向。
1052a,将所有颜色通道中的梯度值中的最大值大于第三阈值的像素点标记为第一状态。
1053a,从标记为所述第一状态的多个像素点中选择所有颜色通道中的梯度值中的最大值为最大的像素点作为起始点,在所有颜色通道执行直线检测得到所有颜色通道对应的直线;将从该起始点出发执行直线检测得到所有颜色通道对应的直线中最长直线保存,加入候选文档边缘列表,并将所有颜色通道中该最长直线上点标记为第二状态;
重复执行步骤1053a直至所述彩色图像的全部像素点标记为所述第二状态。具体地,(1)梯度实质是是对图像这个二维离散函数的求导,反映了图像的强弱变化。梯度包含梯度值、梯度方向两个维度,梯度值反映了图像强弱变化具体大小,而梯度方向则指代图像强弱变化最大的方向。以Hue单通道为例,在坐标点(i,j)处的值为h(i,j),则图像在该点x轴方向的变化dx(i,j)=I(i+1,j)-I(i,j);y轴方向的变化dy(i,j)=I(i,j+1)-I(i,j);因此Hue单通道图像在像素点(i,j)处的梯度值为GradH(i,j)=dx(i,j)+dy(i,j),类似的,图像在像素点(i,j)处的Saturation通道梯度为GradS(i,j),Value通道梯度GradV(i,j)。将所有像素点上的三个通道最大梯度值定义为该点的梯度,即Grad(i,j)=max(GradH(i,j),GradS(i,j),GradV(i,j)),然后将所有像素点按梯度值大小进行排序,若某像素点存在某一通道将梯度值大于某一阈值的点标记为第一状态(例如,Unused)。应理解:梯度是向量,梯度值是标量。
某像素点处梯度方向是图像亮弱变化最大的方向,通常文档边缘亮弱变化比较明显,文档边缘方向一般是梯度方向的垂直方向,此处用于决定直线的搜索方向。
图9示出了图像的梯度和图像的梯度方向。
(2)从图像中标记为第一状态(例如,Unused)的多个像素点中选择所有颜色通道中的梯度值中的最大值为最大的像素点作为起始点,在所有颜色通道同时执行直线检测。在某一单通道上直线检测的具体过程为:沿该点梯度方向的垂直方向搜索标记为第一状态的像素点,直至搜索路径上标记为第二状态(例如,Used)的像素点超过一定数目(例如,数目设定为3),则停止搜索。统计该搜索方向上标记为第一状态的像素点数目超过一定阈值,则定义从起始点到搜索结束点为该颜色通道上的一条直线,多个颜色通道存在多条直线。可选的,还可以设置一个条件来进一步限定搜索结束点,即, 所有像素点与起始点梯度方向夹角小于某一阈值。
(3)将从该起始点出发不同颜色通道的直线按照长度排序,保存长度最长直线,加入候选文档边缘列表,并将所有通道中该最长直线上点标记为第二状态(例如Used)。
(4)继续执行步骤(2)和(3),直至所有像素点标记为第二状态,候选文档直线检测结束。
方式二:
对该起始点的所有颜色通道中每个颜色通道执行如下操作1051b和1052b:
1051b,在所述彩色图像中所有多颜色通道分别执行Canny边缘检测,将在所有多颜色通道中检测到的边缘均标记为边缘点,构建多颜色通道混合边缘;
1052b,对所述多通道混合边缘上的边缘点执行霍夫(Hough)直线检测,将检测到的直线加入候选文档边缘列表。
具体地,(1.1)在图像中不同颜色通道分别执行Canny边缘点检测,将不同通道检测到的边缘均标记为边缘点,构建多通道混合边缘。
(1.2)在多颜色通道混合边缘点上执行霍夫直线检测,将检测到的直线加入候选文档边缘列表。
其中,Canny边缘检测的具体流程如图10所示:
(1001)利用高斯滤波器平滑图像以去除图像噪声
对原始数据与高斯核作卷积,得到的图像与原始图像相比有些轻微的模糊。这样,单独的一个突出像素在经过高斯平滑的图像上变得几乎没有影响。
(1002)用一阶偏导的有限差分来计算梯度值(即,梯度的幅值)和梯度方向(即,梯度的方向)。
图像中的边缘可能会指向不同的方向,所以Canny边缘检测使用4个卷积模板检测水平、垂直以及对角线方向的边缘。原始图像与每个卷积模板所作的卷积都存储起来。对于每个像素点我们都标识在这个像素点上的最大值以及生成的边缘的方向。这样我们就从原始图像生成了图像中每个像素点亮度梯度图以及亮度梯度的方向。
(1003)对梯度幅值进行非极大值抑制。
仅仅得到全局的梯度并不足以确定边缘,因此为确定边缘,必须保留局 部梯度最大的点,而抑制非极大值(non-maxima suppression,缩写:NMS)。具体地解决方法为利用梯度的方向:将梯度角离散为圆周的四个扇区之一,以便用3*3的窗口作抑制运算。四个扇区的标号为0到3,对应3*3邻域的四种可能组合。在每一像素点上,邻域的中心象素与沿着梯度线的两个象素相比。如果中心象素的梯度值不比沿梯度线的两个相邻象素梯度值大,则令中心象素为0。图11示出了抑制非极大值的示意图。
(1004)用双阈值算法检测和连接边缘。
减少假边缘数量的典型方法是双阈值算法,将低于阈值的所有值赋零。双阈值算法包含对非极大值抑制图象作用的两个阈值τ1和τ2,且2τ1≈τ2,从而可以得到两个阈值边缘图象。高阈值图像含有很少的假边缘,但有间断(不闭合),需要低阈值图像中寻找边缘与原有高阈值图像边缘连接成轮廓。
其中,霍夫(Hough)直线检测的流程如图12所示:
(1.2.1)从Canny检测获得的边缘点集合进行极坐标变换,由直线方程可知,在原直角坐标系上位于同一直线上的点落在极坐标系中的同一坐标上;
(1.2.2)在边缘点集合对应的极坐标上随机选取一个像素点,对应的累加器加1;
(1.2.3)将该边缘点从集合中删除,继续重复步骤(1.2.2),直至该极坐标点上不存在边缘点;
(1.2.4)若该像素点对应的累加器数目大于某一阈值,则定位该像素点为一条直线。
方式三:
1051c,计算所述彩色图像每个颜色通道的复杂度x(i),i为颜色通道编号,i=0,1,…,n-1,n为颜色通道数目,所述复杂度包括所述彩色图像的信息熵,或者所述彩色图像的JPG压缩率;
1052c在每个颜色通道分别执行直线检测,并分别按照长度的大小排序,选择m*f(x(i))条长度较长的直线加入候选文档边缘列表,f(x(i))为介于0和1之间的数值,m为候选文档边缘数目,m*f(x(i))为i颜色通道的保留直线数目。应理解:“选择m*f(x(i))条长度较长的直线”可以理解为选择的m*f(x(i))条直线是执行直线检测得到的直线排序靠前的m*f(x(i))条,排序的原则为按照长度的大小从大到小排列。另外,例如,f(x(i))=1-x(i)/sum(x(i))。
对于步骤107,基于执行所述直线检测得到部分或全部直线以及预设条 件检测到四边形,包括:
将基于执行所述直线检测得到的候选直线按照倾斜角以及位置分成四种类别,所述四种类别包括:上、下、左和右;
从每个类别直线中循环选择一条直线,按所所述预设条件构建四边形集合;
从所述四边形集合中选取比值最大的一个四边形作为边缘检测的结果,所述比值是实际检测到的边缘像素数除以拟合四边形周长得到的值。
应理解:所述预设条件包括所述四边形的对边夹角值小于第一阈值(例如第一阈值取值范围为0-50度),所述四边形的临边夹角值介于预设角度值域(例如,预设角度值域为(60度-120度)),且所述四边形的对边距离大于第二阈值(例如,第二阈值为为图像相应边的1/5长度),所述第一阈值为大于零的整数,所述第二阈值为大于零的整数。
在前述技术方案的基础上,一种可选方案如图13所示,具体步骤参见前述相应描述,此处不再赘述。
应理解:对于步骤107中的“基于执行所述直线检测得到部分或全部直线以及预设条件检测到四边形”可以理解为多颜色通道直线融合,即每个颜色通道提供部分边缘,最后融合为一个四边形,如图14所示。
前述描述的技术方案针对位于图像不同位置的局部边缘直线,从多个颜色通道同时进行直线检测,通过多颜色通道直线融合,将相同局部区域不同颜色通道检测到的最长直线加入边缘候选列表。如图15所示,卡片左右侧光线不同,左边缘V通道检测效果较好,而右边缘S通道更为理想。
在前述技术方案的基础上,一种可选方案如图16所示,与前述步骤相同的步骤请参见前述相应描述,此处不再赘述。可选的:在步骤107之后,对检测到的四边形执行以下至少一种处理:四边形原始比例估计;或者,姿态投影矩估计;或者,四边形校正;或者,图像增强。图16示出了包括所有可选步骤的示意图,这些可选步骤采用的越多,最后输出的图像的效果越好。例如,在检测到四边形之前,增加图像预处理操作,可以进一步提高在文档边缘与文档所处背景区分不明显的时候检测文档边缘的成功率。再例如,在检测到四边形之后,增加图16中部分或所有可选步骤,可以进一步实现文档校正,使得电子设备在倾斜角度下拍摄文档时都能输出清晰的矩形文档。
在前述技术方案的基础上,一种可选的多颜色通道检测文档边缘算法流程图如图17所示,其中多颜色通道直线融合的处理流程参见图18。具体步骤参见前述相应描述,此处不再赘述。
在前述技术方案的基础上,一种可选方案如图19所示,与前述步骤相同的步骤请参见前述相应描述,此处不再赘述。需要说明的是:Sobel边缘检测是Canny边缘检测的替代方案,还可以采用其他边缘检测的技术。
在前述技术方案的基础上,一种可选方案如图20所示,与前述步骤相同的步骤请参见前述相应描述,此处不再赘述。
在前述技术方案的基础上,一种可选方案如图21所示,与前述步骤相同的步骤请参见前述相应描述,此处不再赘述。
根据一些实施例,图22示出了处理单元的功能框图。图22中的处理单元1702可以为图1和图2中处理器120;或,图22中的处理单元1702可以为图3中央处理器310;或,图22中的处理单元1702可以为图1和图2中未示出的协处理器;或图22中的处理单元1702可以为图1和图2中未示出的图形处理器(Graphics Processing Unit,缩写:GPU);或,图22中的处理单元1702可以为图3中未示出的协处理器;或,图22中的处理单元1702可以为图3中未示出的图形处理器。处理单元1702的功能模块可由硬件、软件、或者软硬件组合实现,以执行本发明的原理。本领域的技术人员能够理解,图22中所述的功能块可被组合为或者被分离为子块,以实现上述的本发明的原理。因此,本文的描述可以支持本文所述功能块的任何可能的组合或分离或进一步限定。应理解:处理单元可以为处理器,也可以为协处理器。
如图22中所示,处理单元1702包括:获取单元102,获取单元102用于获取一个彩色图像的每个像素点的多颜色通道数据,所述多颜色通道数据包括像素点的二维坐标值,以及像素点在各个颜色通道的值;以及直线检测单元103,该直线检测单元103耦接到获取单元102。该直线检测单元103用于对所述彩色图像的每个像素点的多颜色通道数据执行直线检测;还有检测四边形单元104,该检测四边形单元104耦接到直线检测单元103,该检测四边形单元104用于基于执行所述直线检测得到部分或全部直线以及预设条件检测到四边形,所述预设条件包括所述四边形的对边夹角值小于第一阈 值,所述四边形的临边夹角值介于预设角度值域,且所述四边形的对边距离大于第二阈值,所述第一阈值为大于零的整数,所述第二阈值为大于零的整数。
可选的,所述预设条件还包括:实际检测到的边缘像素数与所述四边形周长的比值最大。
可选的,所述处理单元1702还包括预处理单元101,该预处理单元101用于在获取一个彩色图像的每个像素点的多颜色通道数据之前,还对所述彩色图像执行预处理,所述预处理包括颜色空间转换和/或直方图均衡化。
可选的,所述处理单元1702还包括四边形原始比例估计单元105,姿态投影矩阵估计单元106,四边形校正单元107以及图像增强单元108中的至少一个。其中,该四边形原始比例估计单元105用于对检测到的四边形执行原始比例估计。该姿态投影矩阵估计单元106用于对输入的四边形执行姿态投影矩阵估计。该四边形校正单元107用于对输入的四边形执行四边形校正。该图像增强单元108用于对输入的图像执行图像增强。
以上对处理单元1702的功能进行描述,本领域普通技术人员可以理解,以下的功能跟本文描述的电子设备、系统、装置、方法、图形用户界面、信息处理装置(例如:处理器芯片或者处理器芯片组)、计算机可读存储介质各个实施例是相呼应的,其互相结合和/或组合的各种情况均为本领域普通技术人员在理解本文以后可以直接而毫无疑义地想到的组合。
应当理解,本领域普通技术人员通过理解本文可以认识到上面参考图5至图21所述的操作可由图1或图3或图22中所示的部件来实现。类似地,本领域技术人员会清楚地知道基于在图1或图3或图22中所示的部件可如何实现其他过程。
为了解释的目的,前面的描述是通过参考具体实施例来进行描述的。然而,上面的示例性的讨论并非意图是详尽的,也并非意图要将本发明限制到所公开的精确形式。根据以上教导内容,很多修改形式和变型形式都是可能的。选择和描述实施例是为了充分阐明本发明的原理及其实际应用,以由此使得本领域的其他技术人员能够充分利用具有适合于所构想的特定用途的各种修改的本发明以及各种实施例。

Claims (31)

  1. 一种电子设备,包括:
    显示器;
    一个或多个处理器;
    存储器;
    多个应用程序;以及
    一个或多个程序,其中所述一个或多个程序被存储在所述存储器中并被配置为被所述一个或多个处理器执行,所述一个或多个程序包括指令,所述指令用于:
    获取一个彩色图像的每个像素点的多颜色通道数据,所述多颜色通道数据包括像素点的二维坐标值,以及像素点在各个颜色通道的值;
    对所述彩色图像的每个像素点的多颜色通道数据执行直线检测;
    基于执行所述直线检测得到部分或全部直线以及预设条件检测到四边形,所述预设条件包括所述四边形的对边夹角值小于第一阈值,所述四边形的临边夹角值介于预设角度值域,且所述四边形的对边距离大于第二阈值,所述第一阈值为大于零的整数,所述第二阈值为大于零的整数。
  2. 如权利要求1所述的电子设备,其中,所述预设条件还包括:实际检测到的边缘像素数与所述四边形周长的比值最大。
  3. 如权利要求1或2所述的电子设备,其中,所述指令进一步用于:
    在获取一个彩色图像的每个像素点的多颜色通道数据之前,还对所述彩色图像执行预处理,所述预处理包括颜色空间转换或直方图均衡化中的至少一个。
  4. 如权利要求1至3任一项所述的电子设备,其中,所述指令进一步用于:
    对所述彩色图像的每个像素点的多颜色通道数据执行直线检测,包括:
    S1:计算所述彩色图像中每一个像素点在每个颜色通道中的梯度值 及梯度方向;
    S2:将所有颜色通道中的梯度值中的最大值大于第三阈值的像素点标记为第一状态;
    S3:从标记为所述第一状态的多个像素点中选择所有颜色通道中的梯度值中的最大值为最大的像素点作为起始点,在所有颜色通道执行直线检测得到所有颜色通道对应的直线;将从该起始点出发执行直线检测得到所有颜色通道对应的直线中最长直线保存,加入候选文档边缘列表,并将所有颜色通道中该最长直线上点标记为第二状态;
    重复执行步骤S3直至所述彩色图像的全部像素点标记为所述第二状态。
  5. 如权利要求4所述的电子设备,其中,所述指令进一步用于:
    所述从标记为所述第一状态的多个像素点中选择所有颜色通道中的梯度值中的最大值为最大的像素点作为起始点,在所有颜色通道执行直线检测得到所有颜色通道对应的直线,包括:
    对该起始点的所有颜色通道中每个颜色通道执行如下操作:
    沿该起始点的所述梯度方向的垂直方向搜索标记所述第一状态的像素点,直至搜索路径上标记为所述第二状态的像素点的数量大于第三阈值;
    确定执行直线检测得到的一个直线,所述直线的两个端点分别为所述起始点和搜索路径上的结束点,该搜索路径上标记为所述第一状态的像素点的数量大于第四阈值。
  6. 如权利要求1至3任一项所述的电子设备,其中,所述指令进一步用于:
    对所述彩色图像的每个像素点的多颜色通道数据执行直线检测,包括:
    在所述彩色图像中所有多颜色通道分别执行Canny边缘检测,将在所有多颜色通道中检测到的边缘均标记为边缘点,构建多颜色通道混合边缘;
    对所述多通道混合边缘上的边缘点执行霍夫直线检测,将检测到的直线加入候选文档边缘列表。
  7. 如权利要求1至3任一项所述的电子设备,其中,所述指令进一步用于:
    对所述彩色图像的每个像素点的多颜色通道数据执行直线检测,包括:
    计算所述彩色图像每个颜色通道的复杂度x(i),i为颜色通道编号,i=0,1,...,n-1,n为颜色通道数目,所述复杂度包括所述彩色图像的信息熵,或者所述彩色图像的JPG压缩率;
    在每个颜色通道分别执行直线检测,并分别按照长度的大小排序,选择m*f(x(i))条长度较长的直线加入候选文档边缘列表,f(x(i))为介于0和1之间的数值,m为候选文档边缘数目,m*f(x(i))为i颜色通道的保留直线数目。
  8. 如权利要求1-7任一项所述的电子设备,其中,所述指令进一步用于:
    基于执行所述直线检测得到部分或全部直线以及预设条件检测到四边形,包括:
    将基于执行所述直线检测得到的候选直线按照倾斜角以及位置分成四种类别,所述四种类别包括:上、下、左和右;
    从每个类别直线中循环选择一条直线,按所所述预设条件构建四边形集合;
    从所述四边形集合中选取比值最大的一个四边形作为边缘检测的结果,所述比值是实际检测到的边缘像素数除以拟合四边形周长得到的值。
  9. 如权利要求1-8任一项所述的电子设备,其中,所述指令进一步用于:
    对检测到的四边形执行以下至少一种处理:
    四边形原始比例估计;或者,
    姿态投影矩阵估计;或者,
    四边形校正;或者,
    图像增强。
  10. 一种方法,用于电子设备,所述方法包括:
    获取一个彩色图像的每个像素点的多颜色通道数据,所述多颜色通道数据包括像素点的二维坐标值,以及像素点在各个颜色通道的值;
    对所述彩色图像的每个像素点的多颜色通道数据执行直线检测;
    基于执行所述直线检测得到部分或全部直线以及预设条件检测到四边形,所述预设条件包括所述四边形的对边夹角值小于第一阈值,所述四边形的临边夹角值介于预设角度值域,且所述四边形的对边距离大于第二阈值,所述第一阈值为大于零的整数,所述第二阈值为大于零的整数。
  11. 如权利要求10所述的方法,其中,
    所述预设条件还包括:实际检测到的边缘像素数与所述四边形周长的比值最大。
  12. 如权利要求10或11所述的方法,进一步包括:
    在获取一个彩色图像的每个像素点的多颜色通道数据之前,还对所述彩色图像执行预处理,所述预处理包括颜色空间转换或直方图均衡化中的至少一个。
  13. 如权利要求10至12任一项所述的方法,其中,
    对所述彩色图像的每个像素点的多颜色通道数据执行直线检测,包括:
    S1:计算所述彩色图像中每一个像素点在每个颜色通道中的梯度值及梯度方向;
    S2:将所有颜色通道中的梯度值中的最大值大于第三阈值的像素点标记为第一状态;
    S3:从标记为所述第一状态的多个像素点中选择所有颜色通道中的梯度值中的最大值为最大的像素点作为起始点,在所有颜色通道执行直线检测得到所有颜色通道对应的直线;将从该起始点出发执行直线检测得到所有颜色通道对应的直线中最长直线保存,加入候选文档边缘列表,并将所有颜色通道中该最长直线上点标记为第二状态;
    重复执行步骤S3直至所述彩色图像的全部像素点标记为所述第二 状态。
  14. 如权利要求13所述的方法,其中,
    所述从标记为所述第一状态的多个像素点中选择所有颜色通道中的梯度值中的最大值为最大的像素点作为起始点,在所有颜色通道执行直线检测得到所有颜色通道对应的直线,包括:
    对该起始点的所有颜色通道中每个颜色通道执行如下操作:
    沿该起始点的所述梯度方向的垂直方向搜索标记所述第一状态的像素点,直至搜索路径上标记为所述第二状态的像素点的数量大于第三阈值;
    确定执行直线检测得到的一个直线,所述直线的两个端点分别为所述起始点和搜索路径上的结束点,该搜索路径上标记为所述第一状态的像素点的数量大于第四阈值。
  15. 如权利要求10至12任一项所述的方法,其中,
    对所述彩色图像的每个像素点的多颜色通道数据执行直线检测,包括:
    在所述彩色图像中所有多颜色通道分别执行Canny边缘检测,将在所有多颜色通道中检测到的边缘均标记为边缘点,构建多颜色通道混合边缘;
    对所述多通道混合边缘上的边缘点执行霍夫直线检测,将检测到的直线加入候选文档边缘列表。
  16. 如权利要求10至12任一项所述的方法,其中,
    对所述彩色图像的每个像素点的多颜色通道数据执行直线检测,包括:
    计算所述彩色图像每个颜色通道的复杂度x(i),i为颜色通道编号,i=0,1,...,n-1,n为颜色通道数目,所述复杂度包括所述彩色图像的信息熵,或者所述彩色图像的JPG压缩率;
    在每个颜色通道分别执行直线检测,并分别按照长度的大小排序,选择m*f(x(i))条长度较长的直线加入候选文档边缘列表,f(x(i))为介于0和1之间的数值,m为候选文档边缘数目,m*f(x(i))为i颜色通道的保留直线数目。
  17. 如权利要求10至16任一项所述的方法,其中,
    基于执行所述直线检测得到部分或全部直线以及预设条件检测到四边形,包括:
    将基于执行所述直线检测得到的候选直线按照倾斜角以及位置分成四种类别,所述四种类别包括:上、下、左和右;
    从每个类别直线中循环选择一条直线,按所所述预设条件构建四边形集合;
    从所述四边形集合中选取比值最大的一个四边形作为边缘检测的结果,所述比值是实际检测到的边缘像素数除以拟合四边形周长得到的值。
  18. 如权利要求10至17任一项所述的方法,进一步包括:
    对检测到的四边形执行以下至少一种处理:
    四边形原始比例估计;或者,
    姿态投影矩阵估计;或者,
    四边形校正;或者,
    图像增强。
  19. 一种存储一个或多个程序的计算机可读存储介质,所述一个或多个程序包括指令,所述指令当被具有加速度计和磁力计的电子设备执行时使所述电子设备执行以下事件:
    获取一个彩色图像的每个像素点的多颜色通道数据,所述多颜色通道数据包括像素点的二维坐标值,以及像素点在各个颜色通道的值;
    对所述彩色图像的每个像素点的多颜色通道数据执行直线检测;
    基于执行所述直线检测得到部分或全部直线以及预设条件检测到四边形,所述预设条件包括所述四边形的对边夹角值小于第一阈值,所述四边形的临边夹角值介于预设角度值域,且所述四边形的对边距离大于第二阈值,所述第一阈值为大于零的整数,所述第二阈值为大于零的整数。
  20. 如权利要求19所述的计算机可读存储介质,其中,
    所述预设条件还包括:实际检测到的边缘像素数与所述四边形周长的比值最大。
  21. 如权利要求19或20所述的计算机可读存储介质,其中,所述电子 设备进一步执行以下事件:
    在获取一个彩色图像的每个像素点的多颜色通道数据之前,还对所述彩色图像执行预处理,所述预处理包括颜色空间转换或直方图均衡化中的至少一个。
  22. 如权利要求19至21任一项所述的计算机可读存储介质,其中,所述电子设备进一步执行以下事件:
    对所述彩色图像的每个像素点的多颜色通道数据执行直线检测,包括:
    S1:计算所述彩色图像中每一个像素点在每个颜色通道中的梯度值及梯度方向;
    S2:将所有颜色通道中的梯度值中的最大值大于第三阈值的像素点标记为第一状态;
    S3:从标记为所述第一状态的多个像素点中选择所有颜色通道中的梯度值中的最大值为最大的像素点作为起始点,在所有颜色通道执行直线检测得到所有颜色通道对应的直线;将从该起始点出发执行直线检测得到所有颜色通道对应的直线中最长直线保存,加入候选文档边缘列表,并将所有颜色通道中该最长直线上点标记为第二状态;
    重复执行步骤S3直至所述彩色图像的全部像素点标记为所述第二状态。
  23. 如权利要求22所述的计算机可读存储介质,
    所述电子设备进一步执行以下事件:
    所述从标记为所述第一状态的多个像素点中选择所有颜色通道中的梯度值中的最大值为最大的像素点作为起始点,在所有颜色通道执行直线检测得到所有颜色通道对应的直线,包括:
    对该起始点的所有颜色通道中每个颜色通道执行如下操作:
    沿该起始点的所述梯度方向的垂直方向搜索标记所述第一状态的像素点,直至搜索路径上标记为所述第二状态的像素点的数量大于第三阈值;
    确定执行直线检测得到的一个直线,所述直线的两个端点分别为所述起始点和搜索路径上的结束点,该搜索路径上标记为所述第一状态的 像素点的数量大于第四阈值。。
  24. 如权利要求19至21任一项所述的计算机可读存储介质,
    所述电子设备进一步执行以下事件:
    对所述彩色图像的每个像素点的多颜色通道数据执行直线检测,包括:
    在所述彩色图像中所有多颜色通道分别执行Canny边缘检测,将在所有多颜色通道中检测到的边缘均标记为边缘点,构建多颜色通道混合边缘;
    对所述多通道混合边缘上的边缘点执行霍夫直线检测,将检测到的直线加入候选文档边缘列表。
  25. 如权利要求19至21任一项所述的计算机可读存储介质,
    所述电子设备进一步执行以下事件:
    对所述彩色图像的每个像素点的多颜色通道数据执行直线检测,包括:
    计算所述彩色图像每个颜色通道的复杂度x(i),i为颜色通道编号,i=0,1,...,n-1,n为颜色通道数目,所述复杂度包括所述彩色图像的信息熵,或者所述彩色图像的JPG压缩率;
    在每个颜色通道分别执行直线检测,并分别按照长度的大小排序,
    选择m*f(x(i))条长度较长的直线加入候选文档边缘列表,f(x(i))为介于0和1之间的数值,m为候选文档边缘数目,m*f(x(i))为i颜色通道的保留直线数目。
  26. 如权利要求19至21任一项所述的计算机可读存储介质,
    所述电子设备进一步执行以下事件:
    基于执行所述直线检测得到部分或全部直线以及预设条件检测到四边形,包括:
    将基于执行所述直线检测得到的候选直线按照倾斜角以及位置分成四种类别,所述四种类别包括:上、下、左和右;
    从每个类别直线中循环选择一条直线,按所所述预设条件构建四边形集合;
    从所述四边形集合中选取比值最大的一个四边形作为边缘检测的结果,所述比值是实际检测到的边缘像素数除以拟合四边形周长得到的值。
  27. 如权利要求19至26任一项所述的计算机可读存储介质,
    所述电子设备进一步执行以下事件:
    对检测到的四边形执行以下至少一种处理:
    四边形原始比例估计;或者,
    姿态投影矩阵估计;或者,
    四边形校正;或者,
    图像增强。
  28. 一种电子设备,包括:
    显示器;一个或多个处理器;存储器;多个应用程序;以及一个或多个程序,其中所述一个或多个程序被存储在所述存储器中并被配置为被所述一个或多个处理器执行,所述一个或多个程序包括用于执行根据权利要求1至9任一项所述的方法的指令。
  29. 一种存储一个或多个程序的计算机可读存储介质,所述一个或多个程序包括指令,所述指令当被包括显示器和多个应用程序的电子设备执行时使所述电子设备执行根据权利要求1至9任一项所述方法。
  30. 一种电子设备上的图形用户界面,所述电子设备包括显示器、存储器、多个应用程序;和用于执行存储在所述存储器中的一个或多个程序的一个或多个处理器,所述图形用户界面包括根据权利要求1至9任一项所述的方法显示的用户界面。
  31. 一种电子设备,包括:
    显示器;
    多个应用程序;以及
    用于执行根据权利要求1至9任一项所述的方法的装置。
PCT/CN2016/113987 2016-12-30 2016-12-30 用于处理文档的设备、方法和图形用户界面 WO2018120238A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP16924927.3A EP3547218B1 (en) 2016-12-30 2016-12-30 File processing device and method, and graphical user interface
CN201680091829.4A CN110100251B (zh) 2016-12-30 2016-12-30 用于处理文档的设备、方法和计算机可读存储介质
US16/473,678 US11158057B2 (en) 2016-12-30 2016-12-30 Device, method, and graphical user interface for processing document
PCT/CN2016/113987 WO2018120238A1 (zh) 2016-12-30 2016-12-30 用于处理文档的设备、方法和图形用户界面

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/113987 WO2018120238A1 (zh) 2016-12-30 2016-12-30 用于处理文档的设备、方法和图形用户界面

Publications (1)

Publication Number Publication Date
WO2018120238A1 true WO2018120238A1 (zh) 2018-07-05

Family

ID=62706782

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/113987 WO2018120238A1 (zh) 2016-12-30 2016-12-30 用于处理文档的设备、方法和图形用户界面

Country Status (4)

Country Link
US (1) US11158057B2 (zh)
EP (1) EP3547218B1 (zh)
CN (1) CN110100251B (zh)
WO (1) WO2018120238A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110442521A (zh) * 2019-08-02 2019-11-12 腾讯科技(深圳)有限公司 控件单元检测方法及装置
CN112906681A (zh) * 2019-12-04 2021-06-04 杭州海康威视数字技术股份有限公司 一种仪表读数方法、装置、电子设备及存储介质

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018055496A (ja) * 2016-09-29 2018-04-05 日本電産サンキョー株式会社 媒体認識装置および媒体認識方法
CN110100251B (zh) * 2016-12-30 2021-08-20 华为技术有限公司 用于处理文档的设备、方法和计算机可读存储介质
CN108419062B (zh) * 2017-02-10 2020-10-02 杭州海康威视数字技术股份有限公司 图像融合设备和图像融合方法
JP6810892B2 (ja) * 2017-06-05 2021-01-13 京セラドキュメントソリューションズ株式会社 画像処理装置
JP6996200B2 (ja) * 2017-09-29 2022-01-17 富士通株式会社 画像処理方法、画像処理装置、および画像処理プログラム
CN110926486B (zh) * 2019-11-26 2021-06-11 百度在线网络技术(北京)有限公司 一种路线确定方法、装置、设备和计算机存储介质
CN111539269A (zh) * 2020-04-07 2020-08-14 北京达佳互联信息技术有限公司 文本区域的识别方法、装置、电子设备和存储介质
CN111582134A (zh) * 2020-04-30 2020-08-25 平安科技(深圳)有限公司 证件边沿检测方法、装置、设备和介质
CN112418204A (zh) * 2020-11-18 2021-02-26 杭州未名信科科技有限公司 基于纸质文档的文本识别方法、系统及计算机介质
CN115061616A (zh) * 2022-06-02 2022-09-16 北京字跳网络技术有限公司 文档视图切换方法、装置和电子设备
CN115063414B (zh) * 2022-08-05 2022-12-20 深圳新视智科技术有限公司 锂电池极片胶纸的检测方法、装置、设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105339951A (zh) * 2013-06-12 2016-02-17 柯达阿拉里斯股份有限公司 用于检测文档边界的方法
CN105450900A (zh) * 2014-06-24 2016-03-30 佳能株式会社 用于文档图像的畸变校正方法和设备
WO2016065551A1 (en) * 2014-10-29 2016-05-06 Microsoft Technology Licensing, Llc Whiteboard and document image detection method and system
CN106063240A (zh) * 2013-11-14 2016-10-26 微软技术许可有限责任公司 用于生产力应用的图像处理

Family Cites Families (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7301564B2 (en) * 2002-07-17 2007-11-27 Hewlett-Packard Development Company, L.P. Systems and methods for processing a digital captured image
US7171056B2 (en) 2003-02-22 2007-01-30 Microsoft Corp. System and method for converting whiteboard content into an electronic document
US7672507B2 (en) * 2004-01-30 2010-03-02 Hewlett-Packard Development Company, L.P. Image processing methods and systems
US7330604B2 (en) * 2006-03-02 2008-02-12 Compulink Management Center, Inc. Model-based dewarping method and apparatus
US20070253040A1 (en) 2006-04-28 2007-11-01 Eastman Kodak Company Color scanning to enhance bitonal image
JP4835459B2 (ja) 2007-02-16 2011-12-14 富士通株式会社 表認識プログラム、表認識方法および表認識装置
EP2143041A4 (en) 2007-05-01 2011-05-25 Compulink Man Ct Inc PHOTODOCUMENTEGMENTATION METHOD AND METHOD
JP4402138B2 (ja) 2007-06-29 2010-01-20 キヤノン株式会社 画像処理装置、画像処理方法、コンピュータプログラム
US9672510B2 (en) * 2008-01-18 2017-06-06 Mitek Systems, Inc. Systems and methods for automatic image capture and processing of documents on a mobile device
JP5274305B2 (ja) 2009-02-27 2013-08-28 キヤノン株式会社 画像処理装置、画像処理方法、コンピュータプログラム
JP5596938B2 (ja) 2009-06-02 2014-09-24 キヤノン株式会社 画像処理装置、画像処理方法、及びプログラム
US9047531B2 (en) * 2010-05-21 2015-06-02 Hand Held Products, Inc. Interactive user interface for capturing a document in an image signal
JP5528229B2 (ja) 2010-06-23 2014-06-25 キヤノン株式会社 文書生成装置、文書生成システム、文書アップロード方法及びプログラム
US8781152B2 (en) * 2010-08-05 2014-07-15 Brian Momeyer Identifying visual media content captured by camera-enabled mobile device
US8989515B2 (en) * 2012-01-12 2015-03-24 Kofax, Inc. Systems and methods for mobile image capture and processing
US10289660B2 (en) * 2012-02-15 2019-05-14 Apple Inc. Device, method, and graphical user interface for sharing a content object in a document
JP2014092899A (ja) * 2012-11-02 2014-05-19 Fuji Xerox Co Ltd 画像処理装置及び画像処理プログラム
WO2014160426A1 (en) * 2013-03-13 2014-10-02 Kofax, Inc. Classifying objects in digital images captured using mobile devices
CN105830091A (zh) 2013-11-15 2016-08-03 柯法克斯公司 使用移动视频数据生成长文档的合成图像的系统和方法
US10115031B1 (en) * 2015-02-27 2018-10-30 Evernote Corporation Detecting rectangular page and content boundaries from smartphone video stream
CN105184265A (zh) 2015-09-14 2015-12-23 哈尔滨工业大学 一种基于自学习的手写表格数字字符串快速识别的方法
CN105955599B (zh) 2016-05-13 2019-06-07 锐达互动科技股份有限公司 一种在电子设备上模拟文档阅读方式的实现方法
US10176403B2 (en) * 2016-05-24 2019-01-08 Morphotrust Usa, Llc Shape detection
US10503997B2 (en) * 2016-06-22 2019-12-10 Abbyy Production Llc Method and subsystem for identifying document subimages within digital images
CN110100251B (zh) * 2016-12-30 2021-08-20 华为技术有限公司 用于处理文档的设备、方法和计算机可读存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105339951A (zh) * 2013-06-12 2016-02-17 柯达阿拉里斯股份有限公司 用于检测文档边界的方法
CN106063240A (zh) * 2013-11-14 2016-10-26 微软技术许可有限责任公司 用于生产力应用的图像处理
CN105450900A (zh) * 2014-06-24 2016-03-30 佳能株式会社 用于文档图像的畸变校正方法和设备
WO2016065551A1 (en) * 2014-10-29 2016-05-06 Microsoft Technology Licensing, Llc Whiteboard and document image detection method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3547218A4 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110442521A (zh) * 2019-08-02 2019-11-12 腾讯科技(深圳)有限公司 控件单元检测方法及装置
CN110442521B (zh) * 2019-08-02 2023-06-27 腾讯科技(深圳)有限公司 控件单元检测方法及装置
CN112906681A (zh) * 2019-12-04 2021-06-04 杭州海康威视数字技术股份有限公司 一种仪表读数方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
US11158057B2 (en) 2021-10-26
EP3547218A4 (en) 2019-12-11
EP3547218A1 (en) 2019-10-02
US20190355122A1 (en) 2019-11-21
EP3547218B1 (en) 2023-12-20
CN110100251A (zh) 2019-08-06
CN110100251B (zh) 2021-08-20

Similar Documents

Publication Publication Date Title
WO2018120238A1 (zh) 用于处理文档的设备、方法和图形用户界面
US20230393721A1 (en) Method and Apparatus for Dynamically Displaying Icon Based on Background Image
US10701273B1 (en) User terminal apparatus and control method thereof
US11114130B2 (en) Method and device for processing video
CN107197169B (zh) 一种高动态范围图像拍摄方法及移动终端
US9479693B2 (en) Method and mobile terminal apparatus for displaying specialized visual guides for photography
EP2770729B1 (en) Apparatus and method for synthesizing an image in a portable terminal equipped with a dual camera
CN106713696B (zh) 图像处理方法及装置
US20220408020A1 (en) Image Processing Method, Electronic Device, and Cloud Server
US10290120B2 (en) Color analysis and control using an electronic mobile device transparent display screen
WO2020134558A1 (zh) 图像处理方法、装置、电子设备及存储介质
EP3893495A1 (en) Method for selecting images based on continuous shooting and electronic device
CN105957037B (zh) 图像增强方法及装置
TW201905760A (zh) 全景影像的顯示方法及其電子裝置
WO2015196715A1 (zh) 图像重定位方法、装置及终端
CN109068063B (zh) 一种三维图像数据的处理、显示方法、装置及移动终端
US20230224574A1 (en) Photographing method and apparatus
CN107730443B (zh) 图像处理方法、装置及用户设备
CN106775548B (zh) 页面处理方法及装置
US11284003B2 (en) User terminal apparatus and control method thereof
CN113056905B (zh) 用于拍摄类长焦图像的系统和方法
CN108540726B (zh) 连拍图像的处理方法、装置、存储介质及终端
KR101324809B1 (ko) 휴대용 단말기 및 그 제어방법
CN112995539B (zh) 一种移动终端及图像处理方法
WO2021243955A1 (zh) 主色调提取方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16924927

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2016924927

Country of ref document: EP

Effective date: 20190627