WO2017140233A1 - Text detection method and system, device and storage medium - Google Patents

Text detection method and system, device and storage medium Download PDF

Info

Publication number
WO2017140233A1
WO2017140233A1 PCT/CN2017/073407 CN2017073407W WO2017140233A1 WO 2017140233 A1 WO2017140233 A1 WO 2017140233A1 CN 2017073407 W CN2017073407 W CN 2017073407W WO 2017140233 A1 WO2017140233 A1 WO 2017140233A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
blocks
color
text
connected blocks
Prior art date
Application number
PCT/CN2017/073407
Other languages
French (fr)
Chinese (zh)
Inventor
徐昆
郭晓威
黄飞跃
郑宇飞
张惜今
卢艺帆
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2017140233A1 publication Critical patent/WO2017140233A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10008Still image; Photographic image from scanner, fax or copier
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30176Document

Definitions

  • the invention relates to a text detection technology in an image, in particular to a text detection method and system, a device and a storage medium.
  • a document image is an image format document, which converts a paper document or the like into an image format by some means (such as scanning) for electronic reading by a user.
  • a typical example of a document image is a portable document format (PDF, Portable Document Format). Format) format image, and DjVu format image.
  • the current text detection technology can detect the text in the document image (the area in the image where the text is carried), and perform text recognition based on the detected area of the text.
  • the image in the general sense includes not only the document image but also the non-document image (that is, the image uploaded by the user such as a web album in a scanned format image, which may be a Joint Photographic Experts Group (JPG) image, bitmap (BMP) image. , Tag Image File Format (TIFF) images, Graphics Interchange Format (GIF) images, and Exchangeable Image File Format (EXIF) images.
  • JPG Joint Photographic Experts Group
  • BMP bitmap
  • TIFF Tag Image File Format
  • GIF Graphics Interchange Format
  • EXIF Exchangeable Image File Format
  • detecting the text in the image is a necessary pre-step.
  • the current text detection technology uses artificially designated features to determine whether the image contains text and more for English characters. Detection, because there is a significant difference in the glyph structure between Chinese and English, there is a big difference between the accuracy of the Chinese detection applied to the document image and the accuracy of detecting the English in the document image, which is difficult to meet the needs of practical applications.
  • Embodiments of the present invention provide a text detection method, system, device, and storage medium, which can accurately detect text in an image.
  • an embodiment of the present invention provides a text detection method, including:
  • an embodiment of the present invention provides a text detection system, including:
  • a subtractive binary processing unit configured to perform a color reduction process on each of the three color channels of the target image to obtain a subtractive image, and convert the target image into a binary image
  • a first merging unit configured to merge connected blocks having the same color in the reduced color image, and merge connected blocks having the same color in the binary image
  • a second merging unit configured to merge the connected blocks of each color channel of the three-color channel of the subtractive image and the connected blocks in the binary image in a vertical and horizontal direction Obtaining a candidate text region in the target image
  • the determining unit is configured to extract a specific area on the target image corresponding to the position of the candidate text area, based on the extracted probability and preset of the text area in the specific area
  • the comparison result of the probability thresholds determines whether a text line or a character string is included in the extracted specific region.
  • an embodiment of the present invention provides a text detecting device, including: a memory and a processor, where the executable file stores executable instructions, where the executable instructions are used to cause the processor to perform the following operations:
  • an embodiment of the present invention provides a storage medium, where executable instructions are stored for performing a text detection method provided by an embodiment of the present invention.
  • the image is divided into connected blocks according to color, and the connected block is a potential bounding box containing characters, and then the convolutional neural network sliding window is used to verify the probability that each bounding box contains a text line (or a character string).
  • the probability is greater than the preset probability threshold, it is determined that the bounding box contains a character line (or a character string), and the above processing is applicable to the document image and the non-document image, and the text in the image can be accurately detected.
  • 1-1 to 1-6 are schematic diagrams of pixel relationships provided by an embodiment of the present invention.
  • FIG. 2 is an optional structural diagram of a character detection system according to an embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of a character detecting device according to an embodiment of the present invention.
  • FIG. 4 is a schematic flowchart 1 of a text detection method according to an embodiment of the present invention.
  • FIG. 5 is a second schematic diagram of a flow of a text detection method according to an embodiment of the present invention.
  • FIG. 6 to FIG. 9 are schematic diagrams showing detection results of a character detecting method according to an embodiment of the present invention.
  • FIG. 10 to FIG. 11 are schematic diagrams of a convolutional neural network according to an embodiment of the present invention.
  • FIG. 12 is a schematic structural diagram of a character detection system according to an embodiment of the present invention.
  • the terms "including”, “comprising”, or any other variations thereof are intended to encompass non-exclusive inclusions, such that a method or apparatus comprising a plurality of elements includes not only the Elements, but also other elements not explicitly listed, or elements that are inherent to the implementation of the method or device.
  • an element defined by the phrase “comprising a " does not exclude the presence of additional related elements in the method or device including the element (eg, a step in the method or a unit in the device) ).
  • Gray value indicates the integer number of pixels, for example, the range of pixels is 0-255, which is called the image of 256 gray levels.
  • Adjacency Two pixels are in contact, then they are contiguous. A pixel is in contact with a pixel in its neighborhood. Adjacency only considers the spatial relationship of pixels.
  • Adjacencies include the following types:
  • D adjacency As shown in Figure 1-2, the D neighborhood of the pixel p(x, y) is the pixel on the diagonal (x+1, y+1); the D of the pixel p is represented by ND(p) Neighborhood: (x+1, y-1); (x-1, y+1); (x-1, y-1).
  • Connectivity includes the following types:
  • the connected areas are also connected to each other, and the pixels that communicate with each other (any of the above-described communication methods) form one area, and the unconnected points form different areas.
  • Such a set of points where all points are connected to each other is called a connected domain.
  • Embodiments of the present invention provide a method, system, device, and storage medium for detecting characters in images (including images in a scan format and images in a non-scan format), and the images described herein include not only images in a conventional scan format, such as PDF format, which can also include non-document images such as Joint Photographic Experts Group (JPG) images, bitmap (BMP) images, Tagged Image File Format (TIFF) images, Graphic Interchange Format (GIF) images, and interchangeable image file formats.
  • JPG Joint Photographic Experts Group
  • BMP bitmap
  • TIFF Tagged Image File Format
  • GIF Graphic Interchange Format
  • EXIF An image of any form such as an image.
  • the character detection method, system, device, and storage medium perform a file detection method to locate an area in which an image is carried in an image
  • the image detected by the file detection system may be a document image such as a PDF document, or Non-document images, such as JPG images, BMP images, TIFF images, GIF images, and EXIF images, as a source of images, mainly for the scanning of electronic devices (such as smart phones, tablets, laptops), prints of posters, etc. Electronic version, and other digital images containing printed Chinese characters.
  • each image in the three color channels of the target image is subjected to color reduction processing to obtain a subtractive image, and Converting the target image into a binary image; in step 102, combining the connected blocks having the same color in the subtracted image, and merging the connected blocks having the same color in the binary image;
  • step 103 the connected blocks of each color channel of the color-reduced image three-color channel and the connected blocks in the binary image are respectively combined in a vertical and horizontal direction to obtain the a candidate text area in the target image; in step 104, extracting a specific area on the target image corresponding to the position of the candidate text area, based on the extracted probability and preset probability of the text area in the specific area
  • the comparison result of the thresholds determines whether a text line or a character string is included in the extracted specific region.
  • the text detection system locates the lines of text in the image as shown in Figures 6 to 9 by color clustering, layering, connected block merging and filtering, and discriminating based on a deep convolutional neural network ( Or a text column, such as a Chinese character, or a letter such as English.
  • a text column such as a Chinese character, or a letter such as English.
  • a text line of letters, numbers, symbols, or a combination of characters of any type such as Chinese characters, letters, numbers, symbols, etc., thereby identifying text in the text line based on the positioned text line.
  • the text detection system can be implemented by a plurality of servers arranged in a distributed manner.
  • a plurality of servers cooperate to detect text from the image, that is, each server completes at least part of the file detection method.
  • the steps are sent to other servers that rely on the results of this processing to form the final result of the text detection.
  • each server can perform text detection on different images (or the same image) in parallel, that is, the server 21 does not depend on other servers when performing text detection (the server 22 to The detection result of the server 24).
  • FIG. 3 shows that the structure of the electronic device 30 is only one example of a suitable structure and is not intended to suggest any limitation with respect to the structure of the electronic device.
  • the electronic device 30 includes a personal computer, a server computer, a handheld or laptop device, a mobile device (such as a mobile phone, a personal digital assistant (PDA), a media player, etc.), a consumer electronic device, a small computer, a mainframe computer, A distributed computing environment, etc., including any of the above devices.
  • Computer readable instructions may be distributed via computer readable media (discussed below).
  • Computer readable instructions may be implemented as program modules, such as functions, objects, application programming interfaces (APIs), data structures, etc. that perform particular tasks or implement particular abstract data types.
  • program modules such as functions, objects, application programming interfaces (APIs), data structures, etc. that perform particular tasks or implement particular abstract data types.
  • APIs application programming interfaces
  • data structures such as lists, etc. that perform particular tasks or implement particular abstract data types.
  • the functionality of the computer readable instructions can be arbitrarily combined in various environments or distributed.
  • FIG. 3 illustrates an example of the structure of an electronic device 30 provided in accordance with an embodiment of the present invention.
  • electronic device 30 includes at least one processing unit 31 and storage unit 32.
  • memory unit 32 may be volatile (such as RAM), non-volatile (such as ROM, flash memory, etc.) or some combination of the two. This configuration is illustrated by dashed lines in FIG.
  • electronic device 30 may include additional features and/or functionality.
  • electronic device 30 may also include additional storage devices (eg, removable and/or non-removable) including, but not limited to, magnetic storage devices, optical storage devices, and the like.
  • This additional storage device is illustrated by storage unit 33 in FIG.
  • computer readable instructions for implementing one or more embodiments provided by embodiments of the present invention may be in storage unit 33.
  • the storage unit 33 may also store other computer readable instructions for implementing an operating system, an application, and the like.
  • Computer readable instructions may be loaded into storage unit 32 for execution by, for example, processing unit 31.
  • Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions or other data.
  • the storage unit 32 and the storage unit 33 are examples of computer storage media.
  • Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical storage device, magnetic tape cassette, magnetic tape, magnetic disk storage device or other magnetic storage device, Or any other medium that can be used to store desired information and that can be accessed by electronic device 30. Any such computer storage media may be part of the electronic device 30.
  • Electronic device 30 may also include a communication connection 36 that allows electronic device 30 to communicate with other devices.
  • Communication connection 36 may include, but is not limited to, a modem, a network interface card (NIC), an integrated network interface, a radio frequency transmitter/receiver, an infrared port, a USB connection, or other interface for connecting electronic device 30 to other electronic devices.
  • Communication connection 36 may include a wired connection or wireless connection. Communication connection 36 can transmit and/or receive communication media.
  • Computer readable medium can include a communication medium.
  • Communication media typically embodies computer readable instructions or other data in a “modulated data signal” such as a carrier wave or other transport mechanism, and includes any information delivery media.
  • modulated data signal can include a signal that one or more of the signal characteristics are set or changed in such a manner as to encode the information into the signal.
  • Electronic device 30 may include an input unit 35 such as a keyboard, mouse, pen, voice input device, touch input device, infrared camera, video input device, and/or any other input device.
  • Output unit 34 may also be included in electronic device 30, such as one or more displays, speakers, printers, and/or any other output device.
  • Input unit 35 and output unit 34 may be connected to electronic device 30 via a wired connection, a wireless connection, or any combination thereof.
  • an input device or output device from another electronic device can be used as the input unit 35 or output unit 34 of the electronic device 30.
  • the components of electronic device 30 can be connected by various interconnects, such as a bus.
  • interconnects may include Peripheral Component Interconnect (PCI) (such as Fast PCI), Universal Serial Bus (USB), Firewire (IEEE 1394), optical bus architecture, and the like.
  • PCI Peripheral Component Interconnect
  • USB Universal Serial Bus
  • Firewire IEEE 1394
  • optical bus architecture and the like.
  • the components of electronic device 30 may be interconnected by a network.
  • storage unit 32 may be comprised of a plurality of physical memory units that are interconnected by a network located in different physical locations.
  • the method for detecting text by the character detecting system in the embodiment of the present invention can be applied to a text detecting system or a text detecting device such as the foregoing, including the following steps:
  • Step 201 Perform color reduction processing on the target image to obtain a subtractive image of the target image.
  • the target image since each channel in the RGG tri-color channel has 256 brightness levels (0-255), the target image can have 255 ⁇ 3 (255 cubic) colors, and each of the RGB three-color channels After the luminance of the channels is divided into K intervals, the target image has K ⁇ 3 (the square of K, less than 255 ⁇ 3) colors, and thus the subtractive image f1 is obtained.
  • each channel has a level of 0 and 1 brightness after quantization, that is, 0-1727 of the brightness level 0-255 of each channel is mapped to the quantized brightness 0, Map 128-255 of the brightness level 0-255 of each channel to the quantized brightness 1. If the brightness of the corresponding RGB three-color channel of one pixel in the target image is (0, 122, 255), then the color reduction process The subsequent luminance is (0, 0, 1), and the above-described luminance mapping processing is performed for each pixel in the target image.
  • Step 201 achieves the following technical effects for the above two cases: the text in the subtractive image has one of K ⁇ 3 colors.
  • Step 202 Perform local binarization processing on the target image to obtain a binary image of the target image.
  • Step 202 achieves the following technical effects for the above two cases: respectively, the text in the binary image belongs to one of black or white.
  • the pixels corresponding to the text in the subtractive image obtained in step 201 and step 202 and the characters in the binary image have the same color, and each pixel is used as a connected block in step 203 and will have the same color.
  • the color connected blocks are merged to connect the text.
  • Step 203 identifying the connected blocks in the subtracted image and in the binary image, combining the connected blocks having the same color in the subtracted image, and merging the connected blocks having the same color in the binary image.
  • each pixel on the image is regarded as a vertex in the undirected graph, and adjacent pixels are regarded as having One side, the entire image is treated as an undirected image).
  • I 1 is the number of pixels in the subtracted image f1
  • the channel X is the brightness of any of the RGB three-color channels, here R channel
  • the pixels i and 8 are adjacent to each other (refers to the upper and lower sides of the pixel i and the two ends of the two diagonals)
  • Any pixel j in the pixel is consistent in brightness of the corresponding channel (consistent with the aforementioned assumed R channel), and the connected block to which the pixel i belongs and the connected block to which the pixel j belongs are merged into one connected block.
  • each connected block determining the pixel area of each connected block: if the pixel area of the connected block k (the range of k is the number of connected blocks) is smaller than the threshold (4 pixels), the connected block k Merged into the connected block adjacent to the connected block k, the color of the pixel in the connected block k is set to the luminance of the connected block in which the connected block k is incorporated.
  • the pixel i in the grayscale image of the target image (i takes the value I 2 ⁇ i ⁇ 1)
  • I 2 is the number of pixels in the grayscale image
  • the adjacent pixel pixel
  • the color (gray value) of the pixel j in the upper and lower left and right sides of i and the total of 8 pixels at both ends of the two diagonal lines are the same, and the adjacent blocks i and the connected blocks to which the pixel j belongs are merged into the same communication.
  • Block traverse each connected block, and judge the pixel area of each connected block: if the pixel area of the connected block k (the range of k is the number of connected blocks) is smaller than a threshold (4 pixels), it will be connected
  • the block k is merged into the connected block adjacent to the connected block k, and the gray value of the pixel in the connected block k is set to the gray value of the pixel in the connected block in which the connected block k is incorporated.
  • Step 203 combines the pixels belonging to the same character (for the Chinese character, at least the same stroke) into one called a connected block for subsequent processing.
  • Step 204 discards the connected blocks in the subtracted image and in the binary image that match the preset features (the preset features herein correspond to the features of the non-text regions in the image).
  • Step 204 After merging the connected blocks in the subtracted image and the binary image, discarding the preset features in the subtracted image and the binary image (the preset feature here corresponds to the feature of the non-text region in the image) Connected block.
  • At least one of the following processing is performed on the connected block of each color channel and the connected block of the binary image f2 in the subtractive image f1:
  • a connected block whose area is still smaller than a pixel area threshold (for example, 4 pixels) is regarded as an unsupported character;
  • the length of either side of the connected block is greater than the first preset ratio of the edge length of the corresponding image (for example, 0.8 times);
  • the length of any side of the connected block is greater than the threshold length of the frame (such as 65 pixels), and the ratio of the pixel area of the connected block to the bounding box product is less than the ratio threshold (such as 0.22).
  • the bounding box of the connected block is the smallest rectangle that includes all the pixels contained in the connected block (the sides of the rectangle correspond to the x and y axes of the image, so it can be uniquely determined)
  • step 206 may be performed to merge the strokes of the characters in the image (such as Chinese characters and i and j in the English characters).
  • Step 205 merging the positional relationships (such as distance, intersection) of the connected blocks of each color channel in the color-reduced image into new connected blocks, and based on the positional relationship (eg, distance, for the connected blocks in the binary image) Cross) merges into new connected blocks.
  • a connected block whose merge distance is smaller than the distance threshold (distance refers to the Chebyshev distance d of the center point of the bounding box of the two connected blocks).
  • the merged bounding box has a connected block in which the intersecting portion conforms to the preset intersecting feature. For example, if there is an intersection of the bounding boxes of the two connected blocks, the area of the intersecting portion is greater than a preset 10% of the area of the smaller of the two bounding boxes, and the area of the intersecting portion is less than 10% of the area of the image,
  • the bounding box has a connected block of intersections.
  • alignment means that the bounding boxes of the connected blocks are aligned horizontally or vertically, ie: 1) the bounding boxes of the two connected blocks are of the same height, and The position is consistent in the vertical direction; 2) the width of the bounding boxes of the two connected blocks are uniform, and the positions in the horizontal direction are uniform) merge.
  • An example of an alignment merge rule is: the merge of two connected blocks (ie, the smallest bounding box containing two bounding boxes) and the bounding box area of the two connected blocks. If the area of the bounding box is smaller than the area threshold of the image area (for example, 10%), the bounding box of the two connected blocks is merged.
  • Step 206 Combine the connected blocks of each color channel of the RGB three-color channel of the subtractive image f1 and the connected blocks of the binary image f2 in a vertical and horizontal direction, respectively, to obtain an image.
  • Candidate text area including text line area and text column area).
  • the purpose is to connect a single text (such as a Chinese character) into a text line or column: based on the join merge rule (the same join merge rule is used for the merge of the horizontal direction and the vertical direction, which is described later).
  • a horizontal merge, then a vertical merge, and finally a horizontal merge is used for the merge of the horizontal direction and the vertical direction, which is described later.
  • the horizontally arranged text in the image is more common than the vertical text, so in step 206, the connected blocks are first merged horizontally, so that the horizontally arranged characters are first merged, and the horizontal characters are erroneously vertically merged. Possibly, then merge the connected blocks vertically, and merge the ones that do not satisfy the horizontal merge rule but satisfy the vertical merge rule; but in this process, because the bounding box of the connected block may be changed, a new satisfaction level merge is generated. The bounding box pair of rules, so do another merging of connected blocks in the horizontal direction.
  • connection merge rule is that a bounding box of two connected blocks satisfies at least one of the following conditions: connecting two connected blocks as new connected blocks:
  • the center distance of the bounding box of the two communicating blocks on the reference axis (horizontal axis or vertical axis) (the distance between the coordinates of the two bounding boxes at the center of the corresponding reference axis) or the edge distance
  • the minimum distance (the distance between the edge coordinates of the two bounding boxes on the reference axis) is less than the first of the minimum side lengths of the side lengths of the two bounding boxes corresponding to the reference axis (the side length coincident with the reference axial direction)
  • Preset ratio (eg 0.15 times);
  • the bounding box of the two connecting blocks has a distance in a direction perpendicular to the reference axis that is smaller than a second preset ratio (eg, twice) of the minimum side length of the side lengths of the two bounding boxes corresponding to the reference axis;
  • the difference between the side lengths of the bounding boxes of the two connected blocks in the reference axis (the difference in the side lengths of the corresponding reference axes of the bounding boxes of the two connected blocks) is smaller than the bounding box of the two connected blocks
  • the third preset ratio (eg, 30%) of the minimum side length of the side length of the reference axis.
  • Step 207 extract a specific area on the target image corresponding to the position of the bounding box (that is, the text area of the candidate including the character line or the character string) corresponding to the connected block connected together, and for each extracted specific area, based on the specific area
  • the probability of including a text line or a character string corresponds to whether or not a character line or a character string is included in the specific area.
  • the bounding box obtained by connecting the subtractive image f1 and the binary image f2, that is, the new bounding box obtained by the union of the bounding boxes connected in a row is rectangular in shape, and also Is a potential area including a character line or a character string (that is, a candidate text area), and extracts a region of interest (ROIregion of interest, that is, the aforementioned specific region, from the target image I in the target image I)
  • the area to be processed by the frame, circle, ellipse, irregular polygon, etc.) with the specific sliding window step length such as the shortest side length S of the area as the window side length, 0.5S is the sliding window step sliding window Determined by a pre-trained convolutional neural network (CNN) classifier, the probability p_w containing text in each sliding window is obtained, and all p_w are averaged to obtain a candidate text area which is a text line (or a character string).
  • step 208 the overlapping bounding boxes are merged into one bounding box and output as a region containing text.
  • Steps 201 to 204 ensure the positional accuracy of the bounding box (that is, the potential text area) (even if the bounding box is another image element instead of a text line (or a character string), the corresponding text line can be accurately
  • the image elements are discarded, and the probability threshold filtering in step 208 ensures that the bounding boxes through the filtering contain text lines (or character columns), and the bounding boxes through the filtering have relatively accurate positions, without requiring non-maximum suppression, directly All overlapping bounding boxes are combined into one bounding box and output.
  • the subtractive color binary processing unit 121 is configured to perform color reduction processing on each of the three color channels of the target image to obtain a subtractive color image, and convert the target image into a binary image;
  • a first merging unit 122 configured to merge connected blocks having the same color in the reduced color image, and merge connected blocks having the same color in the binary image;
  • a second merging unit 123 configured to connect the connected blocks of each color channel of the color-reduced image three-color channel and the connected blocks in the binary image in a vertical and horizontal direction Combining to obtain candidate text regions in the target image;
  • the determining unit 124 is configured to extract a specific area on the target image corresponding to the position of the candidate text area, and determine, according to the comparison result of the extracted probability of including the text area in the specific area and a preset probability threshold. Whether the extracted specific area contains a text line or a character column.
  • the subtractive color binary processing unit 121 is further configured to quantize each of the red, green, and blue color channels of the target image into K levels to obtain K levels of intervals;
  • K is an integer and 255>K>1.
  • the first merging unit 122 is further configured to establish, for each pixel in the subtracted image and the binary image as a single connected block, perform a parallel check execution on the pixel.
  • the first merging unit 122 is further configured to merge the connected blocks to which the two adjacent pixels of the same color belong to the same connected block if the color of any one of the pixels adjacent to the pixel is the same.
  • the first merging unit 122 is further configured to determine a pixel area of each of the connected blocks, and if the pixel area of the connected block is smaller than a pixel area threshold, merge the connected block with the connected block Adjacent connecting blocks, and the color of the connected block is set to the color of the connected connected block.
  • system further includes:
  • a discarding processing unit 125 configured to be in the subtractive image in the first merging unit 122 Connected blocks having the same color are merged, and after the connected blocks having the same color in the binary image are merged, the connected blocks in the subtracted image and in the binary image that conform to the preset features are discarded;
  • the preset features include At least one of the following:
  • a connected block in which any one of the connected blocks is longer than a frame length threshold and the ratio of the pixel area to the bounding box product is smaller than a ratio threshold is discarded.
  • system further includes
  • a third merging unit 126 configured to merge the connected blocks having the same color in the reduced color image at the first merging unit 122, and merge the connected blocks having the same color in the binary image, based on The positional relationship of the connected blocks of each color channel in the color reduction image is separately merged into a new connected block, and the connected blocks in the binary image are merged into a new connected block based on a positional relationship;
  • the third merging unit 126 is further configured to perform at least one of the following processes:
  • the merged bounding box has a connected block that intersects and the intersecting portion conforms to the preset cross feature
  • the second merging unit 123 is further configured to perform merging in a horizontal direction, merging in a vertical direction, and merging in a horizontal direction according to different types of connection merging rules; wherein the connection merging rule includes:
  • the two connected blocks selected by the connection satisfying at least one of the following conditions are new connected blocks:
  • the bounding box of the two connecting blocks in the center distance or the edge distance in the reference axis a minimum distance, less than a first predetermined ratio of the minimum side length of the side lengths of the reference axes corresponding to the bounding boxes of the two connected blocks;
  • the bounding boxes of the two connecting blocks have a distance in a direction perpendicular to the reference axis smaller than a minimum side length of the bounding boxes of the two connecting blocks in a side length perpendicular to the reference axis Two preset ratios;
  • the difference between the side lengths of the bounding boxes of the two connecting blocks in the reference axis is smaller than the third preset ratio of the minimum side length of the side lengths of the bounding boxes of the two connecting blocks corresponding to the reference axis .
  • the determining unit 124 is further configured to extract a region of interest on the target image, and obtain a bounding box connecting the subtractive image and the binary image to a specific sliding window step.
  • the long sliding window is determined by the bounding box connected to the convolutional neural network classifier in the subtractive image and the binary image, and the probability of containing characters in each sliding window is obtained;
  • the determining unit 124 is further configured to average an probability of including characters in the sliding window, and obtain a probability that the candidate text region includes a character row or a character string;
  • the determining unit 124 is further configured to determine that a text line or a character string exists in the region of interest if the obtained threshold value is greater than a preset probability.
  • the functional division of the character detection system shown in FIG. 12 is exemplarily applicable to the division of the functional structure of the electronic device provided by the embodiment of the present invention, and the person skilled in the art according to FIG. 12 and In the description of FIG. 12, the functional structure may be easily modified, such as merging the functional units of the part, or further dividing the functional units. Therefore, the functional structure of the character detecting system provided by the embodiment of the present invention is Not limited to Figure 12.
  • An embodiment of the present invention provides a non-volatile storage medium, where the computer storage medium stores executable instructions for executing the file detecting method illustrated in FIG. 2 or FIG. 5, where the storage medium includes : Mobile storage devices, random access memory (RAM, Random Access Memory), read-only memory (ROM, Read-Only Memory), disk or optical disk, etc.
  • the storage medium includes : Mobile storage devices, random access memory (RAM, Random Access Memory), read-only memory (ROM, Read-Only Memory), disk or optical disk, etc.
  • RAM random access memory
  • ROM Read-Only Memory
  • disk or optical disk etc.
  • the embodiment of the invention provides a method, a system, a device and a storage medium for detecting characters in an image, which are suitable for locating characters such as printed Chinese characters in an image in a network album, and the output result can be used as an input of a character recognition system to help ultimately generate Accurate text recognition results.
  • the foregoing storage medium includes: a mobile storage device, a random access memory (RAM), a read-only memory (ROM), a magnetic disk, or an optical disk.
  • RAM random access memory
  • ROM read-only memory
  • magnetic disk or an optical disk.
  • optical disk A medium that can store program code.
  • the above-described integrated unit of the present invention may be stored in a computer readable storage medium if it is implemented in the form of a software function module and sold or used as a standalone product.
  • the technical solution of the embodiments of the present invention may be embodied in the form of a software product in essence or in the form of a software product, which is stored in a storage medium and includes a plurality of instructions for making
  • a computer device which may be a personal computer, server, or network device, etc.
  • the foregoing storage medium includes various media that can store program codes, such as a mobile storage device, a RAM, a ROM, a magnetic disk, or an optical disk.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)
  • Character Input (AREA)

Abstract

A text detection method and system, a device and a storage medium. The method comprises: performing subtractive colour processing on each image in a three-colour channel of a target image to obtain a subtractive colour image, and converting the target image into a binary image (101); merging connected blocks with the same colour in the subtractive colour image and merging connected blocks with the same colour in the binary image (102); respectively merging the connected blocks of each colour channel of the three-colour channel of the subtractive colour image and the connected blocks in the binary image in the vertical and horizontal directions in a connected manner, so as to obtain a candidate text area in the target image (103); and extracting a specific area on a position corresponding to the candidate text area in the target image, and based on a comparison result of the probability of a text area being included in the specific extracted area and a pre-set probability threshold value, determining whether the extracted specific area includes a text row or a text column (104). Accurate detection can be performed on text in an image.

Description

文字检测方法及系统、设备、存储介质Text detection method and system, device, storage medium 技术领域Technical field
本发明涉及图像中的文字检测技术,尤其涉及一种文字检测方法及系统、设备、存储介质。The invention relates to a text detection technology in an image, in particular to a text detection method and system, a device and a storage medium.
背景技术Background technique
文档图像即图像格式的文档,它是通过某种方式(如扫描)将纸质文档等转化为图像格式的文档,以供用户电子阅读,文档图像的典型示例是便携式文档格式(PDF,Portable Document Format)格式图像、以及DjVu格式图像。A document image is an image format document, which converts a paper document or the like into an image format by some means (such as scanning) for electronic reading by a user. A typical example of a document image is a portable document format (PDF, Portable Document Format). Format) format image, and DjVu format image.
目前的文字检测技术可以对文档图像中的文字进行检测(定位图像中承载文字的区域),并基于检测到的承载文字的区域进行文字识别。The current text detection technology can detect the text in the document image (the area in the image where the text is carried), and perform text recognition based on the detected area of the text.
一般意义上的图像不仅包括文档图像,还包括非文档图像(也就是通过扫描格式图像如网络相册中的用户上传图像,这些图像可能是联合照片专家组(JPG)图像、位图(BMP)图像、标签图像文件格式(TIFF)图像、图形交换格式(GIF)图像以及可交换的图像文件格式(EXIF)图像等。The image in the general sense includes not only the document image but also the non-document image (that is, the image uploaded by the user such as a web album in a scanned format image, which may be a Joint Photographic Experts Group (JPG) image, bitmap (BMP) image. , Tag Image File Format (TIFF) images, Graphics Interchange Format (GIF) images, and Exchangeable Image File Format (EXIF) images.
如果能识别非文档格式图像中的文字,则可以获得准确的语义信息,帮助用户检索、管理图像。要想识别非扫描格式图像中的文字,检测图像中的文字是必要的前置步骤,目前的文字检测技术多使用人工指定的特征来判别图像是否中是否包含有文字,且多针对英文字符进行检测,由于中文与英文在字形结构上存在显著的差异,应用于文档图像中的中文检测时住准确度与文档图像中检测英文的精度存在较大差异,难以满足实际应用的需求。 If you can recognize text in non-document format images, you can get accurate semantic information to help users retrieve and manage images. In order to identify the text in the non-scan format image, detecting the text in the image is a necessary pre-step. The current text detection technology uses artificially designated features to determine whether the image contains text and more for English characters. Detection, because there is a significant difference in the glyph structure between Chinese and English, there is a big difference between the accuracy of the Chinese detection applied to the document image and the accuracy of detecting the English in the document image, which is difficult to meet the needs of practical applications.
发明内容Summary of the invention
本发明实施例提供一种文字检测方法及系统、设备、存储介质,能够对图像中的文本进行准确检测。Embodiments of the present invention provide a text detection method, system, device, and storage medium, which can accurately detect text in an image.
本发明实施例的技术方案是这样实现的:The technical solution of the embodiment of the present invention is implemented as follows:
第一方面,本发明实施例提供一种文字检测方法,包括:In a first aspect, an embodiment of the present invention provides a text detection method, including:
将目标图像的三色通道中的每个图像进行减色处理,得到减色图像,以及,将所述目标图像转换为二值图像;Performing color reduction processing on each of the three color channels of the target image to obtain a subtractive image, and converting the target image into a binary image;
将所述减色图像中具有相同色彩的连通块进行合并,以及将所述二值图像中具有相同色彩的连通块合并;Merging the connected blocks having the same color in the subtractive image, and merging the connected blocks having the same color in the binary image;
对所述减色图像三色通道的每种色彩通道的连通块、以及所述二值图像中的连通块,分别在竖直和水平方向上以连接的方式进行合并,得到所述目标图像中候选的文字区域;And connecting the connected blocks of each color channel of the three-color channel of the subtractive image and the connected blocks in the binary image in a vertical manner and a horizontal direction, respectively, to obtain the target image Candidate text area;
在所述目标图像上对应所述候选的文字区域的位置提取特定区域,基于所提取的所述特定区域中包含文字区域的概率与预设概率阈值的比较结果判断所述提取的特定区域中是否包含文字行或文字列。Determining, in the target image, a specific region corresponding to the position of the candidate text region, and determining whether the extracted specific region is based on a comparison result of the extracted probability of including the text region in the specific region and a preset probability threshold Contains text lines or text columns.
第二方面,本发明实施例提供一种文字检测系统,包括:In a second aspect, an embodiment of the present invention provides a text detection system, including:
减色二值处理单元,配置为将目标图像的三色通道中的每个图像进行减色处理,得到减色图像,以及,将所述目标图像转换为二值图像;a subtractive binary processing unit configured to perform a color reduction process on each of the three color channels of the target image to obtain a subtractive image, and convert the target image into a binary image;
第一合并单元,配置为将所述减色图像中具有相同色彩的连通块进行合并,以及将所述二值图像中具有相同色彩的连通块合并;a first merging unit configured to merge connected blocks having the same color in the reduced color image, and merge connected blocks having the same color in the binary image;
第二合并单元,配置为对所述减色图像三色通道的每种色彩通道的连通块、以及所述二值图像中的连通块,分别在竖直和水平方向上以连接的方式进行合并,得到所述目标图像中候选的文字区域;a second merging unit configured to merge the connected blocks of each color channel of the three-color channel of the subtractive image and the connected blocks in the binary image in a vertical and horizontal direction Obtaining a candidate text region in the target image;
判断单元,配置为在所述目标图像上对应所述候选的文字区域的位置提取特定区域,基于所提取的所述特定区域中包含文字区域的概率与预设 概率阈值的比较结果判断所述提取的特定区域中是否包含文字行或文字列。The determining unit is configured to extract a specific area on the target image corresponding to the position of the candidate text area, based on the extracted probability and preset of the text area in the specific area The comparison result of the probability thresholds determines whether a text line or a character string is included in the extracted specific region.
第三方面,本发明实施例提供一种文字检测设备,包括:存储器和处理器,所述存储器中存储有可执行指令,所述可执行指令用于引起所述处理器执行以下的操作:In a third aspect, an embodiment of the present invention provides a text detecting device, including: a memory and a processor, where the executable file stores executable instructions, where the executable instructions are used to cause the processor to perform the following operations:
将目标图像的三色通道中的每个图像进行减色处理,得到减色图像;Performing color reduction processing on each of the three color channels of the target image to obtain a subtractive image;
将所述目标图像转换为二值图像;Converting the target image into a binary image;
将所述减色图像中具有相同色彩的连通块进行合并,将所述二值图像中具有相同色彩的连通块合并;Combining the connected blocks having the same color in the subtracted image, and merging the connected blocks having the same color in the binary image;
对所述减色图像三色通道的每种色彩通道的连通块、以及所述二值图像中的连通块,分别在竖直和水平方向上以连接的方式进行合并,得到所述目标图像中候选的文字区域;And connecting the connected blocks of each color channel of the three-color channel of the subtractive image and the connected blocks in the binary image in a vertical manner and a horizontal direction, respectively, to obtain the target image Candidate text area;
在所述目标图像上对应所述候选的文字区域的位置提取特定区域;Extracting a specific region on the target image corresponding to a position of the candidate text region;
基于所提取的所述特定区域中包含文字区域的概率与预设概率阈值的比较结果,判断所述提取的特定区域中是否包含文字行或文字列。And determining, according to the comparison result of the extracted probability of including the text area in the specific area and the preset probability threshold, whether the extracted specific area includes a character line or a character string.
第四方面,本发明实施例提供一种存储介质,存储有可执行指令,用于执行本发明实施例提供的文字检测方法。In a fourth aspect, an embodiment of the present invention provides a storage medium, where executable instructions are stored for performing a text detection method provided by an embodiment of the present invention.
本发明实施例中对图像按照色彩分割为连通块,将连通块为包含文字的潜在的包围盒,然后用卷积神经网络滑窗验证每个包围盒包含文字行(或文字列)的概率,当概率大于预设概率阈值时判定包围盒中包含有文字行(或文字列),上述处理适用于文档图像和非文档图像,能够对图像中的文本进行准确检测。In the embodiment of the present invention, the image is divided into connected blocks according to color, and the connected block is a potential bounding box containing characters, and then the convolutional neural network sliding window is used to verify the probability that each bounding box contains a text line (or a character string). When the probability is greater than the preset probability threshold, it is determined that the bounding box contains a character line (or a character string), and the above processing is applicable to the document image and the non-document image, and the text in the image can be accurately detected.
附图说明DRAWINGS
图1-1至图1-6是本发明实施例提供的像素关系的示意图;1-1 to 1-6 are schematic diagrams of pixel relationships provided by an embodiment of the present invention;
图2是本发明实施例提供的文字检测系统的一个可选的结构示意图; 2 is an optional structural diagram of a character detection system according to an embodiment of the present invention;
图3是本发明实施例提供的文字检测设备的一个可选的结构示意图;3 is a schematic structural diagram of a character detecting device according to an embodiment of the present invention;
图4是本发明实施例提供的文字检测方法的一个流程示意图一;4 is a schematic flowchart 1 of a text detection method according to an embodiment of the present invention;
图5是本发明实施例提供的文字检测方法的一个流程示意图二;FIG. 5 is a second schematic diagram of a flow of a text detection method according to an embodiment of the present invention; FIG.
图6至图9是本发明实施例提供的文字检测方法的检测结果示意图;6 to FIG. 9 are schematic diagrams showing detection results of a character detecting method according to an embodiment of the present invention;
图10至图11是本发明实施例提供的卷积神经网络的示意图;10 to FIG. 11 are schematic diagrams of a convolutional neural network according to an embodiment of the present invention;
图12是本发明实施例提供的文字检测系统的一个可选的结构示意图。FIG. 12 is a schematic structural diagram of a character detection system according to an embodiment of the present invention.
具体实施方式detailed description
以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所提供的实施例仅仅用以解释本发明,并不用于限定本发明。另外,以下所提供的实施例是用于实施本发明的部分实施例,而非提供实施本发明的全部实施例,在本领域技术人员不付出创造性劳动的前提下,对以下实施例的技术方案进行重组所得的实施例、以及基于对发明所实施的其他实施例均属于本发明的保护范围。The present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It is to be understood that the examples are provided to illustrate the invention and not to limit the invention. In addition, the embodiments provided below are part of the embodiments for carrying out the invention, and are not intended to provide all embodiments for carrying out the invention, and the technical solutions of the following embodiments are provided to those skilled in the art without any inventive work. The examples obtained by carrying out the reorganization and other embodiments based on the invention are all within the scope of the invention.
需要说明的是,在本发明实施例中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的方法或者装置不仅包括所明确记载的要素,而且还包括没有明确列出的其他要素,或者是还包括为实施方法或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的方法或者装置中还存在另外的相关要素(例如方法中的步骤或者装置中的单元)。It should be noted that, in the embodiments of the present invention, the terms "including", "comprising", or any other variations thereof are intended to encompass non-exclusive inclusions, such that a method or apparatus comprising a plurality of elements includes not only the Elements, but also other elements not explicitly listed, or elements that are inherent to the implementation of the method or device. In the absence of further limitation, an element defined by the phrase "comprising a ..." does not exclude the presence of additional related elements in the method or device including the element (eg, a step in the method or a unit in the device) ).
本发明实施例中涉及的名词和术语适用于如下的解释。The nouns and terms referred to in the embodiments of the present invention are applicable to the following explanations.
1)灰度值:表示像素明暗程度的整数量,例如:像素的取值范围为0-255,就称该图像为256个灰度级的图像。1) Gray value: indicates the integer number of pixels, for example, the range of pixels is 0-255, which is called the image of 256 gray levels.
2)邻接:两个像素接触,则它们是邻接的。一个像素和它的邻域中的像素是接触的。邻接仅考虑像素的空间关系。 2) Adjacency: Two pixels are in contact, then they are contiguous. A pixel is in contact with a pixel in its neighborhood. Adjacency only considers the spatial relationship of pixels.
邻接包括以下几种类型:Adjacencies include the following types:
2.1)4邻接:如图1-1所示,像素p(x,y)的4邻域是邻接的像素:(x+1,y);(x-1,y);(x,y+1);(x,y-1)。2.1) 4 adjacency: As shown in Figure 1-1, the 4 neighborhoods of the pixel p(x, y) are adjacent pixels: (x+1, y); (x-1, y); (x, y+ 1); (x, y-1).
2.2)D邻接:如图1-2所示,像素p(x,y)的D邻域是对角上的像素(x+1,y+1);用ND(p)表示像素p的D邻域:(x+1,y-1);(x-1,y+1);(x-1,y-1)。2.2) D adjacency: As shown in Figure 1-2, the D neighborhood of the pixel p(x, y) is the pixel on the diagonal (x+1, y+1); the D of the pixel p is represented by ND(p) Neighborhood: (x+1, y-1); (x-1, y+1); (x-1, y-1).
2.3)8邻接:如图1-3所示,像素p(x,y)的8邻域是:4邻域的像素+D邻域的像素,用N8(p)表示像素p的8邻域。2.3) 8 adjacency: As shown in Figure 1-3, the 8 neighborhoods of the pixel p(x, y) are: pixels of the 4 neighborhood + D neighborhood, and the 8 neighborhood of the pixel p is represented by N8(p) .
3)连通,两个像素连接(1)是邻接的;(2)灰度值(或其他属性)满足某个特定的相似准则(灰度相等或在某个集合中等条件)。3) Connected, two pixel connections (1) are contiguous; (2) Gray values (or other attributes) satisfy a particular similarity criterion (gray equal or moderate in a certain set).
连通包括以下几种类型:Connectivity includes the following types:
3.1)4连通3.1) 4 connectivity
如图1-4所示,对于具有灰度值V的像素p和q,如果q在集合N4(p)中,则称这两个像素是4连通。As shown in FIGS. 1-4, for pixels p and q having a gray value V, if q is in the set N4(p), the two pixels are said to be 4 connected.
3.2)8连通3.2) 8 connectivity
如图1-5所示,对于具有值V的像素p和q,如果q在集合N8(p)中,则称这两个像素是8连通的。As shown in FIGS. 1-5, for pixels p and q having a value of V, if q is in the set N8(p), the two pixels are said to be 8-connected.
如图1-6所示,对于具有值灰度值V的像素p和q,如果:As shown in Figure 1-6, for pixels p and q with a value of gray value V, if:
I.q在集合N4(p)中,或,I.q in the set N4(p), or,
II.q在集合ND(p)中,并且N4(p)与N4(q)的交集为空(没有灰度值V的像素),则像素p和q是m连通的,即4连通和D连通的混合连通。II.q is in the set ND(p), and the intersection of N4(p) and N4(q) is empty (pixels without gray value V), then pixels p and q are connected by m, ie, 4 connected and D Connected hybrids are connected.
4)连通区域,也成连通域,彼此连通(上述的任意一种连通方式)的像素形成了一个区域,而不连通的点形成了不同的区域。这样的一个所有的点彼此连通点构成的集合,称为连通域。 4) The connected areas are also connected to each other, and the pixels that communicate with each other (any of the above-described communication methods) form one area, and the unconnected points form different areas. Such a set of points where all points are connected to each other is called a connected domain.
本发明实施例提供一种用于在图像(包括扫描格式的图像和非扫描格式的图像)中检测文字的方法及系统、设备、存储介质,这里记载的图像不仅包括常规的扫描格式的图像如PDF格式,还可以包括非文档图像如联合照片专家组(JPG)图像,位图(BMP)图像、标签图像文件格式(TIFF)图像、图形交换格式(GIF)图像、以及可交换的图像文件格式(EXIF)图像等任意形式的图像。Embodiments of the present invention provide a method, system, device, and storage medium for detecting characters in images (including images in a scan format and images in a non-scan format), and the images described herein include not only images in a conventional scan format, such as PDF format, which can also include non-document images such as Joint Photographic Experts Group (JPG) images, bitmap (BMP) images, Tagged Image File Format (TIFF) images, Graphic Interchange Format (GIF) images, and interchangeable image file formats. (EXIF) An image of any form such as an image.
本发明实施例记载的文字检测方法和系统、设备、存储介质,通过实施文件检测方法对图像中承载文字的区域进行定位,文件检测系统进行文字检测的图像可以是文档图像如PDF文档,也可以是非文档图像,如JPG图像、BMP图像、TIFF图像、GIF图像以及EXIF图像,作为图像的一个来源,主要是电子设备(如智能手机、平板电脑、笔记本电脑)的截屏、海报杂志等印刷品的扫描电子版、以及其他含有印刷体汉字的数字图像。The character detection method, system, device, and storage medium according to the embodiments of the present invention perform a file detection method to locate an area in which an image is carried in an image, and the image detected by the file detection system may be a document image such as a PDF document, or Non-document images, such as JPG images, BMP images, TIFF images, GIF images, and EXIF images, as a source of images, mainly for the scanning of electronic devices (such as smart phones, tablets, laptops), prints of posters, etc. Electronic version, and other digital images containing printed Chinese characters.
参见图4示出的本发明实施例提供的文件检测方法的一个可选的流程示意图,在步骤101中将目标图像的三色通道中的每个图像进行减色处理,得到减色图像,以及,将所述目标图像转换为二值图像;在步骤102中,将所述减色图像中具有相同色彩的连通块进行合并,以及将所述二值图像中具有相同色彩的连通块合并;在步骤103中对所述减色图像三色通道的每种色彩通道的连通块、以及所述二值图像中的连通块,分别在竖直和水平方向上以连接的方式进行合并,得到所述目标图像中候选的文字区域;在步骤104中,在所述目标图像上对应所述候选的文字区域的位置提取特定区域,基于所提取的所述特定区域中包含文字区域的概率与预设概率阈值的比较结果判断所述提取的特定区域中是否包含文字行或文字列。Referring to FIG. 4, an optional flowchart of the file detection method provided by the embodiment of the present invention is shown. In step 101, each image in the three color channels of the target image is subjected to color reduction processing to obtain a subtractive image, and Converting the target image into a binary image; in step 102, combining the connected blocks having the same color in the subtracted image, and merging the connected blocks having the same color in the binary image; In step 103, the connected blocks of each color channel of the color-reduced image three-color channel and the connected blocks in the binary image are respectively combined in a vertical and horizontal direction to obtain the a candidate text area in the target image; in step 104, extracting a specific area on the target image corresponding to the position of the candidate text area, based on the extracted probability and preset probability of the text area in the specific area The comparison result of the thresholds determines whether a text line or a character string is included in the extracted specific region.
可以看出,文字检测系统通过将图像的色彩聚类、分层,连通块合并和过滤,以及基于深度卷积神经网络的判别,定位如图6至图9示出的图像中的文本行(或者为文字列,如汉字的文字行、当然也可以为字母如英 文字母、数字、符号的文本行,或汉字、字母、数字、符号等任意类型的字符组合形成的文本行),从而基于定位的文本行对文本行中的文字进行识别。It can be seen that the text detection system locates the lines of text in the image as shown in Figures 6 to 9 by color clustering, layering, connected block merging and filtering, and discriminating based on a deep convolutional neural network ( Or a text column, such as a Chinese character, or a letter such as English. A text line of letters, numbers, symbols, or a combination of characters of any type such as Chinese characters, letters, numbers, symbols, etc., thereby identifying text in the text line based on the positioned text line.
就本发明实施例提供的文字检测系统来说,文字检测系统可以由分布设置的多个服务器实现。For the character detection system provided by the embodiment of the invention, the text detection system can be implemented by a plurality of servers arranged in a distributed manner.
例如,参见图2示出的文字检测系统20的一个可选的结构示意图,由多个服务器(服务器21至服务器24)协同完成从图像中检测文字,即每个服务器完成文件检测方法的至少部分步骤,并将处理结果发送到依赖此处理结果的其他服务器,以形成文字检测的最终结果。For example, referring to an optional structural diagram of the character detection system 20 shown in FIG. 2, a plurality of servers (the server 21 to the server 24) cooperate to detect text from the image, that is, each server completes at least part of the file detection method. The steps are sent to other servers that rely on the results of this processing to form the final result of the text detection.
当然,对于图5示出的文字检测系统的中服务器来说,每个服务器可以并行对不同图像(或相同图像)进行文字检测,即服务器21进行文字检测时不依赖于其他服务器(服务器22至服务器24)的检测结果。Of course, for the server in the character detection system shown in FIG. 5, each server can perform text detection on different images (or the same image) in parallel, that is, the server 21 does not depend on other servers when performing text detection (the server 22 to The detection result of the server 24).
就本发明实施例提供的文字检测设备来说,参见图3示例性示出的本发明实施例提供的用于文字检测的电子设备30的结构的示意图。图3示出电子设备30的结构仅仅是适当的结构的一个实例并且不旨在建议关于电子设备的结构的任何限制。电子设备30包括个人计算机、服务器计算机、手持式或膝上型设备、移动设备(比如移动电话、个人数字助理(PDA)、媒体播放器等等)、消费型电子设备、小型计算机、大型计算机、包括任意的上述设备的分布式计算环境等。For the character detecting device provided by the embodiment of the present invention, a schematic diagram of the structure of the electronic device 30 for character detection provided by the embodiment of the present invention exemplarily shown in FIG. 3 is shown. FIG. 3 shows that the structure of the electronic device 30 is only one example of a suitable structure and is not intended to suggest any limitation with respect to the structure of the electronic device. The electronic device 30 includes a personal computer, a server computer, a handheld or laptop device, a mobile device (such as a mobile phone, a personal digital assistant (PDA), a media player, etc.), a consumer electronic device, a small computer, a mainframe computer, A distributed computing environment, etc., including any of the above devices.
尽管没有要求,但是在“计算机可读指令”被一个或多个电子设备执行的通用背景下描述实施例。计算机可读指令可以经由计算机可读介质来分布(下文讨论)。计算机可读指令可以实现为程序模块,比如执行特定任务或实现特定抽象数据类型的功能、对象、应用编程接口(API)、数据结构等等。典型地,该计算机可读指令的功能可以在各种环境中随意组合或 分布。Although not required, embodiments are described in the general context in which "computer readable instructions" are executed by one or more electronic devices. Computer readable instructions may be distributed via computer readable media (discussed below). Computer readable instructions may be implemented as program modules, such as functions, objects, application programming interfaces (APIs), data structures, etc. that perform particular tasks or implement particular abstract data types. Typically, the functionality of the computer readable instructions can be arbitrarily combined in various environments or distributed.
图3图示了包括本发明实施例的提供的电子设备30的结构的实例。在一种配置中,电子设备30包括至少一个处理单元31和存储单元32。根据电子设备的确切配置和类型,存储单元32可以是易失性的(比如RAM)、非易失性的(比如ROM、闪存等)或二者的某种组合。该配置在图3中由虚线图示。FIG. 3 illustrates an example of the structure of an electronic device 30 provided in accordance with an embodiment of the present invention. In one configuration, electronic device 30 includes at least one processing unit 31 and storage unit 32. Depending on the exact configuration and type of electronic device, memory unit 32 may be volatile (such as RAM), non-volatile (such as ROM, flash memory, etc.) or some combination of the two. This configuration is illustrated by dashed lines in FIG.
在其他实施例中,电子设备30可以包括附加特征和/或功能。例如,电子设备30还可以包括附加的存储装置(例如可移除和/或不可移除的),其包括但不限于磁存储装置、光存储装置等等。这种附加存储装置在图3中由存储单元33图示。在一个实施例中,用于实现本发明实施例所提供的一个或多个实施例的计算机可读指令可以在存储单元33中。存储单元33还可以存储用于实现操作系统、应用程序等的其他计算机可读指令。计算机可读指令可以载入存储单元32中由例如处理单元31执行。In other embodiments, electronic device 30 may include additional features and/or functionality. For example, electronic device 30 may also include additional storage devices (eg, removable and/or non-removable) including, but not limited to, magnetic storage devices, optical storage devices, and the like. This additional storage device is illustrated by storage unit 33 in FIG. In one embodiment, computer readable instructions for implementing one or more embodiments provided by embodiments of the present invention may be in storage unit 33. The storage unit 33 may also store other computer readable instructions for implementing an operating system, an application, and the like. Computer readable instructions may be loaded into storage unit 32 for execution by, for example, processing unit 31.
本发明实施例所使用的术语“计算机可读介质”包括计算机存储介质。计算机存储介质包括以用于存储诸如计算机可读指令或其他数据之类的信息的任何方法或技术实现的易失性和非易失性、可移除和不可移除介质。存储单元32和存储单元33是计算机存储介质的实例。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字通用盘(DVD)或其他光存储装置、盒式磁带、磁带、磁盘存储装置或其他磁存储设备、或可以用于存储期望信息并可以被电子设备30访问的任何其他介质。任意这样的计算机存储介质可以是电子设备30的一部分。The term "computer readable medium" as used in the embodiments of the invention includes computer storage media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions or other data. The storage unit 32 and the storage unit 33 are examples of computer storage media. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical storage device, magnetic tape cassette, magnetic tape, magnetic disk storage device or other magnetic storage device, Or any other medium that can be used to store desired information and that can be accessed by electronic device 30. Any such computer storage media may be part of the electronic device 30.
电子设备30还可以包括允许电子设备30与其他设备通信的通信连接36。通信连接36可以包括但不限于调制解调器、网络接口卡(NIC)、集成网络接口、射频发射器/接收器、红外端口、USB连接或用于将电子设备30连接到其他电子设备的其他接口。通信连接36可以包括有线连接或无线 连接。通信连接36可以发射和/或接收通信媒体。 Electronic device 30 may also include a communication connection 36 that allows electronic device 30 to communicate with other devices. Communication connection 36 may include, but is not limited to, a modem, a network interface card (NIC), an integrated network interface, a radio frequency transmitter/receiver, an infrared port, a USB connection, or other interface for connecting electronic device 30 to other electronic devices. Communication connection 36 may include a wired connection or wireless connection. Communication connection 36 can transmit and/or receive communication media.
术语“计算机可读介质”可以包括通信介质。通信介质典型地包含计算机可读指令或诸如载波或其他传输机构之类的“已调制数据信号”中的其他数据,并且包括任何信息递送介质。术语“已调制数据信号”可以包括这样的信号:该信号特性中的一个或多个按照将信息编码到信号中的方式来设置或改变。The term "computer readable medium" can include a communication medium. Communication media typically embodies computer readable instructions or other data in a "modulated data signal" such as a carrier wave or other transport mechanism, and includes any information delivery media. The term "modulated data signal" can include a signal that one or more of the signal characteristics are set or changed in such a manner as to encode the information into the signal.
电子设备30可以包括输入单元35,比如键盘、鼠标、笔、语音输入设备、触摸输入设备、红外相机、视频输入设备和/或任何其他输入设备。电子设备30中也可以包括输出单元34,比如一个或多个显示器、扬声器、打印机和/或任意其他输出设备。输入单元35和输出单元34可以经由有线连接、无线连接或其任意组合连接到电子设备30。在一个实施例中,来自另一个电子设备的输入设备或输出设备可以被用作电子设备30的输入单元35或输出单元34。 Electronic device 30 may include an input unit 35 such as a keyboard, mouse, pen, voice input device, touch input device, infrared camera, video input device, and/or any other input device. Output unit 34 may also be included in electronic device 30, such as one or more displays, speakers, printers, and/or any other output device. Input unit 35 and output unit 34 may be connected to electronic device 30 via a wired connection, a wireless connection, or any combination thereof. In one embodiment, an input device or output device from another electronic device can be used as the input unit 35 or output unit 34 of the electronic device 30.
电子设备30的组件可以通过各种互连(比如总线)连接。这样的互连可以包括外围组件互连(PCI)(比如快速PCI)、通用串行总线(USB)、火线(IEEE 1394)、光学总线结构等等。在另一个实施例中,电子设备30的组件可以通过网络互连。例如,存储单元32可以由位于不同物理位置中的、通过网络互连的多个物理存储器单元构成。The components of electronic device 30 can be connected by various interconnects, such as a bus. Such interconnects may include Peripheral Component Interconnect (PCI) (such as Fast PCI), Universal Serial Bus (USB), Firewire (IEEE 1394), optical bus architecture, and the like. In another embodiment, the components of electronic device 30 may be interconnected by a network. For example, storage unit 32 may be comprised of a plurality of physical memory units that are interconnected by a network located in different physical locations.
再参见图5,本发明实施例文字检测系统检测文字的方法,可以应用于诸如前述的文字检测系统或文字检测设备,包括以下步骤:Referring to FIG. 5 again, the method for detecting text by the character detecting system in the embodiment of the present invention can be applied to a text detecting system or a text detecting device such as the foregoing, including the following steps:
步骤201,对目标图像进行减色处理得到目标图像的减色图像。Step 201: Perform color reduction processing on the target image to obtain a subtractive image of the target image.
在文字检测系统或文字检测设备中输入要检测的目标图像,将目标图像的红绿蓝(RGB)三色的各个通道分别做K个等级的量化(K为整数且255>K>1,例如取值为4),即RGB三色通道中每个通道的亮度划分(例如均匀划分)为K个区间(Bin),也就是将0-255的亮度等级降低为0-(K-1) 等级,将目标图像中每个像素在RGB三色通道的亮度映射到对应通道划分的Bin中。Input the target image to be detected in the text detection system or the text detection device, and quantize each of the channels of the red, green, and blue (RGB) colors of the target image by K (the integer is K and 255>K>1, for example, The value is 4), that is, the luminance division (for example, uniform division) of each channel in the RGB three-color channel is K intervals (Bin), that is, the brightness level of 0-255 is reduced to 0-(K-1). Level, the brightness of each pixel in the target image in the RGB three-color channel is mapped to the Bin of the corresponding channel division.
对于目标图像来说,由于RGG三色通道中每个通道具有256个亮度等级(0-255),因此目标图像可以具有255^3(255的三次方)种色彩,而RGB三色通道中每个通道的亮度划分为K个区间后,目标图像具有K^3(K的三次方,小于255^3)种色彩,因此得到减色图像f1。For the target image, since each channel in the RGG tri-color channel has 256 brightness levels (0-255), the target image can have 255^3 (255 cubic) colors, and each of the RGB three-color channels After the luminance of the channels is divided into K intervals, the target image has K^3 (the square of K, less than 255^3) colors, and thus the subtractive image f1 is obtained.
以K取值为2为例,每个通道在量化后具有0和1两个亮度的等级,也就是将每个通道的亮度等级0-255中的0-127映射到量化后的亮度0,将每个通道的亮度等级0-255中的128-255映射到量化后的亮度1,如目标图像中一个像素的对应RGB三色通道的亮度为(0,122,255),则降色处理后的亮度为(0,0,1),对目标图像中的每个像素进行上述的亮度映射的处理。Taking K as the value of 2, each channel has a level of 0 and 1 brightness after quantization, that is, 0-1727 of the brightness level 0-255 of each channel is mapped to the quantized brightness 0, Map 128-255 of the brightness level 0-255 of each channel to the quantized brightness 1. If the brightness of the corresponding RGB three-color channel of one pixel in the target image is (0, 122, 255), then the color reduction process The subsequent luminance is (0, 0, 1), and the above-described luminance mapping processing is performed for each pixel in the target image.
由于图像中的文字通常有2种情况:1)文字是单色的;2)文字的亮度与文字周边的区域有明显差异。步骤201分别针对上述两种情况实现了以下的技术效果:使减色图像中的文字具有K^3种色彩的一种。Since the characters in the image usually have two kinds of cases: 1) the text is monochrome; 2) the brightness of the text is significantly different from the area around the text. Step 201 achieves the following technical effects for the above two cases: the text in the subtractive image has one of K^3 colors.
步骤202,对目标图像进行局部二值化处理得到目标图像的二值图像。Step 202: Perform local binarization processing on the target image to obtain a binary image of the target image.
将目标图像转换为灰度图(只有一个灰度通道),对灰度图进行局部自适应二值化:将灰度图划分为N个窗口,对这N个窗口中的每一个窗口再按照一个统一的阈值T将窗口内的像素划分为两部分,得到二值图像f2,T为此像素为中心的预设尺寸(如25*25像素)的窗口的高斯加权和。Converting the target image into a grayscale image (only one grayscale channel), local adaptive binarization of the grayscale image: dividing the grayscale image into N windows, and then following each of the N windows A uniform threshold T divides the pixels in the window into two parts, resulting in a binary image f2, which is the Gaussian weighted sum of the windows of the preset size (eg 25*25 pixels) centered on this pixel.
由于图像中的文字通常有2种情况:1)文字是单色的;2)文字的亮度与文字周边的区域有明显差异。步骤202分别针对上述两种情况实现了以下的技术效果:使二值图像中的文字属于黑色或白色的一种。Since the characters in the image usually have two kinds of cases: 1) the text is monochrome; 2) the brightness of the text is significantly different from the area around the text. Step 202 achieves the following technical effects for the above two cases: respectively, the text in the binary image belongs to one of black or white.
步骤201和步骤202中得到的减色图像以及二值图像中的文字对应的像素具有相同的色彩,步骤203中以每个像素作为连通块并将具有相同色 彩的连通块合并,从而将文字进行连接。The pixels corresponding to the text in the subtractive image obtained in step 201 and step 202 and the characters in the binary image have the same color, and each pixel is used as a connected block in step 203 and will have the same color. The color connected blocks are merged to connect the text.
步骤203,识别减色图像中以及二值图像中的连通块,将减色图像中具有相同色彩的连通块进行合并,以及将二值图像中具有相同色彩的连通块合并。 Step 203, identifying the connected blocks in the subtracted image and in the binary image, combining the connected blocks having the same color in the subtracted image, and merging the connected blocks having the same color in the binary image.
对于减色图像f1的RGB三色通道的每个色彩通道的连通块,以及二值图像f2的连通块(只有一个灰度图像),执行以下处理:For the connected block of each color channel of the RGB three-color channel of the subtractive image f1, and the connected block of the binary image f2 (only one grayscale image), the following processing is performed:
1)对每个像素作为一个单独的连通块(也就是连通子图,是图论中的概念,把图像上每个像素作为无向图中的一个顶点,相邻的像素之间视作有一条边,整个图像视作一个无向图)。1) For each pixel as a separate connected block (that is, the connected subgraph, which is a concept in graph theory, each pixel on the image is regarded as a vertex in the undirected graph, and adjacent pixels are regarded as having One side, the entire image is treated as an undirected image).
2)建立并查集,并查集是一种经典的算法,用于高效率地进行连通块合并过程)。2) Establishing and collecting, and collecting is a classic algorithm for efficient interconnection block merging process).
3)遍历减色图像f1,以及二值图像f2的每个像素以执行以下处理:3) traversing the subtractive image f1, and each pixel of the binary image f2 to perform the following processing:
遍历减色图像f1中的像素:对于某一像素,若该像素与8邻接的像素(是指像素的上下左右以及2条对角线的两端的共8个邻接的像素)中的任一像素的色彩(像素在RGB通道中任一通道的色彩是指像素在相应通道的亮度值,像素灰度图中的色彩是指像素在该灰度图中的灰度值)相同,则将相邻的两个色彩相同的像素所属的连通块合并为同一个连通块;然后,遍历每个连通块,对每个连通块的像素面积进行判断:如果连通块k(k的取值范围对应连通块的数量)的像素面积小于像素面积阈值(4像素),则该连通块k(像素面积小于像素面积阈值的)并入与该连通块k相邻的连通块,并将连通块(像素面积小于像素面积阈值的)的色彩设置为所并入的连通块的色彩。Traversing the pixel in the subtractive image f1: for a certain pixel, if the pixel is adjacent to 8 (referring to the pixel up and down and the total of 8 adjacent pixels of the two diagonals) The color of the pixel (the color of any channel in the RGB channel refers to the brightness value of the pixel in the corresponding channel, and the color in the grayscale image of the pixel refers to the gray value of the pixel in the grayscale image), then the adjacent The connected blocks to which two pixels of the same color belong are merged into the same connected block; then, each connected block is traversed, and the pixel area of each connected block is judged: if the connected block k (the value range of k corresponds to the connected block) The number of pixels is smaller than the pixel area threshold (4 pixels), then the connected block k (the pixel area is smaller than the pixel area threshold) is merged into the connected block adjacent to the connected block k, and the connected block (the pixel area is smaller than The color of the pixel area threshold is set to the color of the connected connected block.
例如,对于减色图像f1中的像素i(i取值为I1≥i≥1,I1为减色图像f1中的像素的数量)在RGB三色通道中任一通道X(这里,通道X为RGB三色通道中的任一通道,这里设为R通道)的亮度,若像素i与8邻接的 像素(是指像素i的上下左右以及2条对角线的两端的共8个邻接像素)中的任一像素j在相应通道(与前述假设的R通道一致)的亮度一致,则将像素i所属的连通块与像素j所属的连通块合并为一个连通块。然后,遍历每个连通块,对每个连通块的像素面积进行判断:如果连通块k(k的取值范围为连通块的数量)的像素面积小于阈值(4像素),则将连通块k合并到与连通块k相邻的连通块中,连通块k中像素的色彩设置为连通块k所并入的连通块的亮度。For example, for the pixel i in the subtractive image f1 (i takes the value I 1 ≥ i ≥ 1 , I 1 is the number of pixels in the subtracted image f1) in any of the RGB three-color channels X (here, the channel X is the brightness of any of the RGB three-color channels, here R channel), if the pixels i and 8 are adjacent to each other (refers to the upper and lower sides of the pixel i and the two ends of the two diagonals) Any pixel j in the pixel is consistent in brightness of the corresponding channel (consistent with the aforementioned assumed R channel), and the connected block to which the pixel i belongs and the connected block to which the pixel j belongs are merged into one connected block. Then, traversing each connected block, determining the pixel area of each connected block: if the pixel area of the connected block k (the range of k is the number of connected blocks) is smaller than the threshold (4 pixels), the connected block k Merged into the connected block adjacent to the connected block k, the color of the pixel in the connected block k is set to the luminance of the connected block in which the connected block k is incorporated.
再例如,对于某一像素,若目标图像的灰度图中的像素i(i取值为I2≥i≥1,I2为灰度图中的像素的数量)与8邻接的像素(像素i的上下左右以及2条对角线的两端的共8个像素)中的像素j的色彩(灰度值)相同,则将相邻的像素i和像素j所属的连通块合并为同一个连通块;然后,遍历每个连通块,对每个连通块的像素面积进行判断:如果连通块k(k的取值范围为连通块的数量)的像素面积小于阈值(4像素),则将连通块k合并到与连通块k相邻的连通块中,连通块k中像素的灰度值设置为连通块k所并入的连通块中像素的灰度值。For another example, for a certain pixel, if the pixel i in the grayscale image of the target image (i takes the value I 2 ≥ i ≥ 1, I 2 is the number of pixels in the grayscale image) and the adjacent pixel (pixel) The color (gray value) of the pixel j in the upper and lower left and right sides of i and the total of 8 pixels at both ends of the two diagonal lines are the same, and the adjacent blocks i and the connected blocks to which the pixel j belongs are merged into the same communication. Block; then, traverse each connected block, and judge the pixel area of each connected block: if the pixel area of the connected block k (the range of k is the number of connected blocks) is smaller than a threshold (4 pixels), it will be connected The block k is merged into the connected block adjacent to the connected block k, and the gray value of the pixel in the connected block k is set to the gray value of the pixel in the connected block in which the connected block k is incorporated.
步骤203将属于同一个字符(对于汉字,至少是同一个笔画)的像素合并在一起成为一个称为一个连通块供后续处理。Step 203 combines the pixels belonging to the same character (for the Chinese character, at least the same stroke) into one called a connected block for subsequent processing.
后续步骤204丢弃减色图像中以及二值图像中符合预设特征(这里的预设特征与图像中的非文字区域的特征对应)的连通块。 Subsequent step 204 discards the connected blocks in the subtracted image and in the binary image that match the preset features (the preset features herein correspond to the features of the non-text regions in the image).
步骤204,对减色图像中以及二值图像中的连通块合并后,丢弃减色图像中以及二值图像中符合预设特征(这里的预设特征与图像中的非文字区域的特征对应)的连通块。Step 204: After merging the connected blocks in the subtracted image and the binary image, discarding the preset features in the subtracted image and the binary image (the preset feature here corresponds to the feature of the non-text region in the image) Connected block.
对减色图像f1中每种色彩通道的连通块、以及二值图像f2的连通块分别进行以下处理至少之一:At least one of the following processing is performed on the connected block of each color channel and the connected block of the binary image f2 in the subtractive image f1:
1)丢弃掉连通块中面积仍然小于像素面积阈值(例如4像素)的连通 块,面积仍然小于像素面积阈值(例如4像素)的连通块视为未承载文字;1) Discard the connectivity in the connected block that is still smaller than the pixel area threshold (for example, 4 pixels) A connected block whose area is still smaller than a pixel area threshold (for example, 4 pixels) is regarded as an unsupported character;
2)丢弃掉背景色对应的连通块:连通块任意一边长度大于相应图像边长的第一预设比例(如0.8倍);2) discard the connected block corresponding to the background color: the length of either side of the connected block is greater than the first preset ratio of the edge length of the corresponding image (for example, 0.8 times);
3)丢弃掉边框对应的连通块:连通块任意一边长大于边框长度阈值(如65像素),且连通块像素面积与包围盒积的比值小于比值阈值(如0.22)。连通块的包围盒就是将连通块所含的所有像素都包括进去的最小的矩形(矩形的边对应平行于图像x和y轴,所以可以唯一确定)3) Discard the connected block corresponding to the border: the length of any side of the connected block is greater than the threshold length of the frame (such as 65 pixels), and the ratio of the pixel area of the connected block to the bounding box product is less than the ratio threshold (such as 0.22). The bounding box of the connected block is the smallest rectangle that includes all the pixels contained in the connected block (the sides of the rectangle correspond to the x and y axes of the image, so it can be uniquely determined)
可选地,鉴于图像中包括汉字等笔画不连通的文字的情况,还可以执行步骤206将图像中的文字(如汉字、以及英文字符中的i和j)中不连通的笔画合并到一起。Alternatively, in view of the case where the image includes characters such as Chinese characters that are not connected to the strokes, step 206 may be performed to merge the strokes of the characters in the image (such as Chinese characters and i and j in the English characters).
步骤205,基于减色图像中的每种色彩通道的连通块的位置关系(如距离、交叉)分别进行合并为新的连通块,以及针对二值图像中的连通块基于位置关系(如距离、交叉)进行合并为新的连通块。Step 205: merging the positional relationships (such as distance, intersection) of the connected blocks of each color channel in the color-reduced image into new connected blocks, and based on the positional relationship (eg, distance, for the connected blocks in the binary image) Cross) merges into new connected blocks.
1)合并距离小于距离阈值的连通块(距离是指两个连通块的包围盒中心点的切比雪夫距离d)。1) A connected block whose merge distance is smaller than the distance threshold (distance refers to the Chebyshev distance d of the center point of the bounding box of the two connected blocks).
2)取两个连通块的各自的长宽的平均值的中的最大值,设为ms(max((a1+b1)/2.0,(a2+b2)/2.0)),a1、b1是第一个连通块的包围盒的长度和宽度a2、b2是第二个连通块的包围盒的长度和宽度),取0.4ms作为距离阈值。然后,若满足预设条件如:0.4ms<1或者1<0.4ms<3,且距离d<3;合并所选取的两个连通块。2) Take the maximum value of the average values of the respective lengths and widths of the two connected blocks, and set ms (max((a1+b1)/2.0, (a2+b2)/2.0))), a1, b1 are the first The length and width a2, b2 of the bounding box of one connected block are the length and width of the bounding box of the second connected block, and 0.4 ms is taken as the distance threshold. Then, if the preset condition is met, for example, 0.4 ms < 1 or 1 < 0.4 ms < 3, and the distance d < 3; the selected two connected blocks are merged.
3)对于减色图像f1的RGB三色通道的每个通道的连通块,以及二值图像f2的连通块,合并包围盒存在交叉且交叉部分符合预设交叉特征的连通块。例如,若两个连通块的包围盒存在交叉,则交叉部分的面积大于两个包围盒中面积较小者的面积的预设10%,且交叉部分面积小于图像面积的10%,则合并上述包围盒存在交叉的连个连通块。 3) For the connected block of each channel of the RGB three-color channel of the subtractive image f1, and the connected block of the binary image f2, the merged bounding box has a connected block in which the intersecting portion conforms to the preset intersecting feature. For example, if there is an intersection of the bounding boxes of the two connected blocks, the area of the intersecting portion is greater than a preset 10% of the area of the smaller of the two bounding boxes, and the area of the intersecting portion is less than 10% of the area of the image, The bounding box has a connected block of intersections.
4)合并包围盒对齐且满足预设对齐合并规则的连通块(对齐是指:连通块的包围盒在水平或者竖直方向上对齐,即:1)两个连通块的包围盒高度一致,且在竖直方向位置一致;2)两个连通块的包围盒的宽度一致,且在水平方向的位置一致)合并。4) merging the connected blocks that are aligned and satisfy the preset alignment and merging rules (alignment means that the bounding boxes of the connected blocks are aligned horizontally or vertically, ie: 1) the bounding boxes of the two connected blocks are of the same height, and The position is consistent in the vertical direction; 2) the width of the bounding boxes of the two connected blocks are uniform, and the positions in the horizontal direction are uniform) merge.
对齐合并规则的一个示例为:将对齐的连通块合并后,两个连通块的包围盒(也就是包含两个包围盒的最小包围盒)相对两个连通块的包围盒面积加和的增量小于面积增量比例阈值(如10%),且合并后的包围盒的面积小于图像面积的比例阈值(如10%),则合并这两个连通块的包围盒。An example of an alignment merge rule is: the merge of two connected blocks (ie, the smallest bounding box containing two bounding boxes) and the bounding box area of the two connected blocks. If the area of the bounding box is smaller than the area threshold of the image area (for example, 10%), the bounding box of the two connected blocks is merged.
步骤206,对减色图像f1的RGB三色通道的每种色彩通道的连通块、以及二值图像f2中的连通块,分别在竖直和水平方向上以连接的方式进行合并,得到图像中候选的文字区域(包括文字行区域和文字列区域)。Step 206: Combine the connected blocks of each color channel of the RGB three-color channel of the subtractive image f1 and the connected blocks of the binary image f2 in a vertical and horizontal direction, respectively, to obtain an image. Candidate text area (including text line area and text column area).
目的在将单个的文字(如汉字)连接成文字行或者列:基于连接合并规则(对水平方向的合并和竖直方向的合并使用相同的连接合并规则,后续进行说明)首先对连通块的进行一次水平方向的合并,然后再做一次竖直方向的合并,最后再做一次水平方向的合并。The purpose is to connect a single text (such as a Chinese character) into a text line or column: based on the join merge rule (the same join merge rule is used for the merge of the horizontal direction and the vertical direction, which is described later). A horizontal merge, then a vertical merge, and finally a horizontal merge.
通常在图像中横排方式的文字比竖排文字常见,所以步骤206中首先对连通块进行水平方向的合并,,保证水平排列的文字首先进行合并,减小水平文字被错误地竖直合并的可能,然后对连通块进行竖直方向的合并,把不满足水平合并规则但满足竖直合并规则的合并好;但是这个过程中因为连通块的包围盒可能是变化的,产生新的满足水平合并规则的包围盒对,所以再做一次水平方向的连通块的合并。Generally, the horizontally arranged text in the image is more common than the vertical text, so in step 206, the connected blocks are first merged horizontally, so that the horizontally arranged characters are first merged, and the horizontal characters are erroneously vertically merged. Possibly, then merge the connected blocks vertically, and merge the ones that do not satisfy the horizontal merge rule but satisfy the vertical merge rule; but in this process, because the bounding box of the connected block may be changed, a new satisfaction level merge is generated. The bounding box pair of rules, so do another merging of connected blocks in the horizontal direction.
连接合并规则的一个示例为两个连通块的包围盒满足以下条件至少之一连接两个连通块为新的连通块:An example of a connection merge rule is that a bounding box of two connected blocks satisfies at least one of the following conditions: connecting two connected blocks as new connected blocks:
1)两个连通块的包围盒在参考轴向(水平轴或竖直轴)上的中心距离(两个包围盒在相应参考轴向上的中心的坐标的距离)或者边缘距离中的 最小距离(两个包围盒在参考轴向上的边缘坐标之间的距离),小于两个包围盒对应参考轴向的边长(与参考轴向方向一致边长)中最小边长的第一预设比例(如0.15倍);1) the center distance of the bounding box of the two communicating blocks on the reference axis (horizontal axis or vertical axis) (the distance between the coordinates of the two bounding boxes at the center of the corresponding reference axis) or the edge distance The minimum distance (the distance between the edge coordinates of the two bounding boxes on the reference axis) is less than the first of the minimum side lengths of the side lengths of the two bounding boxes corresponding to the reference axis (the side length coincident with the reference axial direction) Preset ratio (eg 0.15 times);
由于两个包围盒在相应参考轴向上的坐标范围可能是分离的也可能是部分重合的,所以采用中心距离或边缘距离中较小距离的方式能够最准确表征两个连通块的包围盒在相应参考轴向上的距离。Since the coordinate ranges of the two bounding boxes on the respective reference axes may be separated or partially coincident, the use of a smaller distance between the center distance or the edge distance enables the most accurate representation of the bounding boxes of the two connected blocks. Corresponding reference to the distance in the axial direction.
2)两个连通块的包围盒在垂直于参考轴向的方向的距离小于两个包围盒对应垂直于参考轴向的边长中最小边长的第二预设比例(如两倍);2) the bounding box of the two connecting blocks has a distance in a direction perpendicular to the reference axis that is smaller than a second preset ratio (eg, twice) of the minimum side length of the side lengths of the two bounding boxes corresponding to the reference axis;
3)两个连通块的包围盒在参考轴向的边长的差值(两个2连通块的包围盒的对应参考轴向的边长的差值)小于两个连通块的包围盒在相应参考轴向的边长中最小边长的第三预设比例(如30%)。3) the difference between the side lengths of the bounding boxes of the two connected blocks in the reference axis (the difference in the side lengths of the corresponding reference axes of the bounding boxes of the two connected blocks) is smaller than the bounding box of the two connected blocks The third preset ratio (eg, 30%) of the minimum side length of the side length of the reference axis.
步骤207,在目标图像上对应连接到一起的连通块对应的包围盒(也就是包含文字行或文字列的候选的文字区域)的位置提取特定区域,对于每个提取的特定区域,基于特定区域中包含文字行或文字列的概率对应判断该特定区域中是否包含文字行或文字列。Step 207: extract a specific area on the target image corresponding to the position of the bounding box (that is, the text area of the candidate including the character line or the character string) corresponding to the connected block connected together, and for each extracted specific area, based on the specific area The probability of including a text line or a character string corresponds to whether or not a character line or a character string is included in the specific area.
前述步骤201至步骤206中,将在减色图像f1和二值图像f2得到连接的包围盒,也就是被连接成一行的包围盒的并集得到的新的包围盒,形状上为矩形,也就是潜在的包括文字行或文字列的区域(也就是候选的文字区域),在目标图像I上提取出一个感兴趣区域(ROIregion of interest,也就是前述的特定区域,从目标图像I中以方框、圆、椭圆、不规则多边形等方式勾勒出的需要处理的区域),以特定滑窗步长如以该区域的最短边长S为窗口边长,0.5S为滑窗步长滑窗送入预先训练好的卷积神经网络(CNN)分类器中判别,得到每个滑窗内包含文字的概率p_w,对所有p_w取平均值,得到候选的文字区域是文字行(或文字列)的概率p_l,若概率p_l大于预设的概率阈值(取0.5),则判定感兴趣区域内存在文字行(或文字列)。 In the foregoing steps 201 to 206, the bounding box obtained by connecting the subtractive image f1 and the binary image f2, that is, the new bounding box obtained by the union of the bounding boxes connected in a row, is rectangular in shape, and also Is a potential area including a character line or a character string (that is, a candidate text area), and extracts a region of interest (ROIregion of interest, that is, the aforementioned specific region, from the target image I in the target image I) The area to be processed by the frame, circle, ellipse, irregular polygon, etc.), with the specific sliding window step length such as the shortest side length S of the area as the window side length, 0.5S is the sliding window step sliding window Determined by a pre-trained convolutional neural network (CNN) classifier, the probability p_w containing text in each sliding window is obtained, and all p_w are averaged to obtain a candidate text area which is a text line (or a character string). Probability p_l, if the probability p_l is greater than a preset probability threshold (taken 0.5), it is determined that there is a text line (or a character column) in the region of interest.
步骤208,对重叠的包围盒合并成一个包围盒并作为包含文字的区域输出。In step 208, the overlapping bounding boxes are merged into one bounding box and output as a region containing text.
步骤201到204保证了包围盒(也即是潜在的文字区域)的位置准确性(即使这个包围盒内是别的图像元素而不是文字行(或文字列),也能够准确地把对应文字行的图像元素丢弃,而步骤208中概率阈值过滤保证通过过滤的包围盒内都包含文字行(或文字列),通过过滤的包围盒均有比较准确的位置,不需要非极大抑制,直接对所有重叠的包围盒,合并成一个包围盒并输出。 Steps 201 to 204 ensure the positional accuracy of the bounding box (that is, the potential text area) (even if the bounding box is another image element instead of a text line (or a character string), the corresponding text line can be accurately The image elements are discarded, and the probability threshold filtering in step 208 ensures that the bounding boxes through the filtering contain text lines (or character columns), and the bounding boxes through the filtering have relatively accurate positions, without requiring non-maximum suppression, directly All overlapping bounding boxes are combined into one bounding box and output.
卷积神经网络训练步骤:Convolutional neural network training steps:
对数据(包含文字的图像),标注其中的汉字,然后对上述步骤206(卷积神经网络过滤前)的输出进行筛选,选取于标注接近的部分,将包围盒按照上述步骤208中的方法切割为滑窗,人工分离出属于文字和不属于文字的窗口,所有的窗口被缩放至32*32像素。For the data (the image containing the text), mark the Chinese characters therein, and then filter the output of the above step 206 (before the convolutional neural network filtering), select the portion close to the label, and cut the bounding box according to the method in the above step 208. For sliding windows, manually separate windows belonging to text and not belonging to text, all windows are scaled to 32*32 pixels.
将这些窗口构建训练和验证数据,训练图10和图11示出的卷积神经网络,训练时每个数据被随机裁剪成27*27像素大小,并随机翻转。使用随机梯度下降(SGD,Stochastic gradient descent)方式训练卷积神经网络,训练的批尺寸(batch_size)取50,权值衰减(weight_decay)取0.0005,动量(momentum)取0.9,学习率(learning rate)以如下公式计算:lr=base_lr*(1+0.0001*iter)^(-0.75),iter是迭代的次数,前10万次迭代,base_lr取0.001,之后取0.0001。These windows are constructed to train and validate the data, and the convolutional neural networks shown in Figures 10 and 11 are trained. Each data is randomly clipped to a 27*27 pixel size during training and randomly flipped. The convolutional neural network is trained by Stochastic gradient descent (SGD). The batch size (batch_size) of the training is 50, the weight attenuation (weight_decay) is 0.0005, the momentum (momentum) is 0.9, and the learning rate is (learning rate). Calculated by the following formula: lr = base_lr * (1 + 0.0001 * ititer) ^ (-0.75), iter is the number of iterations, the first 100,000 iterations, base_lr takes 0.001, then take 0.0001.
继续对本发明实施例提供的文字检测系统的功能结构的划分进行说明,参见图12示出的本发明实施例提供的一种文字检测系统12,包括:减色二值处理单元121第一合并单元122第二合并单元123判断单元124A description will be given of the division of the functional structure of the text detection system provided by the embodiment of the present invention. Referring to the text detection system 12 provided by the embodiment of the present invention, the first merging unit of the subtractive binary processing unit 121 is included. 122 second merging unit 123 determining unit 124
减色二值处理单元121,配置为将目标图像的三色通道中的每个图像进行减色处理,得到减色图像,以及,将所述目标图像转换为二值图像; The subtractive color binary processing unit 121 is configured to perform color reduction processing on each of the three color channels of the target image to obtain a subtractive color image, and convert the target image into a binary image;
第一合并单元122,配置为将所述减色图像中具有相同色彩的连通块进行合并,以及将所述二值图像中具有相同色彩的连通块合并;a first merging unit 122 configured to merge connected blocks having the same color in the reduced color image, and merge connected blocks having the same color in the binary image;
第二合并单元123,配置为对所述减色图像三色通道的每种色彩通道的连通块、以及所述二值图像中的连通块,分别在竖直和水平方向上以连接的方式进行合并,得到所述目标图像中候选的文字区域;a second merging unit 123 configured to connect the connected blocks of each color channel of the color-reduced image three-color channel and the connected blocks in the binary image in a vertical and horizontal direction Combining to obtain candidate text regions in the target image;
判断单元124,配置为在所述目标图像上对应所述候选的文字区域的位置提取特定区域,基于所提取的所述特定区域中包含文字区域的概率与预设概率阈值的比较结果判断所述提取的特定区域中是否包含文字行或文字列。The determining unit 124 is configured to extract a specific area on the target image corresponding to the position of the candidate text area, and determine, according to the comparison result of the extracted probability of including the text area in the specific area and a preset probability threshold. Whether the extracted specific area contains a text line or a character column.
可选地,所述减色二值处理单元121,还配置为将所述目标图像的红绿蓝三色通道中每个通道分别做K个等级的量化得到K个等级的区间;Optionally, the subtractive color binary processing unit 121 is further configured to quantize each of the red, green, and blue color channels of the target image into K levels to obtain K levels of intervals;
将所述目标图像中每个像素在RGB三色通道的亮度映射到对应通道量化的区间中,K为整数且255>K>1。The luminance of each pixel in the target image in the RGB three-color channel is mapped to the corresponding channel quantization interval, K is an integer and 255>K>1.
可选地,所述第一合并单元122,还配置为对所述减色图像中以及所述二值图像中的每个像素作为一个单独的连通块,建立针对所述像素的并查集执行以下处理:Optionally, the first merging unit 122 is further configured to establish, for each pixel in the subtracted image and the binary image as a single connected block, perform a parallel check execution on the pixel. The following processing:
所述第一合并单元122,还配置为若所述像素与8邻接的像素中的任一像素的色彩相同,则将相邻的两个色彩相同的像素所属的连通块合并为同一个连通块The first merging unit 122 is further configured to merge the connected blocks to which the two adjacent pixels of the same color belong to the same connected block if the color of any one of the pixels adjacent to the pixel is the same.
所述第一合并单元122,还配置为对每个所述连通块的像素面积进行判断,如果所述连通块的像素面积小于像素面积阈值,则将所述连通块并入与所述连通块相邻的连通块,并将所述连通块的色彩设置为所并入的连通块的色彩。The first merging unit 122 is further configured to determine a pixel area of each of the connected blocks, and if the pixel area of the connected block is smaller than a pixel area threshold, merge the connected block with the connected block Adjacent connecting blocks, and the color of the connected block is set to the color of the connected connected block.
可选地,所述系统还包括:Optionally, the system further includes:
丢弃处理单元125,配置为在所述第一合并单元122将所述减色图像中 具有相同色彩的连通块进行合并,以及将所述二值图像中具有相同色彩的连通块合并之后,丢弃减色图像中以及二值图像中符合预设特征的连通块;所述预设特征包括以下至少之一:a discarding processing unit 125 configured to be in the subtractive image in the first merging unit 122 Connected blocks having the same color are merged, and after the connected blocks having the same color in the binary image are merged, the connected blocks in the subtracted image and in the binary image that conform to the preset features are discarded; the preset features include At least one of the following:
丢弃掉所述连通块中面积小于像素面积阈值的连通块;Discarding the connected block in the connected block whose area is smaller than the pixel area threshold;
丢弃所述连通块中任意一边长度大于相应图像边长的第一预设比例的连通块;And discarding, in the connected block, a connected block of a first preset ratio whose length is greater than a side length of the corresponding image;
丢弃所述连通块中任意一边长大于边框长度阈值,且像素面积与包围盒积的比值小于比值阈值的连通块。A connected block in which any one of the connected blocks is longer than a frame length threshold and the ratio of the pixel area to the bounding box product is smaller than a ratio threshold is discarded.
可选地,所述系统还包括Optionally, the system further includes
第三合并单元126,配置为在所述第一合并单元122将所述减色图像中具有相同色彩的连通块进行合并,以及将所述二值图像中具有相同色彩的连通块合并之后,基于所述减色图像中的每种色彩通道的连通块的位置关系分别进行合并为新的连通块,以及针对所述二值图像中的连通块基于位置关系进行合并为新的连通块;a third merging unit 126 configured to merge the connected blocks having the same color in the reduced color image at the first merging unit 122, and merge the connected blocks having the same color in the binary image, based on The positional relationship of the connected blocks of each color channel in the color reduction image is separately merged into a new connected block, and the connected blocks in the binary image are merged into a new connected block based on a positional relationship;
其中,第三合并单元126,还配置为执行以下处理至少之一:The third merging unit 126 is further configured to perform at least one of the following processes:
合并距离小于距离阈值的连通块;a connected block whose merge distance is less than a distance threshold;
取任意两个所述连通块的各自的长宽的平均值的中的最大值,若所述最大值满足预设条件,合并所选取的所述两个连通块;Taking a maximum value of the average values of the respective lengths and widths of any two of the connected blocks, and if the maximum value satisfies a preset condition, combining the selected two connected blocks;
合并包围盒存在交叉且交叉部分符合预设交叉特征的连通块;The merged bounding box has a connected block that intersects and the intersecting portion conforms to the preset cross feature;
合并包围盒对齐且满足预设对齐合并规则的连通块。Merges the connected blocks whose bounding box is aligned and meets the preset alignment merge rules.
可选地,所述第二合并单元123,还配置为基于连接合并规则不同类型的依次进行水平方向的合并、竖直方向的合并,以及水平方向的合并;其中,所述连接合并规则包括:Optionally, the second merging unit 123 is further configured to perform merging in a horizontal direction, merging in a vertical direction, and merging in a horizontal direction according to different types of connection merging rules; wherein the connection merging rule includes:
满足以下条件至少之一连接选取的两个连通块为新的连通块:The two connected blocks selected by the connection satisfying at least one of the following conditions are new connected blocks:
两个所述连通块的包围盒在参考轴向上的中心距离或者边缘距离中的 最小距离,小于两个所述连通块的包围盒对应所述参考轴向的边长中最小边长的第一预设比例;The bounding box of the two connecting blocks in the center distance or the edge distance in the reference axis a minimum distance, less than a first predetermined ratio of the minimum side length of the side lengths of the reference axes corresponding to the bounding boxes of the two connected blocks;
两个所述连通块的包围盒在在垂直于所述参考轴向的方向上的距离小于两个所述连通块的包围盒在垂直于所述参考轴向的边长中最小边长的第二预设比例;The bounding boxes of the two connecting blocks have a distance in a direction perpendicular to the reference axis smaller than a minimum side length of the bounding boxes of the two connecting blocks in a side length perpendicular to the reference axis Two preset ratios;
两个所述连通块的包围盒在所述参考轴向的边长的差值小于两个所述连通块的包围盒对应所述参考轴向的边长中最小边长的第三预设比例。The difference between the side lengths of the bounding boxes of the two connecting blocks in the reference axis is smaller than the third preset ratio of the minimum side length of the side lengths of the bounding boxes of the two connecting blocks corresponding to the reference axis .
可选地,所述判断单元124,还配置为以所述目标图像上提取出一个感兴趣区域,将在所述减色图像和所述二值图像得到连接的包围盒,以特定滑窗步长滑窗将在所述减色图像和所述二值图中连接得到的包围盒送入卷积神经网络分类器中判别,得到每个所述滑窗内包含文字的概率;Optionally, the determining unit 124 is further configured to extract a region of interest on the target image, and obtain a bounding box connecting the subtractive image and the binary image to a specific sliding window step. The long sliding window is determined by the bounding box connected to the convolutional neural network classifier in the subtractive image and the binary image, and the probability of containing characters in each sliding window is obtained;
所述判断单元124,还配置为对所述滑窗内包含文字的概率取平均值,得到所述候选的文字区域包括文字行或文字列的概率;The determining unit 124 is further configured to average an probability of including characters in the sliding window, and obtain a probability that the candidate text region includes a character row or a character string;
所述判断单元124,还配置为若所得到的大于预设的概率阈值,则判定所述感兴趣区域内存在文字行或文字列。The determining unit 124 is further configured to determine that a text line or a character string exists in the region of interest if the obtained threshold value is greater than a preset probability.
可以理解地,图12示出的文字检测系统的功能上的划分是示例性地,同样适用于本发明实施例提供的电子设备的功能结构上的划分,本领域技术人员根据针对图12以及针对图12的说明,可以轻易对功能结构进行变形,如将该部分的功能单元进行合并,或者将部分功能单元进行进一步的划分,因此,本发明实施例提供的文字检测系统的可使用的功能结构不仅限于图12所示。It is to be understood that the functional division of the character detection system shown in FIG. 12 is exemplarily applicable to the division of the functional structure of the electronic device provided by the embodiment of the present invention, and the person skilled in the art according to FIG. 12 and In the description of FIG. 12, the functional structure may be easily modified, such as merging the functional units of the part, or further dividing the functional units. Therefore, the functional structure of the character detecting system provided by the embodiment of the present invention is Not limited to Figure 12.
本发明实施例提供一种非易失性的存储介质,所述计算机存储介质中存储有可执行指令,所述可执行指令用于执行图2或图5示出的文件检测方法,存储介质包括:移动存储设备、随机存取存储器(RAM,Random Access Memory)、只读存储器(ROM,Read-Only Memory)、磁碟或者光盘等各种 可以存储程序代码的介质。An embodiment of the present invention provides a non-volatile storage medium, where the computer storage medium stores executable instructions for executing the file detecting method illustrated in FIG. 2 or FIG. 5, where the storage medium includes : Mobile storage devices, random access memory (RAM, Random Access Memory), read-only memory (ROM, Read-Only Memory), disk or optical disk, etc. A medium that can store program code.
综上所述,本发明实施例具有以下有益效果:In summary, the embodiments of the present invention have the following beneficial effects:
本发明实施例提出了图像中文字检测的方法及系统、设备、存储介质,适用于定位网络相册中的图像中的印刷体汉字等文字,输出的结果可作为文字识别系统的输入,帮助最终产生准确的文字识别结果。The embodiment of the invention provides a method, a system, a device and a storage medium for detecting characters in an image, which are suitable for locating characters such as printed Chinese characters in an image in a network album, and the output result can be used as an input of a character recognition system to help ultimately generate Accurate text recognition results.
本领域的技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:移动存储设备、随机存取存储器(RAM,Random Access Memory)、只读存储器(ROM,Read-Only Memory)、磁碟或者光盘等各种可以存储程序代码的介质。It can be understood by those skilled in the art that all or part of the steps of implementing the above method embodiments may be completed by hardware related to program instructions, and the foregoing program may be stored in a computer readable storage medium, and the program is executed when executed. The foregoing storage medium includes: a mobile storage device, a random access memory (RAM), a read-only memory (ROM), a magnetic disk, or an optical disk. A medium that can store program code.
或者,本发明上述集成的单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明实施例的技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机、服务器、或者网络设备等)执行本发明各个实施例所述方法的全部或部分。而前述的存储介质包括:移动存储设备、RAM、ROM、磁碟或者光盘等各种可以存储程序代码的介质。Alternatively, the above-described integrated unit of the present invention may be stored in a computer readable storage medium if it is implemented in the form of a software function module and sold or used as a standalone product. Based on such understanding, the technical solution of the embodiments of the present invention may be embodied in the form of a software product in essence or in the form of a software product, which is stored in a storage medium and includes a plurality of instructions for making A computer device (which may be a personal computer, server, or network device, etc.) performs all or part of the methods described in various embodiments of the present invention. The foregoing storage medium includes various media that can store program codes, such as a mobile storage device, a RAM, a ROM, a magnetic disk, or an optical disk.
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。 The above is only a specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope of the present invention. It should be covered by the scope of the present invention. Therefore, the scope of the invention should be determined by the scope of the appended claims.

Claims (16)

  1. 一种文字检测方法,包括:A text detection method comprising:
    将目标图像的三色通道中的每个图像进行减色处理,得到减色图像;Performing color reduction processing on each of the three color channels of the target image to obtain a subtractive image;
    将所述目标图像转换为二值图像;Converting the target image into a binary image;
    将所述减色图像中具有相同色彩的连通块进行合并,将所述二值图像中具有相同色彩的连通块合并;Combining the connected blocks having the same color in the subtracted image, and merging the connected blocks having the same color in the binary image;
    对所述减色图像三色通道的每种色彩通道的连通块、以及所述二值图像中的连通块,分别在竖直和水平方向上以连接的方式进行合并,得到所述目标图像中候选的文字区域;And connecting the connected blocks of each color channel of the three-color channel of the subtractive image and the connected blocks in the binary image in a vertical manner and a horizontal direction, respectively, to obtain the target image Candidate text area;
    在所述目标图像上对应所述候选的文字区域的位置提取特定区域;Extracting a specific region on the target image corresponding to a position of the candidate text region;
    基于所提取的所述特定区域中包含文字区域的概率与预设概率阈值的比较结果,判断所述提取的特定区域中是否包含文字行或文字列。And determining, according to the comparison result of the extracted probability of including the text area in the specific area and the preset probability threshold, whether the extracted specific area includes a character line or a character string.
  2. 如权利要求1所述的方法,其中,所述将目标图像的三色通道中的每个图像进行减色处理,得到减色图像,包括:The method of claim 1, wherein the color reduction processing is performed on each of the three color channels of the target image to obtain a subtractive image, comprising:
    将所述目标图像的三色通道中每个通道分别做K个等级的量化得到K个等级的区间;Each of the three color channels of the target image is quantized by K levels to obtain K levels of intervals;
    将所述目标图像中每个像素在红绿蓝三色通道的亮度映射到对应通道量化的区间中,K为整数且255>K>1。The luminance of each pixel in the target image in the red, green and blue color channels is mapped into the corresponding channel quantization interval, K is an integer and 255>K>1.
  3. 如权利要求1所述的方法,其中,所述将减色图像中具有相同色彩的连通块进行合并,将所述二值图像中具有相同色彩的连通块合并,包括:The method of claim 1, wherein the combining the connected blocks having the same color in the subtractive image, and combining the connected blocks having the same color in the binary image, comprises:
    对所述减色图像中以及所述二值图像中的每个像素作为一个单独的连通块,建立针对所述像素的并查集执行以下处理:For each pixel in the subtracted image and in the binary image as a single connected block, establishing a parallel collection for the pixel performs the following processing:
    若所述像素与8邻接的像素中的任一像素的色彩相同,则将相邻的两个色彩相同的像素所属的连通块合并为同一个连通块;If the color of any one of the pixels adjacent to the pixel is the same, the connected blocks to which the two adjacent pixels of the same color belong are merged into the same connected block;
    对每个所述连通块的像素面积进行判断,如果所述连通块的像素面积 小于像素面积阈值,则将所述连通块并入与所述连通块相邻的连通块,并将所述连通块的色彩设置为所并入的连通块的色彩。Determining the pixel area of each of the connected blocks, if the pixel area of the connected block Less than the pixel area threshold, the connected block is incorporated into the connected block adjacent to the connected block, and the color of the connected block is set to the color of the incorporated connected block.
  4. 如权利要求1所述的方法,其中,还包括:The method of claim 1 further comprising:
    所述将所述减色图像中具有相同色彩的连通块进行合并,以及将所述二值图像中具有相同色彩的连通块合并之后,Combining the connected blocks having the same color in the subtracted image, and merging the connected blocks having the same color in the binary image,
    丢弃所述减色图像中以及所述二值图像中符合预设特征的连通块;And discarding the connected block in the subtracted image and the binary image that meets the preset feature;
    所述预设特征包括以下至少之一:The preset feature includes at least one of the following:
    所述连通块中面积小于像素面积阈值的连通块;a connected block in the connected block whose area is smaller than a pixel area threshold;
    所述连通块中任意一边长度大于相应图像边长的第一预设比例的连通块;a connected block of any one of the connected blocks having a length greater than a first predetermined ratio of a side length of the corresponding image;
    所述连通块中任意一边长大于边框长度阈值,且像素面积与包围盒积的比值小于比值阈值的连通块。Any one of the connected blocks is longer than the frame length threshold, and the ratio of the pixel area to the bounding box is smaller than the ratio threshold.
  5. 如权利要求1所述的方法,其中,还包括:The method of claim 1 further comprising:
    所述将所述减色图像中具有相同色彩的连通块进行合并,以及将所述二值图像中具有相同色彩的连通块合并之后,Combining the connected blocks having the same color in the subtracted image, and merging the connected blocks having the same color in the binary image,
    基于所述减色图像中的每种色彩通道的连通块的位置关系分别进行合并为新的连通块;And combining the positional relationship of the connected blocks of each color channel in the color reduction image into a new connected block;
    针对所述二值图像中的连通块基于位置关系进行合并为新的连通块;And connecting the connected blocks in the binary image to a new connected block based on a positional relationship;
    其中,所述合并为新的连通块包括以下至少之一:The merging into a new connected block includes at least one of the following:
    合并距离小于距离阈值的连通块;a connected block whose merge distance is less than a distance threshold;
    取任意两个所述连通块的各自的长宽的平均值的中的最大值,若所述最大值满足预设条件,合并所选取的所述两个连通块;Taking a maximum value of the average values of the respective lengths and widths of any two of the connected blocks, and if the maximum value satisfies a preset condition, combining the selected two connected blocks;
    合并包围盒存在交叉且交叉部分符合预设交叉特征的连通块;The merged bounding box has a connected block that intersects and the intersecting portion conforms to the preset cross feature;
    合并包围盒对齐且满足预设对齐合并规则的连通块。Merges the connected blocks whose bounding box is aligned and meets the preset alignment merge rules.
  6. 如权利要求1所述的方法,其中,所述对所述减色图像三色通道的 每种色彩通道的连通块、以及所述二值图像中的连通块,分别在竖直和水平方向上以连接的方式进行合并,得到所述目标图像中候选的文字区域,包括:The method of claim 1 wherein said subtracting image of said three color channel The connected blocks of each color channel and the connected blocks in the binary image are combined in a vertical and horizontal direction, respectively, to obtain candidate text regions in the target image, including:
    基于连接合并规则不同类型的依次进行水平方向的合并、竖直方向的合并、以及水平方向的合并;Merging in the horizontal direction, merging in the vertical direction, and merging in the horizontal direction based on different types of connection merge rules;
    其中,所述连接合并规则包括以下至少之一:The connection merge rule includes at least one of the following:
    两个所述连通块的包围盒在参考轴向上的中心距离或者边缘距离中的最小距离,小于两个所述连通块的包围盒对应所述参考轴向的边长中最小边长的第一预设比例;a minimum distance of a center distance or an edge distance of the bounding box of the two connecting blocks in the reference axial direction, less than a minimum side length of the side lengths of the bounding boxes of the two connecting blocks corresponding to the reference axis a preset ratio;
    两个所述连通块的包围盒在在垂直于所述参考轴向的方向上的距离小于两个所述连通块的包围盒在垂直于所述参考轴向的边长中最小边长的第二预设比例;The bounding boxes of the two connecting blocks have a distance in a direction perpendicular to the reference axis smaller than a minimum side length of the bounding boxes of the two connecting blocks in a side length perpendicular to the reference axis Two preset ratios;
    两个所述连通块的包围盒在所述参考轴向的边长的差值小于两个所述连通块的包围盒对应所述参考轴向的边长中最小边长的第三预设比例。The difference between the side lengths of the bounding boxes of the two connecting blocks in the reference axis is smaller than the third preset ratio of the minimum side length of the side lengths of the bounding boxes of the two connecting blocks corresponding to the reference axis .
  7. 如权利要求1至6任一项所述的方法,其中,所述在所述目标图像上对应所述候选的文字区域的位置提取特定区域,基于所提取的所述特定区域中包含文字区域的概率与预设概率阈值的比较结果判断所述提取的特定区域中是否包含文字行或文字列,包括:The method according to any one of claims 1 to 6, wherein said extracting a specific region on a position corresponding to said candidate text region on said target image, based on said extracted specific region containing text region The comparison result of the probability and the preset probability threshold determines whether the extracted specific area contains a text line or a character string, including:
    以所述目标图像上提取出一个所述特定区域,将在所述减色图像和所述二值图像得到连接的包围盒;Extracting a specific area on the target image, and obtaining a bounding box connecting the subtractive image and the binary image;
    以特定滑窗步长滑窗将在所述减色图像和所述二值图中连接得到的包围盒送入卷积神经网络分类器中判别,得到每个所述滑窗内包含文字的概率;Deriving a bounding box obtained by connecting the subtracted image and the binary image into a convolutional neural network classifier with a specific sliding window step sliding window, and obtaining a probability that each sliding window contains text ;
    对所述滑窗内包含文字的概率取平均值,得到所述候选的文字区域包括文字行或文字列的概率; Having an average of the probability of including text in the sliding window, and obtaining a probability that the candidate text area includes a character line or a character string;
    若所得到的大于预设的概率阈值,则判定所述特定区域内存在文字行或文字列。If the obtained threshold is greater than the preset probability threshold, it is determined that there is a text line or a character string in the specific area.
  8. 一种文字检测系统,包括:A text detection system comprising:
    减色二值处理单元,配置为将目标图像的三色通道中的每个图像进行减色处理,得到减色图像;a subtractive binary processing unit configured to perform color reduction processing on each of the three color channels of the target image to obtain a subtractive image;
    所述减色二值处理单元,还配置为将所述目标图像转换为二值图像;The subtractive binary processing unit is further configured to convert the target image into a binary image;
    第一合并单元,配置为将所述减色图像中具有相同色彩的连通块进行合并;a first merging unit configured to merge connected blocks having the same color in the reduced color image;
    所述第一合并单元,还配置为将所述二值图像中具有相同色彩的连通块合并;The first merging unit is further configured to merge the connected blocks having the same color in the binary image;
    第二合并单元,配置为对所述减色图像三色通道的每种色彩通道的连通块、以及所述二值图像中的连通块,分别在竖直和水平方向上以连接的方式进行合并,得到所述目标图像中候选的文字区域;a second merging unit configured to merge the connected blocks of each color channel of the three-color channel of the subtractive image and the connected blocks in the binary image in a vertical and horizontal direction Obtaining a candidate text region in the target image;
    判断单元,配置为在所述目标图像上对应所述候选的文字区域的位置提取特定区域;a determining unit configured to extract a specific area on the target image corresponding to a position of the candidate text area;
    所述判断单元,还配置为基于所提取的所述特定区域中包含文字区域的概率与预设概率阈值的比较结果,判断所述提取的特定区域中是否包含文字行或文字列。The determining unit is further configured to determine whether the extracted specific area includes a character line or a character string based on a comparison result between the extracted probability of including the text area in the specific area and a preset probability threshold.
  9. 如权利要求8所述的系统,其中,The system of claim 8 wherein
    所述减色二值处理单元,还配置为将所述目标图像的三色通道中每个通道分别做K个等级的量化得到K个等级的区间;The subtractive color processing unit is further configured to quantize each of the three color channels of the target image into K levels to obtain K levels of intervals;
    所述减色二值处理单元,还配置为将所述目标图像中每个像素在三色通道的亮度映射到对应通道量化的区间中,K为整数且255>K>1。The subtractive binary processing unit is further configured to map the luminance of each pixel in the target image in the interval of the corresponding color channel to the corresponding channel quantization interval, where K is an integer and 255>K>1.
  10. 如权利要求8所述的系统,其中,The system of claim 8 wherein
    所述第一合并单元,还配置为对所述减色图像中以及所述二值图像中 的每个像素作为一个单独的连通块,建立针对所述像素的并查集执行以下处理:The first merging unit is further configured to be in the subtractive image and in the binary image Each pixel is treated as a separate connected block, and the following processing is performed for the parallel collection of the pixels:
    所述第一合并单元,还配置为若所述像素与8邻接的像素中的任一像素的色彩相同,则将相邻的两个色彩相同的像素所属的连通块合并为同一个连通块;The first merging unit is further configured to merge the connected blocks to which the two adjacent pixels of the same color belong to the same connected block if the color of any one of the pixels adjacent to the pixel is the same;
    所述第一合并单元,还配置为对每个所述连通块的像素面积进行判断,如果所述连通块的像素面积小于像素面积阈值,则将所述连通块并入与所述连通块相邻的连通块,并将所述连通块的色彩设置为所并入的连通块的色彩。The first merging unit is further configured to determine a pixel area of each of the connected blocks, and if the pixel area of the connected block is smaller than a pixel area threshold, merge the connected block with the connected block Adjacent connected blocks, and the color of the connected block is set to the color of the connected connected block.
  11. 如权利要求7所述的系统,其中,还包括:The system of claim 7 further comprising:
    丢弃处理单元,配置为在所述第一合并单元将所述减色图像中具有相同色彩的连通块进行合并,以及将所述二值图像中具有相同色彩的连通块合并之后,丢弃减色图像中以及二值图像中符合预设特征的连通块;所述预设特征包括以下至少之一:a discarding processing unit configured to merge the connected blocks having the same color in the subtracted image in the first merging unit, and merge the connected blocks having the same color in the binary image, and discard the subtractive image a connected block that conforms to a preset feature in the middle and binary images; the preset feature includes at least one of the following:
    所述连通块中面积小于像素面积阈值的连通块;a connected block in the connected block whose area is smaller than a pixel area threshold;
    所述连通块中任意一边长度大于相应图像边长的第一预设比例的连通块;a connected block of any one of the connected blocks having a length greater than a first predetermined ratio of a side length of the corresponding image;
    所述连通块中任意一边长大于边框长度阈值,且像素面积与包围盒积的比值小于比值阈值的连通块。Any one of the connected blocks is longer than the frame length threshold, and the ratio of the pixel area to the bounding box is smaller than the ratio threshold.
  12. 如权利要求8所述的系统,其中,还包括:The system of claim 8 further comprising:
    第四合并单元,配置为在所述第一合并单元将所述减色图像中具有相同色彩的连通块进行合并,以及将所述二值图像中具有相同色彩的连通块合并之后,基于所述减色图像中的每种色彩通道的连通块的位置关系分别进行合并为新的连通块,以及针对所述二值图像中的连通块基于位置关系进行合并为新的连通块; a fourth merging unit configured to merge the connected blocks having the same color in the subtractive image in the first merging unit, and merge the connected blocks having the same color in the binary image, based on the Positional relationships of connected blocks of each color channel in the subtractive image are respectively merged into new connected blocks, and merged into new connected blocks based on the positional relationship for the connected blocks in the binary image;
    其中,所述第四合并单元,还配置为采用以下方式至少之一进行合并为新的连通块:The fourth merging unit is further configured to merge into a new connected block by using at least one of the following manners:
    合并距离小于距离阈值的连通块;a connected block whose merge distance is less than a distance threshold;
    取任意两个所述连通块的各自的长宽的平均值的中的最大值,若所述最大值满足预设条件,合并所选取的所述两个连通块;Taking a maximum value of the average values of the respective lengths and widths of any two of the connected blocks, and if the maximum value satisfies a preset condition, combining the selected two connected blocks;
    合并包围盒存在交叉且交叉部分符合预设交叉特征的连通块;The merged bounding box has a connected block that intersects and the intersecting portion conforms to the preset cross feature;
    合并包围盒对齐且满足预设对齐合并规则的连通块。Merges the connected blocks whose bounding box is aligned and meets the preset alignment merge rules.
  13. 如权利要求8所述的系统,其中,The system of claim 8 wherein
    所述第二合并单元,还配置为基于连接合并规则不同类型的依次进行水平方向的合并、竖直方向的合并、以及水平方向的合并;其中,所述连接合并规则包括:The second merging unit is further configured to perform merging in the horizontal direction, merging in the vertical direction, and merging in the horizontal direction according to different types of connection merging rules; wherein the connection merging rule includes:
    将满足以下条件至少之一连接选取的两个连通块为新的连通块:Connect the two connected blocks selected by at least one of the following conditions to the new connected block:
    两个所述连通块的包围盒在参考轴向上的中心距离或者边缘距离中的最小距离,小于两个所述连通块的包围盒对应所述参考轴向的边长中最小边长的第一预设比例;a minimum distance of a center distance or an edge distance of the bounding box of the two connecting blocks in the reference axial direction, less than a minimum side length of the side lengths of the bounding boxes of the two connecting blocks corresponding to the reference axis a preset ratio;
    两个所述连通块的包围盒在在垂直于所述参考轴向的方向上的距离小于两个所述连通块的包围盒在垂直于所述参考轴向的边长中最小边长的第二预设比例;The bounding boxes of the two connecting blocks have a distance in a direction perpendicular to the reference axis smaller than a minimum side length of the bounding boxes of the two connecting blocks in a side length perpendicular to the reference axis Two preset ratios;
    两个所述连通块的包围盒在所述参考轴向的边长的差值小于两个所述连通块的包围盒对应所述参考轴向的边长中最小边长的第三预设比例。The difference between the side lengths of the bounding boxes of the two connecting blocks in the reference axis is smaller than the third preset ratio of the minimum side length of the side lengths of the bounding boxes of the two connecting blocks corresponding to the reference axis .
  14. 如权利要求8至13任一项所述的系统,其中,A system according to any one of claims 8 to 13, wherein
    所述判断单元,还配置为以所述目标图像上提取出一个特定区域,将在所述减色图像和所述二值图像得到连接的包围盒,以特定滑窗步长滑窗将在所述减色图像和所述二值图中连接得到的包围盒送入卷积神经网络分类器中判别,得到每个所述滑窗内包含文字的概率; The determining unit is further configured to extract a specific area on the target image, and connect the bounding box in the subtractive image and the binary image to a specific sliding window step sliding window Determining the subtractive color image and the bounding box sent to the convolutional neural network classifier in the binary image, and obtaining a probability that each of the sliding windows contains characters;
    所述判断单元,还配置为对所述滑窗内包含文字的概率取平均值,得到所述候选的文字区域包括文字行或文字列的概率;The determining unit is further configured to average an probability of including characters in the sliding window, and obtain a probability that the candidate text area includes a character line or a character string;
    所述判断单元,还配置为若所得到的大于预设的概率阈值,则判定所述特定区域内存在文字行或文字列。The determining unit is further configured to determine that a character line or a character string exists in the specific area if the obtained threshold value is greater than a preset probability.
  15. 一种文字检测设备,包括:存储器和处理器,所述存储器中存储有可执行指令,所述可执行指令用于引起所述处理器执行以下的操作:A text detecting device includes: a memory and a processor, wherein the memory stores executable instructions for causing the processor to perform the following operations:
    将目标图像的三色通道中的每个图像进行减色处理,得到减色图像;Performing color reduction processing on each of the three color channels of the target image to obtain a subtractive image;
    将所述目标图像转换为二值图像;Converting the target image into a binary image;
    将所述减色图像中具有相同色彩的连通块进行合并,将所述二值图像中具有相同色彩的连通块合并;Combining the connected blocks having the same color in the subtracted image, and merging the connected blocks having the same color in the binary image;
    对所述减色图像三色通道的每种色彩通道的连通块、以及所述二值图像中的连通块,分别在竖直和水平方向上以连接的方式进行合并,得到所述目标图像中候选的文字区域;And connecting the connected blocks of each color channel of the three-color channel of the subtractive image and the connected blocks in the binary image in a vertical manner and a horizontal direction, respectively, to obtain the target image Candidate text area;
    在所述目标图像上对应所述候选的文字区域的位置提取特定区域;Extracting a specific region on the target image corresponding to a position of the candidate text region;
    基于所提取的所述特定区域中包含文字区域的概率与预设概率阈值的比较结果,判断所述提取的特定区域中是否包含文字行或文字列。And determining, according to the comparison result of the extracted probability of including the text area in the specific area and the preset probability threshold, whether the extracted specific area includes a character line or a character string.
  16. 一种存储介质,存储有可执行指令,用于执行权利要求1至7任一项所述的文字检测方法。 A storage medium storing executable instructions for performing the character detecting method according to any one of claims 1 to 7.
PCT/CN2017/073407 2016-02-18 2017-02-13 Text detection method and system, device and storage medium WO2017140233A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610091568.8 2016-02-18
CN201610091568.8A CN107093172B (en) 2016-02-18 2016-02-18 Character detection method and system

Publications (1)

Publication Number Publication Date
WO2017140233A1 true WO2017140233A1 (en) 2017-08-24

Family

ID=59625563

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/073407 WO2017140233A1 (en) 2016-02-18 2017-02-13 Text detection method and system, device and storage medium

Country Status (2)

Country Link
CN (1) CN107093172B (en)
WO (1) WO2017140233A1 (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109977956A (en) * 2019-04-29 2019-07-05 腾讯科技(深圳)有限公司 A kind of image processing method, device, electronic equipment and storage medium
CN110059685A (en) * 2019-04-26 2019-07-26 腾讯科技(深圳)有限公司 Word area detection method, apparatus and storage medium
CN111062365A (en) * 2019-12-30 2020-04-24 上海肇观电子科技有限公司 Method, device, chip circuit and computer readable storage medium for identifying mixed typesetting characters
CN111178346A (en) * 2019-11-22 2020-05-19 京东数字科技控股有限公司 Character area positioning method, device, equipment and storage medium
CN111222368A (en) * 2018-11-26 2020-06-02 北京金山办公软件股份有限公司 Method and device for identifying document paragraph and electronic equipment
CN111325199A (en) * 2018-12-14 2020-06-23 中移(杭州)信息技术有限公司 Character inclination angle detection method and device
CN111401110A (en) * 2019-01-03 2020-07-10 百度在线网络技术(北京)有限公司 Method and device for extracting information
CN111681229A (en) * 2020-06-10 2020-09-18 创新奇智(上海)科技有限公司 Deep learning model training method, wearable clothes flaw identification method and wearable clothes flaw identification device
CN112650832A (en) * 2020-12-14 2021-04-13 中国电子科技集团公司第二十八研究所 Knowledge correlation network key node discovery method based on topology and literature characteristics
CN113806505A (en) * 2021-09-09 2021-12-17 科大讯飞股份有限公司 Element comparison method and device, electronic equipment and storage medium

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108205676B (en) * 2017-11-22 2019-06-07 西安万像电子科技有限公司 The method and apparatus for extracting pictograph region
CN108989793A (en) * 2018-07-20 2018-12-11 深圳市华星光电技术有限公司 A kind of detection method and detection device of text pixel
CN109191539B (en) * 2018-07-20 2023-01-06 广东数相智能科技有限公司 Oil painting generation method and device based on image and computer readable storage medium
CN109389150B (en) * 2018-08-28 2022-04-05 东软集团股份有限公司 Image consistency comparison method and device, storage medium and electronic equipment
CN109815957A (en) * 2019-01-30 2019-05-28 邓悟 A kind of character recognition method based on color image under complex background
CN110058838B (en) * 2019-04-28 2021-03-16 腾讯科技(深圳)有限公司 Voice control method, device, computer readable storage medium and computer equipment
CN111369441B (en) * 2020-03-09 2022-11-15 稿定(厦门)科技有限公司 Word processing method, medium, device and apparatus
CN111340028A (en) * 2020-05-18 2020-06-26 创新奇智(北京)科技有限公司 Text positioning method and device, electronic equipment and storage medium
CN112149523B (en) * 2020-09-04 2021-05-28 开普云信息科技股份有限公司 Method and device for identifying and extracting pictures based on deep learning and parallel-searching algorithm
CN112418204A (en) * 2020-11-18 2021-02-26 杭州未名信科科技有限公司 Text recognition method, system and computer medium based on paper document

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101615252A (en) * 2008-06-25 2009-12-30 中国科学院自动化研究所 A kind of method for extracting text information from adaptive images
CN101763516A (en) * 2010-01-15 2010-06-30 南京航空航天大学 Character recognition method based on fitting functions
CN103034856A (en) * 2012-12-18 2013-04-10 深圳深讯和科技有限公司 Method and device for locating text area in image
US20130243321A1 (en) * 2012-03-19 2013-09-19 Pfu Limited Image processing apparatus, character recognition method, and computer-readable, non-transitory medium
CN103632159A (en) * 2012-08-23 2014-03-12 阿里巴巴集团控股有限公司 Method and system for training classifier and detecting text area in image

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090148043A1 (en) * 2007-12-06 2009-06-11 International Business Machines Corporation Method for extracting text from a compound digital image
CN101398894B (en) * 2008-06-17 2011-12-07 浙江师范大学 Automobile license plate automatic recognition method and implementing device thereof
CN101447027B (en) * 2008-12-25 2011-12-28 东莞市微模式软件有限公司 Binaryzation method of magnetic code character area and application thereof
CN102136064A (en) * 2011-03-24 2011-07-27 成都四方信息技术有限公司 System for recognizing characters from image
CN103839062B (en) * 2014-03-11 2017-08-08 东方网力科技股份有限公司 A kind of pictograph localization method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101615252A (en) * 2008-06-25 2009-12-30 中国科学院自动化研究所 A kind of method for extracting text information from adaptive images
CN101763516A (en) * 2010-01-15 2010-06-30 南京航空航天大学 Character recognition method based on fitting functions
US20130243321A1 (en) * 2012-03-19 2013-09-19 Pfu Limited Image processing apparatus, character recognition method, and computer-readable, non-transitory medium
CN103632159A (en) * 2012-08-23 2014-03-12 阿里巴巴集团控股有限公司 Method and system for training classifier and detecting text area in image
CN103034856A (en) * 2012-12-18 2013-04-10 深圳深讯和科技有限公司 Method and device for locating text area in image

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111222368B (en) * 2018-11-26 2023-09-19 北京金山办公软件股份有限公司 Method and device for identifying document paragraphs and electronic equipment
CN111222368A (en) * 2018-11-26 2020-06-02 北京金山办公软件股份有限公司 Method and device for identifying document paragraph and electronic equipment
CN111325199B (en) * 2018-12-14 2023-10-27 中移(杭州)信息技术有限公司 Text inclination angle detection method and device
CN111325199A (en) * 2018-12-14 2020-06-23 中移(杭州)信息技术有限公司 Character inclination angle detection method and device
CN111401110A (en) * 2019-01-03 2020-07-10 百度在线网络技术(北京)有限公司 Method and device for extracting information
CN110059685B (en) * 2019-04-26 2022-10-21 腾讯科技(深圳)有限公司 Character area detection method, device and storage medium
CN110059685A (en) * 2019-04-26 2019-07-26 腾讯科技(深圳)有限公司 Word area detection method, apparatus and storage medium
CN109977956B (en) * 2019-04-29 2022-11-18 腾讯科技(深圳)有限公司 Image processing method and device, electronic equipment and storage medium
CN109977956A (en) * 2019-04-29 2019-07-05 腾讯科技(深圳)有限公司 A kind of image processing method, device, electronic equipment and storage medium
CN111178346A (en) * 2019-11-22 2020-05-19 京东数字科技控股有限公司 Character area positioning method, device, equipment and storage medium
CN111178346B (en) * 2019-11-22 2023-12-08 京东科技控股股份有限公司 Text region positioning method, text region positioning device, text region positioning equipment and storage medium
CN111062365B (en) * 2019-12-30 2023-05-26 上海肇观电子科技有限公司 Method, apparatus, chip circuit and computer readable storage medium for recognizing mixed typeset text
CN111062365A (en) * 2019-12-30 2020-04-24 上海肇观电子科技有限公司 Method, device, chip circuit and computer readable storage medium for identifying mixed typesetting characters
CN111681229A (en) * 2020-06-10 2020-09-18 创新奇智(上海)科技有限公司 Deep learning model training method, wearable clothes flaw identification method and wearable clothes flaw identification device
CN112650832A (en) * 2020-12-14 2021-04-13 中国电子科技集团公司第二十八研究所 Knowledge correlation network key node discovery method based on topology and literature characteristics
CN112650832B (en) * 2020-12-14 2022-09-06 中国电子科技集团公司第二十八研究所 Knowledge correlation network key node discovery method based on topology and literature characteristics
CN113806505A (en) * 2021-09-09 2021-12-17 科大讯飞股份有限公司 Element comparison method and device, electronic equipment and storage medium
CN113806505B (en) * 2021-09-09 2024-04-16 科大讯飞股份有限公司 Element comparison method, device, electronic apparatus, and storage medium

Also Published As

Publication number Publication date
CN107093172B (en) 2020-03-17
CN107093172A (en) 2017-08-25

Similar Documents

Publication Publication Date Title
WO2017140233A1 (en) Text detection method and system, device and storage medium
JP6139396B2 (en) Method and program for compressing binary image representing document
JP4016342B2 (en) Apparatus and method for code recognition
JP3345350B2 (en) Document image recognition apparatus, method thereof, and recording medium
US10867171B1 (en) Systems and methods for machine learning based content extraction from document images
JP4646797B2 (en) Image processing apparatus, control method therefor, and program
CN109241861B (en) Mathematical formula identification method, device, equipment and storage medium
US9171224B2 (en) Method of improving contrast for text extraction and recognition applications
CN107590491B (en) Image processing method and device
JP4522468B2 (en) Image discrimination device, image search device, image search program, and recording medium
CN107977658B (en) Image character area identification method, television and readable storage medium
CN104298982A (en) Text recognition method and device
KR101659091B1 (en) Device and method for creating text collage message
CN103577818A (en) Method and device for recognizing image characters
CN102955943A (en) Image processing apparatus, and image processing method
US20150371100A1 (en) Character recognition method and system using digit segmentation and recombination
JP2009169948A (en) Device and method for determining orientation of document, and program and recording medium thereof
CN113095327B (en) Method and system for positioning optical character recognition area and storage medium thereof
TW200540728A (en) Text region recognition method, storage medium and system
CN114429637B (en) Document classification method, device, equipment and storage medium
CN110210467B (en) Formula positioning method of text image, image processing device and storage medium
US20030202696A1 (en) Activity detector
CN104346596A (en) Identification method and identification device for QR (Quick Response) code
JP2010074342A (en) Image processing apparatus, image forming apparatus, and program
JP2021149452A (en) Image processing device, control method and control program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17752665

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17752665

Country of ref document: EP

Kind code of ref document: A1