WO2018166276A1 - 文字区域定位方法和装置、计算机可读存储介质 - Google Patents

文字区域定位方法和装置、计算机可读存储介质 Download PDF

Info

Publication number
WO2018166276A1
WO2018166276A1 PCT/CN2017/119692 CN2017119692W WO2018166276A1 WO 2018166276 A1 WO2018166276 A1 WO 2018166276A1 CN 2017119692 W CN2017119692 W CN 2017119692W WO 2018166276 A1 WO2018166276 A1 WO 2018166276A1
Authority
WO
WIPO (PCT)
Prior art keywords
edge
point
text area
image
edge point
Prior art date
Application number
PCT/CN2017/119692
Other languages
English (en)
French (fr)
Inventor
王永亮
王青泽
陈标龙
Original Assignee
北京京东尚科信息技术有限公司
北京京东世纪贸易有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京京东尚科信息技术有限公司, 北京京东世纪贸易有限公司 filed Critical 北京京东尚科信息技术有限公司
Priority to US16/491,020 priority Critical patent/US11017260B2/en
Publication of WO2018166276A1 publication Critical patent/WO2018166276A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/28Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet
    • G06V30/287Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet of Kanji, Hiragana or Katakana characters

Definitions

  • the present disclosure relates to the field of image processing, and in particular to a text area positioning method and apparatus, and a computer readable storage medium.
  • the stroke width positioning method uses the characteristic that the stroke width of the printed body text is constant, and searches for a pair of parallel lines in one picture, and determines the parallel lines as strokes, and then integrates the strokes close to each other into the text area.
  • the stable extreme value area detection method uses the feature that the text area in the image forms a sharp contrast with the background image for text area localization.
  • the inventors have found that the above related art has different drawbacks.
  • the width of Microsoft's black text strokes is the same, and can be positioned by the stroke width positioning method; however, the stroke width of the Song text is not the same, so it is not suitable for positioning using the stroke width positioning method.
  • the use of stable extreme value region detection requires high pixel contrast in the text area, but in the application process, the high contrast area is not necessarily text, so the algorithm can easily introduce additional noise.
  • both methods can only locate the text area first, and require an additional algorithm to string a single text into a line, which is cumbersome and reduces the computational efficiency.
  • the present disclosure proposes a text area location scheme, which can improve the adaptability to different fonts and improve the accuracy of text area location.
  • a text area localization method including: acquiring a variance map according to an original image; acquiring an edge image of the variance map; and between opposite edge positions of adjacent two edge lines in the edge image When the difference of the distance is within the predetermined distance difference range, the area between the adjacent two edge lines is determined to be a text area.
  • determining the area between the adjacent two edge points as the text area comprises: determining the first edge point and the second edge point on the adjacent edge line; according to the first edge point and the second edge point The distance is determined by the row height; the adjacent first edge point of the difference between the row heights is within a predetermined distance difference to determine the first edge line, and the difference between the connection row heights is within a predetermined distance difference
  • the second edge point determines a second edge line, and the area between the first edge line and the second edge line is a text area.
  • determining the first edge point and the second edge point on the adjacent edge line comprises: taking a point in the edge image as the first edge point; emitting the ray from the first edge point in the direction of the pixel gradient until the next An edge point; when the angle between the first edge point and the next edge point normal vector is less than a predetermined angle threshold, determining that the next edge point is the second edge point.
  • acquiring the variance map according to the original image includes: acquiring a pixel value of a target pixel position in the original image, and a pixel value of a neighboring pixel point of the target pixel position, where the adjacent pixel point is a predetermined number of consecutively in different directions of the target pixel position. Pixel point; the pixel value of the target pixel position in the original image and the pixel value of the adjacent pixel point are determined to determine the pixel value of the target pixel position in the variance map.
  • acquiring the edge image of the variance graph comprises: calculating an edge of the image by using a Canny operator based on the variance map, and acquiring the edge image.
  • the text area includes at least one of a horizontal text area, a vertical text area, a slanted text area, and/or a scalloped area.
  • a text area locating apparatus including: a variance map determining module, configured to acquire a variance map according to an original image; an edge image acquiring module, configured to acquire an edge image of the variance graph; and text area positioning And a module, configured to determine, when the difference between the distances between the adjacent edge points of the adjacent two edge lines in the edge image is within a predetermined distance difference, the area between the adjacent two edge lines is a text area.
  • the text area positioning module includes: an edge point determining unit, configured to determine a first edge point and a second edge point located on an adjacent edge line; a row height determining unit, configured to use the first edge point and the second edge point The distance between the edge points determines the row height; the edge line connecting unit is configured to connect the adjacent first edge points of the row height difference within a predetermined distance difference to determine the difference between the first edge line and the connection line height An adjacent second edge point having a value within a predetermined distance difference is used to determine a second edge line, and an area between the first edge line and the second edge line is a text area.
  • the edge point determining unit includes: a first edge point specifying subunit for taking a point in the edge image as the first edge point; and a next edge point acquiring subunit for using the pixel gradient from the first edge point The direction emits rays until the next edge point; the second edge point determining subunit is configured to determine that the next edge point is the second when the angle between the first edge point and the next edge point normal vector is less than a predetermined angle threshold Edge point.
  • the variance map determining module is configured to: obtain a pixel value of a target pixel position in the original image, and a pixel value of a neighboring pixel point of the target pixel position, wherein the adjacent pixel point is a predetermined number of consecutive target pixel positions in different directions. Pixel point; the pixel value of the target pixel position in the original image and the pixel value of the adjacent pixel point are determined to determine the pixel value of the target pixel position in the variance map.
  • the edge image obtaining module is configured to: calculate an edge of the image by using a Canny operator based on the variance map, and obtain an edge image.
  • the text area includes at least one of a horizontal text area, a vertical text area, a slanted text area, and a scalloped area.
  • Such a device can utilize the characteristics of the similarity of the characters in the text area, and determine the text area according to the distance between the edge lines in the edge image, which is not affected by the variation of the text stroke thickness, and is applicable to various fonts and can be avoided.
  • the complex pixel variation of the image affects the positioning and improves the accuracy of the text area positioning.
  • a text area locating apparatus comprising: a memory; and a processor coupled to the memory, the processor being configured to perform any one of the above mentioned based on an instruction stored in the memory Text area targeting method.
  • Such a device can utilize the characteristics of the similarity of the characters in the text area, and determine the text area according to the distance between the edge lines in the edge image, which is not affected by the variation of the text stroke thickness, and is applicable to various fonts and can be avoided.
  • the complex pixel variation of the image affects the positioning and improves the accuracy of the text area positioning.
  • a computer readable storage medium having stored thereon computer program instructions that, when executed by a processor, implements the steps of any of the text region locating methods mentioned above.
  • Such a computer storage medium can utilize the characteristics of the similarity of the characters in the text area during the operation of the text area locating device, and determine the text area according to the distance between the edge lines in the edge image, which is suitable for various fonts and improves the positioning of the text area. The accuracy.
  • FIG. 1A to 1C are schematic diagrams of a stroke width positioning method in the related art, wherein FIG. 1A is a schematic diagram of a stroke enlargement effect, FIG. 1B is a schematic diagram of a contour, and FIG. 1C is a schematic diagram of a stroke width calculation.
  • Fig. 2 is a schematic diagram of a stable extreme value region detecting method in the related art.
  • FIG. 3A is a schematic diagram of a font with a uniform stroke width.
  • FIG. 3B is a schematic diagram of a font whose text stroke width is inconsistent.
  • FIG. 4 is a schematic diagram of a picture not applicable to the stable extreme value area detection method.
  • FIG. 5 is a flow chart of some embodiments of a text area location method of the present disclosure.
  • 6A is an original image of some embodiments of a text region localization method employing the present disclosure.
  • FIG. 6B is a variance diagram determined when the text area localization method of the present disclosure is applied to FIG. 6A.
  • FIG. 6C is an edge image determined when the text area localization method of the present disclosure is applied to FIG. 6B.
  • FIG. 6D is a schematic diagram of a text area determined when the text area localization method of the present disclosure is applied to FIG. 6C.
  • FIG. 7 is a flow chart of some embodiments of locating a text region in an edge image in the text region localization method of the present disclosure.
  • FIG. 8 is a flow chart of some embodiments of determining edge points in a text area location method of the present disclosure.
  • FIG. 9 is a schematic diagram of some embodiments of a text area locating device of the present disclosure.
  • FIG. 10 is a schematic diagram of some embodiments of a text area locating module in the text area locating device of the present disclosure.
  • FIG. 11 is a schematic diagram of some embodiments of an edge point determining unit in the text area locating device of the present disclosure.
  • FIG. 12 is a schematic diagram of still another embodiment of a text area locating device of the present disclosure.
  • FIG. 13 is a schematic diagram of still another embodiment of the text area locating device of the present disclosure.
  • FIGS. 1A to 1C are schematic diagrams of text positioning by a stroke width positioning method.
  • the gray area of Fig. 1A is an effect after a stroke of a character is enlarged several times, wherein a small gray grid is a pixel on a stroke, and a small white grid is an image background.
  • Fig. 1B the Canny operator is used to outline the two contours of the stroke. It can be seen that the two contours are parallel to each other, p and q are opposite points on both sides of the contour, and the straight line distance between the two points is W.
  • 1C is the minimum distance calculated from the pixel on the contour to the pixel point on the contour parallel thereto on the basis of FIG. 1B, which is the stroke width.
  • Fig. 2 is a schematic diagram of a stable extreme value region detecting method in the related art.
  • the font in the image is obviously contrasted with the background color.
  • two pictures on the right can be obtained, and the text information can be clearly seen in the picture on the right.
  • FIG. 3A is a schematic diagram of a font with a uniform stroke width.
  • FIG. 3B is a schematic diagram of a font whose text stroke width is inconsistent.
  • the width of the text strokes of Microsoft's black body is the same, and the positioning can be performed by the stroke width positioning method.
  • the width of the strokes of the Song text shown in FIG. 3B is not the same, such as strokes ( ⁇ )
  • the upper part is thicker and the lower side is thinner, so it is not suitable for positioning by stroke width positioning method.
  • the use of stable extreme value region detection requires high pixel contrast in the text area, but in the application process, the high contrast area is not necessarily text, so the algorithm can easily introduce additional noise.
  • FIG. 4 is a schematic diagram of a picture not applicable to the stable extreme value area detection method.
  • the selected position of the rectangular frame is a stable extreme value area, but less than half of the area is a text area.
  • both methods can only locate the text area first, and additional algorithms are needed to make a single string into a line, which is cumbersome and reduces the computational efficiency.
  • the present disclosure proposes a text area positioning scheme, which can improve the adaptability to different fonts and improve the accuracy of text area positioning.
  • FIG. 5 illustrates a flow diagram of some embodiments of a text area location method of the present disclosure. As shown in FIG. 5, the text area positioning method includes steps 501-503.
  • a variance map is acquired from the original image.
  • the pixel value variance of each pixel point in the image and the surrounding several pixel points may be obtained according to the pixel value of each point in the original image, such as the variance of the pixel values of several consecutive points according to the horizontal, and the variance is determined.
  • the variance of a point pixel can be determined by calculating the variance of the pixel values of each point and several points around it.
  • an edge image of the variance map is acquired.
  • the edge image may be computed using any of the edge detection algorithms of the related art.
  • step 503 when the difference between the distances between the opposite edge points of the adjacent two edge lines in the edge image is within a predetermined distance difference, the area between the adjacent two edge lines is determined to be a text area.
  • two approximately parallel edge lines may be obtained in the edge image, which may be straight lines, curved lines, breakpoints in the middle, and the like. If the distance between the two edge lines is relatively stable and the range of distance variation is within a predetermined distance difference, the area between the two edge lines can also be considered as a text area.
  • the pixel value of the target pixel position in the original image and the pixel value of the adjacent pixel point of the target pixel position may be acquired, and the pixel value of the target pixel position and the adjacent pixel point in the original image is determined by taking a variance value.
  • the adjacent pixel points may be consecutively a predetermined number of pixel points in different directions (eg, horizontal, vertical directions) for the target pixel position. The predetermined number can be set according to experience or actual needs.
  • the pixel value of the pixel point whose coordinate position is (x, y) in the original image is G(x, y)
  • G(0, 0) represents the pixel value of the upper left corner of the image.
  • the variance map is I
  • the pixel value of the pixel point whose coordinate position is (x, y) in the variance diagram is I(x, y).
  • the neighboring pixels of G(x, y) include G(xt, y), G(x-t+1, y), ... G(x-1, y), G(x +1, y), ... G(x+t, y), according to the formula:
  • I(x,y) Var(G(xt,y),G(x-t+1,y),..G(x,y),G(x+1,y),..G(x +t,y))
  • the pixel value I(x, y) of the (x, y) point in the variance diagram is calculated.
  • the value of t can be set according to needs or effects, such as set to 5.
  • I(0,0) can be determined based only on G(0,0), G(1,0)...G(t,0).
  • a vertical variance map by determining the variance values by pixel values of a predetermined number of pixels in the vertical direction. It is also possible to set pixel points within a predetermined range of up, down, left, and right to be adjacent pixel points.
  • the variance map can be calculated on the basis of the original image.
  • the variance diagram can reflect the change of the image, so as to obtain the position where the image changes drastically, and it is convenient to distinguish the text area from other image areas.
  • the original image is shown in Fig. 6A, and 6B is the variance diagram of Fig. 6A. It can be seen from the variance diagram that the text area is obviously elongated and has outstanding features.
  • the edge contour of the variance map may be further extracted to obtain an edge image. It can be implemented by any edge image extraction algorithm in the related art, such as using the Canny operator to calculate the edge of the image to obtain the edge image.
  • the edge contour of the variance map can be further obtained on the basis of the variance map, thereby facilitating the calculation based on the edge image and obtaining the text region between the edge points.
  • edge contour extraction is performed on the basis of the image 6B, and an edge image in 6C can be obtained.
  • the edge image lines in Fig. 6C are clear, and the edge point extraction and distance calculation can be facilitated, resulting in a text area diagram as shown in Fig. 6D.
  • the text area positioning method includes steps 701-703.
  • a first edge point and a second edge point on an adjacent edge line are determined.
  • the edge image may be traversed, one edge point being taken at a time as the first edge point until the association confirmation between each edge point and the opposite second edge point on the entire image or the entire edge line is completed.
  • a pixel point opposite to the position of the first edge point on the edge line adjacent to the edge line where the first edge point is located may be taken as the second edge point. If the two horizontal edges are parallel to each other, the coordinates of the first edge point are (x, y), and the coordinates of the second edge point are (x, y + n), where n is between the first edge and the second edge point. the distance.
  • a row height is determined based on a distance between the first edge point and the second edge point.
  • the entire map may be traversed to obtain a row height between each of the first edge points and the corresponding second edge point.
  • the adjacent first edge point of the connection line height is within a predetermined distance difference, and the first edge line is determined, and the difference between the connection line heights is within the predetermined distance difference.
  • the edge point determines a second edge line, and the area between the first edge line and the second edge line is a text area.
  • the first edge point and the second edge point are respectively the upper edge point and the lower edge point of the text (the left and right line points in the vertical state) . Therefore, the adjacent edge points can be connected to obtain the upper edge of the text, the lower edge of the text (the left and right lines in the vertical state), and the area between the edges is the text area.
  • the edge of the character can be obtained on the basis of the edge image, thereby obtaining the text region. Since it is not necessary to judge the single character, the amount of calculation is reduced, and the thickness of the stroke is not different, and the pixel value is largely different. The influence of irregular areas improves the efficiency and accuracy of text area positioning.
  • the text area positioning method includes steps 801-803.
  • a point is taken in the edge image as the first edge point.
  • the edge image may be traversed, one edge point is taken each time as the first edge point, until a second edge point corresponding to each edge point in the entire image is determined, or each of the entire edge line is determined to be determined. The second edge point of the edge point.
  • step 802 the ray is emitted from the first edge point in the direction of the pixel gradient until the next edge point, so that the edge line adjacent to the edge line where the first edge point is located can be found as opposed to the first edge point position. Point.
  • step 803 if the angle between the first edge point and the next edge point normal vector is less than the predetermined angle threshold, it is determined that the next edge point is the second edge point.
  • the predetermined angle threshold can be 30 degrees.
  • the normal vector of a pixel is the gradient of a pixel or the derivative of a pixel.
  • a digital image can be called a two-dimensional discrete function, and its normal vector can be determined by deriving the two-dimensional discrete function.
  • the first edge point and the second edge point opposite to the position can be determined on the basis of the edge image, providing a data basis for calculating the distance between the first edge point and the second edge point. Since the second edge point is determined by using the ray in the direction of the pixel gradient, the relative position of the obtained second edge point and the first edge point is uncertain, and the positional change may be up, down, left, or other positional relationship according to the pixel change, thereby enabling Determine horizontal text areas, vertical text areas, slanted text areas, and even fan-shaped text areas to prevent leaky positioning caused by irregular typesetting and improve the accuracy of text positioning.
  • the text area localization apparatus includes a variance map determination module 901, an edge image acquisition module 902, and a text area location module 903.
  • the variance map determination module 901 is capable of acquiring a variance map from the original image.
  • the pixel value variance of each pixel point in the image and the surrounding several pixel points may be obtained according to the pixel value of each point in the original image, such as the variance of the pixel values of several consecutive points according to the horizontal, and the variance is determined.
  • the variance of a point pixel is determined by calculating the variance of the pixel values of each point and several points around it.
  • the variance map determination module 901 can acquire the pixel value of the target pixel position in the original image, and the pixel value of the adjacent pixel point of the target pixel position, and take the pixel value of the target pixel position and the adjacent pixel point in the original image.
  • the variance value is used to determine the pixel value of the target pixel position in the variance map.
  • the adjacent pixel points may be horizontally and vertically in a predetermined number of target pixel positions, or pixel points within a predetermined range of up, down, left, and right. The predetermined number can be set according to experience or actual needs.
  • the variance map can be calculated on the basis of the original image, and the variance map can reflect the change of the image, thereby obtaining a position where the image changes drastically, and is convenient for distinguishing the text region from other image regions.
  • the edge image acquisition module 902 is capable of acquiring an edge image of the variance map.
  • the edge image may be computed using any of the edge detection algorithms of the related art.
  • the edge image acquisition module 902 can further extract the edge contour of the variance map based on the variance map to obtain an edge image. It can be implemented by any edge image extraction algorithm in the related art, such as using the Canny operator to calculate the edge of the image to obtain the edge image.
  • the edge contour of the variance map can be further obtained on the basis of the variance map, thereby facilitating the calculation based on the edge image and obtaining the text region between the edge points.
  • the text area positioning module 903 can determine that the area between the adjacent two edge lines is a text area when the difference between the distances between the opposite edge points of the adjacent two edge lines in the edge image is within a predetermined distance difference range.
  • two approximately parallel edge lines may be obtained in the edge image, and the edge lines may be straight lines, curved lines, and may have breakpoints in the middle. If the distance between the two edge lines is relatively stable, the distance changes. If the range is within the predetermined distance difference, the area between the two edge lines can be considered as a text area.
  • Such a device can utilize the characteristics of the similarity of the characters in the text area, and determine the text area according to the distance between the edge lines in the edge image, which is not affected by the variation of the text stroke thickness, and is applicable to various fonts and can be avoided.
  • the complex pixel variation of the image affects the positioning and improves the accuracy of the text area positioning.
  • Figure 10 further illustrates a schematic diagram of some embodiments of a text area location module in the text area location device of the present disclosure. As shown in FIG. 10, the text area positioning module edge point determining unit 1001, the line height determining unit 1002, and the edge line connecting unit 1003.
  • the edge point determining unit 1001 is capable of determining a first edge point and a second edge point located on an adjacent edge line.
  • the edge image may be traversed, one edge point being taken as the first edge point each time, until the completion of the association between each edge point and the opposite second edge point on the entire image or the entire edge line is completed.
  • a pixel point opposite to the position of the first edge point on the edge line adjacent to the edge line where the first edge point is located may be taken as the second edge point. If the two horizontal edges are parallel to each other, the coordinates of the first edge point are (x, y), and the coordinates of the second edge point are (x, y + n), where n is between the first edge and the second edge point. the distance.
  • the line height determining unit 1002 is capable of determining the line height based on the distance between the first edge point and the second edge point. In some embodiments, the entire map may be traversed to obtain a row height between each of the first edge points and the corresponding second edge point.
  • the edge line connecting unit 1003 connects the adjacent first edge points whose row height difference is within a predetermined distance difference, determines the first edge line, and the adjacent second edge is within a predetermined distance difference
  • the edge point determines a second edge line, and the area between the first edge line and the second edge line is a text area.
  • first edge points are adjacent, and a second edge point corresponding to the first edge point is also adjacent, and among the adjacent edge points, the first edge point and the second edge point
  • the difference between the distances is within a predetermined distance difference, and it can be considered that the first edge point and the second edge point are respectively the upper edge point and the lower edge point of the text (the left and right line points in the vertical state) . Therefore, the adjacent edge points can be connected to obtain the upper edge of the text, the lower edge of the text (the left and right lines in the vertical state), and the area between the edges is the text area.
  • Such a device can obtain the edge of the text on the basis of the edge image, thereby obtaining the text region, and since the judgment of the single character is not required, the calculation amount is reduced, and the irregularity of the stroke is different, and the pixel value is largely different.
  • the impact of the area improves the efficiency and accuracy of text area positioning.
  • Figure 11 further illustrates a schematic diagram of some embodiments of an edge point determining unit in the text area locating device of the present disclosure.
  • the edge point determining unit includes a first edge point specifying subunit 1101, a next edge point obtaining subunit 1102, and a second edge point determining subunit 1103.
  • the first edge point designation sub-unit 1101 can take a point in the edge image as the first edge point.
  • the edge image may be traversed, one edge point is taken each time as the first edge point, until a second edge point corresponding to each edge point in the entire image is determined, or each of the entire edge line is determined to be determined. The second edge point of the edge point.
  • the next edge point acquisition sub-unit 1102 is capable of emitting rays from the first edge point in the direction of the pixel gradient to the next edge point, thereby being able to find the edge line adjacent to the edge line where the first edge point is located and the first The point where the edge point is opposite.
  • the second edge point determining sub-unit 1103 is capable of determining that the next edge point is the second edge point when the angle between the first edge point and the next edge point normal vector is less than a predetermined angle threshold.
  • the predetermined angle threshold can be 30 degrees.
  • Such a device is capable of determining a first edge point and a second edge point opposite thereto based on the edge image, providing a data basis for calculating the distance between the first edge point and the second edge point. Since the second edge point is determined by using the radiation in the direction of the pixel gradient, the relative position of the obtained second edge point and the first edge point is uncertain, and the positional change may be up, down, left, or other depending on the pixel. In this way, it is possible to determine a horizontal text area, a vertical text area, a slanted text area, or even a fan-shaped text area, to prevent leakage positioning caused by irregular typesetting, and to improve the accuracy of text positioning.
  • Figure 12 is a block diagram showing the construction of further embodiments of the text area locating device of the present disclosure. As shown in FIG. 12, the text area locating device includes a memory 1210 and a processor 1220.
  • Memory 1210 can be a magnetic disk, flash memory, or any other non-volatile storage medium.
  • the memory is used to store instructions in a corresponding embodiment of the text area location method, including emulation platform side instructions, and may also include management system side instructions.
  • the processor 1220 is coupled to the memory 1210 and can be implemented as one or more integrated circuits, such as a microprocessor or a microcontroller.
  • the processor 1220 is configured to execute instructions stored in the memory, and can implement positioning of the text area.
  • the text area locating device 1300 includes a memory 1310 and a processor 1320.
  • Processor 1320 is coupled to memory 1310 via BUS bus 1330.
  • the text area locating device 1300 can also be connected to the external storage device 1350 via the storage interface 1040 to invoke external data, and can also be connected to the network or another computer system (not shown) via the network interface 1360. It will not be described in detail here.
  • the operation of the text area locating device can be realized by storing the data command in the memory and processing the above command by the processor.
  • a computer readable storage medium having stored thereon computer program instructions that, when executed by a processor, implement the steps of the method of the text region locating method in accordance with an embodiment.
  • a computer program product having stored thereon computer program instructions that, when executed by a processor, implement the steps of the method of the text region locating method in accordance with an embodiment.
  • the present disclosure may be provided as a method, apparatus, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the present disclosure may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer usable program code. .
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
  • the methods and apparatus of the present disclosure may be implemented in a number of ways.
  • the methods and apparatus of the present disclosure may be implemented in software, hardware, firmware or any combination of software, hardware, firmware.
  • the above-described sequence of steps for the method is for illustrative purposes only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless otherwise specifically stated.
  • the present disclosure may also be embodied as programs recorded in a recording medium, the programs including machine readable instructions for implementing a method in accordance with the present disclosure.
  • the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Character Input (AREA)

Abstract

一种文字区域定位方法和装置、计算机可读存储介质,涉及图像处理领域。该文字区域定位方法包括:根据原始图像获取方差图(501);获取方差图的边缘图像(502);若边缘图像中相邻两条边缘线中位置相对的边缘点之间距离的差值在预定距离差范围时,则确定相邻两条边缘线之间的区域为文字区域(503)。通过这样的方法,能够利用文字区域中文字高度相似的特点,根据边缘图像中边缘线之间的距离确定文字区域,不会由于文字笔画粗度变化造成影响,适用于各种不同的字体,且能够避免图像复杂的像素变化情况对定位造成影响,提高文字区域定位的准确性。

Description

文字区域定位方法和装置、计算机可读存储介质
相关申请的交叉引用
本申请是以CN申请号为201710152728.X,申请日为2017年3月15日的申请为基础,并主张其优先权,该CN申请的公开内容在此作为整体引入本申请中。
技术领域
本公开涉及图像处理领域,特别是一种文字区域定位方法和装置、计算机可读存储介质。
背景技术
在文字识别过程中,需要先利用某些算法在图片中定位出文字区域,再在文字区域中利用相关的图像算法提取文字内容,因此文字定位是文字识别引擎的一个重要的模块。目前常用的方法有笔画宽度定位法(Stroke Width Transform)和稳定极值区域检测法(Maximally Stable Extremal Regions)。
笔画宽度定位法利用打印体文字的笔画宽度不变这一特性,在一个图片中寻找一对平行线,并将平行线判定为笔画,再把相互靠近的笔画聚集成文字区域。
稳定极值区域检测法是利用图像中的文字区域会跟背景图像形成鲜明的对比这一特点进行文字区域定位。
发明内容
发明人发现上述相关技术存在各自不同的缺陷。微软正黑体的文字笔画的宽度是一致的,能够采用笔画宽度定位法进行定位;但是,宋体文字其笔画宽度并不相同,因此不适用于采用笔画宽度定位法进行定位。而采用稳定极值区域检测法需要文字区域的像素对比度高,但在应用过程中,对比度高的区域不一定是文字,因此该算法很容易引入额外的噪声。另外,这两种方法都只能先定位文字区域,还需要额外的算法将单个文字串成行,较为繁琐,降低了运算效率。
为了解决上述问题中的至少一个,本公开提出一种文字区域定位方案,能够提高对不同字体的适应能力,提高文字区域定位的准确性。
根据本公开的一些实施例,提出一种文字区域定位方法,包括:根据原始图像获 取方差图;获取方差图的边缘图像;当边缘图像中相邻两条边缘线中位置相对的边缘点之间距离的差值在预定距离差范围时,确定相邻两条边缘线之间的区域为文字区域。
可选地,确定相邻两条边缘点之间的区域为文字区域包括:确定第一边缘点和位于相邻边缘线上的第二边缘点;根据第一边缘点和第二边缘点之间的距离确定行高;连接行高的差值在预定距离差范围内的相邻的第一边缘点,以确定第一边缘线,连接行高的差值在预定距离差范围内的相邻的第二边缘点,以确定第二边缘线,第一边缘线与第二边缘线之间的区域为文字区域。
可选地,确定第一边缘点和位于相邻边缘线上的第二边缘点包括:在边缘图像中取一点作为第一边缘点;从第一边缘点沿像素梯度的方向发射射线,直至下一边缘点;当第一边缘点与下一边缘点法向量的夹角小于预定角度阈值时,确定下一边缘点为第二边缘点。
可选地,根据原始图像获取方差图包括:获取原始图像中目标像素位置的像素值,及目标像素位置的邻近像素点的像素值,其中,邻近像素点为目标像素位置在不同方向连续预定数量的像素点;将原始图像中目标像素位置和邻近像素点的像素值取方差值,确定方差图中目标像素位置的像素值。
可选地,获取方差图的边缘图像包括:基于方差图利用Canny算子计算图像边缘,获取边缘图像。
可选地,文字区域包括横向文字区域、纵向文字区域、倾斜文字区域和/扇形文字区域中的至少一种。
通过这样的方法,能够利用文字区域中文字高度相似的特点,根据边缘图像中边缘线之间的距离确定文字区域,不会由于文字笔画粗度变化造成影响,适用于各种不同的字体,且能够避免图像复杂的像素变化情况对定位造成影响,提高文字区域定位的准确性。
根据本公开的另一些实施例,提出一种文字区域定位装置,包括:方差图确定模块,用于根据原始图像获取方差图;边缘图像获取模块,用于获取方差图的边缘图像;文字区域定位模块,用于当边缘图像中相邻两条边缘线中位置相对的边缘点之间距离的差值在预定距离差范围内时,确定相邻两条边缘线之间的区域为文字区域。
可选地,文字区域定位模块包括:边缘点确定单元,用于确定第一边缘点和位于相邻边缘线上的第二边缘点;行高确定单元,用于根据第一边缘点和第二边缘点之间 的距离确定行高;边缘线连接单元,用于连接行高的差值在预定距离差范围内的相邻的第一边缘点,以确定第一边缘线,连接行高的差值在预定距离差范围内的相邻的第二边缘点,以确定第二边缘线,第一边缘线与第二边缘线之间的区域为文字区域。
可选地,边缘点确定单元包括:第一边缘点指定子单元,用于在边缘图像中取一点作为第一边缘点;下一边缘点获取子单元,用于从第一边缘点沿像素梯度的方向发射射线,直至下一边缘点;第二边缘点确定子单元,用于当第一边缘点与下一边缘点法向量的夹角小于预定角度阈值时,确定下一边缘点为第二边缘点。
可选地,方差图确定模块用于:获取原始图像中目标像素位置的像素值,及目标像素位置的邻近像素点的像素值,其中,邻近像素点为目标像素位置在不同方向连续预定数量的像素点;将原始图像中目标像素位置和邻近像素点的像素值取方差值,确定方差图中目标像素位置的像素值。
可选地,边缘图像获取模块用于:基于方差图利用Canny算子计算图像边缘,获取边缘图像。
可选地,文字区域包括横向文字区域、纵向文字区域、倾斜文字区域和扇形文字区域中的至少一种。
这样的装置能够利用文字区域中文字高度相似的特点,根据边缘图像中边缘线之间的距离确定文字区域,不会由于文字笔画粗度变化造成影响,适用于各种不同的字体,且能够避免图像复杂的像素变化情况对定位造成影响,提高文字区域定位的准确性。
根据本公开的又一些实施例,提出一种文字区域定位装置,包括:存储器;以及耦接至存储器的处理器,处理器被配置为基于存储在存储器的指令执行上文中提到的任意一种文字区域定位方法。
这样的装置能够利用文字区域中文字高度相似的特点,根据边缘图像中边缘线之间的距离确定文字区域,不会由于文字笔画粗度变化造成影响,适用于各种不同的字体,且能够避免图像复杂的像素变化情况对定位造成影响,提高文字区域定位的准确性。
根据本公开的再一些实施例,提出一种计算机可读存储介质,其上存储有计算机程序指令,该指令被处理器执行时实现上文中提到的任意一种文字区域定位方法的步骤。
这样的计算机存储介质能够在文字区域定位装置运行过程中利用文字区域中文 字高度相似的特点,根据边缘图像中边缘线之间的距离确定文字区域,适用于各种不同的字体,提高文字区域定位的准确性。通过以下参照附图对本公开的示例性实施例的详细描述,本公开的其它特征及其优点将会变得清楚。
附图说明
此处所说明的附图用来提供对本公开的进一步理解,构成本申请的一部分,本公开的示意性实施例及其说明用于解释本公开,并不构成对本公开的不当限定。
图1A~图1C为相关技术中笔画宽度定位法的示意图,其中,图1A为笔画放大效果图,图1B为轮廓示意图,图1C为笔画宽度计算示意图。
图2为相关技术中稳定极值区域检测法的示意图。
图3A为文字笔画宽度一致的字体示意图。
图3B为文字笔画宽度不一致的字体示意图。
图4为不适用于稳定极值区域检测法图片的示意图。
图5为本公开的文字区域定位方法的一些实施例的流程图。
图6A为采用本公开的文字区域定位方法的一些实施例的原始图像。
图6B为对图6A采用本公开的文字区域定位方法时确定的方差图。
图6C为对图6B采用本公开的文字区域定位方法时确定的边缘图像。
图6D为对图6C采用本公开的文字区域定位方法时确定的文字区域示意图。
图7为本公开的文字区域定位方法中在边缘图像中定位文字区域的一些实施例的流程图。
图8为本公开的文字区域定位方法中确定边缘点的一些实施例的流程图。
图9为本公开的文字区域定位装置的一些实施例的示意图。
图10为本公开的文字区域定位装置中文字区域定位模块的一些实施例的示意图。
图11为本公开的文字区域定位装置中边缘点确定单元的一些实施例的示意图。
图12为本公开的文字区域定位装置的另一些实施例的示意图。
图13为本公开的文字区域定位装置的又一些实施例的示意图。
具体实施方式
现在将参照附图来详细描述本公开的各种示例性实施例。应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限 制本公开的范围。
同时,应当明白,为了便于描述,附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本公开及其应用或使用的任何限制。
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为授权说明书的一部分。
在这里示出和讨论的所有示例中,任何具体值应被解释为仅仅是示例性的,而不是作为限制。因此,示例性实施例的其它示例可以具有不同的值。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。
图1A~图1C为利用笔画宽度定位法进行文字定位的示意图。
图1A的灰色区域是文字的一个笔画放大数倍之后的效果,其中灰色的小格子是笔画上的像素,白色的小格子是图像背景。图1B中,利用Canny算子勾勒出了笔画的两条轮廓,大致可以看出这两条轮廓相互平行,p和q为轮廓两侧相对的点,两点之间的直线距离为W。图1C是在图1B的基础上计算轮廓上的像素到与其平行的轮廓上的像素点最小的距离,这个距离就是笔画宽度。
图2为相关技术中稳定极值区域检测法的示意图。
如图2所示,图像中的字体跟背景的颜色对比明显,通过不断提高二值化阈值,可以获得右边的两张图片,在右边的图片中可以清晰的看到文字信息。
但是, 上述相关技术存在各自不同的缺陷。
图3A为文字笔画宽度一致的字体示意图。图3B为文字笔画宽度不一致的字体示意图。如图3A中所示,微软正黑体的文字笔画的宽度是一致的,能够采用笔画宽度定位法进行定位,但是,如图3B中所示的宋体文字其笔画宽度并不相同,如笔画撇(丿)就是上半部分宽度比较粗,下边比较细,因此不适用于采用笔画宽度定位法进行定位。而采用稳定极值区域检测法需要文字区域的像素对比度高,但在应用过程中,对比度高的区域不一定是文字,因此该算法很容易引入额外的噪声。
图4为不适用于稳定极值区域检测法图片的示意图。
如图4所示,矩形框所选中的位置都是稳定极值区域,但是只有不到一半的区域是文字区域。另外,这两种方法都只能先定位文字区域,还需要额外的算法将单个文 字串成行,较为繁琐,降低了运算效率。
基于此,本公开提出一种文字区域定位方案,能够提高对不同字体的适应能力,提高文字区域定位的准确性。
图5示出本公开的文字区域定位方法的一些实施例的流程图。如图5所示,文字区域定位方法包括步骤501-503。
在步骤501中,根据原始图像获取方差图。在一些实施例中,可以根据原始图像中每个点的像素值获取图像中每个像素点与周围几个像素点的像素值方差,如根据水平连续几个点的像素值取方差,确定其中一个点的方差图像素。通过计算每个点与周围几个点像素值的方差,可以确定方差图。
在步骤502中,获取方差图的边缘图像。在一些实施例中,可以采用相关技术中任意一种边缘检测算法计算边缘图像。
在步骤503中,当边缘图像中相邻两条边缘线中位置相对的边缘点之间距离的差值在预定距离差范围内时,确定相邻两条边缘线之间的区域为文字区域。在一些实施例中,可以在边缘图像中得到两条近似平行的边缘线,该边缘线可以为直线、曲线,中间可以有断点等。若两条边缘线之间的距离相对稳定,距离变化范围在预定距离差范围内,也可以认为这两条边缘线之间的区域为文字区域。
通过这样的方法,能够利用文字区域中文字高度相似的特点,根据边缘图像中边缘线之间的距离确定文字区域。这样就不会由于文字笔画粗度变化造成影响,能够适用于各种不同的字体,且能够避免图像复杂的像素变化情况对定位造成影响,提高文字区域定位的准确性。另外,由于无需逐字的确定文字区域再进行拼接,而是直接在复杂版面的打印体图片中直接快速的定位文字行,提高了文字区域的确定效率。
在一些实施例中,可以获取原始图像中目标像素位置的像素值、及目标像素位置的邻近像素点的像素值,将原始图像中目标像素位置和邻近像素点的像素值取方差值来确定方差图中目标像素位置的像素值。邻近像素点可以为目标像素位置在不同方向(例如,水平、竖直方向)连续预定数量的像素点。预定数量可以根据经验或者实际需求设定。例如,假定原始图像为G,原始图像中坐标位置为(x,y)的像素点的像素值为G(x,y),如G(0,0)代表图像左上角的像素值。假定方差图为I,方差图中坐标位置为(x,y)的像素点的像素值为I(x,y)。以水平方差图为例,G(x,y)的邻近像素点包括G(x-t,y)、G(x-t+1,y)、……G(x-1,y)、G(x+1,y)、……G(x+t,y),根据公式:
I(x,y)=Var(G(x-t,y),G(x-t+1,y),..G(x,y),G(x+1,y),..G(x+t,y))
计算方差图中(x,y)点的像素值I(x,y)。在公式中,t的数值可以根据需要或效果设定,如设置为5。
对于位于两端的像素点,如G(0,0),可以只根据G(0,0)、G(1,0)……G(t,0)确定I(0,0)。
对于特殊的应用场合,还可以计算竖直方差图,即以竖直方向续预定数量的像素点的像素值确定方差值。还可以设定上下左右预定范围内的像素点为邻近像素点。
通过这样的方法,能够在原始图像的基础上计算其方差图。方差图能够反应图像的变化情况,从而获取图像发生剧烈变化的位置,便于将文字区域与其他图像区域区分。如图6A所示为原始图像,6B为图6A的方差图。从方差图中能够看出,文字区域呈现明显的长条状,具有突出的特点。
在一些实施例中,在方差图的基础上,可以进一步提取方差图的边缘轮廓,得到边缘图像。可以利用相关技术中任意一种边缘图像提取算法进行实现,如采用Canny算子计算图像边缘,得到边缘图像。
通过这样的方法,能够在方差图的基础上进一步得到方差图的边缘轮廓,从而便于在边缘图像的基础上进行运算,得到位于边缘点之间的文字区域。如图6C所示,在图像6B的基础上进行边缘轮廓提取,能够得到6C中的边缘图像。图6C中的边缘图像线条清楚,能够便于进行边缘点提取和距离计算,得到如图6D所示的文字区域示意图。
图7示出本公开的文字区域定位方法中在边缘图像中定位文字区域的一些实施例的流程图。如图7所示,文字区域定位方法包括步骤701-703。
在步骤701中,确定第一边缘点和位于相邻边缘线上的第二边缘点。在一些实施例中,可以遍历边缘图像,每次取一个边缘点作为第一边缘点,直至完成整幅图像或整条边缘线上每个边缘点与相对的第二边缘点间的关联确认。在第一边缘点的基础上可以取与第一边缘点所处的边缘线相邻的边缘线上与第一边缘点位置相对的像素点作为第二边缘点。如两条水平的边缘线上下平行,第一边缘点坐标为(x,y),则第二边缘点坐标为(x,y+n),其中,n为第一、第二边缘点之间的距离。
在步骤702中,根据第一边缘点和第二边缘点之间的距离确定行高。在一些实施例中,可以遍历整幅图,得到每个第一边缘点和对应的第二边缘点之间的行高。
在步骤703中,连接行高的差值在预定距离差范围内的相邻的第一边缘点,确定 第一边缘线,连接行高的差值在预定距离差范围内的相邻的第二边缘点,确定第二边缘线,第一边缘线与第二边缘线之间的区域为文字区域。在一些实施例中,若至少两个第一边缘点相邻、且第一边缘点对应的第二边缘点也相邻,且这些相邻的边缘点中,第一边缘点与第二边缘点之间的距离的差值在预定距离差范围内,则可以认为第一边缘点和第二边缘点分别为文字的上边线点、下边线点(在竖直状态下是左、右边线点)。从而可以将相邻边缘点连接,得到文字的上边线、文字的下边线(在竖直状态下是左、右边线),边线之间的区域即为文字区域。
通过这样的方法,能够在边缘图像的基础上得到文字的边线,从而得到文字区域,由于无需进行单个文字的判断,降低了运算量,且不受笔画粗度不同、像素值有较大差异的不规则区域的影响,提高了文字区域定位的效率和准确度。
图8示出本公开的文字区域定位方法中确定边缘点的一些实施例的流程图。如图8所示,文字区域定位方法包括步骤801-803。
在步骤801中,在边缘图像中取一点作为第一边缘点。在一些实施例中,可以遍历边缘图像,每次取一个边缘点作为第一边缘点,直至确定整幅图像中每个边缘点对应的第二边缘点,或完成确定整条边缘线上每个边缘点的第二边缘点。
在步骤802中,从第一边缘点沿像素梯度的方向发射射线,直至下一边缘点,从而能够找到与第一边缘点所处的边缘线相邻的边缘线上与第一边缘点位置相对的点。
在步骤803中,若第一边缘点与下一边缘点法向量的夹角小于预定角度阈值,则确定下一边缘点为第二边缘点。在一些实施例中,预定角度阈值可以为30度。像素的法向量即像素的梯度或像素的导数,数字图像作为离散的点值谱,可以称为二维离散函数,可以通过对该二维离散函数求导确定其法向量。
通过这样的方法,能够在边缘图像的基础上确定第一边缘点和与其位置相对的第二边缘点,为计算第一边缘点与第二边缘点的距离提供了数据基础。由于采用沿像素梯度的方向发射射线的方式确定第二边缘点,得到的第二边缘点与第一边缘点的相对位置不确定,根据像素变化情况可能是上下、左右或其他位置关系,从而能够确定横向文字区域、纵向文字区域、倾斜文字区域,甚至是扇形文字区域,防止不规则排版造成的漏定位,提高文字定位的准确度。
图9示出本公开的文字区域定位装置的一些实施例示意图。如图9所示,文字区域定位装置包括方差图确定模块901、边缘图像获取模块902和文字区域定位模块903。
方差图确定模块901能够根据原始图像获取方差图。在一些实施例中,可以根据原始图像中每个点的像素值获取图像中每个像素点与周围几个像素点的像素值方差,如根据水平连续几个点的像素值取方差,确定其中一个点的方差图像素。通过计算每个点与周围几个点像素值的方差,确定方差图。
在一些实施例中,方差图确定模块901能够获取原始图像中目标像素位置的像素值,及目标像素位置的邻近像素点的像素值,将原始图像中目标像素位置和邻近像素点的像素值取方差值来确定方差图中目标像素位置的像素值。邻近像素点可以为目标像素位置水平、竖直方向连续预定数量,或上下左右预定范围内的像素点。预定数量可以根据经验或者实际需求设定。
通过这样的方法,能够在原始图像的基础上计算其方差图,方差图能够反应图像的变化情况,从而获取图像发生剧烈变化的位置,便于将文字区域与其他图像区域区分。
边缘图像获取模块902能够获取方差图的边缘图像。在一些实施例中,可以采用相关技术中任意一种边缘检测算法计算边缘图像。
在一些实施例中,边缘图像获取模块902能够在方差图的基础上进一步提取方差图的边缘轮廓,得到边缘图像。可以利用相关技术中任意一种边缘图像提取算法进行实现,如采用Canny算子计算图像边缘,得到边缘图像。
通过这样的方法,能够在方差图的基础上进一步得到方差图的边缘轮廓,从而便于在边缘图像的基础上进行运算,得到位于边缘点之间的文字区域。
文字区域定位模块903能够在边缘图像中相邻两条边缘线中位置相对的边缘点之间距离的差值在预定距离差范围内时,确定相邻两条边缘线之间的区域为文字区域。在一些实施例中,可以在边缘图像中得到两条近似平行的边缘线,该边缘线可以为直线、曲线,中间可以有断点等,若两条边缘线之间的距离相对稳定,距离变化范围在预定距离差范围内,则可以认为这两条边缘线之间的区域为文字区域。
这样的装置能够利用文字区域中文字高度相似的特点,根据边缘图像中边缘线之间的距离确定文字区域,不会由于文字笔画粗度变化造成影响,适用于各种不同的字体,且能够避免图像复杂的像素变化情况对定位造成影响,提高文字区域定位的准确性。
图10进一步示出本公开的文字区域定位装置中文字区域定位模块的一些实施例的示意图。如图10所示,文字区域定位模块边缘点确定单元1001、行高确定单元1002 和边缘线连接单元1003。
边缘点确定单元1001能够确定第一边缘点和位于相邻边缘线上的第二边缘点。在一些实施例中,可以遍历边缘图像,每次取一个边缘点作为第一边缘点,直至完成整幅图像或整条边缘线上每个边缘点与相对的第二边缘点间关联关系的确认。在第一边缘点的基础上可以取与第一边缘点所处的边缘线相邻的边缘线上与第一边缘点位置相对的像素点作为第二边缘点。如两条水平的边缘线上下平行,第一边缘点坐标为(x,y),则第二边缘点坐标为(x,y+n),其中,n为第一、第二边缘点之间的距离。
行高确定单元1002能够根据第一边缘点和第二边缘点之间的距离确定行高。在一些实施例中,可以遍历整幅图,得到每个第一边缘点和对应的第二边缘点之间的行高。
边缘线连接单元1003连接行高的差值在预定距离差范围内的相邻的第一边缘点,确定第一边缘线,连接行高的差值在预定距离差范围内的相邻的第二边缘点,确定第二边缘线,第一边缘线与第二边缘线之间的区域为文字区域。
在一些实施例中,若至少两个第一边缘点相邻、且第一边缘点对应的第二边缘点也相邻,且这些相邻的边缘点中,第一边缘点与第二边缘点之间的距离的差值在预定距离差范围内,则可以认为第一边缘点和第二边缘点分别为文字的上边线点、下边线点(在竖直状态下是左、右边线点)。从而可以将相邻边缘点连接,得到文字的上边线、文字的下边线(在竖直状态下是左、右边线),边线之间的区域即为文字区域。
这样的装置能够在边缘图像的基础上得到文字的边线,从而得到文字区域,由于无需进行单个文字的判断,降低了运算量,且不受笔画粗度不同、像素值有较大差异的不规则区域的影响,提高了文字区域定位的效率和准确度。
图11进一步示出本公开的文字区域定位装置中边缘点确定单元的一些实施例的示意图。如图11所示,边缘点确定单元包括第一边缘点指定子单元1101、下一边缘点获取子单元1102和第二边缘点确定子单元1103。
第一边缘点指定子单元1101能够在边缘图像中取一点作为第一边缘点。在一些实施例中,可以遍历边缘图像,每次取一个边缘点作为第一边缘点,直至确定整幅图像中每个边缘点对应的第二边缘点,或完成确定整条边缘线上每个边缘点的第二边缘点。
下一边缘点获取子单元1102能够从第一边缘点沿像素梯度的方向发射射线,直 至下一边缘点,从而能够找到与第一边缘点所处的边缘线相邻的边缘线上与第一边缘点位置相对的点。
第二边缘点确定子单元1103能够当第一边缘点与下一边缘点法向量的夹角小于预定角度阈值时,确定下一边缘点为第二边缘点。在一些实施例中,预定角度阈值可以为30度。
这样的装置能够在边缘图像的基础上确定第一边缘点和与其位置相对的第二边缘点,为计算第一边缘点与第二边缘点的距离提供了数据基础。由于采用沿像素梯度的方向发射射线的方式确定第二边缘点,得到的第二边缘点与第一边缘点的相对位置不确定,根据像素变化情况可能是上下、左右或其他位置关系。这样能够确定横向文字区域、纵向文字区域、倾斜文字区域,甚至是扇形文字区域,防止不规则排版造成的漏定位,提高文字定位的准确度。
图12示出本公开文字区域定位装置的另一些实施例的结构示意图。如图12所示,文字区域定位装置包括存储器1210和处理器1220。
存储器1210可以是磁盘、闪存或其它任何非易失性存储介质。存储器用于存储文字区域定位方法的对应实施例中的指令,包括仿真平台侧指令,也可以包括管理系统侧指令。
处理器1220耦接至存储器1210,可以作为一个或多个集成电路来实施,例如微处理器或微控制器。该处理器1220用于执行存储器中存储的指令,能够实现文字区域的定位。
在一些实施例中,还可以如图13所示,文字区域定位装置1300包括存储器1310和处理器1320。处理器1320通过BUS总线1330耦合至存储器1310。该文字区域定位装置1300还可以通过存储接口1040连接至外部存储装置1350以便调用外部数据,还可以通过网络接口1360连接至网络或者另外一台计算机系统(未标出)。此处不再进行详细介绍。
在该实施例中,通过存储器存储数据指令,再通过处理器处理上述指令,能够实现文字区域定位装置的运转。
在另一些实施例中,一种计算机可读存储介质,其上存储有计算机程序指令,该指令被处理器执行时实现文字区域定位方法对应实施例中的方法的步骤。本领域内的技术人员应明白,本公开的实施例可提供为方法、装置、或计算机程序产品。因此,本公开可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的 形式。而且,本公开可采用在一个或多个其中包含有计算机可用程序代码的计算机可用非瞬时性存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本公开是参照根据本公开实施例的方法、设备(系统)和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
至此,已经详细描述了本公开。为了避免遮蔽本公开的构思,没有描述本领域所公知的一些细节。本领域技术人员根据上面的描述,完全可以明白如何实施这里公开的技术方案。
可能以许多方式来实现本公开的方法以及装置。例如,可通过软件、硬件、固件或者软件、硬件、固件的任何组合来实现本公开的方法以及装置。用于所述方法的步骤的上述顺序仅是为了进行说明,本公开的方法的步骤不限于以上具体描述的顺序,除非以其它方式特别说明。此外,在一些实施例中,还可将本公开实施为记录在记录介质中的程序,这些程序包括用于实现根据本公开的方法的机器可读指令。因而,本公开还覆盖存储用于执行根据本公开的方法的程序的记录介质。
最后应当说明的是:以上实施例仅用以说明本公开的技术方案而非对其限制;尽管参照较佳实施例对本公开进行了详细的说明,所属领域的普通技术人员应当理解:依然可以对本公开的具体实施方式进行修改或者对部分技术特征进行等同替换;而不 脱离本公开技术方案的精神,其均应涵盖在本公开请求保护的技术方案范围当中。

Claims (14)

  1. 一种文字区域定位方法,包括:
    根据原始图像获取方差图;
    获取所述方差图的边缘图像;
    当所述边缘图像中相邻两条边缘线中位置相对的边缘点之间距离的差值在预定距离差范围内时,确定所述相邻两条边缘线之间的区域为文字区域。
  2. 根据权利要求1所述的文字区域定位方法,其中,所述确定所述相邻两条边缘点之间的区域为文字区域包括:
    确定第一边缘点和位于相邻边缘线上的第二边缘点;
    根据所述第一边缘点和所述第二边缘点之间的距离确定行高;
    连接行高的差值在预定距离差范围内的相邻的所述第一边缘点,以确定第一边缘线,连接行高的差值在预定距离差范围内的相邻的所述第二边缘点,以确定第二边缘线,所述第一边缘线与所述第二边缘线之间的区域为所述文字区域。
  3. 根据权利要求2所述的文字区域定位方法,所述确定第一边缘点和位于相邻边缘线上的第二边缘点包括:
    在所述边缘图像中取一点作为所述第一边缘点;
    从所述第一边缘点沿像素梯度的方向发射射线,直至下一边缘点;
    当所述第一边缘点与所述下一边缘点法向量的夹角小于预定角度阈值时,确定所述下一边缘点为所述第二边缘点。
  4. 根据权利要求1所述的文字区域定位方法,所述根据原始图像获取方差图包括:
    获取原始图像中目标像素位置的像素值、及所述目标像素位置的邻近像素点的像素值,其中,所述邻近像素点为所述目标像素位置在不同方向连续预定数量的像素点;
    将所述原始图像中所述目标像素位置和所述邻近像素点的像素值取方差值,确定所述方差图中所述目标像素位置的像素值。
  5. 根据权利要求1所述的文字区域定位方法,其中,所述获取所述方差图的边缘图像包括:基于所述方差图利用Canny算子计算图像边缘,获取所述边缘图像。
  6. 根据权利要求1所述的文字区域定位方法,其中,所述文字区域包括横向文字区域、纵向文字区域、倾斜文字区域和扇形文字区域中的至少一种。
  7. 一种文字区域定位装置,包括:
    方差图确定模块,用于根据原始图像获取方差图;
    边缘图像获取模块,用于获取所述方差图的边缘图像;
    文字区域定位模块,用于当所述边缘图像中相邻两条边缘线中位置相对的边缘点之间距离的差值在预定距离差范围内时,确定所述相邻两条边缘线之间的区域为文字区域。
  8. 根据权利要求7所述的文字区域定位装置,其中,所述文字区域定位模块包括:
    边缘点确定单元,用于确定第一边缘点和位于相邻边缘线上的第二边缘点;
    行高确定单元,用于根据所述第一边缘点和所述第二边缘点之间的距离确定行高;
    边缘线连接单元,用于连接行高的差值在预定距离差范围内的相邻的所述第一边缘点,以确定第一边缘线,连接行高的差值在预定距离差范围内的相邻的所述第二边缘点,以确定第二边缘线,所述第一边缘线与所述第二边缘线之间的区域为所述文字区域。
  9. 根据权利要求8所述的文字区域定位装置,其中,所述边缘点确定单元包括:
    第一边缘点指定子单元,用于在所述边缘图像中取一点作为所述第一边缘点;
    下一边缘点获取子单元,用于从所述第一边缘点沿像素梯度的方向发射射线,直至下一边缘点;
    第二边缘点确定子单元,用于当所述第一边缘点与所述下一边缘点法向量的夹角小于预定角度阈值时,确定所述下一边缘点为所述第二边缘点。
  10. 根据权利要求7所述的文字区域定位装置,其中,所述方差图确定模块用于:
    获取原始图像中目标像素位置的像素值,及所述目标像素位置的邻近像素点的像素值,其中,所述邻近像素点为所述目标像素位置在不同方向连续预定数量的像素点;
    将所述原始图像中所述目标像素位置和所述邻近像素点的像素值取方差值,确定所述方差图中所述目标像素位置的像素值。
  11. 根据权利要求7所述的文字区域定位装置,其中,所述边缘图像获取模块用于基于所述方差图利用Canny算子计算图像边缘,获取所述边缘图像。
  12. 根据权利要求7所述的文字区域定位装置,其中,所述文字区域包括横向文字区域、纵向文字区域、倾斜文字区域和扇形文字区域中的至少一种。
  13. 一种文字区域定位装置,包括:
    存储器;以及
    耦接至所述存储器的处理器,所述处理器被配置为基于存储在所述存储器的指令执行如权利要求1至6任一项所述的文字区域定位方法。
  14. 一种计算机可读存储介质,其上存储有计算机程序指令,该指令被处理器执行时实现权利要求1至6任意一项所述的文字区域定位方法。
PCT/CN2017/119692 2017-03-15 2017-12-29 文字区域定位方法和装置、计算机可读存储介质 WO2018166276A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/491,020 US11017260B2 (en) 2017-03-15 2017-12-29 Text region positioning method and device, and computer readable storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710152728.X 2017-03-15
CN201710152728.XA CN108573251B (zh) 2017-03-15 2017-03-15 文字区域定位方法和装置

Publications (1)

Publication Number Publication Date
WO2018166276A1 true WO2018166276A1 (zh) 2018-09-20

Family

ID=63521757

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/119692 WO2018166276A1 (zh) 2017-03-15 2017-12-29 文字区域定位方法和装置、计算机可读存储介质

Country Status (3)

Country Link
US (1) US11017260B2 (zh)
CN (1) CN108573251B (zh)
WO (1) WO2018166276A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111695557A (zh) * 2019-08-30 2020-09-22 新华三信息安全技术有限公司 一种图像处理方法及装置

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111340036A (zh) * 2020-03-25 2020-06-26 上海眼控科技股份有限公司 车辆vin码的篡改检测方法、计算机设备和存储介质
CN111652013B (zh) * 2020-05-29 2023-06-02 天津维创微智能科技有限公司 一种文字过滤方法、装置、设备和存储介质
CN113762244A (zh) * 2020-06-05 2021-12-07 北京市天元网络技术股份有限公司 文档信息的提取方法及装置
CN113986152A (zh) * 2020-07-08 2022-01-28 森大(深圳)技术有限公司 图像分段转换的喷墨打印方法、装置、设备及存储介质
US11954932B2 (en) * 2020-10-16 2024-04-09 Bluebeam, Inc. Systems and methods for automatic detection of features on a sheet
CN113313111B (zh) * 2021-05-28 2024-02-13 北京百度网讯科技有限公司 文本识别方法、装置、设备和介质
CN118334492B (zh) * 2024-06-14 2024-08-16 山东科技大学 边缘检测模型训练方法、边缘检测方法、设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103324927A (zh) * 2012-03-19 2013-09-25 株式会社Pfu 图像处理装置以及文字识别方法
CN104751142A (zh) * 2015-04-01 2015-07-01 电子科技大学 一种基于笔划特征的自然场景文本检测算法
CN104794479A (zh) * 2014-01-20 2015-07-22 北京大学 基于局部笔画宽度变换的自然场景图片中文本检测方法
CN105718926A (zh) * 2014-12-03 2016-06-29 夏普株式会社 一种文本检测的方法和装置

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1265324C (zh) * 2003-11-06 2006-07-19 上海交通大学 基于相邻边缘点距离统计的文字图象分割方法
CN102332096B (zh) * 2011-10-17 2013-01-02 中国科学院自动化研究所 一种视频字幕文本提取和识别的方法
ES2432479B2 (es) * 2012-06-01 2014-10-21 Universidad De Las Palmas De Gran Canaria Método para la identificación y clasificación automática de especies arácnidas a través de sus telas de araña
CN103034856B (zh) * 2012-12-18 2016-01-20 深圳深讯和科技有限公司 定位图像中文字区域的方法及装置
CN104112135B (zh) * 2013-04-18 2017-06-06 富士通株式会社 文本图像提取装置以及方法
CN103593653A (zh) * 2013-11-01 2014-02-19 浙江工业大学 基于扫描枪的字符二维条码识别方法
CN105224941B (zh) * 2014-06-18 2018-11-20 台达电子工业股份有限公司 对象辨识与定位方法
CN104361336A (zh) * 2014-11-26 2015-02-18 河海大学 一种水下视频图像的文字识别方法
CN106033528A (zh) * 2015-03-09 2016-10-19 富士通株式会社 从彩色文档图像中提取特定区域的方法和设备
CN105868757A (zh) * 2016-03-25 2016-08-17 上海珍岛信息技术有限公司 一种图像文字中的文字定位方法及装置
CN106295648B (zh) * 2016-07-29 2019-03-19 湖北工业大学 一种基于多光谱成像技术的低质量文档图像二值化方法
CN106485710A (zh) * 2016-10-18 2017-03-08 广州视源电子科技股份有限公司 一种元件错件检测方法和装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103324927A (zh) * 2012-03-19 2013-09-25 株式会社Pfu 图像处理装置以及文字识别方法
CN104794479A (zh) * 2014-01-20 2015-07-22 北京大学 基于局部笔画宽度变换的自然场景图片中文本检测方法
CN105718926A (zh) * 2014-12-03 2016-06-29 夏普株式会社 一种文本检测的方法和装置
CN104751142A (zh) * 2015-04-01 2015-07-01 电子科技大学 一种基于笔划特征的自然场景文本检测算法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LIANG, TIANCAI ET AL.: "Wordart Detection in Natural Scene Based on Stroke Growing", COMPUTER SIMULATION, vol. 32, no. 08, 31 August 2015 (2015-08-31) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111695557A (zh) * 2019-08-30 2020-09-22 新华三信息安全技术有限公司 一种图像处理方法及装置
CN111695557B (zh) * 2019-08-30 2024-04-26 新华三信息安全技术有限公司 一种图像处理方法及装置

Also Published As

Publication number Publication date
US11017260B2 (en) 2021-05-25
CN108573251A (zh) 2018-09-25
US20200012879A1 (en) 2020-01-09
CN108573251B (zh) 2021-09-07

Similar Documents

Publication Publication Date Title
WO2018166276A1 (zh) 文字区域定位方法和装置、计算机可读存储介质
US9488483B2 (en) Localization using road markings
US9519968B2 (en) Calibrating visual sensors using homography operators
US9208395B2 (en) Position and orientation measurement apparatus, position and orientation measurement method, and storage medium
US10088294B2 (en) Camera pose estimation device and control method
CN106249881B (zh) 增强现实视场空间和虚拟三维目标动态配准方法
US20200074665A1 (en) Object detection method, device, apparatus and computer-readable storage medium
CN108381549B (zh) 一种双目视觉引导机器人快速抓取方法、装置及存储介质
US9164583B2 (en) Method and apparatus for gaze point mapping
US9405182B2 (en) Image processing device and image processing method
TW202011733A (zh) 對影像進行目標取樣的方法及裝置
US20200327653A1 (en) Automatic detection, counting, and measurement of logs using a handheld device
US20180039843A1 (en) Detecting device, detecting method, and program
JP2016167229A (ja) 座標変換パラメータ決定装置、座標変換パラメータ決定方法及び座標変換パラメータ決定用コンピュータプログラム
KR20150037374A (ko) 카메라로 촬영한 문서 영상을 스캔 문서 영상으로 변환하기 위한 방법, 장치 및 컴퓨터 판독 가능한 기록 매체
US9727776B2 (en) Object orientation estimation
CN112017231B (zh) 基于单目摄像头的人体体重识别方法、装置及存储介质
CN110688947A (zh) 一种同步实现人脸三维点云特征点定位和人脸分割的方法
CN106296587B (zh) 轮胎模具图像的拼接方法
US20190310348A1 (en) Recording medium recording information processing program, information processing apparatus, and information processing method
CN109977959A (zh) 一种火车票字符区域分割方法及装置
US11216905B2 (en) Automatic detection, counting, and measurement of lumber boards using a handheld device
CN102831578B (zh) 图像处理方法和图像处理设备
CN110832851B (zh) 图像处理装置、图像变换方法
US20200191577A1 (en) Method and system for road image reconstruction and vehicle positioning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17900585

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS (EPO FORM 1205A DATED 09.12.2019)

122 Ep: pct application non-entry in european phase

Ref document number: 17900585

Country of ref document: EP

Kind code of ref document: A1