CN111814778A - Text line region positioning method, layout analysis method and character recognition method - Google Patents

Text line region positioning method, layout analysis method and character recognition method Download PDF

Info

Publication number
CN111814778A
CN111814778A CN202010640573.6A CN202010640573A CN111814778A CN 111814778 A CN111814778 A CN 111814778A CN 202010640573 A CN202010640573 A CN 202010640573A CN 111814778 A CN111814778 A CN 111814778A
Authority
CN
China
Prior art keywords
line region
image
text line
positioning
region
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010640573.6A
Other languages
Chinese (zh)
Inventor
张岩
刘丽辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sinosecu Technology Co ltd
Original Assignee
Beijing Sinosecu Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sinosecu Technology Co ltd filed Critical Beijing Sinosecu Technology Co ltd
Priority to CN202010640573.6A priority Critical patent/CN111814778A/en
Publication of CN111814778A publication Critical patent/CN111814778A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4038Image mosaicing, e.g. composing plane images from plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Character Input (AREA)

Abstract

The invention belongs to the technical field of optical character recognition in digital image recognition, and particularly relates to a text line region positioning method, a layout analysis method based on the text line region positioning, a character recognition method based on the layout analysis, a device and a storage medium. The text line region positioning method comprises the following steps: acquiring a gray image of an image to be identified; obtaining a positioning image according to the gray level image; the areas of the positive and/or negative text lines in the scout image are identified. The layout analysis method based on text line region positioning and the character recognition method based on layout analysis both comprise the step of obtaining a positive color text line region and/or a negative color text line region by applying the text line region positioning method. According to the invention, the result images only with the positive color text line region or only with the reverse color text line region are obtained by splicing through the text line regions, so that all character information can be obtained by one-time recognition, and the character recognition efficiency is improved.

Description

Text line region positioning method, layout analysis method and character recognition method
Technical Field
The invention belongs to the technical field of optical character recognition in digital image recognition, and particularly relates to a text line region positioning method, a layout analysis method based on the text line region positioning, a character recognition method based on the layout analysis, a device and a storage medium.
Background
OCR (Optical Character Recognition) refers to a process in which an electronic device (e.g., a scanner or a digital camera) checks a Character printed on paper, determines its shape by detecting dark and light patterns, and then translates the shape into a computer text by a Character Recognition method through a large amount of arithmetic processing; the method is characterized in that characters in a paper document are converted into an image file with a black-white dot matrix in an optical mode aiming at print characters, and the characters in the image are converted into a text format through recognition software for further editing and processing by word processing software.
In the field of digital image recognition, the region with light background and light character is generally defined as the positive color region, and conversely, the region with light character with dark background is defined as the negative color region. The existing OCR technology generally performs recognition on images of shallow and deep characters. Therefore, based on the conventional OCR technology, an image having only a positive color region can be directly recognized, and an image having only a negative color region is recognized after a reverse color process. For the identification of the image with complex layout conditions of the positive color area and the reverse color area, the processing means is complex, the identification is directly carried out firstly to identify the positive color area in the image, then the identification is carried out after the reverse color to identify the reverse color area in the image, and then the identification results of the two identifications are combined to obtain the final identification result.
Disclosure of Invention
When the existing OCR technology is used for carrying out character recognition on an image to be recognized with a complex layout, all character information of the image to be recognized with the complex layout can be obtained only by carrying out recognition twice, and the problems of long recognition time and low recognition efficiency exist.
In order to solve the above technical problems, the present application aims to provide a method for positioning a text line region, a method for analyzing a layout based on the positioning of the text line region, a method for identifying characters based on the layout analysis, a device and a storage medium, so as to obtain all character information in an image to be identified with a complex layout by only one OCR recognition, thereby greatly improving the efficiency of character identification of the image to be identified with the complex layout.
In one aspect of the present invention, a method for locating a text line region is provided, which includes the following steps:
acquiring a gray image of an image to be identified;
obtaining a positioning image according to the gray level image;
and identifying the positive color text line region and/or the negative color text line region in the positioning image.
In another aspect of the present invention, a layout analysis method based on text line region location is provided, which includes: obtaining a positive color text line area and a negative color text line area in a positioning image by applying the text line area positioning method; and determining a positive color area and a negative color area in the gray-scale image of the image to be recognized.
In another aspect of the present invention, a method for character recognition based on layout analysis is provided, which includes the following steps:
the method for positioning the text line region is applied to obtain the orthochromatic text line region and/or the inverse-chromatic text line region in the positioning image; obtaining a positive color image and a negative color image according to the gray level image in the text line region positioning method;
acquiring at least one of an orthochromatic text line region of the orthochromatic image, a reverse-chromatic text line region of the orthochromatic image, an orthochromatic text line region of the reverse-chromatic image and a reverse-chromatic text line region of the reverse-chromatic image according to the orthochromatic text line region and/or the reverse-chromatic text line region in the positioning image;
splicing the positive color image and the reverse color image according to at least one of a positive color text line region of the positive color image, a reverse color text line region of the positive color image, a positive color text line region of the reverse color image and a reverse color text line region of the reverse color image to obtain a result image; the text line areas in the result image are both orthochromatic text line areas or inverse-chromatic text line areas;
and performing character recognition on the result image to obtain a recognition result.
In another aspect of the present invention, a method for character recognition based on layout analysis is provided, which includes the following steps:
acquiring a gray image of an image to be identified;
obtaining an empty binary image, an inverse empty binary image and an inverse gray image according to the gray image;
identifying a sixth text line region in the outline binary image and a seventh text line region in the inverse outline binary image;
splicing the gray level image and the inverse gray level image according to the sixth text line region and the seventh text line region to obtain a third result image, wherein the text line regions in the third result image are all positive color text line regions;
and performing character recognition on the third result image to obtain a recognition result.
The invention also provides a character recognition apparatus comprising a processor and a memory, the memory having stored therein computer instructions for executing the computer instructions stored in the memory, the apparatus implementing the steps of the method as described above when the computer instructions are executed by the processor.
The invention also provides a computer storage medium having stored thereon a computer program which, when being executed by a processor, carries out the method steps as set forth above.
Compared with the prior OCR technology that when character recognition is carried out on an image to be recognized with a complex layout, all character information of the image to be recognized with the complex layout can be obtained only by carrying out recognition twice, the text line region positioning method, the layout analysis method based on the text line region positioning, the character recognition method based on the layout analysis, the device and the storage medium of the invention splice the positive color image and the reverse color image according to the positive color text line region and/or the reverse color text line region in the positioning image to obtain a result image only containing the positive color text line region or only containing the reverse color text line region, thereby realizing that all character information of the image to be recognized can be obtained by carrying out recognition on the result image once, greatly shortening the recognition using time and improving the character recognition efficiency of the image to be recognized with the complex layout.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention.
FIG. 1 illustrates an exemplary flow chart of a text line region locating method of the present invention.
FIG. 2 illustrates an exemplary flow chart of one embodiment of a text line region locating method of the present invention.
FIG. 3 illustrates an exemplary flow chart of another embodiment of a text line region locating method of the present invention.
FIG. 4 illustrates an exemplary flow chart of the layout analysis based character recognition method of the present invention.
FIG. 5 illustrates an exemplary flow chart of a preferred embodiment of the layout analysis based character recognition method of the present invention.
Fig. 6 is a diagram showing an example of the result image in one embodiment of the character recognition method based on layout analysis of the present invention.
Fig. 7(a) shows an exemplary diagram of a grayscale image.
Fig. 7(b) shows an exemplary diagram of a reverse gray image.
Fig. 7(c) shows an exemplary diagram of a binary image.
Fig. 7(d) shows an exemplary diagram of a reverse binary image.
Fig. 7(e) shows an exemplary diagram of an outline binary image.
Fig. 7(f) shows an exemplary diagram of an inverse hollow binary image.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following embodiments and accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention. It should be noted that, in order to avoid obscuring the present invention with unnecessary details, only the structures and/or processing steps closely related to the scheme according to the present invention are shown in the drawings, and other details not so relevant to the present invention are omitted. It should be emphasized that the term "comprises/comprising" when used herein, is taken to specify the presence of stated features, elements, steps or components, but does not preclude the presence or addition of at least one other feature, element, step or component. It should be noted that, in the present application, the embodiments and features of the embodiments may be combined with each other without conflict.
Fig. 1 is an exemplary flowchart of a text line region locating method of the present application. The basic idea of the text line region positioning method is to provide concepts of a 'orthochromatic text line region' and a 'inverse-chromatic text line region', and determine the orthochromatic text line region and/or the inverse-chromatic text line region from the text line region of the positioning image according to the characteristics of the orthochromatic text line region and the inverse-chromatic text line region. The "orthochromatic text line region" refers to a text line region in the orthochromatic region, and the "inverse-chromatic text line region" refers to a text line region in the inverse-chromatic region. For the image to be recognized with the complex layout condition of the positive color region and the reverse color region, the method and the device can simultaneously acquire the positive color text line positioning region and the reverse color text line positioning region through one-time recognition. Compared with the prior art that the coordinate position of the text line region in the whole image is only obtained when the text line region is positioned, the method and the device have the advantages that the judgment on whether the text line region is in the positive color or the negative color is added, and the image information of the positive color and the negative color is added in the positioning information of the text line region, so that the text line region can be more accurately processed later. For example, in OCR recognition of a text line region, an appropriate recognition means can be employed depending on a positive text line region and a negative text line region. For those skilled in the art, it should be understood without any doubt that the text line region locating method of the present application is also applicable to an image to be recognized that only contains a positive color region or only contains a negative color region, and the recognition result is all positive color text line regions or all negative color text line regions.
The text line region positioning method provided by the application comprises the following steps:
step S101: acquiring a gray image of an image to be identified;
step S102: obtaining a positioning image according to the gray level image;
step S103: and identifying the positive color text line region and/or the negative color text line region in the positioning image.
The image to be recognized in step S101 may be an electronic image of a planar printed matter such as a paper photograph, a business card, a bank card, an identification card, a book, a leaflet, or a passport, or may be an electronic image obtained by shooting a three-dimensional object. The grayscale image in step S101 may be a grayscale image obtained by performing graying processing on a color image to be recognized, or may be a grayscale image pattern to be recognized that is directly input. The manner of obtaining the grayscale image in the present application is only an example, and is not limited thereto, and those skilled in the art may select other manners according to actual needs, and redundant description is omitted here.
In step S103, for the image to be identified that only includes the orthochromatic region, only the orthochromatic text line region of the positioning image may be identified; for an image to be identified which only contains a reverse color region, only a reverse color text line region of the positioning image can be identified; for an image to be recognized with a complex layout condition, which has both a positive color region and a negative color region, it is necessary to recognize both a positive color text line region and a negative color text line region in the positioning image.
In the field of digital image recognition, the region with light background and light character is generally defined as the positive color region, and conversely, the region with light character with dark background is defined as the negative color region. In the solid color region, since the character color of the text line region is darker than the color of the background, the gradation value of the in-line region of the text line region is smaller than the gradation value of the out-of-line region when the in-line region is displayed in a dark color and the out-of-line region is displayed in a light color for the solid text line region. On the other hand, in the reverse color region, the characters in the text line region are lighter in color than the background, so that for the reverse color text line region, the in-line region is displayed in light color, and the out-of-line region is displayed in dark color, and at this time, the gradation value in the in-line region of the text line region is larger than the gradation value in the out-of-line region. In view of this, it is possible to determine whether the text line region is a forward text line region or a reverse text line region only by comparing the grayscale values of the in-line region and the out-of-line region of the text line region. Although this gray scale comparison method is simple, it is possible to quickly and accurately confirm whether the character line region is "normal color" or "reverse color".
Based on this, in a specific embodiment of the text line region positioning method provided in the present application, as shown in fig. 2, the text line region in the positioning image is a first text line region, the in-line region of the first text line region is a first in-line region, and the out-line region of the first in-line region is a first out-line region. The method specifically comprises the following steps:
step S111: acquiring a gray image of an image to be identified;
step S112: obtaining a positioning image according to the gray level image;
step S1131: identifying a first text line region of the positioning image;
step S1132: selecting a first in-line area and a first out-of-line area in the positioning image;
step S1133: determining the first text line region as a positive text line region when the gray value of the first in-line region is smaller than the gray value of the first out-of-line region; and/or the presence of a gas in the gas,
and determining that the first text line region is an inverse text line region when the gray value of the first in-line region is greater than the gray value of the first out-of-line region.
The first text line region in step S1131 may be obtained by a method of acquiring a connected component in the prior art. In the above embodiment, the coordinate position of the first text line region in the entire image can be acquired in step S1131, and in step S1132 and step S1133, it is possible to confirm whether the first text line region is a normal color text line region or an inverse color text line region only by comparing the magnitude between the grayscale value of the region in the first line and the grayscale value of the region outside the first line, thereby adding the information of "normal color" and "inverse color" to the positioning information of the first text line region, and when performing information processing on the text line region later, particularly in the OCR recognition of the text line region, it is possible to perform the OCR recognition more accurately based on these pieces of information.
In particular, when it is necessary to identify both the orthochromatic text line region and the inverse-chromatic text line region in the positioning image, the step S1133 may determine that the first text line region is the orthochromatic text line region of the positioning image "when the gray value of the region in the first line is smaller than the gray value of the region outside the first line" and determine that the first text line region is the inverse-chromatic text line region of the positioning image "when the gray value of the region in the first line is greater than the gray value of the region outside the first line.
In the current technology approach, the positioning image may be one of a positioning grayscale image (see fig. 7 (a)), a positioning inverse grayscale image (see fig. 7 (b)), a positioning binary image (see fig. 7 (c)), and a positioning inverse binary image (see fig. 7 (d)). In the images, when a gray scale comparison method is adopted, whether the first character line area is a positive color character line area or a negative color character line area can be known.
Because the gray scale image and the inverse gray scale image can reserve information such as the outline and the color depth of each character in the image to be recognized to a large extent, when the positioning image is the positioning gray scale image or the positioning inverse gray scale image, the orthochromatic text line region and/or the inverse chromatic text line region in the positioning gray scale image or the positioning inverse gray scale image can be recognized according to more image detail information and character information in the positioning gray scale image or the positioning inverse gray scale image. The positioning grayscale image may be the grayscale image acquired in step S111; the positioning inverse gray level image is an image obtained by at least performing inverse color processing on the gray level image.
Because the gray value of the pixel point in the binary image is only 0 or 255, and the gray value difference between the character and the corresponding background is large, when the positioning image is a positioning binary image or a positioning inverse binary image, the orthochromatic text line region and the inverse chromatic text line region can be more accurately positioned. The positioning binary image is an image obtained by performing at least binarization processing on the gray level image, and the positioning inverse binary image is an image obtained by performing at least inverse color processing and binarization processing on the gray level image. In this embodiment, the reverse color processing and the binarization processing are both in the prior art, and are not described herein in too much detail.
In order to more precisely perform the positioning of the text line region, the inventors propose an outline binary image (see fig. 7 (e)) and an inverse outline binary image (see fig. 7 (f)). In the outline binary image and the inverse outline binary image, the positive color region is displayed with the background white, the character is a black solid character (see the letters "total manager for flying in the open air" in fig. 7(e) and the first 6 lines in fig. 7 (f)), and the inverse color region is displayed with the background white, the character is displayed as a black margin open character (see the letters "total manager for flying in the open air" in fig. 7(e) and in fig. 7 (f)); the positive color text line area is displayed as white with a background and the characters are black solid characters, and the negative color text line area is displayed as white with a background and the characters are black-edged hollow characters.
Accordingly, as shown in fig. 3, in another embodiment of the text line region locating method provided in the present application, the locating image includes a locating outline binary image and a locating inverse outline binary image. The method specifically comprises the following steps:
step S121: acquiring a gray image of an image to be identified;
step S122: obtaining a positioning hollow binary image and a positioning reverse hollow binary image according to the gray level image;
step S123: and identifying the orthochromatic text line region and/or the inverse-chromatic text line region in the positioning hollow binary image and the positioning inverse hollow binary image.
Because outline information of each character is reserved to the maximum extent by the hollow binary image and the reverse hollow binary image, the text line region can be identified more accurately, and moreover, the color of each character outline is the same, and all the text line regions in the image can be found by screening connected domains by using a threshold value. Therefore, in the application, the hollow binary image and the anti-hollow binary image are positioned, so that the text line region can be accurately, simply and quickly identified, and the accuracy and efficiency of text line region positioning are greatly improved. The accuracy and efficiency of character recognition can be improved by matching with information processing on the text line region later, particularly matching with OCR recognition of characters.
In a possible embodiment, the text line region in the hollow binary image is a second text line region, the in-line region of the second text line region is a second in-line region, the out-of-line region of the second in-line region is a second out-of-line region, and an absolute value of a difference between a gray value of the second in-line region and a gray value of the second out-of-line region is a second gray difference;
a text line region in the positioning anti-hollow binary image is a third text line region, an in-line region of the third text line region is a third in-line region, an out-of-line region of the third in-line region is a third out-of-line region, and an absolute value of a difference value between a gray value of the third in-line region and a gray value of the third out-of-line region is a third gray difference value;
in step S123, identifying the orthochromatic text line region and/or the inverse-chromatic text line region in the positioning hollow binary image and the positioning inverse hollow binary image specifically includes:
step S1231: identifying a second text line region in the positioned outline binary image and a third text line region in the positioned inverse outline binary image;
step S12321: when the second text line region and at least one third text line region have an overlapping region, acquiring a second in-line region and a second out-of-line region of the second text line region, and a third in-line region and a third out-of-line region of the third text line region corresponding to the second text line region; obtaining a second gray scale difference value of the second text line region and a third gray scale difference value of a third text line region corresponding to the second text line region through calculation;
alternatively, the first and second electrodes may be,
when an overlapping area exists between the third text line area and at least one second text line area, acquiring a third in-line area and a third out-line area of the third text line area, and a second in-line area and a second out-line area of a second text line area corresponding to the third text line area; obtaining a third gray difference value of the third text line region and a second gray difference value of a second text line region corresponding to the third text line region through calculation;
step S12322: when the second gray level difference value is larger than the third gray level difference value, determining that the second text line region is a positive color text line region, and/or determining that the third text line region is a reverse color text line region;
and/or the presence of a gas in the gas,
and when the second gray level difference value is smaller than the third gray level difference value, determining that the second text line region is a reverse color text line region, and/or determining that the third text line region is a positive color text line region.
In the art, a text line region refers to a certain block region in the image, and is usually stored in the form of the lower left corner coordinate and the upper right corner coordinate of the block region. In this embodiment, the overlapping area between the two text line areas means that there is an intersection between the position coordinates of the two text line areas. For example, "there is an overlapping area of the second text line region and the third text line region" means that there is an intersection between the coordinates of the second text line region and the coordinates of the third text line region. The area of the second text line region and the area of the third text line region may or may not be equal, but both the second text line region and the third text line region having an overlap region cover the corresponding text line region in the grayscale image. The second text line region in the positioning hollow binary image and the third text line region in the positioning reverse hollow binary image can be obtained by a method for obtaining a connected domain in the prior art.
As can be seen from the foregoing, the positive color text line regions in the positioning hollow binary image and the positioning reverse hollow binary image are displayed as white characters with a background, and the characters are black solid characters, and the reverse color text line regions are displayed as white characters with a background, and the characters are black-edged hollow characters. That is to say, the intra-line regions of the positive-color text line regions in the positioning hollow binary image and the positioning inverse hollow binary image are both white-based black solid characters and the off-line regions are white, and the gray-scale difference (the absolute value of the difference between the gray-scale value of the intra-line region and the gray-scale value of the off-line region) of the positive-color text line regions in the positioning hollow binary image and the positioning inverse hollow binary image is the absolute value of the difference between the gray-scale value of the white-based black solid character region and the gray-scale value of the white background region in the same area. Similarly, the inner line regions of the reverse color text line regions in the positioning hollow binary image and the positioning reverse hollow binary image are both white-bottom black-edge hollow characters, and the outer line regions are white, and the gray difference value of the reverse color text line regions in the positioning hollow binary image and the positioning reverse hollow binary image is the absolute value of the difference value between the gray value of the white-bottom black-edge hollow character region and the gray value of the white background region in the same area. Therefore, when there is an overlapping region between the text line region of the positioned outline binary image and the text line region at the corresponding position of the positioned inverse outline binary image, the grayscale difference of the front color text line region is greater than the grayscale difference of the inverse color text line region, that is, the text line region corresponding to the greater value of the second grayscale difference and the third grayscale difference is the front color text line region, and the text line region corresponding to the smaller value is the inverse color text line region. Thus, in this embodiment, it is determined whether the second text line region and the third text line region are the orthochromatic text line region or the inverse-chromatic text line region of the positioning image by determining the magnitude of the second gray difference value of the second text line region and the gray difference value of the third text line region. The method can accurately and quickly realize the positioning of the orthochromatic text line region of the positioning image and/or the inverse chromatic text line region of the positioning image by a simple and reliable method.
In this embodiment, there is also a third text line region that cannot be found by the second text line region, that is, there is no overlapping region between the second text line region and each of the third text line regions, which means that the content having similar text line regions in the grayscale image can only be located in the location outline binary image. Considering that the positioning of the "orthochromatic" and the "inverse-chromatic" cannot be realized simply by a gray value comparison method in the orthochromatic text line region and the inverse-chromatic text line region of the positioning hollow binary image, step S123 in this embodiment is to identify the orthochromatic text line region and/or the inverse-chromatic text line region in the positioning hollow binary image and the positioning inverse hollow binary image, and further includes the following steps:
step S12331: when the second text line region and each third text line region do not have an overlapping region, acquiring a fourth text line region corresponding to the second text line region in the gray-scale image;
step S12332: acquiring a fourth inner area and a fourth outer area; an in-line area of the fourth text line area is a fourth line-in area, and an out-of-line area of the fourth line-in area is the fourth line-out area;
step S12333: determining that the second text line region is a positive text line region when the gray value of the region in the fourth line is smaller than the gray value of the region outside the fourth line; and/or the presence of a gas in the gas,
and determining that the second text line region is an inverse text line region when the gray value of the region in the fourth line is greater than the gray value of the region outside the fourth line.
In the embodiment, a gray value comparison method with simple calculation and high accuracy is still adopted. In view of the fact that the "normal color" and the "reverse color" of the second text line region in the localization hollow binary image and the fourth text line region in the grayscale image are consistent, the present embodiment determines whether the second text line region is the "normal color" or the "reverse color" by determining whether the fourth text line region is the "normal color" or the "reverse color". When the gray value of the area in the fourth row is smaller than the gray value of the area outside the fourth row, the fourth text line area is a positive color, and then the corresponding second text line area is the positive color text line area of the positioning hollow binary image; when the gray value of the area in the fourth row is greater than the gray value of the area outside the fourth row, the fourth text line area is a reverse color, and then the corresponding second text line area is the reverse color text line area of the positioned hollow binary image. The method and the device determine whether the second text line region is the orthochromatic text line region or the inverse-chromatic text line region in a simple judging mode, and improve the accuracy of the determination result.
Similarly, there may be a case where there is no overlapping area between the second text line region and the third text line region, so the step S123 may further include the following steps:
step S12341: when the third text line region does not have an overlapping region with each second text line region, acquiring a fifth text line region corresponding to the third text line region from the gray-scale image;
step S12342: acquiring a fifth in-line region and a fifth out-of-line region; an in-line region of the fifth text line region is a fifth in-line region, and an out-of-line region of the fifth in-line region is the fifth out-of-line region;
step S12343: determining that the third text line region is a positive text line region when the gray value of the fifth in-line region is smaller than the gray value of the fifth out-of-line region; and/or determining that the third text line region is an inverse text line region under the condition that the gray value of the region in the fifth line is greater than the gray value of the region outside the fifth line.
In this embodiment, the principle of determining the positive color and the negative color of the third text line region by using the positive color and the negative color of the fifth text line region is the same as the principle of determining the positive color and the negative color of the second text line region by using the positive color and the negative color of the fourth text line region, and the description thereof is omitted here.
The following will briefly describe the positioning outline binary image and the positioning inverse outline binary image referred to in the present application, and the in-line region and the out-of-line region of each text line region.
In another embodiment of the above text line region positioning method provided in this application, in step S122, a positioning hollow binary image and a positioning anti-hollow binary image are obtained according to the grayscale image, and the positioning hollow binary image and the positioning anti-hollow binary image may be obtained by performing at least background-removing binarization processing on the grayscale image. Specifically, at least binarization processing and edge detection processing are carried out on the gray level image to obtain the positioning hollow binary image; and at least performing reverse color processing, binarization processing and edge detection processing on the gray level image to obtain the positioning reverse hollow binary image.
In a specific embodiment, step S122, at least performing background-removing binarization processing on the grayscale image to obtain the positioning hollow binary image and the positioning inverse hollow binary image, specifically includes the following steps:
step S1221, at least performing binarization processing on the gray level image to obtain a connected domain in the processed image, setting the gray level value of each pixel point at the edge of the connected domain, where the gray level value of the pixel point is 255, to be 0, and setting the gray level value of each pixel point of a background corresponding to the connected domain to be 255, so as to obtain a positioned hollow binary image;
and the number of the first and second groups,
step S1222, at least performing an inverse color process and a binarization process on the grayscale image, to obtain a connected domain in the processed image, setting the grayscale value of each pixel point at the edge of the connected domain, where the grayscale value of the pixel point is 255, to be 0, and setting the grayscale value of each pixel point of the background corresponding to the connected domain to be 255, to obtain a positioned inverse hollow binary image.
In this embodiment, the binarization processing, the edge detection processing, and the inverse color processing are all prior art. The positioning hollow binary image can be obtained by performing binarization processing of a global gray threshold and Canny edge detection processing on the gray image, and the positioning hollow binary image can be obtained by performing inverse color processing, binarization processing of a global gray value threshold and Canny edge detection processing on the gray image. Or only carrying out local threshold binarization processing on the gray level image and simultaneously obtaining the positioning hollow binary image and the positioning inverse hollow binary image.
In addition, in the above embodiments of the present application, each text line region includes a first text line region, a second text line region, a third text line region, a fourth text line region, and a fifth text line region. The in-line region of each text line region may be at least one unit character region in the text line region. The unit character area is an area corresponding to any character in the text line area. In order to obtain the out-of-line area of each text line area by using the following method, the in-line area is preferably the first unit character area or the last unit character area of the text line area.
Meanwhile, the out-of-line area of each in-line area is obtained, and one of a left area, a right area, an upper area and a lower area adjacent to the in-line area in the positioning image or the gray image may be selected as the out-of-line area. Of course, the selection of the out-of-line area of the in-line area of the first, second, and third text line areas is performed in the positioning image corresponding to each text line area, and the selection of the out-of-line area of the in-line area of the fourth and fifth text line areas is performed in the grayscale image.
In order to make the number of pixels in the out-of-line region the same as the number of pixels in the in-line region, selecting the area of the out-of-line region to be the same as the area of the in-line region, and then acquiring the out-of-line region specifically includes:
when the areas of the left side area and the in-line area are the same and the areas of the right side area and the in-line area are different, selecting the left side area as the out-of-line area;
when the areas of the right side area and the in-line area are the same and the areas of the left side area and the in-line area are different, selecting the right side area as the out-of-line area;
when the areas of the left side area and the right side area are the same as the areas of the in-line areas, selecting the left side area or the right side area as the out-of-line area;
and when the areas of the left side area and the right side area are different from the area of the in-line area, selecting the upper area or the lower area as the out-of-line area.
Wherein, selecting the upper area or the lower area as the out-of-line area specifically comprises:
when the areas of the upper area and the in-line area are the same and the areas of the lower area and the in-line area are different, selecting the upper area as the out-of-line area;
when the areas of the lower area and the in-line area are the same and the areas of the upper area and the in-line area are different, selecting the lower area as the out-of-line area;
and when the areas of the upper area and the lower area are the same as the areas of the in-line areas, selecting the upper area or the lower area as the out-of-line area.
Preferably, the outer area is selected to have the same shape as the inner area, i.e. the height and width of the outer area is the same as the inner area.
The text line region positioning method provided by the application can accurately determine the coordinate information and the image information of the orthochromatic text line region and/or the inverse-chromatic text line region, and the identified image is processed by utilizing the determined coordinate information and the determined image information, so that the processing result can be more accurate. Even if the image to be recognized with a complex layout is processed, the orthochromatic text line region and the inverse-chromatic text line region can be recognized quickly and accurately.
Based on the text line region positioning method provided by the application, the application also provides a layout analysis method based on text line region positioning, and the method can be applied to obtain a positive text line region and a negative text line region in a positioning image; and determining a positive color area and a negative color area in the gray-scale image of the image to be recognized. Because the difference between the gray values of the backgrounds of the positive color area and the negative color area is large, after the positive color text line area or the negative color text line area is located, whether the image to be recognized only contains the positive color area or only contains the negative color area or is a complex layout including both the positive color area and the negative color area can be determined by judging whether the gray value of the background of the positive color text line area or the negative color text line area of the gray image of the image to be recognized has a sudden change. In the process, a threshold value can be set for the abrupt change of the gray value, and whether the gray value of the gray image has the abrupt change or not is determined by whether the gray difference value of two adjacent pixel points is larger than the threshold value or not. When a large number of images to be recognized need to be subjected to layout analysis, the layout analysis method based on the text line region positioning can be used for quickly and accurately completing the layout analysis of the images to be recognized, so that the working efficiency is effectively improved, and the corresponding labor cost and the error rate in manual participation are reduced.
Fig. 4 is an exemplary flowchart of the layout analysis-based character recognition method of the present application. The basic idea of the method is that by applying the above text line region positioning method, according to a gray level image of an image to be recognized, a positioning image and an orthochromatic text line region and/or an inverse chromatic text line region in the positioning image are obtained, so that an orthochromatic text line region and/or an inverse chromatic text line region in the orthochromatic image and/or the inverse chromatic image obtained according to the gray level image are found, and accordingly, a result image in which all text line regions are orthochromatic text line regions or all text line regions are inverse color text line regions is obtained by splicing, so that character recognition is performed on the result image only once, and all character information in the image to be recognized can be obtained. When character recognition is carried out on the image to be recognized with the complex layout, all character information of the image to be recognized with the complex layout can be obtained only by carrying out recognition twice compared with the prior OCR technology.
The character recognition method based on layout analysis comprises the following steps:
step S201: the method for positioning the text line region is applied to obtain the orthochromatic text line region and/or the inverse-chromatic text line region in the positioning image; obtaining a positive color image and a negative color image according to the gray level image in the text line region positioning method;
step S202: acquiring at least one of an orthochromatic text line region of the orthochromatic image, a reverse-chromatic text line region of the orthochromatic image, an orthochromatic text line region of the reverse-chromatic image and a reverse-chromatic text line region of the reverse-chromatic image according to the orthochromatic text line region and/or the reverse-chromatic text line region in the positioning image;
step S203: splicing the positive color image and the reverse color image according to at least one of a positive color text line region of the positive color image, a reverse color text line region of the positive color image, a positive color text line region of the reverse color image and a reverse color text line region of the reverse color image to obtain a result image; the text line areas in the result image are both orthochromatic text line areas or inverse-chromatic text line areas;
step S204: and performing character recognition on the result image to obtain a recognition result.
In the step S201, the order of "obtaining the positive color text line region and/or the negative color text line region in the positioning image by applying the text line region positioning method" and "obtaining the positive color image and the negative color image according to the gray-scale image" is not limited in sequence, the two processes may be performed in parallel, of course, may be performed in sequence according to any order, and a person skilled in the art may select the regions according to needs.
In this application, the text line region positioning method described in any of the above embodiments may be used to obtain the positive text line region and the negative text line region in the positioning image. Therefore, according to various embodiments of the above text line region locating method, the orthochromatic text line region of the locating image may be one of an orthochromatic text line region of the locating gray-scale image, an orthochromatic text line region of the locating inverse gray-scale image, an orthochromatic text line region of the locating binary image, an orthochromatic text line region of the locating inverse binary image, an orthochromatic text line region of the locating empty binary image, and an orthochromatic text line region of the locating inverse empty binary image; the reverse color text line region of the positioning image may be one of a reverse color text line region of the positioning gray level image, a reverse color text line region of the positioning reverse gray level image, a reverse color text line region of the positioning binary image, a reverse color text line region of the positioning reverse binary image, a reverse color text line region of the positioning hollow binary image, and a reverse color text line region of the positioning reverse hollow binary image. For step S202, when the positioning image is one of the positioning grayscale image, the positioning binary image, and the positioning outline binary image, the orthochromatic text line region of the orthochromatic image and the inverse-chromatic text line region of the inverse-chromatic image are regions corresponding to the orthochromatic text line region of the positioning image, and likewise, the inverse-chromatic text line region of the orthochromatic image and the orthochromatic text line region of the inverse-chromatic image are regions corresponding to the inverse-chromatic text line region of the positioning image; when the positioning image is one of the positioning inverse gray image, the positioning inverse binary image and the positioning inverse hollow binary image, the orthochromatic text line region of the orthochromatic image and the inverse chromatic text line region of the inverse chromatic image are regions corresponding to the inverse chromatic text line region of the positioning image, and similarly, the inverse chromatic text line region of the orthochromatic image and the orthochromatic text line region of the inverse chromatic image are regions corresponding to the orthochromatic text line region of the positioning image.
According to the result obtained in step S202, in a specific embodiment, the step S203 for obtaining the result image by stitching specifically includes:
according to the orthochromatic text line region of the orthochromatic image and/or the orthochromatic text line region of the inverse-chromatic image, the orthochromatic text line region of the orthochromatic image and the orthochromatic text line region of the inverse-chromatic image are intercepted and spliced, or the orthochromatic text line region of the orthochromatic image is intercepted and correspondingly spliced to the inverse-chromatic image, or the orthochromatic text line region of the inverse-chromatic image is intercepted and correspondingly spliced to the orthochromatic image, so that a first result image is obtained; the text line areas in the first result image are all orthochromatic text line areas;
alternatively, the first and second electrodes may be,
according to the reverse color text line region of the positive color image and/or the reverse color text line region of the reverse color image, intercepting the reverse color text line region of the positive color image and the reverse color text line region of the reverse color image for splicing, or intercepting the reverse color text line region of the positive color image to be correspondingly spliced to the reverse color image, or intercepting the reverse color text line region of the reverse color image to be correspondingly spliced to the positive color image, so as to obtain a second result image; the text line regions in the second result image are all reverse color text line regions.
As described above, in order to obtain the first result image in which all the areas of the normal text line are the normal text line areas, the normal text line areas of the normal image and the normal text line areas of the reverse image may be cut out and simultaneously stitched to the positions corresponding to the areas of the first blank image with the white background, the areas of the normal text line areas where the normal image is cut out may be stitched to the positions corresponding to the areas of the reverse image, or the areas of the normal text line areas where the reverse image is cut out may be stitched to the positions corresponding to the areas of the normal image. Similarly, in order to obtain the second result image in which all the reverse text line regions are the reverse text line regions, the reverse text line regions of the normal image and the reverse text line regions of the reverse image may be cut out and simultaneously stitched to positions corresponding to the regions of the second blank image in which the background is black, the reverse text line regions of the cut-out normal image may be stitched to positions corresponding to the regions of the reverse image, or the reverse text line regions of the cut-out reverse image may be stitched to positions corresponding to the regions of the normal image. It should be understood by those skilled in the art that the first blank image and the second blank image are both the same size as the image to be recognized.
In the layout analysis-based character recognition method proposed in the present application, the orthochromatic image is at least one of a grayscale image (see fig. 7 (a)), a binary image (see fig. 7 (c)), and an outline binary image (see fig. 7 (e)), and the inverse chromatic image is at least one of an inverse grayscale image (see fig. 7 (b)), an inverse binary image (see fig. 7 (d)), and an inverse outline binary image (see fig. 7 (f)). The binary image, the hollow binary image, the inverse gray image, the inverse binary image and the inverse hollow binary image are all obtained through the gray image. In the splicing process, the positive color image and the negative color image can be selected according to the actual use scene. For example, two or more positive color images may be combined with a reverse color image, or two or more reverse color images may be selected and combined with a positive color image. When one positive color image and one negative color image are selected for intercepting and splicing, if more detail information of the image to be identified is obtained, the positive color image can be selected as a gray image, and the negative color image can be selected as a negative gray image; if the difference between the characters in the text line region and the background is larger in the obtained result image, the orthochromatic image may be selected as a binary image and the inverse-chromatic image may be selected as an inverse binary image, or the orthochromatic image may be selected as an outline binary image and the inverse-chromatic image may be selected as an inverse outline binary image.
According to the character recognition method based on layout analysis, the image to be recognized is converted into the result image only with the orthochromatic text line area or only with the inverse chromatic text line area, and then the result image is recognized, so that recognition of all types of images to be recognized including the deep background, the shallow background and the combination of the deep background and the shallow background is supported simultaneously. Meanwhile, all character information of the image to be recognized can be recognized only by once recognizing the result image, so that the method and the device have the characteristics of accurate recognition result and high recognition rate. The character recognition method based on layout analysis achieves the purpose of quickly and accurately recognizing all characters in a complex layout, and effectively solves the problems of low accuracy and low efficiency when the complex layout is recognized by the conventional character recognition method.
Fig. 5 shows a preferred embodiment of a layout analysis based character recognition method. The basic idea of the character recognition method based on layout analysis is that a gray image and an inverse gray image are spliced according to each text line region in an image to be recognized to obtain a third result image (shown in fig. 6) in which the text line regions are all positive text line regions, and then character recognition is performed on the third result image. Compared with the character recognition method based on layout analysis, the method does not need to mark each text line region of the image to be recognized with 'orthochromatic' or 'inverse-chromatic', but directly finds the text line region conforming to the 'orthochromatic' condition, thereby saving the time for marking and searching again. For the image to be recognized with a complex layout condition of both a positive color area and a negative color area, or the image to be recognized with an irregular negative color area and/or an irregular positive color area, the method and the device get rid of the limitation of the positive color area and the negative color area, get rid of the limitation of the shapes of the positive color area and the negative color area, realize that all character information of the image to be recognized is obtained through one-time character recognition, ensure the accuracy of a character recognition result, greatly shorten the image processing time before the character recognition, and improve the efficiency of the character recognition.
The character recognition method based on layout analysis comprises the following steps:
step S301: acquiring a gray image of an image to be identified;
step S302: obtaining an empty binary image, an inverse empty binary image and an inverse gray image according to the gray image;
step S303: identifying a sixth text line region in the outline binary image and a seventh text line region in the inverse outline binary image;
step S304: splicing the gray level image and the inverse gray level image according to the sixth text line region and the seventh text line region to obtain a third result image; the text line areas in the third result image are all orthochromatic text line areas;
step S305, performing character recognition on the third result image to obtain a recognition result.
In this embodiment, the method for obtaining the outline binary image, the inverse outline binary image and the inverse gray level image according to the gray level image may refer to the method for obtaining the positioning outline binary image, the positioning inverse outline binary image and the positioning inverse gray level image through the gray level image in the above text line region positioning method.
In this embodiment, the in-line area of the sixth text line area is a sixth in-line area, the out-of-line area of the sixth in-line area is a sixth out-of-line area, and an absolute value of a difference between a grayscale value of the sixth in-line area and a grayscale value of the sixth out-of-line area is a sixth grayscale difference;
an in-line region of the seventh text line region is a seventh in-line region, an out-of-line region of the seventh in-line region is a seventh out-of-line region, and an absolute value of a difference between a gradation value of the seventh in-line region and a gradation value of the seventh out-of-line region is a seventh gradation difference value;
in the step S304, the grayscale image and the inverse grayscale image are spliced according to the sixth text line region and the seventh text line region to obtain a third result image, which specifically includes:
when there is an overlapping area of the sixth text line region with at least one of the seventh text line regions,
step S30411: acquiring a sixth in-line area and a sixth out-of-line area of the sixth text line area, and a seventh in-line area and a seventh out-of-line area of a seventh text line area corresponding to the sixth text line area; acquiring a sixth gray difference value of the sixth text line region and a seventh gray difference value of a seventh text line region corresponding to the sixth text line region through calculation;
step S30412: when the sixth gray scale difference value is smaller than the seventh gray scale difference value, intercepting a region corresponding to the seventh text line region in the inverse gray scale image and correspondingly splicing the region to the gray scale image to obtain a third result image;
when there is no overlapping area between the sixth text line region and each of the seventh text line regions or between the seventh text line region and each of the sixth text line regions,
step S30421, obtaining an eighth text line region corresponding to the sixth text line region or the seventh text line region in the grayscale image;
step S30422, acquiring an eighth in-row region and an eighth out-of-row region; an in-line region of the eighth text line region is the eighth in-line region, and an out-of-line region of the eighth in-line region is the eighth out-of-line region;
step S30423, when the gray value of the region in the eighth row is greater than the gray value of the region outside the eighth row, intercepting the region corresponding to the region in the eighth text row from the inverse gray image, and correspondingly splicing the region to the gray image to obtain the third result image.
In this embodiment, the color of each character and its background in the outline binary image is the same as the color of each character and its background in the outline binary image in the text line region location method, and similarly, the features of the inverse outline binary image are the same as the features of the inverse outline binary image in the text line region location method. The larger one of the grayscale differences (the absolute value of the difference between the grayscale value of the in-line region and the grayscale value of the out-of-line region) between the outline binary image and the inverse outline binary image is the absolute value of the difference between the grayscale value of the black solid character region and the grayscale value of the white background region in the same area. And the smaller gray difference value in the hollow binary image and the reverse hollow binary image is the absolute value of the difference value between the gray value of the black-edge hollow character region and the gray value of the white background region under the same area. Therefore, when there is an overlapped region between the sixth text line region and the seventh text line region, if the sixth grayscale difference is greater than the seventh grayscale difference, the region corresponding to the sixth text line region in the grayscale image is "normal color", so that it is not necessary to process the region in the grayscale image, and the text line region of the region in the third result image obtained is still a normal color text line region; if the sixth grayscale difference is smaller than the seventh grayscale difference, the region corresponding to the sixth text line region in the grayscale image is "reverse color", and the region corresponding to the seventh text line region in the reverse grayscale image is "normal color", so that the region corresponding to the seventh text line region in the reverse grayscale image needs to be spliced to the grayscale image, and the text line region of the region in the third resulting image is the normal color text line region.
When there is no overlapping area between the sixth text line region and each of the seventh text line regions or between the seventh text line region and each of the sixth text line regions, directly judging that the eighth text line region corresponding to the sixth text line region or the seventh text line region in the gray-scale image is 'positive color' or 'negative color', when the gray value of the area in the eighth row is greater than the gray value of the area outside the eighth row, the area in the eighth text row in the gray image is in a reverse color, and the area corresponding to the area in the eighth text row in the reverse gray image is in a positive color, therefore, the region corresponding to the eighth text line region in the inverse gray image is captured and spliced to the gray image, and the text line region of the region in the third result image is the positive text line region.
In summary, according to the application, different comparison modes are respectively adopted for the situations of overlapping areas and non-overlapping areas according to the overlapping situation between the sixth text line area and the seventh text line area, then the corresponding areas in the gray-scale image and the inverse gray-scale image are intercepted according to the comparison result for splicing, and through the comparison and splicing modes, the third result image only with the positive color text line area is directly obtained while the corresponding areas of all the text line areas of the image to be recognized are ensured in the third result image. And then, the third result image is subjected to character recognition, all character information of the image to be recognized can be recognized only once, and the method has the advantages of accurate character recognition result and short recognition time.
Each text line region in this embodiment includes a sixth text line region, a seventh text line region, and an eighth text line region. The in-line area of each text line area is the first unit character area or the last unit character area of the text line area. The out-of-line region is one of a left region, a right region, an upper region, and a lower region adjacent to the in-line region.
Preferably, the out-of-line area is one of a left area, a right area, an upper area and a lower area adjacent to the in-line area, and specifically:
when the areas of the left side area and the in-line area are the same and the areas of the right side area and the in-line area are different, selecting the left side area as the out-of-line area;
when the areas of the right side area and the in-line area are the same and the areas of the left side area and the in-line area are different, selecting the right side area as the out-of-line area;
when the areas of the left side area and the right side area are the same as the areas of the in-line areas, selecting the left side area or the right side area as the out-of-line area;
and when the areas of the left side area and the right side area are different from the area of the in-line area, selecting the upper area or the lower area as the out-of-line area.
Wherein, selecting the upper area or the lower area as the out-of-line area specifically comprises:
when the areas of the upper area and the in-line area are the same and the areas of the lower area and the in-line area are different, selecting the upper area as the out-of-line area;
when the areas of the lower area and the in-line area are the same and the areas of the upper area and the in-line area are different, selecting the lower area as the out-of-line area;
and when the areas of the upper area and the lower area are the same as the areas of the in-line areas, selecting the upper area or the lower area as the out-of-line area.
Further, the height and width of the out-of-row region and the in-row region are the same.
The character recognition method based on layout analysis integrates the gray level image and each positive text line region in the reverse gray level image into one image to obtain the third result image, and then performs character recognition on the third result image, so that the result image is recognized once to obtain all character information of the image to be recognized, the recognition service time is greatly shortened, and the efficiency of character recognition on the image with complex layout is improved.
Accordingly, the present invention also discloses a character recognition apparatus, which may comprise a processor and a memory, wherein the memory stores computer instructions, the processor is configured to execute the computer instructions stored in the memory, and when the computer instructions are executed by the processor, the apparatus implements the steps of the method as described above.
The invention also relates to a storage medium having stored thereon computer program code to, when executed by a processor, implement the method steps as described in any of the above, which may be a tangible storage medium such as an optical disc, a usb-disc, a floppy disc, a hard disc, etc.
Those of skill in the art will appreciate that the various illustrative components, systems, and methods described in connection with the embodiments disclosed herein may be implemented as hardware, software, or combinations of both. Whether this is done in hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of a machine-readable medium include electronic circuits, semiconductor memory devices, ROM, flash memory, Erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.
It should also be noted that the exemplary embodiments mentioned in this patent describe some methods or systems based on a series of steps or devices. However, the present invention is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.
Features that are described and/or illustrated with respect to one embodiment may be used in the same way or in a similar way in one or more other embodiments and/or in combination with or instead of the features of the other embodiments in the present invention.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes may be made to the embodiment of the present invention by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A text line region positioning method is characterized by comprising the following steps:
acquiring a gray image of an image to be identified;
obtaining a positioning image according to the gray level image;
and identifying the positive color text line region and/or the negative color text line region in the positioning image.
2. The method according to claim 1, wherein the text row area in the positioning image is a first text row area, the in-row area of the first text row area is a first in-row area, and the out-row area of the first in-row area is a first out-row area;
identifying a positive color text line region and/or a negative color text line region in the positioning image, specifically comprising:
identifying a first text line region of the positioning image;
selecting a first in-line area and a first out-of-line area in the positioning image;
determining the first text line region as a positive text line region when the gray value of the first in-line region is smaller than the gray value of the first out-of-line region; and/or the presence of a gas in the gas,
and determining that the first text line region is an inverse text line region when the gray value of the first in-line region is greater than the gray value of the first out-of-line region.
3. The method of claim 2, wherein the positioning image is one of a positioning grayscale image, a positioning inverse grayscale image, a positioning binary image, and a positioning inverse binary image; the positioning gray level image is the gray level image, at least reverse color processing is carried out on the gray level image to obtain the positioning reverse gray level image, at least binarization processing is carried out on the gray level image to obtain the positioning binary image, and at least reverse color processing and binarization processing are carried out on the gray level image to obtain the positioning reverse binary image.
4. The method of claim 1, wherein the scout image comprises a scout binary image and a scout inverse binary image;
obtaining a positioning image according to the gray image, specifically comprising:
obtaining a positioning hollow binary image and a positioning reverse hollow binary image according to the gray level image;
identifying a positive color text line region and/or a negative color text line region in the positioning image, specifically comprising:
and identifying the orthochromatic text line region and/or the inverse-chromatic text line region in the positioning hollow binary image and the positioning inverse hollow binary image.
5. The method of claim 4, wherein the text line region in the positioned outline binary image is a second text line region, the in-line region of the second text line region is a second in-line region, the out-of-line region of the second in-line region is a second out-of-line region, and an absolute value of a difference between the gray values of the second in-line region and the gray values of the second out-of-line region is a second gray difference value;
the text line region in the positioning anti-hollow binary image is a third text line region, the in-line region of the third text line region is a third in-line region, the out-of-line region of the third in-line region is a third out-of-line region, and the absolute value of the difference between the gray value of the third in-line region and the gray value of the third out-of-line region is a third gray difference value;
identifying the orthochromatic text line region and/or the inverse chromatic text line region in the positioning hollow binary image and the positioning inverse hollow binary image, specifically comprising:
identifying a second text line region in the positioned outline binary image and a third text line region in the positioned inverse outline binary image;
when the second text line region and at least one third text line region have an overlapping region, acquiring a second gray scale difference value of the second text line region and a third gray scale difference value of a third text line region corresponding to the second text line region; or when the third text line region and at least one second text line region have an overlapping region, acquiring a third gray difference value of the third text line region and a second gray difference value of the second text line region corresponding to the third text line region;
when the second gray level difference value is larger than the third gray level difference value, determining that the second text line region is a positive color text line region, and/or determining that the third text line region is a reverse color text line region;
and/or the presence of a gas in the gas,
and when the second gray level difference value is smaller than the third gray level difference value, determining that the second text line region is a reverse color text line region, and/or determining that the third text line region is a positive color text line region.
6. A layout analysis method based on text line region positioning is characterized in that a text line region positioning method according to any one of claims 1 to 5 is applied to obtain a positive text line region and a negative text line region in a positioning image; and determining a positive color area and a negative color area in the gray-scale image of the image to be recognized.
7. A character recognition method based on layout analysis is characterized by comprising the following steps:
applying the text line region positioning method according to any one of claims 1 to 5, obtaining a positive text line region and/or a negative text line region in a positioning image; the gray image in the text line region positioning method according to any one of claims 1 to 5 obtains a normal color image and a reverse color image;
acquiring at least one of an orthochromatic text line region of the orthochromatic image, a reverse-chromatic text line region of the orthochromatic image, an orthochromatic text line region of the reverse-chromatic image and a reverse-chromatic text line region of the reverse-chromatic image according to the orthochromatic text line region and/or the reverse-chromatic text line region in the positioning image;
splicing the positive color image and the reverse color image according to at least one of a positive color text line region of the positive color image, a reverse color text line region of the positive color image, a positive color text line region of the reverse color image and a reverse color text line region of the reverse color image to obtain a result image; the text line areas in the result image are both orthochromatic text line areas or inverse-chromatic text line areas;
and performing character recognition on the result image to obtain a recognition result.
8. A character recognition method based on layout analysis is characterized by comprising the following steps:
acquiring a gray image of an image to be identified;
obtaining an empty binary image, an inverse empty binary image and an inverse gray image according to the gray image;
identifying a sixth text line region in the outline binary image and a seventh text line region in the inverse outline binary image;
splicing the gray level image and the inverse gray level image according to the sixth text line region and the seventh text line region to obtain a third result image, wherein the text line regions in the third result image are all positive color text line regions;
and performing character recognition on the third result image to obtain a recognition result.
9. A character recognition apparatus comprising a processor and a memory, wherein the memory has stored therein computer instructions for executing computer instructions stored in the memory, and wherein the apparatus implements the steps of the method of any one of claims 1 to 8 when the computer instructions are executed by the processor.
10. A computer storage medium, characterized in that a computer program is stored thereon which, when being executed by a processor, carries out the method steps of any one of claims 1 to 8.
CN202010640573.6A 2020-07-06 2020-07-06 Text line region positioning method, layout analysis method and character recognition method Pending CN111814778A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010640573.6A CN111814778A (en) 2020-07-06 2020-07-06 Text line region positioning method, layout analysis method and character recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010640573.6A CN111814778A (en) 2020-07-06 2020-07-06 Text line region positioning method, layout analysis method and character recognition method

Publications (1)

Publication Number Publication Date
CN111814778A true CN111814778A (en) 2020-10-23

Family

ID=72841612

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010640573.6A Pending CN111814778A (en) 2020-07-06 2020-07-06 Text line region positioning method, layout analysis method and character recognition method

Country Status (1)

Country Link
CN (1) CN111814778A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113887484A (en) * 2021-10-20 2022-01-04 前锦网络信息技术(上海)有限公司 Card type file image identification method and device
CN114419636A (en) * 2022-01-10 2022-04-29 北京百度网讯科技有限公司 Text recognition method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113887484A (en) * 2021-10-20 2022-01-04 前锦网络信息技术(上海)有限公司 Card type file image identification method and device
CN114419636A (en) * 2022-01-10 2022-04-29 北京百度网讯科技有限公司 Text recognition method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
US10817741B2 (en) Word segmentation system, method and device
CN108229386B (en) Method, apparatus, and medium for detecting lane line
US9053361B2 (en) Identifying regions of text to merge in a natural image or video frame
JP4323328B2 (en) System and method for identifying and extracting character string from captured image data
US7751648B2 (en) Image processing apparatus, image processing method, and computer program
US10198661B2 (en) System for determining alignment of a user-marked document and method thereof
US11004194B2 (en) Inspection device, image forming apparatus, and inspection method
CN102360419B (en) Method and system for computer scanning reading management
US20070253040A1 (en) Color scanning to enhance bitonal image
CN111259891B (en) Method, device, equipment and medium for identifying identity card in natural scene
CN111814778A (en) Text line region positioning method, layout analysis method and character recognition method
CN111626249B (en) Method and device for identifying geometric figure in topic image and computer storage medium
WO2017141802A1 (en) Image processing device, character recognition device, image processing method, and program recording medium
CN112329756A (en) Method and device for extracting seal and recognizing characters
CN111461100A (en) Bill identification method and device, electronic equipment and storage medium
CN109389110B (en) Region determination method and device
CN110210467B (en) Formula positioning method of text image, image processing device and storage medium
CN113808004B (en) Image conversion device, image conversion method, and computer program for image conversion
US20140086473A1 (en) Image processing device, an image processing method and a program to be used to implement the image processing
CN108090425B (en) Lane line detection method, device and terminal
JP2013254242A (en) Image recognition device, image recognition method, and image recognition program
CN115984211A (en) Visual positioning method, robot and storage medium
CN115100663A (en) Method and device for estimating distribution situation of character height in document image
CN115410191A (en) Text image recognition method, device, equipment and storage medium
CN102682308B (en) Imaging processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination