CN113723412A - Character extraction method, device and equipment for circular red official seal - Google Patents

Character extraction method, device and equipment for circular red official seal Download PDF

Info

Publication number
CN113723412A
CN113723412A CN202110807454.XA CN202110807454A CN113723412A CN 113723412 A CN113723412 A CN 113723412A CN 202110807454 A CN202110807454 A CN 202110807454A CN 113723412 A CN113723412 A CN 113723412A
Authority
CN
China
Prior art keywords
circular
official seal
image
rectangular
characters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110807454.XA
Other languages
Chinese (zh)
Inventor
郭大勇
张海龙
兰永
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Tongban Information Service Co ltd
Original Assignee
Shanghai Tongban Information Service Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Tongban Information Service Co ltd filed Critical Shanghai Tongban Information Service Co ltd
Priority to CN202110807454.XA priority Critical patent/CN113723412A/en
Publication of CN113723412A publication Critical patent/CN113723412A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • G06T3/053Detail-in-context presentations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/62Analysis of geometric attributes of area, perimeter, diameter or volume
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Geometry (AREA)
  • Character Input (AREA)
  • Character Discrimination (AREA)

Abstract

The application discloses a method, a device and equipment for extracting characters of a circular red official seal, wherein the method comprises the following steps: acquiring an RGB image containing a circular official seal, and automatically positioning the position of the circular official seal; intercepting the image of the circular official seal, and removing background black characters; positioning a circular ring where characters of the circular official seal to be extracted are located; converting the circular ring into a rectangular image by converting the polar coordinates into rectangular coordinates; performing OCR recognition on the rectangular images, and re-splicing the rectangular images according to the recognized character coordinates; performing background optimization on the rectangular image after re-splicing; and performing OCR recognition on the optimized rectangular image. By the method, intelligent recognition of characters in the red circular official seal can be achieved, recognition speed is high, character information extraction is accurate, and official seal character recognition accuracy and efficiency are improved.

Description

Character extraction method, device and equipment for circular red official seal
Technical Field
The invention relates to the technical field of image processing, in particular to a method, a device and equipment for extracting characters of a circular red official seal.
Background
With the development of society, official seals are used more and more frequently. The official seal has authority, is widely applied to national organs, groups, enterprises and public institutions of China, and has legal effect on the text covered with the official seal. Particularly, in the government affair office material, both the card certificate and the application form have corresponding official seal, and the validity of one file is often not opened with the official seal, so the intelligent identification of the official seal in the government affair intelligent office is particularly important.
The technology of OCR (Optical Character Recognition) is now a well-established technology, and has applications in various fields. However, in the conventional OCR recognition technology, the detection and recognition of the curved text such as the circular official seal text and the like are performed by detecting a rotating text box and performing affine transformation, and then, the text area is extracted on the featuremap and recognized by using methods such as crnn and the like.
The intelligent identification of the circular official seal in the official seal is the most difficult, and how to accurately identify characters on the circular official seal becomes more and more important, so that no effective solution is available in the market at present.
Disclosure of Invention
The invention aims to provide a method, a device and equipment for extracting characters of a circular red official seal, so as to solve the problems in the technical background.
In order to achieve the purpose, the invention adopts the following technical scheme:
the first aspect of the application provides a method for extracting characters of a circular red official seal, which comprises the following steps:
acquiring an RGB image containing a circular official seal, and automatically positioning the position of the circular official seal;
intercepting the image of the circular official seal, and removing background black characters;
positioning a circular ring where characters of the circular official seal to be extracted are located;
converting the circular ring into a rectangular image by converting the polar coordinates into rectangular coordinates;
performing OCR recognition on the rectangular images, and re-splicing the rectangular images according to the recognized character coordinates;
performing background optimization on the rectangular image after re-splicing;
and performing OCR recognition on the optimized rectangular image.
Preferably, the acquiring the RGB image of the circular official seal to be recognized and automatically positioning the position of the circular official seal includes the following steps:
collecting image data containing circular official seal;
labeling by using label img, and marking a round official seal to be intercepted;
training a target inspection model by adopting a yolov5 pre-training model;
and recognizing the circular official seal by adopting the trained target inspection model.
Preferably, the intercepting the image of the circular official seal and the removing of the background black characters comprises the following steps:
intercepting an RGB image containing a circular official seal to generate an image of the circular official seal, wherein the image of the circular official seal is a minimum rectangular picture comprising the circular official seal;
converting the intercepted image of the circular official seal from an RGB color space into an HSV color space;
obtaining a color area containing each color according to the component threshold value of each color;
and (4) finding out coordinates of all black or gray pixel points, and replacing the color of the point with white.
Preferably, the step of locating the ring where the characters of the circular official seal to be extracted are located includes the following steps: zooming the image of the circular official seal with the background black characters removed to a preset size;
assuming that the curved text area is a part of a circular area of a circular official seal, and acquiring a circle center coordinate and a circle radius of the circular area;
and estimating an arc area corresponding to the curved text area according to the circle center coordinate and the circle radius, and acquiring the outer diameter and the inner diameter of the arc area to obtain the ring diameter width of the ring in which the curved text area is located.
Preferably, the converting the circular ring into the rectangular image by converting the polar coordinates into the rectangular coordinates includes the following steps:
firstly, generating a full-white rectangular image, wherein the width of the rectangular image is the ring diameter width of a ring where characters are located, and the length of the rectangular image is the outer diameter perimeter of the ring;
traversing each pixel point of the rectangular image from left to right and from top to bottom, and searching a corresponding point in the circular ring;
and replacing the pixel value of the corresponding point of the rectangular image with the pixel value of the circular ring, so that the curved text area where the characters are located is mapped to the rectangular area, and the circular ring torque-shaped image is obtained.
More preferably, assuming that the coordinate point on the transformed rectangular image is (col, row), the coordinate point on the image where the circular ring is located is (x, y), the length of the rectangular image is w, the outer diameter of the circular ring is radius, and the central point of the circular ring is (circle _ center _ x, circle _ center _ y), the calculation steps are as follows:
θ=2π/w*(col+1)
rho=radius-row-1
x=INT(circle_center_x+rho*sin(θ)+0.5)-1
y=INT(circle_center_y-rho*sin(θ)+0.5)-1
where w is 2 radius pi and pi is 3.14, rounded off.
Preferably, the OCR recognition of the rectangular image and the re-stitching of the rectangular image according to the recognized character coordinates comprise the following steps:
the converted rectangular image is subjected to OCR recognition and returned to the text box;
judging whether the height of the text box is greater than or equal to a preset threshold value, if so, determining that the identified text box is effective, and otherwise, setting the text box as noise and not processing the noise;
sorting all the text boxes meeting the requirements from small to large according to the coordinate value of the horizontal axis of the upper left corner point;
and taking a horizontal axis coordinate value of the upper right corner point of the first text box, performing image vertical cutting by using the horizontal axis coordinate value, and splicing the cut image behind the rectangular image.
Preferably, the background optimization of the rectangular image after the re-stitching includes the following steps: converting the spliced rectangular image from a color image into a gray image;
carrying out binarization processing on the gray level image;
and finding out a coordinate with a pixel value of 255 from the image after the binarization processing, and changing the pixel point of the corresponding coordinate into white in the rectangular image after the rejoining.
More preferably, the binarizing processing of the grayscale map includes: a second preset threshold is set, and when the pixel value exceeds the second preset threshold, the pixel value is set to 255, otherwise, the pixel value is set to 0.
Preferably, the method further comprises: and performing post-processing on the character result after the OCR, wherein the post-processing comprises one or more of but not limited to text error correction, redundant character symbol removal, regular extraction and named entity identification extraction.
Preferably, the method for extracting characters from a circular red official seal is preferably suitable for extracting characters from a circular red official seal of image materials of government affairs office work.
A second aspect of the present application provides a character extracting apparatus for a circular red official seal, including:
the image acquisition module is used for acquiring an RGB image containing a circular official seal;
the official seal extraction module is used for carrying out official seal detection and positioning on the RGB image containing the circular official seal and intercepting the image of the circular official seal according to a detection and positioning result, wherein the image of the circular official seal is a minimum rectangular image comprising the circular official seal;
the background black character removing module is used for removing background black characters in the image of the circular official seal;
the circular ring positioning module is used for positioning a circular ring where the characters of the circular official seal to be extracted are located;
the circular ring-to-rectangular image module is used for converting the polar coordinates into rectangular coordinates and converting the circular ring where the characters of the circular official seal are located into a rectangular image;
the rectangular image splicing module is used for carrying out OCR recognition on the rectangular images and splicing the rectangular images again according to the recognized character coordinates;
the rectangular image background optimization module is used for carrying out background optimization on the rectangular image after the rejoining; and
and the character recognition module is used for performing OCR recognition on the rectangular image after the background optimization to obtain character information in the circular common seal.
Preferably, the apparatus further comprises: and the character recognition post-processing module is used for post-processing the character result after the OCR recognition, wherein the post-processing comprises but is not limited to one or more of text error correction, redundant character symbol removal, regular extraction and named entity recognition extraction.
A third aspect of the present application provides an electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of text extraction of a circular red seal as described above.
Compared with the prior art, the technical scheme of the invention has the following beneficial effects:
the method can realize intelligent identification of characters in the red circular official seal, is high in identification speed, accurate in character information extraction, and improves the precision and efficiency of official seal character identification.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application. In the drawings:
FIG. 1 is a flow chart of a method for extracting characters from a circular red official seal according to the present application;
FIG. 2 is an example of an image containing a circular official seal in the embodiment of the present application;
FIG. 3 is a schematic diagram illustrating the effect of directly recognizing characters by OCR after positioning a circular official seal by using a target inspection method in the embodiment of the present application;
FIG. 4 is a schematic diagram of a seal cut out after a circular seal is identified by the target inspection model in the embodiment of the present application;
FIG. 5 is a schematic diagram illustrating the effect of replacing coordinates of all black or gray pixels with white in the embodiment of the present application;
FIG. 6 is a schematic diagram illustrating an effect of locating a ring where official seal characters to be extracted are located in the embodiment of the present application;
FIG. 7 is a schematic diagram illustrating an effect of converting a circular ring where official seal characters are located into a rectangular image according to an embodiment of the present application;
FIG. 8 is a schematic diagram of an embodiment of the present application, in which OCR recognition is performed on rectangular images, and rectangular images are re-spliced according to recognized character coordinates;
FIG. 9 is a schematic diagram of a process of replacing a rectangular image with a white background in an embodiment of the present application;
FIG. 10 is a schematic diagram illustrating the whole process of the text extraction method for a circular red official seal in the embodiment of the present application;
fig. 11 is a schematic structural diagram of a character extraction device of a circular red official seal according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and effects of the present invention clearer and clearer, the present invention is further described in detail below by way of examples with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order, it being understood that the data so used may be interchanged under appropriate circumstances. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Referring to fig. 1, fig. 1 is a flowchart of a text extraction method for a circular red official seal according to the present application. The method for extracting characters of the circular red official seal comprises the following steps:
step S1: acquiring an RGB image containing a circular official seal, and automatically positioning the position of the circular official seal;
step S2: intercepting the image of the circular official seal, and removing background black characters;
step S3: positioning a circular ring where characters of the circular official seal to be extracted are located;
step S4: converting the circular ring into a rectangular image by converting the polar coordinates into rectangular coordinates;
step S5: performing OCR recognition on the rectangular images, and re-splicing the rectangular images according to recognized character coordinates;
step S6: performing background optimization on the rectangular image after re-splicing;
step S7: and performing OCR recognition on the rectangular image after the background optimization.
Step S8: and performing text error correction, more-than-character removal, regular or named entity recognition and the like on the result after the OCR recognition.
Examples
The first step is as follows: and positioning the circular official seal by adopting a target inspection method.
If OCR recognition is directly performed on office image materials, it is difficult to extract official characters from recognition results. It is necessary to position the circular official seal first, where the target inspection method is employed.
Step 101: image data containing circular official seals is collected as shown in fig. 2.
Step 102: labeling with labelImg, marking the circular official seal to be intercepted.
Step 103: the goal inspection model was trained using the yolov5 pre-training model.
Step 104: and obtaining a trained target inspection model for identifying the circular official seal.
At this time, if the recognized circular official seal is recognized directly by OCR, it is also difficult to extract the character of the official seal, see fig. 3.
The second step is that: and intercepting the official seal image and removing the background black characters.
Step 201: and after the target inspection model identifies the circular official seal, intercepting a small official seal image, wherein the intercepted small official seal image is a minimum rectangular image comprising the circular official seal, and is shown in FIG. 4.
Step 202: when the image is read, the image exists in an RGB form, the color of the image is difficult to judge in an RGB space, the image needs to be converted into an HSV color space from the RGB color space, and the conversion formula is as follows:
Figure BDA0003166874500000071
Figure BDA0003166874500000072
v=max
wherein r, g, b are R, G, B color values input in the RGB color space, max, min are the maximum and minimum values in r, g, b, respectively, and h, s, v represent hue, saturation and brightness in the HSV color space, respectively.
Step 203: the color is checked by a look-up table (color to HSV component thresholds are given, for example, in table 1).
TABLE 1 example of HSV component thresholds for color
Figure BDA0003166874500000073
Step 203: and (4) finding out coordinates of all black or gray pixel points, and replacing the color of the point with white.
The third step: and positioning a circular ring where the official seal characters to be extracted are located.
Step 301: the image obtained in step 203 is first scaled to a fixed size, as shown in fig. 6, which is a scaled image, height: 180, width: 180.
step 302: determining the center, the outer diameter and the diameter width of the circular ring, as shown in fig. 6, the circle center circle _ center: (90,90), outer diameter radius of the ring: 80, the diameter width of the circular ring (equal to the outer diameter of the circular ring minus the inner diameter) radius _ width: 40.
The fourth step: the circular ring is rotated to be rectangular by rotating the polar coordinate to the rectangular coordinate.
Step 401: firstly, a full white rectangular image is generated, the width of the rectangular image is the width h of the circular ring diameter, namely radius _ width, the length of the rectangular image is the outer diameter and the perimeter w of the circular ring, namely 2 radius pi (pi is 3.14), and the rounding is carried out according to four times or five times.
Step 402: from left to right, from top to bottom, each pixel point of the rectangular image is traversed, and a corresponding point is searched in the circular ring. With reference to fig. 6 and 7, the conversion method is as follows:
the coordinate point on the converted rectangular image is (col, row), the coordinate point on the image where the circular ring is located is (x, y), the length of the rectangular image is w, the outer diameter of the circular ring is radius, the central point of the circular ring (circle _ center _ x, circle _ center _ y), and the (x, y) and (col, row) can be corresponded according to the formula of polar coordinate-to-rectangular coordinate, and the calculation steps are as follows:
θ=2π/w*(col+1)
rho=radius-row-1
x=INT(circle_center_x+rho*sin(θ)+0.5)-1
y=INT(circle_center_y-rho*sin(θ)+0.5)-1
where w is 2 radius pi and pi is 3.14, rounded off.
Step 403: each pixel point on the rectangular image can find a corresponding pixel point in the circular ring, and then the pixel value of the corresponding point of the white rectangle is replaced by the pixel value of the circular ring, so that the circular-to-rectangular image shown in fig. 7 is obtained finally.
The fifth step: and performing OCR recognition on the rectangular images, and splicing again according to the recognized character coordinates.
Step 501: returning to the text box via OCR recognition, we determine whether the recognized text box is valid by determining whether the height of the text box is greater than a threshold 20. That is, text boxes less than 20 a tall may be treated as noise. As shown in fig. 8, the recognized valid text boxes are "service limited company" and "shanghai tong office information", respectively. It should be noted that although text boxes with a height less than 20 are ignored, after the text boxes are spliced again later, an OCR recognition is performed, so that all the text information is still retained.
Step 502: and sequencing all the text boxes meeting the requirements from small to large according to the coordinate value of the horizontal axis of the upper left corner point.
Step 503: and (3) taking the coordinate value of the horizontal axis of the upper right corner point of the first text box in the step, vertically cutting the image according to the value, and splicing the cut image to the back of the rectangular image, as shown in fig. 8.
And a sixth step: and performing background optimization on the rectangular image subjected to the re-splicing.
Step 601: the spliced color image is converted into a Gray scale image (formula: Gray ═ R38 + G75 + B15) > > 7).
Step 602: then, the gray level image is subjected to binarization processing, the value range of the gray level image pixel value is [0,255], here, the threshold value is set to 220 (i.e. a second preset threshold value), namely, when the pixel value exceeds 220, the pixel value is set to 255, otherwise, the pixel value is set to 0, wherein 255 is white and 0 is black.
Step 603: and finding out a coordinate with a pixel value of 255 from the binarized image, and changing the pixel point of the corresponding coordinate into white in the rectangular image after splicing again. The transformation visualization step is as in fig. 9.
The seventh step: and performing OCR recognition on the rectangular image after the background optimization.
Eighth step: and post-processing the character result after the OCR recognition.
In some cases, some bad text results may be mixed in, or because of ambiguous recognition errors. This time may add some post-processing. Such as text error correction, redundant word symbol removal, regular extraction, named entity recognition extraction, etc.
For example, text error correction may be directed to errors that may occur during some official seal word recognition processes, such as: shanghai Tong information service Co., Ltd, identified as Shanghai Tong "power" information service Co., Ltd, and the like.
Sometimes, the coincidence which does not appear in official seal characters such as comma periods and the like can be identified and can be directly removed.
Sometimes some irrelevant characters may be recognized due to some factors, for example, if the recognition result in the seventh step is "Shanghai Tong information service Co., Ltd", then "Shanghai Tong information service Co., Ltd" can be extracted by the named entity recognition.
The whole process of the character extraction method of the circular red official seal is shown in fig. 10.
On the other hand, the application also provides a character extraction device of the circular red official seal. Because the working principle of the character extraction device of the circular red official seal disclosed by the application is the same as or similar to the principle of the character extraction method of the circular red official seal disclosed by the application, repeated parts are not repeated.
Referring to fig. 11, the present application further discloses a text extraction device 100 for a circular red official seal, including:
the image acquisition module 110 is configured to acquire an RGB image including a circular official seal;
the official seal extraction module 120 is configured to perform official seal detection and positioning on an RGB image containing a circular official seal, and intercept an image of the circular official seal according to a detection and positioning result, where the image of the circular official seal is a minimum rectangular image including the circular official seal;
the background black character removing module 130 is used for removing background black characters in the image of the circular official seal;
a ring positioning module 140, configured to position a ring where characters of a circular official seal to be extracted are located;
a circular ring-to-rectangular image module 150, configured to convert a polar coordinate into a rectangular coordinate, and convert a circular ring in which characters of a circular official seal are located into a rectangular image;
the rectangular image splicing module 160 is used for performing OCR recognition on the rectangular images and splicing the rectangular images again according to the recognized character coordinates;
a rectangular image background optimization module 170, configured to perform background optimization on the rectangular image after re-stitching; and
and the character recognition module 180 is configured to perform OCR recognition on the background-optimized rectangular image to obtain character information in the circular official seal.
In a preferred embodiment, the apparatus 100 further comprises: and the character recognition post-processing module 190 is configured to perform post-processing on the character result after the OCR recognition, where the post-processing includes, but is not limited to, one or more of text error correction, redundant character symbol removal, regular extraction, and named entity recognition extraction.
In another aspect, the present application further provides an electronic device comprising at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of text extraction of a circular red official seal as described above.
In summary, the application discloses a method, a device and equipment for extracting characters of a circular red official seal, the method provided by the application can be used for realizing intelligent identification of the characters in the red circular official seal, the identification speed is high, the extraction of character information is accurate, the precision and the efficiency of official seal character identification are improved, and particularly the precision and the efficiency of intelligent identification of official seals in government affair intelligent handling are improved.
The embodiments of the present invention have been described in detail, but the embodiments are only examples, and the present invention is not limited to the above-described embodiments. Any equivalent modifications and substitutions to those skilled in the art are also within the scope of the present invention. Therefore, equivalent changes and modifications made without departing from the spirit and scope of the present invention should be covered by the present invention.

Claims (10)

1. A method for extracting characters of a round red official seal is characterized by comprising the following steps:
acquiring an RGB image containing a circular official seal, and automatically positioning the position of the circular official seal;
intercepting the image of the circular official seal, and removing background black characters;
positioning a circular ring where characters of the circular official seal to be extracted are located;
converting the circular ring into a rectangular image by converting the polar coordinates into rectangular coordinates;
performing OCR recognition on the rectangular images, and re-splicing the rectangular images according to the recognized character coordinates;
performing background optimization on the rectangular image after re-splicing;
and performing OCR recognition on the optimized rectangular image.
2. The method for extracting characters from a circular red official seal as claimed in claim 1, wherein said obtaining of RGB images of the circular official seal to be recognized and automatic positioning of the position of the circular official seal comprises the steps of:
collecting image data containing circular official seal;
labeling by using label img, and marking a round official seal to be intercepted;
training a target inspection model by adopting a yolov5 pre-training model;
and recognizing the circular official seal by adopting the trained target inspection model.
3. The method for extracting characters from a circular red official seal according to claim 1, wherein the step of intercepting the image of the circular official seal and removing background black characters comprises the following steps:
intercepting an RGB image containing a circular official seal to generate an image of the circular official seal, wherein the image of the circular official seal is a minimum rectangular picture comprising the circular official seal;
converting the intercepted image of the circular official seal from an RGB color space into an HSV color space;
obtaining a color area containing each color according to the component threshold value of each color;
and (4) finding out coordinates of all black or gray pixel points, and replacing the color of the point with white.
4. The method for extracting characters of a circular red official seal as claimed in claim 1, wherein said step of locating the ring where the characters of the circular official seal to be extracted are located comprises the steps of:
zooming the image of the circular official seal with the background black characters removed to a preset size;
assuming that the curved text area is a part of a circular area of a circular official seal, and acquiring the circle center coordinate and the circle radius of the circular area;
and estimating an arc area corresponding to the curved text area according to the circle center coordinate and the circle radius, and acquiring the outer diameter and the inner diameter of the arc area to obtain the ring diameter width of the ring in which the curved text area is located.
5. The method for extracting words from circular red official seal according to claim 1, wherein said converting the circular ring into a rectangular image by converting the polar coordinates into rectangular coordinates comprises the steps of:
firstly, generating a full-white rectangular image, wherein the width of the rectangular image is the diameter width of a circular ring where characters are located, and the length of the rectangular image is the outer diameter perimeter of the circular ring;
traversing each pixel point of the rectangular image from left to right and from top to bottom, and searching a corresponding point in the circular ring;
and replacing the pixel value of the corresponding point of the rectangular image with the pixel value of the circular ring, so that the curved text region where the characters are positioned is mapped to the rectangular region, and the circular ring torque annular image is obtained.
6. The method for extracting characters from circular red official seal as claimed in claim 1, wherein said OCR recognizing rectangular images and said re-splicing rectangular images according to recognized character coordinates comprises the following steps:
the converted rectangular image is subjected to OCR recognition and returned to the text box;
judging whether the height of the text box is greater than or equal to a preset threshold value, if so, determining that the identified text box is effective, and otherwise, setting the text box as noise and not processing the noise;
sorting all the text boxes meeting the requirements from small to large according to the coordinate value of the horizontal axis of the upper left corner point;
and taking a horizontal axis coordinate value of the upper right corner point of the first text box, performing image vertical cutting by using the horizontal axis coordinate value, and splicing the cut image behind the rectangular image.
7. The method for extracting characters from circular red official seal according to claim 1, wherein the background optimization of the rectangular image after re-splicing comprises the following steps:
converting the spliced rectangular image from a color image into a gray image;
carrying out binarization processing on the gray level image;
and finding out a coordinate with a pixel value of 255 from the image after the binarization processing, and changing the pixel point of the corresponding coordinate into white in the rectangular image after the rejoining.
8. The method for extracting words from circular red official seal according to claim 1, further comprising: and post-processing the character result after the OCR recognition, wherein the post-processing comprises one or more of text error correction, redundant character symbol removal, regular extraction and named entity recognition extraction.
9. A character extraction device of a circular red official seal is characterized by comprising:
the image acquisition module is used for acquiring an RGB image containing a circular official seal;
the official seal extraction module is used for carrying out official seal detection and positioning on the RGB image containing the circular official seal and intercepting the image of the circular official seal according to a detection and positioning result, wherein the image of the circular official seal is a minimum rectangular image comprising the circular official seal;
the background black character removing module is used for removing background black characters in the image of the circular official seal;
the circular ring positioning module is used for positioning a circular ring where the characters of the circular official seal to be extracted are located;
the circular ring-to-rectangular image module is used for converting the polar coordinates into rectangular coordinates and converting the circular ring where the characters of the circular official seal are located into a rectangular image;
the rectangular image splicing module is used for carrying out OCR recognition on the rectangular images and splicing the rectangular images again according to the recognized character coordinates;
the rectangular image background optimization module is used for carrying out background optimization on the rectangular image after the rejoining; and
and the character recognition module is used for performing OCR recognition on the rectangular image after the background optimization to obtain character information in the circular official seal.
10. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of text extraction of a circular red official seal as claimed in any one of claims 1 to 8.
CN202110807454.XA 2021-07-16 2021-07-16 Character extraction method, device and equipment for circular red official seal Pending CN113723412A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110807454.XA CN113723412A (en) 2021-07-16 2021-07-16 Character extraction method, device and equipment for circular red official seal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110807454.XA CN113723412A (en) 2021-07-16 2021-07-16 Character extraction method, device and equipment for circular red official seal

Publications (1)

Publication Number Publication Date
CN113723412A true CN113723412A (en) 2021-11-30

Family

ID=78673526

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110807454.XA Pending CN113723412A (en) 2021-07-16 2021-07-16 Character extraction method, device and equipment for circular red official seal

Country Status (1)

Country Link
CN (1) CN113723412A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115376142A (en) * 2022-07-20 2022-11-22 北大荒信息有限公司 Image-based business license information extraction method, computer equipment and readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6608930B1 (en) * 1999-08-09 2003-08-19 Koninklijke Philips Electronics N.V. Method and system for analyzing video content using detected text in video frames
US20080263440A1 (en) * 2007-04-19 2008-10-23 Microsoft Corporation Transformation of Versions of Reports
CN109040565A (en) * 2018-09-10 2018-12-18 天津科技大学 Panoramic shooting system
CN112381081A (en) * 2020-11-16 2021-02-19 深圳壹账通智能科技有限公司 Official seal character automatic identification method and device, computer equipment and storage medium
CN112733639A (en) * 2020-12-28 2021-04-30 贝壳技术有限公司 Text information structured extraction method and device
CN112949689A (en) * 2021-02-01 2021-06-11 Oppo广东移动通信有限公司 Image recognition method and device, electronic equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6608930B1 (en) * 1999-08-09 2003-08-19 Koninklijke Philips Electronics N.V. Method and system for analyzing video content using detected text in video frames
US20080263440A1 (en) * 2007-04-19 2008-10-23 Microsoft Corporation Transformation of Versions of Reports
CN109040565A (en) * 2018-09-10 2018-12-18 天津科技大学 Panoramic shooting system
CN112381081A (en) * 2020-11-16 2021-02-19 深圳壹账通智能科技有限公司 Official seal character automatic identification method and device, computer equipment and storage medium
CN112733639A (en) * 2020-12-28 2021-04-30 贝壳技术有限公司 Text information structured extraction method and device
CN112949689A (en) * 2021-02-01 2021-06-11 Oppo广东移动通信有限公司 Image recognition method and device, electronic equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
黄剑航: "基于HALCON的圆环区域字符识别实现", 《现代计算机》, pages 1 - 4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115376142A (en) * 2022-07-20 2022-11-22 北大荒信息有限公司 Image-based business license information extraction method, computer equipment and readable storage medium
CN115376142B (en) * 2022-07-20 2023-09-01 北大荒信息有限公司 Image-based business license information extraction method, computer equipment and readable storage medium

Similar Documents

Publication Publication Date Title
CN111160337B (en) Automatic identification method, system, medium and equipment for reading of pointer instrument
US10152650B2 (en) Trademark retrieval method, apparatus and system, and computer storage medium
CN109801267B (en) Inspection target defect detection method based on feature point detection and SVM classifier
CN101576956B (en) On-line character detection method based on machine vision and system thereof
CN109409355B (en) Novel transformer nameplate identification method and device
CN109426814B (en) Method, system and equipment for positioning and identifying specific plate of invoice picture
US12002198B2 (en) Character defect detection method and device
CN109344820B (en) Digital ammeter reading identification method based on computer vision and deep learning
US10430687B2 (en) Trademark graph element identification method, apparatus and system, and computer storage medium
CN111460967A (en) Illegal building identification method, device, equipment and storage medium
CN110569774B (en) Automatic line graph image digitalization method based on image processing and pattern recognition
CN110288612B (en) Nameplate positioning and correcting method and device
CN115063802A (en) PSENet-based circular seal identification method, device and medium
CN113780087A (en) Postal parcel text detection method and equipment based on deep learning
CN104616019A (en) Identification method for electronic equipment signboard image
CN113723412A (en) Character extraction method, device and equipment for circular red official seal
CN110084587B (en) Automatic dinner plate settlement method based on edge context
CN111199250A (en) Transformer substation air switch state checking method and device based on machine learning
CN113657339A (en) Instrument pointer counting and reading method and medium based on machine vision
CN114463770A (en) Intelligent question-cutting method for general test paper questions
CN113408519A (en) Method and system for reading pointer instrument based on template rotation matching
CN114898083A (en) Method, device and equipment for intelligently aligning circular red official seal
CN111079752A (en) Method and device for identifying circuit breaker in infrared image and readable storage medium
CN110175563B (en) Metal cutting tool drawing mark identification method and system
CN114863086A (en) Transformer substation multi-dial multi-pointer identification method based on template matching

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 200435 11th Floor, Building 27, Lane 99, Shouyang Road, Jing'an District, Shanghai

Applicant after: Shanghai Tongban Information Service Co.,Ltd.

Address before: 200433 No. 11, Lane 100, Zhengtong Road, Yangpu District, Shanghai

Applicant before: Shanghai Tongban Information Service Co.,Ltd.

CB02 Change of applicant information