CN115331229A - Optical character recognition method, computer readable storage medium and electronic device - Google Patents

Optical character recognition method, computer readable storage medium and electronic device Download PDF

Info

Publication number
CN115331229A
CN115331229A CN202210954615.2A CN202210954615A CN115331229A CN 115331229 A CN115331229 A CN 115331229A CN 202210954615 A CN202210954615 A CN 202210954615A CN 115331229 A CN115331229 A CN 115331229A
Authority
CN
China
Prior art keywords
vertex
text box
text
coordinates
empirical value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210954615.2A
Other languages
Chinese (zh)
Inventor
王闯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jiehui Technology Co Ltd
Original Assignee
Beijing Jiehui Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jiehui Technology Co Ltd filed Critical Beijing Jiehui Technology Co Ltd
Priority to CN202210954615.2A priority Critical patent/CN115331229A/en
Publication of CN115331229A publication Critical patent/CN115331229A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/146Aligning or centring of the image pick-up or image-field
    • G06V30/1463Orientation detection or correction, e.g. rotation of multiples of 90 degrees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/146Aligning or centring of the image pick-up or image-field
    • G06V30/147Determination of region of interest
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/16Image preprocessing
    • G06V30/162Quantising the image signal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/414Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text

Abstract

The invention relates to the technical field of computers, in particular to an optical character recognition method, a computer readable storage medium and electronic equipment, aiming at solving the problems that the existing optical character recognition method aiming at an inclined image to be detected needs to be performed with inclination correction before information extraction, the recognition effect is poor, the working procedures are multiple and time is consumed. For this purpose, the optical character recognition method comprises the steps of determining a first text box with character information as target recognition content from text boxes of an image to be detected, determining a capture area according to the character information of the first text box, coordinates of two opposite angle fixed points and a deflection angle, and screening out a second text box with the character information matched with the target recognition content according to the capture area, so that an optical character recognition result can be obtained at least according to the character information of the first text box and the character information of the second text box. Aiming at the inclined image to be detected, the method does not need to carry out inclination correction, can directly output the identification result, improves the identification effect and reduces the time consumption.

Description

Optical character recognition method, computer readable storage medium and electronic device
Technical Field
The invention relates to the technical field of computers, and particularly provides an optical character recognition method, a computer-readable storage medium and electronic equipment.
Background
Optical Character Recognition (OCR) refers to a process in which an electronic device (e.g., a scanner or a digital camera) examines characters printed on paper and then translates the shapes into computer text using character Recognition methods; that is, the process of scanning the text data, then analyzing the image file and obtaining the character and layout information. When an OCR technology is applied to perform optical character recognition on an image to be detected, when the image to be detected is inclined, inclination correction needs to be performed on the image to be detected in advance, then information extraction is performed on the basis of the corrected image to be detected, and the optical character recognition of the image to be detected is completed through the two stages. However, the existing method has the disadvantages that the tilt correction result in the first stage directly affects the information extraction effect in the second stage, and the two stages have more processes, which increases the total duration of the optical character recognition.
Disclosure of Invention
The invention aims to solve the technical problems that the existing optical character recognition for the inclined image to be detected needs to be performed with inclination correction before information extraction, the recognition effect is poor, the working procedures are multiple, and time is consumed.
In a first aspect, the present invention provides an optical character recognition method comprising:
acquiring a text box of an image to be detected and acquiring two diagonal vertex coordinates, a deflection angle and character information of the text box; wherein the two diagonal vertex coordinates comprise an upper left vertex coordinate and a lower right vertex coordinate;
taking the text box with the text information as target identification content as a first text box, and determining a capture area according to the text information of the first text box, the coordinates of the two diagonal vertexes and the deflection angle;
aiming at the rest text boxes except the first text box in the text box corresponding to the image to be detected, screening out a second text box positioned in the capture area from the rest text boxes, wherein the character information corresponding to the second text box is matched with the target identification content;
and obtaining an optical character recognition result at least according to the character information of the first text box and the character information of the second text box.
In some embodiments, said determining a capture area from said text information of said first text box, said two diagonal vertex coordinates and said deflection angle comprises:
determining preset empirical values corresponding to the text information, the two diagonal vertex coordinates and the deflection angle of the first text box, wherein the vertex coordinates at the upper left corner comprise a first direction coordinate and a second direction coordinate of the vertex at the upper left corner, the vertex coordinates at the lower right corner comprise a first direction coordinate and a second direction coordinate of the vertex at the lower right corner, and the preset empirical values comprise a first empirical value, a second empirical value, a third empirical value and a fourth empirical value;
determining a first direction coordinate of a first vertex of the capture area according to the first direction coordinate of the lower right corner vertex of the first text box and the first experience value, wherein the first vertex is a vertex of the capture area closest to the upper left corner vertex of the first text box;
determining a first direction coordinate of a second vertex of the capture area according to the first direction coordinate of the vertex of the lower right corner and the second empirical value; the second vertex is a diagonal vertex of the first vertex;
determining a second direction coordinate of the first vertex according to the first direction coordinate of the first vertex, the first direction coordinate of the top left vertex, the second direction coordinate of the top left vertex, the deflection angle and the third empirical value;
determining a second direction coordinate of the second vertex according to the first direction coordinate of the second vertex, the first direction coordinate of the vertex of the lower right corner, the second direction coordinate of the vertex of the lower right corner, the deflection angle and the fourth empirical value;
determining the capture area according to the first and second directional coordinates of the first vertex and the first and second directional coordinates of the second vertex.
In some embodiments, the capture area is determined by the following expression:
Figure DEST_PATH_IMAGE001
Figure 891969DEST_PATH_IMAGE002
Figure DEST_PATH_IMAGE003
Figure 894691DEST_PATH_IMAGE004
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE005
and
Figure 856831DEST_PATH_IMAGE006
respectively representing a first directional coordinate and a second directional coordinate of the first vertex,
Figure DEST_PATH_IMAGE007
and
Figure 219679DEST_PATH_IMAGE008
respectively representing a first directional coordinate and a second directional coordinate of the second vertex,
Figure DEST_PATH_IMAGE009
and
Figure 832932DEST_PATH_IMAGE010
respectively representing a first directional coordinate and a second directional coordinate of the top left corner vertex,
Figure DEST_PATH_IMAGE011
and
Figure 990244DEST_PATH_IMAGE012
a first directional coordinate and a second directional coordinate respectively representing the vertex of the lower right corner, theta represents the deflection angle,
Figure DEST_PATH_IMAGE013
a first set of empirical values is expressed that,
Figure 190412DEST_PATH_IMAGE014
a second empirical value is expressed that represents a second empirical value,
Figure DEST_PATH_IMAGE015
a third empirical value is expressed that represents a third empirical value,
Figure 622531DEST_PATH_IMAGE016
represents the fourth empirical value.
In some embodiments, before said determining a capture area from said text information of said first text box, said two diagonal vertex coordinates and said deflection angle, said method further comprises:
setting actual capture areas covering the second text box aiming at the first text box and the second text box under different deflection angles respectively;
obtaining coordinates of the two diagonal vertexes of the first text box and coordinates of a third vertex and a fourth vertex of the actual capturing area, wherein the third vertex is the vertex of the actual capturing area closest to the vertex of the upper left corner of the first text box, and the fourth vertex is the diagonal vertex of the third vertex;
and determining the preset empirical value according to the deflection angle, the coordinates of the two diagonal vertexes of the first text box, the coordinates of the third vertex and the fourth vertex, and correspondingly storing the preset empirical value with the character information, the deflection angle and the coordinates of the two diagonal vertexes of the first text box.
In some embodiments, said determining said preset empirical value based on said deflection angle, said two diagonal vertex coordinates of said first text box, said third vertex and said fourth vertex coordinates comprises:
determining a first intersection point according to an intersection point of a first auxiliary line and a first extension line, wherein the first auxiliary line passes through the third vertex of the actual capture area and is parallel to the second direction, and the first extension line is an extension line passing through the top left vertex and the top right vertex of the first text box;
determining a second intersection point according to an intersection point of a second auxiliary line passing through the fourth vertex of the actual capturing area and being parallel to the second direction and a second extended line passing through a lower right corner vertex of the first text box and having an inclination angle equal to the deflection angle;
constructing a position relation model of the actual capturing area and the first text box according to the first intersection point, the second intersection point, the preset experience value and the deflection angle;
determining the preset empirical value according to the two diagonal vertex coordinates of the first text box and the coordinates of the third vertex and the fourth vertex of the actual capturing area based on the positional relationship model;
the preset empirical value includes a first empirical value, a second empirical value, a third empirical value and a fourth empirical value, and the position relationship model is expressed as:
Figure DEST_PATH_IMAGE017
Figure 824711DEST_PATH_IMAGE018
Figure DEST_PATH_IMAGE019
Figure 684082DEST_PATH_IMAGE020
Figure DEST_PATH_IMAGE021
Figure 840388DEST_PATH_IMAGE022
wherein, the first and the second end of the pipe are connected with each other,
Figure DEST_PATH_IMAGE023
and
Figure 76198DEST_PATH_IMAGE024
first and second directional coordinates respectively representing the third vertex of the actual capture area,
Figure DEST_PATH_IMAGE025
and
Figure 416042DEST_PATH_IMAGE026
first and second direction coordinates respectively representing the fourth vertex of the actual capture area,
Figure DEST_PATH_IMAGE027
a second directional coordinate representing the first intersection point,
Figure 180735DEST_PATH_IMAGE028
a second directional coordinate representing the second intersection point,
Figure 808026DEST_PATH_IMAGE009
and
Figure 785209DEST_PATH_IMAGE010
respectively representing a first directional coordinate and a second directional coordinate of the top left corner vertex,
Figure 932288DEST_PATH_IMAGE011
and
Figure 602303DEST_PATH_IMAGE012
a first directional coordinate and a second directional coordinate respectively representing the vertex of the lower right corner, theta represents the deflection angle,
Figure DEST_PATH_IMAGE029
a value representing a first of said empirical values is,
Figure 248048DEST_PATH_IMAGE030
a second empirical value is expressed that represents a second empirical value,
Figure DEST_PATH_IMAGE031
a third empirical value is expressed that represents a third empirical value,
Figure 543769DEST_PATH_IMAGE032
represents the fourth empirical value.
In some embodiments, said deriving an optical character recognition result from at least said text information of said first text box and said text information of said second text box comprises:
when one second text box is obtained, splicing the character information of the first text box and the character information of the second text box into a dictionary to obtain an optical character recognition result;
when a plurality of second text boxes are obtained, sorting the plurality of second text boxes from near to far according to the distance between the top left corner vertex or the bottom right corner vertex of each second text box and the top left corner vertex or the bottom right corner vertex of the first text box;
splicing the character information of the first text box and the character information of the second text boxes into a dictionary based on the sequence of the second text boxes after sequencing to obtain an optical character recognition result;
alternatively, the first and second electrodes may be,
when a plurality of second text boxes are obtained, sorting according to the size of the first-direction coordinates of the top-left corner vertex or the bottom-right corner vertex of the plurality of second text boxes; or sorting according to the size of the second direction coordinate of the top left corner vertex or the bottom right corner vertex of the second text boxes;
and splicing the character information of the first text box and the sequenced character information of the plurality of second text boxes into a dictionary to obtain an optical character recognition result.
In some embodiments, the obtaining a text box of an image to be detected and obtaining two diagonal vertex coordinates, a deflection angle and text information of the text box includes:
inputting the image to be detected into a text detection model to obtain a text box area image of the image to be detected, and the two diagonal vertex coordinates and the deflection angle of each text box in the text box area image;
and inputting the text box area graph into a character recognition model to obtain character information corresponding to the text box.
In some embodiments, inputting the image to be detected into a text detection model to obtain a text box area map of the image to be detected, and the two diagonal vertex coordinates and the deflection angle of each text box in the text box area map, includes:
inputting the image to be detected into a backbone network of the text detection model to obtain a plurality of initial characteristic graphs with different sizes;
respectively up-sampling a plurality of initial feature maps with different sizes by using a feature map pyramid network of the text detection model and inputting sampling results into a cascade layer of the text detection model;
obtaining a characteristic diagram to be detected through the cascade layer;
respectively inputting the feature map to be detected into a probability prediction branch network and a threshold prediction branch network of the text detection model to obtain a probability prediction map and a threshold map of the feature map to be detected;
inputting the probability prediction graph and the threshold value graph into a differentiable binarization layer of the text detection model to obtain an approximate binarization graph;
and obtaining a text box of the image to be detected, the two diagonal vertex coordinates of the text box and the deflection angle according to the approximate binary image, and intercepting the text box area image from the approximate binary image according to the two diagonal vertex coordinates of the text box and the deflection angle.
In a second aspect, the invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the optical character recognition method of any one of the above.
In a third aspect, the invention provides an electronic device comprising a memory and a processor, the memory having stored therein a computer program that, when executed by the processor, implements the optical character recognition method of any of the above.
Under the condition of adopting the technical scheme, the optical character recognition method and the optical character recognition system can determine the first text box with the character information as the target recognition content from the text boxes of the image to be detected, determine the capture area according to the character information of the first text box, the coordinates of the two opposite angle fixed points and the deflection angle, and screen out the second text box with the character information matched with the target recognition content according to the capture area, so that the optical character recognition result can be obtained at least according to the character information of the first text box and the character information of the second text box. Aiming at the inclined image to be detected, the method does not need to carry out inclination correction, can directly output the identification result, improves the identification effect and reduces the time consumption.
Drawings
Preferred embodiments of the present invention are described below with reference to the accompanying drawings, in which:
FIG. 1 is a flow chart illustrating an optical character recognition method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an image to be detected according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating a method for determining a predetermined empirical value according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a sample image provided by an embodiment of the present invention;
FIG. 5 is a schematic diagram of another sample image provided by an embodiment of the present invention;
FIG. 6 is a flowchart illustrating an optical character recognition method according to another embodiment of the present invention.
Detailed Description
Some embodiments of the invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are only for explaining the technical principle of the present invention, and are not intended to limit the scope of the present invention.
Optical Character Recognition (OCR) refers to a process in which an electronic device (e.g., a scanner or a digital camera) examines characters printed on paper and then translates the shapes into computer text using character Recognition methods; namely, the process of scanning the text data, then analyzing and processing the image file and obtaining the character and layout information. When an OCR technology is applied to perform optical character recognition on an image to be detected, when the image to be detected inclines, the image to be detected needs to be subjected to inclination correction in advance, then information extraction is performed on the basis of the corrected image to be detected, and the optical character recognition of the image to be detected is completed through the two stages. However, the existing method has the disadvantages that the tilt correction result in the first stage directly affects the information extraction effect in the second stage, and the two stages have more processes, which increases the total duration of the optical character recognition.
In view of the above, the present invention provides an optical character recognition method, which determines a first text box with text information as target recognition content from text boxes of an image to be detected, determines a capture area according to the text information of the first text box, coordinates of two opposite angular fixed points and a deflection angle, and screens out a second text box with text information matched with the target recognition content according to the capture area, so as to obtain an optical character recognition result at least according to the text information of the first text box and the text information of the second text box. Aiming at the inclined image to be detected, the method does not need to carry out inclination correction, can directly output the identification result, improves the identification effect and reduces the time consumption.
Referring to fig. 1, fig. 1 is a schematic flow chart of an optical character recognition method according to an embodiment of the present invention, which may include:
step S11: acquiring a text box of an image to be detected and acquiring coordinates of two opposite angle vertexes, deflection angles and character information of the text box; the two diagonal vertex coordinates comprise a top left corner vertex coordinate and a bottom right corner vertex coordinate;
step S12: taking a text box with text information as target identification content as a first text box, and determining a capture area according to the text information of the first text box, coordinates of two opposite angle vertexes and a deflection angle;
step S13: screening a second text box located in the capture area from other text boxes aiming at other text boxes except the first text box in the text box corresponding to the image to be detected, wherein the text information corresponding to the second text box is matched with the target identification content;
step S14: and obtaining an optical character recognition result at least according to the character information of the first text box and the character information of the second text box.
Referring to fig. 2, fig. 2 is a schematic diagram of an image to be detected according to a specific example of the present invention, which may establish a 0-xy two-dimensional coordinate system, and obtain two diagonal vertex coordinates of a middle text box a and a text box B of the image to be detected in the two-dimensional coordinate system, where the two diagonal vertex coordinates of the text box a may be represented as: vertex coordinates of upper left corner is (A) x1 ,A y1 ) The vertex coordinate of the lower right corner is (A) x2 ,A y2 ) Angle of deflection theta 1 (ii) a The two diagonal vertex coordinates for text box B may be expressed as: the vertex coordinate of the upper left corner is (B) x1 ,B y1 ) The vertex coordinate of the lower right corner is (B) x2 ,B y2 ) Angle of deflection θ 2
In some embodiments, step S11 may be specifically to directly obtain, according to the pre-stored data, the text box of the image to be detected, and the coordinates, the deflection angle, and the text information of the two diagonal vertices of the text box. In other embodiments, the text box of the image to be detected, the coordinates of the two diagonal vertices of the text box, and the deflection angle may also be obtained by using a text detection model, and the text information of the image to be detected may be obtained by using a text recognition model, which may be specifically described in another embodiment of the present invention below.
The coordinates of two diagonal vertices, the deflection angle, and the text information of the text box may be stored in a list, and the obtained list may be represented as { [ (a) in fig. 2 as an example x1 ,A y1 ), (A x2 ,A y2 ),angle_θ 1 Text _ "legal representative"],[ (B x1 ,B y1 ), (B x2 ,B y2 ) ,angle_θ 2 Text _ "Li Ming"]}。
In some embodiments, the step S12 of using the text box with the text information as the target identification content as the first text box may specifically be: and screening out a text box with the character information as the target identification content from the plurality of text boxes of the acquired image to be detected by adopting a traversal method according to the target identification content, and taking the text box as a first text box.
Taking fig. 2 as an example, when a legal representative in a document needs to be identified, the "legal representative" can be taken as target identification content, the text box a corresponding to the "legal representative" as a first text box, and "li ming" as another target identification content matching the "legal representative".
In some embodiments, the determining the capture area according to the text information of the first text box, the coordinates of the two diagonal vertices and the deflection angle in step S12 includes:
determining preset empirical values corresponding to character information, two diagonal vertex coordinates and a deflection angle of the first text box, wherein the vertex coordinates of the upper left corner comprise a first direction coordinate and a second direction coordinate of the vertex of the upper left corner, the vertex coordinates of the lower right corner comprise a first direction coordinate and a second direction coordinate of the vertex of the lower right corner, and the preset empirical values comprise a first empirical value, a second empirical value, a third empirical value and a fourth empirical value;
determining a first direction coordinate of a first vertex of the capturing area according to a first direction coordinate and a first experience value of a vertex at the lower right corner of the first text box, wherein the first vertex is a vertex with the closest distance from the capturing area to the vertex at the upper left corner of the first text box;
determining a first direction coordinate of a second vertex of the capture area according to the first direction coordinate of the vertex of the lower right corner and a second empirical value; the second vertex is a diagonal vertex of the first vertex;
determining a second direction coordinate of the first vertex according to the first direction coordinate of the first vertex, the first direction coordinate of the top left vertex, the second direction coordinate of the top left vertex, the deflection angle and a third empirical value;
determining a second direction coordinate of the second vertex according to the first direction coordinate of the second vertex, the first direction coordinate of the vertex of the lower right corner, the second direction coordinate of the vertex of the lower right corner, the deflection angle and the fourth empirical value;
the capture area is determined from the first and second directional coordinates of the first vertex and the first and second directional coordinates of the second vertex.
A two-dimensional coordinate system may be constructed in advance based on a first direction and a second direction that are perpendicular to each other, taking fig. 2 as an example, the first direction may be an x direction, and the second direction may be a y direction, so as to obtain two diagonal vertex coordinates of the text box in the corresponding coordinate system.
In some embodiments, the capture area may be determined by the following expression:
Figure 60201DEST_PATH_IMAGE001
Figure 901118DEST_PATH_IMAGE002
Figure 253733DEST_PATH_IMAGE003
Figure 838299DEST_PATH_IMAGE004
wherein the content of the first and second substances,
Figure 209237DEST_PATH_IMAGE005
and
Figure 955476DEST_PATH_IMAGE006
respectively representing a first directional coordinate and a second directional coordinate of the first vertex,
Figure 779076DEST_PATH_IMAGE007
and
Figure 416599DEST_PATH_IMAGE008
respectively representing a first directional coordinate and a second directional coordinate of the second vertex,
Figure DEST_PATH_IMAGE033
and
Figure 173203DEST_PATH_IMAGE034
respectively representing a first direction coordinate and a second direction coordinate of a vertex at the upper left corner,
Figure 90343DEST_PATH_IMAGE011
and
Figure 417550DEST_PATH_IMAGE012
respectively, a first direction coordinate and a second direction coordinate of a vertex of a lower right corner, theta represents a deflection angle,
Figure 343918DEST_PATH_IMAGE013
a first empirical value is represented that is,
Figure DEST_PATH_IMAGE035
a second empirical value is represented that is,
Figure 955028DEST_PATH_IMAGE036
a third empirical value is represented that is,
Figure DEST_PATH_IMAGE037
a fourth empirical value is indicated.
In the embodiment of the present invention, referring to fig. 3, the preset empirical value may be determined by the following steps before step S12:
step S31: respectively setting actual capturing areas covering the second text box aiming at the first text box and the second text box under different deflection angles;
step S32: acquiring coordinates of two opposite angle vertexes of the first text box and coordinates of a third vertex and a fourth vertex of the actual capturing area, wherein the third vertex is the vertex of the actual capturing area with the closest distance from the vertex at the upper left corner of the first text box, and the fourth vertex is the opposite angle vertex of the third vertex;
step S33: and determining a preset experience value according to the deflection angle, the coordinates of the two diagonal vertexes of the first text box, the coordinates of the third vertex and the coordinates of the fourth vertex, and correspondingly storing the preset experience value and the character information, the deflection angle and the coordinates of the two diagonal vertexes of the first text box.
Referring to fig. 4, fig. 4 is a schematic diagram of a sample image provided by an embodiment of the present invention, on which a preset empirical value may be determined.
And establishing a two-dimensional coordinate system in a first direction a and a second direction b which are perpendicular to each other, wherein the sample image comprises a first text box C and a second text box D, and the actual capturing area is a dotted line rectangular box. Wherein, two diagonal vertex coordinates that can obtain the first text box C are respectively (a) a1 ,A b1 ) And (A) a2 ,A b2 ) The coordinates of two diagonal vertexes of the second text box D are respectively (B) a1 ,B b1 ) And (B) a2 ,B b2 ) The deflection angles of the first text box C and the second text box D are both theta, and the third vertex coordinate of the actual capture area is (M) a01 ,M b01 ) And (M) a02 ,M b02 )。
In some embodiments, step S31 may be embodied as adjusting the size of the actual capture area as needed on the premise of ensuring that the actual capture area can completely cover the second text box and not cover other text boxes. Wherein the actual capture area may be rectangular and the same as the deflection angle of the first text box. In other embodiments, the capture area may take on other shapes as well.
In some embodiments, determining the preset empirical value according to the deflection angle, the coordinates of the two diagonal vertices of the first text box, the coordinates of the third vertex and the fourth vertex in step S33 includes:
determining a first intersection point according to the intersection point of the first auxiliary line and the first extension line, wherein the first auxiliary line passes through a third vertex of the actual capture area and is parallel to the second direction, and the first extension line is an extension line of a vertex at the upper left corner and a vertex at the upper right corner of the first text box;
determining a second intersection point according to an intersection point of a second auxiliary line and a second extension line, wherein the second auxiliary line passes through a fourth vertex of the actual capturing area and is parallel to the second direction, the second extension line passes through a vertex of a lower right corner of the first text box, and an inclination angle of the second extension line is equal to the deflection angle;
constructing a position relation model of the actual capture area and the first text box according to the first intersection point, the second intersection point, the preset empirical value and the deflection angle;
based on the position relation model, determining a preset empirical value according to the coordinates of two diagonal vertexes of the first text box and the coordinates of a third vertex and a fourth vertex of the actual capturing area;
the preset empirical value comprises a first empirical value, a second empirical value, a third empirical value and a fourth empirical value, and the position relationship model is expressed as:
Figure 354654DEST_PATH_IMAGE038
Figure DEST_PATH_IMAGE039
Figure 418425DEST_PATH_IMAGE040
Figure DEST_PATH_IMAGE041
Figure 430375DEST_PATH_IMAGE042
Figure 99253DEST_PATH_IMAGE022
wherein the content of the first and second substances,
Figure 889355DEST_PATH_IMAGE023
and
Figure DEST_PATH_IMAGE043
first and second direction coordinates respectively representing third vertices of the actual capture area,
Figure 961128DEST_PATH_IMAGE025
and
Figure 229298DEST_PATH_IMAGE044
first and second direction coordinates respectively representing a fourth vertex of the actual capturing area,
Figure DEST_PATH_IMAGE045
a second directional coordinate representing the first intersection point,
Figure 549421DEST_PATH_IMAGE046
a second directional coordinate representing a second intersection point,
Figure 510424DEST_PATH_IMAGE033
and
Figure 768361DEST_PATH_IMAGE034
respectively representing a first directional coordinate and a second directional coordinate of the top left corner vertex,
Figure 840222DEST_PATH_IMAGE011
and
Figure 14852DEST_PATH_IMAGE012
a first direction coordinate and a second direction coordinate respectively representing a vertex of a lower right corner, theta represents a deflection angle,
Figure DEST_PATH_IMAGE047
a first empirical value is represented that is,
Figure 396023DEST_PATH_IMAGE035
represents the second warpThe value of the test is tested,
Figure 656103DEST_PATH_IMAGE036
a third empirical value is represented that is,
Figure 266076DEST_PATH_IMAGE037
a fourth empirical value is indicated.
Referring to fig. 5, a perpendicular line to the first auxiliary line may be drawn at the top left vertex of the first text box C, and the first intersection point E, the top left vertex of the first text box C, and the intersection point of the perpendicular line and the first auxiliary line may form a right triangle, wherein the angle between the first extension line and the perpendicular line is equal to the deflection angle θ, based on which the relation formula in the position relation model may be determined:
Figure 295212DEST_PATH_IMAGE019
similarly, a perpendicular line to the second auxiliary line may be made at the top of the lower right corner of the first text box C, the second intersection point F, the top of the lower right corner of the first text box C, and the intersection point of the perpendicular line and the second auxiliary line may form a right triangle, where an included angle between the second extension line and the perpendicular line is equal to the deflection angle θ, and based on this, a relational expression in the position relation model may be determined
Figure 83171DEST_PATH_IMAGE048
In some embodiments, step S13 may specifically be:
according to the first direction coordinate of the first vertex of the capture area
Figure DEST_PATH_IMAGE049
And first direction coordinates of the second vertex
Figure 96126DEST_PATH_IMAGE007
Judging that the first direction coordinate of the vertex except the first text box and the top left corner in the text box is less than
Figure 509790DEST_PATH_IMAGE049
And the first direction coordinate of the vertex of the lower right corner is larger than
Figure 393432DEST_PATH_IMAGE007
And the text box is taken as a second text box.
As an example, when the text information of the first text box is the legal representative, a text box with the text information of the specific name "li ming" of the legal representative is correspondingly determined to be used as a second text box; and when the text information of the first text box is the 'identification card number', correspondingly determining that the text box with the text information as the number of the specific identification card number is used as a second text box.
In some embodiments, step S14 may specifically be: and when a second text box is obtained, splicing the character information of the first text box and the character information of the second text box into a dictionary to obtain an optical character recognition result.
As an example, the text information of the first text box is "legal representative", and the text information of the second text box is "Li Ming", so that the dictionary of { "legal representative" can be obtained: "Li Ming" }.
In other embodiments, step S14 may specifically be: when a plurality of second text boxes are obtained, sorting the plurality of second text boxes from near to far according to the distance between the top left corner vertex or the bottom right corner vertex of each second text box and the top left corner vertex or the bottom right corner vertex of the first text box;
and splicing the character information of the first text box and the character information of the second text boxes into a dictionary based on the sequence of the second text boxes after sequencing to obtain an optical character recognition result.
As an example, the plurality of second text boxes are ordered according to the distance between the top left corner vertex of the plurality of second text boxes and the top left corner vertex of the first text box from near to far, so as to obtain a second text box B 1 <Second text box B 2 <Second text box B 3 After the text information of the first text box and the text information of the plurality of second text boxes are spliced into a dictionary, a text _ a: text _ B 1 + text_ B 2 + text_ B 3
In other embodiments, step S14 may specifically be: when a plurality of second text boxes are obtained, sorting according to the size of the first direction coordinates of the top left corner vertex or the bottom right corner vertex of the plurality of second text boxes; or sorting according to the size of the second direction coordinates of the top left corner vertex or the bottom right corner vertex of the second text boxes;
and splicing the character information of the first text box and the character information of the plurality of sequenced second text boxes into a dictionary to obtain an optical character recognition result.
Wherein the ordering of the plurality of second text boxes may be determined according to the relative positions of the first text box and the second text box. As an example, when the first text box is on the left side of the second text boxes and the first direction coordinate of the top left vertex of the first text box is smaller than the first direction coordinates of the top left vertices of all the second text boxes, the plurality of second text boxes may be ordered from small to large according to the first direction coordinates of the top left vertices of the second text boxes.
The optical character recognition method provided by the embodiment of the invention determines the first text box with the character information as the target recognition content from the text boxes of the image to be detected, determines the capture area according to the character information of the first text box, the coordinates of the two opposite angular fixed points and the deflection angle, and screens out the second text box with the character information matched with the target recognition content according to the capture area, so that the optical character recognition result can be obtained at least according to the character information of the first text box and the character information of the second text box. Aiming at the inclined image to be detected, the method does not need to carry out inclination correction, can directly output the identification result, improves the identification effect and reduces the time consumption.
Referring to fig. 6, fig. 6 is a schematic flow chart of an optical character recognition method according to another embodiment of the present invention, which may include:
step S61: inputting an image to be detected into a text detection model to obtain a text box area image of the image to be detected and two diagonal vertex coordinates and a deflection angle of each text box in the text box area image;
step S62: inputting the text box area image into a character recognition model to obtain character information corresponding to the text box;
step S63: taking a text box with text information as target identification content as a first text box, and determining a capture area according to the text information of the first text box, coordinates of two opposite angle vertexes and a deflection angle;
step S64: screening a second text box positioned in the capture area from the rest text boxes aiming at the rest text boxes except the first text box in the text box corresponding to the image to be detected, wherein the character information corresponding to the second text box is matched with the target identification content;
step S65: and obtaining an optical character recognition result at least according to the character information of the first text box and the character information of the second text box.
Steps S63 to S65 may be performed in the same manner as steps S12 to S14, and for brevity, steps S61 and S62 will be mainly described hereinafter.
In some embodiments, step S61 may specifically be:
inputting an image to be detected into a backbone network of a text detection model to obtain a plurality of initial characteristic graphs with different sizes;
respectively up-sampling a plurality of initial feature maps with different sizes by using a feature map pyramid network of a text detection model and inputting sampling results into a cascade layer of the text detection model;
obtaining a characteristic diagram to be detected through a cascade layer;
respectively inputting the feature map to be detected into a probability prediction branch network and a threshold prediction branch network of the text detection model to obtain a probability prediction map and a threshold map of the feature map to be detected;
inputting the probability prediction graph and the threshold graph into a differentiable binarization layer of the text detection model to obtain an approximate binarization graph;
and obtaining a text box of the image to be detected, two diagonal vertex coordinates and a deflection angle of the text box according to the approximate binary image, and intercepting a text box area image from the approximate binary image according to the two diagonal vertex coordinates and the deflection angle of the text box.
Wherein the two diagonal vertex coordinates may include an upper left vertex coordinate and a lower right vertex coordinate.
As an example, the text detection model may employ a DBNet (differential localization Net) model.
In some embodiments, after the text box region map is obtained, the text box region map may be further labeled based on the obtained diagonal vertex coordinates and deflection angle of each text box.
In some embodiments, step S62 may specifically be inputting the labeled text box area map into the character recognition model to obtain character information corresponding to the text box.
As an example, the word recognition model may employ a DenseNet model.
The optical character recognition method provided in another embodiment of the present invention can achieve the same beneficial effects as those of the embodiment corresponding to fig. 1, and combines the text detection model and the character recognition model, so that the preprocessing step of the image to be detected can be reduced, and the optical character recognition can be directly performed on the image to be detected.
Another aspect of the present invention also provides a computer-readable storage medium, in which a computer program is stored, and the computer program can implement the optical character recognition method in any one of the above embodiments when executed by a processor. The computer readable storage medium may be a storage device formed by including various electronic devices, and optionally, the computer readable storage medium is a non-transitory computer readable storage medium in the embodiment of the present invention.
In another aspect of the present invention, an electronic device is further provided, which includes: a memory and a processor, the memory having stored therein a computer program, the computer program when executed by the processor implementing the optical character recognition method as described in any of the above embodiments.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims (10)

1. An optical character recognition method, comprising:
acquiring a text box of an image to be detected and acquiring two diagonal vertex coordinates, a deflection angle and character information of the text box; wherein the two diagonal vertex coordinates comprise an upper left vertex coordinate and a lower right vertex coordinate;
taking the text box with the text information as target identification content as a first text box, and determining a capture area according to the text information of the first text box, the coordinates of the two diagonal vertexes and the deflection angle;
aiming at the rest text boxes except the first text box in the text box corresponding to the image to be detected, screening out a second text box positioned in the capture area from the rest text boxes, wherein the character information corresponding to the second text box is matched with the target identification content;
and obtaining an optical character recognition result at least according to the character information of the first text box and the character information of the second text box.
2. The method of claim 1, wherein determining a capture area based on the text information of the first text box, the two diagonal vertex coordinates, and the deflection angle comprises:
determining preset empirical values corresponding to the text information of the first text box, the two diagonal vertex coordinates and the deflection angle, wherein the vertex coordinates at the upper left corner comprise a first direction coordinate and a second direction coordinate of the vertex at the upper left corner, the vertex coordinates at the lower right corner comprise a first direction coordinate and a second direction coordinate of the vertex at the lower right corner, and the preset empirical values comprise a first empirical value, a second empirical value, a third empirical value and a fourth empirical value;
determining a first direction coordinate of a first vertex of the capture area according to the first direction coordinate of the lower right corner vertex of the first text box and the first experience value, wherein the first vertex is a vertex of the capture area closest to the upper left corner vertex of the first text box;
determining a first direction coordinate of a second vertex of the capture area according to the first direction coordinate of the vertex of the lower right corner and the second empirical value; the second vertex is a diagonal vertex of the first vertex;
determining a second direction coordinate of the first vertex according to the first direction coordinate of the first vertex, the first direction coordinate of the top left vertex, the second direction coordinate of the top left vertex, the deflection angle and the third empirical value;
determining a second direction coordinate of the second vertex according to the first direction coordinate of the second vertex, the first direction coordinate of the vertex of the lower right corner, the second direction coordinate of the vertex of the lower right corner, the deflection angle and the fourth empirical value;
determining the capture area according to the first and second directional coordinates of the first vertex and the first and second directional coordinates of the second vertex.
3. The method of claim 2, wherein the capture area is determined by the expression:
Figure 394695DEST_PATH_IMAGE001
Figure 108574DEST_PATH_IMAGE002
Figure 170202DEST_PATH_IMAGE003
Figure 830990DEST_PATH_IMAGE004
wherein the content of the first and second substances,
Figure 176521DEST_PATH_IMAGE005
and
Figure 530142DEST_PATH_IMAGE006
respectively representing a first direction coordinate and a second direction coordinate of the first vertex,
Figure 577601DEST_PATH_IMAGE007
and
Figure 307660DEST_PATH_IMAGE008
respectively representing first and second directional coordinates of the second vertex,
Figure 507697DEST_PATH_IMAGE009
and
Figure 32219DEST_PATH_IMAGE010
respectively representing a first directional coordinate and a second directional coordinate of the top left corner vertex,
Figure 52128DEST_PATH_IMAGE011
and
Figure 71030DEST_PATH_IMAGE012
a first direction coordinate and a second direction coordinate respectively representing the vertex of the lower right corner, theta represents the deflection angle,
Figure 125574DEST_PATH_IMAGE013
a value representing a first of said empirical values is,
Figure 86577DEST_PATH_IMAGE014
a second empirical value is expressed that represents a second empirical value,
Figure 859361DEST_PATH_IMAGE015
a third empirical value is expressed that represents a third empirical value,
Figure 649331DEST_PATH_IMAGE016
represents the fourth empirical value.
4. The method of claim 2 or 3, wherein before determining a capture area based on the text information of the first text box, the two diagonal vertex coordinates, and the deflection angle, the method further comprises:
setting actual capture areas covering the second text box aiming at the first text box and the second text box under different deflection angles respectively;
obtaining coordinates of the two diagonal vertexes of the first text box and coordinates of a third vertex and a fourth vertex of the actual capturing area, wherein the third vertex is the vertex of the actual capturing area closest to the vertex of the upper left corner of the first text box, and the fourth vertex is the diagonal vertex of the third vertex;
and determining the preset empirical value according to the deflection angle, the coordinates of the two diagonal vertexes of the first text box, the coordinates of the third vertex and the fourth vertex, and correspondingly storing the preset empirical value with the character information, the deflection angle and the coordinates of the two diagonal vertexes of the first text box.
5. The method of claim 4, wherein determining the preset empirical value based on the deflection angle, the coordinates of the two diagonal vertices of the first text box, the coordinates of the third vertex and the fourth vertex comprises:
determining a first intersection point according to an intersection point of a first auxiliary line and a first extension line, wherein the first auxiliary line passes through the third vertex of the actual capture area and is parallel to the second direction, and the first extension line is an extension line passing through the top left vertex and the top right vertex of the first text box;
determining a second intersection point according to an intersection point of a second auxiliary line and a second extended line, wherein the second auxiliary line passes through the fourth vertex of the actual capture area and is parallel to the second direction, the second extended line passes through the vertex of the lower right corner of the first text box, and the inclination angle of the second extended line is equal to the deflection angle;
constructing a position relation model of the actual capturing area and the first text box according to the first intersection point, the second intersection point, the preset empirical value and the deflection angle;
determining the preset empirical value according to the two diagonal vertex coordinates of the first text box and the coordinates of the third vertex and the fourth vertex of the actual capturing area based on the positional relationship model;
the preset empirical value includes a first empirical value, a second empirical value, a third empirical value and a fourth empirical value, and the positional relationship model is expressed as:
Figure 823960DEST_PATH_IMAGE017
Figure 690285DEST_PATH_IMAGE018
Figure 950365DEST_PATH_IMAGE019
Figure 311071DEST_PATH_IMAGE020
Figure 340207DEST_PATH_IMAGE021
Figure 377433DEST_PATH_IMAGE022
wherein, the first and the second end of the pipe are connected with each other,
Figure 859230DEST_PATH_IMAGE023
and
Figure 272893DEST_PATH_IMAGE024
first and second direction coordinates respectively representing the third vertex of the actual capture area,
Figure 411663DEST_PATH_IMAGE025
and
Figure 885369DEST_PATH_IMAGE026
first and second direction coordinates respectively representing the fourth vertex of the actual capture area,
Figure 588883DEST_PATH_IMAGE027
a second directional coordinate representing the first intersection point,
Figure 540659DEST_PATH_IMAGE028
a second directional coordinate representing the second intersection point,
Figure 295119DEST_PATH_IMAGE029
and
Figure 939727DEST_PATH_IMAGE030
respectively representing a first directional coordinate and a second directional coordinate of the top left corner vertex,
Figure 396116DEST_PATH_IMAGE031
and
Figure 151583DEST_PATH_IMAGE032
a first directional coordinate and a second directional coordinate respectively representing the vertex of the lower right corner, theta represents the deflection angle,
Figure 993506DEST_PATH_IMAGE033
a value representing a first of said empirical values is,
Figure 809015DEST_PATH_IMAGE034
a second empirical value is expressed that is representative of the second empirical value,
Figure 487121DEST_PATH_IMAGE035
a third empirical value is expressed that represents a third empirical value,
Figure 46278DEST_PATH_IMAGE036
represents the fourth empirical value.
6. The method of claim 1, wherein obtaining an optical character recognition result based on at least the text information of the first text box and the text information of the second text box comprises:
when one second text box is obtained, splicing the character information of the first text box and the character information of the second text box into a dictionary to obtain an optical character recognition result;
when a plurality of second text boxes are obtained, sorting the plurality of second text boxes from near to far according to the distance between the top left corner vertex or the bottom right corner vertex of each second text box and the top left corner vertex or the bottom right corner vertex of the first text box;
splicing the character information of the first text box and the character information of the second text boxes into a dictionary based on the sequence of the second text boxes after sequencing to obtain an optical character recognition result;
alternatively, the first and second electrodes may be,
when a plurality of second text boxes are obtained, sorting according to the size of the first-direction coordinates of the top-left corner vertex or the bottom-right corner vertex of the plurality of second text boxes; or sorting according to the size of the second direction coordinates of the top left corner vertex or the bottom right corner vertex of the second text boxes;
and splicing the character information of the first text box and the character information of the plurality of sequenced second text boxes into a dictionary to obtain an optical character recognition result.
7. The method of claim 1, wherein the obtaining a text box of an image to be detected and obtaining two diagonal vertex coordinates, a deflection angle and text information of the text box comprises:
inputting the image to be detected into a text detection model to obtain a text box area image of the image to be detected, and the two diagonal vertex coordinates and the deflection angle of each text box in the text box area image;
and inputting the text box area graph into a character recognition model to obtain character information corresponding to the text box.
8. The method according to claim 7, wherein inputting the image to be detected into a text detection model to obtain a text box area map of the image to be detected and the two diagonal vertex coordinates and the deflection angle of each text box in the text box area map comprises:
inputting the image to be detected into a backbone network of the text detection model to obtain a plurality of initial feature maps with different sizes;
respectively up-sampling a plurality of initial feature maps with different sizes by using a feature map pyramid network of the text detection model and inputting sampling results into a cascade layer of the text detection model;
obtaining a characteristic diagram to be detected through the cascade layer;
respectively inputting the feature map to be detected into a probability prediction branch network and a threshold prediction branch network of the text detection model to obtain a probability prediction map and a threshold map of the feature map to be detected;
inputting the probability prediction graph and the threshold graph into a differentiable binarization layer of the text detection model to obtain an approximate binarization graph;
and obtaining a text box of the image to be detected, the two diagonal vertex coordinates of the text box and the deflection angle according to the approximate binary image, and intercepting the text box area image from the approximate binary image according to the two diagonal vertex coordinates of the text box and the deflection angle.
9. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, implements the optical character recognition method according to any one of claims 1 to 8.
10. An electronic device, comprising a memory and a processor, wherein the memory stores a computer program, and the computer program, when executed by the processor, implements the optical character recognition method of any one of claims 1 to 8.
CN202210954615.2A 2022-08-10 2022-08-10 Optical character recognition method, computer readable storage medium and electronic device Pending CN115331229A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210954615.2A CN115331229A (en) 2022-08-10 2022-08-10 Optical character recognition method, computer readable storage medium and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210954615.2A CN115331229A (en) 2022-08-10 2022-08-10 Optical character recognition method, computer readable storage medium and electronic device

Publications (1)

Publication Number Publication Date
CN115331229A true CN115331229A (en) 2022-11-11

Family

ID=83921764

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210954615.2A Pending CN115331229A (en) 2022-08-10 2022-08-10 Optical character recognition method, computer readable storage medium and electronic device

Country Status (1)

Country Link
CN (1) CN115331229A (en)

Similar Documents

Publication Publication Date Title
US20150347837A1 (en) Model-based dewarping method and apparatus
CN111914834A (en) Image recognition method and device, computer equipment and storage medium
US20160253573A1 (en) Automatically Capturing and Cropping Image of Check from Video Sequence for Banking or other Computing Application
CN105308944A (en) Classifying objects in images using mobile devices
CN112734641A (en) Training method and device of target detection model, computer equipment and medium
CN110647882A (en) Image correction method, device, equipment and storage medium
CN111325104A (en) Text recognition method, device and storage medium
JP2019102061A5 (en)
JP2019102061A (en) Text line segmentation method
US7110568B2 (en) Segmentation of a postal object digital image by Hough transform
CN112613506A (en) Method and device for recognizing text in image, computer equipment and storage medium
CN112507782A (en) Text image recognition method and device
CN115359239A (en) Wind power blade defect detection and positioning method and device, storage medium and electronic equipment
CN109635729B (en) Form identification method and terminal
CN114511865A (en) Method and device for generating structured information and computer readable storage medium
CN113221897B (en) Image correction method, image text recognition method, identity verification method and device
CN112580499A (en) Text recognition method, device, equipment and storage medium
CN115457559B (en) Method, device and equipment for intelligently correcting texts and license pictures
CN115331229A (en) Optical character recognition method, computer readable storage medium and electronic device
CN111738979A (en) Automatic certificate image quality inspection method and system
CN115205113A (en) Image splicing method, device, equipment and storage medium
CN114926829A (en) Certificate detection method and device, electronic equipment and storage medium
CN112418210B (en) Intelligent classification method for tower inspection information
CN114359352A (en) Image processing method, apparatus, device, storage medium, and computer program product
CN113159029A (en) Method and system for accurately capturing local information in picture

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination