CN115331229A

CN115331229A - Optical character recognition method, computer readable storage medium and electronic device

Info

Publication number: CN115331229A
Application number: CN202210954615.2A
Authority: CN
Inventors: 王闯
Original assignee: Beijing Jiehui Technology Co Ltd
Current assignee: Beijing Jiehui Technology Co Ltd
Priority date: 2022-08-10
Filing date: 2022-08-10
Publication date: 2022-11-11

Abstract

The invention relates to the technical field of computers, in particular to an optical character recognition method, a computer readable storage medium and electronic equipment, aiming at solving the problems that the existing optical character recognition method aiming at an inclined image to be detected needs to be performed with inclination correction before information extraction, the recognition effect is poor, the working procedures are multiple and time is consumed. For this purpose, the optical character recognition method comprises the steps of determining a first text box with character information as target recognition content from text boxes of an image to be detected, determining a capture area according to the character information of the first text box, coordinates of two opposite angle fixed points and a deflection angle, and screening out a second text box with the character information matched with the target recognition content according to the capture area, so that an optical character recognition result can be obtained at least according to the character information of the first text box and the character information of the second text box. Aiming at the inclined image to be detected, the method does not need to carry out inclination correction, can directly output the identification result, improves the identification effect and reduces the time consumption.

Description

Optical character recognition method, computer readable storage medium and electronic device

Technical Field

The invention relates to the technical field of computers, and particularly provides an optical character recognition method, a computer-readable storage medium and electronic equipment.

Background

Optical Character Recognition (OCR) refers to a process in which an electronic device (e.g., a scanner or a digital camera) examines characters printed on paper and then translates the shapes into computer text using character Recognition methods; that is, the process of scanning the text data, then analyzing the image file and obtaining the character and layout information. When an OCR technology is applied to perform optical character recognition on an image to be detected, when the image to be detected is inclined, inclination correction needs to be performed on the image to be detected in advance, then information extraction is performed on the basis of the corrected image to be detected, and the optical character recognition of the image to be detected is completed through the two stages. However, the existing method has the disadvantages that the tilt correction result in the first stage directly affects the information extraction effect in the second stage, and the two stages have more processes, which increases the total duration of the optical character recognition.

Disclosure of Invention

The invention aims to solve the technical problems that the existing optical character recognition for the inclined image to be detected needs to be performed with inclination correction before information extraction, the recognition effect is poor, the working procedures are multiple, and time is consumed.

In a first aspect, the present invention provides an optical character recognition method comprising:

acquiring a text box of an image to be detected and acquiring two diagonal vertex coordinates, a deflection angle and character information of the text box; wherein the two diagonal vertex coordinates comprise an upper left vertex coordinate and a lower right vertex coordinate;

taking the text box with the text information as target identification content as a first text box, and determining a capture area according to the text information of the first text box, the coordinates of the two diagonal vertexes and the deflection angle;

aiming at the rest text boxes except the first text box in the text box corresponding to the image to be detected, screening out a second text box positioned in the capture area from the rest text boxes, wherein the character information corresponding to the second text box is matched with the target identification content;

and obtaining an optical character recognition result at least according to the character information of the first text box and the character information of the second text box.

In some embodiments, said determining a capture area from said text information of said first text box, said two diagonal vertex coordinates and said deflection angle comprises:

determining preset empirical values corresponding to the text information, the two diagonal vertex coordinates and the deflection angle of the first text box, wherein the vertex coordinates at the upper left corner comprise a first direction coordinate and a second direction coordinate of the vertex at the upper left corner, the vertex coordinates at the lower right corner comprise a first direction coordinate and a second direction coordinate of the vertex at the lower right corner, and the preset empirical values comprise a first empirical value, a second empirical value, a third empirical value and a fourth empirical value;

determining a first direction coordinate of a first vertex of the capture area according to the first direction coordinate of the lower right corner vertex of the first text box and the first experience value, wherein the first vertex is a vertex of the capture area closest to the upper left corner vertex of the first text box;

determining a first direction coordinate of a second vertex of the capture area according to the first direction coordinate of the vertex of the lower right corner and the second empirical value; the second vertex is a diagonal vertex of the first vertex;

determining a second direction coordinate of the first vertex according to the first direction coordinate of the first vertex, the first direction coordinate of the top left vertex, the second direction coordinate of the top left vertex, the deflection angle and the third empirical value;

determining a second direction coordinate of the second vertex according to the first direction coordinate of the second vertex, the first direction coordinate of the vertex of the lower right corner, the second direction coordinate of the vertex of the lower right corner, the deflection angle and the fourth empirical value;

determining the capture area according to the first and second directional coordinates of the first vertex and the first and second directional coordinates of the second vertex.

In some embodiments, the capture area is determined by the following expression:

wherein the content of the first and second substances,

and

respectively representing a first directional coordinate and a second directional coordinate of the first vertex,

and

respectively representing a first directional coordinate and a second directional coordinate of the second vertex,

and

respectively representing a first directional coordinate and a second directional coordinate of the top left corner vertex,

and

a first directional coordinate and a second directional coordinate respectively representing the vertex of the lower right corner, theta represents the deflection angle,

a first set of empirical values is expressed that,

a second empirical value is expressed that represents a second empirical value,

a third empirical value is expressed that represents a third empirical value,

represents the fourth empirical value.

In some embodiments, before said determining a capture area from said text information of said first text box, said two diagonal vertex coordinates and said deflection angle, said method further comprises:

setting actual capture areas covering the second text box aiming at the first text box and the second text box under different deflection angles respectively;

obtaining coordinates of the two diagonal vertexes of the first text box and coordinates of a third vertex and a fourth vertex of the actual capturing area, wherein the third vertex is the vertex of the actual capturing area closest to the vertex of the upper left corner of the first text box, and the fourth vertex is the diagonal vertex of the third vertex;

and determining the preset empirical value according to the deflection angle, the coordinates of the two diagonal vertexes of the first text box, the coordinates of the third vertex and the fourth vertex, and correspondingly storing the preset empirical value with the character information, the deflection angle and the coordinates of the two diagonal vertexes of the first text box.

In some embodiments, said determining said preset empirical value based on said deflection angle, said two diagonal vertex coordinates of said first text box, said third vertex and said fourth vertex coordinates comprises:

determining a first intersection point according to an intersection point of a first auxiliary line and a first extension line, wherein the first auxiliary line passes through the third vertex of the actual capture area and is parallel to the second direction, and the first extension line is an extension line passing through the top left vertex and the top right vertex of the first text box;

determining a second intersection point according to an intersection point of a second auxiliary line passing through the fourth vertex of the actual capturing area and being parallel to the second direction and a second extended line passing through a lower right corner vertex of the first text box and having an inclination angle equal to the deflection angle;

constructing a position relation model of the actual capturing area and the first text box according to the first intersection point, the second intersection point, the preset experience value and the deflection angle;

determining the preset empirical value according to the two diagonal vertex coordinates of the first text box and the coordinates of the third vertex and the fourth vertex of the actual capturing area based on the positional relationship model;

the preset empirical value includes a first empirical value, a second empirical value, a third empirical value and a fourth empirical value, and the position relationship model is expressed as:

wherein, the first and the second end of the pipe are connected with each other,

and

first and second directional coordinates respectively representing the third vertex of the actual capture area,

and

first and second direction coordinates respectively representing the fourth vertex of the actual capture area,

a second directional coordinate representing the first intersection point,

a second directional coordinate representing the second intersection point,

and

and

a value representing a first of said empirical values is,

a second empirical value is expressed that represents a second empirical value,

a third empirical value is expressed that represents a third empirical value,

represents the fourth empirical value.

In some embodiments, said deriving an optical character recognition result from at least said text information of said first text box and said text information of said second text box comprises:

when one second text box is obtained, splicing the character information of the first text box and the character information of the second text box into a dictionary to obtain an optical character recognition result;

when a plurality of second text boxes are obtained, sorting the plurality of second text boxes from near to far according to the distance between the top left corner vertex or the bottom right corner vertex of each second text box and the top left corner vertex or the bottom right corner vertex of the first text box;

splicing the character information of the first text box and the character information of the second text boxes into a dictionary based on the sequence of the second text boxes after sequencing to obtain an optical character recognition result;

alternatively, the first and second electrodes may be,

when a plurality of second text boxes are obtained, sorting according to the size of the first-direction coordinates of the top-left corner vertex or the bottom-right corner vertex of the plurality of second text boxes; or sorting according to the size of the second direction coordinate of the top left corner vertex or the bottom right corner vertex of the second text boxes;

and splicing the character information of the first text box and the sequenced character information of the plurality of second text boxes into a dictionary to obtain an optical character recognition result.

In some embodiments, the obtaining a text box of an image to be detected and obtaining two diagonal vertex coordinates, a deflection angle and text information of the text box includes:

inputting the image to be detected into a text detection model to obtain a text box area image of the image to be detected, and the two diagonal vertex coordinates and the deflection angle of each text box in the text box area image;

and inputting the text box area graph into a character recognition model to obtain character information corresponding to the text box.

In some embodiments, inputting the image to be detected into a text detection model to obtain a text box area map of the image to be detected, and the two diagonal vertex coordinates and the deflection angle of each text box in the text box area map, includes:

inputting the image to be detected into a backbone network of the text detection model to obtain a plurality of initial characteristic graphs with different sizes;

respectively up-sampling a plurality of initial feature maps with different sizes by using a feature map pyramid network of the text detection model and inputting sampling results into a cascade layer of the text detection model;

obtaining a characteristic diagram to be detected through the cascade layer;

respectively inputting the feature map to be detected into a probability prediction branch network and a threshold prediction branch network of the text detection model to obtain a probability prediction map and a threshold map of the feature map to be detected;

inputting the probability prediction graph and the threshold value graph into a differentiable binarization layer of the text detection model to obtain an approximate binarization graph;

and obtaining a text box of the image to be detected, the two diagonal vertex coordinates of the text box and the deflection angle according to the approximate binary image, and intercepting the text box area image from the approximate binary image according to the two diagonal vertex coordinates of the text box and the deflection angle.

In a second aspect, the invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the optical character recognition method of any one of the above.

In a third aspect, the invention provides an electronic device comprising a memory and a processor, the memory having stored therein a computer program that, when executed by the processor, implements the optical character recognition method of any of the above.

Under the condition of adopting the technical scheme, the optical character recognition method and the optical character recognition system can determine the first text box with the character information as the target recognition content from the text boxes of the image to be detected, determine the capture area according to the character information of the first text box, the coordinates of the two opposite angle fixed points and the deflection angle, and screen out the second text box with the character information matched with the target recognition content according to the capture area, so that the optical character recognition result can be obtained at least according to the character information of the first text box and the character information of the second text box. Aiming at the inclined image to be detected, the method does not need to carry out inclination correction, can directly output the identification result, improves the identification effect and reduces the time consumption.

Drawings

Preferred embodiments of the present invention are described below with reference to the accompanying drawings, in which:

FIG. 1 is a flow chart illustrating an optical character recognition method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an image to be detected according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating a method for determining a predetermined empirical value according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a sample image provided by an embodiment of the present invention;

FIG. 5 is a schematic diagram of another sample image provided by an embodiment of the present invention;

FIG. 6 is a flowchart illustrating an optical character recognition method according to another embodiment of the present invention.

Detailed Description

Some embodiments of the invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are only for explaining the technical principle of the present invention, and are not intended to limit the scope of the present invention.

Optical Character Recognition (OCR) refers to a process in which an electronic device (e.g., a scanner or a digital camera) examines characters printed on paper and then translates the shapes into computer text using character Recognition methods; namely, the process of scanning the text data, then analyzing and processing the image file and obtaining the character and layout information. When an OCR technology is applied to perform optical character recognition on an image to be detected, when the image to be detected inclines, the image to be detected needs to be subjected to inclination correction in advance, then information extraction is performed on the basis of the corrected image to be detected, and the optical character recognition of the image to be detected is completed through the two stages. However, the existing method has the disadvantages that the tilt correction result in the first stage directly affects the information extraction effect in the second stage, and the two stages have more processes, which increases the total duration of the optical character recognition.

In view of the above, the present invention provides an optical character recognition method, which determines a first text box with text information as target recognition content from text boxes of an image to be detected, determines a capture area according to the text information of the first text box, coordinates of two opposite angular fixed points and a deflection angle, and screens out a second text box with text information matched with the target recognition content according to the capture area, so as to obtain an optical character recognition result at least according to the text information of the first text box and the text information of the second text box. Aiming at the inclined image to be detected, the method does not need to carry out inclination correction, can directly output the identification result, improves the identification effect and reduces the time consumption.

Referring to fig. 1, fig. 1 is a schematic flow chart of an optical character recognition method according to an embodiment of the present invention, which may include:

step S11: acquiring a text box of an image to be detected and acquiring coordinates of two opposite angle vertexes, deflection angles and character information of the text box; the two diagonal vertex coordinates comprise a top left corner vertex coordinate and a bottom right corner vertex coordinate;

step S12: taking a text box with text information as target identification content as a first text box, and determining a capture area according to the text information of the first text box, coordinates of two opposite angle vertexes and a deflection angle;

step S13: screening a second text box located in the capture area from other text boxes aiming at other text boxes except the first text box in the text box corresponding to the image to be detected, wherein the text information corresponding to the second text box is matched with the target identification content;

step S14: and obtaining an optical character recognition result at least according to the character information of the first text box and the character information of the second text box.

Referring to fig. 2, fig. 2 is a schematic diagram of an image to be detected according to a specific example of the present invention, which may establish a 0-xy two-dimensional coordinate system, and obtain two diagonal vertex coordinates of a middle text box a and a text box B of the image to be detected in the two-dimensional coordinate system, where the two diagonal vertex coordinates of the text box a may be represented as: vertex coordinates of upper left corner is (A) _x1 ,A _y1 ) The vertex coordinate of the lower right corner is (A) _x2 ,A _y2 ) Angle of deflection theta ₁ (ii) a The two diagonal vertex coordinates for text box B may be expressed as: the vertex coordinate of the upper left corner is (B) _x1 ,B _y1 ) The vertex coordinate of the lower right corner is (B) _x2 ,B _y2 ) Angle of deflection θ ₂ 。

In some embodiments, step S11 may be specifically to directly obtain, according to the pre-stored data, the text box of the image to be detected, and the coordinates, the deflection angle, and the text information of the two diagonal vertices of the text box. In other embodiments, the text box of the image to be detected, the coordinates of the two diagonal vertices of the text box, and the deflection angle may also be obtained by using a text detection model, and the text information of the image to be detected may be obtained by using a text recognition model, which may be specifically described in another embodiment of the present invention below.

The coordinates of two diagonal vertices, the deflection angle, and the text information of the text box may be stored in a list, and the obtained list may be represented as { [ (a) in fig. 2 as an example _x1 ,A _y1 ), (A _x2 ,A _y2 ),angle_θ ₁ Text _ "legal representative"],[ (B _x1 ,B _y1 ), (B _x2 ,B _y2 ) ,angle_θ ₂ Text _ "Li Ming"]}。

In some embodiments, the step S12 of using the text box with the text information as the target identification content as the first text box may specifically be: and screening out a text box with the character information as the target identification content from the plurality of text boxes of the acquired image to be detected by adopting a traversal method according to the target identification content, and taking the text box as a first text box.

Taking fig. 2 as an example, when a legal representative in a document needs to be identified, the "legal representative" can be taken as target identification content, the text box a corresponding to the "legal representative" as a first text box, and "li ming" as another target identification content matching the "legal representative".

In some embodiments, the determining the capture area according to the text information of the first text box, the coordinates of the two diagonal vertices and the deflection angle in step S12 includes:

determining preset empirical values corresponding to character information, two diagonal vertex coordinates and a deflection angle of the first text box, wherein the vertex coordinates of the upper left corner comprise a first direction coordinate and a second direction coordinate of the vertex of the upper left corner, the vertex coordinates of the lower right corner comprise a first direction coordinate and a second direction coordinate of the vertex of the lower right corner, and the preset empirical values comprise a first empirical value, a second empirical value, a third empirical value and a fourth empirical value;

determining a first direction coordinate of a first vertex of the capturing area according to a first direction coordinate and a first experience value of a vertex at the lower right corner of the first text box, wherein the first vertex is a vertex with the closest distance from the capturing area to the vertex at the upper left corner of the first text box;

determining a first direction coordinate of a second vertex of the capture area according to the first direction coordinate of the vertex of the lower right corner and a second empirical value; the second vertex is a diagonal vertex of the first vertex;

determining a second direction coordinate of the first vertex according to the first direction coordinate of the first vertex, the first direction coordinate of the top left vertex, the second direction coordinate of the top left vertex, the deflection angle and a third empirical value;

the capture area is determined from the first and second directional coordinates of the first vertex and the first and second directional coordinates of the second vertex.

A two-dimensional coordinate system may be constructed in advance based on a first direction and a second direction that are perpendicular to each other, taking fig. 2 as an example, the first direction may be an x direction, and the second direction may be a y direction, so as to obtain two diagonal vertex coordinates of the text box in the corresponding coordinate system.

In some embodiments, the capture area may be determined by the following expression:

wherein the content of the first and second substances,

and

and

and

respectively representing a first direction coordinate and a second direction coordinate of a vertex at the upper left corner,

and

respectively, a first direction coordinate and a second direction coordinate of a vertex of a lower right corner, theta represents a deflection angle,

a first empirical value is represented that is,

a second empirical value is represented that is,

a third empirical value is represented that is,

a fourth empirical value is indicated.

In the embodiment of the present invention, referring to fig. 3, the preset empirical value may be determined by the following steps before step S12:

step S31: respectively setting actual capturing areas covering the second text box aiming at the first text box and the second text box under different deflection angles;

step S32: acquiring coordinates of two opposite angle vertexes of the first text box and coordinates of a third vertex and a fourth vertex of the actual capturing area, wherein the third vertex is the vertex of the actual capturing area with the closest distance from the vertex at the upper left corner of the first text box, and the fourth vertex is the opposite angle vertex of the third vertex;

step S33: and determining a preset experience value according to the deflection angle, the coordinates of the two diagonal vertexes of the first text box, the coordinates of the third vertex and the coordinates of the fourth vertex, and correspondingly storing the preset experience value and the character information, the deflection angle and the coordinates of the two diagonal vertexes of the first text box.

Referring to fig. 4, fig. 4 is a schematic diagram of a sample image provided by an embodiment of the present invention, on which a preset empirical value may be determined.

And establishing a two-dimensional coordinate system in a first direction a and a second direction b which are perpendicular to each other, wherein the sample image comprises a first text box C and a second text box D, and the actual capturing area is a dotted line rectangular box. Wherein, two diagonal vertex coordinates that can obtain the first text box C are respectively (a) _a1 ,A _b1 ) And (A) _a2 ,A _b2 ) The coordinates of two diagonal vertexes of the second text box D are respectively (B) _a1 ,B _b1 ) And (B) _a2 ,B _b2 ) The deflection angles of the first text box C and the second text box D are both theta, and the third vertex coordinate of the actual capture area is (M) _a01 ,M _b01 ) And (M) _a02 ,M _b02 )。

In some embodiments, step S31 may be embodied as adjusting the size of the actual capture area as needed on the premise of ensuring that the actual capture area can completely cover the second text box and not cover other text boxes. Wherein the actual capture area may be rectangular and the same as the deflection angle of the first text box. In other embodiments, the capture area may take on other shapes as well.

In some embodiments, determining the preset empirical value according to the deflection angle, the coordinates of the two diagonal vertices of the first text box, the coordinates of the third vertex and the fourth vertex in step S33 includes:

determining a first intersection point according to the intersection point of the first auxiliary line and the first extension line, wherein the first auxiliary line passes through a third vertex of the actual capture area and is parallel to the second direction, and the first extension line is an extension line of a vertex at the upper left corner and a vertex at the upper right corner of the first text box;

determining a second intersection point according to an intersection point of a second auxiliary line and a second extension line, wherein the second auxiliary line passes through a fourth vertex of the actual capturing area and is parallel to the second direction, the second extension line passes through a vertex of a lower right corner of the first text box, and an inclination angle of the second extension line is equal to the deflection angle;

constructing a position relation model of the actual capture area and the first text box according to the first intersection point, the second intersection point, the preset empirical value and the deflection angle;

based on the position relation model, determining a preset empirical value according to the coordinates of two diagonal vertexes of the first text box and the coordinates of a third vertex and a fourth vertex of the actual capturing area;

the preset empirical value comprises a first empirical value, a second empirical value, a third empirical value and a fourth empirical value, and the position relationship model is expressed as:

wherein the content of the first and second substances,

and

first and second direction coordinates respectively representing third vertices of the actual capture area,

and

first and second direction coordinates respectively representing a fourth vertex of the actual capturing area,

a second directional coordinate representing the first intersection point,

a second directional coordinate representing a second intersection point,

and

and

a first direction coordinate and a second direction coordinate respectively representing a vertex of a lower right corner, theta represents a deflection angle,

a first empirical value is represented that is,

represents the second warpThe value of the test is tested,

a third empirical value is represented that is,

a fourth empirical value is indicated.

Referring to fig. 5, a perpendicular line to the first auxiliary line may be drawn at the top left vertex of the first text box C, and the first intersection point E, the top left vertex of the first text box C, and the intersection point of the perpendicular line and the first auxiliary line may form a right triangle, wherein the angle between the first extension line and the perpendicular line is equal to the deflection angle θ, based on which the relation formula in the position relation model may be determined:

。

similarly, a perpendicular line to the second auxiliary line may be made at the top of the lower right corner of the first text box C, the second intersection point F, the top of the lower right corner of the first text box C, and the intersection point of the perpendicular line and the second auxiliary line may form a right triangle, where an included angle between the second extension line and the perpendicular line is equal to the deflection angle θ, and based on this, a relational expression in the position relation model may be determined

。

In some embodiments, step S13 may specifically be:

according to the first direction coordinate of the first vertex of the capture area

And first direction coordinates of the second vertex

Judging that the first direction coordinate of the vertex except the first text box and the top left corner in the text box is less than

And the first direction coordinate of the vertex of the lower right corner is larger than

And the text box is taken as a second text box.

As an example, when the text information of the first text box is the legal representative, a text box with the text information of the specific name "li ming" of the legal representative is correspondingly determined to be used as a second text box; and when the text information of the first text box is the 'identification card number', correspondingly determining that the text box with the text information as the number of the specific identification card number is used as a second text box.

In some embodiments, step S14 may specifically be: and when a second text box is obtained, splicing the character information of the first text box and the character information of the second text box into a dictionary to obtain an optical character recognition result.

As an example, the text information of the first text box is "legal representative", and the text information of the second text box is "Li Ming", so that the dictionary of { "legal representative" can be obtained: "Li Ming" }.

In other embodiments, step S14 may specifically be: when a plurality of second text boxes are obtained, sorting the plurality of second text boxes from near to far according to the distance between the top left corner vertex or the bottom right corner vertex of each second text box and the top left corner vertex or the bottom right corner vertex of the first text box;

and splicing the character information of the first text box and the character information of the second text boxes into a dictionary based on the sequence of the second text boxes after sequencing to obtain an optical character recognition result.

As an example, the plurality of second text boxes are ordered according to the distance between the top left corner vertex of the plurality of second text boxes and the top left corner vertex of the first text box from near to far, so as to obtain a second text box B ₁ <Second text box B ₂ <Second text box B ₃ After the text information of the first text box and the text information of the plurality of second text boxes are spliced into a dictionary, a text _ a: text _ B ₁ + text_ B ₂ + text_ B ₃ ｝

In other embodiments, step S14 may specifically be: when a plurality of second text boxes are obtained, sorting according to the size of the first direction coordinates of the top left corner vertex or the bottom right corner vertex of the plurality of second text boxes; or sorting according to the size of the second direction coordinates of the top left corner vertex or the bottom right corner vertex of the second text boxes;

and splicing the character information of the first text box and the character information of the plurality of sequenced second text boxes into a dictionary to obtain an optical character recognition result.

Wherein the ordering of the plurality of second text boxes may be determined according to the relative positions of the first text box and the second text box. As an example, when the first text box is on the left side of the second text boxes and the first direction coordinate of the top left vertex of the first text box is smaller than the first direction coordinates of the top left vertices of all the second text boxes, the plurality of second text boxes may be ordered from small to large according to the first direction coordinates of the top left vertices of the second text boxes.

The optical character recognition method provided by the embodiment of the invention determines the first text box with the character information as the target recognition content from the text boxes of the image to be detected, determines the capture area according to the character information of the first text box, the coordinates of the two opposite angular fixed points and the deflection angle, and screens out the second text box with the character information matched with the target recognition content according to the capture area, so that the optical character recognition result can be obtained at least according to the character information of the first text box and the character information of the second text box. Aiming at the inclined image to be detected, the method does not need to carry out inclination correction, can directly output the identification result, improves the identification effect and reduces the time consumption.

Referring to fig. 6, fig. 6 is a schematic flow chart of an optical character recognition method according to another embodiment of the present invention, which may include:

step S61: inputting an image to be detected into a text detection model to obtain a text box area image of the image to be detected and two diagonal vertex coordinates and a deflection angle of each text box in the text box area image;

step S62: inputting the text box area image into a character recognition model to obtain character information corresponding to the text box;

step S63: taking a text box with text information as target identification content as a first text box, and determining a capture area according to the text information of the first text box, coordinates of two opposite angle vertexes and a deflection angle;

step S64: screening a second text box positioned in the capture area from the rest text boxes aiming at the rest text boxes except the first text box in the text box corresponding to the image to be detected, wherein the character information corresponding to the second text box is matched with the target identification content;

step S65: and obtaining an optical character recognition result at least according to the character information of the first text box and the character information of the second text box.

Steps S63 to S65 may be performed in the same manner as steps S12 to S14, and for brevity, steps S61 and S62 will be mainly described hereinafter.

In some embodiments, step S61 may specifically be:

inputting an image to be detected into a backbone network of a text detection model to obtain a plurality of initial characteristic graphs with different sizes;

respectively up-sampling a plurality of initial feature maps with different sizes by using a feature map pyramid network of a text detection model and inputting sampling results into a cascade layer of the text detection model;

obtaining a characteristic diagram to be detected through a cascade layer;

inputting the probability prediction graph and the threshold graph into a differentiable binarization layer of the text detection model to obtain an approximate binarization graph;

and obtaining a text box of the image to be detected, two diagonal vertex coordinates and a deflection angle of the text box according to the approximate binary image, and intercepting a text box area image from the approximate binary image according to the two diagonal vertex coordinates and the deflection angle of the text box.

Wherein the two diagonal vertex coordinates may include an upper left vertex coordinate and a lower right vertex coordinate.

As an example, the text detection model may employ a DBNet (differential localization Net) model.

In some embodiments, after the text box region map is obtained, the text box region map may be further labeled based on the obtained diagonal vertex coordinates and deflection angle of each text box.

In some embodiments, step S62 may specifically be inputting the labeled text box area map into the character recognition model to obtain character information corresponding to the text box.

As an example, the word recognition model may employ a DenseNet model.

The optical character recognition method provided in another embodiment of the present invention can achieve the same beneficial effects as those of the embodiment corresponding to fig. 1, and combines the text detection model and the character recognition model, so that the preprocessing step of the image to be detected can be reduced, and the optical character recognition can be directly performed on the image to be detected.

Another aspect of the present invention also provides a computer-readable storage medium, in which a computer program is stored, and the computer program can implement the optical character recognition method in any one of the above embodiments when executed by a processor. The computer readable storage medium may be a storage device formed by including various electronic devices, and optionally, the computer readable storage medium is a non-transitory computer readable storage medium in the embodiment of the present invention.

In another aspect of the present invention, an electronic device is further provided, which includes: a memory and a processor, the memory having stored therein a computer program, the computer program when executed by the processor implementing the optical character recognition method as described in any of the above embodiments.

So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims

1. An optical character recognition method, comprising:

2. The method of claim 1, wherein determining a capture area based on the text information of the first text box, the two diagonal vertex coordinates, and the deflection angle comprises:

determining preset empirical values corresponding to the text information of the first text box, the two diagonal vertex coordinates and the deflection angle, wherein the vertex coordinates at the upper left corner comprise a first direction coordinate and a second direction coordinate of the vertex at the upper left corner, the vertex coordinates at the lower right corner comprise a first direction coordinate and a second direction coordinate of the vertex at the lower right corner, and the preset empirical values comprise a first empirical value, a second empirical value, a third empirical value and a fourth empirical value;

3. The method of claim 2, wherein the capture area is determined by the expression:

wherein the content of the first and second substances,

and

respectively representing a first direction coordinate and a second direction coordinate of the first vertex,

and

respectively representing first and second directional coordinates of the second vertex,

and

and

a first direction coordinate and a second direction coordinate respectively representing the vertex of the lower right corner, theta represents the deflection angle,

a value representing a first of said empirical values is,

a second empirical value is expressed that represents a second empirical value,

a third empirical value is expressed that represents a third empirical value,

represents the fourth empirical value.

4. The method of claim 2 or 3, wherein before determining a capture area based on the text information of the first text box, the two diagonal vertex coordinates, and the deflection angle, the method further comprises:

5. The method of claim 4, wherein determining the preset empirical value based on the deflection angle, the coordinates of the two diagonal vertices of the first text box, the coordinates of the third vertex and the fourth vertex comprises:

determining a second intersection point according to an intersection point of a second auxiliary line and a second extended line, wherein the second auxiliary line passes through the fourth vertex of the actual capture area and is parallel to the second direction, the second extended line passes through the vertex of the lower right corner of the first text box, and the inclination angle of the second extended line is equal to the deflection angle;

constructing a position relation model of the actual capturing area and the first text box according to the first intersection point, the second intersection point, the preset empirical value and the deflection angle;

the preset empirical value includes a first empirical value, a second empirical value, a third empirical value and a fourth empirical value, and the positional relationship model is expressed as:

and

first and second direction coordinates respectively representing the third vertex of the actual capture area,

and

a second directional coordinate representing the first intersection point,

a second directional coordinate representing the second intersection point,

and

and

a value representing a first of said empirical values is,

a second empirical value is expressed that is representative of the second empirical value,

a third empirical value is expressed that represents a third empirical value,

represents the fourth empirical value.

6. The method of claim 1, wherein obtaining an optical character recognition result based on at least the text information of the first text box and the text information of the second text box comprises:

alternatively, the first and second electrodes may be,

when a plurality of second text boxes are obtained, sorting according to the size of the first-direction coordinates of the top-left corner vertex or the bottom-right corner vertex of the plurality of second text boxes; or sorting according to the size of the second direction coordinates of the top left corner vertex or the bottom right corner vertex of the second text boxes;

7. The method of claim 1, wherein the obtaining a text box of an image to be detected and obtaining two diagonal vertex coordinates, a deflection angle and text information of the text box comprises:

8. The method according to claim 7, wherein inputting the image to be detected into a text detection model to obtain a text box area map of the image to be detected and the two diagonal vertex coordinates and the deflection angle of each text box in the text box area map comprises:

inputting the image to be detected into a backbone network of the text detection model to obtain a plurality of initial feature maps with different sizes;

obtaining a characteristic diagram to be detected through the cascade layer;

9. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, implements the optical character recognition method according to any one of claims 1 to 8.

10. An electronic device, comprising a memory and a processor, wherein the memory stores a computer program, and the computer program, when executed by the processor, implements the optical character recognition method of any one of claims 1 to 8.