CN112016561B

CN112016561B - Text recognition method and related equipment

Info

Publication number: CN112016561B
Application number: CN202010905503.9A
Authority: CN
Inventors: 吴文建; 王继武; 赵小柱; 张明威; 丁平; 綦红镀; 江贵林; 王冠华
Original assignee: Bank of China Ltd
Current assignee: Bank of China Ltd
Priority date: 2020-09-01
Filing date: 2020-09-01
Publication date: 2023-08-04
Anticipated expiration: 2040-09-01
Also published as: CN112016561A

Abstract

The application discloses a text recognition method and related equipment, wherein the method comprises the following steps: after the image to be identified is acquired, a text identification area can be extracted from the image to be identified, and when the text area is determined to meet a first condition, a data text area is cut from the text identification area; and then carrying out text recognition on the data text region to obtain a target text. The data text in the image to be recognized is recorded in the data text area, so that the text recognized from the data text area is the data text in the image to be recognized, and the data text in the image to be recognized can be accurately recognized.

Description

Text recognition method and related equipment

Technical Field

The present disclosure relates to the field of data processing technologies, and in particular, to a text recognition method and related devices.

Background

Text recognition refers to recognizing all or part of text (e.g., data text) in an image to be recognized (e.g., a bank note image as shown in fig. 1).

In some cases, if the image to be recognized includes name text (i.e., key) and data text (i.e., value), only the data text in the image to be recognized may be recognized. For example, for the bank note image shown in fig. 1, text similar to "Account No/Credit Card No." is name text, and text similar to "9027040025768183" is data text.

However, how to accurately identify the data text in the image to be identified is a technical problem to be solved.

Disclosure of Invention

In order to solve the above technical problems in the prior art, the application provides a text recognition method and related equipment, which can accurately recognize data text in an image to be recognized.

In order to achieve the above object, the technical solution provided in the embodiments of the present application is as follows:

the embodiment of the application provides a text recognition method, which comprises the following steps:

extracting a text recognition area from an image to be recognized;

when the text recognition area meets a first condition, intercepting a data text area from the text recognition area;

and carrying out text recognition on the data text region to obtain a target text.

Optionally, the intercepting the data text area from the text recognition area includes:

acquiring a data text position of the text recognition area;

and determining the data text area according to the data text position of the text recognition area.

Optionally, the acquiring the data text position of the text recognition area includes:

acquiring the position of a preset template in the text recognition area as the demarcation position of the text recognition area;

and determining the data text position of the text recognition area according to the demarcation position of the text recognition area.

Optionally, the method further comprises:

matching a preset template with the text recognition area;

when the preset template is successfully matched with the text recognition area, determining that the text recognition area meets a first condition;

and when the fact that the preset template is failed to be matched with the text recognition area is determined, determining that the text recognition area does not meet a first condition.

Optionally, the extracting the text recognition area from the image to be recognized includes:

preprocessing the image to be identified to obtain a preprocessed image to be identified;

and carrying out text detection on the preprocessed image to be identified to obtain the text identification area.

Optionally, the preprocessing the image to be identified to obtain a preprocessed image to be identified includes:

determining the blurring degree of the image to be identified;

when the blurring degree of the image to be identified is determined to reach a second condition, acquiring the display direction of the image to be identified;

and when the display direction of the image to be identified is determined to reach a third condition, carrying out direction correction on the image to be identified to obtain the preprocessed image to be identified.

Optionally, the method further comprises:

and correcting the target text according to a preset rule to obtain a corrected text.

The embodiment of the application also provides a text recognition device, which comprises:

the extraction unit is used for extracting a text recognition area from the image to be recognized;

the intercepting unit is used for intercepting a data text region from the text recognition region when the text region is determined to meet a first condition;

and the identification unit is used for carrying out text identification on the data text area to obtain a target text.

The embodiment of the application also provides equipment, which comprises a processor and a memory:

the memory is used for storing a computer program;

the processor is configured to execute any implementation mode of the text recognition method provided by the embodiment of the application according to the computer program.

The present embodiments also provide a computer readable storage medium for storing a computer program for executing any implementation of the text recognition method provided by the embodiments of the present application.

Compared with the prior art, the embodiment of the application has at least the following advantages:

in the text recognition method provided by the embodiment of the application, after the image to be recognized is obtained, a text recognition area can be extracted from the image to be recognized, and when the text area is determined to meet a first condition, a data text area is cut from the text recognition area; and then carrying out text recognition on the data text region to obtain a target text. The data text in the image to be recognized is recorded in the data text area, so that the text recognized from the data text area is the data text in the image to be recognized, and the data text in the image to be recognized can be accurately recognized.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a bank note image provided in an embodiment of the present application;

FIG. 2 is a schematic diagram of a text recognition area included in the bank note image of FIG. 1 according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a text recognition area including both name text and data text provided by an embodiment of the present application;

fig. 4 is a flowchart of a text recognition method according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a text recognition device according to an embodiment of the present application;

fig. 6 is a schematic diagram of an apparatus structure according to an embodiment of the present application.

Detailed Description

In the research of the inventor on text recognition, in the related art, when an image to be recognized includes a name text (i.e., key) and a data text (i.e., value), and only the data text in the image to be recognized needs to be recognized, a text recognition area (a newly added rectangular box area as shown in fig. 2) may be extracted from the image to be recognized (e.g., a bank document image as shown in fig. 1), and then the text recognition is performed on the text recognition area, so as to obtain the data text in the image to be recognized. However, since the text recognition area may include both the name text and the data text (such as the area shown in fig. 3), the text recognition area cannot be recognized due to the relatively long text recognition area, and the correct data text cannot be recognized from the text recognition area due to the diversity of contents of the text recognition area, which results in poor recognition accuracy of the data text.

In order to solve the technical problems in the background art and the drawbacks of the related art, an embodiment of the present application provides a text recognition method, which includes: extracting a text recognition area from an image to be recognized; when the text region is determined to meet a first condition, intercepting a data text region from the text recognition region; and carrying out text recognition on the data text region to obtain a target text. The data text in the image to be recognized is recorded in the data text area, so that the text recognized from the data text area is the data text in the image to be recognized, and the data text in the image to be recognized can be accurately recognized. The data text area also comprises the data text but not the name text, so that the length of the data text area is shorter, and the content of the data text area is single, thereby effectively overcoming the defects of the related technology and being beneficial to improving the recognition accuracy of the data text.

In order to make the present application solution better understood by those skilled in the art, the following description will clearly and completely describe the technical solution in the embodiments of the present application with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

Method embodiment

Referring to fig. 4, a flowchart of a text recognition method according to an embodiment of the present application is shown.

The text recognition method provided by the embodiment of the application comprises the following steps of S1-S5:

s1: and extracting a text recognition area from the image to be recognized.

The image to be recognized refers to an image which needs to be recognized by the data text. In addition, embodiments of the present application are not limited to images to be identified, for example, the images to be identified may be bank note images (e.g., foreign money transfer application images).

The text recognition area refers to an area where text in the image to be recognized is located (a newly added rectangular box area as shown in fig. 2). In addition, embodiments of the present application are not limited to content in the text recognition area, for example, the text recognition area may include data text and/or name text.

In addition, the number of the text recognition areas is not limited in the embodiment of the application, for example, N text recognition areas may be included in the image to be recognized. Wherein N is a positive integer.

It should be noted that, in order to further improve the recognition accuracy of the data text, the text recognition area may include the data text and the name text, so that the area including only the data text can be extracted by the area capturing operation later.

In addition, the embodiment of the application also provides an implementation mode of S1, which specifically comprises S11-S12:

s11: and preprocessing the image to be identified to obtain a preprocessed image to be identified.

The preprocessing is used for optimizing the image to be identified, so that the characters in the image to be identified are clearer.

In addition, embodiments of the present application are not limited to implementation of preprocessing, and for example, preprocessing may include at least one means of image brightness adjustment, image sharpness adjustment, image display direction adjustment, image blur adjustment, and the like.

To facilitate understanding of S11, the following description is made in connection with an example.

As an example, S11 may specifically include S111-S116:

s111: and determining the blurring degree of the image to be identified.

The blurring degree is used for describing whether characters in an image to be identified are clear or not, and specifically comprises the following steps: the higher the blurring degree is, the less clear the characters in the image to be identified are; however, the lower the degree of blurring, the clearer the text in the image to be recognized.

It should be noted that, in the embodiment of the present application, the obtaining manner of the blur degree is not limited, and for example, a calculation program of the blur degree in OpenCV may be called for calculation.

S112: judging whether the blurring degree of the image to be identified reaches a second condition, if so, executing S113-S114; if not, S115 is performed.

The second condition is a condition that an image capable of character recognition reaches. In addition, the embodiment of the present application does not limit the second condition, and for example, the second condition may be below a preset blur threshold.

In this embodiment of the present application, after obtaining the blur degree of the image to be identified, it may be directly determined whether the blur degree reaches the second condition (for example, whether the blur degree is lower than a blur threshold value or not); if so (e.g., below a blur threshold), determining that text in the image to be identified can be identified; if the text in the image to be recognized is not reached (for example, the text is not lower than the blurring threshold value), the text in the image to be recognized is determined to be unrecognizable.

S113: and acquiring the display direction of the image to be identified.

The display direction refers to the direction in which the image to be recognized is displayed. For example, the display direction may be a forward display, a reverse display, or a side display.

In addition, the embodiment of the present application is not limited to the method for determining the display direction, and may be implemented by any method capable of determining the display direction of the image to be recognized, for example.

S114: judging whether the display direction of the image to be identified reaches a third condition, if so, carrying out direction correction on the image to be identified to obtain a preprocessed image to be identified; if not, determining the image to be identified as the preprocessed image to be identified.

The third condition is preset, for example, the third condition may be forward display.

The direction correction means correcting an image to be recognized which is not displayed in the forward direction to an image to be recognized which is displayed in the forward direction.

S115: ending the current flow.

Based on the above-mentioned related content of S111 to S115, after the image to be identified is obtained, it may be sequentially determined whether the blur degree of the image to be identified reaches the second condition and whether the display direction of the image to be identified reaches the third condition, so that after determining that the blur degree of the image to be identified reaches the second condition and the display direction of the image to be identified reaches the third condition, the direction correction is performed on the image to be identified to obtain the preprocessed image to be identified, so that the preprocessed image to be identified is a clear-text image displayed in the forward direction can be ensured.

S12: and carrying out text detection on the preprocessed image to be identified to obtain the text identification area.

In this embodiment of the present application, after the preprocessed image to be identified is obtained, text detection may be directly performed on the preprocessed image to be identified, so as to obtain a text recognition area, so that the text recognition area includes a data text and/or a name text. Note that, the embodiments of the present application are not limited to the implementation of text detection, for example, text detection may be implemented using a pre-trained natural image text detection (CTPN) detection model based on a connected text box network.

Based on the above-mentioned related content of S1, after the image to be identified is obtained, the text identification area may be directly extracted from the image to be identified, so that the text identification area includes the data text and/or the name text.

It should be noted that, the embodiment of the present application is not limited to the method of acquiring the image to be identified, for example, the image to be identified may be read from a preset storage space, or the image to be identified may be acquired from a preset image storage path.

S2: judging whether the text recognition area meets a first condition, if so, executing S3; if not, S4 is executed.

The first condition refers to a condition satisfied by the text recognition text region in which the content is not single. In addition, the embodiment of the application does not limit the first condition, for example, the first condition may be that an area which can be successfully matched with the preset template exists in the text recognition area.

Based on this, the embodiment of the present application further provides an implementation manner of S2, which specifically is: matching a preset template with the text recognition area; when the preset template is successfully matched with the text recognition area, determining that the text recognition area meets a first condition; and when the fact that the preset template is failed to be matched with the text region is determined, determining that the text recognition region does not meet a first condition.

The preset template is used for describing the boundary between the name text and the data text. In addition, the embodiment of the present application is not limited to the preset template, and for example, the preset template may be "No." for the text region shown in fig. 3.

In addition, different text recognition areas can correspond to different preset templates, so after the text recognition area is acquired, the preset template corresponding to the text recognition area can be determined according to the text recognition area, and then whether the preset template corresponding to the text recognition area is successfully matched with the text recognition area or not is judged.

Based on the above, if the image to be identified includes N text recognition areas, after the i text recognition area is obtained, the i text recognition area is first matched with the preset template, and if it is determined that an area successfully matched with the preset template exists in the i text recognition area, it can be determined that the i text recognition area is successfully matched with the preset template, so that it can be determined that the i text recognition area meets the first condition; however, in determining that there is no area successfully matched with the preset template in the i-th text recognition area, it may be determined that the i-th text recognition area fails to be matched with the preset template, and thus it may be determined that the i-th text recognition area does not satisfy the first condition. Wherein i is a positive integer, and i is less than or equal to N.

S3: and intercepting a data text area from the text recognition area, and continuing to execute S5.

The data text area is an area where the data text is located in the text recognition area.

In addition, the embodiment of the application also provides an implementation mode for intercepting the text region (namely S3), which specifically comprises S31-S32:

s31: and acquiring the data text position of the text recognition area.

The data text location is the location of the data text in the text recognition area.

In some cases, there is a boundary between the name text and the data text in the text recognition area, and the name text is typically located to the left of the data text (as shown in fig. 3). Based on this, the embodiment of the application also provides an implementation manner for determining the position of the data text, which specifically includes: acquiring the position of a preset template in the text recognition area as the demarcation position of the text recognition area; and determining the data text position of the text recognition area according to the demarcation position of the text recognition area.

After determining that the ith text recognition area includes the name text and the data text, the preset template can be matched with the ith text recognition area, the position of the area successfully matched with the preset template in the ith text recognition area is determined to be the boundary position of the ith text recognition area, and the position of the area on the right side of the boundary position of the ith text recognition area is determined to be the data text position of the text recognition area. Wherein i is a positive integer, and i is less than or equal to N.

S32: and determining the data text area according to the data text position of the text recognition area.

In this embodiment of the present application, after the data text position of the text recognition area is obtained, the area represented by the data text position of the text recognition area may be directly determined as the data text position of the text recognition area.

Based on the above-mentioned related content of S3, in the embodiment of the present application, when it is determined that the i-th text recognition area meets the first condition, it may be determined that the i-th text recognition area includes the name text and the data text, and in this case, in order to avoid an adverse effect of the name text on the recognition process of the data text, the area including the data text (i.e., the data text area) may be directly intercepted from the i-th text recognition area, so that only the area including the data text may be subjected to text recognition in step S5. Wherein i is a positive integer, and i is less than or equal to N.

S4: if the text recognition area includes a data text, determining the text recognition area as a data text area, and continuing to execute S5.

In this embodiment of the present application, when it is determined that the ith text recognition area does not satisfy the first condition, it may be determined that the ith text recognition area includes only a data text or a name text, so that when it is determined that the ith text recognition area includes a data text, it may be determined that the ith text recognition area includes only a data text, and therefore, it may be determined that the text recognition area is directly determined as a data text area for text recognition.

S5: and carrying out text recognition on the data text region to obtain a target text.

In the embodiment of the application, after the data text region only including the data text is acquired, text recognition can be directly performed on the data text region to obtain the target text, so that the target text can accurately represent the data text in the data text region.

Note that, the embodiment of the present application is not limited to the text recognition method, and may be implemented by using a pre-trained text recognition model (CRNN, for short), for example.

In some cases, there may be a text with a recognition error in the target text (e.g., recognizing the letter "O" as the number "0"), so in order to avoid these recognition errors, the embodiments of the present application also provide an implementation of a text recognition method, where the text recognition method includes S6 in addition to S1-S5:

s6: and correcting the target text according to a preset rule to obtain a corrected text.

The preset rule is a preset rule for correcting the data text, and the preset rule can be set according to an application scene.

Therefore, in the embodiment of the application, after the target text is obtained, the target text can be corrected according to the preset rule to obtain the corrected text, so that the corrected text does not have the text with the wrong recognition, and the recognition accuracy can be effectively improved.

Based on the related content of the text recognition method, in the text recognition method provided by the embodiment of the application, after the image to be recognized is obtained, a text recognition area can be extracted from the image to be recognized, and when the text area is determined to meet a first condition, a data text area is cut from the text recognition area; and then carrying out text recognition on the data text region to obtain a target text. The data text in the image to be recognized is recorded in the data text area, so that the text recognized from the data text area is the data text in the image to be recognized, and the data text in the image to be recognized can be accurately recognized. The data text area also comprises the data text but not the name text, so that the length of the data text area is shorter, and the content of the data text area is single, thereby effectively overcoming the defects of the related technology and being beneficial to improving the recognition accuracy of the data text.

Based on the text recognition method provided by the above method embodiment, the present application embodiment also provides a text recognition device, which is explained and illustrated below with reference to the accompanying drawings.

Device embodiment

For technical details of the text recognition device provided in the device embodiment, please refer to the above method embodiment.

Referring to fig. 5, the structure of a text recognition device according to an embodiment of the present application is shown.

The text recognition device 500 provided in the embodiment of the present application includes:

an extracting unit 501, configured to extract a text recognition area from an image to be recognized;

an intercepting unit 502, configured to intercept a data text region from the text recognition region when it is determined that the text region meets a first condition;

and the recognition unit 503 is configured to perform text recognition on the data text region to obtain a target text.

In a possible implementation manner, the intercepting unit 502 includes:

an acquisition subunit, configured to acquire a data text position of the text recognition area;

and the determining subunit is used for determining the data text area according to the data text position of the text recognition area.

In a possible implementation manner, the acquiring subunit is specifically configured to:

In one possible implementation, the text recognition device 500 further includes:

the matching unit is used for matching a preset template with the text recognition area; when the preset template is successfully matched with the text recognition area, determining that the text recognition area meets a first condition; and when the fact that the preset template is failed to be matched with the text recognition area is determined, determining that the text recognition area does not meet a first condition.

In a possible implementation manner, the identifying unit 503 includes:

the processing subunit is used for preprocessing the image to be identified to obtain a preprocessed image to be identified;

and the detection subunit is used for carrying out text detection on the preprocessed image to be identified to obtain the text identification area.

In a possible embodiment, the processing subunit is specifically configured to:

determining the blurring degree of the image to be identified;

and the correction unit is used for correcting the target text according to a preset rule to obtain a corrected text.

Based on the related content of the text recognition device 500, after the image to be recognized is obtained, a text recognition area may be extracted from the image to be recognized, and when it is determined that the text area meets a first condition, a data text area may be cut from the text recognition area; and then carrying out text recognition on the data text region to obtain a target text. The data text in the image to be recognized is recorded in the data text area, so that the text recognized from the data text area is the data text in the image to be recognized, and the data text in the image to be recognized can be accurately recognized.

Based on the text recognition method provided by the method embodiment, the embodiment of the application also provides equipment, and the equipment is explained and illustrated below with reference to the accompanying drawings.

Device embodiment

For the technical details of the device provided in the device embodiment, please refer to the above method embodiment.

Referring to fig. 6, a schematic diagram of an apparatus structure according to an embodiment of the present application is shown.

The apparatus 600 provided in the embodiment of the present application includes: a processor 601 and a memory 602;

the memory 602 is used for storing a computer program;

the processor 601 is configured to execute any implementation of the text recognition method provided by the above method embodiment according to the computer program. That is, the processor 601 is configured to perform the steps of:

extracting a text recognition area from an image to be recognized;

acquiring a data text position of the text recognition area;

Optionally, the method further comprises:

matching a preset template with the text recognition area;

determining the blurring degree of the image to be identified;

Optionally, the method further comprises:

The foregoing is related to the device 600 provided in the embodiments of the present application.

Based on the text recognition method provided by the method embodiment, the embodiment of the application also provides a computer readable storage medium.

Media embodiment

For technical details of the computer-readable storage medium provided in the medium embodiment, please refer to the method embodiment.

The present application provides a computer readable storage medium for storing a computer program for executing any one of the text recognition methods provided by the above method embodiments. That is, the computer program is for performing the steps of:

extracting a text recognition area from an image to be recognized;

acquiring a data text position of the text recognition area;

Optionally, the method further comprises:

matching a preset template with the text recognition area;

determining the blurring degree of the image to be identified;

Optionally, the method further comprises:

The foregoing is related to computer readable storage media provided by embodiments of the present application.

It should be understood that in this application, "at least one" means one or more, and "a plurality" means two or more. "and/or" for describing the association relationship of the association object, the representation may have three relationships, for example, "a and/or B" may represent: only a, only B and both a and B are present, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b or c may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.

The above description is only of the preferred embodiment of the present invention, and is not intended to limit the present invention in any way. While the invention has been described with reference to preferred embodiments, it is not intended to be limiting. Any person skilled in the art can make many possible variations and modifications to the technical solution of the present invention or modifications to equivalent embodiments using the methods and technical contents disclosed above, without departing from the scope of the technical solution of the present invention. Therefore, any simple modification, equivalent variation and modification of the above embodiments according to the technical substance of the present invention still fall within the scope of the technical solution of the present invention.

Claims

1. A method of text recognition, the method comprising:

extracting a text recognition area from an image to be recognized;

performing text recognition on the data text region to obtain a target text;

the intercepting the data text area from the text recognition area comprises the following steps:

acquiring a data text position of the text recognition area;

determining the data text region according to the data text position of the text recognition region;

the acquiring the data text position of the text recognition area comprises the following steps:

determining the data text position of the text recognition area according to the demarcation position of the text recognition area;

the method further comprises the steps of:

matching a preset template with the text recognition area;

2. The method of claim 1, wherein extracting a text recognition region from an image to be recognized comprises:

3. The method according to claim 2, wherein preprocessing the image to be identified to obtain a preprocessed image to be identified comprises:

determining the blurring degree of the image to be identified;

4. The method according to claim 1, wherein the method further comprises:

5. A text recognition device, the device comprising:

the recognition unit is used for carrying out text recognition on the data text region to obtain a target text;

the interception unit comprises:

a determining subunit, configured to determine the data text area according to the data text position of the text recognition area;

the obtaining subunit is specifically configured to: acquiring the position of a preset template in the text recognition area as the demarcation position of the text recognition area; determining the data text position of the text recognition area according to the demarcation position of the text recognition area;

the text recognition apparatus further includes:

6. A text recognition device, the device comprising a processor and a memory:

the memory is used for storing a computer program;

the processor is configured to perform the method of any of claims 1-4 according to the computer program.

7. A computer readable storage medium, characterized in that the computer readable storage medium is for storing a computer program for executing the method of any one of claims 1-4.