CN114067328A

CN114067328A - Text recognition method and device and electronic equipment

Info

Publication number: CN114067328A
Application number: CN202111375638.XA
Authority: CN
Inventors: 李虎; 程林鹏; 胡翔; 郑邦东; 熊博颖
Original assignee: China Construction Bank Corp
Current assignee: China Construction Bank Corp
Priority date: 2021-11-19
Filing date: 2021-11-19
Publication date: 2022-02-18

Abstract

The disclosure provides a method for recognizing a text, a device for recognizing the text and an electronic device, wherein the method comprises the following steps: performing text positioning on the image to be recognized in response to the obtained image to be recognized, and determining text line scale information of at least part of text lines in at least one text line included in the image to be recognized; updating the scale value of the text line scale information for each piece of text line scale information to obtain at least one piece of extended text line scale information associated with the text line scale information; determining a first character recognition result of an image slice corresponding to the text line scale information in the image to be recognized and a second character recognition result of an image slice corresponding to each expanded text line scale information in the at least one expanded text line scale information; and if the first character recognition result is the same as the second character recognition result, outputting the first character recognition result or the second character recognition result.

Description

Text recognition method and device and electronic equipment

Technical Field

The present disclosure relates to the field of artificial intelligence technology and the field of finance, and more particularly, to a method for recognizing a text, a device for recognizing a text, and an electronic device.

Background

Optical Character Recognition (OCR) includes text detection and Character Recognition. In the related art, text detection can be performed first, and a text box can be determined. Then, character recognition is performed on the image slice in the text box.

If the location of the text box obtained by text detection is not accurate, the accuracy of the character recognition result is affected.

Disclosure of Invention

In view of the above, the present disclosure provides a method for recognizing a text, an apparatus for recognizing a text, and an electronic device, so as to at least partially solve the problem that inaccurate positioning of a text box results in low accuracy of character recognition.

One aspect of the present disclosure provides a method of recognizing text, including: performing text positioning on the image to be recognized in response to the obtained image to be recognized, and determining text line scale information of at least part of text lines in at least one text line included in the image to be recognized; updating the scale value of the text line scale information for each piece of text line scale information to obtain at least one piece of extended text line scale information associated with the text line scale information; determining a first character recognition result of an image slice corresponding to the text line scale information in the image to be recognized and a second character recognition result of an image slice corresponding to each expanded text line scale information in the at least one expanded text line scale information; and if the first character recognition result is the same as the second character recognition result, outputting the first character recognition result or the second character recognition result.

In some embodiments, the text line scale information includes coordinate information; updating the scale value of the text line scale information to obtain at least one expanded text line scale information associated with the text line scale information comprises: and updating the coordinate information corresponding to the text line scale information based on a preset rule to obtain at least one piece of extended coordinate information associated with the text line scale information so as to determine a second character recognition result of the image slice corresponding to the at least one piece of extended coordinate information.

In some embodiments, the coordinate information includes reference point coordinate information, and the text line scale information further includes: at least one of height information and width information; and updating the scale value of the text line scale information to obtain at least one expanded text line scale information associated with the text line scale information, wherein the step of obtaining the at least one expanded text line scale information comprises the following steps: updating at least one of the coordinate information, the height information and the width information corresponding to the text line scale information based on a preset rule to obtain at least one of extended coordinate information, extended height information and extended width information associated with the text line scale information so as to determine a second character recognition result of the image slice corresponding to at least one of the extended coordinate information, the extended height information and the extended width information.

In some embodiments, the reference point coordinate information includes coordinate values of a lower left vertex of the text box; updating at least one of the coordinate information, the height information and the width information corresponding to the text line scale information based on a preset rule, and obtaining at least one of extended coordinate information, extended height information and extended width information associated with the text line scale information, wherein the at least one of the extended coordinate information, the extended height information and the extended width information comprises at least one of the following: determining an expansion coefficient; and updating the coordinate value of the lower left vertex of the text box based on the expansion coefficient and the first preset rule to obtain the coordinate value of the expanded lower left vertex, and updating the coordinate value of the height information and/or the width information based on the expansion coefficient and the second preset rule to obtain the expanded height information and/or the expanded width information.

In some embodiments, the coordinate information includes coordinate values of a lower left vertex and an upper right vertex of the text box; updating the coordinate information corresponding to the text line scale information based on a preset rule, and obtaining at least one piece of extended coordinate information associated with the text line scale information comprises: determining an expansion coefficient; and updating the coordinate value of the lower left vertex of the text box based on the expansion coefficient and a third preset rule to obtain the coordinate value of the lower left vertex, and updating the coordinate value of the upper right vertex of the text box based on the expansion coefficient and a fourth preset rule to obtain the coordinate value of the upper right vertex.

In certain embodiments, the above method further comprises: and after the first character recognition result and/or the second character recognition result are/is determined, matching the first character recognition result or the second character recognition result in the dictionary, and if the matching is successful, outputting the first character recognition result or the second character recognition result.

In certain embodiments, the above method further comprises: and if the matching result of the first character recognition result or the second character recognition result is empty and the confidence coefficient of the first character recognition result or the second character recognition result is higher than the first confidence coefficient, outputting the matching result of the first character recognition result or the second character recognition result.

In certain embodiments, the above method further comprises: and if the first character recognition result is different from the second character recognition result, forbidding outputting the first character recognition result and the second character recognition result.

In certain embodiments, the above method further comprises: if the first character recognition result is different from the second character recognition result, determining respective second confidence degrees of the first character recognition result and the second character recognition result; and outputting prompt information indicating the second confidence coefficient so as to receive selection operation aiming at the first character recognition result or the second character recognition result.

One aspect of the present disclosure provides an apparatus for recognizing text, including: the system comprises a text line scale information determining module, a text line scale information expanding module, a multi-scale character recognition module and a character recognition result output module. The text line size information determining module is used for responding to the obtained image to be recognized, performing text positioning on the image to be recognized and determining text line size information of at least part of text lines in at least one text line included in the image to be recognized; the text line scale information expansion module is used for updating the scale value of the text line scale information for each piece of text line scale information to obtain at least one piece of expanded text line scale information associated with the text line scale information; the multi-scale character recognition module is used for determining a first character recognition result of an image slice corresponding to the text line scale information in the image to be recognized and a second character recognition result of an image slice corresponding to each expanded text line scale information in the at least one expanded text line scale information; and the character recognition result output module is used for outputting the first character recognition result or the second character recognition result if the first character recognition result is the same as the second character recognition result.

Another aspect of the present disclosure provides an electronic device including one or more processors and a storage for storing executable instructions that, when executed by the processors, implement the method of recognizing text as above.

Another aspect of the present disclosure provides a computer-readable storage medium storing computer-executable instructions for implementing the method of recognizing text as above when executed.

Another aspect of the disclosure provides a computer program comprising computer executable instructions for implementing a method of recognizing text as above when executed.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments of the present disclosure with reference to the accompanying drawings, in which:

fig. 1 schematically illustrates a method for recognizing a text by an application, an apparatus for recognizing a text, and an application scenario diagram of an electronic device according to an embodiment of the disclosure;

fig. 2 schematically illustrates an exemplary system architecture of a device for recognizing text, to which the method for recognizing text may be applied, according to an embodiment of the present disclosure;

FIG. 3 schematically illustrates a flow chart of a method of recognizing text in accordance with an embodiment of the present disclosure;

FIG. 4 schematically illustrates a schematic diagram of text line scale information, in accordance with an embodiment of the disclosure;

FIG. 5 schematically shows a schematic diagram of extended text line scale information, according to an embodiment of the present disclosure;

FIG. 6 schematically shows a schematic diagram of extended text line scale information according to another embodiment of the present disclosure;

FIG. 7 schematically illustrates a block diagram of an apparatus for recognizing text according to an embodiment of the present disclosure; and

FIG. 8 schematically shows a block diagram of an electronic device according to an embodiment of the disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.

Where a convention analogous to "A, B or at least one of C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B or C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

OCR recognition based on a deep learning technique is to read out characters printed on paper or written on paper using a deep learning technique. The OCR recognition based on the deep learning technology comprises text detection and character recognition, wherein the text detection is firstly carried out, the coordinates of a text box are found, and then the coordinates are input into a deep learning network for recognition.

In the related technology, improving the accuracy of OCR recognition includes algorithmically and strategically, and algorithmically includes two kinds of directions, one is improving the accuracy of text detection, and the existing problem is that text detection is always a technical difficulty and bottleneck in OCR recognition. Therefore, the space for improving the text detection algorithm is limited, and the text detection is not accurate enough in positioning, so that the subsequent character recognition is influenced. Secondly, the accuracy of character recognition is improved, but the character recognition is influenced by the accuracy of text detection, and if the text positioning is not accurate, an algorithm adopted by the character recognition cannot recognize accurate characters. In other words, the accuracy of text detection determines the upper limit of word recognition. Thus, strategically increasing OCR accuracy becomes more feasible when algorithm boosting encounters bottlenecks.

In order to improve the above problem, in the related art, a multi-model method may be adopted to improve the character recognition accuracy. The multi-model method is that a plurality of different character recognition models are trained, the positioned text lines are input into the recognition models for character recognition respectively, if the recognition results are the same, the text lines are output, otherwise, the text lines are not output. The method still cannot solve the defect of inaccurate text positioning.

Under the condition that the algorithm improvement reaches the bottleneck, how to further improve the accuracy of OCR recognition by optimizing a text detection result through a strategy on the basis of the existing text detection and character recognition technology becomes a problem to be solved urgently.

In order to facilitate understanding of the technical solutions of the embodiments of the present disclosure, some terms are first described.

Optical Character Recognition (OCR) uses optical and computer technology to read out characters printed or written on paper and convert them into a computer-accepted and understandable format.

The multiple scales refer to multiple sizes of pictures, and the multiple scales in ocr recognition refer to cutting out slices of an image to be recognized in different sizes.

The method for recognizing the text comprises a text line scale information expansion process and a text recognition result output process. The text line scale information expansion process comprises the following steps: firstly, in response to an obtained image to be recognized, performing text positioning on the image to be recognized, and determining respective text line scale information of at least part of text lines in at least one text line included in the image to be recognized; and updating the scale value of the text line scale information for each piece of text line scale information to obtain at least one piece of extended text line scale information associated with the text line scale information. Entering a character recognition result output process after the text line scale information expansion process is executed, and determining a first character recognition result of an image slice corresponding to the text line scale information in the image to be recognized and a second character recognition result of an image slice corresponding to each expanded text line scale information in at least one expanded text line scale information; and if the first character recognition result is the same as the second character recognition result, outputting the first character recognition result or the second character recognition result.

In the embodiment of the disclosure, the OCR accuracy is improved based on a multi-scale strategy, and a plurality of text lines with different scales are generated by changing the scale information of the text lines, such as coordinate change, after the text lines are output by a text positioning algorithm, and are respectively input into a character recognition model. If the recognition results are consistent, outputting the result, otherwise, not outputting the result, thereby improving the accuracy of character recognition.

In the technical scheme of the disclosure, the related acquisition, storage, application and the like of the image to be identified all accord with the regulations of related laws and regulations, necessary security measures are taken, and the official customs is not violated.

Fig. 1 schematically illustrates a method for recognizing a text by an application, a device for recognizing the text, and an application scenario diagram of an electronic device according to an embodiment of the disclosure.

As shown in fig. 1, the first line of text includes "ring size … …", the second line of text includes "vector … …", because "quantity" is an upper and lower structure, if the handwriting is not standardized enough, the "quantity" is written comparatively large, the upper and lower structures are easily separated far apart, and when the text lines are divided, the upper and lower structures of "quantity" are easily divided into two lines.

The box 101 in fig. 1 represents a text box corresponding to a text line in the related art, and it can be seen that the "quantity" word is not completely enclosed by the box 101, and the divided text line is incorrect. This can result in the wrong word "in" being recognized.

In this embodiment, the block 101 is expanded to obtain

blocks

102 and 103, where the block 102 can correctly perform text line division, and a correct character recognition result "vector" can be obtained after character recognition. The text recognition result corresponding to block 103 is "inward". Through the character recognition process, two character recognition results, inward and vector, can be obtained. If the confidence degrees of the two character recognition results are higher, at least one character recognition result can be determined to be wrong, and the character recognition result can not be output.

Fig. 2 schematically illustrates an exemplary system architecture of a method of recognizing text, an apparatus for recognizing text, which may be applied according to an embodiment of the present disclosure. It should be noted that fig. 2 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.

As shown in fig. 2, the system architecture 200 according to this embodiment may include

terminal devices

201, 202, 203, a network 204 and a server 205. The network 204 may include a plurality of gateways, routers, hubs, network wires, etc. to provide a medium for communication links between the

end devices

201, 202, 203 and the server 205. Network 204 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal device

201, 202, 203 to interact with other terminal devices and the server 205 via the network 204 to receive or send information or the like, such as sending a text recognition request, an image to be recognized, a text recognition model request, or the like. The

terminal devices

201, 202, 203 may be installed with various communication client applications, such as, for example only, a word processing application, a web browser application, a banking application, an e-commerce application, a search application, an office application, an instant messaging tool, a mailbox client, social platform software, etc.

The

terminal devices

201, 202, 203 include, but are not limited to, electronic devices such as smart phones, tablet computers, laptop portable computers, and the like.

The server 205 may receive requests for model training, etc., and process the requests. For example, the server 205 may be a back office management server, a cluster of servers, or the like. The background management server can analyze and process the received training request, the received identification request and the like, and feed back a processing result to the terminal equipment.

It should be noted that the method for recognizing text provided by the embodiment of the present disclosure may be executed by the

terminal device

201, 202, 203 or the server 205. Accordingly, the text recognition device provided by the embodiment of the present disclosure may be disposed in the

terminal device

201, 202, 203 or the server 205. It should be understood that the number of terminal devices, networks, and servers are merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Fig. 3 schematically shows a flow chart of a method of recognizing text according to an embodiment of the present disclosure.

As shown in fig. 3, the method of recognizing text may include operations S310 to S340.

In operation S310, in response to the obtained image to be recognized, performing text localization on the image to be recognized, and determining text line scale information of at least some text lines in at least one text line included in the image to be recognized.

In this embodiment, the image to be recognized may be for including printed text and/or handwritten text.

Text detection means detecting the position and range of the text and the layout thereof. Layout analysis and text line detection are also typically included. The main problem to be solved by text detection is where the text is, and how large the range of the text is.

Specifically, the image to be recognized may be input into the text detection model, and information such as coordinates of the text lines one by one, which is the initial base text line coordinates, is obtained.

The text in the image to be recognized may be arranged in rows. At least one text line may be extracted from the image to be recognized.

It should be noted that all text lines in the image may be extracted at a time, or only a part of text lines may be extracted from the image to be recognized at a time.

FIG. 4 schematically shows a schematic diagram of text line scale information according to an embodiment of the disclosure.

As shown in fig. 4, text line extraction may be performed based on the related art, resulting in a text line as indicated by block 101.

In operation S320, for each text line scale information, the scale value of the text line scale information is updated to obtain at least one expanded text line scale information associated with the text line scale information.

In this embodiment, the expanded text line may be obtained by expanding the text box, so as to obtain the expanded text line scale information. Wherein the scale information includes but is not limited to: at least one of coordinates of vertices of the text box, a height of the text box, a width of the text box, and the like.

Fig. 5 schematically shows a schematic diagram of extended text line scale information according to an embodiment of the present disclosure. Fig. 6 schematically shows a schematic diagram of extended text line scale information according to another embodiment of the present disclosure.

As shown in fig. 5, the height of the box 102 shown in fig. 5 is higher than that of the box 101 shown in fig. 4, and the characters of the "vector" can be all framed.

As shown in fig. 6, relative to block 103 shown in fig. 4, the width of block 103 shown in fig. 6 is wider,

in operation S330, a first character recognition result of an image corresponding to the text line size information among the images to be recognized and a second character recognition result of an image slice corresponding to each extended text line size information among the at least one extended text line size information are determined.

The method for character recognition may adopt various related technical means, such as a character recognition technology based on a deep learning technology, which is not limited herein.

In operation S340, if the first character recognition result and the second character recognition result are the same, the first character recognition result or the second character recognition result is output.

In this embodiment, when the first character recognition result is the same as the second character recognition result, it may be indicated that the extracted text line is accurate, the recognition result is also accurate, the output may be performed, and the accuracy is reliable.

Specifically, for example, the obtained three text line slices are respectively put into a character recognition model, so as to obtain three recognition results, the three character strings are compared, if the results are completely consistent, the results are output if the results are correct, otherwise, the recognition is considered to have a problem, and the results are not output.

The method for recognizing the characters provided by the embodiment of the disclosure does not need to change an algorithm model on the basis of a text detection and character recognition algorithm of the related technology. The embodiment of the disclosure can continuously improve the effect by a strategy optimization method under the condition that the algorithm reaches the bottleneck, and optimize the text positioning effect, so that the upper limit of OCR recognition can be improved.

In some embodiments, the text line scale information includes coordinate information.

Accordingly, updating the scale value of the text line scale information to obtain at least one expanded text line scale information associated with the text line scale information may include the following operations.

And updating the coordinate information corresponding to the text line scale information based on a preset rule to obtain at least one piece of extended coordinate information associated with the text line scale information so as to determine a second character recognition result of the image slice corresponding to the at least one piece of extended coordinate information.

Where the coordinate information includes, but is not limited to, coordinate information of four vertices of the text box. The expanded coordinate information may be coordinate information of four vertices of the expanded text box. The coordinate information may be a coordinate value of a vertex of the text box with respect to the center of the display screen, four endpoints of the display screen, or a coordinate origin of a specified coordinate system.

In some embodiments, the coordinate information includes reference point coordinate information, and the text line scale information further includes: at least one of height information and width information.

Correspondingly, updating the scale value of the text line scale information to obtain at least one piece of extended text line scale information associated with the text line scale information comprises: updating at least one of the coordinate information, the height information and the width information corresponding to the text line scale information based on a preset rule to obtain at least one of extended coordinate information, extended height information and extended width information associated with the text line scale information so as to determine a second character recognition result of the image slice corresponding to at least one of the extended coordinate information, the extended height information and the extended width information.

The height information may be in units of pixels, micrometers, millimeters, centimeters, or the like.

Specifically, as shown with reference to fig. 5, the reference point coordinate information includes coordinate values of the lower left vertex of the text box.

Accordingly, updating at least one of the coordinate information, the height information and the width information corresponding to the text line size information based on a preset rule, and obtaining at least one of the extended coordinate information, the extended height information and the extended width information associated with the text line size information comprises at least one of the following: first, the expansion coefficient is determined. And then, updating the coordinate value of the lower left vertex of the text box based on the expansion coefficient and a first preset rule to obtain the coordinate value of the expanded lower left vertex, and updating the coordinate value of the height information and/or the width information based on the expansion coefficient and a second preset rule to obtain expanded height information and/or expanded width information. The expansion coefficient may be preset or may be recorded according to an actual scene.

In one embodiment, with respect to the multi-scale steps: and carrying out multi-scale coordinate change on each text line to generate two new text lines, and forming a total of three text lines with the base text line (the text line identified in the related technology). And respectively putting the three text line slicing frames into the existing character recognition model to obtain three results. It should be noted that two text lines, four text lines, or more text lines may be used, which is not limited herein.

With respect to the multi-scale principle: characters to be recognized comprise handwritten forms and printed forms, the phenomena of character inclination, climbing and blocking exist, the handwritten forms are easy to appear, and therefore the result of text positioning is not accurate enough, some characters are missed and are not in text line slicing; the characters not included in the column may also be blocked or stuck. Therefore, a coordinate transformation method is designed manually to expand or reduce the length and width of some basic text lines, so that the missed characters can be detected more easily. Or in other words, people do not determine whether characters are missed or not, and manually change some scales to position, then the obtained frames are respectively put into a recognition model for recognition, if the results are consistent, the characters are missed with high probability, the results are correct, otherwise, the results are incorrect and are not output.

The multi-scale implementation method comprises the following steps: the coordinates of the base text line are (origin _ x, origin _ y, origin _ width, origin _ height), and expand _ value is defined to be equal to origin _ height 0.2, then the new two boxes are (origin _ x, origin _ y-0.5 origin _ value, origin _ width, origin _ height + origin _ value), (origin _ x-0.5 origin _ value, origin _ y, origin _ width + origin _ value, origin _ height), which are the better realizations we have summarized for bill identification, and can be adjusted according to the actual situation, or the number of multi-scale boxes can be increased.

In some embodiments, the text box of a text line may also be expanded using coordinate information alone.

For example, the coordinate information includes coordinate values of a lower left vertex and an upper right vertex of the text box.

Accordingly, updating the coordinate information corresponding to the text line size information based on the preset rule, and obtaining at least one expanded coordinate information associated with the text line size information may include: first, the expansion coefficient is determined. And then updating the coordinate value of the lower left vertex of the text box based on the expansion coefficient and a third preset rule to obtain the coordinate value of the lower left vertex, and updating the coordinate value of the upper right vertex of the text box based on the expansion coefficient and a fourth preset rule to obtain the coordinate value of the upper right vertex.

The expansion coefficient may refer to the previous embodiment, and the expansion method for four vertices of the text box may also refer to the previous embodiment. For example, the diagonal vertices of the base text line have coordinates 1 of (origin _ x1, origin _ y1) and coordinates 2 of (origin _ x2, origin _ y 2). Coordinates 1 of the extended diagonal vertex are (origin _ x1-0.5 origin _ value, origin _ y1-0.5 origin _ value), and coordinates 2 of the extended diagonal vertex are (origin _ x2+0.5 origin _ value, origin _ y2+0.5 origin _ value).

The text box corresponding to the text line can be expanded by the method, and the probability of inaccurate text line extraction result is reduced.

In some implementations, the method may further include: and after the first character recognition result and/or the second character recognition result are/is determined, matching the first character recognition result or the second character recognition result in the dictionary, and if the matching is successful, outputting the first character recognition result or the second character recognition result.

Furthermore, the method may further include the operations of: and if the matching result of the first character recognition result or the second character recognition result is empty and the confidence coefficient of the first character recognition result or the second character recognition result is higher than the first confidence coefficient, outputting the matching result of the first character recognition result or the second character recognition result. Wherein the confidence may be related to the sharpness of the image, etc. The clearer the image the higher the confidence of the recognized text.

In this embodiment, the character recognition is to recognize text content based on text detection, and convert text information in an image slice into text information. The main problem to be solved by word recognition is what each word is. The recognized text typically needs to be checked again to ensure its correctness. Text correction is also considered to be part of this link. And when the recognized content is composed of words in the Lexicon, the recognition is called dictionary recognition (Lexicon-based), and the recognition is called non-dictionary recognition (Lexicon-free). For a word recognition result without dictionary recognition, the confidence may be lower than for a word recognition result with dictionary recognition. However, if the confidence of the recognition result is high enough, for example, exceeds a preset threshold, when the above rule is met (for example, a plurality of recognition results are consistent), the confidence of the character recognition result is still high, and the character recognition result can be output.

In some embodiments, the method may further include the operations of: and if the first character recognition result is different from the second character recognition result, forbidding outputting the first character recognition result and the second character recognition result.

When the recognition results of a plurality of text lines are inconsistent, the text box corresponding to at least part of the recognition results is indicated to be wrong, if the result is output, the wrong recognition result can be given, and the output can be omitted. It should be noted that, in order to avoid the word missing problem, a mark, such as a prompt message, may be set at a word corresponding to the inconsistent recognition result.

In some embodiments, the method may further include the following operations.

And if the first character recognition result is different from the second character recognition result, determining respective second confidence degrees of the first character recognition result and the second character recognition result. The confidence coefficient can be the confidence coefficient output by the computer when the recognition result is obtained, and can be between 0 and 1.

And outputting prompt information indicating the second confidence coefficient so as to receive selection operation aiming at the first character recognition result or the second character recognition result. For example, the prompt may be in an annotation manner. In addition, the recognition result with high confidence may be identified in a color different from the recognition result with the other recognition consistency. In addition, the recognition results with low confidence may be listed at the recognition result with inconsistent recognition for the user to select.

In some embodiments, in order to improve the accuracy of the recognition result, the image to be recognized may be preprocessed.

Specifically, the word recognition process may include an image preprocessing process, a text detection process, and a word recognition process.

The image preprocessing is usually to correct the imaging problem of the image. Common pre-treatment processes include: geometric transformations (perspective, warping, rotation, etc.), distortion correction, deblurring, image enhancement, and ray correction, among others.

In the embodiment of the disclosure, on the basis of text detection and character recognition in the related art, a plurality of existing models can be adopted for character recognition. The text detection model is used for finding out the position of a text line needing to be recognized in the page, and the coordinates of the text line are output. The character recognition model is used for cutting out text line slices in the page by utilizing the positioning result of the text lines. And then expanding the sliced text line to obtain a plurality of expanded text lines corresponding to a certain text. Then, the recognition results of the plurality of extended text lines and the base text lines are input into the character recognition model and compared, and when the recognition results are consistent, the accuracy of the character recognition result can be ensured.

In the disclosed embodiment, the OCR recognition accuracy rate is improved algorithmically and strategically. Firstly, from the algorithm, the promotion degree of difficulty based on detection model is great, and lifting efficiency is not high, and the not enough of location can't be solved in the promotion based on recognition algorithm. And secondly, in the aspect of strategy, the traditional method for identifying multiple models is utilized, and the defect of the method still cannot solve the defect of inaccurate text positioning.

In view of the above, embodiments of the present disclosure propose to improve OCR accuracy with a multi-scale based strategy. On the basis of a text line output by the existing text detection algorithm, a plurality of text lines with different scales are generated through coordinate change. And respectively inputting the images of the text lines into the character recognition model. And if the identification results are consistent, outputting, otherwise, not outputting. The identification accuracy rate is effectively improved through the scheme. The advantage is the optimization of the positioning, which amounts to an increase of the upper limit of the OCR recognition. And secondly, the method is based on strategic optimization and can be adapted to most of recognition processes of OCR.

Another aspect of the present disclosure also provides an apparatus for recognizing text.

Fig. 7 schematically shows a block diagram of an apparatus for recognizing text according to an embodiment of the present disclosure.

As shown in fig. 7, the apparatus 700 may include: a text line size information determining module 710, a text line size information expanding module 720, a multi-scale character recognition module 730 and a character recognition result output module 740.

The text line size information determining module 710 is configured to perform text positioning on the image to be recognized in response to the obtained image to be recognized, and determine text line size information of at least some text lines in at least one text line included in the image to be recognized.

The text line scale information expansion module 720 is configured to update the scale value of the text line scale information for each piece of text line scale information to obtain at least one piece of expanded text line scale information associated with the text line scale information.

The multi-scale character recognition module 730 is configured to determine a first character recognition result of an image slice corresponding to the text line scale information in the image to be recognized, and a second character recognition result of an image slice corresponding to each expanded text line scale information in the at least one expanded text line scale information.

The character recognition result output module 740 is configured to output the first character recognition result or the second character recognition result if the first character recognition result is the same as the second character recognition result.

Correspondingly, the text line scale information extension module 720 is specifically configured to update the coordinate information corresponding to the text line scale information based on a preset rule, to obtain at least one extension coordinate information associated with the text line scale information, so as to determine a second character recognition result of the image slice corresponding to the at least one extension coordinate information.

Correspondingly, the text line size information extension module 720 is specifically configured to update at least one of the coordinate information, the height information, and the width information corresponding to the text line size information based on a preset rule, to obtain at least one of extended coordinate information, extended height information, and extended width information associated with the text line size information, so as to determine a second character recognition result of the image slice corresponding to the at least one of the extended coordinate information, the extended height information, and the extended width information.

In some embodiments, the reference point coordinate information includes coordinate values of a lower left vertex of the text box.

Correspondingly, the text line scale information expansion module 720 is specifically configured to update the coordinate value of the lower left vertex of the text box based on the expansion coefficient and the first preset rule to obtain the coordinate value of the expanded lower left vertex, and update the height information and/or the width information based on the expansion coefficient and the second preset rule to obtain the expanded height information and/or the expanded width information.

It should be noted that the implementation, solved technical problems, implemented functions, and achieved technical effects of each module/unit and the like in the apparatus part embodiment are respectively the same as or similar to the implementation, solved technical problems, implemented functions, and achieved technical effects of each corresponding step in the method part embodiment, and are not described in detail herein.

Any of the modules, units, or at least part of the functionality of any of them according to embodiments of the present disclosure may be implemented in one module. Any one or more of the modules and units according to the embodiments of the present disclosure may be implemented by being split into a plurality of modules. Any one or more of the modules, units according to the embodiments of the present disclosure may be implemented at least partially as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented by any other reasonable means of hardware or firmware by integrating or packaging the circuits, or in any one of three implementations of software, hardware and firmware, or in any suitable combination of any of them. Alternatively, one or more of the modules, units according to embodiments of the present disclosure may be implemented at least partly as computer program modules, which, when executed, may perform the respective functions.

For example, any plurality of the text line size information determination module 710, the text line size information expansion module 720, the multi-scale character recognition module 730, and the character recognition result output module 740 may be combined and implemented in one module, or any one of them may be divided into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. According to an embodiment of the present disclosure, at least one of the text line size information determining module 710, the text line size information expanding module 720, the multi-scale word recognition module 730, and the word recognition result outputting module 740 may be at least partially implemented as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented by hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or implemented by any one of three implementation manners of software, hardware, and firmware, or by a suitable combination of any of them. Alternatively, at least one of the text line size information determination module 710, the text line size information expansion module 720, the multi-scale word recognition module 730, and the word recognition result output module 740 may be at least partially implemented as a computer program module that, when executed, may perform corresponding functions.

FIG. 8 schematically shows a block diagram of an electronic device according to an embodiment of the disclosure. The electronic device shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 8, an electronic device 800 according to an embodiment of the present disclosure includes a processor 801 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. The processor 801 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 801 may also include onboard memory for caching purposes. The processor 801 may include a single processing unit or multiple processing units for performing different actions of the method flows according to embodiments of the present disclosure.

In the RAM 803, various programs and data necessary for the operation of the electronic apparatus 800 are stored. The processor 801, the ROM 802, and the RAM 803 are communicatively connected to each other by a bus 804. The processor 801 performs various operations of the method flows according to the embodiments of the present disclosure by executing programs in the ROM 802 and/or RAM 803. Note that the programs may also be stored in one or more memories other than the ROM 802 and RAM 803. The processor 801 may also perform various operations of method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.

Electronic device 800 may also include input/output (I/O) interface 805, input/output (I/O) interface 805 also connected to bus 804, according to an embodiment of the present disclosure. Electronic device 800 may also include one or more of the following components connected to I/O interface 805: an input portion 806 including a keyboard, a mouse, and the like; an output section 807 including a signal such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 808 including a hard disk and the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. A drive 810 is also connected to the I/O interface 805 as necessary. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that a computer program read out therefrom is mounted on the storage section 808 as necessary.

According to embodiments of the present disclosure, method flows according to embodiments of the present disclosure may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 808 and/or installed from the removable medium 811. The computer program, when executed by the processor 801, performs the above-described functions defined in the system of the embodiments of the present disclosure. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.

It should be noted that the method for recognizing a text, the apparatus for recognizing a text, and the electronic device provided in the embodiments of the present disclosure may be used in another field of artificial intelligence for recognizing relevant aspects of a text, may also be used in a plurality of fields other than another field of artificial intelligence, such as a field of recognizing a text, and may also be applied in a financial field. The method for recognizing the text, the device for recognizing the text and the application field of the electronic equipment provided by the embodiment of the disclosure are not limited.

The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.

According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, a computer-readable storage medium may include the ROM 802 and/or RAM 803 described above and/or one or more memories other than the ROM 802 and RAM 803.

Embodiments of the present disclosure also provide a computer program product comprising a computer program containing program code for performing the method provided by the embodiments of the present disclosure, when the computer program product runs on an electronic device, the program code is configured to enable the electronic device to implement the image model training method or the method for recognizing text provided by the embodiments of the present disclosure.

The computer program, when executed by the processor 801, performs the above-described functions defined in the system/apparatus of the embodiments of the present disclosure. The systems, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.

In one embodiment, the computer program may be hosted on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted in the form of a signal on a network medium, distributed, downloaded and installed via communication section 809, and/or installed from removable media 811. The computer program containing program code may be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

In accordance with embodiments of the present disclosure, program code for executing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, these computer programs may be implemented using high level procedural and/or object oriented programming languages, and/or assembly/machine languages. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. These examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims

1. A method of recognizing text, comprising:

in response to the obtained image to be recognized, performing text positioning on the image to be recognized, and determining text line scale information of at least part of text lines in at least one text line included in the image to be recognized;

for each of the text line scale information,

updating the scale value of the text line scale information to obtain at least one piece of extended text line scale information associated with the text line scale information;

determining a first character recognition result of an image slice corresponding to the text line scale information in the image to be recognized and a second character recognition result of an image slice corresponding to each expanded text line scale information in the at least one expanded text line scale information; and

and if the first character recognition result is the same as the second character recognition result, outputting the first character recognition result or the second character recognition result.

2. The method of claim 1, wherein the text line scale information comprises coordinate information;

updating the scale value of the text line scale information to obtain at least one expanded text line scale information associated with the text line scale information comprises:

updating the coordinate information corresponding to the text line scale information based on a preset rule to obtain at least one piece of extended coordinate information associated with the text line scale information so as to determine a second character recognition result of the image slice corresponding to the at least one piece of extended coordinate information.

3. The method of claim 2, wherein the coordinate information comprises reference point coordinate information, the text line scale information further comprising: at least one of height information and width information; and

updating at least one of the coordinate information, the height information and the width information corresponding to the text line scale information based on a preset rule to obtain at least one of extended coordinate information, extended height information and extended width information associated with the text line scale information, so as to determine a second character recognition result of the image slice corresponding to at least one of the extended coordinate information, the extended height information and the extended width information.

4. The method of claim 3, wherein the reference point coordinate information includes a coordinate value of a lower left vertex of a text box;

updating at least one of coordinate information, height information and width information corresponding to the text line scale information based on a preset rule, and obtaining at least one of extended coordinate information, extended height information and extended width information associated with the text line scale information includes:

and updating the coordinate value of the lower left vertex of the text box based on an expansion coefficient and a first preset rule to obtain the coordinate value of the lower left vertex, and updating the height information and/or the width information based on the expansion coefficient and a second preset rule to obtain the expanded height information and/or the expanded width information.

5. The method of claim 2, wherein the coordinate information includes a coordinate value of a lower left vertex and a coordinate value of an upper right vertex of the text box;

updating the coordinate information corresponding to the text line scale information based on a preset rule, and obtaining at least one piece of extended coordinate information associated with the text line scale information includes:

and updating the coordinate value of the lower left vertex of the text box based on the expansion coefficient and a third preset rule to obtain the coordinate value of the lower left vertex, and updating the coordinate value of the upper right vertex of the text box based on the expansion coefficient and a fourth preset rule to obtain the coordinate value of the upper right vertex.

6. The method of claim 1, further comprising: after determining the first and/or second word recognition results,

and matching the first character recognition result or the second character recognition result in a dictionary, and if the matching is successful, outputting the first character recognition result or the second character recognition result.

7. The method of claim 6, further comprising:

and if the matching result of the first character recognition result or the second character recognition result is empty and the confidence coefficient of the first character recognition result or the second character recognition result is higher than the first confidence coefficient, outputting the matching result of the first character recognition result or the second character recognition result.

8. The method of any of claims 1-7, further comprising:

and if the first character recognition result is different from the second character recognition result, forbidding outputting the first character recognition result and the second character recognition result.

9. The method of any of claims 1-7, further comprising:

if the first character recognition result and the second character recognition result are different, determining respective second confidence degrees of the first character recognition result and the second character recognition result; and

and outputting prompt information indicating the second confidence degree so as to receive selection operation aiming at the first character recognition result or the second character recognition result.

10. An apparatus for recognizing text, comprising:

the text line size information determining module is used for responding to the obtained image to be recognized, performing text positioning on the image to be recognized and determining the text line size information of at least part of text lines in at least one text line included in the image to be recognized;

the text line scale information expansion module is used for updating the scale value of the text line scale information for each piece of text line scale information to obtain at least one piece of expanded text line scale information associated with the text line scale information;

the multi-scale character recognition module is used for determining a first character recognition result of an image slice corresponding to the text line scale information in the image to be recognized and a second character recognition result of an image slice corresponding to each expanded text line scale information in the at least one expanded text line scale information; and

and the character recognition result output module is used for outputting the first character recognition result or the second character recognition result if the first character recognition result is the same as the second character recognition result.

11. An electronic device, comprising:

one or more processors;

a storage device for storing executable instructions which, when executed by the processor, implement a method of recognizing text according to any one of claims 1 to 9.

12. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, implement a method of recognizing text according to any one of claims 1 to 9.