CN112183250A

CN112183250A - Character recognition method and device, storage medium and electronic equipment

Info

Publication number: CN112183250A
Application number: CN202010963512.3A
Authority: CN
Inventors: 姜仟艺; 刘曦; 宋祺; 张睿; 周锴; 李楠; 周永生
Original assignee: Beijing Sankuai Online Technology Co Ltd
Current assignee: Beijing Sankuai Online Technology Co Ltd
Priority date: 2020-09-14
Filing date: 2020-09-14
Publication date: 2021-01-05

Abstract

The disclosure relates to a character recognition method, a character recognition device, a storage medium and an electronic device, wherein the method comprises the following steps: the method comprises the steps of obtaining character frames in an image to be recognized by carrying out positioning detection on characters in the image to be recognized, wherein the characters in each character frame have the same line character direction; aiming at each text box, determining the line direction of the text in the text box; and obtaining a character recognition result of the character frame according to the line character direction of the character frame. By adopting the method, the character frame in the image to be recognized can be detected and obtained no matter what font type and font size the characters in the image to be recognized are, no matter what line and text direction the characters in the image to be recognized are and what typesetting mode is adopted, and the character recognition result corresponding to the line and text direction can be obtained according to the detected line and text direction of each character frame. Therefore, the method can perform character recognition on the image with any format.

Description

Character recognition method and device, storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of text recognition technologies, and in particular, to a text recognition method, an apparatus, a storage medium, and an electronic device.

Background

People often need to process a large amount of pictures, texts and reports in daily life. In order to reduce the workload of people and improve the processing efficiency, a character recognition technology is developed. The character recognition technology can be applied to many fields, such as document and picture retrieval, letter and parcel sorting, manuscript editing and proofreading, statistical statement and card summarizing and analysis, bank check processing, commodity invoice statistical summarizing, commodity code recognition, various certificate recognition, sensitive word checking and the like.

In the related art, characters in an image are recognized by an Optical Character Recognition (OCR) technique. Specifically, a certain type of picture sample with a fixed format is adopted for model training, and then the trained model is utilized for carrying out character recognition on the picture with the fixed format, so that the character content in the picture is obtained. However, such a trained model cannot be used to recognize characters in pictures of other formats.

Disclosure of Invention

The present disclosure is directed to a method, an apparatus, a storage medium, and an electronic device for recognizing characters, so as to solve the problems in the related art.

In order to achieve the above object, according to a first part of embodiments of the present disclosure, there is provided a character recognition method including:

the method comprises the steps of obtaining character frames in an image to be recognized by carrying out positioning detection on characters in the image to be recognized, wherein the characters in each character frame have the same line character direction;

aiming at each text box, determining the line direction of the text in the text box; and are

And obtaining a character recognition result of the character frame according to the line character direction of the character frame.

Optionally, the determining, for each text box, the literary composition direction of the text in the text box includes:

aiming at each text box, determining the coordinates of the center point of each text in the text box; and are

And determining the line direction of the text box according to the coordinates of the central point of each text in the text box.

Optionally, the directions of the lines include a horizontal direction and a vertical direction;

the determining the line direction of the text box according to the coordinates of the central point of each text in the text box comprises:

calculating the X-axis offset and the Y-axis offset of the center point coordinate of each character pair in the character frame, wherein the character pair represents two adjacent characters in the character frame;

calculating the average value of the X-axis offset and the average value of the Y-axis offset of all the character pairs of the character frame;

determining the line direction of the text box as the transverse line direction under the condition that the average value of the X-axis offset is greater than or equal to the average value of the Y-axis offset;

and under the condition that the average value of the X-axis offset is smaller than the average value of the Y-axis offset, determining that the line direction of the text box is the vertical line direction.

Optionally, the obtaining a text recognition result of the text box according to the line direction of the text box includes:

if the line character direction of the character frame is the transverse line character direction, inputting the character frame into a transverse recognition model to obtain the character recognition result of the transverse line character;

and if the literary composition direction of the text box is the vertical literary composition direction, inputting the text box into a vertical recognition model to obtain the text recognition result of the vertical literary composition.

Optionally, before said determining, for each of said text boxes, said rowwise direction of the text in the text box, said method further comprises:

aiming at each text box, inputting the text box into a horizontal recognition model and a vertical recognition model respectively to obtain a first recognition result output by the horizontal recognition model and a second recognition result output by the vertical recognition model;

the obtaining of the character recognition result of the character frame according to the line character direction of the character frame includes:

and selecting the character recognition result of the character frame from the first recognition result and the second recognition result according to the line direction.

Optionally, the determining the literary composition direction of the text in the text box includes:

determining that the line and text direction of the text box is a transverse line and text direction under the condition that the first recognition result is not empty and the length of the text box is greater than the height;

determining that the line direction of the text box is a vertical line direction under the conditions that the first recognition result is not empty, the length of the text box is less than or equal to the height, and the length of the first recognition result is less than the length of the second recognition result;

determining the line direction of the character frame to be the vertical line direction under the conditions that the first recognition result is not empty, the length of the character frame is less than or equal to the height, the length of the first recognition result is greater than or equal to the length of the second recognition result, and the number of characters in the second recognition result is greater than 1;

and under the conditions that the first recognition result is not empty, the length of the character frame is less than or equal to the height, the length of the first recognition result is greater than or equal to the length of the second recognition result, and the number of characters in the second recognition result is less than or equal to 1, determining the line direction of the character frame to be the transverse line direction.

Optionally, before performing location detection on the text in the image to be recognized, the method further includes:

correcting the image to be recognized into a forward view;

after obtaining the text box in the image to be recognized, the method further comprises:

and correcting the obtained text frame to enable the text frame to be rectangular in shape.

Optionally, the correcting the obtained text box to make the text box rectangular in shape includes:

determining a circumscribed quadrangle with the largest intersection ratio with the area of the text box;

determining coordinates of four target vertexes of the circumscribed quadrangle according to the coordinates of the four vertexes of the circumscribed quadrangle;

determining a conversion matrix according to the coordinates of the four vertexes of the circumscribed quadrangle and the coordinates of the four target vertexes;

and converting the text frame according to the conversion matrix to obtain the text frame with a rectangular shape.

Optionally, the horizontal recognition model comprises at least one of a seq2seq decoding module and a CTC decoding module, and the vertical recognition model comprises at least one of a seq2seq decoding module and a CTC decoding module.

Optionally, the horizontal recognition model comprises a seq2seq decoding module and a CTC decoding module, and the vertical recognition model comprises a seq2seq decoding module and a CTC decoding module;

the method further comprises the following steps:

and before the character frame in the vertical line direction is input into the vertical recognition model, rotating the character frame by a preset angle according to a preset direction.

According to a second aspect of the embodiments of the present disclosure, there is provided a character recognition apparatus including:

the generating module is configured to perform positioning detection on characters in an image to be recognized to obtain character frames in the image to be recognized, wherein the characters in each character frame have the same line character direction;

a first determining module configured to determine, for each of the text boxes, the literary direction of a word in the text box;

and the execution module is configured to obtain a character recognition result of the text box according to the line direction of the text box.

Optionally, the first determining module includes:

a first determining submodule configured to determine, for each of the text boxes, coordinates of a center point of each of the text in the text box; and are

And the second determining submodule is configured to determine the line direction of the text box according to the center point coordinate of each text in the text box.

the second determination submodule is specifically configured to:

Optionally, the execution module includes:

the first execution submodule is configured to input the text box into a transverse recognition model when the text direction of the text box is the transverse text direction, so as to obtain the text recognition result of the transverse text;

and the second execution submodule is configured to input the text box into the vertical recognition model when the literary composition direction of the text box is the vertical literary composition direction, so as to obtain the text recognition result of the vertical literary composition.

Optionally, the apparatus further comprises:

the input module is configured to input a horizontal recognition model and a vertical recognition model into each text box before determining the literary composition direction of the text in the text box for each text box, so as to obtain a first recognition result output by the horizontal recognition model and a second recognition result output by the vertical recognition model;

the execution module comprises:

a selection submodule configured to select the character recognition result of the character box from the first recognition result and the second recognition result according to the line direction.

Optionally, the first determining module includes:

a third determining submodule configured to determine that the literary direction of the literary frame is a transverse literary direction if the first recognition result is not empty and the length of the literary frame is greater than the height;

a fourth determination submodule configured to determine that the literary direction of the textbox is a vertical literary direction if the first recognition result is not empty, the length of the textbox is less than or equal to the height, and the length of the first recognition result is less than the length of the second recognition result;

a fifth determining submodule configured to determine that the literary direction of the text box is the vertical literary direction when the first recognition result is not empty, the length of the text box is less than or equal to the height, the length of the first recognition result is greater than or equal to the length of the second recognition result, and the number of characters in the second recognition result is greater than 1;

a sixth determining sub-module, configured to determine that the literary direction of the textbox is the transverse literary direction if the first recognition result is not empty, the length of the textbox is less than or equal to the height, the length of the first recognition result is greater than or equal to the length of the second recognition result, and the number of scripts in the second recognition result is less than or equal to 1.

Optionally, the apparatus further comprises:

the first correction module is configured to correct the image to be recognized into a forward view before positioning detection is carried out on characters in the image to be recognized;

and the second correction module is configured to correct the obtained text frame after obtaining the text frame in the image to be recognized, so that the text frame is rectangular in shape.

Optionally, the second correction module is specifically configured for:

the device further comprises:

a rotation module configured to rotate the text box in the vertical literary direction by a preset angle in a preset direction before inputting the text box in the vertical literary direction into the vertical recognition model.

According to a third part of embodiments of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of any one of the first part above.

According to a fourth aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:

a memory having a computer program stored thereon;

a processor for executing the computer program in the memory to implement the steps of the method of any of the first parts above.

By adopting the technical scheme, the following technical effects can be at least achieved:

and carrying out positioning detection on characters in the image to be recognized to obtain character frames in the image to be recognized, wherein each character frame is a local image comprising the characters in the image to be recognized, and the characters in each character frame have the same line character direction. And determining the line character direction of the characters in the character frame aiming at each character frame, and obtaining the character recognition result of the character frame according to the line character direction of the character frame. Thus, the character recognition result of the image to be recognized can be obtained. By adopting the method, the character frame in the image to be recognized can be detected and obtained no matter what font type and font size the characters in the image to be recognized are, no matter what line and text direction the characters in the image to be recognized are and what typesetting mode is adopted, and the character recognition result corresponding to the line and text direction can be obtained according to the detected specific line and text direction of each character frame. Compared with the related technology, the method can perform character recognition aiming at the images with different formats, namely the method is suitable for performing character recognition on the images with any format, and solves the problems in the related technology.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure without limiting the disclosure. In the drawings:

fig. 1 is a flow chart illustrating a method of text recognition according to an exemplary embodiment of the present disclosure.

Fig. 2 illustrates a neural network for detecting the location of a text in an image to be recognized according to an exemplary embodiment of the present disclosure.

Fig. 3 is a flowchart illustrating a method of determining a line direction of a text box according to an exemplary embodiment of the present disclosure.

FIG. 4 is an architectural block diagram illustrating a lateral recognition model according to an exemplary embodiment of the present disclosure.

Fig. 5 is a flowchart illustrating another method of determining a line direction of a text box according to an exemplary embodiment of the present disclosure.

Fig. 6 is a flowchart illustrating yet another method of determining a line direction of a text box according to an exemplary embodiment of the present disclosure.

FIG. 7 is a flow chart illustrating another method of text recognition according to an exemplary embodiment of the present disclosure.

FIG. 8 is a flow chart illustrating yet another method of word recognition according to an exemplary embodiment of the present disclosure.

Fig. 9 is a block diagram illustrating a text recognition apparatus according to an exemplary embodiment of the present disclosure.

Fig. 10 is a block diagram illustrating an electronic device according to an exemplary embodiment of the present disclosure.

Detailed Description

The following detailed description of specific embodiments of the present disclosure is provided in connection with the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present disclosure, are given by way of illustration and explanation only, not limitation.

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

People often need to process a large amount of pictures, texts and reports in daily life. For example, sensitive word auditing is performed on words in a picture. As another example, a copy or picture of the document is identified. And for example, recording the text information in the bill picture, and the like.

In the related art, a trained model is generally used to recognize characters in an image. For example, for an identity card image, a large number of identity card images are used as training samples to be trained to obtain an identity card recognition model, and then the identity card image to be recognized is input into the trained identity card recognition model to obtain a character recognition result output by the identity card recognition model. For example, for a common invoice image, a large number of common invoice images are used as training samples to train to obtain an invoice identification model, and then the common invoice image to be identified is input into the trained invoice identification model to obtain a character identification result output by the invoice identification model.

However, each model obtained by training with the method in the related art can only recognize the image of the fixed format corresponding to the model, that is, the identification card recognition model cannot recognize the characters in the image of the ordinary invoice, and the invoice recognition model cannot recognize the characters in the image of the identification card. It is understood that each model cannot be applied to images other than the fixed format corresponding to the model because the non-fixed character arrangement increases the difficulty of character recognition, and when the character size and font of the character to be recognized in the image are changed drastically, the difficulty of character recognition is also increased.

In view of the above, embodiments of the present disclosure provide a method, an apparatus, a storage medium, and an electronic device for recognizing characters, so as to solve the problems in the related art.

Fig. 1 is a flow chart illustrating a method of text recognition according to an exemplary embodiment of the present disclosure, which may include the following steps, as shown in fig. 1:

s11, carrying out positioning detection on characters in an image to be recognized to obtain character frames in the image to be recognized, wherein the characters in each character frame have the same line direction;

s12, aiming at each character frame, determining the line direction of the characters in the character frame;

and S13, obtaining a character recognition result of the character frame according to the line character direction of the character frame.

The text box is a local image including text in the image to be recognized. The characters in each character frame have the same line direction. For example, the directions of the lines may be transverse, vertical, and diagonal.

In an implementation manner, the joint detection neural network based on pixel segmentation and attention mechanism shown in fig. 2 may be used to perform positioning detection on the text in the image to be recognized to obtain the text box in the image to be recognized, for example, the text box shown as a white box in the lower right corner picture of fig. 2 may be obtained.

After the text box in the image to be recognized is obtained, the line direction of the text box can be determined. By way of example, with continued reference to fig. 2, it is unambiguously possible to determine that the line-text direction of the text box shown by the white box in fig. 2 is the landscape.

After determining the line direction of the text box, the recognition result of the text box can be obtained based on the line direction of the text box. For example, each character in the text box is sequentially recognized according to the line direction of the text box, and a corresponding character recognition result is obtained.

By adopting the method, the text box in the image to be recognized is obtained by positioning and detecting the text in the image to be recognized, wherein each text box is a local image comprising the text in the image to be recognized, and the text in each text box has the same line text direction. And determining the line character direction of the characters in the character frame aiming at each character frame, and obtaining the character recognition result of the character frame according to the line character direction of the character frame. Thus, the character recognition result of the image to be recognized can be obtained. By adopting the method, the character frame in the image to be recognized can be detected and obtained no matter what font type and size of characters in the image to be recognized, and no matter what line and text direction and typesetting mode of the characters in the image to be recognized, and the specific line and text direction of the characters in each character frame, such as horizontal direction, vertical direction or oblique direction, can be obtained. And according to the detected specific line character direction of each character frame, obtaining a character recognition result corresponding to the line character direction. Compared with the related technology, the method can perform character recognition aiming at the images with different formats, namely the method is suitable for performing character recognition on the images with any format, and solves the problems in the related technology.

Optionally, the determining, for each text box, the literary direction of the text in the text box may include the following steps:

aiming at each text box, determining the coordinates of the center point of each text in the text box; and determining the line direction of the text frame according to the coordinates of the central point of each text in the text frame.

Specifically, a fasterncn network may be used to locate each single character in the image to be recognized, so as to obtain the coordinates of the center point of each character. And for each text box, the line direction of the text box can be determined according to the coordinates of the center point of each text in the text box.

It should be noted here that, in the present disclosure, the direction of the text specifically indicates whether the text is arranged left and right or up and down. It is understood that the directions of the texts can be oblique texts, concave-convex texts, etc. besides horizontal texts and vertical texts. And for the characters of the oblique line characters and the concave-convex line characters, the characters can be classified horizontally and vertically according to whether the characters are arranged horizontally or vertically. Thus, in the present disclosure, all types of literary directions can be divided into a horizontal literary direction and a vertical literary direction.

In detail, referring to fig. 3, the line direction of each text box can be determined by:

and S31, aiming at each text box, calculating the X-axis offset and the Y-axis offset of the center point coordinate of each text pair in the text box, wherein the text pair represents two adjacent texts in the text box.

A word pair refers to any two adjacent words in the word box. For example, assuming that the text in the text box is horizontal/vertical/diagonal ABCDE, all pairs of text in the text box can be determined to be AB, BC, CD, DE. Next, the X-axis offset and Y-axis offset of the two words in each word pair are calculated. For example, the X-axis offset of the letter pair AB is calculated to be | A_x-B_xThe Y-axis offset of the text to AB is | A_y-B_yAnd | wherein | represents taking absolute value.

And S32, calculating the average value of the X-axis offset and the average value of the Y-axis offset of all the character pairs of the character frame.

By way of example, the data may be represented by a formula

The average of the X-axis offset of all the character pairs AB, BC, CD, DE is calculated. Similarly, the average of the Y-axis offsets of all the word pairs can be calculated.

And S33, determining the line direction of the character frame as the transverse line direction when the average value of the X-axis offset is larger than or equal to the average value of the Y-axis offset.

After the average value of the X-axis offset and the average value of the Y-axis offset of all the character pairs in the character frame are determined, if the average value of the X-axis offset is larger than or equal to the average value of the Y-axis offset, the line direction of the character frame is determined to be the transverse line direction.

It should be noted that when the average value of the X-axis offset is used

Greater than or equal to the average value of the Y-axis offset

When the temperature of the water is higher than the set temperature,

and

the ratio of (A) to (B) is greater than or equal to 1, and the included angle between the projection of the character central point deviation mean value in the character frame and the X axis is inevitably between-45 degrees and 45 degrees according to the trigonometric function.

And S34, determining the line direction of the text box as the vertical line direction when the average value of the X-axis offset is smaller than the average value of the Y-axis offset.

After the average value of the X-axis offset and the average value of the Y-axis offset of all the character pairs in the character frame are determined, if the average value of the X-axis offset is smaller than the average value of the Y-axis offset, the line direction of the character frame is determined to be the vertical line direction.

It will be understood that the average value of the X-axis offset

Less than the average value of the Y-axis offset

When the temperature of the water is higher than the set temperature,

and

the ratio of (A) to (B) is less than 1, and the projection of the mean value of the character center point deviation in the character frame can be known according to the trigonometric functionThe included angle between the shadow and the X axis is necessarily between 45 degrees and 90 degrees or between-90 degrees and-45 degrees.

Therefore, in the above-described manner of determining the line direction of each text frame, in addition to determining the line direction of the text frame of the horizontal line as the horizontal line direction and determining the line direction of the text frame of the vertical line as the vertical line direction, the line direction of the inclined text frame may be determined as the horizontal line or the vertical line. Therefore, the Chinese characters in the character frames (namely the character frames with the projection of the deviation mean value of the central points of the characters in the character frames and the X axis as any value) in all types of Chinese character directions in the image to be recognized can be determined, and the Chinese characters are specifically horizontal or vertical Chinese characters.

Further, after determining the line direction of each text frame in the image to be recognized, the obtaining a text recognition result of the text frame according to the line direction of the text frame may include the following steps:

if the line character direction of the character frame is the transverse line character direction, inputting the character frame into a transverse recognition model to obtain the character recognition result of the transverse line character; and if the literary composition direction of the text box is the vertical literary composition direction, inputting the text box into a vertical recognition model to obtain the text recognition result of the vertical literary composition.

It should be noted that the horizontal recognition model is obtained by training with text box samples of horizontal lines. The vertical recognition model is obtained by training a text box sample of a vertical line. Therefore, the character recognition result of the horizontal line of the character frame can be obtained by inputting the character frame with the horizontal line direction as the line direction into the horizontal recognition model. Inputting the character frame with the vertical character direction as the character frame into the vertical recognition model to obtain the character recognition result of the vertical character of the character frame

The specific training processes of the horizontal recognition model and the vertical recognition model are similar to those of the model training method in the related art, and are not described herein again.

Optionally, the horizontal recognition model comprises a seq2seq decoding module and a CTC decoding module, and the vertical recognition model comprises a seq2seq decoding module and a CTC decoding module.

Wherein, the seq2seq (extensive Sequence to Sequence) decoding module decodes the encoded feature vector by using a Sequence-to-Sequence identification mode based on an attention mechanism. Due to the introduction of the recurrent neural network and the fact that the seq2seq decoder uses the sequence attention module, the seq2seq decoding module not only depends on the characteristics of the characters itself, but also depends on the characteristics of the context information between the characters in the decoding process. The character sequence recognition result (i.e., the character recognition result) finally output by the seq2seq decoding module includes the category of each character and the probability corresponding to the category.

The ctc (connectionist Temporal classification) decoding module decodes the encoded feature vectors using sequence concatenation classification. Since no recurrent neural network is introduced and CTCs consider only image texture features, the CTC decoding module relies mainly on the native features of the characters for decoding. The final output character sequence recognition result of the CTC decoding module comprises each character category and the corresponding probability of the category.

It should be noted that the horizontal recognition model and the vertical recognition model may share a coding module. The parameters of the feature extraction layer in the coding module can be obtained by training by using networks such as ResNet, Inception V4, DenseNet and the like.

Illustratively, taking a horizontal recognition model as an example, the architecture of the horizontal recognition model may be as shown in fig. 4. It should be understood that, when the decoding module in the transverse recognition model includes a seq2seq decoding module and a CTC decoding module, the feature vectors of the text box can be decoded by the seq2seq decoding module and the CTC decoding module respectively, and then the output results of the seq2seq decoding module and the CTC decoding module are probability-fused to obtain the final text recognition result of the text box.

The probability fusion calculation formula is as follows:

or

S＝argmax(min(P(S1₁),P(S1₂),…,P(S1_m)),min(P(S2₁),P(S2₂),…,P(S2_n)))，

Wherein S represents the final character recognition result, S1 and S2 represent the output results of the CTC decoding module and the seq2seq decoding module respectively, and m and n represent the number of characters in the output results of the CTC decoding module and the seq2seq decoding module respectively.

Optionally, the method may further include:

For example, before entering a text box in a vertical literary direction into the vertical recognition model, the text box may be rotated 90 degrees in a clockwise direction.

It should be noted that, in the process of recognizing the text box in the vertical line direction by using the vertical recognition model with the architecture similar to that of fig. 4, a feature map with a length W and a height 1 can be obtained after passing through the feature extraction layer, and then, during decoding, the feature maps are sequentially decoded in the X-axis direction. In this process, if the height of the text box in the vertical line direction is directly compressed to 1, the text in the text box cannot be recognized at all, and therefore, in the embodiment of the present disclosure, before the text box in the vertical line direction is input into the vertical recognition model, the text box is rotated by a preset angle according to the preset direction, so that when each text in the text box is recognized in turn according to the X-axis direction in the vertical recognition model, the text recognition is actually performed in turn according to the original line direction from top to bottom or from bottom to top in the text box.

In an implementation, before determining, for each of the text boxes, the line direction of the text in the text box, the method may further include the following steps:

and aiming at each character frame, inputting the character frame into a horizontal recognition model and a vertical recognition model respectively to obtain a first recognition result output by the horizontal recognition model and a second recognition result output by the vertical recognition model. And selecting the character recognition result of the character frame from the first recognition result and the second recognition result according to the line direction

In a possible implementation manner, in the case that the line direction of the text box is not determined, that is, in the case that the line direction of the text box is determined by not positioning each single text in the image to be recognized by using the fasterncn network, the text box may be input into the horizontal recognition model and the vertical recognition model, respectively, so as to obtain a first recognition result output by the horizontal recognition model and a second recognition result output by the vertical recognition model.

And the line direction of the text box can be determined according to the first recognition result and the second recognition result of the text box. After the line direction of the text box is determined according to the first recognition result and the second recognition result of the text box, the final text recognition result of the text box can be selected from the first recognition result and the second recognition result according to the determined line direction.

In such an implementable embodiment, the horizontal recognition model includes at least one of a seq2seq decoding module and a CTC decoding module, and the vertical recognition model includes at least one of a seq2seq decoding module and a CTC decoding module.

Specifically, if the horizontal recognition model and the vertical recognition model are input to the text box without determining the line direction of the text box, the horizontal recognition model and the vertical recognition model may be decoded by only one of the seq2seq decoding module and the CTC decoding module in order to obtain a recognition result quickly.

In detail, referring to fig. 5, determining the line direction of the text box based on the first recognition result and the second recognition result of the text box may include:

s51, under the condition that the first recognition result and the second recognition result are both empty, determining that the line direction of the text box is a default line direction, wherein the default line direction is the transverse line direction or the vertical line direction.

It should be understood that, in the case where the first recognition result and the second recognition result of the text box are both empty, the content characterizing the text box cannot be determined, i.e., neither the first recognition result nor the second recognition result has meaning. Therefore, in this case, the line direction of the text box can be determined to be the horizontal line direction or the vertical line direction.

S52, determining the line direction of the text box as the vertical line direction under the condition that the first recognition result is empty and the second recognition result is not empty.

In the case where the first recognition result is empty but the second recognition result is not empty, the characterizing horizontal recognition model cannot recognize the content of the text box, and the vertical recognition model can recognize the content of the text box. Therefore, in this case, it can be determined that the line direction of the text box is the vertical line direction.

S53, determining the line direction of the text box to be a transverse line direction if the first recognition result is not empty and the length of the text box is greater than the height.

In the case where the first recognition result is not empty, the explanation lateral recognition model may recognize the text box whose line direction may be lateral. While, as one possible scenario, when the horizontal recognition model can recognize the text box, it may be recognized that the result of obtaining the text box is a 0 value or a non-0 value. Therefore, on the premise of considering the accuracy of the recognition result, when the length of the text box is determined to be greater than the height, the line direction of the text box can be determined to be the transverse line direction.

Wherein the length of the text frame is the transverse length, and the height is the vertical length.

S54, determining that the line direction of the text box is a vertical line direction when the first recognition result is not empty, the length of the text box is less than or equal to the height, and the length of the first recognition result is less than the length of the second recognition result.

In a possible case, in the step S53, if it is further determined that the length of the text box is less than or equal to the height, it indicates that the text box may still be in the vertical line direction even though the horizontal recognition model can recognize the text box. In this case, if it is determined that the length of the first recognition result is smaller than the length of the second recognition result, it is determined that the line direction of the text box is the vertical line direction.

S55, determining the line direction of the character frame to be the vertical line direction when the first recognition result is not empty, the length of the character frame is less than or equal to the height, the length of the first recognition result is greater than or equal to the length of the second recognition result, and the number of characters in the second recognition result is greater than 1.

In a possible case, if it is further determined in step S54 that the length of the first recognition result is greater than or equal to the length of the second recognition result, it is still further determined that the line direction of the text box is the vertical line direction if it is determined that the number of characters in the second recognition result is greater than 1.

And S56, determining the line direction of the character frame to be the transverse line direction when the first recognition result is not empty, the length of the character frame is less than or equal to the height, the length of the first recognition result is greater than or equal to the length of the second recognition result, and the number of characters in the second recognition result is less than or equal to 1.

If it is further determined in step S55 that the number of characters in the second recognition result is less than or equal to 1, the line direction of the character frame is determined to be the horizontal line direction.

And after the line character direction of the character frame is determined according to the first recognition result and the second recognition result, selecting a corresponding result from the first recognition result and the second recognition result as a character recognition result of the character frame according to the determined line character direction of the character frame.

Referring to fig. 6, the specific implementation steps of the method for determining the line direction of the text box based on the first recognition result and the second recognition result of the text box may be:

s601, judging whether the first recognition result of the text box is empty or not;

if yes, go to S602, otherwise go to S605.

S602, judging whether a second recognition result of the text box is empty or not;

if so, perform S603, otherwise perform S604.

S603, determining the line character direction of the character frame as a transverse line character direction;

s604, determining the line and text direction of the text box as a vertical line and text direction;

s605, judging whether the length of the text box is larger than the height;

if so, perform S606, otherwise perform S607.

S606, determining the line and text direction of the text box as a transverse line and text direction;

s607, judging whether the length of the second recognition result is larger than that of the first recognition result;

if so, perform S608, otherwise perform S609.

S608, determining the line direction of the text frame as a vertical line direction;

s609, judging whether the number of the characters in the second recognition result is greater than 1;

if so, perform S610, otherwise perform S611.

S610, determining the line direction of the text box as a vertical line direction;

s611, determining the line direction of the text box to be a transverse line direction.

S603, S606, and S611 may be the same step. S604, S608, and S610 may be the same step. It should be noted that the order between the respective judgment steps in the above steps can be adjusted without departing from the idea of the above method.

Optionally, for any of the above character recognition methods, before performing location detection on the characters in the image to be recognized, the method may further include: and correcting the image to be recognized into a forward view.

Inevitably, the image to be recognized uploaded by the user may be a forward image, an inverted image, a clockwise rotated image, or a counterclockwise rotated image. The inverted or rotated image can affect the character positioning detection and can affect the identification process of the horizontal or vertical identification model. Therefore, in order to improve the accuracy of character positioning detection and the accuracy of a character recognition process, embodiments of the present disclosure may correct an image to be recognized into a forward view before performing positioning detection on characters in the image to be recognized. Specifically, the direction of the image to be recognized may be determined by using a classification model trained through refinement, and the direction of the image to be recognized may be corrected according to the classification result.

Optionally, after obtaining the text box in the image to be recognized, the method may further include:

When an image is captured, the capturing angle affects the shape of the subject. It is therefore understood that the text box derived from the image to be recognized may have an angular tilt (i.e., distortion) and thus the text in the text box may have an angular tilt. In view of this, in order to overcome the problem of the inclination of the character due to the photographing angle, the shape of the character frame may be corrected so that the character frame has a rectangular shape.

An implementation manner of correcting the obtained text box to make the text box rectangular in shape may specifically include the following steps:

determining a circumscribed quadrangle with the largest intersection ratio with the area of the text box; determining coordinates of four target vertexes of the circumscribed quadrangle according to the coordinates of the four vertexes of the circumscribed quadrangle; determining a conversion matrix according to the coordinates of the four vertexes of the circumscribed quadrangle and the coordinates of the four target vertexes; and converting the text frame according to the conversion matrix to obtain the text frame with a rectangular shape.

Specifically, the coordinates (x) of four vertices of the circumscribed quadrangle are determined₀,y₀),(x₁,y₁),(x₂,y₂),(x₃,y₃) The target length and width of the circumscribed quadrangle can be calculated. For exampleTarget length is

Target width of

Further, the coordinates of the four target vertices of the circumscribed quadrangle may be determined to be (0,0), (w,0), (w, h), (0, h) according to the target length and width of the circumscribed quadrangle. The transformation matrix can be obtained by random sampling of RANSAC according to the coordinates of the four vertexes of the circumscribed quadrangle and the coordinates of the four target vertexes, for example, the transformation matrix is

The text frame can be converted according to the conversion matrix to obtain the text frame with a rectangular shape.

By adopting the mode, the character frame with the included angle between the projection of the character frame and the X axis being in the range of (-45 degrees and 45 degrees) can be corrected into the horizontal character frame, namely, the character frame with the deformation angle being in the range of (-45 degrees and 45 degrees) can be corrected into the horizontal character frame. And correcting the character frame with the included angle between the projection of the character frame and the X axis being in the range of (45 degrees, 90 degrees) or (-90 degrees, minus 45 degrees) into a vertical character frame, namely correcting the character frame with the deformation angle being in the range of (45 degrees, 90 degrees) or (-90 degrees, minus 45 degrees) into a vertical character frame. It should be noted that this way of correcting the text box does not change the line direction of the text in the text box.

Fig. 7 is a flow chart illustrating another method of word recognition according to an exemplary embodiment of the present disclosure, which may include the steps of, as shown in fig. 7:

s71, correcting the image to be recognized into a forward view;

s72, carrying out positioning detection on the characters in the image to be recognized to obtain a character frame in the image to be recognized;

s73, correcting the obtained text box to make the shape of the text box be rectangular;

s74, inputting the text box into the horizontal recognition model and the vertical recognition model respectively aiming at each text box to obtain a first recognition result output by the horizontal recognition model and a second recognition result output by the vertical recognition model;

and the decoding modules in the horizontal recognition model and the vertical recognition model are seq2seq decoding modules or CTC decoding modules.

S75, determining the line direction of the text box according to the first recognition result and the second recognition result of the text box;

and S76, selecting the character recognition result of the character frame from the first recognition result and the second recognition result according to the line character direction of the character frame.

By adopting the method, when the decoding modules in the horizontal recognition model and the vertical recognition model are CTC decoding modules, the character recognition speed of the method is higher than that when the decoding modules in the horizontal recognition model and the vertical recognition model are seq2seq decoding modules, but the accuracy is relatively low. The reason is that the seq2seq decoding module adopts an attention mechanism and combines the text context information in the decoding process.

Fig. 8 is a flowchart illustrating yet another method of word recognition according to an exemplary embodiment of the present disclosure, which may include the following steps, as shown in fig. 8:

s81, correcting the image to be recognized into a forward view;

s82, carrying out positioning detection on the characters in the image to be recognized to obtain a character frame in the image to be recognized;

s83, determining the coordinates of the center point of each character in each character frame aiming at each character frame; and determining the line direction of the text frame according to the coordinates of the central point of each text in the text frame.

S84, correcting the obtained text box to make the shape of the text box be rectangular;

s85, if the line direction of the text box is the transverse line direction, inputting the text box into a transverse recognition model, wherein the transverse recognition model comprises a seq2seq decoding module and a CTC decoding module;

s86, if the line direction of the text box is the vertical line direction, inputting the text box into a vertical recognition model, wherein the vertical recognition model comprises a seq2seq decoding module and a CTC decoding module;

and S87, performing probability fusion processing on the results output by the seq2seq decoding module and the CTC decoding module to obtain the character recognition result of the character frame.

With this method, compared with the two methods shown in fig. 7, because step S83 is added, and the seq2seq decoding module and the CTC decoding module are used to decode the text box respectively, and the probability fusion process is also added, the recognition speed is slower than that of the two methods shown in fig. 7, but the accuracy of text recognition is higher.

If the method corresponding to the decoding module in the horizontal recognition model and the vertical recognition model shown in fig. 7 as the CTC decoding module is the first mode, the method corresponding to the decoding module in the horizontal recognition model and the vertical recognition model shown in fig. 7 as the seq2seq decoding module is the second mode, and the method corresponding to fig. 8 is the third mode. After the test, the three ways are adopted to perform character recognition on the same image to be recognized, and the obtained specific differences are shown in table 1.

TABLE 1

Means for	Rate of accuracy	Time consuming
			First mode	85％	1.5508s
Second oneMeans for	89％	1.9984s
			Third mode	91％	2.4624s

According to an implementation mode, corresponding operation options can be set on a user operation interface respectively aiming at the three modes, so that a user can flexibly select any one of the three modes according to requirements to identify character information in an image to be identified.

For example, when the user has a high requirement on the real-time performance of character recognition but has a low requirement on the accuracy of the recognition result, the first method may be selected to recognize characters in the image to be recognized, so as to quickly obtain character information in the image to be recognized.

In another example, when the real-time requirement and the accuracy requirement of the user for character recognition are balanced, the second mode may be selected to recognize the characters in the image to be recognized.

For another example, when the user has a high requirement on the accuracy of character recognition, the third method may be selected to recognize characters in the image to be recognized, so as to obtain a character recognition result with high accuracy.

Based on the same inventive concept, an embodiment of the present disclosure further provides a character recognition apparatus, as shown in fig. 9, the apparatus 900 includes:

the generating module 901 is configured to perform positioning detection on characters in an image to be recognized to obtain character frames in the image to be recognized, where the characters in each character frame have the same line direction;

a first determining module 902 configured to determine, for each of the text boxes, the literary direction of the text in the text box;

and the execution module 903 is configured to obtain a text recognition result of the text box according to the line text direction of the text box.

By adopting the device, the text frame in the image to be recognized is obtained by positioning and detecting the text in the image to be recognized, wherein each text frame is a local image comprising the text in the image to be recognized, and the text in each text frame has the same line text direction. And determining the line character direction of the characters in the character frame aiming at each character frame, and obtaining the character recognition result of the character frame according to the line character direction of the character frame. Thus, the character recognition result of the image to be recognized can be obtained. By adopting the device, the character frame in the image to be recognized can be detected and obtained no matter what font type and size of characters in the image to be recognized, and no matter what line and text direction and what typesetting mode of the characters in the image to be recognized are adopted, and the specific line and text direction, such as horizontal direction, vertical direction or oblique direction, of the characters in each character frame is judged. And according to the detected specific line character direction of each character frame, obtaining a character recognition result corresponding to the line character direction. Compared with the related technology, the method can perform character recognition on the images with different formats, namely the device is suitable for performing character recognition on the images with any format, and the problems in the related technology are solved.

Optionally, the first determining module 902 includes:

the second determination submodule is specifically configured to:

Optionally, the executing module 903 includes:

Optionally, the apparatus 900 further comprises:

the execution module 903 comprises:

Optionally, the first determining module 902 includes:

Optionally, the apparatus 900 further comprises:

Optionally, the second correction module is specifically configured for:

the apparatus 900 further comprises:

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 10 is a block diagram illustrating an electronic device 700 in accordance with an example embodiment. As shown in fig. 10, the electronic device 700 may include: a processor 701 and a memory 702. The electronic device 700 may also include one or more of a multimedia component 703, an input/output (I/O) interface 704, and a communication component 705.

The processor 701 is configured to control the overall operation of the electronic device 700, so as to complete all or part of the steps in the above-mentioned character recognition method. The memory 702 is used to store various types of data to support operation at the electronic device 700, such as instructions for any application or method operating on the electronic device 700 and application-related data, such as contact data, transmitted and received messages, pictures, audio, video, and the like. The Memory 702 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk, or optical disk. The multimedia components 703 may include screen and audio components. Wherein the screen may be, for example, a touch screen and the audio component is used for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may further be stored in the memory 702 or transmitted through the communication component 705. The audio assembly also includes at least one speaker for outputting audio signals. The I/O interface 704 provides an interface between the processor 701 and other interface modules, such as a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 705 is used for wired or wireless communication between the electronic device 700 and other devices. Wireless Communication, such as Wi-Fi, bluetooth, Near Field Communication (NFC), 2G, 3G, 4G, NB-IOT, eMTC, or other 5G, etc., or a combination of one or more of them, which is not limited herein. The corresponding communication component 705 may thus include: Wi-Fi module, Bluetooth module, NFC module, etc.

In an exemplary embodiment, the electronic Device 700 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components for performing the above-described text recognition method.

In another exemplary embodiment, a computer readable storage medium comprising program instructions which, when executed by a processor, implement the steps of the above-described text recognition method is also provided. For example, the computer readable storage medium may be the memory 702 described above that includes program instructions that are executable by the processor 701 of the electronic device 700 to perform the text recognition method described above.

The preferred embodiments of the present disclosure are described in detail with reference to the accompanying drawings, however, the present disclosure is not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solution of the present disclosure within the technical idea of the present disclosure, and these simple modifications all belong to the protection scope of the present disclosure.

It should be noted that the various features described in the above embodiments may be combined in any suitable manner without departing from the scope of the invention. In order to avoid unnecessary repetition, various possible combinations will not be separately described in this disclosure.

In addition, any combination of various embodiments of the present disclosure may be made, and the same should be considered as the disclosure of the present disclosure, as long as it does not depart from the spirit of the present disclosure.

Claims

1. A method for recognizing a character, the method comprising:

2. The method of claim 1, wherein said determining, for each of said text boxes, said direction of said line of text in said text box comprises:

3. The method of claim 2, wherein the directions of the lines comprise a transverse direction of the lines and a vertical direction of the lines;

4. The method of claim 3, wherein obtaining the text recognition result of the text box according to the line direction of the text box comprises:

5. The method of claim 1, wherein prior to said determining, for each of said text boxes, said textual direction of the text in that text box, the method further comprises:

6. The method of claim 5, wherein determining the textual orientation of the text in the text box comprises:

7. The method according to any one of claims 1-6, wherein prior to performing location detection on the text in the image to be recognized, the method further comprises:

correcting the image to be recognized into a forward view;

8. The method according to claim 7, wherein the correcting the obtained text box to make the text box rectangular in shape comprises:

9. The method of claim 5, wherein the lateral recognition model comprises at least one of a seq2seq decoding module and a CTC decoding module, and wherein the vertical recognition model comprises at least one of a seq2seq decoding module and a CTC decoding module.

10. The method of claim 4, wherein the horizontal recognition model comprises a seq2seq decoding module and a CTC decoding module, and the vertical recognition model comprises a seq2seq decoding module and a CTC decoding module;

the method further comprises the following steps:

11. A character recognition apparatus, comprising:

12. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 10.

13. An electronic device, comprising:

a memory having a computer program stored thereon;

a processor for executing the computer program in the memory to carry out the steps of the method of any one of claims 1 to 10.