CN111985469B - Method and device for recognizing characters in image and electronic equipment - Google Patents

Method and device for recognizing characters in image and electronic equipment Download PDF

Info

Publication number
CN111985469B
CN111985469B CN201910427882.2A CN201910427882A CN111985469B CN 111985469 B CN111985469 B CN 111985469B CN 201910427882 A CN201910427882 A CN 201910427882A CN 111985469 B CN111985469 B CN 111985469B
Authority
CN
China
Prior art keywords
rectangular frame
frame area
characters
image
row
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910427882.2A
Other languages
Chinese (zh)
Other versions
CN111985469A (en
Inventor
徐潇宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Office Software Inc
Zhuhai Kingsoft Office Software Co Ltd
Original Assignee
Beijing Kingsoft Office Software Inc
Zhuhai Kingsoft Office Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Office Software Inc, Zhuhai Kingsoft Office Software Co Ltd filed Critical Beijing Kingsoft Office Software Inc
Priority to CN201910427882.2A priority Critical patent/CN111985469B/en
Publication of CN111985469A publication Critical patent/CN111985469A/en
Application granted granted Critical
Publication of CN111985469B publication Critical patent/CN111985469B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • G06V10/242Aligning, centring, orientation detection or correction of the image by image rotation, e.g. by 90 degrees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Character Input (AREA)

Abstract

The embodiment of the invention provides a method and a device for recognizing characters in an image and electronic equipment, wherein the method comprises the following steps: acquiring an image to be identified; carrying out character area recognition on the image to be recognized, and determining each rectangular frame area containing a row of characters; rotating the first rectangular frame area by 180 degrees to obtain a second rectangular frame area; inputting the first rectangular frame area and the second rectangular frame area into a character recognition model, and carrying out character recognition according to the image characteristics of the first rectangular frame area and the second rectangular frame area to obtain character recognition results and correct probabilities thereof corresponding to the first rectangular frame area and the second rectangular frame area; comparing the correct probabilities, and determining the direction of the characters in the rectangular frame area corresponding to the highest correct probability as a target direction; and carrying out character recognition on the rectangular frame areas except the first rectangular frame area according to the target direction to obtain a recognition result. By adopting the embodiment of the invention, characters in various directions can be identified, and the accuracy of character identification in images can be improved.

Description

Method and device for recognizing characters in image and electronic equipment
Technical Field
The present invention relates to the field of image processing technologies, and in particular, to a method and an apparatus for recognizing characters in an image, and an electronic device.
Background
At present, in many industries such as banks, insurance, finance, libraries and the like, characters in an image need to be recorded into an information database to store related information, and then the image needs to be identified to obtain the characters therein.
Conventional image recognition methods generally include two steps: the first step is to identify the region in the image where the text is located, such as using a conventional image processing algorithm, e.g., the sobel operator, or a deep learning method; the second step is to perform character recognition on the region where the characters are located in the image recognized in the first step, for example, using a deep learning method or OCR (Optical Character Recognition ) to perform character recognition on the region where the characters are located in the image.
If the direction of the text in the image matches the direction of the viewing angle of the person viewing the image, then some of the text in the image may not be forward, but the above image recognition method can only recognize the image with the text direction being forward, and if the text direction is not forward, the text in the image cannot be correctly recognized.
Disclosure of Invention
The embodiment of the invention aims to provide a character recognition method in an image, which can accurately recognize characters in the image when the character direction in the image does not accord with the direction of a viewing angle of a person for watching the image. The specific technical scheme is as follows:
in a first aspect, an embodiment of the present invention provides a method for identifying characters in an image, where the method includes:
acquiring an image to be identified, wherein the image to be identified contains characters;
performing character area recognition on the image to be recognized, and determining each rectangular frame area containing a row of characters;
rotating the first rectangular frame area by 180 degrees to obtain a second rectangular frame area, wherein the first rectangular frame area is one of the rectangular frame areas containing one row of characters;
inputting the first rectangular frame area and the second rectangular frame area into a character recognition model, and performing character recognition according to the image features of the first rectangular frame area and the second rectangular frame area to obtain character recognition results and correct probabilities thereof corresponding to the first rectangular frame area and the second rectangular frame area, wherein the character recognition model comprises a corresponding relation between the image features and the character recognition results and correct probabilities thereof;
Comparing the correct probabilities, and determining the direction of the characters in the rectangular frame area corresponding to the highest correct probability as a target direction;
and carrying out character recognition on the rectangular frame areas except the first rectangular frame area according to the target direction to obtain a recognition result.
Optionally, before the step of rotating the first rectangular frame area by 180 degrees to obtain the second rectangular frame area, the method further includes:
judging whether the height of each rectangular frame area containing a row of characters is larger than the width;
if the height of the rectangular frame area containing one row of characters is larger than the width, rotating the rectangular frame area containing one row of characters by 90 degrees towards the preset direction.
Optionally, the step of identifying the text region of the image to be identified and determining each rectangular frame region containing a row of text includes:
inputting the image to be identified into a text line detection model, and processing according to the image characteristics of the image to be identified to obtain the corresponding relation between the distance between each pixel point in the image to be identified and the four sides of a rectangular frame area containing a row of characters and the included angle between the rectangular frame area containing a row of characters and the horizontal direction, wherein the text line detection model comprises the distance between each pixel point in the image and the four sides of the rectangular frame area containing a row of characters and the corresponding relation between the rectangular frame area and the horizontal direction;
According to the distance, determining rectangular frame areas containing a row of characters in each image to be identified through a non-maximum value suppression algorithm;
and adjusting the rectangular frame area containing one line of characters with the included angle larger than 45 degrees to be vertical, and adjusting the rectangular frame area containing one line of characters with the included angle larger than 0 degrees and not larger than 45 degrees to be horizontal.
Optionally, the output result of the text line detection model further includes a probability that each pixel point in the image to be identified belongs to a rectangular frame area;
the step of determining each rectangular frame area containing a row of characters in the image to be identified through a non-maximum suppression algorithm according to the distance comprises the following steps:
removing pixel points which do not belong to the rectangular frame area according to the probability that each pixel point in the image to be identified belongs to the rectangular frame area and a preset threshold value;
and determining each rectangular frame area containing a row of characters in the image to be identified through a non-maximum value suppression algorithm according to the distance corresponding to the residual pixel points.
Optionally, the training manner of the text line detection model includes:
acquiring an initial text line detection model and a plurality of image samples;
Marking four vertex coordinates of each rectangular frame area containing a row of characters in each image sample according to a preset rule;
calculating the distance between each pixel point and four sides of the rectangular frame area containing one row of characters, the included angle between each rectangular frame area containing one row of characters and the horizontal direction and the probability that each pixel point belongs to the rectangular frame area according to the four vertex coordinates of each rectangular frame area containing one row of characters, and obtaining the detection label of each image sample;
inputting the image sample into the initial text line detection model to obtain a prediction label;
adjusting parameters of the initial text line detection model based on the prediction labels and the detection labels of the corresponding image samples;
and judging whether the iteration times of the initial text line detection model reach preset times or not, or whether the accuracy of the predictive label output by the initial text line detection model reaches a preset value or not, and stopping training to obtain the text line detection model.
Optionally, the step of performing text recognition on the rectangular frame area other than the first rectangular frame area according to the target direction to obtain a recognition result includes:
If the target direction is the direction corresponding to the Chinese character in the first rectangular frame area, identifying each rectangular frame area containing a row of characters except the first rectangular frame area in the image to be identified, and obtaining characters corresponding to each rectangular frame area containing a row of characters;
if the target direction is the direction corresponding to the Chinese character in the second rectangular frame area, rotating the image to be identified by 180 degrees to obtain a target identification image; and identifying each rectangular frame area containing one row of characters except the first rectangular frame area in the target identification image to obtain characters corresponding to each rectangular frame area containing one row of characters.
In a second aspect, an embodiment of the present invention provides a device for recognizing characters in an image, where an image to be recognized acquisition module is configured to acquire an image to be recognized, where the image to be recognized includes characters;
the rectangular frame area determining module is used for carrying out character area recognition on the image to be recognized and determining each rectangular frame area containing a row of characters;
the first rotating module is used for rotating the first rectangular frame area by 180 degrees to obtain a second rectangular frame area, wherein the first rectangular frame area is one of the rectangular frame areas containing one row of characters;
The first recognition module is used for inputting the first rectangular frame area and the second rectangular frame area into a character recognition model, and carrying out character recognition according to the image features of the first rectangular frame area and the second rectangular frame area to obtain character recognition results and correct probabilities thereof corresponding to the first rectangular frame area and the second rectangular frame area, wherein the character recognition model comprises a corresponding relation between the image features and the character recognition results and correct probabilities thereof;
the target direction determining module is used for comparing the correct probabilities and determining the direction of the characters in the rectangular frame area corresponding to the highest correct probability as a target direction;
and the second recognition module is used for recognizing characters of the rectangular frame areas except the first rectangular frame area according to the target direction to obtain a recognition result.
Optionally, the apparatus further includes:
the rectangular frame area judging module is used for judging whether the height of each rectangular frame area containing one row of characters is larger than the width of each rectangular frame area before the first rectangular frame area is rotated 180 degrees to obtain a second rectangular frame area;
and the second rotating module is used for rotating the rectangular frame area containing one row of characters by 90 degrees towards the preset direction if the height of the rectangular frame area containing one row of characters is larger than the width of the rectangular frame area containing one row of characters.
Optionally, the rectangular frame area determining module includes:
the text line detection sub-module is used for inputting the image to be identified into a text line detection model, processing the image according to the image characteristics of the image to be identified to obtain the distance between each pixel point in the image to be identified and four edges of a rectangular frame area containing a line of characters and the included angle between the rectangular frame area containing a line of characters and the horizontal direction, wherein the text line detection model comprises the corresponding relation between the image characteristics and the distance between each pixel point in the image and four edges of the rectangular frame area containing a line of characters and the included angle between the rectangular frame area and the horizontal direction, and the text line detection model is pre-trained by the model training module based on an image sample and a detection label thereof;
the rectangular frame region determining submodule is used for determining each rectangular frame region containing a row of characters in the image to be identified through a non-maximum value suppression algorithm according to the distance;
and the rectangular frame area adjusting sub-module is used for adjusting the rectangular frame area containing one line of characters with the included angle larger than 45 degrees to be in the vertical direction, and adjusting the rectangular frame area containing one line of characters with the included angle larger than 0 degrees and not larger than 45 degrees to be in the horizontal direction.
Optionally, the output result of the text line detection model further includes a probability that each pixel point in the image to be identified belongs to a rectangular frame area;
the rectangular frame area determination submodule includes:
the pixel point removing unit is used for removing the pixel points which do not belong to the rectangular frame area according to the probability that each pixel point in the image to be identified belongs to the rectangular frame area and a preset threshold value;
and the rectangular frame area determining unit is used for determining each rectangular frame area containing a row of characters in the image to be identified through a non-maximum value suppression algorithm according to the distance corresponding to the residual pixel points.
Optionally, the model training module includes:
the image sample acquisition sub-module is used for acquiring an initial text line detection model and a plurality of image samples;
the image sample marking sub-module is used for marking four vertex coordinates of each rectangular frame area containing a row of characters in each image sample according to a preset rule;
the detection label generation sub-module is used for calculating the distance between each pixel point and four sides of the rectangular frame area containing one line of characters, the included angle between each rectangular frame area containing one line of characters and the horizontal direction and the probability that each pixel point belongs to the rectangular frame area according to the four vertex coordinates of each rectangular frame area containing one line of characters, so as to obtain the detection label of each image sample;
The prediction label generation sub-module is used for inputting the image sample into the initial text line detection model to generate a prediction label;
the parameter adjustment sub-module is used for adjusting parameters of the initial text line detection model based on the prediction label and the detection label of the corresponding image sample;
and the model generation sub-module is used for judging whether the iteration times of the initial text line detection model reach preset times or not, or whether the accuracy of the predictive label output by the initial text line detection model reaches a preset value or not, stopping training, and obtaining the text line detection model.
Optionally, the second identifying module includes:
the first character recognition sub-module is used for recognizing each rectangular frame area containing one row of characters in the image to be recognized to obtain characters corresponding to each rectangular frame area containing one row of characters if the target direction is the direction corresponding to the first rectangular frame area;
the second character recognition sub-module is used for rotating the image to be recognized by 180 degrees to obtain a target recognition image if the target direction is the direction corresponding to the second rectangular frame area; and identifying each rectangular frame area containing one row of characters in the target identification image to obtain characters corresponding to each rectangular frame area containing one row of characters.
In a third aspect, an embodiment of the present invention provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;
a memory for storing a computer program;
and the processor is used for realizing the steps of the identification method of the characters in the image when executing the program stored in the memory.
In a fourth aspect, an embodiment of the present invention provides a computer readable storage medium, where a computer program is stored, where the computer program when executed by a processor implements any of the above steps of a method for recognizing characters in an image.
In the scheme provided by the embodiment of the invention, the electronic equipment can acquire an image to be recognized, perform character region recognition on the image to be recognized, determine each rectangular frame region containing a row of characters, determine one of the rectangular frame regions containing a row of characters as a first rectangular frame region, input a character recognition model into a second rectangular frame region obtained by rotating the first rectangular region and the first rectangular frame region by 180 degrees, perform character recognition according to the image features of the first rectangular frame region and the second rectangular frame region, obtain character recognition results corresponding to the first rectangular frame region and the second rectangular frame region and correct probabilities thereof, compare the correct probabilities, determine the direction of the characters in the rectangular frame region corresponding to the highest correct probability as a target direction, and finally perform character recognition on the rectangular frame regions outside the first rectangular frame region according to the target direction to obtain the recognition results. Therefore, the electronic equipment can identify the characters in the image according to the target direction determined by the scheme, and can accurately identify the characters in the image when the character direction in the image does not accord with the direction of the viewing angle of the person for viewing the image.
Of course, it is not necessary for any one product or method of practicing the invention to achieve all of the advantages set forth above at the same time.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a method for recognizing characters in an image according to an embodiment of the present invention;
FIG. 2 (a) is a schematic diagram of an image to be identified according to an embodiment of the present invention;
FIG. 2 (b) is a schematic diagram of a rectangular frame area containing a line of text according to an embodiment of the present invention;
FIG. 2 (c) is a schematic diagram showing the left direction of the Chinese character in the image to be identified according to the embodiment of the present invention;
fig. 2 (d) is a schematic diagram of the direction of the Chinese character in the image to be identified in the embodiment of the present invention;
FIG. 3 is a specific flowchart of step S102 in the embodiment shown in FIG. 1;
FIG. 4 is a schematic diagram of a text line detection model output result according to an embodiment of the present invention;
FIG. 5 is a flowchart showing a step S302 in the embodiment shown in FIG. 3;
FIG. 6 is a flowchart of a training method of a text line detection model according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a device for recognizing characters in an image according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
For convenience of description and clarity of scheme, the direction of the characters in the image, which accords with the viewing angle of the image watched by the person, is referred to as a forward direction, the direction obtained by rotating the forward direction clockwise by 90 degrees is referred to as a left direction, the direction obtained by rotating the forward direction anticlockwise by 90 degrees is referred to as a right direction, and the direction obtained by rotating the forward direction by 180 degrees is referred to as a reverse direction. In the conventional image recognition method, the electronic device recognizes the characters in the image to be recognized according to the forward direction, and when the directions of the characters in the image are the left direction, the right direction or the reverse direction, the electronic device cannot correctly recognize the characters in the image.
In order to correctly identify characters in an image when the character direction in the image does not accord with the direction of the viewing angle of a person viewing the image, embodiments of the present invention provide a method, an apparatus, an electronic device, and a computer-readable storage medium for identifying characters in an image.
The following first describes a method for recognizing characters in an image provided by an embodiment of the present invention.
The method for recognizing the characters in the image provided by the embodiment of the invention can be applied to any electronic equipment needing to recognize the characters in the image, for example, a computer, a mobile phone, a processor and the like, and is not particularly limited. For convenience of description, the following is abbreviated as electronic equipment.
As shown in fig. 1, a method for identifying characters in an image, the method includes:
s101, acquiring an image to be identified;
wherein the image to be identified contains text.
S102, recognizing text areas of an image to be recognized, and determining rectangular frame areas containing a row of text;
s103, rotating the first rectangular frame area by 180 degrees to obtain a second rectangular frame area;
the first rectangular frame area is one of rectangular frame areas each containing a row of characters.
S104, inputting the first rectangular frame area and the second rectangular frame area into a character recognition model, and carrying out character recognition according to the image characteristics of the first rectangular frame area and the second rectangular frame area to obtain character recognition results and correct probabilities thereof corresponding to the first rectangular frame area and the second rectangular frame area;
The character recognition model comprises a corresponding relation between image features and a character recognition result and a correct probability thereof.
S105, comparing the correct probabilities, and determining the direction of the characters in the rectangular frame area corresponding to the highest correct probability as a target direction;
and S106, performing character recognition on the rectangular frame areas except the first rectangular frame area according to the target direction to obtain a recognition result.
In the scheme provided by the embodiment of the invention, the electronic equipment can acquire the image to be identified, identify the text region of the image to be identified, determine each rectangular frame region containing one row of text, determine one of the rectangular frame regions containing one row of text as a first rectangular frame region, input the first rectangular region and a second rectangular frame region obtained by rotating the first rectangular frame region by 180 degrees into a text identification model, perform text identification according to the image features of the first rectangular frame region and the second rectangular frame region, obtain text identification results corresponding to the first rectangular frame region and the second rectangular frame region and correct probabilities thereof, compare the correct probabilities thereof, determine the text direction in the rectangular frame region corresponding to the highest correct probability as a target direction, and finally perform text identification on the rectangular frame regions outside the first rectangular frame region according to the target direction to obtain identification results. Therefore, the electronic equipment can identify the characters in the image according to the target direction determined by the scheme, and can accurately identify the characters in the image when the character direction in the image does not accord with the direction of the viewing angle of the person for viewing the image.
In the step S101, the electronic device may acquire an image to be identified, where the image to be identified includes text. If the electronic equipment has an image acquisition function, the image to be identified containing the characters can be an image acquired by the electronic equipment; images stored locally for the electronic device; images transmitted for other electronic devices are also possible.
After the image to be identified is obtained, the electronic device may execute the step S102, that is, perform text region identification on the image to be identified, and determine each rectangular frame region containing a line of text, where the number of the rectangular frame regions containing a line of text is the number of lines of text in the image to be identified.
For example, fig. 2 (a) is a schematic diagram of an image to be recognized, fig. 2 (b) is a schematic diagram of a determined rectangular frame area containing one line of characters after the electronic device recognizes the text area of the image to be recognized shown in fig. 2 (a), and fig. 2 (b) includes 8 rectangular frame areas 01 containing one line of characters, which illustrates that the image to be recognized shown in fig. 2 (a) contains 8 lines of characters. In one embodiment, the electronic device may obtain, using the text line detection model, rectangular frame regions each containing a line of text according to image features of the image to be identified.
In general, in an image to be recognized, all the directions of characters are the same direction. In the rectangular frame region including one line of characters obtained in the above step S102, the directions of the characters are uniform. Therefore, the electronic device can determine the direction of the characters in the image to be recognized only by taking one rectangular frame area from any one of the rectangular frame areas containing one row of characters for recognition. For convenience of description and clarity of the scheme, one rectangular frame area taken from any one of the rectangular frame areas each containing a line of text will be referred to as a first rectangular frame area.
In the above step S103, the electronic device may rotate the first rectangular frame area by 180 degrees to obtain the second rectangular frame area. The direction of the characters in the first rectangular frame area may be a forward direction or a reverse direction, the electronic device may rotate the first rectangular frame area 180 degrees to obtain a second rectangular frame area, if the direction of the characters in the first rectangular frame is a reverse direction, the direction of the characters in the second rectangular frame is a forward direction, and if the direction of the characters in the first rectangular frame is a forward direction, the direction of the characters in the second rectangular frame is a reverse direction.
In general, the number of words in each line of words in an image to be recognized is inconsistent, and the larger the aspect ratio of a rectangular frame area, the larger the number of words contained in the rectangular frame area, and the smaller the aspect ratio of the rectangular frame area, the smaller the number of words contained in the rectangular frame area. The more the number of characters contained, the more the directions of the characters in the rectangular frame area can represent the directions of the characters in the image to be recognized, so in this case, the electronic device can select the rectangular frame area with the largest number of characters as the first rectangular frame area for recognition. In one embodiment, the electronic device may select a rectangular frame area with the largest aspect ratio as the first rectangular frame area, and rotate the first rectangular frame area by 180 degrees to obtain the second rectangular frame area.
After the first rectangular frame area and the second rectangular frame area are obtained, the electronic device may execute the step S104, that is, input the first rectangular frame area and the second rectangular frame area into a text recognition model, and perform text recognition according to the image features of the first rectangular frame area and the second rectangular frame area, so as to obtain text recognition results and correct probabilities thereof corresponding to the first rectangular frame area and the second rectangular frame area.
The character recognition model comprises a corresponding relation between image features and a character recognition result and a correct probability thereof.
The character recognition model can be obtained by training the electronic equipment on the initial character recognition model. In the training process, the initial character recognition model can learn the corresponding relation between the image features and the character recognition result and the correct probability thereof.
In one embodiment, the electronic device may input the first rectangular frame area and the second rectangular frame area into the text recognition model, and obtain text recognition results and correct probabilities thereof corresponding to the first rectangular frame area and the second rectangular frame area. For example, the text recognition model may be a CNN (Convolutional Neural Networks, convolutional neural network), and the electronic device may input the first rectangular frame area and the second rectangular frame area into the CNN, where the CNN may extract image features in the first rectangular frame area and the second rectangular frame area, and then convert the image features into feature sequences, and further calculate the feature sequences through a normalized exponential function, to obtain a posterior probability matrix.
Each column of the posterior probability matrix represents an output text category and each row represents text identified by a feature sequence. And taking the maximum probability of each column of the matrix to obtain the characters with the maximum probability in each column, combining the characters with the maximum probability into a character sequence to serve as the character sequence with the maximum probability, judging whether the adjacent characters in the obtained character sequence with the maximum probability are repeated, if the repeated characters exist, performing de-duplication on the repeated characters to obtain a character recognition result, and finally calculating the posterior probability of the character recognition result according to a Bayesian formula to serve as the correct probability.
After the correct probability is obtained, the electronic device may compare the two correct probabilities, and determine the direction of the text in the rectangular frame area corresponding to the highest correct probability as the target direction. The higher the accuracy probability, the more accurate the corresponding character recognition result is, and the higher the probability that the direction of characters in the corresponding rectangular frame area is positive. Therefore, the electronic device can determine the direction of the character in the rectangular frame area corresponding to the highest probability of correctness as the target direction, that is, the direction indicating the character in the rectangular frame area is the forward direction.
Furthermore, the electronic device can perform character recognition on the rectangular frame areas except the first rectangular frame area according to the target direction, and a recognition result is obtained. In general, in an image to be recognized, all directions of characters are the same, so that the electronic device performs character recognition on the rectangular frame areas except the first rectangular frame area according to the target direction, and a correct recognition result can be obtained.
In one embodiment, the electronic device may perform text recognition on the rectangular frame area other than the first rectangular frame area in an OCR manner according to the target direction, to obtain a recognition result.
As an implementation manner of the embodiment of the present invention, before the step of rotating the first rectangular frame area by 180 degrees to obtain the second rectangular frame area, the method may further include the following steps:
judging whether the height of each rectangular frame area containing a row of characters is larger than the width; if the height of the rectangular frame area containing one row of characters is larger than the width, rotating the rectangular frame area containing one row of characters by 90 degrees towards the preset direction.
The text direction in the image to be identified acquired by the electronic device may be left or right. For example, as shown in a rectangular frame region 02 in fig. 2 (c), since the text direction is left, the rectangular frame region has a height larger than a width. As shown in a rectangular frame area 03 in fig. 2 (d), since the text direction is right, the rectangular frame area has a larger height than a width.
Therefore, in order to determine whether the text direction in the image to be recognized is the left direction or the right direction, the electronic device may determine whether the height of each rectangular frame area containing one line of text is greater than the width, and if the height of a rectangular frame area containing one line of text is greater than the width, it is indicated that the text direction in the rectangular frame area may be the left direction or the right direction.
Furthermore, in the case that the text direction in the image to be recognized acquired by the electronic device may be left or right, in order to change the text direction in the image to be recognized into positive or negative, so as to facilitate the subsequent text recognition processing, the electronic device may rotate the rectangular frame area containing a line of text by 90 degrees in a preset direction.
If the preset direction is clockwise, if the Chinese character direction in the image to be identified is left, the electronic device can rotate the rectangular frame area by 90 degrees clockwise, and at the moment, the Chinese character direction in the rectangular frame area is changed from left to reverse. If the Chinese image in the image to be identified is right, the electronic device can rotate the rectangular frame area by 90 degrees clockwise, and at the moment, the Chinese direction in the rectangular frame area is changed from right to positive.
If the predicted direction is counterclockwise, the electronic device may rotate the rectangular frame region by 90 degrees in the counterclockwise direction if the Chinese direction in the image to be recognized is left, and at this time, the Chinese direction in the rectangular frame region is changed from left to positive. If the direction of the Chinese character in the image to be identified is right, the electronic device can rotate the rectangular frame area by 90 degrees clockwise, and at the moment, the direction of the Chinese character in the rectangular frame area is changed from right to reverse.
In this embodiment, the electronic device may determine whether the height of each rectangular frame area containing a line of characters is greater than the width; if the height of the rectangular frame area containing one row of characters is larger than the width, rotating the rectangular frame area containing one row of characters by 90 degrees towards the preset direction. Thus, when the character direction in the image to be recognized is left or right, the character direction in the image to be recognized can be changed into positive or negative so as to facilitate the subsequent character recognition processing.
As shown in fig. 3, the step of identifying the text region of the image to be identified and determining each rectangular frame region containing a row of text may include the following steps:
s301, inputting an image to be identified into a text line detection model, and processing according to image characteristics of the image to be identified to obtain distances between each pixel point in the image to be identified and four sides of a rectangular frame area containing one line of characters and included angles between the rectangular frame area containing one line of characters and the horizontal direction, wherein the four sides of the rectangular frame area containing one line of characters are associated with the pixel point;
the text line detection model may include image features, distances between each pixel point in the image and four sides of a rectangular frame area containing a line of text, and corresponding relations between the rectangular frame area and an included angle in a horizontal direction.
The text line detection model can be obtained by training the electronic device on the initial text line detection model. In the training process, the initial text line detection model can learn the corresponding relation between the image characteristics and the distances between each pixel point in the image and four sides of a rectangular frame area containing one line of text and the included angle between the rectangular frame area and the horizontal direction.
Fig. 4 is a schematic diagram of an output result of a text line detection model according to an embodiment of the present invention, that is, a schematic diagram of distances between each pixel point in the image and four sides of a rectangular frame area containing a line of text and included angles between the rectangular frame area and a horizontal direction, where the pixel point belongs. The distances between a pixel point 401 in the image to be identified and four sides of the rectangular frame area 400 containing a row of characters are respectively a dotted line section 402, a dotted line section 403, a dotted line section 404 and a dotted line section 405, and the included angle between the rectangular frame area 400 and the horizontal direction 407 is an angle 406.
S302, determining rectangular frame areas containing a row of characters in each image to be identified through a non-maximum suppression algorithm according to the distance;
after the distances between each pixel point in the image to be identified and the four sides of the rectangular frame area containing the characters, to which the pixel point belongs, are obtained, the electronic equipment can determine each rectangular frame area containing one row of characters in the image to be identified through a non-maximum suppression algorithm according to the distances.
S303, adjusting the rectangular frame area containing one line of characters with the included angle larger than 45 degrees to be vertical, and adjusting the rectangular frame area containing one line of characters with the included angle larger than 0 degrees and not larger than 45 degrees to be horizontal.
The electronic device may adjust the rectangular frame area containing a line of characters with the included angle greater than 45 degrees to a vertical direction according to the included angle between the rectangular frame area containing a line of characters and the horizontal direction, and adjust the rectangular frame area containing a line of characters with the included angle greater than 0 degrees and not greater than 45 degrees to the horizontal direction. For example, the included angle 406 between the rectangular frame area 400 including a line of characters and the horizontal direction 407 shown in fig. 4 is 10 degrees and less than 45 degrees, so the electronic device can rotate the rectangular frame area 400 clockwise by 10 degrees to adjust to the horizontal direction.
It can be seen that, in this embodiment, the electronic device may input the image to be identified into the text line detection model, process the image according to the image characteristics of the image to be identified, obtain the distances between each pixel point in the image to be identified and four sides of the rectangular frame area containing the text to which the pixel point belongs, and the included angle between the rectangular frame area and the horizontal direction, and then the electronic device may adjust the rectangular frame area containing the text with the included angle greater than 45 degrees to the vertical direction according to the distances, and adjust the rectangular frame area containing the text with the included angle greater than 0 degrees and not greater than 45 degrees to the horizontal direction. Therefore, when the rectangular frame area containing one line of characters in the image to be identified is not in the horizontal direction or the vertical direction, the electronic equipment can adjust the rectangular frame area containing one line of characters in the image to be identified to the horizontal direction or the vertical direction, so that the rectangular frame area can be identified conveniently.
As an implementation manner of the embodiment of the present invention, the electronic device inputs the image to be identified into the text line detection model, and the result of processing according to the image features of the image to be identified may further include a probability that each pixel point in the image to be identified belongs to a rectangular frame area.
The text line detection model can be obtained by training the electronic device on the initial text line detection model. In the training process, the initial text line detection model can learn the corresponding relation between the image characteristics and the probability that each pixel point in the image belongs to the rectangular frame area. Therefore, the electronic device inputs the image to be identified into the text line detection model, and the result of processing according to the image characteristics of the image to be identified can include the probability that each pixel point in the image to be identified belongs to the rectangular frame area.
For this case, as shown in fig. 5, the step of determining, according to the distance, each rectangular frame area containing a line of characters in the image to be identified by a non-maximum suppression algorithm may include:
s501, the electronic device can remove the pixel points which do not belong to the rectangular frame area according to the probability that each pixel point in the image to be identified belongs to the rectangular frame area and a preset threshold value;
The electronic device may input the image to be identified into a text line detection model, and the text line detection model may determine, according to the image features of the image to be identified, a probability that each pixel point in the image to be identified belongs to a rectangular frame area and output the probability. Because the electronic device only needs to identify each rectangular frame area containing a row of characters in the image, the electronic device can remove the pixels which do not belong to the rectangular frame area, specifically, the electronic device can preset a preset probability threshold value, and according to the probability that each pixel in the image to be identified belongs to the rectangular frame area and the relation between the preset threshold value, whether each pixel belongs to the rectangular frame area is determined, and then the pixels which do not belong to the rectangular frame area are removed.
When the probability that the pixel point belongs to the rectangular frame area is not smaller than a preset threshold value, the probability that the pixel point belongs to the rectangular frame area is higher, so that the electronic equipment can keep the pixel point; when the probability that the pixel point belongs to the rectangular frame area is smaller than the preset threshold value, the probability that the pixel point does not belong to the rectangular frame area is higher, so that the electronic equipment can remove the pixel point.
S502, the electronic equipment determines each rectangular frame area containing a row of characters in the image to be identified through a non-maximum value suppression algorithm according to the distance corresponding to the residual pixel points.
The remaining pixels are pixels left after the electronic device removes pixels with probability less than the preset threshold value belonging to the rectangular frame area in step S501. The electronic device can determine each rectangular frame area containing a row of characters in the image to be identified through a non-maximum suppression algorithm according to the distance corresponding to the residual pixel points, and further, the electronic device can adjust the rectangular frame area with the included angle larger than 45 degrees to be vertical according to the included angle between the rectangular frame area corresponding to the residual pixel points and the horizontal direction, and adjust the rectangular frame area with the included angle larger than 0 degrees and not larger than 45 degrees to be horizontal.
It can be seen that, in this embodiment, the electronic device may remove the pixels that do not belong to the rectangular frame area according to the probability that each pixel belongs to the rectangular frame area in the image to be identified and the preset threshold, and further determine, according to the distance corresponding to the remaining pixels, the rectangular frame area containing a row of characters in each image to be identified through a non-maximum suppression algorithm. Therefore, the electronic equipment only needs to determine the rectangular frame area containing a row of characters in each image to be identified according to the distance corresponding to the pixel points belonging to the rectangular frame area, and the calculated amount can be reduced and the efficiency can be improved by removing the pixel points not belonging to the rectangular frame area.
As shown in fig. 6, the training manner of the text line model according to the embodiment of the present invention may include:
s601, an electronic device acquires an initial text line detection model and a plurality of image samples;
wherein, the plurality of image samples all contain characters. The initial text line detection model can be a deep learning model such as a convolutional neural network, parameters of the deep learning model can be initialized at any time, and the structure of the initial text line detection model is not particularly limited.
S602, marking four vertex coordinates of each rectangular frame area containing a row of characters in each image sample by the electronic equipment according to a preset rule;
after the plurality of image samples are obtained, in order to train the initial text line detection model, the electronic device can determine a rectangular frame area containing one line of text in each image sample, further calibrate four vertexes of each rectangular frame area containing one line of text in the image sample, and determine coordinates of the four vertexes so as to calculate a detection label of the rectangular frame area.
S603, according to the four vertex coordinates of each rectangular frame area containing a row of characters, calculating the distance between each pixel point and four sides of the rectangular frame area containing a row of characters, the included angle between each rectangular frame area containing a row of characters and the horizontal direction and the probability that each pixel point belongs to the rectangular frame area, and obtaining the detection label of each image sample;
Because the text line detection model obtained by training needs to process the image, the distance between each pixel point and the four sides of the rectangular frame area containing a line of characters, the included angle between each rectangular frame area containing a line of characters and the horizontal direction, and the probability that each pixel point belongs to the rectangular frame area are determined, so that the electronic device can calculate the distance between each pixel point and the four sides of the rectangular frame area containing a line of characters, the included angle between each rectangular frame area containing a line of characters and the horizontal direction, and the probability that each pixel point belongs to the rectangular frame area according to the four vertex coordinates of each rectangular frame area containing a line of characters, and the probability that each pixel point belongs to the rectangular frame area are calculated as the detection label of each image sample.
In one embodiment, in order to ensure that the remaining pixels determined by the electronic device through the preset probability threshold belong to the rectangular frame area, when calculating the probability that the pixels belong to the rectangular frame area, the rectangular frame area containing a row of characters may be reduced by one third to obtain a reduced rectangular frame area, then it is determined whether each pixel belongs to the reduced rectangular frame area, if so, it may be determined that the score corresponding to the pixel is higher, if not, it may be determined that the score corresponding to the pixel is lower, and then the obtained score is used as the probability that each pixel belongs to the rectangular frame area.
In another embodiment, the distance between each pixel point in each image sample and four sides of the rectangular frame area containing a line of characters, and the included angle between each rectangular frame area containing a line of characters in the horizontal direction can also be calculated in other manners, which is not limited herein.
S604, the electronic equipment inputs the image sample into an initial text line detection model to obtain a prediction label;
after the image sample is marked to obtain the detection label, the electronic device can input the image sample into the initial text line detection model, the initial text line detection model can process the image sample based on current parameters, and the distance between each pixel point and four sides of a rectangular frame area containing a line of characters, the included angle between each rectangular frame area containing a line of characters and the horizontal direction and the probability that each pixel point belongs to the rectangular frame area are determined according to the image characteristics of the image sample, namely the prediction label.
S605, adjusting parameters of an initial text line detection model based on the prediction labels and the detection labels of the corresponding image samples;
because the initial text line detection model cannot accurately process the image sample to obtain an accurate output result, after the prediction label and the detection label are obtained, the electronic equipment can compare the prediction label with the corresponding detection label, and further adjust parameters of the initial text line detection model according to the difference between the prediction label and the corresponding detection label, so that the parameters of the initial text line detection model are more suitable. The method for adjusting the parameters of the initial text line detection model may be a model parameter adjustment method such as a random gradient descent algorithm, which is not specifically limited and described herein.
In the training process, the initial text line detection model can continuously learn the corresponding relation between the image characteristics of the image sample, the distances between each pixel point in the image sample and four edges of a rectangular frame area containing one line of characters, the included angle between each rectangular frame area containing one line of characters and the horizontal direction, and the probability that each pixel point belongs to the rectangular frame area.
And S606, the electronic equipment judges whether the iteration times of the initial text line detection model reach the preset times or whether the accuracy of the predictive label output by the initial text line detection model reaches the preset value, and stops training to obtain the text line detection model.
If the iteration times of the initial text line detection model reach the preset times or the accuracy of the prediction label output by the initial text line detection model reaches the preset value, the current initial text line detection model can process the image to obtain an accurate output result, so that training can be stopped at the moment to obtain the text line detection model.
The preset number of times may be set according to factors such as the recognition requirement and the model structure, for example, 5000 times, 10000 times, 12000 times, and the like, which are not limited herein. The preset value may be defined according to the recognition requirement, the model structure, and other factors, for example, 99%, 99.1%, 99.2%, and the like, which are not particularly limited herein.
It can be seen that, in this embodiment, the electronic device may obtain an initial text line detection model and a plurality of image samples, then mark four vertex coordinates of each rectangular frame area containing a line of text in each image sample according to a preset rule, calculate, according to the four vertex coordinates, a distance between each pixel point and four sides of the rectangular frame area containing a line of text to which the pixel point belongs, an included angle between each rectangular frame area containing a line of text and a horizontal direction, and a probability that each pixel point belongs to the rectangular frame area, obtain a detection label of each image sample, input the image sample into the initial text line detection model to obtain a prediction label, and then adjust parameters of the initial text line detection model based on the prediction label and the detection label of the corresponding image sample, and finally determine whether the iteration number of the initial text line detection model reaches the preset number, or whether the accuracy of the prediction label output by the initial text line detection model reaches the preset value. Therefore, the electronic equipment can compare the detection label with the prediction label, and adjust parameters of the initial text line detection model according to the comparison result to obtain the text line detection model meeting the requirements, so that the recognition accuracy is improved.
As an implementation manner of the embodiment of the present invention, the step of performing text recognition on the rectangular frame area other than the first rectangular frame area according to the target direction to obtain a recognition result may include the following steps:
if the target direction is the direction corresponding to the characters in the first rectangular frame area, identifying each rectangular frame area containing a row of characters in the image to be identified, and obtaining characters corresponding to each rectangular frame area containing a row of characters; if the target direction is the direction corresponding to the Chinese character in the second rectangular frame area, rotating the image to be identified by 180 degrees to obtain a target identification image; and identifying each rectangular frame area containing one row of characters in the target identification image to obtain characters corresponding to each rectangular frame area containing one row of characters.
In general, in an image to be recognized, all the directions of characters are the same direction. Therefore, if the target direction is the direction corresponding to the characters in the first rectangular frame area, the direction of all the characters in the image to be recognized is identical to the target direction, and the electronic equipment can recognize each rectangular frame area containing one row of characters in the image to be recognized according to the target direction, so as to obtain the characters corresponding to each rectangular frame area containing one row of characters.
If the target direction is the direction corresponding to the characters in the second rectangular frame area, the direction of all the characters in the image to be recognized is identical to the target direction after rotating 180 degrees, the electronic equipment can rotate the image to be recognized 180 degrees to obtain a target recognition image, and then the characters corresponding to the rectangular frame areas containing one row of characters in each target recognition image are obtained by recognizing the rectangular frame areas containing one row of characters according to the target direction.
It can be seen that, in this embodiment, if the target direction is the direction corresponding to the first rectangular frame area, the electronic device may identify each rectangular frame area containing a row of characters in the image to be identified, so as to obtain characters corresponding to each rectangular frame area containing a row of characters, and if the target direction is the direction corresponding to the second rectangular frame area, the electronic device may rotate the image to be identified by 180 degrees, so as to obtain the target identification image, and further identify each rectangular frame area containing a row of characters in the target identification image, so as to obtain characters corresponding to each rectangular frame area containing a row of characters. Therefore, the electronic equipment can identify the characters outside the first rectangular frame area in the image to be identified according to the correct direction, repeated identification is avoided, and the working efficiency is improved.
Corresponding to the above method for recognizing the characters in the image, the embodiment of the application also provides a device for recognizing the characters in the image. The following describes a device for recognizing characters in an image provided in an embodiment of the present application.
As shown in fig. 7, a schematic structural diagram of a device for recognizing characters in an image includes the following modules.
The image to be identified acquisition module 701 is configured to acquire an image to be identified;
wherein the image to be identified contains text.
The rectangular frame area determining module 702 is configured to perform text area recognition on an image to be recognized, and determine each rectangular frame area containing a row of text;
a first rotation module 703, configured to rotate the first rectangular frame region by 180 degrees to obtain a second rectangular frame region;
the first rectangular frame area is one of rectangular frame areas each containing a row of characters.
The first recognition module 704 is configured to input the first rectangular frame area and the second rectangular frame area into a text recognition model, perform text recognition according to image features of the first rectangular frame area and the second rectangular frame area, and obtain text recognition results and correct probabilities thereof corresponding to the first rectangular frame area and the second rectangular frame area, where the text recognition model includes a correspondence between the image features and the text recognition results and correct probabilities thereof;
The target direction determining module 705 is configured to compare the correct probabilities, and determine a direction of the text in the rectangular frame area corresponding to the highest correct probability as a target direction;
the second recognition module 706 is configured to perform text recognition on the rectangular frame area other than the first rectangular frame area according to the target direction, so as to obtain a recognition result.
In the scheme provided by the embodiment of the invention, the electronic equipment can acquire the image to be identified, identify the text region of the image to be identified, determine each rectangular frame region containing one row of text, determine one of the rectangular frame regions containing one row of text as a first rectangular frame region, input the first rectangular region and a second rectangular frame region obtained by rotating the first rectangular frame region by 180 degrees into a text identification model, perform text identification according to the image features of the first rectangular frame region and the second rectangular frame region, obtain text identification results corresponding to the first rectangular frame region and the second rectangular frame region and correct probabilities thereof, compare the correct probabilities thereof, determine the text direction in the rectangular frame region corresponding to the highest correct probability as a target direction, and finally perform text identification on the rectangular frame regions outside the first rectangular frame region according to the target direction to obtain identification results. Therefore, the electronic equipment can identify the characters in the image according to the target direction determined by the scheme, and can accurately identify the characters in the image when the character direction in the image does not accord with the direction of the viewing angle of the person for viewing the image.
As an implementation manner of the embodiment of the present invention, the device for recognizing characters in an image may further include:
a rectangular frame area judging module (not shown in fig. 7) for judging whether the height of each rectangular frame area containing a line of characters is larger than the width before rotating the first rectangular frame area by 180 degrees to obtain a second rectangular frame area;
and a second rotation module (not shown in fig. 7) for rotating the rectangular frame area containing one line of characters by 90 degrees in a preset direction if the height of the rectangular frame area containing one line of characters is larger than the width.
As an implementation manner of the embodiment of the present invention, the rectangular area determining module 702 may include:
a text line detection sub-module (not shown in fig. 7) for inputting the image to be identified into a text line detection model, and processing according to the image characteristics of the image to be identified to obtain the distance between each pixel point in the image to be identified and four sides of a rectangular frame region containing a line of characters and the included angle between the rectangular frame region containing a line of characters and the horizontal direction;
the text line detection model comprises image characteristics, the distance between each pixel point in an image and four edges of a rectangular frame area containing a line of characters, which the pixel point belongs to, and the corresponding relation between the rectangular frame area and an included angle in the horizontal direction, and is pre-trained based on an image sample and a detection label thereof by a model training module.
A rectangular frame region determining submodule (not shown in fig. 7) for determining rectangular frame regions each containing a line of characters in the image to be identified by a non-maximum suppression algorithm according to the distance;
and the rectangular frame area adjusting sub-module (not shown in fig. 7) is used for adjusting the rectangular frame area containing one line of characters with the included angle larger than 45 degrees to be vertical, and adjusting the rectangular frame area containing one line of characters with the included angle smaller than or equal to 0 degree and smaller than 45 degrees to be horizontal.
As an implementation manner of the embodiment of the present invention, the rectangular frame area determining sub-module may include:
a pixel point removing unit (not shown in fig. 7) configured to remove pixel points that do not belong to a rectangular frame area according to a probability that each pixel point in the image to be identified belongs to the rectangular frame area and a preset threshold;
the probability that each pixel point in the image to be identified belongs to a rectangular frame area is another output result of the text line detection model, and the text line detection model is pre-trained by a model training module based on an image sample and a detection label thereof.
A rectangular frame region determining unit (not shown in fig. 7) configured to determine, according to the distance corresponding to the remaining pixel points, a rectangular frame region including a line of characters in each of the images to be identified by a non-maximum suppression algorithm.
As an implementation manner of the embodiment of the present invention, the model training module may include:
an image sample acquisition sub-module (not shown in fig. 7) for acquiring an initial text line detection model and a plurality of image samples;
an image sample marking sub-module (not shown in fig. 7) for marking four vertex coordinates of each rectangular frame region containing a line of text in each image sample according to a preset rule;
a detection label generating sub-module (not shown in fig. 7) for calculating the distance between each pixel point and four sides of the rectangular frame area containing a row of characters, the included angle between each rectangular frame area containing a row of characters and the horizontal direction, and the probability that each pixel point belongs to the rectangular frame area according to the four vertex coordinates of each rectangular frame area containing a row of characters, so as to obtain the detection label of each image sample;
a predictive label generation sub-module (not shown in fig. 7) for inputting the image samples into the initial text line detection model to generate predictive labels;
a parameter adjustment sub-module (not shown in fig. 7) for adjusting parameters of the initial text line detection neural network model based on the prediction tags and the detection tags of the corresponding image samples;
And a model generating sub-module (not shown in fig. 7) for judging that the iteration number of the initial text line detection model reaches a preset number, or that the accuracy of the predictive label output by the initial text line detection model reaches a preset value, stopping training, and obtaining the text line detection model.
As an implementation manner of the embodiment of the present invention, the second identifying module may include:
a first text recognition sub-module (not shown in fig. 7) configured to, if the target direction is a direction corresponding to the first rectangular frame area, recognize each rectangular frame area containing a row of text in the image to be recognized, and obtain text corresponding to each rectangular frame area containing a row of text;
a second text recognition sub-module (not shown in fig. 7) configured to rotate the image to be recognized by 180 degrees if the target direction is a direction corresponding to the second rectangular frame area, so as to obtain a target recognition image; and identifying each rectangular frame area containing one row of characters in the target identification image to obtain characters corresponding to each rectangular frame area containing one row of characters.
The embodiment of the present invention further provides an electronic device, as shown in fig. 8, including a processor 801, a communication interface 802, a memory 803, and a communication bus 804, where the processor 801, the communication interface 802, and the memory 803 complete communication with each other through the communication bus 804,
A memory 803 for storing a computer program;
the processor 801 is configured to implement the method for recognizing characters in an image according to any one of the embodiments described above when executing the program stored in the memory 803.
In the scheme provided by the embodiment of the invention, the electronic equipment can acquire the image to be identified, identify the text region of the image to be identified, determine each rectangular frame region containing one row of text, determine one of the rectangular frame regions containing one row of text as a first rectangular frame region, input the first rectangular region and a second rectangular frame region obtained by rotating the first rectangular frame region by 180 degrees into a text identification model, perform text identification according to the image features of the first rectangular frame region and the second rectangular frame region, obtain text identification results corresponding to the first rectangular frame region and the second rectangular frame region and correct probabilities thereof, compare the correct probabilities thereof, determine the text direction in the rectangular frame region corresponding to the highest correct probability as a target direction, and finally perform text identification on the rectangular frame regions outside the first rectangular frame region according to the target direction to obtain identification results. Therefore, the electronic equipment can identify the characters in the image according to the target direction determined by the scheme, and can accurately identify the characters in the image when the character direction in the image does not accord with the direction of the viewing angle of the person for viewing the image.
The communication bus mentioned above for the electronic devices may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.
The communication interface is used for communication between the electronic device and other devices.
The Memory may include random access Memory (Random Access Memory, RAM) or may include Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.
The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processing, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
In yet another embodiment of the present invention, there is further provided a computer readable storage medium, in which a computer program is stored, the computer program implementing the steps of the method for recognizing characters in an image according to any one of the embodiments, when the computer program is executed by a processor.
In the scheme provided by the embodiment of the invention, the electronic equipment can acquire the image to be identified, identify the text region of the image to be identified, determine each rectangular frame region containing one row of text, determine one of the rectangular frame regions containing one row of text as a first rectangular frame region, input the first rectangular region and a second rectangular frame region obtained by rotating the first rectangular frame region by 180 degrees into a text identification model, perform text identification according to the image features of the first rectangular frame region and the second rectangular frame region, obtain text identification results corresponding to the first rectangular frame region and the second rectangular frame region and correct probabilities thereof, compare the correct probabilities thereof, determine the text direction in the rectangular frame region corresponding to the highest correct probability as a target direction, and finally perform text identification on the rectangular frame regions outside the first rectangular frame region according to the target direction to obtain identification results. Therefore, the electronic equipment can identify the characters in the image according to the target direction determined by the scheme, and can accurately identify the characters in the image when the character direction in the image does not accord with the direction of the viewing angle of the person for viewing the image.
It should be noted that, with respect to the apparatus, electronic device, and computer-readable storage medium embodiments described above, since they are substantially similar to the method embodiments, the description is relatively simple, and reference should be made to the description of the method embodiments for relevant points.
It is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims (14)

1. A method for recognizing text in an image, the method comprising:
acquiring an image to be identified, wherein the image to be identified contains characters;
performing character area recognition on the image to be recognized, and determining each rectangular frame area containing a row of characters;
rotating the first rectangular frame area by 180 degrees to obtain a second rectangular frame area, wherein the first rectangular frame area is one of the rectangular frame areas containing one row of characters;
inputting the first rectangular frame area and the second rectangular frame area into a character recognition model, and performing character recognition according to the image features of the first rectangular frame area and the second rectangular frame area to obtain character recognition results and correct probabilities thereof corresponding to the first rectangular frame area and the second rectangular frame area, wherein the character recognition model comprises a corresponding relation between the image features and the character recognition results and correct probabilities thereof;
Comparing the correct probabilities, and determining the direction of the characters in the rectangular frame area corresponding to the highest correct probability as a target direction;
performing character recognition on the rectangular frame areas except the first rectangular frame area according to the target direction to obtain a recognition result;
the correct probability is obtained by the text recognition model based on the following modes:
the character recognition model extracts image features of the first rectangular frame area and the second rectangular frame area, converts the image features into feature sequences, and calculates the feature sequences through a normalized exponential function to obtain a posterior probability matrix; wherein each column of the posterior probability matrix represents an output text category and each row represents text identified by the feature sequence;
aiming at each column in the posterior probability matrix, obtaining the characters with the highest probability in the column, and combining the characters with the highest probability in each column into a character sequence as the character sequence with the highest probability;
judging whether adjacent characters in the character sequence with the maximum probability are repeated, and if the repeated characters exist, performing duplicate removal on the repeated characters to obtain a character recognition result corresponding to the input data;
And calculating the posterior probability of the character recognition result as the correct probability corresponding to the input data.
2. The method of claim 1, wherein prior to the step of rotating the first rectangular frame region 180 degrees to obtain the second rectangular frame region, the method further comprises:
judging whether the height of each rectangular frame area containing a row of characters is larger than the width;
if the height of the rectangular frame area containing one row of characters is larger than the width, rotating the rectangular frame area containing one row of characters by 90 degrees towards the preset direction.
3. The method according to claim 1, wherein the step of recognizing the text region of the image to be recognized, and determining each rectangular frame region containing a line of text, comprises:
inputting the image to be identified into a text line detection model, and processing according to the image characteristics of the image to be identified to obtain the corresponding relation between the distance between each pixel point in the image to be identified and the four sides of a rectangular frame area containing a row of characters and the included angle between the rectangular frame area containing a row of characters and the horizontal direction, wherein the text line detection model comprises the distance between each pixel point in the image and the four sides of the rectangular frame area containing a row of characters and the corresponding relation between the rectangular frame area and the horizontal direction;
According to the distance, determining rectangular frame areas containing a row of characters in each image to be identified through a non-maximum value suppression algorithm;
and adjusting the rectangular frame area containing one line of characters with the included angle larger than 45 degrees to be vertical, and adjusting the rectangular frame area containing one line of characters with the included angle larger than 0 degrees and not larger than 45 degrees to be horizontal.
4. A method according to claim 3, wherein the output result of the text line detection model further includes a probability that each pixel point in the image to be identified belongs to a rectangular frame region;
the step of determining each rectangular frame area containing a row of characters in the image to be identified through a non-maximum suppression algorithm according to the distance comprises the following steps:
removing pixel points which do not belong to the rectangular frame area according to the probability that each pixel point in the image to be identified belongs to the rectangular frame area and a preset threshold value;
and determining each rectangular frame area containing a row of characters in the image to be identified through a non-maximum value suppression algorithm according to the distance corresponding to the residual pixel points.
5. A method according to claim 3, wherein the training mode of the text line detection model comprises:
Acquiring an initial text line detection model and a plurality of image samples;
marking four vertex coordinates of each rectangular frame area containing a row of characters in each image sample according to a preset rule;
calculating the distance between each pixel point and four sides of the rectangular frame area containing one row of characters, the included angle between each rectangular frame area containing one row of characters and the horizontal direction and the probability that each pixel point belongs to the rectangular frame area according to the four vertex coordinates of each rectangular frame area containing one row of characters, and obtaining the detection label of each image sample;
inputting the image sample into the initial text line detection model to obtain a prediction label;
adjusting parameters of the initial text line detection model based on the prediction labels and the detection labels of the corresponding image samples;
and judging whether the iteration times of the initial text line detection model reach preset times or not, or whether the accuracy of the predictive label output by the initial text line detection model reaches a preset value or not, and stopping training to obtain the text line detection model.
6. The method according to claim 1, wherein the step of performing text recognition on the rectangular frame area other than the first rectangular frame area according to the target direction to obtain a recognition result includes:
If the target direction is the direction corresponding to the Chinese character in the first rectangular frame area, identifying each rectangular frame area containing a row of characters except the first rectangular frame area in the image to be identified, and obtaining characters corresponding to each rectangular frame area containing a row of characters;
if the target direction is the direction corresponding to the Chinese character in the second rectangular frame area, rotating the image to be identified by 180 degrees to obtain a target identification image; and identifying each rectangular frame area containing one row of characters except the first rectangular frame area in the target identification image to obtain characters corresponding to each rectangular frame area containing one row of characters.
7. An apparatus for recognizing text in an image, the apparatus comprising:
the image to be identified is used for acquiring an image to be identified, wherein the image to be identified contains characters;
the rectangular frame area determining module is used for carrying out character area recognition on the image to be recognized and determining each rectangular frame area containing a row of characters;
the first rotating module is used for rotating the first rectangular frame area by 180 degrees to obtain a second rectangular frame area, wherein the first rectangular frame area is one of the rectangular frame areas containing one row of characters;
The first recognition module is used for inputting the first rectangular frame area and the second rectangular frame area into a character recognition model, and carrying out character recognition according to the image features of the first rectangular frame area and the second rectangular frame area to obtain character recognition results and correct probabilities thereof corresponding to the first rectangular frame area and the second rectangular frame area, wherein the character recognition model comprises a corresponding relation between the image features and the character recognition results and correct probabilities thereof;
the target direction determining module is used for comparing the correct probabilities and determining the direction of the characters in the rectangular frame area corresponding to the highest correct probability as a target direction;
the second recognition module is used for recognizing characters in the rectangular frame areas except the first rectangular frame area according to the target direction to obtain a recognition result;
the correct probability is obtained by the text recognition model based on the following modes:
the character recognition model extracts image features of the first rectangular frame area and the second rectangular frame area, converts the image features into feature sequences, and calculates the feature sequences through a normalized exponential function to obtain a posterior probability matrix; wherein each column of the posterior probability matrix represents an output text category and each row represents text identified by the feature sequence;
Aiming at each column in the posterior probability matrix, obtaining the characters with the highest probability in the column, and combining the characters with the highest probability in each column into a character sequence as the character sequence with the highest probability;
judging whether adjacent characters in the character sequence with the maximum probability are repeated, and if the repeated characters exist, performing duplicate removal on the repeated characters to obtain a character recognition result corresponding to the input data;
and calculating the posterior probability of the character recognition result as the correct probability corresponding to the input data.
8. The apparatus of claim 7, wherein the apparatus further comprises:
the rectangular frame area judging module is used for judging whether the height of each rectangular frame area containing one row of characters is larger than the width of each rectangular frame area before the first rectangular frame area is rotated 180 degrees to obtain a second rectangular frame area;
and the second rotating module is used for rotating the rectangular frame area containing one row of characters by 90 degrees towards the preset direction if the height of the rectangular frame area containing one row of characters is larger than the width of the rectangular frame area containing one row of characters.
9. The apparatus of claim 7, wherein the rectangular box area determination module comprises:
The text line detection sub-module is used for inputting the image to be identified into a text line detection model, processing the image according to the image characteristics of the image to be identified to obtain the distance between each pixel point in the image to be identified and four edges of a rectangular frame area containing a line of characters and the included angle between the rectangular frame area containing a line of characters and the horizontal direction, wherein the text line detection model comprises the corresponding relation between the image characteristics and the distance between each pixel point in the image and four edges of the rectangular frame area containing a line of characters and the included angle between the rectangular frame area and the horizontal direction, and the text line detection model is pre-trained by the model training module based on an image sample and a detection label thereof;
the rectangular frame region determining submodule is used for determining each rectangular frame region containing a row of characters in the image to be identified through a non-maximum value suppression algorithm according to the distance;
and the rectangular frame area adjusting sub-module is used for adjusting the rectangular frame area containing one line of characters with the included angle larger than 45 degrees to be in the vertical direction, and adjusting the rectangular frame area containing one line of characters with the included angle larger than 0 degrees and not larger than 45 degrees to be in the horizontal direction.
10. The apparatus of claim 9, wherein the output result of the text line detection model further includes a probability that each pixel in the image to be identified belongs to a rectangular frame region;
the rectangular frame area determination submodule includes:
the pixel point removing unit is used for removing the pixel points which do not belong to the rectangular frame area according to the probability that each pixel point in the image to be identified belongs to the rectangular frame area and a preset threshold value;
and the rectangular frame area determining unit is used for determining each rectangular frame area containing a row of characters in the image to be identified through a non-maximum value suppression algorithm according to the distance corresponding to the residual pixel points.
11. The apparatus of claim 10, wherein the model training module comprises:
the image sample acquisition sub-module is used for acquiring an initial text line detection model and a plurality of image samples;
the image sample marking sub-module is used for marking four vertex coordinates of each rectangular frame area containing a row of characters in each image sample according to a preset rule;
the detection label generation sub-module is used for calculating the distance between each pixel point and four sides of the rectangular frame area containing one line of characters, the included angle between each rectangular frame area containing one line of characters and the horizontal direction and the probability that each pixel point belongs to the rectangular frame area according to the four vertex coordinates of each rectangular frame area containing one line of characters, so as to obtain the detection label of each image sample;
The prediction label generation sub-module is used for inputting the image sample into the initial text line detection model to generate a prediction label;
the parameter adjustment sub-module is used for adjusting parameters of the initial text line detection model based on the prediction label and the detection label of the corresponding image sample;
and the model generation sub-module is used for judging whether the iteration times of the initial text line detection model reach preset times or not, or whether the accuracy of the predictive label output by the initial text line detection model reaches a preset value or not, stopping training, and obtaining the text line detection model.
12. The apparatus of claim 7, wherein the second identification module comprises:
the first character recognition sub-module is used for recognizing each rectangular frame area containing one row of characters in the image to be recognized to obtain characters corresponding to each rectangular frame area containing one row of characters if the target direction is the direction corresponding to the first rectangular frame area;
the second character recognition sub-module is used for rotating the image to be recognized by 180 degrees to obtain a target recognition image if the target direction is the direction corresponding to the second rectangular frame area; and identifying each rectangular frame area containing one row of characters in the target identification image to obtain characters corresponding to each rectangular frame area containing one row of characters.
13. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
a memory for storing a computer program;
a processor for carrying out the method steps of any one of claims 1-6 when executing a program stored on a memory.
14. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored therein a computer program which, when executed by a processor, implements the method steps of any of claims 1-6.
CN201910427882.2A 2019-05-22 2019-05-22 Method and device for recognizing characters in image and electronic equipment Active CN111985469B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910427882.2A CN111985469B (en) 2019-05-22 2019-05-22 Method and device for recognizing characters in image and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910427882.2A CN111985469B (en) 2019-05-22 2019-05-22 Method and device for recognizing characters in image and electronic equipment

Publications (2)

Publication Number Publication Date
CN111985469A CN111985469A (en) 2020-11-24
CN111985469B true CN111985469B (en) 2024-03-19

Family

ID=73436355

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910427882.2A Active CN111985469B (en) 2019-05-22 2019-05-22 Method and device for recognizing characters in image and electronic equipment

Country Status (1)

Country Link
CN (1) CN111985469B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112381057A (en) * 2020-12-03 2021-02-19 上海芯翌智能科技有限公司 Handwritten character recognition method and device, storage medium and terminal
CN112766266B (en) * 2021-01-29 2021-12-10 云从科技集团股份有限公司 Text direction correction method, system and device based on staged probability statistics
CN113313117B (en) * 2021-06-25 2023-07-25 北京奇艺世纪科技有限公司 Method and device for identifying text content
CN117235831B (en) * 2023-11-13 2024-02-23 北京天圣华信息技术有限责任公司 Automatic part labeling method, device, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09237318A (en) * 1996-03-04 1997-09-09 Fuji Electric Co Ltd Inclination correcting method for character image data inputted by image scanner
JP2005346419A (en) * 2004-06-03 2005-12-15 Canon Inc Method for processing character and character recognition processor
US9552527B1 (en) * 2015-08-27 2017-01-24 Lead Technologies, Inc. Apparatus, method, and computer-readable storage medium for determining a rotation angle of text
CN108229303A (en) * 2017-11-14 2018-06-29 北京市商汤科技开发有限公司 Detection identification and the detection identification training method of network and device, equipment, medium
CN109117848A (en) * 2018-09-07 2019-01-01 泰康保险集团股份有限公司 A kind of line of text character identifying method, device, medium and electronic equipment
CN109685055A (en) * 2018-12-26 2019-04-26 北京金山数字娱乐科技有限公司 Text filed detection method and device in a kind of image

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09237318A (en) * 1996-03-04 1997-09-09 Fuji Electric Co Ltd Inclination correcting method for character image data inputted by image scanner
JP2005346419A (en) * 2004-06-03 2005-12-15 Canon Inc Method for processing character and character recognition processor
US9552527B1 (en) * 2015-08-27 2017-01-24 Lead Technologies, Inc. Apparatus, method, and computer-readable storage medium for determining a rotation angle of text
CN108229303A (en) * 2017-11-14 2018-06-29 北京市商汤科技开发有限公司 Detection identification and the detection identification training method of network and device, equipment, medium
CN109117848A (en) * 2018-09-07 2019-01-01 泰康保险集团股份有限公司 A kind of line of text character identifying method, device, medium and electronic equipment
CN109685055A (en) * 2018-12-26 2019-04-26 北京金山数字娱乐科技有限公司 Text filed detection method and device in a kind of image

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
An On-line Handwritten Japanese Text Recognition System Free from Line Direction and Character Orientation Constraints;Motoki ONUMA, et al.;《IEICE TRANS. INF. & SYST》;20050831;第88卷(第8期);第1823-1830页 *
基于深度学习法的视频文本区域定位与识别;刘明珠 等;《哈尔滨理工大学学报》;第21卷(第6期);第61-66页 *
文字识别中特征与相似度度量的研究;李杰 等;盐城工学院学报(自然科学版);20161231;第29卷(第4期);第42-46页 *

Also Published As

Publication number Publication date
CN111985469A (en) 2020-11-24

Similar Documents

Publication Publication Date Title
CN111985469B (en) Method and device for recognizing characters in image and electronic equipment
US10755120B2 (en) End-to-end lightweight method and apparatus for license plate recognition
CN109492643B (en) Certificate identification method and device based on OCR, computer equipment and storage medium
CN110647829A (en) Bill text recognition method and system
CN112508975A (en) Image identification method, device, equipment and storage medium
CN109284355B (en) Method and device for correcting oral arithmetic questions in test paper
CN107886082B (en) Method and device for detecting mathematical formulas in images, computer equipment and storage medium
CN111353501A (en) Book point-reading method and system based on deep learning
CN112001406B (en) Text region detection method and device
CN112560861A (en) Bill processing method, device, equipment and storage medium
CN109858327B (en) Character segmentation method based on deep learning
CN109447080B (en) Character recognition method and device
CN111310826B (en) Method and device for detecting labeling abnormality of sample set and electronic equipment
CN110222704B (en) Weak supervision target detection method and device
CN115797735A (en) Target detection method, device, equipment and storage medium
CN116580407A (en) Training method of text detection model, text detection method and device
US20230401809A1 (en) Image data augmentation device and method
CN112837404A (en) Method and device for constructing three-dimensional information of planar object
CN112396057A (en) Character recognition method and device and electronic equipment
CN113807407B (en) Target detection model training method, model performance detection method and device
CN115937875A (en) Text recognition method and device, storage medium and terminal
CN109902724B (en) Text recognition method and device based on support vector machine and computer equipment
CN116935179B (en) Target detection method and device, electronic equipment and storage medium
CN112699886B (en) Character recognition method and device and electronic equipment
CN112288003B (en) Neural network training and target detection method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant