CN111985469A - Method and device for recognizing characters in image and electronic equipment - Google Patents

Method and device for recognizing characters in image and electronic equipment Download PDF

Info

Publication number
CN111985469A
CN111985469A CN201910427882.2A CN201910427882A CN111985469A CN 111985469 A CN111985469 A CN 111985469A CN 201910427882 A CN201910427882 A CN 201910427882A CN 111985469 A CN111985469 A CN 111985469A
Authority
CN
China
Prior art keywords
rectangular frame
image
characters
frame area
line
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910427882.2A
Other languages
Chinese (zh)
Other versions
CN111985469B (en
Inventor
徐潇宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Office Software Inc
Zhuhai Kingsoft Office Software Co Ltd
Original Assignee
Beijing Kingsoft Office Software Inc
Zhuhai Kingsoft Office Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Office Software Inc, Zhuhai Kingsoft Office Software Co Ltd filed Critical Beijing Kingsoft Office Software Inc
Priority to CN201910427882.2A priority Critical patent/CN111985469B/en
Publication of CN111985469A publication Critical patent/CN111985469A/en
Application granted granted Critical
Publication of CN111985469B publication Critical patent/CN111985469B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • G06V10/242Aligning, centring, orientation detection or correction of the image by image rotation, e.g. by 90 degrees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Character Input (AREA)

Abstract

The embodiment of the invention provides a method and a device for identifying characters in an image and electronic equipment, wherein the method comprises the following steps: acquiring an image to be identified; carrying out character area identification on an image to be identified, and determining rectangular frame areas containing a line of characters; rotating the first rectangular frame area by 180 degrees to obtain a second rectangular frame area; inputting a character recognition model into the first rectangular frame area and the second rectangular frame area, and performing character recognition according to the image characteristics of the first rectangular frame area and the second rectangular frame area to obtain character recognition results corresponding to the first rectangular frame area and the second rectangular frame area and correct probabilities of the character recognition results; comparing the correct probabilities, and determining the direction of the characters in the rectangular frame region corresponding to the highest correct probability as a target direction; and performing character recognition on the rectangular frame area except the first rectangular frame area according to the target direction to obtain a recognition result. By adopting the embodiment of the invention, characters in various directions can be identified, and the accuracy of character identification in the image can be improved.

Description

Method and device for recognizing characters in image and electronic equipment
Technical Field
The present invention relates to the field of image processing technologies, and in particular, to a method and an apparatus for recognizing characters in an image, and an electronic device.
Background
At present, in many industries such as banks, insurance, finance, libraries and the like, characters in images need to be recorded into an information database to store relevant information, and then the images need to be identified to obtain the characters in the images.
The conventional image recognition method generally includes two steps: the first step is to identify the region where the characters in the image are located, such as using a traditional image processing algorithm, for example, sobel operator, or a deep learning method to identify the region where the characters in the image are located; the second step is to perform Character Recognition on the region where the characters in the image are located, which is recognized in the first step, for example, by using a deep learning method or OCR (Optical Character Recognition).
If the direction in which the direction of the characters in the image corresponds to the viewing angle of the person viewing the image is the forward direction, the direction of the characters in some images may not be the forward direction, and the image recognition method can only recognize the images with the forward direction of the characters, which may result in that the characters in the images cannot be correctly recognized if the direction of the characters is not the forward direction.
Disclosure of Invention
An object of an embodiment of the present invention is to provide a method for recognizing characters in an image, which can correctly recognize characters in an image when a direction of the characters in the image does not conform to a direction of a viewing angle from which a person views the image. The specific technical scheme is as follows:
in a first aspect, an embodiment of the present invention provides a method for recognizing characters in an image, where the method includes:
acquiring an image to be recognized, wherein the image to be recognized comprises characters;
carrying out character area recognition on the image to be recognized, and determining rectangular frame areas containing a line of characters;
rotating the first rectangular frame area by 180 degrees to obtain a second rectangular frame area, wherein the first rectangular frame area is one of the rectangular frame areas containing one line of characters;
inputting a character recognition model into the first rectangular frame area and the second rectangular frame area, and performing character recognition according to the image characteristics of the first rectangular frame area and the second rectangular frame area to obtain character recognition results corresponding to the first rectangular frame area and the second rectangular frame area and correct probabilities thereof, wherein the character recognition model comprises the corresponding relation between the image characteristics and the character recognition results and the correct probabilities thereof;
Comparing the correct probabilities, and determining the direction of the characters in the rectangular frame region corresponding to the highest correct probability as a target direction;
and performing character recognition on the rectangular frame area except the first rectangular frame area according to the target direction to obtain a recognition result.
Optionally, before the step of rotating the first rectangular frame area by 180 degrees to obtain the second rectangular frame area, the method further includes:
judging whether the height of each rectangular frame area containing a line of characters is larger than the width;
and if the height of the rectangular frame area containing a line of characters is larger than the width, rotating the rectangular frame area containing a line of characters by 90 degrees towards a preset direction.
Optionally, the step of performing text region identification on the image to be identified and determining each rectangular frame region containing a line of text includes:
inputting the image to be recognized into a text line detection model, and processing according to the image characteristics of the image to be recognized to obtain the distance between each pixel point in the image to be recognized and the four sides of the rectangular frame area containing one line of characters to which the pixel point belongs and the included angle between the rectangular frame area containing one line of characters and the horizontal direction, wherein the text line detection model comprises the corresponding relation between the image characteristics and the distance between each pixel point in the image and the four sides of the rectangular frame area containing one line of characters to which the pixel point belongs and the included angle between the rectangular frame area and the horizontal direction;
Determining each rectangular frame area containing a line of characters in the image to be recognized through a non-maximum suppression algorithm according to the distance;
and adjusting the rectangular frame area which is larger than 45 degrees and contains a line of characters to be in the vertical direction, and adjusting the rectangular frame area which is larger than 0 degree and not larger than 45 degrees and contains a line of characters to be in the horizontal direction.
Optionally, the output result of the text line detection model further includes a probability that each pixel point in the image to be identified belongs to the rectangular frame region;
the step of determining each rectangular frame area containing a line of characters in the image to be recognized through a non-maximum suppression algorithm according to the distance comprises the following steps:
removing pixel points which do not belong to the rectangular frame region according to the probability that each pixel point in the image to be identified belongs to the rectangular frame region and a preset threshold;
and determining each rectangular frame area containing a line of characters in the image to be recognized through a non-maximum suppression algorithm according to the distance corresponding to the residual pixel points.
Optionally, the training mode of the text line detection model includes:
acquiring an initial text line detection model and a plurality of image samples;
Marking four vertex coordinates of each rectangular frame area containing a line of characters in each image sample according to a preset rule;
calculating the distance between each pixel point and four sides of the rectangular frame region containing a line of characters, the included angle between each rectangular frame region containing a line of characters and the horizontal direction and the probability of each pixel point belonging to the rectangular frame region according to the four vertex coordinates of each rectangular frame region containing a line of characters, and obtaining the detection label of each image sample;
inputting the image sample into the initial text line detection model to obtain a prediction label;
adjusting parameters of the initial text line detection model based on the prediction labels and detection labels of the corresponding image samples;
and judging whether the iteration times of the initial text line detection model reach preset times or not, or judging whether the accuracy of a prediction label output by the initial text line detection model reaches a preset value or not, and stopping training to obtain the text line detection model.
Optionally, the step of performing character recognition on the rectangular frame area outside the first rectangular frame area according to the target direction to obtain a recognition result includes:
If the target direction is the direction corresponding to the characters in the first rectangular frame area, identifying each rectangular frame area which contains one line of characters except the first rectangular frame area in the image to be identified to obtain the characters corresponding to each rectangular frame area containing one line of characters;
if the target direction is the direction corresponding to the characters in the second rectangular frame area, rotating the image to be recognized by 180 degrees to obtain a target recognition image; and identifying each rectangular frame area which contains one line of characters except the first rectangular frame area in the target identification image to obtain characters corresponding to each rectangular frame area containing one line of characters.
In a second aspect, an embodiment of the present invention provides an apparatus for recognizing characters in an image, where an image to be recognized acquiring module is configured to acquire an image to be recognized, where the image to be recognized includes characters;
the rectangular frame area determining module is used for carrying out character area identification on the image to be identified and determining rectangular frame areas containing a line of characters;
the first rotating module is used for rotating the first rectangular frame area by 180 degrees to obtain a second rectangular frame area, wherein the first rectangular frame area is one of the rectangular frame areas containing one line of characters;
The first recognition module is used for inputting a character recognition model into the first rectangular frame area and the second rectangular frame area, and performing character recognition according to the image characteristics of the first rectangular frame area and the second rectangular frame area to obtain character recognition results corresponding to the first rectangular frame area and the second rectangular frame area and correct probabilities thereof, wherein the character recognition model comprises the corresponding relation between the image characteristics and the character recognition results and the correct probabilities thereof;
the target direction determining module is used for comparing the correct probabilities and determining the direction of the characters in the rectangular frame region corresponding to the highest correct probability as a target direction;
and the second identification module is used for carrying out character identification on the rectangular frame area except the first rectangular frame area according to the target direction to obtain an identification result.
Optionally, the apparatus further comprises:
a rectangular frame area judgment module, configured to judge whether the height of each rectangular frame area containing a line of characters is greater than the width before the first rectangular frame area is rotated by 180 degrees to obtain a second rectangular frame area;
the second rotation module is configured to rotate the rectangular frame area including the line of characters by 90 degrees in a preset direction if the height of the rectangular frame area including the line of characters is larger than the width of the rectangular frame area.
Optionally, the rectangular frame region determining module includes:
the text line detection submodule is used for inputting the image to be recognized into a text line detection model and processing the image according to the image characteristics of the image to be recognized to obtain the distance between each pixel point in the image to be recognized and the four sides of the rectangular frame area containing one line of characters to which the pixel point belongs and the included angle between the rectangular frame area containing one line of characters and the horizontal direction, wherein the text line detection model comprises the corresponding relation between the image characteristics and the distance between each pixel point in the image and the four sides of the rectangular frame area containing one line of characters to which the pixel point belongs and the included angle between the rectangular frame area and the horizontal direction, and the text line detection model is trained in advance by the model training module based on an image sample and a detection label thereof;
the rectangular frame area determining submodule is used for determining rectangular frame areas containing a line of characters in the image to be recognized through a non-maximum suppression algorithm according to the distance;
and the rectangular frame area adjusting submodule is used for adjusting the rectangular frame area which is larger than 45 degrees and contains a line of characters into the vertical direction, and adjusting the rectangular frame area which is larger than 0 degree and is not larger than 45 degrees and contains a line of characters into the horizontal direction.
Optionally, the output result of the text line detection model further includes a probability that each pixel point in the image to be identified belongs to the rectangular frame region;
the rectangular frame region determination submodule includes:
the pixel point removing unit is used for removing pixel points which do not belong to the rectangular frame region according to the probability that each pixel point in the image to be identified belongs to the rectangular frame region and a preset threshold value;
and the rectangular frame area determining unit is used for determining the rectangular frame areas containing one line of characters in the image to be recognized through a non-maximum suppression algorithm according to the corresponding distances of the residual pixel points.
Optionally, the model training module includes:
the image sample acquisition sub-module is used for acquiring an initial text line detection model and a plurality of image samples;
the image sample marking submodule is used for marking four vertex coordinates of each rectangular frame area containing a line of characters in each image sample according to a preset rule;
the detection label generation submodule is used for calculating the distance between each pixel point and four sides of the rectangular frame area containing one line of characters, the included angle between the rectangular frame area containing one line of characters and the horizontal direction and the probability of each pixel point belonging to the rectangular frame area according to the four vertex coordinates of the rectangular frame area containing one line of characters, and obtaining the detection label of each image sample;
The prediction label generation submodule is used for inputting the image sample into the initial text line detection model to generate a prediction label;
the parameter adjusting submodule is used for adjusting the parameters of the initial text line detection model based on the prediction label and the detection label of the corresponding image sample;
and the model generation submodule is used for judging whether the iteration times of the initial text line detection model reach preset times or not, or judging whether the accuracy of a prediction label output by the initial text line detection model reaches a preset value or not, and stopping training to obtain the text line detection model.
Optionally, the second identification module includes:
the first character recognition sub-module is used for recognizing each rectangular frame area containing one line of characters in the image to be recognized if the target direction is the direction corresponding to the first rectangular frame area to obtain the characters corresponding to each rectangular frame area containing one line of characters;
the second character recognition submodule is used for rotating the image to be recognized by 180 degrees to obtain a target recognition image if the target direction is the direction corresponding to the second rectangular frame area; and identifying each rectangular frame area containing a line of characters in the target identification image to obtain characters corresponding to each rectangular frame area containing a line of characters.
In a third aspect, an embodiment of the present invention provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor and the communication interface complete communication between the memory and the processor through the communication bus;
a memory for storing a computer program;
and the processor is used for realizing any one of the steps of the method for identifying the characters in the image when executing the program stored in the memory.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when being executed by a processor, the computer program implements any of the above-mentioned steps of the method for recognizing characters in an image.
In the scheme provided by the embodiment of the invention, the electronic equipment can acquire an image to be recognized, perform character region recognition on the image to be recognized, determine each rectangular frame region containing one line of characters, determine one of the rectangular frame regions containing one line of characters as a first rectangular frame region, input a character recognition model into a second rectangular frame region obtained by rotating the first rectangular frame region and the first rectangular frame region by 180 degrees, perform character recognition according to the image characteristics of the first rectangular frame region and the second rectangular frame region to obtain character recognition results corresponding to the first rectangular frame region and the second rectangular frame region and correct probabilities thereof, compare the correct probabilities, determine the direction of the characters in the rectangular frame region corresponding to the highest correct probability as a target direction, and finally perform character recognition on the rectangular frame regions outside the first rectangular frame region according to the target direction, and obtaining a recognition result. Therefore, the electronic equipment identifies the characters in the image according to the target direction determined by the scheme, and can correctly identify the characters in the image when the direction of the characters in the image does not accord with the direction of the visual angle of the image watched by people.
Of course, not all of the advantages described above need to be achieved at the same time in the practice of any one product or method of the invention.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of a method for recognizing characters in an image according to an embodiment of the present invention;
FIG. 2(a) is a schematic diagram of an image to be recognized according to an embodiment of the present invention;
FIG. 2(b) is a schematic diagram of a rectangular frame area containing a line of text according to an embodiment of the present invention;
fig. 2(c) is a schematic diagram of a left direction of characters in an image to be recognized according to an embodiment of the present invention;
fig. 2(d) is a schematic diagram of a text direction in an image to be recognized being a right direction according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating the step S102 in the embodiment shown in FIG. 1;
FIG. 4 is a diagram illustrating an output result of a text line detection model according to an embodiment of the present invention;
FIG. 5 is a flowchart illustrating the embodiment of FIG. 3 in a specific manner of step S302;
FIG. 6 is a flowchart of a training method of a text line detection model according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of an apparatus for recognizing characters in an image according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
For convenience of description and clear solution, the direction in which the direction of the characters in the image corresponds to the angle of view of the image viewed by a person is referred to as the forward direction, the direction obtained by clockwise rotating the forward direction by 90 degrees is referred to as the left direction, the direction obtained by counterclockwise rotating the forward direction by 90 degrees is referred to as the right direction, and the direction obtained by rotating the forward direction by 180 degrees is referred to as the reverse direction. In a conventional image recognition mode, the electronic device recognizes characters in an image to be recognized in a forward direction, and when the direction of the characters in the image is the left direction, the right direction or the reverse direction, the electronic device cannot correctly recognize the characters in the image.
In order to correctly identify characters in an image when the direction of the characters in the image does not conform to the direction of the visual angle of a person watching the image, the embodiment of the invention provides a method and a device for identifying the characters in the image, electronic equipment and a computer readable storage medium.
First, a method for recognizing characters in an image according to an embodiment of the present invention is described below.
The method for recognizing characters in an image provided by the embodiment of the present invention can be applied to any electronic device that needs to recognize characters in an image, for example, a computer, a mobile phone, a processor, and the like, and is not limited specifically herein. For convenience of description, the electronic device is hereinafter referred to simply.
As shown in fig. 1, a method for recognizing characters in an image, the method comprising:
s101, acquiring an image to be identified;
wherein, the image to be identified comprises characters.
S102, carrying out character area identification on an image to be identified, and determining rectangular frame areas containing a line of characters;
s103, rotating the first rectangular frame area by 180 degrees to obtain a second rectangular frame area;
the first rectangular frame area is one of the rectangular frame areas which respectively contain a line of characters.
S104, inputting the first rectangular frame area and the second rectangular frame area into a character recognition model, and performing character recognition according to the image characteristics of the first rectangular frame area and the second rectangular frame area to obtain character recognition results corresponding to the first rectangular frame area and the second rectangular frame area and correct probabilities of the character recognition results;
The character recognition model comprises the corresponding relation between the image characteristics and the character recognition result and the correct probability thereof.
S105, comparing the correct probabilities, and determining the direction of the characters in the rectangular frame region corresponding to the highest correct probability as a target direction;
and S106, performing character recognition on the rectangular frame area except the first rectangular frame area according to the target direction to obtain a recognition result.
It can be seen that in the solution provided in the embodiment of the present invention, the electronic device may obtain an image to be recognized, perform text region recognition on the image to be recognized, determine each rectangular frame region containing one line of text, determine one of the rectangular frame regions containing one line of text as a first rectangular frame region, input a text recognition model into a second rectangular frame region obtained by rotating the first rectangular frame region and the first rectangular frame region by 180 degrees, perform text recognition according to image features of the first rectangular frame region and the second rectangular frame region, obtain text recognition results corresponding to the first rectangular frame region and the second rectangular frame region and correct probabilities thereof, compare the correct probabilities, determine a direction of text in the rectangular frame region corresponding to the highest correct probability as a target direction, and perform text recognition on the rectangular frame regions outside the first rectangular frame region according to the target direction, and obtaining a recognition result. Therefore, the electronic equipment identifies the characters in the image according to the target direction determined by the scheme, and can correctly identify the characters in the image when the direction of the characters in the image does not accord with the direction of the visual angle of the image watched by people.
In the step S101, the electronic device may acquire an image to be recognized, where the image to be recognized includes characters. If the electronic equipment has an image acquisition function, the image to be identified containing the characters can be an image acquired by the electronic equipment; images that may also be stored locally for the electronic device; but also images transmitted by other electronic devices.
After obtaining the image to be recognized, the electronic device may execute step S102, that is, perform text region recognition on the image to be recognized, and determine each rectangular frame region containing one line of text, where the number of the rectangular frame regions containing one line of text is the number of lines of text in the image to be recognized.
For example, fig. 2(a) is a schematic diagram of an image to be recognized, fig. 2(b) is a schematic diagram of rectangular frame areas determined by the electronic device after performing text area recognition on the image to be recognized shown in fig. 2(a) and each rectangular frame area includes one line of text, and fig. 2(b) includes 8 rectangular frame areas 01 including one line of text, which illustrates that the image to be recognized shown in fig. 2(a) includes 8 lines of text. In one embodiment, the electronic device may obtain, by using a text line detection model, rectangular frame regions each including a line of characters according to image features of an image to be recognized.
In general, in the image to be recognized, the directions of all the characters are the same direction. In each rectangular frame region including one line of characters obtained in step S102, the directions of the characters are all the same. Therefore, the electronic device can determine the direction of the characters in the image to be recognized only by selecting any one rectangular frame region from the rectangular frame regions containing one line of characters. For convenience of description and clarity of the scheme, a rectangular frame region selected from rectangular frame regions each including a line of text will be referred to as a first rectangular frame region.
In step S103, the electronic device may rotate the first rectangular frame region by 180 degrees to obtain a second rectangular frame region. The direction of the characters in the first rectangular frame area may be a forward direction or a reverse direction, the electronic device may rotate the first rectangular frame area by 180 degrees to obtain a second rectangular frame area, if the direction of the characters in the first rectangular frame area is the reverse direction, the direction of the characters in the second rectangular frame area is the forward direction, and if the direction of the characters in the first rectangular frame area is the forward direction, the direction of the characters in the second rectangular frame area is the reverse direction.
Generally, the number of words of each line of characters in the image to be recognized is inconsistent, the larger the width-to-height ratio of the rectangular frame area is, the larger the number of characters contained in the rectangular frame area is, and the smaller the width-to-height ratio of the rectangular frame area is, the smaller the number of characters contained in the rectangular frame area is. The direction of the characters in the rectangular frame area containing the larger number of the characters can represent the direction of the characters in the image to be recognized, so in this case, the electronic equipment can select the rectangular frame area with the largest number of the characters as the first rectangular frame area for recognition. In one embodiment, the electronic device may select a rectangular frame region with a largest aspect ratio as a first rectangular frame region, and rotate the first rectangular frame region by 180 degrees to obtain a second rectangular frame region.
After obtaining the first rectangular frame region and the second rectangular frame region, the electronic device may execute step S104, that is, input the character recognition model into the first rectangular frame region and the second rectangular frame region, and perform character recognition according to the image features of the first rectangular frame region and the second rectangular frame region, to obtain character recognition results corresponding to the first rectangular frame region and the second rectangular frame region and correct probabilities thereof.
The character recognition model comprises the corresponding relation between the image characteristics and the character recognition result and the correct probability thereof.
The character recognition model can be obtained by training the initial character recognition model through the electronic equipment. In the training process, the initial character recognition model can learn the corresponding relation between the image characteristics and the character recognition result and the correct probability thereof.
In one embodiment, the electronic device may input the first rectangular frame region and the second rectangular frame region into the character recognition model, and obtain the character recognition result and the correct probability thereof corresponding to the first rectangular frame region and the second rectangular frame region. For example, the character recognition model may be a CNN (Convolutional Neural network), the electronic device may input the first rectangular frame region and the second rectangular frame region into the CNN, and the CNN may extract image features in the first rectangular frame region and the second rectangular frame region, then convert the image features into a feature sequence, and further calculate the feature sequence through a normalized exponential function, so as to obtain a posterior probability matrix.
Each column of the a posteriori probability matrix represents an output text category and each row represents a text identified by a feature sequence. And taking the maximum probability of each column of the matrix to obtain the characters with the maximum probability in each column, combining the characters with the maximum probability into a character sequence as the character sequence with the maximum probability, judging whether the adjacent characters in the obtained character sequence with the maximum probability are repeated, if the repeated characters exist, removing the repeated characters to obtain a character recognition result, and finally calculating the posterior probability of the character recognition result according to a Bayesian formula to serve as the correct probability.
After obtaining the correct probabilities, the electronic device may compare the two correct probabilities, and determine the direction of the text in the rectangular frame region corresponding to the highest correct probability as the target direction. The higher the correct probability is, the more accurate the corresponding character recognition result is, and the higher the probability that the direction of the character in the corresponding rectangular frame region is the forward direction is. Therefore, the electronic device can determine the direction of the text in the rectangular frame region corresponding to the highest correct probability as the target direction, which means that the direction of the text in the rectangular frame region is the forward direction.
Furthermore, the electronic device can perform character recognition on the rectangular frame area except the first rectangular frame area according to the target direction to obtain a recognition result. In general, in the image to be recognized, all the directions of the characters are the same, so that the electronic device can obtain a correct recognition result by performing character recognition on the rectangular frame region other than the first rectangular frame region according to the target direction.
In one embodiment, the electronic device may perform character recognition on a rectangular frame region other than the first rectangular frame region by using an OCR method according to the target direction to obtain a recognition result.
As an implementation manner of the embodiment of the present invention, before the step of rotating the first rectangular frame region by 180 degrees to obtain the second rectangular frame region, the method may further include the following steps:
judging whether the height of each rectangular frame area containing a line of characters is larger than the width; if the height of the rectangular frame area containing a line of characters is larger than the width, the rectangular frame area containing a line of characters is rotated by 90 degrees towards the preset direction.
The direction of the characters in the image to be recognized acquired by the electronic equipment can be left or right. For example, as shown in a rectangular frame area 02 in fig. 2(c), since the direction of the characters therein is left, the height of the rectangular frame area is larger than the width. As shown in the rectangular frame area 03 in fig. 2(d), since the direction of the characters therein is rightward, the height of the rectangular frame area is larger than the width thereof.
Therefore, in order to determine whether the direction of the text in the image to be recognized is in the left direction or the right direction, the electronic device may determine whether the height of each rectangular frame region containing a line of text is greater than the width, and if the height of a rectangular frame region containing a line of text is greater than the width, it indicates that the direction of the text in the rectangular frame region may be in the left direction or the right direction.
Furthermore, in a case that the direction of the text in the image to be recognized acquired by the electronic device may be left or right, in order to change the direction of the text in the image to be recognized to be positive or negative for performing the text recognition processing subsequently, the electronic device may rotate the rectangular frame area containing one line of text by 90 degrees in the preset direction.
If the preset direction is clockwise, if the direction of the characters in the image to be recognized is leftward, the electronic device may rotate the rectangular frame region by 90 degrees clockwise, and at this time, the direction of the characters in the rectangular frame region is changed from leftward to reverse. If the character image in the image to be recognized is in the right direction, the electronic device can rotate the rectangular frame region by 90 degrees in the clockwise direction, and at the moment, the character direction in the rectangular frame region is changed from the right direction to the positive direction.
If the prediction direction is the counterclockwise direction, if the direction of the characters in the image to be recognized is the left direction, the electronic device may rotate the rectangular frame region by 90 degrees in the counterclockwise direction, and at this time, the direction of the characters in the rectangular frame region is changed from the left direction to the forward direction. If the direction of the characters in the image to be recognized is the right direction, the electronic device may rotate the rectangular frame region by 90 degrees clockwise, and at this time, the direction of the characters in the rectangular frame region is changed from the right direction to the reverse direction.
Therefore, in this embodiment, the electronic device may determine whether the height of each rectangular frame region containing a line of characters is greater than the width; if the height of the rectangular frame area containing a line of characters is larger than the width, the rectangular frame area containing a line of characters is rotated by 90 degrees towards the preset direction. Therefore, when the direction of the characters in the image to be recognized is left or right, the direction of the characters in the image to be recognized can be changed into positive or negative, so that the subsequent character recognition processing is convenient.
As an implementation manner of the embodiment of the present invention, as shown in fig. 3, the step of performing character region recognition on the image to be recognized and determining each rectangular frame region containing one line of characters may include the following steps:
s301, inputting an image to be recognized into a text line detection model, and processing according to the image characteristics of the image to be recognized to obtain the distance between each pixel point in the image to be recognized and the four sides of a rectangular frame region containing a line of characters to which the pixel point belongs and the included angle between the rectangular frame region containing the line of characters and the horizontal direction;
the text line detection model may include a correspondence relationship between image characteristics and distances between each pixel point in the image and four sides of a rectangular frame region including a line of text to which the pixel point belongs, and an included angle between the rectangular frame region and a horizontal direction.
The text line detection model may be trained on an initial text line detection model by the electronic device. In the training process, the initial text line detection model can learn the corresponding relation between the image characteristics and the distance between each pixel point in the image and the four sides of the rectangular frame region containing a line of characters and the included angle between the rectangular frame region and the horizontal direction.
Fig. 4 is a schematic diagram of an output result of the text line detection model according to the embodiment of the present invention, that is, a schematic diagram of distances between each pixel point in the image and four sides of a rectangular frame region including a line of characters to which the pixel point belongs, and an included angle between the rectangular frame region and a horizontal direction. Distances between one pixel point 401 in the image to be recognized and four sides of the rectangular frame area 400 containing one line of characters to which the pixel point belongs are a dotted line section 402, a dotted line section 403, a dotted line section 404 and a dotted line section 405 respectively, and an included angle between the rectangular frame area 400 and the horizontal direction 407 is an angle 406.
S302, determining each rectangular frame area containing a line of characters in the image to be recognized through a non-maximum suppression algorithm according to the distance;
after obtaining the distances between each pixel point in the image to be recognized and the four sides of the rectangular frame area containing the text to which the pixel point belongs, the electronic device may determine, according to the distances, each rectangular frame area containing one line of text in the image to be recognized through a non-maximum suppression algorithm.
S303, adjusting the rectangular frame area which is larger than 45 degrees and contains a line of characters to be in the vertical direction, and adjusting the rectangular frame area which is larger than 0 degree and not larger than 45 degrees and contains a line of characters to be in the horizontal direction.
The electronic device can adjust the rectangular frame area containing a line of characters with the included angle larger than 45 degrees to the vertical direction and adjust the rectangular frame area containing a line of characters with the included angle larger than 0 degree and not larger than 45 degrees to the horizontal direction according to the included angle between the rectangular frame area containing a line of characters and the horizontal direction. For example, the angle 406 between the rectangular frame area 400 containing a line of text and the horizontal direction 407 shown in fig. 4 is 10 degrees and less than 45 degrees, so the electronic device can rotate the rectangular frame area 400 clockwise by 10 degrees and adjust the rectangular frame area to the horizontal direction.
As can be seen, in this embodiment, the electronic device may input the image to be recognized into the text line detection model, and perform processing according to the image features of the image to be recognized to obtain the distance between each pixel point in the image to be recognized and the four sides of the rectangular frame region containing the text to which the pixel point belongs, and the included angle between the rectangular frame region and the horizontal direction, and then the electronic device may adjust the rectangular frame region containing a line of text, where the included angle is greater than 45 degrees, to the vertical direction, and adjust the rectangular frame region containing a line of text, where the included angle is greater than 0 degrees and not greater than 45 degrees, to the horizontal direction according to the distance. Therefore, when the rectangular frame area containing one line of characters in the image to be recognized is not in the horizontal direction or the vertical direction, the electronic equipment can adjust the rectangular frame area containing one line of characters in the image to be recognized to be in the horizontal direction or the vertical direction, and the rectangular frame area can be conveniently recognized subsequently.
As an implementation manner of the embodiment of the present invention, the electronic device inputs the image to be recognized into the text line detection model, and a result of processing according to the image feature of the image to be recognized may further include a probability that each pixel point in the image to be recognized belongs to the rectangular frame region.
The text line detection model may be trained on an initial text line detection model by the electronic device. In the training process, the initial text line detection model can learn the corresponding relation between the image characteristics and the probability that each pixel point in the image belongs to the rectangular frame region. Therefore, the electronic device inputs the image to be recognized into the text line detection model, and the result of processing according to the image features of the image to be recognized may include the probability that each pixel point in the image to be recognized belongs to the rectangular frame region.
In this case, as shown in fig. 5, the step of determining, according to the distance and by using a non-maximum suppression algorithm, each rectangular frame region containing a line of characters in the image to be recognized may include:
s501, the electronic equipment can remove pixel points which do not belong to the rectangular frame region according to the probability that each pixel point in the image to be identified belongs to the rectangular frame region and a preset threshold;
The electronic equipment can input the image to be recognized into the text line detection model, and the text line detection model can determine the probability that each pixel point in the image to be recognized belongs to the rectangular frame region according to the image characteristics of the image to be recognized and output the probability. The electronic equipment only needs to identify each rectangular frame area containing one line of characters in the image, so that the electronic equipment can remove the pixel points which do not belong to the rectangular frame area.
When the probability that the pixel point belongs to the rectangular frame region is not smaller than the preset threshold, the probability that the pixel point belongs to the rectangular frame region is high, so that the electronic equipment can keep the pixel point; when the probability that the pixel point belongs to the rectangular frame region is smaller than the preset threshold, the probability that the pixel point does not belong to the rectangular frame region is high, and therefore the electronic device can remove the pixel point.
S502, the electronic equipment determines rectangular frame areas containing a line of characters in the image to be recognized through a non-maximum suppression algorithm according to the distance corresponding to the residual pixel points.
The remaining pixels are pixels left by the electronic device in step S501 after removing the pixels belonging to the rectangular frame region and having the probability lower than the preset threshold. The electronic device can determine each rectangular frame area containing a line of characters in the image to be recognized through a non-maximum suppression algorithm according to the distance corresponding to the remaining pixel points, and then the electronic device can adjust the rectangular frame area with the included angle larger than 45 degrees to the vertical direction and adjust the rectangular frame area with the included angle larger than 0 degree and not larger than 45 degrees to the horizontal direction according to the included angle between the rectangular frame area corresponding to the remaining pixel points and the horizontal direction.
Therefore, in this embodiment, the electronic device may remove the pixel points that do not belong to the rectangular frame region according to the probability that each pixel point in the image to be recognized belongs to the rectangular frame region and the preset threshold, and then determine, according to the distances corresponding to the remaining pixel points, each rectangular frame region containing one line of characters in the image to be recognized through a non-maximum suppression algorithm. Therefore, the electronic equipment only needs to determine the rectangular frame areas containing one line of characters in the image to be recognized according to the distances corresponding to the pixel points belonging to the rectangular frame areas, and the calculated amount can be reduced by removing the pixel points not belonging to the rectangular frame areas, so that the efficiency is improved.
As an implementation manner of the embodiment of the present invention, as shown in fig. 6, the method for training the text line model may include:
s601, the electronic equipment obtains an initial text line detection model and a plurality of image samples;
wherein, all the image samples contain characters. The initial text line detection model can be a deep learning model such as a convolutional neural network, parameters of the initial text line detection model can be initialized at any time, and the structure of the initial text line detection model is not specifically limited.
S602, the electronic equipment marks four vertex coordinates of each rectangular frame area containing a line of characters in each image sample according to a preset rule;
after obtaining the plurality of image samples, in order to train the initial text line detection model, the electronic device may determine a rectangular frame area containing a line of characters in each image sample, further calibrate four vertices of the rectangular frame area containing a line of characters in each image sample, and determine coordinates of the four vertices to calculate a detection label thereof.
S603, calculating the distance between each pixel point and four sides of the rectangular frame region containing a line of characters to which the pixel point belongs, the included angle between each rectangular frame region containing a line of characters and the horizontal direction and the probability of each pixel point belonging to the rectangular frame region according to the four vertex coordinates of each rectangular frame region containing a line of characters, and obtaining the detection label of each image sample;
Because the trained text line detection model needs to process the image and determines the distance between each pixel point and the four sides of the rectangular frame region containing one line of characters, the included angle between each rectangular frame region containing one line of characters and the horizontal direction and the probability that each pixel point belongs to the rectangular frame region, the electronic equipment can calculate the distance between each pixel point and the four sides of the rectangular frame region containing one line of characters, the included angle between each rectangular frame region containing one line of characters and the horizontal direction and the probability that each pixel point belongs to the rectangular frame region according to the four vertex coordinates of each rectangular frame region containing one line of characters, and the detection label of each image sample.
In an embodiment, in order to ensure that the remaining pixels determined by the electronic device through a preset probability threshold belong to a rectangular frame region, when calculating the probability that the pixels belong to the rectangular frame region, the rectangular frame region including a line of characters can be reduced by one third to obtain the reduced rectangular frame region, and then whether each pixel belongs to the reduced rectangular frame region is determined, if so, the score corresponding to the pixel can be determined to be higher, and if not, the score corresponding to the pixel can be determined to be lower, and the obtained score is used as the probability that each pixel belongs to the rectangular frame region.
In another embodiment, the distance between each pixel point in each image sample and four sides of the rectangular frame region containing a line of text to which the pixel point belongs, and the included angle in the horizontal direction of each rectangular frame region containing a line of text may also be calculated by other methods, which is not specifically limited herein.
S604, the electronic equipment inputs the image sample into an initial text line detection model to obtain a prediction label;
after the image sample is marked to obtain the detection tag, the electronic device may input the image sample into the initial text line detection model, the initial text line detection model may process the image sample based on the current parameters, and determine, according to the image characteristics of the image sample, the distance between each pixel point and four sides of a rectangular frame region including a line of characters to which the pixel point belongs, the included angle between each rectangular frame region including a line of characters and the horizontal direction, and the probability that each pixel point belongs to the rectangular frame region, that is, the prediction tag.
S605, adjusting parameters of an initial text line detection model based on the prediction label and the detection label of the corresponding image sample;
because the initial text line detection model cannot accurately process the image sample to obtain an accurate output result, after the prediction tag and the detection tag are obtained, the electronic device can compare the prediction tag with the corresponding detection tag, and then adjust the parameters of the initial text line detection model according to the difference between the prediction tag and the corresponding detection tag, so that the parameters of the initial text line detection model are more appropriate. The method for adjusting the parameters of the initial text line detection model may be a model parameter adjustment method such as a random gradient descent algorithm, and is not specifically limited and described herein.
In the training process, the initial text line detection model can continuously learn the corresponding relation between the image characteristics of the image sample and the distance between each pixel point in the image sample and the four sides of the rectangular frame region containing one line of characters, the included angle between each rectangular frame region containing one line of characters and the horizontal direction and the probability that each pixel point belongs to the rectangular frame region.
And S606, the electronic equipment judges whether the iteration times of the initial text line detection model reach preset times or not, or whether the accuracy of a prediction label output by the initial text line detection model reaches a preset value or not, and stops training to obtain the text line detection model.
If the iteration times of the initial text line detection model reach preset times, or the accuracy of the prediction label output by the initial text line detection model reaches a preset value, the current initial text line detection model can process the image to obtain an accurate output result, so the training can be stopped at the moment to obtain the text line detection model.
The preset number of times may be set according to factors such as identification requirements and model structures, for example, the preset number of times may be 5000 times, 10000 times, 12000 times, and the like, and is not particularly limited herein. The preset value may be defined according to the recognition requirement, the model structure, and other factors, and may be, for example, 99%, 99.1%, 99.2%, and the like, and is not specifically limited herein.
It can be seen that, in this embodiment, the electronic device may obtain an initial text line detection model and a plurality of image samples, then mark four vertex coordinates of each rectangular frame region containing one line of characters in each image sample according to a preset rule, calculate, according to the four vertex coordinates, a distance between each pixel point and four sides of the rectangular frame region containing one line of characters to which the pixel point belongs, an included angle between each rectangular frame region containing one line of characters and the horizontal direction, and a probability that each pixel point belongs to the rectangular frame region, obtain a detection label of each image sample, then input the image sample into the initial text line detection model, obtain a prediction label, further adjust parameters of the initial text line detection model based on the prediction label and the corresponding detection label of the image sample, and finally determine whether the iteration number of the initial text line detection model reaches a preset number, or, detecting whether the accuracy of the prediction label output by the model reaches a preset value or not by the initial text line. Therefore, the electronic equipment can compare the detection label with the prediction label, adjust the parameters of the initial text line detection model according to the comparison result, obtain the text line detection model meeting the requirements, and further improve the identification accuracy.
As an implementation manner of the embodiment of the present invention, the step of performing character recognition on the rectangular frame regions other than the first rectangular frame region according to the target direction to obtain a recognition result may include the following steps:
if the target direction is the direction corresponding to the characters in the first rectangular frame area, identifying each rectangular frame area containing one line of characters in the image to be identified to obtain the characters corresponding to each rectangular frame area containing one line of characters; if the target direction is the direction corresponding to the characters in the second rectangular frame area, rotating the image to be recognized by 180 degrees to obtain a target recognition image; and identifying each rectangular frame area containing a line of characters in the target identification image to obtain characters corresponding to each rectangular frame area containing a line of characters.
In general, in the image to be recognized, the directions of all the characters are the same direction. Therefore, if the target direction is the direction corresponding to the text in the first rectangular frame region, which indicates that the directions of all the texts in the image to be recognized are consistent with the target direction, the electronic device can recognize each rectangular frame region containing one line of texts in the image to be recognized according to the target direction, and obtain the text corresponding to each rectangular frame region containing one line of texts.
If the target direction is the direction corresponding to the characters in the second rectangular frame region, which indicates that the directions of all the characters in the image to be recognized are consistent with the target direction after rotating 180 degrees, the electronic device can rotate the image to be recognized by 180 degrees to obtain a target recognition image, and then recognize each rectangular frame region containing one line of characters in the target recognition image according to the target direction to obtain the characters corresponding to each rectangular frame region containing one line of characters.
As can be seen, in this embodiment, if the target direction is the direction corresponding to the text in the first rectangular frame region, the electronic device may identify each rectangular frame region containing one line of text in the image to be identified, to obtain the text corresponding to each rectangular frame region containing one line of text, and if the target direction is the direction corresponding to the text in the second rectangular frame region, the electronic device may rotate the image to be identified by 180 degrees, to obtain the target identification image, and further identify each rectangular frame region containing one line of text in the target identification image, to obtain the text corresponding to each rectangular frame region containing one line of text. Therefore, the electronic equipment can identify the characters outside the first rectangular frame region in the image to be identified according to the correct direction, so that repeated identification is avoided, and the working efficiency is improved.
Corresponding to the method for recognizing the characters in the image, the embodiment of the application also provides a device for recognizing the characters in the image. The following describes a device for recognizing characters in an image according to an embodiment of the present application.
As shown in fig. 7, a schematic structural diagram of an apparatus for recognizing characters in an image includes the following modules.
An image to be recognized acquisition module 701, configured to acquire an image to be recognized;
wherein, the image to be identified comprises characters.
A rectangular frame region determining module 702, configured to perform character region identification on an image to be identified, and determine rectangular frame regions each including a line of characters;
a first rotation module 703, configured to rotate the first rectangular frame region by 180 degrees to obtain a second rectangular frame region;
the first rectangular frame area is one of the rectangular frame areas which respectively contain a line of characters.
The first recognition module 704 is configured to input a character recognition model into the first rectangular frame region and the second rectangular frame region, and perform character recognition according to image features of the first rectangular frame region and the second rectangular frame region to obtain character recognition results and correct probabilities thereof corresponding to the first rectangular frame region and the second rectangular frame region, where the character recognition model includes a correspondence between the image features and the character recognition results and the correct probabilities thereof;
A target direction determining module 705, configured to compare the correct probabilities, and determine the direction of the text in the rectangular frame region corresponding to the highest correct probability as a target direction;
the second identifying module 706 is configured to perform character identification on the rectangular frame area other than the first rectangular frame area according to the target direction to obtain an identification result.
It can be seen that in the solution provided in the embodiment of the present invention, the electronic device may obtain an image to be recognized, perform text region recognition on the image to be recognized, determine each rectangular frame region containing one line of text, determine one of the rectangular frame regions containing one line of text as a first rectangular frame region, input a text recognition model into a second rectangular frame region obtained by rotating the first rectangular frame region and the first rectangular frame region by 180 degrees, perform text recognition according to image features of the first rectangular frame region and the second rectangular frame region, obtain text recognition results corresponding to the first rectangular frame region and the second rectangular frame region and correct probabilities thereof, compare the correct probabilities, determine a direction of text in the rectangular frame region corresponding to the highest correct probability as a target direction, and perform text recognition on the rectangular frame regions outside the first rectangular frame region according to the target direction, and obtaining a recognition result. Therefore, the electronic equipment identifies the characters in the image according to the target direction determined by the scheme, and can correctly identify the characters in the image when the direction of the characters in the image does not accord with the direction of the visual angle of the image watched by people.
As an implementation manner of the embodiment of the present invention, the apparatus for recognizing characters in an image may further include:
a rectangular frame area determining module (not shown in fig. 7) configured to determine whether a height of each rectangular frame area including a line of text is greater than a width of each rectangular frame area before the first rectangular frame area is rotated by 180 degrees to obtain a second rectangular frame area;
a second rotation module (not shown in fig. 7) configured to rotate the rectangular frame area containing a line of characters by 90 degrees in a preset direction if the height of the rectangular frame area containing a line of characters is larger than the width of the rectangular frame area containing a line of characters.
As an implementation manner of the embodiment of the present invention, the rectangular area determining module 702 may include:
a text line detection submodule (not shown in fig. 7) configured to input the image to be recognized into a text line detection model, and perform processing according to image features of the image to be recognized to obtain distances between each pixel point in the image to be recognized and four sides of a rectangular frame region including a line of characters to which the pixel point belongs, and an included angle between the rectangular frame region including a line of characters and a horizontal direction;
the text line detection model comprises the corresponding relation between image characteristics and the distance between each pixel point in the image and four sides of a rectangular frame area containing a line of characters and the included angle between the rectangular frame area and the horizontal direction, and is trained in advance by a model training module based on an image sample and a detection label thereof.
A rectangular frame region determining submodule (not shown in fig. 7) configured to determine, according to the distance, a rectangular frame region containing a row of characters in each of the to-be-recognized images by using a non-maximum suppression algorithm;
and a rectangular frame region adjusting submodule (not shown in fig. 7) configured to adjust the rectangular frame region containing a line of characters and having the included angle greater than 45 degrees to a vertical direction, and adjust the rectangular frame region containing a line of characters and having the included angle greater than 0 degrees and not greater than 45 degrees to a horizontal direction.
As an implementation manner of the embodiment of the present invention, the rectangular frame area determination sub-module may include:
a pixel point removing unit (not shown in fig. 7) configured to remove pixel points that do not belong to the rectangular frame region according to the probability that each pixel point in the image to be identified belongs to the rectangular frame region and a preset threshold;
the probability that each pixel point in the image to be recognized belongs to the rectangular frame region is another output result of the text line detection model, and the text line detection model is trained in advance by the model training module based on the image sample and the detection label thereof.
And a rectangular frame region determining unit (not shown in fig. 7) configured to determine, according to the distances corresponding to the remaining pixels, each rectangular frame region containing one line of text in the image to be recognized through a non-maximum suppression algorithm.
As an implementation manner of the embodiment of the present invention, the model training module may include:
an image sample acquisition sub-module (not shown in fig. 7) for acquiring an initial text line detection model and a plurality of image samples;
an image sample marking submodule (not shown in fig. 7) for marking four vertex coordinates of each rectangular frame region containing a line of characters in each image sample according to a preset rule;
a detection label generation submodule (not shown in fig. 7) configured to calculate, according to the four vertex coordinates of each rectangular frame region containing a line of characters, a distance between each pixel point and four sides of the rectangular frame region containing a line of characters to which the pixel point belongs, an included angle between each rectangular frame region containing a line of characters and the horizontal direction, and a probability that each pixel point belongs to the rectangular frame region, so as to obtain a detection label of each image sample;
a predictive label generating sub-module (not shown in fig. 7) for inputting the image sample into the initial text line detection model to generate a predictive label;
a parameter adjusting sub-module (not shown in fig. 7) for adjusting parameters of the initial text line detection neural network model based on the prediction labels and the detection labels of the corresponding image samples;
And a model generation sub-module (not shown in fig. 7) configured to determine that the number of iterations of the initial text line detection model reaches a preset number, or determine that the accuracy of a prediction tag output by the initial text line detection model reaches a preset value, and stop training to obtain the text line detection model.
As an implementation manner of the embodiment of the present invention, the second identifying module may include:
a first character recognition sub-module (not shown in fig. 7) configured to, if the target direction is the direction corresponding to the first rectangular frame region, recognize each rectangular frame region containing a line of characters in the image to be recognized, and obtain characters corresponding to each rectangular frame region containing a line of characters;
a second character recognition sub-module (not shown in fig. 7) configured to, if the target direction is a direction corresponding to the second rectangular frame region, rotate the image to be recognized by 180 degrees to obtain a target recognition image; and identifying each rectangular frame area containing a line of characters in the target identification image to obtain characters corresponding to each rectangular frame area containing a line of characters.
An embodiment of the present invention further provides an electronic device, as shown in fig. 8, which includes a processor 801, a communication interface 802, a memory 803, and a communication bus 804, where the processor 801, the communication interface 802, and the memory 803 complete mutual communication through the communication bus 804,
A memory 803 for storing a computer program;
the processor 801 is configured to implement the method for recognizing characters in an image according to any of the embodiments described above when executing the program stored in the memory 803.
It can be seen that in the solution provided in the embodiment of the present invention, the electronic device may obtain an image to be recognized, perform text region recognition on the image to be recognized, determine each rectangular frame region containing one line of text, determine one of the rectangular frame regions containing one line of text as a first rectangular frame region, input a text recognition model into a second rectangular frame region obtained by rotating the first rectangular frame region and the first rectangular frame region by 180 degrees, perform text recognition according to image features of the first rectangular frame region and the second rectangular frame region, obtain text recognition results corresponding to the first rectangular frame region and the second rectangular frame region and correct probabilities thereof, compare the correct probabilities, determine a direction of text in the rectangular frame region corresponding to the highest correct probability as a target direction, and perform text recognition on the rectangular frame regions outside the first rectangular frame region according to the target direction, and obtaining a recognition result. Therefore, the electronic equipment identifies the characters in the image according to the target direction determined by the scheme, and can correctly identify the characters in the image when the direction of the characters in the image does not accord with the direction of the visual angle of the image watched by people.
The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the electronic equipment and other equipment.
The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.
In another embodiment of the present invention, a computer-readable storage medium is further provided, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the method for recognizing characters in an image according to any one of the above embodiments.
It can be seen that in the solution provided in the embodiment of the present invention, the electronic device may obtain an image to be recognized, perform text region recognition on the image to be recognized, determine each rectangular frame region containing one line of text, determine one of the rectangular frame regions containing one line of text as a first rectangular frame region, input a text recognition model into a second rectangular frame region obtained by rotating the first rectangular frame region and the first rectangular frame region by 180 degrees, perform text recognition according to image features of the first rectangular frame region and the second rectangular frame region, obtain text recognition results corresponding to the first rectangular frame region and the second rectangular frame region and correct probabilities thereof, compare the correct probabilities, determine a direction of text in the rectangular frame region corresponding to the highest correct probability as a target direction, and perform text recognition on the rectangular frame regions outside the first rectangular frame region according to the target direction, and obtaining a recognition result. Therefore, the electronic equipment identifies the characters in the image according to the target direction determined by the scheme, and can correctly identify the characters in the image when the direction of the characters in the image does not accord with the direction of the visual angle of the image watched by people.
It should be noted that, for the above-mentioned apparatus, electronic device and computer-readable storage medium embodiments, since they are basically similar to the method embodiments, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiments.
It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (14)

1. A method for recognizing characters in an image, the method comprising:
acquiring an image to be recognized, wherein the image to be recognized comprises characters;
carrying out character area recognition on the image to be recognized, and determining rectangular frame areas containing a line of characters;
rotating the first rectangular frame area by 180 degrees to obtain a second rectangular frame area, wherein the first rectangular frame area is one of the rectangular frame areas containing one line of characters;
inputting a character recognition model into the first rectangular frame area and the second rectangular frame area, and performing character recognition according to the image characteristics of the first rectangular frame area and the second rectangular frame area to obtain character recognition results corresponding to the first rectangular frame area and the second rectangular frame area and correct probabilities thereof, wherein the character recognition model comprises the corresponding relation between the image characteristics and the character recognition results and the correct probabilities thereof;
Comparing the correct probabilities, and determining the direction of the characters in the rectangular frame region corresponding to the highest correct probability as a target direction;
and performing character recognition on the rectangular frame area except the first rectangular frame area according to the target direction to obtain a recognition result.
2. The method of claim 1, wherein before the step of rotating the first rectangular frame area by 180 degrees to obtain the second rectangular frame area, the method further comprises:
judging whether the height of each rectangular frame area containing a line of characters is larger than the width;
and if the height of the rectangular frame area containing a line of characters is larger than the width, rotating the rectangular frame area containing a line of characters by 90 degrees towards a preset direction.
3. The method according to claim 1, wherein the step of performing text region recognition on the image to be recognized and determining rectangular frame regions each containing a line of text comprises:
inputting the image to be recognized into a text line detection model, and processing according to the image characteristics of the image to be recognized to obtain the distance between each pixel point in the image to be recognized and the four sides of the rectangular frame area containing one line of characters to which the pixel point belongs and the included angle between the rectangular frame area containing one line of characters and the horizontal direction, wherein the text line detection model comprises the corresponding relation between the image characteristics and the distance between each pixel point in the image and the four sides of the rectangular frame area containing one line of characters to which the pixel point belongs and the included angle between the rectangular frame area and the horizontal direction;
Determining each rectangular frame area containing a line of characters in the image to be recognized through a non-maximum suppression algorithm according to the distance;
and adjusting the rectangular frame area which is larger than 45 degrees and contains a line of characters to be in the vertical direction, and adjusting the rectangular frame area which is larger than 0 degree and not larger than 45 degrees and contains a line of characters to be in the horizontal direction.
4. The method according to claim 3, wherein the output result of the text line detection model further includes a probability that each pixel point in the image to be recognized belongs to a rectangular frame region;
the step of determining each rectangular frame area containing a line of characters in the image to be recognized through a non-maximum suppression algorithm according to the distance comprises the following steps:
removing pixel points which do not belong to the rectangular frame region according to the probability that each pixel point in the image to be identified belongs to the rectangular frame region and a preset threshold;
and determining each rectangular frame area containing a line of characters in the image to be recognized through a non-maximum suppression algorithm according to the distance corresponding to the residual pixel points.
5. The method of claim 3, wherein the manner of training the text line detection model comprises:
Acquiring an initial text line detection model and a plurality of image samples;
marking four vertex coordinates of each rectangular frame area containing a line of characters in each image sample according to a preset rule;
calculating the distance between each pixel point and four sides of the rectangular frame region containing a line of characters, the included angle between each rectangular frame region containing a line of characters and the horizontal direction and the probability of each pixel point belonging to the rectangular frame region according to the four vertex coordinates of each rectangular frame region containing a line of characters, and obtaining the detection label of each image sample;
inputting the image sample into the initial text line detection model to obtain a prediction label;
adjusting parameters of the initial text line detection model based on the prediction labels and detection labels of the corresponding image samples;
and judging whether the iteration times of the initial text line detection model reach preset times or not, or judging whether the accuracy of a prediction label output by the initial text line detection model reaches a preset value or not, and stopping training to obtain the text line detection model.
6. The method according to claim 1, wherein the step of performing character recognition on the rectangular frame area other than the first rectangular frame area according to the target direction to obtain a recognition result comprises:
If the target direction is the direction corresponding to the characters in the first rectangular frame area, identifying each rectangular frame area which contains one line of characters except the first rectangular frame area in the image to be identified to obtain the characters corresponding to each rectangular frame area containing one line of characters;
if the target direction is the direction corresponding to the characters in the second rectangular frame area, rotating the image to be recognized by 180 degrees to obtain a target recognition image; and identifying each rectangular frame area which contains one line of characters except the first rectangular frame area in the target identification image to obtain characters corresponding to each rectangular frame area containing one line of characters.
7. An apparatus for recognizing characters in an image, the apparatus comprising:
the device comprises an image to be recognized acquisition module, a recognition module and a recognition module, wherein the image to be recognized acquisition module is used for acquiring an image to be recognized, and the image to be recognized comprises characters;
the rectangular frame area determining module is used for carrying out character area identification on the image to be identified and determining rectangular frame areas containing a line of characters;
the first rotating module is used for rotating the first rectangular frame area by 180 degrees to obtain a second rectangular frame area, wherein the first rectangular frame area is one of the rectangular frame areas containing one line of characters;
The first recognition module is used for inputting a character recognition model into the first rectangular frame area and the second rectangular frame area, and performing character recognition according to the image characteristics of the first rectangular frame area and the second rectangular frame area to obtain character recognition results corresponding to the first rectangular frame area and the second rectangular frame area and correct probabilities thereof, wherein the character recognition model comprises the corresponding relation between the image characteristics and the character recognition results and the correct probabilities thereof;
the target direction determining module is used for comparing the correct probabilities and determining the direction of the characters in the rectangular frame region corresponding to the highest correct probability as a target direction;
and the second identification module is used for carrying out character identification on the rectangular frame area except the first rectangular frame area according to the target direction to obtain an identification result.
8. The apparatus of claim 7, further comprising:
a rectangular frame area judgment module, configured to judge whether the height of each rectangular frame area containing a line of characters is greater than the width before the first rectangular frame area is rotated by 180 degrees to obtain a second rectangular frame area;
the second rotation module is configured to rotate the rectangular frame area including the line of characters by 90 degrees in a preset direction if the height of the rectangular frame area including the line of characters is larger than the width of the rectangular frame area.
9. The apparatus of claim 7, wherein the rectangular box area determination module comprises:
the text line detection submodule is used for inputting the image to be recognized into a text line detection model and processing the image according to the image characteristics of the image to be recognized to obtain the distance between each pixel point in the image to be recognized and the four sides of the rectangular frame area containing one line of characters to which the pixel point belongs and the included angle between the rectangular frame area containing one line of characters and the horizontal direction, wherein the text line detection model comprises the corresponding relation between the image characteristics and the distance between each pixel point in the image and the four sides of the rectangular frame area containing one line of characters to which the pixel point belongs and the included angle between the rectangular frame area and the horizontal direction, and the text line detection model is trained in advance by the model training module based on an image sample and a detection label thereof;
the rectangular frame area determining submodule is used for determining rectangular frame areas containing a line of characters in the image to be recognized through a non-maximum suppression algorithm according to the distance;
and the rectangular frame area adjusting submodule is used for adjusting the rectangular frame area which is larger than 45 degrees and contains a line of characters into the vertical direction, and adjusting the rectangular frame area which is larger than 0 degree and is not larger than 45 degrees and contains a line of characters into the horizontal direction.
10. The apparatus according to claim 9, wherein the output result of the text line detection model further includes a probability that each pixel point in the image to be recognized belongs to a rectangular frame region;
the rectangular frame region determination submodule includes:
the pixel point removing unit is used for removing pixel points which do not belong to the rectangular frame region according to the probability that each pixel point in the image to be identified belongs to the rectangular frame region and a preset threshold value;
and the rectangular frame area determining unit is used for determining the rectangular frame areas containing one line of characters in the image to be recognized through a non-maximum suppression algorithm according to the corresponding distances of the residual pixel points.
11. The apparatus of claim 10, wherein the model training module comprises:
the image sample acquisition sub-module is used for acquiring an initial text line detection model and a plurality of image samples;
the image sample marking submodule is used for marking four vertex coordinates of each rectangular frame area containing a line of characters in each image sample according to a preset rule;
the detection label generation submodule is used for calculating the distance between each pixel point and four sides of the rectangular frame area containing one line of characters, the included angle between the rectangular frame area containing one line of characters and the horizontal direction and the probability of each pixel point belonging to the rectangular frame area according to the four vertex coordinates of the rectangular frame area containing one line of characters, and obtaining the detection label of each image sample;
The prediction label generation submodule is used for inputting the image sample into the initial text line detection model to generate a prediction label;
the parameter adjusting submodule is used for adjusting the parameters of the initial text line detection model based on the prediction label and the detection label of the corresponding image sample;
and the model generation submodule is used for judging whether the iteration times of the initial text line detection model reach preset times or not, or judging whether the accuracy of a prediction label output by the initial text line detection model reaches a preset value or not, and stopping training to obtain the text line detection model.
12. The apparatus of claim 7, wherein the second identification module comprises:
the first character recognition sub-module is used for recognizing each rectangular frame area containing one line of characters in the image to be recognized if the target direction is the direction corresponding to the first rectangular frame area to obtain the characters corresponding to each rectangular frame area containing one line of characters;
the second character recognition submodule is used for rotating the image to be recognized by 180 degrees to obtain a target recognition image if the target direction is the direction corresponding to the second rectangular frame area; and identifying each rectangular frame area containing a line of characters in the target identification image to obtain characters corresponding to each rectangular frame area containing a line of characters.
13. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of any of claims 1-6 when executing a program stored in the memory.
14. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1 to 6.
CN201910427882.2A 2019-05-22 2019-05-22 Method and device for recognizing characters in image and electronic equipment Active CN111985469B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910427882.2A CN111985469B (en) 2019-05-22 2019-05-22 Method and device for recognizing characters in image and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910427882.2A CN111985469B (en) 2019-05-22 2019-05-22 Method and device for recognizing characters in image and electronic equipment

Publications (2)

Publication Number Publication Date
CN111985469A true CN111985469A (en) 2020-11-24
CN111985469B CN111985469B (en) 2024-03-19

Family

ID=73436355

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910427882.2A Active CN111985469B (en) 2019-05-22 2019-05-22 Method and device for recognizing characters in image and electronic equipment

Country Status (1)

Country Link
CN (1) CN111985469B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112381057A (en) * 2020-12-03 2021-02-19 上海芯翌智能科技有限公司 Handwritten character recognition method and device, storage medium and terminal
CN112766266A (en) * 2021-01-29 2021-05-07 云从科技集团股份有限公司 Text direction correction method, system and device based on staged probability statistics
CN113313117A (en) * 2021-06-25 2021-08-27 北京奇艺世纪科技有限公司 Method and device for recognizing text content
CN117235831A (en) * 2023-11-13 2023-12-15 北京天圣华信息技术有限责任公司 Automatic part labeling method, device, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09237318A (en) * 1996-03-04 1997-09-09 Fuji Electric Co Ltd Inclination correcting method for character image data inputted by image scanner
JP2005346419A (en) * 2004-06-03 2005-12-15 Canon Inc Method for processing character and character recognition processor
US9552527B1 (en) * 2015-08-27 2017-01-24 Lead Technologies, Inc. Apparatus, method, and computer-readable storage medium for determining a rotation angle of text
CN108229303A (en) * 2017-11-14 2018-06-29 北京市商汤科技开发有限公司 Detection identification and the detection identification training method of network and device, equipment, medium
CN109117848A (en) * 2018-09-07 2019-01-01 泰康保险集团股份有限公司 A kind of line of text character identifying method, device, medium and electronic equipment
CN109685055A (en) * 2018-12-26 2019-04-26 北京金山数字娱乐科技有限公司 Text filed detection method and device in a kind of image

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09237318A (en) * 1996-03-04 1997-09-09 Fuji Electric Co Ltd Inclination correcting method for character image data inputted by image scanner
JP2005346419A (en) * 2004-06-03 2005-12-15 Canon Inc Method for processing character and character recognition processor
US9552527B1 (en) * 2015-08-27 2017-01-24 Lead Technologies, Inc. Apparatus, method, and computer-readable storage medium for determining a rotation angle of text
CN108229303A (en) * 2017-11-14 2018-06-29 北京市商汤科技开发有限公司 Detection identification and the detection identification training method of network and device, equipment, medium
CN109117848A (en) * 2018-09-07 2019-01-01 泰康保险集团股份有限公司 A kind of line of text character identifying method, device, medium and electronic equipment
CN109685055A (en) * 2018-12-26 2019-04-26 北京金山数字娱乐科技有限公司 Text filed detection method and device in a kind of image

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
MOTOKI ONUMA, ET AL.: "An On-line Handwritten Japanese Text Recognition System Free from Line Direction and Character Orientation Constraints", 《IEICE TRANS. INF. & SYST》, vol. 88, no. 8, 31 August 2005 (2005-08-31), pages 1823 - 1830 *
刘明珠 等: "基于深度学习法的视频文本区域定位与识别", 《哈尔滨理工大学学报》, vol. 21, no. 6, pages 61 - 66 *
李杰 等: "文字识别中特征与相似度度量的研究", 盐城工学院学报(自然科学版), vol. 29, no. 4, 31 December 2016 (2016-12-31), pages 42 - 46 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112381057A (en) * 2020-12-03 2021-02-19 上海芯翌智能科技有限公司 Handwritten character recognition method and device, storage medium and terminal
CN112766266A (en) * 2021-01-29 2021-05-07 云从科技集团股份有限公司 Text direction correction method, system and device based on staged probability statistics
CN112766266B (en) * 2021-01-29 2021-12-10 云从科技集团股份有限公司 Text direction correction method, system and device based on staged probability statistics
CN113313117A (en) * 2021-06-25 2021-08-27 北京奇艺世纪科技有限公司 Method and device for recognizing text content
CN113313117B (en) * 2021-06-25 2023-07-25 北京奇艺世纪科技有限公司 Method and device for identifying text content
CN117235831A (en) * 2023-11-13 2023-12-15 北京天圣华信息技术有限责任公司 Automatic part labeling method, device, equipment and storage medium
CN117235831B (en) * 2023-11-13 2024-02-23 北京天圣华信息技术有限责任公司 Automatic part labeling method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN111985469B (en) 2024-03-19

Similar Documents

Publication Publication Date Title
CN111985469B (en) Method and device for recognizing characters in image and electronic equipment
CN109726643B (en) Method and device for identifying table information in image, electronic equipment and storage medium
CN109492643B (en) Certificate identification method and device based on OCR, computer equipment and storage medium
CN109993160B (en) Image correction and text and position identification method and system
US20190095730A1 (en) End-To-End Lightweight Method And Apparatus For License Plate Recognition
US20200074178A1 (en) Method and system for facilitating recognition of vehicle parts based on a neural network
US8600165B2 (en) Optical mark classification system and method
CN111353501A (en) Book point-reading method and system based on deep learning
CN107886082B (en) Method and device for detecting mathematical formulas in images, computer equipment and storage medium
CN111310826B (en) Method and device for detecting labeling abnormality of sample set and electronic equipment
US8542912B2 (en) Determining the uniqueness of a model for machine vision
CN112926564B (en) Picture analysis method, system, computer device and computer readable storage medium
WO2020039882A1 (en) Discrimination device and machine learning method
CN110443235B (en) Intelligent paper test paper total score identification method and system
CN112001406A (en) Text region detection method and device
US8542905B2 (en) Determining the uniqueness of a model for machine vision
CN113420848A (en) Neural network model training method and device and gesture recognition method and device
CN116434266A (en) Automatic extraction and analysis method for data information of medical examination list
CN110222704B (en) Weak supervision target detection method and device
CN112784494B (en) Training method of false positive recognition model, target recognition method and device
CN112396057B (en) Character recognition method and device and electronic equipment
US20230401809A1 (en) Image data augmentation device and method
CN110705633A (en) Target object detection and target object detection model establishing method and device
CN115953744A (en) Vehicle identification tracking method based on deep learning
CN114495108A (en) Character detection method and device, electronic equipment and readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant