CN114220108A - Text recognition method, readable storage medium and text recognition device for natural scene - Google Patents

Text recognition method, readable storage medium and text recognition device for natural scene Download PDF

Info

Publication number
CN114220108A
CN114220108A CN202111565107.7A CN202111565107A CN114220108A CN 114220108 A CN114220108 A CN 114220108A CN 202111565107 A CN202111565107 A CN 202111565107A CN 114220108 A CN114220108 A CN 114220108A
Authority
CN
China
Prior art keywords
text
text region
angle
character
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111565107.7A
Other languages
Chinese (zh)
Inventor
李球
王和平
陈昌全
陈余泉
徐波
陈雅琼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Maxvision Technology Corp
Original Assignee
Maxvision Technology Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Maxvision Technology Corp filed Critical Maxvision Technology Corp
Priority to CN202111565107.7A priority Critical patent/CN114220108A/en
Publication of CN114220108A publication Critical patent/CN114220108A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a text recognition method for a natural scene, which comprises the following steps: acquiring a text image to be recognized, and performing text region detection on the text image to be recognized to obtain a first text region of a rectangular frame; performing perspective transformation on the first text region, and rotating the first text region after the perspective transformation to obtain a second text region, wherein the long side of a rectangular frame of the second text region is parallel to the X axis; training based on a deep learning model to obtain an angle detection model, detecting the angle of the characters in the second text region by using the angle detection model, and adjusting the character angle of the second text region of the rectangular frame according to the angle detected by the angle detection model to obtain a third text region, wherein the character angle in the third text region is 0 degree; and performing single-character segmentation and single-character recognition on the characters in the third text region. The present application also provides a computer-readable storage medium and a text recognition apparatus.

Description

Text recognition method, readable storage medium and text recognition device for natural scene
Technical Field
The present application relates to the field of text recognition technologies, and in particular, to a text recognition method, a readable storage medium, and a text recognition apparatus for a natural scene.
Background
Under the current development trend of science and technology, the technology of recognizing characters by means of images is common. It can be mainly classified into optical character recognition, character recognition in natural scenes, and the like. Optical Character Recognition (OCR) is mainly oriented to high-definition document images, and such techniques assume that the input image has a clean background, simple fonts and orderly arrangement of characters. Under the condition of meeting the requirements of the former proposal, the trained network model can achieve high recognition accuracy and the training process is fast.
Character recognition (STR) in natural scenes is mainly oriented to natural scene images containing characters. However, in life, characters in texts in some natural scenes have different angles and other attributes, which makes it difficult to recognize the characters in the natural scenes.
Disclosure of Invention
Aiming at the prior art, the technical problem to be solved by the application is to provide a text recognition method, a readable storage medium and a terminal for a natural scene, which are beneficial to improving the recognition efficiency of texts containing characters in different angles.
In order to solve the above technical problem, the present application provides a text recognition method for a natural scene, including:
acquiring a text image to be recognized, and performing text region detection on the text image to be recognized to obtain a first text region of a rectangular frame;
performing perspective transformation on the first text region, and rotating the first text region after the perspective transformation to obtain a second text region, so that the long side of a rectangular frame of the second text region is parallel to the X axis;
training based on a deep learning model to obtain an angle detection model, detecting the angle of the characters in the second text region by using the angle detection model, and adjusting the character angle of the second text region of the rectangular frame according to the angle detected by the angle detection model to obtain a third text region, so that the included angle of the characters in the third text region is 0 degree;
performing single character segmentation and single character recognition on the characters in the third text region;
wherein, X-axis and Y-axis are mutually perpendicular to form an image coordinate system, and the character angle is the included angle between the characters and the Y-axis.
In one possible implementation manner, the step of rotating the perspective-transformed first text region to obtain the second text region includes:
judging whether the length ratio of the Y axis to the X axis of the rectangular frame of the first text area is more than 1.5;
if so, rotating the first text region of the rectangular frame by 90 degrees anticlockwise;
otherwise, the first text region of the rectangular box is rotated counterclockwise by 0 degree.
In one possible implementation, the step of obtaining the angle detection model based on deep learning model training includes:
intercepting a text image of a rectangular box in which characters are transversely distributed in parallel and the character angle is 0 degree in a natural scene as a data set;
dividing the data set into six parts, and respectively recording the six parts as first part data, second part data, third part data, fourth part data, fifth part data and sixth part data;
rotating each character of each text image in the first part of data counterclockwise by 0 degree to obtain a first training data set; rotating each character of each second text image in the second data by 90 degrees anticlockwise to obtain a second training data set; rotating each character of each three text images in the third data counterclockwise by 180 degrees to obtain a third training data set; rotating each character of every four text images in the fourth data counterclockwise by 270 degrees to obtain a fourth training data set; rotating each character of each fifth text image in the fifth data counterclockwise by 45 degrees to obtain a fifth training data set; clockwise rotating every character of every six text images in the sixth data by minus 45 degrees to obtain a sixth training data set;
extracting character angle characteristics of a first training data set, a second training data set, a third training data set, a fourth training data set, a fifth training data set and a sixth training data set relative to a text image by using a characteristic layer of a ShuffeNet V2 network model to generate a characteristic diagram, and carrying out learning training based on the ShuffeNet V2 network model until the ShuffeNet V2 network model converges to obtain an angle detection model.
In one possible implementation, the number of text images in the first training data set, the second training data set, the third training data set, the fourth training data set, the fifth training data set, and the sixth training data set is set to be the same.
In a possible implementation manner, the step of performing text angle adjustment on the second text region of the rectangular frame according to the angle detected by the angle detection model to obtain the third text region includes:
if the angle of the characters in the second text region detected by the angle detection model is 0 degree, maintaining the angle of the characters in the second text region unchanged;
if the character angle in the second text region detected by the angle detection model is 90 degrees, rotating the second text region counterclockwise by 270 degrees;
if the character angle in the second text region detected by the angle detection model is 180 degrees, rotating the second text region by 180 degrees anticlockwise;
if the character angle in the second text region detected by the angle detection model is 270 degrees, rotating the second text region by 90 degrees anticlockwise;
and if the character angle in the second text region detected by the angle detection model is 45 degrees, rotating the second text region by 215 degrees anticlockwise.
In one possible implementation manner, the step of performing text region detection on the text image to be recognized to obtain a first text region of a rectangular frame includes:
continuously performing five times of convolution operation on the text image by using a 3 x 3 convolution core, and performing cascade fusion on results of the five times of convolution operation based on a feature map pyramid network to obtain a feature map of the text image;
predicting the feature map by using a DBNet learning network to obtain a probability map about the text;
carrying out threshold operation on the probability map to obtain a segmentation result about the text;
and extracting the contour of the segmentation result, and calculating a circumscribed rectangle frame of the contour, wherein the circumscribed rectangle frame frames the first text region of the regional rectangle frame.
In one possible implementation, the step of performing single-character segmentation and single-character recognition on the characters in the third text region includes:
segmenting all the single characters in the third text area and each single character bounding rectangle by using the yolov3 model;
and inputting the single characters into a single character recognition model one by one for character recognition according to the sequence of the horizontal coordinates of the top left corner vertexes of the circumscribed rectangular frames of all the single characters from small to large.
In one possible implementation, the single character recognition model is the ResNet50 learning model.
The present application also provides a computer-readable storage medium storing a computer program, which, when executed by a processor, implements the method for recognizing text in a natural scene.
The present application further provides a text recognition apparatus comprising a memory and one or more processors, the memory and the processors coupled; the memory is for storing computer program code comprising computer instructions which, when executed by the text recognition apparatus, cause the text recognition apparatus to perform a text recognition method of the natural scene.
In the text recognition method of the natural scene, firstly, detecting a text area of a text image to be recognized to obtain a first text area of a rectangular frame; rotating the first text region after perspective transformation to obtain a second text region, so that the long side of a rectangular frame of the second text region is parallel to the X axis, and obtaining a second text region of a transverse rectangular frame; detecting the angles of the characters in the second text region by using the trained angle detection model, and adjusting the character angles of the second text region of the rectangular frame according to the angles detected by the angle detection model to obtain a third text region, so that the included angle of the characters in the third text region is 0 degree, namely, the characters in the third text region have no angle deviation on the Y axis, and the character angles in the third text region are unified into a state which is conventionally checked by human eyes; therefore, the problems that the difficulty of subsequent character recognition is increased and the character recognition efficiency is influenced due to the problem of different angles of characters are solved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a flowchart of a text recognition method for a natural scene according to an embodiment of the present application;
FIG. 2 is a diagram illustrating results of a first text region obtained, perspective transformation and rotation performed on the first text region, a second text region obtained, and a third text region obtained according to an embodiment of the present application;
fig. 3 is a flowchart illustrating steps for performing single-character segmentation and single-character recognition on characters in the third text region according to an embodiment of the present application.
Detailed Description
In order to make the technical problems, technical solutions and advantageous effects to be solved by the present application clearer, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
It will be understood that when an element is referred to as being "secured to" or "disposed on" another element, it can be directly on the other element or be indirectly on the other element. When an element is referred to as being "connected to" another element, it can be directly connected to the other element or be indirectly connected to the other element.
It will be understood that the terms "length," "width," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like, as used herein, refer to an orientation or positional relationship indicated in the drawings that is solely for the purpose of facilitating the description and simplifying the description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus should not be considered as limiting the present application.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present application, "a plurality" means two or more unless specifically limited otherwise.
A text recognition method, a readable storage medium, and a text recognition apparatus for a natural scene according to embodiments of the present application will now be described with reference to the drawings.
Referring to fig. 1, a text recognition method for a natural scene provided in an embodiment of the present application includes the following steps:
step S100: acquiring a text image to be recognized, and performing text region detection on the text image to be recognized to obtain a first text region of the rectangular frame, wherein each text in a first column in fig. 2 represents a text result of performing text region detection to obtain the first text region of the rectangular frame.
Step S200: and performing perspective transformation on the first text region, and rotating the first text region after the perspective transformation to obtain a second text region, so that the long side of the rectangular frame of the second text region is parallel to the X axis.
Step S300: and training based on the deep learning model to obtain an angle detection model.
Step S400: and detecting the angle of the characters in the second text region by using the angle detection model.
Step S500: performing character angle adjustment on a second text region of the rectangular frame according to the angle detected by the angle detection model to obtain a third text region, so that the included angle of characters in the third text region is 0 degree; .
Step S600: and performing single-character segmentation and single-character recognition on the characters in the third text region.
In the above steps, the X-axis and the Y-axis are perpendicular to each other to form an image coordinate system, which is shown in fig. 2. It should be noted that the text angle is an angle between the text and the Y axis, and can be understood as an angular deviation between the text and the Y axis observed from the usual visual angle of human eyes. To facilitate understanding of the word angle, for example, the word angle in the text region of the fifth rectangular box in the first column of text in fig. 2 is 0 degrees, and the word angle in the text region of the first rectangular box in the second column of text in fig. 2 is 0 degrees; the angle of the word in the text area of the fifth rectangular box in the three columns of text in fig. 2 is 90 degrees; the angle of the word in the text area of the sixth rectangular box in the third column of text in fig. 2 is 270 degrees; the angle of the word within the text area of the seventh rectangular box in the third column of text in fig. 2 is 45 degrees; the angle of the text in the text area of the second rectangular box in the third column of text in fig. 2 is 180 degrees.
Referring to fig. 3, in step S100, the step of performing text region detection on the text image to be recognized to obtain a first text region of a rectangular frame includes:
step S110: the text image is continuously subjected to five convolution operations using a 3 × 3 convolution kernel.
Step S120: performing cascade fusion on the result of the five times of convolution based on a feature map pyramid network (FPN) to obtain a feature map of the text image; wherein, the feature in the feature map is a feature related to the text image characteristic.
Step S130: and predicting the feature map by using a DBNet learning network to obtain a probability map about the text.
Step S140: and performing threshold operation on the probability map to obtain a segmentation result about the text.
Step S150: and extracting the contour of the segmentation result, and calculating a circumscribed rectangle frame of the contour, wherein the circumscribed rectangle frame frames the first text region of the regional rectangle frame.
In one embodiment, the threshold of the threshold operation in step S140 is 0.2.
In step S200, the first text region is subjected to perspective transformation, and the result of the perspective transformation of the text in the first column in fig. 2 is the text in the second column.
In step S200, the step of rotating the perspective-transformed first text region to obtain a second text region includes: judging whether the length ratio of the Y axis to the X axis of the rectangular frame of the first text area is more than 1.5; if so, rotating the first text region of the rectangular frame by 90 degrees anticlockwise; otherwise, the first text region of the rectangular box is rotated counterclockwise by 0 degree. The third column of text in fig. 2 is the result after the second column of text has been rotated.
It is understood that the length of the rectangular frame of the first text region in the Y-axis may be understood as the height of the rectangular frame, and the length of the rectangular frame of the first text region in the X-axis may be understood as the width of the rectangular frame; the rotation of the perspective-transformed first text region in step S200 is performed to obtain a horizontal rectangular frame, i.e., the long side of the rectangular frame of the second text region is parallel to the X-axis; if the rectangular frame is in the positive direction, it can be determined that any side is a long side, that is, the length of the rectangular frame in the first text region in the X axis is set to be a long side or the length of the rectangular frame in the first text region in the Y axis is set to be a long side.
Step S300: the step of obtaining the angle detection model based on deep learning model training comprises the following steps:
intercepting a text image of a rectangular box in which characters are transversely distributed in parallel and the character angle is 0 degree in a natural scene as a data set;
dividing the data set into six parts, and respectively recording the six parts as first part data, second part data, third part data, fourth part data, fifth part data and sixth part data;
rotating each character of each text image in the first part of data counterclockwise by 0 degree to obtain a first training data set; rotating each character of each second text image in the second data by 90 degrees anticlockwise to obtain a second training data set; rotating each character of each three text images in the third data counterclockwise by 180 degrees to obtain a third training data set; rotating each character of every four text images in the fourth data counterclockwise by 270 degrees to obtain a fourth training data set; rotating each character of each fifth text image in the fifth data counterclockwise by 45 degrees to obtain a fifth training data set; clockwise rotating every character of every six text images in the sixth data by minus 45 degrees to obtain a sixth training data set;
extracting character angle characteristics of a first training data set, a second training data set, a third training data set, a fourth training data set, a fifth training data set and a sixth training data set relative to a text image by using a characteristic layer of a ShuffeNet V2 network model to generate a characteristic diagram, and carrying out learning training based on the ShuffeNet V2 network model until the ShuffeNet V2 network model converges to obtain an angle detection model. The ShuffleNet V2 network model is a neural network model.
Further, in order to improve the accuracy of the angle detection model, the number of text images in the first training data set, the second training data set, the third training data set, the fourth training data set, the fifth training data set and the sixth training data set is set to be the same. And adding negative samples to the first training data set, the second training data set, the third training data set, the fourth training data set, the fifth training data set and the sixth training data set.
With further reference to fig. 1, in step S500, the step of performing a text angle adjustment on the second text region of the rectangular frame according to the angle detected by the angle detection model to obtain a third text region includes:
step S510: if the angle of the characters in the second text region detected by the angle detection model is 0 degree, maintaining the angle of the characters in the second text region unchanged;
step S520: if the character angle in the second text region detected by the angle detection model is 90 degrees, rotating the second text region counterclockwise by 270 degrees;
step S530: if the character angle in the second text region detected by the angle detection model is 180 degrees, rotating the second text region by 180 degrees anticlockwise;
step S540: if the character angle in the second text region detected by the angle detection model is 270 degrees, rotating the second text region by 90 degrees anticlockwise;
step S550: and if the character angle in the second text region detected by the angle detection model is 45 degrees, rotating the second text region by 215 degrees anticlockwise.
It should be noted that the included angle of the text is 0 degree, which can be understood as: when a person checks the characters at a usual visual angle, the characters are normally and vertically written and have no angle deviation in the vertical direction; for example, the text in the fourth column of text in fig. 2 is 0 degrees from the Y-axis direction, and when the text in the fourth column of text region is observed by the general visual angle of the human eye, the text is normally vertical and has no angular deviation in the vertical direction.
In step S500, a second text region of the rectangular frame is subjected to a character angle adjustment according to the angle detected by the angle detection model to obtain a third text region, so that a character angle in the third text region is parallel to the Y axis, that is, there is no angle deviation in the vertical direction of characters in the third text region, and thus the character angle in the third text region is unified into a state that human eyes are conventionally used to check, thereby facilitating subsequent single character segmentation and single character recognition, and reducing the difficulty of subsequent character segmentation and character recognition. It can be understood that when characters are various in angle, because characters in the character recognition library commonly used are normally vertical, so when utilizing the character recognition library commonly used to carry out different angle character recognition, the recognition difficulty and the recognition efficiency are inevitably improved.
The fourth column of texts in fig. 2 is a result of performing angle detection and character angle adjustment on the third column of texts by using the angle detection model.
In step S600, the step of performing single-character segmentation and single-character recognition on the characters in the third text region includes: segmenting all the single characters in the third text area and each single character bounding rectangle by using the yolov3 model; and inputting the single characters into a single character recognition model one by one for character recognition according to the sequence of the horizontal coordinates of the top left corner vertexes of the circumscribed rectangular frames of all the single characters from small to large.
In one embodiment of the application, the single character recognition model is the ResNet50 learning model. The training data for training the ResNet50 learning model adopts 6763 Chinese characters in the first-level character library and the second-level character library of the character set GB 2312-80. In order to increase the diversity of the data set and increase the accuracy of the ResNet50 learning model, the brightness of at least part of character images in the character set used by the training model is changed to 70-130% of the original brightness; randomly changing the contrast of at least part of character images to 70-130% of the original contrast; randomly changing the saturation of at least part of character images to 70-130% of the original saturation; and adding the images with the brightness change, the saturation change and the contrast change into the original character set to be mixed to generate new training data.
In the text recognition method of the natural scene, firstly, detecting a text area of a text image to be recognized to obtain a first text area of a rectangular frame; rotating the first text region after perspective transformation to obtain a second text region, so that the long side of a rectangular frame of the second text region is parallel to the X axis, and obtaining a second text region of a transverse rectangular frame; detecting the angles of the characters in the second text region by using the trained angle detection model, and adjusting the character angles of the second text region of the rectangular frame according to the angles detected by the angle detection model to obtain a third text region, so that the included angle of the characters in the third text region is 0 degree, namely, the characters in the third text region have no angle deviation on the Y axis, and the character angles in the third text region are unified into a state which is conventionally checked by human eyes; therefore, the problems that the difficulty of subsequent character recognition is increased and the character recognition efficiency is influenced due to the problem of different angles of characters are solved.
The embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the method for recognizing a text in a natural scene in the foregoing embodiment is implemented.
In the present embodiment, the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device including one or more servers, data centers, and the like, which may be integrated with the medium. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
Embodiments of the present application also provide a text recognition apparatus, which includes a memory and one or more processors, the memory and the processors being coupled; the memory is used for storing computer program code, which includes computer instructions, when executed by the text recognition apparatus, causes the text recognition apparatus to execute the text recognition method of the natural scene in the embodiment.
In this embodiment, the processor may include one or more processing units, such as: the processor may include an Application Processor (AP), a modem processor, a Graphics Processor (GPU), an Image Signal Processor (ISP), a controller, a memory, a video codec, a Digital Signal Processor (DSP), a baseband processor, and/or a neural-Network Processing Unit (NPU), etc.; the different processing units may be separate devices or may be integrated into one or more processors. The memory may be, but is not limited to, an electronic, magnetic, optical, or semiconductor system, apparatus, or device, and in particular, but not limited to, a magnetic disk, hard disk, read only memory, random access memory, or erasable programmable read only memory. The processor may be a central processing unit, but may also be other general purpose processors, digital signal processors, application specific integrated circuits, off-the-shelf programmable gate arrays or other programmable logic devices, discrete gate or transistor logic, discrete hardware components, or the like.
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (10)

1. A text recognition method for a natural scene, comprising:
acquiring a text image to be recognized, and performing text region detection on the text image to be recognized to obtain a first text region of a rectangular frame;
performing perspective transformation on the first text region, and rotating the first text region after the perspective transformation to obtain a second text region, so that the long side of a rectangular frame of the second text region is parallel to the X axis;
training based on a deep learning model to obtain an angle detection model, detecting the angle of the characters in the second text region by using the angle detection model, and adjusting the character angle of the second text region of the rectangular frame according to the angle detected by the angle detection model to obtain a third text region, so that the included angle of the characters in the third text region is 0 degree;
performing single character segmentation and single character recognition on the characters in the third text region;
wherein, X-axis and Y-axis are mutually perpendicular to form an image coordinate system, and the character angle is the included angle between the characters and the Y-axis.
2. The method for recognizing text in natural scene according to claim 1, wherein the step of rotating the perspective-transformed first text region to obtain the second text region comprises:
judging whether the length ratio of the Y axis to the X axis of the rectangular frame of the first text area is more than 1.5;
if so, rotating the first text region of the rectangular frame by 90 degrees anticlockwise;
otherwise, the first text region of the rectangular box is rotated counterclockwise by 0 degree.
3. The natural scene text recognition method of claim 1, wherein the step of obtaining the angle detection model based on deep learning model training comprises:
intercepting a text image of a rectangular box in which characters are transversely distributed in parallel and the character angle is 0 degree in a natural scene as a data set;
dividing the data set into six parts, and respectively recording the six parts as first part data, second part data, third part data, fourth part data, fifth part data and sixth part data;
rotating each character of each text image in the first part of data counterclockwise by 0 degree to obtain a first training data set; rotating each character of each second text image in the second data by 90 degrees anticlockwise to obtain a second training data set; rotating each character of each three text images in the third data counterclockwise by 180 degrees to obtain a third training data set; rotating each character of every four text images in the fourth data counterclockwise by 270 degrees to obtain a fourth training data set; rotating each character of each fifth text image in the fifth data counterclockwise by 45 degrees to obtain a fifth training data set; clockwise rotating every character of every six text images in the sixth data by minus 45 degrees to obtain a sixth training data set;
extracting character angle characteristics of a first training data set, a second training data set, a third training data set, a fourth training data set, a fifth training data set and a sixth training data set relative to a text image by using a characteristic layer of a ShuffeNet V2 network model to generate a characteristic diagram, and carrying out learning training based on the ShuffeNet V2 network model until the ShuffeNet V2 network model converges to obtain an angle detection model.
4. The method for recognizing text in natural scenes according to claim 3, wherein the number of text images in the first training data set, the second training data set, the third training data set, the fourth training data set, the fifth training data set, and the sixth training data set is set to be the same.
5. The method for recognizing texts in natural scenes according to claim 3, wherein the step of performing text angle adjustment on the second text region of the rectangular frame according to the angle detected by the angle detection model to obtain the third text region comprises:
if the angle of the characters in the second text region detected by the angle detection model is 0 degree, maintaining the angle of the characters in the second text region unchanged;
if the character angle in the second text region detected by the angle detection model is 90 degrees, rotating the second text region counterclockwise by 270 degrees;
if the character angle in the second text region detected by the angle detection model is 180 degrees, rotating the second text region by 180 degrees anticlockwise;
if the character angle in the second text region detected by the angle detection model is 270 degrees, rotating the second text region by 90 degrees anticlockwise;
and if the character angle in the second text region detected by the angle detection model is 45 degrees, rotating the second text region by 215 degrees anticlockwise.
6. The natural scene text recognition method of claim 1, wherein the step of performing text region detection on the text image to be recognized to obtain a first text region of a rectangular box comprises:
continuously performing five times of convolution operation on the text image by using a 3 x 3 convolution core, and performing cascade fusion on results of the five times of convolution operation based on a feature map pyramid network to obtain a feature map of the text image;
predicting the feature map by using a DBNet learning network to obtain a probability map about the text;
carrying out threshold operation on the probability map to obtain a segmentation result about the text;
and extracting the contour of the segmentation result, and calculating a circumscribed rectangle frame of the contour, wherein the circumscribed rectangle frame frames the first text region of the regional rectangle frame.
7. The natural scene text recognition method of claim 1, wherein the step of performing single-character segmentation and single-character recognition on the characters in the third text region comprises:
segmenting all the single characters in the third text area and each single character bounding rectangle by using the yolov3 model;
and inputting the single characters into a single character recognition model one by one for character recognition according to the sequence of the horizontal coordinates of the top left corner vertexes of the circumscribed rectangular frames of all the single characters from small to large.
8. The method for recognizing text of a natural scene as recited in claim 1, wherein the single character recognition model is a ResNet50 learning model.
9. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, implements the text recognition method of a natural scene according to any one of claims 1 to 8.
10. A text recognition apparatus comprising a memory and one or more processors, the memory and the processors being coupled; the memory for storing computer program code comprising computer instructions which, when executed by the text recognition apparatus, cause the text recognition apparatus to perform a method of text recognition of a natural scene as claimed in any one of claims 1 to 8.
CN202111565107.7A 2021-12-20 2021-12-20 Text recognition method, readable storage medium and text recognition device for natural scene Pending CN114220108A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111565107.7A CN114220108A (en) 2021-12-20 2021-12-20 Text recognition method, readable storage medium and text recognition device for natural scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111565107.7A CN114220108A (en) 2021-12-20 2021-12-20 Text recognition method, readable storage medium and text recognition device for natural scene

Publications (1)

Publication Number Publication Date
CN114220108A true CN114220108A (en) 2022-03-22

Family

ID=80704519

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111565107.7A Pending CN114220108A (en) 2021-12-20 2021-12-20 Text recognition method, readable storage medium and text recognition device for natural scene

Country Status (1)

Country Link
CN (1) CN114220108A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115457559A (en) * 2022-08-19 2022-12-09 上海通办信息服务有限公司 Method, device and equipment for intelligently correcting text and license pictures

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115457559A (en) * 2022-08-19 2022-12-09 上海通办信息服务有限公司 Method, device and equipment for intelligently correcting text and license pictures
CN115457559B (en) * 2022-08-19 2024-01-16 上海通办信息服务有限公司 Method, device and equipment for intelligently correcting texts and license pictures

Similar Documents

Publication Publication Date Title
CN105868758B (en) method and device for detecting text area in image and electronic equipment
US20180157927A1 (en) Character Segmentation Method, Apparatus and Electronic Device
CN108960229B (en) Multidirectional character detection method and device
CN110032998B (en) Method, system, device and storage medium for detecting characters of natural scene picture
JP7198350B2 (en) CHARACTER DETECTION DEVICE, CHARACTER DETECTION METHOD AND CHARACTER DETECTION SYSTEM
JP2018519574A (en) Text image processing method and apparatus
CN113011144B (en) Form information acquisition method, device and server
CN111709956B (en) Image processing method, device, electronic equipment and readable storage medium
CN111914698A (en) Method and system for segmenting human body in image, electronic device and storage medium
CN113065536B (en) Method of processing table, computing device, and computer-readable storage medium
CN111191649A (en) Method and equipment for identifying bent multi-line text image
WO2021190155A1 (en) Method and apparatus for identifying spaces in text lines, electronic device and storage medium
CN112200117A (en) Form identification method and device
CN108664975A (en) A kind of hand-written Letter Identification Method of Uighur, system and electronic equipment
CN114529773A (en) Form identification method, system, terminal and medium based on structural unit
CN111563505A (en) Character detection method and device based on pixel segmentation and merging
CN114220108A (en) Text recognition method, readable storage medium and text recognition device for natural scene
CN111104941B (en) Image direction correction method and device and electronic equipment
CN116597466A (en) Engineering drawing text detection and recognition method and system based on improved YOLOv5s
CN112749690B (en) Text detection method and device, electronic equipment and storage medium
CN113449726A (en) Character comparison and identification method and device
JP7121132B2 (en) Image processing method, apparatus and electronic equipment
KR102026280B1 (en) Method and system for scene text detection using deep learning
CN112766269B (en) Picture text retrieval method, intelligent terminal and storage medium
CN114863132A (en) Method, system, equipment and storage medium for modeling and capturing image spatial domain information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination