CN108885699B - Character recognition method, device, storage medium and electronic equipment - Google Patents

Character recognition method, device, storage medium and electronic equipment Download PDF

Info

Publication number
CN108885699B
CN108885699B CN201880001125.2A CN201880001125A CN108885699B CN 108885699 B CN108885699 B CN 108885699B CN 201880001125 A CN201880001125 A CN 201880001125A CN 108885699 B CN108885699 B CN 108885699B
Authority
CN
China
Prior art keywords
image
character
correction processing
text line
recognized
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201880001125.2A
Other languages
Chinese (zh)
Other versions
CN108885699A (en
Inventor
梁昊
南一冰
廉士国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cloudminds Shanghai Robotics Co Ltd
Original Assignee
Cloudminds Shenzhen Robotics Systems Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cloudminds Shenzhen Robotics Systems Co Ltd filed Critical Cloudminds Shenzhen Robotics Systems Co Ltd
Publication of CN108885699A publication Critical patent/CN108885699A/en
Application granted granted Critical
Publication of CN108885699B publication Critical patent/CN108885699B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/63Scene text, e.g. street names
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Abstract

The present disclosure relates to a character recognition method, apparatus, storage medium, and electronic device, the method comprising: firstly, an image category corresponding to a target image including characters to be recognized can be determined; then, correcting the target image by a correction processing mode corresponding to the image type; then, extracting at least one text line image from the corrected target image; and finally, identifying the character to be identified in at least one text line image through a preset character identification model. Due to the fact that different image types correspond to different correction processing modes, images of different image types can be corrected according to the corresponding correction processing modes, and character recognition is conducted on the corrected images.

Description

Character recognition method, device, storage medium and electronic equipment
Technical Field
The present disclosure relates to the field of image processing, and in particular, to a character recognition method, device, storage medium, and electronic apparatus.
Background
With the rapid development of computer technology and multimedia, more and more information is spread in the form of images, and the information in the images can be descriptive texts, at present, the text images can be divided into document images and scene images, wherein the document images usually comprise a large number of characters, regular character distribution and single image background; unlike document images, scene images generally include a small number of characters, rich character types, randomly distributed characters, and complex image backgrounds.
In consideration of the fact that the document image and the scene image have different image characteristics, and the current character recognition algorithm is specific to a specific text image, the document image and the scene image need to be respectively subjected to character recognition through different character recognition algorithms, so that the universality of the character recognition algorithm is poor.
Disclosure of Invention
In order to solve the above problems, the present disclosure provides a character recognition method, apparatus, storage medium, and electronic device.
According to a first aspect of the present disclosure, there is provided a character recognition method, the method comprising:
determining an image category corresponding to a target image comprising characters to be recognized; wherein, different image types correspond to different correction processing modes;
correcting the target image by a correction processing mode corresponding to the image type;
extracting at least one text line image from the corrected target image;
and identifying the character to be identified in at least one text line image through a preset character identification model.
According to a second aspect of the present disclosure, there is provided a character recognition apparatus, the apparatus comprising:
the determining module is used for determining the image category corresponding to the target image comprising the character to be recognized; wherein, different image types correspond to different correction processing modes;
the correction module is used for carrying out correction processing on the target image in a correction processing mode corresponding to the image type;
the extraction module is used for extracting at least one text line image from the corrected target image;
and the recognition module is used for recognizing the character to be recognized in at least one text line image through a preset character recognition model.
According to a third aspect of the present disclosure, a computer-readable storage medium is provided, on which a computer program is stored, which program, when being executed by a processor, carries out the steps of the method of the first aspect as set forth above.
According to a fourth aspect of the present disclosure, there is provided an electronic device comprising:
a memory having a computer program stored thereon;
a processor for executing the computer program in the memory to implement the steps of the method of the first aspect.
In the above technical solution, first, an image category corresponding to a target image including a character to be recognized may be determined; then, correcting the target image by a correction processing mode corresponding to the image type; then, extracting at least one text line image from the corrected target image; and finally, identifying the character to be identified in at least one text line image through a preset character identification model. Due to the fact that different image types correspond to different correction processing modes, images of different image types can be corrected according to the corresponding correction processing modes, and character recognition is conducted on the corrected images.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure without limiting the disclosure. In the drawings:
FIG. 1 is a flow diagram illustrating a character recognition method in accordance with an exemplary embodiment;
FIG. 2 is a block diagram illustrating a first type of character recognition apparatus in accordance with an exemplary embodiment;
FIG. 3 is a block diagram illustrating a second type of character recognition apparatus in accordance with an exemplary embodiment;
FIG. 4 is a block diagram illustrating a third type of character recognition apparatus in accordance with an exemplary embodiment;
FIG. 5 is a block diagram illustrating a fourth type of character recognition apparatus in accordance with an exemplary embodiment;
FIG. 6 is a block diagram illustrating a fifth type of character recognition apparatus in accordance with an exemplary embodiment;
FIG. 7 is a block diagram illustrating a sixth type of character recognition apparatus in accordance with an exemplary embodiment;
FIG. 8 is a block diagram illustrating an electronic device in accordance with an example embodiment.
Detailed Description
First, an application scenario of the present disclosure is explained, and the present disclosure may be applied to a scenario of character recognition, in which a character recognition algorithm mainly includes two steps of character detection and character recognition. At present, character detection can be divided into two modes, namely single character detection and text line extraction, wherein the single character detection is to directly detect a single character in a target image, and the text line extraction mainly extracts character areas distributed in lines. Aiming at the two modes, the condition of missing detection is easily caused by single character detection, namely one or more characters in the target image are not detected, so that the accuracy rate of character recognition is influenced; the text line extraction is to take the characters distributed in lines as a whole, so that missing detection is not easy to occur, but each character in the text line needs to be segmented after the text line is detected, so that the segmentation accuracy is high. For the different character detection methods, the character recognition methods are also different: when single character detection is adopted, the extracted single characters can be directly and respectively identified, and all the single characters are arranged and combined according to the character position information of the single characters, so that a final identification result is generated; when text line extraction is adopted, the characters in each text line are divided firstly, then the divided characters are identified, and the character identification results of each text line are arranged and combined according to the position information of each text line to generate a final identification result.
The current text image can be divided into a document image and a scene image, wherein the document image generally comprises a large number of characters, regular character distribution and single image background; unlike document images, scene images generally include a small number of characters, rich character types, randomly distributed characters, and complex image backgrounds. For the document image and the scene image, because of the different image characteristics, the current character recognition algorithm cannot perform character recognition on the document image and the scene image at the same time, and needs to perform character recognition respectively through different character recognition algorithms, thereby causing poor universality of the character recognition algorithm.
In order to solve the above problem, the present disclosure provides a character recognition method, an apparatus, a storage medium, and an electronic device, where an image type of a target image may be determined, a correction processing manner corresponding to the target image may be determined according to the image type, the target image may be corrected according to the correction processing manner corresponding to the target image, at least one text line image may be extracted from the corrected target image, and a character to be recognized in the at least one text line image may be recognized according to a character recognition model. Due to the fact that different image types correspond to different correction processing modes, images of different image types can be corrected according to the corresponding correction processing modes, and character recognition is conducted on the corrected images.
The present disclosure is described in detail below with reference to specific examples.
FIG. 1 is a flow diagram illustrating a character recognition method according to an exemplary embodiment. As shown in fig. 1, the method includes:
s101, determining an image category corresponding to a target image including characters to be recognized.
In this step, the image category may include a document image and a scene image, where the document image generally includes a large number of characters, regular character distribution, and a single image background; different from the document image, the scene image generally includes a small number of characters, rich types of characters, randomly distributed characters, and complex image background, and in consideration of the fact that the document image and the scene image have the different image characteristics, different image types correspond to different correction processing manners, and the image types are only for example and are not limited by the disclosure.
In a possible implementation manner, an image sample of a determined image class may be obtained, and an image class corresponding to the target image may be determined according to the image sample, further, the image sample may include a document image sample and a scene image sample, and a difference between the number of the document image samples and the number of the scene image samples is less than or equal to a preset threshold, so that a target classifier may be obtained by training a preset classifier through the document image sample and the scene image sample based on a deep learning method, and when the target image is input into the target classifier, the target classifier may output the image class corresponding to the target image.
And S102, performing correction processing on the target image according to the correction processing mode corresponding to the image type.
When the image type is a document image, because the characters to be recognized in the document image are usually in dense distribution, if the characters to be recognized in the document image have inclination and/or distortion, the accuracy of character recognition may be affected, and to avoid this problem, the present disclosure may perform correction processing on the document image, where the correction processing manner includes direction correction processing and/or distortion correction processing, and at this time, performing correction processing on the target image by using the correction processing manner corresponding to the image type may include the following steps:
and S11, acquiring a first inclination angle between the character to be recognized in the document image and a horizontal axis.
In a possible implementation manner, the first inclination angle may be obtained by a projection analysis method or a Hough transform method, and of course, the document image may also be subjected to threshold segmentation to obtain a binary document image, and the first inclination angle is obtained according to pixel point information of a character to be recognized in the binary document image, and the specific process may refer to the prior art and is not repeated.
And S12, determining whether the first inclination angle is larger than or equal to a preset angle.
When the first inclination angle is greater than or equal to the preset angle, performing steps S13 and S14;
when the first inclination angle is smaller than the preset angle, step S14 is executed.
And S13, performing direction correction processing on the document image.
The direction correction processing may be to continuously rotate the target image until a first inclination angle between the character to be recognized in the text image and the horizontal axis is smaller than the preset angle.
And S14, determining whether the character to be recognized in the document image has distortion.
When a scanner or a camera is used for collecting a text image, if the text is inclined and bent, or the shooting angle of view is inclined, the text image is distorted, so that the text lines which are originally horizontal or vertical are bent, interference exists among the text lines in the text image, and the final recognition result of the character to be recognized is influenced.
When there is distortion in the character to be recognized in the document image, step S15 is executed;
when there is no distortion in the character to be recognized in the document image, it is determined that the correction processing is completed.
S15, distortion correction processing is performed on the document image.
The distortion correction processing may be performed by using blank positions between the text lines to correct the text lines, so that the text lines are restored to be horizontally distributed or vertically distributed, and the specific process may refer to the prior art and is not described again.
It should be noted that, for the above method embodiments, for the sake of simplicity, all of them are expressed as a series of action combinations, but those skilled in the art should understand that the present disclosure is not limited by the described action sequence, because some steps may be performed in other sequences or simultaneously according to the present disclosure, for example, steps S14 and S15 may be performed before step S11, and at this time, the distortion correction process may be performed before the direction correction process; further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required for the disclosure.
In summary, based on the image features of the text image, the steps S11 to S15 may correct the first inclination angle and distortion of the character to be recognized in the text image, thereby improving the accuracy of character recognition in the subsequent steps.
When the image category is a scene image, because characters to be recognized in the scene image are usually sparsely distributed and a small number of randomly distributed text lines often exist, so that the influence between the text lines in the scene image is small, and distortion correction processing is not required, for the scene image, the corresponding correction processing mode is direction correction processing, and specifically, performing correction processing on the target image through the correction processing mode corresponding to the image category includes the following steps:
and S21, detecting the character area of the scene image to obtain at least one character area.
The text region detection may include any one of edge detection, region detection, texture detection, or learning detection, and of course, two, three, or four detection methods may be combined, and the above examples are only examples, and the disclosure does not limit this.
And S22, sequentially acquiring a second inclination angle between the character to be recognized in at least one text area and the horizontal axis.
Similarly, the second inclination angle may be obtained by a projection analysis method or a Hough transform method, and of course, the scene image may also be subjected to threshold segmentation to obtain a binary scene image, and the second inclination angle may be obtained according to pixel point information of a character to be recognized in the binary scene image, and the specific process may refer to the prior art and is not repeated.
When the second inclination angle is greater than or equal to the preset angle, performing step S23;
and when the second inclination angle is smaller than the preset angle, determining that the correction processing is finished.
S23, a direction correction process is performed on at least one of the character areas.
The direction correction processing may be to continuously rotate the text region until a second inclination angle between the character to be recognized in the text region and the horizontal axis is smaller than the preset angle.
In summary, based on the image features of the scene image, the steps S21 to S23 may correct the second inclination angle of the character to be recognized in the scene image, thereby improving the accuracy of character recognition in the subsequent steps.
S103, extracting at least one text line image from the corrected target image.
In this step, at least one text line image may be extracted based on a deep learning method, and specifically, the following steps may be included:
and S31, extracting the spatial features of the target image through the multilayer convolution layer in the text line detection model.
Wherein the spatial feature may be a correlation between pixels in the target image.
And S32, inputting the spatial features to a recurrent neural network layer in a text line detection model to obtain the sequence features of the target image.
In this step, the Recurrent neural Network layer may be LSTM (Long Short term memory Network), BLSTM (bidirectional Long Short term memory Network), or GRU (Gated redundant Unit, LSTM variant), and the above examples are only examples, and the disclosure does not limit this.
And S33, acquiring a candidate text box in the target image according to a preset rule, and classifying the candidate text box based on the sequence feature.
In a possible implementation manner, a sliding window with a preset size and a preset proportion may be adopted to slide in the target image to intercept the candidate text box, and the specific process refers to the prior art, which is not described in detail in this disclosure.
The classification process can be completed through a classification layer in the text line detection model, for example, the classification layer can be a softmax layer, and the input and output dimensions of the softmax layer are consistent, when the input and output dimensions of the softmax layer are inconsistent, a full connection layer needs to be added in front of the softmax layer, so that the input and output dimensions of the softmax layer are consistent.
S34, text box position information of the candidate text box is obtained by using the regression convolution layer in the text line detection model.
And S35, screening the candidate text box according to the text box position information and the classification result by using an NMS (non maximum suppression) method to obtain a text line image.
And S104, identifying the character to be identified in at least one text line image through a preset character identification model.
Generally, a character recognition step is carried out by taking characters as units, and then character prediction is carried out by adopting a character classifier, however, when a text line image is complex, character segmentation is difficult, a character structure may be damaged, the final recognition result of the characters is directly influenced by the precision of the character segmentation, and in order to avoid the problem of low recognition accuracy rate caused by the character segmentation, the text line image can be taken as a whole, characters to be recognized in the text line image are not cut, all characters to be recognized in the text line image are directly recognized, and therefore the character context relationship can be fully utilized for recognition.
Before this step, the method further includes: acquiring position information of at least one text line image, wherein after the text line image is determined in step S103, the position information corresponding to the text line image can be determined according to the text frame position information, and at this time, the character to be recognized in at least one text line image is recognized through the preset character recognition model and the position information, and the preset character recognition model includes a deep learning layer, a loop network layer, and an encoding layer, and specifically, the character recognition process may include the following steps:
and S41, extracting character features of at least one text line image according to the deep learning layer.
The deep learning layer may be a Convolutional Neural Network (CNN), so that at least one text line image may be formed into a plurality of slices along a horizontal direction through the CNN, each slice corresponds to a character feature, and the character features include a certain context relationship due to possible overlap between adjacent slices.
And S42, inputting the extracted character features into the circulating network layer to obtain at least one feature vector corresponding to the text line image.
The recurrent neural network layer may be LSTM, BLSTM, GRU, or the like, so that the character features may be further learned by the neural network layer to obtain feature vectors corresponding to slices, and the above examples are only illustrative, and the disclosure is not limited thereto.
And S43, inputting the feature vector into the coding layer to obtain a coding result of at least one text line image, and obtaining text information of at least one text line image according to the coding result.
In this step, the coding layer may be a CTC (connection temperature classification) layer, so that a coding result may be obtained according to the CTC layer, and since the text line image may include a plurality of characters to be recognized, the coding result may include a plurality of codes, so that each code in the coding result is matched with a preset code corresponding relationship to obtain a character corresponding to each code, and the characters corresponding to each code are sequentially arranged according to a coding sequence of the plurality of codes to obtain text information of the text line image, where the preset code corresponding relationship is a corresponding relationship between a coding sample and a character sample, and the above example is merely an example, and the disclosure does not limit this.
And S44, orderly arranging the text information of at least one text line image according to the position information to obtain the target recognition result of the target image.
In this step, the order of at least one text line image in the text line image can be obtained according to the position information, so that the text information of at least one text line image is sorted according to the order to obtain the target recognition result.
It should be noted that, the present disclosure is described by taking the characters to be recognized in the target image as an example of horizontal arrangement, when the characters to be recognized are vertically arranged, at least one text column image in the target image may be extracted, and the characters to be recognized in at least one text column image may be recognized by using a preset character recognition model, and the specific process may refer to the description of the text row image and is not repeated.
By adopting the method, firstly, the image type of the target image can be determined, then, the correction processing mode corresponding to the target image is determined according to the image type, then, the target image is corrected according to the correction processing mode corresponding to the target image, secondly, at least one text line image can be extracted from the corrected target image, and finally, the character to be recognized in the at least one text line image is recognized according to the character recognition model. Due to the fact that different image types correspond to different correction processing modes, images of different image types can be corrected according to the corresponding correction processing modes, and character recognition is conducted on the corrected images.
Fig. 2 is a block diagram illustrating a character recognition apparatus 20 according to an exemplary embodiment, as shown in fig. 2, including:
a determining module 201, configured to determine an image category corresponding to a target image including a character to be recognized; wherein, different image types correspond to different correction processing modes;
a correction module 202, configured to perform correction processing on the target image in a correction processing manner corresponding to the image type;
an extracting module 203, configured to extract at least one text line image from the corrected target image;
the recognition module 204 is configured to recognize the character to be recognized in at least one of the text line images through a preset character recognition model.
Optionally, the image category includes a document image and a scene image.
Fig. 3 is a block diagram illustrating the determination module 201 according to an exemplary embodiment, and as shown in fig. 3, the determination module 201 includes:
a first obtaining sub-module 2011, configured to obtain an image sample of the determined image category;
the first determining sub-module 2012 is configured to determine an image category corresponding to the target image according to the image sample.
FIG. 4 is a block diagram of the correction module 202 shown in accordance with an exemplary embodiment, as shown in FIG. 4, when the image category is a document image, the correction processing mode includes a direction correction processing and/or a distortion correction processing; when the correction processing manner includes the direction correction processing and the distortion correction processing, the correction module 202 includes:
the second obtaining sub-module 2021 is configured to obtain a first inclination angle between the character to be recognized in the text image and a horizontal axis;
the first correction submodule 2022 is configured to perform direction correction processing on the text image when the first inclination angle is greater than or equal to a preset angle;
a second determining sub-module 2023, configured to determine whether the character to be recognized in the text image has distortion;
the second correction sub-module 2024 is configured to perform distortion correction processing on the text image when the character to be recognized in the text image has distortion.
FIG. 5 is a block diagram of the correction module 202 according to an exemplary embodiment, as shown in FIG. 5, when the image category is a scene image, the correction processing mode includes direction correction processing; the calibration module 202 includes:
the detection submodule 2025 is configured to perform text region detection on the scene image to obtain at least one text region;
the third obtaining sub-module 2026 is configured to sequentially obtain a second inclination angle between the character to be recognized and the horizontal axis in at least one of the text regions;
the third correction submodule 2027 is configured to perform direction correction processing on at least one text region when the second inclination angle in the at least one text region is greater than or equal to a preset angle.
Fig. 6 is a block diagram illustrating the character recognition apparatus 20 according to an exemplary embodiment, as shown in fig. 6, further including:
an obtaining module 305, configured to obtain position information of at least one text line image before the character to be recognized in the at least one text line image is recognized through a preset character recognition model;
the recognition module 304 is configured to recognize the character to be recognized in at least one of the text line images through the preset character recognition model and the position information.
Fig. 7 is a block diagram illustrating a recognition module 304 according to an exemplary embodiment, where the preset character recognition model includes a deep learning layer, a loop network layer, and an encoding layer, as shown in fig. 7, the recognition module 304 includes:
an extracting submodule 3041, configured to perform character feature extraction on at least one text line image according to the deep learning layer;
a fourth obtaining submodule 3042, configured to input the extracted character features to the loop network layer to obtain at least one feature vector corresponding to the text line image;
a fifth obtaining submodule 3043, configured to input the feature vector to the coding layer to obtain a coding result of at least one text line image, and obtain text information of at least one text line image according to the coding result;
the sixth obtaining sub-module 3044 is configured to sequentially arrange the text information of at least one text line image according to the position information to obtain a target recognition result of the target image.
By adopting the device, firstly, the image type of the target image can be determined, then, the correction processing mode corresponding to the target image is determined according to the image type, then, the target image is corrected according to the correction processing mode corresponding to the target image, secondly, at least one text line image can be extracted from the corrected target image, and finally, the character to be recognized in the at least one text line image is recognized according to the character recognition model. Due to the fact that different image types correspond to different correction processing modes, images of different image types can be corrected according to the corresponding correction processing modes, and character recognition is conducted on the corrected images.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
Fig. 8 is a block diagram illustrating an electronic device 800 in accordance with an example embodiment. As shown in fig. 8, the electronic device 800 may include: a processor 801, a memory 802. The electronic device 800 may also include one or more of a multimedia component 803, an input/output (I/O) interface 804, and a communications component 805.
The processor 801 is configured to control the overall operation of the electronic device 800, so as to complete all or part of the steps in the character recognition method. The memory 802 is used to store various types of data to support operation at the electronic device 800, such as instructions for any application or method operating on the electronic device 800 and application-related data, such as contact data, transmitted and received messages, pictures, audio, video, and so forth. The Memory 802 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk or optical disk. The multimedia components 803 may include screen and audio components. Wherein the screen may be, for example, a touch screen and the audio component is used for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may further be stored in the memory 802 or transmitted through the communication component 805. The audio assembly also includes at least one speaker for outputting audio signals. The I/O interface 804 provides an interface between the processor 801 and other interface modules, such as a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 805 is used for wired or wireless communication between the electronic device 800 and other devices. Wireless communication, such as Wi-Fi, bluetooth, Near Field Communication (NFC), 2G, 3G, or 4G, or a combination of one or more of them, so that the corresponding communication component 805 may include: Wi-Fi module, bluetooth module, NFC module.
In an exemplary embodiment, the electronic Device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components for performing the character recognition method described above.
In another exemplary embodiment, there is also provided a computer readable storage medium comprising program instructions which, when executed by a processor, implement the steps of the character recognition method described above. For example, the computer readable storage medium may be the memory 802 described above that includes program instructions that are executable by the processor 801 of the electronic device 800 to perform the character recognition method described above.
The preferred embodiments of the present disclosure are described in detail with reference to the accompanying drawings, however, the present disclosure is not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solution of the present disclosure within the technical idea of the present disclosure, and these simple modifications all belong to the protection scope of the present disclosure.
It should be noted that the various features described in the above embodiments may be combined in any suitable manner without departing from the scope of the invention. In order to avoid unnecessary repetition, various possible combinations will not be separately described in this disclosure.
In addition, any combination of various embodiments of the present disclosure may be made, and the same should be considered as the disclosure of the present disclosure, as long as it does not depart from the spirit of the present disclosure.

Claims (10)

1. A method of character recognition, the method comprising:
determining an image category corresponding to a target image comprising characters to be recognized; different image types correspond to different correction processing modes, and the image types comprise document images and scene images;
correcting the target image by a correction processing mode corresponding to the image type;
extracting at least one text line image from the corrected target image;
identifying the character to be identified in at least one text line image through a preset character identification model;
when the image type is a document image, the correction processing mode comprises direction correction processing and/or distortion correction processing; when the correction processing method includes the direction correction processing and the distortion correction processing, the performing the correction processing on the target image by the correction processing method corresponding to the image type includes:
acquiring a first inclination angle between the character to be recognized in the document image and a horizontal axis;
when the first inclination angle is larger than or equal to a preset angle, performing direction correction processing on the document image;
determining whether the character to be recognized in the document image has distortion;
when the character to be recognized in the document image has distortion, carrying out distortion correction processing on the document image;
when the image type is a scene image, the correction processing mode comprises direction correction processing; the correcting the target image in the correction processing mode corresponding to the image type comprises the following steps:
performing character region detection on the scene image to obtain at least one character region;
sequentially acquiring a second inclination angle between the character to be recognized and a horizontal axis in at least one character area;
and when the second inclination angle in at least one text area is larger than or equal to a preset angle, performing direction correction processing on at least one text area.
2. The method of claim 1, wherein determining the image category corresponding to the target image including the character to be recognized comprises:
acquiring an image sample of the determined image category;
and determining the image category corresponding to the target image according to the image sample.
3. The method according to claim 1, wherein before the recognizing the character to be recognized in at least one of the text line images by a preset character recognition model, the method further comprises:
acquiring position information of at least one text line image;
the recognizing the character to be recognized in at least one text line image through a preset character recognition model comprises:
and identifying the character to be identified in at least one text line image through the preset character identification model and the position information.
4. The method according to claim 3, wherein the preset character recognition model comprises a deep learning layer, a loop network layer and an encoding layer, and the recognizing the character to be recognized in at least one of the text line images through the preset character recognition model and the position information comprises:
extracting character features of at least one text line image according to the deep learning layer;
inputting the extracted character features into the circulating network layer to obtain at least one feature vector corresponding to the text line image;
inputting the feature vector into the coding layer to obtain a coding result of at least one text line image, and obtaining text information of at least one text line image according to the coding result;
and orderly arranging the text information of at least one text line image according to the position information to obtain a target recognition result of the target image.
5. An apparatus for character recognition, the apparatus comprising:
the determining module is used for determining the image category corresponding to the target image comprising the character to be recognized; different image types correspond to different correction processing modes, and the image types comprise document images and scene images;
the correction module is used for carrying out correction processing on the target image in a correction processing mode corresponding to the image type;
the extraction module is used for extracting at least one text line image from the corrected target image;
the recognition module is used for recognizing the character to be recognized in at least one text line image through a preset character recognition model;
when the image type is a document image, the correction processing mode comprises direction correction processing and/or distortion correction processing; when the correction processing manner includes the direction correction processing and the distortion correction processing, the correction module includes:
the second obtaining submodule is used for obtaining a first inclination angle between the character to be recognized and a horizontal axis in the document image;
the first correction submodule is used for performing direction correction processing on the document image when the first inclination angle is larger than or equal to a preset angle;
the second determining submodule is used for determining whether the character to be recognized in the document image has distortion or not;
the second correction submodule is used for carrying out distortion correction processing on the document image when the character to be recognized in the document image has distortion;
when the image type is a scene image, the correction processing mode comprises direction correction processing; the correction module includes:
the detection submodule is used for carrying out character area detection on the scene image to obtain at least one character area;
the third obtaining submodule is used for sequentially obtaining a second inclination angle between the character to be recognized and the horizontal axis in at least one character area;
and the third correction submodule is used for performing direction correction processing on at least one character area when the second inclination angle in at least one character area is larger than or equal to a preset angle.
6. The apparatus of claim 5, wherein the determining module comprises:
the first obtaining sub-module is used for obtaining an image sample of the determined image category;
and the first determining submodule is used for determining the image category corresponding to the target image according to the image sample.
7. The apparatus of claim 5, further comprising:
the acquisition module is used for acquiring the position information of at least one text line image before the character to be recognized in the at least one text line image is recognized through a preset character recognition model;
the recognition module is used for recognizing the character to be recognized in at least one text line image through the preset character recognition model and the position information.
8. The apparatus of claim 7, wherein the predetermined character recognition model comprises a deep learning layer, a cyclic network layer, and an encoding layer, and wherein the recognition module comprises:
the extraction submodule is used for extracting character features of at least one text line image according to the deep learning layer;
the fourth obtaining submodule is used for inputting the extracted character features into the circulating network layer to obtain at least one feature vector corresponding to the text line image;
a fifth obtaining submodule, configured to input the feature vector to the coding layer to obtain a coding result of the at least one text line image, and obtain text information of the at least one text line image according to the coding result;
and the sixth obtaining submodule is used for orderly arranging the text information of at least one text line image according to the position information to obtain a target recognition result of the target image.
9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 4.
10. An electronic device, comprising:
a memory having a computer program stored thereon;
a processor for executing the computer program in the memory to carry out the steps of the method of any one of claims 1 to 4.
CN201880001125.2A 2018-07-11 2018-07-11 Character recognition method, device, storage medium and electronic equipment Active CN108885699B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/095295 WO2020010547A1 (en) 2018-07-11 2018-07-11 Character identification method and apparatus, and storage medium and electronic device

Publications (2)

Publication Number Publication Date
CN108885699A CN108885699A (en) 2018-11-23
CN108885699B true CN108885699B (en) 2020-06-26

Family

ID=64325024

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201880001125.2A Active CN108885699B (en) 2018-07-11 2018-07-11 Character recognition method, device, storage medium and electronic equipment

Country Status (2)

Country Link
CN (1) CN108885699B (en)
WO (1) WO2020010547A1 (en)

Families Citing this family (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111695377B (en) * 2019-03-13 2023-09-29 杭州海康威视数字技术股份有限公司 Text detection method and device and computer equipment
CN111723627A (en) * 2019-03-22 2020-09-29 北京搜狗科技发展有限公司 Image processing method and device and electronic equipment
CN111832371A (en) * 2019-04-23 2020-10-27 珠海金山办公软件有限公司 Text picture correction method and device, electronic equipment and machine-readable storage medium
CN110490190B (en) * 2019-07-04 2021-10-26 贝壳技术有限公司 Structured image character recognition method and system
CN110674811B (en) * 2019-09-04 2022-04-29 广东浪潮大数据研究有限公司 Image recognition method and device
CN110807454A (en) * 2019-09-19 2020-02-18 平安科技(深圳)有限公司 Character positioning method, device and equipment based on image segmentation and storage medium
CN112949638B (en) * 2019-11-26 2024-04-05 金毛豆科技发展(北京)有限公司 Certificate image uploading method and device
CN113128306A (en) * 2020-01-10 2021-07-16 北京字节跳动网络技术有限公司 Vertical text line recognition method, device, equipment and computer readable storage medium
CN111242083B (en) * 2020-01-21 2024-01-26 腾讯云计算(北京)有限责任公司 Text processing method, device, equipment and medium based on artificial intelligence
CN111444908B (en) * 2020-03-25 2024-02-02 腾讯科技(深圳)有限公司 Image recognition method, device, terminal and storage medium
CN111444834A (en) * 2020-03-26 2020-07-24 同盾控股有限公司 Image text line detection method, device, equipment and storage medium
CN111353493B (en) * 2020-03-31 2023-04-28 中国工商银行股份有限公司 Text image direction correction method and device
CN113554558A (en) * 2020-04-26 2021-10-26 北京金山数字娱乐科技有限公司 Image processing method and device
CN111563502B (en) * 2020-05-09 2023-12-15 腾讯科技(深圳)有限公司 Image text recognition method and device, electronic equipment and computer storage medium
CN111639566A (en) * 2020-05-19 2020-09-08 浙江大华技术股份有限公司 Method and device for extracting form information
CN111611933B (en) * 2020-05-22 2023-07-14 中国科学院自动化研究所 Information extraction method and system for document image
CN111814538B (en) * 2020-05-25 2024-03-05 北京达佳互联信息技术有限公司 Method and device for identifying category of target object, electronic equipment and storage medium
CN111832558A (en) * 2020-06-15 2020-10-27 北京三快在线科技有限公司 Character image correction method, device, storage medium and electronic equipment
CN111695566B (en) * 2020-06-18 2023-03-14 郑州大学 Method and system for identifying and processing fixed format document
CN111753850A (en) * 2020-06-29 2020-10-09 珠海奔图电子有限公司 Document processing method and device, computer equipment and computer readable storage medium
CN111767859A (en) * 2020-06-30 2020-10-13 北京百度网讯科技有限公司 Image correction method and device, electronic equipment and computer-readable storage medium
CN111985465A (en) * 2020-08-17 2020-11-24 中移(杭州)信息技术有限公司 Text recognition method, device, equipment and storage medium
CN112001331A (en) * 2020-08-26 2020-11-27 上海高德威智能交通系统有限公司 Image recognition method, device, equipment and storage medium
CN112149663A (en) * 2020-08-28 2020-12-29 北京来也网络科技有限公司 RPA and AI combined image character extraction method and device and electronic equipment
CN114429632B (en) * 2020-10-15 2023-12-12 腾讯科技(深圳)有限公司 Method, device, electronic equipment and computer storage medium for identifying click-to-read content
CN112364834A (en) * 2020-12-07 2021-02-12 上海叠念信息科技有限公司 Form identification restoration method based on deep learning and image processing
CN112560862B (en) * 2020-12-17 2024-02-13 北京百度网讯科技有限公司 Text recognition method and device and electronic equipment
CN112699871B (en) * 2020-12-23 2023-11-14 平安银行股份有限公司 Method, system, device and computer readable storage medium for identifying field content
CN112733623A (en) * 2020-12-26 2021-04-30 科大讯飞华南人工智能研究院(广州)有限公司 Text element extraction method, related equipment and readable storage medium
CN112784932A (en) * 2021-03-01 2021-05-11 北京百炼智能科技有限公司 Font identification method and device and storage medium
CN113033377A (en) * 2021-03-16 2021-06-25 北京有竹居网络技术有限公司 Character position correction method, character position correction device, electronic equipment and storage medium
CN113191345A (en) * 2021-04-28 2021-07-30 北京有竹居网络技术有限公司 Text line direction determining method and related equipment thereof
CN113076961B (en) * 2021-05-12 2023-09-05 北京奇艺世纪科技有限公司 Image feature library updating method, image detection method and device
CN113408270B (en) * 2021-06-10 2023-02-10 广州三七极创网络科技有限公司 Variant text recognition method and device and electronic equipment
CN113298079B (en) * 2021-06-28 2023-10-27 北京奇艺世纪科技有限公司 Image processing method and device, electronic equipment and storage medium
CN113610073A (en) * 2021-06-29 2021-11-05 北京搜狗科技发展有限公司 Method and device for identifying formula in picture and storage medium
CN113642556A (en) * 2021-08-04 2021-11-12 五八有限公司 Image processing method and device, electronic equipment and storage medium
CN113657364B (en) * 2021-08-13 2023-07-25 北京百度网讯科技有限公司 Method, device, equipment and storage medium for identifying text mark
CN114155546B (en) * 2022-02-07 2022-05-20 北京世纪好未来教育科技有限公司 Image correction method and device, electronic equipment and storage medium
CN114495106A (en) * 2022-04-18 2022-05-13 电子科技大学 MOCR (metal-oxide-semiconductor resistor) deep learning method applied to DFB (distributed feedback) laser chip
CN115640401B (en) * 2022-12-07 2023-04-07 恒生电子股份有限公司 Text content extraction method and device
CN115983938A (en) * 2022-12-13 2023-04-18 北京京东拓先科技有限公司 Online medicine purchasing management method and device
CN117237957A (en) * 2023-11-16 2023-12-15 新视焰医疗科技(杭州)有限公司 Method and system for detecting direction of document and correcting inclined or malformed document

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104636743B (en) * 2013-11-06 2021-09-03 北京三星通信技术研究有限公司 Method and device for correcting character image
CN105631448B (en) * 2015-12-28 2019-03-08 小米科技有限责任公司 Method for correcting image and device
CN107610091A (en) * 2017-07-31 2018-01-19 阿里巴巴集团控股有限公司 Vehicle insurance image processing method, device, server and system
CN107862303B (en) * 2017-11-30 2019-04-26 平安科技(深圳)有限公司 Information identifying method, electronic device and the readable storage medium storing program for executing of form class diagram picture

Also Published As

Publication number Publication date
CN108885699A (en) 2018-11-23
WO2020010547A1 (en) 2020-01-16

Similar Documents

Publication Publication Date Title
CN108885699B (en) Character recognition method, device, storage medium and electronic equipment
US11893782B2 (en) Recurrent deep neural network system for detecting overlays in images
CN110517246B (en) Image processing method and device, electronic equipment and storage medium
CN108388879B (en) Target detection method, device and storage medium
US9141874B2 (en) Feature extraction and use with a probability density function (PDF) divergence metric
JP5775225B2 (en) Text detection using multi-layer connected components with histograms
US9076242B2 (en) Automatic correction of skew in natural images and video
WO2014092979A1 (en) Method of perspective correction for devanagari text
US9619753B2 (en) Data analysis system and method
JP2008217347A (en) License plate recognition device, its control method and computer program
CN103198311B (en) Image based on shooting recognizes the method and device of character
US9836456B2 (en) Techniques for providing user image capture feedback for improved machine language translation
US11087137B2 (en) Methods and systems for identification and augmentation of video content
CN114723646A (en) Image data generation method with label, device, storage medium and electronic equipment
CN110969154A (en) Text recognition method and device, computer equipment and storage medium
CN112052702A (en) Method and device for identifying two-dimensional code
CN112396594A (en) Change detection model acquisition method and device, change detection method, computer device and readable storage medium
CN111160340B (en) Moving object detection method and device, storage medium and terminal equipment
CN113221718B (en) Formula identification method, device, storage medium and electronic equipment
KR102101481B1 (en) Apparatus for lenrning portable security image based on artificial intelligence and method for the same
CN113343983B (en) License plate number recognition method and electronic equipment
CN116501176B (en) User action recognition method and system based on artificial intelligence
CN117372286B (en) Python-based image noise optimization method and system
US20240127588A1 (en) Recurrent Deep Neural Network System for Detecting Overlays in Images
CN116434242A (en) Text detection method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20210301

Address after: 201111 2nd floor, building 2, no.1508, Kunyang Road, Minhang District, Shanghai

Patentee after: Dalu Robot Co.,Ltd.

Address before: 518000 Room 201, building A, No. 1, Qian Wan Road, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong (Shenzhen Qianhai business secretary Co., Ltd.)

Patentee before: Shenzhen Qianhaida Yunyun Intelligent Technology Co.,Ltd.

CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: 201111 Building 8, No. 207, Zhongqing Road, Minhang District, Shanghai

Patentee after: Dayu robot Co.,Ltd.

Address before: 201111 2nd floor, building 2, no.1508, Kunyang Road, Minhang District, Shanghai

Patentee before: Dalu Robot Co.,Ltd.