CN111767859A - Image correction method and device, electronic equipment and computer-readable storage medium - Google Patents

Image correction method and device, electronic equipment and computer-readable storage medium Download PDF

Info

Publication number
CN111767859A
CN111767859A CN202010612625.9A CN202010612625A CN111767859A CN 111767859 A CN111767859 A CN 111767859A CN 202010612625 A CN202010612625 A CN 202010612625A CN 111767859 A CN111767859 A CN 111767859A
Authority
CN
China
Prior art keywords
image
text
text region
semantic segmentation
preprocessed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010612625.9A
Other languages
Chinese (zh)
Inventor
杨舰
庞敏辉
谢国斌
陈兴波
韩光耀
张瑾
冯博豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202010612625.9A priority Critical patent/CN111767859A/en
Publication of CN111767859A publication Critical patent/CN111767859A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/412Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • G06V10/464Salient features, e.g. scale invariant feature transforms [SIFT] using a plurality of salient features, e.g. bag-of-words [BoW] representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/146Aligning or centring of the image pick-up or image-field
    • G06V30/1475Inclination or skew detection or correction of characters or of image to be recognised
    • G06V30/1478Inclination or skew detection or correction of characters or of image to be recognised of characters or characters lines

Abstract

The application discloses an image correction method, an image correction device, electronic equipment and a computer readable storage medium, and relates to the technical field of image processing and the technical field of deep learning. The specific implementation scheme is as follows: acquiring a preprocessed image which is obtained through preprocessing and contains a text region; performing four-degree rotation on the preprocessed image, and taking the rotated preprocessed image with the smallest included angle between the text line of the text region and the preset direction as an image after primary rotation; performing semantic segmentation on the primarily rotated image to obtain the text region after the semantic segmentation; calculating an included angle between a text line of the text region after the semantic segmentation and the preset direction; and rotating the primarily rotated image according to the included angle to obtain a corrected image, wherein the text line of the text region in the corrected image is the same as the preset direction, so that the preprocessed image can be conveniently read and recognized subsequently.

Description

Image correction method and device, electronic equipment and computer-readable storage medium
Technical Field
The embodiment of the application relates to the technical field of computers, in particular to the technical field of image processing and the technical field of deep learning, and particularly relates to a method and a device for image correction, electronic equipment and a computer-readable storage medium.
Background
In order to respond to the national supervision policy, companies and banks in the consumption credit business need to manually check the massive and various consumption certificates uploaded by clients every day, and check whether the information on the bills is matched with the information in the application form.
According to the use and transaction habits of users, after the users usually complete consumption, the users directly use portable devices such as mobile phones and the like to randomly shoot and upload consumption certificates in natural environment so as to complete the auditing requirements of companies and banks of the consumption credit business.
Disclosure of Invention
The application provides a method, a device, an electronic device and a storage medium for image correction.
In a first aspect, an embodiment of the present application provides a method for image correction, including: acquiring a preprocessed image obtained through preprocessing; wherein, the preprocessed image contains text regions; performing four-degree rotation on the preprocessed image, and taking the rotated preprocessed image with the smallest included angle between the text line of the text region and the preset direction as an image after primary rotation; performing semantic segmentation on the primarily rotated image to obtain the text region after the semantic segmentation; calculating an included angle between a text line of the text region after the semantic segmentation and the preset direction; and rotating the primarily rotated image according to the included angle to obtain a corrected image, wherein the text line of the text area in the corrected image is the same as the preset direction.
In a second aspect, an embodiment of the present application provides an apparatus for image correction, including: an image acquisition unit configured to acquire a preprocessed image obtained by preprocessing; wherein, the preprocessed image contains text regions; a first rotation unit configured to perform four-degree rotation on the preprocessed image, and take the rotated preprocessed image with the smallest included angle between the text line of the text region and the preset direction as an image after primary rotation; a semantic segmentation unit configured to perform semantic segmentation on the primarily rotated image to obtain the text region after the semantic segmentation; and the included angle calculation unit is configured to calculate an included angle between the text line of the text region after the semantic segmentation and the preset direction. And a second rotating unit configured to rotate the primarily rotated image according to the included angle to obtain a corrected image, wherein the text lines of the text region in the corrected image are the same as the preset direction.
In a third aspect, an embodiment of the present application provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of image correction as described in any implementation of the first aspect.
In a fourth aspect, embodiments of the present application provide a non-transitory computer readable storage medium having computer instructions stored thereon, comprising: the computer instructions are for causing the computer to perform a method of image correction as described in any implementation of the first aspect.
After a preprocessed image obtained through preprocessing is obtained, four-degree rotation is carried out on the preprocessed image, the rotated preprocessed image with the smallest included angle between a text line of a text region and a preset direction is used as an image after primary rotation, semantic segmentation is carried out on the image after primary rotation to obtain the text region after semantic segmentation, and the included angle between the text line of the text region after semantic segmentation and the preset direction is calculated; and rotating the primarily rotated image according to the included angle to obtain a corrected image, wherein the text line of the text area in the corrected image is the same as the preset direction. After the image is corrected through the steps, the arrangement direction of the characters in the preprocessed image can be a horizontal positive direction, so that the subsequent reading and auditing of the text content are facilitated, and the reading and auditing efficiency is improved.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present application, nor do they limit the scope of the present application. Other features of the present application will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:
FIG. 1 is an exemplary system architecture to which the present application may be applied;
FIG. 2 is a flow diagram of one embodiment of a method of image correction according to the present application;
FIG. 3 is a diagram of a deep high resolution representation learning neural network architecture according to another embodiment of a method of image correction of the present application
FIG. 4 is a flow diagram of another embodiment of a method of image correction according to the present application;
FIG. 5a is a diagram illustrating the result of image correction of an application scene according to the method of image correction of the present application;
FIG. 5b is a diagram illustrating the result of image correction of an application scene according to the method of image correction of the present application;
FIG. 5c is a diagram illustrating the result of image correction of an application scene according to the method of image correction of the present application;
FIG. 5d is a diagram illustrating the result of image correction of an application scene according to the method of image correction of the present application;
FIG. 6 is a schematic structural diagram of one embodiment of an image correction apparatus according to the present application;
FIG. 7 is a block diagram of an electronic device suitable for use in implementing image correction of embodiments of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of the method, apparatus, electronic device, and computer-readable storage medium of image correction of the present application may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various image acquisition applications, such as a consumption audit application, an image document conversion application, an image text audit application, and the like, may be installed on the terminal devices 101, 102, and 103.
The terminal apparatuses 101, 102, and 103 may be hardware or software. Hardware, various electronic devices with display screens are possible, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When the terminal apparatuses 101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as a plurality of software or software modules (e.g. for implementing an image correction service) or as a single software or software module. And is not particularly limited herein.
The server 105 may be a server providing various services, for example, a server acquires a preprocessed image obtained through preprocessing from the terminal devices 101, 102, and 103 that receive an image correction request from a user in the present application through the network 104, performs quarter rotation on the preprocessed image, takes the rotated preprocessed image with the smallest included angle between a text line of a text region and a preset direction as an image after initial rotation, performs semantic segmentation on the image after initial rotation to obtain the text region after semantic segmentation, and calculates an included angle between the text line of the text region after semantic segmentation and the preset direction; and rotating the primarily rotated image according to the included angle to obtain a corrected image, wherein the text line of the text area in the corrected image is the same as the preset direction.
It should be noted that the method for image correction provided by the embodiment of the present disclosure is generally performed by the server 105, and accordingly, the apparatus for image correction is generally disposed in the server 105.
The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules, for example, to provide distributed services, or as a single piece of software or software module. And is not particularly limited herein.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
It should be further noted that the terminal devices 101, 102, and 103 may also be installed with an image correction application, and the terminal devices 101, 102, and 103 may also complete acquiring a preprocessed image obtained through preprocessing, perform quarter rotation on the preprocessed image, use the rotated preprocessed image with the smallest included angle between the text line of the text region and the preset direction as an image after initial rotation, perform semantic segmentation on the image after initial rotation to obtain the text region after semantic segmentation, and calculate the included angle between the text line of the text region after semantic segmentation and the preset direction; and rotating the primarily rotated image according to the included angle to obtain a corrected image, wherein the text line of the text area in the corrected image is the same as the preset direction.
At this time, the method of image correction may be executed by the terminal apparatuses 101, 102, 103, and accordingly, the apparatus of image correction may be provided in the terminal apparatuses 101, 102, 103. At this point, the exemplary system architecture 100 may also not include the server 105 and the network 104.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to FIG. 2, a flow 200 of one embodiment of a method of image correction according to the present application is shown. The image correction method comprises the following steps:
step 201, a preprocessed image obtained through preprocessing is acquired.
In this embodiment, an executing body (for example, the server 105 shown in fig. 1) of the method for image correction may obtain a preprocessed image obtained through preprocessing from a non-local storage device (for example, the terminal devices 101, 102, 103 shown in fig. 1) locally or through a wired connection manner or a wireless connection manner (for example, the network 104 shown in fig. 1), and it should be understood that the preprocessed image should include a text region because the invention of this application aims to correct the preprocessed image according to the content of the text region in the preprocessed image, so that the content in the text region in the preprocessed image used in the subsequent reading and review extraction processes is forward and horizontal.
The local storage device may be a data storage module disposed in the execution body, and in this case, the preprocessed image obtained by preprocessing is acquired only by locally reading the preprocessed image; the non-local storage device may be another data storage server dedicated to storing the pre-processed image uploaded by the user, in which case the executing entity may obtain the pre-processed image returned by the data storage server by sending a pre-processed image obtaining command to the data storage server.
The preprocessing mainly performs unified standardization processing on input images, eliminates irrelevant information in the images, recovers useful real information, enhances the detectability of the relevant information and simplifies data to the maximum extent, thereby improving the reliability of image segmentation.
In some optional implementations of this embodiment, the pre-processing includes: at least one of an image format normalization operation, a watermark removal operation, and a contrast enhancement operation.
Specifically, since the picture storage formats in the natural scene are not uniform, such as RGB, RGBA, JPG, and the like, the picture formats are first recognized, and then the picture data is uniformly converted into a three-channel RGB format, which is convenient for the subsequent steps.
The image shot in the natural scene can be automatically watermarked when being shot or uploaded by the equipment, which belongs to the image irrelevant information, and the watermark can be eliminated by using the method for eliminating the watermark in the prior art or the future developed technology in the preprocessing stage, which is not limited in the application. For example, the watermark may be removed using prior art watermarking tools.
In some optional implementations of this embodiment, the step of the watermark cancellation operation includes: and outputting the original image to a pre-trained neural network for watermark elimination operation to obtain the preprocessed image.
Specifically, a convolutional neural network (ConvNets) can be used, the image is processed by utilizing the image convolution principle, the watermark in the image is removed, so that the subsequent identification content is clear and accurate, and especially when the semantic segmentation operation is carried out on the image, the identification error caused by watermark characters is avoided.
In addition, when a user shoots an image, the user may use a handheld device such as a mobile phone and a tablet personal computer to shoot, so that the obtained image is not always subjected to fixed lighting, and the situation that the image is excessively exploded or dark exists, which is very disadvantageous to image segmentation in a later period, so that in a preprocessing stage, a pre-trained algorithm, a neural network or existing image processing software can be used for performing contrast enhancement processing on the image, for example, OpenCV software is used for enhancing the contrast of the image, highlighting a text region, and reducing the difficulty of image segmentation correction.
After the images are preprocessed in the preprocessing modes, the text regions contained in the preprocessed images can be easily determined in the obtained preprocessed images, so that the preprocessed images can be conveniently subjected to subsequent operation according to the text regions, and the efficiency and the accuracy of image correction are improved.
Step 202, performing quarter rotation on the preprocessed image, and taking the rotated preprocessed image with the smallest included angle between the text line of the text region and the preset direction as the primarily rotated image.
In this embodiment, the preprocessed image obtained in step 201 is subjected to four-quadrant angular rotation processing by inputting the preprocessed image to a pre-trained rotating neural network and using a predetermined rotation model algorithm, where the four-quadrant angular rotation processing generally means that 360 ° is quartered into 0 °, 90 °, 180 °, and 270 °, and then the images are rotated respectively. The pre-trained rotary neural network can respectively rotate the picture by 0 degree, 90 degrees, 180 degrees and 270 degrees, and determine which angle is rotated, so that the included angle between the text line of the text region in the obtained preprocessed image and the preset direction is minimum, and the image after the initial rotation is determined.
For example, if it is determined that the angle between the positive direction of the text line of the text area in the rotated preprocessed image and the positive direction of the horizontal direction is an acute angle after the preprocessed image is rotated by 180 degrees, the image obtained after the preprocessed image is rotated by 180 degrees is determined to be the image after the preprocessed image is rotated for the first time, similarly, the four-classification rotation may be performed on the image by angle detection, and the image is rotated according to the result obtained by the angle detection, so as to achieve the above purpose.
In some optional implementations of the embodiment, the quarter-degree rotating the preprocessed image comprises: the preprocessed image is input into a residual neural network using pre-training for a quarter rotation.
Specifically, a residual neural network (Resnet neural network) is utilized. Deep features of the preprocessed image can be extracted, and a text region angle in the image is used as a sample label to train a text region angle recognition model. After detecting the angle of the text area in the image, carrying out rotation correction processing according to the angle result, wherein the included angle between the positive direction of the text line of the text area in the rotated preprocessed image and the positive direction of the horizontal direction is an acute angle or 0 degree. After the four-classification rotation is carried out on the preprocessed image by adopting the rotary neural network, the angle between the text region in the preprocessed image and the standard direction can be in an acute angle range, so that the direction of the text region can be conveniently adjusted subsequently, and the direction of the text region is 0 degree.
And 203, performing semantic segmentation on the primarily rotated image to obtain a text region after the semantic segmentation.
In this embodiment, the text regions in the initially rotated image determined in step 202 above may be identified using a pre-trained semantic segmentation neural network, a pre-determined semantic segmentation model algorithm, that recovers a high resolution representation from a low resolution representation generated by a high-to-low resolution network. Firstly, an Encoder (Encoder) is used for down-sampling the picture, and the identification features in the picture are extracted; then, a decoder (Eecoder) is used for carrying out upsampling to project the semantic meaning onto a pixel space (high resolution), so that dense classification is obtained, and the region where the dense classification is located can be regarded as a text region in the image after the initial rotation.
In some optional implementations of the embodiment, performing semantic segmentation on the primarily rotated image includes: and inputting the primarily rotated image into a depth high-resolution representation learning neural network trained in advance for semantic segmentation.
Specifically, a depth high-resolution representation learning neural network (HRNetv2 neural network) trained in advance may be used, as shown in fig. 3, the depth high-resolution representation learning neural network directly starts with a high-resolution sub-network as a first stage, gradually increases sub-networks from high resolution to low resolution, forms more stages, and connects the multi-resolution sub-networks in parallel; and simultaneously, performing multi-scale feature fusion for many times, so that each high-resolution to low-resolution representation repeatedly receives information from other parallel representations, thereby obtaining abundant high-resolution representations and improving the result accuracy of semantic recognition.
And 204, calculating an included angle between the text line of the text region after the semantic segmentation and the preset direction.
In this embodiment, the text line of the text region after the semantic segmentation is used as a reference, for example, text pixels in the text line are connected, and an angle between the text pixel and a preset direction is calculated according to an angle between the pixel connecting line and the preset direction.
For example, the image after the initial rotation is subjected to binarization processing, the text pixel rows in the image after the initial rotation obtained after binarization processing are horizontally projected, the text pixels in the same text row content are connected in the projection result to obtain a text row connecting line, and the image after the initial rotation is rotated according to the angle of the text row connecting line to obtain a corrected image, wherein the text row of the text region in the corrected image is the same as the preset direction.
Step 205, rotating the primarily rotated image according to the included angle to obtain a corrected image, where a text line of the text region in the corrected image is the same as the preset direction.
In this embodiment, the image after the initial rotation is rotated according to the angle calculated in step 204 to obtain the corrected image, and since the rotation angle is determined according to the angle between the text line of the text region in the image after the initial rotation and the preset direction angle, the text line of the text region in the corrected image is the same as the preset direction after the rotation according to the angle.
The image correction method provided by the embodiment of the application acquires a preprocessed image obtained by preprocessing; wherein, the preprocessed image contains text regions; performing four-degree rotation on the preprocessed image, and taking the rotated preprocessed image with the smallest included angle between the text line of the text region and the preset direction as an image after primary rotation; performing semantic segmentation on the primarily rotated image to obtain the text region after the semantic segmentation; calculating an included angle between a text line of the text region after the semantic segmentation and the preset direction; and rotating the primarily rotated image according to the included angle to obtain a corrected image, wherein the text line of the text area in the corrected image is the same as the preset direction. By the image correction method, the preprocessed image can be quickly rotated and adjusted, and the preprocessed image can be conveniently read and identified subsequently.
In some optional implementations of embodiments of the present application, the method of the embodiment shown in fig. 2 further includes: and cutting the size of the rotated image after the initial rotation.
Specifically, the image after the primary rotation is cut according to a text area in the image after the primary rotation, so that the size of the image after the primary rotation is regular after the image is cut, and reading is facilitated.
The present application further provides an implementation flow 400 of another embodiment for image correction through fig. 4, and on the basis of the above embodiment flow 200, it is intended to provide a rotation manner for an image after the initial rotation through the implementation flow 400, including:
in step 401, a minimum rectangle that can enclose the text region in the primarily rotated image is determined.
Specifically, after semantic segmentation is performed on the preprocessed image through a pre-trained semantic segmentation neural network, an obtained result is usually a pixel region, and the pixel region corresponds to a text region in the image after the initial rotation, and a minimum rectangle which can surround the pixel region is determined.
Step 402, obtaining the vertex coordinates of the minimum rectangle, and determining the inclination angle of the minimum rectangle according to the vertex coordinates.
Specifically, a coordinate system is established to coordinate the vertexes of the minimum rectangle, the posture of the rectangle is determined by the vertexes, and the angle of inclination of the rectangle with respect to the horizontal is determined.
In step 403, the primarily rotated image is rotated according to the inclination angle of the minimum rectangle.
As can be seen from fig. 4, based on the embodiment corresponding to fig. 2, the flow 400 of the method for image correction in this embodiment specifically provides a rotation manner of the image after the initial rotation. Therefore, the scheme described in the embodiment accurately determines the manner of rotating the image rotation angle after the first rotation, so as to accurately rotate the image after the first rotation.
In order to deepen understanding, the application also provides a specific implementation scheme by combining a specific application scene. In the application scenario, the processing result of each step on the preprocessed image is shown in fig. 5a to 5d, and specifically includes:
acquiring a preprocessed image obtained through preprocessing; wherein, the preprocessed image contains text regions;
inputting the preprocessed image into a pre-trained residual error neural network to perform four-classification angle rotation processing, wherein the processing result is shown in fig. 5a, and a picture corresponding to the picture (a) in fig. 5a is determined as an image obtained after primary rotation;
inputting the primarily rotated image into a pre-trained deep high-resolution representation learning neural network for semantic segmentation, wherein the obtained processing result is shown in fig. 5 b;
the smallest rectangle that can enclose the text region in the image after the first rotation is determined, as shown in fig. 5 c.
Establishing a coordinate system, obtaining the vertex coordinates of the minimum rectangle, and determining the inclination angle of the minimum rectangle according to the vertex coordinates, for example, determining the inclination angle of the minimum rectangle by the difference of the coordinates of two coordinates at the bottom side.
And rotating the primarily rotated image according to the inclination angle of the minimum rectangle.
The image after the first rotation is clipped with reference to the text area in the image after the first rotation, and the result is shown in fig. 5 d.
According to the application scene, the image correction method can rapidly rotate and adjust the preprocessed image, and facilitates subsequent reading and identification of the preprocessed image.
As shown in fig. 6, the image correction apparatus 600 of the present embodiment may include: an image acquisition unit 601 configured to acquire a preprocessed image obtained by preprocessing; wherein, the preprocessed image contains text regions; the first rotation unit 602 is configured to perform four-degree rotation on the preprocessed image, and take the rotated preprocessed image with the smallest included angle between the text line of the text region and the preset direction as an initially rotated image; a semantic segmentation unit 603 configured to perform semantic segmentation on the primarily rotated image to obtain the text region after the semantic segmentation; an angle calculating unit 604 configured to calculate an angle between a text line of the text region after the semantic segmentation and the preset direction; a second rotation unit 605 configured to rotate the primarily rotated image according to the included angle, resulting in a corrected image in which the text lines of the text region are the same as the preset direction.
In some optional implementations of the present embodiment, the pre-processing in the image acquisition unit 601 includes: at least one of an image format normalization operation, a watermark removal operation, and a contrast enhancement operation.
In some optional implementations of this embodiment, the watermark eliminating operation step in the image obtaining unit 601 includes: and outputting the original image to a pre-trained neural network for watermark elimination operation to obtain the preprocessed image.
In some optional implementations of the present embodiment, the performing a quarter-degree rotation on the preprocessed image in the first rotation unit 602 includes: the preprocessed image is input into a residual neural network using pre-training for a quarter rotation.
In some optional implementations of this embodiment, performing semantic segmentation on the primarily rotated image in the semantic segmentation unit 603 includes: and inputting the primarily rotated image into a depth high-resolution representation learning neural network trained in advance for semantic segmentation.
In some optional implementations of this embodiment, the step of calculating the included angle between the text line of the text region after the semantic segmentation and the preset direction in the included angle calculation unit 604 includes: determining a minimum rectangle that can enclose the text region; acquiring the vertex coordinates of the minimum rectangle, and determining the inclination angle of the minimum rectangle according to the vertex coordinates; and calculating the included angle between the text line of the text region after the semantic segmentation and the preset direction according to the inclination angle of the minimum rectangle.
In some optional implementations of this embodiment, the apparatus shown above further includes: an image cropping unit configured to crop a size of the corrected image.
According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.
As shown in fig. 7, it is a block diagram of an electronic device according to the method of image correction of the embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
As shown in fig. 7, the electronic apparatus includes: one or more processors 701, a memory 702, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 7, one processor 701 is taken as an example.
The memory 702 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method of image correction provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method of image correction provided herein.
The memory 702, which is a non-transitory computer-readable storage medium, may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules corresponding to the method of image correction in the embodiment of the present application (for example, the image acquisition unit 601, the first rotation unit 602, the semantic segmentation unit 603, the angle calculation unit 604, and the second rotation unit 605 shown in fig. 6). The processor 701 executes various functional applications of the server and data processing, i.e., a method of implementing image correction in the above-described method embodiments, by running non-transitory software programs, instructions, and modules stored in the memory 702.
The memory 702 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the electronic device for image correction, and the like. Further, the memory 702 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 702 may optionally include memory located remotely from the processor 701, which may be connected to image correction electronics over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device of the method of image correction may further include: an input device 703 and an output device 704. The processor 701, the memory 702, the input device 703 and the output device 704 may be connected by a bus or other means, and fig. 7 illustrates an example of a connection by a bus.
The input device 703 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the image-corrected electronic apparatus, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, or other input devices. The output devices 604 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and Virtual Private Server (VPS) service.
According to the technical scheme of the embodiment of the application, a preprocessed image obtained through preprocessing is obtained; wherein, the preprocessed image contains text regions; performing four-degree rotation on the preprocessed image, and taking the rotated preprocessed image with the smallest included angle between the text line of the text region and the preset direction as an image after primary rotation; performing semantic segmentation on the primarily rotated image to obtain the text region after the semantic segmentation; calculating an included angle between a text line of the text region after the semantic segmentation and the preset direction; and rotating the primarily rotated image according to the included angle to obtain a corrected image, wherein the text line of the text area in the corrected image is the same as the preset direction. By the image correction method, the preprocessed image can be quickly rotated and adjusted, and the preprocessed image can be conveniently read and identified subsequently.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present application can be achieved, and the present invention is not limited herein.
The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (16)

1. A method of image correction, comprising:
acquiring a preprocessed image obtained through preprocessing; the preprocessed image comprises a text region;
performing four-degree rotation on the preprocessed image, and taking the rotated preprocessed image with the minimum included angle between the text line of the text region and the preset direction as an image after primary rotation;
performing semantic segmentation on the image subjected to the primary rotation to obtain the text region subjected to the semantic segmentation;
calculating an included angle between a text line of the text region after the semantic segmentation and the preset direction;
and rotating the image after the primary rotation according to the included angle to obtain a corrected image, wherein the text line of the text region in the corrected image is the same as the preset direction.
2. The method of claim 1, wherein the pre-processing comprises:
at least one of an image format normalization operation, a watermark removal operation, and a contrast enhancement operation.
3. The method of claim 2, wherein the watermark removal operation comprises:
and outputting the original image to a pre-trained neural network for watermark elimination operation to obtain the preprocessed image.
4. The method of claim 1, wherein the quarter-degree rotating the pre-processed image comprises:
and inputting the preprocessed image into a residual error neural network which is trained in advance for four-degree rotation.
5. The method of claim 1, wherein the semantically segmenting the primarily rotated image comprises:
and inputting the primarily rotated image into a pre-trained deep high-resolution representation learning neural network for semantic segmentation.
6. The method according to claim 1, wherein the step of calculating an angle between a text line of the text region after the semantic segmentation and the preset direction comprises:
determining a minimum rectangle that can enclose the text region;
acquiring the vertex coordinates of the minimum rectangle, and determining the inclination angle of the minimum rectangle according to the vertex coordinates;
and calculating an included angle between the text line of the text region after the semantic segmentation and the preset direction according to the inclination angle of the minimum rectangle.
7. The method of claims 1-6, further comprising:
and cutting the size of the corrected image.
8. An apparatus for image correction, comprising:
an image acquisition unit configured to acquire a preprocessed image obtained by preprocessing; the preprocessed image comprises a text region;
a first rotation unit configured to perform quarter rotation on the preprocessed image, and take the rotated preprocessed image with a smallest included angle between a text line of the text region and a preset direction as an initially rotated image;
a semantic segmentation unit configured to perform semantic segmentation on the primarily rotated image to obtain the text region after the semantic segmentation;
an included angle calculation unit configured to calculate an included angle between a text line of the text region after the semantic segmentation and the preset direction;
a second rotation unit configured to rotate the primarily rotated image according to the included angle, resulting in a corrected image in which a text line of the text region is the same as the preset direction.
9. The apparatus of claim 8, the pre-processing in the image acquisition unit comprising:
at least one of an image format normalization operation, a watermark removal operation, and a contrast enhancement operation.
10. The apparatus of claim 9, wherein the watermark removal operation comprises:
and outputting the original image to a pre-trained neural network for watermark elimination operation to obtain the preprocessed image.
11. The apparatus of claim 8, wherein the quarter-degree rotating the pre-processed image in the first rotating unit comprises:
and inputting the preprocessed image into a residual error neural network which is trained in advance for four-degree rotation.
12. The apparatus of claim 8, wherein the semantically segmenting the primarily rotated image in the semantically segmenting unit comprises:
and inputting the primarily rotated image into a pre-trained deep high-resolution representation learning neural network for semantic segmentation.
13. The apparatus according to claim 8, wherein the step of calculating an angle between a text line of the text region after the semantic segmentation and the preset direction in the angle calculation unit includes:
determining a minimum rectangle that can enclose the text region;
acquiring the vertex coordinates of the minimum rectangle, and determining the inclination angle of the minimum rectangle according to the vertex coordinates;
and calculating an included angle between the text line of the text region after the semantic segmentation and the preset direction according to the inclination angle of the minimum rectangle.
14. The apparatus of claims 7-13, further comprising:
an image cropping unit configured to crop a size of the corrected image.
15. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.
16. A non-transitory computer readable storage medium storing computer instructions, comprising: the computer instructions are for causing the computer to perform the method of any one of claims 1-7.
CN202010612625.9A 2020-06-30 2020-06-30 Image correction method and device, electronic equipment and computer-readable storage medium Pending CN111767859A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010612625.9A CN111767859A (en) 2020-06-30 2020-06-30 Image correction method and device, electronic equipment and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010612625.9A CN111767859A (en) 2020-06-30 2020-06-30 Image correction method and device, electronic equipment and computer-readable storage medium

Publications (1)

Publication Number Publication Date
CN111767859A true CN111767859A (en) 2020-10-13

Family

ID=72724191

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010612625.9A Pending CN111767859A (en) 2020-06-30 2020-06-30 Image correction method and device, electronic equipment and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN111767859A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110052094A1 (en) * 2009-08-28 2011-03-03 Chunyu Gao Skew Correction for Scanned Japanese/English Document Images
CN106295638A (en) * 2016-07-29 2017-01-04 北京小米移动软件有限公司 Certificate image sloped correcting method and device
CN108921158A (en) * 2018-06-14 2018-11-30 众安信息技术服务有限公司 Method for correcting image, device and computer readable storage medium
CN109345460A (en) * 2018-09-28 2019-02-15 百度在线网络技术(北京)有限公司 Method and apparatus for correcting image
CN109961064A (en) * 2019-03-20 2019-07-02 深圳市华付信息技术有限公司 Identity card text positioning method, device, computer equipment and storage medium
WO2020010547A1 (en) * 2018-07-11 2020-01-16 深圳前海达闼云端智能科技有限公司 Character identification method and apparatus, and storage medium and electronic device
CN111260569A (en) * 2020-01-10 2020-06-09 百度在线网络技术(北京)有限公司 Method and device for correcting image inclination, electronic equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110052094A1 (en) * 2009-08-28 2011-03-03 Chunyu Gao Skew Correction for Scanned Japanese/English Document Images
CN106295638A (en) * 2016-07-29 2017-01-04 北京小米移动软件有限公司 Certificate image sloped correcting method and device
CN108921158A (en) * 2018-06-14 2018-11-30 众安信息技术服务有限公司 Method for correcting image, device and computer readable storage medium
WO2020010547A1 (en) * 2018-07-11 2020-01-16 深圳前海达闼云端智能科技有限公司 Character identification method and apparatus, and storage medium and electronic device
CN109345460A (en) * 2018-09-28 2019-02-15 百度在线网络技术(北京)有限公司 Method and apparatus for correcting image
CN109961064A (en) * 2019-03-20 2019-07-02 深圳市华付信息技术有限公司 Identity card text positioning method, device, computer equipment and storage medium
CN111260569A (en) * 2020-01-10 2020-06-09 百度在线网络技术(北京)有限公司 Method and device for correcting image inclination, electronic equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SAMIR MALAKAR等: "Two-stage skew correction of handwritten Bangla document images", 《 2012 THIRD INTERNATIONAL CONFERENCE ON EMERGING APPLICATIONS OF INFORMATION TECHNOLOGY》 *
王涛;江加和;: "基于语义分割技术的任意方向文字识别", 应用科技, no. 03 *

Similar Documents

Publication Publication Date Title
CN111753727B (en) Method, apparatus, device and readable storage medium for extracting structured information
CN111523468B (en) Human body key point identification method and device
US10943107B2 (en) Simulating image capture
CN114072850A (en) Subtracting video background using depth
CN111260569A (en) Method and device for correcting image inclination, electronic equipment and storage medium
CN111709878A (en) Face super-resolution implementation method and device, electronic equipment and storage medium
US9355333B2 (en) Pattern recognition based on information integration
US11710210B1 (en) Machine-learning for enhanced machine reading of non-ideal capture conditions
JP7389824B2 (en) Object identification method and device, electronic equipment and storage medium
CN112380566A (en) Method, apparatus, electronic device, and medium for desensitizing document image
CN111539897A (en) Method and apparatus for generating image conversion model
CN114550177A (en) Image processing method, text recognition method and text recognition device
CN112241716B (en) Training sample generation method and device
CN112115921A (en) True and false identification method and device and electronic equipment
CN113055593B (en) Image processing method and device
CN111523292B (en) Method and device for acquiring image information
CN111552829B (en) Method and apparatus for analyzing image material
CN111767859A (en) Image correction method and device, electronic equipment and computer-readable storage medium
CN115422389A (en) Method for processing text image, neural network and training method thereof
CN113971810A (en) Document generation method, device, platform, electronic equipment and storage medium
CN112990201A (en) Text box detection method and device, electronic equipment and computer storage medium
CN113610856A (en) Method and device for training image segmentation model and image segmentation
CN112560678A (en) Expression recognition method, device, equipment and computer storage medium
JP7315639B2 (en) Paper data digitization method and device, electronic device, storage medium
CN112101281B (en) Face image detection method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination