CN111104941B - Image direction correction method and device and electronic equipment - Google Patents

Image direction correction method and device and electronic equipment Download PDF

Info

Publication number
CN111104941B
CN111104941B CN201911115498.5A CN201911115498A CN111104941B CN 111104941 B CN111104941 B CN 111104941B CN 201911115498 A CN201911115498 A CN 201911115498A CN 111104941 B CN111104941 B CN 111104941B
Authority
CN
China
Prior art keywords
target
image
information
line segment
rotation angle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911115498.5A
Other languages
Chinese (zh)
Other versions
CN111104941A (en
Inventor
郭双双
龚星
李斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201911115498.5A priority Critical patent/CN111104941B/en
Publication of CN111104941A publication Critical patent/CN111104941A/en
Application granted granted Critical
Publication of CN111104941B publication Critical patent/CN111104941B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/146Aligning or centring of the image pick-up or image-field
    • G06V30/1475Inclination or skew detection or correction of characters or of image to be recognised
    • G06V30/1478Inclination or skew detection or correction of characters or of image to be recognised of characters or characters lines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/469Contour-based spatial representations, e.g. vector-coding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/48Extraction of image or video features by mapping characteristic values of the pattern into a parameter space, e.g. Hough transformation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The disclosure provides an image direction correction method, an image direction correction device and electronic equipment, and relates to the field of artificial intelligence. The method comprises the following steps: acquiring an image to be corrected, and extracting characteristics of a target object in the image to be corrected through an image processing model so as to acquire detection information corresponding to the target object; vectorizing the detection information to obtain coordinate information corresponding to the target object; and determining a rotation angle corresponding to the image to be corrected based on the coordinate information, and correcting the direction of the image to be corrected according to the rotation angle. The method and the device can correct the rotation angle of the fine granularity of the inclined image, and improve the correction accuracy of the image direction; meanwhile, the detection efficiency and the recognition accuracy of the information in the document image can be improved, and the efficiency and the accuracy of subsequent information structuring processing are greatly improved.

Description

Image direction correction method and device and electronic equipment
Technical Field
The present disclosure relates to the field of computer technology, and in particular, to an image direction correction method, an image direction correction device, a computer storage medium, and an electronic apparatus.
Background
With the rapid development of computer technology, the business of each industry is gradually changed from a manual processing mode to a machine processing mode. And with the explosive growth of data volume, people put higher and higher demands on the efficiency and accuracy of machine processing.
Taking the document information extraction as an example, the document image is inclined due to the angle problem during shooting or the influence of other factors, and the inclined document image needs to be corrected in order to ensure the recognition efficiency and the accuracy of the document information. At present, the correction of the document image is mainly to rotate the image according to a limited number of fixed discrete values, such as rotation of 90 degrees, 180 degrees and the like, but the method has a large limitation, and has poor correction effect on the document image with other inclined angles.
In view of this, there is a need in the art to develop a new image direction correction method.
It should be noted that the information disclosed in the above background section is only for enhancing understanding of the background of the present disclosure and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
The embodiment of the disclosure provides an image direction correction method, an image direction correction device, a computer storage medium and electronic equipment, so that fine granularity direction correction can be performed on an inclined image at least to a certain extent, and the accuracy of image direction correction is improved.
Other features and advantages of the present disclosure will be apparent from the following detailed description, or may be learned in part by the practice of the disclosure.
According to an aspect of the embodiments of the present disclosure, there is provided an image direction correction method including: acquiring an image to be corrected, and extracting characteristics of a target object in the image to be corrected through an image processing model so as to acquire detection information corresponding to the target object; vectorizing the detection information to obtain coordinate information corresponding to the target object; and determining a rotation angle corresponding to the image to be corrected based on the coordinate information, and correcting the direction of the image to be corrected according to the rotation angle.
According to an aspect of the embodiments of the present disclosure, there is provided an image direction correcting apparatus including: the detection information acquisition module is used for acquiring an image to be corrected, and extracting characteristics of a target object in the image to be corrected through an image processing model so as to acquire detection information corresponding to the target object; the coordinate information acquisition module is used for carrying out vectorization processing on the detection information so as to acquire coordinate information corresponding to the target object; and the image direction correction module is used for determining a rotation angle corresponding to the image to be corrected based on the coordinate information and correcting the direction of the image to be corrected according to the rotation angle.
In some embodiments of the present disclosure, the image processing model includes a feature extraction sub-model, a feature fusion sub-model, and a post-processing sub-model; based on the foregoing solution, the detection information obtaining module includes: the target feature information acquisition unit is used for carrying out multi-layer convolution on the target object through the feature extraction submodel so as to acquire multi-level target feature information; the target fusion characteristic information acquisition unit is used for carrying out characteristic fusion according to the target characteristic information of each level through the characteristic fusion sub-model so as to acquire target fusion characteristic information; and the detection information acquisition unit is used for carrying out convolution processing on the target fusion characteristic information through the post-processing sub-model so as to acquire detection information corresponding to the target object.
In some embodiments of the disclosure, the feature extraction submodel comprises a first convolution layer, a pooling layer connected with the first convolution layer, and a residual network module connected with the pooling layer, wherein the residual network module comprises m+1 sequentially connected residual network layers, M is a positive integer; based on the foregoing, the target feature information acquisition unit is configured to: inputting the image to be corrected into the first convolution layer, and extracting the characteristics of the target object through the first convolution layer to obtain initial characteristic information; inputting the initial characteristic information into the pooling layer, and performing dimension reduction processing on the initial characteristic information through the pooling layer to obtain dimension reduction characteristic information; and inputting the dimension reduction characteristic information into the residual error network module, and carrying out characteristic extraction on the dimension reduction characteristic information through the residual error network layers which are sequentially connected in the residual error network module so as to obtain the multi-stage target characteristic information.
In some embodiments of the disclosure, the feature fusion sub-model includes N fusion network layers connected in sequence and a second convolution layer connected to the nth fusion network layer; based on the foregoing, the target fusion feature information acquiring unit is configured to: fusing the n-1 level fusion characteristic information and the target characteristic information output by the M+1-n residual error network layers through the nth fusion network layer to obtain n level fusion characteristic information; repeating the previous step until N-level fusion characteristic information is obtained; inputting the N-level fusion characteristic information into the second convolution layer, and carrying out characteristic extraction on the N-level fusion characteristic information through the second convolution layer so as to acquire the target fusion characteristic information; the zero-order fusion characteristic information is target characteristic information output by an M+1th residual error network layer, N is a positive integer not exceeding N, and N is a positive integer not exceeding M.
In some embodiments of the disclosure, the target object is text, the detection information is text detection information, and the post-processing sub-model includes a third convolution layer, a fourth convolution layer, and a fifth convolution layer that are independent of each other; based on the foregoing, the detection information acquisition unit is configured to: extracting features of the target fusion feature information through the third convolution layer to obtain a text detection score map in the text detection information; extracting features of the target fusion feature information through the fourth convolution layer to obtain a text distance regression graph in the text detection information; and extracting features of the target fusion feature information through the fifth convolution layer to obtain character frame angle information in the character detection information.
In some embodiments of the disclosure, the target object is a straight line segment, the detection information is straight line segment detection information, and the post-processing sub-model includes a sixth convolution layer; based on the foregoing, the detection information acquisition unit is configured to: and extracting the characteristics of the target fusion characteristic information through the sixth convolution layer to acquire the straight line segment detection information.
In some embodiments of the disclosure, the target object is a text and a straight line segment, the detection information is a text detection information and a straight line segment detection information, and the post-processing sub-model includes a seventh convolution layer, an eighth convolution layer, a ninth convolution layer, and a tenth convolution layer, independent of each other; based on the foregoing, the detection information acquisition unit is configured to: extracting features of the target fusion feature information through the seventh convolution layer to obtain a text detection score map in the text detection information; extracting features of the target fusion feature information through the eighth convolution layer to obtain a text distance regression graph in the text detection information; extracting features of the target fusion feature information through the ninth convolution layer to obtain character frame angle information in the character detection information; and extracting the characteristics of the target fusion characteristic information through the tenth convolution layer to acquire the straight line segment detection information.
In some embodiments of the disclosure, based on the foregoing solution, the coordinate information acquisition module is configured to: screening pixels in the text detection score map according to a first threshold value to obtain target pixels with text detection scores greater than or equal to the first threshold value; calculating the frame coordinates of the characters corresponding to the target pixels according to the character distance regression graph and the character frame angle information; and filtering the frame coordinates according to the text detection score corresponding to the target pixel and the overlapping degree of the text frame corresponding to the frame coordinates so as to obtain the frame coordinates corresponding to the target text frame.
In some embodiments of the disclosure, based on the foregoing solution, the coordinate information acquisition module is configured to: performing Hough transformation on the straight line segment detection information to obtain coordinate information of a plurality of line segments; determining any two line segments in the plurality of line segments as a first line segment and a second line segment, and calculating a first distance from the midpoint of the first line segment to the second line segment and a second distance from the midpoint of the second line segment to the first line segment; judging whether the first distance and the second distance are smaller than a second threshold value or not; when a target first line segment and a target second line segment with the first distance and the second distance smaller than the second threshold exist, splicing the first target line segment and the second target line segment; and acquiring the endpoint coordinates of the straight line segment formed after the plurality of line segments are spliced.
In some embodiments of the present disclosure, the target object is a text, and the coordinate information is a frame coordinate of a frame of the target text; based on the foregoing, the image direction correction module is configured to: determining an upper edge line and a lower edge line of the text frame according to the frame coordinates of the target text frame; calculating the slopes of the upper edge line and the lower edge line, counting the occurrence times of the slopes, and obtaining a first target slope with the largest occurrence times; and determining the rotation angle of the characters according to the first target slope, and taking the rotation angle of the characters as the rotation angle corresponding to the image to be corrected.
In some embodiments of the present disclosure, the target object is a straight line segment, and the coordinate information is an endpoint coordinate of the straight line segment; based on the foregoing, the image direction correction module is configured to: calculating the slope of the straight line segment according to the endpoint coordinates of the straight line segment; counting the occurrence times of the slopes, and obtaining a second target slope with the largest occurrence times; and determining the rotation angle of the straight line segment according to the second target slope, and taking the rotation angle of the straight line segment as the rotation angle corresponding to the image to be corrected.
In some embodiments of the present disclosure, the target object is a text and a straight line segment, and the coordinate information is a frame coordinate of a frame of the target text and an endpoint coordinate of the straight line segment; based on the foregoing, the image direction correction module is configured to: calculating the rotation angle of the text according to the frame coordinates of the target text frame, and calculating the rotation angle of the straight line segment according to the endpoint coordinates of the straight line segment; the rotation angle of the characters is differenced with the rotation angle of the straight line segment, and the absolute value is taken, so that the rotation angle difference is obtained; comparing the rotation angle difference with a third threshold; when the rotation angle difference is smaller than or equal to the third threshold value, acquiring an average value of the rotation angle of the characters and the rotation angle of the straight line segment, and taking the average value as the rotation angle corresponding to the image to be corrected; and when the rotation angle difference is larger than the third threshold value, taking the rotation angle of the straight line segment as the rotation angle corresponding to the image to be corrected.
In some embodiments of the disclosure, based on the foregoing scheme, the image direction correction module is configured to: determining a rotation matrix according to the rotation angle and the center point coordinates of the image to be corrected; multiplying the pixel matrix corresponding to the image to be corrected by the rotation matrix to correct the direction of the image to be corrected.
According to an aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the image direction correction method as described in the above embodiments.
According to one aspect of an embodiment of the present disclosure, there is provided an electronic device including one or more processors; and a storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to perform the image direction correction method as described in the above embodiments.
In the technical scheme provided by the embodiment of the disclosure, firstly, an image to be corrected is obtained, and a target object in the image is subjected to feature extraction through an image processing model to obtain detection information corresponding to the target object; then, vectorizing the target object according to the detection information to obtain coordinate information corresponding to the target object; and finally, determining a rotation angle corresponding to the image to be corrected according to the coordinate information, and correcting the direction of the image to be corrected according to the rotation angle. According to the technical scheme, fine granularity direction correction can be carried out on the inclined image, and accuracy of image direction correction is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure. It will be apparent to those of ordinary skill in the art that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived from them without undue effort. In the drawings:
FIG. 1 shows a schematic diagram of an exemplary system architecture to which the technical solution of embodiments of the present disclosure may be applied;
FIG. 2 schematically illustrates a flow diagram of an image direction correction method according to one embodiment of the present disclosure;
FIG. 3 schematically illustrates a structural schematic of an image processing model according to one embodiment of the present disclosure;
FIG. 4 schematically illustrates a structural schematic of a feature extraction sub-model according to one embodiment of the disclosure;
FIG. 5 schematically illustrates a flow diagram of feature extraction by an image processing model according to one embodiment of the disclosure;
FIG. 6 schematically illustrates a structural schematic of a residual network layer according to one embodiment of the present disclosure;
FIG. 7 schematically illustrates a structural schematic of a feature fusion sub-model according to one embodiment of the present disclosure;
FIG. 8 schematically illustrates a structural schematic of a converged network layer in accordance with one embodiment of the present disclosure;
9A-9C schematically illustrate interface diagrams of an image to be corrected before and after processing by an image processing model according to one embodiment of the present disclosure;
FIG. 10 schematically illustrates a flow diagram of vectorization processing according to text detection information according to one embodiment of the present disclosure;
11A-11B schematically illustrate interface diagrams of images to be corrected before and after segment stitching according to one embodiment of the present disclosure;
12A-12B schematically illustrate interface diagrams after vectoring the detection information according to one embodiment of the present disclosure;
13A-13B schematically illustrate interface diagrams before and after correction of the orientation of an image to be corrected according to one embodiment of the present disclosure;
FIGS. 14A-14D schematically illustrate interface diagrams for orientation correction of an oblique medical document image according to one embodiment of the present disclosure;
15A-15D schematically illustrate interface diagrams for orientation correction of an oblique medical document image according to one embodiment of the present disclosure;
FIG. 16 schematically illustrates a block diagram of an image orientation correction apparatus according to one embodiment of the present disclosure;
fig. 17 shows a schematic diagram of a computer system suitable for use in implementing embodiments of the present disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the disclosed aspects may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known methods, devices, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the disclosure.
The block diagrams depicted in the figures are merely functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The flow diagrams depicted in the figures are exemplary only, and do not necessarily include all of the elements and operations/steps, nor must they be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.
Fig. 1 shows a schematic diagram of an exemplary system architecture to which the technical solution of the embodiments of the present disclosure may be applied.
As shown in fig. 1, system architecture 100 may include a terminal device 101, a network 102, and a server 103. Network 102 is the medium used to provide communication links between terminal device 101 and server 103. Network 102 may include various connection types, such as wired communication links, wireless communication links, and the like.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks and servers as practical. For example, the server 103 may be a server cluster formed by a plurality of servers. The terminal device 101 may be a terminal device with an imaging unit and a display screen such as a notebook, a portable computer, a smart phone, or a terminal device such as a camera, a video camera, or the like.
In one embodiment of the present disclosure, a user may photograph an object including a target object through an image capturing unit in the terminal apparatus 101 to acquire an image including the target object. Due to reasons such as a shooting angle, improper operation of a submitter or limited submission scene, etc., the finally acquired image may have inclination, does not conform to a normal reading direction, the terminal device 101 may send the image to be corrected with inclination to the server 103 through the network 102, after the server 103 receives the image to be corrected, feature extraction may be performed on a target object in the image to obtain detection information corresponding to the target object, where the detection information is information at a pixel level, for example, where characters exist, where lines exist, etc., are represented by values of pixel points; then, vectorization processing can be performed on the detection information to obtain coordinate information corresponding to the target object, for example, frame coordinates of a text frame, endpoint coordinates of a straight line segment and the like can be obtained; then, based on the coordinate information corresponding to the target object, determining a rotation angle corresponding to the image to be corrected, wherein the rotation angle is the inclination angle of the image to be corrected; and finally, determining a rotation matrix according to the rotation angle, and transforming pixels of the image to be corrected according to the rotation matrix, so as to realize correction of the direction of the image to be corrected. The image to be corrected in the embodiment of the disclosure can be various document images, such as medical records, a bill of charge, a settlement bill, a medical invoice, a check report, a financial statement and the like, and the technical scheme of the embodiment of the disclosure can correct the image with any inclination angle, thereby realizing correction of the fine granularity direction of the inclined image and improving the accuracy of correction of the image direction; in addition, the information recognition is carried out on the image after the correction direction, so that the recognition efficiency and the recognition accuracy can be improved.
It should be noted that, the image direction correction method provided in the embodiments of the present disclosure is generally executed by a server, and accordingly, the image direction correction device is generally disposed in the server. However, in other embodiments of the present disclosure, the image direction correction method provided by the embodiments of the present disclosure may also be performed by a terminal device.
In the related art in the field, taking correction of document images as an example, due to the numerous patterns of document images, it is difficult to ensure that a method based on a simple spatial rule is applicable to document images of various patterns, and two methods for correcting image directions exist at present, the first method is that: first, a character region in a document image is located, and then, structural features including character aspect ratio, stroke features, connected domain features, and the like are extracted for character blocks in the character region. If the response value of each structural feature is greater than the preset credibility, indicating that the document direction is the normal reading direction, otherwise, rotating the image by 90 degrees, 180 degrees and 270 degrees in sequence until the response value of the structural feature is greater than the credibility, namely correcting the direction of the document image to be the normal reading direction; second kind: firstly, obtaining a text candidate region containing effective information, then carrying out feature extraction on the text candidate region by adopting a neural network, classifying the text region and a non-text region by utilizing a Softmax function, finally grouping the text region, gradually obtaining an overlapping part with a training set image, and obtaining a final corrected text by overlapping all the overlapping parts.
In both methods, the text region is first located, and then the region is rotated a limited number of times in turn, so as to obtain the orientation with the largest characteristic response or the overlapping part of the text region under the orientation and the training set image. However, the rotation angle of the document image is not exhaustive, classifying the orientation of the document image into a preset value is inaccurate and unreliable, and has a large limitation in practical application scenes.
In view of the problems in the related art, the embodiments of the present disclosure provide an image direction correction method implemented based on artificial intelligence (Artificial Intelligence, AI), which is a theory, method, technique, and application system that simulates, extends, and expands human intelligence using a digital computer or a machine controlled by a digital computer, senses an environment, acquires knowledge, and uses the knowledge to obtain an optimal result. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.
The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
Computer Vision (CV) is a science of studying how to "look" a machine, and more specifically, to replace human eyes with a camera and a Computer to perform machine Vision such as recognition, tracking and measurement on a target, and further perform graphic processing to make the Computer process into an image more suitable for human eyes to observe or transmit to an instrument to detect. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision techniques typically include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D techniques, virtual reality, augmented reality, synchronous positioning, and map construction, among others, as well as common biometric recognition techniques such as face recognition, fingerprint recognition, and others.
Machine Learning (ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.
With research and advancement of artificial intelligence technology, research and application of artificial intelligence technology is being developed in various fields, such as common smart home, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned, automatic driving, unmanned aerial vehicles, robots, smart medical treatment, smart customer service, etc., and it is believed that with the development of technology, artificial intelligence technology will be applied in more fields and with increasing importance value.
The scheme provided by the embodiment of the disclosure relates to an image processing technology of artificial intelligence, and is specifically described by the following embodiments:
Fig. 2 schematically illustrates a flow chart of an image direction correction method according to one embodiment of the present disclosure, which may be performed by a server, which may be the server 103 illustrated in fig. 1. Referring to fig. 2, the image direction correcting method at least includes steps S210 to S230, and is described in detail as follows:
in step S210, an image to be corrected is acquired, and feature extraction is performed on a target object in the image to be corrected through an image processing model, so as to acquire detection information corresponding to the target object.
In one embodiment of the disclosure, taking a document image as an example, when information in the document image is generally extracted through a computer algorithm, a paper document needs to be photographed first to obtain the document image, and then the document image is uploaded to a terminal or a server for subsequent information identification and information extraction. When a user photographs a paper document through the terminal device 101 with an image capturing unit, there may be a tilt in the photographed document image due to the influence of factors such as a photographing angle, a paper placement angle, etc., and at the same time, when the user uploads the document image in a local storage or other storage medium through the terminal device 101, there may be a tilt in the document image due to improper operation. When the inclined document image is an image in an abnormal reading direction, and characters in the inclined document image are recognized by an optical character recognition algorithm (OCR algorithm) or other machine learning models in the later period, the recognition efficiency and recognition accuracy are poor, and the character recognition result with poor performance can seriously influence the subsequent information structuring processing process, so that the accuracy of the whole system is reduced. Therefore, after the document image with inclination is acquired, the direction of the document image needs to be corrected, so that the document image identified by the follow-up information is ensured to be the document image with the normal reading direction.
In one embodiment of the present disclosure, a document image with inclination is used as an image to be corrected, and after the image to be corrected is received, feature extraction may be performed on a target object therein, so as to obtain detection information corresponding to the target object. Specifically, feature extraction can be performed on a target object in an image to be corrected through an image processing model so as to acquire detection information corresponding to the target object. In the embodiment of the present disclosure, the target object may be a text and/or a straight line segment in the image to be corrected, for example, when the content in the image to be corrected is pure text information such as a log, a novel, etc., the text is the target object; when the image to be corrected is an image containing characters and straight line segments, such as a medical document, a report, a music score and the like, the characters and the straight line segments are target objects; when the image to be corrected is a blank image such as a medical document or report, the straight line segment is the target object. Of course, other types of images to be corrected are also possible, and the images can be taken as target objects as long as characters and/or straight line segments exist in the images.
In one embodiment of the present disclosure, fig. 3 shows a schematic structural diagram of an image processing model, and as shown in fig. 3, an image processing model 300 includes a feature extraction sub-model 301, a feature fusion sub-module 302, and a post-processing sub-model 303, where the feature extraction sub-model 301 is used to perform multi-layer convolution on a target object to obtain multi-level target feature information; the feature fusion sub-model 302 is used for carrying out feature fusion according to the target feature information of each level so as to obtain target fusion feature information; the post-processing sub-model 303 is used for performing convolution processing on the target fusion characteristic information to obtain detection information corresponding to the target object.
In one embodiment of the present disclosure, fig. 4 shows a schematic structural diagram of a feature extraction sub-model 301, and as shown in fig. 4, the feature extraction sub-model 301 includes a first convolution layer 401, a pooling layer 402, and a residual network module 403 connected to the pooling layer 402, where the residual network module 403 includes m+1 residual network layers connected in sequence: resBlock1, resBlock2, … … ResBlock M+1, where M is a positive integer.
In one embodiment of the present disclosure, fig. 5 shows a schematic flow chart of feature extraction by an image processing model, and as shown in fig. 5, a feature extraction sub-model 301 includes a first convolution layer 501, a pooling layer 502, and four residual network layers 503-1, 503-2, 503-3, and 503-4 connected in sequence, where the convolution kernel of the first convolution layer 501 has a size of 7×7, the number of channels is 64, and the number of channels of the residual network layers 503-1, 503-2, 503-3, and 503-4 are 64, 128, 256, and 512, respectively. When the target object is subjected to multi-layer convolution through the feature extraction sub-model 301, the specific flow is as follows: firstly, inputting an image to be corrected into a first convolution layer 501, and extracting features of a target object through the first convolution layer 501 to obtain initial feature information; inputting the initial characteristic information into a pooling layer 502, and performing dimension reduction processing on the initial characteristic information through the pooling layer 502 to obtain dimension reduction characteristic information; finally, the dimension reduction characteristic information is input into a residual error network module 503, and the dimension reduction characteristic information is subjected to characteristic extraction through residual error network layers 503-1, 503-2, 503-3 and 503-4 which are sequentially connected in the residual error network module 503 so as to obtain multi-level target characteristic information. When feature extraction is performed on the dimension reduction feature information through the residual network layers 503-1, 503-2, 503-3 and 503-4 which are connected in sequence, convolution processing is performed on the dimension reduction feature information through the residual network layer 503-1 so as to obtain primary target feature information; then, the first-level target feature information is convolved through the residual network layer 503-2 to obtain second-level target feature information; then, the secondary target feature information is subjected to convolution processing through a residual error network layer 503-3 so as to obtain tertiary target feature information; and finally, carrying out convolution processing on the three-level target characteristic information through the residual network layer 503-4 to obtain four-level target characteristic information. Of course, the number of residual network layers in the residual network module 503 is not limited to four, but may be any other number, and embodiments of the present disclosure are not limited thereto in particular.
In one embodiment of the present disclosure, fig. 6 further illustrates a schematic structure diagram of a residual network layer, as illustrated in fig. 6, where the residual network layer sequentially includes an input layer 601, a convolution layer 602, an activation layer 603, a convolution layer 604, a connection layer 605, an activation layer 606, and an output layer 607, where convolution kernels of the convolution layer 602 and the convolution layer 604 are each 3×3, and the number of channels is 64; the activation functions employed by activation layer 603 and activation layer 606 may be a ReLU function, a sigmod function, or the like. After receiving the output information of the network layer before the current residual error network layer, the input layer 601 takes the output information as the input information of the current residual error network layer, and convolves the input information through the convolution layer 602 to obtain first characteristic information; then, the first feature information is processed through the activation layer 603, and the first feature information after the activation processing is input to the convolution layer 604, so that further feature extraction is performed on the first feature information and second feature information is obtained; then, the second characteristic information and the input information are connected through the connection layer 605, the connected characteristic information is input to the activation layer 606, and the second characteristic information and the input information are processed through the activation layer 606; and finally, outputting the second characteristic information processed by the activation layer 606 to the next network layer through the output layer 607 for subsequent processing.
In the residual network layer, the input information is subjected to a convolution layer and an activation layer to obtain nonlinear expression information corresponding to the input information, and then the nonlinear expression information is added with the input information to obtain an output result, so that the information of a shallow network can be ensured to be fully transmitted to a deep network, the network training is easier, the degradation problem in the network training process can be solved, and the representation capability of the network is improved.
In one embodiment of the present disclosure, FIG. 7 shows a schematic structural diagram of a feature fusion sub-model 302, where as shown in FIG. 7, the feature fusion sub-model 302 includes N fusion network layers 701-1, 701-2, … …, 701-N connected in sequence and a second convolution layer 702 connected to the Nth fusion network layer 701-N. After the multi-level target feature information is acquired, the n-1 level fusion feature information and the target feature information output by the M+1-n th residual error network layer can be fused through the n-th level fusion network layer in the feature fusion sub-model 302 so as to acquire the n-level fusion feature information; repeating the previous step until N-level fusion characteristic information is obtained; then, the N-level fusion characteristic information can be input into a second convolution layer, and the N-level fusion characteristic information is subjected to characteristic extraction through the second convolution layer so as to obtain target fusion characteristic information; the zero-order fusion characteristic information is target characteristic information output by an M+1th residual error network layer, N is a positive integer not exceeding N, and N is a positive integer not exceeding M.
Returning to fig. 5, the feature fusion sub-model 302 specifically includes three fusion network layers 504-1, 504-2, and 504-3 connected in sequence and a second convolution layer 505 connected to the third fusion network layer 504-3, where the number of channels of the fusion network layers 504-1, 504-2, and 504-3 are 128, 64, and 32, respectively, and the convolution kernel size of the second convolution layer 505 is 3×3 and the number of channels is 32. When feature fusion is performed, first, the first fusion network layer 504-1 fuses the four-level target feature information output by ResBlock4 with the three-level target feature information output by ResBlock3 to obtain first-level fusion feature information; then, the second fusion network layer 504-2 fuses the primary fusion characteristic information and the secondary target characteristic information output by ResBlock2 to obtain secondary fusion characteristic information; then, the third fusion network layer 504-3 fuses the secondary fusion characteristic information and the primary target characteristic information output by ResBlock1 to obtain tertiary fusion characteristic information; and finally, inputting the three-level fusion characteristic information into a second convolution layer 505, and carrying out characteristic extraction on the three-level fusion characteristic information through the second convolution layer 505 so as to acquire target fusion characteristic information.
In one embodiment of the present disclosure, fig. 8 further illustrates a schematic structure diagram of a converged network layer, as illustrated in fig. 8, where the converged network layer includes a first input layer 801, a second input layer 802, an upper pooling layer 803, a connection layer 804, a convolution layer 805, an activation layer 806, a convolution layer 807, an activation layer 808, and an output layer 809, where the convolution kernel sizes of the convolution layer 805 and the convolution layer 807 are 3×3, and the channel number is 64; the activation functions employed by activation layer 806 and activation layer 808 may be a ReLU function, a sigmod function, or the like. After receiving the n-1 level fusion characteristic information, the first input layer 801 sends the n-1 level fusion characteristic information to the upper pooling layer 803, so that the upper pooling layer 803 pools the n-1 level fusion characteristic information to form n-1 level fusion characteristic information with the same size as the target characteristic information output by the (M+1) -n th residual error network layers received by the second input layer 802; then, connecting the n-1 level fusion characteristic information after upper pooling with target characteristic information output by the M+1th-n residual error network layers through a connecting layer 804 to obtain connection characteristic information; then, the connection characteristic information is subjected to convolution-activation-convolution-activation processing sequentially through a convolution layer 805, an activation layer 806, a convolution layer 807 and an activation layer 808 so as to obtain n-level fusion characteristic information; and finally, the n-level fusion characteristic information is output to the next network layer through an output layer 809 for subsequent processing.
In the fusion network layer, after a series of operations are performed on the two different-scale features, namely the output feature of the deep network and the output feature of the last shallow layer, the input information of the next fusion network layer can be obtained through fusion, and like this, continuously fusing the multi-scale features is beneficial to providing richer information. Meanwhile, unlike the use of large convolution kernels like 5×5, 7×7, the nonlinear characterization capability of the network can be improved by superimposing multiple small convolution kernels of 3×3 to obtain the same receptive field and increasing the number of active layers.
In one embodiment of the present disclosure, the structure of the post-processing submodel 303 varies according to the target object. When the target object is a word, the task of the image processing model is mainly to obtain word detection information, where the word detection information may specifically include a word detection score map, a word distance regression map, and a word frame angle, and correspondingly, the post-processing sub-model 303 may include three convolution layers: the third convolution layer, the fourth convolution layer and the fifth convolution layer are used for carrying out feature extraction on the target fusion feature information through the third convolution layer to obtain a character detection score graph, carrying out feature extraction on the target fusion feature information through the fourth convolution layer to obtain a character distance regression graph, and carrying out feature extraction on the target fusion feature information through the fifth convolution layer to obtain a character frame angle; when the target object is a straight line segment, the task of the image processing model is to acquire the detection information of the straight line segment, and then the post-processing sub-model 303 may include only one convolution layer: the sixth convolution layer is used for extracting characteristics of the target fusion characteristic information to obtain straight line segment detection information; when the target object is a text and a straight line segment, the task of the image processing model is to obtain text detection information and straight line segment detection information, and then the post-processing sub-model 303 may include four convolution layers: and the seventh convolution layer, the eighth convolution layer, the ninth convolution layer and the tenth convolution layer are used for respectively obtaining a text detection score graph, a text distance regression graph and a text frame angle through the seventh convolution layer, the eighth convolution layer and the ninth convolution layer, and extracting features of the target fusion feature information through the tenth convolution layer to obtain straight line segment detection information.
Returning to fig. 5, the post-processing submodel 303 includes a third convolution layer 506, a fourth convolution layer 507, a fifth convolution layer 508, and a sixth convolution layer 509, where the third convolution layer 506 is configured to perform feature extraction on the target fusion feature information to obtain a text detection score map; the fourth convolution layer 507 is used for extracting features of the target fusion feature information to obtain a text distance regression graph; the fifth convolution layer 508 is configured to perform feature extraction on the target fusion feature information to obtain text frame angle information; the sixth convolution layer 509 is configured to perform feature extraction on the target fusion feature information to obtain straight-line segment detection information.
Fig. 9A-9C are schematic diagrams showing interfaces before and after the image to be corrected is processed by the image processing model, and as shown in fig. 9A, the original image to be corrected with inclination is shown, and after feature extraction is performed on characters and straight line segments in the image to be corrected by the image processing model, pixel-level character detection information (fig. 9B) and straight line segment detection information (fig. 9C) can be obtained.
In one embodiment of the present disclosure, each pixel value in the text detection score map represents the magnitude of the likelihood that text exists at the location, each pixel value of the text distance regression map represents the distance of the location to the four edges of the nearest text border, and each pixel value of the text border angle information represents the rotation angle of the text at the location. And determining the vertex coordinates of the text frame according to the text detection score map, the text distance regression map and the text frame angle information. Meanwhile, the straight line segment detection information is used for obtaining a pixel level detection result of the straight line segment in the image to be corrected.
The image processing model in the embodiment of the disclosure is of an effective and unified network structure, can output the pixel-level detection results of characters and straight lines in a document at the same time, is simple and efficient, utilizes the synergistic effect between the characters and the straight line segments, and combines the information of the characters and the straight line segments to judge the rotation angle of the document image at the same time, so that the robustness is better.
In step S220, the detection information is vectorized to obtain coordinate information corresponding to the target object.
In one embodiment of the present disclosure, the text detection information and/or the straight line segment detection information acquired in step S210 are both detection results at the pixel level, and specific coordinate information of the text and/or the straight line segment in the image to be corrected cannot be given, so that vectorization processing needs to be performed on the obtained detection information to acquire coordinate information corresponding to the target object.
In one embodiment of the present disclosure, after the text detection information is obtained, the frame coordinates corresponding to the target text frame may be obtained according to the text detection score map, the text distance regression map, and the text frame angle therein. Fig. 10 shows a schematic flow chart of vectorization processing according to text detection information, and as shown in fig. 10, the flow at least includes steps S1001-S1003, specifically:
In step S1001, pixels in the text detection score map are filtered according to the first threshold value to obtain a target pixel with a text detection score greater than or equal to the first threshold value.
In one embodiment of the present disclosure, the text detection score map is specifically a number matrix, where each number is an arbitrary value in [0,1], which indicates a probability that a pixel at the location is text, that is, when the detection score value is small, the probability that the text exists at the location is indicated to be small, and when the detection score value is large, the probability that the text exists at the location is indicated to be large, so that it is necessary to set a first threshold value, screen and filter the detection scores in the text detection score map, by comparing the detection scores at each location with the first threshold value, only the detection score greater than or equal to the first threshold value is reserved, and the reserved pixel location corresponding to the detection score is the location where the text exists, further, the pixel at the location where the text exists may be defined as the target pixel, where the first threshold value may be set to 0.8, or may be set to other values, such as 0.78, 0.9, and so on.
In step S1002, the frame coordinates of the text corresponding to the target pixel are calculated from the text distance regression graph and the text frame angle information.
In one embodiment of the present disclosure, after obtaining pixel positions where there is a high probability of having a text, the frame coordinates of the text corresponding to the target pixel may be calculated at the pixel positions according to the text distance regression graph and the text frame angle information, where the frame coordinates are four vertex coordinates of the text frame corresponding to the text. Each pixel point in the text distance regression graph and the text frame angle information is in one-to-one correspondence, and for each pixel point (c x ,c y ) 4 distance values d corresponding to the pixel point positions can be obtained from the text distance regression graph 1 、d 2 、d 3 And d 4 The vertical distances from the pixel point to the upper boundary, the right boundary, the lower boundary and the left boundary of the frame are respectively represented, and an angle value corresponding to the pixel point position is obtained from the text frame angle information, wherein the angle value represents the rotation angle theta of the text frame. According to (c) x ,c y )、d 1 、d 2 、d 3 、d 4 And the value of theta, the values of the coordinates of 4 vertexes of the text frame can be calculated, and if the coordinate value of the vertex of the upper left corner is: x= (c) x -d 4 )cosθ+(c y +d 1 )sinθ+(1-cosθ)c x -sinθc y ,y=-(c x -d 4 )sinθ+(c y +d 1 )cosθ+sinθc x +(1-cosθ)c y Other vertex coordinates may also be obtained by similar calculation methods, and embodiments of the present disclosure are not described herein.
In step S1003, the frame coordinates are filtered according to the text detection score corresponding to the target pixel and the overlapping degree of the text frame corresponding to the frame coordinates, so as to obtain the frame coordinates corresponding to the target text frame.
In one embodiment of the present disclosure, a large number of text borders may be obtained by the method of step S1002, and there may be a large degree of overlapping between some text borders, so in order to obtain the border coordinates corresponding to the target text borders, the overlapping text borders may be screened and filtered according to the text detection score and the overlapping degree of the text borders, so as to obtain the target text borders with large output response and small overlapping area, where the output response is the text detection score, that is, the possibility that the target text borders contain text is extremely large and there is no overlapping or little overlapping between the target text borders and other target text borders. In the embodiment of the disclosure, a non-maximum suppression method may be specifically adopted to perform filtering, one characteristic text frame with the largest text detection score is selected from a group of text frames, the overlapping degree between other text frames in the group of text frames and the characteristic text frames is calculated, when the overlapping degree is greater than a preset threshold value, the corresponding text frames are filtered, and then the steps are repeated continuously in the rest other groups of text frames, so that a plurality of target text frames with large output response and small overlapping area can be obtained, and further the frame coordinates corresponding to the target text frames are obtained.
In one embodiment of the present disclosure, after the straight line segment detection information is acquired, vectorization processing may be performed on the straight line segment detection information to acquire coordinate information corresponding to the straight line segment. Specifically, hough transformation can be performed on the straight line segment detection information, and coordinate information of a plurality of line segments is obtained according to the pixel-level straight line segment detection information, wherein the coordinate information is coordinate information of each line segment in a Cartesian rectangular coordinate system; then, the nearest neighbor line segments in the plurality of line segments can be spliced to obtain longer straight line segments; and finally, after all the spliced straight-line segments are obtained, obtaining the endpoint coordinates of the two ends of the straight-line segments.
When carrying out Hough transformation according to straight line segment detection information, initializing a correlation matrix; the correlation matrix comprises an angle list, a distance list and a voting matrix, wherein the angle list alpha= [0,1,2, …,178,179], and each angle in the angle list refers to the angle between the perpendicular line from the origin to the target straight line segment and the x coordinate axis; distance list rho= [ -diag_len+1, …, diag_len-1, diag_len ], each distance in the distance list referring to the perpendicular distance of origin to the target straight line segment; the values in the voting matrix are all 0, the number of row elements is the number of elements in the distance list, and the number of column elements is the number of elements in the angle list. Then, each non-zero pixel point in the straight line segment detection information of the pixel level is traversed, each angle value in the angle list is traversed, a vertical line distance value corresponding to the pixel point under the angle value is calculated, the distance value and the angle value can form a data pair (rho, alpha), and if the data pair corresponds to the position (rho+diag_len, alpha) in the voting matrix, the value of the position in the voting matrix is added with 1 in an accumulated mode. In the voting matrix, the larger the accumulated value of a certain matrix position is, the larger the confidence coefficient of the corresponding (rho, alpha) value of the matrix position is, so that the accumulated value of each matrix position in the voting matrix is compared with a preset threshold value, the target matrix position with the accumulated value larger than the preset threshold value can be obtained, and the (rho, alpha) on the target matrix position can be obtained. Since a straight line in the rectangular coordinate system corresponds to a point in the polar coordinate system, if a plurality of curves intersect at a point in the polar coordinate system, the possibility that the point is a straight line in the rectangular coordinate system is high, and therefore, the relevant information of the straight line segment in the image to be processed can be calculated according to (rho, alpha) on the target matrix position.
When splicing the nearest neighbor line segments in the plurality of line segments, any two line segments in the plurality of line segments can be determined to be a first line segment and a second line segment; then calculating a first distance from the midpoint of the first line segment to the second line segment and a second distance from the midpoint of the second line segment to the first line segment; and then comparing the first distance and the second distance with a second threshold value respectively, and judging whether to splice the first line segment and the second line segment according to the comparison result. Specifically, when the first distance and the second distance are both smaller than the second threshold value, it is explained that the first line segment and the second line segment may constitute a straight line segment, and therefore the first line segment and the second line segment may be spliced. Strictly speaking, in a plurality of broken line segments, any two line segments forming the same straight line segment have a distance from the middle point of one line segment to the other line segment of 0, but in order to improve the application range of the image direction correction method provided by the embodiment of the disclosure, the second threshold may be set to a smaller value that is not zero, for example, 20, even if the first line segment and the second line segment are not strictly located on the same straight line, so long as the second threshold is not exceeded, they may be spliced. By executing the above processing on all the line segments, longer straight line segments can be obtained, the number of the line segments is reduced, the length is increased, and further, more accurate straight line slopes can be obtained, and the whole image is more regular. Fig. 11A-11B are schematic diagrams of interfaces of images to be corrected before and after the segment stitching, as shown in fig. 11A, which are broken segments after hough transformation, and after stitching, a plurality of longer straight segments are formed, as shown by thicker lines in fig. 11B.
Meanwhile, fig. 12A-12B are schematic diagrams showing interfaces after vectorizing the detected information, and as shown in fig. 12A, the text frame is formed according to frame coordinates corresponding to the target text frame obtained after vectorizing the text detected information; as shown in fig. 12B, the straight line segment is formed from the coordinates of a plurality of line segments obtained by vectorizing the straight line segment detection information.
In step S230, a rotation angle corresponding to the image to be corrected is determined based on the coordinate information, and the direction of the image to be corrected is corrected according to the rotation angle.
In one embodiment of the present disclosure, when the target object is only a text, after obtaining the frame coordinates of the target text frame with a large output response and a small overlapping area, the upper edge and the lower edge of the text frame may be determined according to the frame coordinates, and then the slopes of the upper edge and the lower edge may be calculated. Since the frame coordinates corresponding to the plurality of target text frames are obtained in step S220, after calculating the slopes of the upper edge and the lower edge of the text frames, a lot of slopes of the upper edge and the slopes of the lower edge are generated, and the slopes of the upper edge and the slopes of the lower edge are different, so that in order to determine the rotation angle of the image to be corrected therefrom, statistics can be performed on all the slopes, the occurrence times of each slope are obtained, and the slope with the largest occurrence times is used as the first target slope, and the angle corresponding to the first target slope is the rotation angle of the text and is the rotation angle of the image to be corrected. For example, the first target slope is 1, then the rotation angle of the image to be corrected is 45 ° clockwise or 225 ° counterclockwise from the normal reading direction.
In one embodiment of the present disclosure, when the target object is only a straight line segment, since there may be a plurality of straight line segments in the image to be corrected, in order to obtain the rotation angle of the image to be corrected, the slope of each straight line segment may be calculated, then the occurrence times of each slope may be counted, and the slope with the largest occurrence times may be used as the second target slope, where the angle corresponding to the second target slope is the rotation angle of the straight line segment, that is, the rotation angle of the image to be corrected. For example, the second target slope is
Figure BDA0002273922320000191
The rotation angle of the image to be corrected is 30 deg. rotated counterclockwise or 270 deg. rotated clockwise from the normal reading direction. When only one straight line segment exists in the image to be corrected, the angle corresponding to the slope of the straight line segment is the rotation angle of the image to be corrected.
In one embodiment of the present disclosure, when the target object is a text and a straight line segment, the rotation angle of the text and the rotation angle of the straight line segment may be obtained by the above method, and then the rotation angle of the image to be corrected may be determined according to the rotation angle of the text and the rotation angle of the straight line segment. Specifically, firstly, the rotation angle of the characters is differed from the rotation angle of the straight line segment and the absolute value is taken, so that the rotation angle difference is obtained; then comparing the rotation angle difference with a third threshold value; when the rotation angle difference is smaller than or equal to a third threshold value, acquiring an average value of the rotation angle of the characters and the rotation angle of the straight line segment, and taking the average value as the rotation angle corresponding to the image to be corrected; and when the rotation angle difference is larger than a third threshold value, taking the rotation angle of the straight line segment as the rotation angle corresponding to the image to be corrected. For example, when the rotation angle of the character is 45 ° rotated clockwise, the rotation angle of the straight line segment is 44 ° rotated clockwise, and the third threshold value is 1 °, the rotation angle difference between the two can be determined to be 1 ° according to the rotation angle of the character and the rotation angle of the straight line segment, the rotation angle difference is equal to the third threshold value, then the average value of the rotation angle of the character and the rotation angle of the straight line segment can be found to be 45.5 °, and the average value to be 45.5 ° is taken as the rotation angle of the image to be corrected. If the rotation angle of the text is 45 ° rotated clockwise, the rotation angle of the straight line segment is 43 ° rotated clockwise, and the third threshold is 1 °, since the rotation angle difference between the rotation angle of the text and the rotation angle of the straight line segment is 2 °, the rotation angle difference is greater than the third threshold, the rotation angle of the straight line segment is 43 ° can be used as the rotation angle of the image to be corrected.
Fig. 13A-13B are schematic diagrams showing interfaces before and after correction of the image direction to be corrected, as shown in fig. 13A, in which an image to be corrected having an inclination is originally present, and after correction by the image direction correction method according to the embodiment of the present disclosure, an image conforming to the normal reading direction is formed, as shown in fig. 13B.
In one embodiment of the disclosure, when the target object includes a text and a straight line segment, the rotation angle of the image to be corrected is determined according to the rotation angle of the text and the rotation angle of the straight line segment, so that not only text information in the document image but also straight line information in the document image are utilized, the text information and the straight line information supplement each other, the risk of any information loss or abnormality is effectively avoided, and the determined rotation angle of the image to be corrected is more accurate and more robust.
In one embodiment of the present disclosure, after the rotation angle of the image to be corrected is acquired, the direction of the image to be corrected may be corrected according to the rotation angle. Specifically, the rotation matrix can be determined according to the rotation angle of the image to be corrected and the center point coordinate of the image to be corrected, and then the image to be corrected is corrected according to the rotation matrix. The rotation matrix may be determined according to equation one:
Figure BDA0002273922320000201
Wherein M is a rotation matrix, θ is a rotation angle of the image to be corrected, (c) 0x ,c 0y ) Is the center point coordinates of the image to be corrected.
When the image to be corrected is corrected according to the rotation matrix, the pixel matrix corresponding to the image to be corrected can be multiplied by the rotation matrix, so that the direction of the image to be corrected can be corrected, and the direction of the document image is changed into the normal reading direction.
In one embodiment of the present disclosure, after an image to be corrected is acquired, the image to be corrected may be subjected to first direction correction, for example, the image to be corrected is subjected to correction at angles of 90 °, 180 °, 270 °, etc., so that the rotation angle of the image to be corrected after the first direction correction is limited to be within the range of [ -45 °,45 ° ], and then the image to be corrected with the rotation angle within the range of [ -45 °,45 ° ] is subjected to feature extraction by an image processing model to acquire detection information, vectorizing the detection information, and performing second direction correction according to coordinate information obtained by vectorizing, where methods of feature extraction, vectorizing, and second direction correction are the same as those of the above embodiment, and the embodiments of the present disclosure are not repeated herein.
In one embodiment of the present disclosure, after a document image having a normal reading direction is acquired, optical character recognition may be performed on the document image to recognize and acquire text information therein. Of course, the feature extraction may also be performed on the document image through other machine learning models, such as convolutional neural networks, cyclic neural networks, and the like, so as to obtain text information therein.
Taking a medical document image as an example, fig. 14A to 14D show interface diagrams for performing direction correction on an inclined medical document image, as shown in fig. 14A, which is an original medical document image in which an inclination direction is clockwise, and an inclination angle is within a range of [ -45 °,45 ° ]; extracting characters and straight line segments in the inclined medical document image through an image processing model, and obtaining character detection information corresponding to the characters and straight line segment detection information corresponding to the straight line segments, wherein the character detection information and the straight line segment detection information are of pixel level; then carrying out vectorization processing on the character detection information and the straight line segment detection information, specifically carrying out vectorization processing on the character detection information, firstly filtering the pixel positions of the characters according to a character detection score graph, then calculating the vertex coordinates of each character frame according to a character distance regression graph and character frame angles, and finally filtering out part of the character frames through non-maximum value inhibition, thereby obtaining target character frames with large characteristic response and small overlapping areas and frame coordinates corresponding to the target text frames; vectorizing the straight line segment detection information, firstly carrying out Hough transformation on the straight line segment detection information to obtain a plurality of broken line segments, then carrying out distance judgment on the broken line segments, finally splicing the similar broken line segments to form long straight line segments, and obtaining end point coordinates of the spliced straight line segments, wherein a target text frame is shown in fig. 14B, and the spliced straight line segments are shown in thicker lines in fig. 14C; finally, according to the frame coordinates, the rotation angle of the text can be determined, according to the end point coordinates of the straight line segment, the rotation angle of the inclined medical document image can be determined according to the rotation angle of the text and the rotation angle of the straight line segment, according to the rotation angle, the rotation matrix can be determined, and according to the rotation matrix, the inclined medical document image is processed to obtain the medical document image after correction, as shown in fig. 14D.
Likewise, fig. 15A-15D also show interface diagrams for correcting the direction of the tilted medical document image, where fig. 15A shows the original medical document image with the tilt, fig. 15B shows the target text frame in the medical document image, fig. 15C shows the straight line segment after the stitching in the medical document image, fig. 15D shows the medical document image after the correction, and the method for obtaining each interface diagram is the same as that in fig. 14, and will not be described again here.
According to the embodiment of the disclosure, the characteristic extraction is carried out on the target object in the image to be corrected through the image processing model with the characteristic extraction sub-model, the characteristic fusion sub-model and the post-processing sub-model, and detection information corresponding to the target object is obtained; then vectorizing the detection information to obtain coordinate information corresponding to the target object; and finally, determining a rotation angle corresponding to the image to be corrected based on the coordinate information, and correcting the direction of the image to be corrected according to the rotation angle. According to the embodiment of the disclosure, the fine-granularity rotation angle correction can be performed on the image to be corrected by combining the traditional image processing technology and the deep learning technology, so that the correction accuracy of the image to be corrected is improved; meanwhile, the efficiency and accuracy of information detection and information identification can be improved, and the efficiency and accuracy of subsequent information structuring processing are greatly improved.
The following describes embodiments of the apparatus of the present disclosure that may be used to perform the image orientation correction method of the above-described embodiments of the present disclosure. For details not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the embodiments of the image direction correction method described above in the present disclosure.
Fig. 16 schematically illustrates a block diagram of an image direction correction apparatus according to one embodiment of the present disclosure.
Referring to fig. 16, an image direction correcting apparatus 1600 according to an embodiment of the present disclosure includes: a detection information acquisition module 1601, a coordinate information acquisition module 1602, and an image direction correction module 1603.
The detection information acquisition module 1601 is configured to acquire an image to be corrected, and perform feature extraction on a target object in the image to be corrected through an image processing model to acquire detection information corresponding to the target object; a coordinate information obtaining module 1602, configured to perform vectorization processing on the detection information to obtain coordinate information corresponding to the target object; the image direction correcting module 1603 is configured to determine a rotation angle corresponding to the image to be corrected based on the coordinate information, and correct the direction of the image to be corrected according to the rotation angle.
In one embodiment of the present disclosure, the image processing model includes a feature extraction sub-model, a feature fusion sub-model, and a post-processing sub-model; the detection information acquisition module 1601 includes: the target feature information acquisition unit is used for carrying out multi-layer convolution on the target object through the feature extraction submodel so as to acquire multi-level target feature information; the target fusion characteristic information acquisition unit is used for carrying out characteristic fusion according to the target characteristic information of each level through the characteristic fusion sub-model so as to acquire target fusion characteristic information; and the detection information acquisition unit is used for carrying out convolution processing on the target fusion characteristic information through the post-processing sub-model so as to acquire detection information corresponding to the target object.
In one embodiment of the disclosure, the feature extraction sub-model comprises a first convolution layer, a pooling layer connected with the first convolution layer, and a residual network module connected with the pooling layer, wherein the residual network module comprises M+1 residual network layers connected in sequence, and M is a positive integer; the target feature information acquisition unit is configured to: inputting the image to be corrected into the first convolution layer, and extracting the characteristics of the target object through the first convolution layer to obtain initial characteristic information; inputting the initial characteristic information into the pooling layer, and performing dimension reduction processing on the initial characteristic information through the pooling layer to obtain dimension reduction characteristic information; and inputting the dimension reduction characteristic information into the residual error network module, and carrying out characteristic extraction on the dimension reduction characteristic information through the residual error network layers which are sequentially connected in the residual error network module so as to obtain the multi-stage target characteristic information.
In one embodiment of the disclosure, the feature fusion sub-model includes N fusion network layers connected in sequence and a second convolution layer connected with an nth fusion network layer; the target fusion characteristic information acquisition unit is configured to: fusing the n-1 level fusion characteristic information and the target characteristic information output by the M+1-n residual error network layers through the nth fusion network layer to obtain n level fusion characteristic information; repeating the previous step until N-level fusion characteristic information is obtained; inputting the N-level fusion characteristic information into the second convolution layer, and carrying out characteristic extraction on the N-level fusion characteristic information through the second convolution layer so as to acquire the target fusion characteristic information; the zero-order fusion characteristic information is target characteristic information output by an M+1th residual error network layer, N is a positive integer not exceeding N, and N is a positive integer not exceeding M.
In one embodiment of the disclosure, the target object is a text, the detection information is text detection information, and the post-processing sub-model includes a third convolution layer, a fourth convolution layer, and a fifth convolution layer that are independent of each other; the detection information acquisition unit is configured to: extracting features of the target fusion feature information through the third convolution layer to obtain a text detection score map in the text detection information; extracting features of the target fusion feature information through the fourth convolution layer to obtain a text distance regression graph in the text detection information; and extracting features of the target fusion feature information through the fifth convolution layer to obtain character frame angle information in the character detection information.
In one embodiment of the disclosure, the target object is a straight line segment, the detection information is straight line segment detection information, and the post-processing sub-model includes a sixth convolution layer; the detection information acquisition unit is configured to: and extracting the characteristics of the target fusion characteristic information through the sixth convolution layer to acquire the straight line segment detection information.
In one embodiment of the disclosure, the target object is a text and a straight line segment, the detection information is a text detection information and a straight line segment detection information, and the post-processing sub-model includes a seventh convolution layer, an eighth convolution layer, a ninth convolution layer, and a tenth convolution layer, which are independent of each other; the detection information acquisition unit is configured to: extracting features of the target fusion feature information through the seventh convolution layer to obtain a text detection score map in the text detection information; extracting features of the target fusion feature information through the eighth convolution layer to obtain a text distance regression graph in the text detection information; extracting features of the target fusion feature information through the ninth convolution layer to obtain character frame angle information in the character detection information; and extracting the characteristics of the target fusion characteristic information through the tenth convolution layer to acquire the straight line segment detection information.
In one embodiment of the present disclosure, the coordinate information acquisition module 1602 is configured to: screening pixels in the text detection score map according to a first threshold value to obtain target pixels with text detection scores greater than or equal to the first threshold value; calculating the frame coordinates of the characters corresponding to the target pixels according to the character distance regression graph and the character frame angle information; and filtering the frame coordinates according to the text detection score corresponding to the target pixel and the overlapping degree of the text frame corresponding to the frame coordinates so as to obtain the frame coordinates corresponding to the target text frame.
In one embodiment of the present disclosure, the coordinate information acquisition module 1602 is configured to: performing Hough transformation on the straight line segment detection information to obtain coordinate information of a plurality of line segments; determining any two line segments in the plurality of line segments as a first line segment and a second line segment, and calculating a first distance from the midpoint of the first line segment to the second line segment and a second distance from the midpoint of the second line segment to the first line segment; judging whether the first distance and the second distance are smaller than a second threshold value or not; when a target first line segment and a target second line segment with the first distance and the second distance smaller than the second threshold exist, splicing the first target line segment and the second target line segment; and acquiring the endpoint coordinates of the straight line segment formed after the plurality of line segments are spliced.
In one embodiment of the disclosure, the target object is a word, and the coordinate information is a frame coordinate of a frame of the target word; the image direction correction module 1603 is configured to: determining an upper edge line and a lower edge line of the text frame according to the frame coordinates of the target text frame; calculating the slopes of the upper edge line and the lower edge line, counting the occurrence times of the slopes, and obtaining a first target slope with the largest occurrence times; and determining the rotation angle of the characters according to the first target slope, and taking the rotation angle of the characters as the rotation angle corresponding to the image to be corrected.
In one embodiment of the disclosure, the target object is a straight line segment, and the coordinate information is an endpoint coordinate of the straight line segment; the image direction correction module 1603 is configured to: calculating the slope of the straight line segment according to the endpoint coordinates of the straight line segment; counting the occurrence times of the slopes, and obtaining a second target slope with the largest occurrence times; and determining the rotation angle of the straight line segment according to the second target slope, and taking the rotation angle of the straight line segment as the rotation angle corresponding to the image to be corrected.
In one embodiment of the disclosure, the target object is a text and a straight line segment, and the coordinate information is frame coordinates of a frame of the target text and endpoint coordinates of the straight line segment; the image direction correction module 1603 is configured to: calculating the rotation angle of the text according to the frame coordinates of the target text frame, and calculating the rotation angle of the straight line segment according to the endpoint coordinates of the straight line segment; the rotation angle of the characters is differenced with the rotation angle of the straight line segment, and the absolute value is taken, so that the rotation angle difference is obtained; comparing the rotation angle difference with a third threshold; when the rotation angle difference is smaller than or equal to the third threshold value, acquiring an average value of the rotation angle of the characters and the rotation angle of the straight line segment, and taking the average value as the rotation angle corresponding to the image to be corrected; and when the rotation angle difference is larger than the third threshold value, taking the rotation angle of the straight line segment as the rotation angle corresponding to the image to be corrected.
In one embodiment of the present disclosure, the image direction correction module 1603 is configured to: determining a rotation matrix according to the rotation angle and the center point coordinates of the image to be corrected; multiplying the pixel matrix corresponding to the image to be corrected by the rotation matrix to correct the direction of the image to be corrected.
Fig. 17 shows a schematic diagram of a computer system suitable for use in implementing embodiments of the present disclosure.
It should be noted that the computer system 1700 of the electronic device shown in fig. 17 is only an example, and should not impose any limitation on the functions and application scope of the embodiments of the present disclosure.
As shown in fig. 17, the computer system 1700 includes a central processing unit (Central Processing Unit, CPU) 1701, which can perform various appropriate actions and processes according to a program stored in a Read-Only Memory (ROM) 1702 or a program loaded from a storage portion 1708 into a random access Memory (Random Access Memory, RAM) 1703, implementing the image labeling method described in the above embodiment. In the RAM 1703, various programs and data required for system operation are also stored. The CPU 1701, ROM 1702, and RAM 1703 are connected to each other through a bus 1704. An Input/Output (I/O) interface 1705 is also connected to the bus 1704.
The following components are connected to the I/O interface 1705: an input section 1706 including a keyboard, a mouse, and the like; an output portion 1707 including a Cathode Ray Tube (CRT), a liquid crystal display (Liquid Crystal Display, LCD), and a speaker, etc.; a storage portion 1708 including a hard disk or the like; and a communication section 1709 including a network interface card such as a LAN (Local Area Network ) card, a modem, or the like. The communication section 1709 performs communication processing via a network such as the internet. The driver 1710 is also connected to the I/O interface 1705 as needed. A removable medium 1711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on the drive 1710, so that a computer program read therefrom is installed into the storage portion 1708 as needed.
In particular, according to embodiments of the present disclosure, the processes described below with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program can be downloaded and installed from a network via the communication portion 1709, and/or installed from the removable media 1711. When executed by a Central Processing Unit (CPU) 1701, performs the various functions defined in the system of the present disclosure.
It should be noted that, the computer readable medium shown in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-Only Memory (ROM), an erasable programmable read-Only Memory (Erasable Programmable Read Only Memory, EPROM), flash Memory, an optical fiber, a portable compact disc read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units involved in the embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware, and the described units may also be provided in a processor. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.
As another aspect, the present disclosure also provides a computer-readable medium that may be contained in the image processing apparatus described in the above embodiment; or may exist alone without being incorporated into the electronic device. The computer-readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to implement the methods described in the above embodiments.
It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.
From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, and includes several instructions to cause a computing device (may be a personal computer, a server, a touch terminal, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any adaptations, uses, or adaptations of the disclosure following the general principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains.
It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (14)

1. An image direction correction method, comprising:
acquiring an image to be corrected, and carrying out multi-layer convolution on a target object in the image to be corrected through a feature extraction sub-model in an image processing model so as to acquire multi-level target feature information; performing feature fusion according to the target feature information of each level through a feature fusion sub-model in the image processing model so as to obtain target fusion feature information; convolving the target fusion characteristic information through a post-processing sub-model in the image processing model to obtain detection information corresponding to the target object; the target object is characters and/or straight line segments in the image to be corrected, the detection information is character detection information and/or straight line segment detection information, and the character detection information comprises a character detection score graph, a character distance regression graph and a character frame angle; the feature fusion sub-model comprises N fusion network layers and a second convolution layer, wherein the N fusion network layers are sequentially connected, and the second convolution layer is connected with the Nth fusion network layer;
Vectorizing the detection information to obtain coordinate information corresponding to the target object;
and determining a rotation angle corresponding to the image to be corrected based on the coordinate information, and correcting the direction of the image to be corrected according to the rotation angle.
2. The method of claim 1, wherein the feature extraction sub-model comprises a first convolution layer, a pooling layer connected to the first convolution layer, and a residual network module connected to the pooling layer, wherein the residual network module comprises m+1 sequentially connected residual network layers, M being a positive integer;
the multi-layer convolution is performed on the target object in the image to be corrected through the feature extraction sub-model in the image processing model so as to obtain multi-level target feature information, and the multi-layer convolution comprises the following steps:
inputting the image to be corrected into the first convolution layer, and extracting the characteristics of the target object through the first convolution layer to obtain initial characteristic information;
inputting the initial characteristic information into the pooling layer, and performing dimension reduction processing on the initial characteristic information through the pooling layer to obtain dimension reduction characteristic information;
and inputting the dimension reduction characteristic information into the residual error network module, and carrying out characteristic extraction on the dimension reduction characteristic information through the residual error network layers which are sequentially connected in the residual error network module so as to obtain the multi-stage target characteristic information.
3. The method according to claim 1, wherein the feature fusion is performed by a feature fusion sub-model in the image processing model according to the target feature information of each level to obtain target fusion feature information, including:
fusing the n-1 level fusion characteristic information and the target characteristic information output by the M+1-n residual error network layers through the nth fusion network layer to obtain n level fusion characteristic information;
repeating the previous step until N-level fusion characteristic information is obtained;
inputting the N-level fusion characteristic information into the second convolution layer, and carrying out characteristic extraction on the N-level fusion characteristic information through the second convolution layer so as to acquire the target fusion characteristic information;
the zero-order fusion characteristic information is target characteristic information output by an M+1th residual error network layer, N is a positive integer not exceeding N, and N is a positive integer not exceeding M.
4. The method of claim 1, wherein the target object is text and the detection information is text detection information, and wherein the post-processing sub-model comprises a third convolution layer, a fourth convolution layer, and a fifth convolution layer that are independent of each other;
the convolving the target fusion characteristic information through a post-processing sub-model in the image processing model to obtain detection information corresponding to the target object, including:
Extracting features of the target fusion feature information through the third convolution layer to obtain a text detection score map in the text detection information;
extracting features of the target fusion feature information through the fourth convolution layer to obtain a text distance regression graph in the text detection information;
and extracting features of the target fusion feature information through the fifth convolution layer to obtain a character frame angle in the character detection information.
5. The method of claim 1, wherein the target object is a straight line segment, the detection information is straight line segment detection information, and the post-processing sub-model comprises a sixth convolution layer;
the convolving the target fusion characteristic information through a post-processing sub-model in the image processing model to obtain detection information corresponding to the target object, including:
and extracting the characteristics of the target fusion characteristic information through the sixth convolution layer to acquire the straight line segment detection information.
6. The method of claim 1, wherein the target object is text and straight line segments, the detection information is text detection information and straight line segment detection information, and the post-processing sub-model comprises a seventh convolution layer, an eighth convolution layer, a ninth convolution layer, and a tenth convolution layer, independent of each other;
The convolving the target fusion characteristic information through a post-processing sub-model in the image processing model to obtain detection information corresponding to the target object, including:
extracting features of the target fusion feature information through the seventh convolution layer to obtain a text detection score map in the text detection information;
extracting features of the target fusion feature information through the eighth convolution layer to obtain a text distance regression graph in the text detection information;
extracting features of the target fusion feature information through the ninth convolution layer to obtain a character frame angle in the character detection information;
and extracting the characteristics of the target fusion characteristic information through the tenth convolution layer to acquire the straight line segment detection information.
7. The method of claim 4, wherein vectorizing the detection information to obtain coordinate information corresponding to the target object comprises:
screening pixels in the text detection score map according to a first threshold value to obtain target pixels with text detection scores greater than or equal to the first threshold value;
Calculating the frame coordinates of the characters corresponding to the target pixels according to the character distance regression graph and the character frame angles;
and filtering the frame coordinates according to the text detection score corresponding to the target pixel and the overlapping degree of the text frame corresponding to the frame coordinates so as to obtain the frame coordinates corresponding to the target text frame.
8. The method of claim 5, wherein vectorizing the detection information to obtain coordinate information corresponding to the target object comprises:
performing Hough transformation on the straight line segment detection information to obtain coordinate information of a plurality of line segments;
determining any two line segments in the plurality of line segments as a first line segment and a second line segment, and calculating a first distance from the midpoint of the first line segment to the second line segment and a second distance from the midpoint of the second line segment to the first line segment;
judging whether the first distance and the second distance are smaller than a second threshold value or not;
when a first target line segment and a second target line segment with the first distance and the second distance smaller than the second threshold exist, splicing the first target line segment and the second target line segment;
And acquiring the endpoint coordinates of the straight line segment formed after the plurality of line segments are spliced.
9. The method of claim 1, wherein the target object is a text and the coordinate information is a frame coordinate of a frame of the target text;
the determining a rotation angle corresponding to the image to be corrected based on the coordinate information includes:
determining an upper edge line and a lower edge line of the target text frame according to the frame coordinates of the target text frame;
calculating the slopes of the upper edge line and the lower edge line, counting the occurrence times of the slopes, and obtaining a first target slope with the largest occurrence times;
and determining the rotation angle of the characters according to the first target slope, and taking the rotation angle of the characters as the rotation angle corresponding to the image to be corrected.
10. The method of claim 1, wherein the target object is a straight line segment and the coordinate information is an endpoint coordinate of the straight line segment;
the determining a rotation angle corresponding to the image to be corrected based on the coordinate information includes:
calculating the slope of the straight line segment according to the endpoint coordinates of the straight line segment;
counting the occurrence times of the slopes, and obtaining a second target slope with the largest occurrence times;
And determining the rotation angle of the straight line segment according to the second target slope, and taking the rotation angle of the straight line segment as the rotation angle corresponding to the image to be corrected.
11. The method of claim 1, wherein the target object is a text and a straight line segment, and the coordinate information is a frame coordinate of a frame of the target text and an endpoint coordinate of the straight line segment;
the determining a rotation angle corresponding to the image to be corrected based on the coordinate information includes:
calculating the rotation angle of the text according to the frame coordinates of the target text frame, and calculating the rotation angle of the straight line segment according to the endpoint coordinates of the straight line segment;
the rotation angle of the characters is differenced with the rotation angle of the straight line segment, and the absolute value is taken, so that the rotation angle difference is obtained;
comparing the rotation angle difference with a third threshold;
when the rotation angle difference is smaller than or equal to the third threshold value, acquiring an average value of the rotation angle of the characters and the rotation angle of the straight line segment, and taking the average value as the rotation angle corresponding to the image to be corrected;
and when the rotation angle difference is larger than the third threshold value, taking the rotation angle of the straight line segment as the rotation angle corresponding to the image to be corrected.
12. The method according to claim 1, wherein correcting the direction of the image to be corrected according to the rotation angle comprises:
determining a rotation matrix according to the rotation angle and the center point coordinates of the image to be corrected;
multiplying the pixel matrix corresponding to the image to be corrected by the rotation matrix to correct the direction of the image to be corrected.
13. An image direction correcting apparatus, comprising:
the detection information acquisition module is used for acquiring an image to be corrected, and carrying out multi-layer convolution on a target object in the image to be corrected through a feature extraction sub-model in the image processing model so as to acquire multi-level target feature information; performing feature fusion according to the target feature information of each level through a feature fusion sub-model in the image processing model so as to obtain target fusion feature information; convolving the target fusion characteristic information through a post-processing sub-model in the image processing model to obtain detection information corresponding to the target object; the target object is characters and/or straight line segments in the image to be corrected, the detection information is character detection information and/or straight line segment detection information, and the character detection information comprises a character detection score graph, a character distance regression graph and a character frame angle; the feature fusion sub-model comprises N fusion network layers and a second convolution layer, wherein the N fusion network layers are sequentially connected, and the second convolution layer is connected with the Nth fusion network layer;
The coordinate information acquisition module is used for carrying out vectorization processing on the detection information so as to acquire coordinate information corresponding to the target object;
and the image direction correction module is used for determining a rotation angle corresponding to the image to be corrected based on the coordinate information and correcting the direction of the image to be corrected according to the rotation angle.
14. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs which when executed by the one or more processors cause the one or more processors to perform the image direction correction method of any of claims 1 to 12.
CN201911115498.5A 2019-11-14 2019-11-14 Image direction correction method and device and electronic equipment Active CN111104941B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911115498.5A CN111104941B (en) 2019-11-14 2019-11-14 Image direction correction method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911115498.5A CN111104941B (en) 2019-11-14 2019-11-14 Image direction correction method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN111104941A CN111104941A (en) 2020-05-05
CN111104941B true CN111104941B (en) 2023-06-13

Family

ID=70420558

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911115498.5A Active CN111104941B (en) 2019-11-14 2019-11-14 Image direction correction method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN111104941B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115917586A (en) * 2020-11-11 2023-04-04 Oppo广东移动通信有限公司 Image processing method, device, equipment and storage medium
CN113158895B (en) * 2021-04-20 2023-11-14 北京中科江南信息技术股份有限公司 Bill identification method and device, electronic equipment and storage medium
CN113397459A (en) * 2021-05-18 2021-09-17 浙江师范大学 Capsule type medical device control system and method based on electromechanical integration

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003271897A (en) * 2002-03-15 2003-09-26 Ricoh Co Ltd Character recognizer, image processor, image processing method, and program used for executing the method
JP2010220059A (en) * 2009-03-18 2010-09-30 Casio Computer Co Ltd Apparatus, program, method and system for image processing
CN109614972A (en) * 2018-12-06 2019-04-12 泰康保险集团股份有限公司 Image processing method, device, electronic equipment and computer-readable medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002077914A1 (en) * 2001-03-23 2002-10-03 International Business Machines Corporation Method, system and program for inputting handwritten characters
CN108446698B (en) * 2018-03-15 2020-08-21 腾讯大地通途(北京)科技有限公司 Method, device, medium and electronic equipment for detecting text in image
CN109271967B (en) * 2018-10-16 2022-08-26 腾讯科技(深圳)有限公司 Method and device for recognizing text in image, electronic equipment and storage medium
CN109993202B (en) * 2019-02-15 2023-08-22 广东智媒云图科技股份有限公司 Line manuscript type graph similarity judging method, electronic equipment and storage medium
CN110378249B (en) * 2019-06-27 2024-01-12 腾讯科技(深圳)有限公司 Text image inclination angle recognition method, device and equipment
CN110427939A (en) * 2019-08-02 2019-11-08 泰康保险集团股份有限公司 Method, apparatus, medium and the electronic equipment of correction inclination text image

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003271897A (en) * 2002-03-15 2003-09-26 Ricoh Co Ltd Character recognizer, image processor, image processing method, and program used for executing the method
JP2010220059A (en) * 2009-03-18 2010-09-30 Casio Computer Co Ltd Apparatus, program, method and system for image processing
CN109614972A (en) * 2018-12-06 2019-04-12 泰康保险集团股份有限公司 Image processing method, device, electronic equipment and computer-readable medium

Also Published As

Publication number Publication date
CN111104941A (en) 2020-05-05

Similar Documents

Publication Publication Date Title
US20220058426A1 (en) Object recognition method and apparatus, electronic device, and readable storage medium
CN108304835B (en) character detection method and device
US11830230B2 (en) Living body detection method based on facial recognition, and electronic device and storage medium
US11775574B2 (en) Method and apparatus for visual question answering, computer device and medium
US20180114071A1 (en) Method for analysing media content
CN108734210B (en) Object detection method based on cross-modal multi-scale feature fusion
WO2022105125A1 (en) Image segmentation method and apparatus, computer device, and storage medium
CN110659723B (en) Data processing method and device based on artificial intelligence, medium and electronic equipment
CN111104941B (en) Image direction correction method and device and electronic equipment
CN111275107A (en) Multi-label scene image classification method and device based on transfer learning
CN111860398B (en) Remote sensing image target detection method and system and terminal equipment
US11768876B2 (en) Method and device for visual question answering, computer apparatus and medium
WO2020223859A1 (en) Slanted text detection method, apparatus and device
CN111488826A (en) Text recognition method and device, electronic equipment and storage medium
CN111241989A (en) Image recognition method and device and electronic equipment
CN113011144B (en) Form information acquisition method, device and server
CN110852311A (en) Three-dimensional human hand key point positioning method and device
CN112052845A (en) Image recognition method, device, equipment and storage medium
CN115631112B (en) Building contour correction method and device based on deep learning
CN115131218A (en) Image processing method, image processing device, computer readable medium and electronic equipment
CN113111880A (en) Certificate image correction method and device, electronic equipment and storage medium
CN115577768A (en) Semi-supervised model training method and device
CN110163095B (en) Loop detection method, loop detection device and terminal equipment
CN112651399B (en) Method for detecting same-line characters in inclined image and related equipment thereof
TWI803243B (en) Method for expanding images, computer device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant