CN111104941A

CN111104941A - Image direction correcting method and device and electronic equipment

Info

Publication number: CN111104941A
Application number: CN201911115498.5A
Authority: CN
Inventors: 郭双双; 龚星; 李斌
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-11-14
Filing date: 2019-11-14
Publication date: 2020-05-05
Anticipated expiration: 2039-11-14
Also published as: CN111104941B

Abstract

The disclosure provides an image direction correcting method and device and electronic equipment, and relates to the field of artificial intelligence. The method comprises the following steps: acquiring an image to be corrected, and performing feature extraction on a target object in the image to be corrected through an image processing model to acquire detection information corresponding to the target object; vectorizing the detection information to obtain coordinate information corresponding to the target object; and determining a rotation angle corresponding to the image to be corrected based on the coordinate information, and correcting the direction of the image to be corrected according to the rotation angle. The method and the device can correct the fine-grained rotation angle of the image with the inclination, and improve the correction accuracy of the image direction; meanwhile, the detection efficiency and the recognition accuracy of the information in the document image can be improved, and the efficiency and the accuracy of subsequent information structuring processing are greatly improved.

Description

Image direction correcting method and device and electronic equipment

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to an image direction correction method, an image direction correction apparatus, a computer storage medium, and an electronic device.

Background

With the rapid development of computer technology, businesses in various industries are gradually converted from a manual processing mode to a machine processing mode. And with the explosive increase of data volume, people put higher and higher requirements on the efficiency and the accuracy of machine processing.

Taking extracting the document information as an example, due to the angle problem during shooting or the influence of other factors, the document image has an inclination, and in order to ensure the recognition efficiency and accuracy of the document information, the inclined document image needs to be corrected. At present, the correction of the document image is mainly to rotate the image according to a limited number of fixed discrete values, for example, only 90 °, 180 °, and so on, but this method has a great limitation, and the correction effect is poor for document images with other inclination angles.

In view of this, there is a need in the art to develop a new image direction correction method.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

The embodiment of the disclosure provides an image direction correcting method, an image direction correcting device, a computer storage medium and an electronic device, so that fine-grained direction correction can be performed on an oblique image at least to a certain extent, and the accuracy of image direction correction is improved.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.

According to an aspect of an embodiment of the present disclosure, there is provided an image direction correction method including: acquiring an image to be corrected, and performing feature extraction on a target object in the image to be corrected through an image processing model to acquire detection information corresponding to the target object; vectorizing the detection information to obtain coordinate information corresponding to the target object; and determining a rotation angle corresponding to the image to be corrected based on the coordinate information, and correcting the direction of the image to be corrected according to the rotation angle.

According to an aspect of an embodiment of the present disclosure, there is provided an image direction correcting apparatus including: the detection information acquisition module is used for acquiring an image to be corrected and extracting the characteristics of a target object in the image to be corrected through an image processing model so as to acquire detection information corresponding to the target object; a coordinate information obtaining module, configured to perform vectorization processing on the detection information to obtain coordinate information corresponding to the target object; and the image direction correcting module is used for determining a rotating angle corresponding to the image to be corrected based on the coordinate information and correcting the direction of the image to be corrected according to the rotating angle.

In some embodiments of the present disclosure, the image processing model comprises a feature extraction sub-model, a feature fusion sub-model, and a post-processing sub-model; based on the foregoing solution, the detection information obtaining module includes: the target characteristic information acquisition unit is used for carrying out multilayer convolution on the target object through the characteristic extraction submodel so as to acquire multistage target characteristic information; the target fusion characteristic information acquisition unit is used for carrying out characteristic fusion according to the target characteristic information of each level through the characteristic fusion submodel to acquire target fusion characteristic information; and the detection information acquisition unit is used for performing convolution processing on the target fusion characteristic information through the post-processing sub-model so as to acquire detection information corresponding to the target object.

In some embodiments of the present disclosure, the feature extraction submodel includes a first convolution layer, a pooling layer connected to the first convolution layer, and a residual network module connected to the pooling layer, wherein the residual network module includes M +1 residual network layers connected in sequence, M is a positive integer; based on the foregoing solution, the target feature information obtaining unit is configured to: inputting the image to be corrected to the first convolution layer, and performing feature extraction on the target object through the first convolution layer to obtain initial feature information; inputting the initial characteristic information into the pooling layer, and performing dimensionality reduction processing on the initial characteristic information through the pooling layer to obtain dimensionality reduction characteristic information; and inputting the dimension reduction characteristic information into the residual error network module, and performing characteristic extraction on the dimension reduction characteristic information through the residual error network layers sequentially connected in the residual error network module to obtain the multi-stage target characteristic information.

In some embodiments of the present disclosure, the feature fusion sub-model includes N fusion network layers connected in sequence and a second convolutional layer connected to the nth fusion network layer; based on the foregoing scheme, the target fusion feature information obtaining unit is configured to: fusing the n-1 level fusion characteristic information and the target characteristic information output by the M +1-n residual error network layers through the nth fusion network layer to obtain n level fusion characteristic information; repeating the previous step until N-level fusion characteristic information is obtained; inputting the N-level fusion feature information into the second convolution layer, and performing feature extraction on the N-level fusion feature information through the second convolution layer to obtain the target fusion feature information; the zero-level fusion feature information is target feature information output by an M +1 th residual network layer, N is a positive integer not exceeding N, and N is a positive integer not exceeding M.

In some embodiments of the present disclosure, the target object is a text, the detection information is text detection information, and the post-processing submodel includes a third convolution layer, a fourth convolution layer, and a fifth convolution layer independently of each other; based on the foregoing solution, the detection information obtaining unit is configured to: extracting the characteristics of the target fusion characteristic information through the third convolution layer to obtain a character detection score map in the character detection information; extracting the characteristics of the target fusion characteristic information through the fourth convolution layer to obtain a text distance regression graph in the text detection information; and performing feature extraction on the target fusion feature information through the fifth convolution layer to acquire character frame angle information in the character detection information.

In some embodiments of the present disclosure, the target object is a straight line segment, the detection information is straight line segment detection information, and the post-processing submodel includes a sixth convolutional layer; based on the foregoing solution, the detection information obtaining unit is configured to: and performing feature extraction on the target fusion feature information through the sixth convolution layer to acquire the straight-line segment detection information.

In some embodiments of the present disclosure, the target object is a text and a straight line segment, the detection information is text detection information and straight line segment detection information, and the post-processing submodel includes a seventh convolution layer, an eighth convolution layer, a ninth convolution layer, and a tenth convolution layer independently of each other; based on the foregoing solution, the detection information obtaining unit is configured to: extracting the characteristics of the target fusion characteristic information through the seventh convolution layer to obtain a character detection score map in the character detection information; extracting the features of the target fusion feature information through the eighth convolution layer to obtain a text distance regression graph in the text detection information; extracting the characteristics of the target fusion characteristic information through the ninth convolution layer to obtain character frame angle information in the character detection information; and performing feature extraction on the target fusion feature information through the tenth convolution layer to acquire the straight-line segment detection information.

In some embodiments of the present disclosure, based on the foregoing solution, the coordinate information obtaining module is configured to: screening pixels in the character detection score map according to a first threshold value to obtain target pixels with character detection scores larger than or equal to the first threshold value; calculating the frame coordinate of the character corresponding to the target pixel according to the character distance regression graph and the character frame angle information; and filtering the frame coordinates according to the character detection scores corresponding to the target pixels and the overlapping degree of the character frames corresponding to the frame coordinates to obtain the frame coordinates corresponding to the target character frames.

In some embodiments of the present disclosure, based on the foregoing solution, the coordinate information obtaining module is configured to: carrying out Hough transform on the straight line segment detection information to obtain coordinate information of a plurality of line segments; determining any two line segments of the line segments as a first line segment and a second line segment, and calculating a first distance from the midpoint of the first line segment to the second line segment and a second distance from the midpoint of the second line segment to the first line segment; judging whether the first distance and the second distance are both smaller than a second threshold value; when a target first line segment and a target second line segment exist, wherein the first distance and the second distance are both smaller than the second threshold value, the first target line segment and the second target line segment are spliced; and acquiring the endpoint coordinates of the straight line segment formed by splicing the line segments.

In some embodiments of the present disclosure, the target object is a text, and the coordinate information is a frame coordinate of a frame of the target text; based on the foregoing solution, the image direction correcting module is configured to: determining an upper edge line and a lower edge line of the character frame according to the frame coordinates of the target character frame; calculating the slopes of the upper edge line and the lower edge line, counting the occurrence times of the slopes, and acquiring a first target slope with the largest occurrence times; and determining the rotation angle of the characters according to the first target slope, and taking the rotation angle of the characters as the rotation angle corresponding to the image to be corrected.

In some embodiments of the present disclosure, the target object is a straight line segment, and the coordinate information is an endpoint coordinate of the straight line segment; based on the foregoing solution, the image direction correcting module is configured to: calculating the slope of the straight line segment according to the coordinates of the endpoint of the straight line segment; counting the occurrence times of the slopes, and acquiring a second target slope with the maximum occurrence times; and determining the rotation angle of the straight line segment according to the second target slope, and taking the rotation angle of the straight line segment as the rotation angle corresponding to the image to be corrected.

In some embodiments of the present disclosure, the target object is a text and a straight line segment, and the coordinate information is a frame coordinate of a frame of the target text and an endpoint coordinate of the straight line segment; based on the foregoing solution, the image direction correcting module is configured to: calculating the rotation angle of the character according to the frame coordinates of the frame of the target character, and calculating the rotation angle of the straight line segment according to the endpoint coordinates of the straight line segment; making a difference between the rotation angle of the characters and the rotation angle of the straight line segment and taking an absolute value to obtain a rotation angle difference; comparing the rotation angle difference to a third threshold; when the rotation angle difference is smaller than or equal to the third threshold value, obtaining an average value of the rotation angle of the characters and the rotation angle of the straight line segment, and taking the average value as the rotation angle corresponding to the image to be corrected; and when the rotation angle difference is larger than the third threshold value, taking the rotation angle of the straight line segment as the rotation angle corresponding to the image to be corrected.

In some embodiments of the present disclosure, based on the foregoing, the image direction correcting module is configured to: determining a rotation matrix according to the rotation angle and the coordinates of the central point of the image to be corrected; and multiplying the pixel matrix corresponding to the image to be corrected by the rotation matrix so as to correct the direction of the image to be corrected.

According to an aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium on which a computer program is stored, the program, when executed by a processor, implementing the image orientation correction method according to the embodiments described above.

According to an aspect of an embodiment of the present disclosure, there is provided an electronic device including one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to perform the image orientation correction method as described in the above embodiments.

In the technical scheme provided by the embodiment of the disclosure, firstly, an image to be corrected is obtained, and feature extraction is performed on a target object in the image to obtain detection information corresponding to the target object through an image processing model; vectorizing the target object according to the detection information to acquire coordinate information corresponding to the target object; and finally, determining a rotation angle corresponding to the image to be corrected according to the coordinate information, and correcting the direction of the image to be corrected according to the rotation angle. The technical scheme disclosed by the invention can be used for carrying out fine-grained direction correction on the image with the inclination, so that the accuracy of image direction correction is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty. In the drawings:

fig. 1 shows a schematic diagram of an exemplary system architecture to which technical aspects of embodiments of the present disclosure may be applied;

FIG. 2 schematically illustrates a flow diagram of an image orientation correction method according to one embodiment of the present disclosure;

FIG. 3 schematically shows a structural schematic of an image processing model according to one embodiment of the present disclosure;

FIG. 4 schematically shows a structural schematic of a feature extraction submodel according to one embodiment of the present disclosure;

FIG. 5 schematically shows a flow diagram of feature extraction by an image processing model according to one embodiment of the present disclosure;

FIG. 6 schematically shows a structural schematic of a residual network layer according to an embodiment of the present disclosure;

FIG. 7 schematically shows a structural schematic of a feature fusion submodel according to one embodiment of the present disclosure;

FIG. 8 schematically illustrates a structural schematic of a converged network layer, according to one embodiment of the present disclosure;

9A-9C schematically illustrate interface diagrams of an image to be corrected before and after processing by an image processing model, according to one embodiment of the present disclosure;

fig. 10 schematically shows a flowchart of vectorization processing according to text detection information according to an embodiment of the present disclosure;

11A-11B schematically illustrate interface diagrams of images to be corrected before and after segment stitching, according to one embodiment of the present disclosure;

12A-12B schematically illustrate interfaces after vectorization processing of detection information according to one embodiment of the present disclosure;

13A-13B schematically illustrate interface diagrams before and after correcting an image orientation to be corrected, according to one embodiment of the present disclosure;

14A-14D schematically illustrate interface diagrams for orientation correction of a tilted medical document image according to one embodiment of the present disclosure;

15A-15D schematically illustrate interface diagrams for orientation correction of a tilted medical document image according to one embodiment of the present disclosure;

fig. 16 schematically shows a block diagram of an image orientation correcting apparatus according to an embodiment of the present disclosure;

FIG. 17 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the disclosure.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

Fig. 1 shows a schematic diagram of an exemplary system architecture to which the technical solutions of the embodiments of the present disclosure may be applied.

As shown in fig. 1, system architecture 100 may include terminal device 101, network 102, and server 103. Network 102 is the medium used to provide communication links between terminal devices 101 and server 103. Network 102 may include various connection types, such as wired communication links, wireless communication links, and so forth.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired. For example, the server 103 may be a server cluster composed of a plurality of servers. The terminal apparatus 101 may be a terminal apparatus with an image pickup unit and a display screen, such as a notebook, a portable computer, a smartphone, or a terminal apparatus such as a camera, a video camera, or the like.

In one embodiment of the present disclosure, a user may photograph a subject including a target object by an image pickup unit in the terminal apparatus 101 to acquire an image including the target object. Because of the shooting angle, the operation of the presenter is improper, or the presentation scene is limited, the finally obtained image may have an inclination and is not in accordance with the normal reading direction, the terminal device 101 may send the image to be corrected, which has the inclination, to the server 103 through the network 102, and after receiving the image to be corrected, the server 103 may perform feature extraction on the target object therein to obtain detection information corresponding to the target object, where the detection information is pixel-level information, such as where there are characters, where there are lines, and the like, represented by the values of the pixel points; vectorization processing may be performed on the detection information to obtain coordinate information corresponding to the target object, for example, frame coordinates of a text frame, end point coordinates of a straight line segment, and the like may be obtained; then, a rotation angle corresponding to the image to be corrected can be determined based on the coordinate information corresponding to the target object, wherein the rotation angle is the inclination angle of the image to be corrected; and finally, determining a rotation matrix according to the rotation angle, and transforming pixels of the image to be corrected according to the rotation matrix so as to correct the direction of the image to be corrected. The image to be corrected in the embodiment of the disclosure can be various types of document images, such as medical records, expense lists, settlement lists, medical invoices, inspection report forms, financial statements and the like, and the technical scheme of the embodiment of the disclosure can correct the image with any inclination angle, thereby realizing the correction of the fine grain direction of the inclined image and improving the accuracy of the correction of the image direction; in addition, the information of the image with the corrected direction is identified, and the identification efficiency and the identification accuracy can be improved.

It should be noted that the image direction correcting method provided by the embodiment of the present disclosure is generally executed by a server, and accordingly, the image direction correcting apparatus is generally disposed in the server. However, in other embodiments of the present disclosure, the image direction correcting method provided by the embodiments of the present disclosure may also be performed by a terminal device.

In the related art in the field, taking the correction of a document image as an example, because the styles of the document image are various, it is difficult to ensure that a method based on a simple spatial rule is applicable to document images of various styles, and at present, there are two main methods for correcting the image direction, the first is: firstly, a character area in a document image is positioned, and then structural features including character aspect ratio, stroke features, connected domain features and the like are extracted from character blocks in the character area. If the response value of each structural feature is larger than the preset credibility, the document direction is indicated to be the normal reading direction, otherwise, the image is sequentially rotated by 90 degrees, 180 degrees and 270 degrees until the response value of the structural feature is larger than the credibility, namely, the direction of the document image is corrected to be the normal reading direction; and the second method comprises the following steps: firstly obtaining a text candidate region containing effective information, then adopting a neural network to carry out feature extraction on the text candidate region, classifying a text region and a non-text region by utilizing a Softmax function, finally grouping the text region, gradually obtaining an overlapping part with a training set image, and overlapping all the overlapping parts to obtain a final corrected text.

In both methods, the text region is first located, and then the region is rotated for a limited number of times in order to obtain the orientation with the maximum feature response or the overlapping portion of the text region and the training set image in the orientation. However, the rotation angle of the document image is not infinite, and classifying the orientation of the document image into a preset value is inaccurate and unreliable, which has great limitation in practical application scenarios.

In view of the problems in the related art, the embodiments of the present disclosure provide an image orientation correction method implemented based on Artificial Intelligence (AI), which is a theory, method, technique, and application system that simulates, extends, and expands human Intelligence, senses an environment, acquires knowledge, and obtains an optimal result using the knowledge using a digital computer or a machine controlled by a digital computer. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Computer Vision technology (CV) Computer Vision is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technologies, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also include common biometric technologies such as face recognition and fingerprint recognition.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical care, smart customer service, and the like.

The scheme provided by the embodiment of the disclosure relates to an artificial intelligence image processing technology, and is specifically explained by the following embodiment:

fig. 2 schematically illustrates a flowchart of an image orientation correction method according to one embodiment of the present disclosure, which may be performed by a server, which may be the server 103 illustrated in fig. 1. Referring to fig. 2, the image direction correcting method at least includes steps S210 to S230, which are described in detail as follows:

in step S210, an image to be corrected is obtained, and feature extraction is performed on a target object in the image to be corrected through an image processing model, so as to obtain detection information corresponding to the target object.

In an embodiment of the present disclosure, taking a document image as an example, when information in the document image is generally extracted through a computer algorithm, a paper document needs to be photographed to obtain the document image, and then the document image is uploaded to a terminal or a server for subsequent information identification and information extraction. When a user photographs a paper document through the terminal device 101 with a camera unit, the photographed document image may be inclined due to the influence of factors such as a photographing angle and a paper placing angle, and meanwhile, when the user uploads the document image in a local storage or other storage media through the terminal device 101, the document image may be inclined due to improper operation. When the inclined document image is an image in an abnormal reading direction, and characters in the image are recognized through an optical character recognition algorithm (OCR algorithm) or other machine learning models in a later period, the recognition efficiency and the recognition accuracy are poor, and the subsequent information structuring processing process is seriously influenced by the character recognition result with poor performance, so that the accuracy of the whole system is reduced. Therefore, after the document image with the inclination is obtained, the direction of the document image needs to be corrected, and the document image identified by the subsequent information is ensured to be the document image with the normal reading direction.

In an embodiment of the present disclosure, a document image with a tilt is taken as an image to be corrected, and after receiving the image to be corrected, feature extraction may be performed on a target object therein to obtain detection information corresponding to the target object. Specifically, feature extraction may be performed on a target object in the image to be corrected through an image processing model to obtain detection information corresponding to the target object. In the embodiment of the present disclosure, the target object may be a text and/or a straight line segment in the image to be corrected, for example, when the content in the image to be corrected is pure text information such as a log, a novel, and the like, the text is the target object; when the image to be corrected is an image containing characters and straight line segments, such as a medical document, a report, a music score and the like, the characters and the straight line segments are target objects; and when the image to be corrected is a blank medical document or report image, the straight line segment in the image is the target object. Of course, other types of images to be corrected may be used, as long as text and/or straight line segments exist therein, and they can be used as target objects.

In an embodiment of the present disclosure, fig. 3 shows a schematic structural diagram of an image processing model, as shown in fig. 3, an image processing model 300 includes a feature extraction sub-model 301, a feature fusion sub-module 302, and a post-processing sub-model 303, where the feature extraction sub-model 301 is used for performing multi-layer convolution on a target object to obtain multi-level target feature information; the feature fusion sub-model 302 is used for performing feature fusion according to the feature information of each level of target to obtain target fusion feature information; the post-processing submodel 303 is configured to perform convolution processing on the target fusion feature information to obtain detection information corresponding to the target object.

In an embodiment of the present disclosure, fig. 4 shows a schematic structural diagram of a feature extraction sub-model 301, as shown in fig. 4, the feature extraction sub-model 301 includes a first convolution layer 401, a pooling layer 402, and a residual network module 403 connected to the pooling layer 402, where the residual network module 403 includes M +1 residual network layers connected in sequence: ResBlock1, ResBlock2, … … ResBlock M +1, where M is a positive integer.

In an embodiment of the present disclosure, fig. 5 shows a schematic flow chart of feature extraction performed by an image processing model, as shown in fig. 5, a feature extraction sub-model 301 includes a first convolution layer 501, a pooling layer 502, and four sequentially connected residual network layers 503-1, 503-2, 503-3, and 503-4, where the convolution kernel size of the first convolution layer 501 is 7 × 7, the number of channels is 64, and the number of channels of the residual network layers 503-1, 503-2, 503-3, and 503-4 is 64, 128, 256, and 512, respectively. When the multi-layer convolution is performed on the target object through the feature extraction submodel 301, the specific process is as follows: firstly, inputting an image to be corrected to a first convolution layer 501, and extracting the characteristics of a target object through the first convolution layer 501 to obtain initial characteristic information; inputting the initial characteristic information into the pooling layer 502, and performing dimension reduction processing on the initial characteristic information through the pooling layer 502 to obtain dimension reduction characteristic information; finally, the dimension reduction feature information is input into the residual error network module 503, and feature extraction is performed on the dimension reduction feature information through residual error network layers 503-1, 503-2, 503-3 and 503-4 which are sequentially connected in the residual error network module 503, so as to obtain multi-level target feature information. When the feature extraction is performed on the dimensionality reduction feature information through the residual error network layers 503-1, 503-2, 503-3 and 503-4 which are connected in sequence, firstly, the residual error network layer 503-1 is used for performing convolution processing on the dimensionality reduction feature information to obtain first-level target feature information; performing convolution processing on the primary target characteristic information through a residual error network layer 503-2 to obtain secondary target characteristic information; then, convolution processing is carried out on the secondary target characteristic information through a residual error network layer 503-3 to obtain tertiary target characteristic information; and finally, carrying out convolution processing on the three-level target characteristic information through the residual error network layer 503-4 to obtain four-level target characteristic information. Of course, the number of the residual network layers in the residual network module 503 is not limited to four, and may be any other number, which is not specifically limited in this disclosure.

In an embodiment of the present disclosure, fig. 6 further illustrates a schematic structural diagram of a residual network layer, as shown in fig. 6, the residual network layer sequentially includes an input layer 601, a convolutional layer 602, an active layer 603, a convolutional layer 604, a connection layer 605, an active layer 606, and an output layer 607, where the convolutional core sizes of the

convolutional layers

602 and 604 are both 3 × 3, and the number of channels is 64; the activation functions employed by the activation layers 603 and 606 may be ReLU functions, sigmod functions, and the like. After receiving the output information of the network layer before the current residual network layer, the input layer 601 uses the output information as the input information of the current residual network layer, and performs convolution processing on the input information through the convolution layer 602 to obtain first characteristic information; then, the first feature information is processed through the activation layer 603, and the activated first feature information is input into the convolutional layer 604, so as to further perform feature extraction on the convolutional layer and obtain second feature information; then, the second feature information and the input information are connected through the connection layer 605, the connected feature information is input into the activation layer 606, and the second feature information and the input information are processed through the activation layer 606; and finally, outputting the second feature information processed by the activation layer 606 to the next network layer through the output layer 607 for subsequent processing.

In the residual network layer, the input information passes through the convolutional layer and the active layer to obtain nonlinear expression information corresponding to the input information, and then the nonlinear expression information is added with the nonlinear expression information to obtain an output result, so that the information of a shallow network can be fully transmitted to a deep network, the network training is easier, the degradation problem in the network training process can be solved, and the characterization capability of the network is improved.

In one embodiment of the present disclosure, fig. 7 shows a schematic structural diagram of the feature fusion sub-model 302, and as shown in fig. 7, the feature fusion sub-model 302 includes N fusion network layers 701-1, 701-2, … …, 701-N connected in sequence and a second convolution layer 702 connected to the nth fusion network layer 701-N. After obtaining the multi-level target feature information, fusing the n-1 level fusion feature information and the target feature information output by the M +1-n residual error network layers through the nth level fusion network layer in the feature fusion sub-model 302 to obtain n level fusion feature information; repeating the previous step until N-level fusion characteristic information is obtained; then, inputting the N-level fusion feature information into a second convolution layer, and performing feature extraction on the N-level fusion feature information through the second convolution layer to obtain target fusion feature information; the zero-level fusion feature information is target feature information output by an M +1 th residual network layer, N is a positive integer not exceeding N, and N is a positive integer not exceeding M.

Returning to fig. 5, the feature fusion submodel 302 specifically includes three fusion network layers 504-1, 504-2 and 504-3 connected in sequence and a second convolutional layer 505 connected to the third fusion network layer 504-3, where the number of channels of the fusion network layers 504-1, 504-2 and 504-3 is 128, 64 and 32, respectively, the convolutional kernel size of the second convolutional layer 505 is 3 × 3, and the number of channels is 32. When feature fusion is performed, firstly, the first fusion network layer 504-1 fuses the four-level target feature information output by ResBlock4 and the three-level target feature information output by ResBlock3 to obtain first-level fusion feature information; then, the second convergence network layer 504-2 merges the first-level convergence feature information with the second-level target feature information output by the ResBlock2 to obtain second-level convergence feature information; then, the third convergence network layer 504-3 merges the second-level convergence feature information with the first-level target feature information output by ResBlock1 to obtain third-level convergence feature information; and finally inputting the three-level fusion feature information into the second convolution layer 505, and performing feature extraction on the three-level fusion feature information through the second convolution layer 505 to obtain target fusion feature information.

In an embodiment of the present disclosure, fig. 8 further illustrates a schematic structural diagram of a converged network layer, as shown in fig. 8, the converged network layer includes a first input layer 801, a second input layer 802, an upper pooling layer 803, a connection layer 804, a convolutional layer 805, an active layer 806, a convolutional layer 807, an active layer 808, and an output layer 809, where convolutional kernel sizes of the convolutional layer 805 and the convolutional layer 807 are both 3 × 3, and the number of channels is 64; the activation functions employed by the activation layers 806 and 808 may be ReLU functions, sigmod functions, and the like. After receiving the n-1 level fusion feature information, the first input layer 801 sends the n-1 level fusion feature information to the upper pooling layer 803, so that the upper pooling layer 803 performs pooling on the n-1 level fusion feature information to form n-1 level fusion feature information with the same size as that of target feature information output by the M +1-n residual network layers received by the second input layer 802; then, connecting the pooled n-1-level fusion feature information with the target feature information output by the M +1-n residual network layers through a connection layer 804 to obtain connection feature information; then, convolution-activation-convolution-activation processing is carried out on the connection characteristic information through a convolution layer 805, an activation layer 806, a convolution layer 807 and an activation layer 808 in sequence to obtain n-level fusion characteristic information; and finally, outputting the n-level fusion characteristic information to the next network layer through the output layer 809 for subsequent processing.

In the converged network layer, after a series of operations are performed on two different scales of the output features of the deep network and the output features of the previous shallow network, the input information of the next converged network layer can be obtained by fusion, and thus, the multi-scale features are continuously converged, which is beneficial to providing richer information. Meanwhile, different from the use of large convolution kernels like 5 × 5 and 7 × 7, the same receptive field is obtained by superposing a plurality of small convolution kernels of 3 × 3, the number of active layers is increased, and the nonlinear characterization capability of the network can be improved.

In one embodiment of the present disclosure, the post-processing submodel 303 is structurally different depending on the target object. When the target object is a character, the task of the image processing model is mainly to obtain character detection information, the character detection information may specifically include a character detection score map, a character distance regression map, and a character frame angle, and accordingly, the post-processing sub-model 303 may include three convolution layers: the character detection score map is obtained by performing feature extraction on the target fusion feature information through the third convolution layer, the character distance regression map is obtained by performing feature extraction on the target fusion feature information through the fourth convolution layer, and the character frame angle is obtained by performing feature extraction on the target fusion feature information through the fifth convolution layer; when the target object is a straight line segment, the task of the image processing model is to obtain the detection information of the straight line segment, and then the post-processing sub-model 303 may only include one convolution layer: the sixth convolution layer is used for extracting the characteristics of the target fusion characteristic information through the sixth convolution layer so as to obtain straight-line segment detection information; when the target object is a text and a straight line segment, the task of the image processing model is to obtain text detection information and straight line segment detection information, then the post-processing submodel 303 may contain four convolutional layers: and the seventh convolution layer, the eighth convolution layer, the ninth convolution layer and the tenth convolution layer respectively acquire a character detection score map, a character distance regression map and a character frame angle through the seventh convolution layer, the eighth convolution layer and the ninth convolution layer, and perform feature extraction on the target fusion feature information through the tenth convolution layer to acquire straight-line segment detection information.

Returning to fig. 5, the post-processing sub-model 303 includes a third convolution layer 506, a fourth convolution layer 507, a fifth convolution layer 508 and a sixth convolution layer 509, wherein the third convolution layer 506 is used for performing feature extraction on the target fusion feature information to obtain a text detection score map; the fourth convolution layer 507 is used for extracting the characteristics of the target fusion characteristic information to obtain a text distance regression graph; the fifth convolutional layer 508 is used for extracting the characteristics of the target fusion characteristic information to obtain the angle information of the character frame; the sixth convolutional layer 509 is used to perform feature extraction on the target fusion feature information to obtain straight-line segment detection information.

Fig. 9A-9C show schematic diagrams of interfaces before and after the image to be corrected is processed by the image processing model, and as shown in fig. 9A, the original image to be corrected with tilt can obtain pixel-level character detection information (fig. 9B) and line segment detection information (fig. 9C) after the characters and line segments in the image to be corrected are subjected to feature extraction by the image processing model.

In one embodiment of the disclosure, each pixel value in the text detection score map represents the size of the possibility that a text exists at the position, each pixel value in the text distance regression map represents the distance from the position to the four sides of the nearest text border, and each pixel value in the text border angle information represents the rotation angle of the text at the position. And determining the vertex coordinates of the character frame according to the character detection score map, the character distance regression map and the character frame angle information. Meanwhile, the detection information of the straight line segments is used for obtaining the pixel level detection result of the straight line segments in the image to be corrected.

The image processing model in the embodiment of the disclosure is an effective and unified network structure, can output pixel level detection results of characters and straight lines in a document at the same time, is simple and efficient, utilizes synergistic effect between the characters and the straight line segments, and judges the rotation angle of the document image by combining the information of the characters and the straight line segments, so that the robustness is better.

In step S220, vectorization processing is performed on the detection information to acquire coordinate information corresponding to the target object.

In an embodiment of the present disclosure, the character detection information and/or the straight-line segment detection information obtained in step S210 are pixel-level detection results, and specific coordinate information of the characters and/or the straight-line segments in the image to be corrected cannot be given, so that vectorization processing needs to be performed on the obtained detection information to obtain coordinate information corresponding to the target object.

In an embodiment of the disclosure, after the character detection information is obtained, the frame coordinates corresponding to the target character frame may be obtained according to the character detection score map, the character distance regression map, and the character frame angle. Fig. 10 is a schematic diagram illustrating a flow of vectorization processing according to text detection information, and as shown in fig. 10, the flow at least includes steps S1001 to S1003, specifically:

in step S1001, pixels in the text detection score map are filtered according to a first threshold to obtain target pixels with text detection scores greater than or equal to the first threshold.

In one embodiment of the present disclosure, the text detection score map is specifically a number matrix, where each number is any value in [0,1], and represents the probability that the pixel at the position is text, that is, when the detection score value is small, the probability that text exists at the position is small, and when the detection score value is large, the probability that text exists at the position is large, so it is necessary to set a first threshold, screen and filter the detection scores in the text detection score map, and by comparing the detection score at each position with the first threshold, only the detection score greater than or equal to the first threshold is retained, and the pixel position corresponding to the retained detection score is the position where text exists, and further, the pixel at the position where text exists may be defined as a target pixel, where the first threshold may be set to 0.8, other values, such as 0.78, 0.9, etc., may also be set, as embodiments of the present disclosure are not specifically limited in this regard.

In step S1002, frame coordinates of the text corresponding to the target pixel are calculated from the text distance regression map and the text frame angle information.

In an embodiment of the present disclosure, after pixel positions where there is a high possibility of text existence are obtained, frame coordinates of the text corresponding to the target pixel may be calculated at the pixel positions according to the text distance regression map and the text frame angle information, where the frame coordinates are four vertex coordinates of a text frame corresponding to the text. Each pixel point in the text distance regression graph and the text frame angle information is in one-to-one correspondence, and for each pixel point (c)_x，c_y) The 4 distance values d corresponding to the pixel point position can be obtained from the text distance regression graph₁、d₂、d₃And d₄And respectively representing the vertical distances from the pixel point to the upper boundary, the right boundary, the lower boundary and the left boundary of the frame, and acquiring an angle value corresponding to the position of the pixel point from the character frame angle information, wherein the angle value represents the rotation angle theta of the character frame. According to (c)_x，c_y)、d₁、d₂、d₃、d₄And theta, calculating the coordinate values of 4 vertexes of the character frame, wherein the coordinate value of the vertex at the upper left corner is as follows: x ═ c_x-d₄)cosθ+(c_y+d₁)sinθ+(1-cosθ)c_x-sinθc_y，y＝-(c_x-d₄)sinθ+(c_y+d₁)cosθ+sinθc_x+(1-cosθ)c_yOther vertex coordinates can also be obtained by a similar calculation method, and the embodiment of the disclosure is not described herein again.

In step S1003, the frame coordinates are filtered according to the text detection score corresponding to the target pixel and the overlap degree of the text frame corresponding to the frame coordinates, so as to obtain the frame coordinates corresponding to the target text frame.

In an embodiment of the present disclosure, a large number of text borders may be obtained by the method in step S1002, and a part of the text borders may overlap to a large extent, so to obtain the border coordinates corresponding to the target text border, the overlapped text borders may be screened and filtered according to the text detection scores and the overlapping degrees of the text borders, so as to obtain the target text border with a large output response and a small overlapping area, where the output response is the text detection score, that is, the target text border has a high possibility of containing text and is not overlapped or slightly overlapped with other target text borders. In the embodiment of the disclosure, filtering may be specifically performed by using a non-maximum suppression method, selecting a characteristic text frame with the maximum text detection score from a group of text frames, calculating the overlapping degree between other text frames in the group of text frames and the characteristic text frame, filtering out the corresponding text frame when the overlapping degree is greater than a preset threshold, and then continuously repeating the above steps in the remaining other group of text frames, so as to obtain a plurality of target text frames with large output responses and small overlapping areas, thereby obtaining frame coordinates corresponding to the target text frames.

In an embodiment of the present disclosure, after the straight-line segment detection information is obtained, vectorization processing may be performed on the straight-line segment detection information to obtain coordinate information corresponding to the straight-line segment. Specifically, hough transformation may be performed on the straight-line segment detection information, and coordinate information of a plurality of line segments is obtained according to the pixel-level straight-line segment detection information, where the coordinate information is coordinate information of each line segment in a cartesian rectangular coordinate system; next, the nearest neighbor line segments of the multiple line segments can be spliced to obtain longer straight line segments; and finally, acquiring the coordinates of the end points at the two ends of the straight line segment after acquiring all spliced straight line segments.

When carrying out Hough transform according to straight line segment detection information, firstly initializing a correlation matrix, wherein the correlation matrix comprises an angle list, a distance list and a voting matrix, wherein the angle list α is [0,1,2, …,178,179], each angle in the angle list refers to the angle between the perpendicular line from the origin to a target straight line segment and the x coordinate axis, the distance list rho is [ -diag _ len +1, …, diag _ len-1, diag _ len ], each distance in the distance list refers to the distance between the perpendicular line from the origin to the target straight line segment, the value in the voting matrix is 0, the number of row elements is the number of elements in the distance list, and the number of column elements is the number of elements in the angle list, then traversing each non-zero pixel point in the straight line segment detection information of the pixel level, calculating the perpendicular distance value corresponding to the pixel point under the angle value, the distance value and the angle value can form a data pair (rho, α), if the value corresponding to the straight line segment detection information in the straight line segment detection information is greater than the linear coordinate system, the linear segment detection information is obtained, the voting matrix is calculated, and the voting matrix is calculated as a linear image, if the linear image is larger than the linear image is obtained by adding the linear coordinate system, the linear image is obtained by adding the linear coordinate system, the linear voting matrix (2, and the linear voting matrix), and the linear voting matrix, the linear image is calculated by adding the linear image voting matrix (rho) and the linear image processing method of which is the linear image which is obtained by adding the linear image which is obtained by adding the linear image which is obtained by adding the linear.

When the nearest neighbor line segments of the plurality of line segments are spliced, any two line segments of the plurality of line segments can be determined as a first line segment and a second line segment; then, calculating a first distance from the midpoint of the first line segment to the second line segment and a second distance from the midpoint of the second line segment to the first line segment; and then comparing the first distance and the second distance with a second threshold respectively, and judging whether to splice the first line segment and the second line segment according to a comparison result. Specifically, when both the first distance and the second distance are smaller than the second threshold, it is stated that the first line segment and the second line segment may constitute a straight line segment, and thus the first line segment and the second line segment may be spliced. Strictly speaking, in the plurality of broken line segments, any two line segments composing the same straight line segment, wherein the distance from the midpoint of one to the other is 0, but in order to improve the applicable range of the image direction correction method provided by the embodiment of the present disclosure, the second threshold may be set to a small value other than zero, for example, to 20, and even if the first line segment and the second line segment are not strictly located on the same straight line, the first line segment and the second line segment can be spliced as long as the second threshold is not exceeded. By executing the processing on all the line segments, longer straight line segments can be obtained, the number of the line segments is reduced, the length is increased, more accurate straight line slope can be obtained, and the whole image is more regular. Fig. 11A-11B show schematic diagrams of interfaces of images to be corrected before and after line segment splicing, as shown in fig. 11A, a plurality of longer straight line segments are formed for a broken line segment after hough transform after splicing, as shown by thicker lines in fig. 11B.

Meanwhile, fig. 12A-12B show schematic diagrams of an interface after vectorization processing is performed on the detection information, and as shown in fig. 12A, the interface is a character frame formed according to frame coordinates corresponding to a target character frame obtained after vectorization processing is performed on the character detection information; as shown in fig. 12B, a straight line segment is formed from the coordinates of a plurality of line segments obtained after vectorization processing is performed on the straight line segment detection information.

In step S230, a rotation angle corresponding to the image to be corrected is determined based on the coordinate information, and the direction of the image to be corrected is corrected according to the rotation angle.

In one embodiment of the present disclosure, when the target object is only text, after obtaining the frame coordinates of the target text frame with a large output response and a small overlap area, the upper edge line and the lower edge line of the text frame may be determined according to the frame coordinates, and then the slopes of the upper edge line and the lower edge line may be calculated. Since the frame coordinates corresponding to the plurality of target text frames are obtained in step S220, after calculating the slopes of the upper and lower lines of the text frame, a plurality of slopes of the upper and lower lines are generated, and each slope of the upper and lower lines is different from each other, in order to determine the rotation angle of the image to be corrected, all slopes may be counted, the occurrence frequency of each slope is obtained, and the slope with the largest occurrence frequency is used as the first target slope, where the angle corresponding to the first target slope is the rotation angle of the text and is the rotation angle of the image to be corrected. For example, the first target slope is 1, the rotation angle of the image to be corrected is 45 ° clockwise or 225 ° counterclockwise from the normal reading direction.

In an embodiment of the present disclosure, when the target object is only a straight line segment, since there may be a plurality of straight line segments in the image to be corrected, in order to obtain the rotation angle of the image to be corrected, a slope of each straight line segment may be calculated, then the occurrence number of each slope is counted, and the slope with the largest occurrence number is taken as a second target slope, where an angle corresponding to the second target slope is the rotation angle of the straight line segment, that is, the rotation angle of the image to be corrected. For example, the second target slope is

The rotation angle of the image to be corrected is either 30 counterclockwise or 270 clockwise from the normal reading direction. When only one straight line segment exists in the image to be corrected, the angle corresponding to the slope of the straight line segment is the rotating angle of the image to be corrected.

In one embodiment of the present disclosure, when the target object is a text and a straight line segment, the rotation angle of the text and the rotation angle of the straight line segment may be obtained by the above method, and then the rotation angle of the image to be corrected is determined according to the rotation angle of the text and the rotation angle of the straight line segment. Specifically, the rotation angle of the character and the rotation angle of the straight line segment are differentiated and the absolute value is taken to obtain the rotation angle difference; then comparing the rotation angle difference with a third threshold value; when the rotation angle difference is smaller than or equal to a third threshold value, obtaining the average value of the rotation angle of the characters and the rotation angle of the straight line segment, and taking the average value as the rotation angle corresponding to the image to be corrected; and when the rotation angle difference is larger than a third threshold value, taking the rotation angle of the straight line segment as the rotation angle corresponding to the image to be corrected. For example, when the rotation angle of the text is 45 ° clockwise, the rotation angle of the straight line segment is 44 ° clockwise, and the third threshold is 1 °, it can be determined that the rotation angle difference between the rotation angle of the text and the rotation angle of the straight line segment is 1 °, and the rotation angle difference is equal to the third threshold, then the average value of the rotation angle of the text and the rotation angle of the straight line segment is 45.5 °, and the average value of 45.5 ° is taken as the rotation angle of the image to be corrected. If the rotation angle of the text is clockwise 45 degrees, the rotation angle of the straight line segment is clockwise 43 degrees, and the third threshold is 1 degree, because the rotation angle difference between the rotation angle of the text and the rotation angle of the straight line segment is 2 degrees, the rotation angle difference is greater than the third threshold, the rotation angle of the straight line segment of 43 degrees can be used as the rotation angle of the image to be corrected.

Fig. 13A-13B are schematic diagrams illustrating interfaces before and after the image direction correction, as shown in fig. 13A, for an image to be corrected which originally has a tilt, an image which conforms to a normal reading direction is formed after the image direction correction method according to the embodiment of the present disclosure corrects, as shown in fig. 13B.

In one embodiment of the disclosure, when the target object includes characters and straight line segments, the rotation angle of the image to be corrected is determined according to the rotation angle of the characters and the rotation angle of the straight line segments, so that not only character information in the document image is utilized, but also straight line information in the document image is utilized, and the two are complementary, thereby effectively avoiding the risk of any information loss or abnormality, and enabling the determined rotation angle of the image to be corrected to be more accurate and more robust.

In an embodiment of the present disclosure, after the rotation angle of the image to be corrected is obtained, the direction of the image to be corrected may be corrected according to the rotation angle. Specifically, the rotation matrix may be determined according to the rotation angle of the image to be corrected and the coordinates of the center point of the image to be corrected, and then the image to be corrected is corrected according to the rotation matrix. The rotation matrix may be determined according to equation one:

wherein M is a rotation matrix, theta is a rotation angle of the image to be corrected, (c)_0x，c_0y) The coordinate of the central point of the image to be corrected.

When the image to be corrected is corrected according to the rotation matrix, the pixel matrix corresponding to the image to be corrected can be multiplied by the rotation matrix, so that the direction of the image to be corrected can be corrected, and the direction of the document image is changed into the normal reading direction.

In an embodiment of the present disclosure, after obtaining an image to be corrected, a first direction correction may be performed on the image to be corrected, for example, angle corrections of 90 °, 180 °, and 270 ° are performed on the image to be corrected, so that a rotation angle of the image to be corrected after the first direction correction is limited within a range of [ -45 °, and 45 ° ], and then feature extraction is performed on the image to be corrected within the range of the rotation angle of [ -45 °, and 45 ° ] through an image processing model to obtain detection information, vectorization processing is performed on the detection information, and a second direction correction is performed according to coordinate information obtained through the vectorization processing, where methods of the feature extraction, the vectorization processing, and the second direction correction are the same as the methods in the foregoing embodiments, and details of the embodiment of the present disclosure are omitted here.

In one embodiment of the present disclosure, after acquiring a document image having a normal reading direction, optical character recognition may be performed on the document image to recognize and acquire text information therein. Of course, feature extraction may be performed on the document image through other machine learning models, such as a convolutional neural network, a cyclic neural network, and so on, to obtain text information therein.

14A-14D show schematic diagrams of interfaces for performing direction correction on a tilted medical document image, as shown in FIG. 14A, the tilted medical document image is an original tilted medical document image, the tilting direction is clockwise, and the tilting angle is in the range of [ -45 degrees, 45 degrees ]; extracting the characteristics of characters and straight line segments in the inclined medical document image through an image processing model, and obtaining character detection information corresponding to the characters and straight line segment detection information corresponding to the straight line segments, wherein the character detection information and the straight line segment detection information are in pixel level; vectorizing the character detection information and the straight-line segment detection information, specifically, vectorizing the character detection information, firstly filtering the pixel positions of characters according to a character detection score map, then calculating the vertex coordinates of each character frame according to a character distance regression map and a character frame angle, and finally inhibiting and filtering partial character frames through a non-maximum value so as to obtain a target character frame with large characteristic response and a small overlapping area and frame coordinates corresponding to the target text frame; vectorizing the straight-line segment detection information, firstly carrying out Hough transformation on the straight-line segment detection information to obtain a plurality of broken line segments, then carrying out distance judgment on the broken line segments, finally splicing the similar broken line segments to form a long straight-line segment, and obtaining the end point coordinates of the spliced straight-line segment, wherein the target character frame is shown as a thick line in fig. 14B, and the spliced straight-line segment is shown as a thick line in fig. 14C; finally, the rotation angle of the text can be determined according to the frame coordinates, the rotation angle of the straight line segment can be determined according to the endpoint coordinates of the straight line segment, the rotation angle of the inclined medical document image can be determined according to the rotation angle of the text and the rotation angle of the straight line segment, a rotation matrix can be determined according to the rotation angle, and the inclined medical document image is processed according to the rotation matrix to obtain a corrected medical document image, as shown in fig. 14D.

Similarly, fig. 15A to 15D also show schematic diagrams of interfaces for performing direction correction on an oblique medical document image, where fig. 15A shows an original medical document image with an oblique angle, fig. 15B shows a border of a target text in the medical document image, fig. 15C shows a straight line segment after splicing in the medical document image, and fig. 15D shows the medical document image after direction correction, and a method for acquiring each interface schematic diagram is the same as the method in fig. 14, and is not repeated here.

The method comprises the steps that the image processing model with the feature extraction submodel, the feature fusion submodel and the post-processing submodel is used for carrying out feature extraction on a target object in an image to be corrected to obtain detection information corresponding to the target object; vectorizing the detection information to obtain coordinate information corresponding to the target object; and finally, determining a rotation angle corresponding to the image to be corrected based on the coordinate information, and correcting the direction of the image to be corrected according to the rotation angle. According to the embodiment of the invention, the fine-grained rotation angle of the image to be corrected can be corrected by combining the traditional image processing technology and the deep learning technology, so that the correction accuracy of the image to be corrected is improved; meanwhile, the efficiency and the accuracy of information detection and information identification can be improved, and the efficiency and the accuracy of subsequent information structuring processing are greatly improved.

The following describes embodiments of the apparatus of the present disclosure, which may be used to perform the image direction correction method in the above embodiments of the present disclosure. For details that are not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the embodiments of the image direction correcting method described above in the present disclosure.

Fig. 16 schematically shows a block diagram of an image orientation correcting apparatus according to an embodiment of the present disclosure.

Referring to fig. 16, an image orientation correcting apparatus 1600 according to an embodiment of the present disclosure includes: a detection information acquisition module 1601, a coordinate information acquisition module 1602, and an image direction correction module 1603.

The detection information acquiring module 1601 is configured to acquire an image to be corrected, and perform feature extraction on a target object in the image to be corrected through an image processing model to acquire detection information corresponding to the target object; a coordinate information obtaining module 1602, configured to perform vectorization processing on the detection information to obtain coordinate information corresponding to the target object; an image direction correcting module 1603, configured to determine a rotation angle corresponding to the image to be corrected based on the coordinate information, and correct the direction of the image to be corrected according to the rotation angle.

In one embodiment of the present disclosure, the image processing model includes a feature extraction sub-model, a feature fusion sub-model, and a post-processing sub-model; the detection information acquiring module 1601 includes: the target characteristic information acquisition unit is used for carrying out multilayer convolution on the target object through the characteristic extraction submodel so as to acquire multistage target characteristic information; the target fusion characteristic information acquisition unit is used for carrying out characteristic fusion according to the target characteristic information of each level through the characteristic fusion submodel to acquire target fusion characteristic information; and the detection information acquisition unit is used for performing convolution processing on the target fusion characteristic information through the post-processing sub-model so as to acquire detection information corresponding to the target object.

In an embodiment of the present disclosure, the feature extraction submodel includes a first convolution layer, a pooling layer connected to the first convolution layer, and a residual network module connected to the pooling layer, where the residual network module includes M +1 residual network layers connected in sequence, and M is a positive integer; the target feature information acquisition unit is configured to: inputting the image to be corrected to the first convolution layer, and performing feature extraction on the target object through the first convolution layer to obtain initial feature information; inputting the initial characteristic information into the pooling layer, and performing dimensionality reduction processing on the initial characteristic information through the pooling layer to obtain dimensionality reduction characteristic information; and inputting the dimension reduction characteristic information into the residual error network module, and performing characteristic extraction on the dimension reduction characteristic information through the residual error network layers sequentially connected in the residual error network module to obtain the multi-stage target characteristic information.

In one embodiment of the present disclosure, the feature fusion sub-model includes N fusion network layers connected in sequence and a second convolutional layer connected to the nth fusion network layer; the target fusion feature information acquisition unit is configured to: fusing the n-1 level fusion characteristic information and the target characteristic information output by the M +1-n residual error network layers through the nth fusion network layer to obtain n level fusion characteristic information; repeating the previous step until N-level fusion characteristic information is obtained; inputting the N-level fusion feature information into the second convolution layer, and performing feature extraction on the N-level fusion feature information through the second convolution layer to obtain the target fusion feature information; the zero-level fusion feature information is target feature information output by an M +1 th residual network layer, N is a positive integer not exceeding N, and N is a positive integer not exceeding M.

In one embodiment of the present disclosure, the target object is a text, the detection information is text detection information, and the post-processing submodel includes a third convolution layer, a fourth convolution layer, and a fifth convolution layer that are independent of each other; the detection information acquisition unit is configured to: extracting the characteristics of the target fusion characteristic information through the third convolution layer to obtain a character detection score map in the character detection information; extracting the characteristics of the target fusion characteristic information through the fourth convolution layer to obtain a text distance regression graph in the text detection information; and performing feature extraction on the target fusion feature information through the fifth convolution layer to acquire character frame angle information in the character detection information.

In an embodiment of the present disclosure, the target object is a straight line segment, the detection information is straight line segment detection information, and the post-processing submodel includes a sixth convolutional layer; the detection information acquisition unit is configured to: and performing feature extraction on the target fusion feature information through the sixth convolution layer to acquire the straight-line segment detection information.

In one embodiment of the present disclosure, the target object is a text and a straight line segment, the detection information is text detection information and straight line segment detection information, and the post-processing submodel includes a seventh convolution layer, an eighth convolution layer, a ninth convolution layer, and a tenth convolution layer independently of each other; the detection information acquisition unit is configured to: extracting the characteristics of the target fusion characteristic information through the seventh convolution layer to obtain a character detection score map in the character detection information; extracting the features of the target fusion feature information through the eighth convolution layer to obtain a text distance regression graph in the text detection information; extracting the characteristics of the target fusion characteristic information through the ninth convolution layer to obtain character frame angle information in the character detection information; and performing feature extraction on the target fusion feature information through the tenth convolution layer to acquire the straight-line segment detection information.

In one embodiment of the present disclosure, the coordinate information obtaining module 1602 is configured to: screening pixels in the character detection score map according to a first threshold value to obtain target pixels with character detection scores larger than or equal to the first threshold value; calculating the frame coordinate of the character corresponding to the target pixel according to the character distance regression graph and the character frame angle information; and filtering the frame coordinates according to the character detection scores corresponding to the target pixels and the overlapping degree of the character frames corresponding to the frame coordinates to obtain the frame coordinates corresponding to the target character frames.

In one embodiment of the present disclosure, the coordinate information obtaining module 1602 is configured to: carrying out Hough transform on the straight line segment detection information to obtain coordinate information of a plurality of line segments; determining any two line segments of the line segments as a first line segment and a second line segment, and calculating a first distance from the midpoint of the first line segment to the second line segment and a second distance from the midpoint of the second line segment to the first line segment; judging whether the first distance and the second distance are both smaller than a second threshold value; when a target first line segment and a target second line segment exist, wherein the first distance and the second distance are both smaller than the second threshold value, the first target line segment and the second target line segment are spliced; and acquiring the endpoint coordinates of the straight line segment formed by splicing the line segments.

In one embodiment of the present disclosure, the target object is a character, and the coordinate information is a frame coordinate of a frame of the target character; the image orientation correction module 1603 is configured to: determining an upper edge line and a lower edge line of the character frame according to the frame coordinates of the target character frame; calculating the slopes of the upper edge line and the lower edge line, counting the occurrence times of the slopes, and acquiring a first target slope with the largest occurrence times; and determining the rotation angle of the characters according to the first target slope, and taking the rotation angle of the characters as the rotation angle corresponding to the image to be corrected.

In one embodiment of the present disclosure, the target object is a straight line segment, and the coordinate information is an endpoint coordinate of the straight line segment; the image orientation correction module 1603 is configured to: calculating the slope of the straight line segment according to the coordinates of the endpoint of the straight line segment; counting the occurrence times of the slopes, and acquiring a second target slope with the maximum occurrence times; and determining the rotation angle of the straight line segment according to the second target slope, and taking the rotation angle of the straight line segment as the rotation angle corresponding to the image to be corrected.

In one embodiment of the present disclosure, the target object is a text and a straight line segment, and the coordinate information is a frame coordinate of a frame of the target text and an endpoint coordinate of the straight line segment; the image orientation correction module 1603 is configured to: calculating the rotation angle of the character according to the frame coordinates of the frame of the target character, and calculating the rotation angle of the straight line segment according to the endpoint coordinates of the straight line segment; making a difference between the rotation angle of the characters and the rotation angle of the straight line segment and taking an absolute value to obtain a rotation angle difference; comparing the rotation angle difference to a third threshold; when the rotation angle difference is smaller than or equal to the third threshold value, obtaining an average value of the rotation angle of the characters and the rotation angle of the straight line segment, and taking the average value as the rotation angle corresponding to the image to be corrected; and when the rotation angle difference is larger than the third threshold value, taking the rotation angle of the straight line segment as the rotation angle corresponding to the image to be corrected.

In one embodiment of the present disclosure, the image direction correction module 1603 is configured to: determining a rotation matrix according to the rotation angle and the coordinates of the central point of the image to be corrected; and multiplying the pixel matrix corresponding to the image to be corrected by the rotation matrix so as to correct the direction of the image to be corrected.

It should be noted that the computer system 1700 of the electronic device shown in fig. 17 is only an example, and should not bring any limitation to the functions and the scope of the application of the embodiments of the present disclosure.

As shown in fig. 17, the computer system 1700 includes a Central Processing Unit (CPU)1701 which can perform various appropriate actions and processes according to a program stored in a Read-Only Memory (ROM) 1702 or a program loaded from a storage portion 1708 into a Random Access Memory (RAM) 1703, implementing the image labeling method described in the above embodiments. In the RAM 1703, various programs and data necessary for system operation are also stored. The CPU 1701, ROM 1702, and RAM 1703 are connected to each other through a bus 1704. An Input/Output (I/O) interface 1705 is also connected to the bus 1704.

The following components are connected to the I/O interface 1705: an input section 1706 including a keyboard, a mouse, and the like; an output section 1707 including a Display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage portion 1708 including a hard disk and the like; and a communication section 1709 including a network interface card such as a LAN (Local area network) card, a modem, or the like. The communication section 1709 performs communication processing via a network such as the internet. A driver 1710 is also connected to the I/O interface 1705 as necessary. A removable medium 1711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1710 as necessary, so that a computer program read out therefrom is mounted into the storage portion 1708 as necessary.

In particular, the processes described below with reference to the flowcharts may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via the communication portion 1709, and/or installed from the removable media 1711. When the computer program is executed by a Central Processing Unit (CPU)1701, various functions defined in the system of the present disclosure are executed.

It should be noted that the computer readable medium shown in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable Compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.

As another aspect, the present disclosure also provides a computer-readable medium that may be contained in the image processing apparatus described in the above-described embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the method described in the above embodiments.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. An image orientation correction method, comprising:

acquiring an image to be corrected, and performing feature extraction on a target object in the image to be corrected through an image processing model to acquire detection information corresponding to the target object;

vectorizing the detection information to obtain coordinate information corresponding to the target object;

and determining a rotation angle corresponding to the image to be corrected based on the coordinate information, and correcting the direction of the image to be corrected according to the rotation angle.

2. The method of claim 1, wherein the image processing model comprises a feature extraction sub-model, a feature fusion sub-model, and a post-processing sub-model;

the feature extraction of the target object in the image to be corrected through the image processing model to obtain the detection information corresponding to the target object includes:

performing multilayer convolution on the target object through the feature extraction submodel to obtain multilevel target feature information;

performing feature fusion according to the target feature information of each level through the feature fusion submodel to obtain target fusion feature information;

and performing convolution processing on the target fusion characteristic information through the post-processing sub-model to acquire detection information corresponding to the target object.

3. The method of claim 2, wherein the feature extraction submodel comprises a first convolutional layer, a pooling layer connected to the first convolutional layer, and a residual network module connected to the pooling layer, wherein the residual network module comprises M +1 residual network layers connected in sequence, M being a positive integer;

the multi-layer convolution is carried out on the target object through the feature extraction submodel to obtain multi-level target feature information, and the method comprises the following steps:

inputting the image to be corrected to the first convolution layer, and performing feature extraction on the target object through the first convolution layer to obtain initial feature information;

inputting the initial characteristic information into the pooling layer, and performing dimensionality reduction processing on the initial characteristic information through the pooling layer to obtain dimensionality reduction characteristic information;

and inputting the dimension reduction characteristic information into the residual error network module, and performing characteristic extraction on the dimension reduction characteristic information through the residual error network layers sequentially connected in the residual error network module to obtain the multi-stage target characteristic information.

4. The method of claim 2, wherein the feature fusion sub-model comprises N fused network layers connected in sequence and a second convolutional layer connected to the nth fused network layer;

the feature fusion is performed according to the target feature information of each level through the feature fusion submodel to obtain target fusion feature information, and the method comprises the following steps:

fusing the n-1 level fusion characteristic information and the target characteristic information output by the M +1-n residual error network layers through the nth fusion network layer to obtain n level fusion characteristic information;

repeating the previous step until N-level fusion characteristic information is obtained;

inputting the N-level fusion feature information into the second convolution layer, and performing feature extraction on the N-level fusion feature information through the second convolution layer to obtain the target fusion feature information;

the zero-level fusion feature information is target feature information output by an M +1 th residual network layer, N is a positive integer not exceeding N, and N is a positive integer not exceeding M.

5. The method of claim 1, wherein the target object is text, the detection information is text detection information, and the post-processing submodel includes a third convolutional layer, a fourth convolutional layer, and a fifth convolutional layer independently of each other;

the performing convolution processing according to the target fusion characteristic information through the post-processing submodel to obtain the detection information includes:

extracting the characteristics of the target fusion characteristic information through the third convolution layer to obtain a character detection score map in the character detection information;

extracting the characteristics of the target fusion characteristic information through the fourth convolution layer to obtain a text distance regression graph in the text detection information;

and performing feature extraction on the target fusion feature information through the fifth convolution layer to acquire character frame angle information in the character detection information.

6. The method of claim 1, wherein the target object is a straight line segment, the detection information is straight line segment detection information, and the post-processing submodel includes a sixth convolutional layer;

and performing feature extraction on the target fusion feature information through the sixth convolution layer to acquire the straight-line segment detection information.

7. The method of claim 1, wherein the target object is a text and a straight line segment, the detection information is text detection information and straight line segment detection information, and the post-processing submodel includes a seventh convolution layer, an eighth convolution layer, a ninth convolution layer, and a tenth convolution layer independently of each other;

extracting the characteristics of the target fusion characteristic information through the seventh convolution layer to obtain a character detection score map in the character detection information;

extracting the features of the target fusion feature information through the eighth convolution layer to obtain a text distance regression graph in the text detection information;

extracting the characteristics of the target fusion characteristic information through the ninth convolution layer to obtain character frame angle information in the character detection information;

and performing feature extraction on the target fusion feature information through the tenth convolution layer to acquire the straight-line segment detection information.

8. The method according to claim 5, wherein the vectorizing the detection information to obtain coordinate information corresponding to the target object includes:

screening pixels in the character detection score map according to a first threshold value to obtain target pixels with character detection scores larger than or equal to the first threshold value;

calculating the frame coordinate of the character corresponding to the target pixel according to the character distance regression graph and the character frame angle information;

and filtering the frame coordinates according to the character detection scores corresponding to the target pixels and the overlapping degree of the character frames corresponding to the frame coordinates to obtain the frame coordinates corresponding to the target character frames.

9. The method according to claim 6, wherein the vectorizing the detection information to obtain coordinate information corresponding to the target object includes:

carrying out Hough transform on the straight line segment detection information to obtain coordinate information of a plurality of line segments;

determining any two line segments of the line segments as a first line segment and a second line segment, and calculating a first distance from the midpoint of the first line segment to the second line segment and a second distance from the midpoint of the second line segment to the first line segment;

judging whether the first distance and the second distance are both smaller than a second threshold value;

when a target first line segment and a target second line segment exist, wherein the first distance and the second distance are both smaller than the second threshold value, the first target line segment and the second target line segment are spliced;

and acquiring the endpoint coordinates of the straight line segment formed by splicing the line segments.

10. The method of claim 1, wherein the target object is a text, and the coordinate information is a frame coordinate of a frame of the target text;

the determining a rotation angle corresponding to the image to be corrected based on the coordinate information includes:

determining an upper edge line and a lower edge line of the target character frame according to the frame coordinates of the target character frame;

calculating the slopes of the upper edge line and the lower edge line, counting the occurrence times of the slopes, and acquiring a first target slope with the largest occurrence times;

and determining the rotation angle of the characters according to the first target slope, and taking the rotation angle of the characters as the rotation angle corresponding to the image to be corrected.

11. The method according to claim 1, wherein the target object is a straight line segment, and the coordinate information is an end point coordinate of the straight line segment;

calculating the slope of the straight line segment according to the coordinates of the endpoint of the straight line segment;

counting the occurrence times of the slopes, and acquiring a second target slope with the maximum occurrence times;

and determining the rotation angle of the straight line segment according to the second target slope, and taking the rotation angle of the straight line segment as the rotation angle corresponding to the image to be corrected.

12. The method of claim 1, wherein the target object is a text and a straight line segment, and the coordinate information is a frame coordinate of a frame of the target text and an endpoint coordinate of the straight line segment;

calculating the rotation angle of the character according to the frame coordinates of the frame of the target character, and calculating the rotation angle of the straight line segment according to the endpoint coordinates of the straight line segment;

making a difference between the rotation angle of the characters and the rotation angle of the straight line segment and taking an absolute value to obtain a rotation angle difference;

comparing the rotation angle difference to a third threshold;

when the rotation angle difference is smaller than or equal to the third threshold value, obtaining an average value of the rotation angle of the characters and the rotation angle of the straight line segment, and taking the average value as the rotation angle corresponding to the image to be corrected;

and when the rotation angle difference is larger than the third threshold value, taking the rotation angle of the straight line segment as the rotation angle corresponding to the image to be corrected.

13. The method according to claim 1, wherein the correcting the direction of the image to be corrected according to the rotation angle comprises:

determining a rotation matrix according to the rotation angle and the coordinates of the central point of the image to be corrected;

and multiplying the pixel matrix corresponding to the image to be corrected by the rotation matrix so as to correct the direction of the image to be corrected.

14. An image orientation correcting apparatus, comprising:

the detection information acquisition module is used for acquiring an image to be corrected and extracting the characteristics of a target object in the image to be corrected through an image processing model so as to acquire detection information corresponding to the target object;

a coordinate information obtaining module, configured to perform vectorization processing on the detection information to obtain coordinate information corresponding to the target object;

and the image direction correcting module is used for determining a rotating angle corresponding to the image to be corrected based on the coordinate information and correcting the direction of the image to be corrected according to the rotating angle.

15. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs that, when executed by the one or more processors, cause the one or more processors to perform the image orientation correction method of any one of claims 1 to 13.