CN113850220A - Text perspective transformation method and equipment - Google Patents

Text perspective transformation method and equipment Download PDF

Info

Publication number
CN113850220A
CN113850220A CN202111168050.7A CN202111168050A CN113850220A CN 113850220 A CN113850220 A CN 113850220A CN 202111168050 A CN202111168050 A CN 202111168050A CN 113850220 A CN113850220 A CN 113850220A
Authority
CN
China
Prior art keywords
text
image
identification code
detected
perspective transformation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111168050.7A
Other languages
Chinese (zh)
Inventor
谭黎敏
龚霁程
赵钊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Westwell Information Technology Co Ltd
Original Assignee
Shanghai Westwell Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Westwell Information Technology Co Ltd filed Critical Shanghai Westwell Information Technology Co Ltd
Priority to CN202111168050.7A priority Critical patent/CN113850220A/en
Publication of CN113850220A publication Critical patent/CN113850220A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Image Processing (AREA)

Abstract

The invention provides a text perspective transformation method and a device, which are used for processing an image to be detected containing a standard text, wherein the standard text comprises a first identification code and a second identification code which are positioned on the same side of a text main body, and the text perspective transformation method comprises the following steps: identifying the positions of the first identification code and the second identification code in the image to be detected; rotating the image to be detected according to the positions of the first identification code and the second identification code; identifying four corner points of the rotated text body of the image to be detected; and carrying out perspective transformation on the rotated image to be detected according to four corner points of the text body. The invention ensures that the positive direction of the image is the positive direction of the text, and corrects the inclination, perspective and the like of the image.

Description

Text perspective transformation method and equipment
Technical Field
The invention relates to the field of image processing, in particular to a text perspective transformation method and text perspective transformation equipment.
Background
Currently, in image processing, processing of text in an image, such as image character recognition, is a technique that is often used. However, how to make the forward direction of the image be the forward direction of the text before performing the image text processing and correct the inclination, perspective, and the like of the image due to various reasons such as photographing is a technical problem to be solved in the art.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention provides a text perspective transformation method and a text perspective transformation device, so that the positive direction of an image is the positive direction of a text, and the inclination, perspective and the like of the image are corrected.
According to an aspect of the present invention, a text perspective transformation method is provided, which is used for processing an image to be detected containing a standard text, wherein the standard text comprises a first identification code and a second identification code which are located on the same side of a text main body, and the text perspective transformation method comprises the following steps:
identifying the positions of the first identification code and the second identification code in the image to be detected;
rotating the image to be detected according to the positions of the first identification code and the second identification code;
identifying four corner points of the rotated text body of the image to be detected;
and carrying out perspective transformation on the rotated image to be detected according to four corner points of the text body.
In some embodiments of the present invention, the identifying the positions of the first identification code and the second identification code in the image to be detected includes:
extracting the characteristics of the image to be detected through a first convolutional neural network;
identifying the positions of the first and second identification codes respectively according to the extracted features,
the first convolutional neural network model comprises a plurality of serially connected and/or parallelly connected bottleneck modules, each bottleneck module comprises a plurality of serially connected basic modules and a jumper link layer, and each basic module comprises a serially connected normalization layer, an activation layer and a convolutional layer.
In some embodiments of the invention, each bottleneck module comprises three basis modules, and the convolution kernels of the basis modules are 1 × 1, 3 × 3 and 1 × 1 in sequence from the input to the output of the bottleneck module.
In some embodiments of the present invention, the performing, by the first convolutional neural network, feature extraction on the image to be detected includes:
performing feature extraction on the image to be detected through a first convolutional neural network to obtain a first quasi-feature map and a second quasi-feature map, wherein the downsampling multiple of the second quasi-feature map is twice that of the first quasi-feature map;
2 times up-sampling the second quasi-feature map;
and fusing the second up-sampled quasi-feature map with the first quasi-feature map to obtain a first feature map.
In some embodiments of the present invention, the positions of the first identification code and the second identification code, which are respectively identified according to the extracted features, are trained by using a first identification model, the first identification model is trained by using a classification loss and a regression loss, the classification loss is a cross entropy loss, and the regression loss is a mean square error loss.
In some embodiments of the present invention, the rotating the image to be detected according to the positions of the first and second identification codes includes:
acquiring central point coordinates of the first identification code and the second identification code;
judging whether the coordinate difference of a second coordinate axis of the center point coordinates of the first identification code and the second identification code is larger than the coordinate difference of the first coordinate axis;
if so, rotating the image to be detected by 90 degrees clockwise or anticlockwise;
judging whether the coordinate of the first coordinate axis of the coordinate of the center point of the first identification code is larger than the coordinate of the first coordinate axis of the coordinate of the center point of the second identification code;
and if so, rotating the image to be detected by 180 degrees.
In some embodiments of the present invention, the identifying four corner points of the rotated text body of the image to be detected comprises:
performing feature extraction on the rotated image to be detected through a second convolutional neural network to obtain a second feature map;
identifying four corner points of the text body according to the second feature map,
the second convolutional neural network model comprises a plurality of serially connected and/or parallelly connected bottleneck modules, each bottleneck module comprises a plurality of serially connected basic modules and a jumper link layer, and each basic module comprises a serially connected normalization layer, an activation layer and a convolutional layer.
In some embodiments of the present invention, the identifying four corner points of the text body according to the extracted features comprises:
and respectively performing first convolution operation and second convolution operation on the second feature graph to obtain a first output layer and a second output layer, wherein the value output by the first output layer indicates whether the current position has a text, and the second output layer outputs the offset of four corner points of the text main body.
In some embodiments of the present invention, said perspective transforming of said rotated image to be detected according to four corner points of said body of text comprises;
calculating a perspective transformation matrix according to the positions of the four corner points of the text main body and the positions of the four corner points of the set transformed text main body;
multiplying the perspective transformation matrix with the rotated image to be detected to obtain a rectified image.
According to still another aspect of the present invention, there is also provided an electronic apparatus, including: a processor; a storage medium having stored thereon a computer program which, when executed by the processor, performs the steps as described above.
According to yet another aspect of the present invention, there is also provided a storage medium having stored thereon a computer program which, when executed by a processor, performs the steps as described above.
Compared with the prior art, the invention has the advantages that:
according to the invention, the image to be detected containing the standard text with the first identification code and the second identification code positioned on the same side of the text main body is processed to identify the positions of the first identification code and the second identification code, so that the image to be detected can be rotated based on the positions, and the text main body is positioned in the horizontal direction; and meanwhile, carrying out perspective transformation on the rotated image to be detected according to the four corner points of the recognized text main body, and carrying out correction of perspective transformation on the image to be detected, so that the positive direction of the image is the positive direction of the text, and the inclination, the perspective and the like of the image are corrected.
Drawings
The above and other features and advantages of the present invention will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings.
FIG. 1 shows a flow diagram of a method of perspective text transformation according to an embodiment of the invention;
FIG. 2 shows a schematic view of a rectified image;
FIG. 3 shows a schematic view of an image after rotation;
FIG. 4 shows a schematic diagram of a bottleneck module according to an embodiment of the invention;
FIG. 5 is a block diagram of a text perspective transformation apparatus according to an embodiment of the present invention;
FIG. 6 schematically illustrates a computer-readable storage medium in an exemplary embodiment of the disclosure;
fig. 7 schematically illustrates an electronic device in an exemplary embodiment of the disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
In order to solve the defects of the prior art, the invention provides a text perspective transformation method which is used for processing an image to be detected containing a standard text, wherein the standard text comprises a first identification code and a second identification code which are positioned on the same side of a text main body. Specifically, as shown in fig. 2, the first identification code 11 may be a two-dimensional code, and the second identification code 12 may be a bar code, but the invention is not limited thereto, and other custom shapes or patterns with marking properties may also be used as the first identification code 11 and the second identification code 12. The first identification code 11 and the second identification code 12 may be positioned on the upper side of the text body 11 with the centers aligned in the horizontal direction.
Fig. 1 shows a flow chart of a text perspective transformation method according to an embodiment of the invention. Fig. 1 shows the following:
step S110: and identifying the positions of the first identification code and the second identification code in the image to be detected.
Step S120: and rotating the image to be detected according to the positions of the first identification code and the second identification code.
Step S130: and identifying four corner points of the rotated text body of the image to be detected.
Step S140: and carrying out perspective transformation on the rotated image to be detected according to four corner points of the text body.
The invention provides a text perspective transformation method, which is characterized in that an image to be detected containing a standard text with a first identification code and a second identification code positioned on the same side of a text main body is processed to identify the positions of the first identification code and the second identification code, so that the image to be detected can be rotated based on the positions to enable the text main body to be positioned in the horizontal direction; and meanwhile, carrying out perspective transformation on the rotated image to be detected according to the four corner points of the recognized text main body, and carrying out correction of perspective transformation on the image to be detected, so that the positive direction of the image is the positive direction of the text, and the inclination, the perspective and the like of the image are corrected.
Specifically, the step S110 of identifying the positions of the first identification code and the second identification code in the image to be detected may be implemented by the following steps: extracting the characteristics of the image to be detected through a first convolutional neural network; and respectively identifying the positions of the first identification code and the second identification code according to the extracted features. Wherein the first convolutional neural network model comprises a plurality of serially connected and/or parallelly connected bottleneck modules. As shown in fig. 4, each bottleneck module includes a plurality of base modules 201-203 and a jumper link layer 204 connected in series, and each of the base modules 201-203 may include a normalization layer, an activation layer and a convolution layer connected in series for sequentially performing normalization, activation and convolution. As shown in fig. 3, each bottleneck module includes three basic modules, and the convolution kernels of the basic modules are 1 × 1, 3 × 3, and 1 × 1 in sequence from the input to the output of the bottleneck module. In other words, the convolution kernel of the base module 201 is 1 × 1, the convolution kernel of the base module 202 is 3 × 3, and the convolution kernel of the base module 203 is 1 × 1. The input of the bottleneck module is simultaneously input to the jumper link layer 204 to perform processing such as tensor addition on the output of the base module 203 and the input of the bottleneck module. In one embodiment of the present invention, the first convolutional neural network model may include 19 serially connected bottleneck modules.
Specifically, the step of performing feature extraction on the image to be detected through the first convolutional neural network in the above embodiment may further include the following steps: performing feature extraction on the image to be detected through a first convolutional neural network to obtain a first quasi-feature map and a second quasi-feature map, wherein the downsampling multiple of the second quasi-feature map is twice that of the first quasi-feature map; 2 times up-sampling the second quasi-feature map; and fusing the second up-sampled quasi-feature map with the first quasi-feature map to obtain a first feature map. Wherein the above steps are implemented independently of the above-mentioned bottleneck module, the second quasi-feature map is the output of the last bottleneck module, and the first quasi-feature map is the output of the penultimate bottleneck module. In a preferred embodiment, the down-sampling multiples of the first quasi-feature map and the second quasi-feature map are 16 times and 32 times, respectively, and considering that there is a difference in semantic information of different feature maps, the second quasi-feature map can be up-sampled by 2 times and then fused with the first quasi-feature map, so as to take account of deep-layer and shallow-layer semantic information.
Specifically, the step S110 in fig. 1 of identifying the positions of the first identification code and the second identification code respectively according to the extracted features may employ a first identification model. The first recognition model may be any neural network learning model for recognizing a target object. The first recognition model can be trained by adopting classification loss and regression loss, wherein the classification loss is cross entropy loss, and the regression loss is mean square error loss.
Specifically, the regression loss mainly comprises 4 pieces of information of the center point and the width and height of the frame, the xy coordinate of the center point is constrained in (0,1) by sigmoid, and the width and height are expanded to the whole real number space by a logarithmic function. In addition, in order to ensure the detection accuracy of the small target, the small target is multiplied by a coefficient of (2-wh) to increase the loss weight of the small target.
Figure BDA0003290242280000061
Figure BDA0003290242280000062
Figure BDA0003290242280000071
loss=loss_box+loss_cls+loss_obj
Wherein the content of the first and second substances,
Figure BDA0003290242280000072
indicating that the prior box at i, j has no target and is 1, otherwise it is 0.
Figure BDA0003290242280000073
Indicating that if the prior box at i, j has a target of 1, it is not 0.λ is used for weighting different parts of the loss function, λcoordTake 5, lambdaclassTake 1, lambdanoobjTake 0.5. S2Indicating the size of the detection layer. And B represents the number of the prior frames obtained by clustering, and is taken as 9. x is the number ofiyiRespectively representing the x and y coordinates of the central point of the prior box, wihiRespectively, the height and width of the prior box,
Figure BDA0003290242280000074
representing the x and y coordinates of the center point of the real value box,
Figure BDA0003290242280000075
indicating the width and height of the true value box. p is a radical ofi(c) Representing the probability of the output prior box being of class c,
Figure BDA0003290242280000076
the probability (0 or 1) that the true value box is class c. c. CiRepresenting the probability that the ith prior box has an object,
Figure BDA0003290242280000077
representing the probability that the ith prior box matches the truth box. Wherein, the loss _ box is a loss function of the detection frame, the loss _ cls is a classification loss function, the loss _ obj is a loss function of the foreground and the background, the obj represents the foreground, and the nonobj represents the background.
Specifically, step S120 in fig. 1 rotates the to-be-detected object according to the positions of the first and second identification codesThe image may be implemented by: acquiring central point coordinates of the first identification code and the second identification code; judging whether the coordinate difference of a second coordinate axis of the center point coordinates of the first identification code and the second identification code is larger than the coordinate difference of the first coordinate axis; if so, rotating the image to be detected by 90 degrees clockwise or anticlockwise; judging whether the coordinate of the first coordinate axis of the coordinate of the center point of the first identification code is larger than the coordinate of the first coordinate axis of the coordinate of the center point of the second identification code; and if so, rotating the image to be detected by 180 degrees. For example, assume that the center point of the first identification code is x1,y1The central point of the second identification code is x2,y2. Can first judge y2-y1Whether or not greater than x2-x1If the text is in a vertical state, the text needs to be rotated by 90 degrees clockwise or anticlockwise. Need to judge x after text level2And x1Size if x1Greater than x2It represents that the text is upside down and needs to be rotated by 180.
Specifically, the step S130 in fig. 1 of identifying four corner points of the rotated text body of the image to be detected may be implemented by the following steps: performing feature extraction on the rotated image to be detected through a second convolutional neural network to obtain a second feature map; and identifying four corner points of the text body according to the second feature map. The second convolutional neural network model includes a plurality of serially and/or parallelly connected bottleneck modules (see fig. 4), each bottleneck module includes a plurality of serially connected basic modules and a jumper link layer, and each basic module includes a serially connected normalization layer, an activation layer and a convolutional layer. The structure of the bottleneck module is not described herein. A second convolutional neural network that extracts image features may be multiplexed with the first convolutional neural network. In this embodiment, the size of the second feature map is consistent with the size of the image to be detected, the second convolutional neural network model performs downsampling on the rotated image to be detected three times, then performs upsampling on each layer, and finally obtains a feature map with the same size as the image to be detected, where the size is 32 × H × W, H and W are the height and width of the original image, and 32 is the number of feature map channels.
Specifically, in fig. 1, the step S130 of identifying four corner points of the text body according to the extracted features may be implemented by: and respectively performing first convolution operation and second convolution operation on the second feature graph to obtain a first output layer and a second output layer, wherein the value output by the first output layer indicates whether the current position has a text, and the second output layer outputs the offset of four corner points of the text main body. The size of the first output layer may be 2 × H × W, and the value of each point represents whether there is text, and the size of the second output layer may be 8 × H × W, and 8 values represent offsets of four corner points of the text.
Wherein, the loss function of the first output layer is cross entropy, and the loss function of the second output layer is:
Figure BDA0003290242280000081
wherein
Figure BDA0003290242280000084
As a normalization parameter, the length of the shortest side of the quadrangle is expressed
Figure BDA0003290242280000082
Wherein Q is*Is the coordinate of the true four-deformation,
Figure BDA0003290242280000083
in order to predict the coordinates of the quadrilateral,
Figure BDA0003290242280000085
all possible forms of quadrilateral coordinates (different starting points result in different quadrilateral representations), ciRepresenting the coordinates of the vertices of the real quadrilateral.
Specifically, in fig. 1, the step S140 performs perspective transformation on the rotated image to be detected according to the four corner points of the text body by the following steps; calculating a perspective transformation matrix according to the positions of the four corner points of the text main body and the positions of the four corner points of the set transformed text main body; multiplying the perspective transformation matrix with the rotated image to be detected to obtain a rectified image. For example, coordinates of four corner points are respectively Poly1 { { x1, y1}, { x2, y2}, { x3, y3}, { x4, y4} }, and the transformed corner points are respectively defined as Poly2 { { u1, v1}, { u2, v2}, { u3, v3}, { u4, v4} }. The two sets of points solve for a perspective transformation matrix M, so that the points of Poly1 can be transformed into Poly 2:
Figure BDA0003290242280000091
wherein w is any number, and the perspective transformation matrix can be solved by substituting the four groups of points. The image corrected by perspective transformation can be obtained by multiplying the matrix by the original image. Wherein Poly1 { { x1, y1}, { x2, y2}, { x3, y3}, { x4, y4} } are the four identified corner points, and Poly2 { { u1, v1}, { u2, v2}, { u3, v3}, { u4, v4} } are the four defined corner point coordinates.
The text perspective transformation method provided by the present invention is only schematically described above, and the present invention is not limited thereto.
The invention also provides a text perspective conversion device which is used for processing the image to be detected containing the standard text, wherein the standard text comprises a first identification code and a second identification code which are positioned on the same side of the text main body. Fig. 5 is a block diagram illustrating a text perspective transformation apparatus according to an embodiment of the present invention. The text perspective transformation apparatus 300 includes a first recognition module 310, a rotation module 320, a second recognition module 330, and a transformation module 340.
The first identification module 310 is configured to identify positions of the first identification code and the second identification code in the image to be detected;
the rotating module 320 is configured to rotate the image to be detected according to the positions of the first identification code and the second identification code;
the second identification module 330 is configured to identify four corner points of the rotated text body of the image to be detected;
the transformation module 340 is configured to perform perspective transformation on the rotated image to be detected according to four corner points of the text body.
The invention provides a text perspective conversion device, which is used for processing an image to be detected containing a standard text with a first identification code and a second identification code which are positioned on the same side of a text main body so as to identify the positions of the first identification code and the second identification code, so that the image to be detected can be rotated based on the positions, and the text main body is positioned in the horizontal direction; and meanwhile, carrying out perspective transformation on the rotated image to be detected according to the four corner points of the recognized text main body, and carrying out correction of perspective transformation on the image to be detected, so that the positive direction of the image is the positive direction of the text, and the inclination, the perspective and the like of the image are corrected.
Fig. 5 is a block diagram schematically illustrating a text perspective transformation apparatus 300 provided by the present invention, and the division, combination, and addition of blocks are within the protection scope of the present invention without departing from the concept of the present invention.
In an exemplary embodiment of the present disclosure, a computer-readable storage medium is further provided, on which a computer program is stored, which when executed by, for example, a processor, may implement the steps of the text perspective transformation method described in any of the above embodiments. In some possible embodiments, the aspects of the invention may also be implemented in the form of a program product comprising program code for causing a terminal device to perform the steps according to various exemplary embodiments of the invention described in the text perspective transformation method section above of this specification, when the program product is run on the terminal device.
Referring to fig. 6, a program product 800 for implementing the above method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the tenant computing device, partly on the tenant device, as a stand-alone software package, partly on the tenant computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing devices may be connected to the tenant computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
In an exemplary embodiment of the present disclosure, there is also provided an electronic device, which may include a processor, and a memory for storing executable instructions of the processor. Wherein the processor is configured to perform the steps of the text perspective transformation method in any of the above embodiments via execution of the executable instructions.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.
An electronic device 600 according to this embodiment of the invention is described below with reference to fig. 7. The electronic device 600 shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 7, the electronic device 600 is embodied in the form of a general purpose computing device. The components of the electronic device 600 may include, but are not limited to: at least one processing unit 610, at least one storage unit 620, a bus 630 that connects the various system components (including the storage unit 620 and the processing unit 610), a display unit 640, and the like.
Wherein the storage unit stores program code executable by the processing unit 610 to cause the processing unit 610 to perform steps according to various exemplary embodiments of the present invention described in the text perspective transformation method section above in this specification. For example, the processing unit 610 may perform the steps as shown in fig. 1.
The storage unit 620 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)6201 and/or a cache memory unit 6202, and may further include a read-only memory unit (ROM) 6203.
The memory unit 620 may also include a program/utility 6204 having a set (at least one) of program modules 6205, such program modules 6205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 630 may be one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 600 may also communicate with one or more external devices 700 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a tenant to interact with the electronic device 600, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 600 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 650. Also, the electronic device 600 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via the network adapter 660. The network adapter 660 may communicate with other modules of the electronic device 600 via the bus 630. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 600, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, or a network device, etc.) to execute the text perspective transformation method according to the embodiments of the present disclosure.
Compared with the prior art, the invention has the advantages that:
according to the invention, the image to be detected containing the standard text with the first identification code and the second identification code positioned on the same side of the text main body is processed to identify the positions of the first identification code and the second identification code, so that the image to be detected can be rotated based on the positions, and the text main body is positioned in the horizontal direction; and meanwhile, carrying out perspective transformation on the rotated image to be detected according to the four corner points of the recognized text main body, and carrying out correction of perspective transformation on the image to be detected, so that the positive direction of the image is the positive direction of the text, and the inclination, the perspective and the like of the image are corrected.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims (10)

1. A text perspective transformation method is characterized by being used for processing an image to be detected containing a standard text, wherein the standard text comprises a first identification code and a second identification code which are positioned on the same side of a text main body, and the text perspective transformation method comprises the following steps:
identifying the positions of the first identification code and the second identification code in the image to be detected;
rotating the image to be detected according to the positions of the first identification code and the second identification code;
identifying four corner points of the rotated text body of the image to be detected;
and carrying out perspective transformation on the rotated image to be detected according to four corner points of the text body.
2. The method of text perspective transformation as claimed in claim 1, wherein said identifying the position of the first identification code and the second identification code in the image to be detected comprises:
extracting the characteristics of the image to be detected through a first convolutional neural network;
identifying the positions of the first and second identification codes respectively according to the extracted features,
the first convolutional neural network model comprises a plurality of serially connected and/or parallelly connected bottleneck modules, each bottleneck module comprises a plurality of serially connected basic modules and a jumper link layer, and each basic module comprises a serially connected normalization layer, an activation layer and a convolutional layer.
3. The method of text perspective transformation of claim 2, wherein each bottleneck module comprises three basis modules, and the convolution kernels of the basis modules are 1 x1, 3 x3 and 1 x1 in order from the input to the output direction of the bottleneck module.
4. The method of text perspective transformation as claimed in claim 2, wherein said feature extraction of the image to be detected by the first convolutional neural network comprises:
performing feature extraction on the image to be detected through a first convolutional neural network to obtain a first quasi-feature map and a second quasi-feature map, wherein the downsampling multiple of the second quasi-feature map is twice that of the first quasi-feature map;
2 times up-sampling the second quasi-feature map;
and fusing the second up-sampled quasi-feature map with the first quasi-feature map to obtain a first feature map.
5. The method of text perspective transformation of claim 2, wherein the identifying the positions of the first identification code and the second identification code respectively according to the extracted features employs a first recognition model, the first recognition model is trained using a classification loss and a regression loss, the classification loss is a cross entropy loss, and the regression loss is a mean square error loss.
6. The text perspective transformation method of claim 1, wherein the rotating the image to be detected according to the positions of the first and second identification codes comprises:
acquiring central point coordinates of the first identification code and the second identification code;
judging whether the coordinate difference of a second coordinate axis of the center point coordinates of the first identification code and the second identification code is larger than the coordinate difference of the first coordinate axis;
if so, rotating the image to be detected by 90 degrees clockwise or anticlockwise;
judging whether the coordinate of the first coordinate axis of the coordinate of the center point of the first identification code is larger than the coordinate of the first coordinate axis of the coordinate of the center point of the second identification code;
and if so, rotating the image to be detected by 180 degrees.
7. The method of text perspective transformation as claimed in claim 1, wherein said identifying four corner points of the text body of the rotated image to be detected comprises:
performing feature extraction on the rotated image to be detected through a second convolutional neural network to obtain a second feature map;
identifying four corner points of the text body according to the second feature map,
the second convolutional neural network model comprises a plurality of serially connected and/or parallelly connected bottleneck modules, each bottleneck module comprises a plurality of serially connected basic modules and a jumper link layer, and each basic module comprises a serially connected normalization layer, an activation layer and a convolutional layer.
8. The method of text perspective transformation of claim 7, wherein the identifying four corner points of the body of text from the extracted features comprises:
and respectively performing first convolution operation and second convolution operation on the second feature graph to obtain a first output layer and a second output layer, wherein the value output by the first output layer indicates whether the current position has a text, and the second output layer outputs the offset of four corner points of the text main body.
9. The method of claim 1, wherein the perspective transformation of the rotated image to be detected according to the four corner points of the text body comprises;
calculating a perspective transformation matrix according to the positions of the four corner points of the text main body and the positions of the four corner points of the set transformed text main body;
multiplying the perspective transformation matrix with the rotated image to be detected to obtain a rectified image.
10. An electronic device, characterized in that the electronic device comprises:
a processor;
a storage medium having stored thereon a computer program which, when executed by the processor, performs:
the method of text perspective transformation of any one of claims 1 to 9.
CN202111168050.7A 2021-09-30 2021-09-30 Text perspective transformation method and equipment Pending CN113850220A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111168050.7A CN113850220A (en) 2021-09-30 2021-09-30 Text perspective transformation method and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111168050.7A CN113850220A (en) 2021-09-30 2021-09-30 Text perspective transformation method and equipment

Publications (1)

Publication Number Publication Date
CN113850220A true CN113850220A (en) 2021-12-28

Family

ID=78977674

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111168050.7A Pending CN113850220A (en) 2021-09-30 2021-09-30 Text perspective transformation method and equipment

Country Status (1)

Country Link
CN (1) CN113850220A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115100506A (en) * 2022-06-27 2022-09-23 平安银行股份有限公司 Method, server and system for correcting screen image in double-recording scene
WO2024125350A1 (en) * 2022-12-13 2024-06-20 北京字跳网络技术有限公司 Image processing method and apparatus, and device and medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115100506A (en) * 2022-06-27 2022-09-23 平安银行股份有限公司 Method, server and system for correcting screen image in double-recording scene
CN115100506B (en) * 2022-06-27 2024-05-24 平安银行股份有限公司 Method, server and system for correcting images of screen in double-recording scene
WO2024125350A1 (en) * 2022-12-13 2024-06-20 北京字跳网络技术有限公司 Image processing method and apparatus, and device and medium

Similar Documents

Publication Publication Date Title
CN111488826B (en) Text recognition method and device, electronic equipment and storage medium
CN114155543B (en) Neural network training method, document image understanding method, device and equipment
WO2019240964A1 (en) Teacher and student based deep neural network training
CN111615702B (en) Method, device and equipment for extracting structured data from image
CN113785305A (en) Method, device and equipment for detecting inclined characters
CN113850220A (en) Text perspective transformation method and equipment
CN112016559A (en) Example segmentation model training method and device and image processing method and device
CN109934229B (en) Image processing method, device, medium and computing equipment
CN112633159A (en) Human-object interaction relation recognition method, model training method and corresponding device
CN111428805A (en) Method and device for detecting salient object, storage medium and electronic equipment
CN111767889A (en) Formula recognition method, electronic device and computer readable medium
CN111414913A (en) Character recognition method and recognition device and electronic equipment
CN113496212A (en) Text recognition method and device for box-type structure and electronic equipment
CN114239760B (en) Multi-modal model training and image recognition method and device, and electronic equipment
CN113191364B (en) Vehicle appearance part identification method, device, electronic equipment and medium
CN115861922A (en) Sparse smoke and fire detection method and device, computer equipment and storage medium
CN115984886A (en) Table information extraction method, device, equipment and storage medium
CN113205092A (en) Text detection method, device, equipment and storage medium
CN114120305A (en) Training method of text classification model, and recognition method and device of text content
CN114741697A (en) Malicious code classification method and device, electronic equipment and medium
CN112801960A (en) Image processing method and device, storage medium and electronic equipment
CN114529891A (en) Text recognition method, and training method and device of text recognition network
CN118097706B (en) Method, system, equipment and medium for detecting graphic element of power grid station wiring diagram
CN110555498A (en) Two-dimensional code generation method and device, electronic equipment and storage medium
CN116259050B (en) Method, device, equipment and detection method for positioning and identifying label characters of filling barrel

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: Room 503-3, 398 Jiangsu Road, Changning District, Shanghai 200050

Applicant after: Shanghai Xijing Technology Co.,Ltd.

Address before: Room 503-3, 398 Jiangsu Road, Changning District, Shanghai 200050

Applicant before: SHANGHAI WESTWELL INFORMATION AND TECHNOLOGY Co.,Ltd.