WO2023019974A1 - 文档图像的矫正方法、装置、电子设备和存储介质 - Google Patents

文档图像的矫正方法、装置、电子设备和存储介质 Download PDF

Info

Publication number
WO2023019974A1
WO2023019974A1 PCT/CN2022/085587 CN2022085587W WO2023019974A1 WO 2023019974 A1 WO2023019974 A1 WO 2023019974A1 CN 2022085587 W CN2022085587 W CN 2022085587W WO 2023019974 A1 WO2023019974 A1 WO 2023019974A1
Authority
WO
WIPO (PCT)
Prior art keywords
document image
corrected
training
dimensional
distorted
Prior art date
Application number
PCT/CN2022/085587
Other languages
English (en)
French (fr)
Inventor
谢群义
钦夏孟
章成全
姚锟
Original Assignee
北京百度网讯科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京百度网讯科技有限公司 filed Critical 北京百度网讯科技有限公司
Publication of WO2023019974A1 publication Critical patent/WO2023019974A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/146Aligning or centring of the image pick-up or image-field

Definitions

  • the disclosure relates to the technical field of artificial intelligence, to the technical field of computer vision and deep learning, and can be applied to scenes such as image processing and image recognition, and specifically relates to a document image correction method, device, electronic equipment and storage medium.
  • OCR Optical Character Recognition
  • the disclosure provides a document image correction method, device, electronic equipment and storage medium.
  • a method for rectifying a document image including:
  • a two-dimensional backward image is obtained by performing interpolation calculation on the two-dimensional forward image, and a rectified document image is generated according to the two-dimensional backward image.
  • an apparatus for correcting a document image including:
  • the first input module is configured to input the document image to be corrected into the shape network model, so as to obtain the distorted three-dimensional coordinates corresponding to the document image to be corrected;
  • the second input module is used to input the distorted three-dimensional coordinates into the corrected coordinate prediction network model, so as to obtain the corrected three-dimensional coordinates corresponding to the distorted three-dimensional coordinates;
  • a first calculation module configured to calculate a corresponding two-dimensional forward map according to the corrected three-dimensional coordinates and the corner points of the document image to be corrected;
  • a processing module configured to obtain a two-dimensional backward image by performing interpolation calculation on the two-dimensional forward image, and generate a rectified document image according to the two-dimensional backward image.
  • an electronic device including:
  • the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can perform the method described in the preceding aspect.
  • a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause a computer to execute the method described in the preceding aspect.
  • a computer program product comprising a computer program which, when executed by a processor, implements the method according to the preceding aspect.
  • the document image correction method, device, electronic device and storage medium input the document image to be corrected into the shape network model to obtain the distorted three-dimensional coordinates corresponding to the document image to be corrected, and input the distorted three-dimensional coordinates into the correction Coordinate prediction network model to obtain the corrected three-dimensional coordinates corresponding to the distorted three-dimensional coordinates, calculate the corresponding two-dimensional forward map according to the corrected three-dimensional coordinates and the corner points of the document image to be corrected, and calculate the corresponding two-dimensional forward map through the two-dimensional forward Perform interpolation calculation on the graph to obtain a two-dimensional backward graph, and generate a corrected document image according to the two-dimensional backward graph.
  • the local distortion rate and OCR character error rate of the document image to be corrected can be reduced.
  • FIG. 1 is a schematic flowchart of a method for correcting a document image provided by an embodiment of the present disclosure.
  • FIG. 2 is a schematic flowchart of a training method for a corrected coordinate prediction network model provided by an embodiment of the present disclosure.
  • Fig. 3 is a schematic diagram of generating a true value of a coordinate value offset provided by an embodiment of the present disclosure.
  • FIG. 4 is a schematic flowchart of a training method for a corrected coordinate prediction network model provided by an embodiment of the present disclosure.
  • FIG. 5 is a schematic flowchart of a training method for a shape network model provided by an embodiment of the present disclosure.
  • Fig. 6 is a schematic structural diagram of an apparatus for correcting a document image provided by an embodiment of the present disclosure.
  • Fig. 7 is a schematic structural diagram of an apparatus for correcting a document image provided by an embodiment of the present disclosure.
  • Fig. 8 is a schematic block diagram of an exemplary electronic device provided by an embodiment of the present disclosure.
  • the corrective deformation of the document image is realized based on the method of 3D shape reconstruction, such as using a projector system to obtain a 3D model or using a dedicated laser rangefinder to capture the 3D model of the deformed document, and to carry out shape reconstruction according to the physical characteristics of the paper.
  • Restoration this method mostly relies on physical equipment, and requires high accuracy of physical equipment.
  • the stability and accuracy of physical equipment are difficult to control, resulting in the construction of 3D models. Instability, which directly leads to poor rectification quality and high distortion rate of the document image.
  • two deep learning models namely the shape network model and the rectified coordinate prediction network model, are firstly used.
  • the three-dimensional image to be corrected is converted into a three-dimensional image, and the coordinates of the three-dimensional image are corrected.
  • the three-dimensional corrected coordinates and the corner points of the document image to be corrected are mapped into a two-dimensional forward image, and finally the backward image is obtained by interpolation.
  • the rectified map of the document image is generated from the document image (distorted map) to be rectified by using the bilinear difference method.
  • FIG. 1 is a schematic flowchart of a method for correcting a document image provided by an embodiment of the present disclosure.
  • the method includes the following steps:
  • Step 101 input the document image to be corrected into the shape network model, so as to obtain the distorted three-dimensional coordinates corresponding to the document image to be corrected.
  • the shape network model is used to predict the 3D coordinates of the document image to be corrected in the distorted map pixel by pixel.
  • the document image to be corrected is a two-dimensional image. After inputting the shape network model, the two-dimensional document image to be corrected is converted and mapped as Corresponding warped 3D coordinates.
  • Step 102 input the distorted three-dimensional coordinates into the corrected coordinate prediction network model to obtain corrected three-dimensional coordinates corresponding to the distorted three-dimensional coordinates.
  • step 101 is used as the input of step 102, that is, the distorted 3D coordinates output by the shape network model are input to the rectified coordinate prediction network model, and the rectified coordinate prediction network model is converted into the corrected 3D coordinates of the distorted image, and the rectified coordinate prediction network model predicts specifies the offset of the coordinate value from the distorted 3D coordinates to the corrected 3D coordinates, and the offset describes the conversion from the distorted 3D coordinates of the document image to be corrected to its corrected 3D coordinates.
  • the corrected coordinate prediction network model may rely on a DenseNet-based model as a mapping function to convert a distorted 3D coordinate map into a 3D corrected coordinate map. And use the differentiable combined objective function to jointly optimize the shape network model and the correction coordinate prediction network model, so that the conversion accuracy of the document image to be corrected can be improved.
  • Step 103 calculating a corresponding two-dimensional forward map according to the corrected three-dimensional coordinates and the corner points of the document image to be corrected.
  • the corner points of the document image to be corrected include four corner points, and the corner points described in this step are the corrected 3D coordinates corresponding to the corner points of the document image to be corrected in the two-dimensional state, and the corrected 3D coordinates are transformed into It is the corresponding 2D coordinate in the two-dimensional state, and the 2D coordinate here is also converted pixel by pixel.
  • the corrected 3D coordinate map is converted pixel by pixel, a 2D forward map is obtained, and the corner point is obtained by combining the corrected 3D coordinates.
  • the two-dimensional forward map can restore the image of the document to be corrected very well.
  • the upper left corner point, the upper right corner point of the document image to be corrected and the width of the document image to be corrected calculate the horizontal coordinates of the two-dimensional forward map, according to the lower left corner point and the lower right corner point of the document image to be corrected and the height of the document image to be corrected, calculating the longitudinal coordinates of the two-dimensional forward map, and generating the forward map according to the horizontal coordinates and the longitudinal coordinates.
  • the four corners of the correction map are the upper left corner point: C1, the lower left corner point: C2, the lower right corner point: C3, and the upper right corner point: C4, and set the correction corresponding to these four corner points according to the generation method of its true value 3D coordinates, in the 3D corrected coordinates of the corrected coordinate prediction network model, assuming that the corrected 3D coordinates of any point P are (x, y, z), and the 2D coordinates in the forward graph to be calculated are (x1, y1) then The 2D coordinates corresponding to point P are:
  • width and height are the set image width and height values.
  • step 104 a two-dimensional backward image is obtained by performing interpolation calculation on the two-dimensional forward image, and a rectified document image is generated according to the two-dimensional backward image.
  • the 2D forward image is converted into a 2D backward image by interpolation, and finally the rectified image is restored by using bilinear interpolation.
  • the document image correction method inputs the document image to be corrected into the shape network model to obtain the distorted three-dimensional coordinates corresponding to the document image to be corrected, and inputs the distorted three-dimensional coordinates into the corrected coordinate prediction network model to obtain the The corrected three-dimensional coordinates corresponding to the distorted three-dimensional coordinates, the corresponding two-dimensional forward map is calculated according to the corrected three-dimensional coordinates and the corner points of the document image to be corrected, and the two-dimensional post-map is obtained by interpolating the two-dimensional forward map A rectified document image is generated according to the two-dimensional backward graph.
  • the local distortion rate and OCR character error rate of the document image to be corrected can be reduced by pixel-by-pixel correction.
  • the shape network model and the corrected coordinate prediction network model are trained separately, and after the training is completed, the two optimal models are jointly trained.
  • a small amount of data is used for pre-training. This process only uses the predicted target information of the 3D coordinates of the distorted map and the pixel-by-pixel difference as constraints, and then adds the rest of the data and other constraints for retraining until Get the best model for 3D coordinate prediction.
  • the corrected coordinate prediction network model similarly, a part of the data is used for pre-training. This process only uses the true value of the 3D corrected coordinates of the distorted map as a constraint, and then adds other training data, gradient constraints, and normal vector constraints.
  • Figure 2 shows a flow chart of a training method for correcting a coordinate prediction network model provided by the present application, and the method includes:
  • Step 201 acquiring first training distorted document information
  • the first training distorted document information includes a training distorted document image and a true value of a coordinate value offset from the distorted 3D coordinates to the corrected 3D coordinates of the training distorted document image.
  • the first training warped document information used in the embodiment of the present application is an open-source Doc3D synthetic data set.
  • the first training warped document information includes 2D training warped document images that can be used for training, as well as 3D coordinate annotation information and depth maps of each image. Labeling information, normal vector labeling information, backward graph labeling information, etc. Therefore, the 3D corrected coordinates of the warped map used in the corrected coordinate prediction network model are also generated from the backward map information provided by the Doc3D dataset.
  • the true value of the coordinate value offset is used as the constraint information of the corrected coordinate prediction network model.
  • Its generation method is as follows, as shown in Figure 3: the backward graph in the Doc3D dataset is interpolated into the forward graph, and Z is added according to the backward graph
  • the axis information is used to obtain the 3D coordinates of the correction map, and the 3D correction coordinates corresponding to the distorted 3D coordinates are generated from the forward map and the 3D coordinates of the correction map, and the 3D correction coordinates are used as the true value of the coordinate value offset of the correction 3D coordinates.
  • the addition of the Z axis when adding the Z axis, it is the basis for converting the two-dimensional training warped document image into 3D, and the addition of the Z axis can use the center point of the training warped document image, and the method formed by the center point Vector, according to the normal vector to realize the addition of the Z axis, in another implementation process of the present application, the addition of the Z axis can also be completed based on any implementation method in the related technology, the application does not carry out specific implementation methods limited.
  • the coordinates of the document image to be corrected in the two-dimensional state are absolute values.
  • the coordinates of a certain point are (0.8,0.8,1.9), but the Under the coordinate prediction network model, the corresponding 3D coordinates are expressed as (1-0.2, 1-0.2, 2-0.1), where 0.2, 0.2 and 0.1 are the offsets of the coordinates.
  • the foregoing is only an exemplary description for ease of understanding, and is not a specific limitation on a certain coordinate.
  • Step 202 Input the training distorted document image into the rectified coordinate prediction network model to obtain a predicted value of a coordinate value offset from the distorted 3D coordinates of the trained distorted document image to the rectified 3D coordinates.
  • Step 203 according to the difference between the true value of the coordinate value offset and the predicted value of the coordinate value offset, train the corrected coordinate prediction network model to obtain a trained corrected coordinate prediction network model.
  • the difference between the true value of the coordinate value offset and the predicted value of the coordinate value offset is converted into a gradient error, and the corrected coordinate prediction network model is trained based on the gradient error.
  • the gradient error is used to describe the smoothness. For example, if a certain coordinate is (0.2,0.2,03), its adjacent coordinates may be (0.22,0.20,0.31), instead of the coordinates of the point being (1,0.20,, 0.31).
  • the accuracy of the training correction coordinate prediction network model can be improved, which in turn can increase the accuracy of the correction document image.
  • the training process of the corrected coordinate prediction network model is generally a multiple iterative training process. On the premise of continuously adjusting the network layer parameters of each layer, the training results are more convergent, so as to complete the training of the corrected coordinate prediction network model.
  • the present application uses gradient and normal vector features as enhancements in the correction coordinate prediction network model to more accurately predict the 3D correction coordinates of distorted images.
  • Figure 4 shows another training method for the corrected coordinate prediction network model provided by the present application. This method is superimposed with Figure 2 for training, which can make the corrected 3D coordinates output by the corrected coordinate prediction network model more accurate .
  • the methods include:
  • Step 401 acquiring the first pixel in the training warped document image.
  • the first pixel is any pixel in the training warped document image, and it is assumed that the first pixel is pixel A.
  • Step 402 obtaining a second pixel point in a parallel direction adjacent to the first pixel point, and obtaining a third pixel point in a vertical direction adjacent to the first pixel point, from the first pixel point,
  • the second pixel and the third pixel form a neighborhood.
  • the second pixel is the pixel immediately to the right of the first pixel
  • the third pixel is the pixel immediately below the first pixel, consisting of the first pixel, the second pixel and the third pixel a neighborhood.
  • Step 403 obtaining the normal vector of the calculation center pixel in the neighborhood.
  • Step 404 calculating a corrected three-dimensional coordinate normal vector according to the vector of three-dimensional coordinates composed of any two pixels among the first pixel, the second pixel, and the third pixel.
  • P A represents the 3D coordinates of point A
  • Step 405 calculating the cross product of the center pixel normal vector and the corrected three-dimensional coordinate normal vector.
  • the cross product result of the normal vector of the neighborhood pixel and the normal vector of the center pixel position is calculated in a neighborhood.
  • the goal is that the normal vector in the neighborhood is parallel, that is, the cross product result is 0.
  • the specific cross product formula is, As follows:
  • Step 406 Train the corrected coordinate prediction network model according to the difference between the cross-product result and the target cross-product result.
  • the target cross product result is 0, which means that the document image is flattened, and the training difficulty also increases accordingly.
  • the target cross-multiplication result can also be set to 0.01, or 0.02, etc. The setting of the target cross-multiplication result depends on the specific application scenario, which is not limited in this embodiment.
  • the training of the corrected coordinate prediction network model based on the normal vector is generally a multiple iterative training process. Under the premise of continuously adjusting the network layer parameters of each layer, the training results are more convergent, so as to complete the corrected coordinate prediction network model. train.
  • FIG. 5 shows a flow chart of a training method for a shape network model provided in an embodiment of the present application, and the method includes:
  • Step 501 acquire second training warped document information
  • the second training warped document information includes a training warped document image and sample target information of the training warped document image
  • the training warped document image contains marked two-dimensional coordinates The image of the system and the annotated true value of the angle.
  • the second training distorted document information used in the embodiment of the present application is an open source Doc3D synthetic data set, and the second training distorted document information contains 2D training distorted document images that can be used for training, and the training distorted document images contain marked two dimensional coordinate system and the annotated true value of the angle.
  • the calculation of the true value of the angle needs to use the backward graph, that is, the mapping relationship from the distorted graph to the corrected graph.
  • the purpose of using the true value of the angle as a constraint is to keep the distorted graph (the graph formed by distorted 3D coordinates) and the predicted graph (the graph formed by the corrected 3D coordinates). The consistency of the angle of the graph) and the angle of the warped graph to the true value.
  • Step 502 generating first angle offset value pairs in the X-axis direction and the Y-axis direction respectively based on the X-axis of the two-dimensional coordinate system.
  • Step 503 generating second angle offset value pairs respectively in the X-axis direction and the Y-axis direction based on the Y-axis of the two-dimensional coordinate system.
  • the shape network model will predict 4 auxiliary channels, that is, the X-axis generates the first angle offset value pair in the X-axis direction and the Y-axis direction respectively, and the Y-axis generates the second angle offset in the X-axis direction and the Y-axis direction respectively.
  • Value pairs denoted as ( ⁇ xx , ⁇ xy , ⁇ yx , ⁇ yy ), where the prediction of ⁇ xy provides the offset value predicted in the Y direction for the X axis, and so on.
  • Step 504 calculate the first magnitude of the angle of the training warped document image according to the first angle offset value pair.
  • Step 505 calculating a second magnitude of the training warped document image angle according to the second angle offset value pair.
  • Step 506 generating a predicted angle of the training warped document image according to the first magnitude and the second magnitude of the training warped document image angle.
  • ⁇ i arctan2( ⁇ ix , ⁇ iy )
  • the value calculated in step 506 is used in the angle loss.
  • Step 507 inputting the predicted angle into the shape network model to obtain the marked warped three-dimensional coordinates and predicted target information in the training warped document image.
  • Step 508 according to the difference between the sample target information and the predicted target information, and the difference between the predicted angle of the training warped document image and the true value of the angle, train the shape network model to Get the trained shape network model.
  • the training process of the shape network model is generally a multiple iterative training process. On the premise of continuously adjusting the network layer parameters of each layer, the training results are more convergent, so as to complete the training of the shape network model.
  • FIG. 6 is a schematic structural diagram of a document image correction device provided by an embodiment of the present disclosure. As shown in FIG. 6 , it includes: a first input module 61 , a second input module 62 , a first calculation module 63 and a processing module 64 .
  • the first input module 61 is configured to input the document image to be corrected into the shape network model, so as to obtain the distorted three-dimensional coordinates corresponding to the document image to be corrected.
  • the second input module 62 is configured to input the distorted three-dimensional coordinates into the corrected coordinate prediction network model, so as to obtain corrected three-dimensional coordinates corresponding to the distorted three-dimensional coordinates.
  • the first calculation module 63 is configured to calculate a corresponding two-dimensional forward map according to the corrected three-dimensional coordinates and the corner points of the document image to be corrected.
  • the processing module 64 is configured to perform interpolation calculation on the two-dimensional forward image to obtain a two-dimensional backward image, and generate a rectified document image according to the two-dimensional backward image.
  • the device further includes: a first input module 71, a second input module 72, a first calculation module 73, a processing module 74, a second An acquisition module 75, a third input module 76 and a first training module 77, wherein, for the description of the first input module 71, the second input module 72, the first calculation module 73 and the processing module 74, please refer to FIG. 6
  • the corresponding descriptions of the first input module 61 , the second input module 62 , the first calculation module 63 and the processing module 64 will not be repeated in this application.
  • the first acquisition module 75 is configured to acquire the first training distorted document information, the first training distorted document information includes the training distorted document image, and the coordinate value deviation from the distorted three-dimensional coordinates of the training distorted document image to the corrected three-dimensional coordinates Shift true value.
  • the third input module 76 is configured to input the training distorted document image into the rectified coordinate prediction network model, so as to obtain a predicted value of a coordinate value offset from the distorted 3D coordinates of the trained distorted document image to the rectified 3D coordinates.
  • the first training module 77 is configured to train the corrected coordinate prediction network model according to the difference between the true value of the offset of the coordinate value and the predicted value of the offset of the coordinate value, so as to obtain a trained correction Coordinate prediction network model.
  • the first training module 77 further includes:
  • a first acquiring unit 771 configured to acquire the first pixel in the training warped document image
  • the second acquiring unit 772 is configured to acquire a second pixel point in a parallel direction adjacent to the first pixel point, and acquire a third pixel point in a vertical direction adjacent to the first pixel point,
  • a forming unit 773 configured to form a neighborhood by the first pixel, the second pixel and the third pixel;
  • the third acquiring unit 774 is configured to acquire the normal vector of the calculation center pixel in the neighborhood;
  • the first calculation unit 775 is configured to calculate the corrected three-dimensional coordinate normal vector according to the vector of three-dimensional coordinates composed of any two pixels among the first pixel, the second pixel, and the third pixel;
  • the second calculation unit 776 is used to calculate the cross product of the center pixel normal vector and the corrected three-dimensional coordinate normal vector
  • the first training unit 777 is configured to train the corrected coordinate prediction network model according to the difference between the cross-product result and the target cross-product result.
  • the first training module 77 further includes:
  • a conversion unit 778 configured to convert the difference between the true value of the coordinate value offset and the predicted value of the coordinate value offset into a gradient error
  • the second training unit 779 is configured to train the corrected coordinate prediction network model based on the gradient error.
  • the device further includes:
  • the second obtaining module 78 is used to obtain the second training distorted document information, the second training distorted document information includes the training distorted document image and the sample target information of the training distorted document image, the training distorted document image contains the An image of the marked two-dimensional coordinate system and the true value of the marked angle;
  • the first generating module 79 is configured to generate first angle offset value pairs in the X-axis direction and the Y-axis direction based on the X-axis of the two-dimensional coordinate system;
  • the second generation module 710 is configured to generate a pair of second angle offset values in the X-axis direction and the Y-axis direction respectively based on the Y-axis of the two-dimensional coordinate system;
  • the second calculation module 711 is configured to calculate the first magnitude of the angle of the training distorted document image according to the first angle offset value pair;
  • a third calculation module 712 configured to calculate a second magnitude of the training warped document image angle according to the second angle offset value pair
  • a generating module 713 configured to generate a predicted angle of the training distorted document image according to the first magnitude and the second magnitude of the training distorted document image angle;
  • the fourth input module 714 is configured to input the predicted angle into the shape network model, so as to obtain the marked warped three-dimensional coordinates and predicted target information in the training warped document image;
  • the second training module 715 is configured to train the shape network according to the difference between the sample target information and the predicted target information, and the difference between the predicted angle of the training warped document image and the true value of the angle.
  • the model is trained to obtain a trained shape network model.
  • the document image to be corrected includes four corner points
  • the first calculation module 73 includes:
  • the first calculation unit 731 is configured to calculate the horizontal coordinates of the two-dimensional forward map according to the upper left corner point, the upper right corner point of the document image to be corrected and the width of the document image to be corrected;
  • the second calculation unit 732 is configured to calculate the longitudinal coordinates of the two-dimensional forward map according to the lower left corner point, the lower right corner point of the document image to be corrected and the height of the document image to be corrected;
  • a generating unit 733 configured to generate a forward map according to the horizontal coordinate and the vertical coordinate.
  • the present disclosure also provides an electronic device, a readable storage medium, and a computer program product.
  • FIG. 8 shows a schematic block diagram of an example electronic device 800 that may be used to implement embodiments of the present disclosure.
  • Electronic device is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers.
  • Electronic devices may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smart phones, wearable devices, and other similar computing devices.
  • the components shown herein, their connections and relationships, and their functions, are by way of example only, and are not intended to limit implementations of the disclosure described and/or claimed herein.
  • the device 800 includes a computing unit 801, which can be loaded into a RAM (Random Access Memory, random access/ accesses the computer program in the memory) 803 to execute various appropriate actions and processes.
  • RAM Random Access Memory
  • various programs and data necessary for the operation of the device 800 can also be stored.
  • the computing unit 801, ROM 802, and RAM 803 are connected to each other through a bus 804.
  • An I/O (Input/Output, input/output) interface 805 is also connected to the bus 804 .
  • the I/O interface 805 includes: an input unit 806, such as a keyboard, a mouse, etc.; an output unit 807, such as various types of displays, speakers, etc.; a storage unit 808, such as a magnetic disk, an optical disk, etc. ; and a communication unit 809, such as a network card, a modem, a wireless communication transceiver, and the like.
  • the communication unit 809 allows the device 800 to exchange information/data with other devices over a computer network such as the Internet and/or various telecommunication networks.
  • the computing unit 801 may be various general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of computing unit 801 include but are not limited to CPU (Central Processing Unit, central processing unit), GPU (Graphic Processing Units, graphics processing unit), various dedicated AI (Artificial Intelligence, artificial intelligence) computing chips, various operating The computing unit of the machine learning model algorithm, DSP (Digital Signal Processor, digital signal processor), and any appropriate processor, controller, microcontroller, etc.
  • the computing unit 801 executes various methods and processes described above, such as a correction method of a document image. For example, in some embodiments, the method for rectifying a document image may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 808 .
  • part or all of the computer program may be loaded and/or installed onto the device 800 via the ROM 802 and/or the communication unit 809 .
  • a computer program When a computer program is loaded into RAM 803 and executed by computing unit 801, one or more steps of the methods described above may be performed.
  • the computing unit 801 may be configured in any other appropriate way (for example, by means of firmware) to execute the aforementioned method for rectifying a document image.
  • programmable processor can be special-purpose or general-purpose programmable processor, can receive data and instruction from storage system, at least one input device, and at least one output device, and transmit data and instruction to this storage system, this at least one input device, and this at least one output device an output device.
  • Program codes for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, a special purpose computer, or other programmable data processing devices, so that the program codes, when executed by the processor or controller, make the functions/functions specified in the flow diagrams and/or block diagrams Action is implemented.
  • the program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include electrical connections based on one or more wires, portable computer disks, hard disks, RAM, ROM, EPROM (Electrically Programmable Read-Only-Memory, Erasable Programmable Read-Only Memory) Or flash memory, optical fiber, CD-ROM (Compact Disc Read-Only Memory, portable compact disk read-only memory), optical storage device, magnetic storage device, or any suitable combination of the above.
  • the systems and techniques described herein can be implemented on a computer having a display device (e.g., a CRT (Cathode-Ray Tube) or LCD ( Liquid Crystal Display (LCD) monitor); and a keyboard and pointing device (such as a mouse or trackball) through which a user can provide input to a computer.
  • a display device e.g., a CRT (Cathode-Ray Tube) or LCD ( Liquid Crystal Display (LCD) monitor
  • a keyboard and pointing device such as a mouse or trackball
  • Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and can be in any form (including Acoustic input, speech input or, tactile input) to receive input from the user.
  • the systems and techniques described herein can be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., as a a user computer having a graphical user interface or web browser through which a user can interact with embodiments of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system.
  • the components of the system can be interconnected by any form or medium of digital data communication, eg, a communication network. Examples of communication networks include: LAN (Local Area Network, local area network), WAN (Wide Area Network, wide area network), the Internet, and blockchain networks.
  • a computer system may include clients and servers.
  • Clients and servers are generally remote from each other and typically interact through a communication network.
  • the relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other.
  • the server can be a cloud server, also known as cloud computing server or cloud host, which is a host product in the cloud computing service system to solve the problem of traditional physical host and VPS service ("Virtual Private Server", or "VPS”) Among them, there are defects such as difficult management and weak business scalability.
  • the server can also be a server of a distributed system, or a server combined with a blockchain.
  • artificial intelligence is a discipline that studies the use of computers to simulate certain human thinking processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.), including both hardware-level technology and software-level technology.
  • Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, and big data processing; artificial intelligence software technologies mainly include computer vision technology, speech recognition technology, natural language processing technology, and machine learning/depth Learning, big data processing technology, knowledge map technology and other major directions.
  • steps may be reordered, added or deleted using the various forms of flow shown above.
  • each step described in the present disclosure may be executed in parallel, sequentially, or in a different order, as long as the desired result of the technical solution disclosed in the present disclosure can be achieved, no limitation is imposed herein.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Processing (AREA)

Abstract

本公开提供了文档图像的矫正方法、装置、电子设备和存储介质,人工智能技术领域,涉及计算机视觉和深度学习技术领域,可应用于图像处理、图像识别等场景,将待矫正文档图像输入形状网络模型,以得到所述待矫正文档图像对应的扭曲三维坐标;将所述扭曲三维坐标输入矫正坐标预测网络模型,以得到所述扭曲三维坐标对应的矫正三维坐标;根据所述矫正三维坐标及所述待矫正文档图像的角点计算对应的二维前向图;通过对所述二维前向图进行插值计算得到二维后向图,根据所述二维后向图生成矫正后的文档图像。通过从单个待矫正文档图像中消除形变的深度学习,能够降低待矫正文档图像的局部失真率以及OCR字符错误率。

Description

文档图像的矫正方法、装置、电子设备和存储介质
相关申请的交叉引用
本申请基于申请号为202110945049.4、申请日为2021年08月17日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
公开涉及人工智能技术领域,涉及计算机视觉和深度学习技术领域,可应用于图像处理、图像识别等场景,具体涉及一种文档图像的矫正方法、装置、电子设备和存储介质。
背景技术
随着移动摄像终端数量剧增,随手拍照已经成为一种对文档进行数字化记录的普遍方式,而光学字符识别(Optical Character Recognition,OCR)技术,能够将文档能以图像的形式被分析和识别,一定程度上实现了文本识别的自动化。
然而,文档图像识别效果的优劣与其质量有着密切的联系,由于纸张的物理变形,对文本图像识别造成极大干扰。
发明内容
本公开提供了一种文档图像的矫正方法、装置、电子设备和存储介质。
根据本公开的一方面,提供了一种文档图像的矫正方法,包括:
将待矫正文档图像输入形状网络模型,以得到所述待矫正文档图像对应的扭曲三维坐标;
将所述扭曲三维坐标输入矫正坐标预测网络模型,以得到所述扭曲三维坐标对应的矫正三维坐标;
根据所述矫正三维坐标及所述待矫正文档图像的角点计算对应的二维前向图;
通过对所述二维前向图进行插值计算得到二维后向图,根据所述二维后向图生成矫正后的文档图像。
根据本公开的另一方面,提供了一种文档图像的矫正装置,包括:
第一输入模块,用于将待矫正文档图像输入形状网络模型,以得到所述待矫正文档图像对应的扭曲三维坐标;
第二输入模块,用于将所述扭曲三维坐标输入矫正坐标预测网络模型,以得到所 述扭曲三维坐标对应的矫正三维坐标;
第一计算模块,用于根据所述矫正三维坐标及所述待矫正文档图像的角点计算对应的二维前向图;
处理模块,用于通过对所述二维前向图进行插值计算得到二维后向图,根据所述二维后向图生成矫正后的文档图像。
根据本公开的另一方面,提供了一种电子设备,包括:
至少一个处理器;以及
与所述至少一个处理器通信连接的存储器;其中,
所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行前述一方面所述的方法。
根据本公开的另一方面,提供了一种存储有计算机指令的非瞬时计算机可读存储介质,其中,所述计算机指令用于使计算机执行前述一方面所述的方法。
根据本公开的另一方面,提供了一种计算机程序产品,包括计算机程序,所述计算机程序在被处理器执行时实现如前述一方面所述的方法。
本公开提供的文档图像的矫正方法、装置、电子设备和存储介质,将待矫正文档图像输入形状网络模型,以得到所述待矫正文档图像对应的扭曲三维坐标,将所述扭曲三维坐标输入矫正坐标预测网络模型,以得到所述扭曲三维坐标对应的矫正三维坐标,根据所述矫正三维坐标及所述待矫正文档图像的角点计算对应的二维前向图,通过对所述二维前向图进行插值计算得到二维后向图,根据所述二维后向图生成矫正后的文档图像。通过从单个待矫正文档图像中消除形变的深度学习,能够降低待矫正文档图像的局部失真率以及OCR字符错误率。
应当理解,本部分所描述的内容并非旨在标识本申请的实施例的关键或重要特征,也不用于限制本申请的范围。本申请的其它特征将通过以下的说明书而变得容易理解。
附图说明
附图用于更好地理解本方案,不构成对本公开的限定。其中:
图1为本公开实施例所提供的一种文档图像的矫正方法的流程示意图。
图2为本公开实施例所提供的一种矫正坐标预测网络模型的训练方法流程示意图。
图3为本公开实施例提供的一种生成坐标值偏移量真值的示意图。
图4为本公开实施例所提供的一种矫正坐标预测网络模型的训练方法流程示意图。
图5为本公开实施例提供的一种形状网络模型的训练方法流程示意图。
图6为本公开实施例提供的一种文档图像的矫正装置的结构示意图。
图7为本公开实施例提供的一种文档图像的矫正装置的结构示意图。
图8为本公开实施例提供的示例性电子设备的示意性框图。
具体实施方式
以下结合附图对本公开的示范性实施例做出说明,其中包括本公开实施例的各种细节以助于理解,应当将它们认为仅仅是示范性的。因此,本领域普通技术人员应当认识到,可以对这里描述的实施例做出各种改变和修改,而不会背离本公开的范围和精神。同样,为了清楚和简明,以下的描述中省略了对公知功能和结构的描述。
下面参考附图描述本公开实施例的文档图像的矫正方法、装置、电子设备和存储介质。
相关技术中,根据3D形状重建的方法实现文档图像的矫正变形,如使用投影仪系统获得3D模型或者使用专用的激光测距仪来捕获变形文档的3D模型,并根据纸张的物理特性以进行形状恢复,这种方式下多依赖于物理设备,对物理设备的精准度要求较高,对于3D模型的构建,物理设备的稳定性及精准度是很难把控的,从而导致构建的3D模型也不稳定,进而直接导致文档图像的矫正质量差以及失真率高。
本公开中,为了避免基于物理设备的不稳定性带来的文档图像的矫正质量差以及失真率高的问题,先采用两个深度学习模型,即形状网络模型以及矫正坐标预测网络模型,将二维的待矫正图像转化为三维图像,并对三维图像的坐标进行矫正,其次,将三维的矫正坐标以及待矫正文档图像的角点映射为二维的前向图,最后通过插值得到后向图,根据后向图利用双线性差值方法从待矫正文档图像(扭曲图)生成文档图像的矫正图。
图1为本公开实施例所提供的一种文档图像的矫正方法的流程示意图。
如图1所示,该方法包含以下步骤:
步骤101,将待矫正文档图像输入形状网络模型,以得到所述待矫正文档图像对应的扭曲三维坐标。
所述形状网络模型用于逐像素预测扭曲图待矫正文档图像的3D坐标,所述的待矫正文档图像为二维图像,在输入形状网络模型后,将二维的待矫正文档图像转换映射为对应的扭曲3D坐标。
步骤102,将所述扭曲三维坐标输入矫正坐标预测网络模型,以得到所述扭曲三维坐标对应的矫正三维坐标。
步骤101的输出,作为步骤102的输入,即将形状网络模型输出的扭曲3D坐标输入到矫正坐标预测网络模型,基于该矫正坐标预测网络模型转换为扭曲图像的矫正3D坐标,矫正坐标预测网络模型预测了扭曲3D坐标到矫正3D坐标的坐标值偏移量,该偏移量描述了从待矫正文档图像的扭曲3D坐标到其矫正3D坐标的转换。
在实际应用中,所述矫正坐标预测网络模型可以是依赖基于DenseNet的模型作为映射函数将扭曲3D坐标图转换为3D矫正坐标图。并使用可微的组合目标函数共同优化形状网络模型和矫正坐标预测网络模型,如此以来便能提高待矫正文档图像的转换精度。
步骤103,根据所述矫正三维坐标及所述待矫正文档图像的角点计算对应的二维前向图。
通常情况下,待矫正文档图像的角点包含4个角点,本步骤所述的角点为二维状态下的待矫正文档图像的角点所对应的矫正3D坐标,将该矫正3D坐标转换为二维状态下对应的2D坐标,该处的2D坐标也是逐像素进行转换,当逐像素对矫正三维坐标图进行转换后,得到二维前向图,角点与矫正三维坐标的结合得到的二维前向图,能够很好的还原待矫正文档图像。
计算二维前向图过程如下所示:
根据所述待矫正文档图像的左上角点、右上角点以及所述待矫正文档图像的宽度,计算二维前向图的横向坐标,根据所述待矫正文档图像的左下角点、右下角点以及所述待矫正文档图像的高度,计算二维前向图的纵向坐标,根据所述横向坐标及所述纵向坐标生成前向图。
假定矫正图的四角点为左上角点:C1、左下角点:C2、右下角点:C3、右上角点:C4,且根据其真值的生成方式,设定此四个角点对应的矫正3D坐标,在矫正坐标预测网络模型的3D矫正坐标中,假设任意一点P点的矫正3D坐标为(x,y,z),要计算的前向图中的2D坐标为(x1,y1)那么P点对应的2D坐标为:
Figure PCTCN2022085587-appb-000001
Figure PCTCN2022085587-appb-000002
其中,width和height为设定的图像宽度和高度值。
步骤104,通过对所述二维前向图进行插值计算得到二维后向图,根据所述二维后向图生成矫正后的文档图像。
基于步骤103获取的2D前向图,采用插值方式将2D前向图转换为2D后向图,最后使用双线性插值恢复出矫正图像。
本公开提供的文档图像的矫正方法将待矫正文档图像输入形状网络模型,以得到所述待矫正文档图像对应的扭曲三维坐标,将所述扭曲三维坐标输入矫正坐标预测网络模型,以得到所述扭曲三维坐标对应的矫正三维坐标,根据所述矫正三维坐标及所述待矫正文档图像的角点计算对应的二维前向图,通过对所述二维前向图进行插值计算得到二维后向图,根据所述二维后向图生成矫正后的文档图像。通过从单个待矫正文档图像中消除形变的深度学习,通过逐像素矫正方式能够降低待矫正文档图像的局部失真率以及OCR字符错误率。
所述形状网络模型与矫正坐标预测网络模型分别进行训练,训练完成后在将两个最优模型进行联合训练。在训练过程中,对于形状网络模型,先用少量数据进行预训练,此过程仅使用扭曲图3D坐标的预测目标信息与逐像素差异为约束,后加入其余数据与其他约束,进行再次训练,直到得到3D坐标预测的最佳模型。对于矫正坐标预测网络模型,同样的,先用一部分数据进行预训练,此过程仅用扭曲图3D矫正坐标真值作为约束,然后加入其他训练数据以及梯度约束以及法向量约束。
如图2所示,图2示出了本申请提供的一种矫正坐标预测网络模型的训练方法流程图,所述方法包括:
步骤201,获取第一训练扭曲文档信息,所述第一训练扭曲文档信息中包含训练扭曲文档图像、以及所述训练扭曲文档图像的扭曲三维坐标到矫正三维坐标的坐标值偏移量真值。
本申请实施例使用的第一训练扭曲文档信息是开源的Doc3D合成数据集,此第一训练扭曲文档信息中包含可用于训练的2D训练扭曲文档图像以及每张图的3D坐标标注信息、深度图标注信息、法向量标注信息、后向图标注信息等等。因此,对于矫正坐标预测网络模型中用到的扭曲图的3D矫正坐标,也是由Doc3D数据集提供的后向图信息生成出来的。
坐标值偏移量真值作为矫正坐标预测网络模型的约束信息,其生成方法如下所述,示意图3所述:将Doc3D数据集中的后向图插值为前向图,同时根据后向图添加Z轴信息得到矫正图3D坐标,由前向图和矫正图3D坐标生成扭曲3D坐标对应的3D矫正坐标,该3D矫正坐标作为矫正3D坐标的坐标值偏移量真值。
在本申请的一种实现方式中,在添加Z轴时,是二维训练扭曲文档图像转换为3D的基础,其添加Z轴可以采用训练扭曲文档图像的中心点,以该中心点形成的法向量,依据该法向量实现对Z轴的添加,在本申请的另一种实现过程中,还可以基于相关技术中的任一种实现方式完成Z轴的添加,本申请对具体实现方式不进行限定。
为了更好的理解偏移量,以下以示例的形式进行说明,待矫正文档图像在二维状态下的坐标为绝对值,例如,某点坐标为(0.8,0.8,1.9),但是在训练矫正坐标预测网络模型下,其对应的3D坐标表示为(1-0.2,1-0.2,2-0.1),其中,0.2,0.2以及0.1为该坐标的偏移量。上述仅为了便于理解给出的示例性说明,并非是对某坐标的具体限定。
步骤202,将所述训练扭曲文档图像输入所述矫正坐标预测网络模型,以获取所述训练扭曲文档图像的扭曲三维坐标到矫正三维坐标的坐标值偏移量预测值。
步骤203,根据所述坐标值偏移量真值与所述坐标值偏移量预测值之间的差异,对所述矫正坐标预测网络模型进行训练,以得到训练好的矫正坐标预测网络模型。
将所述坐标值偏移量真值与所述坐标值偏移量预测值之间的差异转换为梯度误差,基于所述梯度误差对所述矫正坐标预测网络模型进行训练。梯度误差用于描述平滑度,例如,某一坐标为(0.2,0.2,03),其相邻坐标可能为(0.22,0.20,0.31),而不是存在该点的坐标为(1,0.20,,0.31)的情况。通过梯度误差的设置,可以提高训练矫正坐标预测网络模型训练的精准度,进而能够增加矫正文档图像的精准度。
具体实施过程中,矫正坐标预测网络模型的训练过程一般为多次迭代训练的过程,在不断调整各层网络层级参数的前提下,使得训练结果更加收敛,以完成矫正坐标预测网络模型的训练。
对于局部块的褶皱和扭曲,用法向量约束其平滑性更为适合。因此本申请在矫正坐标预测网络模型中使用梯度和法向量的征作为增强,更为精确的预测扭曲图像的3D矫正坐标。
如图4所示,图4示出了本申请提供的另一种矫正坐标预测网络模型的训练方法, 该方法与图2叠加进行训练,可以使得矫正坐标预测网络模型输出的矫正3D坐标更精准。所述方法包括:
步骤401,获取所述训练扭曲文档图像中的第一像素点。
该第一像素点为训练扭曲文档图像中的任意像素点,假设该第一像素点为像素点A。
步骤402,获取与所述第一像素点相邻的平行方向的第二像素点,以及获取与所述第一像素点相邻的垂直方向的第三像素点,由所述第一像素点、第二像素点以及第三像素点构成一邻域。
第二像素点为第一像素点的右侧紧邻的像素点,第三像素点是第一像素点下方紧邻的像素点,由所述第一像素点、第二像素点以及第三像素点构成一邻域。
步骤403,获取所述邻域内计算中心像素法向量。
步骤404,根据所述第一像素点、第二像素点以及第三像素点中任两像素点组成的三维坐标的向量,计算矫正三维坐标法向量。
计算矫正三维坐标向量时采用下述公式:
Figure PCTCN2022085587-appb-000003
其中,P A表示A点3D坐标,
Figure PCTCN2022085587-appb-000004
表示由A点3D坐标指向B点3D坐标的向量,
Figure PCTCN2022085587-appb-000005
表示由A点3D坐标指向C点3D坐标的向量。
步骤405,计算所述中心像素法向量与矫正三维坐标法向量的叉乘。
法向量加入训练约束时在一个邻域内求邻域像素法向量与中心像素点位置的法向量的叉乘结果,目标是邻域内法向量平行,即叉乘结果为0,具体的叉乘公式,如下所示:
Figure PCTCN2022085587-appb-000006
其中J表示所有像素的遍历K j表示像素j的一个邻域。
步骤406,根据叉乘结果与目标叉乘结果之间的差异对所述矫正坐标预测网络模型进行训练。
理想状态下的目标叉乘结果为0,该叉乘结果意味文档图像是完成平展状态,该训练难度也随之增加。实际应用中,也可设置目标叉乘结果为0.01,或者0.02等,该目标叉乘结果的设定需根据其应用的具体场景,具体本实施例不进行限定。
基于法向量进行的矫正坐标预测网络模型训练,其训练过程一般为多次迭代训练的过程,在不断调整各层网络层级参数的前提下,使得训练结果更加收敛,以完成矫正坐标预测网络模型的训练。
如图5所示,图5示出了本申请实施例提供的一种形状网络模型的训练方法的流程图,所述方法包括:
步骤501,获取第二训练扭曲文档信息,所述第二训练扭曲文档信息包括训练扭曲文档图像以及所述训练扭曲文档图像的样本目标信息,所述训练扭曲文档图像中包含已标注的二维坐标系的图像以及已标注的角度真值。
本申请实施例使用的第二训练扭曲文档信息是开源的Doc3D合成数据集,此第二 训练扭曲文档信息中包含可用于训练的2D训练扭曲文档图像,该训练扭曲文档图像中包含已标注的二维坐标系的图像以及已标注的角度真值。
角度真值的计算需要使用到后向图,即扭曲图到矫正图的映射关系,使用角度真值作为约束的目的是保持扭曲图(扭曲3D坐标形成的图)与预测图(矫正3D坐标形成的图)的角度和扭曲图到真值的角度的一致性。
步骤502,基于所述二维坐标系的X轴生成分别在X轴方向和Y轴方向的第一角度偏移值对。
步骤503,基于所述二维坐标系的Y轴生成分别在X轴方向和Y轴方向的第二角度偏移值对。
形状网络模型会预测4个辅助通道,即X轴生成分别在X轴方向和Y轴方向的第一角度偏移值对,Y轴生成分别在X轴方向和Y轴方向的第二角度偏移值对,表示为(φ xx,φ xy,φ yx,φ yy),其中,φ xy的预测提供针对X轴在Y方向上预测的偏移值,依此类推。
步骤504,根据所述第一角度偏移值对计算所述训练扭曲文档图像角度的第一幅值。
步骤505,根据所述第二角度偏移值对计算所述训练扭曲文档图像角度的第二幅值。
步骤506,根据所述训练扭曲文档图像角度的所述第一幅值及所述第二幅值生成所述训练扭曲文档图像的预测角度。
对于每个轴,计算角度θ i(i∈{x,y})的幅值ρ i
θ i=arctan2(φ ixiy)
ρ i=||φ ixiy|| 2
其中“arctan2”是反正切运算符的四象限变体(也称为“atan2”)。
将步骤506计算出的值用于角度损失中。
Figure PCTCN2022085587-appb-000007
步骤507,将所述预测角度输入所述形状网络模型,以获取训练扭曲文档图像中的已标注的扭曲三维坐标以及预测目标信息。
步骤508,根据所述样本目标信息与所述预测目标信息之间的差异,以及,所述训练扭曲文档图像的预测角度与角度真值之间的差异,对所述形状网络模型进行训练,以得到训练好的形状网络模型。
形状网络模型其训练过程一般为多次迭代训练的过程,在不断调整各层网络层级参数的前提下,使得训练结果更加收敛,以完成形状网络模型的训练。
图6为本公开实施例提供的一种文档图像的矫正装置的结构示意图,如图6所示,包括:第一输入模块61、第二输入模块62、第一计算模块63和处理模块64。
第一输入模块61,用于将待矫正文档图像输入形状网络模型,以得到所述待矫正文档图像对应的扭曲三维坐标。
第二输入模块62,用于将所述扭曲三维坐标输入矫正坐标预测网络模型,以得到所述扭曲三维坐标对应的矫正三维坐标。
第一计算模块63,用于根据所述矫正三维坐标及所述待矫正文档图像的角点计算对应的二维前向图。
处理模块64,用于通过对所述二维前向图进行插值计算得到二维后向图,根据所述二维后向图生成矫正后的文档图像。
进一步地,在本实施例一种可能的实现方式中,如图7所示,所述装置还包括:第一输入模块71、第二输入模块72、第一计算模块73、处理模块74、第一获取模块75、第三输入模块76和第一训练模块77,其中,有关第一输入模块71、第二输入模块72、第一计算模块73和处理模块74的说明,请参阅图6所示第一输入模块61、第二输入模块62、第一计算模块63和处理模块64的对应说明,本申请在此不再进行一一赘述。
第一获取模块75,用于获取第一训练扭曲文档信息,所述第一训练扭曲文档信息中包含训练扭曲文档图像、以及所述训练扭曲文档图像的扭曲三维坐标到矫正三维坐标的坐标值偏移量真值。
第三输入模块76,用于将所述训练扭曲文档图像输入所述矫正坐标预测网络模型,以获取所述训练扭曲文档图像的扭曲三维坐标到矫正三维坐标的坐标值偏移量预测值。
第一训练模块77,用于根据所述坐标值偏移量真值与所述坐标值偏移量预测值之间的差异,对所述矫正坐标预测网络模型进行训练,以得到训练好的矫正坐标预测网络模型。
进一步地,在本实施例一种可能的实现方式中,如图7所示,所述第一训练模块77还包括:
第一获取单元771,用于获取所述训练扭曲文档图像中的第一像素点;
第二获取单元772,用于获取与所述第一像素点相邻的平行方向的第二像素点,以及获取与所述第一像素点相邻的垂直方向的第三像素点,
构成单元773,用于由所述第一像素点、第二像素点以及第三像素点构成一邻域;
第三获取单元774,用于获取所述邻域内计算中心像素法向量;
第一计算单元775,用于根据所述第一像素点、第二像素点以及第三像素点中任两像素点组成的三维坐标的向量,计算矫正三维坐标法向量;
第二计算单元776,用于计算所述中心像素法向量与矫正三维坐标法向量的叉乘;
第一训练单元777,用于根据叉乘结果与目标叉乘结果之间的差异对所述矫正坐标预测网络模型进行训练。
进一步地,在本实施例一种可能的实现方式中,如图7所示,所述第一训练模块还77包括:
转换单元778,用于将所述坐标值偏移量真值与所述坐标值偏移量预测值之间的差异转换为梯度误差;
第二训练单元779,用于基于所述梯度误差对所述矫正坐标预测网络模型进行训练。
进一步地,在本实施例一种可能的实现方式中,如图7所示,所述装置还包括:
第二获取模块78,用于获取第二训练扭曲文档信息,所述第二训练扭曲文档信息包括训练扭曲文档图像以及所述训练扭曲文档图像的样本目标信息,所述训练扭曲文档图像中包含已标注的二维坐标系的图像以及已标注的角度真值;
第一生成模块79,用于基于所述二维坐标系的X轴生成分别在X轴方向和Y轴方向的第一角度偏移值对;
第二生成模块710,用于基于所述二维坐标系的Y轴生成分别在X轴方向和Y轴方向的第二角度偏移值对;
第二计算模块711,用于根据所述第一角度偏移值对计算所述训练扭曲文档图像角度的第一幅值;
第三计算模块712,用于根据所述第二角度偏移值对计算所述训练扭曲文档图像角度的第二幅值;
生成模块713,用于根据所述训练扭曲文档图像角度的所述第一幅值及所述第二幅值生成所述训练扭曲文档图像的预测角度;
第四输入模块714,用于将所述预测角度输入所述形状网络模型,以获取训练扭曲文档图像中的已标注的扭曲三维坐标以及预测目标信息;
第二训练模块715,用于根据所述样本目标信息与所述预测目标信息之间的差异,以及,所述训练扭曲文档图像的预测角度与角度真值之间的差异,对所述形状网络模型进行训练,以得到训练好的形状网络模型。
进一步地,在本实施例一种可能的实现方式中,如图7所示,所述待矫正文档图像包含四个角点,所述第一计算模块73包括:
第一计算单元731,用于根据所述待矫正文档图像的左上角点、右上角点以及所述待矫正文档图像的宽度,计算二维前向图的横向坐标;
第二计算单元732,用于根据所述待矫正文档图像的左下角点、右下角点以及所述待矫正文档图像的高度,计算二维前向图的纵向坐标;
生成单元733,用于根据所述横向坐标及所述纵向坐标生成前向图。
需要说明的是,前述对方法实施例的解释说明,也适用于本实施例的装置,原理相同,本实施例中不再限定。
根据本公开的实施例,本公开还提供了一种电子设备、一种可读存储介质和一种计算机程序产品。
图8示出了可以用来实施本公开的实施例的示例电子设备800的示意性框图。电子设备旨在表示各种形式的数字计算机,诸如,膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置,诸如,个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或者要求的本公开的实现。
如图8所示,设备800包括计算单元801,其可以根据存储在ROM(Read-Only Memory,只读存储器)802中的计算机程序或者从存储单元808加载到RAM(Random Access Memory,随机访问/存取存储器)803中的计算机程序,来执行各种适当的动作和处理。在RAM 803中,还可存储设备800操作所需的各种程序和数据。计算单元801、ROM 802以及RAM 803通过总线804彼此相连。I/O(Input/Output,输入/输出)接口805也连接至总线804。
设备800中的多个部件连接至I/O接口805,包括:输入单元806,例如键盘、鼠标等;输出单元807,例如各种类型的显示器、扬声器等;存储单元808,例如磁盘、光盘等;以及通信单元809,例如网卡、调制解调器、无线通信收发机等。通信单元809允许设备800通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据。
计算单元801可以是各种具有处理和计算能力的通用和/或专用处理组件。计算单元801的一些示例包括但不限于CPU(Central Processing Unit,中央处理单元)、GPU(Graphic Processing Units,图形处理单元)、各种专用的AI(Artificial Intelligence,人工智能)计算芯片、各种运行机器学习模型算法的计算单元、DSP(Digital Signal Processor,数字信号处理器)、以及任何适当的处理器、控制器、微控制器等。计算单元801执行上文所描述的各个方法和处理,例如文档图像的矫正方法。例如,在一些实施例中,文档图像的矫正方法可被实现为计算机软件程序,其被有形地包含于机器可读介质,例如存储单元808。在一些实施例中,计算机程序的部分或者全部可以经由ROM802和/或通信单元809而被载入和/或安装到设备800上。当计算机程序加载到RAM 803并由计算单元801执行时,可以执行上文描述的方法的一个或多个步骤。备选地,在其他实施例中,计算单元801可以通过其他任何适当的方式(例如,借助于固件)而被配置为执行前述文档图像的矫正方法。
本文中以上描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、FPGA(Field Programmable Gate Array,现场可编程门阵列)、ASIC(Application-Specific Integrated Circuit,专用集成电路)、ASSP(Application Specific Standard Product,专用标准产品)、SOC(System On Chip,芯片上系统的系统)、CPLD(Complex Programmable Logic Device,复杂可编程逻辑设备)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括:实施在一个或者多个计算机程序中,该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释,该可编程处理器可以是专用或者通用可编程处理器,可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令,并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。
用于实施本公开的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器或控制器,使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的 功能/操作被实施。程序代码可以完全在机器上执行、部分地在机器上执行,作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、RAM、ROM、EPROM(Electrically Programmable Read-Only-Memory,可擦除可编程只读存储器)或快闪存储器、光纤、CD-ROM(Compact Disc Read-Only Memory,便捷式紧凑盘只读存储器)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
为了提供与用户的交互,可以在计算机上实施此处描述的系统和技术,该计算机具有:用于向用户显示信息的显示装置(例如,CRT(Cathode-Ray Tube,阴极射线管)或者LCD(Liquid Crystal Display,液晶显示器)监视器);以及键盘和指向装置(例如,鼠标或者轨迹球),用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互;例如,提供给用户的反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反馈、或者触觉反馈);并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。
可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如,作为数据服务器)、或者包括中间件部件的计算系统(例如,应用服务器)、或者包括前端部件的计算系统(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将系统的部件相互连接。通信网络的示例包括:LAN(Local Area Network,局域网)、WAN(Wide Area Network,广域网)、互联网和区块链网络。
计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。服务器可以是云服务器,又称为云计算服务器或云主机,是云计算服务体系中的一项主机产品,以解决了传统物理主机与VPS服务("Virtual Private Server",或简称"VPS")中,存在的管理难度大,业务扩展性弱的缺陷。服务器也可以为分布式系统的服务器,或者是结合了区块链的服务器。
其中,需要说明的是,人工智能是研究使计算机来模拟人的某些思维过程和智能行为(如学习、推理、思考、规划等)的学科,既有硬件层面的技术也有软件层面的技术。人工智能硬件技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、 大数据处理等技术;人工智能软件技术主要包括计算机视觉技术、语音识别技术、自然语言处理技术以及机器学习/深度学习、大数据处理技术、知识图谱技术等几大方向。
应该理解,可以使用上面所示的各种形式的流程,重新排序、增加或删除步骤。例如,本发公开中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行,只要能够实现本公开公开的技术方案所期望的结果,本文在此不进行限制。
上述具体实施方式,并不构成对本公开保护范围的限制。本领域技术人员应该明白的是,根据设计要求和其他因素,可以进行各种修改、组合、子组合和替代。任何在本公开的精神和原则之内所作的修改、等同替换和改进等,均应包含在本公开保护范围之内。

Claims (15)

  1. 一种文档图像的矫正方法,包括:
    将待矫正文档图像输入形状网络模型,以得到所述待矫正文档图像对应的扭曲三维坐标;
    将所述扭曲三维坐标输入矫正坐标预测网络模型,以得到所述扭曲三维坐标对应的矫正三维坐标;
    根据所述矫正三维坐标及所述待矫正文档图像的角点计算对应的二维前向图;
    通过对所述二维前向图进行插值计算得到二维后向图,根据所述二维后向图生成矫正后的文档图像。
  2. 根据权利要求1所述的文档图像的矫正方法,其中,所述方法还包括:
    获取第一训练扭曲文档信息,所述第一训练扭曲文档信息中包含训练扭曲文档图像、以及所述训练扭曲文档图像的扭曲三维坐标到矫正三维坐标的坐标值偏移量真值;
    将所述训练扭曲文档图像输入所述矫正坐标预测网络模型,以获取所述训练扭曲文档图像的扭曲三维坐标到矫正三维坐标的坐标值偏移量预测值;
    根据所述坐标值偏移量真值与所述坐标值偏移量预测值之间的差异,对所述矫正坐标预测网络模型进行训练,以得到训练好的矫正坐标预测网络模型。
  3. 根据权利要求2所述的文档图像的矫正方法,其中,所述根据所述坐标值偏移量真值与所述坐标值偏移量预测值之间的差异,对所述矫正坐标预测网络模型进行训练还包括:
    获取所述训练扭曲文档图像中的第一像素点;
    获取与所述第一像素点相邻的平行方向的第二像素点,以及获取与所述第一像素点相邻的垂直方向的第三像素点,
    由所述第一像素点、第二像素点以及第三像素点构成一邻域;
    获取所述邻域内计算中心像素法向量;
    根据所述第一像素点、第二像素点以及第三像素点中任两像素点组成的三维坐标的向量,计算矫正三维坐标法向量;
    计算所述中心像素法向量与所述矫正三维坐标法向量的叉乘;
    根据叉乘结果与目标叉乘结果之间的差异对所述矫正坐标预测网络模型进行训练。
  4. 根据权利要求2所述的文档图像的矫正方法,其中,根据所述坐标值偏移量真值与所述坐标值偏移量预测值之间的差异,对所述矫正坐标预测网络模型进行训练包括:
    将所述坐标值偏移量真值与所述坐标值偏移量预测值之间的差异转换为梯度误差;
    基于所述梯度误差对所述矫正坐标预测网络模型进行训练。
  5. 根据权利要求1至4中任一项所述的文档图像的矫正方法,其中,所述方法还包括:
    获取第二训练扭曲文档信息,所述第二训练扭曲文档信息包括训练扭曲文档图像以及所述训练扭曲文档图像的样本目标信息,所述训练扭曲文档图像中包含已标注的二维坐标系的图像以及已标注的角度真值;
    基于所述二维坐标系的X轴生成分别在X轴方向和Y轴方向的第一角度偏移值对;
    基于所述二维坐标系的Y轴生成分别在X轴方向和Y轴方向的第二角度偏移值对;
    根据所述第一角度偏移值对计算所述训练扭曲文档图像角度的第一幅值;
    根据所述第二角度偏移值对计算所述训练扭曲文档图像角度的第二幅值;
    根据所述训练扭曲文档图像角度的所述第一幅值及所述第二幅值生成所述训练扭曲文档图像的预测角度;
    将所述预测角度输入所述形状网络模型,以获取训练扭曲文档图像中的已标注的扭曲三维坐标以及预测目标信息;
    根据所述样本目标信息与所述预测目标信息之间的差异,以及,所述训练扭曲文档图像的预测角度与角度真值之间的差异,对所述形状网络模型进行训练,以得到训练好的形状网络模型。
  6. 根据权利要求1至5中任一项所述的文档图像的矫正方法,其中,所述待矫正文档图像包含四个角点,所述根据所述矫正三维坐标及所述待矫正文档图像的角点计算对应的二维前向图包括:
    根据所述待矫正文档图像的左上角点、右上角点以及所述待矫正文档图像的宽度,计算二维前向图的横向坐标;
    根据所述待矫正文档图像的左下角点、右下角点以及所述待矫正文档图像的高度,计算二维前向图的纵向坐标;
    根据所述横向坐标及所述纵向坐标生成前向图。
  7. 一种文档图像的矫正装置,包括:
    第一输入模块,用于将待矫正文档图像输入形状网络模型,以得到所述待矫正文档图像对应的扭曲三维坐标;
    第二输入模块,用于将所述扭曲三维坐标输入矫正坐标预测网络模型,以得到所述扭曲三维坐标对应的矫正三维坐标;
    第一计算模块,用于根据所述矫正三维坐标及所述待矫正文档图像的角点计算对应的二维前向图;
    处理模块,用于通过对所述二维前向图进行插值计算得到二维后向图,根据所述二维后向图生成矫正后的文档图像。
  8. 根据权利要求7所述的文档图像的矫正装置,其中,所述装置还包括:
    第一获取模块,用于获取第一训练扭曲文档信息,所述第一训练扭曲文档信息中包含训练扭曲文档图像、以及所述训练扭曲文档图像的扭曲三维坐标到矫正三维坐标的坐标值偏移量真值;
    第三输入模块,用于将所述训练扭曲文档图像输入所述矫正坐标预测网络模型,以获取所述训练扭曲文档图像的扭曲三维坐标到矫正三维坐标的坐标值偏移量预测值;
    第一训练模块,用于根据所述坐标值偏移量真值与所述坐标值偏移量预测值之间的差异,对所述矫正坐标预测网络模型进行训练,以得到训练好的矫正坐标预测网络模型。
  9. 根据权利要求8所述的文档图像的矫正装置,其中,所述第一训练模块还包括:
    第一获取单元,用于获取所述训练扭曲文档图像中的第一像素点;
    第二获取单元,用于获取与所述第一像素点相邻的平行方向的第二像素点,以及获取与所述第一像素点相邻的垂直方向的第三像素点,
    构成单元,用于由所述第一像素点、第二像素点以及第三像素点构成一邻域;
    第三获取单元,用于获取所述邻域内计算中心像素法向量;
    第一计算单元,用于根据所述第一像素点、第二像素点以及第三像素点中任两像素点组成的三维坐标的向量,计算矫正三维坐标法向量;
    第二计算单元,用于计算所述中心像素法向量与所述矫正三维坐标法向量的叉乘;
    第一训练单元,用于根据叉乘结果与目标叉乘结果之间的差异对所述矫正坐标预测网络模型进行训练。
  10. 根据权利要求8所述的文档图像的矫正装置,其中,所述第一训练模块还包括:
    转换单元,用于将所述坐标值偏移量真值与所述坐标值偏移量预测值之间的差异转换为梯度误差;
    第二训练单元,用于基于所述梯度误差对所述矫正坐标预测网络模型进行训练。
  11. 根据权利要求7至10中任一项所述的文档图像的矫正装置,其中,所述装置还包括:
    第二获取模块,用于获取第二训练扭曲文档信息,所述第二训练扭曲文档信息包括训练扭曲文档图像以及所述训练扭曲文档图像的样本目标信息,所述训练扭曲文档图像中包含已标注的二维坐标系的图像以及已标注的角度真值;
    第一生成模块,用于基于所述二维坐标系的X轴生成分别在X轴方向和Y轴方向的第一角度偏移值对;
    第二生成模块,用于基于所述二维坐标系的Y轴生成分别在X轴方向和Y轴方向的第二角度偏移值对;
    第二计算模块,用于根据所述第一角度偏移值对计算所述训练扭曲文档图像角度的第一幅值;
    第三计算模块,用于根据所述第二角度偏移值对计算所述训练扭曲文档图像角度的第二幅值;
    生成模块,用于根据所述训练扭曲文档图像角度的所述第一幅值及所述第二幅值生成所述训练扭曲文档图像的预测角度;
    第四输入模块,用于将所述预测角度输入所述形状网络模型,以获取训练扭曲文档图像中的已标注的扭曲三维坐标以及预测目标信息;
    第二训练模块,用于根据所述样本目标信息与所述预测目标信息之间的差异,以及,所述训练扭曲文档图像的预测角度与角度真值之间的差异,对所述形状网络模型进行训练,以得到训练好的形状网络模型。
  12. 根据权利要求7至11中任一项所述的文档图像的矫正装置,其中,所述待矫正文档图像包含四个角点,所述第一计算模块包括:
    第一计算单元,用于根据所述待矫正文档图像的左上角点、右上角点以及所述待矫正文档图像的宽度,计算二维前向图的横向坐标;
    第二计算单元,用于根据所述待矫正文档图像的左下角点、右下角点以及所述待矫正文档图像的高度,计算二维前向图的纵向坐标;
    生成单元,用于根据所述横向坐标及所述纵向坐标生成前向图。
  13. 一种电子设备,包括:
    至少一个处理器;以及
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1-6中任一项所述的方法。
  14. 一种存储有计算机指令的非瞬时计算机可读存储介质,其中,所述计算机指令用于使计算机执行根据权利要求1-6中任一项所述的方法。
  15. 一种计算机程序产品,包括计算机程序,所述计算机程序在被处理器执行时实现根据权利要求1-6中任一项所述方法的步骤。
PCT/CN2022/085587 2021-08-17 2022-04-07 文档图像的矫正方法、装置、电子设备和存储介质 WO2023019974A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110945049.4A CN113792730B (zh) 2021-08-17 2021-08-17 文档图像的矫正方法、装置、电子设备和存储介质
CN202110945049.4 2021-08-17

Publications (1)

Publication Number Publication Date
WO2023019974A1 true WO2023019974A1 (zh) 2023-02-23

Family

ID=78876141

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/085587 WO2023019974A1 (zh) 2021-08-17 2022-04-07 文档图像的矫正方法、装置、电子设备和存储介质

Country Status (2)

Country Link
CN (1) CN113792730B (zh)
WO (1) WO2023019974A1 (zh)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113792730B (zh) * 2021-08-17 2022-09-27 北京百度网讯科技有限公司 文档图像的矫正方法、装置、电子设备和存储介质
CN114155546B (zh) * 2022-02-07 2022-05-20 北京世纪好未来教育科技有限公司 一种图像矫正方法、装置、电子设备和存储介质
CN114937271B (zh) * 2022-05-11 2023-04-18 中维建通信技术服务有限公司 一种通信数据智能录入校对方法
CN115187995B (zh) * 2022-07-08 2023-04-18 北京百度网讯科技有限公司 文档矫正方法、装置、电子设备和存储介质
CN115760620B (zh) * 2022-11-18 2023-10-20 荣耀终端有限公司 一种文档矫正方法、装置及电子设备
CN115984856A (zh) * 2022-12-05 2023-04-18 百度(中国)有限公司 文档图像矫正模型的训练方法、文档图像的矫正方法
CN115641280B (zh) * 2022-12-16 2023-03-17 摩尔线程智能科技(北京)有限责任公司 图像校正方法及装置、电子设备和存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170076169A1 (en) * 2011-10-17 2017-03-16 Sharp Laboratories of America (SLA), Inc. System and Method for Scanned Document Correction
CN108921804A (zh) * 2018-07-04 2018-11-30 苏州大学 扭曲文档图像的校正方法
CN112509106A (zh) * 2020-11-17 2021-03-16 科大讯飞股份有限公司 文档图片展平方法、装置以及设备
CN113255664A (zh) * 2021-05-26 2021-08-13 北京百度网讯科技有限公司 图像处理方法、相关装置及计算机程序产品
CN113792730A (zh) * 2021-08-17 2021-12-14 北京百度网讯科技有限公司 文档图像的矫正方法、装置、电子设备和存储介质
CN114255337A (zh) * 2021-11-03 2022-03-29 北京百度网讯科技有限公司 文档图像的矫正方法、装置、电子设备及存储介质

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100546646B1 (ko) * 2003-04-22 2006-01-26 엘지전자 주식회사 화면 왜곡 보정 방법 및 장치
CN102801894B (zh) * 2012-07-18 2014-10-01 天津大学 一种变形书页展平方法
CN109688392B (zh) * 2018-12-26 2021-11-02 联创汽车电子有限公司 Ar-hud光学投影系统及映射关系标定方法和畸变矫正方法
CN110059691B (zh) * 2019-03-29 2022-10-14 南京邮电大学 基于移动终端的多视角扭曲文档图像几何校正方法
CN111832371A (zh) * 2019-04-23 2020-10-27 珠海金山办公软件有限公司 文本图片矫正方法、装置、电子设备及机器可读存储介质
CN111260586B (zh) * 2020-01-20 2023-07-04 北京百度网讯科技有限公司 扭曲文档图像的矫正方法和装置
CN113160294B (zh) * 2021-03-31 2022-12-23 中国科学院深圳先进技术研究院 图像场景深度的估计方法、装置、终端设备和存储介质
CN113034406B (zh) * 2021-04-27 2024-05-14 中国平安人寿保险股份有限公司 扭曲文档恢复方法、装置、设备及介质
CN113205090B (zh) * 2021-04-29 2023-10-24 北京百度网讯科技有限公司 图片矫正方法、装置、电子设备及计算机可读存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170076169A1 (en) * 2011-10-17 2017-03-16 Sharp Laboratories of America (SLA), Inc. System and Method for Scanned Document Correction
CN108921804A (zh) * 2018-07-04 2018-11-30 苏州大学 扭曲文档图像的校正方法
CN112509106A (zh) * 2020-11-17 2021-03-16 科大讯飞股份有限公司 文档图片展平方法、装置以及设备
CN113255664A (zh) * 2021-05-26 2021-08-13 北京百度网讯科技有限公司 图像处理方法、相关装置及计算机程序产品
CN113792730A (zh) * 2021-08-17 2021-12-14 北京百度网讯科技有限公司 文档图像的矫正方法、装置、电子设备和存储介质
CN114255337A (zh) * 2021-11-03 2022-03-29 北京百度网讯科技有限公司 文档图像的矫正方法、装置、电子设备及存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
DAS SAGNIK; MA KE; SHU ZHIXIN; SAMARAS DIMITRIS; SHILKROT ROY: "DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks", 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), IEEE, 27 October 2019 (2019-10-27), pages 131 - 140, XP033723931, DOI: 10.1109/ICCV.2019.00022 *

Also Published As

Publication number Publication date
CN113792730B (zh) 2022-09-27
CN113792730A (zh) 2021-12-14

Similar Documents

Publication Publication Date Title
WO2023019974A1 (zh) 文档图像的矫正方法、装置、电子设备和存储介质
US11854118B2 (en) Method for training generative network, method for generating near-infrared image and device
JP7373554B2 (ja) クロスドメイン画像変換
WO2015139574A1 (zh) 一种静态物体重建方法和系统
WO2019024808A1 (zh) 语义分割模型的训练方法和装置、电子设备、存储介质
CN111768477B (zh) 三维人脸表情基建立方法及装置、存储介质及电子设备
US10521919B2 (en) Information processing device and information processing method for applying an optimization model
CN112771573A (zh) 基于散斑图像的深度估计方法及装置、人脸识别系统
US10318102B2 (en) 3D model generation from 2D images
US20210312650A1 (en) Method and apparatus of training depth estimation network, and method and apparatus of estimating depth of image
US20220358675A1 (en) Method for training model, method for processing video, device and storage medium
JP2017182302A (ja) 画像処理プログラム、画像処理装置、及び画像処理方法
KR20240002898A (ko) 3d 얼굴 재구성 모델 훈련 방법과 장치 및 3d 얼굴 형상 생성 방법과 장치
JP7390445B2 (ja) 文字位置決めモデルのトレーニング方法及び文字位置決め方法
KR20210039996A (ko) 헤어스타일 변환 방법, 장치, 기기 및 저장 매체
WO2023019995A1 (zh) 训练方法、译文展示方法、装置、电子设备以及存储介质
CN113255664B (zh) 图像处理方法、相关装置及计算机程序产品
CN115984856A (zh) 文档图像矫正模型的训练方法、文档图像的矫正方法
US20230162413A1 (en) Stroke-Guided Sketch Vectorization
CN113205090B (zh) 图片矫正方法、装置、电子设备及计算机可读存储介质
CN113766117B (zh) 一种视频去抖动方法和装置
CN113850714A (zh) 图像风格转换模型的训练、图像风格转换方法及相关装置
US11783501B2 (en) Method and apparatus for determining image depth information, electronic device, and media
CN116030150B (zh) 虚拟形象生成方法、装置、电子设备和介质
WO2024007968A1 (en) Methods and system for generating an image of a human

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22857292

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE